Why coding with LLMs can be harder than you think

Posted by Niraj Chauhan on Mar 16, 2025

Coding with LLMs is not easy. I’ve been deeply involved lately in using large language models (LLMs) for coding, inspired by concepts like Andrej Karpathy’s “Vibe Code,” where the idea is that you just “give vibes” to an LLM and let it handle the coding details—even errors get fed directly back into the LLM for fixes without manual debugging.

Photo by Andrea De Santis on Unsplash

What is Vibe Coding?

A key part of vibe coding is that the user accepts the generated code without necessarily understanding every detail. AI researcher Simon Willison emphasizes this clearly, stating: “If an LLM wrote every line of your code, but you’ve reviewed, tested, and understood it all, that’s not vibe coding in my book—that’s using an LLM as a typing assistant.” This approach works particularly well for greenfield projects initiated directly with AI-driven methods.

From Vibe Coding to Real-World Applications

Initially, coding with LLMs sounds revolutionary and straightforward: just give a prompt, get working code. Tools like Cursor promise exactly that. And while it does feel incredible to see code appear effortlessly, the reality of integrating LLM-generated code into complex, real-world products is far more nuanced.

There’s a fundamental difference between using an LLM to write isolated functions versus integrating entire features into a live product. When working on real-world applications—like an itinerary planner, reservation system, or integrating a payment gateway—context becomes everything. Providing proper context to an LLM isn’t as simple as just saying, “Write a function for XYZ.” You must thoroughly communicate your project’s structure, business logic, and specific constraints. This shift from unit-level coding to feature-level coding is critical but challenging.

My approach has evolved significantly—I now treat LLMs as collaborative partners rather than simple code generators. Instead of immediately asking for code, I begin by brainstorming the feature with the LLM to ensure it understands both the functionality and the context. After this discussion, I instruct the LLM to create a spec file that documents the feature and the brainstorming context. This spec is then used to generate additional prompts, which in turn guide the code generation process. I’ve found that using a few starter prompts works well; for example, I discovered one on Harper Reed’s blog.:

Ask me one question at a time so we can develop a thorough, step-by-step spec for this idea. Each question should build on my previous answers, and our end goal is to have a detailed specification I can hand off to a developer. Let’s do this iteratively and dig into every relevant detail. Remember, only one question at a time. Here’s the idea:

Reasoning models are particularly helpful here, asking the right questions and forcing clarity before code generation even begins.

My experience

When developing a lead-scoring feature, I initially provided exhaustive context about numerous parameters. Each parameter had several potential values, each with different scores. The excessive initial context overwhelmed the LLM, resulting in unclear and overly complicated code filled with redundant or misunderstood logic. Realizing this, I refined my approach by limiting the initial scope to just two or three parameters clearly outlined with their scoring rules. Once the LLM understood the pattern clearly, scaling up to additional parameters became straightforward.

Another challenging experience involved refactoring duplicated types into a shared package. Initially, I attempted to refactor all duplicated types at once. The LLM struggled, creating overly complex shared types, mixing unnecessary fields, and introducing errors. Recognizing my mistake, I shifted to refactoring incrementally—one type at a time—carefully verifying after each step. This allowed the LLM to clearly understand each specific context, resulting in clean and manageable code.

Product hat - Developer hat

One crucial insight from these experiences is that a developer needs to clearly separate their roles when working with an LLM. Initially, adopting a product-focused perspective—discussing only the product, its requirements, and desired outcomes without worrying about technical details—is essential. Once this product-level clarity is achieved, transitioning into a developer-focused mindset to explicitly provide technical context and constraints ensures the LLM can effectively assist in code generation.

Fast Until They Slow You Down

The biggest advantage I’ve noticed from this approach is speed. AI-generated code significantly accelerates development, enabling rapid iterations. However, speed without control can quickly turn into a disadvantage. If context and prompts are not managed properly, rapid output from an LLM can lead to confusion and inefficiency. Iterative corrections become tedious if the initial prompt was unclear or too extensive. Therefore, while speed is a substantial advantage, you must actively manage the AI’s workflow to avoid getting overwhelmed.

Good Documentation: Your LLM’s Best Friend

Documentation is very critical. Every spec file, prompt generated by the LLM, and blueprint must be stored within the same code repository. These become invaluable artifacts, continuously helping the LLM generate better and more accurate outputs. Remember, English is the new coding language.

Greenfield and Brownfield

There is also a distinct contrast between greenfield and brownfield projects. Greenfield projects, initiated entirely through LLM-driven methods, benefit significantly from clarity, speed, and streamlined development. For instance, recently I started a greenfield project purely with markdown spec files and prompts. The AI quickly handled project initialization, dependency management, and basic scaffolding, making the initial phases extremely efficient.

Conversely, brownfield projects require precision in providing context, particularly since existing documentation is often sparse or outdated. Carefully identifying specific files or code points—such as initial README files, package.json, or key configuration files—is crucial for effective collaboration. Without precise targeting, the LLM can easily become confused and generate incorrect or extraneous code.

Various tools enhance this workflow. Tools like Cursor offer good features like docs feature enabling developers to integrate external documentation seamlessly. Cursor rules allow setting specific guardrails—such as preferring Vitest over Jest, using defined types instead of any or unknown, and enforcing consistent coding practices. Another helpful tool is RepoMix, which generates detailed markdown context files summarizing project structure, enhancing the LLM’s effectiveness.

Reality Check

Despite optimistic portrayals in YouTube demos and Twitter, the truth remains nuanced. Tools like Cursor indeed offer impressive capabilities, but they’re not magical. The quality of output relies heavily on your ability to provide the right amount of context—enough to clarify the task but not so much as to overwhelm the LLM. It’s about carefully managing the dialogue, anticipating potential misinterpretations, and ensuring every step is deliberately crafted.

Conclusion

Currently, I am not fully embracing vibe coding; I’m somewhere between AI-assisted coding and vibe coding. I’m cautious about fully adopting vibe coding for critical real-world projects due to inherent risks. However, I actively explore vibe coding approaches in proof-of-concept projects or side projects where I can safely experiment, learn, and better understand its potential and limitations.

And honestly, that’s what makes this journey both challenging and exciting.