A woman and a robot are in a living room, with the robot holding the vacuum cleaner and vacuuming a table. The woman looks annoyed and is scolding the robot.

The AI delegation dilemma

June 27, 2025

Christian Charukiewicz Partner & Principal Software Engineer

AI-assisted programming has rapidly become commonplace in software development. This is largely due to the rise of large language models (LLMs), which bring with them the promise of improved developer productivity and faster delivery. AI tools can excel when used surgically: fix this bug, document this function, generate a series of test cases. These bite-sized tasks are tightly scoped, so they are easy to specify and easy to validate. But once you ask AI to build larger components—like an entire authentication flow, a service layer, or even a full-stack CRUD application—you’re not just using AI to assist, you’re instead opting to delegate work to AI. This shift to delegation introduces a subtle but significant challenge.

Imagine you ask an AI to build a complete feature, such as a new account settings page in your web application. The LLM produces a correct-seeming result very quickly, which feels like magic. You test its output and find that a few things are off and require fixing. You go back to the LLM and ask for a few tweaks. This fixes some things but causes others to regress. You try again, but attempt to be more specific. Before you know it, you’re drowning in prompt revisions or sifting through code that doesn’t quite fit your architecture nor does it behave exactly as you wanted.

This is where the constraint of AI delegation emerges: what we refer to as the AI delegation dilemma.

A triangle of trade-offs

The more you ask the AI to do, the more you find yourself navigating a three-way trade-off between the following priorities:

Design Fidelity: The design and functionality of the code reflects your desired intent—not the AI’s assumptions
Prompt Efficiency: Writing the prompts requires less effort than writing the code yourself
Output Usability: There is minimal post-generation cleanup or rewriting of the output code required of you

The dilemma defines the inherent constraints when offloading large amounts of work to AI

You can only optimize for at most two at a time. Much like the classic “Fast, Cheap, Good: pick two” triangle, this model shows the constraints of AI-generated code:

If you want code that matches your intended design (Design Fidelity) and generated results that are ready to use (Output Usability), you’ll spend substantial time writing and revising the suitable prompts (loss of Prompt Efficiency).
If you want a quick prompting experience (Prompt Efficiency) and don’t want to spend time revising the output (Output Usability), you’ll have to let the AI make design decisions and assumptions for you (loss of Design Fidelity).
If you want to stay in control (Design Fidelity) and write prompts quickly (Prompt Efficiency), you will have to make extensive revisions to the AI’s output (loss of Output Usability).

We call this triangle of trade-offs the AI delegation dilemma. Inherent to the AI delegation dilemma is the constraint that ‘pick 3’ isn’t possible. There are several reasons why.

The roots of the dilemma

Why the AI delegation dilemma is unavoidable requires examining the roots of how modern LLMs work and the limitations one quickly runs into when delegating work to AI.

The nature of LLMs and code generation

LLMs operate by predicting the most probable sequence of tokens based on their training data, which is vast but invariably leads to generic outputs. They operate based on patterns and statistical prediction as “stochastic parrots.” For code generation, additional techniques, such as reinforcement learning, are used to hone the output of LLMs and produce code that is favored by humans. This means they lack actual comprehension and have no actual understanding of your specific project’s nuances, architectural philosophy, organizational constraints, or business logic that isn’t explicitly relayed.

An LLM is like a parrot that knows everything (in its training data) but understands nothing. This is a departure from real parrots, which understand quite a lot.

The result of this is that LLMs produce plausible code, but not necessarily correct nor optimal code for a specific context. This is why LLMs are prone to not only failing to adhere to your intent, but also even hallucinating code that is entirely believable but completely wrong (ask an LLM to write code for you and it won’t be long before it uses a nonexistent function or imports a fake-but-real-sounding library).

Foxhound Systems

A US-based software engineering agency Precision-built software systems for the most discerning organizations.

Foxhound Systems delivers tailored software solutions for clients who demand broad expertise, meticulous attention to detail, and exceptional results

We're a small, highly skilled team of expert software engineers with a focus on custom software systems for fintech, insurance, and ecommerce
We can help you with every part of building a software product: product strategy, UX design, development, production deployment, and beyond
We can take your idea to production deployment in as little as 12 weeks
Benefit from US-market expertise and shared timezones through our entirely US-based team

Learn more

The cost of context and implicit knowledge

Human developers implicitly carry vast amounts of context (team conventions, project history, domain-specific nuances, unspoken requirements, security best practices) that are difficult or impossible to convey fully in a prompt. AI can typically only glimpse a small portion of this through what is conveyed in a prompt or supplemental file, often leading to generic or misaligned output.

The other issue regarding context acquisition by LLMs is that they suffer from anterograde amnesia, which is the inability to form new memories. OpenAI co-founder and Tesla AI director Andrej Karpathy describes in a talk on the role of AI in software development:

LLMs also suffer from anterograde amnesia. I’m alluding to the fact that if you have a coworker who joins your organization, this coworker will over time learn your organization and they will understand and gain a huge amount of context about the organization. And they go home and they sleep and they consolidate knowledge. They develop expertise over time. LLMs don’t natively do this and this is not something that has really been solved in the R&D of LLMs.

Context windows are really like working memory and you have to program the working memory quite directly because they don’t get smarter by default. I think a lot of people get tripped up by the analogies in this way.

In popular culture I recommend people watch these two movies: Memento and 50 First Dates. In both of these movies the protagonists have their weights fixed and their context windows get wiped every single morning. It’s really problematic to go to work or have relationships when this happens, and this happens to them all the time.

The very essence of the dilemma is that the complete context for the required features and functionality (the design fidelity) must be relayed to the LLM (at the cost of prompt efficiency) or else post-generation revisions are required (at the cost of output usability). Achieving maximal design fidelity through prompting demands a level of detail that effectively mirrors writing the code itself. To get truly bespoke, usable code quickly, you’d need an AI that perfectly understands your intricate design intent and all necessary context without explicit instruction—a capability LLMs simply do not possess.

This means that a sacrifice must always be made: either in the ease of prompting, the correctness and usability of the output, or the relinquishing of full design control and settling for generic or somewhat misaligned outputs. You need to be prepared to be highly specific in giving instructions to an LLM and even after doing so, still expect to spend significant time in follow-up prompts or manual code revisions to achieve the results you want.

Navigating the dilemma

The AI delegation dilemma means that the LLM-wielding software developer must make a series of trade offs when opting to delegate work to an LLM. There are several different approaches in navigating the dilemma and the best approach largely depends on the nature of the work that the developer is performing.

Picking your two priorities wisely

Each pair of priorities requires a different set of expectations and leads to distinct usage of the LLM.

Get a working prototype quickly: Prompt Efficiency + Output Usability (sacrificing Design Fidelity)

For quick, immediately usable results without concern for the specifics, you can write concise prompts and embrace generic solutions. This approach works for rapid prototyping. Accept that the AI will make design and functionality assumptions, and budget for potential refactoring if the code later becomes critical.

The downside of this approach is obvious: you’re going to get generic results that don’t necessarily do what you’d ideally want them to do. Depending on the brevity of your prompts and your willingness to accept whatever the LLM produces, you might end up with what is often referred to as “AI slop,” or low quality output that is immediately recognizable as automatically generated. Use this approach for production code at your peril.

Produce a useful starting point: Design Fidelity + Prompt Efficiency (sacrificing Output Usability)

If you need to maintain tight control over the design while keeping prompting quick, view the AI as a brainstorming partner generating a rough draft. This means using concise prompts to convey your intent, but being ready for extensive manual editing and cleanup of the AI’s output, utilizing it primarily for ideation or as a conceptual starting point.

This is a path to get the results you actually want while shaving away some of the tedium in a new project, including things like writing boilerplate code beyond what the framework provides for you, setting up configuration, or creating template files that will be written in later on.

With this approach, the downside—or, depending on your perspective, upside—is that you’re only using the AI for a boost, recognizing that most of the work will be done by you, the software developer.

Create comprehensive specifications for maximal precision: Design Fidelity + Output Usability (sacrificing Prompt Efficiency)

When demanding usable code that aligns with your design and functional requirements, be prepared for substantial effort in writing detailed prompts iteratively. This means providing explicit examples, defining all constraints, and enabling “thinking modes” allow the LLM to produce a better result at the cost of more time.

Like the preceding approach, this tactic seeks to maintain Design Fidelity over the results except it puts the onus of implementation on the LLM rather than a human developer. The task of the human is to supply all of the relevant details of the look, feel, and behavior of the software in English (or whatever non-programming language is chosen).

The downsides of this approach are the burdens associated with relaying a level of detail that coerces the LLM to produce the correct results, along with the need to subsequently review and test the extensive outputs. As mentioned earlier, prompting with sufficient detail to achieve the ultimate level of fidelity will converge with the level of effort required in writing the code itself. Moreover, the limitations of the context window that an LLM operates under means that very large prompts are often at odds with good performance and cost. So even when providing ample context, delegating the task to the LLM may be impractical.

Avoid the dilemma entirely

Sometimes the best way to navigate the dilemma is to avoid it altogether. When working on critical code with hyper-specific requirements, large code bases that require large amounts of contextual information, or when integrating with existing complex systems, it is best to avoid delegating large amounts of work to an LLM.

For example, if working in a well-established code base, delegating a large refactoring task to an LLM can result in significant deviation from existing conventions, introduction of new assumptions, or other undesirable changes. If inclined to use an LLM for such a task, the safer path is to use it to assist with refactoring smaller pieces but steer the high level process on your own.

Keep in mind that with large volumes of work delegated to an AI, the burden or review and revision falls on you, the human developer. In the same way that sometimes a senior engineer may avoid delegating a critical task to a junior engineer because of the increased overall burden caused by the need for more laborious review, avoiding delegation to an AI will in many instances be the best way to get the highest quality result for the lowest amount of total effort.

Takeaways

AI-assisted programming brings with it the potential for additional productivity, but it also introduces the AI delegation dilemma. This fundamental trade-off between Design Fidelity, Prompt Efficiency, and Output Usability means that when offloading larger programming tasks to LLMs, you can achieve at most two out of the three priorities. This isn’t a flaw in the tools themselves, but a consequence of their probabilistic nature and their inherent lack of complete contextual understanding of your specific project.

Recognizing this dilemma allows you to move beyond the initial “magic” of AI code generation and adopt a more strategic approach. Whether you choose to spend time on detailed prompting for precise output, accept generic solutions for rapid prototyping, or use AI as a quick ideation partner for later refinement, recognizing the trade offs is imperative. And for truly critical, complex, or deeply integrated work, sometimes the wisest choice is to avoid the delegation dilemma altogether, and avoid using LLMs for wholesale code generation.

Christian Charukiewicz is a Partner at Foxhound Systems, a small US-based software engineering agency building outstanding software systems. Need help with a project? Contact us.