AI wrapper vs AI agent: where the chatbot ends

I've spent the last few months reviewing projects that call themselves an "AI product", and most of them are the same thing over and over again. A form, a fixed prompt, a call to OpenAI or a gateway, and the result gets rendered straight to the screen. I call that a wrapper, not as an insult. It's a valid architecture when it fits, but confusing it with a real agentic product is what gets a lot of teams stuck.

This series is about how you cross that line. Before I get into frameworks, observability, or pricing, I want to lay out the map.

The question that separates a wrapper from an agent

There's a simple question that almost always makes it clear which side you're on. Who decides how many steps it takes to solve the user's request, your code or the model?

If the answer is your code, you have a wrapper. The flow is linear, the logic lives outside the model, the LLM does a text-to-text transformation, and the rest of the system looks a lot like a traditional API. Take a product description generator. The user uploads a photo, your backend sends the image and a prompt to the model, gets the text back, and stores it. That's a wrapper, and it's perfectly fine if the problem fits that shape.

If the answer is the model, you have something else. The LLM decides which tool to call, in what order, when to stop, when to ask for more context, when to give up. Your code is reduced to giving it the loop, the tools, and the limits. That's an agent, even if it's a small one.

The difference looks subtle on paper, but it breaks almost everything you know from classical programming. Latency stops being predictable. Cost per request can vary by an order of magnitude between two sessions that look identical on the surface. Logs stop being enough and you need traces. The usual tests don't catch regressions because the output isn't deterministic. And you have to protect yourself from things that wouldn't even come up in a wrapper, like the agent getting stuck in a loop, burning through the budget in a single session, or running a destructive tool because of a prompt injection.

When a wrapper is enough

I don't want to paint the wrapper as an inferior architecture. There are three situations where it's the right choice.

The first is when the problem is turning an input into an output and nothing else. Summarizing text, translating, classifying, extracting entities, rewriting an email. There's no decision about the path, only about the result.

The second is when the sequence of steps is stable and known. A simple RAG system where I search against an embeddings database, put the context into a prompt, and return an answer. I define the order of the steps. The model just writes.

The third is when error tolerance is high and responsibility stays with the user. Autocomplete, a tag suggestion, an imperfect placeholder. If the model gets it wrong, the cost is zero.

If your project falls into any of those three, save yourself the complexity of an agent. You're going to suffer a lot building what you don't need, and you won't get anything back from it.

When the wrapper falls short

The problem shows up when you start patching the wrapper so it can do things it was never meant to do. Symptoms I know well.

The prompt keeps growing. Every new case adds two paragraphs of instructions, an example, an exception. You get to 4,000 tokens of system prompt and the model gets more confused, not less.

You start parsing the response to decide what to do next. If it says "X", call this API, if it says "Y", do the other thing. That's a janky agent hiding in plain sight, where the control loop is scattered between your code and the prompt.

The user asks you to "just do it on its own". They want to upload a file, say what they need, and come back twenty minutes later for the result. Your wrapper can't wait, resume context, and chain tasks together. You need a system that keeps state across turns and makes decisions on the fly.

When two or more of those symptoms show up, you've gone from wrapper to agent by accident, and in the worst possible way, with logic out of control and no observability.

Operational definition of an agent

So I don't stay at the level of abstraction, here's the definition I'll use for the rest of the series.

An agent is a system with four minimum components, and I'm going to break them down one by one in later posts:

Control loop. The code repeats model invocations until a stop condition is met. That condition can be a final answer, exhausting a turn budget, or a timeout.

Tools. The model doesn't just transform text, it decides which external function to call. A function can be a search, a database query, a POST to an API, a file operation. Your code defines them and describes them in a schema the model understands.

Memory. The state of the conversation beyond the current prompt. Previous messages, results from earlier tools, facts extracted during the session. That memory persists for the whole session and, in more advanced systems, across different sessions.

Guardrails. Hard limits that the loop respects without asking the model for permission. Token budget, maximum number of turns, list of allowed tools, output validation. Without these, an agent in production is a bomb.

Any one of those four is enough to distinguish an agent from a wrapper. Having all four is what separates a toy agent from one that can survive production.

The price of crossing into the agentic side

This isn't free. Before you jump in, it's worth knowing what you lose when you move from wrapper to agent.

You lose determinism. Two sessions with the same input can take different paths. That forces you to build evals, not tests, and to accept that you're never going to have binary green-red again.

You lose cost predictability. An agent can spend 0,02 USD or 4 USD on the same task depending on how much context it accumulates, how many tools it calls, and whether it drifts into extended reasoning. Without per-session budgets, one user can burn through your plan a hundred times over in an afternoon.

You lose the ability to debug the way you used to. A stack trace doesn't tell you why the agent decided to call the search tool three times before answering. You need traces with spans, you need to annotate the inputs and outputs of every turn and every tool, and you need to review whole sessions as if they were replays.

What you get in return is real capability. Tasks that don't fit in a fixed prompt. Products a user can delegate to and come back later to pick up. Defensibility against the next free feature OpenAI ships, because your value isn't in the model call but in the system around it.

Why I'm starting this series now

The blog already has quite a few posts about what I'd call AI infrastructure. Comparisons of gateways like OpenRouter or Vercel AI Gateway, hardening chats with budget and rate limit, image generation with Replicate, post narration with TTS. They're all valid pieces, and almost all of them are wrappers in the strict sense. Most of them are well solved as wrappers because the problem fit there.

What's missing from the blog is the conversation about what happens when the problem doesn't fit. When it makes sense to jump to an agentic system, which framework to choose, how to observe it, how to evaluate it, what pricing it can support, and what moat you still have when wrappers get commoditized layer by layer.

This is going to be a journey along three axes in parallel. The first is conceptual, going deep into each component. The second is a practical roadmap for an agent I'm going to build and document as a public build log while I do it. The third is strategic, looking at product and market, because choosing the right technical architecture doesn't help if the product angle can't hold up for six months.

The next post opens with the minimum anatomy of an agent, no framework, just raw TypeScript, so it's clear what's underneath before choosing abstractions.

Another entry in the From wrapper to agent series. The next post is Building an agent from scratch in TypeScript.

This series is about how you cross that line. Before I get into frameworks, observability, or pricing, I want to lay out the map.

The question that separates a wrapper from an agent

There's a simple question that almost always makes it clear which side you're on. Who decides how many steps it takes to solve the user's request, your code or the model?

When a wrapper is enough

I don't want to paint the wrapper as an inferior architecture. There are three situations where it's the right choice.

The third is when error tolerance is high and responsibility stays with the user. Autocomplete, a tag suggestion, an imperfect placeholder. If the model gets it wrong, the cost is zero.

If your project falls into any of those three, save yourself the complexity of an agent. You're going to suffer a lot building what you don't need, and you won't get anything back from it.

When the wrapper falls short

The problem shows up when you start patching the wrapper so it can do things it was never meant to do. Symptoms I know well.

The prompt keeps growing. Every new case adds two paragraphs of instructions, an example, an exception. You get to 4,000 tokens of system prompt and the model gets more confused, not less.

When two or more of those symptoms show up, you've gone from wrapper to agent by accident, and in the worst possible way, with logic out of control and no observability.

Operational definition of an agent

So I don't stay at the level of abstraction, here's the definition I'll use for the rest of the series.

An agent is a system with four minimum components, and I'm going to break them down one by one in later posts:

Control loop. The code repeats model invocations until a stop condition is met. That condition can be a final answer, exhausting a turn budget, or a timeout.

Any one of those four is enough to distinguish an agent from a wrapper. Having all four is what separates a toy agent from one that can survive production.

The price of crossing into the agentic side

This isn't free. Before you jump in, it's worth knowing what you lose when you move from wrapper to agent.

You lose determinism. Two sessions with the same input can take different paths. That forces you to build evals, not tests, and to accept that you're never going to have binary green-red again.

Why I'm starting this series now

The next post opens with the minimum anatomy of an agent, no framework, just raw TypeScript, so it's clear what's underneath before choosing abstractions.

Another entry in the From wrapper to agent series. The next post is Building an agent from scratch in TypeScript.

AI wrapper vs AI agent: where the chatbot ends

The question that separates a wrapper from an agent

When a wrapper is enough

When the wrapper falls short

Operational definition of an agent

The price of crossing into the agentic side

Why I'm starting this series now

Leave the first comment

Un asistente de IA dentro de mi CV, arquitectura del chat

Cómo añadí narración por voz a los posts del blog con IA

Tests E2E que se reparan solos: cómo construimos un pipeline de self-healing con IA

AI wrapper vs AI agent: where the chatbot ends

The question that separates a wrapper from an agent

When a wrapper is enough

When the wrapper falls short

Operational definition of an agent

The price of crossing into the agentic side

Why I'm starting this series now

Leave the first comment

Un asistente de IA dentro de mi CV, arquitectura del chat

Cómo añadí narración por voz a los posts del blog con IA

Tests E2E que se reparan solos: cómo construimos un pipeline de self-healing con IA