The anatomy of an AI agent

Goal Brain Memory Tools The loop How we'd help Glossary Quiz The author Book a call

An AI agent is software you hand a goal to, and it works through the steps to reach it on its own, using the tools and data you already have. The phrase doing the heavy lifting is on its own. A chatbot waits for your next instruction. An agent takes the objective and keeps going until the job is done.

The whole picture a goal goes in, the brain reasons over memory and acts through tools, and the loop runs the cycle until the goal is met

Underneath, an agent is still a language model. The difference is everything wrapped around it. A bare model answers once and stops. An agent gives that model a goal, a memory, a set of tools, and a loop, so it can take an action, see what happened, and decide what to do next. That wrapper is the whole game.

Strip away the noise and every agent is the same five parts, working together. Name them and you can tell a real agent from a thin wrapper, judge whether a vendor's demo will survive contact with your business, and see where one would actually pay off. Here they are, one at a time.

Goal: the objective, the rules, and the definition of done

Goal the brief, the rules, and a definition of done it can check against

Everything starts here. The goal is the job to be done, briefed the way you would brief a capable new hire: the outcome you want and the constraints it has to follow. "Reconcile yesterday's invoices against the bank feed and flag anything that doesn't match" is a goal. So is "draft replies to support tickets, but escalate anything about a refund to a person."

In practice the goal lives in two places. There is the task you hand it in the moment, and there is the standing brief behind it, usually called the system prompt: the agent's role, its tone, the rules it must never break, and what counts as finished. Think of the system prompt as the job description and the task as today's ticket. Both shape what the agent does.

The part teams skip is the definition of done. An agent will happily declare victory on a fuzzy goal. Spell out success in terms it can actually check, "every invoice is either matched or flagged with a reason," and you give it a target to aim at. The teams who get real value go one step further and make the agent pass explicit checks before it can call the work finished, the same way you would not let a new hire close a ticket without a review. A vague goal produces a vague agent, and the clarity of the brief is the ceiling on the quality of the work.

Brain: the model that reasons, plans, and decides

Brain reasoning over its context window, deciding when to call a tool

The brain is the model. It reads the goal, looks at the situation in front of it, and decides the next move. This is the piece that used to need a person: judgment, planning, working out what to do when the path isn't written down anywhere.

Everything the model can use at the moment it decides sits in its context window: a working desk of fixed size holding the goal, the relevant data, the steps so far, and the tools on offer. The model knows nothing beyond what is on that desk. And counterintuitively, more is not better. As the window fills, models tend to lose track of what sits in the middle of it, so a good agent curates what goes on the desk instead of dumping everything in. Managing that desk well is half of what separates a reliable agent from a flaky one.

The model also decides when to act, not only what to say. Today's models are trained for tool use, often called function calling: instead of answering in prose, the model can emit a structured request like "call send_email with these arguments." That single capability is what lets the brain reach out of the chat and do something in the world. Model choice matters here too. A frontier model plans several moves ahead and recovers from surprises; a weaker one loses the thread, which is the gap between an agent that finishes and one that stalls.

It is also the part people overestimate. The model reasons well and it can still be wrong. That is exactly why the other parts surround it with the right context, the right tools, and a loop that checks the work instead of trusting it blind.

Current models leading the charge. As of mid-2026, a handful of models sit at the frontier and trade the top spot constantly. Anthropic's Claude (Opus for the hardest reasoning, Sonnet for fast everyday work), OpenAI's GPT, and Google's Gemini are the names most often at the top, with xAI's Grok and DeepSeek close behind. No single one wins everything: one leads on coding, another on math, another on cost per run. Any of them can power a capable agent, so the right pick comes down to the job, the budget, and how much real reasoning the work needs.

Source: live model rankings at LLM Stats and LMArena, checked June 2026.

Memory: working context, short-term and long

Memory short-term context, plus long-term recall pulled in on demand

A model on its own starts every conversation from zero. Memory is what stops that. It is the agent's working context: your data, the task so far, the steps it has already taken, and what it learned from them.

Short-term memory is that context window again, the working desk for the current job, keeping step ten aware of step one. But the desk is finite, so on a long task the agent has to summarize and prune as it goes, holding on to what matters and letting go of the rest, the way you keep notes on a months-long project instead of rereading every email.

Long-term memory is how an agent knows your business and not just the world in general. The common technique is retrieval-augmented generation, or RAG: your documents, tickets, and records are split into chunks and turned into embeddings, numerical fingerprints of their meaning, then kept in a vector database. When the agent needs context, it searches that store for the few most relevant chunks and pulls only those onto the desk. It matches on meaning rather than keywords, so a question about "refunds" can surface a policy that only ever says "money back." That is how an agent cites your refund terms or last quarter's numbers without holding your whole company in its head.

One thing worth setting straight, because it trips up almost everyone: this is memory, not learning. The model's underlying training does not change as the agent works. It gets better because you sharpen its instructions, give it better tools, or improve what sits in its memory, not because it quietly teaches itself overnight. An agent is only as current as the data you feed it.

Tools: the functions it calls to act on your systems

Tools functions the agent calls, increasingly through a shared standard like MCP

Without tools an agent can only talk. Tools are how it acts: your CRM, your inbox, your database, your internal systems, the same software your team clicks through every day. Each one is a door you decide to open.

Technically, a tool is just a function the model is allowed to call, described in plain terms: what it does, what inputs it needs, what it returns. "create_ticket takes a title and a priority." "lookup_customer takes an email and returns the account." Hand an agent a few of these and the work turns concrete: it reads a Salesforce record, drafts a reply in Gmail, runs a query against your database, books a slot on a calendar, starts a refund in Stripe.

Wiring each of those connections by hand, for every agent and every system, is where most projects bog down. That is the problem the Model Context Protocol, or MCP, was built to solve. MCP is an open standard, introduced by Anthropic and now adopted across the major model providers, for how agents talk to outside systems. A system runs an "MCP server" that advertises the tools it offers, and any MCP-compatible agent can connect and use them with no custom integration. It is roughly a USB-C port for AI: build the connector once, and any agent can plug in.

It is worth being precise, because this is the question every technical person asks: MCP does not replace your APIs. It wraps them in a description the model can read and a format any agent can discover on its own. The API is the underlying capability; MCP is the standard envelope that makes it agent-ready and swappable. Because the standard is open, the same server works whether you run Claude, a model from another provider, or several at once, so your agent can reach Slack, GitHub, a data warehouse, or your own internal service through one interface, and you can grant or revoke a capability without rewiring anything.

Tools are also where control lives. You decide which doors exist, which are read-only, and which need a human to approve before anything irreversible happens: sending the email, issuing the refund, deleting the record. A well-built agent is generous with read access and deliberately stingy with write access. It is exactly as capable, and exactly as bounded, as the tools you expose.

The loop: observe, decide, act, repeat

The loop each pass feeds the result back in, until the goal is met

Here is the part that turns four components into something that earns the name. The agent observes the current state, decides the next step, acts through a tool, then reads the result and goes again. Observe, decide, act, repeat, until the goal is met.

Mechanically, each pass is concrete. The model reads the goal and the current context, decides to call a tool, and the tool runs. Its result, the rows from a query, an API response, an error message, gets written back into the context window. Now the model looks again, with new information, and picks the next move. This cycle of reason, act, observe, repeat is what researchers call the agent loop, and it is the whole difference between software that thinks once and software that works a problem.

It is also the line between automation and an agent. Old automation, including the rule-based bots sold as RPA, runs a fixed script and breaks the moment reality shifts: a renamed field, an unexpected error, a case nobody foresaw. An agent adapts inside the loop. It reads the error, tries another path, and keeps moving toward the outcome.

That autonomy has a price, and it is worth naming. Every pass of the loop is a fresh call to the model, which means more steps mean more tokens, more cost, and more time. Part of building a good agent is reaching the goal in as few well-chosen steps as possible. It is also why you bound it: sensible agents run with guardrails, a cap on how many steps they take, checkpoints where a human signs off, and clear stopping conditions, so "until the goal is met" never turns into "forever." And for larger jobs the loop is often not one agent but several, a planner that breaks the work down and specialist agents that each own a piece and report back. The principle is the same, just nested.

This is where the work is heading: you set the goal, and the system runs the steps until it is done.

Where agents fit your operation

You don't need to build anything to use this. The next time someone pitches you an agent, walk the five parts. Is the goal sharp. What is the brain. What does it remember, and from where. Which tools can it actually touch. And is there a real loop, or just a single prompt wearing a costume.

The best place to start is rarely the flashiest. Look for work that is repetitive, rules-heavy, and light on judgment but not free of it: the tasks that quietly eat your team's week and follow a pattern a sharp new hire could learn. Triaging inbound, reconciling records, drafting first-pass responses, assembling the same report every Monday. High volume, a clear definition of done, and a low cost when a mistake slips through. That is where an agent pays for itself first, and it is usually where we begin.

How we'd find your first agents

You do not have to run this play alone. When we work with a team, the first four phases are pure discovery, no code, ending in a build-ready spec. The fifth phase is the build itself, sized to the number of agents we agree are worth it. Here is the shape of it.

Discovery

We map the work

We start in your operation, not in a model. We read your SOPs and sit down with the people doing the work, and map where the repeatable, rules-heavy hours actually go.

Discovery

We surface the candidates

From that map we draw up a shortlist of tasks that are prime for an agent, and score each one on volume, clarity, tool access, and the cost of a mistake.

Discovery

We map the tools

For the strongest candidates we confirm what an agent would need to reach, and whether those systems are available through an API or an MCP server. Feasibility before promises.

Discovery

We write the agent spec

We turn each candidate into a build-ready spec: the goal, the guardrails, the definition of done, and where a human stays in the loop. You finish discovery with a documented plan, not a hunch.

Deliverable · build-ready agent spec

Build

We decide, then build

We bring the shortlist and the specs to your leadership and decide together which agents are worth building. Then we build, in phases scaled to the number of agents, and ship them into your stack.

Curious whether your operation has agents worth building? Book a free working session and we will run through your operations together to find the strongest candidates. No commitment.

Book a free working session →

Leadership

Strategy & build

Audits

Design

AI

Recent

Leadership

Strategy & build

Audits

Design

AI

Recent

The anatomy of an AI agent

Goal: the objective, the rules, and the definition of done

Brain: the model that reasons, plans, and decides

Memory: working context, short-term and long

Tools: the functions it calls to act on your systems

The loop: observe, decide, act, repeat

Where agents fit your operation

How we'd find your first agents

We map the work

We surface the candidates

We map the tools

We write the agent spec

We decide, then build

Improve your AI vocabulary

Test your knowledge on AI agents

Thinking about agents for your business?

More from the studio

Why founders stop seeing their own product.

AI won't take your job. Someone using AI will.