{"id":3939,"date":"2025-10-18T13:27:17","date_gmt":"2025-10-18T13:27:17","guid":{"rendered":"https:\/\/violethoward.com\/new\/under-the-hood-of-ai-agents-a-technical-guide-to-the-next-frontier-of-gen-ai\/"},"modified":"2025-10-18T13:27:17","modified_gmt":"2025-10-18T13:27:17","slug":"under-the-hood-of-ai-agents-a-technical-guide-to-the-next-frontier-of-gen-ai","status":"publish","type":"post","link":"https:\/\/violethoward.com\/new\/under-the-hood-of-ai-agents-a-technical-guide-to-the-next-frontier-of-gen-ai\/","title":{"rendered":"Under the hood of AI agents: A technical guide to the next frontier of gen AI"},"content":{"rendered":"
\n
<\/p>\n
Agents are the trendiest topic in AI today, and with good reason. AI agents act on their users\u2019 behalf, autonomously handling tasks like making online purchases, building software, researching business trends or booking travel. By taking generative AI out of the sandbox of the chat interface and allowing it to act directly on the world, agentic AI represents a leap forward in the power and utility of AI.Taking gen AI out of the protected sandbox of the chat interface and allowing it to act directly on the world represents a leap forward in the power and utility of AI.<\/p>\n
Agentic AI has been moving really fast: For example, one of the core building blocks of today\u2019s agents, the model context protocol (MCP), is only a year old! As in any fast-moving field, there are many competing definitions, hot takes and misleading opinions.<\/p>\n
To cut through the noise, I\u2019d like to describe the core components of an agentic AI system and how they fit together: It\u2019s really not as complicated as it may seem. Hopefully, when you\u2019ve finished reading this post, agents won\u2019t seem as mysterious.<\/p>\n
Definitions of the word \u201cagent\u201d abound, but I like a slight variation on the British programmer Simon Willison\u2019s minimalist take:<\/p>\n
An LLM agent runs tools in a loop to achieve a goal<\/i>.<\/p>\n
The user prompts a large language model (LLM) with a goal: Say, booking a table at a restaurant near a specific theater. Along with the goal, the model receives a list of the tools at its disposal, such as a database of restaurant locations or a record of the user\u2019s food preferences. The model then plans how to achieve the goal and calls one of the tools, which provides a response; the model then calls a new tool. Through repetitions, the agent moves toward accomplishing the goal. In some cases, the model\u2019s orchestration and planning choices are complemented or enhanced by imperative code.<\/p>\n
But what kind of infrastructure does it take to realize this approach? An agentic system needs a few core components:<\/p>\n
A way to build the agent<\/b>. When you deploy an agent, you don\u2019t want to have to code it from scratch. There are several agent development frameworks out there.<\/p>\n<\/li>\n Somewhere to run the AI model. <\/b>A seasoned AI developer can download an open-weight LLM, but it takes expertise to do that right. It also takes expensive hardware that\u2019s going to be poorly utilized for the average user.<\/p>\n<\/li>\n Somewhere to run the agentic code<\/b>. With established frameworks, the user creates code for an agent object with a defined set of functions. Most of those functions involve sending prompts to an AI model, but the code needs to run somewhere. In practice, most agents will run in the cloud, because we want them to keep running when our laptops are closed, and we want them to scale up and out to do their work.<\/p>\n<\/li>\n A mechanism for translating between the text-based LLM and tool calls<\/b>.<\/p>\n<\/li>\n A short-term<\/b> memory<\/b> for tracking the content of agentic interactions.<\/p>\n<\/li>\n A long-term memory<\/b> for tracking the user\u2019s preferences and affinities across sessions.<\/p>\n<\/li>\n A way to trace<\/b> the system\u2019s execution, to evaluate the agent\u2019s performance.<\/p>\n<\/li>\n<\/ul>\n Let's dive into more detail on each of these components.<\/p>\n Asking an LLM to explain how it plans to approach a particular task improves its performance on that task. This \u201cchain-of-thought reasoning\u201d is now ubiquitous in AI.<\/p>\n The analogue in agentic systems is the ReAct (reasoning + action) model, in which the agent has a thought (\u201cI\u2019ll use the map function to locate nearby restaurants\u201d), performs an action (issuing an API call to the map function), then makes an observation (\u201cThere are two pizza places and one Indian restaurant within two blocks of the movie theater\u201d).<\/p>\n ReAct isn\u2019t the only way to build agents, but it is at the core of most successful agentic systems. Today, agents are commonly loops over the thought-action-observation<\/i> sequence.<\/p>\n The tools available to the agent can include local tools and remote tools such as databases, microservices and software as a service. A tool\u2019s specification includes a natural-language explanation of how and when it\u2019s used and the syntax of its API calls.<\/p>\n The developer can also tell the agent to, essentially, build its own tools on the fly. Say that a tool retrieves a table stored as comma-separated text, and to fulfill its goal, the agent needs to sort the table.<\/p>\n Sorting a table by repeatedly sending it through an LLM and evaluating the results would be a colossal waste of resources \u2014 and it\u2019s not even guaranteed to give the right result. Instead, the developer can simply instruct the agent to generate its own Python code when it encounters a simple but repetitive task. These snippets of code can run locally alongside the agent or in a dedicated secure code interpreter tool. <\/p>\n Available tools can divide responsibility between the LLM and the developer. Once the tools available to the agent have been specified, the developer can simply instruct the agent what tools to use when necessary. Or, the developer can specify which tool to use for which types of data, and even which data items to use as arguments during function calls.<\/p>\n Similarly, the developer can simply tell the agent to generate Python code when necessary to automate repetitive tasks or, alternatively, tell it which algorithms to use for which data types and even provide pseudocode. The approach can vary from agent to agent.<\/p>\n Historically, there were two main ways to isolate code running on shared servers: Containerization, which was efficient but offered lower security; and virtual machines, which were secure but came with a lot of computational overhead.<\/p>\n In 2018, Amazon Web Services\u2019 (AWS\u2019s) Lambda serverless-computing service deployed Firecracker, a new paradigm in server isolation. Firecracker creates \u201cmicroVMs\u201d, complete with hardware isolation and their own Linux kernels but with reduced overhead (as low as a few megabytes) and startup times (as low as a few milliseconds). The low overhead means that each function executed on a Lambda server can have its own microVM.<\/p>\n However, because instantiating an agent requires deploying an LLM, together with the memory resources to track the LLM\u2019s inputs and outputs, the per-function isolation model is impractical. Instead, with session-based isolation, every session is assigned its own microVM. When the session finishes, the LLM\u2019s state information is copied to long-term memory, and the microVM is destroyed. This ensures secure and efficient deployment of hosts of agents.<\/p>\n Just as there are several existing development frameworks for agent creation, there are several existing standards for communication between agents and tools, the most popular of which \u2014 currently \u2014 is the model context protocol (MCP). <\/p>\n MCP establishes a one-to-one connection between the agent\u2019s LLM and a dedicated MCP server that executes tool calls, and it also establishes a standard format for passing different types of data back and forth between the LLM and its server.<\/p>\n Many platforms use MCP by default, but are also configurable, so they will support a growing set of protocols over time.<\/p>\n Sometimes, however, the necessary tool is not one with an available API. In such cases, the only way to retrieve data or perform an action is through cursor movements and clicks on a website. There are a number of services available to perform such computer use<\/i>. This makes any website a potential tool for agents, opening up decades of content and valuable services that aren\u2019t yet available directly through APIs.<\/p>\n With agents, authorization works in two directions. First, of course, users require authorization to run the agents they\u2019ve created. But as the agent is acting on the user\u2019s behalf, it will usually require its own authorization to access networked resources.<\/p>\n There are a few different ways to approach the problem of authorization. One is with an access delegation algorithm like OAuth, which essentially plumbs the authorization process through the agentic system. The user enters login credentials into OAuth, and the agentic system uses OAuth to log into protected resources, but the agentic system never has direct access to the user\u2019s passwords.<\/p>\n In the other approach, the user logs into a secure session on a server, and the server has its own login credentials on protected resources. Permissions allow the user to select from a variety of authorization strategies and algorithms for implementing those strategies.<\/p>\n Short-term memory<\/i><\/p>\n LLMs are next-word prediction engines. What makes them so astoundingly versatile is that their predictions are based on long sequences of words they\u2019ve already seen, known as context<\/i>. Context is, in itself, a kind of memory. But it\u2019s not the only kind an agentic system needs.<\/p>\n Suppose, again, that an agent is trying to book a restaurant near a movie theater, and from a map tool, it\u2019s retrieved a couple dozen restaurants within a mile radius. It doesn\u2019t want to dump information about all those restaurants into the LLM\u2019s context: All that extraneous information could wreak havoc with next-word probabilities.<\/p>\n Instead, it can store the complete list in short-term memory and retrieve one or two records at a time, based on, say, the user\u2019s price and cuisine preferences and proximity to the theater. If none of those restaurants pans out, the agent can dip back into short-term memory, rather than having to execute another tool call.<\/p>\n Long-term memory<\/i><\/p>\n Agents also need to remember their prior interactions with their clients. If last week I told the restaurant booking agent what type of food I like, I don\u2019t want to have to tell it again this week. The same goes for my price tolerance, the sort of ambiance I\u2019m looking for, and so on.<\/p>\n Long-term memory allows the agent to look up what it needs to know about prior conversations with the user. Agents don\u2019t typically create long-term memories themselves, however. Instead, after a session is complete, the whole conversation passes to a separate AI model, which creates new long-term memories or updates existing ones.<\/p>\n Memory creation can involve LLM summarization and \u201cchunking\u201d, in which documents are split into sections grouped according to topic for ease of retrieval during subsequent sessions. Available systems allow the user to select strategies and algorithms for summarization, chunking and other information-extraction techniques.<\/p>\n Agents are a new kind of software system, and they require new ways to think about observing, monitoring and auditing their behavior. Some of the questions we ask will look familiar: Whether the agents are running fast enough, how much they\u2019re costing, how many tool calls they\u2019re making and whether users are happy. But new questions will arise, too, and we can\u2019t necessarily predict what data we\u2019ll need to answer them.<\/p>\n Observability and tracing tools can provide an end-to-end view of the execution of a session with an agent, breaking down step-by-step which actions were taken and why. For the agent builder, these traces are key to understanding how well agents are working \u2014 and provide the data to make them work better.<\/p>\n I hope this explanation has demystified agentic AI enough that you\u2019re willing to try building your own agents!<\/p>\nBuilding an agent<\/h2>\n
Runtime<\/h2>\n
Tool calls<\/h2>\n
Authorizations<\/h2>\n
Memory and traces<\/h2>\n
Observability<\/h2>\n