Building a Local-First Agent Framework in Rust (Part 1): Introduction

See Part 0 for the latest table of contents and sample code. New chapters will be added over time.

Introduction: Building the Loop

The word "agent" is already overloaded enough that it can hide more than it explains. Sometimes it means a chatbot with tools. Sometimes it means an autonomous process that keeps working until a goal is done. Sometimes it means a product interface wrapped around a model, memory, a scheduler, a browser, a shell, a file system, and a safety policy. The shape changes from product to product, but the core mechanism is usually smaller than the word suggests.

An agent framework is the machinery around the model.

This post is also available on Medium. If you’re a paid Medium member and happen to read it there, it helps fund my next cup of coffee. Much appreciated ☕️😄

There is no single formal definition I want to rely on here, so I will use a practical one: an agent framework is the orchestration layer that connects a model, context, tools, memory, and a loop. The model is important, of course. It is the part that reads language, proposes actions, and writes answers. But the useful behavior comes from the system around it: how the application stores the conversation, how it tells the model what tools exist, how it parses the model's decision, how it executes a tool, how it feeds the result back, how it remembers what happened, and how it prevents the model from doing something risky without human approval. Those are not things the LLM model does by itself. They are things the agent framework, or the application using the model, has to manage. Once those pieces are visible, agent products become easier to reason about. They stop looking like a single mysterious intelligence and start looking like a set of engineering decisions.

That is what this series is about.

We are going to build abcb, a small local-first agent framework in Rust. It starts as a command-line program. By the end of this first arc, it can talk to a local OpenAI-compatible model server, run a tool-using loop, keep an event log, store simple notes, evaluate model behavior against fixtures, and gate dangerous tools behind human approval. It is not a production framework. It is a teaching artifact, but a real one. The code compiles. The commands run. The failures are not imaginary.

Frankly, this is also for my own study. The best way I know to study something is to explain what I think I understand, expose the gaps, and share the result with other people. These posts are part of that process.

The name abcb is a small personal reference. ABCB, pronounced "ah-ba-ka-bu," is the cafe from Kimagure Orange Road, one of my favorite anime TV shows from the 1980s. In the show, the cafe is not a busy place packed with customers. It is more often a place where the main characters sit, talk, and let the story happen around them. That felt like the right borrowed name for a small framework where experiments can happen.

The larger motivation, as I wrote in the preface, is game development with local agents. But that is not where the implementation starts. Before the agent can be useful inside an editor, it needs the basic loop. It needs to know how to talk to a model, expose tools, receive tool results, recover from malformed output, remember enough to continue, and record enough that I can inspect what happened later.

This series stays with that smaller problem first. That is intentional. If the command-line version is unclear, the editor-integrated version will only make the confusion harder to see.

Who This Is For

If you just want to make a game with agents, this series is probably not the shortest path. You may be better served by using the strongest commercial tools available, accepting their workflow, and spending your time directly on the game. That is a valid choice. You can make a good game without knowing the internals of an agent framework, just as you can make good software without writing your own compiler or database.

This series is for people who want to understand the basic internals of an agent framework and learn Rust through a real example. It is for readers who want to understand the recurring pieces behind agent products: how an application talks to a model, how context is represented, how the model chooses actions, how external tools are exposed and executed, how results flow back into the next step, how memory and observability differ, how failures are handled, and where safety checks belong before an agent changes the outside world.

There is an accountability reason too. We can use tools we do not fully understand, and often we should. But we can only be accountable for what we understand well enough to inspect, question, and change. This series is for people who want to extend that boundary. Not because every project requires it, but because some projects become more interesting when the hidden machinery is no longer hidden.

Why Understand the Framework?

The obvious question is why we should build any of this ourselves. Claude Code, Codex, OpenCode, and other agentic tools already exist. There are also open-source frameworks that expose tools, route model calls, manage memory, and run loops. For many projects, using one of them is the right choice. If the goal is simply to ship something, it is usually better to stand on existing infrastructure than to rebuild it.

But using a framework and understanding one are different things.

You do not need to understand an engine to drive a car, but the analogy only goes so far. When an agent fails, it is tempting to blame the model immediately, especially if the model is local and obviously less capable than a frontier model. Sometimes that blame is correct. Often, the failure can appear in several different places at once. The model may misunderstand the instruction. The prompt may describe the tools poorly. The parser may be too strict. The tool may return the wrong shape of result. The loop may fail to feed that result back in a useful way. The memory layer may retrieve the wrong context. The approval policy may block the right action or allow the wrong one. If all of that is hidden behind a product interface, you can still use the tool, but you cannot easily reason about it.

The internals are not huge. A minimal agent framework needs a typed conversation, a way to call a model, a contract for structured model output, a set of tools, a loop that calls the model and executes what it asks for, a record of what happened, some form of memory, and a safety boundary. Real systems add many layers on top of that, but the recurring pieces are surprisingly portable. Once you have built them in a small form, larger frameworks become less opaque. You can look at them and ask better questions.

That is the first purpose of abcb: to make the mechanism small enough to inspect.

The second purpose is to make the decisions visible. Agent frameworks are full of boundaries. Does the core library own file paths, or does the CLI own them? Does a session borrow messages, or own them? Does a tool call return an error, or does a failed command become data the model can see? Should the framework store an interpretation of a run, or store raw events and derive summaries later? Should safety be a property of each tool, or a separate policy consulted before every call?

Those questions sound abstract until they become code. Rust is useful here because it does not let the decisions stay fuzzy for long. The type signature has to say who owns the value. The error type has to say what can fail. A trait has to say whether a method can mutate its implementation. An async boundary has to say what survives across an await point. The compiler does not design the system for you, but it forces the design to become explicit.

Why Rust?

The honest origin is simple: I wanted to learn Rust, and I wanted a real project to learn it through.

I do not think Rust is the only reasonable language for an agent framework. Python has the ecosystem. TypeScript has the product surface. If I only cared about immediate productivity, I would probably reach for Go. But Rust is useful here because it makes certain choices hard to avoid. An agent loop is stateful and long-running. It calls out to a model over HTTP. It reads and writes files. It may later talk to an editor over a socket. It has to model failures clearly, because a silent failure inside an autonomous loop is not harmless. Rust makes those concerns hard to ignore.

For example, the ownership model is especially useful as a teacher. In a small example, ownership can feel like ceremony. In an agent framework, it quickly becomes architecture. A session lives longer than a single function call, so it should own its messages. A core library should not know where files live, so it should write to something abstract and let the CLI open the file. A provider may be stateful, so calling it may require mutable access. A run may fail, but the session should still survive so the log and summary can tell us what happened. These are design choices, and the language forces us to honor them.

Rust also makes error handling part of the shape of the program. Provider failures, parse failures, tool failures, policy denials, and max-step exhaustion are different kinds of events. Treating them all as one vague failure would make the loop harder to recover from. Modeling them separately makes the code more verbose, but it also makes the system more accountable. Later, when the model emits malformed JSON or asks for a tool that does not exist, the loop can decide whether to recover, retry, or stop.

That is why this series tries to understand Rust through the agent framework rather than beside it. Traits arrive when we need a provider abstraction. String and &str matter when messages have to outlive a single function. Box<dyn Tool> appears when the registry needs to store different tool implementations together. Async appears when the mock model becomes a real HTTP call. Lifetimes appear when the HTTP request can borrow data instead of allocating new strings. The language concepts are not decorative. They show up because the project asks for them.

What We Will Build

The first useful version of abcb is a command-line agent harness.

It begins with almost nothing: a Rust workspace, a CLI shell, a doctor command, and a basic check script. From there, it builds the conversation model: messages, roles, sessions, and serialization tests. Then it adds a provider trait and a scripted mock provider, so the framework can be tested without calling a real model.

Once the one-turn conversation works, the project becomes an agent. We define tools, put them in a registry, ask the model to return a structured envelope, parse that envelope, execute the requested tool, append the tool result to the conversation, and call the model again. That loop is the center of the series. Everything else either supports it, protects it, or makes it easier to understand after the fact.

The next arc connects the framework to a real local model over an OpenAI-compatible HTTP endpoint. In my setup, that means MLX and Gemma, but the important boundary is the wire protocol, not the brand. The provider speaks /v1/chat/completions. Any server that looks enough like that can be substituted. This is where async Rust enters, and also where local-model behavior becomes concrete. The model does not always return clean JSON. Sometimes it returns prose. Sometimes it wraps the answer in markdown fences. Sometimes it chooses the wrong schema. The framework has to survive that.

After that, the project adds the parts that make the loop useful rather than merely functional: session directories, timestamped event logs, simple project-scoped notes, an approval policy, run summaries derived from the log, a system prompt that teaches the model the contract, evaluation fixtures, and finally a few real tools for reading files and running commands. By the end, abcb can answer questions about a local codebase, run safe tools without prompting, ask before shell commands, and leave an audit trail of what happened.

That final shape is still modest. It is a CLI agent, not a Godot assistant. But it contains the parts that a future editor-integrated agent would need: model access, tool access, memory, observability, recovery, evaluation, and approval.

A Note on the Tools

abcb was created, and is still being created, with Claude Code. Most of the code was written by Claude, but not as a blind generation exercise. It came out of many discussions, corrections, design arguments, failed attempts, revisions, and Rust language Q&A with me. That matters because the point of this project is not to pretend the human wrote every line by hand. The point is to understand the system well enough to guide it and be responsible for it.

This post and the larger writing project are being made with Codex. Every sentence originates from me, but Codex may edit, reorganize, research, and propose better phrasing, closer to the role of an editor than an author. That distinction is important to me. I am using AI in the making of this series about making an AI agent framework, but the responsibility for the argument, the examples, and the judgment is mine.

abcb is still under development. I do not know yet whether the original objective, making an agent framework for game development using only local models, can be achieved in the way I imagine. The gaps may be larger than I expect. The workflow may need to change. But the current state is already enough for the narrower objective of this series: to show the internals of a small agent framework and use that process to learn Rust against a real project. It should also help me evaluate other agent frameworks and tools more clearly, because I will have built a small version of the same moving parts myself.

How To Read This Series

This is a build log, not a reference manual. The chapters follow the order in which the framework grows. Each chapter adds one main capability, explains the framework decision behind it, and then uses the code to teach the Rust concept that naturally appears at that point.

The chapters are also written as blog posts, so they should be readable on their own. But the code is cumulative. Chapter 3 assumes the workspace from Chapter 2. Chapter 8 assumes the tool registry and structured model output from Chapters 6 and 7. Chapter 17 makes more sense if you have already seen the event log from Chapter 14. The sequence matters because the sample code grows with the argument.

Each chapter includes focused snippets in the text and a complete sample-code snapshot in the chapter folder. The snapshots are not raw checkouts from the original project history. They are curated chapter states, copied into this writing project so the prose and working code stay together. That duplication is intentional. It makes each chapter self-contained enough to inspect later, even if the original abcb repository keeps changing.

There are a few recurring patterns. Decision notes explain the fork: what options existed, what the implementation chose, what it gave up, and when the decision should be revisited. Rust notes focus on the language concept in play. Trap notes call out the small problems that are easy to miss, such as a local model wrapping JSON in markdown fences, or format! requiring escaped braces inside prompt examples. The point is not to decorate the text. The point is to keep the engineering decision, the Rust concept, and the practical failure mode close to the code that produced them.

What You Need

To follow the code, you need a Rust toolchain, cargo, and an editor that understands Rust well enough to run rust-analyzer. The project uses the 2024 edition. Every chapter's sample code is meant to pass the same basic check set: tests, formatting, and clippy with warnings treated as errors.

For the model chapters, you need a local OpenAI-compatible model server. My reference setup uses MLX with Gemma, because that is what I had running locally. That is not a requirement. A llama.cpp server, Ollama's OpenAI-compatible shim, LM Studio, vLLM, or another compatible endpoint can serve the same role if it exposes the expected chat-completions API. The framework cares about the protocol, not the vendor.

I wrote separately about running Gemma locally with MLX if you want the setup I used as a starting point.

Basic Rust knowledge is not mandatory, but general programming experience will help a lot. I will explain the Rust concepts that matter for the project, but I will not explain every piece of basic syntax in detail. If you have written code in another language, you should be able to follow along and fill in the small grammar details as needed. I also will not pretend that ownership, traits, async, and error modeling are effortless. They are not. The promise is narrower: the hard parts will arrive attached to a concrete problem, not as isolated theory.

You also do not need deep knowledge of LLM tool-calling. We will build a small version by hand. That means the series does not start from provider-specific function-calling APIs. It starts from a JSON contract we control, because that makes the loop visible. Later, that choice has tradeoffs, and we will name them.

What This Is Not

abcb is not a production agent framework. It is small, single-process, and intentionally sequential. That is a feature for this series, not a claim about how production systems should be built. If you need a robust framework for a product, use a mature one and treat this series as background knowledge.

There are also several things this first arc does not build. It does not stream tokens. It fetches model responses whole, because streaming complicates the parse-and-act contract. It does not use native provider tool-calling. It hand-rolls a JSON envelope so the model contract is visible and provider-agnostic. It does not build multi-agent orchestration, RAG, embeddings, fine-tuning, or a wiki-like memory system. The memory here is simple notes and event history.

The shell tool is not sandboxed. Human approval is the safety boundary. That is enough to teach the approval seam, but it is not an operating-system security model. The file-reading tool also has no size cap in the current version, so a large file can overwhelm a local model's context window. These are real limitations, and they are left visible. A teaching project should not pretend its rough edges are features.

The Godot bridge is also not part of the main build. It appears as an epilogue and a direction for the next project. The first complete artifact is the CLI agent. That matters because it gives this series a real ending. A command-line framework that can talk to a local model, call tools, remember notes, record events, summarize runs, evaluate behavior, and ask before executing commands is already enough to understand the core loop.

The rest can wait until the loop is solid.

That is where the next chapter begins: not with a model, not with tools, and not with Godot, but with the workspace that will hold the whole thing.

To Chapter 2: Setting Up the Workspace