Building a Local-First Agent Framework in Rust (Part 21): Beyond The CLI

Share
Building a Local-First Agent Framework in Rust (Part 21): Beyond The CLI
See Part 0 for the latest table of contents and sample code.

Chapter 21: Beyond The CLI: The Godot Bridge

This is the final chapter of this book.

It is also not a build chapter.

Until now, each chapter ended with a concrete checkpoint. The sample code changed. A command gained one more behavior. The loop learned one more responsibility. By Chapter 20, abcb had become a small but usable local-first agent framework: it can talk to a local model, keep a session, use tools, recover from bad model output, record events, summarize a run, ask for approval, and even inspect local files or run approved commands.

That is a good place to stop the main build.

This post is also available on Medium. If you’re a paid Medium member and happen to read it there, it helps fund my next cup of coffee. Much appreciated ☕️😄

This chapter is different. It is a preview of the plan that motivated the project in the first place: taking the agent out of the CLI and embedding it into a creative environment. For me, that environment is Godot.

The goal is not only to have an agent that can answer questions from the terminal. The original motivation was closer to this: can I build an agent framework that helps make a game, locally, with enough of the process visible that I can understand and trust it?

The CLI was the right first shape because it removed distractions. A command-line program made the loop easy to see. A local OpenAI-compatible endpoint made the provider boundary easy to test. JSONL made the event log visible. A simple approval policy made safety concrete.

But a game is not made only from text prompts.

A game project has scenes, nodes, scripts, assets, editor state, playtest feedback, and visual context. If the agent cannot see or touch those things, the user remains the bridge. The user has to copy screenshots, describe scene trees, paste errors, run commands, and manually apply changes.

That was always too much friction.

So this final chapter looks forward. It sketches the next version of abcb: a local agent daemon that can connect to the Godot editor, read editor context, inspect scenes, and eventually perform approved editor mutations. This is not a promise that the design is finished. It is a map of the next questions.

There is no Chapter 21 sample code. The last runnable checkpoint is Chapter 20. Here, the artifact is the plan.

21.1 Why The CLI Was Only The First Host

The CLI gave us a clean place to build the agent loop, but it is not the natural home for every workflow.

When I ask an agent to explain a Rust file, the terminal is enough. The agent can call read_file, inspect Cargo.toml, maybe run cargo test after I approve it, and produce an answer. The input and output are mostly text.

Game development is different.

If I am working in Godot, the important context may be the currently open scene, the selected node, the script attached to that node, the current editor errors, or the shape of the scene tree. Some of that exists on disk, but not all of it is easiest to understand from files alone. Some of it lives in the editor.

The CLI agent can read project.godot or a .tscn file. That is useful, but it is not the same as being editor-aware. It cannot know what I am currently looking at. It cannot ask the editor for the selected node. It cannot save the current scene. It cannot make a small change and let me inspect it immediately.

That is the next boundary.

The first twenty chapters built the agent's internal shape. The next project would be about giving that agent a host environment.

21.2 The Shape Of The Bridge

The plan is to keep the agent local and split the bridge into a few clear pieces:

  • A Rust daemon that runs on the local machine.
  • A WebSocket endpoint, probably something like ws://127.0.0.1:8765/godot.
  • A protocol crate that defines the messages passed between Rust and Godot.
  • A Godot editor addon written in GDScript that connects out to the daemon.
  • A small first set of bridge commands.

The first commands do not need to be dramatic. In fact, they should not be.

I would start with:

  • ping, to prove the connection is alive.
  • get_editor_context, to ask what the editor is currently focused on.
  • get_scene_tree, to inspect the current scene structure.
  • save_current_scene, to make one small editor action visible and auditable.

This is the same instinct we used throughout the book. Do not begin with the most powerful version. Begin with the smallest version that proves the boundary.

In Chapter 6, echo was enough to prove that the loop could call a tool. In Chapter 14, one JSONL event per line was enough to prove that the run could be audited. In Chapter 16, DenyAll was enough to prove that approval was part of the loop. The Godot bridge should start the same way.

First prove that Rust and Godot can talk.

Then prove that the agent can ask for editor context.

Then prove that a safe editor action can be requested, approved, executed, and recorded.

Only after that should the bridge become ambitious.

21.3 The Approval Boundary Becomes More Important

Chapter 20 made approval feel practical because run_command can affect the system. The Godot bridge would make approval even more important.

Reading the scene tree is one thing. Modifying a scene is another.

An agent that can inspect the current scene can help explain it. An agent that can change the current scene can also break it. It might rename a node, delete a child, move a script, save an unwanted state, or make a change that looks correct in text but feels wrong in the editor.

That means editor mutations should go through the same approval seam we already built.

This is the reason I am glad ApprovalPolicy exists as a trait. Today, InteractivePolicy asks a simple y/N question in the terminal. A Godot-integrated version could show a richer approval dialog inside the editor. It could display:

  • the requested command,
  • the target scene or node,
  • the arguments,
  • the expected effect,
  • and maybe a preview or diff when that becomes possible.

The policy decision would still be the same shape: allow or deny. The user experience around that decision would become more specific to the host.

This also brings back the leaky trait problem from Chapter 20. ApprovalPolicy::approve currently exposes serde_json::Value in the public signature. That was acceptable for the CLI. In an editor bridge, the approval request may deserve its own typed vocabulary. A future ToolRequest or EditorActionRequest could make the policy boundary clearer than a raw JSON value.

This is what I mean by a deferred decision coming due. The earlier design was not wrong. It was good enough for the CLI. The next host would apply pressure to it.

21.4 The Loop Signature Is Getting Crowded

There is another piece of pressure we can already see.

By the end of the book, run_loop takes a lot of things:

  • a provider,
  • a registry,
  • a session,
  • a maximum step count,
  • an event writer,
  • and an approval policy.

That was fine while we were learning. Each parameter made a dependency visible. It was useful to see the loop's needs one by one.

But this is also how a design starts to ask for a new type.

In a daemon, the loop may need even more context: session storage, cancellation, per-client metadata, editor connection state, maybe a summary sink, maybe a tracing span. Cancellation means the host can ask a running task to stop cleanly. A tracing span is a structured logging context, a way to say "these logs belong to this run or this client." If we keep adding parameters, the signature will become harder to read and harder to change.

The next shape is probably something like a loop context or loop options struct.

Not because "structs are cleaner" in the abstract, but because the group of values has become a real concept. They are the environment in which the loop runs.

That would be a different Rust lesson from the ones earlier in the book. At first, explicit parameters are a teaching advantage. Later, too many explicit parameters become a smell. A good refactor is not about hiding information. It is about naming the thing that has emerged.

21.5 Async Comes Back

Chapter 11 introduced async, but we kept the runtime simple. The provider runs sequentially. The CLI starts, performs one command, and exits. That kept the mental model small.

A daemon changes that.

A WebSocket server may handle multiple clients. Even if I only expect one Godot editor at first, the architecture starts to look more concurrent. There may be one task accepting connections, another task driving the agent loop, and another task waiting for editor responses. A long-running daemon also has to care more about cancellation, shutdown, and whether futures are safe to move across threads.

This is where one of the earlier Rust footnotes returns: Send futures.

In Chapter 11, we accepted native async fn in a public trait because the project was simple and sequential. A daemon may force that decision back onto the table. If the runtime becomes multi-threaded, provider futures may need to be Send. The trait may need a different shape, or the project may need a helper like trait_variant, a Rust crate that can generate related trait variants such as a local-only version and a Send version. Or the daemon may intentionally stay on a single-threaded runtime.

I do not want to decide that in this book because we do not have the daemon yet. But I do want to name it as the next Rust question. Good deferral should leave a clear note for the future version of the project.

21.6 Session Identity Would Need To Grow Up

Chapter 13 used a timestamp-derived session id:

sess-<unix-epoch-millis>

For the CLI, that was enough. One command starts one process. Collisions are unlikely enough that the simplicity is worth it.

For a daemon, that assumption gets weaker.

If the daemon can run multiple sessions inside one process, or if a Godot editor can ask for multiple agent tasks, session identity should become stronger. A UUID would be the obvious next step. The nice part is that the Session struct is already flexible enough:

id: String

The field does not care whether the id came from a timestamp, a UUID, or something else. That means the early choice was simple without closing the later door.

This is one of the small design satisfactions in the project. Not every temporary choice becomes debt. Some are just modest choices made behind a flexible boundary.

21.7 Memory Is Still The Frontier

The book built notes, events, sessions, replay, and eval. That is already more memory than many toy agents have.

But it is not the full memory story.

For game development, memory probably needs at least one more layer. The agent may need to know project conventions, design notes, asset naming rules, character constraints, known bugs, and decisions from previous sessions. Some of that can live in notes. Some of it belongs in a more deliberate project wiki. Some of it might eventually need retrieval, embeddings, or indexed summaries.

I am cautious about jumping there too early.

Retrieval can make an agent feel smarter, but it can also make the system harder to reason about. If the wrong note is retrieved, the model may act with false confidence. If old design decisions remain in memory after they are obsolete, the agent may fight the current direction of the project. Memory is useful only when the framework can explain where it came from and why it was used.

So the next memory step should stay accountable.

Before adding a big retrieval layer, I would rather add a small project knowledge primitive with clear source, clear update path, and clear visibility to the user. The rule from this book still applies: we can be accountable only for what we can inspect and understand.

21.8 What The Godot Agent Might Do First

If I imagine the first useful Godot workflow, it is not "make my whole game."

It is smaller.

Maybe I select an isometric level scene and ask:

Explain how this isometric level editor scene is structured.

The agent asks the editor for the scene tree. It reads the scripts attached to the level editor nodes. It checks how tiles, layers, selection, collision, and save/load data are represented. It answers with a map of the current design.

Then I ask:

Add a small blocked-tile painting tool to this level editor.

The agent proposes actions: add a toolbar toggle, update the tile-painting script, mark blocked cells in the level data, and save the current scene. The editor shows me what will change. I approve the safe steps one by one. The daemon records the event log. If something looks wrong, replay can show what the model asked for, what I approved, and what the editor did.

That would already be useful.

Not because the agent becomes autonomous, but because it becomes situated. It can see more of the environment. It can propose actions in the place where the work actually happens. It can still be local. It can still be inspectable. It can still ask before doing something risky.

That is the kind of assistant I wanted when this project started.

21.9 The Deferred-Decision Ledger

A book like this leaves many things intentionally unfinished. That is part of the point of a build log.

The useful question is whether the unfinished things are vague or named.

Here is the ledger I would carry into the next version of abcb:

  • ApprovalPolicy probably needs a better request type than raw serde_json::Value.
  • run_loop probably needs a context struct once the daemon adds more dependencies.
  • Provider async traits may need Send futures if the daemon uses a multi-threaded runtime.
  • Session ids should probably move from timestamp strings to UUIDs when concurrency becomes real.
  • RunSummary and eval may need to move into a shared crate if the daemon also wants them.
  • run_command still needs a timeout.
  • read_file still needs a size cap.
  • Memory needs a more deliberate project-knowledge layer before it becomes retrieval-heavy.
  • Prompt injection becomes more serious when the agent can read many project files and talk to an editor.

This list does not make me feel that the project is messy. It makes me feel the opposite. The earlier chapters left seams. Those seams make the next work easier to see.

Good deferral leaves a clean to-do list.

21.10 Closing The Book

This book began with a personal question: could I build an agent framework myself, locally, in Rust, while keeping enough of the process visible that I could learn from it and trust it?

The answer is not "yes, finished."

The answer is more like this: the CLI version is small enough to understand, and real enough to matter.

It has a provider boundary. It has a typed conversation. It has a model-output contract. It has tools. It has recovery. It has events. It has sessions. It has notes. It has approval. It has eval. It has local file tools and a gated command tool. It has limitations that are visible instead of hidden.

The Rust path also has a shape now. We used ownership and borrowing to decide who holds the session, the registry, the provider, and the event writer. We used traits to put providers, tools, and approval policies behind boundaries. We used Result and error types to keep failure visible. We used serde to turn Rust values into model and log contracts. We used async where the model server forced us to wait. We touched lifetimes, trait objects, iterators, slices, and the pure/impure split, not as isolated exercises, but because the framework needed them.

That is enough for this book.

The next version belongs to the editor.

If abcb becomes a Godot bridge, it will need new Rust lessons and new framework decisions. It will need WebSockets, protocol design, editor-side tooling, richer approval, better memory, and probably a few redesigns of things that were good enough for the CLI.

But the shape is no longer mysterious.

The agent is not magic. It is a loop, a contract, tools, memory, policy, and a host. We built the first five pieces in the CLI. The next plan is to give them a better host.

That is where I want to go next.