Building a Local-First Agent Framework in Rust (Part 18): Teaching the Model

Share
Building a Local-First Agent Framework in Rust (Part 18): Teaching the Model
See Part 0 for the latest table of contents and sample code. New chapters will be added over time.

Chapter 18: Teaching the Model: The System Prompt and Robust Parsing

By Chapter 17, abcb run has become a real loop. It can keep a session, call tools, deny tools, record an event log, and print a run summary. The loop is no longer just a toy control flow.

But there is still one uncomfortable boundary: the model.

This post is also available on Medium. If you’re a paid Medium member and happen to read it there, it helps fund my next cup of coffee. Much appreciated ☕️😄

The Rust side expects a strict envelope:

{"kind":"tool_call","tool_name":"echo","arguments":{"text":"hello"}}

or:

{"kind":"final","content":"All done."}

The local model, however, does not automatically know that contract. A model may answer with clean JSON. It may wrap JSON in a markdown code fence. It may add a sentence before the JSON. It may decide to be helpful in a way that is helpful to a person, but not helpful to a parser.

This chapter adds three layers of defense.

  • First, we teach the model what to emit by adding a system prompt.
  • Second, we make parsing tolerant of one common local-model habit: markdown code fences.
  • Third, we still keep the recovery loop from Chapter 9. If the output is still wrong, the environment talks back and gives the model another chance.

The order matters. I do not want parsing to become a magic rescue machine. The primary fix is to state the contract clearly. The parser should tolerate a little. The recovery loop should catch the rest.

The sample code for this chapter is in chapter18/abcb/.

18.1 A Role We Already Had

One nice thing about the earlier design is that Role::System already exists:

File: abcb/crates/abcb-core/src/lib.rs

#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
#[serde(rename_all = "snake_case")]
pub enum Role {
    User,
    Assistant,
    System,
    Tool,
}

This chapter does not need a new message type. It does not need a special prompt field. It uses the same Message and Session structure that already carries user, assistant, and tool messages.

That is a small payoff from keeping the message model general. A system prompt is not a separate field in Session. It is just another message in the session, with a different role.

The model adapter in abcb-models already knows how to translate that role into the OpenAI-compatible HTTP request sent to the local model server:

File: abcb/crates/abcb-models/src/lib.rs

fn wire_role(role: &Role) -> &'static str {
    match role {
        Role::User => "user",
        Role::Assistant => "assistant",
        Role::System => "system",
        Role::Tool => "user",
    }
}

The OpenAI-compatible chat format has the familiar user, assistant, and system roles, so Role::System can go out as system. Tool messages are different. Some local servers may not support a dedicated tool role consistently, so abcb still sends Role::Tool as user. That is not perfect, but it keeps the wire format simple while the loop is still small.

That means the new behavior can be introduced at the CLI layer by adding one message before the user message.

18.2 The System Prompt Lives In Core

The new prompt is built by system_prompt:

File: abcb/crates/abcb-core/src/lib.rs

pub fn system_prompt(registry: &ToolRegistry) -> String {
    let mut tools: Vec<(&str, &str)> = registry
        .names()
        .map(|name| {
            let description = registry
                .get(name)
                .map(|tool| tool.description())
                .unwrap_or("");
            (name, description)
        })
        .collect();
    tools.sort_by_key(|(name, _)| *name);

    let mut prompt = String::from(
        "You are abcb, an agent that acts by emitting JSON.\n\n\
         Respond with EXACTLY ONE JSON object and nothing else: no prose, no markdown code fences.\n\n\
         The object must be one of two shapes:\n\n\
         To call a tool:\n  \
         {\"kind\":\"tool_call\",\"tool_name\":\"<one of the tools below>\",\"arguments\":{...}}\n\n\
         To give your final answer:\n  \
         {\"kind\":\"final\",\"content\":\"<your answer>\"}\n\n\
         Available tools:\n",
    );
    for (name, description) in &tools {
        prompt.push_str("- ");
        prompt.push_str(name);
        prompt.push_str(": ");
        prompt.push_str(description);
        prompt.push('\n');
    }
    prompt.push_str(
        "\nExamples:\n  \
         {\"kind\":\"tool_call\",\"tool_name\":\"echo\",\"arguments\":{\"text\":\"hello\"}}\n  \
         {\"kind\":\"final\",\"content\":\"All done.\"}",
    );
    prompt
}

This function lives in abcb-core, not in the CLI. That may look surprising at first, because the CLI is the place that injects the prompt.

Here is the boundary I have in mind.

The CLI crate is the outer shell. It knows about command-line arguments, config files, paths, and terminal behavior. It decides that the run command should create a session, add the user's message, open the event log, call run_loop, and print a summary.

The core crate is the engine. It defines the concepts the loop depends on: Message, Role, Session, ModelOutput, Tool, ToolRegistry, ApprovalPolicy, and run_loop. It also parses the model's response into ModelOutput.

The system prompt sits between those two worlds. The CLI is the place that decides when to insert it, because only the CLI knows which command is running. But the content of the prompt is not really a CLI concern. It tells the model how to speak to the core loop: emit a ModelOutput envelope, and choose one of the tools registered in ToolRegistry.

That is why system_prompt lives in abcb-core. If we later rename tool_call, add a new envelope shape, or change how tools are listed, the Rust type, the parser, the tests, and the prompt can move together. The CLI only has to say, "for this command, add the system prompt before the user message."

The prompt says three things.

  • It tells the model to emit exactly one JSON object.
  • It shows the two allowed shapes: tool_call and final.
  • It lists the tools currently registered.

This is not just nicer wording. For local models, it is part of the interface. If we expect a structured response, we should say so plainly.

18.3 Sorting For A Stable Prompt

The first half of system_prompt turns the registry into a list:

File: abcb/crates/abcb-core/src/lib.rs

let mut tools: Vec<(&str, &str)> = registry
    .names()
    .map(|name| {
        let description = registry
            .get(name)
            .map(|tool| tool.description())
            .unwrap_or("");
        (name, description)
    })
    .collect();
tools.sort_by_key(|(name, _)| *name);

The registry stores tools in a HashMap. A HashMap does not promise a stable iteration order. That is fine for lookup, but it is not good for a prompt we want to test.

So the code collects the tools into a Vec, then sorts them by name.

This has two practical effects. The generated prompt is stable from run to run, and the tests can check exact pieces of it without depending on accidental map order.

This is one of those small engineering choices that makes a system easier to reason about. In this prompt, the tool list is not ordered by priority or workflow. Each tool is just one available action, so echo appearing before session_note_search is not meant to carry meaning. The test suite, however, does care about stable text.

18.4 Why push_str Instead Of format!

The prompt builder uses String::from, then appends pieces with push_str:

File: abcb/crates/abcb-core/src/lib.rs

for (name, description) in &tools {
    prompt.push_str("- ");
    prompt.push_str(name);
    prompt.push_str(": ");
    prompt.push_str(description);
    prompt.push('\n');
}

This is partly ordinary string building. But it also avoids a common Rust trap.

The prompt contains JSON examples, and JSON uses braces:

File: abcb/crates/abcb-core/src/lib.rs

{"kind":"final","content":"<your answer>"}

If we put the whole prompt inside format!, those braces would become special. format! treats {...} as formatting placeholders, so literal JSON braces have to be escaped as {{ and }}.

That is easy to get wrong and hard to read.

Here, most of the prompt is a plain string literal, and the dynamic tool list is appended with push_str. There is no formatting parser trying to interpret the JSON examples. The code is more boring, and that is a good thing.

18.5 Injecting The Prompt In run

The CLI imports the prompt builder:

File: abcb/crates/abcb-cli/src/main.rs

use abcb_core::{
    AllowAll, Event, LoggedEvent, LoopError, Message, MockProvider, Role, Session, ToolRegistry,
    one_turn, read_events, run_loop, system_prompt, write_event,
};

Then run_run seeds the session with a system message before the user message:

File: abcb/crates/abcb-cli/src/main.rs

let mut session = Session::start();
session.push_message(Message::new(Role::System, system_prompt(&registry)));
session.push_message(Message::new(Role::User, message.as_str()));

This is the whole injection point.

The prompt is not passed as a separate parameter to run_loop. It is not a special field on Provider. It is just part of the session history.

That keeps the layers simple. The provider already receives the session and serializes messages. The loop already calls the provider with the session. Adding a system message does not require changing those APIs.

One detail matters: this happens in run, not in chat. chat is still a one-shot assistant reply path. run is the agent path that parses ModelOutput, calls tools, and needs the strict envelope contract.

18.6 System Messages Are Not Run Events

When run_loop starts, it records the user messages that were already placed in the session:

File: abcb/crates/abcb-core/src/lib.rs

for message in &session.messages {
    if message.role == Role::User {
        write_event(
            events,
            &Event::UserMessage {
                content: message.content.clone(),
            },
        )?;
    }
}

This scan happens before the loop asks the provider for the next assistant reply. It gives the event log a clear starting point: first record what the user asked, then record what the agent does in response.

The code already filters for Role::User. Because the new prompt is Role::System, it is not written as a UserMessage event.

That is the behavior I want. The event log records what the user asked and what the loop did. The system prompt is part of the model context, but it is not a user event and it should not count as a model step.

This is another small layering payoff. We add one system message to the session, and the existing event-log boundary still does the right thing.

18.7 Parsing Now Has A Small Safety Net

The second layer of defense is in ModelOutput::parse:

File: abcb/crates/abcb-core/src/lib.rs

impl ModelOutput {
    pub fn parse(raw: &str) -> Result<ModelOutput, ModelOutputError> {
        Ok(serde_json::from_str(extract_json(raw))?)
    }
}

Before this chapter, parse handed the raw model output directly to serde_json::from_str. Now it first calls extract_json, a small helper that decides which part of the model's text should be treated as JSON.

This is not meant to bless messy output. The system prompt still says "no prose, no markdown code fences." But local models sometimes wrap the JSON in a code block anyway. If the JSON is clearly inside a fence, we can tolerate that without pretending arbitrary prose is valid.

That is the moderate choice.

We are not doing aggressive brace matching. We are not searching for the first { and last }. That strategy looks tempting, but it can break when a string value contains }. Instead, this parser keys off a marker the model deliberately emits: a markdown fence. The next section walks through that helper.

18.8 extract_json: A Narrow Borrowed-Slice Helper

The helper looks like this:

File: abcb/crates/abcb-core/src/lib.rs

fn extract_json(raw: &str) -> &str {
    let Some(open) = raw.find("```") else {
        return raw.trim();
    };

    let after_open = &raw[open + 3..];
    let body = match after_open.find('\n') {
        Some(newline) => &after_open[newline + 1..],
        None => after_open,
    };

    let inner = match body.find("```") {
        Some(close) => &body[..close],
        None => body,
    };

    inner.trim()
}

The signature is important:

File: abcb/crates/abcb-core/src/lib.rs

fn extract_json(raw: &str) -> &str

The return value is not a new String. It is a borrowed slice of raw.

The name extract_json sounds broad, but the helper is deliberately narrow. It does not find JSON anywhere in arbitrary text. It only handles two shapes: clean JSON after trimming, or JSON wrapped in a markdown code fence.

If there is no fence, the function returns raw.trim(): still a borrowed slice.

If there is a fence, the function slices into raw using byte indexes returned by find. after_open, body, and inner are all &str views into the same original string.

One small Rust safety detail is worth noticing here. Slicing a &str by byte index can panic if the index lands in the middle of a UTF-8 character. This code is okay because str::find returns byte indexes at valid character boundaries, and the markers we add to those indexes are ASCII: three backticks for the fence and one newline for the line break. Even if the model writes Korean, Japanese, emoji, or accented text around the fence, the slice boundaries still come from the ASCII markers, not from guessing inside the text.

This is a compact lifetime example. The returned &str cannot outlive raw, because it points into raw. Rust can infer that relationship here, so we do not need to write lifetime parameters by hand.

The function is also zero-allocation. It does not copy the model output. It only finds boundaries and returns the part that should be parsed.

18.8.1 let-else For The Fast Path

The first line of extract_json uses let-else:

File: abcb/crates/abcb-core/src/lib.rs

let Some(open) = raw.find("```") else {
    return raw.trim();
};

The find call looks for the opening markdown fence, the three backtick characters, and returns Option<usize>.

If a fence exists, the Some(open) pattern extracts the starting byte index into open, and the function continues.

If there is no fence, the else block runs and returns the trimmed input immediately.

That makes the no-fence path easy to see. Clean JSON stays clean JSON. It does not go through the fence-stripping logic.

18.8.2 What The Fence Logic Actually Does

After the opening fence is found, the code skips the fence line:

File: abcb/crates/abcb-core/src/lib.rs

let after_open = &raw[open + 3..];
let body = match after_open.find('\n') {
    Some(newline) => &after_open[newline + 1..],
    None => after_open,
};

The + 3 skips the three backticks. If the model wrote ```json, the language tag is still on that first fence line. So the code looks for the newline and starts after it.

Then it cuts at the closing fence:

File: abcb/crates/abcb-core/src/lib.rs

let inner = match body.find("```") {
    Some(close) => &body[..close],
    None => body,
};

inner.trim()

If a closing fence exists, inner is the content before it. If not, the function takes the rest of the body. The final trim() removes surrounding whitespace.

This is deliberately modest parsing. It handles a common wrapper. It does not try to understand every possible malformed response.

There is also a known edge case. Because this helper keys off the first markdown fence it sees, a legitimate JSON string that contains triple backticks could confuse it. That is acceptable for this stage. The system prompt asks the model not to use fences at all, and the recovery loop still catches parse failures. The helper only tolerates the most common wrapper. It does not become a second language parser.

18.9 The Tests Describe The Boundary

The parser tests cover the cases we want to tolerate:

File: abcb/crates/abcb-core/src/lib.rs

#[test]
fn parse_tolerates_a_json_code_fence() {
    let raw = "```json\n{\"kind\":\"final\",\"content\":\"done\"}\n```";

    let output = ModelOutput::parse(raw).expect("fenced JSON should parse");

    assert_eq!(
        output,
        ModelOutput::Final {
            content: "done".into()
        }
    );
}

A fenced JSON block should parse.

File: abcb/crates/abcb-core/src/lib.rs

#[test]
fn parse_tolerates_a_bare_fence_with_surrounding_prose() {
    let raw = "Here is my response:\n```\n{\"kind\":\"final\",\"content\":\"done\"}\n```";

    let output = ModelOutput::parse(raw).expect("prose + fence should parse");

    assert_eq!(
        output,
        ModelOutput::Final {
            content: "done".into()
        }
    );
}

Prose around a fenced block should parse too. This is a pragmatic choice, because that is a common local-model shape.

Clean JSON still works:

File: abcb/crates/abcb-core/src/lib.rs

#[test]
fn parse_still_accepts_clean_json_and_trims_whitespace() {
    let raw = "  {\"kind\":\"final\",\"content\":\"done\"}  ";

    let output = ModelOutput::parse(raw).expect("clean JSON should still parse");

    assert_eq!(
        output,
        ModelOutput::Final {
            content: "done".into()
        }
    );
}

But prose with no JSON still fails:

File: abcb/crates/abcb-core/src/lib.rs

#[test]
fn parse_rejects_prose_with_no_json() {
    let err = ModelOutput::parse("Sure, I can help with that!").expect_err("no json");

    assert!(matches!(err, ModelOutputError::Parse(_)));
}

That last test is important. Fence stripping is a safety net, not magic. If the model does not produce JSON at all, parsing fails, and the recovery loop can respond.

18.10 Testing The Prompt Contract

The system prompt test checks the contract at a few important points:

File: abcb/crates/abcb-core/src/lib.rs

#[test]
fn system_prompt_states_the_contract_and_lists_tools() {
    let registry = registry_with_stub_echo();

    let prompt = system_prompt(&registry);

    assert!(prompt.contains(r#""kind":"tool_call""#));
    assert!(prompt.contains(r#""kind":"final""#));
    assert!(prompt.contains("no markdown code fences"));
    assert!(prompt.contains("- stub_echo: Echoes the `text` field of its arguments."));
}

The test does not compare the entire prompt string. That would make the test brittle. Instead, it checks the pieces that define the contract: both envelope shapes, the no-fence instruction, and the registered tool listing.

This is also why the tool list is sorted. As the registry grows, the generated prompt should remain stable enough to test and reason about.

18.11 The Three-Layer Boundary

At this point, the model boundary has three layers.

The first layer is instruction. The system prompt tells the model the exact JSON envelope it should emit.

The second layer is tolerance. extract_json accepts clean JSON and fenced JSON, without trying to rescue every possible malformed response.

The third layer is recovery. If parsing still fails, the existing loop can feed a recovery message back to the model.

This is the shape I want for local models. Tell the model what we want. Tolerate a common small deviation. Recover from the rest.

The opposite design would be to keep the prompt vague and make parsing aggressive. That feels clever, but it moves too much responsibility into string manipulation. An agent framework should make the contract visible first.

18.12 What Changed

Chapter 18 teaches the model the envelope contract and makes parsing a little more robust.

The framework lesson is that the model needs an interface just as much as the Rust code does. The system prompt is not decoration. It is part of the boundary between probabilistic output and deterministic code.

The Rust lesson is that a small borrowed-slice helper can be enough. extract_json(raw: &str) -> &str returns a view into the original text, uses let-else for the no-fence path, and slices carefully around markdown fences.

The design lesson is moderation. The parser tolerates fences, but it does not pretend arbitrary prose is valid JSON. The prompt prevents the common failure. The parser handles the common leftover wrapper. The recovery loop catches the rest.

The next chapter measures behavior more directly. Once the model has a clearer contract, we can start asking whether the agent actually follows it across repeatable scenarios.

To be continued