Building a Local-First Agent Framework in Rust (Part 9): Failing Gracefully With Recovery

Share
Building a Local-First Agent Framework in Rust (Part 9): Failing Gracefully With Recovery
See Part 0 for the latest table of contents and sample code. New chapters will be added over time.

Chapter 9: Failing Gracefully With Recovery

Chapter 8 gave abcb a loop. The framework could create a session, send it to the provider, parse the model's structured output, execute a tool, append the tool result, and repeat until the model returned a final answer.

That was enough for the happy path. It was not enough for a local-model agent.

This post is also available on Medium. If you’re a paid Medium member and happen to read it there, it helps fund my next cup of coffee. Much appreciated ☕️😄

When we work with commercial coding agents, we can sometimes forget how much correction is happening inside the tool. The model emits something slightly wrong, the system nudges it, the model tries again, and we only see the polished surface. With local models, especially smaller ones, that surface is thinner. The model may return malformed JSON. It may call a tool that does not exist. It may provide the wrong argument shape for a valid tool.

If the framework treats every one of those mistakes as a fatal error, the agent loop becomes fragile. One bad envelope ends the whole run.

But some errors are useful information. A malformed envelope can be turned into: "your previous output was not valid JSON." An unknown tool can be turned into: "call one of the registered tools instead." Bad arguments can be turned into: "correct them and try again."

That is the chapter's main change: some errors become context for the next model turn.

The sample code for this chapter is in chapter09/abcb/.

9.1 Recovery Is Loop Policy

The loop now has one more possible path:

The new arrow is the one from recoverable error back to session. This is not the same as pretending the error did not happen. The framework records the problem as a Role::Tool message and lets the provider see it on the next turn.

This choice belongs in run_loop, not in run_step.

run_step still means: call the provider once, record the assistant message, parse the envelope, and either return a final answer, execute one tool, or report an error. It does not decide whether an error deserves another try.

run_loop owns that policy because the loop controls repeated attempts. It already owns the session. It already owns the step budget. It is the right place to say: this error can become feedback, but that error should stop the run.

9.2 The Errors We Can Classify

The classification starts with LoopError, the error type returned by the agent loop:

File: abcb/crates/abcb-core/src/lib.rs

pub enum LoopError {
    Provider(ProviderError),
    Parse(ModelOutputError),
    UnknownTool(String),
    Tool(ToolError),
    MaxStepsExceeded { max_steps: usize },
}

These variants come from different layers.

Provider means the provider itself failed. In this chapter, the mock provider can run out of scripted responses. Later, a real provider may fail because an HTTP request failed or a model server was unavailable.

Parse means the provider returned text, but the framework could not parse it as ModelOutput.

UnknownTool means the model asked for a tool name that the registry does not know.

Tool means the registry found the tool, but invoking it failed.

MaxStepsExceeded means the loop consumed its step budget without receiving a final answer.

Only some of these are problems the model can plausibly fix. That distinction matters. If the model returned bad JSON, a retry with better instructions may work. If a file-backed tool hit an IO error, asking the model to try again may only hide a real environment problem.

9.3 Option<String> as a Recovery Verdict

The policy is expressed as a method on LoopError:

File: abcb/crates/abcb-core/src/lib.rs

impl LoopError {
    /// Classify this error for the agent loop's recovery policy.
    ///
    /// Returns `Some(feedback)` when the model can plausibly fix the problem by
    /// trying again. The feedback is appended to the session as a `Role::Tool`
    /// message so the next turn sees what went wrong. Returns `None` for
    /// environment, provider, or terminal errors that retrying cannot fix.
    pub fn recovery_feedback(&self) -> Option<String> {
        match self {
            LoopError::Parse(e) => Some(format!(
                "your previous output was not valid JSON ({e}); reply with a single valid envelope"
            )),
            LoopError::UnknownTool(name) => Some(format!(
                "unknown tool `{name}`; call one of the registered tools instead"
            )),
            LoopError::Tool(ToolError::InvalidArguments(e)) => Some(format!(
                "invalid tool arguments: {e}; correct them and try again"
            )),
            LoopError::Tool(ToolError::Execution(msg)) => Some(format!(
                "tool execution failed: {msg}; try a different approach"
            )),
            // Environment / provider / terminal errors are not the model's to fix.
            LoopError::Tool(ToolError::Io(_)) => None,
            LoopError::Provider(_) => None,
            LoopError::MaxStepsExceeded { .. } => None,
        }
    }
}

The return type is small, but it carries the whole recovery decision:

Option<String>

Some(feedback) means the loop should recover. The string is the message that will be appended to the session.

None means the loop should stop and return the original error.

Option as a policy result

We have already used Option for "a value may or may not be present." Here it also works as a compact policy result. Some is not just "there is a string." It means "this error is recoverable, and this is the feedback to send." None means "do not recover from this error."

There are other ways to design recovery. The framework could try to repair malformed JSON by itself. The provider layer could hide a retry inside complete. A stricter model interface could reject bad output before the loop sees it. Those approaches may become useful later, especially when the provider becomes real.

In this chapter, recovery is handled by feeding the error back to the model. That does not mean recovery is only the model's job. The framework still owns the policy: it decides which errors can be retried, writes the feedback, and appends that feedback to the session. Then the next provider call can respond with that context. This is simple, but it is also accountable. The session shows the mistake, the feedback, and the next attempt.

This keeps the caller simple. The loop does not need a separate enum like Recoverable or Terminal yet. If there is feedback, it can continue. If there is no feedback, it stops.

9.4 The Recoverable Cases

The first recoverable case is malformed model output:

File: abcb/crates/abcb-core/src/lib.rs

LoopError::Parse(e) => Some(format!(
    "your previous output was not valid JSON ({e}); reply with a single valid envelope"
)),

This is a model-output problem. The provider returned a message, but the message was not a valid ModelOutput envelope. Since the model can plausibly correct its next reply, we turn the parse error into direct feedback.

The second recoverable case is an unknown tool:

File: abcb/crates/abcb-core/src/lib.rs

LoopError::UnknownTool(name) => Some(format!(
    "unknown tool `{name}`; call one of the registered tools instead"
)),

This also belongs to the model side. The registry is not broken. The tool call simply used a name the framework cannot resolve. The next turn can choose a real tool.

The next two cases go through the nested ToolError enum:

File: abcb/crates/abcb-core/src/lib.rs

LoopError::Tool(ToolError::InvalidArguments(e)) => Some(format!(
    "invalid tool arguments: {e}; correct them and try again"
)),
LoopError::Tool(ToolError::Execution(msg)) => Some(format!(
    "tool execution failed: {msg}; try a different approach"
)),

This is a good example of why the error taxonomy from earlier chapters matters. LoopError::Tool is not enough by itself. A tool can fail because the model sent the wrong argument shape. It can fail because the tool's own domain logic rejected the request. It can also fail because the environment failed.

Those should not all receive the same recovery treatment.

9.4.1 Matching Through Layers

The nested pattern lets the code say exactly what it means:

LoopError::Tool(ToolError::InvalidArguments(e))

This pattern matches a LoopError::Tool variant only when the inner ToolError is InvalidArguments. The code does not have to match the outer error first and then write a second match inside the arm. Rust lets us pattern match through the layers.

This is shorter and more precise because the recovery policy depends on the specific nested error, and the pattern shows that nested shape directly.

9.5 The Terminal Cases

The terminal cases are just as important:

File: abcb/crates/abcb-core/src/lib.rs

// Environment / provider / terminal errors are not the model's to fix.
LoopError::Tool(ToolError::Io(_)) => None,
LoopError::Provider(_) => None,
LoopError::MaxStepsExceeded { .. } => None,

ToolError::Io(_) is deliberately not recoverable.

That may feel strict at first. If a file-backed tool failed to read a file, couldn't the model try a different tool? Sometimes, maybe. But in this version of the framework, IO errors are treated as environment errors. A missing directory, permission problem, or filesystem failure is not something the model can reliably fix by producing another envelope.

Provider(_) is also terminal. If the provider cannot give us another response, the loop cannot recover by asking it for another response. The mechanism needed for recovery is the thing that failed.

MaxStepsExceeded is terminal by definition. The loop already used its allowed number of attempts. Recovering from that error by appending more feedback would ignore the safety boundary we just hit.

What can we do after MaxStepsExceeded?

MaxStepsExceeded should not be recovered inside the same loop, but the caller still has practical options. The CLI could show a clear message and ask the user to rerun with a higher step limit. A future session layer could save the partial session so the user can inspect what happened. A more advanced agent could summarize the unfinished state and ask for confirmation before continuing with a fresh budget. The important point is that those are caller-level decisions. The current loop should not silently continue after its own safety boundary.

The split between ToolError::InvalidArguments and ToolError::Io is the key design judgment in this chapter. Both are tool errors, but only one is something the model can plausibly correct.

9.6 The Recovery Branch in run_loop

Chapter 8's loop used ? on run_step, which meant any step error returned immediately. Chapter 9 replaces that with an explicit match:

File: abcb/crates/abcb-core/src/lib.rs

for _ in 0..max_steps {
    match run_step(provider, registry, &mut session) {
        Ok(StepOutcome::Final(answer)) => return Ok(answer),
        Ok(StepOutcome::ToolExecuted { .. }) => {}
        Err(e) => match e.recovery_feedback() {
            Some(feedback) => session.push_message(Message::new(Role::Tool, feedback)),
            None => return Err(e),
        },
    }
}

Err(LoopError::MaxStepsExceeded { max_steps })

The two success arms are unchanged in spirit.

If the step returns a final answer, the loop returns it:

Ok(StepOutcome::Final(answer)) => return Ok(answer),

If the step executed a tool, run_step has already appended the tool result to the session, so the loop simply continues:

Ok(StepOutcome::ToolExecuted { .. }) => {}

The new behavior is in the error arm:

Err(e) => match e.recovery_feedback() {
    Some(feedback) => session.push_message(Message::new(Role::Tool, feedback)),
    None => return Err(e),
},

First, the loop asks the error whether it can become recovery feedback:

e.recovery_feedback()

If the answer is Some(feedback), the loop appends that feedback as a tool message:

session.push_message(Message::new(Role::Tool, feedback))

Then the loop continues to the next iteration. There is no return in that arm.

If the answer is None, the loop returns the original error:

None => return Err(e),

This is why recovery_feedback takes &self. The loop can ask the error for a recovery verdict without consuming it. If recovery is not allowed, the loop still owns e and can return it.

We could write the recovery branch with if let Some(feedback) = e.recovery_feedback(), but the match form makes both outcomes visible together. For a policy branch, that explicitness is useful. The reader can see the two decisions side by side: append feedback, or abort.

9.7 Recovery Consumes a Step

There is no special retry counter in this chapter. Recovery uses the same loop budget as ordinary tool execution:

File: abcb/crates/abcb-core/src/lib.rs

for _ in 0..max_steps {
    match run_step(provider, registry, &mut session) {
        Ok(StepOutcome::Final(answer)) => return Ok(answer),
        Ok(StepOutcome::ToolExecuted { .. }) => {}
        Err(e) => match e.recovery_feedback() {
            Some(feedback) => session.push_message(Message::new(Role::Tool, feedback)),
            None => return Err(e),
        },
    }
}

Each iteration calls the provider once. If that provider response is malformed and the loop appends recovery feedback, that still used one iteration. The next retry is another iteration.

That choice keeps termination simple. max_steps is the single backstop for the loop. It bounds normal tool chains and it also limits recovery attempts.

A separate retry cap may look attractive, but then we would have two overlapping safety mechanisms: step budget and recovery budget. For now, one budget is easier to reason about. If the model keeps failing, it eventually hits MaxStepsExceeded.

File: abcb/crates/abcb-core/src/lib.rs

#[test]
fn run_loop_recovery_is_bounded_by_max_steps() {
    let mut provider = MockProvider::new(["garbage one", "garbage two"]);
    let registry = registry_with_stub_echo();

    let err = run_loop(&mut provider, &registry, "hi", 2).expect_err("loop should fail");

    assert!(matches!(err, LoopError::MaxStepsExceeded { max_steps: 2 }));
}

The provider gives two invalid responses. Both are recoverable parse errors, but max_steps is 2. After two attempts, the loop stops. Recovery does not create an escape hatch from the loop boundary.

9.8 Recovery Tests

The recovery behavior is easiest to see through scripted providers.

The first test starts with malformed JSON and then returns a final answer:

File: abcb/crates/abcb-core/src/lib.rs

#[test]
fn run_loop_recovers_from_parse_error() {
    let mut provider = MockProvider::new([
        "this is not json",
        r#"{"kind":"final","content":"recovered"}"#,
    ]);
    let registry = registry_with_stub_echo();

    let answer = run_loop(&mut provider, &registry, "hi", 5).expect("loop should recover");

    assert_eq!(answer, "recovered");
}

The first provider response cannot be parsed. run_step returns LoopError::Parse. run_loop asks that error for feedback, appends the feedback to the session, and continues. The second provider response is a valid final envelope, so the loop returns "recovered".

The unknown-tool case has the same shape:

File: abcb/crates/abcb-core/src/lib.rs

#[test]
fn run_loop_recovers_from_unknown_tool() {
    let mut provider = MockProvider::new([
        r#"{"kind":"tool_call","tool_name":"nonexistent","arguments":{}}"#,
        r#"{"kind":"final","content":"recovered"}"#,
    ]);
    let registry = registry_with_stub_echo();

    let answer = run_loop(&mut provider, &registry, "hi", 5).expect("loop should recover");

    assert_eq!(answer, "recovered");
}

Here the JSON is valid and the envelope is valid, but the tool name is not registered. The registry returns None, run_step turns that into LoopError::UnknownTool, and the loop feeds back a correction.

Bad arguments are also recoverable:

File: abcb/crates/abcb-core/src/lib.rs

#[test]
fn run_loop_recovers_from_bad_arguments() {
    let mut provider = MockProvider::new([
        r#"{"kind":"tool_call","tool_name":"stub_echo","arguments":{"wrong":1}}"#,
        r#"{"kind":"final","content":"recovered"}"#,
    ]);
    let registry = registry_with_stub_echo();

    let answer = run_loop(&mut provider, &registry, "hi", 5).expect("loop should recover");

    assert_eq!(answer, "recovered");
}

This time the tool exists, but StubEcho expects a text field:

File: abcb/crates/abcb-core/src/lib.rs

#[derive(Deserialize)]
struct StubEchoArgs {
    text: String,
}

struct StubEcho;

impl Tool for StubEcho {
    fn name(&self) -> &str {
        "stub_echo"
    }

    fn description(&self) -> &str {
        "Echoes the `text` field of its arguments."
    }

    fn invoke(&self, args: &serde_json::Value) -> Result<String, ToolError> {
        let typed: StubEchoArgs = serde_json::from_value(args.clone())?;
        Ok(typed.text)
    }
}

The provider sends {"wrong":1} instead of {"text":"..."}. serde_json::from_value fails, the ? converts that serde_json::Error into ToolError::InvalidArguments, and run_loop treats that as recoverable.

The tests do not prove that a real model will always improve after feedback. They prove the framework gives the model a chance to improve.

9.9 Terminal Tests

The provider-error test checks the opposite behavior:

File: abcb/crates/abcb-core/src/lib.rs

#[test]
fn run_loop_aborts_on_provider_error() {
    // One scripted tool call, then the provider is exhausted on the next turn.
    let mut provider = MockProvider::new([
        r#"{"kind":"tool_call","tool_name":"stub_echo","arguments":{"text":"x"}}"#,
    ]);
    let registry = registry_with_stub_echo();

    let err = run_loop(&mut provider, &registry, "hi", 5).expect_err("loop should abort");

    assert!(matches!(err, LoopError::Provider(_)));
}

The first step succeeds as a tool call. The loop continues. On the second step, the mock provider has no more scripted responses, so provider.complete(session) returns ProviderError::NoMoreResponses.

That becomes LoopError::Provider, and recovery_feedback returns None for provider errors:

File: abcb/crates/abcb-core/src/lib.rs

// Environment / provider / terminal errors are not the model's to fix.
LoopError::Tool(ToolError::Io(_)) => None,
LoopError::Provider(_) => None,
LoopError::MaxStepsExceeded { .. } => None,

The provider case sits with the other terminal cases. The loop can feed back a model mistake, but it cannot repair the provider by asking it for another response. So the loop returns the error instead of appending feedback.

There is also a direct classification test:

File: abcb/crates/abcb-core/src/lib.rs

#[test]
fn recovery_feedback_classifies_errors() {
    // Recoverable: model can plausibly fix these by trying again.
    assert!(
        LoopError::UnknownTool("x".into())
            .recovery_feedback()
            .is_some()
    );
    assert!(
        LoopError::Tool(ToolError::Execution("boom".into()))
            .recovery_feedback()
            .is_some()
    );

    // Not recoverable: environment / provider / terminal conditions.
    let io = ToolError::Io(io::Error::new(io::ErrorKind::PermissionDenied, "denied"));
    assert!(LoopError::Tool(io).recovery_feedback().is_none());
    assert!(
        LoopError::Provider(ProviderError::NoMoreResponses)
            .recovery_feedback()
            .is_none()
    );
    assert!(
        LoopError::MaxStepsExceeded { max_steps: 3 }
            .recovery_feedback()
            .is_none()
    );
}

The test uses is_some() and is_none() because the exact feedback text is less important than the classification. A recoverable error must produce some feedback. A terminal error must not.

is_some() and is_none()

Option
has small helper methods for tests like this. is_some() returns true when the value is Some(...), without caring what is inside. is_none() returns true when the value is None. Here that is exactly what we want to check. The test is about the recovery classification, not the exact wording of the feedback string.

9.10 Proving the Next Turn Sees the Updated Session

The recovery tests above prove the outcome. They show that the loop can recover when the provider's next scripted response is valid. But they do not prove that the provider was called with the updated session.

For that, the test suite adds a different provider:

File: abcb/crates/abcb-core/src/lib.rs

/// A provider that records every session it is asked to complete, so tests
/// can assert what the model actually saw on each turn.
struct RecordingProvider {
    scripted: VecDeque<String>,
    seen: Vec<Session>,
}

RecordingProvider still uses a queue of scripted responses, but it also stores each session it receives:

File: abcb/crates/abcb-core/src/lib.rs

impl Provider for RecordingProvider {
    fn complete(&mut self, session: &Session) -> Result<Message, ProviderError> {
        self.seen.push(session.clone());
        let content = self
            .scripted
            .pop_front()
            .ok_or(ProviderError::NoMoreResponses)?;
        Ok(Message::new(Role::Assistant, content))
    }
}

The important line is:

self.seen.push(session.clone());

The provider receives &Session, so it cannot move the session into seen. It clones the session and stores the snapshot. That gives the test a record of what the provider saw on each turn.

The test uses that record to check the second provider call:

File: abcb/crates/abcb-core/src/lib.rs

#[test]
fn run_loop_feeds_tool_result_back_to_model() {
    let mut provider = RecordingProvider::new([
        r#"{"kind":"tool_call","tool_name":"stub_echo","arguments":{"text":"pong"}}"#,
        r#"{"kind":"final","content":"done"}"#,
    ]);
    let registry = registry_with_stub_echo();

    let answer = run_loop(&mut provider, &registry, "hi", 5).expect("loop should succeed");
    assert_eq!(answer, "done");

    // The provider was called twice. The second call's session must include
    // the tool result, proving the loop fed it back for the model to see.
    assert_eq!(provider.seen.len(), 2);
    let second_turn = &provider.seen[1];
    assert!(
        second_turn
            .messages
            .iter()
            .any(|m| m.role == Role::Tool && m.content == "pong"),
        "second turn should see the tool result; saw: {:?}",
        second_turn.messages
    );
}

This test is about tool result feedback, not recovery feedback, but the mechanism is the same. The loop appends a Role::Tool message, and the next provider call receives a session that includes that message.

The last assertion is worth reading slowly:

second_turn
    .messages
    .iter()
    .any(|m| m.role == Role::Tool && m.content == "pong")

second_turn.messages.iter() borrows each message in the second recorded session. any(...) returns true if at least one borrowed message matches the condition. The closure checks for a tool message whose content is "pong".

That small iterator chain proves the loop's promise: information produced after the first provider call is visible to the second provider call.

9.11 Why Recovery Uses Role::Tool

One design choice may look odd: recovery feedback is appended as a Role::Tool message.

File: abcb/crates/abcb-core/src/lib.rs

Some(feedback) => session.push_message(Message::new(Role::Tool, feedback)),

The feedback did not come from a normal tool invocation. It came from the framework's parser, registry, or tool invocation boundary.

Still, Role::Tool is the closest role we have right now. From the model's point of view, the previous assistant message requested an action or produced an invalid action. The framework is responding with the result of that action: it could not parse the envelope, could not find the tool, or could not accept the arguments.

Using Role::Tool also keeps the session shape simple. The next provider call sees:

  1. the user's message,
  2. the assistant's attempted envelope,
  3. the framework's feedback as a tool message.

Later, a richer framework may separate tool output from framework diagnostics. For this chapter, using Role::Tool lets us teach the model through the same session channel already used for tool results.

9.12 What We Chose

This chapter makes the loop more forgiving without making it silent.

Recoverable model-side errors become feedback. Parse errors, unknown tools, invalid arguments, and tool execution failures produce a String that is appended to the session as a Role::Tool message. The next provider call can see that feedback and try again.

Terminal errors still stop the loop. Provider failures, IO failures, and MaxStepsExceeded return as errors. The model is not asked to repair problems that are outside its control.

Rust helps the design stay explicit. Option<String> represents the recovery verdict. Nested match patterns distinguish one kind of tool failure from another. The loop's match makes the two branches visible: append feedback and continue, or return the error.

There is still only one loop budget. Recovery attempts consume steps just like normal tool calls. That keeps the termination story simple: every provider call counts, and max_steps remains the final backstop.

At this point, abcb can run a multi-step session and recover from some bad model outputs. The loop is still using a mock provider, but the control flow now looks much closer to what a local-first agent framework needs in practice.

To be continued

Read more