Building a Local-First Agent Framework in Rust (Part 16): Approval: Gating Tool Calls

See Part 0 for the latest table of contents and sample code. New chapters will be added over time.

Chapter 16: Approval: Gating Tool Calls

By the end of Chapter 15, the agent could remember notes across runs. It could call tools, write event logs, keep sessions on disk, and use a local model through an OpenAI-compatible HTTP provider. The loop was no longer only a toy conversation. It had begun to do things, and that is when another boundary becomes necessary.

This post is also available on Medium. If you’re a paid Medium member and happen to read it there, it helps fund my next cup of coffee. Much appreciated ☕️😄

A tool call is different from a final answer. A final answer is text. A tool call can touch state. Even our current tools are still mild: echoing text, adding numbers, appending notes, and searching notes. But the framework is clearly moving toward tools that can read files, run commands, inspect project state, or talk to a game editor. If the loop can execute a tool just because the model asked for it, the framework has no place to say, "wait, should this be allowed?"

This chapter adds that place as an approval gate. Before the loop executes any tool call, it asks an ApprovalPolicy. The policy can allow the tool call or deny it. If the call is allowed, the tool runs as before. If the call is denied, the tool does not run, but the loop does not crash. The denial becomes part of the session and part of the event log.

That last part matters. A denial is not an exception. It is a policy decision.

The sample code for this chapter is in chapter16/abcb/.

16.1 The New Event: A Tool Was Denied

The event log already records user messages, model responses, tool results, and final answers. If approval becomes part of the loop, denial should also become part of the log.

File: abcb/crates/abcb-core/src/lib.rs

#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
#[serde(tag = "kind", rename_all = "snake_case")]
pub enum Event {
    UserMessage { content: String },
    ModelResponse { content: String },
    ToolResult { tool_name: String, output: String },
    ToolDenied { tool_name: String },
    FinalAnswer { content: String },
}

This is the first visible change. ToolDenied sits beside ToolResult, not inside it.

That is a deliberate choice. A denied tool call is not a failed tool result. The tool did not run. If the event log later contains a JSONL record such as this:

{"kind":"tool_denied","tool_name":"session_note_append"}

then the meaning is clear. The model asked for a tool, the approval layer blocked it, and no side effect happened.

This is different from the recovery feedback we added earlier. Recovery feedback is loop-internal guidance. It helps the model recover from malformed JSON, unknown tools, or bad arguments. A denied tool call is more important than that. It is a policy action, and it deserves its own event.

16.2 The New Step Outcome

The same idea appears in StepOutcome:

File: abcb/crates/abcb-core/src/lib.rs

#[derive(Clone, Debug, PartialEq)]
pub enum StepOutcome {
    Final(String),
    ToolExecuted { tool_name: String, output: String },
    ToolDenied { tool_name: String },
}

The loop already had two normal step outcomes. A step could finish with a final answer, or it could execute a tool and continue. Now there is a third normal outcome: the model requested a tool, but the policy denied it.

This is a useful Rust moment. Adding an enum variant is noisy in a good way. Every match that handled all previous outcomes now has to consider the new one. The compiler becomes a checklist. If denial is a real state in the framework, the type system makes us carry it through the places that need to know.

That is one reason I like enums for framework states. They make hidden behavior harder to hide.

16.3 The Approval Decision

The approval layer starts with a small enum:

File: abcb/crates/abcb-core/src/lib.rs

#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub enum ApprovalDecision {
    Allow,
    Deny,
}

There are only two answers for now. Either the tool call may run (Allow), or it may not (Deny).

This enum is intentionally plain. It does not include a reason string. It does not include "ask the user" or "allow once" or "allow always." Those may become useful later, but this chapter is about the core boundary. The loop only needs to know whether it can execute the requested tool.

The decision itself comes from a trait:

File: abcb/crates/abcb-core/src/lib.rs

pub trait ApprovalPolicy {
    fn approve(&mut self, tool_name: &str, arguments: &serde_json::Value) -> ApprovalDecision;
}

This is the third time this series has used the same framework shape: core defines a small interface, and the caller supplies the concrete behavior.

The first example was Provider. The loop could ask for a model response without knowing whether the answer came from a mock provider or a real HTTP model. The second example was the event sink from Chapter 14. The loop could write events without knowing whether the writer was a file, a buffer, or io::sink().

Now ApprovalPolicy follows the same pattern. The loop can ask for permission without knowing whether the answer came from an allow-all default, a test policy, a config allow-list, or a future interactive prompt.

There is one important caveat about that future interactive prompt. The approve method in this chapter is synchronous. That is fine for AllowAll and DenyAll, because they never wait. A real human prompt would involve blocking I/O, and Chapter 11 already warned us not to casually block inside async code. When that version arrives, approval may need to become async, or the blocking prompt may need to run on a blocking thread. For now, the synchronous trait is enough to establish the approval boundary without adding another async layer.

16.4 Why `&mut self`?

The approve method takes &mut self:

File: abcb/crates/abcb-core/src/lib.rs

fn approve(&mut self, tool_name: &str, arguments: &serde_json::Value) -> ApprovalDecision;

At first, this may look unnecessary. The two policies in this chapter do not need mutation. AllowAll always returns Allow, and DenyAll always returns Deny.

But the trait is designed for the policies we will need later.

An approval policy may want to remember state. It might allow the first call and deny the rest. It might record which tools were requested. It might ask a human once and then remember "allow this tool for the rest of this session." It might enforce a budget, such as "only three file reads before asking again."

Those policies need mutable access to themselves. So the trait is honest about that from the beginning.

This is similar to the Provider trait from Chapter 4. MockProvider stored scripted responses in a queue, and each call to complete removed the next response from that queue. That mutation required &mut self. Here, the motivation is not a response queue. It is the possibility of stateful approval.

The receiver tells us something about the role of the object. &self says the object can answer without changing itself. &mut self says the answer may depend on state that changes over time.

16.5 Two Basic Policies

The core crate provides two tiny policies: AllowAll, which approves every tool call, and DenyAll, which denies every tool call.

File: abcb/crates/abcb-core/src/lib.rs

pub struct AllowAll;

impl ApprovalPolicy for AllowAll {
    fn approve(&mut self, _tool_name: &str, _arguments: &serde_json::Value) -> ApprovalDecision {
        ApprovalDecision::Allow
    }
}

AllowAll is the production default for now. That may sound strange in a chapter about approval, but it keeps existing behavior unchanged while the framework learns where the approval boundary belongs.

The second policy denies everything:

File: abcb/crates/abcb-core/src/lib.rs

pub struct DenyAll;

impl ApprovalPolicy for DenyAll {
    fn approve(&mut self, _tool_name: &str, _arguments: &serde_json::Value) -> ApprovalDecision {
        ApprovalDecision::Deny
    }
}

DenyAll is useful even if it is not a realistic end-user policy. It gives tests a clean lever. If the loop behaves correctly with a policy that denies every tool call, we know denial is not just a comment in the code. We know the tool is skipped, the event is recorded, and the model receives feedback.

Notice the underscore parameters:

_tool_name: &str
_arguments: &serde_json::Value

The trait gives every policy the tool name and arguments. These two policies ignore them, so the parameter names start with _. That tells Rust and the reader that the values are intentionally unused.

16.6 The Gate Inside `run_step`

The approval policy becomes a new dependency of run_step:

File: abcb/crates/abcb-core/src/lib.rs

pub async fn run_step(
    provider: &mut impl Provider,
    registry: &ToolRegistry,
    session: &mut Session,
    events: &mut impl Write,
    policy: &mut impl ApprovalPolicy,
) -> Result<StepOutcome, LoopError> {

This signature is getting crowded, but each argument has a clear responsibility.

The provider supplies model responses. The registry supplies tools. The session carries conversation state. The event writer records what happened. The approval policy decides whether a tool call may execute.

Inside the tool-call branch, the loop still looks up the tool first:

File: abcb/crates/abcb-core/src/lib.rs

let tool = registry
    .get(&tool_name)
    .ok_or_else(|| LoopError::UnknownTool(tool_name.clone()))?;

This ordering is intentional. An unknown tool is still an unknown-tool error. There is nothing meaningful to approve or deny if the registry has no such tool. Once the tool name resolves, the loop asks the policy before invoking it:

File: abcb/crates/abcb-core/src/lib.rs

if policy.approve(&tool_name, &arguments) == ApprovalDecision::Deny {
    write_event(
        events,
        &Event::ToolDenied {
            tool_name: tool_name.clone(),
        },
    )?;
    session.push_message(Message::new(
        Role::Tool,
        format!("the tool `{tool_name}` was denied by the approval policy"),
    ));
    return Ok(StepOutcome::ToolDenied { tool_name });
}

This is the center of the chapter.

The tool has been found, but it has not run yet. If the policy denies the call, three things happen.

First, the loop writes a ToolDenied event. This makes the denial visible in the audit log.

Second, the loop pushes a Role::Tool message into the session. That is the same channel we use for tool results and recovery feedback. The model asked the environment to do something, and the environment answered: no, that tool call was denied.

Third, the function returns Ok(StepOutcome::ToolDenied { tool_name }). It is Ok, not Err. That is the policy decision encoded in Rust.

16.7 Denial Is Not Failure

This distinction is easy to miss, so it is worth slowing down.

An unknown tool is an error, represented as LoopError::UnknownTool(name). A tool that receives invalid arguments is also an error, represented as LoopError::Tool(ToolError::InvalidArguments(_)).

A denied tool call is not an error. The framework did exactly what it was supposed to do. It intercepted the requested action, applied the policy, recorded the denial, and gave feedback to the model.

That is why run_loop treats ToolDenied like ToolExecuted:

File: abcb/crates/abcb-core/src/lib.rs

match run_step(provider, registry, session, events, policy).await {
    Ok(StepOutcome::Final(answer)) => {
        write_event(
            events,
            &Event::FinalAnswer {
                content: answer.clone(),
            },
        )?;
        return Ok(answer);
    }
    Ok(StepOutcome::ToolExecuted { .. }) | Ok(StepOutcome::ToolDenied { .. }) => {}
    Err(e) => match e.recovery_feedback() {
        Some(feedback) => session.push_message(Message::new(Role::Tool, feedback)),
        None => return Err(e),
    },
}

If a tool ran, the result was already pushed into the session. If a tool was denied, the denial message was already pushed into the session. Either way, the next model turn has new information.

The loop can continue.

That is the main framework lesson of this chapter: approval is not just a safety check. It is part of the agent's data flow.

The full run_loop signature now carries the same policy dependency:

File: abcb/crates/abcb-core/src/lib.rs

pub async fn run_loop(
    provider: &mut impl Provider,
    registry: &ToolRegistry,
    session: &mut Session,
    max_steps: usize,
    events: &mut impl Write,
    policy: &mut impl ApprovalPolicy,
) -> Result<String, LoopError> {

The parameter list is longer than it was in the early chapters, but that length is visible architecture. The loop needs a model, tools, session state, an event sink, and now an approval policy. Each one is injected from outside, so core can stay focused on orchestration.

Rust: how many parameters is too many?

Rust does not have a special rule for the "right" number of function parameters. The compiler is fine with long signatures. The question is whether the signature still communicates the shape of the program.

A long parameter list is often a smell when several values always travel together. In that case, a small struct can give the group a name. For example, configuration values often become a Config, and repeated runtime dependencies may become a context struct.

Here, I am leaving the dependencies explicit because they are still teaching the architecture: provider, registry, session, event sink, and approval policy are separate boundaries. Later, if these same arguments keep moving together through more functions, a RunContext-style struct may become clearer.

16.8 Wiring the Default Policy

Changing the core signature means the CLI has to pass a policy.

File: abcb/crates/abcb-cli/src/main.rs

use abcb_core::{
    AllowAll, Event, Message, MockProvider, Role, Session, ToolRegistry, one_turn, read_events,
    run_loop, write_event,
};

For now, both the mock and real-provider paths use AllowAll. In the mock path, the step limit still comes from DEFAULT_MAX_STEPS; the new part is the final &mut AllowAll argument:

File: abcb/crates/abcb-cli/src/main.rs

let answer = run_loop(
    &mut provider,
    &registry,
    &mut session,
    DEFAULT_MAX_STEPS,
    &mut events,
    &mut AllowAll,
)
.await?;

The real-provider path still reads its step limit from configuration, but it passes the same approval policy:

File: abcb/crates/abcb-cli/src/main.rs

let answer = run_loop(
    &mut provider,
    &registry,
    &mut session,
    config.max_steps(),
    &mut events,
    &mut AllowAll,
)
.await?;

This keeps the command-line behavior the same for readers running the sample code. The framework now has an approval seam, but the CLI does not yet expose a real approval mode.

That may seem like a small result. But this is how framework boundaries often appear. First the boundary exists in the core. Then tests prove it. Later, product behavior can grow on top of it.

One Rust detail in the call site may look odd: &mut AllowAll.

AllowAll is a unit-like struct. It has no fields. The expression AllowAll creates a value of that type, and &mut AllowAll borrows that temporary value mutably for the duration of the call.

This works because run_loop only needs the policy while the function is running. If the policy had state we wanted to inspect afterward, we would bind it to a local variable:

let mut policy = AllowAll;
let answer = run_loop(
    &mut provider,
    &registry,
    &mut session,
    config.max_steps(),
    &mut events,
    &mut policy,
)
.await?;

Both forms satisfy the same parameter, policy: &mut impl ApprovalPolicy. In this chapter, the short form is fine because AllowAll has no state to keep.

16.9 Replay Learns the New Event

Because Event has a new variant, replay must handle it. The change is inside run_replay, where the CLI converts each event into a printable kind and content pair:

File: abcb/crates/abcb-cli/src/main.rs

fn run_replay(path: PathBuf) -> Result<(), Box<dyn Error>> {
    // ...
    for (index, logged) in events.iter().enumerate() {
        let (kind, content) = match &logged.event {
            Event::UserMessage { content } => ("user_message", content.clone()),
            Event::ModelResponse { content } => ("model_response", content.clone()),
            Event::ToolResult { tool_name, output } => {
                ("tool_result", format!("{tool_name}: {output}"))
            }
            Event::ToolDenied { tool_name } => ("tool_denied", tool_name.clone()),
            Event::FinalAnswer { content } => ("final_answer", content.clone()),
        };
        println!("[{}] {kind}: {content}", index + 1);
    }
    Ok(())
}

This is another place where Rust's exhaustive matching helps. If run_replay forgets about ToolDenied, the compiler complains. There is no silent fallback that prints an incorrect event or hides the new state.

The user-facing string is tool_denied. That matches the serialized event kind and keeps replay readable.

16.10 Testing Denial

The first denial test runs a single step with DenyAll:

File: abcb/crates/abcb-core/src/lib.rs

#[tokio::test]
async fn run_step_denied_by_policy_skips_the_tool_and_feeds_back() {
    let mut provider = MockProvider::new([
        r#"{"kind":"tool_call","tool_name":"stub_echo","arguments":{"text":"pong"}}"#,
    ]);
    let registry = registry_with_stub_echo();
    let mut session = session_with_user("hi");

    let outcome = run_step(
        &mut provider,
        &registry,
        &mut session,
        &mut io::sink(),
        &mut DenyAll,
    )
    .await
    .expect("denial is policy, not error");

    assert_eq!(
        outcome,
        StepOutcome::ToolDenied {
            tool_name: "stub_echo".into()
        }
    );
    assert!(!session.messages.iter().any(|m| m.content == "pong"));
    let last = session.messages.last().expect("a message");
    assert_eq!(last.role, Role::Tool);
    assert!(last.content.contains("denied"));
}

This test proves the most important behavior: the tool does not run. If stub_echo had run, "pong" would appear in the session as a tool result. The assertion checks that it does not.

The last message is still a Role::Tool message, but it contains denial feedback rather than tool output. That keeps the conversation shape stable. The model still receives an environment response after its tool call. The content of that response tells it what happened.

The second test runs the full loop:

File: abcb/crates/abcb-core/src/lib.rs

#[tokio::test]
async fn run_loop_logs_denial_and_keeps_going() {
    let mut provider = MockProvider::new([
        r#"{"kind":"tool_call","tool_name":"stub_echo","arguments":{"text":"pong"}}"#,
        r#"{"kind":"final","content":"ok then"}"#,
    ]);
    let registry = registry_with_stub_echo();
    let mut log: Vec<u8> = Vec::new();

    let answer = run_loop(
        &mut provider,
        &registry,
        &mut session_with_user("hi"),
        5,
        &mut log,
        &mut DenyAll,
    )
    .await
    .expect("loop reaches a final answer despite the denial");

    assert_eq!(answer, "ok then");
}

The model first asks for a tool. The policy denies it. Then the model sees that denial and returns a final answer. That is exactly the flow we want.

The same test checks the event stream:

File: abcb/crates/abcb-core/src/lib.rs

assert_eq!(
    events,
    vec![
        Event::UserMessage {
            content: "hi".into()
        },
        Event::ModelResponse {
            content:
                r#"{"kind":"tool_call","tool_name":"stub_echo","arguments":{"text":"pong"}}"#
                    .into()
        },
        Event::ToolDenied {
            tool_name: "stub_echo".into()
        },
        Event::ModelResponse {
            content: r#"{"kind":"final","content":"ok then"}"#.into()
        },
        Event::FinalAnswer {
            content: "ok then".into()
        },
    ]
);

There is a ToolDenied event and no ToolResult event. The log does not blur "blocked" into "ran." That is the audit property this chapter adds.

16.11 Why Not Put Approval on the Tool?

One alternative would be to add a method like requires_approval(&self) -> bool to the Tool trait. That looks simple, but it puts the wrong object in charge.

Risk is not only a property of the tool. It can depend on the environment, the arguments, the user, the current mode, the project, or the previous approvals in this session. A file-reading tool may be harmless for one path and sensitive for another. A command-running tool may be blocked in unattended mode and allowed in an interactive development session.

The current design keeps that decision outside the tool. The policy sees both the tool name and the proposed arguments through policy.approve(&tool_name, &arguments). It can make a decision without changing the Tool trait and without forcing every tool implementation to know about product-level safety rules.

That separation is important. Tools describe capabilities. Policies decide whether a particular use of a capability is allowed.

16.12 What This Is Not Yet

This chapter does not implement a human approval prompt. There is no CLI question that asks, "allow this tool call?" There is no configuration file for allow-lists or deny-lists. There is no per-tool risk classification yet.

That is intentional. If we tried to build the user experience first, core would have to know too much about terminals, prompts, timeouts, defaults, and product policy. Instead, this chapter adds the seam where those things can attach later.

For now, AllowAll preserves current behavior, and DenyAll proves the denial path. That is enough for the framework layer.

16.13 What Changed

Chapter 16 adds an approval gate before tool execution.

The framework lesson is that risky actions need a boundary before they run. The loop now asks an ApprovalPolicy before every tool call. A denied call does not execute the tool, but it also does not crash the loop. The denial is recorded as Event::ToolDenied, fed back to the model as a Role::Tool message, and returned as StepOutcome::ToolDenied.

The Rust lesson is trait injection and stateful receivers. ApprovalPolicy is another example of core defining an interface and callers supplying behavior. Its method takes &mut self because a realistic policy may carry state across calls.

We also saw how enum variants ripple through a Rust program. Adding ToolDenied to Event and StepOutcome forces the code to handle denial explicitly. That is not busywork. It is one of the ways Rust helps framework code stay honest as the state machine grows.

The next chapter turns the event log around. We have been writing down what happened. Now we can read that log back and summarize a run from the records it left behind.

To be continued

Building a Local-First Agent Framework in Rust (Part 16): Approval: Gating Tool Calls

Chapter 16: Approval: Gating Tool Calls

16.1 The New Event: A Tool Was Denied

16.2 The New Step Outcome

16.3 The Approval Decision

16.4 Why `&mut self`?

16.5 Two Basic Policies

16.6 The Gate Inside `run_step`

16.7 Denial Is Not Failure

16.8 Wiring the Default Policy

16.9 Replay Learns the New Event

16.10 Testing Denial

16.11 Why Not Put Approval on the Tool?

16.12 What This Is Not Yet

16.13 What Changed

Read more

Building a Local-First Agent Framework in Rust (Part 15): Memory Tiers: Notes

Building a Local-First Agent Framework in Rust (Part 14): Recording the Loop

Building a Local-First Agent Framework in Rust (Part 13): Time, Identity, and Sessions on Disk

Building a Local-First Agent Framework in Rust (Part 12): Health Checks, doctor, and Testing HTTP

Chapter 16: Approval: Gating Tool Calls

16.1 The New Event: A Tool Was Denied

16.2 The New Step Outcome

16.3 The Approval Decision

16.4 Why &mut self?

16.5 Two Basic Policies

16.6 The Gate Inside run_step

16.7 Denial Is Not Failure

16.8 Wiring the Default Policy

16.9 Replay Learns the New Event

16.10 Testing Denial

16.11 Why Not Put Approval on the Tool?

16.12 What This Is Not Yet

16.13 What Changed

Read more

Building a Local-First Agent Framework in Rust (Part 15): Memory Tiers: Notes

Building a Local-First Agent Framework in Rust (Part 14): Recording the Loop

Building a Local-First Agent Framework in Rust (Part 13): Time, Identity, and Sessions on Disk

Building a Local-First Agent Framework in Rust (Part 12): Health Checks, doctor, and Testing HTTP

16.4 Why `&mut self`?

16.6 The Gate Inside `run_step`