The Librarian Method

Intent-Stated Tool Discovery via Embeddings Open source (MIT). Part of the OpenPawz project.

The Problem

Every AI agent platform faces the same scaling wall: tool bloat. An agent that can send emails, manage files, query databases, generate images, post to Slack, and trade crypto needs dozens of tool definitions injected into its context window. When you connect to an automation engine like n8n with thousands of available workflows and integrations, this becomes catastrophic — you simply cannot fit all those tool schemas into a 32K or even 128K context window. The conventional approaches all have critical flaws:

Approach	Problem
Load all tools	Impossible at scale. Hundreds of workflow definitions would consume excessive tokens.
Pre-filter by keyword	Fragile. “Send a message to John” — is that email, Slack, SMS, WhatsApp, or Telegram? A keyword filter guesses wrong.
Category menus	Requires the user to know which category their request falls into. Breaks natural language interaction.
Static tool sets per agent	Limits each agent’s capability. Defeats the purpose of a universal platform.

The fundamental issue: the system deciding which tools are relevant before the LLM has interpreted the user’s intent is solving the wrong problem. Only the LLM knows what the user actually wants.

The Invention

The Librarian Method inverts the problem. Instead of the system guessing which tools to load, the agent itself requests the tools it needs — after it has understood the user’s intent. The metaphor is literal: a library patron (the agent) walks up to a librarian and describes what they need. The librarian finds the right books. The patron never needs to know the Dewey Decimal System.

The Three Roles

Role	Implementation	Cost
Patron	Cloud LLM (Gemini, Claude, GPT)	Per-token (paid)
Librarian	Any embedding model (e.g. Ollama `nomic-embed-text` locally, or a cloud embedding API)	$0 local / low cloud
Library	In-memory `ToolIndex` with pre-computed embeddings	~230 KB RAM

How It Works — Round by Round

Round 1: User says "Email John about the quarterly report"
  ┌─ Agent has: a small set of core tools (memory, filesystem, identity, request_tools)
  ├─ Agent understands intent: needs email capabilities
  └─ Agent calls: request_tools({"query": "email sending capabilities"})

  ┌─ Librarian embeds "email sending capabilities" → vector
  ├─ Cosine similarity search against tool index
  ├─ Top matches: email_send, email_read
  └─ Domain expansion: email_send → also returns email_read (siblings)

Round 2: Tools hot-loaded into this round
  ┌─ Agent now has: core tools + email_send + email_read
  ├─ Agent calls: email_send({to: "john@...", subject: "Q4 Report", ...})
  └─ Done ✅

The agent used only the tools it needed instead of carrying
every available tool definition in its context window.

Key Design Decisions

1. Agent-Driven Discovery

The LLM forms the search query, not a pre-filter guessing from the raw user message. This is the critical insight — the agent has parsed intent, so its query to the Librarian is precise and contextual. When a user says “Can you check if the deployment went through?”, a keyword filter might match deploy, check, or container. The agent understands the intent is monitoring and calls request_tools("deployment status monitoring CI/CD") — a much better search query.

2. Domain Expansion

When the Librarian finds email_send, it also returns email_read — tools from the same domain travel together. This prevents the agent from needing a second request_tools call to discover related capabilities. The 17 domains: system · filesystem · web · identity · memory · agents · communication · squads · tasks · skills · dashboard · storage · email · messaging · github · integrations · trading

3. Round Carryover

Tools loaded in round N remain available in round N+1 within the same chat turn. The agent doesn’t lose access to tools it just discovered. Tools are cleared between chat turns to prevent stale accumulation.

4. Swarm Bypass

Orchestrated/swarm agents skip Tool RAG entirely and receive all tools upfront. These agents are autonomous and operate without interactive discovery — the overhead of request_tools calls would slow them down unnecessarily.

5. Multiple Fallback Layers

If semantic search returns no results:

Exact name match — direct tool name lookup
Domain match — load all tools from a named domain
Full domain list — return the list of available domains so the agent can refine its query

The agent always gets something useful back.

Why This Is Novel

The Librarian Method combines five properties that, to the best of our research, have not appeared together in any prior system:

Property	Description
Agent-stated intent	The LLM writes the search query after understanding user intent — not a pre-filter on raw input
Local embeddings	Zero-cost semantic search via a local model (e.g. Ollama), or low-cost via any cloud embedding API
On-demand hot-loading	Tools injected into the active agent round, not pre-configured
Domain expansion	Sibling tools auto-included to reduce round trips
Scale-independent	Works the same whether you have 50 tools or 50,000

Prior Art Comparison

Academic work on tool retrieval for LLMs exists (ToolGen, ICLR 2025; ToolWeaver, ICLR 2026; ToolkenGPT, NeurIPS 2023), but these systems focus on fine-tuning LLMs to generate tool tokens or training retrieval models on tool documentation. They require expensive training pipelines and don’t scale gracefully. The Librarian Method requires zero training. It uses off-the-shelf embedding models (e.g. nomic-embed-text, OpenAI text-embedding-3-small, or any model that produces vector embeddings) and works with any tool added to the index — including tools that didn’t exist when the embedding model was trained.

Why This Matters for Cost and Context

Without the Librarian, every request carries the full set of available tool definitions in the context window. With the Librarian, only the tools relevant to the current task are loaded — typically a handful instead of dozens or hundreds. This has two compounding effects:

Lower token usage — Fewer tool definitions means fewer tokens consumed per request, which directly reduces API costs regardless of which provider or model you use.
More room for conversation — Context window space freed from tool schemas is available for actual conversation, memory, reasoning, and skill instructions.

The embedding search itself is powered by any embedding model you configure. Running locally via Ollama costs nothing; cloud embedding APIs like OpenAI’s text-embedding-3-small cost fractions of a cent per query.

Implementation

The Librarian Method is implemented across five files in the OpenPawz Rust engine:

File	Purpose
`engine/tool_index.rs`	`ToolIndex` struct — embedding storage, cosine similarity, domain mapping
`engine/tools/request_tools.rs`	`request_tools` meta-tool — the agent-facing Librarian call
`engine/chat.rs`	`build_chat_tools()` — filters tool list to core + loaded tools
`engine/agent_loop/mod.rs`	Hot-loads newly discovered tools between rounds
`engine/state.rs`	`tool_index` + `loaded_tools` on `EngineState`

Core Flow (Simplified)

// 1. Agent calls request_tools with a natural language query
fn request_tools(query: &str, state: &EngineState) -> Vec<ToolDef> {
    let index = state.tool_index.lock();

    // 2. Embed the query using Ollama (nomic-embed-text)
    let query_embedding = ollama_embed(query); // ~50ms, $0

    // 3. Cosine similarity against all indexed tools
    let matches = index.search(query_embedding, threshold: 0.5);

    // 4. Domain expansion — include sibling tools
    let expanded = index.expand_domains(matches);

    // 5. Return tool definitions for hot-loading
    expanded
}

The ToolIndex is built lazily on the first request_tools call — all registered tools (built-in + MCP) are embedded and stored as vectors. Subsequent searches are pure vector math against the in-memory index.

Try It

The Librarian Method is active by default in OpenPawz. No configuration needed beyond having an embedding model available. For zero-cost local operation, have Ollama running with nomic-embed-text installed:

ollama pull nomic-embed-text

Then ask your agent to do something that requires a specific tool:

“Check my GitHub notifications” The agent calls request_tools("GitHub notifications"), the Librarian finds github_notifications and github_list_repos, and the agent proceeds — carrying only the tools it needs instead of every tool available.

License & Attribution

The Librarian Method is part of OpenPawz and is released under the MIT License. You are free to use, modify, and redistribute this technique in any project, commercial or otherwise. Attribution is appreciated but not required. If you reference this work in academic papers or technical writing:

OpenPawz (2025). “The Librarian Method: Intent-Stated Tool Discovery via Local Embeddings.” https://github.com/OpenPawz/openpawz

​The Librarian Method

​The Problem

​The Invention

​The Three Roles

​How It Works — Round by Round

​Key Design Decisions

​1. Agent-Driven Discovery

​2. Domain Expansion

​3. Round Carryover

​4. Swarm Bypass

​5. Multiple Fallback Layers

​Why This Is Novel

​Prior Art Comparison

​Why This Matters for Cost and Context

​Implementation

​Core Flow (Simplified)

​Try It

​License & Attribution