Skip to main content

The Librarian Method

Intent-Stated Tool Discovery via Embeddings Open source (MIT). Part of the OpenPawz project.

The Problem

Every AI agent platform faces the same scaling wall: tool bloat. An agent that can send emails, manage files, query databases, generate images, post to Slack, and trade crypto needs dozens of tool definitions injected into its context window. When you connect to an automation engine like n8n with thousands of available workflows and integrations, this becomes catastrophic — you simply cannot fit all those tool schemas into a 32K or even 128K context window. The conventional approaches all have critical flaws:
ApproachProblem
Load all toolsImpossible at scale. Hundreds of workflow definitions would consume excessive tokens.
Pre-filter by keywordFragile. “Send a message to John” — is that email, Slack, SMS, WhatsApp, or Telegram? A keyword filter guesses wrong.
Category menusRequires the user to know which category their request falls into. Breaks natural language interaction.
Static tool sets per agentLimits each agent’s capability. Defeats the purpose of a universal platform.
The fundamental issue: the system deciding which tools are relevant before the LLM has interpreted the user’s intent is solving the wrong problem. Only the LLM knows what the user actually wants.

The Invention

The Librarian Method inverts the problem. Instead of the system guessing which tools to load, the agent itself requests the tools it needs — after it has understood the user’s intent. The metaphor is literal: a library patron (the agent) walks up to a librarian and describes what they need. The librarian finds the right books. The patron never needs to know the Dewey Decimal System.

The Three Roles

RoleImplementationCost
PatronCloud LLM (Gemini, Claude, GPT)Per-token (paid)
LibrarianAny embedding model (e.g. Ollama nomic-embed-text locally, or a cloud embedding API)$0 local / low cloud
LibraryIn-memory ToolIndex with pre-computed embeddings~230 KB RAM

How It Works — Round by Round

Round 1: User says "Email John about the quarterly report"
  ┌─ Agent has: a small set of core tools (memory, filesystem, identity, request_tools)
  ├─ Agent understands intent: needs email capabilities
  └─ Agent calls: request_tools({"query": "email sending capabilities"})

  ┌─ Librarian embeds "email sending capabilities" → vector
  ├─ Cosine similarity search against tool index
  ├─ Top matches: email_send, email_read
  └─ Domain expansion: email_send → also returns email_read (siblings)

Round 2: Tools hot-loaded into this round
  ┌─ Agent now has: core tools + email_send + email_read
  ├─ Agent calls: email_send({to: "john@...", subject: "Q4 Report", ...})
  └─ Done ✅

The agent used only the tools it needed instead of carrying
every available tool definition in its context window.

Key Design Decisions

1. Agent-Driven Discovery

The LLM forms the search query, not a pre-filter guessing from the raw user message. This is the critical insight — the agent has parsed intent, so its query to the Librarian is precise and contextual. When a user says “Can you check if the deployment went through?”, a keyword filter might match deploy, check, or container. The agent understands the intent is monitoring and calls request_tools("deployment status monitoring CI/CD") — a much better search query.

2. Domain Expansion

When the Librarian finds email_send, it also returns email_read — tools from the same domain travel together. This prevents the agent from needing a second request_tools call to discover related capabilities. The 17 domains: system · filesystem · web · identity · memory · agents · communication · squads · tasks · skills · dashboard · storage · email · messaging · github · integrations · trading

3. Round Carryover

Tools loaded in round N remain available in round N+1 within the same chat turn. The agent doesn’t lose access to tools it just discovered. Tools are cleared between chat turns to prevent stale accumulation.

4. Swarm Bypass

Orchestrated/swarm agents skip Tool RAG entirely and receive all tools upfront. These agents are autonomous and operate without interactive discovery — the overhead of request_tools calls would slow them down unnecessarily.

5. Multiple Fallback Layers

If semantic search returns no results:
  1. Exact name match — direct tool name lookup
  2. Domain match — load all tools from a named domain
  3. Full domain list — return the list of available domains so the agent can refine its query
The agent always gets something useful back.

Why This Is Novel

The Librarian Method combines five properties that, to the best of our research, have not appeared together in any prior system:
PropertyDescription
Agent-stated intentThe LLM writes the search query after understanding user intent — not a pre-filter on raw input
Local embeddingsZero-cost semantic search via a local model (e.g. Ollama), or low-cost via any cloud embedding API
On-demand hot-loadingTools injected into the active agent round, not pre-configured
Domain expansionSibling tools auto-included to reduce round trips
Scale-independentWorks the same whether you have 50 tools or 50,000

Prior Art Comparison

Academic work on tool retrieval for LLMs exists (ToolGen, ICLR 2025; ToolWeaver, ICLR 2026; ToolkenGPT, NeurIPS 2023), but these systems focus on fine-tuning LLMs to generate tool tokens or training retrieval models on tool documentation. They require expensive training pipelines and don’t scale gracefully. The Librarian Method requires zero training. It uses off-the-shelf embedding models (e.g. nomic-embed-text, OpenAI text-embedding-3-small, or any model that produces vector embeddings) and works with any tool added to the index — including tools that didn’t exist when the embedding model was trained.

Why This Matters for Cost and Context

Without the Librarian, every request carries the full set of available tool definitions in the context window. With the Librarian, only the tools relevant to the current task are loaded — typically a handful instead of dozens or hundreds. This has two compounding effects:
  1. Lower token usage — Fewer tool definitions means fewer tokens consumed per request, which directly reduces API costs regardless of which provider or model you use.
  2. More room for conversation — Context window space freed from tool schemas is available for actual conversation, memory, reasoning, and skill instructions.
The embedding search itself is powered by any embedding model you configure. Running locally via Ollama costs nothing; cloud embedding APIs like OpenAI’s text-embedding-3-small cost fractions of a cent per query.

Implementation

The Librarian Method is implemented across five files in the OpenPawz Rust engine:
FilePurpose
engine/tool_index.rsToolIndex struct — embedding storage, cosine similarity, domain mapping
engine/tools/request_tools.rsrequest_tools meta-tool — the agent-facing Librarian call
engine/chat.rsbuild_chat_tools() — filters tool list to core + loaded tools
engine/agent_loop/mod.rsHot-loads newly discovered tools between rounds
engine/state.rstool_index + loaded_tools on EngineState

Core Flow (Simplified)

// 1. Agent calls request_tools with a natural language query
fn request_tools(query: &str, state: &EngineState) -> Vec<ToolDef> {
    let index = state.tool_index.lock();

    // 2. Embed the query using Ollama (nomic-embed-text)
    let query_embedding = ollama_embed(query); // ~50ms, $0

    // 3. Cosine similarity against all indexed tools
    let matches = index.search(query_embedding, threshold: 0.5);

    // 4. Domain expansion — include sibling tools
    let expanded = index.expand_domains(matches);

    // 5. Return tool definitions for hot-loading
    expanded
}
The ToolIndex is built lazily on the first request_tools call — all registered tools (built-in + MCP) are embedded and stored as vectors. Subsequent searches are pure vector math against the in-memory index.

Try It

The Librarian Method is active by default in OpenPawz. No configuration needed beyond having an embedding model available. For zero-cost local operation, have Ollama running with nomic-embed-text installed:
ollama pull nomic-embed-text
Then ask your agent to do something that requires a specific tool:
“Check my GitHub notifications” The agent calls request_tools("GitHub notifications"), the Librarian finds github_notifications and github_list_repos, and the agent proceeds — carrying only the tools it needs instead of every tool available.

License & Attribution

The Librarian Method is part of OpenPawz and is released under the MIT License. You are free to use, modify, and redistribute this technique in any project, commercial or otherwise. Attribution is appreciated but not required. If you reference this work in academic papers or technical writing:
OpenPawz (2025). “The Librarian Method: Intent-Stated Tool Discovery via Local Embeddings.” https://github.com/OpenPawz/openpawz