The Librarian Method
Intent-Stated Tool Discovery via Embeddings Open source (MIT). Part of the OpenPawz project.The Problem
Every AI agent platform faces the same scaling wall: tool bloat. An agent that can send emails, manage files, query databases, generate images, post to Slack, and trade crypto needs dozens of tool definitions injected into its context window. When you connect to an automation engine like n8n with thousands of available workflows and integrations, this becomes catastrophic — you simply cannot fit all those tool schemas into a 32K or even 128K context window. The conventional approaches all have critical flaws:| Approach | Problem |
|---|---|
| Load all tools | Impossible at scale. Hundreds of workflow definitions would consume excessive tokens. |
| Pre-filter by keyword | Fragile. “Send a message to John” — is that email, Slack, SMS, WhatsApp, or Telegram? A keyword filter guesses wrong. |
| Category menus | Requires the user to know which category their request falls into. Breaks natural language interaction. |
| Static tool sets per agent | Limits each agent’s capability. Defeats the purpose of a universal platform. |
The Invention
The Librarian Method inverts the problem. Instead of the system guessing which tools to load, the agent itself requests the tools it needs — after it has understood the user’s intent. The metaphor is literal: a library patron (the agent) walks up to a librarian and describes what they need. The librarian finds the right books. The patron never needs to know the Dewey Decimal System.The Three Roles
| Role | Implementation | Cost |
|---|---|---|
| Patron | Cloud LLM (Gemini, Claude, GPT) | Per-token (paid) |
| Librarian | Any embedding model (e.g. Ollama nomic-embed-text locally, or a cloud embedding API) | $0 local / low cloud |
| Library | In-memory ToolIndex with pre-computed embeddings | ~230 KB RAM |
How It Works — Round by Round
Key Design Decisions
1. Agent-Driven Discovery
The LLM forms the search query, not a pre-filter guessing from the raw user message. This is the critical insight — the agent has parsed intent, so its query to the Librarian is precise and contextual. When a user says “Can you check if the deployment went through?”, a keyword filter might matchdeploy, check, or container. The agent understands the intent is monitoring and calls request_tools("deployment status monitoring CI/CD") — a much better search query.
2. Domain Expansion
When the Librarian findsemail_send, it also returns email_read — tools from the same domain travel together. This prevents the agent from needing a second request_tools call to discover related capabilities.
The 17 domains:
system · filesystem · web · identity · memory · agents · communication · squads · tasks · skills · dashboard · storage · email · messaging · github · integrations · trading
3. Round Carryover
Tools loaded in round N remain available in round N+1 within the same chat turn. The agent doesn’t lose access to tools it just discovered. Tools are cleared between chat turns to prevent stale accumulation.4. Swarm Bypass
Orchestrated/swarm agents skip Tool RAG entirely and receive all tools upfront. These agents are autonomous and operate without interactive discovery — the overhead ofrequest_tools calls would slow them down unnecessarily.
5. Multiple Fallback Layers
If semantic search returns no results:- Exact name match — direct tool name lookup
- Domain match — load all tools from a named domain
- Full domain list — return the list of available domains so the agent can refine its query
Why This Is Novel
The Librarian Method combines five properties that, to the best of our research, have not appeared together in any prior system:| Property | Description |
|---|---|
| Agent-stated intent | The LLM writes the search query after understanding user intent — not a pre-filter on raw input |
| Local embeddings | Zero-cost semantic search via a local model (e.g. Ollama), or low-cost via any cloud embedding API |
| On-demand hot-loading | Tools injected into the active agent round, not pre-configured |
| Domain expansion | Sibling tools auto-included to reduce round trips |
| Scale-independent | Works the same whether you have 50 tools or 50,000 |
Prior Art Comparison
Academic work on tool retrieval for LLMs exists (ToolGen, ICLR 2025; ToolWeaver, ICLR 2026; ToolkenGPT, NeurIPS 2023), but these systems focus on fine-tuning LLMs to generate tool tokens or training retrieval models on tool documentation. They require expensive training pipelines and don’t scale gracefully. The Librarian Method requires zero training. It uses off-the-shelf embedding models (e.g.nomic-embed-text, OpenAI text-embedding-3-small, or any model that produces vector embeddings) and works with any tool added to the index — including tools that didn’t exist when the embedding model was trained.
Why This Matters for Cost and Context
Without the Librarian, every request carries the full set of available tool definitions in the context window. With the Librarian, only the tools relevant to the current task are loaded — typically a handful instead of dozens or hundreds. This has two compounding effects:- Lower token usage — Fewer tool definitions means fewer tokens consumed per request, which directly reduces API costs regardless of which provider or model you use.
- More room for conversation — Context window space freed from tool schemas is available for actual conversation, memory, reasoning, and skill instructions.
text-embedding-3-small cost fractions of a cent per query.
Implementation
The Librarian Method is implemented across five files in the OpenPawz Rust engine:| File | Purpose |
|---|---|
engine/tool_index.rs | ToolIndex struct — embedding storage, cosine similarity, domain mapping |
engine/tools/request_tools.rs | request_tools meta-tool — the agent-facing Librarian call |
engine/chat.rs | build_chat_tools() — filters tool list to core + loaded tools |
engine/agent_loop/mod.rs | Hot-loads newly discovered tools between rounds |
engine/state.rs | tool_index + loaded_tools on EngineState |
Core Flow (Simplified)
ToolIndex is built lazily on the first request_tools call — all registered tools (built-in + MCP) are embedded and stored as vectors. Subsequent searches are pure vector math against the in-memory index.
Try It
The Librarian Method is active by default in OpenPawz. No configuration needed beyond having an embedding model available. For zero-cost local operation, have Ollama running withnomic-embed-text installed:
“Check my GitHub notifications” The agent callsrequest_tools("GitHub notifications"), the Librarian findsgithub_notificationsandgithub_list_repos, and the agent proceeds — carrying only the tools it needs instead of every tool available.
License & Attribution
The Librarian Method is part of OpenPawz and is released under the MIT License. You are free to use, modify, and redistribute this technique in any project, commercial or otherwise. Attribution is appreciated but not required. If you reference this work in academic papers or technical writing:OpenPawz (2025). “The Librarian Method: Intent-Stated Tool Discovery via Local Embeddings.” https://github.com/OpenPawz/openpawz

