
Houtini LM
by houtini-ai
Offload bounded LLM tasks from Claude Code to local or cloud LLMs to save tokens and avoid rate limits.
What it does
Houtini LM connects Claude Code to local LLM servers (LM Studio, Ollama) or OpenAI-compatible cloud APIs (DeepSeek, Groq, Cerebras, OpenRouter). It allows Claude to delegate "grunt work"—like generating boilerplate, drafting commit messages, and performing code reviews—to cheaper or free models while keeping high-level architecture and planning on the frontier model.
Tools
chat: General task offloading with planning triggers to nudge Claude into delegating work.custom_prompt: A three-part prompt (system, context, instruction) designed to reduce context bleed.code_task: Specialized tool for code analysis, bug finding, and test generation.code_task_files: Analyzes multiple files directly from disk without flooding the MCP client's context window.embed: Generates text embeddings via OpenAI-compatible endpoints.discover: Health check and real-time performance readout (tok/s and TTFT).list_models: Lists all available models on the server with detailed capability profiles.stats: Displays cumulative token savings and per-model performance history.
Installation
Add to claude_desktop_config.json:
{
"mcpServers": {
"houtini-lm": {
"command": "npx",
"args": ["-y", "@houtini/lm"],
"env": {
"HOUTINI_LM_ENDPOINT_URL": "http://localhost:1234"
}
}
}
}
Supported hosts
- claude
Quick install
npx -y @houtini/lmInformation
- Pricing
- free
- Published





