Back to Apps

Houtini LM
by houtini-ai
Offload bounded LLM tasks from Claude Code to local or cloud LLMs to save tokens and avoid rate limits.
0 stars
Works in:claude
Exposes:ToolsResourcesPrompts
What it does
Houtini LM connects Claude Code to local LLM servers (LM Studio, Ollama) or OpenAI-compatible cloud APIs (DeepSeek, Groq, Cerebras, OpenRouter). It allows Claude to delegate "grunt work"—like generating boilerplate, drafting commit messages, and performing code reviews—to cheaper or free models while keeping high-level architecture and planning on the frontier model.
Tools
chat: General task offloading with planning triggers to nudge Claude into delegating work.custom_prompt: A three-part prompt (system, context, instruction) designed to reduce context bleed.code_task: Specialized tool for code analysis, bug finding, and test generation.code_task_files: Analyzes multiple files directly from disk without flooding the MCP client's context window.embed: Generates text embeddings via OpenAI-compatible endpoints.discover: Health check and real-time performance readout (tok/s and TTFT).list_models: Lists all available models on the server with detailed capability profiles.stats: Displays cumulative token savings and per-model performance history.
Installation
Add to claude_desktop_config.json:
{
"mcpServers": {
"houtini-lm": {
"command": "npx",
"args": ["-y", "@houtini/lm"],
"env": {
"HOUTINI_LM_ENDPOINT_URL": "http://localhost:1234"
}
}
}
}
Supported hosts
- claude
Quick install
npx -y @houtini/lmInformation
- Pricing
- free
- Published
- 5/7/2026
- stars
- 0
Categories
Choose your AI client and follow the steps below.
Claude Desktop
{"mcpServers": {"houtini-lm": {"command": "npx", "args": ["-y", "@houtini/lm"], "env": {"HOUTINI_LM_ENDPOINT_URL": "http://localhost:1234"}}}}





