
from ai-toolkit142
Guidelines and patterns for implementing prompt caching with Anthropic APIs to reduce input-token costs and latency, including TTLs, breakpoints, and hit-rate m
This skill documents practical patterns for implementing prompt caching when building with Anthropic (Claude) APIs. It explains cache mechanics, recommended TTLs, how to structure cached blocks and breakpoints, anti-patterns that cause cache misses, and methods for measuring cache hit rate. Concrete code examples in Python and TypeScript show how to mark cached content and keep dynamic data outside cached prefixes.
Use this skill when you need to dramatically cut API input-token costs or reduce latency by caching stable parts of prompts. It's useful during production integration, performance tuning, or any high-volume loop where repeated stable prefixes exist (system prompts, tool definitions, large reference docs). Avoid for one-shot calls or very small prompts.
Likely used by agents and tooling that call Anthropic/Claude APIs (Claude-opus/Haiku), and by developer-facing tools (Claude Code, Cursor, Copilot integrations) that need token-cost optimizations.
This skill has not been reviewed by our automated audit pipeline yet.
Ubiquitous Language Glossary
Extract a DDD-style ubiquitous language glossary from conversation: identify domain terms, flag ambiguities, propose canonical terms, and write the result to UB
Memory Search
Search past coding sessions and observations using natural-language queries to retrieve decisions, notes, and context.