What it does

This skill documents practical patterns for implementing prompt caching when building with Anthropic (Claude) APIs. It explains cache mechanics, recommended TTLs, how to structure cached blocks and breakpoints, anti-patterns that cause cache misses, and methods for measuring cache hit rate. Concrete code examples in Python and TypeScript show how to mark cached content and keep dynamic data outside cached prefixes.

When to use it

Use this skill when you need to dramatically cut API input-token costs or reduce latency by caching stable parts of prompts. It's useful during production integration, performance tuning, or any high-volume loop where repeated stable prefixes exist (system prompts, tool definitions, large reference docs). Avoid for one-shot calls or very small prompts.

What's included

Scripts: none bundled (has_scripts=false) — examples are embedded in the SKILL.md body
References: none bundled (has_references=false) — links to Anthropic docs are cited in-text
Instructions: clear procedural guidance on TTL selection, breakpoint ordering, minimum cache block sizes, anti-pattern fixes, and metrics to measure (cache_read vs input tokens). Includes Python and TypeScript snippets demonstrating how to mark cached blocks.

Compatible agents

Likely used by agents and tooling that call Anthropic/Claude APIs (Claude-opus/Haiku), and by developer-facing tools (Claude Code, Cursor, Copilot integrations) that need token-cost optimizations.

Prompt Caching Patterns

What it does

When to use it

What's included

Compatible agents

Tags

Not yet audited

Related Skills

Information

More from ai-toolkit