
from claude-code-toolkit60
Practical guide to building production Retrieval-Augmented Generation (RAG) systems: vector DB selection, chunking strategies, embedding model choices, retrieva
Provides a hands-on, production-focused blueprint for building Retrieval-Augmented Generation (RAG) systems. Covers selecting and configuring vector databases (Qdrant, Pinecone, Chroma, Weaviate, Milvus), chunking strategies (fixed, semantic, hierarchical, sliding window), embedding model trade-offs (OpenAI, Sentence Transformers, Cohere), retrieval optimizations (hybrid search, reranking, metadata filtering), and production practices like caching, async ingestion, and monitoring. Includes code snippets and decision trees to guide practical implementation and deployment.
Use this skill when you are designing or debugging a semantic search / RAG pipeline: choosing a vector DB, deciding chunking and embedding strategies, optimizing retrieval quality, implementing hybrid dense+sparse search, or building production ingestion and monitoring. It's aimed at engineers building search, Q&A, or assistant systems that need reliable, scalable retrieval.
Best suited for code-focused and engineering agent runtimes that can run Python snippets and interact with vector DBs (Claude Code, Copilot/Codex-style agents, other code-capable assistants).
This skill has not been reviewed by our automated audit pipeline yet.