What it does
This skill provides production-ready patterns for deduplicating events and articles aggregated from multiple sources. It supports two complementary modes: ID-based deduplication (when sources provide stable IDs) and content-based deduplication (semantic grouping by normalized title/date). The implementation includes canonical selection logic that prefers authoritative sources via a reputation score and heuristics for selecting the most complete or relevant version.
When to use it
Use this skill when you ingest overlapping feeds (news, event streams, product lists) and need to collapse duplicates into a single canonical record for downstream processing, search, or analytics. It applies when: multiple outlets publish the same story, sources provide inconsistent IDs, or you must track reduction metrics and attribution.
What's included
- Scripts: no standalone scripts included (has_scripts=false)
- References: none bundled (has_references=false)
- Instructions: TypeScript examples for ID-based and content-based deduplication, dedup key generation, reputation scoring, canonical selection, and usage examples showing multi-source aggregation.
Compatible agents
Best suited for agents and tooling that run TypeScript/Node workflows or integrate with data pipelines: Copilot/Codex-style code assistants, automation agents that can run Node scripts, or server-side MCP agents that perform ETL and aggregation.
Tags
Not yet audited
This skill has not been reviewed by our automated audit pipeline yet.







