This skill provides production-ready patterns for deduplicating events and articles aggregated from multiple sources. It supports two complementary modes: ID-based deduplication (when sources provide stable IDs) and content-based deduplication (semantic grouping by normalized title/date). The implementation includes canonical selection logic that prefers authoritative sources via a reputation score and heuristics for selecting the most complete or relevant version.
Use this skill when you ingest overlapping feeds (news, event streams, product lists) and need to collapse duplicates into a single canonical record for downstream processing, search, or analytics. It applies when: multiple outlets publish the same story, sources provide inconsistent IDs, or you must track reduction metrics and attribution.
Best suited for agents and tooling that run TypeScript/Node workflows or integrate with data pipelines: Copilot/Codex-style code assistants, automation agents that can run Node scripts, or server-side MCP agents that perform ETL and aggregation.
Cette compétence n'a pas encore été examinée par notre pipeline d'audit automatisé.