
from drift771
Group and deduplicate multi-source events/articles using ID- and content-based grouping, reputation scoring, and canonical selection to pick the best version.
This skill provides production-ready patterns for deduplicating events and articles aggregated from multiple sources. It supports two complementary modes: ID-based deduplication (when sources provide stable IDs) and content-based deduplication (semantic grouping by normalized title/date). The implementation includes canonical selection logic that prefers authoritative sources via a reputation score and heuristics for selecting the most complete or relevant version.
Use this skill when you ingest overlapping feeds (news, event streams, product lists) and need to collapse duplicates into a single canonical record for downstream processing, search, or analytics. It applies when: multiple outlets publish the same story, sources provide inconsistent IDs, or you must track reduction metrics and attribution.
Best suited for agents and tooling that run TypeScript/Node workflows or integrate with data pipelines: Copilot/Codex-style code assistants, automation agents that can run Node scripts, or server-side MCP agents that perform ETL and aggregation.
Event deduplication skill providing TypeScript functions for ID-based and content-based deduplication with reputation scoring and canonical selection. The SKILL.md is well-structured with clear concepts, code examples, and best practices. No scripts to run. Code is functional but uses hardcoded domain reputation tiers and a simplistic normalization approach. Sourced from a 'drift v1 depreciated' folder, indicating it may be outdated.
Clean skill, no security concerns. Pure documentation with inline TypeScript snippets — no scripts, no network calls, no credentials. The 'drift v1 depreciated' path is a red flag for staleness but not a security issue. Hardcoded reputation scoring and simple normalization limit production usefulness.