Event Deduplication

Trust Score 85/100

from drift771

Group and deduplicate multi-source events/articles using ID- and content-based grouping, reputation scoring, and canonical selection to pick the best version.

triggers:deduplicatecanonical selectionreputationgroup by titlegenerate event id

GitHub SKILL.md

What it does

This skill provides production-ready patterns for deduplicating events and articles aggregated from multiple sources. It supports two complementary modes: ID-based deduplication (when sources provide stable IDs) and content-based deduplication (semantic grouping by normalized title/date). The implementation includes canonical selection logic that prefers authoritative sources via a reputation score and heuristics for selecting the most complete or relevant version.

When to use it

Use this skill when you ingest overlapping feeds (news, event streams, product lists) and need to collapse duplicates into a single canonical record for downstream processing, search, or analytics. It applies when: multiple outlets publish the same story, sources provide inconsistent IDs, or you must track reduction metrics and attribution.

What's included

Scripts: no standalone scripts included (has_scripts=false)
References: none bundled (has_references=false)
Instructions: TypeScript examples for ID-based and content-based deduplication, dedup key generation, reputation scoring, canonical selection, and usage examples showing multi-source aggregation.

Compatible agents

Best suited for agents and tooling that run TypeScript/Node workflows or integrate with data pipelines: Copilot/Codex-style code assistants, automation agents that can run Node scripts, or server-side MCP agents that perform ETL and aggregation.

Audit Summary

Event deduplication skill providing TypeScript functions for ID-based and content-based deduplication with reputation scoring and canonical selection. The SKILL.md is well-structured with clear concepts, code examples, and best practices. No scripts to run. Code is functional but uses hardcoded domain reputation tiers and a simplistic normalization approach. Sourced from a 'drift v1 depreciated' folder, indicating it may be outdated.

Watch Out

Hardcoded reputation tier lists — not configurable
Source path contains 'depreciated' (misspelled 'deprecated') suggesting this may be stale
No actual runnable scripts — skill is purely documentation/code snippets
MD5 hashing for event IDs is collision-prone at 12 hex chars

Notes

Clean skill, no security concerns. Pure documentation with inline TypeScript snippets — no scripts, no network calls, no credentials. The 'drift v1 depreciated' path is a red flag for staleness but not a security issue. Hardcoded reputation scoring and simple normalization limit production usefulness.

Information

Repository: drift
Stars: 771

Trust Score

Overall85

Security100

Code Quality72

Architecture65

Usefulness55

More from drift

SSE Stream Resilience

Provide robust server-sent-events (SSE) stream management with Redis-backed registry, heartbeat monitoring, completion persistence, and background guardian clea

Related Skills

Node.js Best Practices

Guidelines and decision-making for Node.js architecture, runtime, async patterns, security, validation, and testing to inform framework and system choices.

SSE Stream Resilience

Provide robust server-sent-events (SSE) stream management with Redis-backed registry, heartbeat monitoring, completion persistence, and background guardian clea

Algolia Cost Tuning

Practical guide to reduce Algolia billing by auditing records/requests, switching to virtual replicas, multi-query usage, caching, and cleanup strategies.

URL Routing Patterns

Guidance for designing SEO-friendly URL structures, slug generation, redirects, localization, and routing APIs for headless CMS and content platforms.

DAUB UI

A drop-in CSS+JS component library with 70+ carefully designed components and 20 theme families for fast, consistent UI building.

Supabase SDK Patterns (TypeScript & Python)

Production-ready patterns for using Supabase clients in TypeScript and Python: client initialization, typed queries, auth, realtime, storage, RPC, and error-han

Framer Plugin Development Guide

Expert reference for building, debugging, and publishing Framer plugins: scaffolding, modes, ManagedCollection APIs, build tooling, and marketplace rules.

Google Calendar Tool

Tool integration for listing and creating Google Calendar events via OAuth2 — list upcoming events, add appointments, and manage calendars.

Back to Skills