
from ds-skills24
Post-process entity-match predictions to enforce symmetry and transitivity (graph closure) to improve recall in deduplication and record linkage.
This skill provides a simple, practical technique to post-process pairwise entity match predictions by enforcing bidirectional links (symmetry) and propagating matches across connected components (transitive closure). It converts independent pair predictions into consistent match groups, improving recall and producing coherent deduplicated clusters for downstream workflows.
Use this after a binary pairwise matching model has produced candidate matches but results contain inconsistencies (A→B but not B→A) or fragmented clusters (A→B, B→C but not A→C). Helpful in record linkage, entity deduplication, and any pipeline where consistent grouping of IDs matters. Avoid full closure on extremely noisy predictions unless confidence filtering is applied.
Likely compatible with general-purpose code-capable agents that can run Python snippets (Copilot-like assistants, Claude Code, Cursor).
A pure-documentation skill for post-processing entity match predictions to enforce symmetry and transitivity via graph closure. No bundled scripts — the implementation is inline Python in the SKILL.md. The code is functional and copy-pasteable but lacks error handling and iteration limits. Useful niche tool for Kaggle-style deduplication tasks.
Simple, harmless skill. Essentially a code recipe packaged as a skill. Architecture is flat (no scripts/references dirs) but appropriate for this scope. The union-find alternative mentioned in Key Decisions would be a better implementation for production use.