
from autoskill383
Patches OCR invoice entity-to-bounding-box mapping to avoid shared boxes for duplicate values, reverse-search amounts sections, and ensure coordinate uniqueness
Adds concrete logic to OCR invoice entity mapping workflows to handle duplicate entity values safely: when the same entity value appears multiple times, the algorithm assigns distinct bounding boxes (no reuse), uses memoization to track taken boxes, and falls back to the next-best match when overlaps occur. For amounts_and_tax sections it reverses the search order (bottom-up) to better match invoice layouts. Multi-token entities get sequence-aware matching and overlap checks so tokens don't claim the same coordinates.
Use when extracting structured fields from scanned invoices or receipts where the same textual value can appear multiple times (e.g., repeated amounts, line-item names). It's useful during OCR post-processing to increase mapping accuracy and avoid misattributing coordinates.
Relevant to Python-capable coding assistants (Codex, Copilot, GPT-style code assistants) and OCR pipelines that run post-processing scripts. Recommended for teams working with Tesseract/ocr-dataframes or CV-assisted extraction pipelines.
A prompt-only skill that instructs an LLM to modify OCR invoice entity-to-bounding-box mapping code for duplicate handling. No scripts bundled — purely a structured prompt template with operational rules for dynamic programming, reverse dataframe search, and coordinate uniqueness. Well-defined constraints but no runnable code, examples, or error handling guidance.
Prompt-only skill from the AutoSkill research project (ecnu-icalk/autoskill). Clean from a security perspective as there are no scripts or executable code. Limited practical value as a standalone skill since it only provides instructions for modifying code that must already exist elsewhere.