MessyData produces realistic synthetic DataFrames from a declarative YAML config or Python schema. It injects controlled anomalies (missing values, duplicates, invalid categories, date errors, outliers) to emulate messy real-world datasets for testing, QA, and ML robustness checks.
Trigger this skill when you need synthetic dirty data for testing pipelines, validating data-cleaning code, creating edge-case samples for model training or QA, or scheduling daily backfills of generated datasets.
Useful for agents that can run CLI commands or Python (Bash/uv runner, Python-capable agents like Copilot/Cursor/Claude Code).
This skill has not been reviewed by our automated audit pipeline yet.