
from mineru-ecosystem112
CLI-driven document extraction skill: convert PDFs, images, Word/PPT/Excel, and web pages to Markdown/HTML/LaTeX/DOCX with OCR, table and formula recognition.
MinerU provides a CLI skill to convert a wide range of documents (PDFs, scanned images, Word, PPT, Excel, web pages) into clean Markdown or other formats. It includes two modes: a zero-config 'flash-extract' for quick Markdown output (no token) and a precision 'extract' mode (token) for multi-format output, VLM layout parsing, batch processing, OCR, table and formula recognition.
Use MinerU when you need fast, reliable conversion of messy or scanned documents into editable text (research papers, reports, slides, scans), to extract tables/formulas accurately, or to batch-process large numbers of files. Choose 'flash-extract' for quick one-off conversions and 'extract' for higher accuracy and larger files.
Best used by agents or tools that can invoke system CLIs and stream/process stdout/stderr (automation agents, developer assistants, research pipelines).
This skill has not been reviewed by our automated audit pipeline yet.