
from sciclaw9
Handles common PDF tasks: extract text and tables, merge/split files, rotate/watermark pages, and run OCR to make scanned documents searchable.
The PDF Skill provides pragmatic, scriptable operations for working with PDF artifacts in reproducible workflows. It helps agents extract text and tables, merge or split documents, rotate or watermark pages, and perform OCR on scanned pages. The skill emphasises preservation of page order and metadata, validation of output page counts, and recording provenance for transformations.
Use this skill whenever a user asks to read, transform, or extract data from PDF files — for example: extracting tables for data analysis, converting scanned reports to searchable text, splitting a multi-article PDF into separate files, or applying consistent watermarks to produced documents. It is intended for automated pipelines where reproducibility and validation matter.
This skill is language- and tool-agnostic but clearly targets agents with Python runtime support (Copilot/Codex, Claude Code, and other automation agents that can run Python snippets). It is well suited for CLI-capable agents integrated into reproducible research pipelines.
This skill has not been reviewed by our automated audit pipeline yet.