
from phd-deepread-workflow28
A guided CLI workflow that extracts text from academic PDFs (PyMuPDF + Tesseract), generates structured Obsidian notes, and creates JSON Canvas critical-thinkin
PhD Deep Read packages a four-stage pipeline for turning academic PDFs into richly structured literature notes and critical-thinking canvases. It uses a Text-First decision tree (PyMuPDF for searchable pages with Tesseract OCR fallback) to extract text and images, then generates Obsidian-friendly markdown with YAML frontmatter and Dataview callouts. The skill also produces JSON Canvas files for deep analysis and includes verification steps to ensure output consistency.
Use this skill when processing individual or batches of academic PDFs for literature reviews, generating reproducible notes for Obsidian, or when you need structured synthesis and critique (assumptions, evidence assessment, future directions). It's appropriate for researchers, graduate students, and knowledge workers preparing reading corpora.
Works with agents that can run or orchestrate CLI/python tools (Claude Code, assistant shells, or local CLI wrappers). Best when the environment provides PyMuPDF and Tesseract for OCR and when the agent can read/write files for Obsidian integration.
PhD Deep Read Workflow processes academic PDFs into structured Obsidian notes using a Text-First decision tree (PyMuPDF + Tesseract OCR fallback). It generates JSON Canvas critical-thinking canvases and structured literature note prompts. CLI entry point and two helper scripts ran cleanly; most other scripts need required args or missing deps (PyMuPDF, Tesseract). No security concerns found — no network calls, no credential exposure, no destructive operations.
PyMuPDF (fitz)tesseract-ocrpytesseractpillowClean, well-documented academic tool. process.py auto-chmods .sh scripts to 0o755 — minor but harmless. extract.py passes --lang arg directly to tesseract subprocess — theoretical injection risk if lang value is untrusted, but low severity since user controls input.