
from web-scraper39
Adaptive web-scraping skill that chooses the cheapest reliable approach (HTTP, browser, API, or hybrid), discovers APIs via traffic interception, and can produc
This skill guides an agent through an adaptive, phased web-scraping reconnaissance and implementation workflow. It starts with lightweight HTTP checks (curl), escalates to browser reconnaissance and traffic interception when needed, discovers APIs/endpoints, validates selectors/JSON paths, and documents a repeatable extraction strategy. It also includes guidance for turning a working scraper into a production Apify Actor (TypeScript-first).
Use when you need to extract structured data from a website, investigate blocking/403 issues, find APIs behind a site, or convert an ad-hoc scraper into a production actor. Triggers include: "scrape [site]", "extract data from", "I'm getting blocked", and "Make this an Apify Actor".
Works well for agents that can run shell and Node workflows (Claude Code, Copilot/Code-writing agents, agents that can run Playwright/Crawlee examples).
Comprehensive web-scraping skill with an adaptive phased workflow (quick curl → browser → deep scan → protection testing → report). No bundled scripts — pure instruction-based SKILL.md with extensive progressive disclosure across strategies/, workflows/, reference/, and apify/ subdirectories. Well-structured quality gates and self-critique phase. Tied to Apify/Crawlee ecosystem but recon phases are generally applicable.
One of the more thoroughly documented skills seen. Progressive disclosure via subdirectories is exemplary. The phased approach with quality gates preventing unnecessary browser launches is well-designed. Security is clean — no credential leaks, destructive commands, or exfiltration risks.