Web Scraping — Adaptive Strategy

Trust Score 87/100

Adaptive web-scraping skill that chooses the cheapest reliable approach (HTTP, browser, API, or hybrid), discovers APIs via traffic interception, and can produc

triggers:scrapeextract data403productionizemake this an apify actorapi discoverytraffic interception

GitHub SKILL.md

What it does

This skill guides an agent through an adaptive, phased web-scraping reconnaissance and implementation workflow. It starts with lightweight HTTP checks (curl), escalates to browser reconnaissance and traffic interception when needed, discovers APIs/endpoints, validates selectors/JSON paths, and documents a repeatable extraction strategy. It also includes guidance for turning a working scraper into a production Apify Actor (TypeScript-first).

When to use it

Use when you need to extract structured data from a website, investigate blocking/403 issues, find APIs behind a site, or convert an ad-hoc scraper into a production actor. Triggers include: "scrape [site]", "extract data from", "I'm getting blocked", and "Make this an Apify Actor".

What's included

Scripts: none bundled in the repo (has_scripts=false) but the skill contains runnable examples and patterns (examples/*, apify templates).
References: in-repo references and strategy docs (has_references=false for top-level flag, but many subfiles are referenced in the SKILL.md).
Instructions: a detailed 6-phase reconnaissance workflow (Phases 0-5), validation steps, protection-testing guidance, report schema, and productionization checklist for Apify Actors.

Compatible agents

Works well for agents that can run shell and Node workflows (Claude Code, Copilot/Code-writing agents, agents that can run Playwright/Crawlee examples).

Audit Summary

Comprehensive web-scraping skill with an adaptive phased workflow (quick curl → browser → deep scan → protection testing → report). No bundled scripts — pure instruction-based SKILL.md with extensive progressive disclosure across strategies/, workflows/, reference/, and apify/ subdirectories. Well-structured quality gates and self-critique phase. Tied to Apify/Crawlee ecosystem but recon phases are generally applicable.

Watch Out

Relies heavily on proxy-mcp tool availability (interceptor_chrome_*, proxy_*, humanizer_*) which may not be present
Anti-detection/anti-blocking guidance is detailed but context-dependent on specific proxy infrastructure
No bundled scripts to validate — all instruction-based

Notes

One of the more thoroughly documented skills seen. Progressive disclosure via subdirectories is exemplary. The phased approach with quality gates preventing unnecessary browser launches is well-designed. Security is clean — no credential leaks, destructive commands, or exfiltration risks.

Information

Repository: web-scraper
Stars: 39

Trust Score

Overall87

Security92

Code Quality78

Architecture88

Usefulness82

Related Skills

Development Worktree

Create an isolated git worktree for feature work, auto-run project setup, and verify a clean test baseline before development.

Readwise Reader Document Management

Manage Readwise Reader documents: list, save, search, move, tag, highlight, export and bulk-edit via official and custom CLIs.

Bounty Hunter — Atlas

Persona skill: 'Atlas' — a profit-focused developer persona for discovering, evaluating and executing paid bounties or freelance tasks with ROI-aware workflows.

Junshi — Research Advisor

Daily strategic research advisor that scans arXiv/venues, digests papers, and proposes bold, ranked research ideas tailored to the user's profile.

Full Stack Builder

End-to-end builder that scaffolds, implements, tests, and optionally deploys web and API applications from a natural-language specification.

ezBookkeeping API Tools

Command-line API tools for ezBookkeeping: record and query transactions, retrieve accounts/categories/tags, and fetch exchange rates for self-hosted personal fi

Feishu Voice Sender

Convert MP3s and send them as native Feishu voice messages (playable voice clips) to users or groups.

Claw Bench

Benchmarking skill that guides an agent through a structured suite of capability tests and reporting steps for leaderboard submission.

Back to Skills