
MCP Web Content Extractor
by hamz115
Unlimited content extraction from dynamic JavaScript-heavy sites and Google Gemini conversations using Playwright.
What it does
Connects an AI to the web with the ability to bypass traditional scrapers. It uses browser automation to handle dynamic content, lazy-loading, and authenticated sessions, specifically optimized for extracting long Gemini chat histories.
Tools
extract_dynamic_content: The primary tool for high-volume extraction with automated scrolling and authentication.extract_url_content: Fast HTTP-based extraction for static pages.extract_url_content_clean: Returns cleaned HTML content.extract_url_content_structured: Organizes page content into logical sections.extract_with_browser_session: Uses provided cookies to access authenticated pages.login_and_extract_google: Automates Google login to fetch private data like Gemini chats.
Installation
{
"mcpServers": {
"Web Content Extractor": {
"command": "python3",
"args": ["/path/to/main.py"],
"protocol": "mcp"
}
}
}
Supported hosts
Confirmed support for Cursor. Defaulting to Claude as standard MCP host.
Quick install
pip install -r requirements.txt && playwright install chromiumInformation
- Pricing
- free
- Published
- 5/28/2026
- Updated






