Labsco
jae-jae logo

Fetcher MCP

β˜… 1,100

from jae-jae

Fetch and extract web content using a Playwright headless browser, with support for intelligent extraction and flexible output.

πŸ”₯πŸ”₯πŸ”₯πŸ”₯βœ“ VerifiedFreeAdvanced setup

δΈ­ζ–‡ | Deutsch | EspaΓ±ol | franΓ§ais | ζ—₯本θͺž | ν•œκ΅­μ–΄ | PortuguΓͺs | Русский

Fetcher MCP

MCP server for fetch web page content using Playwright headless browser.

🌟 Recommended: OllaMan - Powerful Ollama AI Model Manager.

Advantages

JavaScript Support: Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications.

Intelligent Content Extraction: Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements.

Flexible Output Format: Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications.

Parallel Processing: The fetch_urls tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations.

Resource Optimization: Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance.

Robust Error Handling: Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages.

Configurable Parameters: Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases.

Features

fetch_url - Retrieve web page content from a specified URL

  • Uses Playwright headless browser to parse JavaScript

  • Supports intelligent extraction of main content and conversion to Markdown

  • Supports the following parameters:

  • url: The URL of the web page to fetch (required parameter)

  • timeout: Page loading timeout in milliseconds, default is 30000 (30 seconds)

  • waitUntil: Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'

  • extractContent: Whether to intelligently extract the main content, default is true

  • maxLength: Maximum length of returned content (in characters), default is no limit

  • returnHtml: Whether to return HTML content instead of Markdown, default is false

  • waitForNavigation: Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false

  • navigationTimeout: Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)

  • disableMedia: Whether to disable media resources (images, stylesheets, fonts, media), default is true

  • debug: Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified

fetch_urls - Batch retrieve web page content from multiple URLs in parallel

  • Uses multi-tab parallel fetching for improved performance

  • Returns combined results with clear separation between webpages

  • Supports the following parameters:

  • urls: Array of URLs to fetch (required parameter)

  • Other parameters are the same as fetch_url

browser_install - Install Playwright Chromium browser binary automatically

  • Installs required Chromium browser binary when not available

  • Automatically suggested when browser installation errors occur

  • Supports the following parameters:

  • withDeps: Install system dependencies required by Chromium browser, default is false

  • force: Force installation even if Chromium is already installed, default is false

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms

Wait for Complete Loading: For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt:

Copy & paste β€” that's it
Please wait for the page to fully load

This will use the waitForNavigation: true parameter.

Increase Timeout Duration: For websites that load slowly:

Copy & paste β€” that's it
Please set the page loading timeout to 60 seconds

This adjusts both timeout and navigationTimeout parameters accordingly.

Content Retrieval Adjustments

Preserve Original HTML Structure: When content extraction might fail:

Copy & paste β€” that's it
Please preserve the original HTML content

Sets extractContent: false and returnHtml: true.

Fetch Complete Page Content: When extracted content is too limited:

Copy & paste β€” that's it
Please fetch the complete webpage content instead of just the main content

Sets extractContent: false.

Return Content as HTML: When HTML format is needed instead of default Markdown:

Copy & paste β€” that's it
Please return the content in HTML format

Sets returnHtml: true.

Debugging and Authentication

Enabling Debug Mode

  • Dynamic Debug Activation: To display the browser window during a specific fetch operation:
Copy & paste β€” that's it
Please enable debug mode for this fetch operation

This sets debug: true even if the server was started without the --debug flag.

Using Custom Cookies for Authentication

Manual Login: To login using your own credentials:

Copy & paste β€” that's it
Please run in debug mode so I can manually log in to the website

Sets debug: true or uses the --debug flag, keeping the browser window open for manual login.

Interacting with Debug Browser: When debug mode is enabled:

  • The browser window remains open

  • You can manually log into the website using your credentials

  • After login is complete, content will be fetched with your authenticated session

Enable Debug for Specific Requests: Even if the server is already running, you can enable debug mode for a specific request:

Copy & paste β€” that's it
Please enable debug mode for this authentication step

Sets debug: true for this specific request only, opening the browser window for manual login.

Development

Install Dependencies

Copy & paste β€” that's it
npm install

Install Playwright Browser

Install the browsers needed for Playwright:

Copy & paste β€” that's it
npm run install-browser

Build the Server

Copy & paste β€” that's it
npm run build

Debugging

Use MCP Inspector for debugging:

Copy & paste β€” that's it
npm run inspector

You can also enable visible browser mode for debugging:

Copy & paste β€” that's it
node build/index.js --debug

Related Projects

  • g-search-mcp: A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously. Perfect for batch search operations and data collection.

License

Licensed under the MIT License