Labsco
ScrapeGraphAI logo

ScrapeGraph AI

β˜… 85

from ScrapeGraphAI

AI-powered web scraping using the ScrapeGraph AI API. Requires an API key.

πŸ”₯πŸ”₯πŸ”₯βœ“ VerifiedAccount requiredAdvanced setup

ScrapeGraph MCP Server

<p align="center"> <img src="./assets/scrapegraphAI.svg" width="250" alt="ScrapegraphAI Logo"> </p>

License: MIT Python 3.13+ smithery badge

A production-ready Model Context Protocol (MCP) server that provides seamless integration with the ScrapeGraph AI API. This server enables language models to leverage advanced AI-powered web scraping capabilities with enterprise-grade reliability.

Table of Contents

API v2

This MCP server targets ScrapeGraph API v2 (https://v2-api.scrapegraphai.com/api), aligned 1:1 with scrapegraph-py PR #84. Auth uses the SGAI-APIKEY header. Environment variables mirror the Python SDK:

  • SGAI_API_URL β€” override the base URL (default https://v2-api.scrapegraphai.com/api)
  • SGAI_TIMEOUT β€” request timeout in seconds (default 120)
  • SGAI_API_KEY β€” API key (can also be passed via MCP scrapegraphApiKey or X-API-Key header)

Legacy aliases (still honored): SCRAPEGRAPH_API_BASE_URL for SGAI_API_URL, SGAI_TIMEOUT_S for SGAI_TIMEOUT.

Key Features

  • Scrape & extract: scrape (POST /scrape, multi-format), extract (POST /extract, URL + prompt)
  • Search: search (POST /search; num_results clamped 3–20)
  • Crawl: Async multi-page crawl with crawl_start / crawl_get_status / crawl_stop / crawl_resume
  • Schema: schema (POST /schema) β€” generate or augment a JSON Schema from a prompt
  • Monitors: Scheduled jobs via monitor_create, monitor_list, monitor_get, pause/resume/delete, monitor_activity (paginated tick history)
  • Account: credits, history
  • Easy integration: Claude Desktop, Cursor, Smithery, HTTP transport
  • Developer docs: .agent/ folder

Migration: v2 β†’ v3

v3 renames every MCP tool that diverged from the v2 API docs. Hard rename, no aliases.

v2 (old)v3 (new)
smartscraperextract
searchscrapersearch
smartcrawler_initiatecrawl_start
smartcrawler_fetch_resultscrawl_get_status
sgai_historyhistory
generate_schemaschema
markdownifyremoved β€” use scrape with output_format="markdown"

Available Tools

ToolRole
scrapePOST /scrape (output_format: markdown, html, screenshot, branding, links, images, summary)
extractPOST /extract (requires website_url + user_prompt; optional output_schema)
searchPOST /search (num_results 1–20; supports country_search, time_range, output_schema)
crawl_startPOST /crawl β€” extraction_mode markdown / html / links / images / summary / branding / screenshot
crawl_get_statusGET /crawl/:id (poll until status: completed)
crawl_stop, crawl_resumePOST /crawl/:id/stop | resume
schemaPOST /schema (generate or augment a JSON Schema from a prompt)
creditsGET /credits
historyGET /history (paginated, service filter)
monitor_create, monitor_list, monitor_get, monitor_pause, monitor_resume, monitor_delete/monitor API
monitor_activityGET /monitor/:id/activity (paginated tick history: id, createdAt, status, changed, elapsedMs, diffs)

Removed: sitemap, agentic_scrapper, async-status polling, and (in v3) markdownify β€” use scrape with output_format="markdown".

Google ADK Integration

The ScrapeGraph MCP server can be integrated with Google ADK (Agent Development Kit) to create AI agents with web scraping capabilities.

Prerequisites

  • Python 3.13 or higher
  • Google ADK installed
  • ScrapeGraph API key

Installation

  1. Install Google ADK (if not already installed):
Copy & paste β€” that's it
pip install google-adk
  1. Set your API key:
Copy & paste β€” that's it
export SGAI_API_KEY=your-api-key-here

Basic Integration Example

Create an agent file (e.g., agent.py) with the following configuration:

Copy & paste β€” that's it
import os
from google.adk.agents import LlmAgent
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset
from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams
from mcp import StdioServerParameters

# Path to the scrapegraph-mcp server directory
SCRAPEGRAPH_MCP_PATH = "/path/to/scrapegraph-mcp"

# Path to the server.py file
SERVER_SCRIPT_PATH = os.path.join(
    SCRAPEGRAPH_MCP_PATH, 
    "src", 
    "scrapegraph_mcp", 
    "server.py"
)

root_agent = LlmAgent(
    model='gemini-2.0-flash',
    name='scrapegraph_assistant_agent',
    instruction='Help the user with web scraping and data extraction using ScrapeGraph AI. '
                'You can convert webpages to markdown, extract structured data using AI, '
                'perform web searches, crawl multiple pages, and automate complex scraping workflows.',
    tools=[
        MCPToolset(
            connection_params=StdioConnectionParams(
                server_params=StdioServerParameters(
                    command='python3',
                    args=[
                        SERVER_SCRIPT_PATH,
                    ],
                    env={
                        'SGAI_API_KEY': os.getenv('SGAI_API_KEY'),
                    },
                ),
                timeout=300.0,)
            ),
            # Optional: Filter which tools from the MCP server are exposed
            # tool_filter=['scrape', 'extract', 'search']
        )
    ],
)

Configuration Options

Timeout Settings:

  • Default timeout is 5 seconds, which may be too short for web scraping operations
  • Recommended: Set `timeout=300.0
  • Adjust based on your use case (crawling operations may need even longer timeouts)

Tool Filtering:

  • By default, all registered MCP tools are exposed to the agent (see Available Tools)
  • Use tool_filter to limit which tools are available:
    Copy & paste β€” that's it
    tool_filter=['scrape', 'extract', 'search']

API Key Configuration:

  • Set via environment variable: export SGAI_API_KEY=your-key
  • Or pass directly in env dict: 'SGAI_API_KEY': 'your-key-here'
  • Environment variable approach is recommended for security

Usage Example

Once configured, your agent can use natural language to interact with web scraping tools:

Copy & paste β€” that's it
# The agent can now handle queries like:
# - "Convert https://example.com to markdown"
# - "Extract all product prices from this e-commerce page"
# - "Search for recent AI research papers and summarize them"
# - "Crawl this documentation site and extract all API endpoints"

For more information about Google ADK, visit the official documentation.

Example Use Cases

The server enables sophisticated queries across various scraping scenarios:

Single Page Scraping

  • Markdownify: "Convert the ScrapeGraph documentation page to markdown"
  • Extract: "Extract all product names, prices, and ratings from this e-commerce page"
  • Extract with scrolling: "Scrape this infinite scroll page with 5 scrolls and extract all items"
  • Basic Scrape: "Fetch the HTML content of this JavaScript-heavy page with full rendering"

Search and Research

  • Search: "Research and summarize recent developments in AI-powered web scraping"
  • Search: "Search for the top 5 articles about machine learning frameworks and extract key insights"
  • Search: "Find recent news about GPT-4 and provide a structured summary"
  • Search: v2 does not apply time_range; phrase queries to bias recency in natural language instead

Website analysis

  • Use crawl_start plus crawl_get_status to map and capture multi-page content; there is no separate sitemap tool on v2.

Multi-page crawling

  • Crawl: "Crawl the blog in markdown mode and poll until complete"
  • For structured fields per page, run extract on individual URLs (or monitor_create on a schedule)

Monitors and account

  • Monitor: "Run this extract prompt on https://example.com every day at 9am" (monitor_create with interval)
  • Credits / history: credits, history
  • Agentic Scraper: "Execute a complex workflow: login, navigate to reports, download data, and extract summary statistics"

Error Handling

The server implements robust error handling with detailed, actionable error messages for:

  • API authentication issues
  • Malformed URL structures
  • Network connectivity failures
  • Rate limiting and quota management

Development

Prerequisites

  • Python 3.13 or higher
  • pip or uv package manager
  • ScrapeGraph API key

Installation from Source

Copy & paste β€” that's it
# Clone the repository
git clone https://github.com/ScrapeGraphAI/scrapegraph-mcp
cd scrapegraph-mcp

# Install dependencies
pip install -e ".[dev]"

# Set your API key
export SGAI_API_KEY=your-api-key

# Run the server
scrapegraph-mcp
# or
python -m scrapegraph_mcp.server

Testing with MCP Inspector

Test your server locally using the MCP Inspector tool:

Copy & paste β€” that's it
npx @modelcontextprotocol/inspector scrapegraph-mcp

This provides a web interface to test all available tools.

Code Quality

Linting:

Copy & paste β€” that's it
ruff check src/

Type Checking:

Copy & paste β€” that's it
mypy src/

Format Checking:

Copy & paste β€” that's it
ruff format --check src/

Project Structure

Copy & paste β€” that's it
scrapegraph-mcp/
β”œβ”€β”€ src/
β”‚   └── scrapegraph_mcp/
β”‚       β”œβ”€β”€ __init__.py      # Package initialization
β”‚       └── server.py        # Main MCP server (all code in one file)
β”œβ”€β”€ .agent/                  # Developer documentation
β”‚   β”œβ”€β”€ README.md           # Documentation index
β”‚   └── system/             # System architecture docs
β”œβ”€β”€ assets/                  # Images and badges
β”œβ”€β”€ pyproject.toml          # Project metadata & dependencies
β”œβ”€β”€ smithery.yaml           # Smithery deployment config
└── README.md               # This file

Contributing

We welcome contributions! Here's how you can help:

Adding a New Tool

  1. Add method to ScapeGraphClient class in server.py:
Copy & paste β€” that's it
def new_tool(self, param: str) -> Dict[str, Any]:
    """Tool description."""
    url = f"{self.BASE_URL}/new-endpoint"
    data = {"param": param}
    response = self.client.post(url, headers=self.headers, json=data)
    if response.status_code != 200:
        raise Exception(f"Error {response.status_code}: {response.text}")
    return response.json()
  1. Add MCP tool decorator:
Copy & paste β€” that's it
@mcp.tool()
def new_tool(param: str) -> Dict[str, Any]:
    """
    Tool description for AI assistants.

    Args:
        param: Parameter description

    Returns:
        Dictionary containing results
    """
    if scrapegraph_client is None:
        return {"error": "ScrapeGraph client not initialized. Please provide an API key."}

    try:
        return scrapegraph_client.new_tool(param)
    except Exception as e:
        return {"error": str(e)}
  1. Test with MCP Inspector:
Copy & paste β€” that's it
npx @modelcontextprotocol/inspector scrapegraph-mcp
  1. Update documentation:

  2. Submit a pull request

Development Workflow

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Run linting and type checking
  5. Test with MCP Inspector and Claude Desktop
  6. Update documentation
  7. Commit your changes (git commit -m 'Add amazing feature')
  8. Push to the branch (git push origin feature/amazing-feature)
  9. Open a Pull Request

Code Style

  • Line length: 100 characters
  • Type hints: Required for all functions
  • Docstrings: Google-style docstrings
  • Error handling: Return error dicts, don't raise exceptions in tools
  • Python version: Target 3.13+

For detailed development guidelines, see the .agent documentation.

Documentation

For comprehensive developer documentation, see:

Technology Stack

Core Framework

  • Python 3.13+ - Modern Python with type hints
  • FastMCP - Lightweight MCP server framework
  • httpx 0.24.0+ - Modern async HTTP client

Development Tools

  • Ruff - Fast Python linter and formatter
  • mypy - Static type checker
  • Hatchling - Modern build backend

Deployment

  • Smithery - Automated MCP server deployment
  • Docker - Container support with Alpine Linux
  • stdio transport - Standard MCP communication

API Integration

  • ScrapeGraph AI API - Enterprise web scraping service
  • Base URL: https://v2-api.scrapegraphai.com/api
  • Authentication: API key-based

License

This project is distributed under the MIT License. For detailed terms and conditions, please refer to the LICENSE file.

Acknowledgments

Special thanks to tomekkorbak for his implementation of oura-mcp-server, which served as starting point for this repo.

Resources

Official Links

MCP Resources

AI Assistant Integration

Support


Made with ❀️ by ScrapeGraphAI Team