Labsco
apache logo

Apache Doris

308

from apache

MCP Server For Apache Doris, an MPP-based real-time data warehouse.

🔥🔥🔥🔥✓ VerifiedAccount requiredAdvanced setup
<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

Doris MCP Server

Doris MCP (Model Context Protocol) Server is a backend service built with Python and FastAPI. It implements the MCP, allowing clients to interact with it through defined "Tools". It's primarily designed to connect to Apache Doris databases, potentially leveraging Large Language Models (LLMs) for tasks like converting natural language queries to SQL (NL2SQL), executing queries, and performing metadata management and analysis.

🚀 What's New in v0.6.0

  • 🔐 Enterprise Authentication System: Revolutionary token-bound database configuration with comprehensive Token, JWT, and OAuth authentication support, enabling secure multi-tenant access with granular control switches and enterprise-grade security defaults
  • ⚡ Immediate Database Validation: Real-time database configuration validation at connection time, eliminating query-time blocking and providing instant feedback for invalid configurations - achieving 100% elimination of late-stage connection failures
  • 🔄 Hot Reload Configuration Management: Zero-downtime configuration updates with intelligent hot reloading of tokens.json, automatic token revalidation, and comprehensive error handling with rollback mechanisms
  • 🏗️ Advanced Connection Architecture: Session caching and connection pool optimization with 60% reduction in connection overhead, intelligent pool recreation, and automatic resource management
  • 🌐 Multi-Worker Scalability: True horizontal scaling with stateless multi-worker architecture, efficient load distribution, and enterprise-grade concurrent processing capabilities
  • 🔒 Enhanced Security Framework: Comprehensive access control and SQL security validation with immediate validation, role-based permissions, and enhanced injection detection patterns
  • 🛠️ Unified Configuration System: Streamlined configuration management with proper command-line precedence, Docker compatibility improvements, and cross-platform deployment support
  • 📊 Token Management Dashboard: Complete token lifecycle management with creation, revocation, statistics, and comprehensive audit trails for enterprise token governance
  • 🌐 Web-Based Management Interface: Secure localhost-only token administration with intuitive dashboard, database binding configuration, real-time operations, and enterprise-grade access controls

🚀 Major Milestone: v0.6.0 establishes the platform as a production-ready enterprise authentication and database management system with zero-downtime operations (hot reload + immediate validation + multi-worker scaling), advanced security controls, and comprehensive token-bound database configuration - representing a fundamental advancement in enterprise data platform capabilities.

What's Also Included from v0.5.1

  • 🔥 Critical at_eof Connection Fix: Complete elimination of connection pool errors with intelligent health monitoring and self-healing recovery
  • 🔧 Enterprise Logging System: Level-based file separation with automatic cleanup and millisecond precision timestamps
  • 📊 Advanced Data Analytics Suite: 7 enterprise-grade data governance tools including quality analysis, lineage tracking, and performance monitoring
  • 🏃‍♂️ High-Performance ADBC Integration: Apache Arrow Flight SQL support with 3-10x performance improvements for large datasets
  • ⚙️ Enhanced Configuration Management: Complete ADBC configuration system with intelligent parameter validation

Core Features

  • MCP Protocol Implementation: Provides standard MCP interfaces, supporting tool calls, resource management, and prompt interactions.
  • Streamable HTTP Communication: Unified HTTP endpoint supporting both request/response and streaming communication for optimal performance and reliability.
  • Stdio Communication: Standard input/output mode for direct integration with MCP clients like Cursor.
  • Enterprise-Grade Architecture: Modular design with comprehensive functionality:
    • Tools Manager: Centralized tool registration and routing with unified interfaces (doris_mcp_server/tools/tools_manager.py)
    • Enhanced Monitoring Tools Module: Advanced memory tracking, metrics collection, and flexible BE node discovery with modular, extensible design
    • Query Information Tools: Enhanced SQL explain and profiling with configurable content truncation, file export for LLM attachments, and advanced query analytics
    • Resources Manager: Resource management and metadata exposure (doris_mcp_server/tools/resources_manager.py)
    • Prompts Manager: Intelligent prompt templates for data analysis (doris_mcp_server/tools/prompts_manager.py)
  • Advanced Database Features:
    • Query Execution: High-performance SQL execution with advanced caching and optimization, enhanced connection stability and automatic retry mechanisms (doris_mcp_server/utils/query_executor.py)
    • Security Management: Comprehensive SQL security validation with configurable blocked keywords, SQL injection protection, data masking, and unified security configuration management (doris_mcp_server/utils/security.py)
    • Metadata Extraction: Comprehensive database metadata with catalog federation support (doris_mcp_server/utils/schema_extractor.py)
    • Performance Analysis: Advanced column analysis, performance monitoring, and data analysis tools (doris_mcp_server/utils/analysis_tools.py)
  • Catalog Federation Support: Full support for multi-catalog environments (internal Doris tables and external data sources like Hive, MySQL, etc.)
  • Enterprise Security: Comprehensive security framework with authentication, authorization, SQL injection protection, and data masking capabilities with environment variable configuration support
  • Web-Based Token Management: Secure localhost-only interface for complete token lifecycle management with database binding, real-time statistics, and enterprise-grade access controls (doris_mcp_server/auth/token_handlers.py)
  • Unified Configuration Framework: Centralized configuration management through config.py with comprehensive validation, standardized parameter naming, and smart default database handling with automatic fallback to information_schema

Connecting with Cursor

You can connect Cursor to this MCP server using Stdio mode (recommended) or Streamable HTTP mode.

Stdio Mode

Stdio mode allows Cursor to manage the server process directly. Configuration is done within Cursor's MCP Server settings file (typically ~/.cursor/mcp.json or similar).

Method 1: Using PyPI Installation (Recommended)

Install the package from PyPI and configure Cursor to use it:

Copy & paste — that's it
pip install doris-mcp-server

Configure Cursor: Add an entry like the following to your Cursor MCP configuration:

Copy & paste — that's it
{
  "mcpServers": {
    "doris-stdio": {
      "command": "doris-mcp-server",
      "args": ["--transport", "stdio"],
      "env": {
        "DORIS_HOST": "127.0.0.1",
        "DORIS_PORT": "9030",
        "DORIS_USER": "root",
        "DORIS_PASSWORD": "your_db_password"
      }
    }
  }
}

Method 2: Using uv (Development)

If you have uv installed and want to run from source:

Copy & paste — that's it
uv run --project /path/to/doris-mcp-server doris-mcp-server

Note: Replace /path/to/doris-mcp-server with the actual absolute path to your project directory.

Configure Cursor: Add an entry like the following to your Cursor MCP configuration:

Copy & paste — that's it
{
  "mcpServers": {
    "doris-stdio": {
      "command": "uv",
      "args": ["run", "--project", "/path/to/your/doris-mcp-server", "doris-mcp-server"],
      "env": {
        "DORIS_HOST": "127.0.0.1",
        "DORIS_PORT": "9030",
        "DORIS_USER": "root",
        "DORIS_PASSWORD": "your_db_password"
      }
    }
  }
}

Streamable HTTP Mode

Streamable HTTP mode requires you to run the MCP server independently first, and then configure Cursor to connect to it.

  1. Configure .env: Ensure your database credentials and any other necessary settings are correctly configured in the .env file within the project directory.

  2. Start the Server: Run the server from your terminal in the project's root directory:

    Copy & paste — that's it
    ./start_server.sh

    This script reads the .env file and starts the FastAPI server with Streamable HTTP support. Note the host and port the server is listening on (default is 0.0.0.0:3000).

  3. Configure Cursor: Add an entry like the following to your Cursor MCP configuration, pointing to the running server's Streamable HTTP endpoint:

    Copy & paste — that's it
    {
      "mcpServers": {
        "doris-http": {
           "url": "http://127.0.0.1:3000/mcp"
        }
      }
    }

    Note: Adjust the host/port if your server runs on a different address. The /mcp endpoint is the unified Streamable HTTP interface.

After configuring either mode in Cursor, you should be able to select the server (e.g., doris-stdio or doris-http) and use its tools.

Connecting with Kiro

Add to Kiro

Or add the following to your Kiro MCP config file (~/.kiro/settings/mcp.json for global, or .kiro/settings/mcp.json for project-scoped). See the Kiro MCP documentation for more details.

Copy & paste — that's it
{
  "mcpServers": {
    "doris-stdio": {
      "command": "doris-mcp-server",
      "args": ["--transport", "stdio"],
      "env": {
        "DORIS_HOST": "127.0.0.1",
        "DORIS_PORT": "9030",
        "DORIS_USER": "root",
        "DORIS_PASSWORD": "your_db_password"
      }
    }
  }
}

Directory Structure

Copy & paste — that's it
doris-mcp-server/
├── doris_mcp_server/           # Main server package
│   ├── main.py                 # Main entry point and FastAPI app
│   ├── multiworker_app.py      # Multi-worker application module (New in v0.6.0)
│   ├── auth/                   # Authentication modules (New in v0.6.0)
│   │   ├── token_manager.py    # Enterprise token management with hot reload
│   │   ├── jwt_manager.py      # JWT authentication provider
│   │   ├── oauth_provider.py   # OAuth authentication provider  
│   │   ├── oauth_handlers.py   # OAuth HTTP endpoint handlers
│   │   ├── token_handlers.py   # Token management HTTP endpoints
│   │   ├── auth_middleware.py  # Authentication middleware
│   │   └── __init__.py
│   ├── tools/                  # MCP tools implementation
│   │   ├── tools_manager.py    # Centralized tools management and registration
│   │   ├── resources_manager.py # Resource management and metadata exposure
│   │   ├── prompts_manager.py  # Intelligent prompt templates for data analysis
│   │   └── __init__.py
│   ├── utils/                  # Core utility modules
│   │   ├── config.py           # Configuration management with validation
│   │   ├── db.py               # Enhanced database connection management with token binding (Enhanced in v0.6.0)
│   │   ├── query_executor.py   # High-performance SQL execution with caching
│   │   ├── security.py         # Advanced security management and authentication (Enhanced in v0.6.0)
│   │   ├── schema_extractor.py # Metadata extraction with catalog federation
│   │   ├── analysis_tools.py   # Data analysis and performance monitoring
│   │   ├── data_governance_tools.py  # Data lineage and freshness monitoring (v0.5.0)
│   │   ├── data_quality_tools.py     # Comprehensive data quality analysis (v0.5.0)
│   │   ├── data_exploration_tools.py # Advanced statistical analysis (v0.5.0)
│   │   ├── security_analytics_tools.py # Access pattern analysis (v0.5.0)
│   │   ├── dependency_analysis_tools.py # Impact analysis and dependency mapping (v0.5.0)
│   │   ├── performance_analytics_tools.py # Query optimization and capacity planning (v0.5.0)
│   │   ├── adbc_query_tools.py       # High-performance Arrow Flight SQL operations (v0.5.0)
│   │   ├── logger.py           # Logging configuration
│   │   └── __init__.py
│   └── __init__.py
├── doris_mcp_client/           # MCP client implementation
│   ├── client.py               # Unified MCP client for testing and integration
│   ├── README.md               # Client documentation
│   └── __init__.py
├── logs/                       # Log files directory
├── tokens.json                 # Token configuration file (New in v0.6.0)
├── README.md                   # This documentation
├── RELEASE_NOTES_v0.6.0.md     # Release notes for v0.6.0
├── .env.example                # Environment variables template
├── requirements.txt            # Python dependencies
├── pyproject.toml              # Project configuration and entry points
├── uv.lock                     # UV package manager lock file
├── generate_requirements.py    # Requirements generation script
├── start_server.sh             # Server startup script
└── restart_server.sh           # Server restart script

Developing New Tools

This section outlines the process for adding new MCP tools to the Doris MCP Server, based on the unified modular architecture with centralized tool management.

1. Leverage Existing Utility Modules

The server provides comprehensive utility modules for common database operations:

  • doris_mcp_server/utils/db.py: Database connection management with connection pooling and health monitoring.
  • doris_mcp_server/utils/query_executor.py: High-performance SQL execution with advanced caching, optimization, and performance monitoring.
  • doris_mcp_server/utils/schema_extractor.py: Metadata extraction with full catalog federation support.
  • doris_mcp_server/utils/security.py: Comprehensive security management, SQL validation, and data masking.
  • doris_mcp_server/utils/analysis_tools.py: Advanced data analysis and statistical tools.
  • doris_mcp_server/utils/config.py: Configuration management with validation.
  • doris_mcp_server/utils/data_governance_tools.py: Data lineage tracking and freshness monitoring (New in v0.5.0).
  • doris_mcp_server/utils/data_quality_tools.py: Comprehensive data quality analysis framework (New in v0.5.0).
  • doris_mcp_server/utils/adbc_query_tools.py: High-performance Arrow Flight SQL operations (New in v0.5.0).

2. Implement Tool Logic

Add your new tool to the DorisToolsManager class in doris_mcp_server/tools/tools_manager.py. The tools manager provides a centralized approach to tool registration and execution with unified interfaces.

Example: Adding a new analysis tool:

Copy & paste — that's it
# In doris_mcp_server/tools/tools_manager.py

async def your_new_analysis_tool(self, arguments: Dict[str, Any]) -> List[Dict[str, Any]]:
    """
    Your new analysis tool implementation
    
    Args:
        arguments: Tool arguments from MCP client
        
    Returns:
        List of MCP response messages
    """
    try:
        # Use existing utilities
        result = await self.query_executor.execute_sql_for_mcp(
            sql="SELECT COUNT(*) FROM your_table",
            max_rows=arguments.get("max_rows", 100)
        )
        
        return [{
            "type": "text",
            "text": json.dumps(result, ensure_ascii=False, indent=2)
        }]
        
    except Exception as e:
        logger.error(f"Tool execution failed: {str(e)}", exc_info=True)
        return [{
            "type": "text", 
            "text": f"Error: {str(e)}"
        }]

3. Register the Tool

Add your tool to the _register_tools method in the same class:

Copy & paste — that's it
# In the _register_tools method of DorisToolsManager

@self.mcp.tool(
    name="your_new_analysis_tool",
    description="Description of your new analysis tool",
    inputSchema={
        "type": "object",
        "properties": {
            "parameter1": {
                "type": "string",
                "description": "Description of parameter1"
            },
            "parameter2": {
                "type": "integer", 
                "description": "Description of parameter2",
                "default": 100
            }
        },
        "required": ["parameter1"]
    }
)
async def your_new_analysis_tool_wrapper(arguments: Dict[str, Any]) -> List[Dict[str, Any]]:
    return await self.your_new_analysis_tool(arguments)

4. Advanced Features

For more complex tools, you can leverage the comprehensive framework:

  • Advanced Caching: Use the query executor's built-in caching for enhanced performance
  • Enterprise Security: Apply comprehensive SQL validation and data masking through the security manager
  • Intelligent Prompts: Use the prompts manager for advanced query generation
  • Resource Management: Expose metadata through the resources manager
  • Performance Monitoring: Integrate with the analysis tools for monitoring capabilities

5. Testing

Test your new tool using the included MCP client:

Copy & paste — that's it
# Using doris_mcp_client/client.py
from doris_mcp_client.client import DorisUnifiedMCPClient

async def test_new_tool():
    client = DorisUnifiedMCPClient()
    result = await client.call_tool("your_new_analysis_tool", {
        "parameter1": "test_value",
        "parameter2": 50
    })
    print(result)

MCP Client

The project includes a unified MCP client (doris_mcp_client/) for testing and integration purposes. The client supports multiple connection modes and provides a convenient interface for interacting with the MCP server.

For detailed client documentation, see doris_mcp_client/README.md.

Contributing

Contributions are welcome via Issues or Pull Requests.

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.