Labsco
shinpr logo

Local RAG

335

from shinpr

Privacy-first local RAG server for semantic document search without external APIs

🔥🔥🔥✓ VerifiedFreeQuick setup

MCP Local RAG

Local RAG for developers via MCP or CLI. Semantic search with keyword boost for exact technical terms — fully private, zero setup.

Features

Semantic search with keyword boost Vector search first, then keyword matching boosts exact matches. Terms like useEffect, error codes, and class names rank higher—not just semantically guessed.

Smart semantic chunking Chunks documents by meaning, not character count. Uses embedding similarity to find natural topic boundaries—keeping related content together and splitting where topics change.

Quality-first result filtering Groups results by relevance gaps instead of arbitrary top-K cutoffs. Get fewer but more trustworthy chunks.

Runs entirely locally No API keys, no cloud, no data leaving your machine. Works fully offline after the first model download.

Zero-friction setup One npx command. No Docker, no Python, no servers to manage. Use via MCP, CLI, or both. Optional Agent Skills help AI assistants form better queries and interpret results.

Why This Exists

You want AI to search your documents—technical specs, research papers, internal docs. But most solutions send your files to external APIs.

Privacy. Your documents might contain sensitive data. This runs entirely locally.

Cost. External embedding APIs charge per use. This is free after the initial model download.

Offline. Works without internet after setup.

Code search. Pure semantic search misses exact terms like useEffect or ERR_CONNECTION_REFUSED. Keyword boost catches both meaning and exact matches.

Agent reality. In practice, many AI environments mainly use tool calling. CLI support and Agent Skills make the same workflows available even without full MCP integration.

Search Tuning

Adjust these for your use case:

Variable Default Description RAG_HYBRID_WEIGHT 0.6 Keyword boost factor. 0 = semantic only, higher = stronger keyword boost. RAG_GROUPING (not set) similar for top group only, related for top 2 groups. RAG_MAX_DISTANCE (not set) Filter out low-relevance results (e.g., 0.5). RAG_MAX_FILES (not set) Limit results to top N files (e.g., 1 for single best file).

Code-focused tuning

For codebases and API specs, increase keyword boost so exact identifiers (useEffect, ERR_*, class names) dominate ranking:

Copy & paste — that's it
"env": {
 "RAG_HYBRID_WEIGHT": "0.7",
 "RAG_GROUPING": "similar"
}
  • 0.7 — balanced semantic + keyword

  • 1.0 — aggressive; exact matches strongly rerank results

Keyword boost is applied after semantic filtering, so it improves precision without surfacing unrelated matches.

How It Works

TL;DR:

  • Documents are chunked by semantic similarity, not fixed character counts

  • Each chunk is embedded locally using Transformers.js

  • Search uses semantic similarity with keyword boost for exact matches

  • Results are filtered based on relevance gaps, not raw scores

Details

When you ingest a document, the parser extracts text based on file type (PDF via mupdf, DOCX via mammoth, text files directly).

The semantic chunker splits text into sentences, then groups them using embedding similarity. It finds natural topic boundaries where the meaning shifts—keeping related content together instead of cutting at arbitrary character limits. This produces chunks that are coherent units of meaning, typically 500-1000 characters. Markdown code blocks are kept intact—never split mid-block—preserving copy-pastable code in search results.

Each chunk goes through a Transformers.js embedding model (default: all-MiniLM-L6-v2, configurable via MODEL_NAME), converting text into vectors. Vectors are stored in LanceDB, a file-based vector database requiring no server process.

When you search:

  • Your query becomes a vector using the same model

  • Semantic (vector) search finds the most relevant chunks

  • Quality filters apply (distance threshold, grouping)

  • Keyword matches boost rankings for exact term matching

The keyword boost ensures exact terms like useEffect or error codes rank higher when they match.

Agent Skills

Agent Skills provide optimized prompts that help AI assistants use RAG tools more effectively. Install skills for better query formulation, result interpretation, and ingestion workflows:

Copy & paste — that's it
# Claude Code (project-level)
npx mcp-local-rag skills install --claude-code

# Claude Code (user-level)
npx mcp-local-rag skills install --claude-code --global

# Codex
npx mcp-local-rag skills install --codex

Skills include:

  • Query optimization: Better search query formulation

  • Result interpretation: Score thresholds and filtering guidelines

  • HTML ingestion: Format selection and source naming

Ensuring Skill Activation

Skills are loaded automatically in most cases—AI assistants scan skill metadata and load relevant instructions when needed. For consistent behavior:

Option 1: Explicit request (natural language) Before RAG operations, request in natural language:

  • "Use the mcp-local-rag skill for this search"

  • "Apply RAG best practices from skills"

Option 2: Add to agent instruction file Add to your AGENTS.md, CLAUDE.md, or other agent instruction file:

Copy & paste — that's it
When using query_documents, ingest_file, or ingest_data tools,
apply the mcp-local-rag skill for better query formulation and result interpretation.

Contributing

Contributions welcome! See CONTRIBUTING.md for setup and guidelines.

License

MIT License. Free for personal and commercial use.

Blog Posts

Acknowledgments

Built with Model Context Protocol by Anthropic, LanceDB, and Transformers.js.