Labsco
sachitrafa logo

YourMemory

β˜… 247

from sachitrafa

Persistent memory for AI agents with Ebbinghaus forgetting curve decay, hybrid BM25 + vector + knowledge graph retrieval, temporal reasoning, and a local dashboard. 89.4% Recall@5 on LongMemEval.

πŸ”₯πŸ”₯πŸ”₯βœ“ VerifiedFreeQuick setup

YourMemory

Persistent memory for AI agents β€” built on the science of how humans remember.

What Is YourMemory?

Every session, your AI assistant starts from zero. It asks the same questions, forgets your preferences, re-learns your stack. There is no memory between conversations.

YourMemory fixes that with a one-command install that plugs into Claude, Cursor, Cline, Windsurf, or any MCP client. It gives your AI a persistent memory layer modelled on human cognition:

  • Things that matter stick β€” importance score controls how quickly a memory decays

  • Outdated facts get replaced β€” subject-aware deduplication merges or supersedes memories automatically

  • Related context surfaces together β€” entity graph links memories that share people, places, or concepts

  • Old memories fade naturally β€” Ebbinghaus forgetting curve prunes stale context every 24 hours

Zero infrastructure required. SQLite by default, Postgres for teams.

Table of Contents

  • Benchmarks

  • Quick Start

  • Memory Dashboard

  • Ask Without an LLM Call

  • API Proxy β€” Guaranteed Memory

  • MCP Tools

  • How It Works

  • Multi-Agent Memory

  • Stack

  • Architecture

  • Troubleshooting

  • Contributing

Benchmarks

Three external datasets, all scripts open source and reproducible. Full methodology in BENCHMARKS.md.

LongMemEval-S β€” 500 questions, ~53 distractor sessions each

The hardest standard benchmark for long-term memory systems. Each question is backed by ~53 conversation sessions; the model must retrieve the right one(s) from the haystack.

Metric Score Recall@5 (any gold session in top-5) 89.4% Recall-all@5 (all gold sessions in top-5) 84.8% nDCG@5 (ranking quality) 87.4%

By question type (Recall@5):

Question Type Recall@5 n single-session-assistant 98.2% 56 knowledge-update 96.2% 78 multi-session 95.5% 133 single-session-preference 90.0% 30 temporal-reasoning 84.2% 133 single-session-user 72.9% 70

LoCoMo-10 β€” 1,534 QA pairs across 10 multi-session conversations

Conversations spanning weeks to months. Every system ingests the same session summaries in the same order.

System Recall@5 95% CI YourMemory (BM25 + vector + graph + decay) 59% 56–61% Zep Cloud 28% 26–30% Supermemory 31%* 28–33% Mem0 18%* 16–20%

2Γ— better recall than Zep Cloud across all 10 samples. * Supermemory and Mem0 exhausted free-tier quotas mid-benchmark; scores computed over full 1,534 pairs using 0 for unfinished samples.

HotpotQA β€” 200 multi-hop questions requiring two facts from different articles

System BOTH_FOUND@5 YourMemory (vector + BM25 + entity graph) 71.5% YourMemory (no entity edges) 59.5%

Entity graph edges add +12 pp β€” they traverse from Fact 1 to Fact 2 even when Fact 2 has low embedding similarity to the query.

Writeup: I built memory decay for AI agents using the Ebbinghaus forgetting curve

Memory Dashboard

Two built-in browser UIs β€” no extra setup, start automatically with the MCP server.

Memory Browser β€” http://localhost:3033/ui

A full read/write view of everything stored in memory.

What you see Details Stats bar Total Β· Strong β‰₯50% Β· Fading 5–50% Β· Near prune <10% Agent tabs All / User / per-agent views Memory cards Content Β· strength bar Β· category Β· recall count Β· last accessed Filters Category (fact / strategy / assumption / failure) Β· Sort by strength, recency, recall

Pass ?user=<id> to pre-load a specific user: http://localhost:3033/ui?user=sachit

Graph Visualiser β€” http://localhost:3033/graph

An interactive force-directed map of how memories connect.

Copy & paste β€” that's it
http://localhost:3033/graph?memoryId=42&userId=sachit&depth=2
  • Root memory as a larger cyan node; neighbours color-coded by category

  • Edge thickness = connection strength

  • Click any node for full content; drag, zoom, reposition freely

Ask Without Calling the API

The only memory system that can answer questions without making any LLM API call.

Copy & paste β€” that's it
yourmemory ask "what database does this project use"
# β†’ YourMemory uses DuckDB locally and Postgres in production.

yourmemory ask "what port does the dashboard run on"
# β†’ 3033

yourmemory ask "how do I fix a kubernetes deployment"
# β†’ Not enough memory context to answer without Claude.

When memory is strong enough, it answers instantly β€” zero tokens, zero cloud cost, zero latency. When it isn't, it declines cleanly rather than hallucinating.

Query Mem0 / Zep / LangMem YourMemory "What port does the server run on?" Full LLM API call Instant, $0 "What database does this project use?" Full LLM API call Instant, $0 "How do I fix a k8s deployment?" Full LLM API call Declines β†’ Claude Privacy Query sent to cloud Never leaves your machine

API Proxy β€” Guaranteed Memory

MCP tools are called at the AI's discretion. The API proxy removes that uncertainty β€” it intercepts every LLM call, injects relevant memories automatically, and handles store_memory / update_memory without any model configuration.

Start the YourMemory server (yourmemory), then point your LLM client at localhost:3033:

OpenAI

Copy & paste β€” that's it
from openai import OpenAI

client = OpenAI(
 api_key="sk-...",
 base_url="http://localhost:3033/proxy/openai"
)

# Memory is injected automatically β€” no other changes needed
response = client.chat.completions.create(
 model="gpt-4o",
 messages=[{"role": "user", "content": "What database do I use?"}]
)

Anthropic

Copy & paste β€” that's it
from anthropic import Anthropic

client = Anthropic(
 api_key="sk-ant-...",
 base_url="http://localhost:3033/proxy/anthropic"
)

response = client.messages.create(
 model="claude-opus-4-8",
 max_tokens=1024,
 messages=[{"role": "user", "content": "What database do I use?"}]
)

Per-user memory

Pass X-YourMemory-User to isolate memory per person:

Copy & paste β€” that's it
client = OpenAI(
 api_key="sk-...",
 base_url="http://localhost:3033/proxy/openai",
 default_headers={"X-YourMemory-User": "sachit"}
)

How it works

On every request the proxy:

  • Recalls relevant memories and injects them into the system prompt β€” guaranteed, no tool call needed

  • Adds store_memory and update_memory as tools β€” the model calls them when it learns something new

  • Executes those tool calls locally and returns the final response transparently

Streaming note: recall injection works for all requests. Tool call interception (store/update) works for non-streaming requests only β€” streaming passes through and tools execute on the next turn.

MCP Tools

Three tools, called by your AI automatically.

Tool When your AI calls it What it does recall_memory(query, current_path?) Start of every task Surfaces memories ranked by similarity Γ— decay strength; spatial boost for path-matched memories store_memory(content, importance, category?, context_paths?) After learning something new Embeds, deduplicates, stores with decay; tags optional file/dir paths update_memory(id, new_content, importance) When a stored fact is outdated Re-embeds and replaces; logs old content to audit trail

Copy & paste β€” that's it
# Store with spatial context
store_memory(
 "Sachit prefers tabs over spaces in Python",
 importance=0.9,
 category="fact",
 context_paths=["/projects/backend"]
)

# Next session β€” spatial boost fires when working in that directory
recall_memory("Python formatting", current_path="/projects/backend")
# β†’ {"content": "Sachit prefers tabs over spaces in Python", "strength": 0.87}

Memory categories control decay rate

Category Half-life Best for strategy ~38 days Patterns that worked, architectural decisions fact ~24 days Preferences, identity, stable knowledge assumption ~19 days Inferred context, uncertain beliefs failure ~11 days Errors, wrong approaches, environment-specific issues

How It Works

Ebbinghaus Forgetting Curve

Memory strength decays exponentially. Importance and recall frequency slow that decay:

Copy & paste β€” that's it
effective_Ξ» = base_Ξ» Γ— (1 βˆ’ importance Γ— 0.8)
strength = clamp(importance Γ— e^(βˆ’effective_Ξ» Γ— active_days) Γ— (1 + recall_count Γ— 0.2), 0, 1)
hybrid_score = 0.4 Γ— bm25_norm + 0.6 Γ— cosine_similarity

active_days counts only days the user was active β€” vacations don't cause memory loss. Memories below strength 0.05 are pruned automatically every 24 hours.

Session wrap-up: recalled memory IDs are tracked per session. When a session goes idle (30 min default), those memories get a recall_count boost. Set YOURMEMORY_SESSION_IDLE to change the window.

Recall throttling: identical (user, query) pairs are cached within a configurable window. Set YOURMEMORY_RECALL_COOLDOWN (seconds, default 0 = off).

Hybrid Retrieval: Vector + BM25 + Entity Graph

Retrieval runs in two rounds:

Round 1 β€” Hybrid search: cosine similarity + BM25 keyword scoring, returns top-k candidates above threshold.

Round 2 β€” Graph expansion: BFS traversal from Round 1 seeds surfaces memories that share context but not vocabulary β€” connected via semantic or entity edges.

Copy & paste β€” that's it
recall("Python backend")
 Round 1 β†’ [1] Python/MongoDB (sim=0.61)
 [2] DuckDB/spaCy (sim=0.19)
 Round 2 β†’ [5] Docker/Kubernetes (sim=0.29 β€” below cut-off, surfaced via shared entity "backend")

Chain-aware pruning: a decayed memory is kept alive if any graph neighbour is above the prune threshold. Related memories age together.

Subject-Aware Deduplication

Before storing, YourMemory checks whether the new memory is about the same entity as the nearest existing one:

Copy & paste β€” that's it
"Sachit uses DuckDB" vs "YourMemory uses DuckDB"
 subject: Sachit subject: YourMemory
 β†’ different entities β†’ stored separately βœ“

"YourMemory uses DuckDB" vs "YourMemory stores data in DuckDB"
 subject: YourMemory subject: YourMemory
 β†’ same entity β†’ merged βœ“

Subject comparison embeds the first two tokens of each sentence β€” no hardcoded word lists, generalises to any language.

Multi-Agent Memory

Multiple agents can share one YourMemory instance β€” each with isolated private memories and controlled access to shared context.

Copy & paste β€” that's it
from src.services.api_keys import register_agent

result = register_agent(
 agent_id="coding-agent",
 user_id="sachit",
 can_read=["shared", "private"],
 can_write=["shared", "private"],
)
# β†’ result["api_key"] β€” ym_xxxx (shown once only)
Copy & paste β€” that's it
# Agent stores a private failure memory
store_memory(
 "Staging uses self-signed cert β€” skip SSL verify",
 importance=0.7, category="failure",
 api_key="ym_xxxx", visibility="private"
)

# Recalls shared + its own private memories; other agents see shared only
recall_memory("staging SSL", api_key="ym_xxxx")

Stack

Component Role DuckDB Default vector DB β€” zero setup, native cosine similarity NetworkX Default graph backend β€” persists at ~/.yourmemory/graph.pkl sentence-transformers Local embeddings (multi-qa-mpnet-base-dot-v1, 768 dims) spaCy Local NLP for deduplication and entity extraction APScheduler Automatic 24h decay and pruning job PostgreSQL + pgvector Optional β€” for teams or large datasets Neo4j Optional graph backend

Architecture

Copy & paste β€” that's it
Claude / Cline / Cursor / Any MCP client
 β”‚
 β”œβ”€β”€ recall_memory(query, current_path?, api_key?)
 β”‚ └── throttle check β†’ embed β†’ hybrid search (Round 1)
 β”‚ β†’ graph BFS expansion (Round 2)
 β”‚ β†’ score = sim Γ— strength
 β”‚ β†’ spatial boost (+0.08) if current_path matches context_paths
 β”‚ β†’ temporal boost (+0.25) if query has time window expression
 β”‚ β†’ session tracking β†’ recall_count bump on session end
 β”‚
 β”œβ”€β”€ store_memory(content, importance, category?, context_paths?, api_key?)
 β”‚ └── question? β†’ reject
 β”‚ subject-aware dedup β†’ same entity? merge/reinforce : new
 β”‚ embed() β†’ INSERT β†’ index_memory() β†’ graph node + edges
 β”‚ record_activity(user_id) β†’ active days log
 β”‚
 └── update_memory(id, new_content, importance)
 └── log old content β†’ memory_history (audit trail)
 embed(new_content) β†’ UPDATE β†’ refresh graph node

 Vector DB (Round 1) Graph DB (Round 2)
 DuckDB (default) NetworkX (default)
 memories.duckdb graph.pkl
 β”œβ”€β”€ embedding FLOAT[768] β”œβ”€β”€ nodes: memory_id, strength
 β”œβ”€β”€ importance FLOAT └── edges: sim Γ— verb_weight β‰₯ 0.4
 β”œβ”€β”€ recall_count INTEGER
 β”œβ”€β”€ context_paths JSON Neo4j (opt-in)
 β”œβ”€β”€ created_at TIMESTAMP └── bolt://localhost:7687
 β”œβ”€β”€ visibility VARCHAR
 β”œβ”€β”€ agent_id VARCHAR
 user_activity (active days log)
 memory_history (supersession audit)

Contributing

PRs are welcome. See CONTRIBUTORS.md for contributors who have already improved YourMemory.

Dataset References

  • LoCoMo β€” Maharana et al. (2024). LoCoMo: Long Context Multimodal Benchmark for Dialogue. Snap Research.

  • LongMemEval β€” Wu et al. (2024). LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory.

  • HotpotQA β€” Yang et al. (2018). HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering.

License

Copyright 2026 Sachit Misra β€” Licensed under CC-BY-NC-4.0.

Free for: personal use, education, academic research, open-source projects. Not permitted: commercial use without a separate written agreement.

Commercial licensing: [emailΒ protected]