Cognitive Memory OS for AI Agents

The First AI Memory That Thinks Like a Brain

5-layer cognitive architecture. Zero-LLM ingestion pipeline. Self-improving memory that decays, consolidates, and reasons — like a real brain. The memory OS that every other system forgot to build.

npm installmnemosy-ai
33
Production Features
$0
Per Memory Stored
<50ms
Ingestion Latency
Built in Production

Not a Demo. Not a Roadmap. Running Right Now.

Mnemosyne powers a 10-machine AI fleet with sub-200ms retrieval latency. Every feature on this page is in production.

13,000+
Memories in Production
Managed across a live 10-machine AI fleet
10
Concurrent Agents
Real-time pub/sub mesh, zero locking
33
Shipping Features
Every feature listed is deployed — not roadmap
28
Features Nobody Else Has
Capabilities that exist in no other system
<50ms store latency (zero LLM calls)
<10ms cached recall
>60% cache hit rate
~1,000 memories/min consolidation
The Problem

AI Agents Have Amnesia. Mnemosyne Gives Them a Brain.

Every Other Memory System

  • Burns LLM tokens on every memory stored (~$0.01 each)
  • Retrieves by single signal only (cosine similarity)
  • No knowledge graph, or paywalled at $249/mo
  • Single-agent only — no multi-agent collaboration
  • Static storage — quality degrades over time
  • No activation decay, no consolidation, no reasoning

Mnemosyne

  • Memories persist across sessions, restarts, and context window resets
  • Retrieval uses 5 signals weighted by detected intent — not just cosine similarity
  • Knowledge grows through auto-linking, graph expansion, and cross-agent corroboration
  • Quality improves via reinforcement learning and autonomous 4-phase consolidation
  • Agents collaborate through a real-time memory mesh with pub/sub broadcasting
  • Zero LLM calls during ingestion — deterministic, $0 per memory, works offline
Quick Start

5 Lines to Cognitive Memory

Drop-in TypeScript SDK. Zero LLM required for the ingestion pipeline. Your agent has a brain in under a minute.

index.ts
TypeScript
Bring your own vector store — or use our Docker Compose to spin everything up in one command.

Works With Your Stack

Vector DB
Qdrant, Pinecone, Weaviate, Chroma — configurable
Cache
Redis, Memcached, or in-memory
Graph
FalkorDB, Neo4j, or disable entirely
Embeddings
Any OpenAI-compatible endpoint

Mnemosyne is the brain. The databases are just where data lives.

Architecture

5-Layer Cognitive Architecture

Inspired by how the human brain organizes, retrieves, and strengthens memories over time. Each layer is independently toggleable.

L5

Self-Improvement

Reinforcement Learning, Active Consolidation, Flash Reasoning, Theory of Mind

L4

Cognitive

Activation Decay, Multi-Signal Scoring, Intent-Aware Retrieval, Diversity Reranking

L3

Knowledge Graph

Temporal Graph, Auto-Linking, Path Finding, Timeline Reconstruction

L2

Pipeline

12-Step Zero-LLM Ingestion: Security, Embed, Dedup, Extract, Classify, Score, Link, Graph, Broadcast

L1

Infrastructure

Pluggable Storage Backends: Vector DB, Graph DB, 2-Tier Cache, Pub/Sub — bring your own or use defaults

33 Features

Every Feature. All Shipping.

33 production features across 5 cognitive layers. Every feature is independently toggleable — start simple, enable progressively.

L1

Infrastructure

6 features

Vector Storage

768-dim embeddings with HNSW indexing, sub-linear scaling to billions. Supports Qdrant, Pinecone, Weaviate, Chroma

2-Tier Cache

L1 in-memory (50 entries, 5min TTL) + L2 cache (1hr TTL) for sub-10ms recall. Supports Redis, Memcached, or in-memory

Pub/Sub Broadcast

Real-time memory events across your entire agent mesh via configurable pub/sub backend

Knowledge Graph

Temporal entity graph with auto-linking, path finding, timeline reconstruction. Supports FalkorDB, Neo4j, or disable

Bi-Temporal Model

Every memory tracks eventTime (when it happened) + ingestedAt (when stored)

Soft-Delete Architecture

Memories are never physically deleted — full audit trails and recovery

L2

Pipeline

6 features

12-Step Ingestion

Security → embed → dedup → extract → classify → score → link → graph → broadcast

Zero-LLM Pipeline

All classification, extraction, scoring runs algorithmically — $0 per memory, <50ms

Security Filter

3-tier classification (public/private/secret), blocks API keys and credentials

Smart Dedup & Merge

Cosine ≥0.92 = duplicate merge, 0.70–0.92 = conflict alert broadcast

Entity Extraction

Automatic identification of people, machines, IPs, dates, technologies, URLs — zero LLM

7-Type Taxonomy

Episodic, semantic, preference, relationship, procedural, profile, core — algorithmic

L3

Knowledge Graph

5 features

Temporal Queries

"What was X connected to as of date Y?" — relationships carry since timestamps

Auto-Linking

New memories automatically discover and link to related memories, bidirectional, Zettelkasten-style

Path Finding

Shortest-path queries between any two entities with configurable max depth

Timeline Reconstruction

Ordered history of all memories mentioning a given entity

Depth-Limited Traversal

Configurable graph exploration (default: 2 hops) balancing relevance vs. noise

L4

Cognitive

6 features

Activation Decay

Logarithmic decay model — critical memories stay months, core/procedural are immune

Multi-Signal Scoring

5 signals: similarity, recency, importance×confidence, frequency, type relevance

Intent-Aware Retrieval

Auto-detects query intent (factual, temporal, procedural, preference, exploratory)

Diversity Reranking

Cluster detection, overlap penalty, type diversity — prevents echo chambers in results

4-Tier Confidence

Mesh Fact ≥0.85, Grounded 0.65–0.84, Inferred 0.40–0.64, Uncertain <0.40

Priority Scoring

Urgency × Domain composite score — critical+technical = 1.0, background+general = 0.2

L5

Self-Improvement

10 features

Reinforcement Learning

Feedback loop tracks usefulness, auto-promotes memories with >0.7 ratio after 3+ retrievals

Active Consolidation

4-phase autonomous maintenance: contradiction detection, dedup merge, promotion, demotion

Flash Reasoning

BFS traversal through linked memory graphs, reconstructs multi-step logic chains

Theory of Mind (TOMA)

"What does Agent-B know about X?" — knowledge gap analysis across the mesh

Cross-Agent Synthesis

3+ agents agree on a fact → auto-synthesized into fleet-level insight

Proactive Recall

Generates speculative queries from incoming prompts, injects context before the agent asks

Session Survival

Snapshot/recovery across context window resets — zero discontinuity

Observational Memory

Compresses raw conversation streams into structured, high-signal memory cells

Procedural Memory

Learned procedures as first-class objects, immune to decay, shared across the mesh

Mesh Sync

Named, versioned shared state blocks with real-time broadcast propagation

Research-Grade

10 Capabilities From Research Papers. All Production-Ready.

These capabilities exist almost exclusively in academic literature and closed research labs. Mnemosyne ships all 10 as deployable infrastructure.

CapabilityIndustry StatusMnemosyne
Flash Reasoning
Chain-of-thought traversal through linked memory graphs with BFS and cycle detection
Research paper onlyProduction
Theory of Mind for Agents
Agents model what other agents know — enabling task routing and collaborative problem-solving
Research paper onlyProduction
Observational Memory
Raw conversation streams compressed into structured observations, like human working memory
Research paper onlyProduction
Reinforcement Learning on Memory
Feedback loop auto-promotes useful memories and flags misleading ones
Research paper onlyProduction
Self-Improving Consolidation
4-phase autonomous maintenance: contradictions, dedup, promotion, demotion
Not implemented anywhereProduction
Cross-Agent Cognitive State
Named, versioned shared blocks participate in retrieval, reasoning, and consolidation
Not implemented anywhereProduction
Bi-Temporal Knowledge Graph
Tracks what was true at any point in time — eventTime + ingestedAt on every relationship
Research paper onlyProduction
Proactive Anticipatory Recall
Speculative queries surface relevant context before the agent asks for it
Not implemented anywhereProduction
Procedural Memory / Skill Library
Learned procedures as first-class objects, immune to decay, shared across the mesh
Not implemented anywhereProduction
Session Survival
Cognitive continuity across context window resets via snapshot/recovery
Not implemented anywhereProduction
Comparison

33 Features. 28 That Nobody Else Has.

Every feature listed is in production. Not planned. Not in beta. Shipping.

FeatureMnemosyneMem0ZepCogneeLangMemLetta
Pipeline & Ingestion
Zero-LLM Ingestion Pipeline
12-Step Structured Pipeline
Security Filter (Secret Blocking)
Smart Dedup with Semantic Merge
Conflict Detection & Alerts
7-Type Memory Taxonomy
Entity Extraction (Zero-LLM)LLM $LLM $LLM $
Cognitive Features
Activation Decay Model
Multi-Signal Scoring (5 Signals)
Intent-Aware Retrieval
Diversity Reranking
Flash Reasoning Chains
Reinforcement Learning
Active Consolidation (4-Phase)
Proactive Recall
Session Survival (Compaction)
Observational Memory
Preference Modeling
Knowledge Graph
Built-in Knowledge Graph$249/mo
Temporal Graph Queries
Auto-Linking (Bidirectional)
Path Finding Between Entities
Timeline Reconstruction
Bi-Temporal Data Model
Multi-Agent
Real-Time Broadcast (Pub/Sub)
Theory of Mind (TOMA)
Cross-Agent Synthesis
Knowledge Gap Analysis
Shared State Blocks (Mesh Sync)
Infrastructure
2-Tier Caching (L1 + L2)
Soft-Delete Architecture
Procedural Memory (Skill Library)
CLI Tools
Mnemosyne 33/33
Mem0 5/33
Cognee 5/33
Letta 4/33
Zep 3/33
LangMem 2/33
Open Source

MIT License. Free Knowledge Graph. $0 Per Memory.

Competitors paywall their graph, charge per memory via LLM, and gate multi-agent behind enterprise tiers.

MnemosyneMem0ZepLetta
LicenseMIT (fully open)Open core (limited)Open core (limited)Open source
Self-hostedFree — all featuresFree — limited featuresFree — limited featuresFree
Knowledge graphFree (self-hosted)$249/mo (Pro tier)N/AN/A
Per memory stored$0.00 (zero LLM)~$0.01 (LLM call)~$0.01 (LLM call)~$0.01 (LLM call)
100K memories cost$0~$1,000~$1,000~$1,000
Multi-agentFree — built-in meshEnterprise pricingN/AN/A
Cognitive featuresFree — all 10N/AN/ASession mgmt only

LLM costs estimated at ~$0.01/memory (conservative). Mnemosyne's zero-LLM pipeline has exactly $0 in per-memory costs beyond infrastructure.

How It Works

Store. Recall. Learn. Collaborate.

Four operations that transform stateless agents into intelligent, collaborative systems.

01

Store

Input goes through a 12-step zero-LLM pipeline: security filter, embedding, dedup, extraction, classification, scoring, linking, graph ingest, and broadcast. All in <50ms.

02

Recall

Queries hit the 2-tier cache, then vector search with 5-signal intent-aware scoring, diversity reranking, graph enrichment, and flash reasoning chains.

03

Learn

Feedback signals promote useful memories and flag poor ones. Active consolidation merges duplicates, resolves contradictions, and promotes popular knowledge.

04

Collaborate

Agents share memories via pub/sub mesh. Theory of Mind queries what others know. 3+ agents agreeing on a fact triggers fleet-level synthesis.

Performance

10-40x Faster Ingestion Than Any Competitor

Real numbers from a 10-machine AI fleet running Mnemosyne in production.

0+
Memories Managed
<0ms
Store Latency
<0ms
Cached Recall
0+
Concurrent Agents
OperationMnemosyneLLM-Based Systems
Store (full pipeline)<50ms500ms – 2s
Recall (cached)<10msNo caching
Recall (uncached)<200ms200ms – 500ms
Consolidation~1,000/minNot available
Embedding generation~15ms (cached)~15ms
Use Cases

Built for Agents That Actually Think

From single-agent coding assistants to enterprise-scale agent meshes.

AI Coding Assistants

Session survival, procedural memory, temporal graph. Agents remember project context, deployment procedures, and past debugging sessions.

Enterprise Knowledge Agents

Agent mesh, Theory of Mind, cross-agent synthesis. Specialized agents (HR, IT, Finance) build domain expertise while sharing verified facts.

Customer Support

Preference tracking, reinforcement learning, procedural memory. Agents remember customer history, resolution patterns, and preferences.

Research Assistants

Flash reasoning, auto-linking, knowledge graph. Agents accumulate domain knowledge and surface non-obvious connections between findings.

DevOps & Infrastructure

Temporal queries, proactive warnings, mesh sync. Agents remember topology, incidents, and answer "What changed since last stable state?"

Personal AI Companions

Activation decay, observational memory, preference modeling. Long-running assistants that develop genuine understanding over time.

API

9 Tools. Any Agent Framework.

Drop-in tools that work with Claude, GPT, LangChain, CrewAI, AutoGen, or any LLM agent framework.

memory_recall

Multi-signal search with intent detection, diversity reranking, graph enrichment, flash reasoning

memory_store

Full 12-step ingestion: security filter, dedup, classify, link, graph ingest, broadcast

memory_forget

Soft-delete by ID or semantic search, mesh-wide cache invalidation

memory_block_get

Read a named shared memory block (Mesh Sync)

memory_block_set

Write/update a named shared block with versioning and broadcast

memory_feedback

Reinforcement signal — drives memory promotion/demotion

memory_consolidate

4-phase active maintenance: contradictions, dedup, promotion, demotion

memory_toma

Query what a specific agent knows about a topic (Theory of Mind)

before_agent_start

Automatic hook: session recovery, proactive recall, context injection

Give Your Agents the Memory They Deserve

Open source. MIT licensed. 33 features. $0 per memory. Deploy cognitive memory in minutes.

npm installmnemosy-ai

Because intelligence without memory isn't intelligence.