memweave: Zero-Infra AI Agent Memory with Markdown and SQLite — No Vector Database Required

Disclosure: memweave is an open-source venture I constructed. This text describes the issue it addresses and the design selections behind it.

Image this: you spend a day constructing an AI coding assistant. It learns your venture’s conventions, remembers that you simply use Valkey as a substitute of Redis, and is aware of your crew’s most popular testing patterns. The session ends. You open a brand new dialog the subsequent morning, and you’ve got forgotten all the pieces. Again to sq. one.

That is the default state of each LLM agent. Fashions are stateless by design. Every name begins with a clean slate. Reminiscence is your drawback to unravel.

The commonest workaround is to stuff your entire dialog historical past into the context window. It really works — till it doesn’t. Context home windows are finite and costly. An extended-running agent accumulates 1000’s of tokens of historical past, most of that are irrelevant to the present query. You find yourself paying to repeatedly feed your agent final week’s debugging notes when all it wants is one structure choice from three months in the past.

So that you attain for a vector database. Spin up Chroma, or provision a Pinecone index, embed all the pieces, and question by semantic similarity. This works too — but it surely introduces a brand new class of issues:

Opacity. Your agent’s reminiscence lives in a binary index you can’t open, learn, or purpose about. What does your agent truly know? You may solely discover out by querying it.
No model management. There is no such thing as a git diff for a vector retailer. You can’t see what an agent discovered between runs, audit its information, or roll again a foul reminiscence.
Infrastructure overhead. Even for a single native agent, you now have a server course of to handle, credentials to configure, and a service to maintain operating.
Stale reminiscence, no treatment. A vector DB ranks outcomes by semantic similarity, full cease. A debugging notice from six months in the past competes on equal footing with a call made this morning. Older, stale context surfaces confidently alongside contemporary information — and there’s no built-in mechanism to choose the current over the outdated.
Invisible edits. If you wish to right a reminiscence — repair a mistaken assumption the agent saved — you should delete and re-embed. You can’t simply open the file and alter a line.

The deeper challenge is that none of those instruments had been designed for agent reminiscence. They had been designed for doc retrieval at scale. Utilizing them for a private or project-scoped agent is like deploying a PostgreSQL cluster to retailer a config file.

There’s a less complicated means.

The Method: Markdown + SQLite

Photograph by Soragrit Wongsa on Unsplash

The core thought behind memweave is intentionally easy: reminiscences are .md recordsdata you write to disk. memweave indexes them into an area SQLite database and allows you to search throughout them with hybrid BM25 + semantic vector search. The database is at all times a derived cache — when you delete it, memweave rebuilds it from the recordsdata. The recordsdata are the supply of fact.

pip set up memweave

Right here is all the pieces you should give an agent persistent reminiscence:

import asyncio
from pathlib import Path
from memweave import MemWeave, MemoryConfig

async def fundamental():
    async with MemWeave(MemoryConfig(workspace_dir=".")) as mem:
        # Write a reminiscence - only a plain Markdown file
        memory_file = Path("reminiscence/stack.md")
        memory_file.mum or dad.mkdir(exist_ok=True)
        memory_file.write_text("We use Valkey as a substitute of Redis. Goal latency SLA: 5ms p99.")
        await mem.add(memory_file)

        # Search throughout all reminiscences.
        # min_score=0.0 ensures outcomes floor in a small corpus;
        # in manufacturing the default 0.35 threshold filters low-confidence matches.
        outcomes = await mem.search("caching layer choice", , min_score=0.0)
        for r in outcomes:
            print(f"[{r.score:.2f}] {r.snippet}  ← {r.path}:{r.start_line}")

asyncio.run(fundamental())

Output:

[0.34] We use Valkey as a substitute of Redis. Goal latency SLA: 5ms p99.  ← reminiscence/stack.md:1

Each end result contains its relevance rating, the precise file it got here from, and the road quantity—full supply provenance out of the field. No post-processing wanted to hint the place a solution originated.

And since reminiscences are simply recordsdata, you possibly can examine them with any device you have already got:

cat reminiscence/stack.md
grep -r "Valkey" reminiscence/
git diff reminiscence/

That final command — git diff reminiscence/ — is the one which modifications how you consider agent reminiscence. Each truth your agent shops is a line in a file. Each session is a commit. What your agent discovered is as auditable as another change in your codebase.

Why Information and SQLite As a substitute of a Vector Database

Vector databases had been designed for large-scale doc retrieval — hundreds of thousands of paperwork, multi-tenant providers, and manufacturing search infrastructure. They’re glorious at that job. Agent reminiscence is a special job completely: a whole bunch of 1000’s of recordsdata, private or project-scoped, the place the information is as vital because the code itself. These constraints pushed me towards a special set of tradeoffs:

memweave vs Vector Databases (picture by writer)

Every of those variations compounds in observe, however model management illustrates the hole most concretely. Contemplate what occurs when your agent shops a mistaken assumption — say, it discovered that your crew makes use of PostgreSQL once you truly migrated to CockroachDB final quarter. With a vector DB, correcting this implies discovering the proper embedding, deleting it, and re-inserting the corrected model by way of API. With memweave, you open the file and repair the road. You then commit it.

# git diff reminiscence/stack.md

- Database: PostgreSQL (main), Redis (cache)
+ Database: CockroachDB (main, migrated Q1 2026), Valkey (cache)
+ Cause: geo-distribution requirement from the platform crew

That diff is now a part of your venture historical past. Any teammate — or any future agent — can see what modified, when, and why. That is the operational mannequin that memweave is constructed round: agent reminiscence as a first-class artifact of your venture, not a side-effect saved in a service you possibly can’t examine.

Structure

memweave is constructed round one central thought: separate storage from search. The Markdown recordsdata are the supply of fact. The SQLite database is a derived index — at all times rebuildable, by no means irreplaceable.

┌──────────────────────────────────────────────────────────────┐
│                 SOURCE OF TRUTH  (Markdown recordsdata)            │
│   reminiscence/MEMORY.md          ← evergreen information            │
│   reminiscence/2026-03-21.md      ← day by day logs                     │
│   reminiscence/researcher_agent/  ← agent-scoped namespace         │
└───────────────────────┬──────────────────────────────────────┘
                        │  chunking → hashing → embedding
┌───────────────────────▼──────────────────────────────────────┐
│                  DERIVED INDEX  (SQLite)                     │
│   chunks          - textual content + metadata                          │
│   chunks_fts      - FTS5 full-text index  (BM25)             │
│   chunks_vec      - sqlite-vec SIMD index (cosine)           │
│   embedding_cache - hash → vector  (compute as soon as, reuse)     │
│   recordsdata           - SHA-256 change detection                 │
└───────────────────────┬──────────────────────────────────────┘
                        │  hybrid merge → post-processing
                        ▼
              checklist[SearchResult]

This separation has a sensible consequence that’s straightforward to miss: dropping the database is just not information loss. Dropping the recordsdata is. If the SQLite index is deleted or corrupted, await mem.index() rebuilds it fully from the Markdown recordsdata within the workspace. No information is gone. No embeddings should be re-fetched if the cache is undamaged.

The Write Path

Whenever you name await mem.add(path) or await mem.index(), memweave processes every file by way of a deterministic pipeline — no LLM concerned at any step:

.md file
    │
    ▼
chunking                  - cut up into overlapping textual content chunks
    │
    ▼
sha256(chunk_text)        - fingerprint every chunk by content material
    │
    ▼
embedding cache lookup    - bulk SQL question: which hashes are already cached?
    │
    ├── cache hit  ──────── reuse saved vector, skip API name
    │
    └── cache miss ──────── name embedding API (batched)
                │
                ▼
         retailer in cache   - write vector to embedding_cache desk
                │
                ▼
    insert into FTS5 + sqlite-vec tables

The SHA-256 hash is the important thing effectivity lever. A piece’s hash is set completely by its textual content content material — so if a file is re-indexed and 90% of its chunks are unchanged, solely the modified chunks set off an API name. The remainder are served from cache immediately.

The Search Path

Whenever you name await mem.search(question), each search backends run in parallel towards the identical question and their outcomes are merged earlier than post-processing:

question
    │
    ├─── FTS5 BM25 (key phrase) ─────────────────────┐
    │    precise time period matching                      │
    │                                             ▼
    └─── sqlite-vec ANN (semantic) ──────► weighted merge
         cosine similarity                rating = 0.7 × vector
                                               + 0.3 × BM25
                                                   │
                                                   ▼
                                          post-processing pipeline
                                          (threshold → decay → MMR)
                                                   │
                                                   ▼
                                          checklist[SearchResult]

Working each backends in parallel issues: BM25 catches precise matches — error codes, config values, correct names — whereas vector search catches semantically associated content material even when no key phrases overlap. Collectively they cowl the total vary of how an agent’s reminiscence is prone to be queried. The post-processing pipeline that follows the merge is roofed intimately in subsequent sections.

Why SQLite because the Infrastructure Layer?

The selection of SQLite deserves a short notice. SQLite is just not a compromise — it’s a deliberate match for this use case. It ships with Python, requires no server, helps full-text search by way of FTS5, and with the sqlite-vec extension good points SIMD-accelerated vector similarity search. Your entire reminiscence retailer — chunks, embeddings, cache, file metadata — is a single file on disk that you could copy, again up, or examine with any SQLite browser. For the size of agent reminiscence (1000’s of recordsdata), it isn’t simply ample — it’s optimum.

How memweave Organises Reminiscence: Evergreen Information, Dated Logs, and Agent Namespaces

Not all information ages equally. A crew’s choice to make use of CockroachDB over PostgreSQL is as related right this moment because the day it was made. A debugging notice from a session six months in the past most likely isn’t. memweave enforces this distinction on the file degree — no metadata tagging, no configuration, only a naming conference.

There are two varieties of reminiscence recordsdata:

Forms of reminiscence recordsdata in memweave (picture by writer)

The rule is straightforward: any file whose title matches YYYY-MM-DD.md is dated. All the pieces else is evergreen. memweave reads the date immediately from the filename — no file system metadata, no frontmatter parsing, no handbook tagging.

A typical workspace organises itself naturally round this conference:

reminiscence/
├── MEMORY.md                  ← evergreen - everlasting details, at all times surfaces
├── structure.md            ← evergreen - stack selections, constraints
├── 2026-01-15.md              ← dated - session notes from January
├── 2026-03-10.md              ← dated - session notes from March
├── 2026-04-11.md              ← dated - right this moment's session, full rating for now
└── researcher_agent/
    ├── findings.md            ← evergreen - agent's standing information
    └── 2026-04-11.md          ← dated - agent's session log, will decay

Over time, the dated recordsdata accumulate and fade. The evergreen recordsdata stay anchored at full rating no matter how a lot historical past builds up round them. An agent asking concerning the tech stack at all times will get structure.md on the prime of its outcomes — even when a whole bunch of session logs have been written since.

Agent Namespaces (allows Multi-Agent Reminiscence)

When a number of brokers share one workspace, you want a option to maintain their information remoted with out spinning up separate databases. memweave handles this by way of subdirectories. The rapid subdirectory beneath reminiscence/ turns into the supply label for each file inside it:

memweave agent namespaces examples (picture by writer)

Every agent writes to its personal subdirectory. All brokers index towards the identical SQLite database. Searches are international by default — any agent can learn another agent’s reminiscences. Go source_filter to scope a search completely to at least one namespace:

# Researcher writes to its personal namespace
researcher = MemWeave(MemoryConfig(workspace_dir="./venture"))
author     = MemWeave(MemoryConfig(workspace_dir="./venture"))

async with researcher, author:
    # Researcher indexes its findings beneath reminiscence/researcher_agent/
    await researcher.index()

    # Author queries solely the researcher's namespace
    outcomes = await author.search(
        "water ice on the Moon",
        source_filter="researcher_agent",
    )

This sample scales naturally to any variety of brokers. Every agent’s information is remoted by path conference, inspectable as a folder, and versionable independently — git log reminiscence/researcher_agent/ reveals precisely what that agent discovered and when.

memweave Search Pipeline

Each mem.search(question) name strikes by way of 5 fastened phases so as. Every stage is impartial, composable, and tunable. Right here is the total pipeline, then every stage intimately.

Stage 1 — Hybrid Rating Merge

Each backends run in parallel towards the identical question and their scores are normalised then linearly mixed:

merged_score = α × vector_score + (1 − α) × bm25_score

Default α = 0.7. Every backend contributes what it does finest:

FTS5 BM25 ranks by time period frequency and inverse doc frequency. It’s a precision anchor — precise technical phrases, error codes, config values, and correct names rating excessive. In case your question and your doc use the identical phrases, BM25 finds it.
sqlite-vec cosine similarity measures distance in embedding area. It catches semantically associated content material even when no key phrases overlap — a question for “caching layer” will floor a piece mentioning “Redis latency” as a result of the embeddings are shut, although the phrases differ.

The 70/30 cut up displays the character of most agent reminiscence queries: conceptual and paraphrased extra usually than exact-string lookups. Tune the weights by way of HybridConfig , in case your use case skews towards exact technical retrieval:

from memweave.config import MemoryConfig, QueryConfig, HybridConfig

config = MemoryConfig(
    question=QueryConfig(
        hybrid=HybridConfig(
            vector_weight=0.5,   # equal weight for keyword-heavy corpora
            text_weight=0.5,
        )
    )
)

Stage 2 — Rating Threshold

drop end result if merged_score < min_score   (default: 0.35)

A noise gate that runs earlier than the dearer post-processing phases. With out it, low-confidence tail outcomes enter MMR and decay calculations and waste compute. The default of 0.35 is calibrated for typical agent reminiscence corpora — decrease it for small workspaces the place you need extra outcomes to floor, increase it when precision issues greater than recall.

# Override per name - no config change wanted
outcomes = await mem.search("structure choice", min_score=0.5)

Stage 3 — Temporal Decay (opt-in)

Brokers accumulate information over time, however not all information ages equally. With out decay, a stale debugging notice from six months in the past can outrank a call made this morning just because it embeds nicely. Temporal decay solves this by multiplying every end result’s rating by an exponential issue primarily based on the age of its supply file.
The system is customary exponential decay:

λ             = ln(2) / half_life_days
multiplier    = exp(−λ × age_days)
decayed_score = original_score × multiplier

At age_days = 0 the multiplier is 1.0 — no change. At age_days = half_life_days it’s precisely 0.5. The curve is easy and steady: scores are by no means zeroed, outdated reminiscences nonetheless floor, they merely rank decrease than current ones.

Evergreen recordsdata bypass this stage completely — their multiplier is at all times 1.0 no matter after they had been written.

from memweave.config import MemoryConfig, QueryConfig, TemporalDecayConfig

config = MemoryConfig(
    question=QueryConfig(
        temporal_decay=TemporalDecayConfig(
            enabled=True,
            half_life_days=30.0,  # tune to your workflow
        )
    )
)

Tune half_life_days to your workflow: 7 for fast-moving initiatives the place week-old context is already stale, 90 for analysis or documentation repositories the place information stays related for months.

Stage 4 — MMR Re-ranking (opt-in)

With out variety management, the highest outcomes from a hybrid search are sometimes near-duplicates — a number of chunks from the identical file, or completely different phrasings of the identical truth. An agent loading all of them into its context window wastes tokens and misses different related however distinct reminiscences.

MMR (Maximal Marginal Relevance) reorders outcomes after scoring to steadiness relevance towards variety. At every choice step it picks the candidate that maximises:

MMR(cᵢ) = λ × relevance(cᵢ) − (1 − λ) × max sim(cᵢ, cⱼ)  for cⱼ ∈ S

The place:
S = set of already-selected outcomes
relevance(cᵢ) = merged rating from Stage 1, after temporal decay
sim(cᵢ, cⱼ) = Jaccard token overlap between candidate and every chosen end result
λ = variety dial — 0 is pure variety, 1 is pure relevance, default 0.7

Why Jaccard overlap moderately than cosine similarity?

Two chunks that share lots of the identical phrases — even from completely different recordsdata — are genuinely redundant for an agent loading them as context. Jaccard catches this on the token degree with out requiring an extra embedding name per pair.

┌──────────────┬─────────────────────────────────────────────────────────┐
│ lambda_param │ Behaviour                                               │
├──────────────┼─────────────────────────────────────────────────────────┤
│ 1.0          │ Pure relevance — equivalent to no MMR                    │
│ 0.7          │ Default — robust relevance, mild variety push        │
│ 0.5          │ Equal weight between relevance and variety            │
│ 0.0          │ Pure variety — maximally novel outcomes                │
└──────────────┴─────────────────────────────────────────────────────────┘

from memweave.config import MemoryConfig, QueryConfig, MMRConfig

config = MemoryConfig(
    question=QueryConfig(
        mmr=MMRConfig(enabled=True, lambda_param=0.7)
    )
)

# Or override λ per name with out touching the config
diverse_results = await mem.search("deployment steps", mmr_lambda=0.3)

Stage 5 — Customized Submit-processors

Any processors registered by way of mem.register_postprocessor() run final, in registration order. Every receives the output of the earlier stage and may filter, reorder, or rescore freely — domain-specific boosting, laborious pinning a end result to the highest, or integrating an exterior sign. The built-in pipeline runs first; customized phases lengthen it with out changing it.

Actual-World Instance utilizing memweave — Ebook Membership Choice Log

One of the simplest ways to see memweave in motion is to look at two brokers reply the identical query with completely different retrieval methods. The complete runnable pocket book is accessible at examples/book_club_demo.ipynb.

The Setup

The workspace comprises 9 reminiscence recordsdata spanning 18 months of a e book membership’s historical past:

Ebook Membership dataset (picture by writer)

One evergreen file holds standing data that ought to at all times floor at full rating. Seven dated recordsdata accumulate the membership’s historical past. One file written right this moment holds the present state.

The Query

Each brokers are requested the identical query:

“What style did the membership vote on most just lately?”

The proper reply — grounded in the latest data — is science fiction, with literary fiction seemingly subsequent. However an agent with out temporal consciousness is not going to essentially discover this.

Agent A — No Temporal Decay

config = MemoryConfig(
    workspace_dir=WORKSPACE,
    embedding=EmbeddingConfig(mannequin="text-embedding-3-small"),
)

async with MemWeave(config) as mem:
    outcomes = await mem.search(
        "What style did the membership vote on most just lately?",
        max_results=3,
        min_score=0.1,
    )

Agent A’s prime 3 outcomes by uncooked semantic similarity:

[0.339]  2025-11-03.md   ← Non-fiction vote (5 months in the past)
[0.336]  2024-10-05.md   ← Fantasy vote (18 months in the past)
[0.320]  2025-05-10.md   ← Thriller vote (11 months in the past)

As we speak’s file doesn’t seem within the prime 3. The older “vote” recordsdata outscore it on uncooked semantic similarity as a result of they include extra specific voting language. Agent A’s reply:

“The membership most just lately voted on the style of non-fiction.”

Factually stale — the November 2025 vote, not the latest one.

Agent B — With Temporal Decay (`half_life = 90 days`)

async with MemWeave(config) as mem:
    outcomes = await mem.search(
        "What style did the membership vote on most just lately?",
        max_results=3,
        min_score=0.1,
        decay_half_life_days=90.0,
    )

Agent B’s prime 3 outcomes after the age penalty:

[0.313]  2026-04-11.md   ← As we speak's notes (multiplier: 1.00) ↑ rank 1
[0.293]  club_info.md    ← Evergreen     (multiplier: 1.00)
[0.128]  2025-12-30.md   ← Sci-fi plan   (multiplier: ~0.46)

As we speak’s file floats to rank 1 after the age penalty collapses the scores of older recordsdata. The top-of-year evaluate retains ~46% of its rating; the November 2025 non-fiction vote drops out of the highest 3 completely.

Agent B’s reply, grounded in right this moment’s file:

“The membership most just lately voted for science fiction.”

What This Demonstrates

The stale reminiscence drawback is actual and silent. Agent A doesn’t know it’s mistaken. It returns a assured reply primarily based on the highest-scoring semantic matches — which occur to be older recordsdata with extra specific voting language. There is no such thing as a error, no warning, simply subtly outdated context.
Decay’s benefit compounds with historical past. With 18 months of recordsdata, Agent A’s context fills with more and more stale votes. The bigger the reminiscence grows, the more severe the issue turns into — and the extra dramatic the distinction between the 2 brokers.
club_info.md (evergreen) surfaces in Agent B at full rating. With decay enabled, the age penalty clears out stale vote information, and the evergreen standing data rises into the highest 3 — regardless of by no means being the closest semantic match to the question. In Agent A, older dated recordsdata with specific voting language outscore it on uncooked similarity. Evergreen immunity is set by the file path, not the content material.
A single parameter change is all it takes. decay_half_life_days=90.0 is the one distinction between Agent A and Agent B. No schema modifications, no re-indexing, no metadata tagging.

Abstract

Agent reminiscence doesn’t must imply infrastructure. memweave takes a special wager: reminiscences are plain Markdown recordsdata you possibly can open, edit, and git diff. An area SQLite database indexes them for hybrid search — BM25 for precise matches, vector seek for semantic retrieval, merged right into a single ranked checklist. Temporal decay retains current context above stale historical past mechanically. MMR ensures the highest outcomes cowl completely different elements of your question moderately than repeating the identical truth. An embedding cache means solely modified content material ever hits the API. Your entire retailer is a single file on disk — no server, no Docker, no cloud service.

The e book membership demo makes the tradeoff concrete: two brokers, one query, one parameter distinction, two completely different solutions. The agent with temporal decay surfaces right this moment’s file at rank one. The agent, with out it, surfaces a five-month-old vote with extra specific “voting” language — and confidently provides the mistaken reply with out realizing it.

The broader level is that the stale-memory drawback is silent. There is no such thing as a error, no warning — simply subtly outdated context fed to the mannequin. The bigger the reminiscence grows, the extra stale recordsdata accumulate, and the extra aggressively they compete with current ones on uncooked semantic similarity. Temporal decay is the one mechanism that retains the retrieval sincere as historical past builds up.

Get Began

pip set up memweave

Should you hit one thing sudden, discover a use case the library doesn’t cowl nicely, or simply wish to share what you constructed — open a difficulty or begin a dialogue on GitHub. The suggestions will probably be actually appreciated.

Source link

memweave: Zero-Infra AI Agent Memory with Markdown and SQLite — No Vector Database Required

Loop Engineering for RAG Question Parsing: The Small Loop That Runs Before Retrieval

How to Find the Optimal Coding Agent Interface

I Completed Five Years in Analytics Consulting: 5 Lessons That Changed How I Work

GPU-Resident Top-K for Agentic RAG: I Built a CUDA Kernel So My Retrieval Step Would Stop Bouncing Off the GPU

Can Machine Learning Predict the World Cup?

Automate Writing Your LLM Prompts

These Were My Favorite Things Samsung Unpacked During Its 2026 Galaxy Event

AI minister role boosted but tech department axed in Burnham shake-up

Loop Engineering for RAG Question Parsing: The Small Loop That Runs Before Retrieval

The risk of weather data sabotage is rising

Featured Picks

Sony launches new Bluetooth turntables for vinyl lovers

Louisiana bill targets sweepstakes sites and their financial backers

The Next Generation of Night Vision

memweave: Zero-Infra AI Agent Memory with Markdown and SQLite — No Vector Database Required

The Method: Markdown + SQLite

Why Information and SQLite As a substitute of a Vector Database

Structure

The Write Path

The Search Path

Why SQLite because the Infrastructure Layer?

How memweave Organises Reminiscence: Evergreen Information, Dated Logs, and Agent Namespaces

Agent Namespaces (allows Multi-Agent Reminiscence)

memweave Search Pipeline

Stage 1 — Hybrid Rating Merge

Stage 2 — Rating Threshold

Stage 3 — Temporal Decay (opt-in)

Stage 4 — MMR Re-ranking (opt-in)

Stage 5 — Customized Submit-processors

Actual-World Instance utilizing memweave — Ebook Membership Choice Log

The Setup

The Query

Agent A — No Temporal Decay

Agent B — With Temporal Decay (half_life = 90 days)

What This Demonstrates

Abstract

Get Began

Related Posts

Agent B — With Temporal Decay (`half_life = 90 days`)