Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Humanoid data: 10 Things That Matter in AI Right Now
    • 175 Park Avenue skyscraper in New York will rank among the tallest in the US
    • The conversation that could change a founder’s life
    • iRobot Promo Code: 15% Off
    • My Smartwatch Gives Me Health Anxiety. Experts Explain How to Make It Stop
    • How to Call Rust from Python
    • Agent orchestration: 10 Things That Matter in AI Right Now
    • New Teaser for 6th Gen Navy Fighter
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Wednesday, April 22
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»I Replaced Vector DBs with Google’s Memory Agent Pattern for my notes in Obsidian
    Artificial Intelligence

    I Replaced Vector DBs with Google’s Memory Agent Pattern for my notes in Obsidian

    Editor Times FeaturedBy Editor Times FeaturedApril 3, 2026No Comments14 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    This began as a result of my Obsidian assistant stored getting amnesia. I didn’t wish to get up Pinecone or Redis simply so Claude might keep in mind that Alice authorized the Q3 price range final week. Seems, with 200K+ context home windows, you may not want any of that.

    I wish to share a brand new mechanism that I’ve began operating. It’s a system constructed on SQLite and direct LLM reasoning, no vector databases, no embedding pipeline. Vector search was principally a workaround for tiny context home windows and protecting prompts from getting messy. With trendy context sizes, you possibly can usually skip that and simply let the mannequin learn your reminiscences immediately.


    The Setup

    I take detailed notes, each in my private life and at work. I used to scrawl in notebooks that will get misplaced or get caught on a shelf and by no means be referenced once more. Just a few years in the past, I moved to Obsidian for every thing, and it has been implausible. Within the final yr, I’ve began hooking up genAI to my notes. Immediately I run each Claude Code (for my private notes) and Kiro-CLI (for my work notes). I can ask questions, get them to do roll-ups for management, observe my objectives, and write my experiences. Nevertheless it’s all the time had one huge Achilles’ heel: reminiscence. After I ask a couple of assembly, it makes use of an Obsidian MCP to look my vault. It’s time-consuming, error-prone, and I would like it to be higher.

    The apparent repair is a vector database. Embed the reminiscences. Retailer the vectors. Do a similarity search at question time. It really works. Nevertheless it additionally means a Redis stack, a Pinecone account, or a regionally operating Chroma occasion, plus an embedding API, plus pipeline code to sew all of it collectively. For a private instrument, that’s lots, and there’s a actual danger that it gained’t work precisely like I would like it to. I have to ask, what occurred on ‘Feb 1 2026’ or ‘recap the final assembly I had with this particular person’, issues that embeddings and RAG aren’t nice with.

    Then I ran throughout Google’s always-on-memory agent https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/agents/always-on-memory-agent. The thought is fairly easy: don’t do a similarity search in any respect; simply give the LLM your current reminiscences immediately and let it motive over them.

    I wished to know if that held up on AWS Bedrock with Claude Haiku 4.5. So I constructed it (together with Claude Code, after all) and added in some additional bells and whistles.

    Go to my GitHub repo, however be certain that to come back again!

    https://github.com/ccrngd1/ProtoGensis/tree/main/memory-agent-bedrock


    An Perception That Adjustments the Math

    Older fashions topped out at 4K or 8K tokens. You couldn’t match quite a lot of paperwork in a immediate. Embeddings allow you to retrieve the related paperwork with out loading every thing. That was genuinely vital. Haiku 4.5 affords a context window of 250k, so what can we do with that?

    A structured reminiscence (abstract, entities, matters, significance rating) runs about 300 tokens. Which suggests we are able to get about 650 reminiscences earlier than you hit the ceiling. In observe, it’s a bit much less for the reason that system immediate and question additionally eat tokens, however for a private assistant that tracks conferences, notes, and conversations, that’s months of context.

    No embeddings, no vector indexes, no cosine similarity.

    The LLM causes immediately over semantics, and it’s higher at that than cosine similarity.


    The Structure

    The orchestrator isn’t a separate service. It’s a Python class contained in the FastAPI course of that coordinates the three brokers.

    The IngestAgent job is easy: take uncooked textual content and ask Haiku what’s value remembering. It extracts a abstract, entities (names, locations, issues), matters, and an significance rating from 0 to 1. That package deal goes into the `reminiscences` desk.

    The ConsolidateAgent runs with clever scheduling: at startup if any reminiscences exist, when a threshold is reached (5+ reminiscences by default), and day by day as a compelled go. When triggered, it batches unconsolidated reminiscences and asks Haiku to search out cross-cutting connections and generate insights. Outcomes land in a `consolidations` desk. The system tracks the final consolidation timestamp to make sure common processing even with low reminiscence accumulation.

    The QueryAgent reads current reminiscences plus consolidation insights right into a single immediate and returns a synthesized reply with quotation IDs. That’s the entire question path.


    What Truly Will get Saved

    Whenever you ingest textual content like “Met with Alice in the present day. Q3 price range is authorized, $2.4M,” the system doesn’t simply dump that uncooked string into the database. As an alternative, the IngestAgent sends it to Haiku and asks, “What’s vital right here?”

    The LLM extracts structured metadata:

    {
      "id": "a3f1c9d2-...",
      "abstract": "Alice confirmed Q3 price range approval of $2.4M",
      "entities": ["Alice", "Q3 budget"],
      "matters": ["finance", "meetings"],
      "significance": 0.82,
      "supply": "notes",
      "timestamp": "2026-03-27T14:23:15.123456+00:00",
      "consolidated": 0
    }

    The reminiscences desk holds these particular person information. At ~300 tokens per reminiscence when formatted right into a immediate (together with the metadata), the theoretical ceiling is round 650 reminiscences in Haiku’s 200K context window. I deliberately set the default to be 50 current reminiscences, so I’m nicely wanting that ceiling.

    When the ConsolidateAgent runs, it doesn’t simply summarize reminiscences. It causes over them. It finds patterns, attracts connections, and generates insights about what the reminiscences imply collectively. These insights get saved as separate information within the consolidations desk:

    {
      "id": "3c765a26-...",
      "memory_ids": ["a3f1c9d2-...", "b7e4f8a1-...", "c9d2e5b3-..."],
      "connections": "All three conferences with Alice talked about price range considerations...",
      "insights": "Price range oversight seems to be a recurring precedence...",
      "timestamp": "2026-03-27T14:28:00.000000+00:00"
    }

    Whenever you question, the system hundreds each the uncooked reminiscences *and* the consolidation insights into the identical immediate. The LLM causes over each layers without delay, together with current details plus synthesized patterns. That’s the way you get solutions like “Alice has raised price range considerations in three separate conferences [memory:a3f1c9d2, memory:b7e4f8a1] and the sample suggests it is a excessive precedence [consolidation:3c765a26].”

    This two-table design is the complete persistence layer. A single SQLite file. No Redis. No Pinecone. No embedding pipeline. Simply structured information that an LLM can motive over immediately.

    What the Consolidation Agent Truly Does

    Most reminiscence methods are purely retrieval. They retailer, search, and return comparable textual content. The consolidation agent works in another way; It reads a batch of unconsolidated reminiscences and asks, “What connects these?”, “What do these have in frequent?”, “How do these relate?”

    These insights get written as a separate consolidations report. Whenever you question, you get each the uncooked reminiscences and the synthesized insights. The agent isn’t simply recalling. It’s reasoning.

    The sleeping mind analogy from the unique Google implementation appear fairly correct. Throughout idle time, the system is processing slightly than simply ready. That is one thing I usually battle with when constructing brokers: how can I make them extra autonomous in order that they’ll work after I don’t, and it is a good use of that “downtime”.

    For a private instrument, this issues. “You’ve had three conferences with Alice this month, and all of them talked about price range considerations” is extra helpful than three particular person recall hits.

    The unique design used a easy threshold for consolidation: it waited for five reminiscences earlier than consolidating. That works for lively use. However should you’re solely ingesting sporadically, a observe right here, a picture there, you would possibly wait days earlier than hitting the edge. In the meantime, these reminiscences sit unprocessed, and queries don’t profit from the consolidation agent’s sample recognition.

    So, I made a decision so as to add two extra triggers. When the server begins, it checks for unconsolidated reminiscences from the earlier session and processes them instantly. No ready. And on a day by day timer (configurable), it forces a consolidation go if something is ready, no matter whether or not the 5-memory threshold has been met. So even a single observe per week nonetheless will get consolidated inside 24 hours.

    The unique threshold-based mode nonetheless runs for lively use. However now there’s a security web beneath it. If you happen to’re actively ingesting, the edge catches it. If you happen to’re not, the day by day go does. And on restart, nothing falls by the cracks.

    File Watching and Change Detection

    I’ve an Obsidian vault with a whole lot of notes, and I don’t wish to manually ingest every one. I wish to level the watcher on the vault and let it deal with the remaining. That’s precisely what this does.

    On startup, the watcher scans the listing and ingests every thing it hasn’t seen earlier than. It runs two modes within the background: a fast scan each 60 seconds checks for brand new recordsdata (quick, no hash calculation, simply “is that this path within the database?”), and a full scan each half-hour, calculates SHA256 hashes, and compares them to saved values. If a file has modified, the system deletes the previous reminiscences, cleans up any consolidations that referenced them, re-ingests the brand new model, and updates the monitoring report. No duplicates. No stale knowledge.

    For private observe workflows, the watcher covers what you’d anticipate:

    • Textual content recordsdata (.txt, .md, .json, .csv, .log, .yaml, .yml)
    • Photographs (.png, .jpg, .jpeg, .gif, .webp), analyzed by way of Claude Haiku’s imaginative and prescient capabilities
    • PDFs (.pdf), textual content extracted by way of PyPDF2

    Recursive scanning and listing exclusions are configurable. Edit a observe in Obsidian, and inside half-hour, the agent’s reminiscence displays the change.


    Why No Vector DB

    Whether or not you want embeddings in your private notes boils down to 2 issues: what number of notes you will have and the way you wish to search them.

    Vector search is genuinely vital when you will have thousands and thousands of paperwork and might’t match the related ones in context. It’s a retrieval optimization for large-scale issues.

    At private scale, you’re working with a whole lot of reminiscences, not thousands and thousands. Vector means you’re operating an embedding pipeline, paying for the API calls, managing the index, and implementing similarity search to unravel an issue {that a} 200K context window already solves.

    Right here’s how I take into consideration the tradeoffs:

    Complexity
    Accuracy
    Scale

    I couldn’t justify having to setup and keep a vector database, even FAISS for the few notes that I generate.

    On prime of that, this new methodology provides me higher accuracy for the way in which I would like to look my notes.


    Seeing It In Motion

    Right here’s what utilizing it really seems like. Configuration is dealt with by way of a .env file with smart defaults. You’ll be able to copy of the instance immediately and begin utilizing it (assuming you will have run aws configure on you’re machine already).

    cp .env.instance .env

    Then, begin the server with the file watcher lively

    ./scripts/run-with-watcher.sh

    CURL the /ingest endpoint with to check a pattern ingestion. That is possibility, simply to exhibit the way it works. You’ll be able to skip this should you’re organising in an actual use case.

    -H "Content material-Sort: utility/json" 
    -d '{"textual content": "Met with Alice in the present day. Q3 price range is authorized, $2.4M.", "supply": "notes"}'

    The response will appear to be

    {
      "id": "a3f1c9d2-...",
      "abstract": "Alice confirmed Q3 price range approval of $2.4M.",
      "entities": ["Alice", "Q3 budget"],
      "matters": ["finance", "meetings"],
      "significance": 0.82,
      "supply": "notes"
    }

    To question it later CURL the question endpoint with

    question?q=What+did+Alice+say+about+the+price range

    Or use the CLI:

    python cli.py ingest "Paris is the capital of France." --source wikipedia
    python cli.py question "What are you aware about France?"
    python cli.py consolidate  # set off manually
    python cli.py standing       # see reminiscence depend, consolidation state

    Making It Helpful Past CURL

    curl works, however you’re not going to curve your reminiscence system at 2 am when you will have an concept, so the venture has two integration paths.

    Claude Code / Kiro-CLI talent. I added a local talent that auto-activates when related. Say “keep in mind that Alice authorized the Q3 price range” and it shops it with out you needing to invoke something. Ask “what did Alice say in regards to the price range?” subsequent week, and it checks reminiscence earlier than answering. It handles ingestion, queries, file uploads, and standing checks by pure dialog. That is how I work together with the reminiscence system most frequently, since I are likely to stay in CC/Kiro more often than not.

    CLI. For terminal customers or scripting

    python cli.py ingest "Paris is the capital of France." --source wikipedia
    
    python cli.py question "What are you aware about France?"
    
    python cli.py consolidate
    
    python cli.py standing
    
    python cli.py record --limit 10

    The CLI talks to the identical SQLite database, so you possibly can combine API, CLI, and talent utilization interchangeably. Ingest from a script, question from Claude Code, and verify standing from the terminal. All of it hits the identical retailer.


    What’s Subsequent

    The excellent news, the system works, and I’m utilizing it in the present day, however listed here are just a few additions it may gain advantage from.

    Significance-weighted question filtering. Proper now, the question agent reads the N most up-to-date reminiscences. Which means previous however vital reminiscences can get pushed out by current noise. I wish to filter by significance rating earlier than constructing the context, however I’m unsure but how aggressive to be. I don’t need a high-importance reminiscence from two months in the past to vanish simply because I ingested a bunch of assembly notes this week.

    Metadata filtering. Equally, since every reminiscence has related metadata, I might use that metadata to filter out reminiscences which might be clearly incorrect. If I’m asking questions on Alice, I don’t want any reminiscences that solely contain Bob or Charlie. For my use case, this might be based mostly on my observe hierarchy, since I preserve notes aligned to prospects and/or particular tasks.

    Delete and replace endpoints. The shop is append-only proper now. That’s superb till you ingest one thing incorrect and wish to repair it. DELETE /reminiscence/{id} is an apparent hole. I simply haven’t wanted it badly sufficient but to construct it.

    MCP integration. Wrapping this as an MCP server would let any Claude-compatible shopper use it as persistent reminiscence. That’s in all probability the highest-lift factor on this record, but it surely’s additionally essentially the most work.


    Attempt It

    The venture is up on GitHub as a part of an ongoing collection I began, the place I implement analysis papers, discover modern concepts, and repurpose helpful instruments for bedrock (https://github.com/ccrngd1/ProtoGensis/tree/main/memory-agent-bedrock).

    It’s Python with no unique dependencies, simply boto3, FastAPI, and SQLite.

    The default mannequin is `us.anthropic.claude-haiku-4-5-20251001-v1:0` (Bedrock cross-region inference profile), configurable by way of .env.

    A observe on safety: the server has no authentication by default; it’s designed for native use. If you happen to expose it on a community, add auth first. The SQLite database will include every thing you’ve ever ingested, so deal with it accordingly (chmod 600 reminiscence.db is an efficient begin).

    If you happen to’re constructing private AI tooling and stalling on the reminiscence drawback, this sample is value a glance. Let me know should you resolve to strive it out, the way it works for you, and which venture you’re utilizing it on.


    About

    Nicholaus Lawson is a Resolution Architect with a background in software program engineering and AIML. He has labored throughout many verticals, together with Industrial Automation, Well being Care, Monetary Companies, and Software program corporations, from start-ups to giant enterprises.

    This text and any opinions expressed by Nicholaus are his personal and never a mirrored image of his present, previous, or future employers or any of his colleagues or associates.

    Be at liberty to attach with Nicholaus by way of LinkedIn at https://www.linkedin.com/in/nicholaus-lawson/



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    How to Call Rust from Python

    April 22, 2026

    Inside the AI Power Move That Could Redefine Finance

    April 22, 2026

    Git UNDO : How to Rewrite Git History with Confidence

    April 22, 2026

    DIY AI & ML: Solving The Multi-Armed Bandit Problem with Thompson Sampling

    April 21, 2026

    Your RAG Gets Confidently Wrong as Memory Grows – I Built the Memory Layer That Stops It

    April 21, 2026

    The LLM Gamble | Towards Data Science

    April 21, 2026

    Comments are closed.

    Editors Picks

    Humanoid data: 10 Things That Matter in AI Right Now

    April 22, 2026

    175 Park Avenue skyscraper in New York will rank among the tallest in the US

    April 22, 2026

    The conversation that could change a founder’s life

    April 22, 2026

    iRobot Promo Code: 15% Off

    April 22, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Future Trends in AI and Adult Entertainment: What to Expect

    April 19, 2025

    AI’s Copyright Dilemma Affects All of Us, Even You. Here’s What You Need to Know

    November 10, 2025

    ExoLab Italia lands €5 million to scale plant exosome innovation globally

    February 19, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.