Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Extragalactic Archaeology tells the ‘life story’ of a whole galaxy
    • Swedish semiconductor startup AlixLabs closes €15 million Series A to scale atomic-level etching technology
    • Republican Mutiny Sinks Trump’s Push to Extend Warrantless Surveillance
    • Yocha Dehe slams Vallejo Council over rushed casino deal approval process
    • One Rumored Color for the iPhone 18 Pro? A Rich Dark Cherry Red
    • A Practical Guide to Memory for Autonomous LLM Agents
    • The first splittable soft-top surfboard
    • Meet the speakers joining our “How to Launch and Scale in Malta” panel at the EU-Startups Summit 2026!
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Saturday, April 18
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»A Practical Guide to Memory for Autonomous LLM Agents
    Artificial Intelligence

    A Practical Guide to Memory for Autonomous LLM Agents

    Editor Times FeaturedBy Editor Times FeaturedApril 17, 2026No Comments14 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    a distributed multi-agent system each in OpenClaw and AWS AgentCore for some time now. In my OpenClaw setup alone, it has a analysis agent, a writing agent, a simulation engine, a heartbeat scheduler, and a number of other extra. They collaborate asynchronously, hand off context via shared information, and keep state throughout classes spanning days or perhaps weeks.

    Once I usher in different agentic techniques like Claude Code or the brokers I’ve deployed in AgentCore, coordination, reminiscence, and state all grow to be harder to unravel for.

    Finally, I got here to a realization: most of what makes these brokers truly work isn’t the mannequin selection. It’s the reminiscence structure.

    So after I got here throughout “Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Emerging Frontiers” (arxiv 2603.07670), I used to be curious whether or not the formal taxonomy matched what I’d constructed by really feel and iteration. It does, fairly intently. Nonetheless, it codifies a whole lot of what I had discovered by myself and helped me see that a few of my present ache factors aren’t distinctive to me and are being seen extra broadly.

    Let’s stroll via the survey and talk about its findings as I share my experiences.


    Why Reminiscence Issues Extra Than You Suppose

    The paper leads with an empirical statement that ought to recalibrate your priorities if it hasn’t already:

    “The hole between ‘has reminiscence’ and ‘doesn’t have reminiscence’ is usually bigger than the hole between totally different LLM backbones.”

    It is a enormous declare. Swapping your underlying mannequin issues lower than whether or not your agent can bear in mind issues. I’ve felt this intuitively, however seeing it said this plainly in a proper survey is helpful. Practitioners spend monumental power on mannequin choice and immediate tuning whereas treating reminiscence as an afterthought. That’s backward.

    The paper frames agent reminiscence inside a Partially Observable Markov Resolution Course of (POMDP) construction, the place reminiscence capabilities because the agent’s perception state over {a partially} observable world. That’s a tidy formalization. In follow, it means the agent can’t see all the things, so it builds and maintains an inside mannequin of what’s true. Reminiscence is that mannequin. Get it improper, and each downstream determination degrades.


    The Write-Handle-Learn Loop

    The paper characterizes agent reminiscence as a write-manage-read loop, not simply “retailer and retrieve.”

    • Write: New data enters reminiscence (observations, outcomes, reflections)
    • Handle: Reminiscence is maintained, pruned, compressed, and consolidated
    • Learn: Related reminiscence is retrieved and injected into the context

    Most implementations I see nail “write” and “learn” and utterly neglect “handle.” They accumulate with out curation. The result’s noise, contradiction, and bloated context. Managing is the arduous half, and it’s the place most techniques wrestle or outright fail.

    Earlier than the newest OpenClaw enhancements, I used to be dealing with this with a heuristic management coverage: guidelines for what to retailer, what to summarize, when to escalate to long-term reminiscence, and when to let issues age out. It’s not elegant, nevertheless it forces me to be specific in regards to the administration step somewhat than ignoring it.

    In different techniques I construct, I usually depend on mechanisms equivalent to AgentCore Quick/Lengthy-term reminiscence, Vector Databases, and Agent Reminiscence techniques. The file-based reminiscence system doesn’t scale nicely for giant, distributed techniques (although for brokers or chatbots, it’s not off the desk).


    4 Temporal Scopes (And The place I See Them in Follow)

    The paper breaks reminiscence into 4 temporal scopes.

    Working Reminiscence

    That is the context window.

    It’s ephemeral, high-bandwidth, and restricted. The whole lot lives right here briefly. The failure mode is attentional dilution and the “misplaced within the center” impact, the place related content material will get ignored as a result of the window is simply too crowded. I’ve hit this, as have a lot of the groups I’ve labored with.

    When OpenClaw, Claude Code, or your chatbot context will get lengthy, agent conduct degrades in methods which can be arduous to debug as a result of the mannequin technically “has” the knowledge however isn’t utilizing it. The most typical factor I see from groups (and myself) is to create new threads for various chunks of labor. You don’t hold Claude Code open all day whereas engaged on 20+ totally different JIRA duties; it degrades over time and performs poorly.

    Episodic Reminiscence

    This captures concrete experiences; what occurred, when, and in what sequence.

    In my OpenClaw occasion, that is the each day standup logs. Every agent writes a quick abstract of what it did, what it discovered, and what it escalated. These accumulate as a searchable timeline. The sensible worth is big: brokers can look again at yesterday’s work, spot patterns, and keep away from repeating failures. Instruments like Claude Code wrestle, except you arrange directions to drive the conduct.

    Manufacturing brokers can leverage issues like Agent Core’s short-term reminiscence to maintain these episodic reminiscences. There are even mechanisms to know what deserves to be endured past a single interplay.

    The paper validates this as a definite and vital tier.

    Semantic Reminiscence

    Is liable for abstracted, distilled data, information, heuristics, and discovered conclusions.

    In my OpenClaw, that is the MEMORY.md file in every agent’s workspace. It’s curated. Not all the things goes in. The agent (or I, periodically) decides what’s price preserving as an enduring reality versus what was situational.

    In Agent Core Reminiscence, that is primarily the Lengthy-term reminiscence characteristic.

    This curation step is vital; with out it, semantic reminiscence turns into a junk drawer.

    Procedural Reminiscence

    It’s encoded executable expertise, behavioral patterns, and discovered conduct.

    In OpenClaw, this maps largely to the AGENTS.md and SOUL.md information, which include persona directions, behavioral constraints, and escalation guidelines. When the agent reads these at first of the session, it’s loading procedural reminiscence. These must be up to date primarily based on consumer suggestions, and even via ‘dream’ processes that analyze interactions.

    That is an space that I’ve been remiss in (as have groups that I’ve labored with). I spend time tuning a immediate, however the suggestions mechanisms that drive the storage of procedural reminiscence and the iteration on these personas usually get omitted.

    The paper formalizes this as a definite tier, which I discovered validating. These aren’t simply system prompts. They’re a type of long-term discovered conduct that shapes each motion.


    5 Mechanism Households

    Now that now we have some widespread definitions across the sorts of reminiscences, let’s dive into reminiscence mechanisms.

    Context-Resident Compression

    This covers sliding home windows, rolling summaries, and hierarchical compression. These are the “keep in context” methods. Rolling summaries are seductive as a result of they really feel clear (they’re not, I’ll get to why in a second).

    I’m positive everybody has run into Claude Code or Kiro CLI compressing a dialog when it will get too massive for the context window. Oftentimes, you’re higher off restarting a brand new thread.

    Retrieval-Augmented Shops

    That is RAG utilized to agent interplay historical past somewhat than static paperwork. The agent embeds previous observations and retrieves by similarity. That is highly effective for long-running brokers with deep historical past, however retrieval high quality turns into a bottleneck quick. In case your embeddings don’t seize semantic intent nicely, you’ll miss related reminiscences and floor stale ones.

    You additionally run into points the place questions like ‘what occurred final Monday’ don’t retrieve high quality reminiscences.

    Reflective Self-Enchancment

    This consists of techniques equivalent to Reflexion and ExpeL, the place brokers write verbal post-mortems and retailer conclusions for future runs. The concept is compelling; brokers be taught from errors and enhance. The failure mode is extreme, although (we’ll cowl it in additional element in a minute).

    I imagine different ‘dream’ primarily based reflection and techniques just like the Google Memory Agent sample belong to this class as nicely.

    Hierarchical Digital Context

    A MemGPT’s OS-inspired structure (see GitHub repo also). A important context window is “RAM”, a recall database is the “disk”, and archival storage is “chilly storage”, whereas the agent manages its personal paging. Whereas this class is fascinating, the overhead/work of sustaining these separate tiers is burdensome and tends to fail.

    The MemGPT paper and git repo are each nearly 3 years outdated, and I’ve but to see any precise use in manufacturing.

    Coverage-Discovered Administration

    It is a new frontier strategy, the place RL-trained operators (equivalent to retailer, retrieve, replace, summarize, and discard) that fashions be taught to invoke optimally. I feel there may be a whole lot of promise right here, however I haven’t seen actual harnesses for builders to make use of or any precise manufacturing use.


    Failure Modes

    We’ve lined the sorts of reminiscences and the techniques that make them. Subsequent is how these can fail.

    Context-Resident Failures

    Summarization drift happens whenever you repeatedly compress historical past to suit it inside a context window. Every compression/summarization throws away particulars, and finally, you’re left with reminiscence that doesn’t actually match what occurred. Once more, you see this Claude Code and Kiro CLI when coding classes cowl too many options with out creating new threads. A technique I’ve seen groups fight that is to maintain uncooked reminiscences linked to the summarized/consolidated reminiscences.

    Consideration dilution is the opposite failure mode on this class. Even if you happen to can hold all the things in context (as with the brand new 1 million-token home windows), bigger prompts “lose” data within the center. Whereas brokers technically have all of the reminiscences, they will’t deal with the proper elements on the proper time.

    Retrieval Failures

    Semantic vs. causal mismatch happens when similarity searches return reminiscences that appear associated however aren’t. Embeddings are nice at figuring out when textual content ‘appear to be’ one another, however are horrible with figuring out ‘that is the trigger’. In follow, I usually see this when debugging via coding assistants. They see comparable errors however can miss the underlying trigger, which frequently results in thrashing/churning, a whole lot of modifications, however by no means fixes the true challenge.

    Reminiscence blindness happens in tiered techniques when vital information by no means resurface. The information exists, however the agent by no means sees it once more. This may be as a result of a sliding window has moved on, since you solely retrieve 10 reminiscences from an information supply, however what you want would have been the eleventh reminiscence.

    Silent orchestration failures are probably the most harmful on this class. Paging, eviction, or archival insurance policies do the improper issues, however no errors are thrown (or are misplaced within the noise by the autonomous system or by people working it). The one symptom will likely be that responses worsen, get extra generic, and get much less grounded. Whereas I’ve seen this come up in a number of methods, the newest for me was when OpenClaw failed to jot down each day reminiscence information, so each day stand-ups/summarizations had nothing to do. I solely observed as a result of it stored forgetting issues we labored on throughout these days.

    Data-Integrity Failures

    Staleness might be most typical. The skin world modifications, however your system reminiscence doesn’t. Addresses, system states, consumer preferences, and something that your system depends on to make selections can drift over time. Lengthy-lived brokers will act on information from 2024 even in 2026 (who hasn’t seen an LLM insist the date is improper, the improper President is in workplace, or that the most recent expertise hasn’t truly hit the scene but?).

    Self-reinforcing errors (affirmation loops) happen when a system treats a reminiscence as floor reality, however that reminiscence is improper. Whilst you typically need techniques to be taught and construct a brand new foundation of reality, if a system creates a foul reminiscence, its view of the world is affected. In my OpenClaw occasion, it determined that my SmartThings integration with my House Assistant was defective; due to this fact, all data from a SmartThings system was deemed inaccurate, and it ignored all the things from it (the truth is, there have been only a few lifeless batteries in my system).

    Over-generalization is a quieter model of self-reinforcement. Brokers be taught a lesson in a slender context, then apply it in every single place. A workaround for a single buyer or a single error is a default sample.

    Environmental Failure

    Contradiction dealing with will be extremely irritating. As new data is collected, if it conflicts with current data, techniques can’t all the time decide the precise reality. In my OpenClaw system, I requested it to create some N8N workflows. All of them created accurately, however the motion timed out, so it thought it failed. I verified the workflows existed, instructed my OpenClaw agent to recollect it, and it agreed. For the following a number of interactions, the agent oscillated between believing the workflow was accessible and believing it had did not arrange.


    Design Tensions

    There may be going to be push-and-pull in opposition to all these for brokers and reminiscence techniques.

    Utility vs. Effectivity

    Higher reminiscence normally means extra tokens, extra latency, extra storage, extra techniques.

    Utility vs. Adaptivity

    Reminiscence that’s helpful now will likely be stale sooner or later. Updating is dear and dangerous.

    Adaptivity vs. Faithfulness

    The extra you replace, revise, and compress, the extra you threat distorting what truly occurred.

    Faithfulness vs. Governance

    Correct reminiscence might include delicate data (PHI, PII, and so forth) that you could be be required to delete, obfuscate, or defend.

    The entire above vs. Governance

    Enterprises have advanced compliance necessities that may battle with all these.


    Sensible Takeaways for Builders

    I’m usually requested by engineering groups for the most effective reminiscence system or the place they need to begin their journey. Right here’s what I say.

    Begin with specific temporal scopes

    Don’t construct “reminiscence”. Whenever you want episodic reminiscence, construct it. When your use case grows and wishes semantic reminiscence, construct it. Don’t attempt to discover one system that does all of it, and don’t construct each type of reminiscence earlier than you want it.

    Take the administration step critically

    Plan easy methods to keep your reminiscence. Don’t plan on accumulating indefinitely; work out if you happen to want compression or reminiscence connection/dream conduct. How will what goes into semantic reminiscence versus RAG reminiscence? How do you deal with updates? With out figuring out these, you’ll accumulate noise, get contradictions, and your system will degrade.

    Hold uncooked episodic data

    Don’t simply depend on summaries; they will drift or lose particulars. Uncooked data allow you to return to what truly occurred and pull them in when vital.

    Model reflective reminiscence

    To assist keep away from contradictions in summaries, long-term reminiscences, and compressions, add timestamps or variations to every. This might help your brokers decide what’s true and what’s the most correct reflection of the system.

    Deal with procedural reminiscence as code

    In OpenClaw, your Brokers.MD, Reminiscence.MD, private information, and behavioral configs are all a part of your reminiscence structure. Evaluation them and hold them below supply management so you possibly can study what modifications and when. That is particularly vital in case your autonomous system can alter these primarily based on suggestions.


    Wrapup

    The write-manage-read framing is probably the most helpful takeaway from this paper. It’s easy, it’s full, and it forces you to consider all three phases as an alternative of simply “retailer stuff, retrieve stuff.”

    The taxonomy maps surprisingly nicely to what I inbuilt OpenClaw via iteration and frustration. That’s both validating or humbling, relying on the way you take a look at it (in all probability each.) The paper formalizes patterns that practitioners have been discovering independently, which is what a superb survey ought to do.

    The open issues part is sincere about how a lot is unsolved. Analysis continues to be primitive. Governance is usually ignored in follow. Coverage-learned administration is promising however immature. There’s a whole lot of runway right here.

    Reminiscence is the place the true differentiation occurs in agent techniques. Not the mannequin, not the prompts. The reminiscence structure. The paper offers you a vocabulary and a framework to assume extra clearly about it.

    About

    Nicholaus Lawson is a Resolution Architect with a background in software program engineering and AIML. He has labored throughout many verticals, together with Industrial Automation, Well being Care, Monetary Providers, and Software program corporations, from start-ups to massive enterprises.

    This text and any opinions expressed by Nicholaus are his personal and never a mirrored image of his present, previous, or future employers or any of his colleagues or associates.

    Be happy to attach with Nicholaus by way of LinkedIn at https://www.linkedin.com/in/nicholaus-lawson/



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    You Don’t Need Many Labels to Learn

    April 17, 2026

    Beyond Prompting: Using Agent Skills in Data Science

    April 17, 2026

    6 Things I Learned Building LLMs From Scratch That No Tutorial Teaches You

    April 17, 2026

    Introduction to Deep Evidential Regression for Uncertainty Quantification

    April 17, 2026

    memweave: Zero-Infra AI Agent Memory with Markdown and SQLite — No Vector Database Required

    April 17, 2026

    What It Actually Takes to Run Code on 200M€ Supercomputer

    April 16, 2026
    Leave A Reply Cancel Reply

    Editors Picks

    Extragalactic Archaeology tells the ‘life story’ of a whole galaxy

    April 18, 2026

    Swedish semiconductor startup AlixLabs closes €15 million Series A to scale atomic-level etching technology

    April 18, 2026

    Republican Mutiny Sinks Trump’s Push to Extend Warrantless Surveillance

    April 18, 2026

    Yocha Dehe slams Vallejo Council over rushed casino deal approval process

    April 18, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Hacksmith unveils Smith Blade multi-tool with 21 features

    August 7, 2025

    Boyd Gaming completes sale of FanDuel holding to Flutter

    August 2, 2025

    The 3 Best Coffee Grinders of 2025, Expert Tested

    February 21, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.