of OpenAI) posted a GitHub gist earlier this yr.
It’s referred to as “LLM Wiki.” About 1,500 phrases. It describes a sample the place you construct a private wiki that an LLM maintains for you: a persistent, compounding artifact that will get richer each time you add to it.
Information compiled as soon as and stored present, quite than re-derived from scratch on each question.
Most individuals in all probability learn it, thought “that’s attention-grabbing,” and closed the tab!
I constructed it and this text reveals the right way to set it up and I additionally inform you what I discovered throughout implementation.
Each dialog begins clean.
You open a chat, clarify who you’re, what you’re engaged on, what you determined final week. You get a helpful response. You shut the tab. Tomorrow you do it once more.
The software works superb, however the context layer beneath it’s lacking!
It’s true that in-built reminiscence helps just a little.
Claude remembers your identify and job title. ChatGPT is aware of you like bullet factors. However neither is aware of the particulars about your lively initiatives, the deal you’re about to shut, the seller you dominated out final month, or what occurred in your pipeline this week.
That sort of operational state doesn’t dwell anyplace persistent!
The choice most engineers attain for subsequent is RAG.
RAG is genuinely helpful, however it’s fixing a unique drawback.
It re-derives information from scratch on each question. You embed paperwork, retrieve chunks at question time, and hope the fitting fragments floor. Nothing accumulates.
A query that requires synthesising 5 paperwork means the LLM has to seek out and reassemble these fragments each single time.
The vault method of this text compiles information as soon as and retains it present. Once you add one thing new, the LLM indexes it, reads it, integrates it, updates associated pages, flags contradictions and maintains cross-references.
The synthesis is already achieved earlier than you ask your subsequent query.
Karpathy places it cleanly: the wiki is a persistent, compounding artifact.
The cross-references are already there. The evaluation doesn’t disappear into chat historical past. It builds.
Hey there! My identify is Sara and I cowl sensible AI constructing each week on Learn AI. Instruments, patterns, and what truly breaks in manufacturing. Free to subscribe.
The structure: two folders and a schema file
The core construction suits in a single listing tree:
vault/
├── CLAUDE.md ← schema file, entry level for any AI
├── Uncooked/ ← immutable supply paperwork
│ ├── Assembly Notes/
│ ├── Paperwork/
│ └── _pending.md ← compilation queue
└── Wiki/ ← LLM-generated, structured, listed
├── Initiatives/
├── Folks/
├── Selections/
├── _hot.md ← lively cache
├── _log.md ← audit path
└── _index.md ← grasp index
(That is simply an instance. Be at liberty to customise it)
Uncooked is your supply of fact.
Assembly transcripts, exported Slack threads, paperwork pulled from wherever your work truly occurs. The rule is absolute: the AI reads Uncooked, by no means edits it. Append-only.
Wiki is what the AI builds and maintains. One file per mission, individual, determination, or area space. Structured, cross-referenced. That is what the AI reads first whenever you ask a query.

Should you’ve labored with knowledge pipelines, this cut up is acquainted. Uncooked is your touchdown zone. Wiki is your curated layer. If Wiki drifts or will get corrupted, you rebuild from Uncooked. You by no means lose the supply.
The schema file sits on the root and tells any AI how the vault is organised, what to learn first, and what the working guidelines are. I name it CLAUDE.md. Should you’re utilizing Codex, AGENTS.md works. Identify it something, so long as you level the AI to it at first of each session.
That is the half most implementations skip, and it’s why most implementations quietly die.
A folder of markdown information is just not a system. These three information make it one.

_hot.md is the cache. Each morning, the day by day automation rewrites this file with probably the most lively threads, any key numbers or deadlines that surfaced, and one line on something pressing. It stays below 500 tokens. Once you open a dialog and desire a quick briefing, the AI reads _hot.md first, no must load the complete Wiki.
_pending.md is the queue. Each time a brand new file lands in Uncooked, its filename and date get appended right here. When the weekly compilation runs, it reads this file, processes every entry, compiles it into Wiki, and marks it [COMPILED — 2026-05-01]. With out this file, the day by day ingest and the weekly compilation can’t coordinate. You get orphaned uncooked information and a Wiki that’s weeks behind.
_log.md is the audit path. Each automated run appends a timestamped entry: what ran, what information had been processed, what Wiki pages had been created or up to date. If the system drifts, that is how you discover the place. Karpathy’s gist has a helpful tip right here: begin every log entry with a constant prefix like ## [2026-05-01] daily-ingest so the entire log is grep-parseable with fundamental unix instruments.
A vault with out these information accumulates mud. With them, you may have a working pipeline.
The schema file: educating any AI the right way to learn your vault
CLAUDE.md is the entry level. Each session begins right here.
What goes in it:
- The folder map (what’s in Uncooked, what’s in Wiki, what every subdirectory is for)
- Learn order (
_hot.mdat all times first, then the related area index) - Onerous guidelines: “by no means edit information in Uncooked/”, “by no means invent information not current in supply information”, “at all times append to _log.md after each run”
- Area construction (which indexes exist, how they’re named)

The schema file can also be the place you encode your prompting defaults. I exploit a really recognized sample, tailored straight into the schema:
I need to [TASK] in order that [WHAT SUCCESS LOOKS LIKE].
First, learn the uploaded information utterly earlier than responding.
DO NOT begin executing but. Ask me clarifying questions so we
can refine the method collectively.
Solely start work as soon as we have aligned.
When that is built-in into your schema, each AI that reads your vault already is aware of to ask earlier than executing. You cease getting half-baked output from a mannequin that assumed it understood the duty.
The prompting philosophy value encoding explicitly:
- Context beats prompts. Feed the AI information, not directions.
- Examples beat prescriptions. Present what you need, don’t describe it.
- Constraints beat guidelines. Say what the output is NOT, let the AI select how.
- Targets beat directions. Say what to realize, not how.
- State the duty and the success standards. Two sentences.
The automation layer: three cadences, not one
Two failure modes I’ve seen: you replace the vault manually and it’s superb for per week, then life occurs and it’s been three weeks since something received filed.
Otherwise you construct one huge automated job that ingests, synthesises, and audits multi function move, and now your day by day ingest is modifying Wiki information it ought to by no means contact.
The answer is to separate the roles. Let’s discover it under.
Each day (weekday mornings): ingestion solely
Pull out of your sources. Drop new information into Uncooked/. Queue them in _pending.md. Rewrite _hot.md primarily based on what surfaced.
No Wiki edits. The day by day job is mechanical, quick, and protected sufficient to run unattended on daily basis.

Right here’s what the immediate seems to be like in follow:
Each weekday morning, do the next:
1. Examine [your project management tool] for gadgets up to date or
created within the final 24 hours.
2. Examine [your meeting notes source] for brand spanking new transcripts. For
each discovered, reserve it as a markdown file in Uncooked/Assembly Notes/
utilizing the format YYYY-MM-DD — [meeting title].md.
Add a line to Uncooked/_pending.md with the filename and date.
3. Examine [your team communication tool] for messages in key
channels. Extract selections, motion gadgets, and something
that impacts an lively mission.
4. Examine [your email] for flagged or necessary messages.
Summarize what wants consideration.
After finishing the above, rewrite Wiki/_hot.md with:
- Essentially the most lively threads or open selections from right now's scan
- Any key numbers or deadlines that surfaced
- One line on something pressing
Maintain _hot.md below 500 tokens.
Substitute the bracketed placeholders together with your precise instruments. The construction works whether or not you’re pulling from Linear and Slack, or Notion and e mail, or anything.
Weekly (Monday mornings): compilation
Learn _pending.md. For every unprocessed file, learn it in full, create a structured Wiki web page in the fitting area folder, replace the related index, add backlinks to associated pages, mark the entry compiled.

The weekly job does interpretation. It synthesises uncooked content material into structured information. It’s slower, dearer, and price reviewing often to examine the AI is submitting issues appropriately.
Month-to-month (1st of the month): linting
Well being examine solely. Scan your complete Wiki for stale pages (dates or statuses that newer content material has outdated), lacking backlinks, contradictions between pages, protection gaps, and orphaned pages not referenced in any index.
Write a report file. Publish a plain-English abstract. Don’t auto-fix something.
The month-to-month job by no means touches Wiki content material straight. That boundary is what makes it protected to run with out supervision.

Every cadence has a unique threat tolerance: day by day is mechanical, weekly does interpretation and month-to-month does analysis. Mixing them in a single job is how vaults get corrupted.
On tooling: any system with scheduling works right here. A cron job with an MCP-enabled CLI, n8n, or an AI desktop software that helps scheduled duties.
The prompts above are the logic. The runner is interchangeable.
What truly modifications
You cease re-explaining your self, and the conversations shift character.
When context is already loaded, you cease utilizing AI for remoted questions and begin utilizing it for precise work.

The AI is aware of your open initiatives, your current selections, your staff. You ask “what ought to I prioritise right now?” and it reads _hot.md plus your mission information and provides you a grounded reply.
Portability is the opposite factor!
Your context lives in a folder in your machine, not inside any AI’s reminiscence system. Level a unique AI on the similar folder and it reads the identical information. Change instruments everytime you need. The vault travels.
A couple of failure modes value figuring out earlier than you construct:
_pending.md backs up if day by day ingest is simply too broad and weekly compilation can’t drain it quick sufficient. Tighten what you pull in day by day.
Wiki drifts if no person reads _log.md. The month-to-month linter catches this, however provided that you truly learn the report.
The entire system breaks if automation ever touches Uncooked. One job that writes to Uncooked “simply this as soon as” and also you’ve misplaced the source-of-truth assure. That boundary doesn’t bend.
The tedious a part of sustaining a information base isn’t the studying or the pondering.
It’s the bookkeeping. Updating cross-references, protecting summaries present, noting when new knowledge contradicts outdated claims. People abandon wikis as a result of the upkeep burden grows quicker than the worth.
LLMs don’t get bored, don’t overlook to replace a cross-reference, and might contact 15 information in a single move.
Karpathy traces this again to Vannevar Bush’s Memex idea from 1945, a private curated information retailer with associative trails between paperwork. Bush’s imaginative and prescient was nearer to this than to what the net grew to become. The half he couldn’t clear up was who does the upkeep.
The vault I’ve been operating makes use of Claude because the AI layer and a markdown software because the entrance finish.
The sample works with any AI that reads information and any scheduler that may run a immediate on a clock! The folder is only a folder. The information are simply textual content.
You set this up as soon as. After that, your AI stops ranging from zero.
Thanks for studying!

