Proxy-Pointer RAG: Achieving Vectorless Accuracy at Vector RAG Scale and Cost

launch of PageIndex lately, is a part of a broader shift in AI structure towards “Vectorless RAG” or “Reasoning-Primarily based Retrieval.” As an alternative of the usual methodology of splitting paperwork into random chunks and looking through mathematical similarity, PageIndex builds a “Good Desk of Contents” (a hierarchical tree) that enables LLMs to navigate paperwork like a human knowledgeable would. Quite a few blogs (together with this one from Microsoft), define the working rules (no vector database, no chunking, enhanced explainability) together with 98.7% accuracy achieved on a monetary benchmark. Nevertheless, they’re additionally cautious to notice that Vectorless RAG is greatest fitted to deep-dive queries on complicated structured or semi-structured paperwork (reminiscent of monetary statements), than looking throughout many impartial paperwork, reminiscent of buyer help information bases, the place we should always proceed to make use of Vector RAG.

Why is that?

If Vectorless RAG utilizing PageIndex gives higher (or no less than pretty much as good) outcomes on virtually any question, why not use it for a big assortment of paperwork. The first purpose is that PageIndex’s tree-based strategy can not virtually scale to multi-document situations. The hierarchical tree index that may be a pre-requisite ingestion step is gradual and costly to construct utilizing a LLM. Moreover, the retrieval is a 2 step course of: use an LLM to stroll the tree, find essentially the most related nodes, then use the content material of these nodes as context for the response synthesis step utilizing the LLM.

Compared, constructing a vector index is quick and cheap, and the retrieval step makes use of a LLM solely as soon as in the course of the synthesis step. Additionally, Ingestion utilizing an embedding mannequin prices a lot lower than summarization of the total doc by an LLM.

What if you happen to might get the wonderful structure-aware reasoning accuracy of Vectorless RAG, together with the low latency and value of a Vector RAG, in a approach that’s scalable throughout the enterprise database? On this article, I’ll stroll by way of an actual use case on a big, complicated doc to construct Proxy-Pointer RAG—an ingestion and retrieval pipeline that achieves this by way of a set of novel engineering steps. Alongside the way in which, we’ll discover and reveal the next:

Why precisely is PageIndex so correct? And why it’s tough to virtually scale the idea to multi-document information bases.
A fast comparability of Vectorless RAG utilizing PageIndex vs Flat Vector RAG to determine a baseline.
How can we incorporate the rules of PageIndex right into a Vector index with not one of the related latency and price?
Comparability of all kinds of queries utilizing PageIndex and Proxy-Pointer to check the standard of retrievals.

Use Case Setup

We’ll use a World Financial institution report named South Asia Development Update, April 2024: Jobs for Resilience (License: CC BY 3.0 IGO). This can be a 131 web page report comprising a number of chapters, complicated charts, tables, content material in bins and so on. and is an efficient candidate for PageIndex to show its functionality. I’ve used gemini-3-flash because the LLM to construct the pageindex tree and gemini-3.1-flash-lite for retrievals. I extracted the report pdf to a markdown file utilizing the Adobe PDF Extract API, however every other methodology reminiscent of utilizing a VLM which preserves the integrity of the tables, charts and so on would work simply as effectively. For vector database, FAISS is used.

How does PageIndex work?

As an alternative of the “chunk your doc, embed the chunks, retrieve the top-Ok, feed them to an LLM” pipeline of a Vector RAG, PageIndex takes a radically totally different strategy to doc retrieval. As an alternative of treating a doc as a flat sequence of chunks, it builds a semantic skeleton tree — a hierarchical map of each part, sub-section, and content material block within the doc — after which makes use of an LLM to navigate that tree at question time.

Section 1: Indexing(as soon as per doc)

PageIndex parses the doc’s heading construction (Markdown headers, PDF outlines, and so on.) right into a nested tree. Every node will get:

A title (extracted from the heading)
A node ID (distinctive identifier like 0012)
Line boundaries (begin and finish line within the supply doc)
A abstract (generated by an LLM — that is the costly and time-consuming half)

The result’s a JSON that appears like this:

{
  "node_id": "0011",
  "title": "Chapter 1. Misleading Power",
  "abstract": "Covers South Asia's development outlook, inflation traits, monetary vulnerabilities, local weather dangers, and coverage challenges...",
  "line_num": 621,
  "nodes": [
    {
      "node_id": "0012",
      "title": "Introduction",
      "summary": "Summarizes the chapter's key themes including regional growth driven by India...",
      "line_num": 625
    },
    ...
  ]
}

Section 2: Retrieval (Per Question)

When a person asks a query, PageIndex arms the complete tree of summaries to an LLM and says, “Which nodes comprise the reply?”. That is not like a Vector RAG which depends on mathematical similarity between question and chunk embeddings to construct the related context.

The LLM reads the summaries — not the total textual content — and returns a brief listing of node IDs. PageIndex then makes use of the road boundaries to slice the actual, contiguous, full part from the unique markdown file and passes it to the synthesis LLM.

Why this works so effectively?

PageIndex excels due to three architectural benefits:

1. Structural Navigation, Not Sample Matching

Once we ask “What are the primary messages of Chapter 1?”, PageIndex doesn’t seek for chunks containing these phrases. It reads the abstract of node 0011 (“Chapter 1. Misleading Power”) which says “Covers development outlook, inflation, monetary vulnerabilities, local weather dangers, and coverage challenges” — and instantly is aware of that is the best node. It causes about relevance, not semantic and lexical similarity.

2. Contiguous Context Extraction

As soon as the best nodes are recognized, PageIndex extracts the total, unbroken part that the node represents, from the unique Markdown — headers, sub-headers, bullet factors, determine references, and all. The synthesis LLM receives context that reads like a correctly authored doc part, not a fragmented chunk with arbitrary boundaries.

3. Zero Chunk Boundary Artifacts

There aren’t any overlapping chunks, no cut up sentences, no context home windows that begin mid-paragraph. Every bit of context has a pure starting (the part header) and a pure finish (the subsequent part’s begin). This dramatically reduces hallucination from ambiguous context.

Nevertheless, this energy comes at a value—and that is the place the strategy begins to interrupt down.

Why that is tough to scale?

The explanations are Price and Latency (ingestion in addition to retrieval time). For our 131-page report, we get 137 structural nodes within the tree and subsequently, PageIndex wants 137 LLM calls only for the indexing part — one abstract per node. For a corpus of fifty paperwork, that’s ~7,000 LLM calls earlier than we’ve got used it to reply a single query.

Here’s a comparative price of PageIndex vs Vector Index:

Metric	PageIndex	Commonplace Vector RAG
Indexing LLM calls	~137 per doc	0
Indexing time	5-10 minutes per doc	30 seconds per doc
Indexing price	$$$ (LLM API charges)	$ (embedding API solely)
Multi-document help	Requires per-document tree traversal throughout retrieval	Single unified index, quick and environment friendly
Retrieval high quality	★★★★★	★★★☆☆

For a single essential doc reminiscent of a monetary report, this price is definitely justified. For a 500-document enterprise information base on customer support SOPs that updates weekly? It’s a non-starter.

That is the place PageIndex begins changing into impractical for actual methods at scale.

Earlier than proposing an alternate, let’s set up a baseline: how significantly better is PageIndex in comparison with normal Vector RAG?

Comparability of Vectorless vs Flat Vector RAG

To determine a baseline, I in contrast the outcomes of a various vary of queries utilizing Vectorless and flat Vector RAG. As anticipated, the standard of Vectorless RAG was higher in all instances. Listed below are two examples:

Question about desk content material

Right here is the response from Vectorless RAG:

=== VECTORLESS RAG INTERACTION ===
Query: what does annexure desk a2.1.1 say
Nodes Retrieved: 0125, 0131
Response:
Annex Desk A2.1.1, titled "Research included within the literature assessment," gives a complete listing of the educational and coverage papers utilized to estimate the expansion elasticity of employment (Annex Determine A2.1.1; Annex 2.1).

The desk catalogs 25 particular research, detailing the next data for every:
*   **Paper:** The writer(s) and publication supply.
*   **Interval of primary pattern:** The timeframe coated by the research (starting from the Sixties by way of 2019).
*   **Pattern protection:** The geographical or organizational scope (e.g., Worldwide, Europe, South Asia, Africa, Asia).
*   **Area:** The precise area analyzed.
*   **Economies:** The precise nations included within the research pattern.

In line with the offered textual content, the articles listed on this desk had been chosen primarily based on particular standards: they concentrate on development elasticity of employment utilizing country-level estimates. The choice excludes articles that current solely subnational, gender-specific, or business/sector-specific estimates, in addition to these targeted solely on Okun's legislation and unemployment elasticity of development (Annex 2.1).

And the identical utilizing Vector RAG, which utterly misses the content material

=== VECTOR RAG INTERACTION ===
Query: what does annexure desk a2.1.1 say
Retrieved 5 chunks.
Response:
Primarily based on the offered context, there isn't any data accessible describing the contents of a desk labeled "A2.1.1."

The paperwork listing "A2.1.1" within the desk of contents as **"Research included within the literature assessment"** (web page 93). Nevertheless, the detailed content material offered in your search outcomes pertains to different tables, particularly:

*   **Annex 2.1:** Describes a abstract of the literature on the expansion elasticity of employment, noting that "an intensive literature search was carried out of research on the elasticity of employment with respect to output development."
*   **Annex Determine A2.1.1 (referenced as A2..1.11.1 within the textual content):** Titled "Estimates of output elasticity of employment," which states, "The literature gives a variety of estimates on the expansion elasticity of whole employment."
*   **Annex Desk A2.3.3:** Gives particular regression information concerning the robustness assessments for the feminine employment ratio.

There is no such thing as a particular information or content material offered for Annex Desk A2.1.1 past its title within the desk of contents.

Question requiring Structural information of the doc

The question is what are the questions answered by chapter 2? Right here is the response from Vectorless RAG. It exactly positioned Chapter 2 and the part of “Questions” inside that and revealed it verbatim, together with the grounding data for explainability.

=== VECTORLESS RAG INTERACTION ===
Query: what are the questions answered by chapter 2
Nodes Retrieved: 0098
Response:
Primarily based on the offered excerpts, Chapter 2 addresses the next questions:

*   **How does South Asia’s employment examine with that of different rising market and creating economies (EMDEs)?**
*   **What are the correlates of long-run employment ratios amongst EMDEs?**
*   **How can South Asia create the roles wanted to soak up its rising working-age inhabitants, whereas additionally boosting productiveness?**

These questions are explicitly listed within the part titled **"Questions."**

The chunks in Vector RAG don’t preserve the structural hierarchy and part integrity, so whereas it locates the “Questions” part, it’s not capable of finding the particular questions in it.

=== VECTOR RAG INTERACTION ===
Query: what are the questions answered by chapter 2
Retrieved 5 chunks.
Response:
Primarily based on the offered doc, Chapter 2—titled "Jobless Growth"—is structured to handle particular inquiries concerning labor markets and financial development.

The chapter explicitly identifies a "Questions" part (web page 65) to border its evaluation. Whereas the doc doesn't listing the particular textual content of those questions, it signifies that the chapter covers the next thematic areas and analytical parts:

*   **Developments and Productiveness:** The chapter examines "Developments in employment and labor productiveness" (web page 67) and addresses "Labor productiveness development" (web page 73).
*   **Employment Ratios:** A good portion of the chapter is devoted to "Lengthy-run employment ratios" (web page 75) and their numerous "Correlates," together with:
    *   **Coverage and Structural Components:** Tax regimes, coverage uncertainty (web page 77), and commerce openness (web page 79).
    *   **Market and Monetary Components:** Entry to finance (web page 81), in addition to product, labor, and land markets (web page 82).
    *   **Human Capital and Agency Traits:** Schooling (web page 83) and institution dimension (web page 78).
*   **Progress Dynamics:** The chapter explores the "Engines of output development in EMDEs" (web page 73) and the "Sectoral employment composition" (web page 72).
*   **Coverage Implications:** The analysis concludes by addressing "Coverage choices to spice up employment" (web page 85).

So the query turns into: can we retain these benefits with out paying the fee?

Engineering a Higher Retriever — Proxy-Pointer RAG

Let’s construct Proxy-Pointer RAG to reply that query. The core perception is intuitive and as follows:

You don’t want LLM summaries to present a vector database structural consciousness. You simply must encode the construction into the embeddings themselves.

The system makes use of the identical structural tree utilizing PageIndex — however with out the costly summarization flag set. Constructing this skeletal tree requires no costly LLM calls throughout indexing. The tree is constructed purely from regex-based heading detection, which runs in milliseconds.

Then, as an alternative of asking an LLM to navigate the tree, we let FAISS do the retrieval — however we engineer the chunks in order that FAISS “understands” the place every chunk lives within the doc’s hierarchy.

Here’s a view of the Ingestion pipeline:

Ingestion Pipeline

Construct a Skeleton Tree

PageIndex’s tree parser doesn’t really want an LLM to construct the structural hierarchy. The heading detection is regex-based — it finds Markdown headers (#, ##, ###) and builds the nesting from indentation ranges. The LLM is barely used to summarize every node.

We name the LLM-free model a Skeleton Tree: identical construction, identical node IDs, identical line boundaries — however no summaries.

# Construct skeleton tree — no LLM, runs in milliseconds
pageindex = PageIndex(doc_path, enable_ai=False)
tree = pageindex.build_structure()  # Pure regex parsing

The skeleton tree and the summarized tree produced for the sooner Vectorless RAG produce similar constructions — identical 137 nodes, identical nesting depths, identical line numbers, identical titles. The one distinction is the lacking abstract discipline.

Price: $0. Time: < 1 second.

Structural Metadata Pointers (The Core Differentiator)

That is the center of why PageIndex works so effectively — and the trick we’ll undertake.

In normal vector RAG, a retrieved chunk is the context. No matter 500 phrases FAISS returns, that’s what the LLM sees. If the chunk begins mid-sentence or ends earlier than the important thing information level, the response will miss the intent of the question fully (as illustrated within the earlier part on Vectorless vs Vector RAG comparability).

PageIndex does one thing basically totally different: the chunk isn’t the context. Every node within the tree is aware of its actual place within the unique doc — its title, its node ID, and crucially, the begin and finish line numbers of your complete part it represents. When retrieval selects a node, PageIndex goes again to the unique Markdown file and slices out the full, contiguous part between these line boundaries.

We replicate this precisely. Each chunk we embed into the vector index carries wealthy structural metadata from the tree node:

metadata = {
    "doc_id": "SADU",           # Which doc
    "node_id": "0012",          # Which structural node
    "title": "Introduction",    # Part heading
    "start_line": 624,          # The place the part begins within the unique file
    "end_line": 672             # The place the part ends
}

At retrieval time, we don’t feed the matched chunks to the LLM. As an alternative, we:

Use the chunks as proxies — they’re solely there to establish which nodes are related. Take away duplicate (doc_id, node_id) combos to get distinctive top-k.
Observe the metadata pointers — open the unique Markdown, slice traces of nodes, e.g. 624 to 672
Ship the total sections — the LLM receives the whole, pristine, structurally-intact textual content

Here’s a view of the retrieval pipeline:

This implies even when a bit solely matched on a single sentence deep inside a bit, the synthesis LLM will get the complete part — with its header, its context, its figures, its conclusions. The chunk was disposable; the pointer is what issues.

This is the reason I name it Proxy-Pointer RAG: the vectors are proxies for location, the metadata are tips that could the actual content material.

Price: $0. Affect: Transforms context high quality from fragmented chunks to finish doc sections.

Breadcrumb Injection (Structural Context)

That is key to reply the queries associated to particular part of the doc (reminiscent of Chapter 2). Commonplace vector RAG embeds uncooked textual content:

"Whereas non-public funding development has slowed in each South Asia and different EMDEs..."

FAISS has no thought this chunk comes from Chapter 1, underneath Financial Exercise, inside Field 1.1. So when person asks “primary messages of Chapter 1,” this chunk gained’t rank extremely — it doesn’t comprise the phrases “Chapter 1” or “primary messages.”

Breadcrumb injection prepends the total ancestry path from the Skeleton Tree to each chunk earlier than embedding:

"[Chapter 1. Deceptive Strength > Economic activity > Regional developments > BOX 1.1 Accelerating Private Investment]
Whereas non-public funding development has slowed in each South Asia and different EMDEs..."

Now the embedding vector encodes each the content material AND its structural location. When somebody asks about “Chapter 1,” FAISS is aware of which chunks belong to Chapter 1 — as a result of the phrases “Chapter 1. Misleading Power” are current within the embedding.

# Construct breadcrumb from ancestry
current_crumb = f"{parent_breadcrumb} > {node_title}"

# Prepend to chunk textual content earlier than embedding
enriched_text = f"[{current_crumb}]n{section_text}"
chunks = text_splitter.split_text(enriched_text)

This can be a zero-cost encoding of the tree construction into the vector house. We’re utilizing the identical embeddings API, the identical FAISS index, the identical retrieval code. The one distinction is what we feed into the embedder.

Price: $0 additional. Affect: Transforms retrieval high quality for structural queries.

Construction Guided Chunking (No Blind Sliding Home windows)

Commonplace vector RAG applies a sliding window throughout the complete doc — a 2000-character window that strikes ahead with some overlap, utterly oblivious to the doc’s construction. A bit may begin mid-paragraph within the Introduction and finish mid-sentence in a Determine caption. The boundaries are arbitrary, and each chunk is an island in itself, with no information of its place within the general doc construction.

Proxy-Pointer does one thing basically totally different: we stroll the tree, not the textual content.

For every node within the skeleton tree, we extract solely its personal part textual content — from start_line to end_line — after which apply the textual content splitter to that remoted part. If a bit is brief sufficient, it turns into a single chunk. If it’s longer, the splitter divides it — however strictly inside that part’s boundaries.

Commonplace RAG:  Blind sliding window throughout complete doc
[====chunk1====][====chunk2====][====chunk3====]...
    ↑ may begin in Introduction, finish in Determine caption

Proxy-Pointer: Chunk inside every node's boundaries
Introduction (traces 624-672)     → [chunk A] [chunk B]
Financial Exercise (traces 672-676) → [chunk C]
BOX 1.1 (traces 746-749)          → skipped (< 100 chars)
Inflation (traces 938-941)        → [chunk D]

This ensures three issues:

Chunks by no means cross part boundaries — a bit from Introduction won’t ever overlap with Financial Exercise
Every chunk belongs to precisely one node — so the node_id metadata is all the time exact
Breadcrumbs are correct per-chunk — they mirror the precise structural container, not a guess

Importantly, when a node is skipped (as a result of its textual content is just too quick — e.g., a “BOX 1.1” heading with no physique content material), the tree stroll nonetheless recurses into its kids. The precise content material lives in little one nodes like “Introduction,” “Options,” and “Figures” — all of which get embedded with the mother or father’s title of their breadcrumb (eg: BOX 1.1 Accelerating Personal Funding > Introduction, BOX 1.1 > Options of...). No content material is ever misplaced; solely empty structural headers are excluded.

Price: $0. Affect: Each chunk is structurally traceable — enabling exact metadata pointers.

Noise Filtering (Eradicating Distractions)

Some sections in any doc are irrelevant for retrievals and contaminate the context: they comprise key phrases that match virtually each question however present no helpful content material.

Desk of Contents — mentions each chapter title (matches “Chapter 1,” “Chapter 2,” the whole lot)
Government Abstract — paraphrases each key discovering (matches each matter question)
Abbreviations — lists each acronym used within the doc
Acknowledgments — mentions organizations, nations, and themes

These sections act as distractions in vector house. They sit close to the centroid of the doc’s embedding distribution, pulling retrieval towards shallow overview textual content as an alternative of deep analytical content material.

We apply a easy title-based filter throughout indexing:

NOISE_TITLES = {
    "contents", "abstract of contents", "desk of contents",
    "abbreviations", "acknowledgments", "foreword",
    "govt abstract", "references"
}

if title.strip().decrease() in NOISE_TITLES:
    proceed  # Skip this node fully

This filter eliminated 7 nodes from our check doc. The influence was rapid: for the question “What are the primary messages of Chapter 1?”, the system stopped retrieving the Government Abstract and Desk of Contents and began retrieving the precise Introduction part.

Price: $0. Affect: Eliminates the commonest retrieval failure mode.

That’s it. Proxy-Pointer RAG is now prepared to check.

Check Proxy-Pointer pipeline

Whereas detailed comparability is talked about within the subsequent part, listed here are the outcomes of the identical two queries that had been cited within the “Vectorless vs Flat Vector RAG” part. As you possibly can examine, the solutions are as correct and detailed as Vectorless RAG utilizing PageIndex.

Question about desk content material

=== PROXY-POINTER RAG INTERACTION ===
Query: what does annexure desk a2.1.1 say
Distinctive Context Nodes Extracted:
  - [SADU] Node 0131: 'ANNEX TABLE A2.1.1 Research included within the literature assessment' | Traces: 3952-4157
  - [SADU] Node 0092: 'ANNEX TABLE SL.B1.1.1 Marginal chance of research documenting above-average impact for poor-average households' | Traces: 2537-2745
  - [SADU] Node 0052: 'TABLE B1.1.1 Correlates of the chance of personal funding accelerations beginning' | Traces: 1663-1899

Response:
Primarily based on the offered paperwork, **Annex Desk A2.1.1** is titled **"Research included within the literature assessment."**

This desk gives a abstract of assorted analysis research, detailing the next data for every:
*   **Paper:** The writer(s) and yr of the research (e.g., Kapsos (2005), Saget (2000), Papola and Sahu (2012), and so on.).
*   **Interval of primary pattern:** The timeframe coated by the research (starting from the Sixties to 2019).
*   **Pattern protection:** The dimensions or variety of entities analyzed (e.g., "160 nations," "90 creating nations," "47 African nations").
*   **Area:** The geographic scope of the analysis (e.g., Worldwide, South Asia, Africa, Europe, Center East, Asia).
*   **Economies:** The precise nations or financial groupings included within the analysis (e.g., "India," "Bulgaria, Czech Republic, Estonia...", "G20 nations").

The desk paperwork a various set of analysis spanning from 1960 by way of 2019, overlaying areas and economies throughout the globe, together with particular information on transition economies, numerous OECD nations, and particular areas reminiscent of South Asia and Africa.

Question requiring Structural information

=== PROXY-POINTER RAG INTERACTION ===
Query: what are the questions answered by chapter 2
Distinctive Context Nodes Extracted:
  - [SADU] Node 0099: 'FIGURE 2.3 Progress fashions' | Traces: 3127-3155
  - [SADU] Node 0098: 'Questions' | Traces: 3115-3127
  - [SADU] Node 0124: 'ANNEX 2.1 Abstract of literature on the expansion elasticity of employment' | Traces: 3807-3812

Response:
Primarily based on the offered textual content, Chapter 2 addresses the next questions concerning South Asia’s financial development and labor market:

*   **How does South Asia’s employment examine with that of different rising market and creating economies (EMDEs)?** (Supply: *Chapter 2, Questions part*)
*   **What are the correlates of long-run employment ratios amongst EMDEs?** (Supply: *Chapter 2, Questions part*)
*   **How can South Asia create the roles wanted to soak up its rising working-age inhabitants, whereas additionally boosting productiveness?** (Supply: *Chapter 2, Questions part*)

Moreover, the chapter explores these points with the purpose of understanding how the area can harness its working-age inhabitants to speed up convergence towards the revenue ranges of superior economies.

Whereas we’ve got demonstrated the fee benefit to date, does the structure overcome the scalability problem of PageIndex?

Is Proxy-Pointer Scalable?

Right here is why the structure is scalable throughout an enterprise information base. PageIndex pays a scalability penalty at each ends: ~137 LLM calls per doc throughout indexing, and a further LLM reasoning step per question for tree navigation. Proxy-Pointer eliminates each.

No LLM at indexing. The skeleton tree is regex-built in milliseconds. The one API calls are to the embedding mannequin — similar to plain vector RAG.
No tree navigation at retrieval. Queries go straight to the vector index. No LLM studying summaries, no per-document traversal.

Proxy-Pointer is normal vector RAG with clever metadata baked in. The structural consciousness lives contained in the embeddings (through breadcrumbs) and the chunk metadata (through node pointers) — not in an LLM reasoning loop. It inherits all of vector RAG’s scalability: unified multi-document indexes, sub-linear search, incremental updates, and nil per-query LLM overhead past the ultimate synthesis.

Fail-safe for unstructured paperwork: If a doc has no headings — or the skeleton tree produces solely a single root node — the system detects this throughout chunking and falls again to a typical sliding window. Chunks are flagged with empty node_id and line boundaries. At retrieval time, flagged chunks are used straight as LLM context as an alternative of following pointers again to the supply. The system gracefully degrades to plain vector RAG — no errors, no particular dealing with required.

Let’s examine Vectorless RAG and Proxy-Pointer head-to-head.

Vectorless vs Proxy-Pointer RAG

I ran a wide range of queries — broad structural, cross-reference, particular factual, determine particular and so on. And let Claude decide the responses for a complete comparability. You could find the detailed responses from Vectorless and Proxy-Pointer together with the total Quality Comparison report here.

The next desk encapsulates the decision. The Closing Rating: PageIndex 2 — Proxy 4 — Ties 4 . In different phrases, Proxy-Pointer matches or beats PageIndex on 8 out of 10 queries. And all on the scalability and value of a Flat Vector RAG.

Right here is the abstract verdict:

#	Question Kind	Winner
1	Broad structural (Ch.1 messages)	🔴 PageIndex
2	Broad structural (Ch.2 messages)	🔴 PageIndex (slender)
3	Particular factual (Field 1.1 options)	🟡 Tie
4	Cross-reference (inflation tables)	🟢 Proxy-Pointer
5	Comparative (India vs area)	🟢 Proxy-Pointer
6	Determine-specific (B1.1.1 traits)	🟢 Proxy-Pointer
7	Direct lookup (Annexure A2.1.1)	🟡 Tie
8	Entity-specific (foreign money disaster nations)	🟡 Tie
9	Navigational (Ch.2 questions)	🟡 Tie
10	Inferential/coverage (govt vs shocks)	🟢 Proxy-Pointer

And right here is the fee comparability:

Metric	PageIndex	Proxy-Pointer	Commonplace Vector RAG
Indexing LLM calls	~137 per doc	0	0
Indexing time	5-10 min/doc	< 30 sec/doc	< 30 sec/doc
Retrieval high quality	★★★★★	★★★★★ (8/10 vs PageIndex)	★★★☆☆
Multi-doc scalability	Poor (per-doc tree nav)	Glorious (unified vector index)	Glorious
Structural consciousness	Full (LLM-navigated)	Excessive (breadcrumb-encoded)	None
Index rebuild on replace	Costly (re-summarize)	Low-cost (re-embed affected nodes)	Low-cost
Explainability	Excessive (part titles + doc IDs)	Excessive (part titles + doc IDs)	Low (opaque chunks)

Key Takeaways

Construction is the lacking ingredient in RAG. The standard hole between naive vector RAG and PageIndex isn’t about higher embeddings — it’s about preserving hierarchy.
You don’t want an LLM to encode construction. Breadcrumb injection and structural metadata give the vector index, structural consciousness with none price.
Noise filtering beats higher embeddings. Eradicating 7 low-value nodes from the index had extra influence on retrieval high quality than any mannequin swap might.
Pointers beat chunks. Chunks act as proxies for the total part, which is what the synthesizer LLM sees.

Conclusion

Proxy-Pointer RAG proves a easy thesis: you don’t want an costly LLM to make a retriever structurally conscious — you simply should be intelligent about what you embed.

5 zero-cost engineering methods — skeleton timber, metadata pointers, breadcrumbs, structure-guided chunking, and noise filtering — shut the standard hole with a full LLM-navigated system, whereas retaining the pace and scalability of normal vector RAG. On our 10-query benchmark, Proxy-Pointer matched or beat PageIndex on 8 out of 10 queries, at the price of a typical Vector RAG.

The following time you’re constructing RAG in your structured (or unstructured) doc repository, don’t attain for a much bigger mannequin. Attain for Proxy-Pointer index.

Join with me and share your feedback at www.linkedin.com/in/partha-sarkar-lets-talk-AI

Reference

World Bank. 2024. South Asia Development Update, April 2024: Jobs for Resilience — License: CC BY 3.0 IGO.

_{Photos used on this article are generated utilizing Google Gemini. Code created by me.}

Source link

Proxy-Pointer RAG: Achieving Vectorless Accuracy at Vector RAG Scale and Cost

Loop Engineering for RAG Question Parsing: The Small Loop That Runs Before Retrieval

How to Find the Optimal Coding Agent Interface

I Completed Five Years in Analytics Consulting: 5 Lessons That Changed How I Work

GPU-Resident Top-K for Agentic RAG: I Built a CUDA Kernel So My Retrieval Step Would Stop Bouncing Off the GPU

Can Machine Learning Predict the World Cup?

Automate Writing Your LLM Prompts

These Were My Favorite Things Samsung Unpacked During Its 2026 Galaxy Event

AI minister role boosted but tech department axed in Burnham shake-up

Loop Engineering for RAG Question Parsing: The Small Loop That Runs Before Retrieval

The risk of weather data sabotage is rising

Featured Picks

Simulated nuclear reactor aids space propulsion research

Google’s Conversational Photo Editor Is the Rare AI Feature People Will Actually Use

The Aurora Borealis Is Back: These 18 States Have a Chance to See It Over 2 Nights

Proxy-Pointer RAG: Achieving Vectorless Accuracy at Vector RAG Scale and Cost

Use Case Setup

How does PageIndex work?

Section 1: Indexing(as soon as per doc)

Section 2: Retrieval (Per Question)

Why this works so effectively?

Why that is tough to scale?

Comparability of Vectorless vs Flat Vector RAG

Question about desk content material

Question requiring Structural information of the doc

Engineering a Higher Retriever — Proxy-Pointer RAG

Construct a Skeleton Tree

Structural Metadata Pointers (The Core Differentiator)

Breadcrumb Injection (Structural Context)

Construction Guided Chunking (No Blind Sliding Home windows)

Noise Filtering (Eradicating Distractions)

Check Proxy-Pointer pipeline

Question about desk content material

Question requiring Structural information

Is Proxy-Pointer Scalable?

Vectorless vs Proxy-Pointer RAG

Key Takeaways

Conclusion

Reference

Related Posts