Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Nothing Phone (4a) Pro Review: A Close Second
    • Match Group CEO Spencer Rascoff says growing women’s share on Tinder is his “primary focus” to stem user declines; Sensor Tower says 75% of Tinder users are men (Kieran Smith/Financial Times)
    • Today’s NYT Connections Hints, Answers for April 20 #1044
    • AI Machine-Vision Earns Man Overboard Certification
    • Battery recycling startup Renewable Metals charges up on $12 million Series A
    • The Influencers Normalizing Not Having Sex
    • Sources say NSA is using Mythos Preview, and a source says it is also being used widely within the DoD, despite Anthropic’s designation as a supply chain risk (Axios)
    • Today’s NYT Wordle Hints, Answer and Help for April 20 #1766
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Monday, April 20
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»How to Context Engineer to Optimize Question Answering Pipelines
    Artificial Intelligence

    How to Context Engineer to Optimize Question Answering Pipelines

    Editor Times FeaturedBy Editor Times FeaturedSeptember 5, 2025No Comments9 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    engineering is likely one of the most related matters in machine studying at present, which is why I’m writing my third article on the subject. My objective is to each broaden my understanding of engineering contexts for LLMs and share that data via my articles.

    In at present’s article, I’ll talk about bettering the context you feed into your LLMs for query answering. Normally, this context relies on retrieval augmented technology (RAG), nevertheless, in at present’s ever-shifting setting, this method ought to be up to date.

    The co-founder of Chroma (a vector database supplier) tweeted that RAG is lifeless. I don’t absolutely agree that we received’t use RAG anymore, however his tweet highlights how there are completely different choices for filling the context of your LLM.

    It’s also possible to learn my earlier context engineering articles:

    1. Basic Context engineering techniques
    2. Advanced context engineering techniques

    Desk of Contents

    Why it’s best to care about context engineering

    First, let me spotlight three key factors for why it’s best to care about context engineering:

    • Higher output high quality by avoiding context rot. Fewer pointless tokens improve output high quality. You possibly can learn extra particulars about it in this article
    • Cheaper (don’t ship pointless tokens, they value cash)
    • Pace (much less tokens = sooner response instances)

    These are three core metrics for many query answering methods. The output high quality is of course of utmost precedence, contemplating customers won’t wish to use a low-performing system.

    Moreover, value ought to all the time be a consideration, and when you can decrease it (with out an excessive amount of engineering value), it’s a easy resolution to take action. Lastly, a sooner query answering system gives a greater consumer expertise. You don’t need customers ready quite a few seconds to get a response when ChatGPT will reply a lot sooner.

    The standard question-answering method

    Conventional, on this sense, means the most typical query answering method in methods constructed after the release of ChatGPT. This technique is conventional RAG, which works as follows:

    1. Fetch essentially the most related paperwork to the consumer’s query, utilizing vector similarity retrieval
    2. Feed related paperwork together with a query into an LLM, and obtain a response

    Contemplating its simplicity, this method works extremely effectively. Apparently sufficient, we additionally see this occurring with one other conventional method. BM25 has been around since 1994 and was, for instance, not too long ago utilized by Anthropic once they launched Contextual Retrieval, proving how efficient even easy data retrieval methods are.

    Nevertheless, you possibly can nonetheless vastly enhance your query answering system by updating your RAG utilizing some methods I’ll describe within the subsequent part.

    Enhancing RAG context fetching

    Despite the fact that RAG works comparatively effectively, you possibly can probably obtain higher efficiency by introducing the methods I’ll talk about on this part. The methods I describe right here all give attention to bettering the context you feed to the LLM. You possibly can enhance this context with two principal approaches:

    1. Use fewer tokens on irrelevant context (for instance, eradicating or utilizing much less materials from related paperwork)
    2. Add paperwork which are related

    Thus, it’s best to give attention to reaching one of many factors above. When you assume when it comes to precision and recall:

    1. Will increase precision (at the price of recall)
    2. Improve recall (at the price of precision)

    It is a tradeoff you should make whereas engaged on context engineering your query answering system.

    Decreasing the variety of irrelevant tokens

    On this part, I spotlight three principal approaches to cut back the variety of irrelevant tokens you feed into the LLMs context:

    • Reranking
    • Summarization
    • Prompting GPT

    When fetching paperwork from vector similarity search, they’re returned so as of most related to least related, given the vector similarity rating. Nevertheless, this similarity rating won’t precisely symbolize which paperwork are most related.

    Reranking

    You possibly can thus use a reranking mannequin, for instance, Qwen reranker, to reorder the doc chunks. You possibly can then select to solely maintain the highest X most related chunks (in accordance with the reranker), which ought to take away some irrelevant paperwork out of your context.

    Summarization

    It’s also possible to select to summarize paperwork, lowering the variety of tokens used per doc. You possibly can, for instance, maintain the total doc from the highest 10 most related paperwork fetched, summarize paperwork ranked from 11-20, and discard the remainder.

    This method will improve the probability that you simply maintain the total context from related paperwork, whereas a minimum of sustaining some context (the abstract) from paperwork which are much less more likely to be related.

    Prompting GPT

    Lastly, it’s also possible to immediate GPT whether or not the fetched paperwork are related to the consumer question. For instance, when you fetch 15 paperwork, you can also make 15 particular person LLM calls to evaluate if every doc is related. You then discard paperwork which are deemed irrelevant. Understand that these LLM calls should be parallelized to maintain response time inside an appropriate restrict.

    Including related paperwork

    Earlier than or after eradicating irrelevant paperwork, you additionally make sure you embrace related paperwork. I embrace two principal approaches on this subsection:

    • Higher embedding fashions
    • Looking via extra paperwork (at the price of decrease precision)

    Higher embedding fashions

    To search out the most effective embedding fashions, you possibly can go to the HuggingFace embedding model leaderboard, the place Gemini and Qwen are within the prime 3 as of the writing of this text. Updating your embedding mannequin is normally an inexpensive method to fetch extra related paperwork. It’s because working and storing embeddings is normally low-cost, for instance, embedding via the Gemini API, and storing vectors in Pinecone.

    Search extra paperwork

    One other (comparatively easy) method to fetch extra related paperwork is to fetch extra paperwork generally. Fetching extra paperwork naturally will increase the chance that you simply add related ones. Nevertheless, you must stability this with avoiding context rot and lowering the variety of irrelevant paperwork to a minimal. Each pointless token in an LLM name is, as earlier, more likely to:

    • Scale back output high quality
    • Improve value
    • Decrease pace

    These are all essential facets of a question-answering system.

    Agentic search method

    I’ve mentioned agentic search approaches in earlier articles, for instance, after I mentioned Scaling your AI Search. Nevertheless, on this part, I’ll dive deeper into establishing an agentic search, which replaces some or all the vector retrieval step in your RAG.

    Step one is that the consumer gives their query to a given set of information factors, for instance, a set of paperwork. You then arrange an agentic system consisting of an orchestra agent and an inventory of sub-agents.

    This determine highlights an orchestra system of LLM brokers. The primary agent receives the consumer question and assigns duties to subagents. Picture by ChatGPT.

    That is an instance of the pipeline the brokers would comply with (although there are numerous methods to set it up).

    1. Orchestra agent tells two subagents to iterate over all doc filenames and return related paperwork
    2. Related paperwork are fed again to the orchestra agent, which once more releases a subagent to every of the related paperwork, to fetch subparts (chunks) of the doc which are related to the consumer’s query. These chunks are then fed again to the orchestra agent
    3. The orchestra agent solutions the consumer’s query, given the supplied chunks

    One other circulate you possibly can implement could possibly be to retailer doc embeddings, and substitute the 1st step with vector similarity between the consumer query and every doc.

    This agentic method has upsides and drawbacks.

    Upsides:

    • Higher likelihood of fetching related chunks than with conventional RAG
    • Extra management over the RAG system. You possibly can replace system prompts, and many others, whereas RAG is comparatively static with its embedding similarities

    Draw back:

    In my view, constructing such an agent-based retrieval system is a brilliant highly effective method that may result in superb outcomes. The consideration you must make when constructing such a system is whether or not the elevated high quality you’ll (probably) see is definitely worth the improve in value.

    Different context engineering facets

    On this article, I’ve primarily coated context engineering for the paperwork we fetch in a query answering system. Nevertheless, there are additionally different facets you have to be conscious of, primarily:

    • The system/consumer immediate you’re utilizing
    • Different data fed into the immediate

    The immediate you write on your query answering system ought to be exact, structured, and keep away from irrelevant data. You possibly can learn many different articles on the subject of structuring prompts, and you may sometimes ask an LLM to enhance these facets of your immediate.

    Generally, you additionally feed different data into your immediate. A typical instance is feeding in metadata, for instance, information overlaying details about the consumer, similar to:

    • Title
    • Job position
    • What they normally seek for
    • and many others

    Everytime you add such data, it’s best to all the time ask your self:

    Does amending this data assist my query answering system reply the query?

    Generally the reply is sure, different instances it’s no. An important half is that you simply made a rational resolution on whether or not the knowledge is required within the immediate. When you can’t justify having this data within the immediate, it ought to normally be eliminated.

    Conclusion

    On this article, I’ve mentioned context engineering on your query answering system, and why it’s necessary. Query answering methods normally include an preliminary step to fetch related data. The give attention to this data ought to be to cut back the variety of irrelevant tokens to a minimal, whereas additionally together with as many related items of data as attainable.

    👉 Discover me on socials:

    🧑‍💻 Get in touch

    🔗 LinkedIn

    🐦 X / Twitter

    ✍️ Medium

    It’s also possible to learn my in-depth article on Anthropic’s contextual retrieval under:



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    KV Cache Is Eating Your VRAM. Here’s How Google Fixed It With TurboQuant.

    April 19, 2026

    Proxy-Pointer RAG: Structure Meets Scale at 100% Accuracy with Smarter Retrieval

    April 19, 2026

    Dreaming in Cubes | Towards Data Science

    April 19, 2026

    AI Agents Need Their Own Desk, and Git Worktrees Give Them One

    April 18, 2026

    Your RAG System Retrieves the Right Data — But Still Produces Wrong Answers. Here’s Why (and How to Fix It).

    April 18, 2026

    Europe Warns of a Next-Gen Cyber Threat

    April 18, 2026

    Comments are closed.

    Editors Picks

    Nothing Phone (4a) Pro Review: A Close Second

    April 20, 2026

    Match Group CEO Spencer Rascoff says growing women’s share on Tinder is his “primary focus” to stem user declines; Sensor Tower says 75% of Tinder users are men (Kieran Smith/Financial Times)

    April 20, 2026

    Today’s NYT Connections Hints, Answers for April 20 #1044

    April 20, 2026

    AI Machine-Vision Earns Man Overboard Certification

    April 20, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Agentic commerce runs on truth and context

    March 25, 2026

    Mexican national charged with robberies at Washington tribal casinos

    November 18, 2025

    The Watergate-Inspired Law That’s Being Used to Fight DOGE

    February 21, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.