Retrieval for Time-Series: How Looking Back Improves Forecasts

Helps in Time Collection Forecasting

Everyone knows the way it goes: Time-series information is hard.

Conventional forecasting fashions are unprepared for incidents like sudden market crashes, black swan occasions, or uncommon climate patterns.

Even massive fancy fashions like Chronos typically wrestle as a result of they haven’t handled that form of sample earlier than.

We will mitigate this with retrieval. With retrieval, we’re capable of ask Has something like this occurred earlier than? after which utilizing that previous instance to information the forecast.

As all of us may know now, in pure language processing (NLP), this concept known as Retrieval-Augmented Era (RAG). It’s changing into in style too within the time-series forecasting world.

The mannequin then considers previous conditions that look just like the present one, and from there it may well make extra dependable predictions.

How is that this RAF totally different from conventional time-series? Retrieval forecasting provides an specific reminiscence entry step.

As an alternative of:

Previous -> parameters -> forecast

With retrieval we now have:

Present state of affairs -> similarity search -> concrete previous episodes-> forecast

Retrieval-Augmented Forecasting Cycle. Picture by Writer | Serviette AI.

As an alternative of simply utilizing what the mannequin discovered throughout coaching, the thought is to present it entry to a variety of comparable conditions.

It’s like letting a climate mannequin test, “What did previous winters like this one appear to be earlier than?”.

Hey there, I’m Sara Nóbrega, an AI Engineer. For those who’re engaged on comparable issues or need suggestions on making use of these concepts, I accumulate my writing, sources, and mentoring hyperlinks here.

On this article, I discover retrieval–augmented forecasting from first rules and present, with concrete examples and code examples, how retrieval can be utilized in actual forecasting pipelines.

What Is Retrieval-Augmented Forecasting (RAF)?

What’s RAF? On a really high-level view, as a substitute solely leaning on what a mannequin discovered in coaching, RAF lets the mannequin actively search for concrete previous conditions just like the present one and use their outcomes to information its prediction.

Let’s see it extra intimately:

You change the present state of affairs (e.g., the previous couple of weeks of a time collection inventory dataset) into a question.
This question is then used to search a database of historic time-series segments to search out essentially the most comparable patterns.
These matches don’t want to return from the identical inventory; the system also needs to floor comparable actions from different shares or monetary merchandise.

It retrieves these patterns and what occurred afterwards.

Afterwards, this info is ingested to the forecasting mannequin to assist it make higher predictions.

This method is highly effective in:

Zero-shot situations: When the mannequin faces one thing it wasn’t educated on.
Uncommon or anomalous occasions: Like COVID, sudden monetary crashes, and so forth.
Evolving seasonal tendencies: The place previous information comprises useful patterns, however they shift over time.

RAF doesn’t change your forecasting mannequin, however as a substitute augments it by giving it additional hints and grounding it in related historic examples.

One other instance: let’s say you need to forecast power consumption throughout an unusually scorching week.

As an alternative of hoping your mannequin recollects how heatwaves have an effect on utilization, retrieval finds comparable previous heatwaves and lets the mannequin contemplate what occurred in that point.

What Do These Fashions Truly Retrieve?

The retrieved “information” isn’t solely uncooked information. It’s context that provides the mannequin clues.

Listed here are some widespread examples:

Examples of Data Retrieval. Image by Author | Napkin AI. — Examples of Knowledge Retrieval. Picture by Writer | Serviette AI.

As you possibly can see, retrieval focuses on significant historic conditions, like uncommon shocks, seasonal results and patterns which have comparable buildings. These give actionable context for the present forecast.

How Do These Fashions Retrieve?

To seek out related patterns from the previous, these fashions use structured mechanisms that characterize the present state of affairs in a means that makes it straightforward to look massive databases and discover the closest matches.

The code snippets on this part are a simplified illustration meant to construct instinct, they don’t characterize manufacturing code.

Retrieval methods for time series forecasting. Image by Author | Napkin AI. — Retrieval strategies for time collection forecasting. Picture by Writer | Serviette AI.

A few of these strategies are:

Embedding-Primarily based Similarity

This one converts time-series (or patches/home windows of a collection) into compact vectors, then evaluate them with distance metrics like Euclidean or cosine similarity.

In easy phrases: The mannequin turns chunks of time-series information into quick summaries after which checks which previous summaries look most just like what’s occurring now.

Some retrieval-augmented forecasters (e.g., RAFT) retrieve essentially the most comparable historic patches from the coaching information / whole collection after which mixture retrieved values with attention-like weights.

In easy phrases: It finds comparable conditions from the previous and averages them, paying extra consideration to the finest matches.

import numpy as np

# Instance: embedding-based retrieval for time-series patches
# This can be a toy instance to indicate the *thought* behind retrieval.
# In follow:
# - embeddings are discovered by neural networks
# - similarity search runs over hundreds of thousands of vectors
# - this logic lives inside a bigger forecasting pipeline


def embed_patch(patch: np.ndarray) -> np.ndarray:
    """
    Convert a brief time-series window ("patch") right into a compact vector.

    Right here we use easy statistics (imply, std, min, max) purely for illustration.
    Actual-world techniques may use:
      - a educated encoder community
      - shape-based representations
      - frequency-domain options
      - latent vectors from a forecasting spine
    """
    return np.array([
        patch.mean(),   # average level
        patch.std(),    # volatility
        patch.min(),    # lowest point
        patch.max()     # highest point
    ])


def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
    """
    Measure how comparable two vectors are.
    Cosine similarity focuses on *route* quite than magnitude,
    which is usually helpful for evaluating patterns or shapes.
    """
    return float(a @ b) / (np.linalg.norm(a) * np.linalg.norm(b) + 1e-9)


# Step 1: Symbolize the present state of affairs

# A brief window representing the present time-series habits
query_patch = np.array([10, 12, 18, 25, 14, 11])

# Flip it into an embedding
query_embedding = embed_patch(query_patch)


# Step 2: Symbolize historic conditions

# Previous home windows extracted from historic information
historical_patches = [
    np.array([9, 11, 17, 24, 13, 10]),   # seems to be comparable
    np.array([2, 2, 2, 2, 2, 2]),        # flat, unrelated
    np.array([10, 13, 19, 26, 15, 12])   # very comparable
]

# Convert all historic patches into embeddings
historical_embeddings = [
    embed_patch(patch) for patch in historical_patches
]

# Step 3: Examine and retrieve essentially the most comparable previous circumstances

# Compute similarity scores between the present state of affairs
# and every historic instance
similarities = [
    cosine_similarity(query_embedding, hist_emb)
    for hist_emb in historical_embeddings
]

# Rank historic patches by similarity
top_k_indices = np.argsort(similarities)[::-1][:2]

print("Most comparable historic patches:", top_k_indices)

# Step 4 (conceptual):
# In a retrieval-augmented forecaster, the mannequin would now:
# - retrieve the *future outcomes* of those comparable patches
# - weight them by similarity (attention-like weighting)
# - use them to information the ultimate forecast
# This integration step is model-specific and never proven right here.

Retrieval Instruments and Libraries

1. FAISS
FAISS is a brilliant quick and GPU-friendly library for similarity search over dense vectors. The finest datasets for this library are those which can be massive and in-reminiscence, although its construction makes real-time updates tougher to implement.

import faiss
import numpy as np

# Suppose we have already got embeddings for historic home windows
d = 128  # embedding dimension
xb = np.random.randn(100_000, d).astype("float32")  # historic embeddings
xq = np.random.randn(1, d).astype("float32")        # question embedding

index = faiss.IndexFlatIP(d)   # inside product (usually used with normalized vectors for cosine-like habits)
index.add(xb)

ok = 5
scores, ids = index.search(xq, ok)
print("Nearest neighbors (ids):", ids)
print("Similarity scores:", scores)

# Some FAISS indexes/algorithms can run on GPU.

Nearest-neighbor lookup (Annoy)
The Annoy library is comparatively light-weight and simple to work with.

The very best datasets for this library is historic datasets that stay largely static, since any modification to the dataset requires rebuilding the index.

from annoy import AnnoyIndex
import numpy as np

# Variety of values in every embedding vector.
# The "size" of every fingerprint.
f = 64

# Create an Annoy index.
# This object will retailer many previous embeddings and assist us shortly discover essentially the most comparable ones.
ann = AnnoyIndex(f, "angular")
# "angular" distance is usually used to match patterns
# and behaves equally to cosine similarity.

# Add historic embeddings (previous conditions).
# Every merchandise represents a compressed model of a previous time-series window.
# Right here we use random numbers simply for example.
for i in vary(10000):
    ann.add_item(i, np.random.randn(f).tolist())

# Construct the search construction.
# This step organizes the info so similarity searches are quick.
# After this, the index turns into read-only.
ann.construct(10)

# Save the index to disk.
# This permits us to load it later with out rebuilding all the things.
ann.save("hist.ann")

# Create a question embedding.
# This represents the present state of affairs we need to evaluate
# in opposition to previous conditions.
q = np.random.randn(f).tolist()

# Discover the 5 most comparable previous embeddings.
# Annoy returns the IDs of the closest matches.
neighbors = ann.get_nns_by_vector(q, 5)

print("Nearest neighbors:", neighbors)

# Necessary be aware:
# As soon as the index is constructed, you can not add new gadgets.
# If new historic information seems, the index should be rebuilt.

Qdrant / Pinecone

Qdrant and Pinecone are like Google for embeddings.

You retailer plenty of vector “fingerprints” (plus additional tags like metropolis/season), and when you’ve a brand new fingerprint, you ask:

Present me essentially the most comparable ones however solely from this metropolis/season/retailer sort.”
That is what makes them simpler than rolling your individual retrieval: they deal with quick search and filtering!

Qdrant calls metadata payload, and you may filter search outcomes utilizing circumstances.

# Instance solely (for instinct). Actual code wants a working Qdrant occasion + actual embeddings.

from qdrant_client import QdrantClient, fashions

shopper = QdrantClient(url="http://localhost:6333")

assortment = "time_series_windows"

# Faux that is the embedding of the *present* time-series window
query_vector = [0.12, -0.03, 0.98, 0.44]  # shortened for readability

# Filter = "solely contemplate previous home windows from New York in summer season"
# Qdrant documentation exhibits filters constructed from FieldCondition + MatchValue. :contentReference[oaicite:3]{index=3}
query_filter = fashions.Filter(
    should=[
        models.FieldCondition(
            key="city",
            match=models.MatchValue(value="New York"),
        ),
        models.FieldCondition(
            key="season",
            match=models.MatchValue(value="summer"),
        ),
    ]
)

# In actual utilization, you’d name search/question and get again the closest matches
# plus their payload (metadata) if you happen to request it.
outcomes = shopper.search(
    collection_name=assortment,
    query_vector=query_vector,
    query_filter=query_filter,
    restrict=5,
    with_payload=True,   # return metadata so you possibly can examine what you retrieved
)

print(outcomes)

# What you'd do subsequent (conceptually):
# - take the matched IDs
# - load the precise historic home windows behind them
# - feed these home windows (or their outcomes) into your forecasting mannequin

Pinecone shops metadata key-value pairs alongside vectors and allows you to filter at question time (together with $eq) and return metadata.

# Instance solely (for instinct). Actual code wants an API key + an index host.

from pinecone import Pinecone

laptop = Pinecone(api_key="YOUR_API_KEY")
index = laptop.Index(host="INDEX_HOST")

# Faux that is the embedding of the present time-series window
query_vector = [0.12, -0.03, 0.98, 0.44]  # shortened for readability

# Ask for essentially the most comparable previous home windows, however solely the place:
# metropolis == "New York" AND season == "summer season"
# Pinecone docs present query-time filtering and `$eq`. :contentReference[oaicite:5]{index=5}
res = index.question(
    namespace="home windows",
    vector=query_vector,
    top_k=5,
    filter={
        "metropolis": {"$eq": "New York"},
        "season": {"$eq": "summer season"},
    },
    include_metadata=True,  # return tags so you possibly can sanity-check matches
    include_values=False
)

print(res)

# Conceptually subsequent:
# - use the returned IDs to fetch the underlying historic home windows/outcomes
# - situation your forecast on these retrieved examples

Why do vector DBs assist? They allow you to do similarity search + “SQL-like WHERE filters” in a single step, which is tough to do cleanly with a DIY setup (each Qdrant payload filtering and Pinecone metadata filtering are first-class options of their docs.)

Every device has its trade-offs. As an example, FAISS is nice for efficiency however isn’t fitted to frequent updates. Qdrant provides flexibility and real-time filtering. Pinecone is straightforward to arrange however SaaS-only.

Retrieval + Forecasting: Tips on how to Mix Them

After understanding what to retrieve, the following step is to mix that info with the present enter.

It might range relying on the structure and the duty. There are a number of methods for doing this (see picture beneath).

Strategies for Combining Retrieval and Forecasting — Methods for Combining Retrieval and Forecasting. Picture by Writer | Serviette AI.

A. Concatenation
Thought: deal with retrieved context as “extra enter” by appending it to the present sequence (quite common in retrieval-augmented technology setups).

Works effectively with transformer-based fashions like Chronos and doesn’t require structure adjustments.

import torch

# x_current: the mannequin's common enter sequence (e.g., final N timesteps or tokens)
# form: [batch, time, d_model]   (or [batch, time] if you happen to assume in tokens)
x_current = torch.randn(8, 128, 256)

# x_retrieved: retrieved context encoded within the SAME illustration house
# e.g., embeddings for comparable previous home windows (or their summaries)
# form: [batch, retrieved_time, d_model]
x_retrieved = torch.randn(8, 32, 256)

# Easy fusion: simply append retrieved context to the top of the enter sequence
# Now the mannequin sees: [current history ... + retrieved context ...]
x_fused = torch.cat([x_current, x_retrieved], dim=1)

# In follow, you'd additionally add:
# - an consideration masks (so the mannequin is aware of what’s actual vs padded)
# - phase/sort embeddings (so the mannequin is aware of which half is retrieved context)
# Then feed x_fused to your transformer.

B. Cross-Consideration Fusion
Thought: hold the “present enter” and “retrieved context” separate, and let the mannequin attend to retrieved context when it wants it. That is the core “fusion within the decoder through cross-attention” sample utilized by retrieval-augmented architectures like FiD.

import torch

# current_repr: illustration of the present time-series window
# form: [batch, time, d_model]
current_repr = torch.randn(8, 128, 256)

# retrieved_repr: illustration of retrieved home windows (may very well be concatenated)
# form: [batch, retrieved_time, d_model]
retrieved_repr = torch.randn(8, 64, 256)

# Consider cross-attention like:
# - Question (Q) comes from the present sequence
# - Keys/Values (Okay/V) come from retrieved context
Q = current_repr
Okay = retrieved_repr
V = retrieved_repr

# Consideration scores: "How a lot ought to every present timestep take a look at every retrieved timestep?"
scores = torch.matmul(Q, Okay.transpose(-1, -2)) / (Q.measurement(-1) ** 0.5)

# Flip scores into weights (in order that they sum to 1 throughout retrieved positions)
weights = torch.softmax(scores, dim=-1)

# Weighted sum of retrieved info (that is the “fused” retrieved sign)
retrieval_signal = torch.matmul(weights, V)

# Last fused illustration: present information + retrieved information
# (Some fashions add, some concatenate, some use a discovered projection)
fused = current_repr + retrieval_signal

# Then the forecasting head reads from `fused` to foretell the longer term.

C. Combination-of-Specialists (MoE)
Thought: mix two “consultants”:

the retrieval-based forecaster (non-parametric, case-based)

the base forecaster (parametric information)

A “gate” decides which one to belief extra at every time step.

import torch

# base_pred: forecast from the primary mannequin (what it "discovered in weights")
# form: [batch, horizon]
base_pred = torch.randn(8, 24)

# retrieval_pred: forecast recommended by retrieved comparable circumstances
# form: [batch, horizon]
retrieval_pred = torch.randn(8, 24)

# context_for_gate: abstract of the present state of affairs (may very well be final hidden state)
# form: [batch, d_model]
context_for_gate = torch.randn(8, 256)

# gate: a quantity between 0 and 1 saying "how a lot to belief retrieval"
# (In actual fashions, this can be a tiny neural internet.)
gate = torch.sigmoid(torch.randn(8, 1))

# Combination: convex mixture
# - if gate ~ 1 -> belief retrieval extra
# - if gate ~ 0 -> belief the bottom mannequin extra
final_pred = gate * retrieval_pred + (1 - gate) * base_pred

# In follow:
# - gate is perhaps timestep-dependent: form [batch, horizon, 1]
# - you may also add coaching losses to stabilize routing/utilization (widespread in MoE)

D. Channel Prompting
Thought: deal with retrieved collection as additional enter channels/options (particularly pure in multivariate time collection, the place every variable is a “channel”).

import torch

# x: multivariate time collection enter
# form: [batch, time, channels]
# Instance: channels may very well be [sales, price, promo_flag, temperature, ...]
x = torch.randn(8, 128, 5)

# retrieved_series_aligned: retrieved sign aligned to the identical time grid
# Instance: common of the top-k comparable previous home windows (or one consultant neighbor)
# form: [batch, time, retrieved_channels]
retrieved_series_aligned = torch.randn(8, 128, 2)

# Channel prompting = append retrieved channels as additional options
# Now the mannequin will get "regular channels + retrieved channels"
x_prompted = torch.cat([x, retrieved_series_aligned], dim=-1)

# In follow you’d seemingly additionally embrace:
# - a masks or confidence rating for retrieved channels
# - normalization so retrieved alerts are on a comparable scale
# Then feed x_prompted into the forecaster.

Some fashions even mix a number of strategies.

A typical strategy is to retrieve a number of comparable collection, merge them utilizing consideration so the mannequin can deal with essentially the most related components, after which feed them to an professional.

Wrap-up

Retrieval-Augmented Forecasting (RAF) lets your mannequin be taught from the previous in a means that conventional time-series modeling doesn’t obtain.

It acts like an exterior reminiscence that helps the mannequin navigate unfamiliar conditions with extra confidence.

It’s easy to experiment with and delivers significant enhancements in forecasting duties.

Retrieval will not be an educational hype anymore, it’s already delivering leads to real-world techniques.

Thanks for studying!

My identify is Sara Nóbrega. I’m an AI engineer centered on MLOps and on deploying machine studying techniques into manufacturing.

References

[1] J. Liu, Y. Zhang, Z. Wang et al., Retrieval-Augmented Time Collection Forecasting (2025), arXiv preprint
Supply: https://arxiv.org/html/2505.04163v1

[2] UConn DSIS, TS-RAG: Time-Collection Retrieval-Augmented Era (n.d.), GitHub Repository
Supply: https://github.com/UConn-DSIS/TS-RAG

[3] Y. Zhang, H. Xu, X. Chen et al., Reminiscence-Augmented Forecasting for Time Collection with Uncommon Occasions (2024), arXiv preprint
Supply: https://arxiv.org/abs/2412.20810

Source link

Retrieval for Time-Series: How Looking Back Improves Forecasts

How to Study the Monotonicity and Stability of Variables in a Scoring Model using Python

A Gentle Introduction to Stochastic Programming

Proxy-Pointer RAG: Multimodal Answers Without Multimodal Embeddings

DeepSeek’s new AI model is rolling out quietly, not to the Wall Street market shock

System Design Series: Apache Flink from 10,000 Feet, and Building a Flink-powered Recommendation Engine

Agentic AI: How to Save on Tokens

Apple Plugs Security Hole That Enabled FBI to Access Deleted Signal Messages on iPhone

GPU Performance Comparison Shows Surprising Variability

How to Study the Monotonicity and Stability of Variables in a Scoring Model using Python

Vision-only manipulation is hitting a wall

Featured Picks

Data Science in 2026: Is It Still Worth It?

Today’s NYT Strands Hints, Answer and Help for Nov. 2 #609

Generative AI hype distracts us from AI’s more important breakthroughs

Retrieval for Time-Series: How Looking Back Improves Forecasts

Helps in Time Collection Forecasting

What Is Retrieval-Augmented Forecasting (RAF)?

What Do These Fashions Truly Retrieve?

How Do These Fashions Retrieve?

Embedding-Primarily based Similarity

Retrieval Instruments and Libraries

Qdrant / Pinecone

Retrieval + Forecasting: Tips on how to Mix Them

Wrap-up

References

Related Posts