Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • 1000xResist Studio’s Next Indie Game Asks: Can You Convince an AI It Isn’t Human?
    • Efficient hybrid minivan delivers MPG
    • How Can Astronauts Tell How Fast They’re Going?
    • A look at the AI nonprofit METR, whose time-horizon metrics are used by AI researchers and Wall Street investors to track the rapid development of AI systems (Kevin Roose/New York Times)
    • Double Dazzle: This Weekend, There Are 2 Meteor Showers in the Night Sky
    • asexual fish defy extinction with gene repair
    • The ‘Lonely Runner’ Problem Only Appears Simple
    • Binance and Bitget to probe a rally in RaveDAO’s RAVE token, which surged 4,500% in a week, after ZachXBT alleged RAVE insiders engineered a large short squeeze (Francisco Rodrigues/CoinDesk)
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Sunday, April 19
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»From Connections to Meaning: Why Heterogeneous Graph Transformers (HGT) Change Demand Forecasting
    Artificial Intelligence

    From Connections to Meaning: Why Heterogeneous Graph Transformers (HGT) Change Demand Forecasting

    Editor Times FeaturedBy Editor Times FeaturedJanuary 28, 2026No Comments13 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    forecasting errors will not be brought on by unhealthy time-series fashions.

    They’re brought on by ignoring construction.

    SKUs don’t behave independently. They work together by way of shared vegetation, product teams, warehouses, and storage areas. A requirement shock to 1 SKU typically propagates to others — but most forecasting methods mannequin every SKU in isolation.

    In my previous article, we confirmed that explicitly modeling these connections issues. Utilizing an actual FMCG supply-chain graph, a easy Graph Neural Community (GraphSAGE) diminished SKU-level forecast error by over 27% in comparison with a powerful naïve baseline, purely by permitting data to circulate throughout associated SKUs.

    However GraphSAGE makes a simplifying assumption: all relationships are equal.

    A shared plant is handled the identical as a shared product group. Substitutes and enhances are averaged right into a single sign. This limits the mannequin’s skill to anticipate actual demand shifts.

    This text explores what occurs when the mannequin is allowed not simply to see the supply-chain community, however to perceive the which means of every relationship inside it.

    We present how Heterogeneous Graph Transformers (HGT) introduce relationship-aware studying into demand forecasting, and why that seemingly small change produces extra anticipatory forecasts, tighter error distributions, and materially higher outcomes — even on intermittent, day by day per-SKU demand — turning related forecasts into meaning-aware, operationally grounded predictions.

    A short recap: What GraphSAGE advised us

    Within the earlier article, we skilled a spatio-temporal GraphSAGE mannequin on an actual FMCG supply-chain graph with:

    • 40 SKUs
    • 9 vegetation
    • 21 product teams
    • 36 subgroups
    • 13 storage areas

    Every SKU was related to others by way of shared vegetation, teams, and areas — making a dense internet of operational dependencies. The temporal traits displayed lumpy manufacturing and intermittent demand, a typical state of affairs in FMCG.

    GraphSAGE allowed every SKU to mixture data from its neighbors. That produced a big leap in forecast high quality.

    Mannequin WAPE (SKU-daily)
    Naïve baseline 0.86
    GraphSAGE ~0.62

    On the hardest potential stage — day by day, per-SKU, intermittent demand — a WAPE of ~0.62 is already nearly production-grade in FMCG.

    However the error plots confirmed one thing essential:

    • The mannequin adopted traits properly
    • It dealt with zeros properly
    • However it smoothed away excessive spikes
    • And it reacted as an alternative of anticipating

    As a result of GraphSAGE assumes that all relationships are equal. Assuming all relations have equal weightage means the mannequin can not be taught that:

    • A requirement spike in a complementary SKU in the identical plant ought to enhance my forecast
    • However a spike in a substitute SKU in the identical product group ought to scale back it

    Let’s see how Heterogeneous Graph Transformer (HGT) addresses the problem.

    What HGT provides: Relationship-aware studying

    Heterogeneous Graph Transformers are constructed for graphs the place:

    • There are a number of forms of nodes (SKUs, vegetation, warehouses, teams) and/or
    • There are a number of forms of edges (shared vegetation, product teams and so forth.)

    On this case, whereas all nodes within the graph are SKUs, the relationships between them are heterogeneous. Right here, HGT isn’t used to mannequin a number of entity sorts, however to be taught relation-aware message passing.

    The mannequin learns separate transformation and a focus mechanisms for every kind of SKU–SKU relationship, permitting demand alerts to propagate otherwise relying on why two SKUs are related.

    It learns:

    “How ought to data circulate throughout every kind of relationship?”

    Formally, as an alternative of 1 aggregation perform, HGT learns:

    [
    h_i = sum_{r in {text{plant}, text{group}, text{subgroup}, text{storage}}}
    sum_{j in N_r(i)} alpha_{r,i,j} W_r h_j
    ]

    the place

    • r represents the kind of operational relationship between SKUs (shared plant, product group, and so forth.)
    • Wᵣ permits the mannequin to deal with every relationship otherwise
    • αᵣ,ᵢ,ⱼ lets the mannequin deal with essentially the most influential neighbors
    • The set Nr(i) comprises all SKUs which can be straight related to SKU i by way of a shared relationship r.

    This lets the mannequin be taught, for instance:

    • Plant edges propagate capability and manufacturing alerts
    • Product-group edges propagate substitution and demand switch
    • Warehouse edges propagate stock buffering

    The graph turns into economically significant, not simply topologically related.

    Implementation (high-level)

    Identical to within the GraphSAGE mannequin, we use:

    • The identical SupplyGraph dataset, temporal options, log1p normalization and sliding window of 14 days.

    The distinction is within the spatial encoder. The next is an summary of the structure.

    1. Heterogeneous Graph Encoder
      • Nodes: SKUs
      • Edges: shared plant, shared group, shared sub-group and shared storage
      • HGT layers be taught relation-specific message passing
    2. Temporal Encoder
      • A time-series encoder processes the final 14 days of embeddings
      • This captures how the graph evolves over time
    3. Output Head
      • A regressor predicts next-day log1p gross sales per SKU

    Every thing else — coaching, loss, analysis — stays similar to GraphSAGE. So any distinction in efficiency comes purely from higher structural understanding.

    The housing market analogy — now with which means

    Within the earlier article, we used a easy housing-market analogy to clarify why graph-based forecasting works.

    Let’s improve it.

    GraphSAGE: construction with out which means

    GraphSAGE is like predicting the worth of your own home by taking a look at:

    • The historic worth of your home
    • The typical worth motion of close by homes

    This already improves over treating your own home in isolation. However GraphSAGE makes a essential simplifying assumption:

    All neighbors affect your own home in the identical means.

    In follow, this implies GraphSAGE treats all close by entities as similar alerts. A luxurious villa, a faculty, a shopping center, a freeway, or a manufacturing facility are all simply “neighbors” whose worth alerts get averaged collectively.

    The mannequin learns that homes are related — however not why they’re related.

    HGT: construction with which means

    Now think about a extra lifelike housing mannequin.

    Each knowledge level continues to be a home — there aren’t any totally different node sorts.
    However homes are related by way of totally different sorts of relationships:

    • Some share the identical college district
    • Some share the identical builder or building high quality
    • Some are close to parks
    • Others are close to highways or industrial zones

    Every of those relationships impacts costs otherwise.

    • Colleges and parks have a tendency to extend worth
    • Highways and factories typically scale back it
    • Luxurious homes matter greater than uncared for ones

    A Heterogeneous Graph Transformer (HGT) learns these distinctions explicitly. As an alternative of averaging all neighbor alerts, HGT learns:

    • which kind of relationship a neighbor represents, and
    • how strongly that relationship ought to affect the prediction.

    That distinction is what turns a related demand forecast right into a meaning-aware, operationally grounded prediction.

    Comparability of Outcomes

    Right here is the comparability of WAPE of HGT with GraphSAGE and naive baseline:

    Mannequin WAPE
    Naive baseline 0.86
    GraphSAGE 0.62
    HGT 0.58

    At a daily-per SKU WAPE under 0.60, the Heterogeneous Graph Transformer (HGT) delivers a transparent production-grade step-change over each conventional forecasting and GraphSAGE. The outcomes depict a ~32% discount in misallocated demand vs. conventional forecasting and an additional 6–7% enchancment over GraphSAGE

    The next scatter chart depicts the precise vs predicted gross sales on the log1p scale for each GraphSAGE (purple dots) and HGT (cyan dots). Whereas each fashions are good, there’s a higher dispersion of purple dots of GraphSAGE as in comparison with the tight clustering of the cyan HGT ones, akin to the 6% enchancment in WAPE.

    Precise vs predicted (GraphSAGE vs HGT)

    On the scale of this dataset (≈ 1.1 million models), that enchancment interprets into ~45,000 fewer models misallocated over the analysis interval.

    Operationally, decreasing misallocation by this magnitude results in:

    • Fewer emergency manufacturing adjustments
    • Decrease expediting and premium freight prices
    • Extra secure plant and warehouse operations
    • Higher service ranges on high-volume SKUs
    • Much less stock trapped within the mistaken areas

    Importantly, these enhancements come with out including enterprise guidelines, planner overrides, or handbook tuning.

    And the bias comparability is as follows:

    Mannequin Imply Forecast Bias (Models) Bias %
    Naïve ~701 0 0%
    GraphSAGE ~733 +31 ~4.5%
    HGT ~710 ~8.4 ~1.2%

    HGT introduces a very small constructive bias — roughly 1–2%.

    That is properly inside production-safe limits and aligns with how FMCG planners function in follow, the place a slight upward bias is usually most well-liked to keep away from stock-outs. The next histogram confirms a Gaussian distribution centered round zero, indicating unbiased efficiency on typical forecasting days.

    Prediction error

    The actual distinction between GraphSAGE and HGT is clear after we evaluate the forecasts for the top-4 SKUs by quantity. Right here is the GraphSAGE chart:

    Forecast v Precise – Prime 4 SKUs (GraphSAGE)

    And the identical for HGT :

    Forecast v Precise – Prime 4 SKUs (HGT)

    The excellence is clear from the realm highlighted within the first chart and throughout all of different SKUs:

    • HGT isn’t reactive like GraphSAGE. It’s a stronger forecast, anticipating and monitoring the peaks and troughs of the particular demand, reasonably than smoothing out the fluctuations.
    • It is a results of the differential studying of the structural relations between neighboring SKUs, which lets it predict the change in demand confidently earlier than it has already began.

    And eventually, the efficiency throughout SKUs with non-zero volumes clearly exhibits that all the high-volume SKUs have a WAPE < 0.60, which is fascinating for a manufacturing forecast and is an enchancment over GraphSAGE.

    Efficiency throughout SKUs

    Explainability

    HGT makes it sensible to implement explainability to the forecasts — important for planners to trust on the causality of options. When the mannequin predicts a dip, and we will present it’s as a result of “Neighbor X in the identical subgroup is trending down,” planners can validate the sign in opposition to real-world logistics, turning an AI prediction into actionable enterprise perception.

    Lets take a look at the affect of various spatial and temporal options in the course of the forecast for the primary 7 days and final 7 days of the length for the SKU with most quantity (SOS001L12P). Right here is the comparability of the temporal options:

    Evolution of temporal options

    And the spatial options:

    Evolution of spatial options

    The charts present that totally different options and SKU/edges play a job throughout totally different time durations:

    • For the primary 7 days, Gross sales Lag(7d) has the utmost affect (23%) which adjustments to Rolling Imply (21%) for the final 7 days.
    • Equally in the course of the preliminary 7 days, there may be heavy reliance on SOS005L04P,  possible a main storage node or precursor SKU that dictates instant availability. By the tip of the check length, the affect redistributes. SOS005L04P shares the stage with SOS002L09P (~40% Share every) each from the identical subgroup as our goal SKU. This implies the mannequin is now aggregating alerts from a broader subgroup of associated merchandise to kind a extra holistic view.

    Any such evaluation is essential to grasp and forecast the impacts of selling campaigns and promotions or exterior elements corresponding to rates of interest on particular SKUs. These ought to be included within the spatial construction as further nodes within the graph with the SKUs linked to it.

    Not All Provide Chains Are Created Equal

    The use case here’s a comparatively easy case with solely SKUs as nodes. And that’s as a result of in FMCG, vegetation and warehouses act largely as buffers — they clean volatility however hardly ever hard-stop the system. That’s the reason, HGT might be taught a lot of their impact purely from edge sorts like shared plant or shared warehouse with out modeling them as specific nodes. Provide chains will be way more complicated. For instance, automotive provide chains are very totally different. A paint store, engine line, or regional distribution middle is a arduous capability bottleneck: when it’s constrained, demand for particular trims or colours collapses no matter market demand. In that setting, HGT nonetheless advantages from typed relationships, however it additionally requires specific Plant and Warehouse nodes with their very own time-series alerts (capability, output, backlogs, delays) to mannequin how supply-side physics work together with buyer demand. In different phrases, FMCG wants structure-aware graphs; automotive wants causality-aware graphs.

    Different elements which can be widespread throughout industries are promotions, advertising spends, seasonality, exterior elements corresponding to financial situations (eg; gasoline costs) or competitor launches in a phase. These additionally have an effect on SKUs in numerous methods. For eg; gasoline worth enhance or a brand new regulation might dampen gross sales of ICE automobiles and enhance sale of electrical ones. Such elements must be included within the graph as nodes and their relations to the SKUs included within the spatial mannequin. And their temporal options want to incorporate the historic knowledge when the occasions occurred. This could allow HGT to be taught the results of those elements on demand within the weeks and months following the occasion.

    Key Takeaways

    • Provide-chain demand isn’t just related — it’s structured. Treating all SKU relationships as equal leaves doesn’t harness the total predictive potential.
    • GraphSAGE proves that networks matter: merely permitting SKUs to trade data throughout shared vegetation, teams, and areas delivers a big accuracy leap over classical forecasting.
    • Heterogeneous Graph Transformers go one step additional by studying why SKUs are related. A shared plant, a shared subgroup, and a shared warehouse don’t propagate demand in the identical means — and HGT learns that distinction straight from knowledge.
    • That structural consciousness interprets into actual outcomes: decrease WAPE, tighter forecast dispersion, higher peak anticipation, and materially fewer misallocated models — with out enterprise guidelines, handbook tuning, or planner overrides.
    • Explainability turns into operational, not beauty. Relation-aware consideration permits planners to hint forecasts again to economically significant drivers, turning predictions into trusted choices.
    • The broader lesson: as provide chains develop extra interdependent, forecasting fashions should evolve from time-series-only to relationship-aware methods. In FMCG this implies structure-aware graphs; in additional constrained industries like automotive, it means causality-aware graphs with specific bottlenecks.

    In brief: when the mannequin understands the which means of connections, forecasting stops being reactive — and begins turning into anticipatory.

    What’s subsequent? From Ideas to Code

    Throughout this text and the previous one, we moved step-by-step by way of the evolution of demand forecasting — from remoted time-series fashions, to GraphSAGE, and at last to Heterogeneous Graph Transformers — displaying how every shift progressively improves forecast high quality by higher reflecting how actual provide chains function.

    The subsequent logical step is to maneuver from ideas to code.

    Within the subsequent article, we are going to translate these concepts into an end-to-end, implementable workflow. Utilizing centered code examples, we are going to stroll by way of :

    • Assemble the supply-chain graph and outline relationship sorts
    • Engineer temporal options for intermittent, SKU-level demand
    • Design and practice GraphSAGE and HGT fashions
    • Consider efficiency utilizing production-grade metrics
    • Visualize forecasts, errors, and relation-aware consideration
    • Add explainability so planners can perceive why a forecast modified

    The objective isn’t just to indicate practice a mannequin, however construct a production-ready, interpretable graph-based forecasting system that practitioners can adapt to their very own provide chains.

    If this text defined why construction and which means matter, the subsequent one will present precisely make them work in code.

    Join with me and share your feedback at www.linkedin.com/in/partha-sarkar-lets-talk-AI

    Reference

    SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks : Authors: Azmine Toushik Wasi, MD Shafikul Islam, Adipto Raihan Akib

    Photos used on this article are generated utilizing Google Gemini. Charts and underlying code created by me.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    AI Agents Need Their Own Desk, and Git Worktrees Give Them One

    April 18, 2026

    Your RAG System Retrieves the Right Data — But Still Produces Wrong Answers. Here’s Why (and How to Fix It).

    April 18, 2026

    Europe Warns of a Next-Gen Cyber Threat

    April 18, 2026

    How to Learn Python for Data Science Fast in 2026 (Without Wasting Time)

    April 18, 2026

    A Practical Guide to Memory for Autonomous LLM Agents

    April 17, 2026

    You Don’t Need Many Labels to Learn

    April 17, 2026

    Comments are closed.

    Editors Picks

    1000xResist Studio’s Next Indie Game Asks: Can You Convince an AI It Isn’t Human?

    April 19, 2026

    Efficient hybrid minivan delivers MPG

    April 19, 2026

    How Can Astronauts Tell How Fast They’re Going?

    April 19, 2026

    A look at the AI nonprofit METR, whose time-horizon metrics are used by AI researchers and Wall Street investors to track the rapid development of AI systems (Kevin Roose/New York Times)

    April 19, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    OpenAI and UK sign deal to use AI in public services

    July 22, 2025

    Prague’s City Center Sparkles, Buzzes, and Burns at the Signal Festival

    December 1, 2025

    JBL Grip Review: A Bluetooth Speaker With Lava Lamp Vibes

    December 8, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.