Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • I Replaced GPT-4 with a Local SLM and My CI/CD Pipeline Stopped Failing
    • Humanoid data: 10 Things That Matter in AI Right Now
    • 175 Park Avenue skyscraper in New York will rank among the tallest in the US
    • The conversation that could change a founder’s life
    • iRobot Promo Code: 15% Off
    • My Smartwatch Gives Me Health Anxiety. Experts Explain How to Make It Stop
    • How to Call Rust from Python
    • Agent orchestration: 10 Things That Matter in AI Right Now
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Wednesday, April 22
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»From Reactive to Predictive: Forecasting Network Congestion with Machine Learning and INT
    Artificial Intelligence

    From Reactive to Predictive: Forecasting Network Congestion with Machine Learning and INT

    Editor Times FeaturedBy Editor Times FeaturedJuly 19, 2025No Comments8 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    Context

    facilities, community slowdowns can seem out of nowhere. A sudden burst of visitors from distributed techniques, microservices, or AI coaching jobs can overwhelm change buffers in seconds. The issue is not only figuring out when one thing goes flawed. It’s with the ability to see it coming earlier than it occurs.
    Telemetry techniques are extensively used to watch community well being, however most function in a reactive mode. They flag congestion solely after efficiency has degraded. As soon as a hyperlink is saturated or a queue is full, you might be already previous the purpose of early analysis, and tracing the unique trigger turns into considerably tougher.

    In-band Community Telemetry, or INT, tries to resolve that hole by tagging stay packets with metadata as they journey via the community. It provides you a real-time view of how visitors flows, the place queues are increase, the place latency is creeping in, and the way every change is dealing with forwarding. It’s a highly effective instrument when used rigorously. But it surely comes with a price. Enabling INT on each packet can introduce critical overhead and push a flood of telemetry information to the management airplane, a lot of which you won’t even want.

    What if we might be extra selective? As an alternative of monitoring all the pieces, we forecast the place bother is more likely to kind and allow INT only for these areas and only for a short while. This manner, we get detailed visibility when it issues most with out paying the complete value of always-on monitoring.

    The Drawback with At all times-On Telemetry

    INT provides you a robust, detailed view of what’s taking place contained in the community. You may observe queue lengths, hop-by-hop latency, and timestamps instantly from the packet path. However there’s a price: this telemetry information provides weight to each packet, and in the event you apply it to all visitors, it could possibly eat up vital bandwidth and processing capability.
    To get round that, many techniques take shortcuts:

    Sampling: Tag solely a fraction (e.g. — 1%) of packets with telemetry information.

    Occasion-triggered telemetry: Activate INT solely when one thing dangerous is already taking place, like a queue crossing a threshold.

    These strategies assist management overhead, however they miss the vital early moments of a visitors surge, the half you most need to perceive in the event you’re making an attempt to forestall slowdowns.

    Introducing a Predictive Strategy

    As an alternative of reacting to signs, we designed a system that may forecast congestion earlier than it occurs and activate detailed telemetry proactively. The thought is straightforward: if we are able to anticipate when and the place visitors goes to spike, we are able to selectively allow INT only for that hotspot and just for the best window of time.

    This retains overhead low however provides you deep visibility when it truly issues.

    System Design

    We got here up with a easy strategy that makes community monitoring extra clever. It could possibly predict when and the place monitoring is definitely wanted. The thought is to not pattern each packet and to not look forward to congestion to occur. As an alternative, we wish a system that might catch indicators of bother early and selectively allow high-fidelity monitoring solely when it’s wanted.

    So, how’d we get this achieved? We created the next 4 vital parts, every for a definite activity.

    Picture supply: Writer

    Knowledge Collector

    We start by gathering community information to watch how a lot information is shifting via totally different community ports at any given second. We use sFlow for information assortment as a result of it helps to gather essential metrics with out affecting community efficiency. These metrics are captured at common intervals to get a real-time view of the community at any time.

    Forecasting Engine

    The Forecasting engine is an important element of our system. It’s constructed utilizing a Lengthy Brief-Time period Reminiscence (LSTM) mannequin. We went with LSTM as a result of it learns how patterns evolve over time, making it appropriate for community visitors. We’re not searching for perfection right here. The essential factor is to identify uncommon visitors spikes that sometimes present up earlier than congestion begins.

    Telemetry Controller

    The controller listens to these forecasts and makes selections. When a predicted spike crosses alert threshold the system would reply. It sends a command to the switches to modify into an in depth monitoring mode, however just for the flows or ports that matter. It additionally is aware of when to again off, turning off the additional telemetry as soon as circumstances return to regular.

    Programmable Knowledge Aircraft

    The ultimate piece is the change itself. In our setup, we use P4 programmable BMv2 switches that allow us alter packet habits on the fly. More often than not, the change merely forwards visitors with out making any adjustments. However when the controller activates INT, the change begins embedding telemetry metadata into packets that match particular guidelines. These guidelines are pushed by the controller and allow us to goal simply the visitors we care about.

    This avoids the tradeoff between fixed monitoring and blind sampling. As an alternative, we get detailed visibility precisely when it’s wanted, with out flooding the system with pointless information the remainder of the time.

    Experimental Setup

    We constructed a full simulation of this method utilizing:

    • Mininet for emulating a leaf-spine community
    • BMv2 (P4 software program change) for programmable information airplane habits
    • sFlow-RT for real-time visitors stats
    • TensorFlow + Keras for the LSTM forecasting mannequin
    • Python + gRPC + P4Runtime for the controller logic

    The LSTM was skilled on artificial visitors traces generated in Mininet utilizing iperf. As soon as skilled, the mannequin runs in a loop, making predictions each 30 seconds and storing forecasts for the controller to behave on.

    Right here’s a simplified model of the prediction loop:

    For each 30 seconds:
    latest_sample = data_collector.current_traffic()
    slinding_window += latest_sample
    if sliding_window dimension >= window dimension:
    forecast = forecast_engine.predict_upcoming_traffic()
    if forecast > alert_threshold:
    telem_controller.trigger_INT()

    Switches reply instantly by switching telemetry modes for particular flows.

    Why LSTM?

    We went with an LSTM mannequin as a result of community visitors tends to have construction. It’s not fully random. There are patterns tied to time of day, background load, or batch processing jobs, and LSTMs are notably good at selecting up on these temporal relationships. In contrast to less complicated fashions that deal with every information level independently, an LSTM can keep in mind what got here earlier than and use that reminiscence to make higher short-term predictions. For our use case, which means recognizing early indicators of an upcoming surge simply by taking a look at how the previous couple of minutes behaved. We didn’t want it to forecast actual numbers, simply to flag when one thing irregular could be coming. LSTM gave us simply sufficient accuracy to set off proactive telemetry with out overfitting to noise.

    Analysis

    We didn’t run large-scale efficiency benchmarks, however via our prototype and system habits in take a look at circumstances, we are able to define the sensible benefits of this design strategy.

    Lead Time Benefit

    One of many major advantages of a predictive system like that is its capacity to catch bother early. Reactive telemetry options sometimes wait till a queue threshold is crossed or efficiency degrades, which suggests you’re already behind the curve. Against this, our design anticipates congestion primarily based on visitors tendencies and prompts detailed monitoring prematurely, giving operators a clearer image of what led to the difficulty, not simply the signs as soon as they seem.

    Monitoring Effectivity

    A key purpose on this undertaking was to maintain overhead low with out compromising visibility. As an alternative of making use of full INT throughout all visitors or counting on coarse-grained sampling, our system selectively permits high-fidelity telemetry for brief bursts, and solely the place forecasts point out potential issues. Whereas we haven’t quantified the precise value financial savings, the design naturally limits overhead by retaining INT centered and short-lived, one thing that static sampling or reactive triggering can’t match.

    Conceptual Comparability of Telemetry Methods

    Whereas we didn’t document overhead metrics, the intent of the design was to discover a center floor, delivering deeper visibility than sampling or reactive techniques however at a fraction of the price of always-on telemetry. Right here’s how the strategy compares at a excessive degree:

    Picture supply: Writer

    Conclusion

    We needed to determine a greater option to monitor the community visitors. By combining machine studying and programmable switches, we constructed a system that predicts congestion earlier than it occurs and prompts detailed telemetry in simply the best place and time.

    It looks like a minor change to foretell as a substitute of react, however it opens up a brand new degree of observability. As telemetry turns into more and more essential in AI-scale information facilities and low-latency providers, this type of clever monitoring will grow to be a baseline expectation, not only a good to have.

    References

    1. https://www.researchgate.net/publication/340034106_Adaptive_Telemetry_for_Software-Defined_Mobile_Networks
    2. https://liyuliang001.github.io/publications/hpcc.pdf



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    I Replaced GPT-4 with a Local SLM and My CI/CD Pipeline Stopped Failing

    April 22, 2026

    How to Call Rust from Python

    April 22, 2026

    Inside the AI Power Move That Could Redefine Finance

    April 22, 2026

    Git UNDO : How to Rewrite Git History with Confidence

    April 22, 2026

    DIY AI & ML: Solving The Multi-Armed Bandit Problem with Thompson Sampling

    April 21, 2026

    Your RAG Gets Confidently Wrong as Memory Grows – I Built the Memory Layer That Stops It

    April 21, 2026

    Comments are closed.

    Editors Picks

    I Replaced GPT-4 with a Local SLM and My CI/CD Pipeline Stopped Failing

    April 22, 2026

    Humanoid data: 10 Things That Matter in AI Right Now

    April 22, 2026

    175 Park Avenue skyscraper in New York will rank among the tallest in the US

    April 22, 2026

    The conversation that could change a founder’s life

    April 22, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Ketogenic diet may protect brain from Alzheimer’s in APOE4 carriers

    October 10, 2025

    Paris-based ArcaScience raises €6 million to transform drug development with AI-powered benefit-risk intelligence

    September 3, 2025

    Does Red-Light Therapy Work? (2025)

    December 15, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.