Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • One Rumored Color for the iPhone 18 Pro? A Rich Dark Cherry Red
    • A Practical Guide to Memory for Autonomous LLM Agents
    • The first splittable soft-top surfboard
    • Meet the speakers joining our “How to Launch and Scale in Malta” panel at the EU-Startups Summit 2026!
    • OpenAI Executive Kevin Weil Is Leaving the Company
    • CFTC’s one-man show gets awkward on the Hill as lawmakers hammer Selig on sports bets, staffing gaps and corruption claims
    • Today’s NYT Connections: Sports Edition Hints, Answers for April 18 #572
    • You Don’t Need Many Labels to Learn
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Saturday, April 18
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»What It Actually Takes to Run Code on 200M€ Supercomputer
    Artificial Intelligence

    What It Actually Takes to Run Code on 200M€ Supercomputer

    Editor Times FeaturedBy Editor Times FeaturedApril 16, 2026No Comments12 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    you stroll throughout the campus of the Polytechnic College of Catalonia in Barcelona, you may bump into the Torre Girona chapel on a phenomenal park. Constructed within the nineteenth century, it encompasses a large cross, excessive arches, and stained glass. However inside the primary corridor, encased in an unlimited illuminated glass field, sits a unique form of structure.

    That is the historic residence of MareNostrum. Whereas the unique 2004 racks stay on show within the chapel as a museum piece, the latest iteration, MareNostrum V, one of many fifteen strongest supercomputers on this planet, spans a devoted, closely cooled facility proper subsequent door.

    Most information scientists are used to spinning up a heavy EC2 occasion on AWS or using distributed frameworks like Spark or Ray. Excessive-Efficiency Computing (HPC) on the supercomputer degree is a unique beast totally. It operates on completely different architectural guidelines, completely different schedulers, and a scale that’s troublesome to fathom till you employ it.

    I not too long ago had the possibility to make use of MareNostrum V to generate large quantities of artificial information for a machine studying surrogate mannequin. What follows is a glance underneath the hood of a 200M€ machine: what it’s, why its structure appears the way in which it does, and the way you really work together with it.

    The Structure: Why You Ought to Care Concerning the Wiring

    The psychological mannequin that causes probably the most confusion when approaching HPC is that this: you aren’t renting time on a single, impossibly highly effective pc. You might be submitting work to be distributed throughout 1000’s of unbiased computer systems that occur to share a particularly quick community.

    Why ought to a knowledge scientist care concerning the bodily networking? As a result of should you’ve ever tried to coach an enormous neural community throughout a number of AWS cases and watched your costly GPUs idle whereas ready for a knowledge batch to switch, you understand that in distributed computing, the community is the pc.

    To stop bottlenecks, MareNostrum V makes use of an InfiniBand NDR200 cloth organized in a fat-tree topology. In a normal workplace community, as a number of computer systems attempt to speak throughout the identical most important change, bandwidth will get congested. A fat-tree topology solves this by rising the bandwidth of the hyperlinks as you progress up the community hierarchy, actually making the “branches” thicker close to the “trunk.” This ensures non-blocking bandwidth: any of the 8,000 nodes can speak to every other node at precisely the identical minimal latency.

    Fats-Tree structure, by HoriZZon~commonswiki by way of Wikimedia Commons (CC BY-SA 4.0)

    The machine itself represents a joint funding from the EuroHPC Joint Endeavor, Spain, Portugal, and Turkey, cut up into two most important computational partitions:

    Basic Function Partition (GPP):

    It’s designed for extremely parallel CPU duties. It accommodates 6,408 nodes, every packing 112 Intel Sapphire Rapids cores, with a mixed peak efficiency of 45.9 PFlops. That is the one you’re going to be utilizing most frequently for the “common” computing duties.

    Accelerated Partition (ACC):

    This one is extra specialised, designed with AI coaching, molecular dynamics and such in thoughts. It accommodates 1,120 nodes, every with 4 NVIDIA H100 SXM GPUs. Contemplating a single H100 retails for roughly $25,000, the GPU value alone exceeds $110 million.
    The GPUs give it a a lot larger peak efficiency than that of the GPP, reaching as much as 260 PFlops.

    There are additionally a particular kind of nodes known as the Login Nodes. These act because the entrance door to the supercomputer. If you SSH into Mare Nostrum, that is the place you land. Login nodes are strictly for light-weight duties: shifting recordsdata, compiling code, and submitting job scripts to the scheduler. They aren’t for computing.

    Picture by Planet Volumes on Unsplash

    Quantum Infrastructure: Classical nodes are not the one {hardware} contained in the glass field. As of not too long ago, Mare Nostrum 5 has been bodily and logically built-in with Spain’s first quantum computer systems. This features a digital gate-based quantum system and the newly acquired MareNostrum-Ona, a state-of-the-art quantum annealer primarily based on superconducting qubits. Slightly than changing the classical supercomputer, these quantum processing models (QPUs) act as extremely specialised accelerators.

    When the supercomputer encounters fiercely advanced optimization issues or quantum chemistry simulations that may choke even the H100 GPUs, it may well offload these particular calculations to the quantum {hardware}, creating an enormous hybrid classical-quantum computing powerhouse.

    Airgaps, Quotas, and the Actuality of HPC

    Understanding the {hardware} is simply half the battle. The operational guidelines of a supercomputer are totally completely different from a business cloud supplier. Mare Nostrum V is a shared public useful resource, which implies the surroundings is closely restricted to make sure safety and truthful play.

    The airgap on MN-V, by writer utilizing Inkscape

    The Airgap: One of many largest shocks for information scientists transitioning to HPC is the community restriction. You may entry the supercomputer from the skin world by way of SSH, however the compute nodes completely can’t entry the skin world. There isn’t any outbound web connection. You can not pip set up a lacking library, wget a dataset, or connect with an exterior HuggingFace repository as you see match. Every thing your script wants should be pre-downloaded, compiled, and sitting in your storage listing earlier than you submit your job.

    In actuality, it’s much less of a difficulty than it seems, for the reason that Marenostrum directors present a lot of the libraries and software program chances are you’ll want by way of a module system.

    Shifting Information: Due to this strict boundary, information ingress and egress occur by way of scp or rsync by means of the login nodes. You push your uncooked datasets in over SSH, await the compute nodes to chew by means of the simulations, and pull the processed tensors again out to your native machine. One shocking facet of this restriction is that, for the reason that precise computation might be so extremely quick, the bottleneck turns into extracting the completed outcomes to your native machine for postprocessing and visualization.

    Limits and Quotas: You can not merely launch a thousand jobs and monopolize the machine. Your challenge is assigned a selected CPU-hour finances. Moreover, there are onerous limits on what number of concurrent jobs a single consumer can have working or queuing at any given time.

    You need to additionally specify a strict wall-time restrict for each single job you submit. Supercomputers don’t tolerate loitering, should you request two hours of compute time and your script wants two hours and one second, the scheduler will ruthlessly kill your course of mid-calculation to make room for the following researcher.

    Logging within the Darkish: Since you submit these jobs to a scheduler and stroll away, there isn’t any reside terminal output to stare at. As an alternative, all normal output (stdout) and normal error (stderr) are routinely redirected into log recordsdata (e.g., sim_12345.out and sim_12345.err). When your job completes, or if it crashes in a single day, you need to comb by means of these generated textual content recordsdata to confirm the outcomes or debug your code. You do, nonetheless, have instruments to observe the standing of your submitted jobs, corresponding to squeue or doing the traditional tail -f on the log recordsdata.

    Understanding SLURM Workload Supervisor

    If you lastly get your analysis allocation accredited and log into MareNostrum V by way of SSH, your reward is… a totally normal Linux terminal immediate.

    After months of writing proposals for entry to a 200M€ machine, it’s, frankly, a bit underwhelming. There are not any flashing lights, no holographic progress bars, nothing to sign simply how highly effective the engine behind the wheel is.

    Preliminary terminal view after login, by writer

    As a result of 1000’s of researchers are utilizing the machine concurrently, you can’t simply execute a heavy python or C++ script instantly within the terminal. When you do, it should run on the “login node,” shortly grinding it to a halt for everybody else and incomes you an extremely well mannered however quite agency and indignant e-mail from the system directors.

    Slurm Schema on MN-V, by writer utilizing inkscape

    As an alternative, HPC depends on a workload supervisor known as SLURM. You write a bash script detailing precisely what {hardware} you want, what software program environments to load, and what code to execute. SLURM places your job in a queue, finds the {hardware} when it turns into accessible, executes your code, and releases the nodes.

    SLURM stands for Simple Linux Utility for Resource Management, and it’s a free and open supply software program that handles job-scheduling in lots of pc clusters and supercomputers.

    Earlier than a fancy pipeline, it’s essential perceive find out how to talk with the scheduler. That is performed utilizing #SBATCH directives positioned on the high of your submission script. These directives act as your procuring checklist for sources:

    • --nodes: The variety of distinct bodily machines you want.
    • --ntasks: The full variety of separate MPI processes (duties) you need to spawn. SLURM handles distributing these duties throughout your requested nodes.
    • --time: The strict wall-clock time restrict to your job. Supercomputers don’t tolerate loitering; in case your script runs even one second over this restrict, SLURM ruthlessly kills the job.
    • --account: The precise challenge ID that shall be billed to your CPU-hours.
    • --qos: The “High quality of Service” or particular queue you might be focusing on. For example, utilizing a debug queue grants sooner entry however limits you to brief runtimes for testing.

    A Sensible Instance: Orchestrating an OpenFOAM Sweep

    To floor this in actuality, right here is how I really used the machine. I used to be constructing an ML surrogate mannequin to foretell aerodynamic downforce, which required ground-truth information from 50 high-fidelity computational fluid dynamics (CFD) simulations throughout 50 completely different 3D meshes.

    Instance move round one of many 3D meshes, by writer utilizing ParaView

    Right here is the precise SLURM job script for a single OpenFOAM CFD case on the Basic Function Partition:

    #!/bin/bash
    #SBATCH --job-name=cfd_sweep
    #SBATCH --output=logs/sim_percentj.out
    #SBATCH --error=logs/sim_percentj.err
    #SBATCH --qos=gp_debug
    #SBATCH --time=00:30:00
    #SBATCH --nodes=1
    #SBATCH --ntasks=6
    #SBATCH --account=nct_293
    
    module purge
    module load OpenFOAM/11-foss-2023a
    supply $FOAM_BASH
    
    # MPI launchers deal with core mapping routinely
    srun --mpi=pmix surfaceFeatureExtract
    srun --mpi=pmix blockMesh
    srun --mpi=pmix decomposePar -force
    srun --mpi=pmix snappyHexMesh -parallel -overwrite
    srun --mpi=pmix potentialFoam -parallel
    srun --mpi=pmix simpleFoam -parallel
    srun --mpi=pmix reconstructPar
    

    Slightly than manually submitting this 50 instances and flooding the scheduler, I used SLURM dependencies to chain every job behind the earlier one. This creates a clear, automated information pipeline:

    #!/bin/bash
    PREV_JOB_ID=""
    
    for CASE_DIR in instances/case_*; do
      cd $CASE_DIR
      
      if [ -z "$PREV_JOB_ID" ]; then
        OUT=$(sbatch run_all.sh)
      else
        OUT=$(sbatch --dependency=afterany:$PREV_JOB_ID run_all.sh)
      fi
      
      PREV_JOB_ID=$(echo $OUT | awk '{print $4}')
      cd ../..
    performed
    

    This orchestrator drops a sequence of fifty jobs into the queue in seconds. I walked away, and by the following morning, my 50 aerodynamic evaluations had been processed, logged, and able to be formatted into tensors for ML coaching.

    Instance underside stress on one of many 3D meshes, by writer utilizing ParaView

    Parallelism Limits: Amdahl’s Regulation

    A standard query from newcomers is: If in case you have 112 cores per node, why did you solely request 6 duties (ntasks=6) to your CFD simulation?

    The reply is Amdahl’s Regulation. Each program has a serial fraction that can not be parallelized. It explicitly states that the theoretical speedup of executing a program throughout a number of processors is strictly restricted by the fraction of the code that should be executed serially. It’s a really intuitive regulation and, mathematically, it’s expressed as:

    [
    S=frac{1}{(1-p)+frac{p}{N}}
    ]

    The place S is the general speedup, p is the proportion of the code that may be parallelized, 1−p is the strictly serial fraction, and N is the variety of processing cores.

    Due to that (1−p) time period within the denominator, you face an insurmountable ceiling. If simply 5% of your program is basically sequential, the utmost theoretical speedup you’ll be able to obtain, even should you use each single core in MareNostrum V, is 20x.

    Moreover, dividing a activity throughout too many cores will increase the communication overhead over that InfiniBand community we mentioned earlier. If the cores spend extra time passing boundary circumstances to one another than doing precise math, including extra {hardware} slows this system down.

    Time as sources improve for various N, by writer utilizing matplotlib

    As proven on this determine, when simulating a small system (N=100), runtime will increase after 16 threads. Solely at large scales (N=10k+) does the {hardware} turn out to be totally productive. Writing code for a supercomputer is an train in managing this compute-to-communication ratio.

    The Entry to the Immediate

    Regardless of the staggering value of the {hardware}, entry to MareNostrum V is free for researchers, as compute time is handled as a publicly funded scientific useful resource.

    If you’re affiliated with a Spanish establishment, you’ll be able to apply by means of the Spanish Supercomputing Community (RES). For researchers throughout the remainder of Europe, the EuroHPC Joint Endeavor runs common entry calls. Their “Improvement Entry” monitor is particularly designed for tasks porting code or benchmarking ML fashions, making it extremely accessible for information scientists.

    If you sit at your desk watching that utterly unremarkable SSH immediate, it’s straightforward to overlook what you might be really . What that blinking cursor doesn’t present is the 8,000 nodes it connects to, the fat-tree cloth routing messages between them at 200 Gb/s, or the scheduler coordinating a whole bunch of concurrent jobs from researchers throughout six nations.

    The “single highly effective pc” image persists in our heads as a result of it’s easier. However the distributed actuality is what makes fashionable computing potential, and it’s far more accessible than most individuals notice.

    References

    [1] Barcelona Supercomputing Heart, MareNostrum 5 Technical Specs (2024), BSC Press Room. https://towardsdatascience.com/what-it-actually-takes-to-run-code-on-200me-supercomputer/

    [2] EuroHPC Joint Endeavor, MareNostrum 5 Inauguration Particulars (2023), EuroHPC JU. [link]



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    A Practical Guide to Memory for Autonomous LLM Agents

    April 17, 2026

    You Don’t Need Many Labels to Learn

    April 17, 2026

    Beyond Prompting: Using Agent Skills in Data Science

    April 17, 2026

    6 Things I Learned Building LLMs From Scratch That No Tutorial Teaches You

    April 17, 2026

    Introduction to Deep Evidential Regression for Uncertainty Quantification

    April 17, 2026

    memweave: Zero-Infra AI Agent Memory with Markdown and SQLite — No Vector Database Required

    April 17, 2026
    Leave A Reply Cancel Reply

    Editors Picks

    One Rumored Color for the iPhone 18 Pro? A Rich Dark Cherry Red

    April 18, 2026

    A Practical Guide to Memory for Autonomous LLM Agents

    April 17, 2026

    The first splittable soft-top surfboard

    April 17, 2026

    Meet the speakers joining our “How to Launch and Scale in Malta” panel at the EU-Startups Summit 2026!

    April 17, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Today’s NYT Connections: Sports Edition Hints, Answers for Jan. 9, #108

    January 9, 2025

    SEMA’s Hydrogen Fuel Cell Truck

    November 3, 2025

    Nevada man sentenced after stealing $714K from multiple Dotty’s Casino locations

    January 2, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.