Why AI Still Can’t Solve Your Real Mathematical Optimization Problem

to make use of AI to construct a mathematical optimization mannequin for an actual enterprise drawback, you’ve most likely run into the identical wall: the AI works superbly on textbook examples and falls aside the second you hand it your precise information and your precise drawback.

That hole isn’t a coincidence. It’s by design, and it’s the explanation why I constructed ORPilot.

The Promise of AI-Powered Optimization

Operations Analysis (OR) has been quietly powering a number of the most impactful choices in enterprise for many years — routing supply vans, scheduling manufacturing facility manufacturing, designing provide chains, allocating cargo to carriers. The mathematics is mature and the solvers are glorious. The bottleneck has all the time been the human experience required to translate a enterprise drawback right into a mathematical mannequin.

Giant Language Fashions (LLMs) appeared like the right answer. A rising physique of analysis, together with the OptiMUS sequence, OR-LLM, and others, has proven that state-of-the-art LLMs can generate appropriate solver code for well-specified linear programming (LP) and blended integer programming (MIP) issues. The outcomes seemed spectacular. The demos had been compelling.

Then you definitely attempt to use certainly one of these instruments on an actual drawback, and the cracks seem instantly.

The place Present Instruments Break Down

Nearly each LLM-for-OR software constructed thus far shares a hidden assumption: the issue description is full, unambiguous, and handed to the AI in a single, well-formatted immediate with all the info neatly embedded inline.

That’s not how actual OR issues work. Not even shut.

Contemplate what really occurs when a provide chain staff desires to construct an optimization mannequin:

The issue description is incomplete and ambiguous. A enterprise analyst will say “we wish to reduce transportation prices” and neglect to say that every distribution heart has a throughput restrict, that some routes don’t exist, or that opening a facility incurs a one-time mounted value. These omissions aren’t
carelessness. They’re assumptions the analyst considers apparent, which is precisely why they’re harmful. An AI system that begins modeling earlier than these particulars are nailed down produces a mannequin that’s technically appropriate however virtually fallacious.
The information is simply too massive to slot in a immediate. An actual provide chain drawback may contain tons of of manufacturing websites, distribution facilities, clients, and 1000’s of merchandise over a number of durations. The demand desk alone may need hundreds of thousands of entries. You can not embed that in a immediate. Even should you might, flooding the context window with uncooked information dramatically will increase the chance of hallucinations.
The information you might have is just not the info the mannequin wants. The mannequin may want a distance matrix between all pairs of areas. What you might have is a desk of GPS coordinates. The mannequin may want mixture demand by product and interval. What you might have is a transaction ledger with one row per order. Bridging this hole, specifically computing derived parameters from uncooked information, is a big engineering step that no current LLM-for-OR software handles routinely.
Upon getting a working mannequin, portability and reproducibility matter. If you wish to re-run the mannequin on up to date information, change from Gurobi to an open-source solver, or hand the mannequin off to a colleague on a special machine, you’re again to sq. one until the software produces a sturdy, solver-agnostic artifact. Most instruments produce solver-specific code and nothing else.

These aren’t edge instances. They’re the usual situations for any real-world OR deployment. Present LLM-for-OR instruments had been constructed for a special world, a textbook world, they usually present their seams the second they go away it.

Introducing ORPilot

ORPilot is an open-source AI agent constructed from the bottom up for manufacturing situations. It’s, to my data, the primary LLM-based OR software designed explicitly for the messy, large-scale, data-heavy actuality of business optimization.

Most AI instruments for optimization leap straight to writing code the second you describe your drawback. ORPilot does one thing completely different: it asks questions first.

That design resolution, prioritizing understanding over pace, displays a single guideline: an AI agent ought to work the identical means a talented human OR advisor would.

A superb advisor doesn’t stroll right into a consumer assembly and begin writing a mathematical mannequin on the whiteboard. They ask questions. They hear rigorously. They push again when one thing
is ambiguous. They be certain that the info is in the best form earlier than the modeling begins. Solely in spite of everything of that do they choose up the pen.

ORPilot’s pipeline displays this self-discipline via 5 sequentially related phases.

Stage 1: Interview Agent

The interview agent is the entry level. It receives your preliminary description of the enterprise drawback, which could be obscure, incomplete, and even self-contradictory, and engages you in a
structured dialog to fill within the gaps. The important thing design precept isn’t any modeling begins till the interview is full.

The agent is prompted to establish info gaps within the present description, ask at most one focused clarifying query per flip (to keep away from overwhelming you), and terminate as soon as the target operate, resolution variables, constraints, and information necessities are all unambiguously specified.

In apply, this implies conversations like:

ORPilot: “As soon as a facility is opened, does it stay open for all subsequent durations, or can or not it’s closed later?”

ORPilot: “Does this mannequin deal with a single product kind or a number of merchandise?”

ORPilot: “You talked about a transportation value. Is that this value per unit shipped, per cargo no matter amount, or one thing else?”

Earlier than ending the interview, the agent presents a full structured abstract with goal operate, resolution variables, constraints, parameters, indices, and provides you the possibility to appropriate something earlier than that abstract is handed downstream. That is the guard in opposition to the most typical failure mode in LLM-for-OR instruments: modeling the fallacious drawback.

Stage 2: Knowledge Assortment Agent

This stage has no counterpart in most of current LLM-for-OR software. It is among the most necessary structural improvements in ORPilot.

Most current LLM-for-OR instruments assume the info is embedded in the issue textual content, sufficiently small to slot in a immediate. For textbook issues, this works. For actual issues, it breaks down in two methods. First, actual datasets are too massive. For instance, a 500-customer, 500-product, 12-period provide chain drawback would have 3,000,000 demand entries. Second, embedding information within the immediate inflates hallucination danger and burns via context window unnecessarily.

ORPilot’s reply is to deal with information as separate from the immediate solely. Knowledge lives in CSV information. The AI accesses it solely by writing and executing code. The information assortment agent’s job is to determine precisely what these CSV information must appear like.

Based mostly on the issue specification from the interview agent, the info assortment agent determines:

Which entities (units) exist within the mannequin
What attributes (parameters) every entity wants
The exact schema for every required desk: column names, varieties, semantics

It presents this specification to you and waits till you’ve provided all of the information within the appropriate format. It validates completeness earlier than continuing.

Crucially, the agent is versatile: should you don’t have a specific piece of model-ready information (say, the mannequin wants a distance matrix however you solely have GPS coordinates), you inform the agent what you even have, and it updates the schema accordingly — passing the hole to the subsequent stage to deal with.

Stage 3: Parameter Computation Agent

Nearly each current LLM-for-OR software assumes the numerical portions wanted by the mannequin seem instantly within the user-supplied information. In apply, that is virtually by no means true. Two examples that come up continually in actual OR issues:

A automobile routing mannequin wants a pairwise distance matrix. The consumer has GPS coordinates. Computing Euclidean or geographic distances is a metamorphosis solely exterior the scope of LP/MIP formulation.
A multi-period manufacturing mannequin wants mixture demand per interval. The consumer has a transaction ledger with one row per order. The mannequin parameter is a sum-aggregation that needs to be computed from the uncooked information.

The parameter computation agent bridges this hole routinely. It receives the issue specification and the uncooked CSV information, then:

Identifies which mannequin parameters can’t be learn instantly from the uncooked tables
Generates a Python script to compute these derived parameters
Executes the script in a sandboxed surroundings
Writes the outcomes as further CSV information, handed to the modeling step

This ensures that by the point the modeling agent sees the info, it’s clear, accurately typed, accurately listed, and model-ready. In our experiments, this step considerably diminished code technology failures and retry counts.

One other frequent scenario the place the parameter computation agent may very well be helpful is computing BigM values. In some experiments that I did on ORPilot, the parameter computation agent computed a BigM worth wanted for constraints linking steady cargo variables to binary facility-opening choices. It is a derived parameter that will be impractical to ask the consumer to offer instantly.

Stage 4: Code Technology Agent

With a whole drawback specification, uncooked information, and derived parameters all in hand, the code technology agent produces a whole Python solver script in your chosen backend. ORPilot presently helps 5 backends: Gurobi, CPLEX, PuLP, Pyomo, and OR-Instruments.

The generated code is instantly executed in a sandbox. If something goes fallacious: syntax error, runtime exception, or an infeasible/unbounded solver end result, the total error message and traceback are fed again to the LLM together with the beforehand generated code. The agent retries, as much as a user-configurable most variety of makes an attempt.

In apply, nearly all of failures are resolved inside one or two retries. The important thing purpose ORPilot’s retry loop is efficient is that the upstream phases have already completed the arduous work: the issue is accurately specified, the info is model-ready, and the agent solely
wants to repair a code-level mistake reasonably than rethink all the mannequin construction.

Stage 5: Reporter Agent

After a profitable resolve, a reporter agent interprets the numerical outcomes into plain English, explaining which amenities to open, what routes to make use of, what portions to provide, within the area language of the unique enterprise drawback, for consumption by a enterprise consumer reasonably than an OR skilled.

Why This Order Issues

The pipeline is intentionally sequential. Every stage is gated on the earlier one finishing efficiently. The interview should end earlier than information assortment begins. Knowledge have to be validated earlier than parameter computation runs. Parameters have to be prepared earlier than code is generated.

This sequencing prevents the most typical failure mode in LLM-based OR instruments: cascading errors the place an ambiguous drawback description propagates via the pipeline and produces code that’s syntactically legitimate however fashions the fallacious goal.

What This Seems Like at Scale

I examined ORPilot on just a few OR issues, certainly one of which is a provide chain community design drawback with 50 manufacturing websites, 50 distribution facilities, 500 clients, 500 merchandise, 12 durations. The ensuing mannequin had greater than 9.7 million resolution variables and 963,000 constraints. ORPilot efficiently dealt with the total pipeline finish to finish, from the preliminary dialog via information assortment, parameter computation, code technology, and answer reporting, producing an optimum answer with Gurobi. Take a look at my paper right here https://arxiv.org/abs/2605.02728 to see the outcomes of extra take a look at issues.

Getting Began

ORPilot is open supply and out there now:

GitHub: https://github.com/GuangruiXieVT/ORPilot
Paper: https://arxiv.org/abs/2605.02728

Set up takes a couple of minutes. ORPilot helps OpenAI, Anthropic, Google, and DeepSeek as LLM suppliers, and Gurobi, CPLEX, PuLP, Pyomo, and OR-Instruments as solver backends.

Within the subsequent put up on this sequence, we’ll take a deep dive on the Intermediate Illustration (IR) — the solver-agnostic JSON artifact that makes ORPilot’s outcomes reproducible and transportable throughout backends with out ever calling the LLM once more. Keep tuned!

Source link

Why AI Still Can’t Solve Your Real Mathematical Optimization Problem

EmoNet: Speaker-Aware Transformers for Emotion Recognition — and What I’d Build Differently in 2026

The Infrastructure Behind Making Local LLM Agents Actually Useful

DiffuJudge-AV: A Diffusion-Inspired Framework for Calibrated AV Video Evaluation

They Requested It. I Built It. Nobody Ever Used It.

Learning From Pairwise Preferences: An Introduction to the Bradley Terry Model

How to Effectively Run Many Claude Code Sessions in Parallel

Why AI Still Can’t Solve Your Real Mathematical Optimization Problem

New Titanium Multitool Offers 15 Functions

Zurich’s Stellar Alpina raises €3.8 million to build detonation-based propulsion for in-space mobility

The $6 Billion Chinese Startup Trying to Build Hands for Every Robot

Featured Picks

Premier League Soccer 2026: Stream Man United vs. Man City Live

The 16 Best Apple TV Plus Shows You’re Probably Not Watching

Entering a New Era of Modeling and Simulation

Why AI Still Can’t Solve Your Real Mathematical Optimization Problem

The Promise of AI-Powered Optimization

The place Present Instruments Break Down

Introducing ORPilot

Stage 1: Interview Agent

Stage 2: Knowledge Assortment Agent

Stage 3: Parameter Computation Agent

Stage 4: Code Technology Agent

Stage 5: Reporter Agent

Why This Order Issues

What This Seems Like at Scale

Getting Began

Related Posts