A Gentle Introduction to Stochastic Programming

In my first TDS post, I wrote about easy methods to translate a real-world drawback into an integer linear program. In my second, I wrote about easy methods to make that program sturdy towards uncertainty. Each had been variations on the identical concept: take a fuzzy real-world query, squeeze it into an LP, and let a solver do the remaining.

There’s a second in each optimizer’s life, although, when the LP begins to really feel a bit too neat. Demand is a quantity. Journey time is a quantity. Wind pace is a quantity. The mannequin accepts the enter, returns an optimum resolution, and goes on its means. The truth these numbers had been supposed to explain (messy, jittery, and sometimes shocking) doesn’t actually present up anyplace.

Stochastic programming is the sphere that takes that discomfort significantly. As a substitute of pretending the info is actual, it builds the uncertainty immediately into the mannequin. The value you pay is a little more notation; the payoff is choices that maintain up when the world doesn’t cooperate.

This submit is a delicate tour of the fundamentals. We’ll see why the plain strategy doesn’t work, stroll by way of the 4 commonplace methods to deal with uncertainty in a linear program, and end with a fast sanity examine on whether or not any of that is definitely worth the effort. There’s some math, however it’s the identical math you already know from LP, with one further image hooked up.

Place to begin: a style firm with a foul crystal ball

To make this concrete, we’ll use the operating instance from dr. Ruben van Beesten’s lectures (extra on that within the credit beneath). It goes like this.

You run a style firm that sells winter clothes in Germany. Manufacturing occurs in Bangladesh, which is reasonable however gradual: the products take just a few weeks to reach. So within the fall, it’s important to determine how a lot to supply for the upcoming winter season.

Two methods this may go unsuitable: produce too little, and also you lose gross sales; produce an excessive amount of, and also you’re caught with inventory you may’t promote. The entire query is how a lot to supply now, and the reply will depend on one thing you don’t truly know but: winter demand.

If you happen to ignored the uncertainty for a second and pretended demand was a hard and fast quantity, you could possibly write down a vanilla LP:

Right here x is how a lot you produce, c is the unit manufacturing price, h is demand, and T is simply the id matrix (one unit produced satisfies one unit of demand). The constraint says: produce at the least as a lot as is demanded.

That is high quality if h is definitely recognized. The difficulty is that demand isn’t a quantity, it’s a random variable. Let’s name it ξ. The trustworthy model of the mannequin would appear to be this:

And right here we hit a wall. What does it imply for x to fulfill a constraint that will depend on a random variable? Is x = 100possible if demand would possibly be 80, would possibly be 120, and is perhaps anyplace in between? The issue isn’t exhausting to unravel: it’s ill-defined. The solver doesn’t even know which drawback you’re asking it to unravel.

Stochastic programming is, in essence, a set of principled solutions to that query. We’ll have a look at the 4 most typical ones.

4 methods to deal with the uncertainty

Every of the 4 approaches takes the ill-defined LP above and turns it right into a well-defined optimization drawback. They differ in what they assume you recognize concerning the uncertainty, and in how cautious they’re about dangerous outcomes.

1. Strong optimization: put together for the worst

Probably the most cautious strategy. You don’t have to know the complete chance distribution of ξ, however solely its help, i.e., the set of values it might probably take. We name this set the uncertainty set, written U. Then you definately ask: what’s the greatest determination that stays possible regardless of which ξ ∈ U truly exhibits up?

The constraint now has to carry for each ξ within the uncertainty set. In our style instance with U = [0, 10], you’d be planning for demand of 10, the worst case, each time.

That’s the energy and the weak point of sturdy optimization in a single sentence. The answer is bulletproof, however it’s additionally conservative: you’ll usually be sitting on stock you didn’t want, since you deliberate as if the unlikely worst case had been assured. If you happen to’ve learn my earlier post on robustifying linear programs, that is precisely the framework that sits behind these 4 steps.

2. Likelihood constraints: chill out the worst case

Strong optimization plans for any doable final result. Likelihood constraints chill out that to: plan for most of them. You choose a chance stage α, say 95%, and require the constraint to carry with at the least that chance:

That is referred to as a joint likelihood constraint: all of the entries of the constraint vector must be happy concurrently, with joint chance ≥ α. A weaker variant treats every row individually:

These are particular person likelihood constraints: every constraint i should maintain with chance at the least αᵢ, however you don’t care concerning the joint occasion. Fast train: when you set each αᵢ equal to the joint α, which formulation is extra conservative?

Reply: the joint model. Satisfying all constraints concurrently is a stricter requirement than satisfying every one in isolation, so the joint formulation has a smaller possible area and a worse (larger) optimum price. Both means, likelihood constraints offer you a knob, α, to dial how cautious you need to be. Crank it to 1, and also you’re again to (virtually) sturdy. Drop it to 0.5, and also you’re mainly flipping a coin on feasibility. Most actual functions reside someplace within the 0.9–0.99 vary.

There’s a catch value flagging: likelihood constraints are exhausting normally. The chance time period contained in the constraint is a non-linear, usually non-convex perform of x, so that you often can’t hand the formulation on to a regular LP solver. There are tractable particular circumstances (Gaussian noise, sure mixtures of distributions, sample-based approximations), however the basic drawback is more durable than it seems to be at first look.

3. Two-stage recourse fashions: determine, observe, appropriate

The primary two approaches deal with constraint violation as one thing to keep away from, both all the time (sturdy) or with excessive chance (likelihood). Generally that’s the unsuitable body. In our style instance, falling in need of demand isn’t catastrophic. It’s annoying. You may often repair it: produce a small emergency batch in Germany at a better price, or ship by air, or simply settle for the misplaced gross sales and transfer on.

This concept, that violating a constraint isn’t the top of the world, you may take a corrective motion later, is the guts of recourse fashions. Within the two-stage model, the timeline seems to be like this:

Stage 1 (now): you make a first-stage determination x whereas ξ remains to be unsure.
Then: ξ is realized, i.e., the random variable turns into a recognized quantity.
Stage 2 (later): you make a second-stage determination y, understanding ξ.

Mathematically, the primary stage seems to be virtually like a vanilla LP, besides the target now accommodates an anticipated future price:

The perform v(ξ, x) is the optimum worth of the second-stage drawback, given that you simply selected x within the first stage and that ξ turned out to be the realized worth:

Learn this fastidiously. The suitable-hand aspect, h(ξ) − T(ξ) x, is the shortfall, how a lot your first-stage determination did not cowl, after ξ was revealed. The recourse determination y then closes that hole, at a value q(ξ)ᵀ y. So the construction is: pay the up-front price cᵀ x, and on high of it pay the anticipated price of cleansing up after the random variable does its factor.

That’s the entire concept. Two-stage recourse fashions are by far the most typical formulation in follow, partly as a result of they seize the precise chronology of selections in lots of actual issues (manufacturing planning, stock, power dispatch, scheduling), and partly as a result of they’re comparatively well-behaved mathematically.

A few items of vocabulary you’ll journey over when you learn additional:

A mannequin has mounted recourse if the recourse matrix W doesn’t rely on ξ. Many algorithms solely work on this case.
A mannequin has (comparatively) full recourse if there’s all the time a possible recourse determination y, it doesn’t matter what ξ seems to be and it doesn’t matter what x you selected. If full recourse fails, the second-stage drawback may be infeasible, which turns into an implicit constraint on the primary stage. (That is precisely the place Benders’ feasibility cuts come from, however that’s a narrative for one more submit.)

4. Multi-stage recourse fashions: preserve going

Generally life isn’t two phases. You don’t simply decide-observe-correct as soon as and go dwelling; you determine, observe, determine, observe, determine, … time and again. Multi-stage recourse fashions are the pure extension.

In our style instance, suppose we’re not selecting as soon as within the fall, however 3 times: within the fall (low cost, in Bangladesh), in early winter (dearer, in Romania), and in late winter (most costly, in Germany). Demand is regularly revealed over the season, and at every stage we determine primarily based on what we’ve noticed to date.

The notation will get heavier, you find yourself writing recursive worth capabilities Qₜ, with histories ξ_[t] = (ξ₁, …, ξₜ) hanging off them, however conceptually nothing new is occurring. Every stage is a recourse drawback nested contained in the earlier one. The pure technique to image that is as a state of affairs tree: every node is a state of the world, every department is a doable realization of the subsequent random variable, and a state of affairs is an entire root-to-leaf path.

Instance of a three-stage state of affairs tree, supply: course slides by dr. Ruben van Beesten.

One subtlety. A state of affairs is the complete trajectory of ξ, not only one realization. Realizing that ξ₂ = 10 doesn’t let you know which state of affairs you’re in, as a result of ξ₃ hasn’t occurred but. This issues if you begin writing the deterministic equal (subsequent part), as a result of it’s important to watch out that your choices solely rely on data that has truly been noticed by the point the choice is made. That property known as non-anticipativity: you may’t anticipate the long run. The mannequin would fortunately cheat when you didn’t implement it explicitly.

How can we truly clear up a recourse mannequin?

To date we’ve been writing fashions. To unravel them, we usually rework them into one thing a regular LP solver can chew on. The trick is the deterministic equal formulation.

Suppose the random variable ξ has a discrete distribution: it takes finitely many values ξ¹, ξ², …, ξˢ (referred to as eventualities), every with chance pₛ. Then the anticipated second-stage price is only a finite sum, and we will write the complete two-stage drawback as one large LP by introducing one copy of y per state of affairs:

That’s a daily LP. Huge, probably very large, you probably have S eventualities, you’ve primarily copied the second stage S occasions, however it’s an LP. You may hand it straight to HiGHS, Gurobi, CPLEX, or no matter solver you want, and it’ll clear up it.

Two pure questions observe.

First: what if the distribution of ξ is not discrete? In that case the deterministic equal has infinitely many eventualities and isn’t finite-dimensional. The usual repair is pattern common approximation: draw a pattern of dimension S from the true distribution, clear up the sampled deterministic equal, and let S develop till your resolution stabilizes statistically. There’s a complete literature on how large S must be and what ensures you get.

Second: what if the deterministic equal is simply too large to unravel immediately? That is the place decomposition strategies are available in. Benders’ decomposition splits the issue right into a grasp drawback within the first-stage variables and a subproblem per state of affairs, then iteratively passes data between them. For multi-stage fashions with many phases, the analogous trick is stochastic twin dynamic programming (SDDP), which makes use of sampling and approximate worth capabilities to keep away from constructing the complete state of affairs tree. Each are superior sufficient to deserve their very own posts, so I’ll come again to them later.

Is any of this truly definitely worth the hassle?

Trustworthy query. Stochastic packages are messier to formulate, more durable to unravel, and slower to run than their deterministic cousins. In case your real-world drawback isn’t very delicate to uncertainty, you is perhaps higher off simply plugging the anticipated demand into a daily LP and calling it a day.

The excellent news is, you may quantify precisely how a lot the stochastic formulation buys you. There are two classical metrics, and each are value understanding.

Outline 4 numbers:

In phrases: SP is the optimum worth of the particular stochastic program. EV is what you get when you substitute ξ with its anticipated worth and clear up the ensuing deterministic drawback; name its resolution x̄. EEV is the anticipated price of implementing that deterministic resolution x̄ within the precise stochastic world. And WS (“wait-and-see”) is the anticipated price when you acquired to peek on the realized ξ earlier than deciding x, the cheating-but-best case.

From these 4 numbers you may construct two extremely informative portions:

VSS is the Worth of the Stochastic Resolution: how a lot worse off you’d be when you simply solved the deterministic drawback with common values and applied its resolution. If VSS is small, the stochastic program isn’t shopping for you a lot; the deterministic shortcut is okay.

EVPI is the Anticipated Worth of Good Data: how a lot you’d achieve if a benevolent oracle handed you the realized ξ earlier than you needed to determine. If EVPI is small, your forecasts already comprise a lot of the data you want; investing in higher predictions in all probability gained’t transfer the needle. If EVPI is massive, higher knowledge has actual worth.

Clarification of helpful metrics for a stochastic program.

The 2 metrics trip alongside on a tidy chain of inequalities (assuming uncertainty solely on the right-hand aspect):

Learn it left to proper: cheating-with-the-mean (EV) is at most as dangerous as cheating-with-the-realization (WS), which is at most as dangerous because the trustworthy stochastic reply (SP), which is at most as dangerous as plugging within the deterministic-solution-and-living-with-it (EEV). The chain implies a free higher sure on VSS which you can compute earlier than you ever clear up the SP: VSS ≤ EEV − EV. If that hole is tiny, the deterministic shortcut is sweet sufficient and it can save you your self the headache.

The place to go from right here

This submit caught to the fundamentals: easy methods to write a stochastic program down. The following pure step is easy methods to clear up massive ones effectively. The 2 large workhorses are:

Benders’ decomposition — for two-stage fashions, decomposes the deterministic equal right into a grasp drawback (in x) plus one subproblem per state of affairs, and reconciles them with cuts. Notably elegant when you could have a number of eventualities however a comparatively small first stage.
Stochastic Twin Dynamic Programming (SDDP) — for multi-stage fashions, makes use of sampling and piecewise-linear approximations of the long run worth capabilities. Famously utilized in hydropower scheduling, the place the state of affairs tree is so large that express enumeration is hopeless.

Each deserve their very own posts. If there’s curiosity, I’ll write them up.

Takeaway

If you happen to’re utilizing LPs in any context the place the enter knowledge is genuinely unsure because of forecasted demand, climate, costs, journey occasions, or anything, then your mannequin is making an implicit alternative about easy methods to deal with that uncertainty. “Simply use the imply” is a alternative. So is “plan for the worst.” Stochastic programming provides you the vocabulary to make that alternative express, and the instruments to guage whether or not your alternative was a superb one (hi there, VSS).

To summarize the 4 primary methods to mannequin uncertainty in an LP:

Strong optimization — plan for the worst case in a given uncertainty set.
Likelihood constraints — require feasibility with at the least chance α.
Two-stage recourse — determine, observe, appropriate; pay an anticipated recourse price.
Multi-stage recourse — the identical concept, repeated over time on a state of affairs tree.

And two metrics value conserving in your again pocket: VSS (does the stochastic mannequin assist?) and EVPI (would higher forecasts assist?).

Most actual issues aren’t deterministic. The excellent news is your modeling toolkit doesn’t must be both.

Credit and references

This submit is predicated on lectures by dr. Ruben van Beesten (Norwegian College of Science and Expertise) from his course on Stochastic Programming given in October 2023, which I had the pleasure of attending in Trondheim, Norway. The style-company instance, the four-way taxonomy of formulations, and the VSS/EVPI framing all come straight from his slides; any clumsiness within the retelling is mine.

The unique modeling train that motivates a lot of the recourse-model instinct is from

Higle, J. L. (2005). Stochastic Programming: Optimization When Uncertainty Issues. In INFORMS TutORials in Operations Analysis, pp. 30–53.

A few additional pointers value understanding about:

Kleywegt, A. J., Shapiro, A., and Homem-de-Mello, T. (2002). The pattern common approximation technique for stochastic discrete optimization. SIAM Journal on Optimization, 12(2), 479–502. The usual reference for SAA.
Higle, J. L., and Sen, S. (1991). Stochastic decomposition: an algorithm for two-stage linear packages with recourse. Arithmetic of Operations Analysis, 16(3), 650–669. One of many few strategies that handles non-discrete distributions immediately.

And naturally, the 2 earlier posts on this collection: Five questions that will help you model integer linear programs better and Four steps to robustify your linear program.

Source link

A Gentle Introduction to Stochastic Programming

Proxy-Pointer RAG: Multimodal Answers Without Multimodal Embeddings

DeepSeek’s new AI model is rolling out quietly, not to the Wall Street market shock

System Design Series: Apache Flink from 10,000 Feet, and Building a Flink-powered Recommendation Engine

Agentic AI: How to Save on Tokens

4 YAML Files Instead of PySpark: How We Let Analysts Build Data Pipelines Without Engineers

Ensembles of Ensembles of Ensembles: A Guide to Stacking

Vision-only manipulation is hitting a wall

Brain-inspired AI chip could save 70% energy

Liquid Instruments jags more taxpayer funding in $70 million Series C

MAGA Is Confused About ‘Animal Farm’

Featured Picks

Meta shareholders overwhelmingly rejected a proposal to explore adding Bitcoin to the company’s treasury, with less than 1% voting in favor of the measure (Kyle Baird/DL News)

MWC 2026 Updates: News, Updates and Product Announcements

New €30 million Step Fund targets early-stage Italian startups with international growth potential

A Gentle Introduction to Stochastic Programming

Place to begin: a style firm with a foul crystal ball

4 methods to deal with the uncertainty

1. Strong optimization: put together for the worst

2. Likelihood constraints: chill out the worst case

3. Two-stage recourse fashions: determine, observe, appropriate

4. Multi-stage recourse fashions: preserve going

How can we truly clear up a recourse mannequin?

Is any of this truly definitely worth the hassle?

The place to go from right here

Takeaway

Credit and references

Related Posts