Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • I Spent May Evaluating Different Engines for OCR
    • Extra-wide tiny house combines premium finishes with spacious design
    • Property investment startup Dashdot in liquidation, with Budget as ‘the straw that broke the camel’s back’
    • This Is How Trump Finally Signed the AI Executive Order
    • 7 of the Best A24 Movies You Can Stream Free on Your Next Movie Night
    • Why AI Is NOT Stealing Your Job
    • First production roadster with a roof
    • GAMING: How Australia decides its Game of The Year
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Wednesday, June 3
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Benders’ Decomposition 101: How to Crack Open a Stochastic Program That’s Too Big to Swallow Whole
    Artificial Intelligence

    Benders’ Decomposition 101: How to Crack Open a Stochastic Program That’s Too Big to Swallow Whole

    Editor Times FeaturedBy Editor Times FeaturedMay 21, 2026No Comments17 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    In my first TDS post, I a real-world drawback into an integer linear program. In my second, I made that program strong in opposition to uncertainty. In my third, I launched stochastic programming: 4 principled methods to place uncertainty into the mannequin reasonably than hand-waving it away.

    The third submit ended with a promise. I stated two-stage recourse fashions are simple to put in writing down and tidy in concept, however in observe they’ve an embarrassing drawback: as you add extra situations, the so-called “deterministic equal” explodes in measurement and the solver begins to complain. I name-dropped Benders’ decomposition as the usual repair after which disappeared with out exhibiting the way it really works.

    That is the submit that makes good on that promise.

    We’ll begin with the place the earlier submit left off, present why the plain strategy falls aside at scale, work out the one statement that saves us, after which stroll slowly via the algorithm itself. There’s a bit extra math than final time, however nothing unique; probably the most superior instrument we’ll use is LP duality, and even that comes packaged as “we will write the identical drawback otherwise.”

    Fast recap: the mannequin we’re making an attempt to unravel

    Within the earlier submit, our hero was a trend firm in Germany sourcing winter clothes from Bangladesh. Demand ξ is a random variable; manufacturing x must be determined earlier than ξ is thought. After demand is revealed, a recourse determination y patches up any shortfall at a better value (an emergency batch from Romania, say).

    If we assume a discrete distribution with S situations — values ξ¹, ξ², …, ξS^S with chances p₁, …, pS — the two-stage recourse mannequin collapses right into a single huge LP, the deterministic equal:

    One copy of the second stage per state of affairs, all glued to the identical first-stage xx. Hand it to HiGHS or Gurobi, wait, and get an answer. Finish of story.

    Besides typically the ready turns into the entire story.

    When the deterministic equal hits a wall

    For small SS, every little thing is okay. You write the LP, you remedy it, you go dwelling. However situations are likely to multiply quicker than your endurance:

    • A steady demand distribution that you simply approximate with pattern common approximation may want 1000’s of samples earlier than the optimum xx stabilizes.
    • A multi-stage mannequin followers out exponentially with the variety of levels. Three levels with ten branches every already provides you a thousand leaves; 5 levels and also you’re nicely into the lots of of 1000’s.
    • In actual purposes, corresponding to hydropower scheduling, supply-chain planning beneath demand shocks, or vitality dispatch, SS within the tens of 1000’s isn’t uncommon. Hydropower individuals repeatedly mannequin state of affairs bushes which might be too huge to even enumerate, not to mention hand to a solver.

    The deterministic equal dutifully grows with SS. It has one block of constraints per state of affairs, one set of ysy_s variables per state of affairs, and one TsxT_sx coupling every block to the shared first-stage xx. When you check out the constraint matrix, it has a really specific form:

    That is referred to as block-angular construction. Every state of affairs contributes its personal diagonal block WsW_s, plus a column of TsT_s on the left that ties it to xx. The shared row on the backside is the first-stage feasibility constraint Ax≥bAx geq b.

    The form is informative, however for the solver it’s unhealthy information. LP runtime grows superlinearly in drawback measurement. A fast Google search will let you know it’s roughly O(n2.5)O(n^{2.5}) for simplex-style strategies and O(n3.5)O(n^{3.5}) for interior-point strategies within the worst case, which suggests doubling the variety of situations doesn’t double the runtime — it sextuples it. In some unspecified time in the future the issue stops becoming in reminiscence completely, and your pleasant solver begins swapping to disk, which is a well mannered approach of claiming “that is an excessive amount of.”

    So we’d like one thing cleverer than “shove the entire thing right into a solver and hope for the perfect.”

    Why naive tips don’t assist

    Earlier than we get to the intelligent thought, let’s rule out the plain ones.

    Thought 1: Simply use fewer situations. Typically that is high-quality; typically the reply modifications meaningfully as you add situations, and your “optimum” was a coincidence of the pattern you occurred to attract. The pattern common approximation literature is exactly about when you will get away with this and when you may’t.

    Thought 2: Clear up every state of affairs individually and common the outcomes. Tempting, however flawed. The entire level of getting one xx is that it has to work for all situations concurrently. Fixing state of affairs ss by itself provides you the scenario-specific optimum (the “wait-and-see” resolution from the final submit), not a first-stage determination you may decide to.

    Thought 3: Clear up the deterministic drawback with imply demand 𝔼[ξ], then take care of the leftovers. That is the EV resolution. It may be very flawed, and that wrongness is precisely what VSS measures.

    So we will’t shrink the issue, we will’t decouple it, and we will’t common our approach round it. We’d like one thing that respects the construction of the deterministic equal whereas not fixing it as one big LP.

    That one thing has been round since 1962.

    The one statement that saves us

    Take a look at the constraint matrix once more:

    Now think about that xx is mounted and decide any worth, doesn’t matter which. The primary-stage block Ax≥bAx geq b both holds or it doesn’t. And the remainder of the constraints decouple: state of affairs s’s constraint turns into Wsys≥hs−TsxW_s y_s ≥ h_s − T_s x, which includes solely yₛ. The connection between situations runs via xx. Repair xx, and the situations collapse into SS unbiased LPs.

    That is the important thing thought. The variable xx is a complicating variable: if it have been recognized, life could be a lot simpler. Every subproblem is small (the dimensions of 1 state of affairs, not all of them), they usually’re unbiased, so you may even remedy them in parallel.

    The Benders’ decomposition recipe is constructed round this statement. Roughly:

    1. Guess a first-stage determination xx.
    2. Clear up the S state of affairs subproblems independently, given xx.
    3. Use what the subproblems let you know to replace your guess of xx.
    4. Repeat till your guesses cease enhancing.

    The laborious half (and the elegant half) is step 3: how do the subproblems inform us something helpful about xx? That’s the place LP duality earns its maintain.

    Benders!

    Let’s decelerate and construct the algorithm one piece at a time.

    The worth operate v(ξˢ, x)

    For a set first-stage determination xx and a set state of affairs ss, the second-stage subproblem is only a small LP:

    Its twin is

    This twin has a function we’ll lean on. Its possible area Λs={λ≥0:λ⊤Ws≤qs}Lambda_s = {lambda geq 0 : lambda^high W_s leq q_s} is a polyhedron, and it doesn’t depend upon xx in any respect. The xx solely reveals up within the goal. Mixed with the LP indisputable fact that an optimum is at all times attained at a vertex of the possible area, we get a helpful rewrite:

    the place the λokslambda^s_k are the (finitely many) vertices of ΛsLambda_s.

    That’s a really good expression to stare at. As a operate of xx, v(ξs,x)v(xi^s, x) is the pointwise most of a finite assortment of affine capabilities. That makes it piecewise linear and convex. Image the higher envelope of a fan of traces.

    The anticipated recourse value is only a weighted sum of those:

    A weighted sum of piecewise-linear convex capabilities is itself piecewise-linear and convex. So Q(x)Q(x), the factor we’re making an attempt to attenuate within the first stage on high of c⊤xc^high x, is a piecewise-linear convex operate of xx. We don’t have an express method for it, as a result of that might require enumerating each vertex of each ΛsLambda_s, however we all know its form.

    That form is the lever Benders pulls.

    Approximating Q with cuts

    We don’t know QQ in closed type, however a piecewise-linear convex operate has a really forgiving property: at any level x‾bar{x} the place we all know the worth Q(x‾)Q(bar x) and a subgradient g∈∂Q(x‾)g ∈ ∂Q(x̄), the linear operate

    is a world decrease sure on QQ. It touches QQ at x‾x̄ and stays beneath all over the place else.

    For the recourse value particularly, the subgradient is one thing we will compute. If we remedy the twin subproblems and acquire optimum twin vertices λok∗slambda^s_{ok*} for every state of affairs, a subgradient of QQ at x‾x̄ is

    So each time we remedy the subproblems at some candidate xτx^tau, we get a decent affine decrease sure on QQ totally free. Benders calls every such decrease sure an optimality reduce.

    The plan is now seen. We’re going to construct up a sequence of affine decrease bounds on QQ, one per iteration, and use them as a stand-in for QQ within the first-stage drawback.

    The grasp drawback

    We change Q(x)Q(x) with a placeholder variable θtheta that has to lie above each decrease sure we’ve collected to date. Iteration tt has the grasp drawback:

    Right here LL is any secure decrease sure on QQ — usually L=0L = 0 if recourse prices are non-negative, or one thing extra damaging if not — and every (αt,βt)(αₜ, βₜ) is an optimality reduce from a earlier iteration. The grasp drawback is only a common LP, a lot smaller than the deterministic equal as a result of it solely carries the first-stage variables and a rising pile of cuts.

    The candidate resolution (xτ,θτ)(x^tau, θ^tau) from this grasp is then fed to the subproblems, which return a recent reduce, which will get added, and we go round once more.

    The algorithm

    Placing it collectively, the uni-cut Benders algorithm is:

    1. Clear up the grasp drawback to get (xτ,θτ)(x^tau, θ^tau).

    2. For every state of affairs s=1,…,Ss = 1, …, S, remedy the twin subproblem at xτx^tau to get an optimum vertex λτsλˢ_τ and the optimum worth v(ξs,xτ)v(ξˢ, x^tau).

    3. Construct the optimality reduce θ≥αt+βt⊤xθ ≥ α_t + β_t^high x with

    and add it to the grasp.

    4. Cease if the grasp’s θτθ^tau already satisfies the brand new reduce at xτx^tau — that’s, if θτ≥αt+βt⊤xτθ^tau ≥ α_t + β_t^high x^tau. In any other case, return to step 1.

    A number of issues are quietly exceptional about this loop.

    Finiteness. Each reduce comes from some vertex λoksλ^s_k of some ΛsLambda_s. Every ΛsLambda_s has finitely many vertices. So within the worst case, the algorithm enumerates all doable cuts and stops — it can not loop ceaselessly. In observe, it stops a lot sooner: usually solely a small handful of vertices find yourself mattering.

    Decomposition finished proper. The large shared LP has been changed by a small grasp drawback plus S tiny, unbiased subproblems. The subproblems could be solved in parallel. The grasp grows, however solely by one constraint per iteration.

    Info, not enumeration. We by no means write out all of the situations in a single LP. We let the subproblems inform us, via their twin variables, what the grasp must learn about them. The cuts are the language they converse.

    Uni-cut versus multi-cut

    Within the recipe above, we mixture the per-scenario data right into a single reduce on QQ. The pure different is to maintain SS separate placeholders θ1,…,θSθ₁, …, θ_S, one per state of affairs, and add SS separate cuts every iteration. That is the multi-cut variant.

    Multi-cut makes use of extra data per iteration, so the grasp’s approximation of QQ tightens quicker. The price is that you simply’re including SS constraints per iteration as an alternative of 1, and the grasp will get unwieldy quicker. There’s no common winner — You and Grossmann (2013) report significant speedups from multi-cut on supply-chain fashions, but it surely’s case-by-case. An affordable default is to start out with uni-cut and take a look at multi-cut if convergence is sluggish.

    An image of 1 iteration

    Typically the best technique to soak up the algorithm is to stroll via what every iteration accomplishes geometrically.

    Think about Q(x)Q(x) because the true (unknown) piecewise-linear convex operate plotted over xx. The grasp’s present approximation Q^τ(x)hat Q^tau(x) is the pointwise most of all cuts we’ve added to date — a polyhedral decrease sure that sits beneath Q all over the place.

    At iteration τtau, the grasp minimizes c⊤x+Q^τ(x)c^high x + Q̂^tau (x), and lands on some xτx^tau the place its approximation thinks the optimum is. We then ask the subproblems: “what’s the true QQ at xτx^tau?” They reply with each a price Q(xτ)Q(x^tau) and a subgradient (the twin resolution). That subgradient provides us a brand new linear operate that touches QQ at xτx^tau — the reduce. We add it; the grasp’s approximation rises, simply sufficient to be precise at xτx^tau; and the following iteration’s xτ+1x^{tau+1} will (normally) be totally different.

    The loop stops when the grasp’s approximation already predicts the true worth at xτx^tau. Visually, this implies our polyhedral decrease envelope has caught up with QQ precisely on the level we care about.

    You received’t at all times attain an actual match, as a result of for numerical causes most implementations cease at a small tolerance, however the geometry is identical.

    It’s not a magic instrument

    To this point this has learn like an algorithmic free lunch: identical reply, a lot smaller LPs, parallelizable, finite. Let’s be sincere in regards to the disadvantages as nicely.

    Sluggish convergence within the wild. In concept, Benders terminates in finitely many iterations; in observe, “finitely many” can imply rather a lot of iterations, and each solves an LP. Rahmaniani et al. (2017) is the usual literature overview on the dozen-odd tips individuals use to hurry issues up: reduce choice, reduce removing when previous cuts grow to be dominated, belief areas to keep away from wildly oscillating xτx^tau, and so forth.

    The grasp drawback turns into the bottleneck. It’s commonplace for greater than 90% of whole runtime to enter fixing the (frequently rising) grasp. Widespread treatments embrace not fixing the grasp to full optimality on early iterations, eradicating cuts that not assist, and exploiting construction when the grasp is itself a structured LP.

    Subproblems can dominate too. Particularly when SS is massive or every subproblem is itself a tough LP, the second stage takes over. The repair is generally engineering: parallelize throughout situations, warm-start every subproblem from the earlier iteration’s foundation, batch them when doable.

    The hidden assumption: comparatively full recourse. The whole lot above quietly assumed that the second-stage subproblem is possible for any xx the grasp may suggest. That’s the property of (comparatively) full recourse from the earlier submit: for each possible first-stage x and each state of affairs ξξ, there exists a recourse y≥0y ≥ 0 satisfying Wy≥h(ξ)−T(ξ)xWy ≥ h(ξ) − T(ξ)x.

    When that property fails, some xτx^tau will go away the subproblem infeasible — the second-stage LP has no y that works, so the implicit value is +∞+ infty, which means xτx^tau was by no means a possible first-stage alternative to start with. Benders patches this with feasibility cuts, that are derived by fixing an auxiliary LP that measures how infeasible the subproblem is and utilizing its twin to supply a hyperplane that slices xτx^tau out of the grasp’s possible area. With out going via the algebra, the construction is identical as optimality cuts: remedy a small LP, have a look at the twin, construct a linear constraint, hand it again to the grasp. In every iteration of Benders, then, you both add an optimality reduce (if each subproblem was possible) or a feasibility reduce (if some subproblem was not). The recourse mannequin with no complete-recourse assure nonetheless works; you simply want each sorts of cuts.

    It solely totally applies to two-stage fashions. For multi-stage stochastic applications, the pure generalization isn’t simply “Benders with extra levels.” It’s stochastic twin dynamic programming (SDDP), which makes use of sampling and approximate worth capabilities to keep away from constructing out the total state of affairs tree. SDDP is the workhorse in hydropower scheduling, and it deserves, and doubtless will finally get, its personal submit.

    Integer first levels get tough. If xx must be integer (assume capability selections, facility location), then the grasp is an MILP, and Benders’ clear convex concept loses a few of its chunk. There are extensions, corresponding to generalized Benders, integer L-shaped, branch-and-Benders-cut, and many others., however they’re previous the scope of this submit.

    None of this implies Benders is unhealthy. It means it’s a way, not a miracle. Used thoughtfully — with the best grasp/subproblem cut up, the best reduce variant, and the best speedup tips — it’s the distinction between fixing an issue and never.

    Wrapping up

    We began with an issue: stochastic applications are simple to put in writing down however simple to make too huge to unravel. The deterministic equal grows linearly within the variety of situations, and LP solvers develop superlinearly in drawback measurement, which is a quadratic-or-worse collision course.

    Benders’ decomposition takes that block-angular construction severely. It carves out the complicating first-stage variables, leaves the scenario-decoupled second stage to a swarm of small subproblems, and makes use of LP duality to ferry data between the 2: optimality cuts when the subproblems are possible, feasibility cuts after they’re not. The result’s an iterative scheme that converges in finitely many steps and fairly often lots quicker than that.

    To summarize what’s really doing the work:

    1. The second-stage worth operate v(ξ,x)v(ξ, x) is piecewise-linear and convex in xx. So is the anticipated recourse value Q(x)Q(x).
    2. A piecewise-linear convex operate could be approximated arbitrarily nicely by a rising assortment of affine decrease bounds (cuts), every obtained totally free from a twin resolution.
    3. The primary stage doesn’t must know QQ precisely — simply to lower-bound it tightly on the factors it cares about. The grasp/subproblem loop builds up precisely that.
    4. Uni-cut and multi-cut are two methods to bundle the per-scenario data. Feasibility cuts deal with the case when subproblems can fail.

    When you take one factor away, let it’s the block-angular look of the constraint matrix. Each time you may rewrite an optimization drawback in order that fixing some variables makes the remainder separable, you may attempt Benders. This concept reveals up in community design, scheduling, vitality planning, anyplace you’ve a “laborious” set of choices that bind collectively in any other case simple ones.

    Within the subsequent submit on this sequence, we’ll see what to do when even a two-stage Benders is the flawed body — when the world doesn’t politely cut up into “resolve, observe, appropriate” however goes “resolve, observe, resolve, observe, resolve…” indefinitely. That’s the territory of SDDP. When you’ve ever puzzled how hydropower operators schedule reservoirs over a multi-year horizon with 1000’s of joint influx situations, the reply is in that subsequent submit.

    Credit and references

    As with the earlier submit, this one relies on lectures by Dr. Ruben van Beesten (Norwegian College of Science and Expertise) from his October 2023 course on Stochastic Programming, which I had the pleasure of attending in Trondheim. The pedagogical buildup comes straight from his slides; any clumsiness within the retelling is mine. Additionally, all photographs, apart from the thumbnail, on this submit are made by me.

    4 references value figuring out about:

    • Benders, J. F. (1962). Partitioning procedures for fixing mixed-variables programming issues. Numerische Mathematik, 4(1), 238–252. The unique paper. Surprisingly readable.
    • Van Slyke, R., and Wets, R. (1969). L-shaped linear applications with purposes to optimum management and stochastic programming. SIAM Journal on Utilized Arithmetic, 17(4), 638–663. The stochastic-programming-flavored model; that is the place the identify “L-shaped technique” comes from.
    • Fengqi You and Ignacio E Grossmann. (2013). Multicut benders decomposition algorithm for course of provide chain planning beneath uncertainty. Annals of Operations Analysis, 210:191–211.
    • Rahmaniani, R., Crainic, T. G., Gendreau, M., and Rei, W. (2017). The Benders decomposition algorithm: A literature overview. European Journal of Operational Analysis, 259(3), 801–817. The reference for accelerations, variants, and fashionable utilization.

    And the earlier posts within the sequence: Five questions that will help you model integer linear programs better, Four steps to robustify your linear program, and A Gentle Introduction to Stochastic Programming.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    I Spent May Evaluating Different Engines for OCR

    June 3, 2026

    Why AI Is NOT Stealing Your Job

    June 3, 2026

    What AI Agents Should Never Do on Their Own

    June 3, 2026

    Exploring Income Patterns with Python Pandas, Matplotlib, and Seaborn

    June 2, 2026

    From Local App to Public Website in Minutes

    June 2, 2026

    Code Is Cheap. Engineering Judgement Is Now the Scarce Resource

    June 2, 2026
    Leave A Reply Cancel Reply

    Editors Picks

    I Spent May Evaluating Different Engines for OCR

    June 3, 2026

    Extra-wide tiny house combines premium finishes with spacious design

    June 3, 2026

    Property investment startup Dashdot in liquidation, with Budget as ‘the straw that broke the camel’s back’

    June 3, 2026

    This Is How Trump Finally Signed the AI Executive Order

    June 3, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Liquid metal gives joint implants infection-fighting powers

    October 10, 2025

    Robots learn new skills without custom code changes

    May 19, 2026

    Burning Man Is Over, but Regional Burns Keep the Party Going Year-Round

    September 17, 2024
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.