Expected Value Analysis in AI Product Management

underneath uncertainty is a central concern for product groups. Selections massive and small usually need to be made underneath time stress, regardless of incomplete — and probably inaccurate — details about the issue and resolution area. This can be attributable to a scarcity of related person analysis, restricted data concerning the intricacies of the enterprise context (sometimes seen in firms that do too little to foster buyer centricity and cross-team collaboration), and/or a flawed understanding of what a sure expertise can and can’t do (notably when constructing front-runner merchandise with novel, untested applied sciences).

The scenario is very difficult for AI product groups for at the least three causes. First, many AI algorithms are inherently probabilistic in nature and thus yield unsure outcomes (e.g., mannequin predictions could also be proper or unsuitable with a sure likelihood). Second, a adequate amount of high-quality, related knowledge could not all the time be out there to correctly prepare AI programs. Third, the current explosion in hype round AI — and extra particularly, generative AI — has led to unrealistic expectations amongst clients, Wall Road analysts and (inevitably) determination makers in higher administration; the sensation amongst many of those stakeholders appears to be that just about something can now be solved simply with AI. For sure, it may be tough for product groups to handle such expectations.

So, what hope is there for AI product groups? Whereas there isn’t a silver bullet, this text introduces readers to the notion of anticipated worth and the way it may be used to information determination making in AI product administration. After a quick overview of key theoretical ideas, we are going to take a look at three real-life case research that underscore how anticipated worth evaluation can assist AI product groups make strategic choices underneath uncertainty throughout the product lifecycle. Given the foundational nature of the subject material, the target market of this text consists of knowledge scientists, AI product managers, engineers, UX researchers and designers, managers, and all others aspiring to develop nice AI merchandise.

Notice: All figures and formulation within the following sections have been created by the writer of this text.

Anticipated Worth

Earlier than a proper definition of anticipated worth, allow us to take into account two easy video games to construct our instinct.

A Sport of Cube

Within the first recreation, think about you’re competing with your mates in a dice-rolling contest. Every of you will get to roll a good, six-sided die N occasions. The rating for every roll is given by the variety of pips (dots) exhibiting on the highest face of the die after the roll; 1, 2, 3, 4, 5, and 6 are thus the one achievable scores for any given roll. The participant with the very best complete rating on the finish of N rolls wins the sport. Assuming that N is a big quantity (say, 500), what ought to we count on to see on the conclusion of the sport? Will there be an outright winner or a tie?

It seems that, as N will get massive, the entire scores of every of the gamers are more likely to converge to three.5*N. For instance, after 500 rolls, the entire scores of you and your mates are more likely to be round 3.5*500 = 1750. To see why, discover that, for a good, six-sided die, the likelihood of any facet being on high after a roll is 1/6. On common, the rating of a person roll will subsequently be (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5, i.e., the common of all achievable scores per roll — this additionally occurs to be the anticipated worth of a die roll. Assuming that the outcomes of all rolls are impartial of one another, we might count on the common rating of the N rolls to be 3.5. So, after 500 rolls, we shouldn’t be shocked if every participant has a complete rating of roughly 1750. Actually, there’s a so-called robust legislation of huge numbers in arithmetic, which states that in case you repeat an experiment (like rolling a die) a sufficiently massive variety of occasions, the common results of all these experiments ought to converge nearly certainly to the anticipated worth.

A Sport of Roulette

Subsequent, allow us to take into account roulette, a well-liked recreation at casinos. Think about you’re taking part in a simplified model of roulette in opposition to a pal as follows. The roulette wheel has 38 pockets, and the sport ends after N rounds. For every spherical, it’s essential to choose a complete quantity between 1 and 38, after which your pal will spin the roulette wheel and throw a small ball onto the spinning wheel. As soon as the wheel stops spinning, if the ball leads to the pocket with the quantity that you just picked, your pal can pay you $35; if the ball leads to any of the opposite pockets, nonetheless, it’s essential to pay your pal $1. How a lot cash do you count on you and your pal to make after N rounds?

You would possibly suppose that, since $35 is much more than $1, your pal will find yourself paying you fairly a bit of cash by the point the sport is completed — however not so quick. Allow us to apply the identical primary method we used within the cube recreation to investigate this seemingly profitable recreation of roulette. For any given spherical, the likelihood of the ball ending up within the pocket with the quantity that you just picked is 1/38. The likelihood of the ball ending up in another pocket is 37/38. Out of your perspective, the common consequence per spherical is subsequently $35*1/38 – $1*37/38 = -$0.0526. So, evidently you’ll really find yourself owing your pal a little bit over a nickel after every spherical. After N rounds, you can be out of pocket by round $0.0526*N. In case you play 500 rounds, as within the cube recreation above, you’ll find yourself paying your pal roughly $26. That is an instance of a recreation that’s rigged to favor the “home” (i.e., the on line casino, or on this case, your pal).

Formal Definition

Let X be a random variable that may yield any one in every of ok consequence values, x₁, x₂, …, x_ok, every with chances p₁, p₂, …, p_ok of occurring, respectively. The anticipated worth, E(X), of X is the sum of the result values weighted by their respective chances of incidence:

The entire anticipated worth of N impartial occurrences of X will likely be N*E(X).

The video under walks via some extra hands-on examples of anticipated worth calculations:

Within the following case research, we are going to see how anticipated worth evaluation can support determination making underneath uncertainty. Fictitious firm names are used all through to protect the anonymity of the companies concerned.

Case Examine 1: Fraud Detection in E-Commerce

Automobiles On-line is a web-based platform for reselling used vehicles throughout Europe. Professional automobile dealerships and personal homeowners of used vehicles can listing their automobiles on the market on Automobiles On-line. A typical itemizing will embody the asking worth of the vendor, details concerning the automobile (e.g., its primary properties, particular options, and particulars of any damages/wear-and-tear), and images of the automobile’s inside and exterior. Patrons can flick thru the numerous listings on the platform, and having discovered one they like, can click on on a button on the itemizing web page to contact the vendor to rearrange a viewing, and finally make the acquisition. Automobiles On-line costs sellers a small month-to-month price to indicate listings on the platform. To drive such subscription-based income, the method for sellers to join the platform and create listings is stored so simple as doable.

The difficulty is that a few of the listings on the platform could in reality be pretend. An unintended consequence of lowering the limitations for creating listings is that malicious customers can arrange pretend vendor accounts and create pretend listings (usually impersonating reputable automobile dealerships) to lure and probably defraud unsuspecting consumers. Faux listings can have a unfavorable enterprise influence on Automobiles On-line in two methods. First, fearing reputational injury, affected sellers could take their listings to different competing platforms, publicly criticize Automobiles On-line for its apparently lax safety requirements (which could set off different sellers to additionally depart the platform), and even sue for damages. Second, affected consumers (and those who hear concerning the cases of fraud within the press, on social media, and from family and friends) may abandon the platform, and write unfavorable opinions on-line — all of which may additional persuade sellers (the platform’s key income supply) to depart.

In opposition to this backdrop, the chief product officer (CPO) at Automobiles On-line has tasked a product supervisor and a cross-functional crew of buyer success representatives, knowledge scientists, and engineers to evaluate the potential of utilizing AI to fight the scourge of fraudulent listings. The CPO just isn’t occupied with mere opinions — she desires a data-driven estimate of the online worth of implementing an AI system that may assist shortly detect and delete fraudulent listings from the platform earlier than they will trigger any injury.

Anticipated worth evaluation can be utilized to estimate the online worth of the AI system by contemplating the chances of right and incorrect predictions and their respective advantages and prices. Specifically, we will distinguish between 4 circumstances: (1) accurately detected pretend listings (true positives), (2) reputable listings incorrectly deemed pretend (false positives), (3) accurately detected reputable listings (true negatives), and (4) pretend listings incorrectly deemed reputable (false negatives). The online financial influence, C(i), of every case i could be estimated with the assistance of historic knowledge and stakeholder interviews. Each true positives and false positives will end in some effort for Automobiles On-line to take away the recognized listings, however the false positives will end in extra prices (e.g., revenues misplaced attributable to eradicating reputable listings and the price of efforts to reinstate these). In the meantime, whereas true negatives ought to incur no prices, false negatives could be costly — these symbolize the very fraud that the CPO goals to fight.

Given an AI mannequin with a sure predictive accuracy, if P(i) denotes the likelihood of every case i occurring in follow, then the sum S = C(1)*P(1) + C(2)*P(2) + C(3)*P(3) + C(4)*P(4) displays the anticipated worth of every prediction (see Determine 1 under). The entire anticipated worth for N predictions would then be N*S.

Determine 1: Anticipated Worth of Fraud Prediction in Automobiles On-line Case Examine

Based mostly on the predictive efficiency profile of a given AI mannequin and estimates of anticipated worth for every of the 4 circumstances (from true positives to false negatives), the CPO can get a greater sense of the anticipated worth of constructing an AI system for fraud detection and make a go/no-go determination for the challenge accordingly. In fact, extra mounted and variable prices normally related to constructing, working, and sustaining AI programs also needs to be factored into the general determination making.

This article considers an analogous case examine, during which a recruiting company decides to implement an AI system for figuring out and prioritizing good leads (candidates more likely to be employed by shoppers) over unhealthy ones. Readers are inspired to undergo that case examine and mirror on the similarities and variations with the one mentioned right here.

Case Examine 2: Auto-Finishing Buy Orders

The procurement division of ACME Auto, an American automobile producer, creates a major variety of buy orders each month. Constructing a single automobile requires a number of thousand particular person components that must be procured on time and on the proper high quality customary from permitted suppliers. A crew of buying clerks is chargeable for manually creating the acquisition orders; this entails filling out a web-based type consisting of a number of knowledge fields that outline the exact specs and portions of every merchandise to be bought per order. For sure, it is a time-consuming and error-prone exercise, and as a part of a company-wide cost-cutting initiative, the Chief Procurement Officer of ACME Auto has tasked a cross-functional product crew inside her division to considerably automate the creation of buy orders utilizing AI.

Having carried out person analysis in shut collaboration with the buying clerks, the product crew has determined to construct an AI function for auto-filling fields in buy orders. The AI can auto-fill fields based mostly on a mix of any preliminary inputs supplied by the buying clerk and different related info sourced from grasp knowledge tables, inputs from manufacturing traces, and so forth. The buying clerk can then evaluate the auto-filled order and has the choice of both accepting the AI-generated proposals (i.e., predictions) for every subject or overriding incorrect proposals with guide entries. In circumstances the place the AI is not sure of the proper worth to fill (as exemplified by a low mannequin confidence rating for the given prediction), the sector is left clean, and the clerk should manually fill it with an acceptable worth. An AI function for flexibly auto-filling varieties on this method could be constructed utilizing an method known as denoising, as described in this article.

To make sure top quality, the product crew want to set a threshold for mannequin confidence scores, such that solely predictions with confidence scores above this predefined threshold are proven to the person (i.e., used to auto-fill the acquisition order type). The query is: what threshold worth ought to be chosen?

Let c₁ and c₂ be the payoffs of exhibiting right and incorrect predictions to the person (attributable to being above the boldness threshold), respectively. Let c₃ and c₄ be the payoffs of not exhibiting right and incorrect predictions to the person (attributable to being under the boldness threshold), respectively. Presumably, there ought to be a optimistic payoff (i.e., a profit) to exhibiting right predictions (c₁) and never exhibiting incorrect ones (c₄). In contrast, c₂ and c₃ ought to be unfavorable payoffs (i.e., prices). Selecting a threshold that’s too low will increase the prospect of exhibiting unsuitable predictions that the clerk should manually right (c₂). However selecting a threshold that’s too excessive will increase the prospect of right predictions not being proven, leaving clean fields on the acquisition order type that the clerk would want to spend some effort to manually fill in (c₃). The product crew thus has a trade-off on its palms — can anticipated worth evaluation assist resolve it?

Because it occurs, the crew is ready to estimate cheap values for the payoff elements c₁, c₂, c₃, and c₄ by leveraging findings from person analysis and enterprise area know-how. Moreover, the information scientists on the product crew are capable of estimate the chances of incurring these prices by coaching an instance AI mannequin on a dataset of historic buy orders at ACME Auto and analyzing the outcomes. Suppose ok is the boldness rating connected to a prediction. Then given a predefined mannequin confidence threshold t, let q(ok > t) denote the proportion of predictions which have confidence scores better than t; these are the predictions that might be used to auto-fill the acquisition order type. The proportion of predictions with confidence rating under the edge worth is q(ok ≤ t) = 1 – q(ok > t). Moreover, let p(ok > t) and p(ok ≤ t) denote the common accuracies of predictions which have confidence scores better than t and at most t, respectively. The anticipated worth (or anticipated payoff) S per prediction could be derived by summing up the anticipated values attributable to every of the 4 payoff drivers (denoted s₁, s₂, s₃, and s₄), as proven in Determine 2 under. The duty for the product crew is then to check numerous threshold values t and determine one which maximizes the anticipated payoff S.

Determine 2: Anticipated Payoff per Prediction in ACME Auto Case Examine

Case Examine 3: Standardizing AI Design Steerage

The CEO of Ex Corp, a world enterprise software program vendor, has not too long ago declared her intention to make the corporate “AI-first” and infuse all of its services and products with high-value AI options. To help this company-wide transformation effort, the Chief Product Officer has tasked the central design crew at Ex Corp with making a constant set of design pointers to assist groups construct AI merchandise that improve person expertise. A key problem is managing the trade-off between creating steering that’s too weak/high-level (giving particular person product groups better freedom of interpretation whereas risking inconsistent software of the steering throughout product groups) and steering that’s too strict (imposing standardization throughout product groups with out due regard for product-specific exceptions or customization wants).

One well-intentioned piece of steering that the central design crew initially got here up with entails displaying labels subsequent to predictions on the UI (e.g., “best choice,” “good different,” or comparable), to provide customers some indication of the anticipated high quality/relevance of the predictions. It’s thought that exhibiting such qualitative labels would assist customers make knowledgeable choices throughout their interactions with AI merchandise, with out overwhelming them with hard-to-interpret statistics comparable to mannequin confidence scores. Specifically, the central design crew believes that by stipulating a constant, world set of mannequin confidence thresholds, a standardized mapping could be created for translating between mannequin confidence scores and qualitative labels for merchandise throughout Ex Corp. For instance, predictions with confidence scores better than 0.8 could be labeled as “greatest,” predictions with confidence scores between 0.6 and 0.8 could be labeled as “good,” and so forth.

As we’ve got seen within the earlier case examine, it’s doable to make use of anticipated worth evaluation to derive a mannequin confidence threshold for a selected use case, so it’s tempting to attempt to generalize this threshold throughout all use circumstances within the product portfolio. Nonetheless, that is trickier than it first appears, and the likelihood principle underlying anticipated worth evaluation can assist us perceive why. Take into account two easy video games, a coin flip and a die roll. The coin flip entails two doable outcomes, touchdown heads or tails, every with a 1/2 likelihood of occurring (assuming a good coin). In the meantime, as we mentioned beforehand, rolling a good, six-sided die entails six doable outcomes for the top-facing facet (1, 2, 3, 4, 5, or 6 pips), every with a 1/6 likelihood of occurring. A key perception right here is that, because the variety of doable outcomes of a random variable (additionally known as the cardinality of the result set) will increase, it usually turns into tougher and tougher to accurately guess the result of an arbitrary occasion. In case you guess that the following coin flip will end in heads, you can be proper half the time on common. However in case you guess that you’ll roll any specific quantity (say, 3) on the following die roll, you’ll solely be right one out of six occasions on common.

Now, what if we have been to set a world confidence threshold of, say, 0.4 for each the coin and cube video games? If an AI mannequin for the cube recreation predicts a 3 on the following roll with a confidence rating of 0.45, then we would fortunately label this prediction as “good” and even “nice”; in any case, the boldness rating is above the predefined world threshold and considerably larger than 1/6 (the success likelihood of a random guess). Nonetheless, if an AI mannequin for the coin recreation predicts heads on the following coin flip with the identical confidence rating of 0.45, we could suspect that it is a false optimistic and never present the prediction to the person in any respect; though the boldness rating is above the predefined threshold, it’s nonetheless under 0.5 (the success likelihood of a random guess).

The above evaluation suggests {that a} single, one-size-fits-all stipulation to show qualitative labels subsequent to predictions ought to be struck from the standardized design steering for AI use circumstances. As a substitute, maybe particular person product groups ought to be empowered to make use-case-specific choices about how one can show qualitative labels (if in any respect).

The Wrap

Resolution making underneath uncertainty is a key concern for AI product groups, and can seemingly acquire in significance in a future dominated by AI. On this context, anticipated worth evaluation can assist information AI product administration. The anticipated worth of an unsure consequence represents the theoretical, long-term, common worth of that consequence. Utilizing real-life case research, this text exhibits how anticipated worth evaluation can assist groups make educated, strategic choices underneath uncertainty throughout the product lifecycle.

As with every such mathematical modeling method, nonetheless, it’s price emphasizing two necessary factors. First, an anticipated worth calculation is just pretty much as good as its structural completeness and the accuracy of its inputs. If all related worth drivers aren’t included, the calculation will likely be structurally incomplete, and the ensuing findings will likely be inaccurate. Utilizing conceptual frameworks such because the matrices and tree diagrams proven in Figures 1 and a couple of above can assist groups confirm the completeness of their calculations. Readers can confer with this ebook to discover ways to leverage conceptual frameworks. If the information and/or assumptions used to derive the result values and their chances are defective, then the ensuing anticipated worth will likely be inaccurate, and probably damaging if used to tell strategic determination making (e.g., wrongly sunsetting a promising product). Second, it’s normally a good suggestion to pair a quantitative method like anticipated worth evaluation with qualitative approaches (e.g., buyer interviews, observing how customers work together with the merchandise) to get a well-rounded image. Qualitative insights can assist us do sanity checks of inputs to the anticipated worth calculation, higher interpret the quantitative outcomes, and finally derive holistic suggestions for determination making.

Source link

Expected Value Analysis in AI Product Management

Loop Engineering for RAG Question Parsing: The Small Loop That Runs Before Retrieval

How to Find the Optimal Coding Agent Interface

I Completed Five Years in Analytics Consulting: 5 Lessons That Changed How I Work

GPU-Resident Top-K for Agentic RAG: I Built a CUDA Kernel So My Retrieval Step Would Stop Bouncing Off the GPU

Can Machine Learning Predict the World Cup?

Automate Writing Your LLM Prompts

These Were My Favorite Things Samsung Unpacked During Its 2026 Galaxy Event

AI minister role boosted but tech department axed in Burnham shake-up

Loop Engineering for RAG Question Parsing: The Small Loop That Runs Before Retrieval

The risk of weather data sabotage is rising

Featured Picks

10 Best Prime Day Pet Deals on Amazon (2025)

Accommodation accessibility startup Heartful is up for grabs – or closing down

PENN Entertainment announces board-approved reshuffle in a bid to streamline leadership

Expected Value Analysis in AI Product Management

Anticipated Worth

A Sport of Cube

A Sport of Roulette

Formal Definition

Case Examine 1: Fraud Detection in E-Commerce

Case Examine 2: Auto-Finishing Buy Orders

Case Examine 3: Standardizing AI Design Steerage

The Wrap

Related Posts