Disclaimer: The information offered on this article is totally artificial and purely hypothetical. It was generated solely for illustrative and academic functions. Any resemblance to real-world knowledge, people, or organizations is only coincidental.
Have you ever ever performed a co-operative recreation or sport? Let’s take into account one other instance, however this time within the skilled world. Let’s say you’re a part of a corporation whose major technique of driving product gross sales is thru its e-commerce web site. Inside this group, you doubtless have varied advertising groups that drive clients to the web site by on-line ads, electronic mail campaigns, and different channels. The web site itself is maintained by one other set of groups whose obligations embrace design, merchandising, suggestion methods, and plenty of extra. The group should additionally take into account the product itself and the groups that create, enhance, and develop new merchandise. This poses a major query to the group:
How can worth be correctly attributed in order that extra strategic selections could be made going ahead?
If that query was not already tough sufficient to reply, we should take into account a extra life like viewpoint. All of those groups throughout the enterprise rely upon one another to make a sale. The product groups want a approach to promote the merchandise. The group has a web site to facilitate this. For the web site to make a sale, it requires clients; that’s the place the advertising groups have to create, launch, and keep campaigns that drive visitors to the web site. Recognizing the intertwined dependencies among the many group’s groups, the enterprise should perceive worth when it comes to workforce coalitions. That is the place Harsanyi Dividends come into play.
Harsanyi Dividends
Harsanyi dividends, an idea from cooperative recreation idea, measure the surplus worth of coalitions in a cooperative recreation. The important thing right here is the extra worth. A co-operative recreation is an idea in recreation idea, the research of how individuals work together in a recreation or exercise with a shared purpose.
Let’s do a chicken’s-eye view of cooperative recreation idea, particularly transferable-utility (TU) cooperative video games. In TU cooperative recreation idea, gamers can type coalitions to attain a collective payoff in an agreed-upon approach. For these of us who’ve constructed sturdy predictive fashions utilizing frameworks similar to XGBoost or different ensemble strategies, we have now most likely discovered ourselves utilizing Shapley values to grasp the contribution of every characteristic, because the mannequin itself is a black field. Shapley values may also be used to find out every participant’s payoff in a coalition in TU cooperative video games. There may be definitely plenty of worth in a framework similar to Shapley values for understanding particular person contributions; nonetheless, Harsanyi Dividends assist us know the additional worth generated by coalitions. Let’s take a look at a hypothetical instance.
A cooperative recreation — Dragon Slayer
Let’s say three buddies get collectively to play a brand new co-op online game the place the purpose is to work collectively to inflict as a lot harm as attainable on a dragon. The gamers are Andrew, Bryan, and Carson. They’ve performed this recreation many instances; nonetheless, not all gamers play collectively each time. Typically, it’s only Andrew & Carson, Carson & Bryan, Carson by himself, and so on. They’ve performed this recreation a lot that each attainable subset of the group has performed it many instances, together with classes with just one participant.
Carson, a knowledge scientist by commerce, desires to assemble a deeper understanding of the group’s efficiency. He gathered the scores of every session and finally had a median rating for every coalition and particular person. Check out these mixture scores under. Every participant can be represented by their first preliminary. We’ll signify the typical rating of every coalition/particular person with v(n).
v(a) = 10
v(b) = 12
v(c) = 18
v(a,b) = 27
v(a,c) = 23
v(c,b) = 29
v(a,b,c) = 37
We will clearly see that Carson is the highest particular person participant, whereas Andrew & Bryan are the highest duo. You might be most likely not stunned that the three-player coalition yields the very best rating. Carson, the curious knowledge scientist, desires to check these interactions amongst his buddies in higher depth. To take action, he decides to calculate Harsanyi Dividends to see which group collaborated probably the most successfully. Now this can be a difficult query. We will simply see the scores by every coalition; nonetheless, what if we adjusted for what particular person gamers already contributed? We will uncover which coalitions improve one another or act as a detriment to what the smaller coalitions of gamers already contribute. In different phrases, the place does 1+1 equal one thing greater than two, and the place does 1+1 equal one thing lower than two?
To perform this, we’ll use the system under:
Let’s break it down piece by piece.

This represents the Harsanyi Dividend for coalition S.

The Sigma image is utilized in arithmetic to show taking the sum of a sequence of phrases in a compact type. Slightly below it exhibits the expression of the phrases we’re taking a sum over. On this case, it explicitly reads T subset S. T is a subset of the coalition S, which is the coalition for which we’re calculating the dividend. Together with Sigma, this demonstrates that we’re taking the sum of all attainable subsets of S in a particular method that we’ll focus on subsequent. One remaining be aware on this half, the coalition itself is taken into account a subset.

The pipe symbols round S and T point out that we’re wanting on the sizes of the units. The results of these variations is the ability we elevate unfavorable one to in every summation. Then, it’s multiplied by the worth for subset T.
Calculating the Dividends — People
Let’s begin with the person gamers (Andrew, Bryan, and Carson), as this would be the most simple. For comfort, listed below are their particular person aggregated scores talked about earlier:
v(a) = 10
v(b) = 12
v(c) = 18
Are you able to guess what their Harsanyi Dividends are? Let’s begin with Andrew and calculate it step-by-step (or simply step).

For every a part of the sum, we’d like the subset’s worth and its dimension. For particular person gamers, that leaves us with just one subset (itself), so we solely should undergo the loop as soon as.
Beginning with the exponent, the scale of set a is only one. T may also be of dimension one. This leaves us with elevating -1 to the 0 energy, which yields 1. We then multiply that by our price for a, which is 10, yielding a Harsanyi dividend of 10 for Andrew. For people, the dividend is simply the worth.
Dividend (a) = 10
Dividend (b) = 12
Dividend (c) = 18
Calculating the Dividends — Pairs
Let’s calculate the dividends for Andrew and Bryan (a, b). The subsets are (a,b), (a), and (b). Due to this fact, we could have three sums.
Sum #1, Subset (a), v(a) = 10, dimension of (a) = 1,
-1^([a,b]-[a]) * v(a) = -1^(2-1) * 10 = -10
Sum #2, Subset (b), v(b) = 12, dimension of (b) = 1,
-1^([a,b]-[b]) * v(b) = -1^(2-1) * 12 = -12
Sum #3, Subset (a,b), v(a,b) = 27, dimension of (a,b) = 2,
-1^([a,b]-[a,b]) * v(a,b) = -1^(2-2) * 27 = 27
Add all of them collectively, and we get:
Dividend (a,b) = 5
Let’s pause right here to debate some fast instinct behind calculating Harsanyi dividends for pairs. To place it merely, the dividend is solely the worth of the pair minus the values of the people within the pair. In different phrases, it exhibits whether or not the pair generates a surplus of worth or loses worth after they work collectively. On this instance, Andrew & Bryan show they performed the sport extra effectively collectively. Check out the dividends for the remaining pairs. What insights can we derive? The very first thing that involves my thoughts is that Carson might be not the very best teammate, not less than when he’s in a pair. Let’s see how issues change once we take a look at the trio.
Dividend (a,c) = -5
Dividend (c,b) = -1
Calculating the Dividends — Trios
Buckle up, there are plenty of sums right here; nonetheless, it’s important to grasp which values are added versus subtracted within the trio calculation.
- Sum #1, Subset (a), v(a) = 10, dimension of (a) = 1
- -1^([a,b,c]-[a]) * v(a) = -1^(3-1) * 10 = 10
- Sum #2, Subset (b), v(b) = 12, dimension of (b) = 1
- -1^([a,b,c]-[b]) * v(b) = -1^(3-1) * 12 = 12
- Sum #3, Subset (c), v(c) = 18, dimension of (c) = 1
- -1^([a,b,c]-[c]) * v(c) = -1^(3-1) * 18 = 18
- Sum #4, Subset (a,c), v(ab) = 27, dimension of (a,b) = 2
- -1^([a,b,c]-[a,b]) * v(a,b) = -1^(3-2) * 27 = -27
- Sum #5, Subset (a,c), v(a,c) = 23, dimension of (a,c) = 2
- -1^([a,b,c]-[a,c]) * v(a,c) = -1^(3-2) * 23 = -23
- Sum #6, Subset (c,b), v(c,b) = 29, dimension of (c,b) = 2
- -1^([a,b,c]-[c,b]) * v(c,b) = -1^(3-2) * 29 = -29
- Sum #7, Subset (a,b,c), v(a,b,c) = 37, dimension of (a,b,c) = 3
- -1^([a,b,c]-[a,b,c]) * v(a,b,c) = -1^(3-3) * 37 = 37
Dividend (a,b,c) = -2
So there’s definitely plenty of math, however it’s simple. What in regards to the instinct behind what is occurring? As you simply noticed, calculating dividends for one and two-player coalitions is kind of easy to execute with out the system; nonetheless, when you get to three-player coalitions and above, the steps improve exponentially. With the three-player coalition particularly, it’s simple to see that the two-player coalition values get subtracted, whereas the one-player coalition values get added again in. What about four-player coalitions? Three participant coalitions would get subtracted, two would get added again in, singles can be subtracted, and so on. You possibly can simply extrapolate the sample right here; nonetheless, what does this sample of subtracting and including truly do? Let’s give attention to the three-player instance. By subtracting the 2 participant coalition values, we’re eradicating the synergy obtained from that coalition and the decrease degree values from the smaller coalitions inside it, nonetheless, when this occurs, it truly over-subtracts worth and when the one participant values are added again in, we’re adjusting for the over-subtracted worth and are leftover with the pure synergy from the three participant coalition.
Actual World Utility — E-commerce Web site

Going again to our unique instance, let’s construct an utility that calculates Harsanyi Dividends for an e-commerce web site for all the actions a buyer can carry out, in order that we will get a way of which parts of the web site work effectively collectively. These insights can help stakeholders with the next questions:
- The place ought to we ship clients relying on the web page?
- What merchandise have up-sell or cross-sell alternatives?
- What are the very best touchdown pages for various channels?
- Which journeys needs to be improved or needs to be eliminated altogether?
The Harsanyi Utility
The entire mission could be discovered on my GitHub here. I’ll stroll you thru the three core recordsdata: synthetic_data.py, dividends.py, and app.py.
synthetic_data.py
Why embrace an artificial knowledge characteristic? One in every of my targets for this mission is to be instructional, and the artificial technology knowledge portion permits an finish person to shortly discover the instrument and even acquire a way of the kind of knowledge the instrument is designed to deal with. Notice, there’s additionally an choice for a person to add their very own knowledge through a CSV file.
Here’s a simplified view of what the information ought to seem like:
| search engine optimisation | Product Web page | Desktop | Conversion | |
| 1 | 0 | 1 | 0 | 1 |
| 0 | 0 | 1 | 0 | 1 |
| 0 | 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 1 | 0 |
As you may see, every characteristic and the goal column (Conversion) are boolean. Every statement could be implied as a buyer or a web site session. Within the artificial knowledge part, the characteristic variables could be in three classes: channel, web page, and machine; nonetheless, if you’re importing your personal knowledge, you should use no matter you need so long as it’s Boolean.
Function Variable Propensities
Producing “good” artificial knowledge means making it as life like as attainable. On this mission, which means we should embrace life like propensities for every characteristic variable.
Within the file, I added an inventory of characteristic propensity ranges. These could be simply configured and are utilized to mannequin the propensity to transform. A few of them deliberately should not have a spread, however for people who do, they’re handed by a customized randomization perform that outputs a worth in between the vary.
FEATURE_PROPENSITY_RANGES
FEATURE_PROPENSITY_RANGES: Dict[str, Tuple[float, float]] = {
# Channels
"electronic mail": (2.0, 2.0),
"search engine optimization": (6.0, 6.0),
"sem": (6.0, 6.0),
"direct": (5.0, 5.0),
"show": (1.0, 1.0),
"social": (1.0, 1.0),
"affiliate": (7.0, 7.0),
# Pages (A/F with ranges the place specified)
"product_page_a": (5.0, 7.0),
"product_page_b": (4.0, 8.0),
"product_page_c": (5.0, 7.0),
"product_page_d": (4.0, 8.0),
"product_page_e": (5.0, 7.0),
"product_page_f": (4.0, 8.0),
"deals_page": (6.0, 6.0),
"search_page": (5.0, 5.0),
"homepage": (4.0, 4.0),
"account_page": (7.0, 7.0),
"support_page": (3.0, 3.0),
# Machine
"device_desktop": (6.0, 6.0),
"device_mobile": (3.0, 3.0),
}
_coef_range_for_score
The ranges themselves can’t be plugged instantly right into a mannequin to generate a pattern that yields a median conversion fee of round 5%. We’re engaging in this through a Logistic Regression, which requires life like coefficients within the linear perform. To transform these ranges into significant coefficients, I created the next perform:
def _coef_range_for_score(rating: float) -> Tuple[float, float]:
if rating <= 2.0:
return (-1.0, -0.3) # unfavorable impact
elif rating <= 4.0:
return (-0.3, 0.3) # close to impartial
elif rating <= 6.0:
return (0.3, 1.0) # reasonable constructive
elif rating <= 8.0:
return (1.0, 2.5) # sturdy constructive
else:
return (2.5, 4.0) # very sturdy constructive
_sample_marginal_probabilities
Whereas propensity is important, one should additionally take into account how typically we anticipate a person to work together with every channel, web page, or machine. Due to this fact, we’d like a perform that determines how often every factor is interacted with by a buyer. Notice, I’ve the channel part proven, however the remaining are executed equally. Understand that all of the capabilities you could have seen thus far be certain that the artificial knowledge is completely different every time it’s generated.
def _sample_marginal_probabilities(
rng: np.random.Generator,
) -> Tuple[Dict[str, float], float]:
probs: Dict[str, float] = {}
# Channels – pretty sparse, some extra widespread (search engine optimisation, Direct)
probs["email"] = rng.uniform(0.03, 0.15)
probs["seo"] = rng.uniform(0.10, 0.60)
probs["sem"] = rng.uniform(0.05, 0.40)
probs["direct"] = rng.uniform(0.10, 0.50)
probs["display"] = rng.uniform(0.01, 0.10)
probs["social"] = rng.uniform(0.03, 0.20)
probs["affiliate"] = rng.uniform(0.02, 0.15)
_build_logistic_spec
The next perform is what builds the logistic regression mannequin. Listed here are the primary few strains.
def _build_logistic_spec(
rng: np.random.Generator,
) -> LogisticSpec:
scores = _sample_feature_scores(rng)
# Primary results
main_effects = {}
for characteristic in ALL_BINARY_FEATURES:
rating = scores[feature]
lo, hello = _coef_range_for_score(rating)
main_effects[feature] = rng.uniform(lo, hello)
To disclose interactions among the many variables, we might want to add interplay phrases to the mannequin. To perform this, we’ll add some capabilities inside our customized logistic regression perform that add interplay phrases for mixtures of two and three options. These could be configured within the perform itself, as you may see within the second code block.
interactions_2 = {}
interactions_3 = {}
strong_2 = (1.0, 3.0)
moderate_2 = (0.5, 1.5)
weak_2 = (-0.3, 0.3)
strong_3 = (1.5, 3.5)
moderate_3 = (0.7, 2.0)
def add_interaction_2(a, b, coef_range):
key = tuple(sorted((a, b)))
interactions_2[key] = rng.uniform(*coef_range)
def add_interaction_3(a, b, c, coef_range):
key = tuple(sorted((a, b, c)))
interactions_3[key] = rng.uniform(*coef_range)
add_interaction_3("sem", "product_page_a", "deals_page", strong_3)
add_interaction_3("search engine optimization", "product_page_c", "search_page", moderate_3)
Lastly, we add the intercept. Notice that whereas this can guarantee our baseline mannequin retains to knowledge at round a 5% conversion fee, we might want to fine-tune it to maintain it shut to five%.
intercept = float(np.log(0.05 / (1.0 - 0.05)))
return LogisticSpec(intercept, main_effects, interactions_2, interactions_3)
_compute_linear_predictor
Now, the earlier perform doesn’t truly construct the mannequin; it units the stage by making a dictionary of options, characteristic interactions, and their related coefficients. The perform under iterates and returns the output as soon as the values for a given statement are plugged in.
def _compute_linear_predictor(
df: pd.DataFrame,
spec: LogisticSpec,
) -> np.ndarray:
z = np.full(form=len(df), fill_value=spec.intercept, dtype=float)
# Primary results
for f, beta in spec.main_effects.objects():
if f in df.columns:
z += beta * df[f].values
# 2-way
for (a, b), beta in spec.interactions_2.objects():
if a in df.columns and b in df.columns:
z += beta * (df[a].values * df[b].values)
# 3-way
for (a, b, c), beta in spec.interactions_3.objects():
if a in df.columns and b in df.columns and c in df.columns:
z += beta * (df[a].values * df[b].values * df[c].values)
return z
_calibrate_intercept_to_global_rate
Conversion charges can range considerably; nonetheless, I imagine it’s secure to imagine that almost all web sites obtain conversions from a small variety of their clients. On this instrument, we’ll regulate the information to take a conversion fee of round 5%. There are a couple of methods we will do that; nonetheless, I discover probably the most environment friendly strategy is to regulate the intercept time period till we get a threshold near the 5% goal. The perform under does simply that. The ultimate perform that follows this one combines the whole lot previous it and is what is definitely referred to as within the utility.
def _calibrate_intercept_to_global_rate(
df: pd.DataFrame,
spec: LogisticSpec,
target_rate: float = 0.05,
max_iter: int = 8,
) -> LogisticSpec:
for _ in vary(max_iter):
z = _compute_linear_predictor(df, spec)
p = expit(z)
mean_p = float(p.imply())
if mean_p <= 0 or mean_p >= 1:
break # one thing degenerate; quit
current_odds = mean_p / (1.0 - mean_p)
target_odds = target_rate / (1.0 - target_rate)
delta = np.log(target_odds / current_odds)
spec.intercept += float(delta)
# Early cease if shut sufficient
if abs(mean_p - target_rate) < 0.002:
break
return spec
dividends.py
As you most likely guessed, this file is the engine that computes the Harsanyi Dividends. We already went by a sturdy train reviewing how they’re calculated; subsequently, I feel it’s far more productive to debate how the dividends can be calculated within the context of this instrument.
Clickstream knowledge, in itself, could be very sparse, as a typical buyer journey might contain a number of particular person actions. This poses a problem when calculating coalition values. Say we have now a dataset of 100k clients with all the actions they took, and we need to calculate the coalition worth for patrons who interacted with the homepage and a product web page. We might discover solely a handful of consumers who carried out these two actions alone; subsequently, for every coalition, we’ll verify whether or not a buyer carried out these actions no matter what else they did. From there, we take the typical to acquire the coalition’s worth. One important be aware I ought to point out is that there isn’t any formal definition of how a worth rating needs to be calculated within the context of Harsanyi dividends; subsequently, one wants to make use of one’s finest judgment. On this instance, taking the typical is efficient as a result of we’re utilizing binary knowledge and the typical yields a proportion or share. Now, if we have been utilizing income as an alternative, taking the typical might be considerably deceptive because of potential outliers.
Lastly, I ought to point out that this file makes use of parallel programming through the concurrent.futures module and the dynamic configurations. Parallel programming can considerably scale back the time required to compute Harsanyi dividends when working with giant datasets. There may be additionally an choice to designate the utmost dimension of the coalitions for which you want to calculate dividends. The aim of this instrument is to present stakeholders one thing actionable they’ll work with. If you’re delivering coalitions of buyer journeys that embrace a number of interactions, this could result in many fragmented alternatives that would stretch obtainable sources somewhat than specializing in a couple of small, high-value coalitions. The final configuration I’ll point out is the minimal knowledge proportion for a coalition to be included within the calculations. This ensures that any alternatives that the instrument uncovers have a good pattern dimension.
Demo utilizing Artificial Information
Now, let’s do a fast demonstration with the instrument. We’ll go from begin to end utilizing the artificial dataset choice and finish with a couple of insights.
Step 1: Generate an artificial Dataset

Step 2: Configure the utmost coalition dimension and the minimal % of knowledge required for a coalition to be counted, then calculate the Harsanyi Dividends.

Step 3: Analyze the Outcomes

The ensuing dataframe can be sorted by the Harsanyi Dividend column; subsequently, one would almost definitely see that the primary few coalitions are from the one gamers. Given the context wherein one would doubtless use Harsanyi Dividends, particular person gamers aren’t invaluable, however they’re sensible in that context. The actual affect comes from analyzing multiplayer coalitions. Let’s check out a couple of through the export of the above desk.

These are the multiple-player coalitions with the biggest Harsanyi Dividends; in different phrases, the gamers who generate probably the most synergy collectively. So, what will we do with this info?
The highest multi-player coalition is “offers web page” & “SEM”, extra virtually talking, clients who went to the offers web page from a SEM marketing campaign. One suggestion you may present as knowledgeable is that extra funding might be useful for all these campaigns.
What about the following couple of coalitions? There seem like varied mixtures of product pages. You may suggest upsell or cross-sell experiences for these merchandise, as conversion charges improve measurably when clients work together with these pages throughout the identical journey. Upselling and cross-selling these merchandise collectively may show to be useful.
Conclusion
I may go on and on in regards to the limitless alternatives a Harsanyi Dividend-derived evaluation may ship, particularly in a high-volume advertising or on-line retailer surroundings the place numerous variables are all the time at work. To conclude, I need to depart you all with a couple of suggestions on the subject of driving concepts and alternatives through Harsanyi Dividends:
- Discover a steadiness between coalition worth and quantity: You’ll undoubtedly encounter conditions the place you establish useful coalitions, however specializing in them would have an effect on solely a fraction of the enterprise or clients. It is important to discover a wholesome steadiness from this angle.
- Follow moderately sized coalitions: Pitching alternatives or concepts to giant coalitions may show expensive from a number of angles. In my e-commerce web site instance, there could also be situations the place a useful coalition spans a number of pages and maybe quite a few advertising channels. If I inform stakeholders to give attention to these mixtures, it may require advanced investments throughout varied groups and applied sciences. With that being mentioned, if it’s a giant coalition of a number of related pages, then any funding improve might be streamlined. In the end, a fairly sized coalition will rely upon the enterprise case. As with all knowledge science mission, area information is vital right here.
- Translate Dividends into measurable affect: Any alternative or thought pitched to a stakeholder will almost definitely require a monetary affect. Due to this fact, one wants to have the ability to translate a Harsanyi Dividend into an funding return. It may be so simple as reverting to the coalition worth metric and including some multiplier in case you suggest a mission that might result in a bigger coalition dimension, for instance, extra campaigns from a particular channel to a selected web page, as I discussed earlier. There’ll almost definitely be numerous methods to perform the sort of mathematical translation.
I hope you loved this text! I discover this space of co-operative recreation idea plenty of enjoyable! If you wish to be taught extra, you’ll want to try the unique revealed paper from John Harsanyi entitled: A Simplified Bargaining Model for the n-Person Cooperative Game, revealed in 1963.

