DIY AI & ML: Solving The Multi-Armed Bandit Problem with Thompson Sampling

Introduction

of data-driven decision-making. Not solely do most organizations keep huge databases of data, however in addition they have numerous groups that depend on this information to tell their decision-making. From clickstream visitors to wearable edge gadgets, telemetry, and rather more, the velocity and scale of data-driven decision-making are rising exponentially, driving the recognition of integrating machine studying and AI frameworks.

Talking of data-driven decision-making frameworks, one of the crucial dependable and time-tested approaches is A/B testing. A/B testing is particularly widespread amongst web sites, digital merchandise, and related shops the place buyer suggestions within the type of clicks, orders, and so on., is acquired almost immediately and at scale. What makes A/B testing such a strong choice framework is the flexibility to manage for numerous variables so {that a} stakeholder can see the impact the aspect they’re introducing within the take a look at has on a key efficiency indicator (KPI).

Like all issues, there are drawbacks to A/B testing, notably the time it will probably take. Following the conclusion of a take a look at, somebody should talk the outcomes, and stakeholders should use the suitable channels to succeed in a call and implement it. All that misplaced time can translate into a chance value, assuming the take a look at expertise demonstrated an affect. What if there have been a framework or an algorithm that might systematically automate this course of? That is the place Thompson Sampling comes into play.

The Multi-Armed Bandit Downside

Think about you go to the on line casino for the primary time and, standing earlier than you, are three slot machines: Machine A, Machine B, and Machine C. You don’t have any concept which machine has the very best payout; nonetheless, you provide you with a intelligent concept. For the primary few pulls, assuming you don’t run out of luck, you pull the slot machine arms at random. After every pull, you file the consequence. After a couple of iterations, you check out your outcomes, and also you check out the win charge for every machine:

Machine A: 40%
Machine B: 30%
Machine C: 50%

At this level, you determine to drag Machine C at a barely larger charge than the opposite two, as you consider there may be extra proof that Machine C has the very best win charge, but you need to acquire extra information to make sure. After the following few iterations, you check out the brand new outcomes:

Machine A: 45%
Machine B: 25%
Machine C: 60%

Now, you will have much more confidence that Machine C has the very best win charge. This hypothetical instance is what gave the Multi-Armed Bandit Downside its title and is a traditional instance of how Thompson Sampling is utilized.

This Bayesian algorithm is designed to decide on between a number of choices with unknown reward distributions and maximize the anticipated reward. It accomplishes this by the exploration-exploitation tradeoff. Because the reward distributions are unknown, the algorithm chooses choices at random, collects information on the outcomes, and, over time, progressively chooses choices at the next charge that yield the next common reward.

On this article, I’ll stroll you thru how one can construct your personal Thompson Sampling Algorithm object in Python and apply it to a hypothetical but real-life instance.

Electronic mail Headlines — Optimizing the Open Price

Picture by Mariia Shalabaieva
on Unsplash. Free to make use of beneath the Unsplash License

On this instance, assume the function of somebody in a advertising and marketing group charged with e-mail campaigns. Previously, the crew examined which headlines led to larger e-mail open charges utilizing an A/B testing framework. Nonetheless, this time, you suggest implementing a multi-armed bandit method to start out realizing worth quicker.

To reveal the effectiveness of a Thompson Sampling (also referred to as the bandit) method, I’ll construct a Python simulation that compares it to a random method. Let’s get began.

Step 1 – Base Electronic mail Simulation

This would be the foremost object for this mission; it would function a base template for each the random and bandit simulations. The initialization perform shops some primary info wanted to execute the e-mail simulation, particularly, the headlines of every e-mail and the true open charges. One merchandise I need to stress is the true open charges. They may”be “unknown” to the precise simulation and might be handled as possibilities when an e-mail is distributed. A random quantity generator object can also be created to permit one to duplicate a simulation, which could be helpful. Lastly, now we have a built-in perform, reset_results(), that I’ll talk about subsequent.

import numpy as np
import pandas as pd

class BaseEmailSimulation:
    """
    Base class for e-mail headline simulations.

    Shared duties:
    - retailer headlines and their true open possibilities
    - simulate a binary email-open final result
    - reset simulation state
    - construct a abstract desk from the most recent run
    """

    def __init__(self, headlines, true_probabilities, random_state=None):
        self.headlines = listing(headlines)
        self.true_probabilities = np.array(true_probabilities, dtype=float)

        if len(self.headlines) == 0:
            increase ValueError("No less than one headline have to be offered.")

        if len(self.headlines) != len(self.true_probabilities):
            increase ValueError("headlines and true_probabilities should have the identical size.")

        if np.any(self.true_probabilities < 0) or np.any(self.true_probabilities > 1):
            increase ValueError("All true_probabilities have to be between 0 and 1.")

        self.n_arms = len(self.headlines)
        self.rng = np.random.default_rng(random_state)

        # Floor-truth finest arm info for analysis
        self.best_arm_index = int(np.argmax(self.true_probabilities))
        self.best_headline = self.headlines[self.best_arm_index]
        self.best_true_probability = float(self.true_probabilities[self.best_arm_index])

        # Outcomes from the most recent accomplished simulation
        self.reset_results()

reset_results()

For every simulation, it’s helpful to have many particulars, together with:

Which headline was chosen at every step
whether or not or not the e-mail despatched resulted in an open
General opens and open charge

The attributes aren’t explicitly outlined on this perform; they’ll be outlined later. As an alternative, this perform resets them, permitting a recent historical past for every simulation run. That is particularly vital for the bandit subclass, which I’ll present you later within the article.

def reset_results(self):
    """
    Clear all outcomes from the most recent simulation.
    Known as mechanically at initialization and at first of every run().
    """
    self.reward_history = []
    self.selection_history = []
    self.historical past = pd.DataFrame()
    self.summary_table = pd.DataFrame()
    self.total_opens = 0
    self.cumulative_opens = []

send_email()

The following perform that must be featured is how the e-mail sends might be executed. Given an arm index (headline index), the perform samples precisely one worth from a binomial distribution with the true chance charge for that headline with precisely one unbiased trial. This can be a sensible method, as sending an e-mail has precisely two outcomes: it’s opened or ignored. Opened and ignored might be represented by 1 and 0, respectively, and the binomial perform from numpy will do exactly that, with the possibility of return”n” “1” being equal to the true chance of the respective e-mail headline.

def send_email(self, arm_index):
    """
    Simulate sending an e-mail with the chosen headline.

    Returns
    -------
    int
        1 if opened, 0 in any other case.
    """
    if arm_index < 0 or arm_index >= self.n_arms:
        increase IndexError("arm_index is out of bounds.")

    true_p = self.true_probabilities[arm_index]
    reward = self.rng.binomial(n=1, p=true_p)

    return int(reward)

_finalize_history() & build_summary_table()

Lastly, these two capabilities work in conjunction by taking the outcomes of a simulation and constructing a clear abstract desk that exhibits metrics such because the variety of instances a headline was chosen, opened, true open charge, and the realized open charge.

def _finalize_history(self, information):
    """
    Convert round-level information right into a DataFrame and populate
    shared consequence attributes.
    """
    self.historical past = pd.DataFrame(information)

    if not self.historical past.empty:
        self.reward_history = self.historical past["reward"].tolist()
        self.selection_history = self.historical past["arm_index"].tolist()
        self.total_opens = int(self.historical past["reward"].sum())
        self.cumulative_opens = self.historical past["reward"].cumsum().tolist()
    else:
        self.reward_history = []
        self.selection_history = []
        self.total_opens = 0
        self.cumulative_opens = []

    self.summary_table = self.build_summary_table()

def build_summary_table(self):
    """
    Construct a abstract desk from the most recent accomplished simulation.

    Returns
    -------
    pd.DataFrame
        Abstract by headline.
    """
    if self.historical past.empty:
        return pd.DataFrame(columns=[
            "arm_index",
            "headline",
            "selections",
            "opens",
            "realized_open_rate",
            "true_open_rate"
        ])

    abstract = (
        self.historical past
        .groupby(["arm_index", "headline"], as_index=False)
        .agg(
            picks=("reward", "measurement"),
            opens=("reward", "sum"),
            realized_open_rate=("reward", "imply"),
            true_open_rate=("true_open_rate", "first")
        )
        .sort_values("arm_index")
        .reset_index(drop=True)
    )

    return abstract

Step 2 – Subclass: Random Electronic mail Simulation

With a purpose to correctly gauge the affect of a multi-armed bandit method for e-mail headlines, we have to examine it in opposition to a benchmark, on this case, a randomized method, which additionally mirrors how an A/B take a look at is executed.

select_headline()

That is the core of the Random Electronic mail Simulation class, select_headline() chooses an integer between 0 and the variety of headlines (or arms) at random.

def select_headline(self):
    """
    Choose one headline uniformly at random.
    """
    return int(self.rng.integers(low=0, excessive=self.n_arms))

run()

That is how the simulation is executed. All that’s wanted is the variety of iterations from the tip consumer. It leverages the select_headline() perform in tandem with the send_email() perform from the mother or father class. At every spherical, an e-mail is distributed, and outcomes are recorded.

def run(self, num_iterations):
    """
    Run a recent random simulation from scratch.

    Parameters
    ----------
    num_iterations : int
        Variety of simulated e-mail sends.
    """
    if num_iterations <= 0:
        increase ValueError("num_iterations have to be higher than 0.")

    self.reset_results()
    information = []
    cumulative_opens = 0

    for round_number in vary(1, num_iterations + 1):
        arm_index = self.select_headline()
        reward = self.send_email(arm_index)
        cumulative_opens += reward

        information.append({
            "spherical": round_number,
            "arm_index": arm_index,
            "headline": self.headlines[arm_index],
            "reward": reward,
            "true_open_rate": self.true_probabilities[arm_index],
            "cumulative_opens": cumulative_opens
        })

    self._finalize_history(information)

Thompson Sampling & Beta Distributions

Earlier than diving into our bandit subclass, it’s important to cowl the arithmetic behind Thompson Sampling in additional element. I’ll cowl this by way of our hypothetical e-mail instance on this article.

Let’s first think about what we all know to date about our present state of affairs. There’s a set of e-mail headlines, and we all know every has an related open charge. We want a framework to determine which e-mail headline to ship to a buyer. Earlier than going additional, let’s outline some variables:

Headlines:
- 1: “Your Unique Spring Supply Is right here.”
- 2: “48 Hours Solely: Save 25%
- 3: “Don’t Miss Your Member Low cost”
- 4: “Ending Tonight: Closing Likelihood to have”
- 5: “A Little One thing Only for You”

A_i = Headline (arm) at index i

t_i = Time or the present variety of the iteration (e-mail ship) to be carried out

r_i = The reward noticed at time t_i, consequence might be open or ignored

Now we have but to ship the primary e-mail. Which headline ought to we choose? That is the place the Beta Distribution comes into play. A Beta Distribution is a steady chance distribution outlined on the interval (0,1). It has two key variables representing successes and failures, respectively, alpha & beta. At time t = 1, all headlines begin with alpha = 1 and beta = 1. Electronic mail opens add 1 to alpha; in any other case, beta will get incremented by 1.

At first look, you would possibly suppose the algorithm is assuming a 50% true open charge at first. This isn’t essentially the case, and this assumption would fully neglect the entire level of the Thompson Sampling method: the exploration-exploitation tradeoff. The alpha and beta variables are used to construct a Beta Distribution for every particular person headline. Previous to the primary iteration, these distributions will look one thing like this:

I promise there may be extra to it than only a horizontal line. The x-axis represents possibilities from 0 to 1. The y-axis represents the density for every chance, or the realm beneath the curve. Utilizing this distribution, we pattern a random worth for every e-mail, then use the very best worth as the e-mail’s headline. On this primary iteration, the choice framework is only random. Why? Every worth has the identical space beneath the curve. However what about after a couple of extra iterations? Keep in mind, every reward is both added to alpha or to beta within the respective beta distribution. Let’s see what the distribution seems to be like with alpha = 10 and beta = 10.

There definitely is a distinction, however what does that imply within the context of our drawback? Initially, if alpha and beta are equal to 10, it means we chosen that headline 18 instances and noticed 9 successes (e-mail opens) and 9 failures (e-mail ignored). Thus, the realized open charge for this headline is 0.5, or 50%. Keep in mind, we all the time begin with alpha and beta equal to 1. If we randomly pattern a worth from this distribution, what do you suppose will probably be? Most definitely, one thing near 0.5, however it’s not assured. Let’s have a look at yet one more instance and set alpha and beta equal to 100.

Now there’s a a lot larger likelihood {that a} randomly sampled worth might be someplace round 0.5. This development demonstrates how Thompson Sampling seamlessly strikes from exploration to exploitation. Let’s see how we will construct an object that executes this framework.

Step 3 – Subclass: Bandit Electronic mail Simulation

Let’s check out some key attributes, beginning with alpha_prior and beta_prior. They’re set to 1 every time a BanditSimulation() object is initialized. “Prior” is a key time period on this context. At every iteration, our choice about which headline to ship is dependent upon a chance distribution, generally known as the Posterior. Subsequent, this object inherits a couple of choose attributes from the BaseEmailSimulation mother or father class. Lastly, a customized perform referred to as reset_bandit_state() known as. Let’s talk about that perform subsequent.

class BanditSimulation(BaseEmailSimulation):
    """
    Thompson Sampling e-mail headline simulation.

    Every headline is modeled with a Beta posterior over its
    unknown open chance. At every iteration, one pattern is drawn
    from every posterior, and the headline with the most important pattern is chosen.
    """

    def __init__(
        self,
        headlines,
        true_probabilities,
        alpha_prior=1.0,
        beta_prior=1.0,
        random_state=None
    ):
        tremendous().__init__(
            headlines=headlines,
            true_probabilities=true_probabilities,
            random_state=random_state
        )

        if alpha_prior <= 0 or beta_prior <= 0:
            increase ValueError("alpha_prior and beta_prior have to be optimistic.")

        self.alpha_prior = float(alpha_prior)
        self.beta_prior = float(beta_prior)

        self.reset_bandit_state()

reset_bandit_state()

The objects I’ve constructed for this text are meant to run in a simulation; subsequently, we have to embody failsafes to stop information leakage between simulations. The reset_bandit_state() perform accomplishes this by resetting the posterior for every headline at any time when it’s run or when a brand new Bandit class is initiated. In any other case, we threat working a simulation as if the info had already been gathered, which defeats the entire objective of a Thompson Sampling method.

def reset_bandit_state(self):
    """
    Reset posterior state for a recent Thompson Sampling run.
    """
    self.alpha = np.full(self.n_arms, self.alpha_prior, dtype=float)
    self.beta = np.full(self.n_arms, self.beta_prior, dtype=float)

Choice & Reward Features

Beginning with posterior_means(), we will use this perform to return the realized open charge for any given headline. The following perform, select_headline(), samples a random worth from a headline’s posterior and returns the index of the most important worth. Lastly, now we have update_posterior(), which increments alpha or beta for a specific headline based mostly on the reward.

def posterior_means(self):
    """
    Return the posterior imply for every headline.
    """
    return self.alpha / (self.alpha + self.beta)

def select_headline(self):
    """
    Draw one pattern from every arm's Beta posterior and
    choose the headline with the very best sampled worth.
    """
    sampled_values = self.rng.beta(self.alpha, self.beta)
    return int(np.argmax(sampled_values))

def update_posterior(self, arm_index, reward):
    """
    Replace the chosen arm's Beta posterior utilizing the noticed reward.
    """
    if arm_index < 0 or arm_index >= self.n_arms:
        increase IndexError("arm_index is out of bounds.")

    if reward not in (0, 1):
        increase ValueError("reward have to be both 0 or 1.")

    self.alpha[arm_index] += reward
    self.beta[arm_index] += (1 - reward)

run() and build_summary_table()

All the things is in place to execute a Thompson Sampling-driven simulation. Observe, we name reset_results() and reset_bandit_state() to make sure now we have a recent run, in order to not depend on earlier info. On the finish of every simulation, outcomes are aggregated and summarized by way of the customized build_summary_table() perform.

def run(self, num_iterations):
    """
    Run a recent Thompson Sampling simulation from scratch.

    Parameters
    ----------
    num_iterations : int
        Variety of simulated e-mail sends.
    """
    if num_iterations <= 0:
        increase ValueError("num_iterations have to be higher than 0.")

    self.reset_results()
    self.reset_bandit_state()

    information = []
    cumulative_opens = 0

    for round_number in vary(1, num_iterations + 1):
        arm_index = self.select_headline()
        reward = self.send_email(arm_index)
        self.update_posterior(arm_index, reward)

        cumulative_opens += reward

        information.append({
            "spherical": round_number,
            "arm_index": arm_index,
            "headline": self.headlines[arm_index],
            "reward": reward,
            "true_open_rate": self.true_probabilities[arm_index],
            "cumulative_opens": cumulative_opens,
            "posterior_mean": self.posterior_means()[arm_index],
            "alpha": self.alpha[arm_index],
            "beta": self.beta[arm_index]
        })

    self._finalize_history(information)

    # Rebuild abstract desk with further posterior columns
    self.summary_table = self.build_summary_table()

def build_summary_table(self):
    """
    Construct a abstract desk for the most recent Thompson Sampling run.
    """
    if self.historical past.empty:
        return pd.DataFrame(columns=[
            "arm_index",
            "headline",
            "selections",
            "opens",
            "realized_open_rate",
            "true_open_rate",
            "final_posterior_mean",
            "final_alpha",
            "final_beta"
        ])

    abstract = (
        self.historical past
        .groupby(["arm_index", "headline"], as_index=False)
        .agg(
            picks=("reward", "measurement"),
            opens=("reward", "sum"),
            realized_open_rate=("reward", "imply"),
            true_open_rate=("true_open_rate", "first")
        )
        .sort_values("arm_index")
        .reset_index(drop=True)
    )

    abstract["final_posterior_mean"] = self.posterior_means()
    abstract["final_alpha"] = self.alpha
    abstract["final_beta"] = self.beta

    return abstract

Working the Simulation

Picture by Markus Spiske
on Unsplash. Free to make use of beneath the Unsplash License

One remaining step earlier than working the simulation, check out a customized perform I constructed particularly for this step. This perform runs a number of simulations given a listing of iterations. It additionally outputs an in depth abstract instantly evaluating the random and bandit approaches, particularly exhibiting key metrics comparable to the extra e-mail opens from the bandit, the general open charges, and the elevate between the bandit open charge and the random open charge.

def run_comparison_experiment(
    headlines,
    true_probabilities,
    iteration_list=(100, 1000, 10000, 100000, 1000000),
    random_seed=42,
    bandit_seed=123,
    alpha_prior=1.0,
    beta_prior=1.0
):
    """
    Run RandomSimulation and BanditSimulation aspect by aspect throughout
    a number of iteration counts.

    Returns
    -------
    comparison_df : pd.DataFrame
        Excessive-level comparability desk throughout iteration counts.

    detailed_results : dict
        Nested dictionary containing simulation objects and abstract tables
        for every iteration rely.
    """

    comparison_rows = []
    detailed_results = {}

    for n in iteration_list:
        # Recent objects for every simulation measurement
        random_sim = RandomSimulation(
            headlines=headlines,
            true_probabilities=true_probabilities,
            random_state=random_seed
        )

        bandit_sim = BanditSimulation(
            headlines=headlines,
            true_probabilities=true_probabilities,
            alpha_prior=alpha_prior,
            beta_prior=beta_prior,
            random_state=bandit_seed
        )

        # Run each simulations
        random_sim.run(num_iterations=n)
        bandit_sim.run(num_iterations=n)

        # Core metrics
        random_opens = random_sim.total_opens
        bandit_opens = bandit_sim.total_opens

        random_open_rate = random_opens / n
        bandit_open_rate = bandit_opens / n

        additional_opens = bandit_opens - random_opens

        opens_lift_pct = (
            ((bandit_opens - random_opens) / random_opens) * 100
            if random_opens != 0 else np.nan
        )

        open_rate_lift_pct = (
            ((bandit_open_rate - random_open_rate) / random_open_rate) * 100
            if random_open_rate != 0 else np.nan
        )

        comparison_rows.append({
            "iterations": n,
            "random_opens": random_opens,
            "bandit_opens": bandit_opens,
            "additional_opens_from_bandit": additional_opens,
            "opens_lift_pct": opens_lift_pct,
            "random_open_rate": random_open_rate,
            "bandit_open_rate": bandit_open_rate,
            "open_rate_lift_pct": open_rate_lift_pct
        })

        detailed_results[n] = {
            "random_sim": random_sim,
            "bandit_sim": bandit_sim,
            "random_summary_table": random_sim.summary_table.copy(),
            "bandit_summary_table": bandit_sim.summary_table.copy()
        }

    comparison_df = pd.DataFrame(comparison_rows)

    # Optionally available formatting helpers
    comparison_df["random_open_rate"] = comparison_df["random_open_rate"].spherical(4)
    comparison_df["bandit_open_rate"] = comparison_df["bandit_open_rate"].spherical(4)
    comparison_df["opens_lift_pct"] = comparison_df["opens_lift_pct"].spherical(2)
    comparison_df["open_rate_lift_pct"] = comparison_df["open_rate_lift_pct"].spherical(2)

    return comparison_df, detailed_results

Reviewing the Outcomes

Right here is the code for working each simulations and the comparability, together with a set of e-mail headlines and the corresponding true open charge. Let’s see how the bandit carried out!

headlines = [
    "48 Hours Only: Save 25%",
    "Your Exclusive Spring Offer Is Here",
    "Don’t Miss Your Member Discount",
    "Ending Tonight: Final Chance to Save",
    "A Little Something Just for You"
]

true_open_rates = [0.18, 0.21, 0.16, 0.24, 0.20]

comparison_df, detailed_results = run_comparison_experiment(
    headlines=headlines,
    true_probabilities=true_open_rates,
    iteration_list=(100, 1000, 10000, 100000, 1000000),
    random_seed=42,
    bandit_seed=123
)

display_df = comparison_df.copy()
display_df["random_open_rate"] = (display_df["random_open_rate"] * 100).spherical(2).astype(str) + "%"
display_df["bandit_open_rate"] = (display_df["bandit_open_rate"] * 100).spherical(2).astype(str) + "%"
display_df["opens_lift_pct"] = display_df["opens_lift_pct"].spherical(2).astype(str) + "%"
display_df["open_rate_lift_pct"] = display_df["open_rate_lift_pct"].spherical(2).astype(str) + "%"

display_df

At 100 iterations, there isn’t a actual distinction between the 2 approaches. At 1,000, it’s an identical final result, besides the bandit method is lagging this time. Now have a look at what occurs within the remaining three iterations with 10,000 or extra: the bandit method constantly outperforms by 20%! That quantity could not look like a lot; nonetheless, think about it’s for a big enterprise that may ship thousands and thousands of emails in a single marketing campaign. That 20% might ship thousands and thousands of {dollars} in incremental income.

My Closing Ideas

The Thompson Sampling method can definitely be a strong device within the digital world, notably as a web based A/B testing various for campaigns and suggestions. That being mentioned, it has the potential to work out a lot better in some eventualities greater than others. To conclude, here’s a fast guidelines one can make the most of to find out if a Thompson Sampling method might show to be helpful:

A single, clear KPI
- The method is dependent upon a single final result for rewards; subsequently, regardless of the underlying exercise, the success metric of that exercise should have a transparent, single final result to be thought of profitable.
A close to instantaneous reward mechanism
- The reward mechanism must be someplace between close to instantaneous and inside a matter of minutes as soon as the exercise is impressed upon the client or consumer. This enables the algorithm to obtain suggestions rapidly, thereby optimizing quicker.
Bandwidth or Finances for numerous iterations
- This isn’t a magic quantity for what number of e-mail sends, web page views, impressions, and so on., one should obtain to have an efficient Thompson Sampling exercise; nonetheless, if you happen to refer again to the simulation outcomes, the larger the higher.
A number of & Distinct Arms
- Arms, because the metaphor from the bandit drawback, regardless of the expertise, the variations, comparable to the e-mail headlines, have to be distinct or have excessive variability to make sure one is maximizing the exploration area. For instance, in case you are testing the colour of a touchdown web page, as an alternative of testing totally different shades of a single coloration, think about testing fully totally different colours.

I hope you loved my introduction and simulation with Thompson Sampling and the Multi-Armed Bandit drawback! If you’ll find an appropriate outlet for it, you could discover it extraordinarily helpful.

Source link

DIY AI & ML: Solving The Multi-Armed Bandit Problem with Thompson Sampling

Git UNDO : How to Rewrite Git History with Confidence

Your RAG Gets Confidently Wrong as Memory Grows – I Built the Memory Layer That Stops It

The LLM Gamble | Towards Data Science

Context Payload Optimization for ICL-Based Tabular Foundation Models

What Does the p-value Even Mean?

From Risk to Asset: Designing a Practical Data Strategy That Actually Works

Git UNDO : How to Rewrite Git History with Confidence

China’s open-source bet: 10 Things That Matter in AI Right Now

Alaskan HS640 lightweight composite high-roof pickup truck camper

Wolverine star Hugh Jackman joins ticketing startup Humanitix as Head of Impact

Featured Picks

IEEE Spectrum’s Top Biomedical Stories of 2025

Natural Language Visualization and the Future of Data Analysis and Presentation

Nvidia to invest $5bn in rival Intel

DIY AI & ML: Solving The Multi-Armed Bandit Problem with Thompson Sampling

Introduction

The Multi-Armed Bandit Downside

Electronic mail Headlines — Optimizing the Open Price

Step 1 – Base Electronic mail Simulation

reset_results()

send_email()

_finalize_history() & build_summary_table()

Step 2 – Subclass: Random Electronic mail Simulation

select_headline()

run()

Thompson Sampling & Beta Distributions

Step 3 – Subclass: Bandit Electronic mail Simulation

reset_bandit_state()

Choice & Reward Features

run() and build_summary_table()

Working the Simulation

Reviewing the Outcomes

My Closing Ideas

Related Posts