Want to make sure you win the coin toss just a little more often than you should? I certainly do, so I made some unfair coins. We’ll use the beta distribution to see just how unfair they are. While this is just a toy example problem for using the beta distribution, machine learning algorithms rely on this distribution for learning just about everything. Math is an amazing thing that way.
Making the coins
We’ll make our unfair coins by bending them. Our hypothesis is that the concave side will have less area to land on, and so the coin should land on it less often. Let’s get started.
It’s easy to bend the coins with your teeth:
WAIT! That really hurts! Using pliers or wrenches works much better:
I made seven coins this way, each with a different bending angle.
I did 100 flips for each coin, making sure each flip went at least a foot in the air and spun real well. “Umm… only 100 flips?” you ask, “That can’t be enough!” Just you wait until the section on the math.
Here’s the raw results:
Now for the math
Coin flipping is a Bernoulli process. This just means that all trials (flips) can have only two outcomes (heads or tails), and each trial is independent of every other trial. What we’re interested in calculating is the expected value of a coin flip for each of our coins. That is, what is the probability it will come up heads? The obvious way to calculate this probability is simply to divide the number of heads by the total number of trials. Unfortunately, this doesn’t give us a good idea about how accurate our estimate is.
Enter the beta distribution. This is a distribution over the bias of a Bernoulli process. Intuitively, this means that CDF(x) equals the probability that the expectation of a coin flip is x. In other words, we’re finding the probability that a probability is what we think it should be. That’s a convoluted definition! Some examples should make it clearer.
The beta distribution takes two parameters and . is the number of heads we have flipped plus one, and is the number of tails plus one. We’ll talk about why that plus one is there in a bit, but first let’s see what the distribution actually looks like with some example parameters.
In both the above cases, the distribution is centered around 0.5 because and are equal—we’ve gotten the same number of heads as we have tails. As these parameters increase, the distribution gets tighter and tighter. This should makes sense. The more flips we do, the more confident we can be that the data we’ve collected actually match the characteristics of the coin.
When the parameters are not equal to each other—for example, we’ve seen twice as many heads as we have tails—then the distribution is skewed to the left or right accordingly. The peak of the PDF occurs at:
That’s exactly what we said the expectation of the next coin flip should be above. Awesome!
So what happens when and are one?
We get the flat distribution. Basically, we haven’t flipped the coin at all yet, so we have no data about how our coin is biased, so all biases are equally likely. This is why we must add one to the number of heads and tails we have flipped to get the appropriate and .
If and are less than one, we get something like this:
Hopefully, this has given you an intuitive sense for what the beta distribution looks like. But for the pedantic, here’s how the beta distribution’s pdf is formally defined:
Where is the gamma function—you can think of it as being a generalization of factorials to the real numbers. That is, . Excel, many calculators, and any scientific programming package will be able to calculate that for you easily. Most of these applications will even have the beta function already built in.
Applying the beta distribution to our coins
We’re finally ready to see just how biased our coins actually are!
Amazingly, it takes some pretty big bends to make a biased coin. It is not until coin 3, which has an almost 90 degree bend that we can say with any confidence that the coin is biased at all. People might notice if you tried to flip that coin to settle a bet!