The categorical distribution is the main distribution for handling discrete data. I like to think of it as a **histogram**. For example, let’s say Simon has a bag full of marbles. There are four “categories” of marbles—red, green, blue, and white. Now, if Simon reaches into the bag and randomly selects a marble, what’s the probability it will be green? We would use the categorical distribution to find out.

In this article, we’ll go over the math behind the categorical distribution, the algebraic structure of the distribution, and how to manipulate it within Haskell’s HLearn library. We’ll also see some examples of how this focus on algebra makes HLearn’s interface more powerful than other common statistical packages. Everything that we’re going to see is in a certain sense very “obvious” to a statistician, but this algebraic framework also makes it **convenient**. And since programmers are inherently lazy, this is a Very Good Thing.

Before delving into the “cool stuff,” we have to look at some of the mechanics of the HLearn library.