# Statistics

You are currently browsing the archive for the Statistics category.

## Markov Networks, Monoids, and Futurama

In this post, we’re going to look at how to manipulate multivariate distributions in Haskell’s HLearn library. There are many ways to represent multivariate distributions, but we’ll use a technique called Markov networks.  These networks have the algebraic structure called a monoid (and group and vector space), and training them is a homomorphism.  Despite the scary names, these mathematical structures make working with our distributions really easy and convenient—they give us online and parallel training algorithms “for free.” If you want to go into the details of how, you can check out my TFP13 submission, but in this post we’ll ignore those mathy details to focus on how to use the library in practice.  We’ll use a running example of creating a distribution over characters in the show Futurama.

## The categorical distribution’s algebraic structure

The categorical distribution is the main distribution for handling discrete data. I like to think of it as a histogram.  For example, let’s say Simon has a bag full of marbles.  There are four “categories” of marbles—red, green, blue, and white.  Now, if Simon reaches into the bag and randomly selects a marble, what’s the probability it will be green?  We would use the categorical distribution to find out.

In this article, we’ll go over the math behind the categorical distribution, the algebraic structure of the distribution, and how to manipulate it within Haskell’s HLearn library.  We’ll also see some examples of how this focus on algebra makes HLearn’s interface more powerful than other common statistical packages.  Everything that we’re going to see is in a certain sense very “obvious” to a statistician, but this algebraic framework also makes it convenient.  And since programmers are inherently lazy, this is a Very Good Thing.

Before delving into the “cool stuff,” we have to look at some of the mechanics of the HLearn library.

## Nuclear weapon statistics using monoids, groups, and modules in Haskell

The Bulletin of the Atomic Scientists tracks the nuclear capabilities of every country. We’re going to use their data to demonstrate Haskell’s HLearn library and the usefulness of abstract algebra to statistics. Specifically, we’ll see that the categorical distribution and kernel density estimates have monoid, group, and module algebraic structures.  We’ll explain what this crazy lingo even means, then take advantage of these structures to efficiently answer real-world statistical questions about nuclear war. It’ll be a WOPR!

## Gausian distributions form a monoid

#### (And why machine learning experts should care)

This is the first in a series of posts about the HLearn library for haskell that I’ve been working on for the past few months. The idea of the library is to show that abstract algebra—specifically monoids, groups, and homomorphisms—are useful not just in esoteric functional programming, but also in real world machine learning problems.  In particular, by framing a learning algorithm according to these algebraic properties, we get three things for free: (1) an online version of the algorithm; (2) a parallel version of the algorithm; and (3) a procedure for cross-validation that runs asymptotically faster than the standard version.

We’ll start with the example of a Gaussian distribution. Gaussians are ubiquitous in learning algorithms because they accurately describe most data.  But more importantly, they are easy to work with.  They are fully determined by their mean and variance, and these parameters are easy to calculate.

In this post we’ll start with examples of why the monoid and group properties of Gaussians are useful in practice, then we’ll look at the math underlying these examples, and finally we’ll see that this technique is extremely fast in practice and results in near perfect parallelization.

## How to create an unfair coin and prove it with math

Want to make sure you win the coin toss just a little more often than you should?  I certainly do, so I made some unfair coins.  We’ll use the beta distribution to see just how unfair they are.  While this is just a toy example problem for using the beta distribution, machine learning algorithms rely on this distribution for learning just about everything. Math is an amazing thing that way.