## HLearn’s code is shorter and clearer than Weka’s

Haskell code is expressive.  The HLearn library uses 6 lines of Haskell to define a function for training a Bayesian classifier; the equivalent code in the Weka library uses over 100 lines of Java.  That’s a big difference!  In this post, we’ll look at the actual code and see why the Haskell is so much more concise.

But first, a disclaimer:  It is really hard to fairly compare two code bases this way.  In both libraries, there is a lot of supporting code that goes into defining each classifier, and it’s not obvious what code to include and not include.  For example, both libraries implement interfaces to a number of probability distributions, and this code is not contained in the source count.  The Haskell code takes more advantage of this abstraction, so this is one language-agnostic reason why the Haskell code is shorter.  If you think I’m not doing a fair comparison, here’s some links to the full repositories so you can do it yourself:

## HLearn cross-validates >400x faster than Weka

Weka is one of the most popular tools for data analysis.  But Weka takes 70 minutes to perform leave-one-out cross-validate using a simple naive bayes classifier on the census income data set, whereas Haskell’s HLearn library only takes 9 seconds.  Weka is 465x slower!

Code and instructions for reproducing these experiments are available on github.

Read the rest of this entry »

## Turning an AK-47 into a serving ladle

This is the story of an AK-47 and a dead man named Isaiah.  Because of Isaiah, I forged this AK-47 into a serving ladle.

## Markov Networks, Monoids, and Futurama

In this post, we’re going to look at how to manipulate multivariate distributions in Haskell’s HLearn library.  There are many ways to represent multivariate distributions, but we’ll use a technique called Markov networks.  These networks have the algebraic structure called a monoid (and group and vector space), and training them is a homomorphism.  Despite the scary names, these mathematical structures make working with our distributions really easy and convenient—they give us online and parallel training algorithms “for free.”  If you want to go into the details of how, you can check out my TFP13 submission, but in this post we’ll ignore those mathy details to focus on how to use the library in practice.  We’ll use a running example of creating a distribution over characters in the show Futurama.

## Why (and how) I’m refusing to pay war taxes

Growing up, I wanted nothing more than to be a Naval officer.  But then Jesus changed my heart.  He’s been teaching me that instead of killing my enemies, I’m supposed to love them.  In fact, I’m supposed to dedicate my life to serving them.  Maybe even die for them.  So after 7 years in the navy, I left as a conscientious objector.  That’s also why I’m not paying my federal taxes this year.

You see, in the United States, roughly half of our tax dollars go to financing war.  (You can find a detailed breakdown here.)  This is ridiculous and unacceptable.  I would gladly pay more taxes to finance roads, schools, or public health care.  But I will no longer pay other people to kill America’s enemies on my behalf. Read the rest of this entry »

## The categorical distribution’s algebraic structure

The categorical distribution is the main distribution for handling discrete data. I like to think of it as a histogram.  For example, let’s say Simon has a bag full of marbles.  There are four “categories” of marbles—red, green, blue, and white.  Now, if Simon reaches into the bag and randomly selects a marble, what’s the probability it will be green?  We would use the categorical distribution to find out.

In this article, we’ll go over the math behind the categorical distribution, the algebraic structure of the distribution, and how to manipulate it within Haskell’s HLearn library.  We’ll also see some examples of how this focus on algebra makes HLearn’s interface more powerful than other common statistical packages.  Everything that we’re going to see is in a certain sense very “obvious” to a statistician, but this algebraic framework also makes it convenient.  And since programmers are inherently lazy, this is a Very Good Thing.

Before delving into the “cool stuff,” we have to look at some of the mechanics of the HLearn library.

## Nuclear weapon statistics using monoids, groups, and modules in Haskell

The Bulletin of the Atomic Scientists tracks the nuclear capabilities of every country. We’re going to use their data to demonstrate Haskell’s HLearn library and the usefulness of abstract algebra to statistics. Specifically, we’ll see that the categorical distribution and kernel density estimates have monoid, group, and module algebraic structures.  We’ll explain what this crazy lingo even means, then take advantage of these structures to efficiently answer real-world statistical questions about nuclear war. It’ll be a WOPR!

## My 2012 Experiments in Christianity

We don’t know what God wants, and we wouldn’t know how to do it even if we did.  Therefore (as Gandhi put it) we must “experiment with truth.”  We must discover truth for ourselves, and how to achieve it.

These are my experiments from 2012.  I didn’t try these experiments because they are somehow the “most Christ-like” thing to do.  I tried them because I don’t know what the most Christ-like thing is, but I want to learn.  I want to train myself to do it at all times.  Some of these experiments succeeded and some failed.  But all of them made me a better Christian.

## Gausian distributions form a monoid

#### (And why machine learning experts should care)

This is the first in a series of posts about the HLearn library for haskell that I’ve been working on for the past few months. The idea of the library is to show that abstract algebra—specifically monoids, groups, and homomorphisms—are useful not just in esoteric functional programming, but also in real world machine learning problems.  In particular, by framing a learning algorithm according to these algebraic properties, we get three things for free: (1) an online version of the algorithm; (2) a parallel version of the algorithm; and (3) a procedure for cross-validation that runs asymptotically faster than the standard version.

We’ll start with the example of a Gaussian distribution. Gaussians are ubiquitous in learning algorithms because they accurately describe most data.  But more importantly, they are easy to work with.  They are fully determined by their mean and variance, and these parameters are easy to calculate.

In this post we’ll start with examples of why the monoid and group properties of Gaussians are useful in practice, then we’ll look at the math underlying these examples, and finally we’ll see that this technique is extremely fast in practice and results in near perfect parallelization.