June 2013

You are currently browsing the monthly archive for June 2013.


Haskell code is expressive.  The HLearn library uses 6 lines of Haskell to define a function for training a Bayesian classifier; the equivalent code in the Weka library uses over 100 lines of Java.  That’s a big difference!  In this post, we’ll look at the actual code and see why the Haskell is so much more concise.

But first, a disclaimer:  It is really hard to fairly compare two code bases this way.  In both libraries, there is a lot of supporting code that goes into defining each classifier, and it’s not obvious what code to include and not include.  For example, both libraries implement interfaces to a number of probability distributions, and this code is not contained in the source count.  The Haskell code takes more advantage of this abstraction, so this is one language-agnostic reason why the Haskell code is shorter.  If you think I’m not doing a fair comparison, here’s some links to the full repositories so you can do it yourself:

Read the rest of this entry »

weka-lambda-haskellWeka is one of the most popular tools for data analysis.  But Weka takes 70 minutes to perform leave-one-out cross-validate using a simple naive bayes classifier on the census income data set, whereas Haskell’s HLearn library only takes 9 seconds.  Weka is 465x slower!

Code and instructions for reproducing these experiments are available on github.

Read the rest of this entry »