Functors and monads are powerful design patterns used in Haskell. They give us two cool tricks for analyzing data. First, **we can “preprocess” data after we’ve already trained a model**. The model will be automatically updated to reflect the changes. Second, this whole process happens **asymptotically faster** than the standard method of preprocessing. In some cases, you can do it in constant time no matter how many data points you have!

This post focuses on how to use functors and monads in practice with the HLearn library. We won’t talk about their category theoretic foundations; instead, we’ll go through **ten concrete examples** involving the categorical distribution. This distribution is somewhat awkwardly named for our purposes because it has nothing to do with category theory—it is the most general distribution over non-numeric (i.e. categorical) data. It’s simplicity should make the examples a little easier to follow. Some more complicated models (e.g. the kernel density estimator and Bayesian classifier) also have functor and monad instances, but we’ll save those for another post.

Read the rest of this entry »