EDIT: WordPress seems to garble the code sections on occasion for no good reason. If you want to run the code, you should download the original file instead. Sorry.
This is a tutorial for how to use Hidden Markov Models (HMMs) in Haskell. We will use the Data.HMM package to find genes in the second chromosome of Vitis vinifera: the wine grape vine. Predicting gene locations is a common task in bioinformatics that HMMs have proven good at.
The basic procedure has three steps. First, we create an HMM to model the chromosome. We do this by running the Baum-Welch training algorithm on all the DNA. Second, we create an HMM to model transcription factor binding sites. This is where genes are located. Finally, we use Viterbi’s algorithm to determine which HMM best models the DNA at a given location in the chromosome. If it’s the first, this is probably not the start of a gene. If it’s the second, then we’ve found a gene!