I occasionally have skype calls with computer programmers in North Korea, and one of the things we talk about is how to improve their internet infrastructure. Recently, we talked about how their kcna.kp webpage was using javascript incorrectly. This error prevented other websites from linking to articles published on kcna.kp and Google from searching those articles.
This minor technical problem had geopolitical implications. KCNA is the main newspaper in North Korea, and policy wonks closely analyze KCNA’s articles in order to better understand the North Korean government. A broken KCNA website makes their jobs harder and reduces the quality of discussion about North Korean policy.
As of 22 February, these problems with the KCNA webpage are now fixed.
To illustrate the changes that the KCNA web developers made, we’ll use the Internet Archive’s Wayback Machine to look at old versions of the website. The first snapshot of the kcna.kp webpage is from 20-April-2011.1 The front page shows Kim Jong Il performing on-the-spot guidance, and is the sort of picture that wonks go crazy over:
The webpage is reasonably nice looking, but if you click on any of the article links in the snapshot page, you’ll notice that they don’t work anymore. There’s no way to see the contents of these older articles or their associated images.
Inspecting the HTML source code we can see why. All the link tags look something like
<a href="javascript:onNews('specialnews','2011','410796')">
When you click the link, your browser calls the javascript onNews
function. This function is custom written for the KCNA webpage, and makes an AJAX call to display the article’s contents. Unfortunately, web crawlers cannot access the contents of these AJAX calls unless special procedures are followed, and the KCNA webpage did not follow these procedures. So the Internet Archive was not able to archive these links, and this bit of history is lost.2
The Wayback Machine has collected 2395 more snapshots of the KCNA webpage up through today. Looking through these records we can see that the kcna.kp website was redesigned in January 2013, and this redesign broke the webpage even more. The redesigned webpage uses javascript even for displaying the main body of the webpage, and so not even the homepage can be archived. The snapshot from 1-January-2013 is the last working snapshot before this redesign.
After 9 years, the webpage was finally fixed last month on 22-February-2022. The new webpage looks like:
The important part, however, is the underlying HTML code. The link tags now use standard HTML to include the URL directly in the tag with no javascript. For example, the link to the top article about Kim Jong Un above looks like
<a href="/kp/article/q/320150e5ae8e9bc8fdf3d6b8547eaeaf.kcmsf">
Crawlers are able to follow these links. So now, after a 9 year hiatus, the internet archive is once again able to archive articles from the KCNA. You can view the article above permanently archived in the Internet Archive repository along with 26 associated pictures. These automated archives of the KCNA are especially important for Western researchers because the KCNA is known to have altered historic articles in response to domestic purges.
Furthermore, Google3 is now able to index the KCNA’s articles. So analysts can do searches like site:kcna.kp united states
to find KCNA articles mentioning a topic of interest like the United States:
These usability improvements will help Western researchers navigate the KCNA’s published articles and learn about the DPRK. But there are still unfortunately some major problems with the webpage.
For example, if you click on any of the google links above, you’ll be taken to the “secure” webpage using the HTTPS protocol (instead of the HTTP protocol). Ordinarily, that’s a good thing, but the KCNA webpage uses a self-signed certificate, so you get a scary looking error message. On firefox, it looks like:
At first glance, this error message makes it look like the KCNA webpage might have something dangerous like a virus on it. That’s not the case though. The message just means that the webpage isn’t properly encrypted.
The North Korean government wants to fix these problems, and we should too. It’s in both their interest and ours to improve the communication between our countries’ foreign policy experts. Unfortunately, the current US sanctions regime makes this difficult. I have a standing invitation from my North Korean colleagues to visit them and teach about modern web standards, but the US has banned American passport holders from entering North Korea. So American sanctions are effectively preventing North Korea from improving their internet.
Prior to 2011, the KCNA was hosted online at http://kcna.co.jp, and the Wayback Machine has archives going back to 1997. Like most other webpages of that era, the kcna.co.jp webpage used simple HTML and had a rather crude appearance. The switch to the .kp ccTLD also entailed a rewrite of the interface to make it prettier and more modern. This rewrite introduced the javascript bugs described in this post. An archived post from North Korea Tech describes the switch from the kcna.co.jp domain to kcna.kp.↩
Technically, the contents of the KCNA articles themselves are not lost, they’re just much more difficult to access. Libraries maintain print copies of KCNA publications, and there is a custom archiver/search engine https://kcnawatch.org that was built specifically for tracking North Korean media. But the average policy researcher or reporter doesn’t have access to these resources, and so from their perspective this history was lost.↩
Other search engines are able to index kcna.kp now too, but the process takes time, especially for low traffic webpages. As of 22-Mar-2022, Yandex had indexed kcna.kp, but Bing and Baidu had not.↩
In 2016, I went to North Korea to teach a class of masters students how to contribute to open source software. Here’s an image from one of my lectures:
As part of the class, students were required to submit patches to a project of their choosing, and I want to share the stories of how two of these patches landed into the popular machine learning libraries mlpack and vowpal wabbit. I believe these examples highlight how academic collaboration between North Koreans and Americans can benefit ordinary citizens of both countries and improve diplomatic relations.
One of the students was working on a “vision-based vehicle detection system” for his masters thesis. In this problem, we are given a live feed from a video camera mounted near a road, and the goal is to count the number of cars and trucks that pass by. This is a fairly standard machine vision problem that students around the world regularly implement, and the output looks something like:
(This image unfortunately isn’t from the student’s project, but is instead taken from https://github.com/ahmetozlu/vehicle_counting_tensorflow.)
Apparently, the North Korean department of transportation had directed the student to work on this problem because traffic in Pyongyang was growing rapidly. Visitors to Pyongyang in the 1990s would often remark about the lack of cars on the roads, but these days the city is bustling with traffic. I guess Pyongyang’s famous Traffic Girls could use some automated help keeping traffic flowing.
Graduate students in North Korea have unfiltered internet access, and the student had used this access to download the mlpack machine learning library in order to implement their vehicle detection system. They chose to use mlpack because it was written in C++, and that was the main language they had been taught in their undergrad university courses. But they were encountering a very serious problem: their computer was old and did not have enough memory to compile the library!
After an hour or so of debugging with the student, we narrowed down the problem to mlpack’s use of C++ templates. The mlpack library uses templates extensively throughout its code in order to enable generic programming with no runtime overhead. This use of templates has earned mlpack a well-deserved reputation for running models very fast with low memory overhead, but the downside is that compilation is slow and memory intensive. At the time, compilation consumed a peak of about 2GB of RAM, but the North Korean student’s laptop only had 1GB of RAM available.
The student finally managed to get mlpack to compile by greatly optimizing the compilation procedure. The original code contained hundreds of unnecessary #include
statements spread across the project, and the fix was simply to remove this dead code. You can view the actual commit on github. The fix sped up compilation by about 25% and more than halved memory consumption. The tens of thousands of people and companies who have used mlpack since then are all benefiting from this North Korean student’s excellent work.
Many of the masters students in my class had not yet selected a thesis topic, and so I encouraged one student to work on twitter sentiment analysis. In this problem, we are given a tweet like
and we must classify the tweet as having either positive or negative sentiment towards a topic. For example, the tweet above has a negative sentiment towards the 2018 Singapore Summit between President Trump and Chairman Kim Jong-Un. By analyzing thousands (or millions/billions) of tweets this way, we can determine how different communities feel about a particular topic. Again, this is an entirely routine task implemented by students around the world. But for the North Korean students, this task was remarkable.
The students had all heard of twitter before, but they didn’t use it. Even though their computers had a direct, unfiltered connection to the internet, they were not allowed to create accounts on social media websites. The reason—as it was explained to me—is because the United States controls most internet infrastructure (including websites like Twitter), and through programs like the NSA’s PRISM and the Army’s Cyber Command is spying on and manipulating social media. The United States and North Korea never signed a peace treaty after the Korean War, and so North Koreans still very much worry about being attacked by the US. Not creating social media accounts was one of the “defenses” that these North Korean students were required to use in order to limit the effect of potential “American cyberattacks”. Fortunately, studying twitter is one of my research areas, and I had brought some reasonably sized data sets with me for the students to analyze.
What’s remarkable about this project—as one of my North Korean colleagues liked to remind me—is that it was the first time a North Korean student had ever analyzed twitter data. And analyzing twitter data would soon have geopolitical importance: Less than a year after this project was started, Donald Trump was elected president of the United States, and twitter became one of the primary tools that his administration would use to announce its foreign policy to the world. Fortunately, by the time Trump gave twitter this pseudo-official status, at least a small number of North Koreans had experience analyzing twitter data. They could use this experience to better understand both Trump’s tweets and the responses sent by millions of Americans. The North Korean government now recognizes the importance of using social media to understand American policy, and has recently created a new foreign ministry dedicated to analyzing US intentions through social media and other public information sources. And this is great news for both countries! The United States is built on a system of transparency because we actively want everyone—including North Koreans—to understand how our democracy functions and how to best negotiate with us to achieve shared goals.
This interaction between scientific exchanges and diplomacy is called science diplomacy, and it was instrumental in helping the US and Soviet Union negotiate successful limitations on nuclear weapons systems during the cold war. I believe my work teaching open source software in North Korea helps demonstrate that this science diplomacy model can also be successfully applied to US-North Korean negotiations as well.
Now back to the student’s open source contribution. I recommended that the student use vowpal wabbit to perform the analysis, since this is a great tool for analyzing large scale text datasets. The student successfully downloaded the code, compiled it, and analyzed the sentiment of a few thousand tweets. In this case, the code worked fine on the student’s computer without modification. But for the class on open source software, the student was still required to submit a patch.
He found an open issue on github asking for the ability to correct how the intercept term interacts with L2 regularization when training linear models, and submitted a patch to add this behavior. (The pull requests for both projects were submitted from my github account since github is considered social media, and the North Korean students weren’t allowed to create social media accounts.) A bug was later found in this patch, and a subsequent patch was added to fix the issue. This back-and-forth is exactly how open source software development is supposed to work, and I find it amazing that open source software lets ordinary people from around the world find common purpose building awesome software even across seemingly irreconcilable political differences.
AFAIK, the patches submitted for this class were the first ever open source contributions to come from North Korea; unfortunately, they were also the last.
I had organized another trip to North Korea the following year (2017) that would have brought several other instructors to teach about open source software, but President Trump banned Americans from travelling to North Korea. So I and the other instructors could no longer meet with North Korean students, and there was no one to teach them how to contribute to open source or encourage them to do so.
President Biden has recently announced his policy of “practical diplomacy” with North Korea. But the details of this policy are not yet clear, and the travel ban remains in effect. As long as this policy remains in effect, I and other American instructors will not be able to help North Koreans contribute to open source. So Americans will not benefit from North Koreans fixing bugs in our code, and the science diplomacy that effectively reduced tensions between the US and Soviet Union cannot be used as a tool to reduce tensions between the US and North Korea.
My copy of Settlers of Catan came with two normal wooden dice. To load these dice, I placed them in a small plate of water overnight, leaving the 6 side exposed.
The submerged area absorbed water, becoming heavier. My hope was that when rolled, the heavier wet sides would be more likely to land face down, and the lighter dry side would be more likely to land face up. So by leaving the 6 exposed, I was hoping to create dice that roll 6’s more often.
This effect is called the bias of the dice. To measure this bias, my wife and I spent the next 7 days rolling dice while eating dinner. (She must love me a lot!)
In total, we rolled the dice 4310 times. The raw results are shown below.
1 | 2 | 3 | 4 | 5 | 6 | |
number of rolls | 622 | 698 | 650 | 684 | 666 | 812 |
probability | 0.151 | 0.169 | 0.157 | 0.165 | 0.161 | 0.196 |
Looking at the data, it’s “obvious” that our dice are biased: The 6 gets rolled more times than any of the other numbers. Before we prove this bias formally, however, let’s design a strategy to exploit this bias while playing Settlers of Catan.
The key to winning at Settlers of Catan is to get a lot of resources. We want to figure out how many extra resources we can get using our biased dice.
First, let’s quickly review the rules. Each settlement is placed on the corner of three tiles, and each tile has a number token. Whenever the dice are rolled, if they add up to one of the numbers on the tokens, you collect the corresponding resource card. For example:
A good settlement will be placed next to numbers that will be rolled often.
To make strategizing easier, the game designers put helpful dots on each token below the number. These dots count the ways to roll that token’s number using two dice.
We can use these dots to calculate the probability of rolling each number. For example, a \(4\) can be rolled in three ways. If we name our two dice \(A\) and \(B\), then the possible combinations are \((A=1,B=3)\), \((A=2,B=2)\), \((A=3,B=1)\). To calculate the probability of rolling a 4, we calculate the probability of each of these rolls and add them together. For fair dice, the probability of every roll is the same \((1/6)\), so the calculation is:
\[\begin{align} Pr(A+B = 4) &= Pr(A = 1)Pr(B=3) + Pr(A=2)Pr(B=2) + Pr(A=3)Pr(B=1) \\ &= (1/6)(1/6) + (1/6)(1/6) + (1/6)(1/6) \\ &= 1/12 \\ &\approx 0.08333 \end{align}\]For our biased dice, the probability of each roll is different. Using the numbers from the table above, we get:
\[\begin{align} Pr(A+B = 4) &= Pr(A = 1)Pr(B=3) + Pr(A=2)Pr(B=2) + Pr(A=3)Pr(B=1) \\ &= (0.151)(0.157) + (0.169)(0.169) + (0.157)(0.151) \\ &= 0.07597 \end{align}\]So rolling a \(4\) is now less likely with our biased dice. Performing this calculation for each possible number gives us the following chart.
All the numbers below \(7\) are now less likely, and the numbers above 7 are now more likely. The shift is small, but it has important strategic implications.
Consider the two initial settlement placements below.
The naughty player knows that the dice are biased and puts her settlements on locations with high numbers, but the nice player doesn’t know the dice are biased and puts her settlements on locations with low numbers. Notice that if the dice were fair, both settlement locations would be equally good because they have the same number of dots.
The following formula calculates the average number of cards a player receives on each dice roll:
\[ \text{expected cards per roll} = \sum_{\text{adjacent tokens}} Pr(A+B=\text{token value}) \]
Substituting the appropriate values gives us the following results.
|
||
naughty | nice | |
fair dice | 0.500 | 0.500 |
biased dice | 0.543 | 0.457 |
So the difference between the naughty and nice player is \(0.086\) cards per roll of the biased dice. A typical game of Settlers contains about 60 dice rolls (about 15 turns per player in a 4 player game), so this results in \(0.086*60=5.16\) more cards for the naughty player.
And this is only considering the two starting settlements. As the game progresses, more settlements will be built, and some settlements will be upgraded to cities (which receive two cards per roll instead of one). Calculating the exact effect of these additional sources of cards is difficult because these improvements will be built at random points throughout the game. We’ll have to make some additional assumptions.
If we assume that the naughty player gets 0.043 more cards per roll per settlement/city than the nice player (this exact number will vary depending on the quality of the settlement), and that both players build settlement/cities at turns 10,20,25,30,35,40,45, and 50, then the naughty player will on average receive 15.050 more cards than the nice player.
To summarize, the naughty player will receive somewhere between 5 and 15 more resource cards depending on how their future settlements and cities are built. This advantage can’t guarantee a victory, but it’ll definitely help.
To show that the dice are biased, we will use a standard scientific technique called null hypothesis significance testing. We begin by assuming a hypothesis that we want to disprove. In our case, we assume that the dice are not biased. In other words, we assume that each number on the dice has a \(1/6\approx 0.166\) chance of being rolled. Our goal is to show that under this assumption, the number of 6’s rolled above is very unlikely. We therefore conclude that our hypothesis is also unlikely, and that the dice probably are in fact biased.
More formally, we let \(X\) be a random variable that represents the total number of 6’s we would roll if we were to repeat our initial experiment with fair dice. Then \(X\) follows a binomial distribution whose density is plotted below.where \(n\) is the total number of dice rolls (4310), \(k\) is the number of 6’s actually rolled (812), and \(q\) is the assumed probability of rolling a 6 (1/6). Substituting these numbers gives us \[ p\text{-value}= Pr(X\ge k) \approx 0.0000884 . \] In other words, if we repeated this experiment one million times with fair dice, we would expect to get results similar to the results we actually got only 88 times. Since this is so unlikely, we conclude that our original assumption (that the dice are not biased) is probably false. Most science classes teach that \(p\)-values less than 0.05 are “significant.” We are very far below that threshold, so our result is “very significant.”
Our \(p\)-value is so low because the number of trials we conducted was very large \((n=4310)\). In a typical game of Settlers, however, there will be many fewer trials. This makes it hard for our opponents to prove that we’re cheating.
We said before that there are 60 dice rolls in a typical game. Since we have two dice, that means \(n=120\). To keep the math simple, we’ll assume that we role an average number of 6’s. That is, the number of sixes rolled during the game is \[ k=812\cdot \frac{120}{4310}\approx23. \] Substituting into our formula for the \(p\)-value, we get \[ p\text{-value}=P(X\ge k) \approx 0.265 . \] In words, this means that if the dice were actually fair, then we would still role this number of 6’s \(26.5\%\) of the time. Since this probability is so high, the standard scientific protocol tells us to conclude that we have no “significant” evidence that the dice are biased. (Notice that this is subtly different from having evidence that the dice are not biased! Confusing these two statements is a common mistake, even for trained phd scientists, and especially for medical doctors.)
So how many games can we play without getting caught? It turns out that if we play 6 games (so \(n=6*120=720\), and \(k=812\cdot(720/4310)\approx136\)), then the resulting \(p\)-value is 0.05. In other words, as long as we play fewer than 6 games, then our opponents won’t have enough data to conclude that their measurements of the biased dice are “significant.” The standard scientific method won’t prove we’re cheating.
The \(p\)-value argument above is how most scientists currently test their hypotheses. But there’s some major flaws with this approach. For example:
The \(p\)-value test doesn’t use all the available information. In particular, our opponents may have other reasons to believe that the dice are loaded. If you look closely at the dice, you’ll notice some slight discoloration where it was submerged in water.
This discoloration was caused because the water spread the ink on the die’s face. If you see similar discoloration on the dice in your game, it makes sense to be extra suspicious about the dice’s bias.
Unfortunately, there’s no way to incorporate this suspicion into the \(p\)-value analysis we conducted above. An alternative to the \(p\)-value called the bayes factor can incorporate this prior evidence. So if our opponent uses a bayes factor analysis, they may be able to determine that we’re cheating. The bayes factor is more complicated than the \(p\)-value, however, and so it is not widely taught to undergraduate science majors. It is rarely even used in phd-level scientific publications, and many statisticians are calling for increased use of these more sophisticated analysis techniques.
Another weakness of the \(p\)-value test is that false positives are very common. Using the standard significance threshold of \(p\le0.05\) means that 5 of every 100 games will have “significant” evidence that the dice are biased to role 6’s. Common sense, however, tells us that cheating at Settlers of Catan is almost certainly not this common because most people just don’t want to cheat. But when you run many experiments, some of them will give “significant” results just by random chance. This is one of the many reasons why some scientists have concluded that most published research is false. This effect is thought to be one of the reasons that evidence of extra sensorial perception (ESP) continues to be published in scientific journals. Some less scrupulous scientists exploit this deficiency in a process called p-hacking to make their research seem more important.
To alleviate the problem of false positives, a group of statisticians is proposing a new significance threshold of \(p\le0.005\) for a result to qualify as “significant”. While this reduces the risk of false positives, it also makes detecting true effects harder. Under this new criterion, we’d have to play 16 games (for \(n=1920\) dice roles) to get statistically significant evidence that the dice are biased.
At this point, you might be feeling overwhelmed at the complexity of statistical analysis. And this is just for the toy problem of detecting loaded dice in a game. Real world problems like evaluating the effectiveness of chemotherapy drugs are much more complicated, and so require much more complicated statistical analyses. Doing science is hard!
Edit after peer review: Vijay Lulla sent me the following message:
The blog mentions that you rolled the dice 4310 times and all your calculations are based on it, but the frequency table adds up to 4312.
Whooops! It looks like a messed up my addition. Fortunately, this mistake is small enough that it won’t affect any of the numbers in the article by much.
A lot of people mistakenly think that peer review is where other scientists repeat an experiment to test the conclusion. But that’s not the case. The purpose for peer review is for scientists like Vijay to just do a sanity check on the whole procedure to make sure obvious mistakes like this get caught. Sadly, another commonly made mistake in science is that researchers don’t publish their data, so there’s no way for checks like this to be performed.
If this were a real publication in a scientific journal, I would redo all the calculations. But since it’s not, I’ll leave the mistake for posterity.
Edit 2: There’s a good discussion on reddit’s /r/statistics. This discussion provides a much more nuanced view about significance testing than my discussion above, and a few users point out ways that I might be overstating some conclusions.
Two weeks ago at ICML, I presented a method for making nearest neighbor queries faster. The paper is called Faster Cover Trees and discusses some algorithmic improvements to the cover tree data structure. You can find the code in the HLearn library on github.
The implementation was written entirely in Haskell, and it is the fastest existing method for nearest neighbor queries in arbitrary metric spaces. (If you want a non-Haskell interface, the repo comes with a standalone program called hlearn-allknn
for finding nearest neighbors.) In this blog post, I want to discuss four lessons learned about using Haskell to write high performance numeric code. But before we get to those details, let’s see some benchmarks.
The benchmark task is to find the nearest neighbor of every point in the given dataset. We’ll use the four largest datasets in the mlpack benchmark suite and the Euclidean distance. (HLearn supports other distance metrics, but most of the compared libraries do not.) You can find more details about the compared libraries and instructions for reproducing the benchmarks in the repo’s bench/allknn folder.
the runtime of finding every nearest neighbor
Notice that HLearn’s cover trees perform the fastest on each dataset except for YearPredict. But HLearn can use all the cores on your computer to parallelize the task. Here’s how performance scales with the number of CPUs on an AWS EC2 c3.8xlarge
instance with 16 true cores:
With parallelism, HLearn is now also the fastest on the YearPredict dataset by a wide margin. (FLANN is the only other library supporting parallelism, but in my tests with this dataset parallelism actually slowed it down for some reason.)
You can find a lot more cover tree specific benchmarks in the Faster Cover Trees paper.
The Haskell Wiki’s guide to performance explains the basics of writing fast code. But unfortunately, there are many details that the wiki doesn’t cover. So I’ve selected four lessons from this project that I think summarize the state-of-the-art in high performance Haskell coding.
Lesson 1: Polymorphic code in GHC is slower than it needs to be.
Haskell makes generic programming using polymorphism simple. My implementation of the cover tree takes full advantage of this. The CoverTree_
type implements the data structure that speeds up nearest neighbor queries. It is defined in the module HLearn.Data.SpaceTree.CoverTree as:
data CoverTree_
( childC :: * -> * ) -- the container type to store children in
( leafC :: * -> * ) -- the container type to store leaves in
( dp :: * ) -- the type of the data points
= Node
{ nodedp :: !dp
, level :: {-#UNPACK#-}!Int
, nodeWeight :: !(Scalar dp)
, numdp :: !(Scalar dp)
, maxDescendentDistance :: !(Scalar dp)
, children :: !(childC (CoverTree_ childC leafC dp))
, leaves :: !(leafC dp)
}
Notice that every field except for level
is polymorphic. A roughly equivalent C++ struct (using higher kinded templates) would look like:
template < template <typename> typename childC
, template <typename> typename leafC
, typename dp
>
struct CoverTree_
{
dp *nodedp;
int level;
dp::Scalar *nodeWeight;
dp::Scalar *numdp;
dp::Scalar *maxDescendentDistance;
childC<CoverTree_<childC,leafC,dp>> *children;
leafC<dp> *leaves;
}
Notice all of those nasty pointers in the C++ code above. These pointers destroy cache performance for two reasons. First, the pointers take up a significant amount of memory. This memory fills up the cache, blocking the data we actually care about from entering cache. Second, the memory the pointers reference can be in arbitrary locations. This causes the CPU prefetcher to load the wrong data into cache.
The solution to make the C++ code faster is obvious: remove the pointers. In Haskell terminology, we call this unpacking the fields of the Node
constructor. Unfortunately, due to a bug in GHC (see issues #3990 and #7647 and a reddit discussion), these polymorphic fields cannot currently be unpacked. In principle, GHC’s polymorphism can be made a zero-cost abstraction similar to templates in C++; but we’re not yet there in practice.
As a temporary work around, HLearn provides a variant of the cover tree specialized to work on unboxed vectors. It is defined in the module HLearn.Data.SpaceTree.CoverTree_Specialized as:
data CoverTree_
( childC :: * -> * ) -- must be set to: BArray
( leafC :: * -> * ) -- must be set to: UArray
( dp :: * ) -- must be set to: Labaled' (UVector "dyn" Float) Int
= Node
{ nodedp :: {-#UNPACK#-}!(Labeled' (UVector "dyn" Float) Int)
, nodeWeight :: {-#UNPACK#-}!Float
, level :: {-#UNPACK#-}!Int
, numdp :: {-#UNPACK#-}!Float
, maxDescendentDistance :: {-#UNPACK#-}!Float
, children :: {-#UNPACK#-}!(BArray (CoverTree_ exprat childC leafC dp))
, leaves :: {-#UNPACK#-}!(UArray dp)
}
Since the Node
constructor no longer has polymorphic fields, all of its fields can be unpacked. The hlearn-allknn
program imports this specialized cover tree type, giving a 10-30% speedup depending on the dataset. It’s a shame that I have to maintain two separate versions of the same code to get this speedup.
Lesson 2: High performance Haskell code must be written for specific versions of GHC.
Because Haskell code is so high level, it requires aggressive compiler optimizations to perform well. Normally, GHC combined with LLVM does an amazing job with these optimizations. But in complex code, sometimes these optimizations don’t get applied when you expect them. Even worse, different versions of GHC apply these optimizations differently. And worst of all, debugging problems related to GHC’s optimizer is hard.
I discovered this a few months ago when GHC 7.10 was released. I decided to upgrade HLearn’s code base to take advantage of the new compiler’s features. This upgrade caused a number of performance regressions which took me about a week to fix. The most insidious example happened in the findNeighbor
function located within the HLearn.Data.SpaceTree.Algorithms module. The inner loop of this function looks like:
go (Labeled' t dist) (Neighbor n distn) = if dist*ε > maxdist
then Neighbor n distn
else inline foldr' go leafres
$ sortBy (\(Labeled' _ d1) (Labeled' _ d2) -> compare d2 d1)
$ map (\t' -> Labeled' t' (distanceUB q (stNode t') (distnleaf+stMaxDescendentDistance t)))
$ toList
$ stChildren t
where
leafres@(Neighbor _ distnleaf) = inline foldr'
(\dp n@(Neighbor _ distn') -> cata dp (distanceUB q dp distn') n)
(cata (stNode t) dist (Neighbor n distn))
(stLeaves t)
maxdist = distn+stMaxDescendentDistance t
For our purposes right now, the important thing to note is that go
contains two calls to foldr'
: one folds over the CoverTree_
’s childC
, and one over the leafC
. In GHC 7.8, this wasn’t a problem. The compiler correctly specialized both functions to the appropriate container type, resulting in fast code.
But for some reason, GHC 7.10 did not want to specialize these functions. It decided to pass around huge class dictionaries for each function call, which is a well known cause of slow Haskell code. In my case, it resulted in more than a 20 times slowdown! Finding the cause of this slowdown was a painful exercise in reading GHC’s intermediate language core. The typical tutorials on debugging core use trivial examples of only a dozen or so lines of core code. But in my case, the core of the hlearn-allknn
program was several hundred thousand lines long. Deciphering this core to find the slowdown’s cause was one of my more painful Haskell experiences. A tool that analyzed core to find function calls that contained excessive dictionary passing would make writing high performance Haskell code much easier.
Once I found the cause of the slowdown, fixing it was trivial. All I had to do was call the inline
function on both calls to foldr
. In my experience, this is a common theme in writing high performance Haskell code: Finding the cause of problems is hard, but fixing them is easy.
Lesson 3: Immutability and laziness can make numeric code faster.
The standard advice in writing numeric Haskell code is to avoid laziness. This is usually true, but I want to provide an interesting counter example.
This lesson relates to the same go
function above, and in particular the call to sortBy
. sortBy
is a standard Haskell function that uses a lazy merge sort. Lazy merge sort is a slow algorithm—typically more than 10 times slower than in-place quicksort. Profiling hlearn-allknn
shows that the most expensive part of nearest neighbor search is calculating distances (taking about 80% of the run time), and the second most expensive part is the call to sortBy
(taking about 10% of the run time). But I nevertheless claim that this lazy merge sort is actually making HLearn’s nearest neighbor queries much faster due to its immutability and its laziness.
We’ll start with immutability since it is pretty straightforward. Immutability makes parallelism easier and faster because there’s no need for separate threads to place locks on any of the containers.
Laziness is a bit trickier. If the explanation below doesn’t make sense, reading Section 2 of the paper where I discuss how a cover tree works might help. Let’s say we’re trying to find the nearest neighbor to a data point we’ve named q
. We can first sort the children according to their distance from q
, then look for the nearest neighbors in the sorted children. The key to the cover tree’s performance is that we don’t have to look in all of the subtrees. If we can prove that a subtree will never contain a point closer to q
than a point we’ve already found, then we “prune” that subtree. Because of pruning, we will usually not descend into every child. So sorting the entire container of children is a waste of time—we should only sort the ones we actually visit. A lazy sort gives us this property for free! And that’s why lazy merge sort is faster than an in-place quick sort for this application.
Lesson 4: Haskell’s standard libraries were not designed for fast numeric computing.
While developing this cover tree implementation, I encountered many limitations in Haskell’s standard libraries. To work around these limitations, I created an alternative standard library called SubHask. I have a lot to say about these limitations, but here I’ll restrict myself to the most important point for nearest neighbor queries: SubHask lets you safely create unboxed arrays of unboxed vectors, but the standard library does not. (Unboxed containers, like the UNPACK
pragma mentioned above, let us avoid the overhead of indirections caused by the Haskell runtime. The Haskell wiki has a good explanation.) In my experiments, this simple optimization let me reduce cache misses by around 30%, causing the program to run about twice as fast!
The distinction between an array and a vector is important in SubHask—arrays are generic containers, and vectors are elements of a vector space. This distinction is what lets SubHask safely unbox vectors. Let me explain:
In SubHask, unboxed arrays are represented using the UArray :: * -> *
type in SubHask.Algebra.Array. For example, UArray Int
is the type of an unboxed array of ints. Arrays can have arbitrary length, and this makes it impossible to unbox an unboxed array. Vectors, on the other hand, must have a fixed dimension. Unboxed vectors in SubHask are represented using the UVector :: k -> * -> *
type in SubHask.Algebra.Vector. The first type parameter k
is a phantom type that specifies the dimension of the vector. So a vector of floats with 20 dimensions could be represented using the type UVector 20 Float
. But often the size of a vector is not known at compile time. In this case, SubHask lets you use a string to identify the dimension of a vector. In hlearn-allknn
, the data points are represented using the type UVector "dyn" Float
. The compiler then statically guarantees that every variable of type UVector "dyn" Float
will have the same dimension. This trick is what lets us create the type UArray (UVector "dyn" Float)
.
The hlearn-allknn
program exploits this unboxing by setting the leafC
parameter of CoverTree_
to UArray
. Then, we call the function packCT
which rearranges the nodes in leafC
to use the cache oblivious van Embde Boas tree layout. Unboxing by itself gives a modest 5% performance gain from the reduced overhead in the Haskell run time system; but unpacking combined with this data rearrangement actually cuts runtime in half!
Unfortunately, due to some current limitations in GHC, I’m still leaving some performance on the table. The childC
parameter to CoverTree_
cannot be UArray
because CoverTree_
s can have a variable size depending on their number of children. Therefore, childC
is typically set to the boxed array type BArray
. The GHC limitation is that the run time system gives us no way to control where the elements of a BArray
exist in memory. Therefore, we do not get the benefits of the CPU prefetcher. I’ve proposed a solution that involves adding new primops to the GHC compiler (see feature request #10652). Since there are typically more elements within the childC
than the leafC
on any given cover tree, I estimate that the speedup due to better cache usage of BArray
s would be even larger than the speedup reported above.
My experience is that Haskell can be amazingly fast, but it takes a lot of work to get this speed. I’d guess that it took about 10 times as much work to create my Haskell-based cover tree than it would have taken to create a similar C-based implementation. (I’m roughly as proficient in each language.) Furthermore, because of the current limitations in GHC I pointed out above, the C version would be just a bit faster.
So then why did I use Haskell? To make cover trees 10 times easier for programmers to use.
Cover trees can make almost all machine learning algorithms faster. For example, they’ve sped up SVMs, clustering, dimensionality reduction, and reinforcement learning. But most libraries do not take advantage of this optimization because it is relatively time consuming for a programmer to do by hand. Fortunately, the fundamental techniques are simple enough that they can be implemented as a compiler optimization pass, and Haskell has great support for libraries that implement their own optimizations. So Real Soon Now (TM) I hope to show you all how cover trees can speed up your Haskell-based machine learning without any programmer involvement at all.
Learning Haskell was excruciating. The error messages from the Haskell compiler ghc were way more difficult to understand than the error messages I was used to from g++. I admit I’m still a novice programmer: My only experience is a year of classes in C++ programming. But the Haskell compiler should do a better job generating error messages for beginners like me.
First we’ll see four concrete examples of ghc doing worse than g++, then Mike will talk about some ways to fix ghc’s error messages.
Below are two equivalent C++ and Haskell programs. I’ve intentionally added some syntax errors:
/* C++ Code */
#include <iostream>
using namespace std;
int main ()
{
int in = -1;
cout << "Please choose 1 for a message" << endl;
cin >> in;
err-> if in == 1
{
cout << "Hello, World!" << endl;
}
else{
cout << "Error, wrong choice" << endl;
}
return 0;
}
{- Haskell Code -}
main = do
putStrLn "Please enter 1 for a message"
num <- getLine
if num == "1"
then do
putStrLn "Hello, World"
err->
putStrLn "Error, wrong choice"
Alright, so the first notable difference is that the Haskell code is much shorter. It takes up roughly half the space that the C++ code does, yet they both output hello world
when the correct number is entered.
Great!
Haskell already seems better, right?
Wrong!
Notice how I messed up the if
statement in both programs. In the C++ version, I forgot the parentheses, and in the Haskell version I forgot the else
. Both omissions are simple mistakes that I’ve made while learning these languages.
Now let’s see the error messages:
-- C++ Error --
main.cpp: In function 'int main()':
main.cpp:15:5: error: expected '(' before 'in'
main.cpp:19:2: error: 'else' without a previous 'if'
Compilation failed.
-- Haskell Error --
[..]main.hs:19:1:
parse error (possibly incorrect indentation or mismatched brackets)
Failed, modules loaded: none.
Both error messages let the programmer know where the mistake happened, but the g++ message is far more helpful. It tells us how to fix the syntax error by adding some missing parentheses. Bam! Problem solved.
Now let us turn to ghc’s output. Okay, something about a parse error… might have indentation errors… and no modules loaded. Cool. Now I’ve never taken a compiler course, so I don’t know what parse error
means, and I have no idea how to fix it. The error message is simply not helpful.
Here’s another example of parsing errors.
/* C++ Code */
#include <iostream>
using namespace std;
int main()
{
err-> string in = ""
cout << "Please enter a single word and get the string size back" << endl;
cin >> in;
cout << "The size of your word, \"" << in << "\", is "
<< in.length() << "!" << endl;
return 0;
}
{- Haskell Code -}
err-> main = {-do-}
putStrLn "Please enter a single word and get the string size back"
num <- getLine
let size = length num
putStrLn $ "The size of your word, \"" ++ num ++ "\", is " ++ show size ++ "!"
As you can see, in the C++ I forgot to include a semicolon and in the Haskell I forgot the do
in main. As a beginner, I’ve personally made both of these mistakes.
Here are the error messages:
-- C++ Error --
main.cpp:8:2: error: expected ',' or ';' before 'cout'
Compilation failed.
-- Haskell Error --
[..]main.hs:4:13:
parse error on input '<-'
Failed, modules loaded: none.
C++ delivers a clear message explaining how to fix the error. Haskell, however, is not so helpful. It says there’s a parse error on the input operator. How should I know this is related to a missing do
?
Next let’s see what happens when you call the built-in strlen
and length
functions with no arguments at all.
/* C++ Code */
#include <iostream>
#include <cstring>
using namespace std;
int main (){
char input[256];
cout << "Please enter a word" << endl;
cin >> input;
err-> cout << "The size of your string is: " << (unsigned)strlen();
cout << "!" << endl;
return 0;
}
{- Haskell Code -}
main = do
putStrLn "Please enter a word"
num <- getLine
err-> let size = length
putStrLn $ "The size of your string is: " ++ show size ++ "!"
Now let us see the different error messages that are produced:
-- C++ Error --
main.cpp: In function 'int main()':
main.cpp:11:61: error: too few arguments to function 'size_t_strlen(const char*)'
Compilation failed.
-- Haskell Error --
[..]main.hs:7:36:
No instance for (Show ([a0]->Int)) arising from a use of 'show'
Possile fix: add an instance declaration for (Show ([a0]->Int))
In the first argument of '(++)', namely 'show size'
In the second argument of '(++)', namely 'show size ++ "!"'
In the second argument of '(++)', namely
'"\", is " ++ show size ++ "!"'
Failed, modules loaded:none.
Once again, it appears that the C++ compiler g++ knew exactly what was wrong with the code and how to fix the error. It tells me that there are not enough arguments in my function call.
Wow, Hakell’s error message is quite the mouthful this time. I suppose this is better than just a parse error
message, but I’m not sure what exactly ghc is even wanting me to correct. The error is simply too technical to help me.
Next, we will look at what happens when you pass too many arguments to functions in both languages:
/* C++ Code */
#include <iostream>
using namespace std;
int main () {
string in[256];
cout << "Please enter a single word to get the string size back" << endl;
cin >> in;
err-> cout << "The size of your string, \"" << in << "\", is " << (unsigned)strlen(in, in);
cout << "!" << endl;
return 0;
}
{- Haskell Code -}
main = do
putStrLn "Please enter a single word to get the string size back"
num <- getLine
err-> let size = length num num
putStrLn $ "The size of your string, \"" ++ num ++ "\", is " ++ show size ++ "!"
And the errors:
-- C++ Error --
main.cpp:16:78: error: too many arguments to function 'int newLength(std::string)'
main.cpp:6:5: note: declared here
Compilation failed.
-- Haskell Error --
Couldn't match expected type 'String -> t0' with actual type 'Int'
The function 'length' is applied to two arguments,
but its type '[Char] -> Int' has only one
In the expression: length num num
In an equation for 'size':size = length num num
Failed, modules loaded: none.
The C++ error clearly explains how to fix the code, and I even understand the Haskell error this time. Both languages tell me that there are too many arguments. Yet the C++ error message tells me this without a bunch of technical jargon. So even when Haskell is actually helpful with its error messages, it still manages to hide what it wants the user to do.
To me, Haskell seems like a language only for experienced programmers because the errors are not user-friendly. How can I write code if a few simple mistakes cripple my progress? Haskell’s compiler ghc simply lags behind g++ in terms of useful error messages for beginners.
I’ve created a patch for ghc that clarifies the specific error messages that Paul had trouble with (and a few related ones). In particular:
Anytime there is a parse error caused by a malformed if
, case
, lambda, or (non-monadic) let
, ghc will now remind the programmer of the correct syntax. In the first example Paul gives above, we would get the much clearer:
parse error in if statement: missing required else clause
To help with the second example, anytime ghc encounters a parse error caused by a <-
token, it now outputs the hint:
Perhaps this statement should be within a 'do' block?
The third example Paul points out comes from the type checker, rather than the parser. It’s a little less obvious how to provide good hints here. My idea is based on the fact that it is fairly rare for functions to be instances of type classes. The only example I know of is the Monad
instance for (a->)
.
Therefore, if the type checker can’t find an instance for a function, the more likely scenario is that the programmer simply did not pass enough parameters to the function. My proposed change is that in this situation, ghc would output the hint:
maybe you haven't applied enough arguments to a function?
This patch doesn’t completely fix ghc’s problem with poor error messages. For example, it doesn’t address Paul’s last point about type errors being verbose. But hopefully it will make it a little easier for aspiring haskellers who still aren’t familiar with all the syntax rules.
This post compares the runtimes of AVL tree operations in C++ vs Haskell. In particular, we insert 713,000 strings from a file into an AVL Tree. This is a \(O(n \log n)\) operation. But we want to investigate what the constant factor looks like in different situations.
Experimental setup: All the code for these tests is available in the github repository. The C++ AVL tree was created in a data structures course that I took recently and the Haskell AVL tree is from the Haskell library Data.Tree.AVL
. Additionally, the Haskell code stores the strings as ByteString
s because they are much more efficient than the notoriously slow String
. To see how the runtime is affected by files of different sizes, the file was first partitioned into 10 segments. The first segment has 71,300 words, the second 71,300 * 2 words, and so on. Both the C++ and Haskell programs were compiled with the -O2
flag for optimization. The test on each segment is the average runtime of three separate runs.
Here’s the results:
C++ is a bit faster than Haskell on the last partition for this test.
I guess this is because Haskell operates on immutable data. Every time a new element is to be inserted into the Haskell AVL tree, new parent nodes must be created because the old parent nodes cannot be changed. This creates quite a bit of overhead. C++ on the other hand, does have mutable data and can simply change the node that a parent node is pointing to. This is faster than making a whole new copy like the Haskell code does.
Is there an easy way to speed up our Haskell code?
There is a Haskell library called parallel
that makes parallel computations really convenient. We’ll try to speed up our program with this library.
You might think that it is unfair to compare multithreaded Haskell against C++ that is not multithreaded. And you’re absolutely right! But let’s be honest, manually working with pthreads
in C++ is quite the headache, but parallism in Haskell is super easy.
Before we look at the results, let’s look at the parallelized code. What we do is create four trees each with a portion of the set of strings. Then, we call par
on the trees so that the code is parallelized. Afterwards, we union the trees to make them a single tree. Finally, we call deepseq
so that the code is evaluated.
{-# LANGUAGE TemplateHaskell #-}
import Control.DeepSeq.TH
import Control.Concurrent
import Data.Tree.AVL as A
import Data.COrdering
import System.CPUTime
import qualified Data.ByteString.Char8 as B
import Control.DeepSeq
import Data.List.Split
import System.Environment
import Control.Parallel
$(deriveNFData ''AVL)
-- Inserts elements from list into AVL tree
load :: AVL B.ByteString -> [B.ByteString] -> AVL B.ByteString
load t [] = t
load t (x:xs) = A.push (fstCC x) x (load t xs)
main = do
args <- getArgs
contents <- fmap B.lines $ B.readFile $ args !! 0
let l = splitEvery (length contents `div` 4) contents
deepseq contents $ deepseq l $ return ()
start <- getCPUTime
-- Loading the tree with the subpartitions
let t1 = load empty $ l !! 0
let t2 = load empty $ l !! 1
let t3 = load empty $ l !! 2
let t4 = load empty $ l !! 3
let p = par t1 $ par t2 $ par t3 t4
-- Calling union to combine the trees
let b = union fstCC t1 t2
let t = union fstCC t3 t4
let bt = union fstCC b t
let bt' = par b $ par t bt
deepseq p $ deepseq bt' $ return ()
end <- getCPUTime
n <- getNumCapabilities
let diff = ((fromIntegral (end-start)) / (10^12) / fromIntegral n)
putStrLn $ show diff
Great, so now that the Haskell code has been parallelized, we can compile and run the program again to see the difference. To compile for parallelism, we must use some special flags.
ghc –O2 filename -rtsopts –threaded
And to run the program (-N4
refers to the number of cores).
./filename +RTS –N4
Haskell now gets better runtimes than C++.
Now that we know Haskell is capable of increasing its speeds through parallelism, it would be interesting to see how the runtime is affected by the degree of parallelism.
According to Amdahl’s law, a program that is 100% parallelized will see a proportional speed up based on the number of threads of execution. For example, if a program that is 100% parallelized takes 2 seconds to run on 1 thread, then it should take 1 second to run using 2 threads. The code used for our test, however, is not 100% parallelized since there a union operation performed at the end to combine the trees created by the separate threads. The union of the trees is a \(O(n)\) operation while the insertion of the strings into the AVL tree is a \(O\left(\frac{n \log n }{p}\right)\) operation, where \(p\) is the number of threads. Therefore, the runtime for our test should be
\[O\left(\frac{n\log{n}}{p} + n\right)\]
Here is a graph showing the runtime of the operation on the largest set (713,000 strings) across increasing levels of parallelism.
Taking a look at the results, we can see that the improvement in runtime does not fit the 100% parallelized theoretical model, but does follow it to some extent. Rather than the 2 core runtime being 50% of the 1 core runtime, the 2 core runtime is 56% of the 1 core runtime, with decreasing efficiency as the number of cores increases. Though, it is clear that there are significant improvements in speed through the use of more processor cores and that parallelism is an easy way to get better runtime speeds with little effort.
Parametric polymorphism is when you write one function that works on many data types. In C++, this is pretty confusing, but it’s really easy in Haskell. Let’s take a look at an example.
Let’s say we want a function that calculates the volume of a box. In C++, we’d use templates so that our function works with any numeric type:
template<typename T>
T boxVolume(T length, T width, T height)
{
return length * width * height;
}
Templates have an awkward syntax, but that isn’t too much of a hassle. C++ has much bigger problems. What if in the course of writing your program, you accidentally pass in some strings to this function?
int main()
{
cout << boxVolume("oops","no","strings") << endl;
}
We get this error when we compile with g++
:
test.cpp: In instantiation of _T boxVolume(T, T, T) [with T = const char*]_:
test.cpp:22:47: required from here
test.cpp:8:19: error: invalid operands of types _const char*_ and _const char*_ to binary
_operator*_
return length * width * height;
This error message is a little hard to understand because of the templates. If we had written our function to use double
s instead of templates:
double boxVolume(double length, double width, double height)
{
return length * width * height;
}
We would get this simpler error message:
test.cpp: In function _int main()_:
test.cpp:22:47: error: cannot convert _const char*_ to _double_ for argument _1_ to _double
boxVolume(double, double, double)_
cout << boxVolume("oops","nope","bad!") << endl;
We see that this error is shorter and easier to use, as it clearly tells us we cannot pass string literals to our function. Plus there is no superfluous comment about our “instantiation” of boxVolume
.
Now let’s try to write a polymorphic boxVolume
in Haskell:
boxVolume :: a -> a -> a -> a
boxVolume length width height = length * width * height
When we try to compile, we get the error:
test.hs:2:50:
No instance for (Num a) arising from a use of `*'
Possible fix:
add (Num a) to the context of
the type signature for boxVolume :: a -> a -> a -> a
In the expression: length * width * height
In an equation for `boxVolume':
boxVolume length width height = length * width * height
Uh-oh! An error message! What went wrong? It says that we tried to use the *
operator without declaring our parameters as an instance of the Num
type class.
But what is a type class? This leads us to ad hoc polymorphism, also known as function overloading. Ad hoc polymorphism is when a function can be applied to different argument types, each with a different implementation. For example, the STL classes stack and queue each have their own push and pop functions, which, although they have the same names, do different things:
stack<int> s;
queue<int> q;
s.push(1); q.push(1);
s.push(2); q.push(2);
s.push(3); q.push(3);
s.pop(); q.pop();
After the above code is executed, the stack s
will be left with the numbers 1,2
while the queue q
will be left with the numbers 2,3
. The function pop
behaves differently on stacks and queues: calling pop
on a stack removes the item added last, while calling pop
on a queue removes the item added first.
Haskell does not support function overloading, except through type classes. For example, if we were to specifically declare our own Stack
and Queue
classes with push
and pop
functions:
data Stack = Stack [Int] deriving Show
data Queue = Queue [Int] deriving Show
push :: Stack -> Int -> Stack
push (Stack xs) x = Stack (x:xs)
pop :: Stack -> Stack
pop (Stack []) = Stack []
pop (Stack xs) = Stack (tail xs)
push :: Queue -> Int -> Queue
push (Queue xs) x = Queue (x:xs)
pop :: Queue -> Queue
pop (Queue []) = Queue []
pop (Queue xs) = Queue (init xs)
It results in a compiler error:
stack.hs:11:1:
Duplicate type signatures for `push'
at stack.hs:4:1-4
stack.hs:11:1-4
stack.hs:12:1:
Multiple declarations of `push'
Declared at: stack.hs:5:1
stack.hs:12:1
stack.hs:14:1:
Duplicate type signatures for `pop'
at stack.hs:7:1-3
stack.hs:14:1-3
stack.hs:15:1:
Multiple declarations of `pop'
Declared at: stack.hs:8:1
stack.hs:15:1
Changing the names of our push and pop functions to, say, stackPush
, stackPop
, queuePush
, and queuePop
would let the program compile.
A more generic way, however, is to create a type class. Let’s make a Sequence
type class that implements our push
and pop
functions.
class Sequence s where
push :: s -> Int -> s
pop :: s -> s
This type class declaration says that any data type that is an instance of this Sequence
type class can use the push
and pop
operations, or, in other words, can add and remove an Int
. By making our Stack
and Queue
instances of the Sequence
type class, both data types can have their own implementations of the push
and pop
functions!
instance Sequence Stack where
push (Stack xs) x = Stack (x:xs)
pop (Stack []) = Stack []
pop (Stack xs) = Stack (tail xs)
instance Sequence Queue where
push (Queue xs) x = Queue (x:xs)
pop (Queue []) = Queue []
pop (Queue xs) = Queue (init xs)
Replacing our function definitions with these instantiations of the Sequence type class lets our program compile.
Type classes are also an important part of using templates in function definitions. In our function boxVolume
, we got an error because we tried to use the *
operation without declaring the type variable a
as an instance of the Num
type class. The Num
type class is basically for anything that acts like a number, such as Int
, Float
, and Double
, and it lets you use the common operations of +
, -
, and *
.
Let’s change our function to declare that a
is a Num
:
boxVolume :: (Num a) => a -> a -> a -> a
boxVolume length width height = length * width * height
This is called adding a class constraint. Whenever we want to declare a template function that relies on other functions, we have to add a class constraint that tells both the user and the compiler which types of data can be put into the function.
If we were to call boxVolume
on strings, we would get this simple error message:
ghci> boxVolume "a" "b" "c"
<interactive>:14:1:
No instance for (Num [Char]) arising from a use of `boxVolume'
Possible fix: add an instance declaration for (Num [Char])
In the expression: boxVolume "a" "b" "c"
In an equation for `it': it = boxVolume "a" "b" "c"
The compiler tells us it can’t evaluate this function because strings aren’t numbers! If we really wanted to, we could make String
an instance of the Num
type class, and then this function would work! (Of course, why you would want to do that is beyond me.) That’s the power of parametric polymorphism combined with type classes.
So there you have it. In C++, although we can easily implement ad hoc polymorphism through function overloading, parametric polymorphism is a tricky beast. This is made easier in Haskell, especially with the use of type classes. Type classes guarantee that data passed in to functions will work, and guide the user into what they can pass into a function. Use type classes to your advantage when you next write a Haskell program!
Learning to use git
, vim
, and bash
was hard for us. These tools are so different than the tools we used when we first learned to program. And they’re confusing! But our professor made us use them… and eventually… after we learned the tools… we discovered that we really like them! So we’ve put together a simple video guide to help you learn and enjoy these tools too. We did this as part of the CS100 open source software development class at UC Riverside.
Click here to watch the full playlist on YouTube.
This video shows you step by step how to create an account on GitHub. Then we see how to create our first repository called test
, and transfer it from GitHub onto our local machine using the git clone
command.
How do we create files and upload them to GitHub? The touch <filename>
command will create an empty file for you. The vim <filename>
command will open a file in an advanced text editor that we talk about farther down the page. The git push
command sends these files from your local machine up to GitHub, and the git pull
command downloads files from GitHub and saves them to your local computer.
Branches let you work on files without messing up your original code. When you finish your changes, you can merge them into the master
branch. This is the best part of version control.
Most programs have different versions, for example: 1.0, 1.1, 1.2, 2.1 and 2.2.1. The git tag
command let’s you create these versions. They’re just like a checkpoint in a Mario game!
Let’s say you want to contribute to an open source project, but you don’t have permission. In order to contribute to someone else’s repository, you must first “fork” it to create a repo that you do have push permission on. Then you issue a pull request through the GitHub website. This tells the owner of the original repo that you’ve made some changes they can incorporate.
README.md
fileREADME.md
files are how you document your projects. The README.md
should explain your program, give installation instructions, and list known bugs. Basically, it explains to someone else who has absolutely no idea what your program does or how to code, but it enables the user to understand the concepts and basic directions to execute your program. The .md
extension at the end of the filename indicates that the file uses markdown formatting. This is a simple way to create nice looking documentation.
vim
is an advanced text editor for Unix operating systems. It’s powerful, but all the commands are intimidating for first time users. Even though it’s hard to get used to at first, these videos will help you learn some of the basic commands and get comfortable with vim
.
It was difficult at first trying to transverse my code while using vim
. I was so used to being able to use my mouse and simply click where I wanted to go. There are many ways to maneuver inside of vim
. Some may just use the h,j,k,l
, up, down, left, right arrow keys
, or the w, e, b
keys to move. You can also press gg
to go to the top of the code, G
to go to the bottom of it, and (any number)G
to go to the line number typed before the capital G.)
Cutting, copying, and pasting took a while to get used to when using vim
. Sometimes there was something I wanted in my code that was in the instructions for the assignment. In order to paste I would use the p
command, but I could not paste things from outside of vim
into it. If I had something copied outside of vim
, then to paste it into vim
I would right click and just click paste. This would paste it wherever the cursor currently is. If you right click to copy, then it will not affect what is copied by using the commands y
to copy or the commands d
or x
to cut. If those commands are used, the just clicking p
will paste them. There are other ways to store more than one thing while copying or cutting, but these two ways were the most helpful as I learned how to use vim
.
Another personal favorite features of vim
, are the shift-a
(takes you to the end of the line and into insert mode) and the shift-i
(takes you to the beginning of the line and into insert mode) command. You can also press a
to append after the cursor position, as well as i
to insert before the current cursor position
vim
also allows you to use the v
or shift-v
keys to highlight certain text or lines of code. You can then use other vim
commands such as the copy, paste and delete keys to perform your needed actions.
At first it felt very time consuming to indent multiple lines. I felt this way until I found about the V
command. V
lets users highlight a line and pressing up or down can highlight as many lines as they desire. All that was left to do was to type >
after everything I wanted to indent was highlighted and it all would indented once to the right. Typing <
would instead indent it to the left if I ever wanted to do that.
There are two commands for deleting single character. x
deletes the character that the cursor is on and moves the cursor to the right; and X
deletes the character that the cursor is on and moves the cursor to the left.
The d
command is a more powerful way to delete. d
can be used with many different things after it. dd
will delete the entire line. d$
will delete the rest of the current line. de
will delete from where the cursor is up until the end of the word.
Lower case r
can replace one letter while upper case R
can replace one letter with many.
There are three c
commands that I regularly use for replacement: ce
, which deletes up until the end of the word that the cursor is currently on, then allows you to insert immediately; c$
, which deletes from where the cursor is up until the end of the line, then allows you to insert immediately; and cc
, which deletes the whole line that the cursor is on and allows you to insert immediately at the beginning of the line.
Ever wondered how’ve we get our vim
editor to work in the way we have it versus the default editor? vim
has a file where you can setup it’s defaults such as auto parentheses, auto-indent, and much more. By watching our video above, you can easily create new defaults for your vim
editor that can cut time spent formating your text to spend more on coding.
One of the best features of Unix operating systems is the powerful terminal they provide.
ls
commandThe ls
command is one of the most used terminal commands.
The basic ls
command, when run, displays the contents within the current working directory. Passing in a directory name as an argument will display the contents of that directory. It is also possible to pass in a path for a directory to display any directory, regardless of the directory the user is currently in.
If the -a
flag is passed in with ls
, all items in the current working directory prepended with a .
are also displayed, along with the rest of the items.
Passing in the -l
flag prints information for each item in the directory in a series of columns on a single line. The first column displays the read, write, and executable permissions for the main user, the group the current user is in, and any user in that order. The next column shows the owner of the item and the next column shows the group owner. The fourth column displays the size, in bytes, of the item. The fifth column displays the moment the item was created, and the last column displays the name of the item.
If the -R
flag is passed in, the command will display the contents of the current directory, and then recursively enter every directory within the current directory and display the contents of that directory, then keep going into every directory until there are no more directories in the current directory it is in.
All these options are combinable for different uses. For example, I could use the -l
and -a
flags to display the information for the items prepended with a .
, or use -R
and -l
together.
cd
and mv
commandsThe cd
and mv
commands are crucial commands in order to actually use the terminal. Without cd, I would forever be stuck in their home directory. The mv
command is necessary for moving files from one section of the hard drive. The cd
command by itself will change the current working directory to the home directory. If passed a directory name that is within the current working directory, the current working directory will be changed to the name of the passed in directory. cd
will also take a path as an argument. When a path is passed in, the current working directory will be changed to the directory specified by the path. When cd
is passed with ..
, the directory will go backwards, the directory that the current directory is in.
The mv
command will move an item within a certain directory to the directory passed in.
If the destination argument is not a path, the command will look for the destination in the current working directory. The destination argument can be a path, so I can move the item to any directory in the hard drive.
With the script
command you can record the commands you run in your terminal into a file. By just typing script file_name_here
, you can start a script. Also, you don’t need to worry about making a file beforehand, because when you specify the filename, it will make once for you in that name. Then when you’re done, type exit
and your terminal will say your script session has ended and re-state the filename in which it recorded all your commands in.
Computer Science students have the ability to log into the school’s server using the ssh
command. The way to do access the terminal is to type into the command terminal the following text:
ssh your_NetId@bell.cs.ucr.edu
If it is your first time entering the terminal, you will be asked to trust the encryption that the server uses, then prompted to enter the password associated with your NetID. Once doing all those steps, you will be brought to your home directory on the server. To exit the server, type exit
into the command prompt and press enter.
A useful command that moves files to and from the remote server onto your home computer is the scp
command. To put items from your home computer to the school’s server, type into the command prompt:
scp filename/absolute_path your_NetID@bell.cs.ucr.edu:absolute_path
To move items from the remote server onto your home computer, type into the command prompt:
scp your_NetID@bell.cs.ucr.edu:absolute_path absolute_path
vim
in one screenOne of the first things I noticed about vim
that I initially disliked was that it took over the terminal when I used it. Users with Windows 7 & above automatically have this ability by dragging your screen to the left or right border of your screen. Unfortunately, OS X users don’t have this built in ability. To get around this, OS X users can install the Spectacle App which will enable you to organize multiple windows on your screen with a touch of a buttom. To get around this issue, I started using two terminals instead of just one while I was programming. I would run vim
using the first terminal and would run the executable in the second. It was as simple as using :w
to save on vim
instead of using :wq
to save and quit. I could now test my code without ever having to close vim
.
When programming for unix based operating systems (which is a primary component of CS100), system calls are a prominent component for code. The perror
function captures the error value (if returned) from the system call and prints to stdout an error message based on the system call and the type of error. It takes in one c-string argument, which is a message the user can pass in.
In 2006, I saw the dead sea scrolls in San Diego. The experience changed my life. I realized I knew nothing about ancient Judea, and decided to immerse myself in it. I studied biblical Hebrew and began a collection of Hebrew scrolls.
Each scroll is between 100 to 600 years old, and is a fragment of the Torah. These scrolls were used by synagogues throughout Southern Europe, Africa, and the Middle East. As we’ll see in a bit, each region has subtly different scribal traditions. But they all take their Torah very seriously.
The first thing that strikes me about a scroll is its color. Scrolls are made from animal skin, and the color is determined by the type of animal and method of curing the skin. The methods and animals used depend on the local resources, so color gives us a clue about where the scroll originally came from. For example, scrolls with a deep red color usually come from North Africa. As the scroll ages, the color may either fade or deepen slightly, but remains largely the same. The final parchment is called either gevil or klaf depending on the quality and preparation method.
The four scrolls below show the range of colors scrolls come in:
My largest scroll is about 60 feet long. Here I have it partially stretched out on the couch in my living room:
The scroll is about 300 years old, and contains most of Exodus, Leviticus, and Numbers. A complete Torah scroll would also have Genesis and Deuteronomy and be around 150 feet long. Sadly, this scroll has been damaged throughout its long life, and the damaged sections were removed.
As you can imagine, many hides were needed to make these large scrolls. These hides get sewn together to form the full scroll. You can easily see the stitching on the back of the scroll:
Also notice how rough that skin is! The scribes (for obvious reasons) chose to use the nice side of the skin to write on.
Here is a front end, rotated view of the same seam above. Some columns of text are separated at these seems, but some columns are not.
Animal hides come in many sizes. The hide in this image is pretty large and holds five columns of text:
But this hide is smaller and holds only three columns:
The coolest part of these scrolls is their calligraphy. Here’s a zoomed in look on one of the columns of text above:
There’s a lot to notice in this picture:
The detail is amazing. Many characters have small strokes decorating them. These strokes are called tagin (or crowns in English). A bit farther down the page we’ll see different ways other scribal traditions decorate these characters. Because of this detail in every letter, a scribe (or sopher) might spend the whole day writing without finishing a single piece of parchment. The average sopher takes between nine months to a year to complete a Torah scroll.
There are faint indentations in the parchment that the sopher used to ensure he was writing straight. We learned to write straight in grade school by writing our letters on top of lines on paper. But in biblical Hebrew, the sopher writes their letters below the line!
Hebrew is read and written right to left (backwards from English). To keep the left margin crisp, the letters on the left can be stretched to fill space. This effect is used in different amounts throughout the text. The stretching is more noticeable in this next section:
And sometimes the sopher goes crazy and stretches all the things:
If you look at the pictures above carefully, you can see that only certain letters get stretched: ת ד ח ה ר ל. These letters look nice when stretched because they have a single horizontal stroke.
The next picture shows a fairly rare example of stretching the letter ש. It looks much less elegant than the other stretched letters:
Usually these stretched characters are considered mistakes. An experienced sopher evenly spaces the letters to fill the line exactly. But a novice sopher can’t predict their space usage as well. When they hit the end of the line and realize they can’t fit another full word, they’ll add one of these stretched characters to fill the space.
In certain sections, however, stretched lettering is expected. It is one of the signs of poetic text in the Torah. For example, in the following picture, the sopher intentionally stretched each line, even when they didn’t have to:
Keeping the left margin justified isn’t just about looks. The Torah is divided into thematic sections called parashot. There are two types of breaks separating parashot. The petuha (open) is a jagged edge, much like we end paragraphs in English. The setumah (closed) break is a long space in the middle of the line. The picture below shows both types of breaks:
A sopher takes these parashot divisions very seriously. If the sopher accidentally adds or removes parashot from the text, the entire scroll becomes non-kosher and cannot be used. A mistake like this would typically be fixed by removing the offending piece of parchment from the scroll, rewriting it, and adding the corrected version back in. (We’ll see some pictures of less serious mistakes at the bottom of this page.)
The vast majority of of the Torah is formatted as simple blocks of text. But certain sections must be formatted in special ways. This is a visual cue that the text is more poetic.
The passage below is of Numbers 10:35-36. Here we see an example of the inverted nun character being used to highlight some text. This is the only section of the Torah where this formatting appears (although it also appears seven times in the book of psalms). The inverted nun characters are set all by themselves, and surround a command about the Ark of the Covenant:
It’s really cool when two different scrolls have sections that overlap. We can compare them side-by-side to watch the development of different scribal traditions. The image below shows two versions of Numbers 6:22-27.
The writing is almost identical in both versions, with one exception. On the first line with special formatting, the left scroll has two words in the right column: אמור להם, but the right scroll only has the world אמור) להם is the last word on the previous line). When the sopher is copying a scroll, he does his best to preserve the formatting in these special sections. But due to the vast Jewish diaspora, small mistakes like this get made and propagate. Eventually they form entirely new scribal traditions. (Note that if a single letter is missing from a Torah, then the Torah is not kosher and is considered unfit for use. These scribal differences are purely stylistic.)
Many individual characters and words also receive special formatting throughout the text. Both images below come from the same piece of parchment (in Genesis 23) and were created by the same sopher. The image on the left shows the letter פ in its standard form, and the image on the right shows it in a modified form.
The meaning of these special characters is not fully known, and every scribal tradition exhibits some variation in what letters get these extra decorations. In the scroll above, the whirled פ appears only once. But some scrolls exhibit the special character dozens of times. Here is another example where you can see a whirled פ a few letters to the right of its normal version:
Another special marker is when dots are placed over the Hebrew letters. The picture below comes from the story when Esau is reconciling with Jacob in Genesis 33. Normally, the dotted word would mean that Esau kissed Jacob in reconciliation; but tradition states that these dots indicate that Esau was being incincere. Some rabbis say that this word, when dotted, could be more accurately translated as Esau “bit” Jacob.
Next, let’s take a look at God’s name written in many different styles. In Hebrew, God’s name is written יהוה. Christians often pronounce God’s name as Yahweh or Jehovah. Jews, however, never say God’s name. Instead, they say the word adonai, which means “lord.” In English old testaments, anytime you see the word Lord rendered in small caps, the Hebrew is actually God’s name. When writing in English, Jews will usually write God’s name as YHWH. Removing the vowels is a reminder to not say the name out loud.
Below are nine selected images of YHWH. Each comes from a different scroll and illustrates the decorations added by a different scribal tradition. A few are starting to fade from age, but they were the best examples I could find in the same style. The simplest letters are in the top left, and the most intricate in the bottom right. In the same scroll, YHWH is always written in the same style.
The next image shows the word YHWH at the end of the line. The ה letters get stretched just like in any other word. When I first saw this I was surprised a sopher would stretch the name of God like this—the name of God is taken very seriously and must be handled according to special rules. I can just imagine rabbis 300 years ago getting into heated debates about whether or not this was kosher!
There is another oddity in the image above. The letter yod (the small, apostrophe looking letter at the beginning of YHWH) appears in each line. But it is written differently in the last line. Here, it is given two tagin, but everywhere else it only has one. Usually, the sopher consistently applies the same styling throughout the scroll. Changes like this typically indicate the sopher is trying to emphasize some aspect of the text. Exactly what the changes mean, however, would depend on the specific scribal tradition.
The more general word for god in Hebrew is אלוהים, pronounced elohim. This word can refer to either YHWH or a non-Jewish god. Here it is below in two separate scrolls:
Many Christians, when they first learn Hebrew, get confused by the word elohim. The ending im on Hebrew words is used to make a word plural, much like the ending s in English. (For example, the plural of sopher is sophrim.) Christians sometimes claim that because the Hebrew word for god looks plural, ancient Jews must have believed in the Christian doctrine of the trinity. But this is very wrong, and rather offensive to Jews.
Tradition has that Moses is the sole author of the Torah, and that Jewish sophrim have given us perfect copies of Moses’ original manuscripts. Most modern scholars, however, believe in the documentary hypothesis, which challenges this tradition. The hypothesis claims that two different writers wrote the Torah. One writer always referenced God as YHWH, whereas the other always referenced God as elohim. The main evidence for the documentary hypothesis is that some stories in the Torah are repeated twice with slightly different details; in one version God is always called YHWH, whereas in the other God is always called elohim. The documentary hypothesis suggests that some later editor merged two sources together, but didn’t feel comfortable editing out the discrepancies, so left them exactly as they were. Orthodox Jews reject the documentary hypothesis, but some strains of Judaism and most Christian denominations are willing to consider that the hypothesis might be true. This controversy is a very important distinction between different Jewish sects, but most Christians aren’t even aware of the controversy in their holy book.
The next two pictures show common gramatical modifications of the words YHWH and elohim: they have letters attached to them in the front. The word YHWH below has a ל in front. This signifies that something is being done to YHWH or for YHWH. The word elohim has a ה in front. This signifies that we’re talking about the God, not just a god. In Hebrew, prepositions like “to,” “for,” and “the” are not separate words. They’re just letters that get attached to the words they modify.
Names are very important in Hebrew. Most names are actually phrases. The name Jacob, for example, means “heel puller.” Jacob earned his name because he was pulling the heel of his twin brother Esau when they were born in Genesis 25:26. Below are two different versions of the word Jacob:
But names often change in the book of Genesis. In fact, Jacob’s name is changed to Israel in two separate locations: first in Genesis 32 after Jacob wrestles with “a man”; then again in Genesis 35 after Jacob builds an alter to elohim. (This is one of the stories cited as evidence for the documentary hypothesis.) The name Israel is appropriate because it literally means “persevered with God.” The el at the end of Israel is a shortened form of elohim and is another Hebrew word for god.
Here is the name Israel in two different scripts:
Another important Hebrew name is ישוע. In Hebrew, this name is pronounced yeshua, but Christians commonly pronounce it Jesus! The name literally translates as “salvation.” That’s why the angel in Matthew 1:21 and Luke 1:31 gives Jesus this name. My scrolls are only of the old testament, so I don’t have any examples to show of Jesus’ name!
To wrap up the discussion of scribal writing styles, let’s take a look at the most common phrase in the Torah: ודבר יהוה אל משה. This translates to “and the Lord said to Moses.” Here is is rendered in three different styles:
Now let’s move on to what happens when the sophrim make mistakes.
Copying all of these intricate characters was exhausting work! And hard! So mistakes are bound to happen. But if even a single letter is wrong anywhere in the scroll, the entire scroll is considered unusable. The rules are incredibly strict, and this is why Orthodox Jews reject the documentary hypothesis. To them, it is simply inconceivable to use a version of the Torah that was combined from multiple sources.
The most common way to correct a mistake is to scratch off the outer layer of the parchment, removing the ink. In the picture below, the sopher has written the name Aaron (אהרן) over the scratched off parchment:
The next picture shows the end of a line. Because of the mistake, however, the sopher must write several characters in the margin of the text, ruining the nice sharp edge they created with the stretched characters. Writing that enters the margins like this is not kosher.
Sometimes a sopher doesn’t realize they’ve made a mistake until several lines later. In the picture below, the sopher has had to scratch off and replace three and a half lines:
Scratching the parchment makes it thinner and weaker. Occasionally the parchment is already very thin, and scratching would tear through to the other side. In this case, the sopher can take a thin piece of blank parchment and attach it to the surface. In the following picture, you can see that the attached parchment has a different color and texture.
The next picture shows a rather curious instance of this technique. The new parchment is placed so as to cover only parts of words on multiple lines. I can’t imagine how a sopher would make a mistake that would best be fixed in this manner. So my guess is that this patch was applied some time later, by a different sopher to repair some damage that had occurred to the scroll while it was in use.
Our last example of correcting mistakes is the most rare. Below, the sopher completely forgot a word when copying the scroll, and added it in superscript above the standard text:
If we zoom in, you can see that the superscript word is slightly more faded than the surrounding text. This might be because the word was discovered to be missing a long time (days or weeks) after the original text was written, so a different batch of ink was used to write the word.
Since these scrolls are several hundred years old, they’ve had plenty of time to accumulate damage. When stored improperly, the parchment can tear in some places and bunch up in others:
One of the worst things that can happen to a scroll is water. It damages the parchment and makes the ink run. If this happens, the scroll is ruined permanently.
If you’ve read this far and enjoyed it, then you should learn biblical Hebrew. It’s a lot of fun! You can start right now at any of these great sites:
http://foundationstone.com.au - geared for the Jewish reader, covers both biblical and modern Hebrew
http://hebrew4christians.com - very good site because it also explains lots of Jewish customs that may be unfamiliar to you; this should probably be more accurately named “hebrew4gentiles”
http://www.ulpan.net - the focus is on modern Hebrew, but the basics are the same
http://www.101languages.net/hebrew - also modern Hebrew
When you’re ready to get serious, you’ll need to get some books. The books that helped me the most were:
These books all have lots of exercises and make self study pretty simple. The Biblical Hebrew Workbook is for absolute beginners. Within the first few sessions you’re translating actual bible verses and learning the nuances that get lost in the process. I spent two days a week with this book, two hours at each session. It took about four months to finish.
The other two books start right where the workbook stops. They walk you through many important passages and even entire books of the old testament. After finishing these books, I felt comfortable enough to start reading the old testament by myself. Of course I was still very slow and was constantly looking things up in the dictionary!
For me, learning the vocabulary was the hardest part. I used a great free piece of software called FoundationStone to help. The program remembers which words you struggle with and quizes you on them more frequently.
Finally, let’s end with my favorite picture of them all. Here we’re looking down through a rolled up Torah scroll at one of my sandals.
In today’s adventure, our hero ghc faces its final challenge: granting parametricity to our lensified Functor, Applicative, and Monad classes. Parametricity is the key to finishing the task of simplifying our type signatures that we started two days ago. At the end we’ll still have some loose ends left untied, but they’ll have to wait for another haskell binge.
> {-# LANGUAGE KindSignatures #-}
> {-# LANGUAGE TypeFamilies #-}
> {-# LANGUAGE MultiParamTypeClasses #-}
> {-# LANGUAGE UndecidableInstances #-}
> {-# LANGUAGE FlexibleInstances #-}
> {-# LANGUAGE LiberalTypeSynonyms #-}
> {-# LANGUAGE ImpredicativeTypes #-}
> {-# LANGUAGE ConstraintKinds #-}
> import Control.Category
> import Prelude hiding ( (.), id, Functor(..), Applicative(..), Monad(..) )
> import Data.Params
> import Data.Params.Functor
> import Data.Params.Applicative
> import Data.Params.Monad
At the value level, tying the knot is a classic technique in Haskell data structures. It let’s us build circular datastructures using self-refrence and lazy evaluation. The classic example is the cycle:
> cycle = x where
> x = 0 : y
> y = 1 : x
But there can be no direct analogy between tying the knot at the value and type level. This is because tying the knot requires lazy evaluation, which doesn’t make sense for types.
(Idea! Maybe we should just start calling Python a lazily typed language!)
But let’s check out a new type level technique… and if you look at the right angle… and squint just the right amount… then it sorta kinda looks like tying the knot.
Remember all that fun we had with lensified Functors, Applicatives, and Monads? Our new classes are powerful, but we paid a major price for that power: We gave up parametricity. Parametricity is one of the main properties that makes Haskell code so easy to use and fun to write. Giving it up would be a disaster.
So let’s get it back.
First, we’ll modify our type classes. We’ll need to pass a third parameter to each of them. We’ll call it b. This parameter represents the type at the lens position of our Functor/Applicative/Monad.
> class b ~ GetParam lens tb => Functor' lens tb b where
> fmap' :: TypeLens p lens -> (a -> b) -> SetParam lens a tb -> tb
> class Functor' lens tb b => Applicative' lens tb b where
> pure :: GetParam lens tb -> TypeLens Base lens -> tb
>
> ap :: ( tf ~ SetParam lens (a -> b) tb
> , ta ~ SetParam lens a tb
> , a ~ GetParam lens ta
> )
> => TypeLens Base lens -> tf -> ta -> tb
> class Applicative' lens tfb b => Monad' lens tfb b where
> join :: tffb ~ CoJoin lens tfb
> => TypeLens Base lens -> tffb -> tfb
Now we can guarantee parametricity when we declare instances of the classes. All we have to do is make sure that the b parameter is a variable and not a type constructor. For example, this is parametric:
instance Functor' (Param_a Base) (Either a b) a
but this is not:
instance Functor' (Param_a Base) (Either Int b) Int
Making our instances parametric is not enough for the type checker. We must prove that all instances will always be parametric. These higher-rank type constraints assert this fact:
> type Functor'' p t = forall a t'. ( t' ~ SetParam p a t, Functor' p t' a )
> type Applicative'' p t = forall a t'. ( t' ~ SetParam p a t, Applicative' p t' a )
> type Monad'' p t = forall a t'. ( t' ~ SetParam p a t, Monad' p t' a )
These type synonym constraints are what I’m calling “tying the type knot.” The foralled a and t’ let us represent an “infinite” number of constraints with a finite size, just like tying the knot lets us represent an infinite data structure with finite memory.
This same technique also works for the constrained monad problem:
> type CnstFunctor p c t = forall a t'. ( t' ~ SetParam p a t, Functor' p t' a, c a )
> type CnstApplicative p c t = forall a t'. ( t' ~ SetParam p a t, Applicative' p t' a, c a )
> type CnstMonad p c t = forall a t'. ( t' ~ SetParam p a t, Monad' p t' a, c a )
So what, exactly, do these new parametric constraints buy us?
Remember how we used the idea of type-level rewrite rules to simplify the type of our applicative sequencing operator (*>)? Now we can simplify it even further.
This is the type that we came up with two posts ago:
(*>) ::
( Applicative lens ( SetParam lens ( a -> b -> b ) tb )
, Applicative lens ( SetParam lens ( b -> b ) tb )
, Applicative lens tb
, tb ~ SetParam lens b tb
) => SetParam lens a tb
-> tb -> TypeLens Base lens -> tb
It’s pretty clean except for the multiple Applicative constraints. But if we use the type-knotted constraints, we can combine all the Applicatives into one:
(*>) ::
( Applicative'' lens tb
, tb ~ SetParam lens b tb
) => SetParam lens a tb
-> tb -> TypeLens Base lens -> tb
Unfortunately, we can’t test these parametric constraints today because of a ghc bug/missing feature.
With a heavy dose of sugar, we can use our type lenses to create a type level record syntax. This will make our (*>) operator’s type even clearer… almost like original.
The sugaring rules are pretty simple. Just replace any type expression of the form:
t { lens = a }
with a call to the SetParam type function:
SetParam lens a t
And that’s it!
Now, the lensified and standard versions of (*>) are pretty similar looking. Here they are in a side-by-side comparison:
original (*>) :: Applicative t => t a -> t b -> t b
lensified (*>) :: Applicative lens t => t { lens = a } -> t { lens = b } -> t { lens = b }
Sweet!
Next time, we’ll see how to promote everything we’ve done to the kind level.
…
…
…
…
Just kidding!
I’m actually getting married this weekend! I wanted to share my excitement with all you haskellers, so I put together this bad little tying the knot pun! Thanks for putting up with me! Yay!
(disclaimer: there’s probably lot’s of little mistakes floating around in these posts… type theory isn’t really my area… and I just thought of this idea last week… and… now my brain hurts…)
It’s round 5 of typeparams versus GHC. We’ll be extending our Functor and Applicative classes to define a new Monad class. It’s all pretty simple if you just remember: lensified monads are like burritos with fiber optic cables telling you where to bite next. They’re also just monoids in the category of lens-enhanced endofunctors. Piece of cake.
We’ll be using all the same extensions as before:
> {-# LANGUAGE TemplateHaskell #-}
> {-# LANGUAGE ScopedTypeVariables #-}
> {-# LANGUAGE KindSignatures #-}
> {-# LANGUAGE TypeFamilies #-}
> {-# LANGUAGE MultiParamTypeClasses #-}
> {-# LANGUAGE UndecidableInstances #-}
> {-# LANGUAGE FlexibleInstances #-}
> {-# LANGUAGE RankNTypes #-}
But we’ll be adding some pretty nasty ones today:
> {-# LANGUAGE OverlappingInstances #-}
> {-# LANGUAGE RebindableSyntax #-}
We need RebindableSyntax to get do notation, but OverlappingInstances is just a product of the Monad class’s definition. I’ll give infinite haskell points to anyone who can refactor this code so we don’t need the extension!
We’ll also be needing all of our previous work on Functors and Applicatives. It has been uploaded to hackage and is sitting in the appropriate modules:
> import Control.Category
> import Prelude hiding ( (.), id, Functor(..), Applicative(..), Monad(..) )
> import qualified Prelude as P
> import GHC.Exts
> import Data.Params
> import Data.Params.Applicative
> import Data.Params.Functor
And we’re off!
We will define our monads in terms of their join function. In the standard libraries, join has the type:
join :: m (m a) -> m a
The input has the same type as the output, except that the Monad m is repeated twice. There are two differences in the lensified join function: First, the monad we’re working with might be nested arbitrarily deeply in other data types. Second, the argument it is monadic in might not be the last one. Here is an example of what the join type signature would look like for the Left Either monad sitting within a Maybe Monad:
join :: TypeLens Base (Param_a (Param_a Base))
-> Maybe (Either (Either String Int) Int)
-> Maybe (Either String Int)
Since we’re all wannabe category theorists here, we’ll create a CoJoin type family that transforms the output of the join function by duplicating the type at location specified by the lens:
> type family CoJoin (lens :: * -> Constraint) t
> type instance CoJoin lens t
> = SetParam'
> lens
> ( SetParam'
> ( Objective lens )
> ( GetParam lens t )
> ( GetParam (RemoveObjective lens) t )
> )
> t
(We covered the Objective and RemoveObjective families in a previous post. As a reminder, the Objective family returns the innermost type lens from our input, and the RemoveObjective family returns the lens that results when the innermost lens is taken away.)
CoJoin only has one instance, so we could have just used a type synonym. That would make debugging harder, however. The advantage of a type family is that when we ask GHCi what the type is, it will perform the substitutions for us. For example:
ghci> :t undefined :: CoJoin (Param_a Base) (Maybe (Either String Int))
:: Maybe (Maybe (Either String Int))
ghci> :t undefined :: CoJoin (Param_a (Param_a Base)) (Maybe (Either String Int))
:: Maybe (Either (Either String Int) Int)
Now we’re ready to see our new Monad class:
> class Applicative lens tfb => Monad lens tfb where
> join ::
> ( tffb ~ CoJoin lens tfb
> ) => TypeLens Base lens
> -> tffb -> tfb
The Left and Right Either instances are:
> instance Monad (Param_a Base) (Either a b) where
> join lens (Left (Left a)) = Left a
> join lens (Left (Right b)) = Right b
> join lens (Right b) = Right b
> instance Monad (Param_b Base) (Either a b) where
> join lens (Right (Right b)) = Right b
> join lens (Right (Left a)) = Left a
> join lens (Left a) = Left a
And here are some examples of join in action:
ghci> join _b (Right $ Right "monads") :: Either String String
Right "monads"
ghci> join _b (Right $ Left "are") :: Either String String
Left "are"
ghci> join _a (Left $ Left "so") :: Either String String
Left "so"
ghci> join _a (Right "awesome") :: Either String String
Right "awesome"
The instances above don’t consider the case when our lenses point inside of the Either type. We’ll need to define two new recursive instances to handle this case. These instances are the reason we needed the OverlappingInstances language extension:
> instance
> ( Monad p a
> , Either (CoJoin p a) b ~ CoJoin (Param_a p) (Either a b) -- follows from the lens laws
> ) => Monad (Param_a p) (Either a b)
> where
>
> join lens (Left a) = Left $ join (zoom lens) a
> join lens (Right b) = Right b
> instance
> ( Monad p b
> , Either a (CoJoin p b) ~ CoJoin (Param_b p) (Either a b) -- follows from the lens laws
> ) => Monad (Param_b p) (Either a b)
> where
>
> join lens (Left a) = Left a
> join lens (Right b) = Right $ join (zoom lens) b
The equality constraints in the instances above are implied by the lens laws. As we discussed yesterday, with the type rules language extension, those constraints could be removed completely, making the code a bit nicer.
Here are some examples of using join in the nested case:
ghci> join (_a._b) (Left $ Right $ Right "lenses") :: Either (Either a String) b
Left (Right "lenses")
ghci> join (_a._b) (Left $ Right $ Left "are") :: Either (Either String b) b
Left (Left "are")
ghci> join (_b._b) (Left "neat") :: Either String (Either a String)
Left "neat"
Sometimes we will get the same answer if we join in two separate locations. In the first example below, we join the second two Right constructors, whereas in the second example, we join the first two Right constructors. The results are the same:
ghci> join (_b._b) (Right $ Right $ Right "easy") :: Either a (Either a String)
Right (Right "easy")
ghci> join _b (Right $ Right $ Right "peasy") :: Either a (Either a String)
Right (Right "peasy")
We’ll also be needing a Monad instance for Maybe, so here it is:
> instance Monad (Param_a Base) (Maybe a) where
> join lens Nothing = Nothing
> join lens (Just Nothing) = Nothing
> join lens (Just (Just a)) = Just a
> instance
> ( Monad p a
> , Maybe (CoJoin p a) ~ CoJoin (Param_a p) (Maybe a) -- follows from the lens laws
> ) => Monad (Param_a p) (Maybe a)
> where
> join lens Nothing = Nothing
> join lens (Just a) = Just $ join (zoom lens) a
From join and our Applicative instance, we can derive our monadic bind function. We don’t want to use the traditional (>>=) operator for bind just yet. We will need to do something fancy with it to make do notation work out. So instead, we will use the (\=) operator for bind. Its definition is:
(\\=) ::
( Monad lens tb
, a ~ GetParam lens tfa
, {- ... lens laws go here ... -}
) => ta -> (a -> tb) -> TypeLens Base lens -> tb
> infixl 1 \\=
> (m \\= f) lens = join lens $ fmap lens f m
We will create the “minus bind operators” in the same way we created minus operators for the Applicative class. Remember, the minus sign points to the parameters that will get a lens applied to them because they are “minus a lens”. These minus operators are defined as:
> infixl 1 \\=-
> infixl 1 -\\=-
> infixl 1 -\\=
> (m \\=- f) lens = ( m \\= \a -> f a $ objective lens ) lens
> (m -\\=- f) lens = ( m lens \\= \a -> f a $ objective lens ) lens
> (m -\\= f) lens = ( m lens \\= \a -> f a ) lens
For our example, we’ll build a simple monadic filter. The filterSmall function below sits in the Either Monad, but we’ll be using Left to represent successes (the input passes through the filter), and Right to represent failure (the input doesn’t pass through).
> filterSmall :: (Show a, Ord a) => a -> a -> Either a String
> filterSmall k x = if x > k
> then Left x
> else Right $ show x ++ " is too small"
We can call our function using the monadic bind by:
> chain1 :: Either Int String
> chain1 = at _a $ Left 20 \\= filterSmall 10
ghci> chain1
Left 20
Instead of using the Left constructor, we can make things a little more generic by using the return function. As usual, it is equivalent to pure:
> return :: Monad lens t => GetParam lens t -> TypeLens Base lens -> t
> return = pure
Sine pure’s last parameter is a type lens, we must use the left-minus (-\=) variant of bind to sequence the computation:
> chain2 :: Either Int String
> chain2 = at _a $ return 20 -\\= filterSmall 10
ghci> chain2
Left 20
Similarly, all the bind operators take a type lens as their last parameter. So any future binds must also use left-minus bind:
> chain3 :: Either Int String
> chain3 = at _a $ return 20 -\\= filterSmall 10 -\\= filterSmall 15
ghci> chain3
Left 20
And so on:
> chain4 :: Either Int String
> chain4 = at _a $ return 20 -\\= filterSmall 10 -\\= filterSmall 15 -\\= filterSmall 25
ghci> chain4
Right "20 is too small"
We can easily nest our monads. Let’s put all of the computations above inside a Maybe wrapper. All we have to do is change the type signature and the lens:
> chain2' :: Maybe (Either Int String)
> chain2' = at (_a._a) $ return 20 -\\= filterSmall 10
> chain3' :: Maybe (Either Int String)
> chain3' = at (_a._a) $ return 20 -\\= filterSmall 10 -\\= filterSmall 15
> chain4' :: Maybe (Either Int String)
> chain4' = at (_a._a) $ return 20 -\\= filterSmall 10 -\\= filterSmall 15 -\\= filterSmall 25
We’re using the RebindableSyntax language extension to construct a custom do notation. We do this by defining our own (>>=) operator. The most generic bind operator we have is the double minus bind (-\=-). Sometimes we will want to feed a lens to both sides of the bind, so that’s what we’ll use:
> infixl 1 >>=
> (m >>= f) lens = (m -\\=- f) lens
Notice that our (>>=) operator and the one from Prelude take different numbers of arguments! GHC is awesome enough that this is not a problem.
RebindableSyntax also requires us to define functions for failed pattern matching and if statements. Our definitions will be pretty simple:
> fail = error
> ifThenElse False _ f = f
> ifThenElse True t _ = t
Now, we can take our chain2’ function above and rewrite it in do notation. Here it is again for easy reference:
chain2' :: Maybe (Either Int String)
chain2' = at (_a._a) $ return 20 -\\= filterSmall 10
First, rewrite it to use (-\=-) instead of (-\=) by causing the right hand side to take a lens parameter even though it won’t use it:
> chain2'' :: Maybe (Either Int String)
> chain2'' = at (_a._a) $ return 20 -\\=- (\x lens -> filterSmall 10 x)
Then, rewrite it using do notation:
> chain2''' :: Maybe (Either Int String)
> chain2''' = at (_a._a) $ do
> x <- return 20
> \lens -> filterSmall 10 x
It looks a little bit nicer if we use const to absorb the lens parameter:
> chain2'''' :: Maybe (Either Int String)
> chain2'''' = at (_a._a) $ do
> x <- return 20
> const $ filterSmall 10 x
Here is our other examples converted into do notation using the same technique:
> chain3''' :: Maybe (Either Int String)
> chain3''' = at (_a._a) $ do
> x <- return 20
> y <- const $ filterSmall 10 x
> const $ filterSmall 15 y
> chain4'' :: Maybe (Either Int String)
> chain4'' = at (_a._a) $ do
> x <- return 20
> y <- const $ filterSmall 10 x
> z <- const $ filterSmall 15 y
> const $ filterSmall 25 z
And here is a more complicated expression with a nested do:
> chain5 :: Either a (Either a (Maybe (Either Int String)))
> chain5 = at (_b._b._a._a) $ do
> x <- return 20
> y <- do
> a <- const $ filterSmall x 10
> b <- const $ filterSmall 1 3
> return $ a+b
> z <- const $ filterSmall y x
> return $ z-x
But there is still a limitation. Due to the way the types work out, the first line of a do block must always be a return statement when using the at function to specify our lens. This is a by product of the extra lens parameter our (>>=) operator is passing around. Fortunately, we can automate this construction with the following function:
> atM lens m = at (removeObjective lens) $ do
> return $ at (objective lens) $ m
This lets us rewrite chain5 as:
> chain5' :: Either a (Either a (Maybe (Either Int String)))
> chain5' = atM (_b._b._a._a) $ do
> let x = 20
> y <- do
> a <- const $ filterSmall x 10
> b <- const $ filterSmall 1 3
> return $ a+b
> z <- const $ filterSmall y x
> return $ z-x
Now we fully support do notation!
Hooray!!
How do we get rid of those ugly const functions?
Can optimus prime use type lenses to save our purity from the effects of the evil decepticons?
Does any one actually care about lensified arrow-do?
Stay tuned to find out.
(or how to promote quickcheck and rewrite rules to the type level)
We’ve seen how to use the typeparams library to soup up our Functor and Applicative type classes. But we’ve been naughty little haskellers—we’ve been using type lenses without discussing their laws! Today we fix this oversight. Don’t worry if you didn’t read/understand the previous posts. This post is much simpler and does not require any background.
First, we’ll translate the standard lens laws to the type level. Then we’ll see how these laws can greatly simplify the type signatures of our functions. Finally, I’ll propose a very simple (yes, I promise!) GHC extension that promotes rewrite rules to the type level. These type level rewrite rules would automatically simplify our type signatures for us. It’s pretty freakin awesome.
### What exactly is a type lens?
Today, we won’t actually import anything from the typeparams library. Instead, we’ll be building up everything from scratch.
> {-# LANGUAGE TypeFamilies #-}
> {-# LANGUAGE MultiParamTypeClasses #-}
> {-# LANGUAGE PolyKinds #-}
> {-# LANGUAGE RankNTypes #-}
> {-# LANGUAGE ConstraintKinds #-}
> {-# LANGUAGE FlexibleContexts #-}
> import Control.Category
> import Prelude hiding ( (.), id )
> import GHC.Exts
Given a data type:
> data Example a b c = Example a b c
We construct the following empty classes:
> class Param_a (p :: * -> Constraint) t -- has kind :: * -> Constraint
> class Param_b (p :: * -> Constraint) t
> class Param_c (p :: * -> Constraint) t
These classes are the type level lenses. Each one uniquely identifies a parameter of the Example data type. To use these lenses, we will need to be able to represent them at the value level. So we create the singleton type:
> data TypeLens p q = TypeLens
Now, we can create three values that uinquely identify the three type parameters:
> _a = TypeLens :: TypeLens p (Param_a p)
> _b = TypeLens :: TypeLens p (Param_b p)
> _c = TypeLens :: TypeLens p (Param_c p)
We’re calling these things lenses, so they must be composable. In fact, they compose really easy. Check out their Category instance:
> instance Category TypeLens where
> id = TypeLens
> t1.t2 = TypeLens
When we chain values together using the (.) composition operator, we create a chain of classes at the type level. For example:
ghci> :t _a._b
_a._b :: TypeLens p (Param_a (Param_b p))
ghci> :t _a._b._c
_a._b._c :: TypeLens p (Param_a (Param_b (Param_c p)))
ghci> > :t _a._a._b._c._a._b
_a._a._b._c._a._b :: TypeLens p (Param_a (Param_a (Param_b (Param_c (Param_a (Param_b p))))))
These chains of classes correspond to a nesting of data types. For the Example type we created above, _a._b would refer to the type param b1 in the type:
Example (Example a1 b1 c1) b2 c2
_a._b._c would refer to b2 in the type:
Example (Example a1 b1 (Example a2 b2 c2)) b0 c0
and _a._a._b._c._a._b would refer to the parameter b6 in the monster type:
Example
( Example
( Example
a2
( Example
a3
b3
( Example
( Example
a5
( Example
a6
b6
c6
)
c5
)
b4
c4
)
)
c2
)
b1
c1
)
b0
c0
### getters and setters
The whole point of lenses is they give us an easy way to get and set parameters. At the type level, we do that with these type families:
> type family GetParam (p :: * -> Constraint) (t :: *) :: *
> type family SetParam (p :: * -> Constraint) (a :: *) (t :: *) :: *
For our Example data type, the implementations look like:
> type instance GetParam (Param_a p) (Example a b c) = GetParam p a
> type instance GetParam (Param_b p) (Example a b c) = GetParam p b
> type instance GetParam (Param_c p) (Example a b c) = GetParam p c
> type instance SetParam (Param_a p) a' (Example a b c) = Example (SetParam p a' a) b c
> type instance SetParam (Param_b p) b' (Example a b c) = Example a (SetParam p b' b) c
> type instance SetParam (Param_c p) c' (Example a b c) = Example a b (SetParam p c' c)
These definitions are recursive, so we need a base case to halt the recursion:
> class Base t
> type instance GetParam Base t = t
> type instance SetParam Base t' t = t'
Here are some example usages of the GetParam family:
ghci> :t undefined :: GetParam (Param_a Base) (Example Int b c)
:: Int
ghci> :t undefined :: GetParam (Param_b Base) (Example Int Float c)
:: Float
ghci> :t undefined :: GetParam (Param_a (Param_b Base)) (Example (Example a1 Int c1) b2 Float)
:: Int
ghci> :t undefined :: GetParam (Param_c Base) (Example (Example a1 Int c1) b2 Float)
:: Float
And similar uses of the SetParam family:
ghci> :t undefined :: SetParam (Param_a Base) Char (Example Int b Float)
:: Example Char b Float
ghci> :t undefined :: SetParam (Param_c Base) Char (Example Int b Float)
:: Example Int b Char
ghci> :t undefined :: SetParam (Param_a (Param_b Base)) Char (Example (Example a1 Int c1) b2 Float)
:: Example (Example a1 Char c1) b2 Float
ghci> :t undefined :: SetParam (Param_c Base) Char (Example (Example a1 Int c1) b2 Float)
:: Example (Example a1 Int c1) b2 Char
### the lens laws
The first lens law is that if we set a type parameter to its current value, then the overall type does not change. In code, this looks like:
> type LensLaw1 lens t = t ~ SetParam lens (GetParam lens t) t
The second lens law states that if we set a type parameter to a certain value, then get the value at the location of the lens, then we should get back our original type. In code:
> type LensLaw2 lens a t = a ~ GetParam lens (SetParam lens a t)
And lastly, if we set the same parameter twice, then the last setter wins. In code:
> type LensLaw3 lens a b t = a ~ GetParam lens (SetParam lens a (SetParam lens b t))
There are many other laws that can be derived from these three simple laws. For example, we can derive this fourth lens law from laws 1 and 3:
> type LensLaw4 lens a b t = SetParam lens a (SetParam lens b t) ~ SetParam lens a t
We’re glossing over some technicalities involving injective type families, here, but we’ll return to this later in the post.
### promoting quick check to the type level
Any time we have laws in Haskell, we’ve got to prove that they hold. Sometimes, parametricity does this for us automatically (as in the case of the Functor laws). But usually, we rely on test frameworks like QuickCheck. Therefore, we need these frameworks at the type level.
This turns out to be straightforward. We can use these functions to verify our laws:
> property_lensLaw1 :: LensLaw1 lens t => TypeLens Base lens -> t -> ()
> property_lensLaw1 _ _ = ()
> property_lensLaw2 :: LensLaw2 lens a t => TypeLens Base lens -> a -> t -> ()
> property_lensLaw2 _ _ _ = ()
> property_lensLaw3 :: LensLaw3 lens a b t => TypeLens Base lens -> a -> b -> t -> ()
> property_lensLaw3 _ _ _ _ = ()
We test the laws as follows. First, specialize all the type variables in the function. Then, ask GHC if the function type checks. If it does, then the law holds for the type variables we chose.
Here is an example:
ghci> property_lensLaw1 _a (undefined :: Example Int Float String)
()
ghci> property_lensLaw2 _a (undefined :: String) (undefined :: Example Int Float String)
()
ghci> property_lensLaw3 _a (undefined :: String) (undefined :: [a]) (undefined :: Example Int Float String)
()
Now, let’s write some GetParam/SetParam instances that do not obey the laws and see what happens. In the NationalSecurityAgency type below, GetParams works just fine, but SetParams is broken.
> data NationalSecurityAgency x = NationalSecurityAgency
> class Param_x (p :: * -> Constraint) t
> _x = TypeLens :: TypeLens p (Param_x p)
> type instance GetParam (Param_x p) (NationalSecurityAgency x) = x
> type instance SetParam (Param_x p) x' (NationalSecurityAgency x) = NationalSecurityAgency String
When we test the first lens law using a String, everything works fine:
ghci> lensLaw1 _x (undefined :: NationalSecurityAgency String)
()
But when we test it using an Int, the type checker explodes:
ghci> lensLaw1 _x (undefined :: NationalSecurityAgency Int)
:73:1:
Couldn't match type ‘[Char]’ with ‘Int’
Expected type: SetParam
(Param_x Base)
(GetParam (Param_x Base) (NationalSecurityAgency Int))
(NationalSecurityAgency Int)
Actual type: NationalSecurityAgency Int
In the expression: lensLaw1 _x (undefined :: NationalSecurityAgency Int)
In an equation for ‘it’:
it = lensLaw1 _x (undefined :: NationalSecurityAgency Int)
You can imagine a template haskell quickcheck that calls these property functions many times on random types to give a probabalistic test our type laws hold.
### using the laws
These laws will greatly simplify inferred types in our programs. We’ll see why using an example.
Consider the beloved Applicative sequencing operator (*>) . In the standard libraries, it has the type:
(*>) :: Applicative f => f a -> f b -> f b
Sweet and simple.
In the applicative class we generated yesterday, however, the sequencing operator is pretty nasty looking. GHCi reports it has the type of:
> (*>) ::
> ( Applicative lens
> ( SetParam
> lens
> (a1 -> GetParam lens (SetParam lens (a -> GetParam lens tb1) tb1))
> (SetParam lens (a -> GetParam lens tb1) tb1)
> )
> , Applicative lens (SetParam lens (a -> GetParam lens tb1) tb1)
> , Applicative lens tb1
> , (b1 -> a2 -> a2) ~ GetParam
> lens
> (SetParam
> lens
> (a1 -> GetParam lens (SetParam lens (a -> GetParam lens tb1) tb1))
> (SetParam lens (a -> GetParam lens tb1) tb1))
> , a1 ~ GetParam lens (SetParam lens a1 (SetParam lens (a -> GetParam lens tb1) tb1))
> , tb0 ~ SetParam lens a tb1
> , ta ~ SetParam lens a1 (SetParam lens (a -> GetParam lens tb1) tb1)
> , a ~ GetParam lens (SetParam lens a tb1)
> ) => ta
> -> tb0
> -> TypeLens Base lens
> -> tb1
> (*>) = undefined
> class Applicative lens t
Yikes! What the hell does that beast do?!
Somehow, we need to simplify this type signature, and the type lens laws are what lets us do this. For example, one of the constraints above is:
a1 ~ GetParam lens (SetParam lens a1 (SetParam lens (a -> GetParam lens tb1) tb1))
We can use the third lens law to simplify this to:
a1 ~ GetParam lens (SetParam lens a1 tb1)
If we repeat this process many times, we get a type signature that looks like:
> newop ::
> ( Applicative lens ( SetParam lens ( a -> b -> b ) tb )
> , Applicative lens ( SetParam lens ( b -> b ) tb )
> , Applicative lens tb
> , tb ~ SetParam lens b tb
> , LensLaw2 lens (b->b) tb
> , LensLaw2 lens b tb
> , LensLaw3 lens (a -> b -> b) (b -> b) tb
> , LensLaw3 lens a (b->b) tb
> , LensLaw4 lens (a->b->b) (b->b) tb
> , LensLaw4 lens a (b->b) tb
> ) => SetParam lens a tb
> -> tb
> -> TypeLens Base lens
> -> tb
> newop = (*>)
This looks quite a bit better, but is still less than ideal. Actually, this is as far as you can get with the lens laws in GHC 7.8. You need injective type families to go further. (See this mailing list thread and this ghc trac issue for more details about what injective type families are.) Currently, injectve type families are slated to enter GHC 7.10, so the rest of this post will be a bit more speculative about what this future GHC can do.
### injecting power into the lens laws
Let’s take another look at the type synonyms for the lens laws:
type LensLaw1 lens t = t ~ SetParam lens (GetParam lens t) t
type LensLaw2 lens a t = a ~ GetParam lens (SetParam lens a t)
type LensLaw3 lens a b t = a ~ GetParam lens (SetParam lens a (SetParam lens b t))
This code only enforces that the laws hold for certain parameters. But that’s not what we want! All types are equal in the eyes of the law, so what we really want is type synonyms that look like:
type LensLaw1' = forall lens t. t ~ SetParam lens (GetParam lens t) t
type LensLaw2' = forall lens a t. a ~ GetParam lens (SetParam lens a t)
type LensLaw3' = forall lens a b t. a ~ GetParam lens (SetParam lens a (SetParam lens b t))
Unfortunately, sticking this into GHC yields the dreaded “type families may not be injective” error message. With injective type families, we would be able to write these laws. (This is a somewhat bold claim that I won’t justify here.) Then our code would simplify further to:
newop' ::
( Applicative lens ( SetParam lens ( a -> b -> b ) tb )
, Applicative lens ( SetParam lens ( b -> b ) tb )
, Applicative lens tb
, tb ~ SetParam lens b tb
, LensLaw1'
, LensLaw2'
, LensLaw3'
) => SetParam lens a tb
-> tb
-> TypeLens Base lens
-> tb
newop' = (*>)
### a proposal for new syntax
We can still do better. The lens laws are not something that applies only to specific functions. They are global properties of the type families, and they apply everywhere. Therefore, they should be implicitly added as constraints into every type signature.
We could make this happen by adding a new syntax called “type rules”. In the same way that value level rewrite rules simplify our values, these type rules would simplify our types. The syntax could look something like:
type rule LensLaw1' = forall lens t. t ~ SetParam lens (GetParam lens t) t
type rule LensLaw2' = forall lens a t. a ~ GetParam lens (SetParam lens a t)
type rule LensLaw3' = forall lens a b t. a ~ GetParam lens (SetParam lens a (SetParam lens b t))
There are two differences between a type rule and a regular type synonym: First, they can take no type parameters. Second, they are implicitly added to every type signature in your program.
The three rules above would allow us to rewrite our function as:
newop'' ::
( Applicative lens ( SetParam lens ( a -> b -> b ) tb )
, Applicative lens ( SetParam lens ( b -> b ) tb )
, Applicative lens tb
, tb ~ SetParam lens b tb
) => SetParam lens a tb
-> tb
-> TypeLens Base lens
-> tb
newop'' = (*>)
That is soooo much nicer!
### Stay tuned
We still have some work to go to get our newop function’s type signature as simple as (*>) from the standard library. But I think we’ve got a realistic shot at it. In a coming post I’ll be proposing a way to combine the multiple Applicative constraints into a single constraint, and a nice looking sugar over the SetParam/GetParam type families.
If you didn’t quite follow the previous posts about Functors and Applicatives, they might make a bit more sense now.
This post covers a pretty neat trick with closed type families. Normal type families are “open” because any file can add new instances of the type. Closed type families, however, must be defined in a single file. This lets the type checker make more assumptions, and so the closed families are more powerful. In this post, we will circumvent this restriction and define certain closed type families over many files.
We only need these two language extensions for the technique:
> {-# LANGUAGE TypeFamilies #-}
> {-# LANGUAGE UndecidableInstances #-}
But for our motivating example, we’ll also use these extensions and some basic imports:
> {-# LANGUAGE KindSignatures #-}
> {-# LANGUAGE MultiParamTypeClasses #-}
> {-# LANGUAGE ConstraintKinds #-}
> import Data.Proxy
> import GHC.Exts
Let’s begin.
Consider the classes:
> class Param_a (p :: * -> Constraint) t
> class Param_b (p :: * -> Constraint) t
> class Param_c (p :: * -> Constraint) t
> class Base t
These classes can be chained together like so:
> type Telescope_abc = Param_a (Param_b (Param_c Base))
It is easy to write a type family that returns the “head” of this list. On a telescope, the lens closest to you is called the eye piece, so that’s what we’ll call our type family:
> type family EyePiece ( p :: * -> Constraint ) :: * -> Constraint
> type instance EyePiece (Param_a p) = Param_a Base
> type instance EyePiece (Param_b p) = Param_b Base
> type instance EyePiece (Param_c p) = Param_c Base
Again, this type family is “open” because new instances can be defined in any file.
We might use this EyePiece type family as:
ghci> :t Proxy :: Proxy (EyePiece Telescope_abc)
:: Proxy (Param_a Base)
Now, let’s try to write a type class that does the opposite. Instead of extracting the first element in the chain, it will extract the last. On a telescope the lens farthest away from you is called the objective, so that’s what we’ll call our type family. We’ll also need to define it as a closed type family:
type family Objective (lens :: * -> Constraint) :: * -> Constraint where
Objective (Param_a p) = Objective p
Objective (Param_b p) = Objective p
Objective (Param_c p) = Objective p
Objective (Param_a Base) = Param_a Base
Objective (Param_b Base) = Param_b Base
Objective (Param_c Base) = Param_c Base
We can use the Objective family like:
ghci> :t Proxy :: Proxy (Objective Telescope_abc)
:: Proxy (Param_c Base)
The Objective family must be closed. This is because the only way to identify when we are at the end of the telescope is by checking if the p parmaeter is the Base class. If it is, then we’re done. If not, we must keep moving down the telescope recusively. Without a closed type family, we would have to explicitly list all of the recursive paths. This means \(O(n^2)\) type instances whenever we want to add a new Param_xxx class. That’s nasty and error prone.
Again, the downside of closed type families is that they must be defined all in one place. We can work around this limitation by “factoring” the closed type family into a collection of closed and open type families. In the example above, this works out to be:
> type family Objective (lens :: * -> Constraint) :: * -> Constraint
> type instance Objective (Param_a p) = Objective_Param_a (Param_a p)
> type instance Objective (Param_b p) = Objective_Param_b (Param_b p)
> type instance Objective (Param_c p) = Objective_Param_c (Param_c p)
> type instance Objective Base = Base
> type family Objective_Param_a (lens :: * -> Constraint) :: * -> Constraint where
> Objective_Param_a (Param_a Base) = Param_a Base
> Objective_Param_a (Param_a p) = Objective p
> type family Objective_Param_b (lens :: * -> Constraint) :: * -> Constraint where
> Objective_Param_b (Param_b Base) = Param_b Base
> Objective_Param_b (Param_b p) = Objective p
> type family Objective_Param_c (lens :: * -> Constraint) :: * -> Constraint where
> Objective_Param_c (Param_c Base) = Param_c Base
> Objective_Param_c (Param_c p) = Objective p
ghci> :t Proxy :: Proxy (Objective Telescope_abc)
:: Proxy (Param_c Base)
With this factoring, we are able to define the Objective instance for each Param_xxx in separate files and retain the benefits of closed type families.
Here is another example. The RemoveObjective family acts like the init function from the Prelude:
> type family RemoveObjective (lens :: * -> Constraint) :: * -> Constraint
> type instance RemoveObjective (Param_a p) = RemoveObjective_Param_a (Param_a p)
> type instance RemoveObjective (Param_b p) = RemoveObjective_Param_b (Param_b p)
> type instance RemoveObjective (Param_c p) = RemoveObjective_Param_c (Param_c p)
> type family RemoveObjective_Param_a (lens :: * -> Constraint) :: * -> Constraint where
> RemoveObjective_Param_a (Param_a Base) = Base
> RemoveObjective_Param_a (Param_a p) = Param_a (RemoveObjective p)
> type family RemoveObjective_Param_b (lens :: * -> Constraint) :: * -> Constraint where
> RemoveObjective_Param_b (Param_b Base) = Base
> RemoveObjective_Param_b (Param_b p) = Param_b (RemoveObjective p)
> type family RemoveObjective_Param_c (lens :: * -> Constraint) :: * -> Constraint where
> RemoveObjective_Param_c (Param_c Base) = Base
> RemoveObjective_Param_c (Param_c p) = Param_b (RemoveObjective p)
ghci> :t Proxy :: Proxy (RemoveObjective Telescope_abc)
:: Proxy (Param_a (Param_b Base))
Of course, you can’t do this trick with every closed type family. For example, the RemoveObjective_Param_c family above cannot be factored any smaller. But if you find yourself wanting the benefits of both closed and open type families, then your type probably has the needed structure.
Welcome back for round 2 of adventures in typeparams. In our last episode, we lensified the Functor type class. In this episode, we’re going to lensify the Applicative type class and plunge head first into type lens parsing.
Okay… down to business.
> {-# LANGUAGE TemplateHaskell #-}
> {-# LANGUAGE ScopedTypeVariables #-}
> {-# LANGUAGE KindSignatures #-}
> {-# LANGUAGE TypeFamilies #-}
> {-# LANGUAGE MultiParamTypeClasses #-}
> {-# LANGUAGE UndecidableInstances #-}
> {-# LANGUAGE FlexibleInstances #-}
> {-# LANGUAGE RankNTypes #-}
> {-# LANGUAGE OverloadedStrings #-}
We’ve got a few more imports today. Our work from last time has been uploaded to hackage and is in the Data.Params.Functor module. For parsing, we’ll be torturing the attoparsec library.
> import Control.Category
> import Prelude hiding ( (.), id, Functor(..), Applicative(..) )
> import qualified Prelude as P
> import Data.Params
> import Data.Params.Functor
> import qualified Control.Applicative as Ap
> import qualified Data.Attoparsec.Text as A
> import Data.Attoparsec.Text (parse,Parser,Result)
> import Data.Monoid
> import Data.Text (Text,pack)
As a quick warm up, let’s talk about the infix fmap operator <$>. The fmap function has type:
fmap :: Functor lens tb
=> TypeLens Base lens
-> (a -> GetParam lens tb) -> SetParam lens a tb -> tb
All this <$> operator does is move fmap’s lens parameter to the end of the parameter list. This restructuring will help us chain our operators together and will be a common theme throughout the post. The operator is defined as:
> infixl 4 <$>
> (f <$> t) lens = fmap lens f t
We can use the operator like:
ghci> length <$> (Left $ Right "test") $ _a._b
Left (Right 4)
It will also be useful to have an operator just for specifying the type lens. Since a lens specifies the location “at” which we are operating, we call our new operator @@. It is defined as:
> infixr 0 @@
> (@@) :: (TypeLens p q -> b) -> TypeLens p q -> b
> (@@) = id
And used like:
ghci> length <$> (Left $ Right "test") @@ _a._b
Left (Right 4)
The fourth lens laws states that we must provide both prefix and infix versions of every combinator. Therefore we also introduce the function:
> at :: TypeLens q p -> (TypeLens q p -> t) -> t
> at lens f = f lens
ghci> at (_a._b) $ length <$> (Left $ Right "test")
Left (Right 4)
We’re ready to see our new Applicative class:
> class Functor lens tb => Applicative lens tb where
>
> pure :: GetParam lens tb -> TypeLens Base lens -> tb
>
> ap ::
> ( tf ~ SetParam lens (a -> b) tb
> , ta ~ SetParam lens a tb
> , a ~ GetParam lens ta
> , b ~ GetParam lens tb
> )
> => TypeLens Base lens
> -> tf
> -> ta
> -> tb
The functions pure and ap have the exact same meaning and laws as their counterparts in the standard libraries. The only difference is the addition of the TypeLens parameter and corresponding constraints.
The Left and Right Applicative instances for the Either class are defined as:
> instance Applicative p a => Applicative (Param_a p) (Either a b) where
> pure a lens = Left $ pure a (zoom lens)
> ap lens (Right a) _ = Right a
> ap lens (Left f) (Right a) = Right a
> ap lens (Left f) (Left b) = Left $ ap (zoom lens) f b
> instance Applicative p b => Applicative (Param_b p) (Either a b) where
> pure b lens = Right $ pure b (zoom lens)
> ap lens (Left a) _ = Left a
> ap lens (Right f) (Left a) = Left a
> ap lens (Right f) (Right b) = Right $ ap (zoom lens) f b
And just like with Functors, we have to define the base case for our recusive definitions:
> instance Applicative Base t where
> pure a _ = a
> ap _ f = f
Now, to get the Applicative notation we all know and love, we redefine the <*> operator. It is just a thin wrapper around the ap function. Like the <$> operator, we just move the lens parameter to the end:
> infixl 4 <*>
> (tf <*> ta) lens = ap lens (tf lens) ta
Easy as cake!
Let’s try it out!
We’ll start with the doubly nested Either. For nested Eithers, the lens we use specifies what the success constructors are. Any other constructors will act as errors.
Here’s an example without an error:
> fact1 :: Either (Either a String) b
> fact1 = (++) <$> Left (Right "haskell") <*> Left (Right " rocks!") @@ _a._b
ghci> fact1
Left (Right "haskell rocks!")
Here we have one possible way of signaling an error:
> fact2 :: Either (Either a String) String
> fact2 = (++) <$> Left (Right "python") <*> Right "error" @@ _a._b
ghci> fact2
Right "error"
And here we have the other way:
> fact3 :: Either (Either String String) b
> fact3 = (++) <$> Left (Right "c++") <*> Left (Left "error") @@ _a._b
ghci> fact3
Left (Left "error")
Of course, Applicatives are much more useful when our functions have many arguments. Let’s create a function that concatenates four strings together into a phrase:
> cat4 :: String -> String -> String -> String -> String
> cat4 a b c d = a ++ " " ++ b ++ " "++ c ++ " " ++ d
And create a phrase with no errors:
> phrase1 :: Either (Either a String) b
> phrase1 = cat4
> <$> Left (Right "haskell")
> <*> Left (Right "is")
> <*> Left (Right "super")
> <*> Left (Right "awesome")
> @@ _a._b
ghci> phrase1
Left (Right "haskell is super awesome")
And a phrase with two errors:
> phrase2 :: Either (Either String String) String
> phrase2 = cat4
> <$> Left (Right "python")
> <*> Right "error"
> <*> Left (Right "is")
> <*> Left (Left "error")
> @@ _a._b
ghci> phrase2
Right "error"
Notice that in phrase2 we had two different causes of errors. The error with the fewest number of terms will always win. As a proof by example, let’s shuffle around our previous errors. We still get the same result:
> phrase3 :: Either (Either String String) String
> phrase3 = cat4
> <$> Left (Right "python")
> <*> Left (Left "error")
> <*> Left (Right "is")
> <*> Right "error"
> @@ _a._b
ghci> phrase3
Right "error"
This is cool, but it’s not yet very generic. Every time we want a success, we have to manually specify the constructors we want to use. We can avoid this tedium using the pure function. It’s type signature is:
pure :: Applicative lens tb
=> GetParam lens tb -> TypeLens Base lens -> tb
The important thing to notice is that the last parameter takes a TypeLens. This follows our magic formula. We can substitute it into our phrase1 variable like:
> phrase1' :: Either (Either a String) b
> phrase1' = cat4
> <$> (pure "haskell" @@ _a._b)
> <*> (pure "is" @@ _a._b)
> <*> (pure "super" @@ _a._b)
> <*> (pure "awesome" @@ _a._b)
> @@ _a._b
But this is nasty! We have to specify the same TypeLens everywhere we want to use the pure function.
Thankfully, we don’t have to do this. The whole point of lenses is to create ridiculous new combinators that reduce boilerplate! So let’s do that! The “ap minus” combintator will automatically apply the lens for us:
> infixl 4 <*>-
> (tf <*>- ta) lens = (tf <*> ta lens) lens
The minus sign signifies that the right side is “minus a lens” and so we should give it one automtically. Using this combinator, we can rewrite our phrase to look like:
> phrase1'' :: Either (Either a String) b
> phrase1'' = cat4
> <$> (pure "haskell" @@ _a._b)
> <*>- pure "is"
> <*>- pure "super"
> <*>- pure "awesome"
> @@ _a._b
In order to get rid of the first lens application, we’ll need to perform the same trick to <$>:
> infixl 4 <$>-
> (f <$>- t) lens = (f <$> t lens) lens
And we get the beautiful:
> phrase1''' :: Either (Either a String) b
> phrase1''' = cat4
> <$>- pure "haskell"
> <*>- pure "is"
> <*>- pure "super"
> <*>- pure "awesome"
> @@ _a._b
There’s two more Applicative combinators needed for parsing: > and < . They use the same definition in the standard libraries, but with a third lens parameter:
> infixl 4 <*
> (u <* v) lens = pure const <*> u <*> v @@ lens
> infixl 4 *>
> (u *> v) lens = pure (const id) <*> u <*> v @@ lens
Now we need to create all of the “minus” operators. Remember that the minus sign points to the variable that will have the lens automatically applied for us:
> infixl 4 <*-
> infixl 4 -<*-
> infixl 4 -<*
> (u <*- v) lens = ( u <* v lens ) lens
> (u -<*- v) lens = ( u lens <* v lens ) lens
> (u -<* v) lens = ( u lens <* v ) lens
> infixl 4 *>-
> infixl 4 -*>-
> infixl 4 -*>
> (u *>- v) lens = ( u *> v lens ) lens
> (u -*>- v) lens = ( u lens *> v lens ) lens
> (u -*> v) lens = ( u lens *> v ) lens
Confused? Just remember: when you master these new combinators, all the n00bs will worship your l33t h4sk311 5ki115.
Now that we’ve constructed our torture chamber, it’s time to put attoparsec on the rack. We’ll use the built-in “blind” Functor and Applicative instances to define our lensified ones as:
> mkParams ''Parser
> instance Functor p a => Functor (Param_a p) (Parser a) where
> fmap' lens f parser = P.fmap (fmap' (zoom lens) f) parser
> instance Applicative (Param_a Base) (Parser a) where
> pure a lens = Ap.pure $ pure a (zoom lens)
> ap lens tf ta = tf Ap.<*> ta
And now we’re ready to start parsing. We’ll start simple. The attoparsec library provides a function called string that matches a specified string. We’ll use it to create a Parser that matches the phrase “haskell rocks”:
> chain1 :: TypeLens Base (Param_a Base) -> Parser Text
> chain1 = A.string "haskell" *> A.string " rocks"
ghci> parse (chain1 @@ _a) "haskell rocks"
Done "" " rocks"
In the above example, we chose to not specify the lens in the chain1 variable. This means that if we want to chain it with another parser, we should use the minus then operator like:
> chain2 :: TypeLens Base (Param_a Base) -> Parser Text
> chain2 = chain1 -*> A.string "!"
ghci> parse (chain2 @@ _a) "haskell rocks!"
Done "" "!"
If we choose to compose on the right, then we’ll need to move the minus sign to the right:
> chain3 :: TypeLens Base (Param_a Base) -> Parser Text
> chain3 = A.string "¡" *>- chain2
ghci> parse (chain3 @@ _a) "¡haskell rocks!"
Done "" "!"
We have to use minus operators whenever we chain more than two parsers together. In the example below, the first *>
takes three parameters (two parsers and a lens). It gets the lens from the minus of the first -*>
operator. That operator also needs a lens, which it gets from the next -*>
, and so on.
> chain4 :: TypeLens Base (Param_a Base) -> Parser Text
> chain4 = A.string "do"
> *> A.string " you"
> -*> A.string " get"
> -*> A.string " it"
> -*> A.string " yet?"
ghci> parse (chain4 @@ _a) "do you get it yet?"
Done "" " yet?"
If we need to apply a lens to both sides, then we use the -*>-
operator:
> chain5 :: TypeLens Base (Param_a Base) -> Parser Text
> chain5 = chain3 -*> A.string " ... " -*>- chain4
ghci> parse (chain5 @@ _a) "¡haskell rocks! ... do you get it yet?"
Done "" " yet?"
Everything in the last section we could have done without type lenses. But now we’re going to lift the Parser into an arbitrary data type and work with it.
As a concrete example, we’ll put our Parser inside a Maybe. The Maybe Applicative instance is:
> instance Applicative p a => Applicative (Param_a p) (Maybe a) where
> pure a lens = Just $ pure a (zoom lens)
> ap lens Nothing _ = Nothing
> ap lens (Just f) Nothing = Nothing
> ap lens (Just f) (Just b) = Just $ ap (zoom lens) f b
And for convenience we’ll use the following parseMaybe function. It has the same effect as the parse function provided by attoparsec, but does everything from within a Maybe.
> parseMaybe :: Maybe (Parser a) -> Text -> Maybe (Result a)
> parseMaybe parser str = flip parse str <$> parser @@ _a
Next, we lensify our parser combinators. This string lifts the string function provided by the attoparsec library into an arbitrary parameter specified by our type lens:
> string c lens = pure (A.string c) (zoom lens)
Back to parsing.
Let’s just repeat the same 5 parse chains from above, but now within the Maybe context. Notice two things:
The A.string function provided by the attoparsec library did not take a type parameter, but our new string function does. This means there’s a lot more minus combinators!
Instead of specifying our lens to focus on the _a
parameter, we must focus on the _a._a
parameter to hit the parser.
> chain1' :: TypeLens Base (Param_a (Param_a Base)) -> Maybe (Parser Text)
> chain1' = string "haskell" -*>- string " rocks"
ghci> parseMaybe (chain1' @@ _a._a) "haskell rocks"
Just Done "" " rocks"
> chain2' :: TypeLens Base (Param_a (Param_a Base)) -> Maybe (Parser Text)
> chain2' = chain1' -*>- string "!"
ghci> parse (chain2' @@ _a._a) "haskell rocks!"
Done "" '!'
> chain3' :: TypeLens Base (Param_a (Param_a Base)) -> Maybe (Parser Text)
> chain3' = string "¡" -*>- chain2'
ghci> parse (chain3' @@ _a._a) "¡haskell rocks!"
Done "" '!'
> chain4' :: TypeLens Base (Param_a (Param_a Base)) -> Maybe (Parser Text)
> chain4' = string "do" -*>- string " you" -*>- string " get" -*>- string " it" -*>- string " yet?"
ghci> parse (chain4' @@ _a._a) "do you get it yet?"
Done "" " yet?"
> chain5' :: TypeLens Base (Param_a (Param_a Base)) -> Maybe (Parser Text)
> chain5' = chain3' -*>- string " ... " -*>- chain4'
ghci> parse (chain5' @@ _a._a) "¡haskell rocks! ... do you get it yet?"
Done "" " yet?"
Again, there’s nothing special about being nested inside a Maybe. We could be nested inside any monstrous data type of your choosing. Yay!
But in the example we’ve chosen, what happens if we add a Maybe into the chain? Nothing takes over and eats the whole Parser. It doesn’t matter if the Parse was failing or succeeding, the answer is Nothing.
> chain6 :: TypeLens Base (Param_a (Param_a Base)) -> Maybe (Parser Text)
> chain6 = string "python" -*> Nothing
ghci> parseMaybe (chain6 @@ _a._a) "python"
Nothing
ghci> parseMaybe (chain6 @@ _a._a) "haskell"
Nothing
Now we’re ready for some super coolness. We’re going to design a parsing circuit that parses three unique Parse streams simultaneously!
Here is our Circuit definition:
> data Circuit x y z
> = Circuit (Maybe x) (Maybe y) (Maybe z)
> | CircuitFail
> deriving (Show)
> mkParams ''Circuit
The x, y, and z type params will hold the Parsers. These Parsers are wrapped within a Maybe. A value of Nothing represents that that parser will not consume any input. A value of (Just parser) means that it will consume input.
The Functor instances are rather interesting because of the Maybe wrapper. We must compose _a
with the zoomed lens to make the types work out:
> instance Functor p x => Functor (Param_x p) (Circuit x y z) where
> fmap' lens f CircuitFail = CircuitFail
> fmap' lens f (Circuit x y z) = Circuit (fmap' (_a . zoom lens) f x) y z
> instance Functor p y => Functor (Param_y p) (Circuit x y z) where
> fmap' lens f CircuitFail = CircuitFail
> fmap' lens f (Circuit x y z) = Circuit x (fmap' (_a . zoom lens) f y) z
> instance Functor p z => Functor (Param_z p) (Circuit x y z) where
> fmap' lens f CircuitFail = CircuitFail
> fmap' lens f (Circuit x y z) = Circuit x y (fmap' (_a . zoom lens) f z)
The Applicative instances are where all the action is at. In each case, the pure function is fairly straightforward. It looks just like the other ones we’ve seen except that it applies the _a
to the zoomed lens and gives default values of Nothing to the other parsers. The ap
function calls ap
on the appropriate parser and uses the First
Monoid
instance on the other two.
> instance
> ( Applicative p x
> , Monoid y
> , Monoid z
> ) => Applicative (Param_x p) (Circuit x y z)
> where
> pure x lens = Circuit (pure x @@ (_a . zoom lens)) Nothing Nothing
> ap lens CircuitFail _ = CircuitFail
> ap lens _ CircuitFail = CircuitFail
> ap lens (Circuit x1 y1 z1) (Circuit x2 y2 z2) = Circuit
> (ap (_a . zoom lens) x1 x2)
> (getFirst $ First y1 <> First y2)
> (getFirst $ First z1 <> First z2)
> instance (Monoid x, Applicative p y, Monoid z) => Applicative (Param_y p) (Circuit x y z) where
> pure a lens = Circuit Nothing (pure a @@ _a . zoom lens) Nothing
> ap lens CircuitFail _ = CircuitFail
> ap lens _ CircuitFail = CircuitFail
> ap lens (Circuit x1 y1 z1) (Circuit x2 y2 z2) = Circuit
> (getFirst $ First x1 <> First x2)
> (ap (_a . zoom lens) y1 y2)
> (getFirst $ First z1 <> First z2)
> instance (Monoid x, Monoid y, Applicative p z) => Applicative (Param_z p) (Circuit x y z) where
> pure a lens = Circuit Nothing Nothing (pure a @@ _a . zoom lens)
> ap lens CircuitFail _ = CircuitFail
> ap lens _ CircuitFail = CircuitFail
> ap lens (Circuit x1 y1 z1) (Circuit x2 y2 z2) = Circuit
> (getFirst $ First x1 <> First x2)
> (getFirst $ First y1 <> First y2)
> (ap (_a . zoom lens) z1 z2)
We write a nice wrapper so we can parse our circuits:
> parseCircuit
> :: Circuit (Parser x) (Parser y) (Parser z)
> -> Text
> -> Text
> -> Text
> -> Circuit (Result x) (Result y) (Result z)
> parseCircuit CircuitFail _ _ _ = CircuitFail
> parseCircuit (Circuit x y z) str1 str2 str3 = Circuit
> ( parseMaybe x str1 )
> ( parseMaybe y str2 )
> ( parseMaybe z str3 )
And now here is a simple circuit for us to play with:
> circ1 :: Circuit (Parser Text) (Parser Text) (Parser Text)
> circ1 = Circuit
> (string (pack "haskell") @@ _a._a)
> (string (pack "is" ) @@ _a._a)
> (string (pack "fun" ) @@ _a._a)
When we try to parse our circuit, we just match each word in parallel:
ghci> parseCircuit circ1 "haskell" "is" "fun"
Circuit
(Just Done "" "haskell")
(Just Done "" "is")
(Just Done "" "fun")
In this example, we compose our circuit only on the first parameter:
ghci> parseCircuit (circ1 *> circ1 @@ _x._a) "haskell" "is" "fun"
Circuit
(Just Partial _)
(Just Done "" "is")
(Just Done "" "fun")
Notice that (above) we no longer finished after matching the word haskell
. We’ve got a whole ’nother haskell
to go. Oh Joy!
Here, we match completely:
ghci> parseCircuit (circ1 *> circ1 @@ _x._a) "haskellhaskell" "is" "fun"
Circuit
(Just Done "" "haskell")
(Just Done "" "is")
(Just Done "" "fun")
In our Circuit type, every parser is—at least so far—acting completely independently. That means one parser can fail while the others succeed:
ghci> parseCircuit circ1 "python " "is" "fun"
Circuit
(Just Fail "python " [] "Failed reading: takeWith")
(Just Done "" "is")
(Just Done "" "fun")
Let’s create another simple circuit to play with. In this one, only the first parser performs any actions. The other two are noops:
> circ2 :: Circuit (Parser Text) (Parser y) (Parser z)
> circ2 = Circuit
> (string (pack " with lenses") @@ _a._a)
> Nothing
> Nothing
We can compose circ1 and circ2 exactly as you would suspect. Our original string is now only a partial match:
ghci> parseCircuit (circ1 *> circ2 @@ _x._a) "haskell" "is" "fun"
Circuit
(Just Partial _)
(Just Done "" "is")
(Just Done "" "fun")
But this matches perfectly:
ghci> parseCircuit (circ1 *> circ2 @@ _x._a) "haskell with lenses" "is" "fun"
Circuit
(Just Done "" " with lenses")
(Just Done "" "is")
(Just Done "" "fun")
And this fails:
ghci> parseCircuit (circ1 *> circ2 @@ _x._a) "haskell without lenses" "is" "fun"
Circuit
(Just Fail " without lenses" [] "Failed reading: takeWith")
(Just Done "" "is")
(Just Done "" "fun")
We can simplify the code of circ2 even further (and make it more generic) using the pure function:
> circ3 :: Circuit (Parser Text) (Parser y) (Parser z)
> circ3 = pure (string (pack " with lenses") @@ _a) @@ _x
circ3 behaves exactly like circ2 when sequenced with circ1:
ghci> parseCircuit (circ1 *> circ3 @@ _x._a) "haskell with lenses" "is" "fun"
Circuit
(Just Done "" " with lenses")
(Just Done "" "is")
(Just Done "" "fun")
And that’s enough for today. GHC needs to rest. It’s tired.
We’ve still go so many tantalizing questions to answer:
What is that CircuitFail gizmo doing?
How do I use Alternative to branch my parser?
Can a Circuit’s parser depend on the other parsers in the Circuit?
Do fiber optic burritos taste good??!?!
Stay tuned to find out!
The typeparams package provides type lenses. Let’s combine them with Functors. Because why not?! You’ll need to have at least skimmed the linked README to understand what’s going on here.
First, enable some GHC magic:
> {-# LANGUAGE TemplateHaskell #-}
> {-# LANGUAGE ScopedTypeVariables #-}
> {-# LANGUAGE KindSignatures #-}
> {-# LANGUAGE TypeFamilies #-}
> {-# LANGUAGE MultiParamTypeClasses #-}
> {-# LANGUAGE UndecidableInstances #-}
> {-# LANGUAGE FlexibleInstances #-}
> {-# LANGUAGE RankNTypes #-}
And import our libraries:
> import Control.Category
> import Prelude hiding ( (.), id, Functor(..) )
> import Data.Params
We’ll use the Either type as our main example. It’s defined as:
data Either a b = Left a | Right b
The Functor instance is pretty straightforward:
class Functor f where
fmap :: (a -> b) -> f a -> f b
instance Functor (Either a) where
fmap f (Left a) = Left a
fmap f (Right b) = Right $ f b
But this instance has a key limitation: We can map a function only over the the last type.
Bifunctors are the current solution to this problem. A recent, popular proposal suggested adding them to base. But this is an ad hoc solution whose application does not extend far beyond the Either type.
Type lenses will (kinda sort of) provide a cleaner solution. That is, they fix the problem about as well as regular old lenses fix the problems of record selectors. As a bonus, we’ll get a convenient mechanism for mapping over nested Functors.
Here is the alternative definition of the Functor class using type lenses:
> class Functor lens t where
> fmap' :: a ~ GetParam lens t
> => TypeLens p lens
> -> (a -> b)
> -> t
> -> SetParam lens b t
It’s okay if you don’t understand the type signature at first glace. (That’s how know you’re using lenses, afterall!) Let’s step through it using the Either example.
The first argument is the type lens. This indicates which parameter we will be mapping over the type t. In the Either data type, we could use the variable _a to map over the Left component or _b to map over the Right.
Next, we encounter two type families, GetParam and SetParam. These act as getters and setters at the type level. In the above example, GetParam is used to extract arbitrary type params from a type. It is defined as:
type family GetParam (p::k1) (t:: *) :: k3
type instance GetParam Param_a (Either a b) = a
type instance GetParam Param_b (Either a b) = b
The SetParam type similarly sets the type of arbitrary params in a type. It is defined as:
type family SetParam (p::k1) (a::k2) (t:: *) :: *
type instance SetParam Param_a a' (Either a b) = Either a' b
type instance SetParam Param_b b' (Either a b) = Either a b'
These instances can be automatically provided for any type by calling the mkParams template haskell function like so:
> mkParams ''Either
Quick aside: With injective type families and a little sugar, we could make this definition of Functor a tad cleaner.
We can replicate the traditional Functor instance with the code:
instance Functor (Param_b Base) (Either a b) where
fmap' lens f (Left a) = Left a
fmap' lens f (Right b) = Right $ f b
and create a “Left” Functor instance as:
instance Functor (Param_a Base) (Either a b) where
fmap' lens f (Left a) = Left $ f a
fmap' lens f (Right b) = Right b
Together, these instances let us run the commands:
ghci> fmap _b length $ Left "Roses are red,"
Left "Roses are red,"
ghci> fmap _b length $ Rightt "Violets are blue,"
Right 17
ghci> fmap _a length $ Left "Haskell is fun,"
Left 15
ghci> fmap _a length $ Right "Type lenses are cool."
Right "Type lenses are cool."
With the above definitions, we can’t combine our type lenses at all. Enter the funnily named and awkwardly typed zoom combinator:
zoom :: TypeLens a p -> TypeLens a (Zoom p)
This combinator lets us zoom into a composed type lens, removing the outer most layer. For example, given the composed type lens:
ghci> :t _a._b._a._b
_a._b._a._b :: TypeLens a (Param_a (Param_b (Param_a (Param_b a))))
Then zooming in removes the first _a:
ghci> :t zoom (_a._b._a._b)
zoom (_a._b._a._b) :: TypeLens a (Param_b (Param_a (Param_b a)))
We will use this combinator to redefine our Functor instances. The new instances will recursively map over every Functor in our input lens:
> instance Functor p b => Functor (Param_b p) (Either a b) where
> fmap' lens f (Left a) = Left a
> fmap' lens f (Right b) = Right $ fmap' (zoom lens) f b
>
> instance Functor p a => Functor (Param_a p) (Either a b) where
> fmap' lens f (Left a) = Left $ fmap' (zoom lens) f a
> fmap' lens f (Right b) = Right b
The type Base provides the base case of the recursion:
> instance Functor Base t where
> fmap' _ f a = f a
Now, in order to call fmap’, we must compose our lens with the type lens:
_base :: TypeLens Base Base
For example:
ghci> :t _a._b._a._b._base
deeplens :: TypeLens Base (Param_a (Param_b (Param_a (Param_b Base))))
And we call fmap’ like:
ghci> fmap' (_a._b._a._b._base) length $ Left $ Right $ Left $ Right "still simpler than the lens package ;)"
Left (Right (Left (Right 42)))
ghci> fmap' (_a._b._a._b._base) length $ Left $ Right $ Left $ Left "... for now ..."
Left (Right (Left (Left "... for now ...")))
Composing all of our lenses with _base
is tedious. So let’s write a function that automates that task:
> fmap ::
> ( Functor lens t
> ) => TypeLens Base lens
> -> (GetParam lens t -> c)
> -> t
> -> SetParam lens c t
> fmap lens = fmap' (lens._base)
And we call fmap as:
ghci> fmap (_a._b._a._b) length $ Left $ Right $ Left $ Left "mwahhahahaha"
Left (Right (Left (Left "mwahhahahaha")))
We can easily define more of these new Functor instances. In fact, the procedure is exactly as mechanical for type lens based Functors as it is for the traditional Functors. All you have to do is replace every function application with a recursive Functor call:
f x --> fmap' (zoom lens) f x
Here are some examples using the list and Maybe functors:
> mkParams ''[]
> instance Functor p a => Functor (Param_a p) [a] where
> fmap' lens f [] = []
> fmap' lens f (a:as) = fmap' (zoom lens) f a : fmap' lens f as
>
> mkParams ''Maybe
> instance Functor p a => Functor (Param_a p) (Maybe a) where
> fmap' lens f Nothing = Nothing
> fmap' lens f (Just a) = Just $ fmap' (zoom lens) f a
Let’s create a variable that uses all of our functors:
> monster =
> [ Nothing
> , Just (Left "Hello!")
> , Just (Right 42)
> , Just (Left "World!")
> ]
And go to town:
ghci> fmap (_a._a._a._a) succ monster
[Nothing,Just (Left "Ifmmp\""),Just (Right 42),Just (Left "Xpsme\"")]
ghci> fmap (_a._a._a) length monster
[Nothing,Just (Left 6),Just (Right 42),Just (Left 6)]
ghci> fmap (_a._a) (const 3.4) monster
[Nothing,Just 3.4,Just 3.4,Just 3.4]
ghci> fmap _a show monster
["Nothing","Just (Left \"Hello!\")","Just (Right 42)","Just (Left \"World!\")"]
In our next installment, we’ll tackle Applicative parsing with type lenses. Thought the lens package had too many operators??? You ’aint seen ’nothin yet.
In Matthew 17:20, Jesus says:
I tell you the truth, if you have faith as small as a mustard seed, you can say to this mountain, “Move from here to there” and it will move. Nothing will be impossible for you.
This has always been a hard verse for me. My basic interpretation has always been: Either Jesus was wrong, or no one has the faith of even a mustard seed. Because I don’t see any mountains being moved.
But this passage was also recorded in the Gospel of Thomas. This version says:
If two make peace with each other in this one house, they will say to the mountain, “Move Away,” and it will move away.
What Mathew called “faith as small as a mustard seed,” Thomas calls “two making peace with each other.” It amazes me that these two phrases are used interchangeably by the early church.
(Scholars debate the exact date when the Gospel of Thomas was first recorded. Estimates vary from 40AD to 140AD. But most agree that it faithfully represents one of the early oral Christian traditions. See wikipedia for more detail.)
I am a Christian pacifist, but I still have a lot of respect for certain people in the military. This post is about how I resolve this apparent conflict using a tool called the “pacifism parallelogram.” Here’s a picture:
The blue dot in the center of the parallelogram is me (or you!). Each of the dots on the corners represent different archetypes that we can follow.
In the bottom left is Jonah. Jonah was nonviolent, but he wasn’t a very good person. When God commanded Jonah to go help Ninevah, Jonah ran away. He was too concerned about his own personal comfort and safety to think about others. When we act like Jonah, the world suffers. As the saying goes, “all it takes for evil to triumph is for good men to do nothing.”
But we can get quite a bit more evil. If we perform violent actions, we travel right in the diagram. This takes us to Judas. Judas came with armed men to capture and kill the innocent Jesus. By using violence in this way, we can bring about quite a bit more evil than by doing nothing. That is why Judas is farther down in the diagram than Jonah. The farther down you are, the more evil you are.
But not all violence is created equal. If we travel up from Judas, we get to David. David was a king of Israel and is considered one of the most righteous people of the old testament. He used violence to protect the innocent. When a soldier kills a suicide bomber before the bomber can kill innocent civilians, the soldier is imitating David. The soldier has risked his own safety and done a good thing.
But it was not the best thing. If we travel again to the left in the diagram we come to Jesus. Jesus did not use violence, and he embodied all that is good in the world. When the devil gave Jesus the opportunity to use violence to stop evil people (Matthew 4:1-11), Jesus chose a better path: He sacrificed himself for those evil people. He died on the cross. While violence may be useful in protecting the innocent, it is useless when saving the guilty from themselves. This is a much harder (and in the Christian perspective much more righteous) task. That is why Jesus is higher in the diagram than David.
My goal is to be as much like Jesus as possible. Here’s two examples of how we can use the parallelogram to do this:
Example 1: Let’s rethink the story of David and Goliath as told by 1 Samuel 17. The Philistines are invading Israel, and are camped inside the borders of Judah. Every day, the giant Goliath comes forward and challenges the Israelites to single combat. At this point, the Jonah option would be to hide in the ranks. Jonah would depend on someone else to save the Israelites. The Judas option would be to secretly meet with the invading army. Judas would help the Philistines kill the Jews in the hope of escaping a similar fate. Enter David. David was brave. He chose to fight the Goliath single handedly. He wanted to save his friends from doom, and this was a good thing.
But it was not the best thing. What would Jesus have done? I can’t know for sure, but I can speculate: I think Jesus would have helped the Philistines. He would have delivered them water and food. He would have healed their wounded and cared for the widows and orphans left behind. Jesus would have been willing to die not just for the Israelites (like David), but also for the Philistines. What greater love is there than that!?
Example 2: The parallelogram has informed my personal development as a Christian. (1) Like most adolescents, I had no desire to risk my own safety for others. I didn’t stand up for the weird kids when the bullies picked on them. I followed Jonah. (2) This changed after September 11th. Around that time, I started taking my Christian faith seriously. The world trade centers taught me that there is evil in the world, and Christ showed me that this was not how the world was meant to be. I decided to do my best to fix the world, so I joined the Navy. David became my role model. (3) But David couldn’t heal my broken soul. I thought I could be the world’s savior, but only Jesus can do that. So I recommitted myself to Christ and decided to take his teaching to “turn the other cheek” seriously. (There’s a lot more to this transformation, and you can read about it here.)
In graphical form:
Notice that I don’t consider myself more righteous than David. In fact, I firmly believe that there have been violent people more righteous than I am! Nonetheless, my calling is to be like Jesus. That means striving for something better than David. My new goal is to follow the dotted line… to change the world by offering myself as a living sacrifice.
I fail every day. But with Christ’s grace, I find renewed strength to keep trying. That is why I call myself a Christian pacifist.
Functors and monads are powerful design patterns used in Haskell. They give us two cool tricks for analyzing data. First, we can “preprocess” data after we’ve already trained a model. The model will be automatically updated to reflect the changes. Second, this whole process happens asymptotically faster than the standard method of preprocessing. In some cases, you can do it in constant time no matter how many data points you have!
This post focuses on how to use functors and monads in practice with the HLearn library. We won’t talk about their category theoretic foundations; instead, we’ll go through ten concrete examples involving the categorical distribution. This distribution is somewhat awkwardly named for our purposes because it has nothing to do with category theory—it is the most general distribution over non-numeric (i.e. categorical) data. It’s simplicity should make the examples a little easier to follow. Some more complicated models (e.g. the kernel density estimator and Bayesian classifier) also have functor and monad instances, but we’ll save those for another post.
Before we dive into using functors and monads, we need to set up our code and create some data. Let’s install the packages:
$ cabal install HLearn-distributions-1.1.0.1
Import our modules:
> import Control.ConstraintKinds.Functor
> import Control.ConstraintKinds.Monad
> import Prelude hiding (Functor(..), Monad (..))
>
> import HLearn.Algebra
> import HLearn.Models.Distributions
For efficiency reasons we’ll be using the Functor and Monad instances provided by the ConstraintKinds package and language extension. From the user’s perspective, everything works the same as normal monads.
Now let’s create a simple marble data type, and a small bag of marbles for our data set.
> data Marble = Red | Pink | Green | Blue | White
> deriving (Read,Show,Eq,Ord)
>
> bagOfMarbles = [ Pink,Green,Red,Blue,Green,Red,Green,Pink,Blue,White ]
This is a very small data set just to make things easy to visualize. Everything we’ll talk about works just as well on arbitrarily large data sets.
We train a categorical distribution on this data set using the train function:
> marblesDist = train bagOfMarbles :: Categorical Double Marble
The Categorical type takes two parameters. The first is the type of our probabilities, and the second is the type of our data points. If you stick your hand into the bag and draw a random marble, this distribution tells you the probability of drawing each color.
Let’s plot our distribution:
ghci> plotDistribution (plotFile "marblesDist" $ PNG 400 300) marblesDist
Okay. Now we’re ready for the juicy bits. We’ll start by talking about the list functor. This will motivate the advantages of the categorical distribution functor.
A functor is a container that lets us “map” a function onto every element of the container. Lists are a functor, and so we can apply a function to our data set using the map
function.
map :: (a -> b) -> [a] -> [b]
Example 1:
Let’s say instead of a distribution over the marbles’ colors, I want a distribution over the marbles’ weights. I might have a function that associates a weight with each type of marble:
> marbleWeight :: Marble -> Int -- weight in grams
> marbleWeight Red = 3
> marbleWeight Pink = 2
> marbleWeight Green = 3
> marbleWeight Blue = 6
> marbleWeight White = 2
I can generate my new distribution by first transforming my data set, and then training on the result. Notice that the type of our distribution has changed. It is no longer a categorical distribution over marbles; it’s a distribution over ints.
> weightsDist = train $ map marbleWeight bagOfMarbles :: Categorical Double Int
ghci> plotDistribution (plotFile "weightsDist" $ PNG 400 300) weightsDist
This is the standard way of preprocessing data. But we can do better because the categorical distribution is also a functor. Functors have a function called fmap that is analogous to calling map on a list. This is its type signature specialized for the Categorical type:
fmap :: (Ord dp0, Ord dp1) => (dp0 -> dp1) -> Categorical prob dp0 -> Categorical prob dp1
We can use fmap to apply the marbleWeights function directly to the distribution:
> weightDist' = fmap marbleWeight marblesDist
This is guaranteed to generate the same exact answer, but it is much faster. It takes only constant time to call Categorical’s fmap, no matter how much data we have!
Let me put that another way. Below is a diagram showing the two possible ways to generate a model on a preprocessed data set. Every arrow represents a function application.
The normal way to preprocess data is to take the bottom left path. But because our model is a functor, the top right path becomes available. This path is better because it has the shorter run time.
Furthermore, let’s say we want to experiment with \(latex k\) different preprocessing functions. The standard method will take \(latex \Theta(nk)\) time, whereas using the categorical functor takes time \(latex \Theta(n + k)\).
Note: The diagram treats the number of different categories (m) as a constant because it doesn’t depend on the number of data points. In our case, we have 5 types of marbles, so m=5. Every function call in the diagram is really multiplied by m.
Example 2:
For another example, what if we don’t want to differentiate between red and pink marbles? The following function converts all the pink marbles to red.
> pink2red :: Marble -> Marble
> pink2red Pink = Red
> pink2red dp = dp
Let’s apply it to our distribution, and plot the results:
> nopinkDist = fmap pink2red marblesDist
ghci> plotDistribution (plotFile "nopinkDist" $ PNG 400 300) nopinkDist
That’s about all that a Functor can do by itself. When we call fmap, we can only process individual data points. We can’t change the number of points in the resulting distribution or do other complex processing. Monads give us this power.
Monads are functors with two more functions. The first is called return. Its type signature is
return :: (Ord dp) => dp -> Categorical prob dp
We’ve actually seen this function already in previous posts. It’s equivalent to the train1dp function found in the HomTrainer type class. All it does is train a categorical distribution on a single data point.
The next function is called join. It’s a little bit trickier, and it’s where all the magic lies. Its type signature is:
join :: (Ord dp) => Categorical prob (Categorical prob dp) -> Categorical prob dp
As input, join takes a categorical distribution whose data points are other categorical distributions. It then “flattens” the distribution into one that does not take other distributions as input.
Example 3
Let’s write a function that removes all the pink marbles from our data set. Whenever we encounter a pink marble, we’ll replace it with an empty categorical distribution; if the marble is not pink, we’ll create a singleton distribution from it.
> forgetPink :: (Num prob) => Marble -> Categorical prob Marble
> forgetPink Pink = mempty
> forgetPink dp = train1dp dp
>
> nopinkDist2 = join $ fmap forgetPink marblesDist
ghci> plotDistribution (plotFile "nopinkDist2" $ PNG 400 300) nopinkDist2
This idiom of join ( fmap … ) is used a lot. For convenience, the** >>=** operator (called bind) combines these steps for us. It is defined as:
(>>=) :: Categorical prob dp0 -> (dp0 -> Categorical prob dp1) -> Categorical prob dp1
dist >>= f = join $ fmap f dist
Under this notation, our new distribution can be defined as:
> nopinkDist2' = marblesDist >>= forgetPink
Example 4
Besides removing data points, we can also add new ones. Let’s double the number of pink marbles in our training data:
> doublePink :: (Num prob) => Marble -> Categorical prob Marble
> doublePink Pink = 2 .* train1dp Pink
> doublePink dp = train1dp dp
>
> doublepinkDist = marblesDist >>= doublePink
ghci> plotDistribution (plotFile "doublepinkDist" $ PNG 400 300) doublepinkDist
Example 5
Mistakes are often made when collecting data. One common machine learning task is to preprocess data sets to account for these mistakes. In this example, we’ll assume that our sampling process suffers from uniform noise. Specifically, if one of our data points is red, we will assume there is only a 60% chance that the marble was actually red, and a 10% chance each that it was one of the other colors. We will define a function to add this noise to our data set, increasing the accuracy of our final distribution.
Notice that we are using fractional weights for our noise, and that the weights are carefully adjusted so that the total number of marbles in the distribution still sums to one. We don’t want to add or remove marbles while adding noise.
> addNoise :: (Fractional prob) => Marble -> Categorical prob Marble
> addNoise dp = 0.5 .* train1dp dp <> 0.1 .* train [ Red,Pink,Green,Blue,White ]
>
> noiseDist = marblesDist >>= addNoise
ghci> plotDistribution (plotFile "noiseDist" $ PNG 400 300) noiseDist
Adding uniform noise just made all our probabilities closer together.
Example 6
Of course, the amount of noise we add to each sample doesn’t have to be the same everywhere. If I suffer from red-green color blindness, then I might use this as my noise function:
> rgNoise :: (Fractional prob) => Marble -> Categorical prob Marble
> rgNoise Red = trainW [(0.7,Red),(0.3,Green)]
> rgNoise Green = trainW [(0.1,Red),(0.9,Green)]
> rgNoise dp = train1dp dp
>
> rgNoiseDist = marblesDist >>= rgNoise
ghci> plotDistribution (plotFile "rgNoiseDist" $ PNG 400 300) rgNoiseDist
Because of my color blindness, the probability of drawing a red marble from the bag is higher than drawing a green marble. This is despite the fact that we observed more green marbles in our training data.
Example 7
In the real world, we can never know exactly how much error we have in the samples. Luckily, we can try to learn it by conducting a second experiment. We’ll first experimentally determine how red-green color blind I am, then we’ll use that to update our already trained distribution.
To determine the true error rate, we need some unbiased source of truth. In this case, we can just use someone with good vision. They will select ten red marbles and ten green marbles, and I will guess what color they are.
Let’s train a distribution on what I think green marbles look like:
> greenMarbles = [Green,Red,Green,Red,Green,Red,Red,Green,Green,Green]
> greenDist = train greenMarbles :: Categorical Double Marble
and what I think red marbles look like:
> redMarbles = [Red,Green,Red,Green,Red,Red,Green,Green,Red,Red]
> redDist = train redMarbles :: Categorical Double Marble
Now we’ll create the noise function based off of our empirical data. The (/.) function is scalar division, and we can use it because the categorical distribution is a vector space. We’re dividing by the number of data points in the distribution so that the distribution we output has an effective training size of one. This ensures that we’re not accidentally creating new data points when applying our function to another distribution.
> rgNoise2 :: Marble -> Categorical Double Marble
> rgNoise2 Green = greenDist /. numdp greenDist
> rgNoise2 Red = redDist /. numdp redDist
> rgNoise2 dp = train1dp dp
>
> rgNoiseDist2 = marblesDist >>= rgNoise2
ghci> plotDistribution (plotFile "rgNoiseDist2" $ PNG 400 300) rgNoiseDist2
Example 8
We can chain our preprocessing functions together in arbitrary ways.
> allDist = marblesDist >>= forgetPink >>= addNoise >>= rgNoise
ghci> plotDistribution (plotFile "allDist" $ PNG 400 300) allDist
But wait! Where’d that pink come from? Wasn’t the call to forgetPink supposed to remove it? The answer is that we did remove it, but then we added it back in with our noise functions. When using monadic functions, we must be careful about the order we apply them in. This is just as true when using regular functions.
Here’s another distribution created from those same functions in a different order:
> allDist2 = marblesDist >>= addNoise >>= rgNoise >>= forgetPink
ghci> plotDistribution (plotFile "allDist" $ PNG 400 300) allDist2
We can also use Haskell’s do notation to accomplish the same exact thing:
>allDist2' :: Categorical Double Marble
>allDist2' = do
> dp <- train bagOfMarbles
> dp <- addNoise dp
> dp <- rgNoise dp
> dp <- forgetPink dp
> return dp
(Since we’re using a custom Monad definition, do notation requires the RebindableSyntax
extension.)
Example 9
Do notation gives us a convenient way to preprocess multiple data sets into a single data set. Let’s create two new data sets and their corresponding distributions for us to work with:
> bag1 = [Red,Pink,Green,Blue,White]
> bag2 = [Red,Blue,White]
>
> bag1dist = train bag1 :: Categorical Double Marble
> bag2dist = train bag2 :: Categorical Double Marble
Now, we’ll create a third data set that is a weighted combination of bag1 and bag2. We will do this by repeated sampling. On every iteration, with a 20% probability we’ll sample from bag1, and with an 80% probability we’ll sample from bag2. Imperative pseudo-code for this algorithm is:
let comboDist be an empty distribution
loop until desired accuracy achieved:
let r be a random number from 0 to 1
if r > 0.2:
sample dp1 from bag1
add dp1 to comboDist
else:
sample dp2 from bag2
add dp2 to comboDist
This sampling procedure will obviously not give us an exact answer. But since the categorical distribution supports weighted data points, we can use this simpler pseudo-code to generate an exact answer:
let comboDist be an empty distribution
foreach datapoint dp1 in bag1:
foreach datapoint dp2 in bag2:
add dp1 with weight 0.2 to comboDist
add dp2 with weight 0.8 to comboDist
Using do notation, we can express this as:
> comboDist :: Categorical Double Marble
> comboDist = do
> dp1 <- bag1dist
> dp2 <- bag2dist
> trainW [(0.2,dp1),(0.8,dp2)]
ghci> plotDistribution (plotFile "comboDist" $ PNG 400 300) comboDist
And because the Categorical functor takes constant time, constructing comboDist also takes constant time. The naive imperative algorithm would have taken time \(O(|\text{bag1}|*|\text{bag2}|)\).
When combining multiple distributions this way, the number of data points in our final distribution will be the product of the number of data points in the initial distributions:
ghci> numdp combination
15
Example 10
Finally, arbitrarily complex preprocessing functions can be written using Haskell’s do notation. And remember, no matter how complicated these functions are, their run time never depends on the number of elements in the initial data set.
This function adds uniform sampling noise to our bagOfMarbles, but only on those marbles that are also contained in bag2 above.
> comboDist2 :: Categorical Double Marble
> comboDist2 = do
> dp1 <- marblesDist
> dp2 <- bag2dist
> if dp1==dp2
> then addNoise dp1
> else return dp1
ghci> plotDistribution (plotFile "comboDist2" $ PNG 400 300) comboDist2
This application of monads to machine learning generalizes the monad used in probabilistic functional programming. The main difference is that PFP focused on manipulating already known distributions, not training them from data. Also, if you enjoy this kind of thing, you might be interested in the n-category cafe discussion on category theory in machine learning from a few years back.
In future posts, we’ll look at functors and monads for continuous distributions, multivariate distributions, and classifiers.
Subscribe to the RSS feed to stay tuned!
For $5 at Lowes, I built an “external wort chiller.” I’ve never seen any other homebrewers with this setup, so I figured I’d post my results on the internet. With only the standard wort chiller it takes about 30 minutes to cool our wort from boiling to yeast pitching temperature (85F for us). With the external chiller, it took just under 20 minutes. This may seem like a long time to you, but we live in the middle of a desert. Ambient temperature is often over 100F on a summer brew day.
Here’s a picture of our full chiller assembly. The internal chiller is on the right, and the external chiller is on the left.
The external chiller just sits around the outside of the boil pot. The pot’s handles keep the coil in place:
When we start the cooldown, water flows through the internal chiller, then through the external chiller. The external chiller has a number of holes cut into it. Water sprays out these hole and onto the outside of the pot:
This dramatically increases the surface area of water cooling. Heat is being transfered not just at the internal coils, but also along the whole pot. Here’s a zoomed out picture of the whole thing in action:
We only had to buy a 5 foot section of copper coil to wrap around the pot, and this cost about $5 at Lowes. I used a dremel to cut slots in the copper tubing about every 4 inches.
This external chilling reduced our cooling times from just over 30 minutes to just under 20 minutes. It was definitely worth the investment.
Below is the text of a document I prepared when filing for conscientious objector status to leave the navy. I’ve published it at the request of other conscientious objectors going through the discharge process.
I joined the Navy because I wanted to serve my country. My religious beliefs no longer allow me to kill, but I still want to serve. Service, in fact, is an integral part of my beliefs. My country has given me a lot. I value the ideas of freedom and democracy. I want to give everything I have to my country and the ideals for which it stands. Ideally, I would serve in a capacity that maximizes the peace and welfare of the United States, but minimizes my contribution to war. I believe these goals are not mutually exclusive. This document explores how well my service options meet these goals, both inside and outside the military. This will explain my decision not to apply for noncombatant (1-A-0) status.
All billets in the military are designed to maximize the security of the United States, and these billets contribute to war in varying degrees. If a billet existed which did not contribute to war in any way, I would gladly volunteer for it. No matter how dangerous, difficult, time-consuming, or otherwise undesirable the job may be, I would enthusiastically perform this job to the best of my abilities. I cannot know every billet available, but I do know what communities exist. The Navy’s officer community is divided into four main groups: unrestricted line officers, restricted line officers, special duty officers, and the staff corps. I will classify these communities depending on whether they present high, medium, or low conflict with my beliefs.
I will demonstrate that had I applied for noncombatant (1-A-0) status, I would still be placed in a billet which conflicts with my beliefs. According to regulation MILPERSMAN 1900-020, a noncombatant can be assigned to serve “on board an armed ship or aircraft in a combat zone provided the member is not personally and directly involved in the operation of weapons.” For example, as a nuclear trained officer, I could be assigned to operate the nuclear propulsion system for an aircraft carrier. I would not be the individual delivering bombs to their targets, so according to regulations I would not be responsible. But according to my conscience I would still be responsible.
Most of the Navy’s communities are primarily warfare related. These communities provide the maximum conflict with my convictions. The unrestricted line officers form the heart of the Navy. Their duties involve training for war, and conducting war once begun. This directly goes against my nonviolent religious beliefs. These communities include:
Surface warfare
Submarine warfare
Naval aviation
Naval flight officers
Special warfare
Notably, by the definition of a 1-A-0 noncombatant I could still be billeted within these high conflict communities.
Even if I were guaranteed a billet not in the high conflict communities, all naval communities present at least a medium conflict with my beliefs. They all participate in war indirectly because their missions are to make the warfighters more effective. The navy divides these medium conflict communities into three categories: restricted line officers, special duty officers, and staff corps.
Restricted line officers prepare the Navy for warfare. Without their support, the fighting elements of the Navy could not complete their missions. Therefore, these communities still provide significant conflict with my nonviolent religious beliefs. These communities include:
Human Resources Officers “plan, program and execute life-cycle management of our Navy’s most important resource – people.”
Nuclear Propulsion Training officers teach students the fundamentals of nuclear propulsion. The purpose of this training is so that students qualify in ship driving, and the training is critical in their training for war.
Naval Reactors Engineers ensure the safe and reliable operation of the Navy’s nuclear propulsion plants. This ensures the combat readiness of the Navy’s submarine force and aircraft carriers.
Engineering Duty Officers design, construct, and maintain the Navy’s ships. These ships are designed around their capabilities to project power and deliver weapons systems to enemy targets.
Aerospace Engineering Duty officers perform a similar role for the Navy’s airplanes.
Foreign Area Officers “manage and analyze politico-military activities overseas.”
Special duties officers are similar to unrestricted line officers in that they are usually only indirectly involved in warfare. This includes:
Intelligence officers provide “tactical, operational and strategic intelligence support to U.S. naval forces, joint services, multi-national forces, and executive level decision-makers.”
Public Affairs are responsible for projecting a good moral image of the Navy’s warfighting
Recruiters convince young men and women to join the warfighting elements of the navy
Fleet Support officers provide engineering assistance to warfighting units
Meteorology/Oceanography officers “collect, analyze, and distribute data about the ocean and the atmosphere to Navy forces operating all over the world. They assist the war fighter in taking tactical advantage of the environment.”
Special duties officers are similar to unrestricted line officers. They includes:
Information Professionals maintain the electronic equipment aboard naval installations
Information Warfare officers “deliver overwhelming information superiority that successfully supports command objectives… And ultimately, providing war-fighters, planners and policy makers with real-time warning, offensive opportunities and an ongoing operational advantage.”
Cyber Warfare Engineers conduct electronic attacks
Staff corps officers are like special duties officers that require special training. They include doctors and JAGs. I would not be qualified for any of these billets.
I believe there are many opportunities outside the military that would allow me to serve in a manner consistent with my beliefs. Should I be given a discharge, I will pursue such service. I would gladly accept as a condition of my discharge some other type of obligated service. Many conscientious objectors in the past have served honorably in government service. They have volunteered to restore national parks, serve in psychiatric wards, and even have medical experiments conducted on themselves. The smoke jumpers—an elite group of firefighters who parachute into blazing fires—were founded by conscientious objectors.
I have received training that can be utilized nonviolently in two areas: computer science and nuclear power. This training can be used nonviolently to promote the effective defense of the United States.
In a defensive capacity, my computer science training could be used to safeguard electronic systems against attack. Criminal organizations routinely target government electronic infrastructure. Sometime they are looking for specific information, sometimes simply to cause disruptions. I have significant experience protecting electronic assets. I would proudly serve in a role where I would harden the United States government and infrastructure against such threats.
In a defensive capacity, my nuclear training could be used to reduce the threat of nuclear weapons. The current administration has expressed an intent to reduce the nation’s nuclear arsenal. I could apply my nuclear training with the Department of Energy
All available billets within the Navy present high conflict with my belief in Jesus. I therefore cannot apply for noncombatant 1-A-0 status. But there are many other roles within the federal government that I am both highly qualified for and present no such conflict. I would gladly serve in such a capacity, no matter how difficult or dangerous the job may be.
Haskell code is expressive. The HLearn library uses 6 lines of Haskell to define a function for training a Bayesian classifier; the equivalent code in the Weka library uses over 100 lines of Java. That’s a big difference! In this post, we’ll look at the actual code and see why the Haskell is so much more concise.
But first, a disclaimer: It is really hard to fairly compare two code bases this way. In both libraries, there is a lot of supporting code that goes into defining each classifier, and it’s not obvious what code to include and not include. For example, both libraries implement interfaces to a number of probability distributions, and this code is not contained in the source count. The Haskell code takes more advantage of this abstraction, so this is one language-agnostic reason why the Haskell code is shorter. If you think I’m not doing a fair comparison, here’s some links to the full repositories so you can do it yourself:
HLearn’s bayesian classifier source code (74 lines of code)
Weka’s naive bayes source code (946 lines of code)
HLearn implements training for a bayesian classifier with these six lines of Haskell:
newtype Bayes labelIndex dist = Bayes dist
deriving (Read,Show,Eq,Ord,Monoid,Abelian,Group)
instance (Monoid dist, HomTrainer dist) => HomTrainer (Bayes labelIndex dist) where
type Datapoint (Bayes labelIndex dist) = Datapoint dist
train1dp dp = Bayes $ train1dp dp
This code elegantly captures how to train a Bayesian classifier—just train a probability distribution. Here’s an explanation:
The first two lines define the Bayes data type as a wrapper around a distribution.
The fourth line says that we’re implementing the Bayesian classifier using the HomTrainer type class. We do this because the Haskell compiler automatically generates a parallel batch training function, an online training function, and a fast cross-validation function for all HomTrainer instances.
The fifth line says that our data points have the same type as the underlying distribution.
The sixth line says that in order to train, just train the corresponding distribution.
We only get the benefits of the HomTrainer type class because the bayesian classifier is a monoid. But we didn’t even have to specify what the monoid instance for bayesian classifiers looks like! In this case, it’s automatically derived from the monoid instances for the base distributions using a language extension called GeneralizedNewtypeDeriving. For examples of these monoid structures, check out the algebraic structure of the normal and categorical distributions, or more complex distributions using Markov networks.
Look for these differences between the HLearn and Weka source:
In Weka we must separately define the online and batch trainers, whereas Haskell derived these for us automatically.
Weka must perform a variety of error handling that Haskell’s type system takes care of in HLearn.
The Weka code is tightly coupled to the underlying probability distribution, whereas the Haskell code was generic enough to handle any distribution. This means that while Weka must make the “naive bayes assumption” that all attributes are independent of each other, HLearn can support any dependence structure.
Weka’s code is made more verbose by for loops and if statements that aren’t necessary for HLearn.
The Java code requires extensive comments to maintain readability, but the Haskell code is simple enough to be self-documenting (at least once you know how to read Haskell).
Weka does not have parallel training, fast cross-validation, data point subtraction, or weighted data points, but HLearn does.
/**
* Generates the classifier.
*
* @param instances set of instances serving as training data
* @exception Exception if the classifier has not been generated
* successfully
*/
public void buildClassifier(Instances instances) throws Exception {
// can classifier handle the data?
getCapabilities().testWithFail(instances);
// remove instances with missing class
instances = new Instances(instances);
instances.deleteWithMissingClass();
m_NumClasses = instances.numClasses();
// Copy the instances
m_Instances = new Instances(instances);
// Discretize instances if required
if (m_UseDiscretization) {
m_Disc = new weka.filters.supervised.attribute.Discretize();
m_Disc.setInputFormat(m_Instances);
m_Instances = weka.filters.Filter.useFilter(m_Instances, m_Disc);
} else {
m_Disc = null;
}
// Reserve space for the distributions
m_Distributions = new Estimator[m_Instances.numAttributes() - 1]
[m_Instances.numClasses()];
m_ClassDistribution = new DiscreteEstimator(m_Instances.numClasses(),
true);
int attIndex = 0;
Enumeration enu = m_Instances.enumerateAttributes();
while (enu.hasMoreElements()) {
Attribute attribute = (Attribute) enu.nextElement();
// If the attribute is numeric, determine the estimator
// numeric precision from differences between adjacent values
double numPrecision = DEFAULT_NUM_PRECISION;
if (attribute.type() == Attribute.NUMERIC) {
m_Instances.sort(attribute);
if ( (m_Instances.numInstances() > 0)
&& !m_Instances.instance(0).isMissing(attribute)) {
double lastVal = m_Instances.instance(0).value(attribute);
double currentVal, deltaSum = 0;
int distinct = 0;
for (int i = 1; i < m_Instances.numInstances(); i++) {
Instance currentInst = m_Instances.instance(i);
if (currentInst.isMissing(attribute)) {
break;
}
currentVal = currentInst.value(attribute);
if (currentVal != lastVal) {
deltaSum += currentVal - lastVal;
lastVal = currentVal;
distinct++;
}
}
if (distinct > 0) {
numPrecision = deltaSum / distinct;
}
}
}
for (int j = 0; j < m_Instances.numClasses(); j++) {
switch (attribute.type()) {
case Attribute.NUMERIC:
if (m_UseKernelEstimator) {
m_Distributions[attIndex][j] =
new KernelEstimator(numPrecision);
} else {
m_Distributions[attIndex][j] =
new NormalEstimator(numPrecision);
}
break;
case Attribute.NOMINAL:
m_Distributions[attIndex][j] =
new DiscreteEstimator(attribute.numValues(), true);
break;
default:
throw new Exception("Attribute type unknown to NaiveBayes");
}
}
attIndex++;
}
// Compute counts
Enumeration enumInsts = m_Instances.enumerateInstances();
while (enumInsts.hasMoreElements()) {
Instance instance =
(Instance) enumInsts.nextElement();
updateClassifier(instance);
}
// Save space
m_Instances = new Instances(m_Instances, 0);
}
And the code for online learning is:
/**
* Updates the classifier with the given instance.
*
* @param instance the new training instance to include in the model
* @exception Exception if the instance could not be incorporated in
* the model.
*/
public void updateClassifier(Instance instance) throws Exception {
if (!instance.classIsMissing()) {
Enumeration enumAtts = m_Instances.enumerateAttributes();
int attIndex = 0;
while (enumAtts.hasMoreElements()) {
Attribute attribute = (Attribute) enumAtts.nextElement();
if (!instance.isMissing(attribute)) {
m_Distributions[attIndex][(int)instance.classValue()].
addValue(instance.value(attribute), instance.weight());
}
attIndex++;
}
m_ClassDistribution.addValue(instance.classValue(),
instance.weight());
}
}
Every algorithm implemented in HLearn uses similarly concise code. I invite you to browse the repository and see for yourself. The most complicated algorithm is for Markov chains which use only 6 lines for training, and about 20 for defining the Monoid.
You can expect lots of tutorials on how to incorporate the HLearn library into Haskell programs over the next few months.
Weka is one of the most popular tools for data analysis. But Weka takes 70 minutes to perform leave-one-out cross-validate using a simple naive bayes classifier on the census income data set, whereas Haskell’s HLearn library only takes 9 seconds. Weka is 465x slower!
Code and instructions for reproducing these experiments are available on github.
Why is HLearn so much faster?
Well, it turns out that the bayesian classifier has the algebraic structure of a monoid, a group, and a vector space. HLearn uses a new cross-validation algorithm that can exploit these algebraic structures. The standard algorithm runs in time \(\Theta(kn)\), where \(k\) is the number of “folds” and \(n\) is the number of data points. The algebraic algorithms, however, run in time \(\Theta(n)\). In other words, it doesn’t matter how many folds we do, the run time is constant! And not only are we faster, but we get the exact same answer. Algebraic cross-validation is not an approximation, it’s just fast.
Here’s some run times for k-fold cross-validation on the census income data set. Notice that HLearn’s run time is constant as we add more folds.
And when we set k=n, we have leave-one-out cross-validation. Notice that Weka’s cross-validation has quadratic run time, whereas HLearn has linear run time.
HLearn certainly isn’t going to replace Weka any time soon, but it’s got a number of cool tricks like this going on inside. If you want to read more, you should check out these two recent papers:
I’ll continue to write more about these tricks in future blog posts.
I really like the classic passage from Isaiah 2:4,
Nations will beat their swords into plowshares and their spears into pruning hooks. Nation will not take up sword against nation, nor will they train for war anymore.
This passage inspired me to convert this former AK-47 into a serving ladle.
This fully automatic AK-47 was used by the Romanian army during the Cold War. It could shoot 600 rounds per minute with an effective range of 400 yards. After the Cold War, the Romanians sold these rifles as “parts kits” for hobbyists to build their own rifle. I bought one while I was in the navy, but now I wanted to try doing something a little cooler with it.
Here it is fully disassembled:
This is a closeup of the the barrel assembly that I actually made into a spoon. I’m still looking for ideas on what to make out of everything else.
Here’s a closeup of the end where the bullet enters the barrel. On top is the “rear sights.” This is adjustable to shoot at targets anywhere from 0-800 meters away.
Notice that there’s actually many pieces of metal here—the barrel itself and two large blocks of steel attached to it. These blocks are held in place by “pinions.” These are the circle shaped pieces. If I had a hydraulic press I could push out the pinions and then remove the big metal blocks. But I don’t, so I’ll just beat them off with my forging hammer!
Looking down the barrel. This is where we would insert the bullet when firing.
Now for the business end. Here I have the flash suppressor removed. The forward sight (the tall thing jutting out to the right) will make a great handle for the future ladle.
In order to turn this chunk of metal into a spoon, I needed to build a forge. I bought a basic anvil and 5 lbs hammer on the internet for about $80, and the stump came from craigslist for free:
Next, I needed a way to heat the metal. I decided to build a propane forge, because I already had a good burner. The burner came out of a portable stove that we’ve used to serve free chili with a group called food not bombs. The base of the stove is below the bricks, but I unscrewed the burner and placed it on the bricks.
Then, I simply stacked the bricks on top of each other to create a nice chamber for the flames. Here’s a picture of a test run with a piece of rebar.
The flame is actually gigantic, shooting 1-2 feet past the opening of the forge. The middle of the rebar is a bright glowing orange-yellow, over 2000 degrees Fahrenheit. My camera just can’t do it justice.
The bricks are also really cheap. I payed 25 cents each at Lowes. Overall, the whole set up cost less than $100.
Let the hammering begin! You have to work quickly to hit the metal while it’s still hot from the forge!
Oooohhh…. glowing…… purty…..
After just a few hammer blows, look how much the blocks have shrunk. Also, I did NOT have a flash for this. The metal is so hot it’s putting off enough light to light up the wall I’m holding it next to!
I had to stop hammering after about an hour because of blisters. This is what I was starting with on the next day. Notice that you can still make out the serial number!
And here’s a view from the bottom up.
After a few more hours hammering, everything’s much flatter. The serial number has long since been flattened away.
Here’s the same view from the bottom. It looks like a ghost!
The layers of metal from the attached blocks are quite distinct and starting to get in the way.
After a lot of wrestling with some pliers, I’ve finally managed to remove the blocks of metal. All that remains is this “shrapnel.”
And the gun barrel itself is now the world’s coolest spatula.
And from the bottom:
All that’s left is to turn this spatula into a spoon. The metal is still pretty thick, so I flattened it out as much as I could.
Then I made a spoon shape. I did this by just holding the spatula at a slight angle while hammering. Every blow bent the metal just a little bit until the full bowl shape was complete.
I made my spoon about 2 inches too long. Whoops! No worries, I used a Dremel to cut the extra bits off. I also used it to smooth around some of the edges.
Notice that there’s a little hole in the middle of the spoon. I accidentally hammered the steel too thin and went all the way through. Meh. It’s still good. It’s just a straining spoon now!
There’s also a little bit of burnt steel along the sides. Seriously?! Steel can burn? Thankfully, a soak in vinegar and scrubbing brought it out. Thanks to the folks at iforgeiron.com for giving me the tip!
Here’s the final product:
This was my first project with a forge, so things didn’t turn out perfect. But I learned a lot and am still quite pleased with the result.
I wish I had some pictures of me eating from the spoon, but unfortunately it’s not food safe. Real gunpowder has been detonated countless times inside this barrel. I tried cleaning it as best as I could, but I’m pretty sure there’s plenty of little cancer molecules still hanging out in there.
In this post, we’re going to look at how to manipulate multivariate distributions in Haskell’s HLearn library. There are many ways to represent multivariate distributions, but we’ll use a technique called Markov networks. These networks have the algebraic structure called a monoid (and group and vector space), and training them is a homomorphism. Despite the scary names, these mathematical structures make working with our distributions really easy and convenient—they give us online and parallel training algorithms “for free.” If you want to go into the details of how, you can check out my TFP13 submission, but in this post we’ll ignore those mathy details to focus on how to use the library in practice. We’ll use a running example of creating a distribution over characters in the show Futurama.
As usual, this post is a literate haskell file. To run this code, you’ll need to install the hlearn-distributions package. This package requires GHC version at least 7.6.
bash> cabal install hlearn-distributions-1.1
Now for some code. We start with our language extensions and imports:
>{-# LANGUAGE DataKinds #-}
>{-# LANGUAGE TypeFamilies #-}
>{-# LANGUAGE TemplateHaskell #-}
>
>import HLearn.Algebra
>import HLearn.Models.Distributions
Next, we’ll create data type to represent Futurama characters. There are a lot of characters, so we’ll need to keep things pretty organized. The data type will have a record for everything we might want to know about a character. Each of these records will be one of the variables in our multivariate distribution, and all of our data points will have this type.
>data Character = Character
> { _name :: String
> , _species :: String
> , _job :: Job
> , _isGood :: Maybe Bool
> , _age :: Double -- in years
> , _height :: Double -- in feet
> , _weight :: Double -- in pounds
> }
> deriving (Read,Show,Eq,Ord)
>
>data Job = Manager | Crew | Henchman | Other
> deriving (Read,Show,Eq,Ord)
Now, in order for our library to be able to interpret the Character type, we call the template haskell function:
>makeTypeLenses ''Character
This function creates a bunch of data types and type classes for us. These “type lenses” give us a type-safe way to reference the different variables in our multivariate distribution. We’ll see how to use these type level lenses a bit later. There’s no need to understand what’s going on under the hood, but if you’re curious then checkout the hackage documentation or source code.
Now, we’re ready to create a data set and start training. Here’s a list of the employees of Planet Express provided by the resident bureaucrat Hermes Conrad. This list will be our first data set.
>planetExpress =
> [ Character "Philip J. Fry" "human" Crew (Just True) 1026 5.8 195
> , Character "Turanga Leela" "alien" Crew (Just True) 43 5.9 170
> , Character "Professor Farnsworth" "human" Manager (Just True) 85 5.5 160
> , Character "Hermes Conrad" "human" Manager (Just True) 36 5.3 210
> , Character "Amy Wong" "human" Other (Just True) 21 5.4 140
> , Character "Zoidberg" "alien" Other (Just True) 212 5.8 225
> , Character "Cubert Farnsworth" "human" Other (Just True) 8 4.3 135
> ]
Let’s train a distribution from this data. Here’s how we would train a distribution where every variable is independent of every other variable:
>dist1 = train planetExpress :: Multivariate Character
> '[ Independent Categorical '[String,String,Job,Maybe Bool]
> , Independent Normal '[Double,Double,Double]
> ]
> Double
In the HLearn library, we always use the function train to train a model from data points. We specify which model to train in the type signature.
As you can see, the Multivariate distribution takes three type parameters. The first parameter is the type of our data point, in this case Character. The second parameter describes the dependency structure of our distribution. We’ll go over the syntax for the dependency structure in a bit. For now, just notice that it’s a type-level list of distributions. Finally, the third parameter is the type we will use to store our probabilities.
What can we do with this distribution? One simple task we can do is to find marginal distributions. The marginal distribution is the distribution of a certain variable ignoring all the other variables. For example, let’s say I want a distribution of the species that work at planet express. I can get this by:
>dist1a = getMargin TH_species dist1
Notice that we specified which variable we’re taking the marginal of by using the type level lens TH_species. This data constructor was automatically created for us by out template haskell function makeTypeLenses. Every one of our records in the data type has its own unique type lens. It’s name is the name of the record, prefixed by TH. These lenses let us infer the types of our marginal distributions at compile time, rather than at run time. For example, the type of the marginal distribution of species is:
ghci> :t dist1a
dist1a :: Categorical String Double
That is, a categorical distributions whose data points are Strings and which stores probabilities as a Double. Now, if I wanted a distribution of the weights of the employees, I can get that by:
>dist1b = getMargin TH_weight dist1
And the type of this distribution is:
ghci> :t dist1b
dist1b :: Normal Double
Now, I can easily plot these marginal distributions with the plotDistribution function:
ghci> plotDistribution (plotFile "dist1a" $ PNG 250 250) dist1a
ghci> plotDistribution (plotFile "dist1b" $ PNG 250 250) dist1b
But wait! I accidentally forgot to include Bender in the planetExpress data set! What can I do?
In a traditional statistics library, we would have to retrain our data from scratch. If we had billions of elements in our data set, this would be an expensive mistake. But in our HLearn library, we can take advantage of the model’s monoid structure. In particular, the compiler used this structure to automatically derive a function called add1dp for us. Let’s look at its type:
ghci> :t add1dp
add1dp :: HomTrainer model => model -> Datapoint model -> model
It’s pretty simple. The function takes a model and adds the data point associated with that model. It returns the model we would have gotten if the data point had been in our original data set. This is called online training.
Again, because our distributions form monoids, the compiler derived an efficient and exact online training algorithm for us automatically.
So let’s create a new distribution that considers bender:
>bender = Character "Bender Rodriguez" "robot" Crew (Just True) 44 6.1 612
>dist1' = add1dp dist1 bender
And plot our new marginals:
ghci> plotDistribution (plotFile "dist1-withbender-species" $ PNG 250 250) $
getMargin TH_species dist1'
ghci> plotDistribution (plotFile "dist1-withbender-weight" $ PNG 250 250) $
getMargin TH_weight dist1'
Notice that our categorical marginal has clearly changed, but that our normal marginal doesn’t seemed to have changed at all. This is because the plotting routines automatically scale the distribution, and the normal distribution, when scaled, always looks the same. We can double check that we actually did change the weight distribution by comparing the mean:
ghci> mean dist1b
176.42857142857142
ghci> mean $ getMargin TH_weight dist1'
230.875
Bender’s weight really changed the distribution after all!
That’s cool, but our original distribution isn’t very interesting. What makes multivariate distributions interesting is when the variables affect each other. This is true in our case, so we’d like to be able to model it. For example, we’ve already seen that robots are much heavier than organic lifeforms, and are throwing off our statistics. The HLearn library supports a small subset of Markov Networks for expressing these dependencies.
We represent Markov Networks as graphs with undirected edges. Every attribute in our distribution is a node, and every dependence between attributes is an edge. We can draw this graph with the plotNetwork command:
ghci> plotNetwork "dist1-network" dist1
As expected, there are no edges in our graph because everything is independent. Let’s create a more interesting distribution and plot its Markov network.
>dist2 = train planetExpress :: Multivariate Character
> '[ Ignore '[String]
> , MultiCategorical '[String]
> , Independent Categorical '[Job,Maybe Bool]
> , Independent Normal '[Double,Double,Double]
> ]
> Double
ghci> plotNetwork "dist2-network" dist2
Okay, so what just happened?
The syntax for representing the dependence structure is a little confusing, so let’s go step by step. We represent the dependence information in the graph as a list of types. Each element in the list describes both the marginal distribution and the dependence structure for one or more records in our data type. We must list these elements in the same order as the original data type.
Notice that we’ve made two changes to the list. First, our list now starts with the type Ignore ’[String]. This means that the first string in our data type—the name—will be ignored. Notice that TH_name is no longer in the Markov Network. This makes sense because we expect that a character’s name should not tell us too much about any of their other attributes.
Second, we’ve added a dependence. The MultiCategorical distribution makes everything afterward in the list dependent on that item, but not the things before it. This means that the exact types of dependencies it can specify are dependent on the order of the records in our data type. Let’s see what happens if we change the location of the MultiCategorical:
>dist3 = train planetExpress :: Multivariate Character
> '[ Ignore '[String]
> , Independent Categorical '[String]
> , MultiCategorical '[Job]
> , Independent Categorical '[Maybe Bool]
> , Independent Normal '[Double,Double,Double]
> ]
> Double
ghci> plotNetwork "dist3-network" dist3
As you can see, our species no longer have any relation to anything else. Unfortunately, using this syntax, the order of list elements is important, and so the order we specify our data records is important.
Finally, we can substitute any valid univariate distribution for our Normal and Categorical distributions. The HLearn library currently supports Binomial, Exponential, Geometric, LogNormal, and Poisson distributions. These just don’t make much sense for modelling Futurama characters, so we’re not using them.
Now, we might be tempted to specify that every variable is fully dependent on every other variable. In order to do this, we have to introduce the “Dependent” type. Any valid multivariate distribution can follow Dependent, but only those records specified in the type-list will actually be dependent on each other. For example:
>dist4 = train planetExpress :: Multivariate Character
> '[ Ignore '[String]
> , MultiCategorical '[String,Job,Maybe Bool]
> , Dependent MultiNormal '[Double,Double,Double]
> ]
> Double
ghci> plotNetwork "dist4-network" dist4
Undoubtably, this is in always going to be the case—everything always has a slight influence on everything else. Unfortunately, it is not easy in practice to model these fully dependent distributions. We need roughly \(\Theta(2^{n+e})\) data points to accurately train a distribution, where n is the number of nodes in our graph and e is the number of edges in our network. Thus, by selecting that two attributes are independent of each other, we can greatly reduce the amount of data we need to train an accurate distribution.
I realize that this syntax is a little awkward. I chose it because it was relatively easy to implement. Future versions of the library should support a more intuitive syntax. I also plan to use copulas to greatly expand the expressiveness of these distributions. In the mean time, the best way to figure out the dependencies in a Markov Network are just to plot it and see visually.
Okay. So what distribution makes the most sense for Futurama characters? We’ll say that everything depends on both the characters’ species and job, and that their weight depends on their height.
>planetExpressDist = train planetExpress :: Multivariate Character
> '[ Ignore '[String]
> , MultiCategorical '[String,Job]
> , Independent Categorical '[Maybe Bool]
> , Independent Normal '[Double]
> , Dependent MultiNormal '[Double,Double]
> ]
> Double
ghci> plotNetwork "planetExpress-network" planetExpressDist
We still don’t have enough data to to train this network, so let’s create some more. We start by creating a type for our Markov network called FuturamaDist. This is just for convenience so we don’t have to retype the dependence structure many times.
>type FuturamaDist = Multivariate Character
> '[ Ignore '[String]
> , MultiCategorical '[String,Job]
> , Independent Categorical '[Maybe Bool]
> , Independent Normal '[Double]
> , Dependent MultiNormal '[Double,Double]
> ]
> Double
Next, we train some more distribubtions of this type on some of the characters. We’ll start with Mom Corporation and the brave Space Forces.
>momCorporation =
> [ Character "Mom" "human" Manager (Just False) 100 5.5 130
> , Character "Walt" "human" Henchman (Just False) 22 6.1 170
> , Character "Larry" "human" Henchman (Just False) 18 5.9 180
> , Character "Igner" "human" Henchman (Just False) 15 5.8 175
> ]
>momDist = train momCorporation :: FuturamaDist
>spaceForce =
> [ Character "Zapp Brannigan" "human" Manager (Nothing) 45 6.0 230
> , Character "Kif Kroker" "alien" Crew (Just True) 113 4.5 120
> ]
>spaceDist = train spaceForce :: FuturamaDist
And now some more robots:
>robots =
> [ bender
> , Character "Calculon" "robot" Other (Nothing) 123 6.8 650
> , Character "The Crushinator" "robot" Other (Nothing) 45 8.0 4500
> , Character "Clamps" "robot" Henchman (Just False) 134 5.8 330
> , Character "DonBot" "robot" Manager (Just False) 178 5.8 520
> , Character "Hedonismbot" "robot" Other (Just False) 69 4.3 1200
> , Character "Preacherbot" "robot" Manager (Nothing) 45 5.8 350
> , Character "Roberto" "robot" Other (Just False) 77 5.9 250
> , Character "Robot Devil" "robot" Other (Just False) 895 6.0 280
> , Character "Robot Santa" "robot" Other (Just False) 488 6.3 950
> ]
>robotDist = train robots :: FuturamaDist
Now we’re going to take advantage of the monoid structure of our multivariate distributions to combine all of these distributions into one.
> futuramaDist = planetExpressDist <> momDist <> spaceDist <> robotDist
The resulting distribution is equivalent to having trained a distribution from scratch on all of the data points:
train (planetExpress++momCorporation++spaceForces++robots) :: FuturamaDist
We can take advantage of this property any time we use the train function to automatically parallelize our code. The higher order function parallel will split the training task evenly over each of your available processors, then merge them together with the monoid operation. This results in “theoretically perfect” parallel training of these models.
parallel train (planetExpress++momCorporation++spaceForces++robots) :: FuturamaDist
Again, this is only possible because the distributions have a monoid structure.
Now, let’s ask some questions of our distribution. If I pick a character at random, what’s the probability that they’re a good guy? Let’s plot the marginal.
ghci> plotDistribution (plotFile "goodguy" $ PNG 250 250) $ getMargin TH_isGood futuramaDist
But what if I only want to pick from those characters that are humans, or those characters that are robots? Statisticians call this conditioning. We can do that with the condition function:
ghci> plotDistribution (plotFile "goodguy-human" $ PNG 250 250) $
getMargin TH_isGood $ condition TH_species "human" futuramaDist
ghci> plotDistribution (plotFile "goodguy-robot" $ PNG 250 250) $
getMargin TH_isGood $ condition TH_species "robot" futuramaDist
On the left is the plot for humans, and on the right the plot for robots. Apparently, original robot sin is much worse than that in humans! If only they would listen to Preacherbot and repent of their wicked ways…
Now let’s ask: What’s the average age of an evil robot?
ghci> mean $ getMargin TH_age $
condition TH_isGood (Just False) $ condition TH_species "robot" futuramaDist
273.0769230769231
Notice that conditioning a distribution is a commutative operation. That means we can condition in any order and still get the exact same results. Let’s try it:
ghci> mean $ getMargin TH_age $
condition TH_species "robot" $ condition TH_isGood (Just False) futuramaDist
273.0769230769231
There’s one last thing for us to consider. What does our Markov network look like after conditioning? Let’s find out!
plotNetwork "condition-species-isGood" $
condition TH_species "robot" $ condition TH_isGood (Just False) futuramaDist
Notice that conditioning against these variables caused them to go away from our Markov Network.
Finally, there’s another similar process to conditioning called “marginalizing out.” This lets us ignore the effects of a single attribute without specifically saying what that attribute must be. When we marginalize out on our Markov network, we get the same dependence structure as if we conditioned.
plotNetwork "marginalizeOut-species-isGood" $
marginalizeOut TH_species $ marginalizeOut TH_isGood futuramaDist
Effectively, what the marginalizeOut function does is “forget” the extra dependencies, whereas the condition function “applies” those dependencies. In the end, the resulting Markov network has the same structure, but different values.
Finally, at the start of the post, I mentioned that our multivariate distributions have group and vector space structure. This gives us two more operations we can use: the inverse and scalar multiplication. You can find more posts on how to take advantage of these structures here and here.
The best part of all of this is still coming. Next, we’ll take a look at full on Bayesian classification and why it forms a monoid. Besides online and parallel trainers, this also gives us a fast cross-validation method.
There’ll also be a posts about the monoid structure of Markov chains, the Free HomTrainer, and how this whole algebraic framework applies to NP-approximation algorithms as well.
Growing up, I wanted nothing more than to be a naval officer. But then Jesus changed my heart. He’s been teaching me that instead of killing my enemies, I’m supposed to love them. In fact, I’m supposed to dedicate my life to serving them. Maybe even die for them. So after 7 years in the navy, I left as a conscientious objector. That’s also why I’m not paying my federal taxes this year.
In the United States, roughly half of our tax dollars go to financing war:
You can find a detailed breakdown here. This is ridiculous and unacceptable. I would gladly pay more taxes to finance roads, schools, or public health care. But I will no longer pay other people to kill America’s enemies on my behalf.
I deeply regret the need for tax resistance because it contradicts a number of Biblical commands. For example, in Romans 13:7 Paul tell us that “if you owe taxes, pay taxes” and in Mathew 22:21 Jesus commands us to “give unto Caesar the things that are Caesar’s.” I wish I could obey these commands at face value. But obeying the commands to pay taxes would result in me breaking the greatest commandment of them all: to love my neighbor as myself. Jesus calls everyone my neighbor, even my enemies. Even people who kill Americans, like Osama bin Laden. I’m deeply ashamed that my tax dollars helped finance his assassination. Not to mention the near-daily drone strikes that continue to happen, the torture at gitmo, and the DOD’s research into newer and deadlier weapons systems. I payed for it all.
I could say a lot more about why I feel morally compelled to not pay war taxes, but I won’t. I’ll skip right to the part where I’m making a public statement that I will not finance war, and I will accept whatever consequences that entails. I also acknowledge that by taking this stand, I am sinning. But this is the least sinful option my limited wisdom can find. So I will continue on, “sinning boldly” as ever.
To be smart with my protest, I’m following advice provided mainly by the War Resistor League’s War Tax Resistance book:
Today I filed my taxes just like everyone else. I filled out my form 1040, and found out that I owed 48 dollars. It’s not very much, but it’s something. I did my best to be as honest and complete as possible in the paperwork. But instead of including a check, I wrote them the following letter:
To whom it may concern:
After careful consideration, I have decided not to pay my 2012 taxes to the Federal government. I cannot in good conscience provide any financial support for our ongoing wars and excessive military spending.
I do, however, want to be a good citizen and contribute my fair share to society. Therefore, I am paying the taxes I owe to the federal government ($48) to my local state government (CA) instead. I have scanned a copy of my contribution check below.
Sincerely,
Michael Izbicki
I was happy to do my California taxes in addition to giving them this extra money. It’s only war that I’m against, not taxes in general. Here’s a copy of the actual check I wrote:
Also, for anyone interested, I’ve posted my form 1040:
Finally, just before mailing my envelope, I said the St Francis prayer:
Lord, make me an instrument of your peace. Where there is hatred, let me sow love; where there is injury, pardon; where there is doubt, faith; where there is despair, hope; where there is darkness, light; and where there is sadness, joy.
O Divine Master, grant that I may not so much seek to be consoled, as to console; to be understood, as to understand; to be loved, as to love. For it is in giving that we receive; it is in pardoning that we are pardoned, and it is in dying that we are born to Eternal Life.
In Mark 4, Jesus tells the classic parable of the sower:
Listen! Behold, a sower went out to sow. And as he sowed, some seed fell along the path, and the birds came and devoured it. Other seed fell on rocky ground, where it did not have much soil, and immediately it sprang up, since it had no depth of soil. And when the sun rose, it was scorched, and since it had no root, it withered away. Other seed fell among thorns, and the thorns grew up and choked it, and it yielded no grain. And other seeds fell into good soil and produced grain, growing up and increasing and yielding thirtyfold and sixtyfold and a hundredfold.
…
The sower sows the word. And these are the ones along the path, where the word is sown: when they hear, Satan immediately comes and takes away the word that is sown in them. And these are the ones sown on rocky ground: the ones who, when they hear the word, immediately receive it with joy. And they have no root in themselves, but endure for a while; then, when tribulation or persecution arises on account of the word, immediately they fall away. And others are the ones sown among thorns. They are those who hear the word, but the cares of the world and the deceitfulness of riches and the desires for other things enter in and choke the word, and it proves unfruitful. But those that were sown on the good soil are the ones who hear the word and accept it and bear fruit, thirtyfold and sixtyfold and a hundredfold.
My greatest temptation as a Christian is to try to take responsibility for God’s work. I want to feel like I am the one sowing the seeds and growing the fruit. I feel especially tempted to think this way after reading Jesus tell his followers in the sermon on the mount to judge a tree by its fruit. I feel like I should only focus on the fruit.
That’s why I find this parable particularly convicting. Jesus tells us that our focus should just be on our own soil. God’s rain and sun grows the plant, not me. Then God’s sower and wind scatter the seeds, not me. All I can do—the absolute only thing within my power—is to make my soil available for God. To remove the rocks and the weeds and give it good nutrients. I don’t like this because I feel like I should be doing more. I feel like God needs my help to sow seeds and make plants grow, when this is just patently absurd! When I try to do this, I just get in the way, then end up neglecting caring for my own soil.
Also, it’s extra cool that soil and soul sound so close in English.
The categorical distribution is the main distribution for handling discrete data. I like to think of it as a histogram. For example, let’s say Simon has a bag full of marbles. There are four “categories” of marbles—red, green, blue, and white. Now, if Simon reaches into the bag and randomly selects a marble, what’s the probability it will be green? We would use the categorical distribution to find out.
In this article, we’ll go over the math behind the categorical distribution, the algebraic structure of the distribution, and how to manipulate it within Haskell’s HLearn library. We’ll also see some examples of how this focus on algebra makes HLearn’s interface more powerful than other common statistical packages. Everything that we’re going to see is in a certain sense very “obvious” to a statistician, but this algebraic framework also makes it convenient. And since programmers are inherently lazy, this is a Very Good Thing.
Before delving into the “cool stuff,” we have to look at some of the mechanics of the HLearn library.
The HLearn-distributions package contains all the functions we need to manipulate categorical distributions. Let’s install it:
$ cabal install HLearn-distributions-1.1
We import our libraries:
>import Control.DeepSeq
>import HLearn.Algebra
>import HLearn.Models.Distributions
We create a data type for Simon’s marbles:
>data Marble = Red | Green | Blue | White
> deriving (Read,Show,Eq,Ord)
The easiest way to represent Simon’s bag of marbles is with a list:
>simonBag :: [Marble]
>simonBag = [Red, Red, Red, Green, Blue, Green, Red, Blue, Green, Green, Red, Red, Blue, Red, Red, Red, White]
And now we’re ready to train a categorical distribution of the marbles in Simon’s bag:
>simonDist = train simonBag :: Categorical Double Marble
We can load up ghci and plot the distribution with the conveniently named function plotDistribution:
ghci> plotDistribution (plotFile "simonDist" $ PDF 400 300) simonDist
This gives us a histogram of probabilities:
In the HLearn library, every statistical model is generated from data using either train or train’. Because these functions are overloaded, we must specify the type of simonDist so that the compiler knows which model to generate. Categorical takes two parameters. The first is the type of the discrete data (Marble). The second is the type of the probability (Double). We could easily create Categorical distributions with different types depending on the requirements for our application. For example:
>stringDist = train (map show simonBag) :: Categorical Float String
This is the first “cool thing” about Categorical: We can make distributions over any user-defined type. This makes programming with probabilities easier, more intuitive, and more convenient. Most other statistical libraries would require you to assign numbers corresponding to each color of marble, and then create a distribution over those numbers.
Now that we have a distribution, we can find some probabilities. If Simon pulls a marble from the bag, what’s the probability that it would Red?
We can use the pdf function to do this calculation for us:
ghci> pdf simonDist Red
0.5294117647058824
ghci> pdf simonDist Blue
0.17647058823529413
ghci> pdf simonDist Green
0.23529411764705882
ghci> pdf simonDist White
0.058823529411764705
If we sum all the probabilities, as expected we would get 1:
ghci> sum $ map (pdf simonDist) [Red,Green,Blue,White]
1.0
Due to rounding errors, you may not always get 1. If you absolutely, positively, have to avoid rounding errors, you should use Rational probabilities:
>simonDistRational = train simonBag :: Categorical Rational Marble
Rationals are slower, but won’t be subject to floating point errors.
This is just about all the functionality you would get in a “normal” stats package like R or NumPy. But using Haskell’s nice support for algebra, we can get some extra cool features.
First, let’s talk about semigroups. A semigroup is any data structure that has a binary operation (<>) that joins two of those data structures together. The categorical distribution is a semigroup.
Don wants to play marbles with Simon, and he has his own bag. Don’s bag contains only red and blue marbles:
>donBag = [Red,Blue,Red,Blue,Red,Blue,Blue,Red,Blue,Blue]
We can train a categorical distribution on Don’s bag in the same way we did earlier:
>donDist = train donBag :: Categorical Double Marble
In order to play marbles together, Don and Simon will have to add their bags together.
>bothBag = simonBag ++ donBag
Now, we have two options for training our distribution. First is the naive way, we can train the distribution directly on the combined bag:
>bothDist = train bothBag :: Categorical Double Marble
This is the way we would have to approach this problem in most statistical libraries. But with HLearn, we have a more efficient alternative. We can combine the trained distributions using the semigroup operation:
>bothDist' = simonDist <> donDist
Under the hood, the categorical distribution stores the number of times each possibility occurred in the training data. The <> operator just adds the corresponding counts from each distribution together:
This method is more efficient because it avoids repeating work we’ve already done. Categorical’s semigroup operation runs in time O(1), so no matter how big the bags are, we can calculate the distribution very quickly. The naive method, in contrast, requires time O(n). If our bags had millions or billions of marbles inside them, this would be a considerable savings!
We get another cool performance trick “for free” based on the fact that Categorical is a semigroup: The function train can be automatically parallelized using the higher order function parallel. I won’t go into the details about how this works, but here’s how you do it in practice.
First, we must show the compiler how to resolve the Marble data type down to “normal form.” This basically means we must show the compiler how to fully compute the data type. (We only have to do this because Marble is a type we created. If we were using a built in type, like a String, we could skip this step.) This is fairly easy for a type as simple as Marble:
>instance NFData Marble where
> rnf Red = ()
> rnf Blue = ()
> rnf Green = ()
> rnf White = ()
Then, we can perform the parallel computation by:
>simonDist_par = parallel train simonBag :: Categorical Double Marble
Other languages require a programmer to manually create parallel versions of their functions. But in Haskell with the HLearn library, we get these parallel versions for free! All we have to do is ask for it!
A monoid is a semigroup with an empty element, which is called mempty in Haskell. It obeys the law that:
M <> mempty == mempty <> M == M
And it is easy to show that Categorical is also a monoid. We get this empty element by training on an empty data set:
mempty = train ([] :: [Marble]) :: Categorical Double Marble
The HomTrainer type class requires that all its instances also be instances of Monoid. This lets the compiler automatically derive “online trainers” for us. An online trainer can add new data points to our statistical model without retraining it from scratch.
For example, we could use the function add1dp (stands for: add one data point) to add another white marble into Simon’s bag:
>simonDistWhite = add1dp simonDist White
This also gives us another approach for our earlier problem of combining Simon and Don’s bags. We could use the function addBatch:
>bothDist'' = addBatch simonDist donBag
Because Categorical is a monoid, we maintain the property that:
bothDist == bothDist' == bothDist''
Again, statisticians have always known that you could add new points into a categorical distribution without training from scratch. The cool thing here is that the compiler is deriving all of these functions for us, and it’s giving us a** consistent interface for use with different data structures. All we had to do to get these benefits was tell the compiler that Categorical is a monoid. This makes designing and programming libraries much easier, quicker, and less error prone**.
A group is a monoid with the additional property that all elements have an inverse. This lets us perform subtraction on groups. And Categorical is a group.
Ed wants to play marbles too, but he doesn’t have any of his own. So Simon offers to give Ed some of from his own bag. He gives Ed one of each color:
>edBag = [Red,Green,Blue,White]
Now, if Simon draws a marble from his bag, what’s the probability it will be blue?
To answer this question without algebra, we’d have to go back to the original data set, remove the marbles Simon gave Ed, then retrain the distribution. This is awkward and computationally expensive. But if we take advantage of Categorical’s group structure, we can just subtract directly from the distribution itself. This makes more sense intuitively and is easier computationally.
>simonDist2 = subBatch simonDist edBag
This is a shorthand notation for using the group operations directly:
>edDist = train edBag :: Categorical Double Marble
>simonDist2' = simonDist <> (inverse edDist)
The way the inverse operation works is it multiplies the counts for each category by -1. In picture form, this flips the distribution upside down:
Then, adding an upside down distribution to a normal one is just subtracting the histogram columns and renormalizing:
Notice that the green bar in edDist looks really big—much bigger than the green bar in simonDist. But when we subtract it away from simonDist, we still have some green marbles left over in simonDist2. This is because the histogram is only showing the probability of a green marble, and not the actual number of marbles.
Finally, there’s one more crazy trick we can perform with the Categorical group. It’s perfectly okay to have both positive and negative marbles in the same distribution. For example:
ghci> plotDistribution (plotFile "mixedDist" $ PDF 400 300) (edDist <> (inverse donDist))
results in:
Most statisticians would probably say that these upside down Categoricals are not “real distributions.” But at the very least, they are a convenient mathematical trick that makes working with distributions much more pleasant.
Finally, an R-Module is a group with two additional properties. First, it is abelian. That means <> is commutative. So, for all a, b:
a <> b == b <> a
Second, the data type supports multiplication by any element in the ring** R**. In Haskell, you can think of a ring as any member of the Num type class.
How is this useful? It let’s “retrain” our distribution on the data points it has already seen. Back to the example…
Well, Ed—being the clever guy that he is—recently developed a marble copying machine. That’s right! You just stick some marbles in on one end, and on the other end out pop 10 exact duplicates. Ed’s not just clever, but pretty nice too. He duplicates his new marbles and gives all of them back to Simon. What’s Simon’s new distribution look like?
Again, the naive way to answer this question would be to retrain from scratch:
>duplicateBag = simonBag ++ (concat $ replicate 10 edBag)
>duplicateDist = train duplicateBag :: Categorical Double Marble
Slightly better is to take advantage of the Semigroup property, and just apply that over and over again:
>duplicateDist' = simonDist2 <> (foldl1 (<>) $ replicate 10 edDist)
But even better is to take advantage of the fact that Categorical is a module and the (.*) operator:
>duplicateDist'' = simonDist2 <> 10 .* edDist
In picture form:
Also notice that without the scalar multiplication, we would get back our original distribution:
Another way to think about the module’s scalar multiplication is that it allows us to weight our distributions.
Ed just realized that he still needs a marble, and has decided to take one. Someone has left their Marble bag sitting nearby, but he’s not sure whose it is. He thinks that Simon is more forgetful than Don is, so he assigns a 60% probability that the bag is Simon’s and a 40% probability that it is Don’s. When he takes a marble, what’s the probability that it is red?
We create a weighted distribution using module multiplication:
>weightedDist = 0.6 .* simonDist <> 0.4 .* donDist
Then in ghci:
ghci> pdf weightedDist Red
0.4929577464788732
We can also train directly on weighted data using the trainW function:
>weightedDataDist = trainW [(0.4,Red),(0.5,Green),(0.2,Green),(3.7,White)] :: Categorical Double Marble
which gives us:
Talking about the categorical distribution in algebraic terms let’s us do some cool new stuff with our distributions that we can’t easily do in other libraries. None of this is statistically ground breaking. The cool thing is that algebra just makes everything so convenient to work with.
I think I’ll do another post on some cool tricks with the kernel density estimator that are not possible at all in other libraries, then do a post about the category (formal category-theoretic sense) of statistical training methods. At that point, we’ll be ready to jump into machine learning tasks. Depending on my mood we might take a pit stop to discuss the computational aspects of free groups and modules and how these relate to machine learning applications.
The Bulletin of the Atomic Scientists tracks the nuclear capabilities of every country. We’re going to use their data to demonstrate Haskell’s HLearn library and the usefulness of abstract algebra to statistics. Specifically, we’ll see that the categorical distribution and kernel density estimates have monoid, group, and module algebraic structures. We’ll explain what this crazy lingo even means, then take advantage of these structures to efficiently answer real-world statistical questions about nuclear war. It’ll be a WOPR!
Before we get into the math, we’ll need to review the basics of nuclear politics.
The nuclear Non-Proliferation Treaty (NPT) is the main treaty governing nuclear weapons. Basically, it says that there are five countries that are “allowed” to have nukes: the USA, UK, France, Russia, and China. “Allowed” is in quotes because the treaty specifies that these countries must eventually get rid of their nuclear weapons at some future, unspecified date. When another country, for example Iran, signs the NPT, they are agreeing to not develop nuclear weapons. What they get in exchange is help from the 5 nuclear weapons states in developing their own civilian nuclear power programs. (Iran has the legitimate complaint that Western countries are actively trying to stop its civilian nuclear program when they’re supposed to be helping it, but that’s a whole ’nother can of worms.)
The Nuclear Notebook tracks the nuclear capabilities of all these countries. The most-current estimates are from mid-2012. Here’s a summary (click the warhead type for more info):Country | Delivery Method | Warhead | Yield (kt) | # Deployed |
USA | ICBM | W78 | 335 | 250 |
USA | ICBM | W87 | 300 | 250 |
USA | SLBM | W76 | 100 | 468 |
USA | SLBM | W76-1 | 100 | 300 |
USA | SLBM | W88 | 455 | 384 |
USA | Bomber | W80 | 150 | 200 |
USA | Bomber | B61 | 340 | 50 |
USA | Bomber | B83 | 1200 | 50 |
UK | SLBM | W76 | 100 | 225 |
France | SLBM | TN75 | 100 | 150 |
France | Bomber | TN81 | 300 | 150 |
Russia | ICBM | RS-20V | 800 | 500 |
Russia | ICBM | RS-18 | 400 | 288 |
Russia | ICBM | RS-12M | 800 | 135 |
Russia | ICBM | RS-12M2 | 800 | 56 |
Russia | ICBM | RS-12M1 | 800 | 18 |
Russia | ICBM | RS-24 | 100 | 90 |
Russia | SLBM | RSM-50 | 50 | 144 |
Russia | SLBM | RSM-54 | 100 | 384 |
Russia | Bomber | AS-15 | 200 | 820 |
China | ICBM | DF-3A | 3300 | 16 |
China | ICBM | DF-4 | 3300 | 12 |
China | ICBM | DF-5A | 5000 | 20 |
China | ICBM | DF-21 | 300 | 60 |
China | ICBM | DF-31 | 300 | 20 |
China | ICBM | DF-31A | 300 | 20 |
China | Bomber | H-6 | 3100 | 20 |
I’ve consolidated all this data into the file nukes-list.csv, which we will analyze in this post. If you want to try out this code for yourself (or the homework question at the end), you’ll need to download it. Every line in the file corresponds to a single nuclear warhead, not delivery method. Warheads are the parts that go boom! Bombers, ICBMs, and SSBN/SLBMs are the delivery method.
There are three things to note about this data. First, it’s only estimates based on public sources. In particular, it probably overestimates the Russian nuclear forces. Other estimates are considerably lower. Second, we will only be considering deployed, strategic warheads. Basically, this means the “really big nukes that are currently aimed at another country.” There are thousands more tactical warheads, and warheads in reserve stockpiles waiting to be disassembled. For simplicity—and because these nukes don’t significantly affect strategic planning—we won’t be considering them here. Finally, there are 4 countries who are not members of the NPT but have nuclear weapons: Israel, Pakistan, India, and North Korea. We will be ignoring them here because their inventories are relatively small, and most of their weapons would not be considered strategic.
First, let’s install the library:
$ cabal install HLearn-distributions-0.1
Now we’re ready to start programming. First, let’s import our libraries:
>import Control.Lens
>import Data.Csv
>import qualified Data.Vector as V
>import qualified Data.ByteString.Lazy.Char8 as BS
>
>import HLearn.Algebra
>import HLearn.Models.Distributions
>import HLearn.Gnuplot.Distributions
Next, we load our data using the Cassava package. (You don’t need to understand how this works.)
>main = do
> Right rawdata <- fmap (fmap V.toList . decode True) $ BS.readFile "nukes-list.csv"
> :: IO (Either String [(String, String, String, Int)])
And we’ll use the Lens package to parse the CSV file into a series of variables containing just the values we want. (You also don’t need to understand this.)
> let list_usa = fmap (\row -> row^._4) $ filter (\row -> (row^._1)=="USA" ) rawdata
> let list_uk = fmap (\row -> row^._4) $ filter (\row -> (row^._1)=="UK" ) rawdata
> let list_france = fmap (\row -> row^._4) $ filter (\row -> (row^._1)=="France") rawdata
> let list_russia = fmap (\row -> row^._4) $ filter (\row -> (row^._1)=="Russia") rawdata
> let list_china = fmap (\row -> row^._4) $ filter (\row -> (row^._1)=="China" ) rawdata
NOTE: All you need to understand about the above code is what these list_country variables look like. So let’s print one:
> putStrLn $ "List of American nuclear weapon sizes = " ++ show list_usa
gives us the output:
List of American nuclear weapon sizes = fromList [335,335,335,335,335,335,335,335,335,335 ... 1200,1200,1200,1200,1200]
If we want to know how many weapons are in the American arsenal, we can take the length of the list:
> putStrLn $ "Number of American weapons = " ++ show (length list_usa)
We get that there are 1951 American deployed, strategic nuclear weapons. If we want to know the total “blowing up” power, we take the sum of the list:
> putStrLn $ "Explosive power of American weapons = " ++ show (sum list_usa)
We get that the US has 516 megatons of deployed, strategic nuclear weapons. That’s the equivalent of 1,033,870,000,000 pounds of TNT.
To get the total number of weapons in the world, we concatenate every country’s list of weapons and find the length:
> let list_all = list_usa ++ list_uk ++ list_france ++ list_russia ++ list_china
> putStrLn $ "Number of nukes in the whole world = " ++ show (length list_all)
Doing this for every country gives us the table:
Country | Warheads | Total explosive power (kt) |
USA | 1,951 | 516,935 |
UK | 225 | 22,500 |
France | 300 | 60,000 |
Russia | 2,435 | 901,000 |
China | 168 | 284,400 |
Total | 5,079 | 1,784,835 |
Now let’s do some algebra!
In a previous post, we saw that the Gaussian distribution forms a group. This means that it has all the properties of a monoid—an empty element (mempty) that represents the distribution trained on no data, and a binary operation (mappend) that merges two distributions together—plus an inverse. This inverse lets us “subtract” two Gaussians from each other.
It turns out that many other distributions also have this group property. For example, the categorical distribution. This distribution is used for measuring discrete data. Essentially, it assigns some probability to each “label.” In our case, the labels are the size of the nuclear weapon, and the probability is the chance that a randomly chosen nuke will be exactly that destructive. We train our categorical distribution using the train function:
> let cat_usa = train list_usa :: Categorical Int Double
If we plot this distribution, we’ll get a graph that looks something like:
A distribution like this is useful to war planners from other countries. It can help them statistically determine the amount of casualties their infrastructure will take from a nuclear exchange.
Now, let’s train equivalent distributions for our other countries.
> let cat_uk = train list_uk :: Categorical Int Double
> let cat_france = train list_france :: Categorical Int Double
> let cat_russia = train list_russia :: Categorical Int Double
> let cat_china = train list_china :: Categorical Int Double
Because training the categorical distribution is a group homomorphism, we can train a distribution over all nukes by either training directly on the data:
> let cat_allA = train list_all :: Categorical Int Double
or we can merge the already generated categorical distributions:
> let cat_allB = cat_usa <> cat_uk <> cat_france <> cat_russia <> cat_china
Because of the homomorphism property, we will get the same result both ways. Since we’ve already done the calculations for each of the the countries already, method B will be more efficient—it won’t have to repeat work we’ve already done. If we plot either of these distributions, we get:
The thing to notice in this plot is that most countries have a nuclear arsenal that is distributed similarly to the United States—except for China. These Chinese ICBMs will become much more important when we discuss nuclear strategy in the last section.
But nuclear war planners don’t particularly care about this complete list of nuclear weapons. What war planners care about is the survivable nuclear weapons—that is, weapons that won’t be blown up by a surprise nuclear attack. Our distributions above contain nukes dropped from bombers, but these are not survivable. They are easy to destroy. For our purposes, we’ll call anything that’s not a bomber a survivable weapon.
We’ll use the group property of the categorical distribution to calculate the survivable weapons. First, we create a distribution of just the _un_survivable bombers:
> let list_bomber = fmap (\row -> row^._4) $ filter (\row -> (row^._2)=="Bomber") rawdata
> let cat_bomber = train list_bomber :: Categorical Int Double
Then, we use our group inverse to subtract these unsurvivable weapons away:
> let cat_survivable = cat_allB <> (inverse cat_bomber)
Notice that we calculated this distribution indirectly—there was no possible way to combine our variables above to generate this value without using the inverse! This is the power of groups in statistics.
The categorical distribution is not sufficient to accurately describe the distribution of nuclear weapons. This is because we don’t actually know the yield of a given warhead. Like all things, it has some manufacturing tolerances that we must consider. For example, if we detonate a 300 kt warhead, the actual explosion might be 275 kt, 350 kt, or the bomb might even “fizzle out” and have almost a 0kt explosion.
We’ll model this by using a kernel density estimator (KDE). The KDE basically takes all our data points, assigns each one a probability distribution called a “kernel,” then sums these kernels together. It is a very powerful and general technique for modelling distributions… and it also happens to form a group!
First, let’s create the parameters for our KDE. The bandwidth controls how wide each of the kernels is. Bigger means wider. I selected 20 because it made a reasonable looking density function. The sample points are exactly what they sounds like: they are where we will sample the density from. We can generate them using the function genSamplePoints. Finally, the kernel is the shape of the distributions we will be summing up. There are many supported kernels.
> let kdeparams = KDEParams
> { bandwidth = Constant 20
> , samplePoints = genSamplePoints
> 0 -- minimum
> 4000 -- maximum
> 4000 -- number of samples
> , kernel = KernelBox Gaussian
> } :: KDEParams Double
Now, we’ll train kernel density estimates on our data. Notice that because the KDE takes parameters, we must use the train’ function instead of just train.
> let kde_usa = train' kdeparams list_usa :: KDE Double
Again, plotting just the American weapons gives:
And we train the corresponding distributions for the other countries.
> let kde_uk = train' kdeparams list_uk :: KDE Double
> let kde_france = train' kdeparams list_france :: KDE Double
> let kde_russia = train' kdeparams list_russia :: KDE Double
> let kde_china = train' kdeparams list_china :: KDE Double
>
> let kde_all = kde_usa <> kde_uk <> kde_france <> kde_russia <> kde_china
The KDE is a powerful technique, but the draw back is that it is computationally expensive—especially when a large number of sample points are used. Fortunately, all computations in the HLearn library are easily parallelizable by applying the higher order function parallel.
We can calculate the full KDE from scratch in parallel like this:
> let list_double_all = map fromIntegral list_all :: [Double]
> let kde_all_parA = (parallel (train' kdeparams)) list_double_all :: KDE Double
or we can perform a parallel reduction on the KDEs for each country like this:
> let kde_all_parB = (parallel reduce) [kde_usa, kde_uk, kde_france, kde_russia, kde_china]
And because the KDE is a homomorphism, we get the same exact thing either way. Let’s plot the parallel version:
> plotDistribution (genPlotParams "kde_all" kde_all_parA) kde_all_parA
The parallel computation takes about 16 seconds on my Core2 Duo laptop running on 2 processors, whereas the serial computation takes about 28 seconds.
This is a considerable speedup, but we can still do better. It turns out that there is a homomorphism from the Categorical distribution to the KDE:
> let kde_fromcat_all = cat_allB $> kdeparams
> plotDistribution (genPlotParams "kde_fromcat_all" kde_fromcat_all) kde_fromcat_all
(For more information about the morphism chaining operator $>, see the Hlearn documentation.) This computation takes less than a second and gets the exact same result as the much more expensive computations above.
We can express this relationship with a commutative diagram:
No matter which path we take to get to a KDE, we will get the exact same answer. So we should always take the path that will be least computationally expensive for the data set we’re working on.
Why does this work? Well, the categorical distribution is a structure called the “free module” in disguise.
R-Modules (like groups, but unlike monoids) have not seen much love from functional programmers. This is a shame, because they’re quite handy. It turns out they will increase our performance dramatically in this case.
It’s not super important to know the formal definition of an R-module, but here it is anyways: An R-module is a group with an additional property: it can be “multiplied” by any element of the ring R. This is a generalization of vector spaces because R need only be a ring instead of a field. (Rings do not necessarily have multiplicative inverses.) It’s probably easier to see what this means by an example.
Vectors are modules. Let’s say I have a vector:
> let vec = [1,2,3,4,5] :: [Int]
I can perform scalar multiplication on that vector like this:
> let vec2 = 3 .* vec
which as you might expect results in:
[3,6,9,12,15]
Our next example is the free R-module. A “free” structure is one that obeys only the axioms of the structure and nothing else. Functional programmers are very familiar with the free monoid—it’s the list data type. The free Z-module is like a beefed up list. Instead of just storing the elements in a list, it also stores the number of times that element occurred. (Z is shorthand for the set of integers, which form a ring but not a field.) This lets us greatly reduce the memory required to store a repetitive data set.
In HLearn, we represent the free module over a ring r with the data type:
:: FreeMod r a
where a is the type of elements to be stored in the free module. We can convert our lists into free modules using the function list2module like this:
> let module_usa = list2module list_usa
But what does the free module actually look like? Let’s print it to find out:
> print module_usa
gives us:
FreeMod (fromList [(100,768),(150,200),(300,250),(335,249),(340,50),(455,384),(1200,50)])
This is much more compact! So this is the take away: The free module makes repetitive data sets easier to work with. Now, let’s convert all our country data into module form:
> let module_uk = list2module list_uk
> let module_france = list2module list_france
> let module_russia = list2module list_russia
> let module_china = list2module list_china
Because modules are also groups, we can combine them like so:
> let module_allA = module_usa <> module_uk <> module_france <> module_russia <> module_china
or, we could train them from scratch:
> let module_allB = list2module list_all
Again, because generating a free module is a homomorphism, both methods are equivalent.
The categorical distribution and the KDE both have this module structure. This gives us two cool properties for free.
First, we can train these distributions directly from the free module. Because the free module is potentially much more compact than a list is, this can save both memory and time. If we run:
> let cat_module_all = train module_allB :: Categorical Int Double
> let kde_module_all = train' kdeparams module_allB :: KDE Double
Then we get the properties:
cat_mod_all == cat_all
kde_mod_all == kde_all == kde_cat_all
Extending our commutative diagram above gives:
Again, no matter which path we take to train our KDE, we still get the same result because each of these arrows is a homomorphism.
Second, if a distribution is a module, we can weight the importance of our data points. Let’s say we’re a general from North Korea (DPRK), and we’re planning our nuclear strategy. The US and North Korea have a very strained relationship in the nuclear department. It is much more likely that the US will try to nuke the DPRK than China will. And modules let us model this! We can weight each country’s influence on our “nuclear threat profile” distribution like this:
> let threats_dprk = 20 .* kde_usa
> <> 10 .* kde_uk
> <> 5 .* kde_france
> <> 2 .* kde_russia
> <> 1 .* kde_china
>
> plotDistribution (genPlotParams "threats_dprk" threats_dprk) threats_dprk
Basically, we’re saying that the USA is 20x more likely to attack the DPRK than China is. Graphically, our threat distribution is:
The maximum threat that we have to worry about is about 1300 kt, so we need to design all our nuclear bunkers to withstand this level of blast. Nuclear war planners would use the above distribution to figure out how much infrastructure would survive a nuclear exchange. To see how this is done, you’ll have to click the link.
On the other hand, if we’re an American general, then we might say that China is our biggest threat… who knows what they’ll do when we can’t pay all the debt we owe them!?
> let threats_usa = 1 .* kde_russia
> <> 5 .* kde_china
>
> plotDistribution (genPlotParams "threats_usa" threats_usa) threats_usa
Graphically:
So now Chinese ICBMs are a real threat. For American infrastructure to be secure, most of it needs to be able to withstand ~3500 kt blast. (Actually, Chinese nuclear policy is called the “minimum means of reprisal”—these nukes are not targeted at military installations, but major cities. Unlike the other nuclear powers, China doesn’t hope to win a nuclear war. Instead, its nuclear posture is designed to prevent nuclear war in the first place. This is why China has the fewest weapons of any of these countries. For a detailed analysis, see the book Minimum Means of Reprisal. This means that American military infrastructure isn’t threatened by these large Chinese nukes, and really only needs to be able to withstand an 800kt explosion to be survivable.)
By the way, since we’ve already calculated all of the kde_country variables before, these computations take virtually no time at all to compute. Again, this is all made possible thanks to our friend abstract algebra.
If you want to try out the HLearn library for yourself, here’s a question you can try to answer: Create the DPRK and US threat distributions above, but only use survivable weapons. Don’t include bombers in the analysis.
In our next post, we’ll go into more detail about the mathematical plumbing that makes all this possible. Then we’ll start talking about Bayesian classification and full-on machine learning. Subscribe to the RSS feed so you don’t miss out!
Why don’t you listen to Tom Lehrer’s “Song for WWIII” while you wait?
We don’t know what God wants, and we wouldn’t know how to do it even if we did. Therefore (as Gandhi put it) we must “experiment with truth.” We must discover truth for ourselves, and how to achieve it.
These are my experiments from 2012. I didn’t try these experiments because they are somehow the “most Christ-like” thing to do. I tried them because I don’t know what the most Christ-like thing is, but I want to learn. I want to train myself to do it at all times. Some of these experiments succeeded and some failed. But even the failures made me a better Christian.
This year, I decided not to eat any food on Mondays. During lent, I didn’t eat on either Monday or Thursday. I had never fasted before this. Growing up, I thought fasting was a “stupid Catholic thing.” But now I see it is immensely valuable.
From an earthly standpoint, fasting helped me exercise my willpower and discipline. From a spiritual standpoint, being hungry was a constant reminder of people in need around the world—I need to help them just like Jesus would, and I need to give thanks to God for what food and blessings I do have. Fasting also helped me grow closer to God. Every hunger pang was a reminder to say a prayer to God asking for His endurance and grace.
From January 1st until Easter Sunday (~3 months), I didn’t eat any meat. I did this experiment for two reasons. First, I feel bad about factory farming and my involvement in the abuse of animals. Second, I want to stand in solidarity with those who choose vegetarianism for moral reasons. I want to understand what it feels like to go into a restaurant or a party and leave hungry because there’s no vegetarian food. I want to share their pain.
This experiment was much easier than I expected. I never had any cravings for meat, and learned to cook quite a variety of different foods. My new favorite food is “buffalo broccoli,” a stir fry with buffalo sauce added at the end. It’s even better on broccoli than it is on chicken because the broccoli has so much surface area for the sauce to cling to. It’s amazing.
Ultimately, however, I decided not to remain strictly vegetarian because it was harming my relationships. Sharing meals is an important part of friendship, and I was unable to participate in several church and family dinners because of my strict vegetarianism. So now, I try to eat “mostly vegetarian.” If I’m by myself, I won’t eat meat. If I’m at a party and there’s a vegetarian option available, I’ll eat only that. But, if I’m with other people and the only food available is meat, then I’ll eat meat with them. To me, strengthening these relationships with humans seems more important than reducing the suffering of factory farmed animals.
Ever since leaving the Navy, I’ve struggled with how to express my pacifism. I want to do it in a way that promotes peace rather than just condemns war. So I helped start a Food not Bombs group on the UCR campus with a few other students. We served free vegetarian meals every Friday during the spring 2012 quarter. In total, we served 1000 meals.
Unfortunately, we were unable to continue serving food after that quarter. Some administrators decided they didn’t want Food not Bombs on campus, despite the fact that we were following all the proper regulations. All of our requests for food permits were denied for various stupid “problems,” and when we fixed those problems, the administrators found new problems. It became clear that they would never approve a permit with our names on it, so our FNB group gradually dissolved.
One thing I learned from the experience is that I suck at organizing and motivating people behind a vision. I just don’t enjoy doing it. So in the future, rather than trying to start my own group, I’m going to help somebody else start theirs. Finding a good peacemaking group to be a part of is my top priority for 2013.
I grew up thinking that the way to be a good Christian was to be a tea-totaling nationalist. Now, I believe the way to be a good Christian is to be Christ to everyone you meet. So since Jesus turned water into wine, I decided to try my hand at turning it into beer. (I personally can’t stand wine.)
This has turned out better than I could ever have imagined. Not only is making beer a ton of fun, but it really helps build relationships. Sharing alcohol with strangers is the fastest way I know to turn them into friends. Plus, it can lead into great conversations about God. Mostly what I’ve been brewing is what I affectionately call “monk beer.” It’s based on recipes from Trappist monasteries from Belgian. In my experience, even the most die-hard anti-theists have good things to say about Christians after we share some monk beer together.
Oh, and the beer tastes great too. We got 2nd place out of 550 at the Mayfaire beer competition.
I’ve read through the sermon more times than I can count. Many verses, like “turn the other cheek” and “go the extra mile” are burned into my heart. But I feel there is still so much more for me to learn here. I still don’t fully understand what Jesus is saying, and I’m certainly not yet living it out.
So I decided to memorize it. I made this goal for myself at the beginning of summer, and 6 months later I still haven’t accomplished it. I am normally good at memorizing, I just haven’t put in the time it takes to accomplish this task. I am getting closer, however. Finishing this task is one of my goals for 2013.
I made a commitment in 2012 not to open my computer on Sunday. Instead of wasting my time working or browsing reddit, I wanted to spend this same time building relationships with members of my church and family. Growing up, I thought respecting the sabbath was a “stupid Jewish thing,” but now I realize it is immensely important.
Sadly, I didn’t keep the sabbath as well as I would have liked. Often, I would read academic papers on Sunday, grade homework, or do other things that failed to build relationships. I did this because of poor scheduling throughout the week and misplaced priorities about what is important.
Another goal for 2013 is to change this. I want my every Sunday to be the most holy day of the year. I want to dedicate Sunday to building important relationships with God and men.
Normally I ride my bike everywhere, but when I got a flat tire in early November I decided to try walking everywhere. Bikes are already a much slower lifestyle than cars, but walking is even slower. Every day since my tire popped until today, I did a 40 minute walk to work. It was amazing.
As I walked to work, I got to enjoy the sights of the snowy San Gabriel mountains to the north, the Box Spring mountains to the east and Saddleback Mountain to the southwest. I got to watch and listen to the birds sing. I got to pick up trash along the side of the road, making the community just a little bit nicer, and I even got to wish the occasional homeless man a “Merry Christmas” and talk about how they’re getting along.
Sadly, I failed to help many of the “least of these” in Riverside. Many times this year I drove passed a homeless man despite a gut wrenching conviction that I need to stop and help this person. I failed to help Christ in his greatest need, and I am ashamed.
In fall of 2011, I did a lot of talking with the homeless. I did most of my grocery shopping at a nearby Stater Brothers, and there were always homeless men and women outside begging for money. I would ask them to come shopping with me, buy them some food, and have a good conversation about their life and problems. But this year my roommates and I have been shopping exclusively at Costco, and I don’t get to meet any of these homeless while going about my daily business. I think this turned out to be a Very Bad Thing for my spiritual health.
I’m still not sure how to fix this problem for 2013.
I’m sharing this for two reasons. First, these experiments have been a tremendous growth to me spiritually. Maybe they will help you as well if you try them. But the second reason is much more important to me personally: I want public accountability for my actions. I want to change the world for the better—I want to be like Christ—but I can’t do that without help. I need others to lift me up, just as I need to lift others up.
So if you have any suggestions for cool religious experiments I can perform in 2013, please tell me!
(And why machine learning experts should care)
This is the first in a series of posts about the HLearn library for haskell that I’ve been working on for the past few months. The idea of the library is to show that abstract algebra—specifically monoids, groups, and homomorphisms—are useful not just in esoteric functional programming, but also in real world machine learning problems. In particular, by framing a learning algorithm according to these algebraic properties, we get three things for free: (1) an online version of the algorithm; (2) a parallel version of the algorithm; and (3) a procedure for cross-validation that runs asymptotically faster than the standard version.
We’ll start with the example of a Gaussian distribution. Gaussians are ubiquitous in learning algorithms because they accurately describe most data. But more importantly, they are easy to work with. They are fully determined by their mean and variance, and these parameters are easy to calculate.
In this post we’ll start with examples of why the monoid and group properties of Gaussians are useful in practice, then we’ll look at the math underlying these examples, and finally we’ll see that this technique is extremely fast in practice and results in near perfect parallelization.
Install the libraries from a shell:
$ cabal install HLearn-algebra-0.0.1
$ cabal install HLearn-distributions-0.0.1
Then import the HLearn libraries into a literate haskell file:
> import HLearn.Algebra
> import HLearn.Models.Distributions.Gaussian
And some libraries for comparing our performance:
> import Criterion.Main
> import Statistics.Distribution.Normal
> import qualified Data.Vector.Unboxed as VU
Now let’s create some data to work with. For simplicity’s sake, we’ll use a made up data set of how much money people make. Every entry represents one person making that salary. (We use a small data set here for ease of explanation. When we stress test this library at the end of the post we use much larger data sets.)
> gradstudents = [15e3,25e3,18e3,17e3,9e3] :: [Double]
> teachers = [40e3,35e3,89e3,50e3,52e3,97e3] :: [Double]
> doctors = [130e3,105e3,250e3] :: [Double]
In order to train a Gaussian distribution from the data, we simply use the train function, like so:
> gradstudents_gaussian = train gradstudents :: Gaussian Double
> teachers_gaussian = train teachers :: Gaussian Double
> doctors_gaussian = train doctors :: Gaussian Double
The train function is a member of the HomTrainer type class, which we’ll talk more about later. Also, now that we’ve trained some Gaussian distributions, we can perform all the normal calculations we might want to do on a distribution. For example, taking the mean, standard deviation, pdf, and cdf.
Now for the interesting bits. We start by showing that the Gaussian is a semigroup. A semigroup is any data structure that has an associative binary operation called (<>). Basically, we can think of (<>) as “adding” or “merging” the two structures together. (Semigroups are monoids with only a mappend function.)
So how do we use this? Well, what if we decide we want a Gaussian over everyone’s salaries? Using the traditional approach, we’d have to recompute this from scratch.
> all_salaries = concat [gradstudents,teachers,doctors]
> traditional_all_gaussian = train all_salaries :: Gaussian Double
But this repeats work we’ve already done. On a real world data set with millions or billions of samples, this would be very slow. Better would be to merge the Gaussians we’ve already trained into one final Gaussian. We can do that with the semigroup operation (<>):
> semigroup_all_gaussian = gradstudents_gaussian <> teachers_gaussian <> doctors_gaussian
Now,
traditional_all_gaussian == semigroup_all_gaussian
The coolest part about this is that the semigroup operation takes time O(1), no matter how much data we’ve trained the Gaussians on. The naive approach takes time O(n), so we’ve got a pretty big speed up!
Next, a monoid is a semigroup with an identity. The identity for a Gaussian is easy to define—simply train on the empty data set!
> gaussian_identity = train ([]::[Double]) :: Gaussian Double
Now,
gaussian_identity == mempty
But we’ve still got one more trick up our sleeves. The Gaussian distribution is not just a monoid, but also a group. Groups appear all the time in abstract algebra, but they haven’t seen much attention in functional programming for some reason. Well groups are simple: they’re just monoids with an inverse. This inverse lets us do “subtraction” on our data structures.
So back to our salary example. Lets say we’ve calculated all our salaries, but we’ve realized that including grad students in the salary calculations was a mistake. (They’re not real people after all.) In a normal library, we would have to recalculate everything from scratch again, excluding the grad students:
> nograds = concat [teachers,doctors]
> traditional_nograds_gaussian = train nograds :: Gaussian Double
But as we’ve already discussed, this takes a lot of time. We can use the inverse function to do this same operation in constant time:
> group_nograds_gaussian = semigroup_all_gaussian <> (inverse gradstudents_gaussian)
And now,
traditional_nograds_gaussian == group_nograds_gaussian
Again, we’ve converted an operation that would have taken time** O(n)** into one that takes time O(1). Can’t get much better than that!
As I’ve already mentioned, the HomTrainer type class is the basis of the HLearn library. Basically, any learning algorithm that is also a semigroup homomorphism can be made an instance of HomTrainer. This means that if xs and ys are lists of data points, the class obeys the following law:
train (xs ++ ys) == (train xs) <> (train ys)
It might be easier to see what this means in picture form:
On the left hand side, we have some data sets, and on the right hand side, we have the corresponding Gaussian distributions and their parameters. Because training the Gaussian is a homomorphism, it doesn’t matter whether we follow the orange or green paths to get to our final answer. We get the exact same answer either way.
Based on this property alone, we get the three “free” properties I mentioned in the introduction. (1) We get an online algorithm for free. The function add1dp can be used to add a single new point to an existing Gaussian distribution. Let’s say I forgot about one of the graduate students—I’m sure this would never happen in real life—I can add their salary like this:
> gradstudents_updated_gaussian = add1dp gradstudents_gaussian (10e3::Double)
This updated Gaussian is exactly what we would get if we had included the new data point in the original data set.
(2) We get a parallel algorithm. We can use the higher order function parallel to parallelize any application of train. For example,
> gradstudents_parallel_gaussian = (parallel train) gradstudents :: Gaussian Double
The function parallel automatically detects the number of processors your computer has and evenly distributes the work load over them. As we’ll see in the performance section, this results in perfect parallelization of the training function. Parallelization literally could not be any simpler!
(3) We get asymptotically faster cross-validation; but that’s not really applicable to a Gaussian distribution so we’ll ignore it here.
One last note about the HomTrainer class: we never actually have to define the train function for our learning algorithm explicitly. All we have to do is define the semigroup operation, and the compiler will derive our training function for us! We’ll save a discussion of why this homomorphism property gives us these results for another post. Instead, we’ll just take a look at what the Gaussian distribution’s semigroup operation looks like.
Our Gaussian data type is defined as:
data Gaussian datapoint = Gaussian
{ n :: !Int -- The number of samples trained on
, m1 :: !datapoint -- The mean (first moment) of the trained distribution
, m2 :: !datapoint -- The variance (second moment) times (n-1)
, dc :: !Int -- The number of "dummy points" that have been added
}
In order to estimate a Gaussian from a sample, we must find the total number of samples (n), the mean (m1), and the variance (calculated from m2). (We’ll explain what dc means a little later.) Therefore, we must figure out an appropriate definition for our semigroup operation below:
(Gaussian na m1a m2a dca) <> (Gaussian nb m1b m2b dcb) = Gaussian n' m1' m2' dc'
First, we calculate the number of samples n’. The number of samples in the resulting distribution is simply the sum of the number of samples in both the input distributions:
Second, we calculate the new average m1’. We start with the definition that the final mean is:
\[ \displaystyle m'_1 = \frac{1}{n'}\sum_{i=1}^{n'} x_i \]
Then we split the summation according to whether the input element \(x_i\) was from the left Gaussian a or right Gaussian b, and substitute with the definition of the mean above:
\[ m'_1 = \frac{1}{n'}\left(\sum_{i=1}^{n_a} x_{ia} + \sum_{i=1}^{n_b} x_{ib}\right) \\ m'_1 = \frac{1}{n'}\left(n_a m_{1a} + n_b m_{1b}\right) \]
Notice that this is simply the weighted average of the two means. This makes intuitive sense. But there is a slight problem with this definition: When implemented on a computer with floating point arithmetic, we will get infinity whenever n’ is 0. We solve this problem by adding a “dummy” element into the Gaussian whenever n’ would be zero. This increases n’ from 0 to 1, preventing the division by 0. The variable dc counts how many dummy variables have been added, so that we can remove them before performing calculations (e.g. finding the pdf) that would be affected by an incorrect number of samples.
Finally, we must calculate the new m2’. We start with the definition that the variance times (n-1) is:
\[ m'_2 = \sum_{i=1}^{n'}(x_i - m'_1)^2 = \sum_{i=1}^{n'}(x_i^2 - m_1^{'2})$ \]
(Note that the second half of the equation is a property of variance, and its derivation can be found on wikipedia.)
Then, we do some algebra, split the summations according to which input Gaussian the data point came from, and resubstitute the definition of m2 to get: \[ m'_2 = \sum_{i=1}^{n'}(x_i^2 - m_1^{'2}) \\ m'_2 = \sum_{i=1}^{n'}(x_i^2) - n' m_1^{'2} \\ m'_2 = \sum_{i=1}^{n_a}(x_{ia}^2) + \sum_{i=1}^{n_b}(x_{ib}^2) - n' m_1^{'2} \\ m'_2 = \sum_{i=1}^{n_a}(x_{ia}^2 -m_{1a}^2) + n_a m_{1a}^2 + \sum_{i=1}^{n_b}(x_{ib}^2 - m_{1b}^2) +n_b m_{1b}^2 - n' m_1^{'2} \\ m'_2 = m_{2a}+ n_a m_{1a}^2 + m_{2b}+ n_b m_{1b}^2-n' m_1^{'2} \]
Notice that this equation has no divisions in it. This is why we are storing m2 as the variance times (n-1) rather than simply the variance. Adding in the extra divisions causes training our Gaussian distribution to run about 4x slower. I’d say haskell is getting pretty fast if the number of floating point divisions we perform is impacting our code’s performance that much!
This algebraic interpretation of the Gaussian distribution has excellent time and space performance. To show this, we’ll compare performance to the excellent Haskell package called “statistics” that also has support for Gaussian distributions. We use the criterion package to create three tests:
> size = 10^8
> main = defaultMain
> [ bench "statistics-Gaussian" $ whnf (normalFromSample . VU.enumFromN 0) (size)
> , bench "HLearn-Gaussian" $ whnf
> (train :: VU.Vector Double -> Gaussian Double)
> (VU.enumFromN (0::Double) size)
> , bench "HLearn-Gaussian-Parallel" $ whnf
> (parallel $ (train :: VU.Vector Double -> Gaussian Double))
> (VU.enumFromN (0::Double) size)
> ]
In these test, we time three different methods of constructing Gaussian distributions given 100,000,000 data points. On my laptop with 2 cores, I get these results:
statistics-Gaussian | 2.85 sec |
HLearn-Gaussian | 1.91 sec |
HLearn-Gaussian-Parallel | 0.96 sec |
Pretty nice! The algebraic method managed to outperform the traditional method for training a Gaussian by a handy margin. Plus, our parallel algorithm runs exactly twice as fast on two processors. Theoretically, this should scale to an arbitrary number of processors, but I don’t have a bigger machine to try it out on.
Another interesting advantage of the HLearn library is that we can trade off time and space performance by changing which data structures store our data set. Specifically, we can use the same functions to train on a list or an unboxed vector. We do this by using the ConstraintKinds package on hackage that extends the base type classes like Functor and Foldable to work on classes that require constraints. Thus, we have a Functor instance of Vector.Unboxed. This is not possible without ConstraintKinds.
Using this benchmark code:
main = do
print $ (train [0..fromIntegral size::Double] :: Gaussian Double)
print $ (train (VU.enumFromN (0::Double) size) :: Gaussian Double)
We generate the following heap profile:
Processing the data as a vector requires that we allocate all the memory in advance. This lets the program run faster, but prevents us from loading data sets larger than the amount of memory we have. Processing the data as a list, however, allows us to allocate the memory only as we use it. But because lists are boxed and lazy data structures, we must accept that our program will run about 10x slower. Lucky for us, GHC takes care of all the boring details of making this happen seamlessly. We only have to write our train function once.
There’s still at least four more major topics to cover in the HLearn library: (1) We can extend this discussion to show how the Naive Bayes learning algorithm has a similar monoid and group structure. (2) There are many more learning algorithms with group structures we can look into. (3) We can look at exactly how all these higher order functions, like batch and parallel work under the hood. And (4) we can see how the fast cross-validation I briefly mentioned works and why it’s important.
Anyone who wants to be first must be the very last, and the servant of all.
– Mark 9:35
Being the servant of all is a hard task that we often forget to do. We need to remind ourselves constantly that we are here to serve everyone. One easy way to do this is to call everyone else “Sir” or “Ma’am.” This language serves as a reminder to ourselves, and at the same time uplifts the person we’re talking to.
I discovered this trick when I was an officer in the Navy. All the enlisted sailors had to call me “Sir,” and that frankly made me feel good about myself. It made me feel important. I had twenty people I was in charge of, and their sole purpose in life was to do what I told them to do. To drive this point home, all midshipmen have to memorize this quote in their first year at the Naval Academy:
Sir, sir is subservient word surviving from the surly days of old Serbia, when certain serfs, too ignorant to remember their lord’s names, yet too servile to blaspheme them, circumvented the situation by surrogating the subservient word sir, by which I now belatedly address a certain senior cirroped who correctly surmised that I was syrupy enough to say sir after every word I said, sir.
But one of the reasons I left the Navy was the realization that the military’s power structure looked nothing like what Jesus wanted from his followers. Jesus came to invert this power structure. He came to make the first last and the last first—so we need to do that in our own lives. As an officer, I should have been calling my sailors sir, because it should have been me serving them. That’s the example that Jesus gave.
So now when I’m walking down the street and see a homeless women, I greet her with “Good morning, ma’am!” God put me on that path in order to be a Jesus for her—to be her servant. I need to remind myself that I was made just for this moment: to accept this woman’s sins as my own and serve her as Christ would. And she needs someone to treat her with the dignity of a human created in God’s image. Being called ma’am may just be the only dignity she receives for the rest of her life.
So last weekend I was making an awesome double IPA. My roommate’s dad owns a hops farm. He had just sent us 2 lbs of cascade. So I figured, I wonder how that much hops would taste in a 5 gallon batch?
Now, I live in the desert, and it’s still pretty hot here in November. Like over 90 degrees hot. So naturally I was eating ice cream while brewing. And then it struck me: What if I poured some of the unfermented wort into the ice cream?!
When the boil was done, I put the hops bag inside a bowl and squeezed to get out as much juicy goodness as possible. Then I poured it into a tub of ice cream:
After a bunch of stirring, the final product looked something like:
In total, I added about a cup of the unfermented wort into 1 quart of ice cream. I was a little worried that maybe I had added too much. Maybe the wort would simply turn into ice and ruin the ice cream.
But lo!
One day later, I come back from a hard day of work, ready for dessert, and am greeted by the perfect combo:
Pie à la beer mode FTW!
It’s hard because no one ever listens as soon as a conversation turns political. It’s just about waiting for our turn to regurgitate the pros of our favorite candidate or policy. I’ve been thinking a lot lately about: How should I change my conversation strategies in response to this fact?
In the past, when people asked me, “Are you democrat or republican?” I might answer with: “I just call myself a Christian, but other people also call me an anarchist.” Everyone just explodes with how naive and irresponsible that is. Or sometimes how I can’t possibly be both. We go back and forth on political theory. But nobody listens. Not even me. (To my shame!)
I’ve also tried answering with, “I don’t vote.” Then I hear a chorus of “But it’s your duty as an American citizen to vote!” Or “God gave this freedom, you better use it!” I try to explain how I’d rather spend the time it takes to vote personally improving society, rather than voting for a politician to do it for me. But nobody listens. And I don’t listen to them. (To my shame!)
These conversations were never productive. So I’ve been experimenting with different approaches.
What I’ve settled on is: “I always just try to do what Jesus would do. Nothing more or less.”
Some people say, “Well, you can’t force your beliefs on everyone else.” And then I say, “Jesus didn’t, so I won’t either. I only try to make my own actions more Christ-like.”
Some people say, “Well, you’re wasting the chance to make everyone else more Christ-like.” And then I say, “Jesus didn’t force everyone else to be Christ-like, so I won’t either.”
Some people say, “Well, Jesus didn’t talk about government policies.” And then I say, “So then I don’t care about government policies.”
Some people say, “Well, somebody’s got to think about the government!” And then I say, “My faith is in Jesus, not my government. I’ll follow Jesus’s commands and trust in him to take care of the rest.”
Some people say, “Well, then how are you going to make the world a better place?” And then I say, “The same way Jesus did: sharing my food with the poor; sharing my wine with the sinners; and if necessary dieing for the people who kill me.”
These conversations are much more humbling for me. I stop trying to convince others that I’m right, and instead simply present my beliefs. Hopefully it helps others understand that I have no king but Christ.
Now I just need to figure out how to listen better…
I’ve been thinking recently about ways to formalize exactly what makes Christianity different. One way to approach this is through ethics.
In the _Nicomachean Ethics, _Aristotle defines virtue as the mean between two vices. For example,
Every ethical virtue is a condition intermediate between two other states, one involving excess, and the other deficiency. The courageous person judges that some dangers are worth facing and others not, and experiences fear to a degree that is appropriate to his circumstances. He lies between the coward, who flees every danger and experiences excessive fear, and the rash person, who judges every danger worth facing and experiences little or no fear.
In picture form, that looks something like:
If we accept Aristotle’s perspective, then there are two possible ways for Christian ethics to be different. First, we might come to different conclusions about where this virtuous mean lies. If you place me on the spectrum of rash to coward, for example, I definitely think virtue lies far closer to the cowardice end of the spectrum than most people. (I am a pacifist after all.) But I also do not think this does justice to my Christ-inspired ethics because I actually despise cowardice. Instead, I value peace making. But at first glance most people think peace making and cowardice are the same thing. So something else must define what makes Christian ethics different.
This leads us to our second option: redefining which spectrum of vices and virtues we use. Let’s imagine the classic thought experiment against pacifism: Someone is attacking your grandmother; do you fight the attacker or do nothing? By the standard metric above, only a coward would not fight. The courageous man—the virtuous and ethical man—would clearly fight.
But I make decisions based on a different metric. On the one hand, I weigh my responsibility to the victim, and on the other hand I weigh my responsibility to the perpetrator. The virtuous person under this schema is the one who is able to treat both of them with perfect love; the vices are excessively favoring one side at the expense of the other. That is, the virtuous man intervenes but in a nonviolent way to stop the attack.
This reframing has two interesting properties. First, if you can’t find a way to intervene nonviolently in a given situation, then you are not yet a virtuous person. One of the major problems that pacifists must deal with is that these nonviolent intervention strategies are rarely obvious. I think that we haven’t achieved perfect virtue if we are not able to solve this problem. This fits perfectly with the Christian idea that we can never fully overcome our sinful natures.
The second interesting property relates to how we deal with our own lack of virtue. The nonpacifist errs on the side of excessive love of victim, whereas the pacifist errs on the side of excessive love of perpetrator. But both sides must realize that they are making an ethical compromise and not acting in accordance with Christian virtue. If a non-pacifist Christian reasons in this way, then they have my full support for any violence they deem ultimately necessary.
Actually, one of the reasons that I am a complete pacifist is to force myself to reason in this way, because I think it is impossible to think in these terms without adopting complete pacifism. But that would be a whole separate article.
The main point is that “radical” Christians tend to adopt this perspective. We don’t just try to place the mean in a different spot. Instead, we completely redefine the problem. Because that’s what Jesus did.
In Luke 11:24-25, Jesus says:
When an evil spirit comes out of a man, it goes through arid places seeking rest and does not find it. Then it says, “I will return to the house I left.” When it arrives, it finds the house swept, clean, and put in order. Then it goes and takes seven other spirits more wicked than itself, and they go in and live there. And the final condition of that man is worse than the first.
When I read this tonight, I imagined a lot of progressive/radical/anarchist Christianity. We see things wrong with society and the church, so we try to remove those “evil spirits” from our lives. The problem is that we need to follow this up by inserting love into our lives to take this evil spirit’s place, but too often we don’t. The result is that more evil spirits move in. And this, according to Jesus, is the worst possible thing we could do.
The main way I see this lived out is in our opposition to militarism. Too often I see people recognize that militarism is destructive, but then they start hating the military and all soldiers. Well, I know a lot of soldiers who are trying their best to serve God, and I certainly don’t think we should hate them for it. Hating won’t end war.
I feel like I need to keep this in mind whenever I discuss polarizing issues. It’s not enough to cast the demon out of my heart, I must also put Christ into my heart. And that’s by far more important.
In Luke 10:2, Jesus says:
The harvest is plenty but the workers are few.
I’ve heard this verse interpreted by churches as a call to evangelism. There’s a lot of lost souls to save, and they won’t get saved unless we go out and get them. But this assumes that I am a “workers” and need to go out to “harvest” the lost souls. I don’t like this interpretation because it lacks humility.
What this verse makes me think about is: “Am I really one of the workers, or am I just slacking off and pretending to work?” Jesus goes on to give very strict commands to his workers: “Do not take a purse or bag or sandals.” I like to think I’m trying to live up to this strict command of abandoning everything for Jesus, but it’s really hard to figure out what things I need to abandon to make this happen. What completely ordinary—even necessary—things like sandals am I not giving up?
In February 2011 I was discharged as a conscientious objector after 7 years in the Navy. Part of the conscientious objector process is an interview with an investigating officer. His job is to assess the “depth and sincerity” of the applicant’s beliefs. My interview spanned three days and covered both technical legal points and broad theological concepts. Here is a short excerpt where I we talk about the role of “salvation,” and how it led to my pacifist convictions.
The investigating officer is in bold, and I am in plain text.
Now you testify that you follow the ways of Jesus Christ, right?
I try to.
You stated on page 3 of you application, “I believe that Christ came, that all might be saved from their sin.” Is that true?
Yes sir, that is.
Have you been saved?
I have.
Do you believe you can lose your salvation?
My interpretation of being saved and salvation is that I’ve come to know God and am in a spiritual process with God. I don’t see how now that I have that how it would be possible for me to turn away from that, or something, anything, to cause that to change or that to lessen. So, for me I don’t think that’s possible, but I don’t know in the general sense.
The Israelites turned away from God several times in the Old Testament. Would you agree?
Yes, sir.
Did God ever give up on them?
He did not.
Do you think God would give up on you?
No.
Do you think if you are somehow forced into warfare, participating in warfare, that you would lose your salvation?
If I were forced into warfare, then it would be my prayer and my goal to not participate in that warfare, so that I’m not harming somebody else in anyway. But I recognize that I am a sinner and so is everyone else and we all make mistakes. I make small mistakes; I make large mistakes. I believe God is a god of forgiveness, that if I do make these mistakes, which I know I will, he’s always inviting me to come back to him and repent and he’ll forgive me. At the same time, that’s not an excuse for making mistakes. God is always inviting me to get closer to Him and it’s always my goal to get as close to him as possible.
Edit: I should have made it more clear that I would have refused the orders to participate. I did too much equivocation because I was so nervous.
One of your witnesses said he believes there is more than one way to heaven. Do you believe there is more than one way, or do you believe that Jesus Christ is the way and the truth and the light? No one goes to the father except through Him.
I believe that those two statements—that there’s more than one way to heaven and that Jesus Christ is the way to heaven—are really two ways of saying the same thing. When Jesus Christ said, “I am the way, the truth, and the light,” I don’t believe that he meant that in order to get into heaven you have to say “I worship Jesus.” And that if you do say that it’s not sufficient. In the sermon on the mount, he says there will be those who cry, “Father, Father,” but God will say, “I never knew you.”
I think the basic message of the Gospel is this: God exists. Our goal in life is to draw closer to God. We do that by making our lives conform to Jesus’s life. We will fall short, and that’s where grace comes in, but we still need to make our life as much like Jesus’s as we can. It’s this desire to make my life conform to Jesus that led me to pacifism.
Also,to the extent that other people are pursuing that goal—even if they have never heard of the name Jesus Christ—I think they are acting like Christians. I still don’t believe that those two ideas are incompatible.
I am a little concerned that you don’t have a good feeling for how to define salvation. It seems to me that you’re trying to rectify your own beliefs with the beliefs of the Quakers and Pastors who testified for you earlier. I don’t think it’s so important that you try to do that if that’s what you’re trying to do. There’s nothing wrong with your beliefs. Have you memorized John 3:16?
It says, “For God so loved the world that he gave his only begotten son that whomsoever believes in him shall not perish but have everlasting life.”
That’s right. Do you believe that?
I do, but I don’t understand what it means.
Do you know what only begotten means?
It means that Jesus was God’s one and only son.
Okay. So when the Quakers say that Jesus was just another person like us who was a little closer to God, and that we are all children of God, do you think that we are sons of God in the same manner that the one and only son of god is. Would that make us the same physically as the one and only?
I know Jesus called people his brothers and sisters, and that I consider myself his brother in that same sense. He called other people children of God. What it means for him to be called the son of God or the only begotten son of God is not something that I can limit by giving an exact definition.
So the Bible says that God so loved the world that he gave his only begotten son that whoever should believe in him shall not perish but have everlasting life. So that’s a way to have everlasting life. You mentioned that there were other ways. Can you list those?
Because, my thought is that if there are other ways to getting to heaven than why bother following Jesus? You can just stay in the Navy and you don’t need to follow Jesus’s ways, you can use other ways to get eternal life. Why don’t you just pick one of the other ways that agrees with the Navy?
I don’t believe that there are other ways. My understanding of John 3:16 is that the sort of belief he talks about is a very deep and personal thing. It doesn’t stop and it doesn’t even start with someone saying, “I believe in Jesus.” Those words and maybe even that idea doesn’t even matter.
Jesus said, “Whoever believes in me will follow my commands.” The important thing is the internal change of “I want to be like this and I want to follow somebody who was like this and set this kind of example for me.” I cannot list other ways to be like Jesus other than to be like Jesus, and I can’t list other ways to believe in God other than to be like Jesus.
Let me just read this other verse that God presented me. Acts 4:12, “Salvation is found in no one else for there is no other name under heaven given to men by which we must be saved.” No other. Do you believe that’s accurate? It was written by somebody who was with Jesus, right?
My understanding of Acts 4:12 is the same as my understanding of John 3:16. It’s not the words that matter, but the alignment of your heart to God’s.
Okay, so you don’t believe there is any other way other than that relationship with God or Jesus, who yesterday you said is the same person basically. Okay. So you’re counting on Jesus as your one and only way to heaven and that is why it is important for you to follow his ways, is that correct?
Yes
Edit: My motivation for aligning my will to God’s is not to get into heaven. I really don’t care about heaven, I just want to be like Jesus. I want to make heaven on earth. I didn’t stress that point though because I was getting really sick of this conversation.
Okay. I’m glad we clarified that.
In February 2011 I was discharged as a conscientious objector after 7 years in the Navy. Part of the conscientious objector process is an interview with an investigating officer. His job is to assess the “depth and sincerity” of the applicant’s beliefs. My interview spanned three days and covered both technical legal points and broad theological concepts. Here is a short excerpt where we talk about some of the ways nonviolence impacts my interaction with the state outside of a military context.
The investigating officer is in bold, and I am in plain text.
You state on page 2 of your application, “I cannot take someone else’s life, nor can I aid others in doing so. Therefore I cannot participate in war in any form.” If you pay federal taxes, your money is used to fund the military. Explain how you justify that.
My goal is to draw as close to God as I can. At this point, the issue of taxes has not come up on my conscience because I have a more important concern. Every day, I spend my life in the Navy wearing the uniform, somehow contributing to warfare. That has weighed very heavily on me. It’s not compatible with my beliefs anymore. I recognize that a conscientious objector discharge is the first step that I have to take to reconcile my beliefs with what I’m doing. I don’t yet know what to do about taxes. Right now, the Navy automatically deducts them from my pay. I will deal with that problem when I come to it.
My goal, as it has always been, is to work through the system: to be a good citizen, be constructive to my country, and to help and serve my country. I want to keep doing this, but I just can’t continue in the military anymore.
Edit: I have now had lots of time to think about my paying taxes for war. I do not do it, and am a war tax resistor.
You say you believe in the country. Do you know this country was founded by military operation?
Yes sir, I do.
So we actually wouldn’t have a country if it weren’t for some military. How do you get that straight in your mind?
My belief in the country—like I think it is for most people—is that I believe in freedom, democracy and liberty. Those ideals are somehow fundamental to humanity. I think lots of nations are trying to get those ideals in different ways. It’s those ideals that I support
Edit: Also, those ideals mean more to me than any country ever could. I actively work to undermine the United States when it does not live up to its rhetoric here. I think this is the most patriotic attitude we can have. When the United States falls short, we don’t just sit by and watch. Instead, we strive to make it a better country.
Our country needs laws to support itself. For instance, murder is against the law. How are those rules enforced? Will everyone in the country just decide to follow them?
No, which is why we have a police force.
Does that police force use violence to enforce those rules?
It does.
In a letter from your former roommate, he discusses a conversation you had in December 2009. This was after you submitted your first application. He says, “Not before long, our conversation turned into me justifying to him why it is necessary to even have law enforcement agencies that are willing to do violence on behalf of those who need to be protected. This was a surprise to me because we had discussed law enforcement on a few occasions before he had decided to become a conscientious objector. And, he had not been opposed to the idea of violence in the name of law.”
Do you believe that violence in the name of the law is necessary?
I could not participate in that violence.
I understand that you don’t believe you could participate. Do you believe it’s required? Or can we just get rid of the police force?
My inspiration comes largely from Gandhi on this point. He believed it would be possible to have a society where the police enforced laws nonviolently. In a society like that, I would be able to serve in the police force.
Gandhi believes that?
That’s what he tried to implement. That’s my understanding.
Was he successful in implementing that? A country with a police force that doesn’t use any violence?
In my opinion, he was. He helped create communities called Ashrams that were based entirely on nonviolence. I’m currently living in a place called St. Francis House which is also based on nonviolence and partly inspired by his example.
Is there like a wall up around there? I mean, how do you keep the violent people out?
The same way the early Christians did it: by turning the other cheek. When violence comes to you, you be nonviolent back. Sometimes that means you experience violence. The early church had many martyrs because of this. But through it all, you continue to love those who persecute you.
At least, that’s what I think Jesus would want.
Don’t you think there might be a lot of home break ins? These people seem like they’d be an easy mark, and it doesn’t really seem believable unless you had this huge wall up around the whole city that kept the violent people out away from the nonviolent ones. It would be like a chicken coop with a bunch of wolves around. You’ve got to have something that keeps the wolves away from the chickens. Otherwise, once the walls go down and the wolves go in, all the chickens die.
Gandhi, the early Christians, and people who practice nonviolence today usually do not have much to steal because they are too busy helping people to accumulate riches on earth. Besides, Jesus said to give to anyone who asks and constantly compared Christians to sheep.
In February 2011 I was discharged as a conscientious objector after 7 years in the Navy. Part of the conscientious objector process is an interview with an investigating officer. His job is to assess the “depth and sincerity” of the applicant’s beliefs. My interview spanned three days and covered both technical legal points and broad theological concepts. Here is a short excerpt where we talk about the “Jesus Revolution” and the sermon on the mount.
The investigating officer is in bold, and I am in plain text.
Can you explain why specifically that your beliefs cause you to object to warfare?
My beliefs are against warfare because I believe that I’m supposed to follow Jesus as closely as I can and I believe that…
How do you know Jesus was against or for warfare? Did he ever talk about war? War between nations?
Jesus’s main teaching was the sermon on the mount. That’s the one that’s most influenced my life. He talks a lot about how its the inside of you that matters, not outside that matters. My interpretation of his saying “When struck turn the other cheek,” is that it applies to me at all times. I believe this because of his teachings, and the way the early church’s understood Jesus’s teachings, and the example of Jesus’ life
People expected him to go lead a violent revolution and he didn’t. He said, “That’s not why I came. I came to lead a peaceful revolution. My Kingdom is not of this world. If it were, my people would fight for it.” The early church interpreted Jesus nonviolently, and I want to live like the early church. They actually talked to Jesus, after all. Or knew people who talked to him.
Did Jesus lead a revolution?
In my opinion, yes.
How was it a revolution? When I think about the Revolutionary War, that’s where we fought for our independence and all that. Did Jesus take a part another country and separate from the rest of the country?
No, my idea of a revolution is just a complete paradigm shift. There was certainly a paradigm shift in Judaism at the time into Christianity. Christianity spread rapidly. It’s that rapid growth that I would call the Christian Revolution if you want to put that name on it.
So he talked about if your enemy slaps you in the face turn the other cheek. Would that seem to be like a nation against nation thing or a neighbor against neighbor thing? I mean, is there anything in there that would suggest that it’s talking about nation against nation like in the old testament where there was several nation against nation discussions?
My interpretation is that Jesus’s teaching applies to me at all times. It applies to me whether I’m wearing the uniform or not wearing the uniform. I applies to me whether the person striking me is my neighbor, my family, a terrorist, a German Nazi, or anybody. I don’t feel that I can participate in war, a nation to nation conflict, where that is going to cause me to go against how I would resolve a person to person conflict.
I take the sermon on the mount literally. And it’s hard, and it’s very different, and I don’t always know what I’m doing, and sometimes I fail. But I’m trying my best to follow it.
Like I said before, I wanted to be a military officer my whole life. I wanted to serve my country in those nation to nation wars you talked about. But I can’t do that anymore. Jesus turned my life upside down. I wish he didn’t, but he did.
In February 2011 I was discharged as a conscientious objector after 7 years in the Navy. Part of the conscientious objector process is an interview with an investigating officer. His job is to assess the “depth and sincerity” of the applicant’s beliefs. My interview spanned three days and covered both technical legal points and broad theological concepts. Here is a short excerpt where we talk about how I interpret the bible, especially with regard to the old testament’s apparent violence.
The investigating officer is in bold, and I am in plain text.
You put in your application that after you read Choosing Against War by John Roth. Afterward, you looked at the Bible to make sure it made sense. Do you normally do that after looking at what a man has written and compare it back to what you know to be written? Do you believe that the Bible is inspired by God’s word?
I do believe that the Bible’s inspired, but I think by that I mean something different than what some people mean. I’m not sure how to describe that difference. I believe that things today can also be inspired, that the Bible’s not the only inspired thing, that the Bible still needs to be understood in historical context, and it has to be understood that it wasn’t God who wrote those words. It was human beings.
Was there any difference between the human beings who wrote the Bible and yourself?
The main difference is the human beings, at least in the New Testament, had more direct access to Jesus and his teachings. I haven’t had physical encounters with Jesus, but through my prayers and other spiritual practice, I feel like I’ve had some of that encounter. So I guess I see those individuals as teachers, or a path for me to follow, a guide.
So do you believe the story of the parting of the red sea? Moses talking to the burning bush? Things like that?
My belief in the Bible is primarily based on my belief in Jesus. Those other parts of the Bible, whether they happened or not, may be an interesting historical tidbit, but it’s not foundational. It’s not that important.
My belief in Jesus is not dependent on whether the Old Testament got certain facts exactly right.
In the Old Testament, there’s discussions of different warfare. For example, Joshua taking Jericho, Gideon taking the Midianites, Sampson against the Philistines, and David’s conquests. Those are all more either nation against nation or tribe against tribe. What do you think about those, contrasting or comparing what happened in the Old Testament to what Jesus is teaching now?
I think that comparing those two, in the Old Testament, the way the stories are written appear very different. It does appear God condoned or even ordered war in those cases. But the important thing to me is that I’m trying to follow Jesus. His example of not choosing war, of choosing peaceful ways, is the way that I have to go.
I just had an epiphany of my own. Do you think what happened then, because in many cases these people that were attacked were not following God, do you think it’s possible that that was a similar situation to Jesus and the temple? God had sent someone to tell them they’re wrong?
I actually like that comparison a lot, because if we compare how Jesus did it to the way other people did it, Jesus did it in a way that did not involve warfare and did not involve killing people.
So Jesus is able to accomplish what was accomplished before, through peace?
Yes.
So let’s talk about that whole thing with Gideon going to take out the Mennonites.
I think you mean Midianites.
Edit: throughout the whole conversation the investigating officer would mix up these two words. It’s pretty hilarious in retrospect, but really flustered me at the time.
I assume you’ve read the story. Tell me about it.
The story is that the Israelites were being attacked by a large army. Gideon raised an army to fight, but according to the Bible Gideon prayed to God, and God kept saying “No that’s too many people. You need fewer people.” Eventually Gideon had a very small number, maybe 100, I forget exactly. They ended up tricking the other army into basically killing themselves. Certainly the way it’s portrayed Gideon thought that killing was God’s command, and that he was doing God’s work.
It was all about going and attacking and killing those people because they were idol worshipers. They weren’t worshiping God. They were spreading false religions much as probably the Pharisees and the Sanhedrin were. They had created their own religion. People were following them, not God. Wasn’t Jesus coming to lead us back to God? Saying, these people have placed all these requirements upon you: special foods that you have to eat, and special ways that you have to walk, and special resting that you have to take, you can’t work in these hours and all that. Would you agree with all that?
I think that’s a very important difference between how Jesus handled the situation and how Gideon did. Like I said before, I follow Jesus.
You said Gideon handled it, but Gideon didn’t actually do anything, right? He just listened to what God said. As you said, he just took troops down there and didn’t even have to fight. God made those people all kill themselves.
That’s the story in the bible, but…
So Gideon didn’t kill anyone, right? It was actually under God’s control? Gideon didn’t even want to go fight, right? Gideon kept saying are you sure? Make this fleece wet if I’m supposed to go. Make the ground wet. I don’t want to go fight. I don’t want to go down there. But ultimately he did God’s will, didn’t he?
I’m listening for what God’s is will for me. It is apparently very different from how Gideon described it for himself. I look to the way Jesus handled that same sort of situation because I believe Jesus is God. I don’t believe Gideon is God. I have no idea whether Gideon was inspired by God or not. I believe Jesus was God, and that I’m inspired by Jesus and inspired by God. I must follow Jesus’s way. I believe Jesus’s way was a way of peace.
Alright, and that’s good. Now, let’s go back. You hinted that maybe you don’t believe in the Old Testament. Do you believe this story is true? You said earlier you believe the old testament is a historical document.
About the inspiration of the books, I believe that they were written by people and that they have those associated flaws, both the Old Testament and the New Testament. We have to understand them in that context. Whether the author of Judges thought Gideon was inspired is different question from whether Gideon actually was inspired. And really it has no bearing on my life. I don’t think it has a bearing on my beliefs because I believe I have to follow Jesus, not Gideon.
Did the Old Testament predict that Jesus Christ would be coming?
I don’t know. I know that there’s lots of verses that people describe as Messianic verses and that they’re prophesies. And I’ve looked into some of them and maybe these are prophesies and I’ve looked into others and said that I think people are clearly misunderstanding these verses. Overall it is not important to my understanding of Jesus whether he was prophesied in the Old Testament.
Was Jesus not alive in the Old Testament? You said you believed in the Trinity. Wasn’t Jesus always there?
He was not alive in the same sense that I’m alive now and had a body on earth.
But he did exist, right?
I wouldn’t know how to describe what he was.
Do you believe that Jesus existed since the beginning of time when God existed, along with the Spirit of God?
I believe that all three of those are eternal.
Do you believe—and we’ll just say God, because as you said, Jesus, God, and The Spirit are one—So do you believe that God’s will is that you should not be in the military?
Correct.
And would it be reasonable that Gideon believed God’s will was that he was supposed to assemble an army to go attack? Remember I just said, do you believe it’s possible?
What I believe is that the author of Judges believed that Gideon believed that God ordered those things. There’s a lot of layers of indirection there. To really understand the meaning of that, we have to peel back and do a lot of exploration and interpretation.
You realize that there’s danger when you start believing that some stuff in the Bible’s not true, because then we might start believing that Jesus is not true. Everything you know about Jesus is in the Bible, and anything that you hear about it is from other people who wrote stuff about the Bible.
As you said, if you have God’s spirit within you because you’re saved, whether you’re born with it or not, you’ve been saved so you have God’s spirit within you. You have the ability to interpret the word of God in the bible on your own.
So, do you really need to rely on any other man’s interpretation? Would another man’s interpretation be better than your interpretation?
I believe Bible is other man’s interpretation.
Of God’s word directly?
In part guided by God’s word directly, but also in part by their experiences and what they witnessed. It is accounts of what they witnessed guided by God.
You think some parts of the Bible are more important than others. Why? Is there anything in the Bible that you think was written by somebody that did not have that experience with God?
I guess parts of the Bible are more important to me because I feel they are more spiritually enlightening. A large part of why I feel they’re spiritually enlightening is that I can trace a tradition to them. For example I see how important the Sermon on the Mount was to the early church, and that’s why it’s so important to me. So tradition and historical accuracy are my two main guides.
Don’t you think that there might be some pastors out there that speak Greek and Aramaic that might raise a stink if someone was to put out a Bible out on the shelf and it was an inaccurate translation? Wouldn’t you be very curious if someone ran up to you and said, “Oh I just learned Greek and I’ve got this Greek Testament. And you know what, we’re wrong. Jesus is not the Son of God. It says here that Jesus was just a regular guy. It says here that a man named John Smith will be born and he will be the son of God and you should worship me.” Don’t you think you’d be curious? Think maybe that guy doesn’t know what in the world he’s talking about?
You know Jesus was seen by hundreds of people when he was raised from the dead and came back. He was seen by hundreds of people. Don’t you think that if someone had recorded it wrong, one of those hundreds of people would have come back and said, “No it wasn’t like that.”
I mean, I agree that the Bible’s been very well analyzed. My study of Hebrew has helped me understand how this analysis process works. It’s helped me understand why I should believe in the Bible and why that stuff doesn’t happen. The historical analysis of the bible has complemented my spiritual understanding of the bible. For me, at least, I can’t separate one from the other—the learning Hebrew, seeing that the history goes back and is well traced, and that Christian pacifism is a tradition that’s not new, that it’s continuous, and that people have been saying it for a while: ever since Jesus came.
So it’s affirmed your beliefs in the old testament? Is that true?
Not quite. I think the old testament and new testament are very different beasts. My study has confirmed my belief in Jesus. I see that as the most important part. I see the old testament as leading up to Jesus and helping me understand Jesus based on his roots. Not that I should be going back to the old testament to see how I should live, but that I should be going to Jesus to see how he lived and that’s how I should live. It’s helpful to understand the old testament so that I can understand Jesus.
How did Jesus feel about the old testament? What did he have to say about it?
He said he didn’t come to abolish the law but to fulfill the law.
Where is the law written?
The law is the Torah, the first five books of the Old Testament.
Did Jesus ever make corrections to the Old Testament? For instance, we talked about Gideon. Maybe when Jesus came did he tell anyone that the story of Gideon was recorded wrong and I need you to write an update or a change to correct it? Did he do any of that?
In my understanding, yes. Jesus says, “You have heard that it has been said, an eye for an eye and a tooth for a tooth.” I forget how the verse continues, but Jesus says it really isn’t an eye for an eye and a tooth for a tooth. It’s love your neighbor and turn the other cheek. It’s my understanding that those aren’t incompatible.
Okay. That seems to be a case where Jesus did say perhaps the Old Testament, part of that way was to change. Why didn’t he make other corrections besides that one?
I’m not sure that it was important for him to go through the Old Testament and go verse by verse. Reject this verse, reinterpret this verse, this person was misunderstanding slightly when he said this.
But the Bible’s important, so don’t you think he would have corrected it? Don’t you think he would care about us that much, that if we could be lead down the wrong path by incorrect verse that he would correct it so we would then be led down the correct path?
Jesus presented a very radical world view, and to explain that view he had to use simple language to be succinct. Even still he was constantly misunderstood by those around him. I believe that the sermon on the mount is a sufficient explanation of that, of his world view. It is very short and simple and something that someone can sit down and listen to Jesus preach in one day and understand. Now I can sit down and read it and in the course of an hour or half hour and understand what Jesus was all about.
For me to expect Jesus to do some sort of textual analysis using techniques that we’ve only been adopting in modern times is not what the people were looking for then, and not what I’m looking for now from Jesus.
In February 2011 I was discharged as a conscientious objector after 7 years in the Navy. Part of the conscientious objector process is an interview with an investigating officer. His job is to assess the “depth and sincerity” of the applicant’s beliefs. My interview spanned three days and covered both technical legal points and broad theological concepts. Here is a short excerpt where we talk about why I prefer to describe myself as “pro-peace” rather than “anti-war.”
The investigating officer is in bold, and I am in plain text.
Why are you anti-war?
I wouldn’t describe myself as anti-war so much as pro-peace.
Maybe we should clarify that statement. Did you mean you’re not against war?
No, sir. This whole process has characterized my beliefs as anti-war in a very narrow way because that’s what the regulations require. That’s true, but it’s much broader than that. I’m not just against something, I’m for something: I’m for trying to follow Jesus and I’m for building peace wherever possible.
This means a lot of things. It means helping the homeless and those marginalized by society. It means working really hard now to prevent wars that might happen ten years from now. It means actually being nice and helpful to my enemies.
A very small aspect of being a peacemaker just happens to be that I can’t participate in war.
On page 10 of your application you state, “I want every aspect of my life to contribute to peacemaking because I believe every aspect of Jesus’ did.” When Jesus cleansed the temple, was he trying to make peace with the Pharisees or was he trying to correct them?
I believe both.
Do you think a peacemaker is someone who goes into a situation and is open to everyone’s concerns, even if someone has thoughts that are wrong? Say somebody was in here and that person wanted to kill you, or wanted to kill me. Just to make peace with that person, you’d say, “Well go ahead. I don’t want you to be un-peaceful, so go ahead and murder people because that way you and I can have peace between us. I won’t tell you it’s wrong.”
That sounds ridiculous, and it’s not what I believe. I think Gandhi’s a good contemporary example of peacemaking that’s at least a little bit easier for me to understand. When people attacked him violently, he didn’t stand back and do nothing (although some might call it that because he didn’t try to kill people in return). He used what he called Satyagraha, or “truth force.” He believed, “What you’re doing is wrong and I’m going to tell you it’s wrong. I’m not going to force you to change, but I’m going to try to convince you it’s wrong and show you the error of your ways.” When that Satyagraha process, which is peacemaking and what I believe I’m called to, is done properly, it’s not that the person doing it wins and the other person loses. It’s that both people win. That’s the essence to me of peacemaking: that both sides will win.
Edit: I really like the protest movement Food not Bombs for this reason. Instead of going out and being obnoxious, we go out and give food to hungry people. You should check us out.
In February 2011 I was discharged as a conscientious objector after 7 years in the Navy. Part of the conscientious objector process is an interview with an investigating officer. His job is to assess the “depth and sincerity” of the applicant’s beliefs. The interview spanned three days and covered both technical legal points and broad theological concepts. Here is a longish excerpt where we talk about the development of my beliefs as a conscientious objector.
The investigating officer is in bold, and I am in plain text.
The criteria for a “conscientious objector” discharge is that you’re opposed to participation in war in any form, that opposition is based on religious training or belief, that your position is firm, fixed, sincere, and deeply held. All four of those. The burden of proof is on you. You must establish clear and convincing evidence. This is different than a regular court, where if someone accused you of something they would have to prove that you did the wrong. In your case, you’ve come to the hearing, and you have to prove your case.
Your legal counsel will be in here the entire time. I have your petition right here, and I’m going to ask you questions about it so that I can understand your beliefs. Afterward, I’m going to make a summary, and I’ll try to get my report completed as quickly as possible after the hearing and get the stuff to your captain.
Right, I understand.
Okay, let’s begin.
You joined the navy in 2004 right after high school. The invasion of Iraq had just begun. In your application on page 5 you say, “I felt the call to serve my country. The United States was at war, and I firmly believed she needed my service.”
Based on your old religious studies you used to support war. Now based on recent religious studies, you have come to the opposite conclusion. What is your plan if a year from now, your beliefs once again change, and you’re no longer a pacifist?
First, I’d like to describe my beliefs while I was in high school. Like I said in my application I was raised as a nondenominational Christian. When I was very young we attended church regularly. Probably when I was about ten—I forget the exact date—we stopped attending. We’d go only on Christmas and Easter and major events like that. It was in high school when some of my school friends were attending Pacific Coast Church, so I decided to go with them. That was when religion started becoming important to me. I began attending church regularly and joined the youth group.
Those years were very much me trying to understand what those people at church believed and why. For example, they believed in service and sacrifice for a greater cause. That was very important to them and very important to me. That’s why I felt that I had to serve my country. And because I lived in a military community, service to the country meant joining the military. I think in a lot of communities that means the same thing.
But it wasn’t as a result of study. It was through a very different process than how I came to the convictions that I have now. Before I was trying to figure out what other people believed, and now I’m trying to make my life as close to Jesus’s as possible.
Was it more like riding the bus instead of driving the bus before?
No, I was very much involved. But, at that point the involvement was very corporate, and less personal. I recognized that the bible is a big book and Christianity is a big religion. It’s hard to understand it all. Theology is very complicated. Or at least we make it complicated for some reason.
I saw these people at church. They had looked at this stuff already, so to some extent I just trusted what they said. They have a sermon to preach, and I’m going to listen and see what they have to say. And their sermons were never everyone has to go and join the military. I don’t even remember any sermons about the military. I remember some of them had been in the military and that was probably related to some of their teachings, but that’s it. I think it was a pretty average nondenominational church.
Do you remember if they were teaching from the bible? Did they quote from the bible a lot?
Yes. They believed the bible was the word of God and that it has no errors.
And you grew up near the military?
We lived right next to Camp Pendleton, the biggest Marine Corps base on the west coast. From our backyard you could hear the practice bombs being dropped. So I had a lot of friends with military parents. In high school, my friends and I would go paint balling on the base and watch the military exercises.
I was only six at the time, but I remember our city throwing parades for the marines when they came back from Desert Storm. From then until high school, Colin Powell was my hero. I remember waiting in line for hours to meet him and tell him he should run for president.
Let’s see. You also wrote in your application that your grandparents had a big impact on your life, right?
Yes, sir. One was in the army and one in the navy. I grew up pretty close to my mom’s side of the family, and that was the grandad in the navy. He was at both Pearl Harbor and Midway. I did a history day project in middle school about him at Midway. It went all the way to the state competition. Sadly, he died a few years ago and I couldn’t go to his funeral because of navy obligations I had.
****I think my becoming a conscientious objector has been really hard on my granny. She’s very religious, like me, but I think she saw it as a betrayal of grandad. They were really proud of me for going to the Academy. I remember them coming out to all the events even though it was getting hard for them to travel. It was really hard for me to first tell her why I was trying to leave the navy, but I think she’s okay with it now.
So after high school, you went to the Naval Academy. In your application on page 6, you mention spending a week on a submarine for PROTRAMID. Quickly, what’s PROTRAMID?
PROTRAMID stands for professional training for midshipman. While I was at the Naval Academy, we would attend classes during the school year. The summer was broken up into 3 blocks, and each block we would have some sort of direct military assignment. The summer before my junior year I went on PROTRAMID. During that month long period we spent a week with the marines, a week with the aviation community, a week with the submarine community, and a week doing leadership development. And so it was the week with the submarine community.
But I did a lot more than just PROTRAMID during the summers. I spent a month on the Bon Homme Richard LHD-6, a “helicopter carrier.” I did an internship at the National Security Agency doing electronic warfare. And I did a lot of piloting patrol ships up and down the east coast. PROTRAMID was just the only time I spent on a submarine before graduating.
Edit: I have a note here saying that the investigating officer spilled coffee on himself. He makes a joke about how he likes to put in just enough creamer so it blends in with his khaki uniform when it spills. I was so nervous, and I remember this relieving a lot of the tension.
So tell me about your time on the submarine since that’s what you decided to go into.
The submarine was out of San Diego. I believe it was the Jefferson City. We only spent 2 days underway. During that time, they did the sorts of things they do for midshipmen, for example angles-and-dangles. It’s just maneuvers of the submarine to see that the floor is moving and that you’re almost walking on the wall at some points. The whole idea was to show us what life on a submarine would be like.
I think what appealed to me about submarines was really the sort of missions they go on. I really felt submarines were an important part of our national security policy. I felt like this was where I could have the most impact on the world, make the biggest difference. So that ultimately made me choose submarines. That and realizing they’re not as claustrophobic as I thought.
Edit: Some submarine missions are:
ISR (Intelligence, Surveillance, and Reconassaince): sticking high tech antennas up near foreign militaries to figure out what they’re doing
Strike warfare: launching tomahawk cruise missiles at targets that regular ships can’t reach
Special warfare: delivering Navy SEALs into foreign countries without being detected
Anti-submarine and anti-surface warfare: traditional WWII style killing each other
Nuclear deterrence
A really good book about the submarine force is Blind Man’s Bluff.
Oh Okay. Now when did you make that choice?
Either late junior year or early senior year. It was a fairly long process because of the interviews and stuff involved so I forget the exact date.
Edit: You have to pass a technical interview with a 4 star admiral—the guy in charge of the whole military nuclear program—to become a submarine officer.
Did you receive any bonus for choosing submarines?
I did. I received a nuclear bonus of fifteen thousand, I believe, and like I said in my application I am prepared to pay that back. Also, I’m prepared to payback all the education expenses for the Naval Academy. No one who I’ve talked to knows how much that’s going to be, but it doesn’t matter to me. I don’t care if I’m in debt for the rest of my life. It’s just money. I can’t sacrifice my conscience.
So still in your junior year you really weren’t a conscientious objector yet, right?
In no way, shape, or form.
Okay. Let’s see. In your application on page 6, you said, “We calculated the extent of civilian casualties and whether these numbers were politically acceptable for different types of targets. I accepted that this is the way things were done.” What did you mean by politically acceptable?
In that context I was taking a naval weapons class at the Academy. The way the class was structured was, we as the operators of the weapons needed to be able to understand the politics of using those weapons and the strategy of using those weapons. The strategy was calculating the sorts of damage they would do, and the politics was deciding whether that damage was acceptable.
Acceptable to who?
People higher ranking than me. My job as junior officer is using the weapons; more senior officers do the strategy part; and very senior officers or civilian appointees write the rules of engagement or provide guidelines on in this circumstance it is politically acceptable and necessary for the United States to use these weapons. I understood that at the time to be necessary for our national defense and security and our freedom and democracy and things that I supported. So I supported that structure and supported my role in that structure.
Would it be appropriate to put a sailor or a soldier in charge of a weapon without the knowledge of the devastating effect of the weapon?
I’m personally glad that the Navy taught me about the weapons systems and how they worked and the effects that they will have before putting me in a role where I would be expected to use them.
Actually, one of the things that first got me started down this road is the idea that you should be able to do your boss’s job. I asked myself, “Okay, if I were the one making the decision on when and where to launch a missile, what would I decide?” I felt like a good officer should be able to answer that question, and I wanted to be a good officer.
Okay. On page 6 of your application, you talk about the sad state of people’s beliefs regarding the sanctity of human life. Then you state, “I believed it was my responsibility as a Christian to make my service conform to the ideals of the just war, and in doing so, bring others up to my standards.” So you would go off, be in the military, and that would be your ministry. Basically, to bring others up to Jesus’s standards I’m sure is what you meant.
** So that means at the time you believed that you should be in the Navy. Because what you had learned was that maybe a lot of people were not Christians out there that perhaps could come to know Jesus through your testimony.**
It wasn’t that I wanted to make people come to know Jesus through my testimony. It was that I saw specific things that were bad about the military and were not the military as the way I thought it should be. For example, there is too much killing of civilians. I watched marines brag about it, and I was appalled. The other communities were even worse. Pilots drop bombs from F-18’s, but all they care about is how fun it is to fly faster than the speed of sound. On the submarine, nothing matters except passing your ORS.
Edit: ORS stands for Operational Reactor Safeguards exam. It’s a routine inspection done to submarines to ensure the nuclear reactors are being operated correctly. If a submarine doesn’t pass, then it can’t go on its missions until it does, so all the other submarines have to do the missions instead and make fun of you.
I thought to myself, “This needs to stop, and I’m going to use my authority and example as an officer to make it stop.” So my goal wasn’t to convert people to Christianity, but to make the military a better institution. I wanted our military to live up to the ideals of just war theory. I used to believe in a just war, but I don’t anymore.
Your belief to go forth and bring others up to standard is actually supported by the following scripture. Mathew 9:9-12, “As Jesus went on from there, he saw a man named Mathew sitting at the tax collectors booth. Follow me he told him, and Mathew got up and followed him. While Jesus was having dinner at Mathew’s house, many tax collectors and sinners came and ate with him and his disciples. When the pharisees saw this, they asked his disciples, ‘Why does your teacher eat with tax collectors and sinners?’ On hearing this, Jesus said, ‘It is not the healthy who need a doctor but the sick.’” Mathew 28:18-20 says, “Then Jesus came to them and said, ‘all authority in heaven and on earth has been given to me. Therefore make disciples of all nations, baptizing them in the name of the father the son and the holy spirit, and teaching them to obey everything I have commanded you.’”
Why don’t you think you can do this anymore?
I don’t believe it’s possible to have a just war. I look at all the wars throughout history—especially the ones the US has been in—and all I see is economics and greed. War changes you. It makes you think there’s things more valuable than human life. Well I don’t believe that.
But what about wars like World War II where we fought to defend human life?
I struggled for a long time with this. I really like Bonhoeffer’s approach here. He was basically a pacifist until he tried to assassinate Hitler. His thinking was, “Lord, I know what I’m doing is wrong, but I just don’t see any other way. Please forgive me.” It’s people like that who are trying to live out the just war theory. I don’t see any of that in our military, so that’s the sort of thinking I wanted to bring.
The problem I see now is that a military man cannot think this way. War is not normal, but when you dedicate your life to the art of war it becomes normal. Even if you try to keep it from becoming normal, you can’t. When I was first learning to fire our 9mm, we used sillouettes of people as target practice. The whole point of that is to desensitize us to killing, so that it becomes normal and we get better at it. It’s even worse here in America. I can’t leave my house in uniform without people thanking me for my bravery and service. They thank me for killing people! So what I’m doing must be normal, right?! That’s what we start saying to ourselves.
So that’s why I don’t think the military can ever fight just wars, even in cases like WWII.
So you’d let Hitler take over the world?
No, sir. I see lots of inspiring stories about nonviolence. I see the conscientious objectors in America volunteered for medical experiments that helped save millions from starvation and disease at the end of the war. I see the Danish resistance movement which was largely nonviolent. I see the Rosenstrasse protest where Jews were given their freedom to live in Berlin throughout the whole war. I see the White Rose and the village of Le Chambon. And I see all the otherRighteous among the Nations. I see all these examples of nonviolence, and I’m inspired to follow them.
So I don’t know if these things would have worked on a larger scale, but I know they worked on a smaller scale. And my faith is in Jesus. That when I choose to follow him—really to follow him at the expense of everything else—I believe he makes miracles happen. I believe he changes hearts. And I am willing to suffer and die for this belief. If all we could do to fight evil was to wage war, then we should wage war. I used to think that’s all we could do. But now I believe Jesus shows us another way.
Finally, part of being a peacemaker means taking the plank out of our own eyes. We created Hitler. We created the economic misery Germans faced with the Treaty of Versailles. That was our desire for vengeance. Many Jews came and saught assylum here, but we turned them back and handed them to Hitler. We keep on creating enemies, then killing new enemies, then creating more in a vicious cycle. It’s the same thing we did with Bin Laden and Ahmadenijad. When you ask me to go in a submarine and kill some ragheads, you’re really asking me to create an enemy out of their children that my children are going to have to fight.
That stuff only worked because there were soldiers on the ground. If everyone thought that way, then there’d be no one left to actually fight evil.
Well, I think Jesus’s vision of peace is a lot harder to work for than the military’s vision of peace. Even though Chesterton wasn’t a pacifist, he said it best that “Christianity hasn’t been tried and found lacking, it’s been found hard and not tried.” I want to try it.
Okay. Let’s go back a second. Once again, first you thought you could help reform the military. This was while you were at the Academy and just after graduating. Only later did you become a conscientious objector?
Yes, it was while at the Academy, and I hadn’t had any thoughts about conscientious objection. I didn’t even know what it was.
When did you learn about conscientious objection?
About a year after graduating from the Academy. I was at nuclear power school, learning how to operate the reactors on the submarine, when I really began to struggle with this issue in my faith. I was flagged by a psychological screening as possibly unfit for duty, but after a few conversations with the psychologist he cleared me. But I still wasn’t comfortable, so I went to see the chaplain. We had a lot of conversations, and eventually he showed me MILPERSMAN 1900-020, the regulations for discharge as a conscientious objector. Like I said, I’d never heard of that before. He recommended I fill out the application as a way to clarify my beliefs, to see if this fit me. So I did.
That was in early 2009? Around March?
Yes, sir.
But you submitted your application in October?
****Well, it took me a long time to figure out this was what I had to do. It wasn’t a light decision for me, but something I really struggled with. I had to make sure it was right.
You’ve isolated yourself from other people in the Navy. Do you believe it is wrong to associate with people who don’t agree with your beliefs?
No. I do not believe that it’s wrong. In many ways I think it’s a healthy thing to do. I think an important part of growing in your faith is challenging it, so I’ve tried to talk to lots of people who disagree with me.
That’s actually how I first started thinking about pacifism. At the Naval Academy, we took a course on ethics. We talked about things like the just war tradition. But I was never satisfied with that discussion, because I didn’t feel like I had heard both sides of the issue. So I started reading about pacifism, and I learned that there were groups of Christians who were pacifist. I never knew that before. Growing up, no one ever told me that there’s pacifist Christians.
So when I first started reading about these guys, I did it to challenge my beliefs. I thought that it would make me a better naval officer, because then I would better understand why it is that I’m going to war. I kind of assumed these guys were naive and stupid, but I wanted to know why they were naive. It had the opposite effect. I read people like Yoder, Hauerwas, Dorthy Day, Martin Luther King, Walter Wink and Gandhi. I read a lot. And all of these people impressed me. Their theologies impressed me, but even more so their lives impressed me.
At the Academy we would occasionally read off medal of honor citations. I would think to myself, “These guys are heroes. What sacrifice! What effort to save their fellow soldiers!” But now I was reading these Christian pacifists and I see the same sacrifice and efforts. It amazed me because these pacifists were helping people at least as much as the medal of honor guys did, and they didn’t even have to kill people to do it. In fact, they were helping their enemies just as much as helping their friends. I’d never seen people with such a dedication to Jesus, and that’s what I wanted in my own life. This caused me to read the sermon on the mount over and over, and the more I read it the more I felt the call that Jesus wants us to be nonviolent.
This really disturbed me. I didn’t want these newly forming beliefs. These beliefs have totally ruined my life. I always wanted to be a military officer. I spent my whole life dreaming about it. I can’t even describe the conflict inside of me.
So I did everything I could to convince myself that these beliefs weren’t real. I talked to my closest navy friends about them. We had long conversations analyzing the justness of all the wars America’s ever been in. I talked to my pastors. Lots of pastors actually, I think probably more than 10 now. Both military and civilian, from every denomination possible. I talked to my parents and my family. All of these people believed in just war, and tried to convince me to stay in the navy. Like I said before, I wanted them to convince me. But they couldn’t. Now that I understood what Jesus’s love meant. That it meant loving your enemies, and forgiving them even as they nailed him to the cross. I had to dedicate myself to that. How could I not?
It sounds to me like you were looking for an excuse to become a pacifist and avoid your military obligations.
No, sir. That couldn’t be farther from the truth.
Pacifism isn’t the only part of my faith I challenged this way. I do it to every part of my faith. For example, I love reading books by Dawkins and Hitchens and all the other new atheists. I struggled with atheism for a while because of them. I didn’t know how to answer all their objections. But reading them really helped me focus my understanding of Christianity. They helped inspire me to study the historicity of the bible and to learn Hebrew. Ultimately, I was able to walk away with a stronger faith than I ever had before.
So I thought when I began reading the pacifist books the same thing would happen. Like I said I thought I would become a better officer. But the pacifists just made more sense. I saw Christ in their lives.
So why have you isolated yourself from the other people in the navy?
I haven’t been hanging out here with the other submariners because we don’t have much in common, and at the end of the day I just want to escape. The whole thing’s really frustrating for me. I’ve talked to so many people about these beliefs already. I don’t need these guys making fun of me—you know how submariners can be to outsiders—and making my life harder than it already is.
You’ve spent a lot of time since then learning to operate the nuclear power plant on a submarine. How has that affected you beliefs?
Not a lot. I think nuclear power can be a good thing, but not nuclear weapons. I’ve been training enlisted sailors for war, and that really bothers me. It’s pretty hard working 12 hours a day, 7 days a week, doing things I don’t believe in. This has really contributed to my stress and frustration with the whole situation.
You know that we don’t put people in charge of nuclear missiles who don’t want the responsibility.
Yes, sir, I know. But my beliefs aren’t just against nuclear missiles, they’re against all weapons and all warfare.
When you launched torpedoes in the trainer, why didn’t you just stand up and tell them it’s wrong and they need to stop? Why didn’t you stand up and say, “No, you’re all wrong. Peace is the way, not war. You need to realize the mistakes you’re making!”
My goal in this process has been to work through the system as much as possible and not to be obnoxious about it. This is my testimony, and I’m giving it to the Navy, and the Navy is going to make their decision on it, and I’m going to follow the Navy’s rules to the best of my ability about this decision. I will refuse orders to kill people, but to the best of my ability I’m going to follow the rules.
One of those rules was when my chain of command “highly recommended” I not talk to anyone else about this. I don’t think they were worried about the officers, but that if the enlisted guys found out. I don’t want to cause a scene, and I don’t want to get charged with conduct unbecoming or spreading dissent or treason or anything.
I remember a story my last captain told us over lunch once. He was talking about when he was a junior officer on his first submarine. One of the other officers really sucked at his job. He was constantly breaking equipment and making more work for the others. One day, the guy had a nervous breakdown and said he couldn’t take it anymore and was kicked out of the submarine force. Everyone called him a coward and a pacifist. Not a pacifist because he didn’t believe in war, but pacifist was an insult. It was a way to mock him and call him stupid.
We’ll I’m a pacifist, and I’m not stupid. I’m actually pretty good at running a nuclear plant. I’m the one picking up others’ slack. I’m always given top rankings in my reviews, and a couple of honors. I figure that since this work isn’t directly killing someone, I can do it for the time being, and if I’m going to do it, I’m going to do it better than anyone else. This is my way of turning the other cheek. The Navy’s asked for my tunic, and I’m giving you my cloak as well. I’m going the second mile for you, sir.
Maybe one day all these guys I work with will find out about this, and my actions will bring Jesus glory. But I don’t know for sure. Maybe I could be doing things better. I’m just trying to do my best to honor God in a really difficult situation here. It kinda sucks.
I talked to your father yesterday about why you can’t stay in the military and do something that’s of a different nature that doesn’t involve warfare. I think he had said that in his discussions with you there was some remark about just wearing the uniform, you know. People seeing you wearing the uniform would indicate your support for the military, and that would be unacceptable to you. Was that statement correct?
Yes sir, every day. Even though I’m not currently in a combat position, it weighs against my beliefs. Everything I have to do at work contributes to warfare, it makes others more effective killers, and that’s not compatible with my beliefs. That’s why I need a discharge.
My parents and I talked a lot about noncombatant roles I might be able to take. I think they were very worried for me. They knew I couldn’t continue as I was, but thought that getting a discharge just wasn’t the way out. So we spent a lot of time looking at other possibilities. Like I said, I didn’t want a discharge and tried really hard to find any solution that would solve my beliefs without a discharge. I think this is one of the reasons it took me so long to fill out that first application too.
Basically I made a list of all the possible noncombatant jobs in the navy. There’s a manual somewhere that has them all listed. I went down the list of these billets and asked myself, “Is there anyway I could live with myself if I did this job for the next four years.” And the answer was always no. Even the people on hospital ships are being sent to foreign countries to “project American sea power” and provide future bases of military operation.
I remember when I was a little kid. I would see marines coming home from the Gulf War, and I was so proud of them! I wanted to be just like them when I grew up! I think that probably was a huge influence for why I joined the navy in the first place. But I don’t want kids looking at me like that. When people look at me, I want them to see a disciple of Jesus, not Washington.
In your letter to Naval Personnel Command dated 24 Nov 2009, you stated “I continue to respect those in the military for their sacrifices in support of our country.” How is it that you have respect for people that do something that you don’t believe in?
Well, I have respect for people because of their values. It’s the values that I see in service members that attracted me originally—of self sacrifice, honor, courage, and commitment—which have been important to me, are important to me, and will continue to be. I think honoring these values with nonviolence is the higher calling, but I still have many friends whom I respect who disagree. So I feel that I cannot be in military, but I don’t want to insult people who disagree with me. I don’t want to insult people who are still trying to think like Bonhoeffer, people who are trying to make our wars more just.
After almost seven years in the navy, I was discharged as a conscientious objector in February 2011. There’s no one thing I can point out as being the the last straw that made me become a pacifist. Instead, it was a very gradual process. I applied twice for discharge, was denied twice, and had to go to federal court before my discharge was granted. The official record for my case is over 1000 pages. I’m posting some of that here that others might find helpful.
I’m posting the stuff here mostly so that anyone else going through the process has something to reference. I remember when I went through the process I wished I had more stuff to help guide me. If that’s why you’re reading this, you should call the GI Rights hotline and the Center on Conscience and War right now. I was skeptical at first. I thought I didn’t need help from other people, but I was wrong. After my first application was denied, I called these organizations and they hooked me up with the ACLU to get great legal representation. Also, make sure you find a good support group to help you out. I spent a year at St Francis House, a pacifist community, while all this was going on. The whole thing was a pretty miserable experience, and I wouldn’t have made it without all these people.
Here’s my first application, my second application, and the habeas corpus petition in federal court. The first two applications were denied. It was only after I submitted the petition the navy granted me a discharge.
Also, I’ve added some excerpts from my testimony before the investigating officer. A lot of it was very technical and procedural, so I’ve just selected excerpts where we talk about theology. Overall, I interviewed with over 30 officers about my beliefs, each for an hour or more. The interviews with the investigating officers spanned multiple days.
How I interpret the bible, especially the old testament
Why I call myself “pro-peace” rather than “anti-war”
How nonviolence affects my interaction with the government
For the cliffsnotes version, just checkout the NYTimes article. Most of the news stories received tons of negative comments, but they editors were nice enough to remove them all.
The Day (local paper), article 1
The Day (local paper), article 2 (written in response to people who complained that I got a discharge)
The Day (local paper), op-ed in my support
The Day (local paper), op-ed against me
San Diego Channel 10 News (includes video)
Get Religion (includes video from Fox news)
The Navy Times (just a reprint of the AP article)
MennoWeekly (an update on my status after several months)
ACLU’s report on the initial filing of my case
The Stupid Shall be Punished (an unofficial submarine commnuity blog has their reaction to my discharge)
The Scoop Deck (this is a semiofficial publication from the military with their reaction to my discharge
Between 1954 and 1973, 522 conscientious objectors volunteered as human test subjects in the US Army’s biological weapons defense program. Conscientious objectors are people whose religious beliefs forbid them from participating in war. In the United States, most of them our Christians, but they can be of any religion, or of no religion at all. These men believed so firmly that killing people was wrong, that they decided to risk their lives as medical experiments rather than be drafted as a soldier.
Under Project Whitecoat, the Seventh Day Adventist Church made special arrangements with Army Surgeon General so that members could avoid conscription into a combat role. Adventists are a one of the peace churches. They believe that Jesus called all men to love each other unconditionally, and that military service is not compatible with this obligation. Instead, they seek alternative ways to serve their country. In this case, by being exposed to biological weapons.
The goal of Project Whitecoat was “to use human volunteers in medical studies to evaluate the effect of certain biological pathogens upon humans in an effort to determine the vulnerability to attack with biological agents.” These human subjects participated in “studies involving exposure to live agents, receipt of investigational vaccines, and studies of metabolic and psychological effects of environmental- and infection-induced stress.”
According to a GAO report:
The human subjects originally consisted of volunteer enlisted men. However, after the enlisted men staged a sitdown strike to obtain more information about the dangers of the biological tests, Seventh-day Adventists who were conscientious objectors were recruited for the studies.
The study helped develop medical defenses against biological warfare. It resulted in techniques for rapid diagnosis, better cures for disease, better preventative medicine, and better vaccines.
These men deserve our undying respect and admiration. As John F. Kennedy said,
To find out more, read the Army’s official medical report
Daniel 3 tells the story of 3 Jews who refused to worship a statue that King Nebuchadnezer built to symbolize his and Babylon’s power. Verses 9-12 read:
Your Majesty has issued a decree that everyone who hears the sound of the horn, flute, zither, lyre, harp, pipe and all kinds of music must fall down and worship the image of gold, and that whoever does not fall down and worship will be thrown into a blazing furnace. But there are some Jews whom you have set over the affairs of the province of Babylon—Shadrach, Meshach and Abednego—who pay no attention to you, Your Majesty. They neither serve your gods nor worship the image of gold you have set up.
The rest of the chapter describes how these 3 Jews are thrown into a blazing fire for their insolence, but God saves them.
When I read this chapter, I can’t help but think there is a parallel here between the statue of gold and the American flag. In the latter case, whenever the national anthem gets played I am expected to stand, place my hand over my heart, and venerate the flag. Of course, I won’t be burned alive if I refuse. But most Americans still get upset when they see me not doing these things.
Personally, the American flag used to be a huge idol in my life. This story is one of the Biblical passages that helped me realize that.
I enjoy brewing beer, and have invested a lot into equipment. But most of this equipment is useful for more than just beer. We can use it to make food! Some friends and I do this with a group called food not bombs. Basically, we just serve free lunches at our local college campus, no strings attached. It’s a great way to connect with your community and make friends.
So what homebrew equipment do I have? A pretty typical all-grain brewing set up.
Here’s the stuff that we can also use for food:
10 gallon pot
5 gallon pot (this is leftover from when I first started brewing doing partial boils)
10 gallon water cooler
mash tun stirrer (I use a wooden dowel bought from Lowes)
hops bag
Outdoor propane burner
The only tricky bit is the water cooler, which needs to be converted from a mash tun every time we serve. I used PVC cement to attach half inch threads to the normal water cooler attachment.
This lets us swap between a hose barb for mashing and a water cooler attachment for serving cold water:
Buying all this equipment new would probably cost around $300-400, but your typical homebrewer will already have most of it anyways. The only other things you need are a table and an awning. Luckily, we were able to borrow those from a local church. Here’s what our final setup looks like:
We mostly serve Chinese stew because it is cheap, delicious, and simple to make. If you’re looking for some more ideas, check out food not bombs recipe page, or Ellen’s kitchen. We’ve also tried making chili, but it is very easy to burn the tomato paste when making these giant batches. So be careful!
We purchase these ingredients from Costco for a total of $19.96 after taxes, but it should be pretty cheap anywhere:
(20 lbs) potatoes
(10 lbs) onion
(10 lbs) carrots
(2 lbs) broccoli
(2 gallons) rice (This actually comes in 50 lbs bags, and you get enough for many feedings)
You could also add some chopped beef if you wanted, but we prefer serving vegetarian food so that more people will be able to eat it. Anyways, there’s so much flavor in this dish from the spices that adding meat doesn’t really make it taste any better.
The spices are:
10 oz freshly chopped ginger
25 bay leaves
25 star anise
3 Tbsp Schezuan pepper (Most peppers make your mouth taste hot, but this pepper makes your mouth taste cold!)
6 Tbsp chili powder
1 cup minced garlic
2 cups soy sauce
4 Tbsp corn starch (this thickens the water, turning it from a soup into a sauce)
I personally prefer my Chinese food to have a lot of spice in it, but this version is pretty mild so that everyone can enjoy it. We usually leave bottles of hot sauce and soy sauce available so that people can spice their bowls exactly how they want it.
All of these ingredients can be found at your local Asian food market. Buying in bulk, it costs us about $20 for enough spices for about 10 meals.
Cooking lunch for 150 really isn’t very different than cooking for only 5. The only difference is that the pots are bigger, and things take a little more time. Since we’re distributing food to the public, we also have to follow certain safety laws, like getting licenses and permits. Due to these regulations, all our food must be cooked at the site where we plan to cook it. So we make some preparations in the kitchen, then take everything to the site and start cooking.
In the kitchen:
Rinse all vegetables, then chop carrots, onion, and potato into 1/2 inch cubes. This takes 2 people about an hour if you chop quickly.
Fill the 10 gallon pot with the chopped potatoes and 2 gallons of water
Fill the 5 gallon pot with the remainder of the vegetables (we don’t want to add all the vegetables to the large pot before bringing it to a boil to prevent burning)
Chop the ginger into thin strips. Place ginger, bay leaves, star anise, and Schezuan pepper into the hop bag. All of these spices make the water taste great, but you don’t want to accidentally bite into them!
Then we load everything into a truck and drive to the site. We setup our kitchen, and begin cooking:
Bring the 10 gallon pot, with potatoes and water to a boil
Add the rest of the vegetables. Bring to boil
Once boiling, add the hop bag and the rest of the spices.
Stir thoroughly to prevent burning on the bottom of the pot
The vegetables will be cooked in about 20 minutes, but the longer you leave them in there, the more flavors they’ll absorb. I usually let the pot simmer for between 30 minutes to an hour, depending on how much time I have until serving.
Once done, take the 8 gallon pot off the burner and allow it to cool as you cook the rice. Add 4 gallons of water to the 2 gallons of rice. (Because there’s room in between rice grains, this still manages to barely fit in the 5 gallon pot.)
It’s ready! At last! How will we eat it?!
Using paper plates and plastic spoons is by far the easiest, but it also adds to your expenses. Altogether, disposable utensils cost about 20 cents per meal, making it half of your overall expenses for serving!
We usually serve each bowl with one big spoon full of rice, and two of vegetables. Make sure to add plenty of sauce, since that’s where all the flavor is! Obviously, the amount you put in each bowl determines how many people you’re going to feed. We usually end up serving between 120-150 with this meal.
Finally, get a local singer to play the guitar while you serve!
Bruce Phillips believed that killing people in war was wrong. After fighting in Korea, he became a conscientious objector. But he was certainly no coward: he volunteered as a smokejumper. Smokejumpers parachute into forest fires to extinguish them while the fire is still remote, before it becomes a direct threat to the public. Conscientious objectors during WWII pioneered the practice, and by the end of the war, 240 were deployed smokejumping across the country. Due to the success of the program, the US Forest Service continues it to this day.
I cried when I first read this poem. It testifies to the conscientious objectors’ courage, and nonviolent convictions. They were real men.
War came; the young men all stood in line to go. But we, when asked to take the oath, simply answered, "No." For what we said was simple, though said by just a few: "I will not shoot another man because I'm ordered to." No wonder some were puzzled, or took it as a joke, when COs wrote and volunteered to jump into the smoke. You said that what we were doing could prove that we were men; we had---and didn't need your words to prove it once again. You thought that we were renegades, and the training much to hard; we packed your words in our duffel bags and left for Camp Menard. But you shunned us in the cookhouse, and cursed us to our souls; your words were blurred by the heat and sweat, as we practiced landing rolls. You said we were too yellow to jump with airborne troops; we rolled your words in our shroud lines when the rigger packed our chutes. We turned aside your hatred, and blunted your abuse; we held your words in clenching teeth, and climbed into the goose. You told us we were cowards, called each of us a liar; we hooked your words to the static line, and jumped into the fire. And all you said hung over us as we saw our chutes deploy; we took your words to the fire line, to save and not destroy. You said we'd never understand what war is all about; we threw your words on the roaring flames and put the fire out.
Reprinted from Mark Mathews’s Smoke Jumping on the Western Fire Line.
One common myth about conscientious objectors in the US is that they are reservists who took the government’s money to pay for college but then refused to fulfill their end of the bargain. This graph, however, shows that most conscientious objectors are in fact full-time, active duty personnel:
The data was obtained from two Government Accountability Office (GAO) reports on conscientious objection. Report GAO/NSIAD-94-35 covers conscientious objection during the First Persian Gulf War, and report GAO-07-1196 covers conscientious objection during Operation Iraqi Freedom.
This is a collection of government files I have collected concerning the conscientious objector (CO) process in the United States. Many of these files are outdated; whatever analysis they provide is probably no longer relevant. They are probably not of interest to you, unless you are doing some serious historical work.
GAO-07-1196 (2007) “Number of Formally Reported Applications for Conscientious Objectors is Small Relative to the Total Size of the Armed Forces”
GAO/NSIAD-94-35 (1993) “Conscientious Objectors - Number of Applications Remained Small During the Persian Gulf War”
GAO/NSIAD-98-199 (1998) -“Gender Issues - Changes Would be Needed to Expand Selective Service Registration to Women”
“An Assessment of Health Status Among Medical Research Volunteers who Served in the Project Whitecoat Program at Fort Detrick, Maryland” (2005) - 2000 conscientious objectors volunteered to have biological weapons tested on them rather than fight in combat
“Restructuring the In-Service Conscientious Objector Program” (1993) - Army JAG argues that in-service CO regulations are too lenient and need to be restricted
Technical Report 70-1, and Appendix (1970) - Army manual about training medical corpsmen who are also 1-0-A conscientious objectors
DODd-1300.06 (1971) - All changes after this date appear to be trivial. This regulation is a significant deviation from previous regulations, however, which I do not have copies of.
These are full journals, not just the relevant article. Do a search for “conscientious objector” to find the relevant section.
“Conscientious Objectors and Courts-Martial: Some Recent Developments” (1971)
“Nuclear Weapons: The Crisis of Conscience”(1985) - Discusses conscientious objection in relation to nuclear pacifism and denunciation by Catholic Bishops of nuclear weapons
This is a tutorial for how to use Hidden Markov Models (HMMs) in Haskell. We will use the Data.HMM package to find genes in the second chromosome of Vitis vinifera: the wine grape vine. Predicting gene locations is a common task in bioinformatics that HMMs have proven good at.
The basic procedure has three steps. First, we create an HMM to model the chromosome. We do this by running the Baum-Welch training algorithm on all the DNA. Second, we create an HMM to model transcription factor binding sites. This is where genes are located. Finally, we use Viterbi’s algorithm to determine which HMM best models the DNA at a given location in the chromosome. If it’s the first, this is probably not the start of a gene. If it’s the second, then we’ve found a gene!
Unfortunately, it’s beyond the scope of this tutorial to go into the math of HMMs and how they work. Instead, we will focus on how to use them in practice. And like all good Haskell tutorials, this page is actually a literate Haskell program, so you can simply cut and paste it into your favorite text editor to run it.
Before we do anything else, we must import the Data.HMM library, and some other libraries for the program
>import Data.HMM
>import Control.Monad
>import Data.Array
>import System.IO
Now, let’s create our first HMM. The HMM datatype is:
data HMM stateType eventType = HMM { states :: [stateType]
, events :: [eventType]
, initProbs :: (stateType -> Prob)
, transMatrix :: (stateType -> stateType -> Prob)
, outMatrix :: (stateType -> eventType -> Prob)
}
Notice that states and events can be any type supported by Haskell. In this example, we will be using both integers and strings for the states, and characters for the events. DNA is composed of 4 base pairs that get repeated over and over: adenine (A), guanine (G), cytosine (C), and thymine (T), so “AGCT” will be the list of our events.
We’ll start by creating a simple HMM by hand:
>hmm1 = HMM { states=[1,2]
> , events=['A','G','C','T']
> , initProbs = ip
> , transMatrix = tm
> , outMatrix = om
> }
>
>ip s
> | s == 1 = 0.1
> | s == 2 = 0.9
>
>tm s1 s2
> | s1==1 && s2==1 = 0.9
> | s1==1 && s2==2 = 0.1
> | s1==2 && s2==1 = 0.5
> | s1==2 && s2==2 = 0.5
>
>om s e
> | s==1 && e=='A' = 0.4
> | s==1 && e=='G' = 0.1
> | s==1 && e=='C' = 0.1
> | s==1 && e=='T' = 0.4
> | s==2 && e=='A' = 0.1
> | s==2 && e=='G' = 0.4
> | s==2 && e=='C' = 0.4
> | s==2 && e=='T' = 0.1
While creating HMMs manually is straightforward, we will typically want to start with one of the built in HMMs. This simplest way to do this is the function simpleHMM:
>hmm2 = simpleHMM [1,2] "AGCT"
hmm2 is an HMM with the same states and events as hmm1, but all the initial, transition, and output probabilities are distributed in an unknown manner. This is okay, however, because we will normally want to train our HMM using Baum-Welch to determine those parameters automatically.
Another simple way to create an HMM is by creating a non-hidden Markov model with the simpleMM command. (Note the absence of an “H”) Below, hmm3 is a 3rd order Markov model for DNA:
>hmm3 = simpleMM "AGCT" 3
Now, how do we train our model? The standard algorithm is called Baum-Welch. To illustrate the process, we’ll create a short array of DNA, then call three iterations of baumWelch on it.
>dnaArray = listArray (1,20) "AAAAGGGGCTCTCTCCAACC"
>hmm4 = baumWelch hmm3 dnaArray 3
We use arrays instead of lists because this gives us better performance when we start passing large training data to Baum-Welch. Doing three iterations is completely arbitrary. Baum-Welch is guaranteed to converge, but there is no way of knowing how long that will take.
Now, let’s train our HMM on an entire chromosome. We will use the winegrape-chromosome2 file. This DNA file was downloaded from the plant genomics database. We can load and process it like this:
>loadDNAArray len = do
> let dnaArray = listArray (1,len) $ filter isBP dna
> return dnaArray
> where
> isBP x = if x `elem` "AGCT" -- This filters out the "N" base pair
> then True -- "N" means it could be any bp
> else False -- so this should not affect results too much
>
>createDNAhmm file len hmm = do
> let hmm' = baumWelch hmm dna 10
> putStrLn $ show hmm'
> saveHMM file hmm'
> return hmm'
The loadDNAArray function simply loads the DNA from the file into an array, and the createDNAhmm function actually calls the Baum-Welch algorithm. This function can take a while on long inputs—and DNA is a long input!—so we also pass a file parameter for it to save our HMM when it’s done for later use. Now let’s create our HMM:
>hmmDNA = createDNAhmm "trainedDNA.hmm" 50000 hmm3
This call takes almost a full day on my laptop. Luckily, you don’t have to repeat it. The Data.HMM.HMMFile module allows us to write our HMMs to disk and retrieve them later. Simply download trainedDNA.hmm and then call loadHMM:
>hmmDNA_file = loadHMM "trainedDNA.hmm" :: IO (HMM String Char)
NOTE: Whenever you use loadHMM, you must specify the type of the resulting HMM. loadHMM relies on the built-in “read” function, and this cannot work unless you specify the type!
Great! Now, we have a fully trained HMM for our chromosome. Our next step is to train another HMM on the transcription factor binding sites. There are many advanced ways to do this (e.g. Profile HMMs), but that’s beyond the scope of this tutorial. We’re simply going to download a list of TF binding sites, concatenate them, then train our HMM on them. This won’t be as effective, but saves us from taking an unnecessary tangent.
>createTFhmm file hmm = do
> x <- strTF
> let hmm' = baumWelch hmm (listArray (1,length x) x) 10
> putStrLn $ show hmm'
> saveHMM file hmm'
> return hmm'
> where
> strTF = liftM (concat . map ( (++) "") ) loadTF
> loadTF = liftM (filter isValidTF) $ (liftM lines) $ readFile "TFBindingSites"
> isValidTF str = (length str > 0) && (not $ elemChecker "#(/)[]|N" str)
>
>elemChecker :: (Eq a) => [a] -> [a] -> Bool
>elemChecker elemList list
> | elemList == [] = False
> | otherwise = if (head elemList) `elem` list
> then True
> else elemChecker (tail elemList) list
Now, let’s create our transcription factor HMM:
>hmmTF = createTFhmm "trainedTF.hmm" $ simpleMM "AGCT" 3
Or if you’re in a hurry, just download trainedTF.hmm and load it:
>hmmTF_file = loadHMM "trainedTF.hmm" :: IO (HMM String Char)
So now we have 2 HMMs, how are we going to use them? We’ll combine the two HMMs into a single HMM, then use Viterbi’s algorithm to determine which HMM best characterizes our DNA at a given point. If it’s hmmDNA, then we do not have a TF binding site at that location, but if it’s hmmTF, then we probably do.
The Data.HMM library provides another convenient function for combining HMMs, hmmJoin. It adds transitions from every state in the first HMM to every state in the second, and vice versa, using the “joinParam” to determine the relative probability of making that transition. This is the simplest way to combine to HMMs. If you want more control over how they get combined, you can implement your own version.
>findGenes len joinParam hout = do
> hmmTF <- loadHMM "hmm/TF-3.hmm" :: IO (HMM String Char)
> hmmDNA <- loadHMM "hmm/autowinegrape-1000-3.hmm" :: IO (HMM String Char)
> let hmm' = seq hmmDNA $ seq hmmTF $ hmmJoin hmmTF hmmDNA joinParam
> dna <- loadDNAArray len
> hPutStrLn hout ("len="++show len++",joinParam="++show joinParam++" -> "++(show $ concat $ map (show . fst) $ viterbi hmm' dna))
>
>main = do
> hout mapM_ (\len -> mapM_ (\jp -> findGenes len jp hout) [0.5,0.51,0.52,0.53,0.54,0.55,0.56,0.57,0.58,0.59,0.6]) [50000]
> hClose hout
Finally, our main function runs findGenes with several different joinParams. These act as thresholds for finding where the genes actually occur. You can download the full results here.
How should we interpret these results? Let’s look at the output from around 38000 base pairs into the chromosome:
jP=0.50 -> 222222222222222222222222222222222222222222222222222222
jP=0.51 -> 222222222222222222222222222222222222222222222222222222
jP=0.52 -> 222222222222222222222222222222222222222222222222222222
jP=0.53 -> 222222222222222222222222222222222222222222222222222222
jP=0.54 -> 222222222222222222222222222222222222222222222222222222
jP=0.55 -> 222222222222222222222222222222222222222222222222222222
jP=0.56 -> 222222222222222222222222222211112222222222222222222222
jP=0.57 -> 222222222222222222222222222211112222222222222111222222
jP=0.58 -> 222221111111112222222222222211111122222222222111222222
jP=0.59 -> 222221111111112222211111111111111122211111111111222222
jP=0.60 -> 222221111111112222211111111111111111111111111111112222
Everywhere where there is a 2, Viterbi selected hmmDNA; where there is a 1, Viterbi selected the hmmTF. Whether you select this area as a likely candidate for a transcription factor binding site depends on how you set your join parameter.
Now that you’re familiar with how the Data.HMM module works, let’s look at its performance characteristics.
Overall, the Data.HMM package performs well on medium size datasets of up to about 10,000 items. Unfortunately, on larger datasets, performance begins to suffer. Algorithms that should be running in linear time start taking super-linear time, presumably because Haskell’s garbage collector is interfering. More work is needed to determine the exact cause and fix it. Still, performance remains tractable on these large datasets up to 100,000 items, which is the largest I tried.
I ran these tests using haskell’s Data.Criterion package. Criterion conveniently allows you to define multiple tests and does all the statistical analysis of them. For these tests, I did 3 trials each, and ran them on my Core 2 duo laptop. The code for the tests can be found in the HMMPerf.hs file. In all graphs, the blue line is actual performance data and the red line is a best fit curve.
Baum-Welch’s performance
First, as expected we find that Baum-Welch runs in linear time based on the number of iterations. In an imperative language, there would be no point in even testing this. But in Haskell, laziness can rear its head in unexpected ways, so it is important to ensure this is linear.
For small arrays, Baum-Welch runs in linear time.
But for larger arrays, it runs in super-linear time. It is interesting that the exponent on our polynomial function is not quite at 2. This provides evidence that the performance hit has to do with the Haskell compiler and not an incorrect implementation.
Viterbi’s performance
As expected, the Viterbi runs in quadratic time on the number of states in the HMM.
The curves for the Viterbi algorithm clearly demonstrate that something weird is going on. At small array sizes, Viterbi is only mildly super-linear. It’s best fit polynomial curve has an exponent of only 1.3. But at medium array lengths, this exponent increases to 1.8, and at large array lengths, the exponent increases to 1.97.
Data.HMM is a great tool if you just need a small HMM in your Haskell application for some reason. If you’re going to be making heavy use of HMMs and don’t specifically need to interact with Haskell, it’s probably better to use a package written in C++ that’s been optimized for speed.
Let’s make some unfair coins by bending them. Our guess is that the concave side will have less area to land on, and so the coin should land on it less often.
It’s easy to bend the coins with your teeth:
WAIT! That really hurts! Using pliers or wrenches works much better:
I made seven coins this way, each with a different bending angle.
I did 100 flips for each coin, making sure each flip went at least a foot in the air and spun real well. “Umm… only 100 flips?” you ask, “That can’t be enough!” Just you wait until the section on the math.
Here’s the raw results:
Coin | Total Flips | Heads | Tails |
0 | 100 | 53 | 47 |
1 | 100 | 55 | 45 |
2 | 100 | 49 | 51 |
3 | 100 | 41 | 59 |
4 | 100 | 39 | 61 |
5 | 100 | 27 | 73 |
6 | 100 | 0 | 100 |
Coin flipping is a bernoulli process. This just means that all trials (flips) can have only two outcomes (heads or tails), and each trial is independent of every other trial. What we’re interested in calculating is the expected value of a coin flip for each of our coins. That is, what is the probability it will come up heads? The obvious way to calculate this probability is simply to divide the number of heads by the total number of trials. Unfortunately, this doesn’t give us a good idea about how accurate our estimate is.
Enter the beta distribution. This is a distribution over the bias of a bernoulli process. Intuitively, this means that CDF(x) equals the probability that the expectation of a coin flip is \(\le\) x. In other words, we’re finding the probability that a probability is what we think it should be. That’s a convoluted definition! Some examples should make it clearer.
The beta distribution takes two parameters \(\alpha\) and \(\beta\). \(\alpha\) is the number of heads we have flipped plus one, and \(\beta\) is the number of tails plus one. We’ll talk about why that plus one is there in a bit, but first let’s see what the distribution actually looks like with some example parameters.
In both the above cases, the distribution is centered around 0.5 because \(\alpha\) and \(\beta\) are equal—we’ve gotten the same number of heads as we have tails. As these parameters increase, the distribution gets tighter and tighter. This should makes sense. The more flips we do, the more confident we can be that the data we’ve collected actually match the characteristics of the coin.
When the parameters are not equal to each other—for example, we’ve seen twice as many heads as we have tails—then the distribution is skewed to the left or right accordingly. The peak of the PDF occurs at:
That’s exactly what we said the expectation of the next coin flip should be above. Awesome!
So what happens when \(\alpha\) and \(\beta\) are one?
We get the flat distribution. Basically, we haven’t flipped the coin at all yet, so we have no data about how our coin is biased, so all biases are equally likely. This is why we must add one to the number of heads and tails we have flipped to get the appropriate \(\alpha\) and \(\beta\).
If \(\alpha\) and \(\beta\) are less than one, we get something like this:
Essentially, this means that we know our coin is very biased in one way or the other, but we don’t know which way yet! As you can imagine, such perverse parameterizations are rarely used in practice.
Hopefully, this has given you an intuitive sense for what the beta distribution looks like. But for the pedantic, here’s how the beta distribution’s pdf is formally defined:
Where \(\Gamma\) is the gamma function—you can think of it as being a generalization of factorials to the real numbers. That is, \(\Gamma(x+1) = (x+1)\Gamma(x) \Leftrightarrow (x+1)! = (x+1) x!\). Excel, many calculators, and any scientific programming package will be able to calculate that for you easily. Most of these applications will even have the beta function already built in.
We’re finally ready to see just how biased our coins actually are!
Coin 0 Heads: 53 Tails: 47 |
|
Coin 1
Heads: 55 Tails: 45 |
|
Coin 2 Heads: 49 Tails: 51 |
|
Coin 3
Heads: 41 Tails: 59 |
|
Coin 4
Heads: 39 Tails: 61 |
|
Coin 5
Heads: 27 Tails: 73 |
|
Coin 6
Heads: 0 Tails: 100 |
Amazingly, it takes some pretty big bends to make a biased coin. It’s not until coin 3, which has an almost 90 degree bend that we can say with any confidence that the coin is biased at all. People might notice if you tried to flip that coin to settle a bet!
Download the source code here. It is released under the BSD license. If you find it useful or do anything cool with it, I’d appreciate a heads up.
Other stuff you’ll need:
Microsoft Visual C++ Express Edition, available here
Royal Vegas Poker software and account, available here (You must have an account to log on to the software and observe a game. You do not need to have money in the account, unless you want to actually have PokerPirate play.)
(optional) VMWare player, available here
You will need a copy of Windows to run inside VMWare.
Optional stuff for the database analysis:
Download my sample database here (currently unavailable)
Download my analysis tool here (currently unavailable)
MySQL
Any web server with php support, for example Apache
The original setup was on an AMD Athlon 1800+ computer running linux. I used VMWare player to run a Windows XP virtual machine. On the virtual machine I ran only Visual C++, RVP, and my own code. It worked out pretty well and I recommend that setup to anyone who is considering a serious pokerbot.
I apologize for the poor quality of coding in places. I never thought I would release it to the public. Anyways, my recommendation is to completely understand what I wrote in the AI section. Once you’ve done that, start reading at main(). Most of the functions and variables have descriptive names. If you don’t understand what a function does or how it does it, then right click -> view implementation. Good luck! Send me an email if you want some help.
When you start PokerPirate, it automatically scans your open windows to detect which ones belong to Royal Vegas Poker. In debug mode, it will automatically connect to the first table it finds. Therefore, first start RVP and open one game table.
Starting RVP…
Select and open a sit-and-go table…
NOTE: Make sure the table is for 10 players, with the appropriate blind structures. This is not required, but it is what the AI is designed for.
An empty table is opened…
NOTE: PokerPirate determines the game’s state from the information in the chat panel. It assumes that, initially, all players started with 1000 chips. Every time a bet, raise, call, or all-in is recorded in the chat box it deducts the appropriate number of chips from that player. Every time a player wins a hand, it adds those chips. The advantage of this is that no OCR techniques are required to determine a player’s chip counts. A simple call to GetWindowText() returns the information in text format, which is much easier to parse. The disadvantage is that the entire game’s status must be displayed in the window. Therefore, it is not possible to begin analyzing a game half way through. You must have the game open when it starts.
In order to launch PokerPirate, you will have to pass the command line option “debug” to the program. The easiest way to do this is to create a shortcut on your desktop (or 5 like I did)…
NOTE: PokerPirate will not be able to connect to a window unless it knows what your alias is! (Notice how your name appears in the window’s title.) Edit the pp.conf file, and change the “Moniker” value to be what yours actually is.
The program is now running…
By default, the program will not play for you. This will let you experiment with the code without the risk of losing money. If you want it to play, simply update pp.conf and set “Observer” to “false”. Be forewarned that PokerPirate will take over your mouse, so you won’t be able to surf the web or anything while this is happening. Good luck!
My AI uses what I like to call the “manual” technique, also known as rule-based artificial intelligence. Basically, a human writes a list of rules that determine how the AI should act in any given circumstance. The advantage of this technique is it is relatively easy to implement from a programming perspective. From an AI perspective, however, it can be very difficult to determine what those rules should be. That is why most pokerbots suck. These rules are much easier to generate, however, for sit-and-go tournaments.
There are many disadvantages of this manual AI technique. First, the AI cannot learn and adapt. It is possible, for example, that a player will go all-in every single hand. A human would recognize this easily and be able to take advantage of it. My AI, however, cannot. Second, the AI is only as capable as its human designer. I was never able to win consistently in the 30+3 sit-and-go’s or above. Therefore, I would not be able to design a set of rules that enable the AI to do this.
I spent considerable time working on more automated AI techniques, but found them to be unwieldy and not worth the effort. Below, I will discuss only the manual technique as I implemented it. Maybe later I will update this page to include my findings on the automated AI.
Now that we know how the AI sends the commands to RVP (makePlay()), we can discuss how it handles the decision making process. This is what getPlay() does. The AI has certain types of moves it knows how to play (blind steal, slow play, value play, etc). The AI only knows about standard moves that should be familiar to anyone has played Texas Hold’em seriously.
Initially, we assume that we will be folding. This is a conservative assumption, because we usually do not have cards good enough to justify playing (this applies to humans too!). Then, we see if any of our moves apply in the given situation. If it does, we override the fold and returns. Otherwise, we try the next move.
Each move is a function, which takes a TableTexture struct as an input. This struct is generated by MsgQueue and defines all the relevant information about the cards on the table. For example, it contains the pot odds and whether or not there is a straight scare on the table. The full implementation can be found here.
getPlay() looks like this:
ActionRet ret;
/*
Heads up is a very unique situation which requires a lot of aggression;
handle this situation seperately
*/
ret=headsUp(t);
/*
If ret is set to something other than FOLD, then we are done; otherwise,
we must analyze to see what the best move is. Depending on what stage the
hand is in, the AI has a certain number of "moves" that it can perform.
These moves should be fairly self explainable from their function names.
note: pf=PreFlop
*/
switch (table->getStage()) {
case STAGE_PREFLOP:
if (ret.action==FOLD) ret=pfLimp(t); // call with weak cards, hoping to hit a set, straight, or flush later
if (ret.action==FOLD) ret=pfBlindSteal(t); // raise based soley on position
if (ret.action==FOLD) ret=pfValuePlay(t); // bet if we have a good hand
break;
case STAGE_FLOP:
if (ret.action==FOLD) ret=chaseDraws(t); // do pot odds make it reasonable?
if (ret.action==FOLD) ret=checkSteal(t); // its been checked down to us, a bet might win uncontested
if (ret.action==FOLD) ret=continueBet(t); // if we bet big pf, then good chance a large bet here will win
if (ret.action==FOLD) ret=probeBet(t); // gather information about the opponents' hand
if (ret.action==FOLD) ret=valuePlay(t); // bet if we have a good hand
break;
case STAGE_TURN:
if (ret.action==FOLD) ret=chaseDraws(t); // do pot odds make it reasonable?
if (ret.action==FOLD) ret=valuePlay(t); // bet if we have a good hand
break;
case STAGE_RIVER:
if (ret.action==FOLD) ret=valuePlay(t); // bet if we have a good hand
break;
}
/*
if we are desperate (defined by our stack/blinds/pot ratios), then we should
turn a call into an all-in. Basically, we're already pot committed, so we have
to go for gold. Also, prevents us from having to worry about "tricky" folds later on.
*/
if (t.desperate||t.uberAgg||t.uberDesperate) {
if (ret.action==CALL) {
ret.sub=ret.name;
ret.name="nocall";
ret.action=BET;
ret.amt=allin();
}
}
return ret;
Most of these moves are self-explanatory and it will be easy to follow the source code; however, I want to draw special attention to headsUp(), pfValuePlay(), and valuePlay() below as illustrative examples.
It is important to note that there is no bluff() move. My bot was not designed to take advantage of any outright bluffs. Instead, it only does semi-bluffs incorporated in the other hands. At the level of play PokerPirate was designed for, I think this is entirely appropriate.
All the functions can be viewed here:
This is the simplest and most important of the AI’s moves. Basically, if there are only 2 people left and the blinds are high enough, we should always go all-in. This worked so well because other players are surprisingly timid and usually fold. When I got in the money, I would finish 3rd 25%, 2nd 25%, and first 50%. Because 1st pays so much better than 2nd or 3rd, this function was my moneymaker.
To make PokerPirate profitable at higher stakes games, the headsUp() function will require an alternate strategy. Luckily, heads up poker playing is nearly a solved problem even for no-limit games. Assuming we will always push (all-in) or fold, we can follow this table.
Here’s how to read the table:
Determine number of BB’s in the short stack (could be either us or the opponent)
From the SB, push if the number of BB’s you are playing for is less than or equal to the number in the cell for the cards you are holding. For example, if you have 94o, you should be pushing when the game is for 2.7 BB’s or less. If you have 3 BBs, you should fold 94o.
From the BB, call a push from the SB when the number of BB’s is less than or equal to the number in the cell corresponding to the cards you hold. For example, from the BB you should call a SB push with any pocket pair when the game is for 15BB or less.
This alternate strategy has two flaws. First, it assumes our opponents will always be making the optimal move. In the low statkes games PokerPirate was designed for, this is definitely not the case. I believe that our opponents fold so often that my always push strategy is superior for these games. The always push strategy will obviously not work at high stakes. Second, it assumes the only available moves are all-in or fold. From a practical stand point, this will usually be the case. Making this assumption is probably sufficient to advance to medium stakes games. For high stakes games, however, further refinement will be required.
I ranked all the preflop hands. (View my rankings here.) PokerPirate was only allowed to play the top X hands, where X was based on position and blinds. PokerPirate would only raise in these situations, never call.
I think this function is currently the AI’s greatest weakness, and that PokerPirate might be ready for 10+1 games by only revising this function. There are two reasons for this.
First, I use the same hand rankings in any given situation. This should not be the case. For example, 10Js is marginal with 10 players, but pretty weak heads up. Therefore I have hard-coded how to handle this case as an exception. Much better would be to have a different set of rankings depending on the number of players playing, and your position. Probably someone has already run a simulation to determine the rankings of each hand in any given situation. My current rankings are optimized for about a 6-8 person table.
Second, because pfValuePlay comes at the very beginning it has a “trickle down” effect on the other moves. For example, if it is correct to play marginal hands more frequently, the chaseDraw() function will become much more successful.
valuePlay() is easily the most complicated of the functions. It bets when it has a good hand, and slow-plays (check-raise) when it has a great hand. It’s a simple idea, but defining “good” and “great” takes a lot of work.
The simple cases are easy to account for and I was able to directly program them in. Normally, with top pair in late position you bet. There are a lot of “exceptional” cases, however, that must be taken into account. These are obvious when you see them, but it is difficult to just make a list off the top of your head. This function was developed out of experience. I would watch PokerPirate play. Sometimes, it would make a bonehead move, like raising with top pair when there’s an obvious flush on the board. I then would code an appropriate exception.
Probably a lot of work could be done on this function to improve it. I doubt I accounted for all of the exceptional cases. Furthermore, it is not always appropriate to slow play a weak hand.
PokerPirate has two main components: an interface with the Royal Vegas Poker software and the AI. This section describes how these components relate.
This flowchart (explained below) shows the overall program execution flow.
The red boxes represent windows opened by the Royal Vegas Poker software. The black and green boxes represent code inside the PokerPirate program. PokerPirate has only one console window.
The black boxes represent the “heart” of the program and are used in all of the program’s modes. The MsgQueue class is what controls all the interaction between PokerPirate and RVP. It reads the cards off the screen and calculates players’ stack sizes. When playing, it controls the mouse and keyboard inputs to perform bets, raise, etc. The TexasHoldemAIManual class contains the actual AI code. Looking over the header for the MsgQueue class (MsgQueue.h) will give you a good understanding of how the interface actually works.
The green boxes represent the program’s flow path when in the “play” mode. Notice how it is able to have an arbitrary number of open tables. The RVP software, however, was limited to five tables at a time.
The actual update loop for “play” mode looks like this:
/*
bring the table to the foreground;
start a timer to see how long AI engine takes
*/
tables[i].loadTable();
/*
read the table to see if anything has changed
*/
tables[i].update();
/*
print the table's status;
display type ("play"/"debug"/etc) controlled by
tables[i].setPrintType() above
*/
cur (0, 4+i);
tables[i].print();
/*
ask the AI for its move;
will actually make the move if tables[i].allowPlay==true
*/
tables[i].getAI()->makePlay();
/*
stop AI engine timer
*/
tables[i].closeTable();
makePlay() is the interface between the AI and the MsgQueue. It calls the function getPlay(), which is where the AI is located. getPlay() will return an ActionRet struct.
struct ActionRet
{
string name,sub;
int action,amt;
};
The strings are only for output purposes. Amt is the amount to bet or raise if applicaple. Action can be any of the following:
#define FOLD 'f'
#define CHECK 'p'
#define CALL 'c'
#define ALLIN 'a'
#define BET 'b'
#define RAISE 'r'
Once it knows what action to make, it calls the appropriate MsgQueue function (DoBet which also handles raises and all-ins, DoCall, DoCheck, and DoFold). These in turn use MsgQueue’s Keyboard and MouseClick functions to send the play to the RVP window.
The interface was surprisingly simple to build. The only challenge I came accross was when RVP updated their software. This would occur approximately monthly. These updates would sometimes move card or button positions very slightly. This would cause PokerPirate to read cards incorrectly, or not to be able to make plays.
A lot of my losses during development of the program occurred when I just let it play unobserved over a weekend as a test of its robustness. Luckily it was only playing one table. Basically, RVP moved their button positions for betting/calling/etc down about 20 pixels, and PokerPirate was no longer able to make any plays. It went the whole weekend just timing-out and folding every hand.
If you read through some of the table connecting code, it will look very messy. This is a result of constant changes trying to keep up with the pace of updates. Now, RVP has changed its software over to PokerTime. If anyone wants to start using PokerPirate again, they will have to completely update PokerPirate’s ability to find cards and buttons. In practice, this means finding the x and y coordinates and updating the variables, which sounds easy but takes several hours of tedious work.
I thought the casinos would not want to have bots playing on their networks. My biggest fear was that they would seize my assets and I would be out several thousand dollars. Therefore I took some measures to cover my trail.
I didn’t think I had to fear that casinos would be using software to monitor my computer for bots. Because my bot was custom, it would be relatively difficult to detect. There are two ways I could see me being detected. The first is that an admin might try to talk to me over the chat system. I could not think of a way to counter this, but considered it to be unlikely. In retrospect I am lucky this didn’t happen with the weekend-timeout story above. The second would be monitoring for artificial keyboard/mouse inputs. This would be possible if the software somehow monitored calls to the SendInput() function that I was using to do this. I do not know much about Windows, but believed this would require administrator-like privileges which I did not give my user. Ultimately, however, I believed this to be an insignificant threat.
My greatest counterdetection threat was due to being incorrectly identified as using collusion (a form of cheating where two people at the same table share information). Poker rooms seem to take this much more seriously than pokerbots. This was a concern because I was using multiple accounts to get the table coverage I wanted. If two of these accounts from the same IP address logged on to the same table, the poker room would flag them even though they weren’t colluding. Even if this weren’t the case, I wouldn’t want my bots playing against each other. Therefore, I took three counter-collusion-detector measures. First, I limited each account to be able to play at only certain tables so they would never overlap. Second, each bot’s virtual machine had a unique IP address. Third, money was deposited and withdrawn from each poker account using different banking information (all of which was legal).
Based on the current proliferation of pokerbots on the net, it doesn’t seem like anyone is too concerned about the matter. I am guessing this is because most of these bots are not successful. If it became widely known that there was a successful bot operating on a casino, I guess that would hurt the casino’s business and they would take measures to shut it down.
Before talking about how the AI actually works, I think it would be useful to talk about the program’s architecture. This section is mostly relevant from a software engineering perspective and can easily be skipped.
I run Linux on my desktop. In order to run the poker software, I had to load a copy of Windows XP inside a VMWare virtual machine. I did the development and testing in this virtual machine.
PokerPirate can be run in any of 5 modes:
Play
Watch
Debug
Proc
ResetDB
This section describes each mode, and the download section describes how to get the program running in each.
This is the default mode, and the way PokerPirate was meant to be operated. This is what my screen looks like when running:
Around the edges is my linux desktop. We’re interested, however, in the VMWare window that occupies most of the space. Inside that are 4 RVP tables, PokerPirate, and Microsoft Visual C++. This screenshot is typical for what the program looks like in play mode. Unfortunately, at the time of the screen shot, no one was seated at any of the tables. Apparently, Royal Vegas Poker is not as popular as it used to be.
This is what the PokerPirate console looks like:
PokerPirate is just a console program, and an ugly one at that. The only purpose of the window is to provide some debugging and status information. All the displayed information is simply the default for an empty screen.
When PokerPirate is running, in any mode, it is important to note that the computer is unusable. PokerPirate is constantly changing which window is in the foreground so it can read new information, and constantly moving the mouse to make the plays. This makes it difficult to follow each game, so the PokerPirate console displays the status of each game in red. At the bottom is space for the AI to output its debug information.
Upon starting, the program checks the file pp.conf. This file controls everything about the program’s operation. Most notably, it controls which player the AI will act for (“Moniker”), and what times of day the AI is allowed to play for (“TimeStart” and “TimeEnd”). I ran three accounts on Royal Vegas Poker, each with their own set of times they were allowed to play. This made it appear that someone was just logging in and playing as though it were their “job.”
If there are already RVP tables open when PokerPirate starts, it will take control of them. If there are none open, it will automatically start RVP, join 5 tables, and begin play. 5 tables is the maximum number of tables that RVP will allow a single account to sit down at. This is another reason I operated 3 RVP accounts. Each account had its own virtual machine. Inside the virtual machine would be PokerPirate playing a different account on 5 separate tables. Each account was given specific tables it was allowed to play at, so no two accounts would play at the same table.
Watch mode functions exactly like play mode, except the AI won’t automatically seat you at a table. If you’re already seated, however, it will play for you so you don’t just fold every hand. Because it still records the games, it is useful for building a large dataset of games for future analysis in proc mode.
The screenshots above are actually from watch mode. Because I’m making this page long after I stopped using the program actively, I no longer have any money in my accounts and can’t play. Notice how it says watch in the PokerPirate title bar, and Allow new games (56): 0. The 0 means do not sit down at new games. The 56 is the total number of games it has observed since being reset. This is tracked in gameid.txt and incremented after each completed game.
As the name suggests, I used debug mode to develop PokerPirate and not for actually playing to make money.
With the game like:
PokerPirate will display:
This screen displays a lot more information about the table, because PokerPirate is only playing on one table. It’s basically just spitting out all the information it knows about the game’s status. I used this screen to, you guessed it, debug the program. It helped debug interfacing with the RVP software in two ways, and it helped in the early stages of AI development.
First, reading the cards. The only way to read the cards is to take a capture of the screen. Fortunately, the cards appear in the same locations every time, so you simply take that region and compare it to all 52 cards and see which one matches. The row called “cards” displays the community cards, and whatever players’ cards are revealed will show up next to that player. Unfortunately, the new version has gotten fancy. They now move the card positions around when people win, to highlight what the winning hand was. They also dim the non-winning cards. This all interferes with my screen reading routines. This should be fairly straight forward to fix if anyone has the motivation.
Second, reading the chip counts. Reading the chip counts is much easier than reading the cards. As I mentioned in the note above, all you have to do is use the GetWindowText() function built into Windows to get the text on the side of the screen. Then, parse that text to find out how much everyone has bet/won/lost. Notice how the stack sizes are all integers. Originally, I designed the game to be compatible with only integer stacks, because that is how RVP was originally built. They have since changed their code. Now, when a pot with an odd dollar amount is split the “change” is split between the winners. My code can’t handle this. These decimals are so small compared to the stack sizes, however, that it is probably not a big deal strategy-wise. The error message at the top has occurred because if you add up all the player’s stacks and the pot, you no longer get the 10000 like you’re supposed to.
Finally, displaying the AI. Notice all the stuff on the right. This was an early means for me to output the AI’s decision process. I have no idea what any of it means anymore. (Hint for developers: comment your code!) At the very bottom, I output what the AI’s recommended move is. In the latter stages of AI development, this screen was no longer useful, because “interesting” hands would only come along about once a game. I was faster to have many games running in play mode to get the interesting hands to appear more frequently.
In proc mode, the program uses the game recordings from play mode to compile information about the players. Originally, my idea was to compile a database of every player, to determine their unique playing styles and exploit them. This was found not to be necessary. Additionally, it was excessively difficult. After playing thousands of games, I realized that I played with the same people only occasionally. It was, however, useful in characterizing the relative difficulty between blinds. Based on this data, I determined that the level of play was roughly the same from .75+.25 level until the 5.00+.50 level. The next blinds level, 10.00+1.00 provided significant increase in player skill. I stopped work on the program before I was able to beat the 10+1 game, however, I believe it would be possible with only a little work.
Proc mode requires access to a MySQL database where all this information is stored. I also had a series of php scripts to go through and analyze the database. My current desktop does not have these available, so I’ve disabled proc mode in the source code. Analyzed results can be viewed in the results section.
This clears the database. It is required to be run before running proc mode again to update the database with new information.
The key to winning at poker is finding a game you can exploit. As Matt Damon tells us in Rounders:
Listen, here’s the thing. If you can’t spot the sucker in the first half hour at the table, then you ARE the sucker.
To make money at poker, you must find and take advantage of your opponents’ weaknesses and minimize your own. This is true if you’re playing as a person, or a bot is playing for you. In this section, I will describe how sit-and-go style games are particularly easy to exploit as a pokerbot.
If you need a refresher on basic poker hands or how to play Texas Hold’em, I recommend reading the wizard of odd’s page on Texas Hold’em before continuing.
Everyone has a different style of playing poker, and each of these styles are more suited for different types of games. Before we look at exactly what types of games we can exploit, we should analyze what our own strengths and weaknesses are.
Advantages of a pokerbot:
Consistency of play
Patience
Won’t “go on tilt”
Accurate odds calculations
Disadvantages of a pokerbot:
Does not respond well to change
Can only handle preprogrammed situations
Suffers from “information overload”
The “traditional” pokerbot plays small stakes, “cash” games. In these games, table conditions are relatively consistent: blinds structures do not change; the number of players at the table is relatively constant; the amount of cash a player has plays only a small part in strategy. All this means that each hand, strategically speaking, should be approached with the same perspective. This plays right to the advantages of a pokerbot. If you can program your bot to handle this situation, you’ve hit the jackpot.
What makes poker a unique game is that a poker hand can have literally billions of variations. Some of these are easy to program for. For example, if your hole cards are 2 7 offsuit preflop, then fold. Most situations are much more difficult than this. Not only do we have to consider our own cards, but we must consider how other peoples actions give clues as to what cards they may have. It turns out this is extremely difficult to program for. There are just too many factors for the human programmer to be able to account for everything.
This is the case for the traditional pokerbot. The information overload dominates. That is why most pokerbots available today are not competitive. They chose the wrong game.
PokerPirate exploited the sit-and-go tournament. These tournaments have become very popular online. Essentially, they are single table tournaments. At Royal Vegas Poker, they worked like this:
Everyone buys in for a set amount, say 5+.50. (This means that the house takes fifty cents, and five dollars go into the prize pool. The total cost to the player is $5.50)
The tournament starts once ten players have joined
Everyone is given 1000 tournament chips
Blinds increment like this:
Hand | RVP blinds |
1-10 | 10/20 |
11-20 | 20/40 |
21-30 | 40/80 |
31-40 | 80/160 |
41-50 | 160/320 |
51-60 | 320/640 |
61+ | 640/1280 |
Once a player reaches 0 chips, he is eliminated from the tournament
The first place player receives 50% of the prize pool, second place 30%, and third place 20% (for the 5+.50 game above, this would be $25 to 1st, $15 to 2nd, and $10 to 3rd)
I developed PokerPirate in 2005-2006 to work on the Royal Vegas Poker casino. Since then they have redesigned their game to use the new PokerTime software, and this has changed their tournament structures somewhat. It is still, however, very similar. Most other online poker rooms offer sit-and-go’s with similar structures.
The sit-and-go tournament, surprisingly, is a goldmine for pokerbots. At first, this seems ridiculous because things are changing all the time. It is almost the exact opposite of a cash game. Blinds structures change frequently; the number of players at the table is constantly decreasing; and, the amount of cash a player has plays a huge part in tournament strategy. This means the bot must be able to respond to many more situations. More situations means more complexity which means harder to program. This apparent contradiction is why there has been little work on sit-and-go bots compared to cash game bots.
Because the house charges an additional 10% to play a game, playing merely average will result in us losing money. We have to be at least 10% better than our opponents just to break even. In practice, a good player will make 10-15% profit over thousands of games after subtracting the house’s cut. Below I go into detail about how to measure success, but at this point it is important to understand that even the best poker plays may win or lose an individual sit-and-go. We are only conserned about our net profits over thousands of hands.
The secret of PokerPirate is that there are many more opportunities to exploit our opponents’ weaknesses in a sit-and-go tournament, and these exploits are relatively mechanical (i.e. easy to program). This will become apparent if we divide the tournament into 3 distinct phases: the early, mid, and end games.
Most of our advantage comes in the early game. The early game is made up of the first 20 hands. Blinds at this stage are either 10/20 or 20/40. This is very small compared to our initial chip count of 1000. If we went all 20 hands without playing once, we would lose 90 chips, leaving us with 910. This is still a very respectable chip count to enter the mid game with. Furthermore, most players are very impatient, and joined the game just for some action. During the first 20 hands, typically 1-3 players will go all in. This means that 1-3 players will be removed from the game, so only 7-9 will remain. Simply by waiting, we reduce our number of opponents and maintain roughly the same number of chips.
With a normal stack of chips and ten players left, being average over the long term nets us the 10% loss from the house’s cut. With nine players left, being average we break even. Anything fewer than nine, and being average generates a net profit.
The hardest part of the tournament is the midgame. The number of players are reduced from 7-9 to 4 or less, and blinds are typically 40/80, or 80/160, but can get as high as 160/320. We cannot afford to play passively anymore at this point, because blinds are big enough to take away a significant portion of our stack. Blinds are not yet high enough, however, that we are pot committed if we play a hand. This means the program will encounter “tricky” situations where it will have to decide what to do. It is these tricky situations that makes developing a pokerbot dificult.
PokerPirate is at a disadvantage during the midgame, because the midgame so closely resembles a cash game. Cash games are difficult to program because there are a lot of different hand combinations and the AI has to make difficult bets and calls. Luckily, because people have already been removed from the tournament, playing average will provide us with long term profits. Therefore we have a much easier goal to meet than do the cash game bots.
Another advantage of the tournament structure is that deciding the amount to bet in a no-limit game becomes much easier. In an individual hand, we are no longer concerned with maximizing our expected value. Instead, we are concerned with not getting knocked out. This turns out much easier to handle because we go all-in much more frequently. In a typical game, we will have to win 1-2 major hands (all-in) or 3-4 minor hands (win blinds) in order to make it to the end game and have a shot at winning first place. A limit holdem sit-and-go bot would probably be more successful because it wouldn’t have to deal with this particular challenge, but in practice there are very few limit sit-and-go tournaments.
We also have a great advantage during the endgame. During the endgame, there are typically 4 or fewer players and blinds are 160/320 or greater. There are two things we can exploit here. First, at this point, most players are desperate just to get “in the money.” That means they will play more conservatively than they should, hoping that someone else will get knocked out. First place prize, however, pays much more than 2nd or 3rd. Therefore, to maximize long-term profits, you should always play to win 1st and not settle to get just in the money. Second, because blinds are so large relative to stack size, you must play many more hands or else be blinded to death. Because blinds are so large, the best moves are typically to either go all-in or fold preflop. This avoids any tricky situations, and allows us to use the computer’s knowledge of preflop odds to greatly bolster our play.
In summary, a sit-and-go pokerbot has three advantages over the cash game bot:
Players are too aggressive too early, and will get kicked out without us doing anything
Players don’t realize that they should always be playing for first place
Preflop play is much more important, and computers are good at preflop play
For more information about beating sit-and-go’s, I recommend Sit ’n Go Strategy or How to Beat Sit ‘n’ Go Poker Tournaments.
Professional poker players define success in terms of how much money they make; therefore, a pokerbot’s success should be determined in the same manner. The bot must be skilled enough not only to win money from the other players, but also to win enough to cover the house’s take. Also, it must be able to make money not just ocassionally, but over a prolonged period of time.
Return on Investment (ROI) is a measure of a player’s ability at a given game. A negative ROI means the player will lose money over time; a zero ROI means he will neither gain nor lose money; a positive ROI means the player will make money over time, and this is our goal. It is calculated like this:
ROI = (Winnings - Investment)/Investment
It is important that we include the house’s fees in the investment portion of the calculation. For example, if we play 1000 games of $5+.5 sit and goes, the investment would be $5500. If our total winnings over this period were 6000 dollars, then our ROI would be 9%.
Because poker has such a large amount of chance in any individual hand, ROI has a theoretical upper limit. This upper limit is determined by the skill of the other players, which is determined by the buy in for the game. In practice, this has been determined to be about:Buy in ($) | ROI (%) |
11 | 20 |
22 | 15 |
33 | 10 |
55 | 8 |
109 | 7 |
215 | 6 |
Hourly rate is a measure of how quickly a player wins money. A higher ROI will increase hourly rate, but there are many other factors that contribute as well. For example, playing several tables at the same.
hourly rate = ROI * Buy in * # games per hour
For example, a 9% ROI at a $5+.5 game, playing 5 games per hour yields an hourly rate of $2.50. A human wouldn’t want to quit their day job for such a paltry income, but a pokerbot making this would create a nice additional paycheck.
For human players, ROI and hourly rate can be at odds with each other. A player might be able to have a 20% ROI playing just one table at a time, or a 10% ROI playing 4 tables. The latter results in twice the hourly rate with half the ROI. A pokerbot does not have this dilemma. It will play just as well at 1 table or 4. Therefore ROI and hourly rate can be treated almost sinonomously when measuring the success of a pokerbot. Since human players use hourly rate, however, that will be our convention from here out.
In 2005-2006 I developed PokerPirate, a pokerbot. Pokerbots are software that play online poker without human interaction. PokerPirate successfully beat single table, no limit, Texas Hold’em tournaments. These are better known as sit-and-go’s. PokerPirate played the $5 tournaments on Royal Vegas Poker. After recouping my losses from the development process, I decided to turn the bot off. I am now releasing the source code and using it as a case study in artificial intelligence and software engineering.
Poker presents a particular challenge to the AI developer. Much like the real world, poker hands can have millions of variations, there is a lot of unknown information, and a lot of human interaction. AI techniques capable of winning at poker would be a significant advance because they could be applied to many problems that are currently unsolved.
A successful pokerbot, however, is intrinsically interesting apart from any potential advancement in AI. For example, it could provide the owner with an effort-free source of income. The AI in PokerPirate does not use any advanced techniques. Instead, I have carefully selected a game that simple techniques would be effective at beating. Most pokerbots specialize in limit table games, but PokerPirate specializes in no-limit sit-and-go tournaments. I don’t think there are many bots competing in these games yet.
PokerPirate was play tested on the .75+.25 sit-and-go’s on RVP. That means that everyone would play one dollar to play, the house would take 25 cents, and 75 cents would go into the pot. PokerPirate played over 5000 tournaments during this time. The AI would be revised, then a small test would be run, and the cycle would repeat. The AI never had a positive ROI at this level because the house’s take was so high. A final test of 1000 games was run in which the AI would have had about a 10% ROI if the house take was reduced to 10% vice 33%.
I felt the AI was ready after this test to be moved to the 5+.50 games. These were the lowest stakes games offering only a 10% house take. Over the course of 2000 games, PokerPirate covered the development costs of about $1000 and had an ROI of about 10%.
I decided stop running PokerPirate after I had recouped my losses. I now believe that online gambling is a net loss to society, and I do not want to be a part of it in any fashion. I’m continuing to make these articles available because people enjoy them.
The links below describe the development of the bot. A working knowledge of C++ will be required to understand the AI engine. Each page builds on the discussion of those previous, so I recommend reading them in order.
Exploiting the sit-and-go game
Pokerbot styles
The Sit-and-go tournament
Exploiting the game
How to measure success
“play” mode
“watch” mode
“debug” mode
“proc” mode
“resetDB” mode
PokerPirate’s architecture and code
How to interface with the poker client
Explaining the makePlay() function
Challenges associated with Royal Vegas Poker
Staying below the radar
Explaining the getPlay() function
Why heads up play is easy
Pre-flop play
Post-flop play
Reading the code
Launching the program in “debug” mode
The first step in data mining images is to create a distance measure for two images. In the intro to data mining images, we called this distance measure the “black box.” This post will cover how to create distance measures based on time series analysis. This technique is great for comparing objects with a constant, rigid shape. For example, it will work well on classifying images of skulls, but not on images of people. Skulls always have the same shape, whereas a person might be walking, standing, sitting, or curled into a ball. By the end of this post, you should understand how to compare these hominid skulls from UC Riverside [1] using radial scanning and dynamic time warping.
But first, we must start from the beginning. What exactly is a times series? Anything that can be plotted on a line graph. For example, the price of Google stock is a time series:
As you can imagine, time series have been studied extensively. Most scientists use them at some point in their careers. Unsurprisingly, they have developed many techniques for analyzing them. If we can convert our images into time series, then all these tools become available to us. Therefore, the time series distance measure has two steps:
STEP 1: Convert the images into a time series
STEP 2: Find the distance between two images by finding the distance between their time series
We have our choice of several algorithms for each step. In the rest of this post, we will look at two algorithms for converting images into time series: radial scanning and linear scanning. Then, we will look at two algorithms for measuring the distance between time series: Euclidean distance and dynamic time warping. We will conclude by looking at the types of problems time series analysis handles best and worst.
Radial scanning is tricky to explain, but once it clicks you’ll realize that it is both simple and elegant. Here’s an example from a human skull:
First we find the skull’s outline. Then we find the distance from the center of the skull to each point on the skull’s outline (B). Finally, we plot those distances as a time series (C). The lines connecting the skull to the graph show where that point on the skull maps to the time series below. In this case, we started at the skull’s mouth and went clockwise.
Skulls from different species produce different time series:
Take a careful look at these skulls and their time series. Make sure you can spot the differences in the time series between each grouping. Don’t worry yet about how the groupings were made. Right now, just get a feel for how a shape can be converted into a time series.
Another example of radial scanning comes from Korea University. Here we are trying to determine a tree’s species based on it’s leaf shapes:
The labeled points on the leaf at left correspond to the labeled positions on the time series at right. Radial scanning is a popular technique for leaf classification because every species of plant has a characteristic leaf shape. Each leaf will be unique, but the pattern of peaks and valleys in the resulting time series should be similar if the species of plant is the same.
We can already tell that the graphs created by the skulls and the leaf look very different to the human eye. This is a good sign that radial scanning captures important information about the objects shape that we will be able to use in the comparison step.
Some objects just aren’t circular, so radial scanning makes no sense. One example is hand written words. The University of Massachusetts has analyzed a large collection of George Washington’s letters using the linear scanning method. [3] [4] In the first image is a picture of the word “Alexandria” as Washington actually wrote it:
Then, we remove the tilt from the image. All of Washington’s writing has a fairly constant tilt, so this process is easy to automate.
Finally, we create a time series from the word:
To create this time series, we start at the left of the image and consider each column of pixels in turn. The value at each “time” is just the number of dark pixels in that column. If you look closely at the time series, you should be able to tell where each bump corresponds to a specific letter. Some letters, like the “d” get two bumps in the time series because they have two areas with a high concentration of dark pixels.
We could have constructed the time series in other ways as well. For example, we could have counted the number of pixels from the top of the column to the first dark pixel. This would have created an outline of the top of the word. We simply have to consider our application carefully and decide which method will work the best.
We now have two simple methods for creating time series from images. These are the simplest and most common methods, but the only ones. WARP [5] and Beam Angle Statistics [6] are two examples of other methods. Which is best depends—as always—on the specific application. Now that we can create the time series, let’s figure out how to compare them.
The whole purpose of creating the time series was to create a distance measure that uses them. The easiest way to do this is the euclidean distance. (This is the normal $distance = $ that we are used to.) Consider the two time series below: [7]
More formally,
where \(red_i\) is the height of the red series at “time” \(i\), \(blue_i\) is the height of the blue series at “time” \(i\), and \(N\) is the length of the time series. This is a simple and fast calculation, running in time \(O(N)\).
A more sophisticated way to compare time series is called Dynamic Time Warping (DTW). DTW tries to compare similar areas in each time series with each other. Here are the same two time series compared with DTW:
In this case, each of the humps in the blue series is matched with a hump in the red series, and all the flat areas are paired together. Notice that a single point in one time series can align with multiple points in the other. In this case, DTW gives a distance nearly zero—it is a nearly perfect match. Euclidean distance had a much worse match and would give a large distance.
For most applications, dynamic time warping outperforms straight Euclidean distance. Take a look at this dendrogram clustering:
The orange series contain three humps, the green four, and the blue five. But the humps do not line up, so this is a difficult problem for straight Euclidean distance. In contrast, DTW successfully clustered the time series based on the number of humps they have.
That’s great, but how did DTW decide which points in the red and blue time series should align?
Exhaustive search. We try every possible alignment and pick the one that works best. This will be easier to see with a simpler example:
The overall distance is the distance between each \(red_i\) and \(blue_j\). This effectively compares every time in the red series with every other time in the blue series. Then, we select the path through the matrix that minimizes the total distance:
The colored boxes correspond to the colored lines connecting the two time series in the first image. For example, the four light blue squares in the top right are on a single row, so they map one point on the red series to four points on the blue one.
Using dynamic programming, DTW is an $O(N^2) $ algorithm, which is much slower than Euclidean distance’s \(O(N)\). This is a serious problem if we want to use the algorithm to search a large database.
The easiest way to speed up the algorithm is to calculate only a small fraction of the matrix. Intuitively, we want our warping path to stay relatively close to a diagonal line. If it stays exactly on the diagonal line, then every red and blue time correspond exactly. This is the same as the Euclidean distance. At the opposite extreme would be a path that follows the left most, then top most edges. In this case we are comparing the first blue value to all red values and the last red value to all blue values. This seems unlikely to make a good match.
There are two common ways to limit the number of calculations. First is the Sakoe-Chiba band:
The second method is the Itakura parallelogram:
The basic ideas behind these restrictions is pretty straightforward from their pictures. What isn’t straightforward, however, is that these techniques also increase DTW’s accuracy. [8] For over a decade researchers tried to find ways to increase the amount of the matrix they could search because they falsely believed that this would lead to more accurate results.
We can also speed up the calculation using an approximation function called a lower bound. A lower bound is computationally much cheaper than the full DTW function—a good one might run 1000 times faster than the time of the full DTW—and is always less than or equal to the real DTW. We can run the lower bound on millions of images, and only select the potentially closest matches to run the full DTW algorithm on. Two good lower bounds are LB_Improved [9] and LB_Keogh [10].
Finally, there are other methods for comparing time series. The most common is called Longest Common Sub-Sequence (LCSS). It is useful for matching images suffering from occlusion [11].
When to use Time Series Analysis
Time series analysis is only sensitive to an object’s shape. It is invariant to colors and internal features. These properties make time series analysis good for comparing rigid objects, such as skulls, leaves, and handwriting. These shapes do not change over time, so they will have similar time series no matter when they are measured.
Time series analysis will not work on objects that can change their shapes over time. People are good examples of this, because we have many different postures. We can walk, sit, or curl into a ball. Another distance measure called “shock graphs” is better for comparing the shapes of objects that can move. We’ll cover shock graphs in a later post.
[1] Eamonn Keogh, Li Wei, Xiaopeng Xi, Sang-Hee Lee and Michail Vlachos “LB_Keogh Supports Exact Indexing of Shapes under Rotation Invariance with Arbitrary Representations and Distance Measures.” VLDB 2006. (PDF)
[2] Yoon-Sik Tak and Eenjun Hwang. “A Leaf Image Retrieval Scheme Based on Partial Dynamic Warping and Two-Level Filtering” _7th International Conference on Computer and Information Technology, _2007. (Access on IEEE)
[3] Rath, Kane, Lehman, Partridge, and Manmatha. “Indexing for a Digital Library of George Washinton’s Manuscripts: A Study of Word Matching Techniques.” CIIR Technical Report. (PDF)
[4] ath, Manmatha. “Word Image Matching Using Dynamic Time Warping,” the Proceedings of CVPR-03 conference,vol. 2, pp. 521-527. (PDF)
[5] Bartolini, Ciaccia, Patella, “WARP: Accurate Retrieval of Shapes Using Phase of Fourier Descriptors and Time Warping Distance” _IEEE Transactions of Pattern Analysis and Machine Intelligence, _Vol 27 No 1, January 2005. (PDF)
[6] Arica, Yarman-vural. “BAS: a perceptual shape descriptor based on the beam angle statistics.” Pattern Recognition Letters 2003. (PDF)
[7] Keogh, “Exact Indexing of Dynamic Time Warping” (PDF)
[8] Ratanamahatana, Keogh. “Three Myths about Dynamic Time Warping.” SDM 2005. (PDF)
[9] Lemire, “Faster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound,” Pattern Recognition 2008. (PDF)
[10] Keogh, Ratanamahatana. “Exact indexing of dynamic time warping,” Knowledge and Information Sytems 2002. (PDF)
[11] Yazdani, Meral Özsoyoglu. 1996 Sequence matching of images. In Proc. 8th Int. Conf. Sci. Stat. Database Manag. pp. 53–62.
Image processing is one of those things people are still much better at than computers. Take this set of cats:
Just at a glance, you can easily tell the difference between the cartoon animals and the photographs. You can tell that the hearts in the top left probably don’t belong, and that Odie is tackling Garfield in the top right. The human brain does this really well on small datasets.
But what if we had thousands, millions, or even billions of images? Could we make an image search engine, where I give it a picture of an animal and it says what type it is? Could we make it automatically find patterns that people miss?
Yes! This post is the beginning of a series about how. Finding patterns in large databases of images is still an active research area, and these posts will hopefully make those results more accessible. The current research still isn’t perfect, but it’s probably much better than you’d guess.
There are three basic steps in data mining images:
STEP 1: Create the “black box”
STEP 2: Cluster
STEP 3: Run queries
That’s it!
… well … sort of …
There are many different algorithms that can be used at each step. Which ones you decide to use will depend on the type of information you’re mining from the images. The rest of this post gives a high level overview of how each of these steps works, and later posts will focus on specific implementations for each step.
The black box defines the “distance” between two images. The smaller the distance, the more similar the images are. For example:
Garfield is very similar to himself, that’s why Box A gives him a low score–nearly zero. Odie is not very similar to Garfield, but he’s a lot closer than a palm tree. The specific numbers outputted don’t matter. All that matters is the ordering created by those numbers. In this case:
Of course, if we compare against a different image, we will probably get a different ordering.
Likewise, we can get different orderings with a different black box. Let’s imagine that Box A was designed to determine if two pictures are of the same type of animal. If we test it on some new input, we might get:
Notice that Box A thinks the real cat is more similar to Garfield than Odie is. Now let’s consider another black box. Imagine Box B is designed to see if two images were drawn in a similar style. Box B might give the following:
Box B gives the opposite results of box A.
Creating a good black box is the hardest part of data mining images. Most research is dedicated to this area, and most of this series will be focused on evaluating the performance of different black boxes. Which ones are good depends on your dataset and what information you’re trying to extract. Some general categories of black boxes we’ll look at are:
Histogram analysis (a simple technique that can be surprisingly effective on colored input)
Converting images into a time series (for analyzing the shapes of rigid objects, e.g. fruit)
Creating shock graphs (for analyzing the shapes of non-rigid objects, e.g. animals)
Komolgorov comlexity of the images (for comparing an image’s textures)
But first, let’s take a closer look at what makes a black box good.
There are two more aspects of black boxes we must look at. First, every black box will be sensitive to certain features of an image and invariant to others. In the examples below, Box C is sensitive to shape, but invariant to color. Box D is sensitive to color, but invariant to shape.
Most black box algorithms contain both sensitivities and invariances. These are the properties you will use to decide which black box is best for your application.
Second, a black box is a metric and as such must satisfy four criteria:
distance(x, y) ≥ 0 (non-negativity)
distance(x, y) = 0 if and only if x = y (identity of indiscernibles)
distance(x, y) = distance(y, x) (symmetry)
distance(x, z) ≤ distance(x, y) + distance(y, z) (subadditivity / triangle inequality).
If you don’t understand these criteria, don’t worry too much. All the black boxes we’ll look at in the rest of this series will satisfy these criteria automatically.
Clustering is much easier than designing the black box. Clustering algorithms are used in many fields, so they have received much more attention. Some clustering algorithms commonly used are: K-means and Hierarchical clustering (i.e. Dendrograms).
There are many more as well. In general, you can use whatever clustering algorithm you want. When developing an application, most people will try several and pick whichever one happens to work the best for their data.
Here’s an example clustering of our cat data using Black Box A (i.e. by what the picture is of):
We’ve created three clusters. The red cluster contains hearts, the white cluster contains cats, and the blue cluster is an anomaly. It contains both a cat and a dog, and there is no easy way to separate them. If we had used a hierarchical classifier, the “contains cats and dogs cluster” might be a sub-cluster of the “contains cats cluster.”
Here’s the same data clustered using Black Box B (i.e. by how the picture is drawn):
Now we have only two clusters: the white cluster contains cartoons, and the red cluster contains photographs.
One last note. Most of the CPU work gets done during this step. On large datasets, clustering can take hours to months depending on the algorithm and the speed of the black box. There are many tricks for speeding up clustering, which will take a look at in later posts.
Queries are fairly easy once the ground work is set up with the black box and clustering. Sometimes, all you want to know is how STEP 2 clustered your input. For example, you could query “how many types of animals are in this dataset?” The answer would just be the number of clusters using Box A. Typically, however, your query we will supply the database with an image and find similar images.
If we’ve done steps 1 and 2 well, this should take only seconds even when the database contains millions of images. Of course, it’s not always possible to do steps 1 and 2 well enough to make this happen. Later posts may cover some new techniques for speeding up the querying process.
So far, we’ve seen that the black box framework for image datamining is very simple. The tricky part is putting the right algorithm in each step. In the rest of the series, we’ll look at a few different black boxes, and show how to efficiently combine them with a clustering algorithm. The different types of black boxes are the most interesting part of image mining, so we will focus on that first.