<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>My Experiments in Truth</title>
	<atom:link href="http://izbicki.me/blog/feed" rel="self" type="application/rss+xml" />
	<link>http://izbicki.me/blog</link>
	<description>Writing about computer science and religion</description>
	<lastBuildDate>Tue, 11 Jun 2013 17:59:15 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>HLearn&#8217;s code is shorter and clearer than Weka&#8217;s</title>
		<link>http://izbicki.me/blog/hlearns-code-is-shorter-and-clearer-than-wekas?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=hlearns-code-is-shorter-and-clearer-than-wekas</link>
		<comments>http://izbicki.me/blog/hlearns-code-is-shorter-and-clearer-than-wekas#comments</comments>
		<pubDate>Tue, 11 Jun 2013 17:50:09 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Haskell]]></category>

		<guid isPermaLink="false">http://izbicki.me/blog/?p=2520</guid>
		<description><![CDATA[Haskell code is expressive.  The HLearn library uses 6 lines of Haskell to define a function for training a Bayesian classifier; the equivalent code in the Weka library uses over 100 lines of Java.  That&#8217;s a big difference!  In this post, we&#8217;ll look at the actual code and see why the Haskell is so much [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignright  wp-image-2478" alt="weka-lambda-haskell" src="http://izbicki.me/blog/wp-content/uploads/2013/05/weka-lambda-haskell-300x150.png" width="240" height="120" /></p>
<p>Haskell code is expressive.  The <a href="https://github.com/mikeizbicki/HLearn">HLearn library</a> uses 6 lines of Haskell to define a function for training a Bayesian classifier; the equivalent code in the <a href="http://www.cs.waikato.ac.nz/ml/weka/">Weka library</a> uses over 100 lines of Java.  That&#8217;s a big difference!  In this post, we&#8217;ll look at the actual code and see why the Haskell is so much more concise.</p>
<p><strong>But first, a disclaimer:</strong>  It is really hard to fairly compare two code bases this way.  In both libraries, there is a lot of supporting code that goes into defining each classifier, and it&#8217;s not obvious what code to include and not include.  For example, both libraries implement interfaces to a number of probability distributions, and this code is not contained in the source count.  The Haskell code takes more advantage of this abstraction, so this is one language-agnostic reason why the Haskell code is shorter.  If you think I&#8217;m not doing a fair comparison, here&#8217;s some links to the full repositories so you can do it yourself:</p>
<ul>
<li><span class="Apple-style-span" style="line-height: 12px;"><a href="https://github.com/mikeizbicki/HLearn/blob/master/HLearn-classification/src/HLearn/Models/Classifiers/Bayes.hs">HLearn&#8217;s bayesian classifier source code</a> (74 lines of code)</span></li>
<li><a href="https://svn.cms.waikato.ac.nz/svn/weka/trunk/weka/src/main/java/weka/classifiers/bayes/NaiveBayes.java">Weka&#8217;s naive bayes source code</a> (946 lines of code)</li>
</ul>
<p><span id="more-2520"></span></p>
<h3>The HLearn code</h3>
<p>HLearn implements training for a <a href="https://en.wikipedia.org/wiki/Naive_Bayes_classifier">bayesian classifier</a> with these six lines of Haskell:</p>
<pre>newtype Bayes labelIndex dist = Bayes dist
    deriving (Read,Show,Eq,Ord,Monoid,Abelian,Group)

instance (Monoid dist, HomTrainer dist) =&gt; HomTrainer (Bayes labelIndex dist) where
    type Datapoint (Bayes labelIndex dist) = Datapoint dist
    train1dp dp = Bayes $ train1dp dp</pre>
<p>This code elegantly captures how to train a Bayesian classifier&#8212;just train a probability distribution.  Here&#8217;s an explanation:</p>
<ul>
<li>The first two lines define the Bayes data type as a wrapper around a distribution.</li>
<li>The fourth line says that we&#8217;re implementing the Bayesian classifier using the HomTrainer type class.  We do this because <strong>the Haskell compiler automatically generates a parallel batch training function, an online training function, and a fast cross-validation function for all HomTrainer instances.</strong></li>
<li>The fifth line says that our data points have the same type as the underlying distribution.</li>
<li>The sixth line says that in order to train, just train the corresponding distribution.</li>
</ul>
<p>We only get the benefits of the HomTrainer type class because the bayesian classifier is a monoid.  But we didn&#8217;t even have to specify what the monoid instance for bayesian classifiers looks like!  In this case, it&#8217;s automatically derived from the monoid instances for the base distributions using a language extension called <a href="http://www.haskell.org/ghc/docs/7.6.1/html/users_guide/deriving.html">GeneralizedNewtypeDeriving</a>.  For examples of these monoid structures, check out the algebraic structure of the <a href="http://izbicki.me/blog/gausian-distributions-are-monoids">normal</a> and <a href="http://izbicki.me/blog/the-categorical-distributions-algebraic-structure">categorical</a> distributions, or more complex distributions using <a href="http://izbicki.me/blog/markov-networks-monoids-and-futurama">Markov networks</a>.</p>
<h3>The Weka code</h3>
<p>Look for these differences between the HLearn and Weka source:</p>
<ul>
<li>In Weka we must separately define the online and batch trainers, whereas Haskell derived these for us automatically.</li>
<li>Weka must perform a variety of error handling that Haskell&#8217;s type system takes care of in HLearn.</li>
<li>The Weka code is tightly coupled to the underlying probability distribution, whereas the Haskell code was generic enough to handle any distribution. This means that while Weka must make the &#8220;naive bayes assumption&#8221; that all attributes are independent of each other, HLearn can support any dependence structure.</li>
<li>Weka&#8217;s code is made more verbose by for loops and if statements that aren&#8217;t necessary for HLearn.</li>
<li>The Java code requires extensive comments to maintain readability, but the Haskell code is simple enough to be self-documenting (at least once you know how to read Haskell).</li>
<li>Weka does not have parallel training, fast cross-validation, data point subtraction, or weighted data points, but HLearn does.</li>
</ul>
<pre>/**
   * Generates the classifier.
   *
   * @param instances set of instances serving as training data 
   * @exception Exception if the classifier has not been generated 
   * successfully
   */
  public void buildClassifier(Instances instances) throws Exception {

    // can classifier handle the data?
    getCapabilities().testWithFail(instances);

    // remove instances with missing class
    instances = new Instances(instances);
    instances.deleteWithMissingClass();

    m_NumClasses = instances.numClasses();

    // Copy the instances
    m_Instances = new Instances(instances);

    // Discretize instances if required
    if (m_UseDiscretization) {
      m_Disc = new weka.filters.supervised.attribute.Discretize();
      m_Disc.setInputFormat(m_Instances);
      m_Instances = weka.filters.Filter.useFilter(m_Instances, m_Disc);
    } else {
      m_Disc = null;
    }

    // Reserve space for the distributions
    m_Distributions = new Estimator[m_Instances.numAttributes() - 1]
      [m_Instances.numClasses()];
    m_ClassDistribution = new DiscreteEstimator(m_Instances.numClasses(), 
                                                true);
    int attIndex = 0;
    Enumeration enu = m_Instances.enumerateAttributes();
    while (enu.hasMoreElements()) {
      Attribute attribute = (Attribute) enu.nextElement();

      // If the attribute is numeric, determine the estimator 
      // numeric precision from differences between adjacent values
      double numPrecision = DEFAULT_NUM_PRECISION;
      if (attribute.type() == Attribute.NUMERIC) {
	m_Instances.sort(attribute);
	if ( (m_Instances.numInstances() &gt; 0)
	    &amp;&amp; !m_Instances.instance(0).isMissing(attribute)) {
	  double lastVal = m_Instances.instance(0).value(attribute);
	  double currentVal, deltaSum = 0;
	  int distinct = 0;
	  for (int i = 1; i &lt; m_Instances.numInstances(); i++) { 	    
            Instance currentInst = m_Instances.instance(i); 	    
              if (currentInst.isMissing(attribute)) {
                break; 	    
              }
 	    currentVal = currentInst.value(attribute);
 	    if (currentVal != lastVal) {
 	      deltaSum += currentVal - lastVal;
 	      lastVal = currentVal;
 	      distinct++;
 	    }
 	  }
 	  if (distinct &gt; 0) {
	    numPrecision = deltaSum / distinct;
	  }
	}
      }

      for (int j = 0; j &lt; m_Instances.numClasses(); j++) {
	switch (attribute.type()) {
	case Attribute.NUMERIC: 
	  if (m_UseKernelEstimator) {
	    m_Distributions[attIndex][j] = 
	      new KernelEstimator(numPrecision);
	  } else {
	    m_Distributions[attIndex][j] = 
	      new NormalEstimator(numPrecision);
	  }
	  break;
	case Attribute.NOMINAL:
	  m_Distributions[attIndex][j] = 
	    new DiscreteEstimator(attribute.numValues(), true);
	  break;
	default:
	  throw new Exception("Attribute type unknown to NaiveBayes");
	}
      }
      attIndex++;
    }

    // Compute counts
    Enumeration enumInsts = m_Instances.enumerateInstances();
    while (enumInsts.hasMoreElements()) {
      Instance instance = 
	(Instance) enumInsts.nextElement();
      updateClassifier(instance);
    }

    // Save space
    m_Instances = new Instances(m_Instances, 0);
  }</pre>
<p>And the code for online learning is:</p>
<pre>/**
   * Updates the classifier with the given instance.
   *
   * @param instance the new training instance to include in the model 
   * @exception Exception if the instance could not be incorporated in
   * the model.
   */
  public void updateClassifier(Instance instance) throws Exception {

    if (!instance.classIsMissing()) {
      Enumeration enumAtts = m_Instances.enumerateAttributes();
      int attIndex = 0;
      while (enumAtts.hasMoreElements()) {
	Attribute attribute = (Attribute) enumAtts.nextElement();
	if (!instance.isMissing(attribute)) {
	  m_Distributions[attIndex][(int)instance.classValue()].
            addValue(instance.value(attribute), instance.weight());
	}
	attIndex++;
      }
      m_ClassDistribution.addValue(instance.classValue(),
                                   instance.weight());
    }
  }</pre>
<h3>Conclusion</h3>
<p>Every algorithm implemented in HLearn uses similarly concise code.  I invite you to <a href="https://github.com/mikeizbicki/HLearn/">browse the repository</a> and see for yourself.  The most complicated algorithm is for Markov chains which use only <a href="https://github.com/mikeizbicki/HLearn/blob/master/HLearn-markov/src/HLearn/Models/Markov/MarkovChain.hs">6 lines for training, and about 20 for defining the Monoid</a>.</p>
<p>You can expect lots of tutorials on how to incorporate the HLearn library into Haskell programs over the next few months.</p>
<p>Subscribe to the <a href="http://izbicki.me/blog/feed">RSS feed</a> to stay tuned!</p>
 <img src="http://izbicki.me/blog/?feed-stats-post-id=2520" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://izbicki.me/blog/hlearns-code-is-shorter-and-clearer-than-wekas/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>HLearn cross-validates &gt;400x faster than Weka</title>
		<link>http://izbicki.me/blog/hlearn-cross-validates-400x-faster-than-weka?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=hlearn-cross-validates-400x-faster-than-weka</link>
		<comments>http://izbicki.me/blog/hlearn-cross-validates-400x-faster-than-weka#comments</comments>
		<pubDate>Mon, 03 Jun 2013 15:33:16 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Haskell]]></category>

		<guid isPermaLink="false">http://izbicki.me/blog/?p=2468</guid>
		<description><![CDATA[Weka is one of the most popular tools for data analysis.  But Weka takes 70 minutes to perform leave-one-out cross-validate using a simple naive bayes classifier on the census income data set, whereas Haskell&#8217;s HLearn library only takes 9 seconds.  Weka is 465x slower! Code and instructions for reproducing these experiments are available on github. Why is HLearn so much faster? [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignright  wp-image-2478" alt="weka-lambda-haskell" src="http://izbicki.me/blog/wp-content/uploads/2013/05/weka-lambda-haskell-300x150.png" width="240" height="120" /><a href="http://www.cs.waikato.ac.nz/~ml/weka/">Weka</a> is one of the most popular tools for data analysis.  But Weka takes <strong>70 minutes</strong> to perform leave-one-out cross-validate using a simple <a href="https://en.wikipedia.org/wiki/Naive_Bayes_classifier">naive bayes classifier</a> on the <a href="http://archive.ics.uci.edu/ml/datasets/Census-Income+(KDD)">census income</a> data set, whereas Haskell&#8217;s <a href="https://github.com/mikeizbicki/HLearn">HLearn</a> library only takes <strong>9 seconds</strong>.  Weka is 465x slower!</p>
<p><strong>Code and instructions for reproducing these experiments are <a href="https://github.com/mikeizbicki/HLearn/tree/master/HLearn-classification/src/examples/weka-cv#readme">available on github</a>.</strong></p>
<p><strong><span id="more-2468"></span></strong></p>
<p>Why is HLearn so much faster?</p>
<p>Well, it turns out that the bayesian classifier has the algebraic structure of a <a href="https://en.wikipedia.org/wiki/Monoid">monoid</a>, a <a href="https://en.wikipedia.org/wiki/Abelian_group">group</a>, and a <a href="https://en.wikipedia.org/wiki/Vector_space">vector space</a>.  HLearn uses a new cross-validation algorithm that can exploit these algebraic structures.  The standard algorithm runs in time <span id='tex_3405'></span>, where <span id='tex_233'></span> is the number of &#8220;folds&#8221; and <span id='tex_5285'></span> is the number of data points.  The algebraic algorithms, however, run in time <span id='tex_3482'></span>.  In other words, it doesn&#8217;t matter how many folds we do, the run time is constant!  And not only are we faster, but we get the <em>exact same answer</em>.  Algebraic cross-validation is not an approximation, it&#8217;s just fast.</p>
<p>Here&#8217;s some run times for k-fold cross-validation on the census income data set.  Notice that HLearn&#8217;s run time is constant as we add more folds.<i><br />
</i></p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-2479" alt="k-fold-cross-validation-weka" src="http://izbicki.me/blog/wp-content/uploads/2013/05/k-fold-cross-validation-weka1.png" width="555" height="336" /></p>
<p>And when we set k=n, we have leave-one-out cross-validation.  Notice that Weka&#8217;s cross-validation has quadratic run time, whereas HLearn has linear run time.</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-2480" alt="leave-one-out-fast-cross-validation-weka" src="http://izbicki.me/blog/wp-content/uploads/2013/05/leave-one-out-fast-cross-validation-weka1.png" width="553" height="333" /></p>
<p>HLearn certainly isn&#8217;t going to replace Weka any time soon, but it&#8217;s got a number of cool tricks like this going on inside.  If you want to read more, you should check out these two recent papers:</p>
<ul>
<li>(ICML13) <a href="http://izbicki.me/public/papers/icml2013-algebraic-classifiers.pdf">Algebraic Classifiers: a generic approach to fast cross-validation, online training, and parallel training</a></li>
</ul>
<ul>
<li><span class="Apple-style-span" style="line-height: 12px;">(TFP13) <a href="http://izbicki.me/public/papers/tfp2013-hlearn-a-machine-learning-library-for-haskell.pdf">HLearn: a machine learning library for Haskell</a></span></li>
</ul>
<p>I&#8217;ll continue to write more about these tricks in future blog posts.</p>
<p>Subscribe to the <a href="http://izbicki.me/blog/feed">RSS feed</a> to stay tuned.</p>
 <img src="http://izbicki.me/blog/?feed-stats-post-id=2468" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://izbicki.me/blog/hlearn-cross-validates-400x-faster-than-weka/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Turning an AK-47 into a serving ladle</title>
		<link>http://izbicki.me/blog/turning-an-ak-47-into-a-serving-ladle?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=turning-an-ak-47-into-a-serving-ladle</link>
		<comments>http://izbicki.me/blog/turning-an-ak-47-into-a-serving-ladle#comments</comments>
		<pubDate>Mon, 13 May 2013 14:13:39 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Plowshares]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[Religion]]></category>

		<guid isPermaLink="false">http://izbicki.me/blog/?p=2388</guid>
		<description><![CDATA[This is the story of an AK-47 and a dead man named Isaiah.  Because of Isaiah, I forged this AK-47 into a serving ladle. This fully automatic AK-47 was used by the Romanian army during the Cold War.  It could shoot 600 rounds per minute with an effective range of 400 yards.  It was an [...]]]></description>
				<content:encoded><![CDATA[<p>This is the story of an AK-47 and a dead man named Isaiah.  Because of Isaiah, I forged this AK-47 into a serving ladle.</p>
<p><img class="aligncenter size-full wp-image-2389" alt="ak47-into-spoon-arrow" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-into-spoon-arrow.jpg" width="700" height="484" /></p>
<p><span id="more-2388"></span>This fully automatic AK-47 was used by the Romanian army during the Cold War.  It could shoot 600 rounds per minute with an effective range of 400 yards.  It was an instrument of death; but now, it is an instrument of life.  It has been redeemed.</p>
<p>You see, 2500 years ago the prophet Isaiah wrote:</p>
<blockquote><p>Nations will beat their swords into plowshares and their spears into pruning hooks.<br />
Nation will not take up sword against nation, nor will they train for war anymore.</p></blockquote>
<p>I want Isaiah&#8217;s vision to become reality.</p>
<h3>the rifle</h3>
<p>This is the rifle fully disassembled.</p>
<p><img class="aligncenter size-full wp-image-2405" alt="romanian-ak47-disassembled" src="http://izbicki.me/blog/wp-content/uploads/2013/05/romanian-ak47-disassembled.jpg" width="700" height="237" /></p>
<p>This is a closeup of the the barrel assembly that I actually made into a spoon.  I&#8217;m still looking for ideas on what to make out of everything else.</p>
<p><img class="aligncenter size-full wp-image-2419" alt="ak47-barrel-all" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-barrel-all.jpg" width="700" height="350" /></p>
<p>Here&#8217;s a closeup of the end where the bullet enters the barrel.  On top is the &#8220;rear sights.&#8221;  This is adjustable to shoot at targets anywhere from 0-800 meters away.</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-2411" alt="ak47-block-rotated" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-block-rotated.jpg" width="700" height="419" /></p>
<p>Notice that there&#8217;s actually many pieces of metal here&#8212;the barrel itself and two large blocks of steel attached to it.  These blocks are held in place by &#8220;pinions.&#8221;  These are the circle shaped pieces.  If I had a hydraulic press I could push out the pinions and then remove the big metal blocks.  But I don&#8217;t, so I&#8217;ll just beat them off with my forging hammer!</p>
<p><img class="aligncenter size-full wp-image-2412" alt="ak47-block-horizontal-labeled" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-block-horizontal-labeled.jpg" width="700" height="419" /></p>
<p>Looking down the barrel.  This is where we would insert the bullet when firing.</p>
<p><img class="aligncenter size-full wp-image-2413" alt="ak47-block-down-barrel-rifling" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-block-down-barrel-rifling.jpg" width="700" height="419" /></p>
<p>Now for the business end.  Here I have the flash suppressor removed.  The forward sight (the tall thing jutting out to the right) will make a great handle for the future ladle.</p>
<p><img class="aligncenter size-full wp-image-2414" alt="ak47-flash-surpressor" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-flash-surpressor.jpg" width="700" height="419" /></p>
<h3>Building the Forge</h3>
<p>In order to turn this chunk of metal into a spoon, I needed to build a forge.  I bought a basic anvil and 5 lbs hammer on the internet for about $80, and the stump came from craigslist for free:</p>
<p><img class="aligncenter size-full wp-image-2394" alt="stump-forge" src="http://izbicki.me/blog/wp-content/uploads/2013/05/stump-forge.jpg" width="375" height="700" /></p>
<p>Next, I needed a way to heat the metal.  I decided to build a propane forge, because I already had a good burner.  The burner came out of a portable stove that <a href="http://izbicki.me/blog/how-i-serve-150-free-lunches-for-less-than-20-cents-each-using-homebrew-equipment">we&#8217;ve used to serve free chili</a> with a group called <a href="http://www.foodnotbombs.net/">Food not Bombs</a>.  The base of the stove is below the bricks, but I unscrewed the burner and placed it on the bricks.</p>
<p><img class="aligncenter size-full wp-image-2395" alt="forge-burner" src="http://izbicki.me/blog/wp-content/uploads/2013/05/forge-burner.jpg" width="700" height="419" /></p>
<p>Then, I simply stacked the bricks on top of each other to create a nice chamber for the flames.  Here&#8217;s a picture of a test run with a piece of rebar.</p>
<p><img class="aligncenter size-full wp-image-2396" alt="forge-test-rebar" src="http://izbicki.me/blog/wp-content/uploads/2013/05/forge-test-rebar.jpg" width="700" height="419" /></p>
<p>The flame is actually gigantic, shooting 1-2 feet past the opening of the forge.  The middle of the rebar is a bright glowing orange-yellow, over 2000 degrees Fahrenheit.  My camera just can&#8217;t do it justice.</p>
<p>The bricks are also really cheap.  I payed 25 cents each at Lowes.  Overall, the whole set up cost less than $100.</p>
<h3>Forging the metal</h3>
<p>Let the hammering begin!  You have to work quickly to hit the metal while it&#8217;s still hot from the forge!</p>
<p><img class="aligncenter size-full wp-image-2422" alt="forging-at-night-ak47" src="http://izbicki.me/blog/wp-content/uploads/2013/05/forging-at-night-ak47.jpg" width="419" height="700" /></p>
<p>Oooohhh&#8230;. glowing&#8230;&#8230;  purty&#8230;..</p>
<p><img class="aligncenter size-full wp-image-2421" alt="ak47-glowing-2" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-glowing-2.jpg" width="700" height="419" /></p>
<p>After just a few hammer blows, look how much the blocks have shrunk.  Also, I did NOT have a flash for this.  The metal is so hot it&#8217;s putting off enough light to light up the wall I&#8217;m holding it next to!</p>
<p><img class="aligncenter size-full wp-image-2423" alt="ak47-glowing-3" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-glowing-3.jpg" width="700" height="419" /></p>
<p>I had to stop hammering after about an hour because of blisters.  This is what I was starting with on the next day.  Notice that you can still make out the serial number!</p>
<p><img class="aligncenter size-full wp-image-2427" alt="ak47-flat-serial-number" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-flat-serial-number.jpg" width="700" height="276" /></p>
<p>And here&#8217;s a view from the bottom up.</p>
<p><img class="aligncenter size-full wp-image-2426" alt="ak47-flat-bottomup" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-flat-bottomup.jpg" width="700" height="372" /></p>
<p>After a few more hours hammering, everything&#8217;s much flatter.  The serial number has long since been flattened away.</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-2431" alt="ak47-veryflat-side" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-veryflat-side.jpg" width="700" height="233" /></p>
<p>Here&#8217;s the same view from the bottom.  It looks like a ghost!</p>
<p><img class="aligncenter size-full wp-image-2433" alt="ak47-ghost-escaping" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-ghost-escaping1.jpg" width="700" height="419" /></p>
<p>The layers of metal from the attached blocks are quite distinct and starting to get in the way.</p>
<p><img class="aligncenter size-full wp-image-2429" alt="ak47-layers-peeling-off" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-layers-peeling-off.jpg" width="700" height="419" /></p>
<p>After a lot of wrestling with some pliers, I&#8217;ve finally managed to remove the blocks of metal.  All that remains is this &#8220;shrapnel.&#8221;</p>
<p><img class="aligncenter size-full wp-image-2430" alt="ak47-shrapnel" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-shrapnel.jpg" width="700" height="397" /></p>
<p>And the gun barrel itself is now the world&#8217;s coolest spatula.</p>
<p><img class="aligncenter size-full wp-image-2435" alt="ak47-worlds-coolest-spatula" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-worlds-coolest-spatula.jpg" width="700" height="419" /></p>
<p>And from the bottom:</p>
<p><img class="aligncenter size-full wp-image-2440" alt="ak47-worlds-coolest-spatula-bottom" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-worlds-coolest-spatula-bottom.jpg" width="700" height="419" /></p>
<h3>Rounding out the bowl</h3>
<p>All that&#8217;s left is to turn this spatula into a spoon.  The metal is still pretty thick, so I flattened it out as much as I could.</p>
<p><img class="aligncenter size-full wp-image-2442" alt="ak47-spatual-long-flat" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-spatual-long-flat.jpg" width="419" height="700" /></p>
<div>
<p><span class="Apple-style-span" style="line-height: 17px;">Then I made a spoon shape.  I did this by just holding the spatula at a slight angle while hammering.  Every blow bent the metal just a little bit until the full bowl shape was complete.</span></p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-2441" alt="ak47-worlds-coolest-spoon" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-worlds-coolest-spoon.jpg" width="419" height="700" /></p>
<p>I made my spoon about 2 inches too long.  Whoops!  No worries, I used a Dremel to cut the extra bits off.  I also used it to smooth around some of the edges.</p>
<p><img class="aligncenter size-full wp-image-2445" alt="ak47-spoon-smoothed-edges-hole" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-spoon-smoothed-edges-hole.jpg" width="700" height="490" /></p>
<p>Notice that there&#8217;s a little hole in the middle of the spoon.  I accidentally hammered the steel too thin and went all the way through.  Meh. It&#8217;s still good.  It&#8217;s just a straining spoon now!</p>
<p>There&#8217;s also a little bit of burnt steel along the sides.  Seriously?!  Steel can burn?  Thankfully, a soak in vinegar and scrubbing brought it out.  Thanks to the <a href="http://www.iforgeiron.com/topic/32781-help-my-metal-is-all-flaky-how-can-i-fix-it/">folks at iforgeiron.com</a> for giving me the tip!</p>
<p><img class="aligncenter size-full wp-image-2446" alt="ak47-spoon-final-burnt-steel" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-spoon-final-burnt-steel.jpg" width="700" height="419" /></p>
<p>Here&#8217;s the final product:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-2443" alt="ak47-spoon" src="http://izbicki.me/blog/wp-content/uploads/2013/05/ak47-spoon1.jpg" width="700" height="242" /></p>
<p>That&#8217;s it!</p>
<p>I wish I had some pictures of me eating from the spoon, but unfortunately it&#8217;s not food safe.  Real gunpowder has been detonated countless times inside this barrel.  I tried cleaning it as best as I could, but I&#8217;m pretty sure there&#8217;s plenty of little cancer molecules still hanging out in there.</p>
<p>One gun down, only <a href="http://en.wikipedia.org/wiki/Number_of_guns_per_capita_by_country">874,999,999 to go</a>.  Depressing.</p>
</div>
 <img src="http://izbicki.me/blog/?feed-stats-post-id=2388" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://izbicki.me/blog/turning-an-ak-47-into-a-serving-ladle/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Markov Networks, Monoids, and Futurama</title>
		<link>http://izbicki.me/blog/markov-networks-monoids-and-futurama?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=markov-networks-monoids-and-futurama</link>
		<comments>http://izbicki.me/blog/markov-networks-monoids-and-futurama#comments</comments>
		<pubDate>Thu, 09 May 2013 15:14:43 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://izbicki.me/blog/?p=2229</guid>
		<description><![CDATA[In this post, we&#8217;re going to look at how to manipulate multivariate distributions in Haskell&#8217;s HLearn library.  There are many ways to represent multivariate distributions, but we&#8217;ll use a technique called Markov networks.  These networks have the algebraic structure called a monoid (and group and vector space), and training them is a homomorphism.  Despite the scary names, these mathematical [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-2241" alt="fry" src="http://izbicki.me/blog/wp-content/uploads/2013/05/fry-300x225.jpg" width="300" height="225" />In this post, we&#8217;re going to look at how to manipulate multivariate distributions in Haskell&#8217;s <a href="https://github.com/mikeizbicki/HLearn">HLearn library</a>.  There are many ways to represent multivariate distributions, but we&#8217;ll use a technique called <a href="https://en.wikipedia.org/wiki/Markov_random_field">Markov networks</a>.  These networks have the algebraic structure called a <a href="https://en.wikipedia.org/wiki/Monoid">monoid</a> (and group and vector space), and training them is a <a href="https://en.wikipedia.org/wiki/Monoid_homomorphism#Monoid_homomorphisms">homomorphism</a>.  Despite the scary names, these mathematical structures make working with our distributions really easy and convenient&#8212;they give us online and parallel training algorithms &#8220;for free.&#8221;  If you want to go into the details of how, you can check out my <a href="http://izbicki.me/public/papers/tfp2013-hlearn-a-machine-learning-library-for-haskell.pdf">TFP13 submission</a>, but in this post we&#8217;ll ignore those mathy details to focus on how to use the library in practice.  We&#8217;ll use a running example of creating a distribution over characters in the show Futurama.</p>
<p><span id="more-2229"></span></p>
<h3>Prelimiaries: Creating the data Types</h3>
<p>As usual, this post is a literate haskell file.  To run this code, you&#8217;ll need to install the <a href="http://hackage.haskell.org/package/HLearn-distributions">hlearn-distributions</a> package.  This package requires GHC version at least 7.6.</p>
<pre>bash&gt; cabal install hlearn-distributions-1.0.0.1</pre>
<p>Now for some code.  We start with our language extensions and imports:</p>
<pre>&gt;{-# LANGUAGE DataKinds #-}
&gt;{-# LANGUAGE TypeFamilies #-}
&gt;{-# LANGUAGE TemplateHaskell #-}
&gt;
&gt;import HLearn.Algebra
&gt;import HLearn.Models.Distributions</pre>
<p>Next, we&#8217;ll create data type to represent Futurama characters.  There are a lot of characters, so we&#8217;ll need to keep things pretty organized.  The data type will have a record for everything we might want to know about a character.  Each of these records will be one of the variables in our multivariate distribution, and all of our data points will have this type.</p>
<p><img class="aligncenter size-large wp-image-2250" alt="FuturamaCast" src="http://izbicki.me/blog/wp-content/uploads/2013/05/FuturamaCast-1024x439.png" width="500" height="214" /></p>
<pre>&gt;data Character = Character
&gt;   { _name      :: String
&gt;   , _species   :: String
&gt;   , _job       :: Job
&gt;   , _isGood    :: Maybe Bool
&gt;   , _age       :: Double -- in years
&gt;   , _height    :: Double -- in feet
&gt;   , _weight    :: Double -- in pounds
&gt;   }
&gt;   deriving (Read,Show,Eq,Ord)
&gt;
&gt;data Job = Manager | Crew | Henchman | Other
&gt;   deriving (Read,Show,Eq,Ord)</pre>
<p>Now, in order for our library to be able to interpret the Character type, we call the template haskell function:</p>
<pre>&gt;makeTypeLenses ''Character</pre>
<p>This function creates a bunch of data types and type classes for us.  These &#8220;type lenses&#8221; give us a type-safe way to reference the different variables in our multivariate distribution.  We&#8217;ll see how to use these type level lenses a bit later.  There&#8217;s no need to understand what&#8217;s going on under the hood, but if you&#8217;re curious then checkout the <a href="http://hackage.haskell.org/packages/archive/HLearn-distributions/1.0.0.1/doc/html/HLearn-Models-Distributions-Multivariate-Internal-TypeLens.html">hackage documentation</a> or <a href="https://github.com/mikeizbicki/HLearn/blob/master/HLearn-distributions/src/HLearn/Models/Distributions/Multivariate/Internal/TypeLens.hs">source code</a>.</p>
<h3>Training a distribution</h3>
<p>Now, we&#8217;re ready to create a data set and start training.  Here&#8217;s a list of the employees of Planet Express provided by the resident bureaucrat Hermes Conrad.  This list will be our first data set.</p>
<p><img class="aligncenter size-full wp-image-2306" alt="hermes-zoom" src="http://izbicki.me/blog/wp-content/uploads/2013/05/hermes-zoom.png" width="700" height="250" /></p>
<pre>&gt;planetExpress = 
&gt;   [ Character "Philip J. Fry"         "human" Crew     (Just True) 1026   5.8 195
&gt;   , Character "Turanga Leela"         "alien" Crew     (Just True) 43     5.9 170
&gt;   , Character "Professor Farnsworth"  "human" Manager  (Just True) 85     5.5 160
&gt;   , Character "Hermes Conrad"         "human" Manager  (Just True) 36     5.3 210
&gt;   , Character "Amy Wong"              "human" Other    (Just True) 21     5.4 140
&gt;   , Character "Zoidberg"              "alien" Other    (Just True) 212    5.8 225
&gt;   , Character "Cubert Farnsworth"     "human" Other    (Just True) 8      4.3 135
&gt;   ]</pre>
<p>Let&#8217;s train a distribution from this data.  Here&#8217;s how we would train a distribution where every variable is independent of every other variable:</p>
<pre>&gt;dist1 = train planetExpress :: Multivariate Character
&gt;  '[ Independent Categorical '[String,String,Job,Maybe Bool]
&gt;   , Independent Normal '[Double,Double,Double]
&gt;   ]
&gt;   Double</pre>
<p>In the HLearn library, we always use the function <strong>train</strong> to train a model from data points.  We specify which model to train in the type signature.</p>
<p>As you can see, the Multivariate distribution takes three type parameters.  The first parameter is the type of our data point, in this case Character.  The second parameter describes the dependency structure of our distribution.  We&#8217;ll go over the syntax for the dependency structure in a bit.  For now, just notice that it&#8217;s a type-level list of distributions.  Finally, the third parameter is the type we will use to store our probabilities.</p>
<p>What can we do with this distribution?  One simple task we can do is to find <a href="https://en.wikipedia.org/wiki/Marginal_distribution">marginal distributions</a>.  The marginal distribution is the distribution of a certain variable ignoring all the other variables.  For example, let&#8217;s say I want a distribution of the species that work at planet express.  I can get this by:</p>
<pre>&gt;dist1a = getMargin TH_species dist1</pre>
<p>Notice that we specified which variable we&#8217;re taking the marginal of by using the type level lens TH_species.  This data constructor was automatically created for us by out template haskell function makeTypeLenses.  Every one of our records in the data type has its own unique type lens.  It&#8217;s name is the name of the record, prefixed by TH.  These lenses let us infer the types of our marginal distributions at compile time, rather than at run time.  For example, the type of the marginal distribution of species is:</p>
<pre>ghci&gt; :t dist1a
dist1a :: Categorical String Double</pre>
<p>That is, a categorical distributions whose data points are Strings and which stores probabilities as a Double.  Now, if I wanted a distribution of the weights of the employees, I can get that by:</p>
<pre>&gt;dist1b = getMargin TH_weight dist1</pre>
<p>And the type of this distribution is:</p>
<pre>ghci&gt; :t dist1b
dist1b :: Normal Double</pre>
<p>Now, I can easily plot these marginal distributions with the <strong>plotDistribution</strong> function:</p>
<pre>ghci&gt; plotDistribution (plotFile "dist1a") dist1a
ghci&gt; plotDistribution (plotFile "dist1b") dist1b</pre>
<p><center><br />
<img class="alignnone size-full wp-image-2271" alt="dist1a" src="http://izbicki.me/blog/wp-content/uploads/2013/05/dist1a.png" width="250" height="250" /><img class="alignnone size-full wp-image-2272" alt="dist1b" src="http://izbicki.me/blog/wp-content/uploads/2013/05/dist1b.png" width="250" height="250" /></center><br />
<img class=" wp-image-2310 alignright" alt="futurama-bender-smoking-cigar-wallpaper" src="http://izbicki.me/blog/wp-content/uploads/2013/05/futurama-bender-smoking-cigar-wallpaper-225x300.jpg" width="108" height="144" />But wait! I accidentally forgot to include Bender in the planetExpress data set! What can I do?</p>
<p>In a traditional statistics library, we would have to retrain our data from scratch.  If we had billions of elements in our data set, this would be an expensive mistake.  But in our HLearn library, we can take advantage of the model&#8217;s monoid structure.  In particular, the compiler used this structure to automatically derive a function called <strong>add1dp</strong> for us.  Let&#8217;s look at its type:</p>
<pre>ghci&gt; :t add1dp
add1dp :: HomTrainer model =&gt; model -&gt; Datapoint model -&gt; model</pre>
<p>It&#8217;s pretty simple.  The function takes a model and adds the data point associated with that model.  It returns the model we would have gotten if the data point had been in our original data set.  This is called online training.</p>
<p>Again, because our distributions form monoids, the compiler derived an efficient and exact online training algorithm for us automatically.</p>
<p>So let&#8217;s create a new distribution that considers bender:</p>
<pre>&gt;bender = Character "Bender Rodriguez" "robot" Crew (Just True) 44 6.1 612
&gt;dist1' = add1dp dist1 bender</pre>
<p>And plot our new marginals:</p>
<pre>ghci&gt; plotDistribution (plotFile "dist1-withbender-species" $ PNG 250 250) $ 
                getMargin TH_species dist1'
ghci&gt; plotDistribution (plotFile "dist1-withbender-weight"  $ PNG 250 250) $ 
                getMargin TH_weight dist1'</pre>
<p><center><br />
<img class="alignnone size-full wp-image-2267" alt="dist1-withbender-species" src="http://izbicki.me/blog/wp-content/uploads/2013/05/dist1-withbender-species.png" width="250" height="250" /><img class="alignnone size-full wp-image-2268" alt="dist1-withbender-weight" src="http://izbicki.me/blog/wp-content/uploads/2013/05/dist1-withbender-weight.png" width="250" height="250" /></center><br />
Notice that our categorical marginal has clearly changed, but that our normal marginal doesn&#8217;t seemed to have changed at all. This is because the plotting routines automatically scale the distribution, and the normal distribution, when scaled, always looks the same. We can double check that we actually did change the weight distribution by comparing the mean:</p>
<pre>ghci&gt; mean dist1b
176.42857142857142
ghci&gt; mean $ getMargin TH_weight dist1'
230.875</pre>
<p>Bender&#8217;s weight really changed the distribution after all!</p>
<h3>Complicated DependencE structureS</h3>
<p>That&#8217;s cool, but our original distribution isn&#8217;t very interesting.  What makes multivariate distributions interesting is when the variables affect each other.  This is true in our case, so we&#8217;d like to be able to model it.  For example, we&#8217;ve already seen that robots are much heavier than organic lifeforms, and are throwing off our statistics.  The HLearn library supports a small subset of Markov Networks for expressing these dependencies.</p>
<p>We represent Markov Networks as graphs with undirected edges.  Every attribute in our distribution is a node, and every dependence between attributes is an edge.  We can draw this graph with the <strong>plotNetwork</strong> command:</p>
<pre>ghci&gt; plotNetwork "dist1-network" dist1</pre>
<p style="text-align: left;"><img class="size-medium wp-image-2276 aligncenter" alt="dist1-network" src="http://izbicki.me/blog/wp-content/uploads/2013/05/dist1-network-300x276.png" width="300" height="276" />As expected, there are no edges in our graph because everything is independent.  Let&#8217;s create a more interesting distribution and plot its Markov network.</p>
<pre>&gt;dist2 = train planetExpress :: Multivariate Character
&gt;  '[ Ignore                  '[String]
&gt;   , MultiCategorical        '[String]
&gt;   , Independent Categorical '[Job,Maybe Bool]
&gt;   , Independent Normal      '[Double,Double,Double]
&gt;   ]
&gt;   Double</pre>
<pre>ghci&gt; plotNetwork "dist2-network" dist2</pre>
<p style="text-align: center;"> <img class="size-medium wp-image-2277 aligncenter" alt="dist2-network" src="http://izbicki.me/blog/wp-content/uploads/2013/05/dist2-network-300x263.png" width="300" height="263" /></p>
<p>Okay, so what just happened?</p>
<p>The syntax for representing the dependence structure is a little confusing, so let&#8217;s go step by step.  We represent the dependence information in the graph as a list of types.  Each element in the list describes both the marginal distribution and the dependence structure for one or more records in our data type.  We must list these elements in the same order as the original data type.</p>
<p>Notice that we&#8217;ve made two changes to the list.  First, our list now starts with the type Ignore &#8216;[String].  This means that the first string in our data type&#8212;the name&#8212;will be ignored.  Notice that TH_name is no longer in the Markov Network.  This makes sense because we expect that a character&#8217;s name should not tell us too much about any of their other attributes.</p>
<p>Second, we&#8217;ve added a dependence.  The MultiCategorical distribution makes everything afterward in the list dependent on that item, but not the things before it.  This means that the exact types of dependencies it can specify are dependent on the order of the records in our data type.  Let&#8217;s see what happens if we change the location of the MultiCategorical:</p>
<pre>&gt;dist3 = train planetExpress :: Multivariate Character
&gt;  '[ Ignore '[String]
&gt;   , Independent Categorical '[String]
&gt;   , MultiCategorical '[Job]
&gt;   , Independent Categorical '[Maybe Bool]
&gt;   , Independent Normal '[Double,Double,Double]
&gt;   ]
&gt;   Double</pre>
<pre>ghci&gt; plotNetwork "dist3-network" dist3</pre>
<p style="text-align: center;"><img class="size-medium wp-image-2279 aligncenter" alt="dist3-network" src="http://izbicki.me/blog/wp-content/uploads/2013/05/dist3-network1-300x246.png" width="300" height="246" /></p>
<p>As you can see, our species no longer have any relation to anything else.  Unfortunately, using this syntax, the order of list elements is important, and so the order we specify our data records is important.</p>
<p>Finally, we can substitute any valid univariate distribution for our Normal and Categorical distributions.  The HLearn library currently supports Binomial, Exponential, Geometric, LogNormal, and Poisson distributions.  These just don&#8217;t make much sense for modelling Futurama characters, so we&#8217;re not using them.</p>
<p>Now, we might be tempted to specify that every variable is fully dependent on every other variable.  In order to do this, we have to introduce the &#8220;Dependent&#8221; type.  Any valid multivariate distribution can follow Dependent, but only those records specified in the type-list will actually be dependent on each other.  For example:</p>
<pre>&gt;dist4 = train planetExpress :: Multivariate Character
&gt;  '[ Ignore '[String]
&gt;   , MultiCategorical '[String,Job,Maybe Bool]
&gt;   , Dependent MultiNormal '[Double,Double,Double]
&gt;   ]
&gt;   Double</pre>
<pre>ghci&gt; plotNetwork "dist4-network" dist4</pre>
<p><img class="aligncenter size-medium wp-image-2281" alt="distb-network" src="http://izbicki.me/blog/wp-content/uploads/2013/05/distb-network-300x226.png" width="300" height="226" /></p>
<p>Undoubtably, this is in always going to be the case&#8212;everything always has a slight influence on everything else.  Unfortunately, it is not easy in practice to model these fully dependent distributions.  We need roughly <span id='tex_8666'></span> data points to accurately train a distribution, where n is the number of nodes in our graph and e is the number of edges in our network.  Thus, by selecting that two attributes are independent of each other, we can greatly reduce the amount of data we need to train an accurate distribution.</p>
<p>I realize that this syntax is a little awkward.  I chose it because it was relatively easy to implement.  Future versions of the library should support a more intuitive syntax.  I also plan to use <a href="https://en.wikipedia.org/wiki/Copula_(probability_theory)">copulas</a> to greatly expand the expressiveness of these distributions.  In the mean time, the best way to figure out the dependencies in a Markov Network are just to plot it and see visually.</p>
<p>Okay.  So what distribution makes the most sense for Futurama characters?  We&#8217;ll say that everything depends on both the characters&#8217; species and job, and that their weight depends on their height.</p>
<pre>&gt;planetExpress = train planetExpress :: Multivariate Character
&gt;  '[ Ignore '[String]
&gt;   , MultiCategorical '[String,Job]
&gt;   , Independent Categorical '[Maybe Bool]
&gt;   , Independent Normal '[Double]
&gt;   , Dependent MultiNormal '[Double,Double]
&gt;   ]
&gt;   Double</pre>
<pre>ghci&gt; plotNetwork "planetExpress-network" planetExpress</pre>
<p><img class="aligncenter size-medium wp-image-2280" alt="dist4-network" src="http://izbicki.me/blog/wp-content/uploads/2013/05/dist4-network-300x225.png" width="300" height="225" /></p>
<p>We still don&#8217;t have enough data to to train this network, so let&#8217;s create some more.  We start by creating a type for our Markov network called FuturamaDist.  This is just for convenience so we don&#8217;t have to retype the dependence structure many times.</p>
<pre>&gt;type FuturamaDist = Multivariate Character
&gt;  '[ Ignore '[String]
&gt;   , MultiCategorical '[String,Job]
&gt;   , Independent Categorical '[Maybe Bool]
&gt;   , Independent Normal '[Double]
&gt;   , Dependent MultiNormal '[Double,Double]
&gt;   ]
&gt;   Double</pre>
<p>Next, we train some more distribubtions of this type on some of the characters.  We&#8217;ll start with Mom Corporation and the brave Space Forces.</p>
<p><center> <img class="alignnone size-full wp-image-2316" alt="200-futurama_mom_and_sons" src="http://izbicki.me/blog/wp-content/uploads/2013/05/200-futurama_mom_and_sons.jpg" width="304" height="200" /> <img class="alignnone size-full wp-image-2318" alt="200-kif and zapp" src="http://izbicki.me/blog/wp-content/uploads/2013/05/200-kif-and-zapp.jpg" width="267" height="200" /></center></p>
<pre>&gt;momCorporation = 
&gt;   [ Character "Mom"                   "human" Manager  (Just False) 100 5.5 130
&gt;   , Character "Walt"                  "human" Henchman (Just False) 22  6.1 170
&gt;   , Character "Larry"                 "human" Henchman (Just False) 18  5.9 180
&gt;   , Character "Igner"                 "human" Henchman (Just False) 15  5.8 175
&gt;   ]
&gt;momDist = train momCorporation :: FuturamaDist</pre>
<pre>&gt;spaceForce = 
&gt;   [ Character "Zapp Brannigan"        "human" Manager  (Nothing)   45  6.0 230
&gt;   , Character "Kif Kroker"            "alien" Crew     (Just True) 113 4.5 120
&gt;   ]
&gt;spaceDist = train spaceForce :: FuturamaDist</pre>
<p style="text-align: left;">And now some more robots:</p>
<p><center><img class="alignnone size-full wp-image-2319" alt="200-robotmafia" src="http://izbicki.me/blog/wp-content/uploads/2013/05/200-robotmafia.jpg" width="330" height="200" /> <img class="alignnone size-full wp-image-2317" alt="200-hedonismbot" src="http://izbicki.me/blog/wp-content/uploads/2013/05/200-hedonismbot.jpg" width="250" height="200" /></center></p>
<pre>&gt;robots = 
&gt;   [ bender
&gt;   , Character "Calculon"              "robot" Other    (Nothing)    123  6.8 650
&gt;   , Character "The Crushinator"       "robot" Other    (Nothing)    45   8.0 4500
&gt;   , Character "Clamps"                "robot" Henchman (Just False) 134  5.8 330
&gt;   , Character "DonBot"                "robot" Manager  (Just False) 178  5.8 520
&gt;   , Character "Hedonismbot"           "robot" Other    (Just False) 69   4.3 1200
&gt;   , Character "Preacherbot"           "robot" Manager  (Nothing)    45   5.8 350
&gt;   , Character "Roberto"               "robot" Other    (Just False) 77   5.9 250
&gt;   , Character "Robot Devil"           "robot" Other    (Just False) 895  6.0 280
&gt;   , Character "Robot Santa"           "robot" Other    (Just False) 488  6.3 950
&gt;   ]
&gt;robotDist = train robots :: FuturamaDist</pre>
<p>Now we&#8217;re going to take advantage of the monoid structure of our multivariate distributions to combine all of these distributions into one.</p>
<pre>&gt; futuramaDist = dist1 &lt;&gt; momDist &lt;&gt; spaceDist &lt;&gt; robotDist</pre>
<p>The resulting distribution is equivalent to having trained a distribution from scratch on all of the data points:</p>
<pre>train (planetExpress++momCorporation++spaceForces++robots) :: FuturamaDist</pre>
<div>
<p>We can take advantage of this property any time we use the train function to automatically parallelize our code.  The higher order function <strong>parallel</strong> will split  the training task evenly over each of your available processors, then merge them together with the monoid operation.  This results in &#8220;theoretically perfect&#8221; parallel training of these models.</p>
<pre>parallel train (planetExpress++momCorporation++spaceForces++robots) :: FuturamaDist</pre>
<div>
<p>Again, this is only possible because the distributions have a monoid structure.</p>
</div>
<p><span class="Apple-style-span" style="line-height: 17px;">Now, let&#8217;s ask some questions of our distribution.  If I pick a character at random, what&#8217;s the probability that they&#8217;re a good guy?  Let&#8217;s plot the marginal.</span></p>
</div>
<pre>ghci&gt; plotDistribution (plotFile "goodguy" $ PNG 250 250) $ getMargin TH_isGood futuramaDist</pre>
<p><img class="aligncenter size-full wp-image-2294" alt="goodguy" src="http://izbicki.me/blog/wp-content/uploads/2013/05/goodguy.png" width="250" height="250" /></p>
<p>But what if I only want to pick from those characters that are humans, or those characters that are robots?  Statisticians call this conditioning.  We can do that with the condition function:</p>
<pre>ghci&gt; plotDistribution (plotFile "goodguy-human" $ PNG 250 250) $
             getMargin TH_isGood $ condition TH_species "human" futuramaDist
ghci&gt; plotDistribution (plotFile "goodguy-robot" $ PNG 250 250) $
             getMargin TH_isGood $ condition TH_species "robot" futuramaDist</pre>
<p>&nbsp;</p>
<p><center><img class="alignright" alt="Preacherbot" src="http://izbicki.me/blog/wp-content/uploads/2013/05/Preacherbot-174x300.jpg" width="174" height="300" /><img class="alignnone size-full wp-image-2295" alt="goodguy-human" src="http://izbicki.me/blog/wp-content/uploads/2013/05/goodguy-human.png" width="250" height="250" /> <img class="size-medium wp-image-2296 alignnone" alt="goodguy-robot" src="http://izbicki.me/blog/wp-content/uploads/2013/05/goodguy-robot.png" width="250" height="250" /></center>On the left is the plot for humans, and on the right the plot for robots.  Apparently, original robot sin is much worse than that in humans!  If only they would listen to Preacherbot and repent of their wicked ways&#8230;</p>
<p>Now let&#8217;s ask: What&#8217;s the average age of an evil robot?</p>
<pre>ghci&gt; mean $ getMargin TH_age $ 
         condition TH_isGood (Just False) $ condition TH_species "robot" futuramaDist 
273.0769230769231</pre>
<p>Notice that conditioning a distribution is a commutative operation.  That means we can condition in any order and still get the exact same results.  Let&#8217;s try it:</p>
<pre>ghci&gt; mean $ getMargin TH_age $ 
         condition TH_species "robot" $ condition TH_isGood (Just False) futuramaDist 
273.0769230769231</pre>
<p>There&#8217;s one last thing for us to consider.  What does our Markov network look like after conditioning?  Let&#8217;s find out!</p>
<pre>plotNetwork "condition-species-isGood" $ 
         condition TH_species "robot" $ condition TH_isGood (Just False) futuramaDist</pre>
<p style="text-align: center;"><img class="aligncenter size-medium wp-image-2344" alt="condition-species-isGood" src="http://izbicki.me/blog/wp-content/uploads/2013/05/condition-species-isGood-300x168.png" width="300" height="168" /></p>
<p>Notice that conditioning against these variables caused them to go away from our Markov Network.</p>
<p>Finally, there&#8217;s another similar process to conditioning called &#8220;marginalizing out.&#8221; This lets us ignore the effects of a single attribute without specifically saying what that attribute must be. When we marginalize out on our Markov network, we get the same dependence structure as if we conditioned.</p>
<pre>plotNetwork "marginalizeOut-species-isGood" $ 
         marginalizeOut TH_species $ marginalizeOut TH_isGood futuramaDist</pre>
<p><img class="aligncenter" alt="condition-species-isGood" src="http://izbicki.me/blog/wp-content/uploads/2013/05/condition-species-isGood-300x168.png" width="300" height="168" /></p>
<p>Effectively, what the marginalizeOut function does is &#8220;forget&#8221; the extra dependencies, whereas the condition function &#8220;applies&#8221; those dependencies.  In the end, the resulting Markov network has the same structure, but different values.</p>
<p>&nbsp;</p>
<p>Finally, at the start of the post, I mentioned that our multivariate distributions have group and vector space structure.  This gives us two more operations we can use: the inverse and scalar multiplication.  You can find more posts on how to take advantage of these structures <a href="http://izbicki.me/blog/the-categorical-distributions-algebraic-structure">here</a> and <a href="http://izbicki.me/blog/nuclear-weapon-statistics-using-monoids-groups-and-modules-in-haskell">here</a>.</p>
<h3>Next time&#8230;</h3>
<p><img class="alignright size-medium wp-image-2248" alt="futurama-spacesuits" src="http://izbicki.me/blog/wp-content/uploads/2013/05/futurama-spacesuits-300x208.jpg" width="300" height="208" /></p>
<p>The best part of all of this is still coming.  Next, we&#8217;ll take a look at full on Bayesian classification and why it forms a monoid.  Besides online and parallel trainers, this also gives us a fast cross-validation method.</p>
<p>There&#8217;ll also be a posts about the monoid structure of Markov <em>chains</em>, the Free HomTrainer, and how this whole algebraic framework applies to NP-approximation algorithms as well.</p>
<p>Subscribe to the <a href="http://izbicki.me/blog/feed">RSS feed</a> to stay tuned.</p>
<div><span class="Apple-style-span" style="line-height: 17px;"> </span></div>
 <img src="http://izbicki.me/blog/?feed-stats-post-id=2229" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://izbicki.me/blog/markov-networks-monoids-and-futurama/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why (and how) I&#8217;m refusing to pay war taxes</title>
		<link>http://izbicki.me/blog/why-and-how-im-refusing-to-pay-war-taxes?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=why-and-how-im-refusing-to-pay-war-taxes</link>
		<comments>http://izbicki.me/blog/why-and-how-im-refusing-to-pay-war-taxes#comments</comments>
		<pubDate>Mon, 15 Apr 2013 14:45:19 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Anarchism]]></category>
		<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Religion]]></category>

		<guid isPermaLink="false">http://izbicki.me/blog/?p=2102</guid>
		<description><![CDATA[Growing up, I wanted nothing more than to be a Naval officer.  But then Jesus changed my heart.  He&#8217;s been teaching me that instead of killing my enemies, I&#8217;m supposed to love them.  In fact, I&#8217;m supposed to dedicate my life to serving them.  Maybe even die for them.  So after 7 years in the [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignright" alt="" src="http://www.warresisters.org/sites/default/files/2013pie.jpg" width="250" height="250" />Growing up, I wanted nothing more than to be a Naval officer.  But then Jesus changed my heart.  He&#8217;s been teaching me that instead of killing my enemies, I&#8217;m supposed to love them.  In fact, I&#8217;m supposed to dedicate my life to serving them.  Maybe even die for them.  So <a href="http://izbicki.me/blog/co-testimony-the-development-of-my-beliefs">after 7 years in the navy, I left as a conscientious objector</a>.  That&#8217;s also why I&#8217;m not paying my federal taxes this year.</p>
<p>You see, <strong>in the United States, roughly half of our tax dollars go to financing war</strong>.  (You can find a detailed breakdown <a href="https://www.warresisters.org/sites/default/files/FY2014piechart-english-color.pdf">here</a>.)  This is ridiculous and unacceptable.  I would gladly pay more taxes to finance roads, schools, or public health care.  But I will no longer pay other people to kill America&#8217;s enemies on my behalf.<span id="more-2102"></span></p>
<p>I deeply regret the need for tax resistance because it contradicts a number of Biblical commands.  For example, in Romans 13:7 Paul tell us that &#8220;if you owe taxes, pay taxes&#8221; and in Mathew 22:21 Jesus commands us to &#8220;give unto Caesar the things that are Caesar&#8217;s.&#8221;  I wish I could obey these commands at face value.  But <strong>obeying the commands to pay taxes would result in me breaking the greatest commandment of them all: to love my neighbor as myself</strong>.  Jesus calls everyone my neighbor, even my enemies.  Even people who kill Americans, like Osama bin Laden.  I&#8217;m deeply ashamed that my tax dollars helped finance his assassination.  Not to mention the near-daily drone strikes that continue to happen, the torture at gitmo, and the DOD&#8217;s research into newer and deadlier weapons systems.  I payed for it all.</p>
<p>I could say a lot more about why I feel morally compelled to not pay war taxes, but I won&#8217;t.  I&#8217;ll skip right to the part where <strong>I&#8217;m making a public statement that I will not finance war, and I will accept whatever consequences that entails</strong>.  I also acknowledge that by taking this stand, I am sinning.  But this is the least sinful option my limited wisdom can find.  So I will continue on, &#8220;sinning boldly&#8221; as ever.</p>
<p>Below I describe the exact mechanics of how I&#8217;m refusing to pay war taxes.  I&#8217;m following advice provided mainly by the War Resistor League&#8217;s <a href="https://www.warresisters.org/wartaxresistanceguide">War Tax Resistance</a> book.</p>
<h3>How I&#8217;m Resisting</h3>
<p><a href="https://www.warresisters.org/wartaxresistanceguide"><img class="alignright  wp-image-2118" alt="wartaxresistance" src="http://izbicki.me/blog/wp-content/uploads/2013/04/wartaxresistance.png" width="150" height="200" /></a></p>
<p>Today I filed my taxes just like everyone else.  I filled out my form 1040, and found out that I owed 48 dollars.  It&#8217;s not very much, but it&#8217;s something.  I did my best to be as honest and complete as possible in the paperwork.  But instead of including a check, I wrote them the following letter:</p>
<blockquote><p>To whom it may concern:</p>
<p>After careful consideration, I have decided not to pay my 2012 taxes to the Federal government. I cannot in good conscience provide any financial support for our ongoing wars and excessive military spending.</p>
<p align="JUSTIFY">I do, however, want to be a good citizen and contribute my fair share to society. Therefore, I am paying the taxes I owe to the federal government ($48) to my local state government (CA) instead. I have scanned a copy of my contribution check below.</p>
<p align="JUSTIFY">Sincerely,</p>
<p align="JUSTIFY">Michael Izbicki</p>
</blockquote>
<p>I was happy to do my California taxes in addition to giving them this extra money.  It&#8217;s only <em>war</em> that I&#8217;m against, not taxes in general.  Here&#8217;s a copy of the actual check I wrote:</p>
<p><img class="aligncenter size-full wp-image-2104" alt="check-cropped-censored-700" src="http://izbicki.me/blog/wp-content/uploads/2013/04/check-cropped-censored-700.jpeg" width="700" height="326" /></p>
<p>Also, for anyone interested, I&#8217;ve posted my form 1040:</p>
<table align="center">
<tbody>
<tr>
<td><a href="http://izbicki.me/blog/wp-content/uploads/2013/04/f1040pg1.png"><img alt="f1040pg1" src="http://izbicki.me/blog/wp-content/uploads/2013/04/f1040pg1-220x300.png" width="220" height="300" /></a></td>
<td><a href="http://izbicki.me/blog/wp-content/uploads/2013/04/f1040pg2.png"><img class="size-medium wp-image-2141" alt="f1040pg2" src="http://izbicki.me/blog/wp-content/uploads/2013/04/f1040pg2-220x300.png" width="220" height="300" /></a></td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p>Finally, just before mailing my envelope, I said the St Francis prayer:</p>
<blockquote><p>Lord, make me an instrument of your peace.  Where there is hatred, let me sow love; where there is injury, pardon; where there is doubt, faith; where there is despair, hope; where there is darkness, light; and where there is sadness, joy.</p>
<p>O Divine Master, grant that I may not so much seek to be consoled, as to console; to be understood, as to understand; to be loved, as to love.  For it is in giving that we receive; it is in pardoning that we are pardoned, and it is in dying that we are born to Eternal Life.</p></blockquote>
 <img src="http://izbicki.me/blog/?feed-stats-post-id=2102" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://izbicki.me/blog/why-and-how-im-refusing-to-pay-war-taxes/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The categorical distribution&#8217;s algebraic structure</title>
		<link>http://izbicki.me/blog/the-categorical-distributions-algebraic-structure?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the-categorical-distributions-algebraic-structure</link>
		<comments>http://izbicki.me/blog/the-categorical-distributions-algebraic-structure#comments</comments>
		<pubDate>Tue, 08 Jan 2013 14:43:15 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://izbicki.me/blog/?p=1932</guid>
		<description><![CDATA[ The categorical distribution is the main distribution for handling discrete data. I like to think of it as a histogram.  For example, let&#8217;s say Simon has a bag full of marbles.  There are four &#8220;categories&#8221; of marbles&#8212;red, green, blue, and white.  Now, if Simon reaches into the bag and randomly selects a marble, what&#8217;s the probability [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignright" alt="histogram of simonDist" src="http://izbicki.me/blog/wp-content/uploads/2013/01/histogram-of-simonDist.png" width="247" height="173" /> The <a href="https://en.wikipedia.org/wiki/Categorical_distribution">categorical distribution</a> is the main distribution for handling discrete data. I like to think of it as a <strong>histogram</strong>.  For example, let&#8217;s say Simon has a bag full of marbles.  There are four &#8220;categories&#8221; of marbles&#8212;red, green, blue, and white.  Now, if Simon reaches into the bag and randomly selects a marble, what&#8217;s the probability it will be green?  We would use the categorical distribution to find out.</p>
<p>In this article, we&#8217;ll go over the math behind the categorical distribution, the algebraic structure of the distribution, and how to manipulate it within Haskell&#8217;s <a href="http://hackage.haskell.org/package/HLearn-algebra">HLearn</a> library.  We&#8217;ll also see some examples of how this focus on algebra makes HLearn&#8217;s interface more powerful than other common statistical packages.  Everything that we&#8217;re going to see is in a certain sense very &#8220;obvious&#8221; to a statistician, but this algebraic framework also makes it <strong>convenient</strong>.  And since programmers are inherently lazy, this is a Very Good Thing.</p>
<p>Before delving into the &#8220;cool stuff,&#8221; we have to look at some of the mechanics of the HLearn library.</p>
<p><span id="more-1932"></span></p>
<h3>Preliminaries</h3>
<p>The <a href="http://hackage.haskell.org/package/HLearn-distributions">HLearn-distributions</a> package contains all the functions we need to manipulate categorical distributions. Let&#8217;s install it:</p>
<pre>$ cabal install HLearn-distributions</pre>
<p>We import our libraries:</p>
<pre>&gt;import Control.DeepSeq
&gt;import HLearn.Algebra
&gt;import HLearn.Gnuplot.Distributions
&gt;import HLearn.Models.Distributions</pre>
<p>We create a data type for Simon&#8217;s marbles:</p>
<pre>&gt;data Marble = Red | Green | Blue | White
&gt;    deriving (Read,Show,Eq,Ord)</pre>
<p><img class="aligncenter size-full wp-image-2058" alt="marbles" src="http://izbicki.me/blog/wp-content/uploads/2013/01/marbles.png" width="400" height="100" /></p>
<p>The easiest way to represent Simon&#8217;s bag of marbles is with a list:</p>
<pre>&gt;simonBag :: [Marble]
&gt;simonBag = [Red, Red, Red, Green, Blue, Green, Red, Blue, Green, Green, Red, Red, Blue, Red, Red, Red, White]</pre>
<p>And now we&#8217;re ready to train a categorical distribution of the marbles in Simon&#8217;s bag:</p>
<pre>&gt;simonDist = train simonBag :: Categorical Marble Double</pre>
<p>We can load up ghci and plot the distribution with the conveniently named function <a href="http://hackage.haskell.org/packages/archive/HLearn-distributions/0.1.0.1/doc/html/HLearn-Gnuplot-Distributions.html">plotDistribution</a>:</p>
<pre>ghci&gt; plotDistribution (plotFile "simonDist") simonDist</pre>
<p>This gives us a histogram of probabilities:</p>
<p><img class="aligncenter size-full wp-image-1965" alt="marbles trained into categorical" src="http://izbicki.me/blog/wp-content/uploads/2013/01/marbles-trained-into-categorical.png" width="700" height="210" /></p>
<p>In the HLearn library, every statistical model is generated from data using either <a href="http://hackage.haskell.org/packages/archive/HLearn-algebra/0.1.0.1/doc/html/HLearn-Algebra-Models.html">train</a> or <a href="http://hackage.haskell.org/packages/archive/HLearn-algebra/0.1.0.1/doc/html/HLearn-Algebra-Models.html">train&#8217;</a>.  Because these functions are overloaded, we must specify the type of simonDist so that the compiler knows which model to generate. <a href="http://hackage.haskell.org/packages/archive/HLearn-distributions/0.1.0.1/doc/html/HLearn-Models-Distributions-Categorical.html">Categorical</a> takes two parameters. The first is the type of the discrete data (Marble). The second is the type of the probability (Double). We could easily create Categorical distributions with different types depending on the requirements for our application. For example:</p>
<pre>&gt;stringDist = train (map show simonBag) :: Categorical String Float</pre>
<p>This is the first &#8220;cool thing&#8221; about Categorical:  <strong>We can make distributions over any user-defined type</strong>.  This makes programming with probabilities easier, more intuitive, and more convenient.  Most other statistical libraries would require you to assign numbers corresponding to each color of marble, and then create a distribution over those numbers.</p>
<p>Now that we have a distribution, we can find some probabilities. If Simon pulls a marble from the bag, what&#8217;s the probability that it would Red?</p>
<p style="text-align: center;"><span id='tex_8202'></span></p>
<p>We can use the pdf function to do this calculation for us:</p>
<pre>ghci&gt; pdf simonDist Red
0.5626
ghci&gt; pdf simonDist Blue
0.1876
ghci&gt; pdf simonDist Green
0.1876
ghci&gt; pdf simonDist White
6.26e-2</pre>
<p>If we sum all the probabilities, as expected we would get 1:</p>
<pre>ghci&gt; sum $ map (pdf simonDist) [Red,Green,Blue,White]
1.0</pre>
<p>Due to rounding errors, you may not always get 1. If you absolutely, positively, have to avoid rounding errors, you should use Rational probabilities:</p>
<pre>&gt;simonDistRational = train simonBag :: Categorical Marble Rational</pre>
<p>Rationals are slower, but won&#8217;t be subject to floating point errors.</p>
<p>This is just about all the functionality you would get in a &#8220;normal&#8221; stats package like R or NumPy. But using Haskell&#8217;s nice support for algebra, we can get some extra cool features.</p>
<h3>Semigroup</h3>
<p>First, let&#8217;s talk about semigroups. A <a href="https://en.wikipedia.org/wiki/Semigroup">semigroup</a> is any data structure that has a binary operation (<strong>&lt;&gt;</strong>) that joins two of those data structures together. The categorical distribution is a semigroup.</p>
<p>Don wants to play marbles with Simon, and he has his own bag. Don&#8217;s bag contains only red and blue marbles:</p>
<pre>&gt;donBag = [Red,Blue,Red,Blue,Red,Blue,Blue,Red,Blue,Blue]</pre>
<p>We can train a categorical distribution on Don&#8217;s bag in the same way we did earlier:</p>
<pre>&gt;donDist = train donBag :: Categorical Marble Double</pre>
<p>In order to play marbles together, Don and Simon will have to add their bags together.</p>
<pre>&gt;bothBag = simonBag ++ donBag</pre>
<p>Now, we have two options for training our distribution. First is the naive way, we can train the distribution directly on the combined bag:</p>
<pre>&gt;bothDist = train bothBag :: Categorical Marble Double</pre>
<p>This is the way we would have to approach this problem in most statistical libraries. But with HLearn, we have a more efficient alternative. We can combine the trained distributions using the semigroup operation:</p>
<pre>&gt;bothDist' = simonDist &lt;&gt; donDist</pre>
<p>Under the hood, the categorical distribution stores the number of times each possibility occurred in the training data.  The &lt;&gt; operator just adds the corresponding counts from each distribution together:</p>
<p><img class="aligncenter size-full wp-image-1994" alt="semigroup and bothDist" src="http://izbicki.me/blog/wp-content/uploads/2013/01/semigroup-and-bothDist.png" width="700" height="260" /></p>
<p>This method is more efficient because it avoids repeating work we&#8217;ve already done. Categorical&#8217;s semigroup operation runs in time <strong>O(1)</strong>, so no matter how big the bags are, we can calculate the distribution very quickly. The naive method, in contrast, requires time <strong>O(n)</strong>. If our bags had millions or billions of marbles inside them, this would be a considerable savings!</p>
<p>We get another cool performance trick &#8220;for free&#8221; based on the fact that Categorical is a semigroup: The function train can be <strong>automatically parallelized</strong> using the higher order function <a href="http://hackage.haskell.org/packages/archive/HLearn-algebra/0.1.1.0/doc/html/HLearn-Algebra-Functions.html">parallel</a>. I won&#8217;t go into the details about how this works, but here&#8217;s how you do it in practice.</p>
<p>First, we must show the compiler how to resolve the Marble data type down to &#8220;<a href="http://stackoverflow.com/questions/6872898/haskell-what-is-weak-head-normal-form">normal form</a>.&#8221; This basically means we must show the compiler how to fully compute the data type. (We only have to do this because Marble is a type we created.  If we were using a built in type, like a String, we could skip this step.) This is fairly easy for a type as simple as Marble:</p>
<pre>&gt;instance NFData Marble where
&gt;    rnf Red   = ()
&gt;    rnf Blue  = ()
&gt;    rnf Green = ()
&gt;    rnf White = ()</pre>
<p>Then, we can perform the parallel computation by:</p>
<pre>&gt;simonDist_par = parallel train simonBag :: Categorical Marble Double</pre>
<p>Other languages require a programmer to manually create parallel versions of their functions. But in Haskell with the HLearn library, we get these parallel versions for free! All we have to do is ask for it!</p>
<h3>Monoid</h3>
<p>A monoid is a semigroup with an empty element, which is called <strong>mempty</strong> in Haskell. It obeys the law that:</p>
<pre>M &lt;&gt; mempty == mempty &lt;&gt; M == M</pre>
<p>And it is easy to show that Categorical is also a monoid. We get this empty element by training on an empty data set:</p>
<pre>mempty = train ([] :: [Marble]) :: Categorical Marble Double</pre>
<p>The <a href="http://hackage.haskell.org/packages/archive/HLearn-algebra/0.1.1.0/doc/html/HLearn-Algebra-Models.html">HomTrainer</a> type class requires that all its instances also be instances of Monoid. This lets the compiler automatically derive &#8220;<a href="http://en.wikipedia.org/wiki/Online_machine_learning">online trainers</a>&#8221; for us. An online trainer can add new data points to our statistical model without retraining it from scratch.</p>
<p>For example, we could use the function <a href="http://hackage.haskell.org/packages/archive/HLearn-algebra/0.1.1.0/doc/html/HLearn-Algebra-Models.html">add1dp</a> (stands for: add one data point) to add another white marble into Simon&#8217;s bag:</p>
<pre>&gt;simonDistWhite = add1dp simonDist White</pre>
<p>This also gives us another approach for our earlier problem of combining Simon and Don&#8217;s bags. We could use the function <a href="http://hackage.haskell.org/packages/archive/HLearn-algebra/0.1.1.0/doc/html/HLearn-Algebra-Models.html">addBatch</a>:</p>
<pre>&gt;bothDist'' = addBatch simonDist donBag</pre>
<p>Because Categorical is a monoid, we maintain the property that:</p>
<pre>bothDist == bothDist' == bothDist''</pre>
<p>Again, statisticians have always known that you could add new points into a categorical distribution without training from scratch.  The cool thing here is that <strong>the compiler is deriving all of these functions for us</strong>, and it&#8217;s giving us a<strong> consistent interface </strong>for use with different data structures.  All we had to do to get these benefits was tell the compiler that Categorical is a monoid.  This makes designing and programming libraries much <strong>easier, quicker, and less error prone</strong>.</p>
<h3>Group</h3>
<p>A <a href="http://en.wikipedia.org/wiki/Group_(mathematics)">group</a> is a monoid with the additional property that all elements have an <strong>inverse</strong>. This lets us perform subtraction on groups.  And Categorical is a group.</p>
<p>Ed wants to play marbles too, but he doesn&#8217;t have any of his own. So Simon offers to give Ed some of from his own bag. He gives Ed one of each color:</p>
<pre>&gt;edBag = [Red,Green,Blue,White]</pre>
<p>Now, if Simon draws a marble from his bag, what&#8217;s the probability it will be blue?</p>
<p>To answer this question without algebra, we&#8217;d have to go back to the original data set, remove the marbles Simon gave Ed, then retrain the distribution. This is awkward and computationally expensive. But if we take advantage of Categorical&#8217;s group structure, we can just subtract directly from the distribution itself. This makes more sense intuitively and is easier computationally.</p>
<pre>&gt;simonDist2 = subBatch simonDist edBag</pre>
<p>This is a shorthand notation for using the group operations directly:</p>
<pre>&gt;edDist = train edBag :: Categorical Marble Double
&gt;simonDist2' = simonDist &lt;&gt; (inverse edDist)</pre>
<p>The way the inverse operation works is it multiplies the counts for each category by -1. In picture form, this flips the distribution upside down:</p>
<p><img class="aligncenter size-full wp-image-2013" alt="edDist inversification" src="http://izbicki.me/blog/wp-content/uploads/2013/01/edDist-inversification.png" width="620" height="200" /></p>
<p>Then, adding an upside down distribution to a normal one is just subtracting the histogram columns and renormalizing:</p>
<p><img class="aligncenter size-full wp-image-2014" alt="Simon substraction edDist" src="http://izbicki.me/blog/wp-content/uploads/2013/01/Simon-substraction-edDist.png" width="700" height="200" /></p>
<p>Notice that the green bar in edDist looks really big&#8212;much bigger than the green bar in simonDist.  But when we subtract it away from simonDist, we still have some green marbles left over in simonDist2.  This is because the histogram is only showing the <em>probability</em> of a green marble, and not the <em>actual number</em> of marbles.</p>
<p>Finally, there&#8217;s one more crazy trick we can perform with the Categorical group.  It&#8217;s perfectly okay to have both positive and negative marbles in the same distribution.  For example:</p>
<pre>ghci&gt; plotDistribution (plotFile "mixedDist") (edDist &lt;&gt; (inverse donDist))</pre>
<p>results in:</p>
<p><img class="aligncenter size-full wp-image-2073" alt="mixedDist-300" src="http://izbicki.me/blog/wp-content/uploads/2013/01/mixedDist-300.png" width="300" height="209" /></p>
<p>Most statisticians would probably say that these upside down Categoricals are not &#8220;real distributions.&#8221; But at the very least, they are a convenient mathematical trick that makes <strong>working with distributions much more pleasant</strong>.</p>
<h3>Module</h3>
<p>Finally, an <a href="http://en.wikipedia.org/wiki/R-module">R-Module</a> is a group with two additional properties. First, it is <a href="http://en.wikipedia.org/wiki/Abelian_groups">abelian</a>. That means &lt;&gt; is commutative. So, for all a, b:</p>
<pre>a &lt;&gt; b == b &lt;&gt; a</pre>
<p>Second, the data type supports <strong>multiplication by any element in the </strong><a href="http://en.wikipedia.org/wiki/Ring_(mathematics)"><strong>ring</strong></a><strong> R</strong>. In Haskell, you can think of a ring as any member of the <a href="http://www.haskell.org/tutorial/numbers.html">Num</a> type class.</p>
<p>How is this useful?  It let&#8217;s &#8220;retrain&#8221; our distribution on the data points it has already seen.  Back to the example&#8230;</p>
<p>Well, Ed&#8212;being the clever guy that he is&#8212;recently developed a marble copying machine. That&#8217;s right! You just stick some marbles in on one end, and on the other end out pop 10 exact duplicates. Ed&#8217;s not just clever, but pretty nice too. He duplicates his new marbles and gives all of them back to Simon. What&#8217;s Simon&#8217;s new distribution look like?</p>
<p>Again, the naive way to answer this question would be to retrain from scratch:</p>
<pre>&gt;duplicateBag = simonBag ++ (concat $ replicate 10 edBag)
&gt;duplicateDist = train duplicateBag :: Categorical Marble Double</pre>
<p>Slightly better is to take advantage of the Semigroup property, and just apply that over and over again:</p>
<pre>&gt;duplicateDist' = simonDist2 &lt;&gt; (foldl1 (&lt;&gt;) $ replicate 10 edDist)</pre>
<p>But even better is to take advantage of the fact that Categorical is a module and the (<strong>.*</strong>) operator:</p>
<pre>&gt;duplicateDist'' = simonDist2 &lt;&gt; 10 .* edDist</pre>
<p>In picture form:</p>
<p><img class="aligncenter size-full wp-image-2066" alt="module example" src="http://izbicki.me/blog/wp-content/uploads/2013/01/module-example.png" width="700" height="200" /></p>
<p>Also notice that without the scalar multiplication, we would get back our original distribution:</p>
<p><img class="aligncenter size-full wp-image-2067" alt="module example-mod" src="http://izbicki.me/blog/wp-content/uploads/2013/01/module-example-mod.png" width="700" height="200" /></p>
<p>Another way to think about the module&#8217;s scalar multiplication is that it allows us to <strong>weight our distributions</strong>.</p>
<p>Ed just realized that he still needs a marble, and has decided to take one.  Someone has left their Marble bag sitting nearby, but he&#8217;s not sure whose it is.  He thinks that Simon is more forgetful than Don is, so he assigns a 60% probability that the bag is Simon&#8217;s and a 40% probability that it is Don&#8217;s.  When he takes a marble, what&#8217;s the probability that it is red?</p>
<p>We create a weighted distribution using module multiplication:</p>
<pre>&gt;weightedDist = 0.6 .* simonDist &lt;&gt; 0.4 .* donDist</pre>
<p>Then in ghci:</p>
<pre>ghci&gt; pdf weightedDist Red
0.4929577464788732</pre>
<p>We can also train directly on weighted data:</p>
<pre>&gt;weightedDataDist = train [(0.4,Red),(0.5,Green),(0.2,Green),(3.7,White)] :: Categorical Marble Double</pre>
<p>which gives us:</p>
<p><img class="aligncenter size-full wp-image-2068" alt="weightedDataDist-300" src="http://izbicki.me/blog/wp-content/uploads/2013/01/weightedDataDist-300.png" width="300" height="209" /></p>
<h3>The Takeaway and next posts</h3>
<p>Talking about the categorical distribution in algebraic terms let&#8217;s us do some cool new stuff with our distributions that we can&#8217;t easily do in other libraries.  None of this is statistically ground breaking. The cool thing is that <strong>algebra just makes everything so convenient to work with</strong>.</p>
<p>I think I&#8217;ll do another post on some cool tricks with the kernel density estimator that are not possible at all in other libraries, then do a post about the category (formal category-theoretic sense) of statistical training methods.  At that point, we&#8217;ll be ready to jump into machine learning tasks.  Depending on my mood we might take a pit stop to discuss the computational aspects of free groups and modules and how these relate to machine learning applications.</p>
<p><a href="http://izbicki.me/blog/feed">Sign up for the RSS feed</a> to stay tuned!</p>
 <img src="http://izbicki.me/blog/?feed-stats-post-id=1932" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://izbicki.me/blog/the-categorical-distributions-algebraic-structure/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nuclear weapon statistics using monoids, groups, and modules in Haskell</title>
		<link>http://izbicki.me/blog/nuclear-weapon-statistics-using-monoids-groups-and-modules-in-haskell?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nuclear-weapon-statistics-using-monoids-groups-and-modules-in-haskell</link>
		<comments>http://izbicki.me/blog/nuclear-weapon-statistics-using-monoids-groups-and-modules-in-haskell#comments</comments>
		<pubDate>Fri, 04 Jan 2013 14:47:40 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://izbicki.me/blog/?p=1766</guid>
		<description><![CDATA[The Bulletin of the Atomic Scientists tracks the nuclear capabilities of every country. We&#8217;re going to use their data to demonstrate Haskell&#8217;s HLearn library and the usefulness of abstract algebra to statistics. Specifically, we&#8217;ll see that the categorical distribution and kernel density estimates have monoid, group, and module algebraic structures.  We&#8217;ll explain what this crazy [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignright" alt="" src="http://upload.wikimedia.org/wikipedia/commons/thumb/7/79/Operation_Upshot-Knothole_-_Badger_001.jpg/282px-Operation_Upshot-Knothole_-_Badger_001.jpg" width="197" height="168" />The <a href="http://www.thebulletin.org/">Bulletin of the Atomic Scientists</a> tracks the nuclear capabilities of every country. We&#8217;re going to use their data to demonstrate Haskell&#8217;s <a href="http://hackage.haskell.org/package/HLearn-algebra">HLearn</a> library and the usefulness of abstract algebra to statistics. Specifically, we&#8217;ll see that the <a href="https://en.wikipedia.org/wiki/Categorical_distribution">categorical distribution</a> and <a href="https://en.wikipedia.org/wiki/Kernel_density_estimation">kernel density estimates</a> have <a href="https://en.wikipedia.org/wiki/Monoid">monoid</a>, <a href="https://en.wikipedia.org/wiki/Group_(mathematics)">group</a>, and <a href="https://en.wikipedia.org/wiki/Module_(mathematics)">module</a> algebraic structures.  We&#8217;ll explain what this crazy lingo even means, then take advantage of these structures to <strong>efficiently</strong> <strong>answer real-world statistical questions about nuclear war</strong>. It&#8217;ll be a <a href="https://en.wikipedia.org/wiki/WOPR">WOPR</a>!</p>
<p><span id="more-1766"></span></p>
<p>Before we get into the math, we&#8217;ll need to review the basics of nuclear politics.</p>
<p>The nuclear <a href="https://en.wikipedia.org/wiki/Nuclear_Non-Proliferation_Treaty">Non-Proliferation Treaty</a> (<strong>NPT</strong>) is the main treaty governing nuclear weapons. Basically, it says that there are five countries that are &#8220;allowed&#8221; to have nukes: the <strong>USA</strong>, <strong>UK</strong>, <strong>France</strong>, <strong>Russia</strong>, and <strong>China</strong>. &#8220;Allowed&#8221; is in quotes because the treaty specifies that these countries must eventually get rid of their nuclear weapons at some future, unspecified date. When another country, for example Iran, signs the NPT, they are agreeing to not develop nuclear weapons. What they get in exchange is help from the 5 nuclear weapons states in developing their own civilian nuclear power programs. (Iran has the legitimate complaint that Western countries are actively trying to stop its civilian nuclear program when they&#8217;re supposed to be helping it, but that&#8217;s a <a href="http://www.csmonitor.com/Commentary/Opinion/2010/0917/Reality-check-Iran-is-not-a-nuclear-threat">whole &#8216;nother can of worms</a>.)</p>
<p>The <a href="http://bos.sagepub.com/">Nuclear Notebook</a> tracks the nuclear capabilities of all these countries.  The most-current estimates are from mid-2012.  Here&#8217;s a summary (click the warhead type for more info):</p>
<table border="1" align="center">
<tbody>
<tr>
<td align="LEFT"><strong>Country</strong></td>
<td align="LEFT"><strong>Delivery Method</strong></td>
<td align="LEFT"><strong>Warhead</strong></td>
<td align="LEFT"><strong>Yield (kt)</strong></td>
<td align="LEFT"><strong># Deployed</strong></td>
</tr>
<tr>
<td align="LEFT" height="17">USA</td>
<td align="LEFT">ICBM</td>
<td align="LEFT"><a href="http://nuclearweaponarchive.org/Usa/Weapons/W78.html">W78</a></td>
<td align="RIGHT">335</td>
<td align="RIGHT">250</td>
</tr>
<tr>
<td align="LEFT" height="17">USA</td>
<td align="LEFT">ICBM</td>
<td align="LEFT"><a href="http://nuclearweaponarchive.org/Usa/Weapons/W87.html">W87</a></td>
<td align="RIGHT">300</td>
<td align="RIGHT">250</td>
</tr>
<tr>
<td align="LEFT" height="17">USA</td>
<td align="LEFT">SLBM</td>
<td align="LEFT"><a href="http://nuclearweaponarchive.org/Usa/Weapons/W76.html">W76</a></td>
<td align="RIGHT">100</td>
<td align="RIGHT">468</td>
</tr>
<tr>
<td align="LEFT" height="17">USA</td>
<td align="LEFT">SLBM</td>
<td align="LEFT"><a href="http://nuclearweaponarchive.org/Usa/Weapons/W76.html">W76-1</a></td>
<td align="RIGHT">100</td>
<td align="RIGHT">300</td>
</tr>
<tr>
<td align="LEFT" height="17">USA</td>
<td align="LEFT">SLBM</td>
<td align="LEFT"><a href="http://nuclearweaponarchive.org/Usa/Weapons/W88.html">W88</a></td>
<td align="RIGHT">455</td>
<td align="RIGHT">384</td>
</tr>
<tr>
<td align="LEFT" height="17">USA</td>
<td align="LEFT">Bomber</td>
<td align="LEFT"><a href="http://nuclearweaponarchive.org/Usa/Weapons/W80.html">W80</a></td>
<td align="RIGHT">150</td>
<td align="RIGHT">200</td>
</tr>
<tr>
<td align="LEFT" height="17">USA</td>
<td align="LEFT">Bomber</td>
<td align="LEFT"><a href="https://en.wikipedia.org/wiki/B61_nuclear_bomb">B61</a></td>
<td align="RIGHT">340</td>
<td align="RIGHT">50</td>
</tr>
<tr>
<td align="LEFT" height="17">USA</td>
<td align="LEFT">Bomber</td>
<td align="LEFT"><a href="https://en.wikipedia.org/wiki/B83_nuclear_bomb">B83</a></td>
<td align="RIGHT">1200</td>
<td align="RIGHT">50</td>
</tr>
<tr>
<td align="LEFT" height="17">UK</td>
<td align="LEFT">SLBM</td>
<td align="LEFT"><a href="http://nuclearweaponarchive.org/Usa/Weapons/W76.html">W76</a></td>
<td align="RIGHT">100</td>
<td align="RIGHT">225</td>
</tr>
<tr>
<td align="LEFT" height="17">France</td>
<td align="LEFT">SLBM</td>
<td align="LEFT"><a href="https://en.wikipedia.org/wiki/TN_75">TN75</a></td>
<td align="RIGHT">100</td>
<td align="RIGHT">150</td>
</tr>
<tr>
<td align="LEFT" height="17">France</td>
<td align="LEFT">Bomber</td>
<td align="LEFT"><a href="https://en.wikipedia.org/wiki/TN_81">TN81</a></td>
<td align="RIGHT">300</td>
<td align="RIGHT">150</td>
</tr>
<tr>
<td align="LEFT" height="17">Russia</td>
<td align="LEFT">ICBM</td>
<td align="LEFT">RS-20V</td>
<td align="RIGHT">800</td>
<td align="RIGHT">500</td>
</tr>
<tr>
<td align="LEFT" height="17">Russia</td>
<td align="LEFT">ICBM</td>
<td align="LEFT">RS-18</td>
<td align="RIGHT">400</td>
<td align="RIGHT">288</td>
</tr>
<tr>
<td align="LEFT" height="17">Russia</td>
<td align="LEFT">ICBM</td>
<td align="LEFT">RS-12M</td>
<td align="RIGHT">800</td>
<td align="RIGHT">135</td>
</tr>
<tr>
<td align="LEFT" height="17">Russia</td>
<td align="LEFT">ICBM</td>
<td align="LEFT">RS-12M2</td>
<td align="RIGHT">800</td>
<td align="RIGHT">56</td>
</tr>
<tr>
<td align="LEFT" height="17">Russia</td>
<td align="LEFT">ICBM</td>
<td align="LEFT">RS-12M1</td>
<td align="RIGHT">800</td>
<td align="RIGHT">18</td>
</tr>
<tr>
<td align="LEFT" height="17">Russia</td>
<td align="LEFT">ICBM</td>
<td align="LEFT">RS-24</td>
<td align="RIGHT">100</td>
<td align="RIGHT">90</td>
</tr>
<tr>
<td align="LEFT" height="17">Russia</td>
<td align="LEFT">SLBM</td>
<td align="LEFT"><a href="https://en.wikipedia.org/wiki/R-29_Vysota">RSM-50</a></td>
<td align="RIGHT">50</td>
<td align="RIGHT">144</td>
</tr>
<tr>
<td align="LEFT" height="17">Russia</td>
<td align="LEFT">SLBM</td>
<td align="LEFT"><a href="https://en.wikipedia.org/wiki/R-29RM_Shtil">RSM-54</a></td>
<td align="RIGHT">100</td>
<td align="RIGHT">384</td>
</tr>
<tr>
<td align="LEFT" height="17">Russia</td>
<td align="LEFT">Bomber</td>
<td align="LEFT"><a href="https://en.wikipedia.org/wiki/Kh-55_(missile_family)">AS-15</a></td>
<td align="RIGHT">200</td>
<td align="RIGHT">820</td>
</tr>
<tr>
<td align="LEFT" height="17">China</td>
<td align="LEFT">ICBM</td>
<td align="LEFT"><a href="https://www.fas.org/nuke/guide/china/theater/df-3a.htm">DF-3A</a></td>
<td align="RIGHT">3300</td>
<td align="RIGHT">16</td>
</tr>
<tr>
<td align="LEFT" height="17">China</td>
<td align="LEFT">ICBM</td>
<td align="LEFT"><a href="https://www.fas.org/nuke/guide/china/theater/df-4.htm">DF-4</a></td>
<td align="RIGHT">3300</td>
<td align="RIGHT">12</td>
</tr>
<tr>
<td align="LEFT" height="17">China</td>
<td align="LEFT">ICBM</td>
<td align="LEFT"><a href="https://en.wikipedia.org/wiki/DF-5">DF-5A</a></td>
<td align="RIGHT">5000</td>
<td align="RIGHT">20</td>
</tr>
<tr>
<td align="LEFT" height="17">China</td>
<td align="LEFT">ICBM</td>
<td align="LEFT"><a href="https://en.wikipedia.org/wiki/DF-21">DF-21</a></td>
<td align="RIGHT">300</td>
<td align="RIGHT">60</td>
</tr>
<tr>
<td align="LEFT" height="17">China</td>
<td align="LEFT">ICBM</td>
<td align="LEFT"><a href="https://en.wikipedia.org/wiki/DF-31">DF-31</a></td>
<td align="RIGHT">300</td>
<td align="RIGHT">20</td>
</tr>
<tr>
<td align="LEFT" height="17">China</td>
<td align="LEFT">ICBM</td>
<td align="LEFT"><a href="https://en.wikipedia.org/wiki/DF-31">DF-31A</a></td>
<td align="RIGHT">300</td>
<td align="RIGHT">20</td>
</tr>
<tr>
<td align="LEFT" height="17">China</td>
<td align="LEFT">Bomber</td>
<td align="LEFT"><a href="https://en.wikipedia.org/wiki/Xian_H-6">H-6</a></td>
<td align="RIGHT">3100</td>
<td align="RIGHT">20</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p>I&#8217;ve consolidated all this data into the file <a href="http://izbicki.me/public/datasets/nukes-list.csv">nukes-list.csv</a>, which we will analyze in this post.  If you want to try out this code for yourself (or the homework question at the end), you&#8217;ll need to download it.  Every line in the file corresponds to a single nuclear warhead, not delivery method.  Warheads are the parts that go boom!  Bombers, <a href="https://en.wikipedia.org/wiki/Icbm">ICBMs</a>, and <a href="https://en.wikipedia.org/wiki/SSBN">SSBN</a>/<a href="https://en.wikipedia.org/wiki/SLBMs">SLBMs</a> are the delivery method.</p>
<p><img class="aligncenter" alt="nuclear-triad" src="http://izbicki.me/blog/wp-content/uploads/2013/01/nuclear-triad.jpg" width="700" height="149" /></p>
<p>There are three things to note about this data.  First, it&#8217;s <strong>only estimates</strong> based on public sources.  In particular, it probably overestimates the Russian nuclear forces. <a href="http://russianforces.org/missiles/">Other estimates are considerably lower</a>.  Second, we will only be considering <strong>deployed, strategic warheads</strong>.  Basically, this means the &#8220;really big nukes that are currently aimed at another country.&#8221;  There are thousands more tactical warheads, and warheads in reserve stockpiles waiting to be disassembled.  For simplicity&#8212;and because these nukes don&#8217;t significantly affect strategic planning&#8212;we won&#8217;t be considering them here.   Finally, there are 4 countries who are not members of the NPT but have nuclear weapons: <strong>Israel</strong>, <strong>Pakistan</strong>, <strong>India</strong>, and <strong>North Korea</strong>.  We will be ignoring them here because their inventories are relatively small, and most of their weapons would not be considered strategic.</p>
<h3>Programming preliminaries</h3>
<p>Now we&#8217;re ready to start programming. First, let&#8217;s import our libraries:</p>
<pre>&gt;import Control.Lens
&gt;import Data.Csv
&gt;import qualified Data.Vector as V
&gt;import qualified Data.ByteString.Lazy.Char8  as BS
&gt; 
&gt;import HLearn.Algebra
&gt;import HLearn.Models.Distributions
&gt;import HLearn.Gnuplot.Distributions</pre>
<p>Next, we load our data using the <a href="http://hackage.haskell.org/package/cassava">Cassava</a> package.  (You don&#8217;t need to understand how this works.)</p>
<pre>&gt;main = do
&gt;    Right rawdata &lt;- fmap (fmap V.toList . decode True) $ BS.readFile "nukes-list.csv"
&gt;        :: IO (Either String [(String, String, String, Int)])</pre>
<p>And we&#8217;ll use the <a href="http://hackage.haskell.org/package/lens">Lens</a> package to parse the CSV file into a series of variables containing just the values we want.  (You also don&#8217;t need to understand this.)</p>
<pre>&gt;   let list_usa    = fmap (\row -&gt; row^._4) $ filter (\row -&gt; (row^._1)=="USA"   ) rawdata
&gt;   let list_uk     = fmap (\row -&gt; row^._4) $ filter (\row -&gt; (row^._1)=="UK"    ) rawdata 
&gt;   let list_france = fmap (\row -&gt; row^._4) $ filter (\row -&gt; (row^._1)=="France") rawdata 
&gt;   let list_russia = fmap (\row -&gt; row^._4) $ filter (\row -&gt; (row^._1)=="Russia") rawdata 
&gt;   let list_china  = fmap (\row -&gt; row^._4) $ filter (\row -&gt; (row^._1)=="China" ) rawdata</pre>
<p><strong>NOTE:</strong> All you need to understand about the above code is what these list_country variables look like. So let&#8217;s print one:</p>
<pre>&gt;   putStrLn $ "List of American nuclear weapon sizes = " ++ show list_usa</pre>
<p>gives us the output:</p>
<pre>List of American nuclear weapon sizes = fromList [335,335,335,335,335,335,335,335,335,335  ...  1200,1200,1200,1200,1200]</pre>
<p>If we want to know how many weapons are in the American arsenal, we can take the length of the list:</p>
<pre>&gt;   putStrLn $ "Number of American weapons = " ++ show (length list_usa)</pre>
<p>We get that there are <strong>1951 American deployed, strategic nuclear weapons</strong>.  If we want to know the total &#8220;blowing up&#8221; power, we take the sum of the list:</p>
<pre>&gt;   putStrLn $ "Explosive power of American weapons = " ++ show (sum list_usa)</pre>
<p>We get that the US has  <strong>516 megatons of deployed, strategic nuclear weapons</strong>.  That&#8217;s the equivalent of <strong>1,033,870,000,000 pounds of TNT</strong>.</p>
<p>To get the total number of weapons in the world, we concatenate every country&#8217;s list of weapons and find the length:</p>
<pre>&gt;   let list_all = list_usa ++ list_uk ++ list_france ++ list_russia ++ list_china
&gt;   putStrLn $ "Number of nukes in the whole world = " ++ show (length list_all)</pre>
<p>Doing this for every country gives us the table:</p>
<table border="1" align="center">
<tbody>
<tr>
<td align="LEFT"><strong>Country</strong></td>
<td align="LEFT"><strong>Warheads</strong></td>
<td align="LEFT"><strong>Total explosive power (kt)</strong></td>
</tr>
<tr>
<td align="LEFT" height="17">USA</td>
<td align="RIGHT">1,951</td>
<td align="RIGHT">516,935</td>
</tr>
<tr>
<td align="LEFT" height="17">UK</td>
<td align="RIGHT">225</td>
<td align="RIGHT">22,500</td>
</tr>
<tr>
<td align="LEFT" height="17">France</td>
<td align="RIGHT">300</td>
<td align="RIGHT">60,000</td>
</tr>
<tr>
<td align="LEFT" height="17">Russia</td>
<td align="RIGHT">2,435</td>
<td align="RIGHT">901,000</td>
</tr>
<tr>
<td align="LEFT" height="17">China</td>
<td align="RIGHT">168</td>
<td align="RIGHT">284,400</td>
</tr>
<tr>
<td align="LEFT" height="17"><strong>Total</strong></td>
<td align="RIGHT">5,079</td>
<td align="RIGHT">1,784,835</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p>Now let&#8217;s do some algebra!</p>
<h3>Monoids and groups</h3>
<p>In a previous post, we saw that the <a href="http://izbicki.me/blog/gausian-distributions-are-monoids">Gaussian distribution forms a group</a>. This means that it has all the properties of a monoid&#8212;an empty element (<strong>mempty</strong>) that represents the distribution trained on no data, and a binary operation (<strong>mappend</strong>) that merges two distributions together&#8212;plus an <strong>inverse</strong>. This inverse lets us &#8220;subtract&#8221; two Gaussians from each other.</p>
<p>It turns out that many other distributions also have this group property. For example, the <strong>categorical distribution.</strong>  This distribution is used for measuring discrete data. Essentially, it assigns some probability to each &#8220;label.&#8221;  In our case, the labels are the size of the nuclear weapon, and the probability is the chance that a randomly chosen nuke will be exactly that destructive.  We train our categorical distribution using the <a href="http://hackage.haskell.org/packages/archive/HLearn-algebra/0.0.1/doc/html/HLearn-Algebra-Models.html">train</a> function:</p>
<pre>&gt; let cat_usa = train list_usa :: Categorical Int Double</pre>
<p>If we plot this distribution, we&#8217;ll get a graph that looks something like:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1808" alt="catigorical distribution of american nuclear weapons" src="http://izbicki.me/blog/wp-content/uploads/2013/01/catigorical-distribution-of-american-nuclear-weapons1.png" width="572" height="402" /></p>
<p>A distribution like this is useful to war planners from other countries.  It can help them statistically determine the amount of casualties their infrastructure will take from a nuclear exchange.</p>
<p>Now, let&#8217;s train equivalent distributions for our other countries.</p>
<pre>&gt; let cat_uk = train list_uk :: Categorical Int Double
&gt; let cat_france = train list_france :: Categorical Int Double
&gt; let cat_russia = train list_russia :: Categorical Int Double
&gt; let cat_china = train list_china :: Categorical Int Double</pre>
<p>Because training the categorical distribution is a group <strong>homomorphism</strong>, we can train a distribution over all nukes by either training directly on the data:</p>
<pre>&gt;   let cat_allA = train list_all :: Categorical Int Double</pre>
<p>or we can merge the already generated categorical distributions:</p>
<pre>&gt;   let cat_allB = cat_usa &lt;&gt; cat_uk &lt;&gt; cat_france &lt;&gt; cat_russia &lt;&gt; cat_china</pre>
<p>Because of the homomorphism property, we will get the same result both ways. Since we&#8217;ve already done the calculations for each of the the countries already, method B will be more efficient&#8212;it won&#8217;t have to repeat work we&#8217;ve already done.  If we plot either of these distributions, we get:</p>
<p><img class="aligncenter size-full wp-image-1811" alt="catigorical distribution of all nuclear weapons" src="http://izbicki.me/blog/wp-content/uploads/2013/01/catigorical-distribution-of-all-nuclear-weapons.png" width="572" height="401" /></p>
<p>The thing to notice in this plot is that most countries have a nuclear arsenal that is distributed similarly to the United States&#8212;except for China.  These Chinese ICBMs will become much more important when we discuss nuclear strategy in the last section.</p>
<p>But nuclear war planners don&#8217;t particularly care about this complete list of nuclear weapons.  What war planners care about is the <strong>survivable nuclear weapons</strong>&#8212;that is, weapons that won&#8217;t be blown up by a surprise nuclear attack.  Our distributions above contain nukes dropped from bombers, but these are not survivable.  They are easy to destroy.  For our purposes, we&#8217;ll call anything that&#8217;s not a bomber a survivable weapon.</p>
<p><img class="aligncenter size-full wp-image-1803" alt="nuclear-triad-no-bomber" src="http://izbicki.me/blog/wp-content/uploads/2013/01/nuclear-triad-no-bomber.jpg" width="700" height="149" /></p>
<p>We&#8217;ll use the group property of the categorical distribution to calculate the survivable weapons.  First, we create a distribution of just the <em>un</em>survivable bombers:</p>
<pre>&gt;   let list_bomber = fmap (\row -&gt; row^._4) $ filter (\row -&gt; (row^._2)=="Bomber") rawdata
&gt;   let cat_bomber = train list_bomber :: Categorical Int Double</pre>
<p>Then, we use our group inverse to subtract these unsurvivable weapons away:</p>
<pre>&gt;   let cat_survivable = cat_allB &lt;&gt; (inverse cat_bomber)</pre>
<p>Notice that we calculated this distribution indirectly&#8212;there was no possible way to combine our variables above to generate this value without using the inverse! This is the power of groups in statistics.</p>
<h3>More distributions</h3>
<p>The categorical distribution is not sufficient to accurately describe the distribution of nuclear weapons. This is because we don&#8217;t actually know the yield of a given warhead. Like all things, it has some manufacturing tolerances that we must consider. For example, if we detonate a 300 kt warhead, the actual explosion might be 275 kt, 350 kt, or the bomb might even &#8220;fizzle out&#8221; and have almost a 0kt explosion.</p>
<p>We&#8217;ll model this by using a <strong>kernel density estimator</strong> (KDE).  The KDE basically takes all our data points, assigns each one a probability distribution called a &#8220;kernel,&#8221; then sums these kernels together.  It is a very powerful and general technique for modelling distributions&#8230; and it also happens to form a group!</p>
<p>First, let&#8217;s create the parameters for our KDE.  The bandwidth controls how wide each of the kernels is.  Bigger means wider.  I selected 20 because it made a reasonable looking density function.  The sample points are exactly what they sounds like: they are where we will sample the density from.  We can generate them using the function <a href="http://hackage.haskell.org/packages/archive/HLearn-distributions/0.1.0.1/doc/html/HLearn-Models-Distributions-KernelDensityEstimator.html#g:3">genSamplePoints</a>.  Finally, the kernel is the shape of the distributions we will be summing up.  There are many <a href="http://hackage.haskell.org/packages/archive/HLearn-distributions/0.1.0.1/doc/html/HLearn-Models-Distributions-KernelDensityEstimator-Kernels.html">supported kernels</a>.</p>
<pre>&gt;   let kdeparams = KDEParams
&gt;        { bandwidth    = Constant 20
&gt;        , samplePoints = genSamplePoints
&gt;               0       -- minimum
&gt;               4000    -- maximum
&gt;               4000    -- number of samples
&gt;        , kernel       = KernelBox Gaussian
&gt;        } :: KDEParams Double</pre>
<p>Now, we&#8217;ll train kernel density estimates on our data.  Notice that because the KDE takes parameters, we must use the <strong>train&#8217;</strong> function instead of just train.</p>
<pre>&gt;   let kde_usa     = train' kdeparams list_usa      :: KDE Double</pre>
<p>Again, plotting just the American weapons gives:</p>
<p><img class="aligncenter size-full wp-image-1816" alt="kernel density estimate of american nuclear weapons" src="http://izbicki.me/blog/wp-content/uploads/2013/01/kernel-density-estimate-of-american-nuclear-weapons.png" width="572" height="400" /></p>
<p>And we train the corresponding distributions for the other countries.</p>
<pre>&gt;   let kde_uk      = train' kdeparams list_uk       :: KDE Double
&gt;   let kde_france  = train' kdeparams list_france   :: KDE Double
&gt;   let kde_russia  = train' kdeparams list_russia   :: KDE Double
&gt;   let kde_china   = train' kdeparams list_china    :: KDE Double
&gt;
&gt;   let kde_all = kde_usa &lt;&gt; kde_uk &lt;&gt; kde_france &lt;&gt; kde_russia &lt;&gt; kde_china</pre>
<p>The KDE is a powerful technique, but the draw back is that it is computationally expensive&#8212;especially when a large number of sample points are used. Fortunately, all computations in the HLearn library are <strong>easily parallelizable</strong> by applying the higher order function <a href="http://hackage.haskell.org/packages/archive/HLearn-algebra/0.0.1/doc/html/HLearn-Algebra-Functions.html">parallel</a>.</p>
<p>We can calculate the full KDE from scratch in parallel like this:</p>
<pre>&gt;   let list_double_all = map fromIntegral list_all :: [Double]
&gt;   let kde_all_parA = (parallel (train' kdeparams)) list_double_all :: KDE Double</pre>
<p>or we can perform a parallel reduction on the KDEs for each country like this:</p>
<pre>&gt;   let kde_all_parB = (parallel reduce) [kde_usa, kde_uk, kde_france, kde_russia, kde_china]</pre>
<p>And because the KDE is a homomorphism, we get the same exact thing either way.  Let&#8217;s plot the parallel version:</p>
<pre>&gt;   plotDistribution (genPlotParams "kde_all" kde_all_parA) kde_all_parA</pre>
<p><img class="aligncenter size-full wp-image-1817" alt="kernel density estimate of all nuclear weapons globally-mod" src="http://izbicki.me/blog/wp-content/uploads/2013/01/kernel-density-estimate-of-all-nuclear-weapons-globally-mod.png" width="573" height="401" /></p>
<p>The parallel computation takes about 16 seconds on my Core2 Duo laptop running on 2 processors, whereas the serial computation takes about 28 seconds.</p>
<p>This is a considerable speedup, but we can still do better. It turns out that there is a homomorphism from the Categorical distribution to the KDE:</p>
<pre>&gt;   let kde_fromcat_all = cat_allB $&gt; kdeparams
&gt;   plotDistribution (genPlotParams "kde_fromcat_all" kde_fromcat_all) kde_fromcat_all</pre>
<p>(For more information about the morphism chaining operator <strong>$&gt;</strong>, see the <a href="http://hackage.haskell.org/packages/archive/HLearn-algebra/0.1.0.1/doc/html/HLearn-Algebra-Morphism.html">Hlearn documentation</a>.) This computation takes less than a second and gets the exact same result as the much more expensive computations above.</p>
<p>We can express this relationship with a commutative diagram:</p>
<p><img class="aligncenter size-full wp-image-1790" alt="kde-commutative-diagram-small" src="http://izbicki.me/blog/wp-content/uploads/2013/01/kde-commutative-diagram-small.png" width="300" height="400" /></p>
<p>No matter which path we take to get to a KDE, we will get the exact same answer.  So we should always take the path that will be least computationally expensive for the data set we&#8217;re working on.</p>
<p>Why does this work? Well, the categorical distribution is a structure called the &#8220;free module&#8221; in disguise.</p>
<h3>Modules and the Free Module</h3>
<p><strong>R-Modules</strong> (like groups, but unlike monoids) have not seen much love from functional programmers. This is a shame, because they&#8217;re quite handy. It turns out they will increase our performance dramatically in this case.</p>
<p>It&#8217;s not super important to know the formal definition of an R-module, but here it is anyways: An R-module is a group with an additional property: it can be &#8220;multiplied&#8221; by any element of the <a href="https://en.wikipedia.org/wiki/Ring_(mathematics)">ring</a> R. This is a generalization of <a href="https://en.wikipedia.org/wiki/Vector_space">vector spaces</a> because R need only be a ring instead of a <a href="https://en.wikipedia.org/wiki/Field_(mathematics)">field</a>. (Rings do not necessarily have multiplicative inverses.)  It&#8217;s probably easier to see what this means by an example.</p>
<p>Vectors are modules.  Let&#8217;s say I have a vector:</p>
<pre>&gt;   let vec = [1,2,3,4,5] :: [Int]</pre>
<p>I can perform scalar multiplication on that vector like this:</p>
<pre>&gt;   let vec2 = 3 .* vec</pre>
<p>which as you might expect results in:</p>
<pre>[3,6,9,12,15]</pre>
<p>Our next example is the <strong>free R-module</strong>. A &#8220;free&#8221; structure is one that obeys only the axioms of the structure and nothing else. Functional programmers are very familiar with the free monoid&#8212;it&#8217;s the list data type. The <strong>free Z-module</strong> is like a beefed up list. Instead of just storing the elements in a list, it also stores the number of times that element occurred.  (Z is shorthand for the set of integers, which form a ring but not a field.) This lets us greatly reduce the memory required to store a repetitive data set.</p>
<p>In HLearn, we represent the free module over a ring r with the data type:</p>
<pre>:: FreeMod r a</pre>
<p>where a is the type of elements to be stored in the free module. We can convert our lists into free modules using the function <strong>list2module</strong> like this:</p>
<pre>&gt;   let module_usa = list2module list_usa</pre>
<p>But what does the free module actually look like? Let&#8217;s print it to find out:</p>
<pre>&gt;   print module_usa</pre>
<p>gives us:</p>
<pre>FreeMod (fromList [(100,768),(150,200),(300,250),(335,249),(340,50),(455,384),(1200,50)])</pre>
<p>This is much more compact! So this is the take away: <strong>The free module makes repetitive data sets easier to work with.</strong> Now, let&#8217;s convert all our country data into module form:</p>
<pre>&gt;   let module_uk       = list2module list_uk
&gt;   let module_france   = list2module list_france
&gt;   let module_russia   = list2module list_russia
&gt;   let module_china    = list2module list_china</pre>
<p>Because modules are also groups, we can combine them like so:</p>
<pre>&gt;   let module_allA = module_usa &lt;&gt; module_uk &lt;&gt; module_france &lt;&gt; module_russia &lt;&gt; module_china</pre>
<p>or, we could train them from scratch:</p>
<pre>&gt;   let module_allB = list2module list_all</pre>
<p>Again, because generating a free module is a homomorphism, both methods are equivalent.</p>
<h3>Module distributions</h3>
<p>The categorical distribution and the KDE both have this module structure. This gives us two cool properties for free.</p>
<p>First, <strong>we can train these distributions directly from the free module</strong>.  Because the free module is potentially much more compact than a list is, this can save both memory and time. If we run:</p>
<pre>&gt;   let cat_module_all = train module_allB :: Categorical Int Double
&gt;   let kde_module_all = train' kdeparams module_allB :: KDE Double</pre>
<p>Then we get the properties:</p>
<pre>cat_mod_all == cat_all
kde_mod_all == kde_all == kde_cat_all</pre>
<p>Extending our commutative diagram above gives:</p>
<p><img class="aligncenter size-full wp-image-1791" alt="kde-commutative-diagram-big" src="http://izbicki.me/blog/wp-content/uploads/2013/01/kde-commutative-diagram-big.png" width="580" height="400" /></p>
<p>Again, no matter which path we take to train our KDE, we still get the same result because each of these arrows is a homomorphism.</p>
<p>Second, <strong>if a distribution is a module, we can weight the importance of our data points</strong>.  Let&#8217;s say we&#8217;re a general from North Korea (DPRK), and we&#8217;re planning our nuclear strategy. The US and North Korea have a very strained relationship in the nuclear department. It is much more likely that the US will try to nuke the DPRK than China will. And modules let us model this!  We can weight each country&#8217;s influence on our &#8220;nuclear threat profile&#8221; distribution like this:</p>
<pre>&gt;   let threats_dprk = 20 .* kde_usa
&gt;                   &lt;&gt; 10 .* kde_uk
&gt;                   &lt;&gt; 5  .* kde_france
&gt;                   &lt;&gt; 2  .* kde_russia
&gt;                   &lt;&gt; 1  .* kde_china
&gt;
&gt;   plotDistribution (genPlotParams "threats_dprk" threats_dprk) threats_dprk</pre>
<p>Basically, we&#8217;re saying that the USA is 20x more likely to attack the DPRK than China is.  Graphically, our threat distribution is:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1797" alt="nuclear-threat-against-dprk" src="http://izbicki.me/blog/wp-content/uploads/2013/01/nuclear-threat-against-dprk1.png" width="570" height="400" /></p>
<p>The maximum threat that we have to worry about is about 1300 kt, so we need to <a href="https://encrypted.google.com/url?sa=t&amp;rct=j&amp;q=nuclear%20blast%20dynamics&amp;source=web&amp;cd=3&amp;ved=0CEQQFjAC&amp;url=http%3A%2F%2Fwww.dtic.mil%2Fdtic%2Ftr%2Ffulltext%2Fu2%2F601139.pdf&amp;ei=sFbmUIPtM8K6igLqpIGICg&amp;usg=AFQjCNE_yhq31Q03YvFfGl1Te_WsQlKOgw&amp;sig2=Q3YNU84DaVTtWffkOlLOCQ&amp;bvm=bv.1355534169,d.cGE">design all our nuclear bunkers to withstand this level of blast</a>.  Nuclear war planners would use the above distribution to figure out how much infrastructure would survive a nuclear exchange.  To see how this is done, you&#8217;ll have to click the link.</p>
<p>On the other hand, if we&#8217;re an <strong>American general</strong>, then we might say that China is our biggest threat&#8230; who knows what they&#8217;ll do when we can&#8217;t pay all the debt we owe them!?</p>
<pre>&gt;   let threats_usa = 1 .* kde_russia 
&gt;                  &lt;&gt; 5 .* kde_china
&gt;
&gt;   plotDistribution (genPlotParams "threats_usa" threats_usa) threats_usa</pre>
<p>Graphically:</p>
<p><img class="aligncenter size-full wp-image-1798" alt="nuclear-threat-against-usa" src="http://izbicki.me/blog/wp-content/uploads/2013/01/nuclear-threat-against-usa.png" width="573" height="400" /></p>
<p>So now Chinese ICBMs are a real threat.  For American infrastructure to be secure, most of it needs to be able to withstand ~3500 kt blast.  (Actually, Chinese nuclear policy is called the &#8220;minimum means of reprisal&#8221;&#8212;these nukes are not targeted at military installations, but major cities.  Unlike the other nuclear powers, China doesn&#8217;t hope to win a nuclear war.  Instead, its nuclear posture is designed to prevent nuclear war in the first place.  This is why China has the fewest weapons of any of these countries.  For a detailed analysis, see the book <a href="http://mitpress.mit.edu/books/minimum-means-reprisal">Minimum Means of Reprisal</a>.  This means that American military infrastructure isn&#8217;t threatened by these large Chinese nukes, and really only needs to be able to withstand an 800kt explosion to be survivable.)</p>
<p>By the way, since we&#8217;ve already calculated all of the kde_country variables before, <strong>these computations take virtually no time at all to compute</strong>.  Again, this is all made possible thanks to our friend abstract algebra.</p>
<h3>Homework + next Post</h3>
<p>If you want to try out the <a href="http://hackage.haskell.org/package/HLearn-algebra">HLearn library</a> for yourself, here&#8217;s a question you can try to answer: Create the DPRK and US threat distributions above, but only use survivable weapons.  Don&#8217;t include bombers in the analysis.</p>
<p>In our <strong>next post</strong>, we&#8217;ll go into more detail about the <strong>mathematical plumbing</strong> that makes all this possible. Then we&#8217;ll start talking about Bayesian classification and full-on machine learning. <a href="http://izbicki.me/blog/feed">Subscribe to the RSS feed</a> so you don&#8217;t miss out!</p>
<p>Why don&#8217;t you listen to <a href="http://www.youtube.com/watch?v=YDFqoReof6A">Tom Lehrer&#8217;s &#8220;Song for WWIII&#8221;</a> while you wait?</p>
<p>&nbsp;</p>
 <img src="http://izbicki.me/blog/?feed-stats-post-id=1766" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://izbicki.me/blog/nuclear-weapon-statistics-using-monoids-groups-and-modules-in-haskell/feed</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>My 2012 Experiments in Christianity</title>
		<link>http://izbicki.me/blog/my-2012-experiments-in-christianity?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=my-2012-experiments-in-christianity</link>
		<comments>http://izbicki.me/blog/my-2012-experiments-in-christianity#comments</comments>
		<pubDate>Wed, 02 Jan 2013 05:31:19 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Religion]]></category>

		<guid isPermaLink="false">http://izbicki.me/blog/?p=1708</guid>
		<description><![CDATA[We don&#8217;t know what God wants, and we wouldn&#8217;t know how to do it even if we did.  Therefore (as Gandhi put it) we must &#8220;experiment with truth.&#8221;  We must discover truth for ourselves, and how to achieve it. These are my experiments from 2012.  I didn&#8217;t try these experiments because they are somehow the [...]]]></description>
				<content:encoded><![CDATA[<div>
<p>We don&#8217;t know what God wants, and we wouldn&#8217;t know how to do it even if we did.  Therefore (as Gandhi put it) we must &#8220;experiment with truth.&#8221;  We must discover truth for ourselves, and how to achieve it.</p>
<p>These are my experiments from 2012.  I didn&#8217;t try these experiments because they are somehow the &#8220;most Christ-like&#8221; thing to do.  I tried them because <strong>I don&#8217;t know what the most Christ-like thing is, but I want to learn</strong>.  I want to train myself to do it at all times.  Some of these experiments <span style="color: #008000;">succeeded</span> and some <span style="color: #800000;">failed</span>.  But all of them made me a better Christian.</p>
<p><span id="more-1708"></span></p>
</div>
<p><span class="Apple-style-span" style="line-height: 17px;" data-mce-mark="1"><strong>1. Fasting: <span style="color: #008000;">SUCCESS</span></strong></span></p>
<p><span class="Apple-style-span" style="line-height: 17px;" data-mce-mark="1"><strong><img class="alignright" alt="" src="http://upload.wikimedia.org/wikipedia/commons/thumb/e/e5/Fasting_4-Fasting-a-glass-of-water-on-an-empty-plate.jpg/550px-Fasting_4-Fasting-a-glass-of-water-on-an-empty-plate.jpg" width="231" height="202" /></strong>This year, I decided not to eat any food on Mondays. During lent, I didn&#8217;t eat on either Monday or Thursday. I had never fasted before this. Growing up, I thought fasting was a &#8220;stupid Catholic thing.&#8221; But now I see it is immensely valuable.</span></p>
<p><span class="Apple-style-span" style="line-height: 17px;" data-mce-mark="1">From an earthly standpoint, fasting helped me exercise my willpower and discipline. </span>From a spiritual standpoint, being hungry was a constant reminder of people in need around the world&#8212;I need to help them just like Jesus would, and I need to give thanks to God for what food and blessings I do have. Fasting also helped me grow closer to God. Every hunger pang was a reminder to say a prayer to God asking for His endurance and grace.</p>
<p><strong>2. Vegetarianism: <span style="color: #008000;">SUCCESS</span></strong></p>
<p><span class="Apple-style-span" style="line-height: 17px;" data-mce-mark="1"><strong><img class="alignright size-medium wp-image-1715" alt="IMAG0175" src="http://izbicki.me/blog/wp-content/uploads/2013/01/IMAG0175-300x179.jpg" width="300" height="179" /></strong>From January 1st until Easter Sunday (~3 months), I didn&#8217;t eat any meat. I did this experiment for two reasons. First, I feel bad about factory farming and my involvement in the abuse of animals. Second, I want to stand in solidarity with those who choose vegetarianism for moral reasons. I want to understand what it feels like to go into a restaurant or a party and leave hungry because there&#8217;s no vegetarian food. I want to share their pain.</span></p>
<p><span class="Apple-style-span" style="line-height: 17px;" data-mce-mark="1">This experiment was much easier than I expected. I never had any cravings for meat, and learned to cook quite a variety of different foods. My new favorite food is &#8220;buffalo broccoli,&#8221; a stir fry with buffalo sauce added at the end.  It&#8217;s even better on broccoli than it is on chicken because the broccoli has so much surface area for the sauce to cling to.  It&#8217;s amazing.</span></p>
<p><span class="Apple-style-span" style="line-height: 17px;" data-mce-mark="1">Ultimately, however, I decided not to remain strictly vegetarian because it was harming my relationships. Sharing meals is an important part of friendship, and I was unable to participate in several church and family dinners because of my strict vegetarianism. So now, I try to eat &#8220;mostly vegetarian.&#8221; If I&#8217;m by myself, I won&#8217;t eat meat. If I&#8217;m at a party and there&#8217;s a vegetarian option available, I&#8217;ll eat only that. But, if I&#8217;m with other people and the only food available is meat, then I&#8217;ll eat meat with them. To me, strengthening these relationships with humans seems more important than reducing the suffering of factory farmed animals.</span></p>
<p><span class="Apple-style-span" style="line-height: 17px;" data-mce-mark="1"><strong>3. Food not Bombs: <span style="color: #008000;">SUCCESS</span></strong> <strong>&amp; <span style="color: #800000;">FAILURE</span></strong></span></p>
<p><span class="Apple-style-span" style="line-height: 17px;" data-mce-mark="1"><strong><img class="alignright  wp-image-1711" alt="fnblogo" src="http://izbicki.me/blog/wp-content/uploads/2013/01/fnblogo.jpg" width="169" height="190" /></strong>Ever since <a href="http://izbicki.me/blog/co-testimony-the-development-of-my-beliefs">leaving the Navy</a>, I&#8217;ve struggled with how to express my pacifism. I want to do it in a way that <a href="http://izbicki.me/blog/co-testimony-im-not-anti-war-im-pro-peace">promotes peace rather than just condemns war</a>. So I helped start a Food not Bombs group on the UCR campus with a few other students. We served free vegetarian meals every Friday during the spring 2012 quarter. In total, <a href="http://izbicki.me/blog/how-i-serve-150-free-lunches-for-less-than-20-cents-each-using-homebrew-equipment">we served 1000 meals</a>.</span></p>
<p><span class="Apple-style-span" style="line-height: 17px;" data-mce-mark="1">Unfortunately, we were unable to continue serving food after that quarter. Some administrators decided they didn&#8217;t want Food not Bombs on campus, despite the fact that we were following all the proper regulations. All of our requests for food permits were denied for various stupid &#8220;problems,&#8221; and when we fixed those problems, the administrators found new problems.  It became clear that they would never approve a permit with our names on it, so our FNB group gradually dissolved.</span></p>
<p><span class="Apple-style-span" style="line-height: 17px;" data-mce-mark="1">One thing I learned from the experience is that I suck at organizing and motivating people behind a vision. I just don&#8217;t enjoy doing it. So in the future, rather than trying to start my own group, I&#8217;m going to help somebody else start theirs.  Finding a good peacemaking group to be a part of is my top priority for 2013.</span></p>
<p><span class="Apple-style-span" style="line-height: 17px;" data-mce-mark="1"><strong>4. Beer: <span style="color: #008000;">SUCCESS</span></strong></span></p>
<p><span class="Apple-style-span" style="line-height: 17px;" data-mce-mark="1"><strong><img class="alignright  wp-image-1712" alt="dogfish-head-sahtea-poured" src="http://izbicki.me/blog/wp-content/uploads/2013/01/dogfish-head-sahtea-poured-171x300.jpg" width="103" height="180" /></strong>I grew up thinking that the way to be a good Christian was to be a good Puritan and Nationalist.  Now, I believe the way to be a good Christian is to be Christ to everyone you meet.  So since Jesus turned water into wine, I decided to try my hand at turning it into beer.  (I personally can&#8217;t stand wine.)</span></p>
<p><span class="Apple-style-span" style="line-height: 17px;" data-mce-mark="1">This has turned out better than I could ever have imagined.  Not only is making beer a ton of fun, but it really helps build relationships.  Sharing alcohol with strangers is the fastest way I know to turn them into friends.  Plus, it can lead into great conversations about God.  Mostly what I&#8217;ve been brewing is what I affectionately call &#8220;Monk Beer.&#8221;  It&#8217;s based on recipes from Trappist monasteries from Belgian.  In my experience, even the most die-hard anti-theists have good things to say about Christians after we share some monk beer together.</span></p>
<p><span class="Apple-style-span" style="line-height: 17px;" data-mce-mark="1">Oh, and the beer tastes great too.  We got <a href="http://www.maltosefalcons.com/recipes/food-not-bombs-dubbel">2nd place out of 550</a> at the Mayfaire beer competition.</span></p>
<p><span class="Apple-style-span" style="line-height: 17px;" data-mce-mark="1"><strong>5. Memorizing the sermon on the mount: <span style="color: #800000;">FAILURE</span></strong></span></p>
<p><span class="Apple-style-span" style="line-height: 17px;" data-mce-mark="1"><strong><img class="alignright" alt="" src="http://upload.wikimedia.org/wikipedia/commons/thumb/9/96/Bloch-SermonOnTheMount.jpg/280px-Bloch-SermonOnTheMount.jpg" width="168" height="188" /></strong>I&#8217;ve read through the sermon more times than I can count. Many verses, like &#8220;turn the other cheek&#8221; and &#8220;go the extra mile&#8221; are burned into my heart. But I feel there is still so much more for me to learn here. I still don&#8217;t fully understand what Jesus is saying, and I&#8217;m certainly not yet living it out.  </span></p>
<p>So I decided to memorize it.  I made this goal for myself at the beginning of summer, and 6 months later I still haven&#8217;t accomplished it. I am normally good at memorizing, I just haven&#8217;t put in the time it takes to accomplish this task.  I am getting closer, however.  Finishing this task is one of my goals for 2013.</p>
<p><strong>6. Respecting the Sabbath: <span style="color: #800000;">FAILURE</span></strong></p>
<p><strong><a href="http://www.reddit.com/r/Sidehugs/comments/14uqyz/your_namer98_macro_is_here/"> <img class="alignright" alt="uMZSq" src="http://izbicki.me/blog/wp-content/uploads/2013/01/uMZSq-300x225.jpg" width="300" height="225" /></a></strong>I made a commitment in 2012 not to open my computer on Sunday. Instead of wasting my time working or browsing reddit, I wanted to spend this same time building relationships with members of my church and family. Growing up, I thought respecting the sabbath was a &#8220;stupid Jewish thing,&#8221; but now I realize it is immensely important.</p>
<p>Sadly, I didn&#8217;t keep the sabbath as well as I would have liked. Often, I would read academic papers on Sunday, grade homework, or do other things that failed to build relationships.  I did this because of poor scheduling throughout the week and misplaced priorities about what is important.</p>
<p>Another goal for 2013 is to change this. I want my every Sunday to be the most holy day of the year. I want to dedicate Sunday to building important relationships with God and men.</p>
<p><strong>7. Walking everywhere: <span style="color: #008000;">SUCCESS</span></strong></p>
<p><strong> <img class="alignright" alt="big1" src="http://izbicki.me/blog/wp-content/uploads/2013/01/big1-300x150.jpg" width="300" height="150" /></strong>Normally I ride my bike everywhere, but when I got a flat tire in early November I decided to try walking everywhere. Bikes are already a much slower lifestyle than cars, but walking is even slower. Every day since my tire popped until today, I did a 40 minute walk to work.  It was amazing.</p>
<p>As I walked to work, I got to enjoy the sights of the snowy San Gabriel mountains to the north, the Box Spring mountains to the east and Saddleback Mountain to the southwest. I got to watch and listen to the birds sing. I got to pick up trash along the side of the road, making the community just a little bit nicer, and I even got to wish the occasional homeless man a &#8220;Merry Christmas&#8221; and talk about how they&#8217;re getting along.</p>
<p><strong>8. Talking with the homeless: <span style="color: #800000;">FAILURE</span></strong></p>
<p>Sadly, I failed to help many of the &#8220;least of these&#8221; in Riverside. Many times this year I drove passed a homeless man despite a gut wrenching conviction that I need to stop and help this person. I failed to help Christ in his greatest need, and I am ashamed.</p>
<p><strong><img class="aligncenter" alt="Christ in the Breadlines" src="http://izbicki.me/blog/wp-content/uploads/2012/06/Christ-in-the-Breadlines.jpg" width="640" height="349" /></strong></p>
<p>In fall of 2011, I did a lot of talking with the homeless.  I did most of my grocery shopping at a nearby Stater Brothers, and there were always homeless men and women outside begging for money.  I would ask them to come shopping with me, buy them some food, and have a good conversation about their life and problems.  But this year my roommates and I have been shopping exclusively at Costco, and I don&#8217;t get to meet any of these homeless while going about my daily business.  I think this turned out to be a Very Bad Thing for my spiritual health.</p>
<p>I&#8217;m still not sure how to fix this problem for 2013.</p>
<p><strong>Bonus 9th: Sharing my experiments on the internet</strong></p>
<p>I&#8217;m sharing this for two reasons.  First, these experiments have been a tremendous growth to me spiritually.  Maybe they will help you as well if you try them.  But the second reason is much more important to me personally: I want public accountability for my actions.  I want to change the world for the better&#8212;I want to be like Christ&#8212;but I can&#8217;t do that without help.  I need others to lift me up, just as I need to lift others up.</p>
<p>So if you have any suggestions for cool religious experiments I can perform in 2013, <strong>please tell me</strong>!</p>
 <img src="http://izbicki.me/blog/?feed-stats-post-id=1708" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://izbicki.me/blog/my-2012-experiments-in-christianity/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Gausian distributions form a monoid</title>
		<link>http://izbicki.me/blog/gausian-distributions-are-monoids?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=gausian-distributions-are-monoids</link>
		<comments>http://izbicki.me/blog/gausian-distributions-are-monoids#comments</comments>
		<pubDate>Sun, 25 Nov 2012 00:43:24 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://izbicki.me/blog/?p=1442</guid>
		<description><![CDATA[(And why machine learning experts should care) This is the first in a series of posts about the HLearn library for haskell that I&#8217;ve been working on for the past few months. The idea of the library is to show that abstract algebra&#8212;specifically monoids, groups, and homomorphisms&#8212;are useful not just in esoteric functional programming, but [...]]]></description>
				<content:encoded><![CDATA[<h4>(And why machine learning experts should care)</h4>
<p><img class="alignright" title="gaussian" src="http://izbicki.me/blog/wp-content/uploads/2012/11/gaussian.png" alt="" width="275" height="220" />This is the first in a series of posts about the <a href="http://hackage.haskell.org/package/HLearn-algebra">HLearn library</a> for haskell that I&#8217;ve been working on for the past few months. The idea of the library is to show that abstract algebra&#8212;specifically <a href="https://en.wikipedia.org/wiki/Monoid">monoids</a>, <a href="https://en.wikipedia.org/wiki/Group_(mathematics)">groups</a>, and <a href="https://en.wikipedia.org/wiki/Homomorphism">homomorphisms</a>&#8212;are useful not just in esoteric functional programming, but also in real world machine learning problems.  In particular, <strong>by framing a learning algorithm according to these algebraic properties, we get three things for free</strong>: (1) an online version of the algorithm; (2) a parallel version of the algorithm; and (3) a procedure for cross-validation that runs asymptotically faster than the standard version.</p>
<p>We&#8217;ll start with the example of a <a href="https://en.wikipedia.org/wiki/Normal_distribution">Gaussian distribution</a>. Gaussians are ubiquitous in learning algorithms because they accurately describe most data.  But more importantly, they are easy to work with.  They are fully determined by their mean and variance, and these parameters are easy to calculate.</p>
<p>In this post we&#8217;ll start with examples of why the monoid and group properties of Gaussians are useful in practice, then we&#8217;ll look at the math underlying these examples, and finally we&#8217;ll see that this technique is extremely fast in practice and results in <strong>near perfect parallelization</strong>.</p>
<p><span id="more-1442"></span></p>
<h3>HLearn by Example</h3>
<p>Install the libraries from a shell:</p>
<pre>$ cabal install HLearn-distributions</pre>
<p>Then import the HLearn libraries into a literate haskell file:</p>
<pre>&gt; import HLearn.Algebra
&gt; import HLearn.Models.Distributions.Gaussian</pre>
<p>And some libraries for comparing our performance:</p>
<pre>&gt; import Criterion.Main
&gt; import Statistics.Distribution.Normal
&gt; import qualified Data.Vector.Unboxed as VU</pre>
<p>Now let&#8217;s create some data to work with. For simplicity&#8217;s sake, we&#8217;ll use a made up data set of how much money people make. Every entry represents one person making that salary. (We use a small data set here for ease of explanation.  When we stress test this library at the end of the post we use much larger data sets.)</p>
<pre>&gt; gradstudents = [15e3,25e3,18e3,17e3,9e3]        :: [Double]
&gt; teachers     = [40e3,35e3,89e3,50e3,52e3,97e3]  :: [Double]
&gt; doctors      = [130e3,105e3,250e3]              :: [Double]</pre>
<p>In order to train a Gaussian distribution from the data, we simply use the <strong>train</strong> function, like so:</p>
<pre>&gt; gradstudents_gaussian = train gradstudents      :: Gaussian Double
&gt; teachers_gaussian     = train teachers          :: Gaussian Double
&gt; doctors_gaussian      = train doctors           :: Gaussian Double</pre>
<p>The train function is a member of the HomTrainer type class, which we&#8217;ll talk more about later.  Also, now that we&#8217;ve trained some Gaussian distributions, we can perform all the normal calculations we might want to do on a distribution.  For example, taking the mean, standard deviation, pdf, and cdf.</p>
<p>Now for the interesting bits. We start by showing that the Gaussian is a semigroup. A <strong>semigroup</strong> is any data structure that has an associative binary operation called (<strong>&lt;&gt;</strong>). Basically, we can think of (&lt;&gt;) as &#8220;adding&#8221; or &#8220;merging&#8221; the two structures together. (Semigroups are monoids with only a mappend function.)</p>
<p>So how do we use this? Well, what if we decide we want a Gaussian over everyone&#8217;s salaries? Using the traditional approach, we&#8217;d have to recompute this from scratch.</p>
<pre>&gt; all_salaries = concat [gradstudents,teachers,doctors]
&gt; traditional_all_gaussian = train all_salaries :: Gaussian Double</pre>
<p>But this repeats work we&#8217;ve already done. On a real world data set with millions or billions of samples, this would be very slow. Better would be to merge the Gaussians we&#8217;ve already trained into one final Gaussian. We can do that with the semigroup operation (&lt;&gt;):</p>
<pre>&gt; semigroup_all_gaussian = gradstudents_gaussian &lt;&gt; teachers_gaussian &lt;&gt; doctors_gaussian</pre>
<p>Now,</p>
<pre>traditional_all_gaussian == semigroup_all_gaussian</pre>
<p>The coolest part about this is that <em>the semigroup operation takes time <strong>O(1)</strong>, no matter how much data we&#8217;ve trained the Gaussians on.</em> The naive approach takes time <strong>O(n)</strong>, so we&#8217;ve got a pretty big speed up!</p>
<p>Next, a <strong>monoid</strong> is a semigroup with an identity. The identity for a Gaussian is easy to define&#8212;simply train on the empty data set!</p>
<pre>&gt; gaussian_identity = train ([]::[Double]) :: Gaussian Double</pre>
<p>Now,</p>
<pre>gaussian_identity == mempty</pre>
<p>But we&#8217;ve still got one more trick up our sleeves.  The Gaussian distribution is not just a monoid, but also a group. Groups appear all the time in abstract algebra, but they haven&#8217;t seen much attention in functional programming for some reason. Well <strong>groups</strong> are simple: they&#8217;re just monoids with an inverse. This inverse lets us do &#8220;subtraction&#8221; on our data structures.</p>
<p>So back to our salary example. Lets say we&#8217;ve calculated all our salaries, but we&#8217;ve realized that including grad students in the salary calculations was a mistake. (They&#8217;re not real people after all.) In a normal library, we would have to recalculate everything from scratch again, excluding the grad students:</p>
<pre>&gt; nograds = concat [teachers,doctors]
&gt; traditional_nograds_gaussian = train nograds :: Gaussian Double</pre>
<p>But as we&#8217;ve already discussed, this takes a lot of time. We can use the <strong>inverse</strong> function to do this same operation in constant time:</p>
<pre>&gt; group_nograds_gaussian = semigroup_all_gaussian &lt;&gt; (inverse gradstudents_gaussian)</pre>
<p>And now,</p>
<pre>traditional_nograds_gaussian == group_nograds_gaussian</pre>
<p>Again, we&#8217;ve converted an operation that would have taken time<strong> O(n)</strong> into one that takes time <strong>O(1)</strong>. Can&#8217;t get much better than that!</p>
<h3>The HomTrainer Type Class</h3>
<p>As I&#8217;ve already mentioned, the HomTrainer type class is the basis of the <a href="http://hackage.haskell.org/package/HLearn-algebra">HLearn library</a>.  Basically, any learning algorithm that is also a <strong>semigroup homomorphism</strong> can be made an instance of HomTrainer.  This means that if xs and ys are lists of data points, the class obeys the following law:</p>
<pre>train (xs ++ ys) == (train xs) &lt;&gt; (train ys)</pre>
<p>It might be easier to see what this means in picture form:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1635" title="gaussian commutative table" src="http://izbicki.me/blog/wp-content/uploads/2012/11/gaussian-commutative-table22.png" alt="gaussian commutative table" width="525" height="489" /></p>
<p>On the left hand side, we have some data sets, and on the right hand side, we have the corresponding Gaussian distributions and their parameters.  Because training the Gaussian is a homomorphism, it doesn&#8217;t matter whether we follow the orange or green paths to get to our final answer.  We get the exact same answer either way.</p>
<p>Based on this property alone, we get the three &#8220;free&#8221; properties I mentioned in the introduction.  (1) We get an online algorithm for free.  The function <strong>add1dp</strong> can be used to add a single new point to an existing Gaussian distribution.  Let&#8217;s say I forgot about one of the graduate students&#8212;I&#8217;m sure this would never happen in real life&#8212;I can add their salary like this:</p>
<pre>&gt; gradstudents_updated_gaussian = add1dp gradstudents_gaussian (10e3::Double)</pre>
<p>This updated Gaussian is exactly what we would get if we had included the new data point in the original data set.</p>
<p>(2) We get a parallel algorithm.  We can use the higher order function <strong>parallel</strong> to parallelize any application of train.  For example,</p>
<pre>&gt; gradstudents_parallel_gaussian = (parallel train) gradstudents :: Gaussian Double</pre>
<p>The function parallel automatically detects the number of processors your computer has and evenly distributes the work load over them.  As we&#8217;ll see in the performance section, this results in perfect parallelization of the training function.  Parallelization literally could not be any simpler!</p>
<p>(3) We get asymptotically faster cross-validation; but that&#8217;s not really applicable to a Gaussian distribution so we&#8217;ll ignore it here.</p>
<p>One last note about the HomTrainer class: we never actually have to define the <strong>train</strong> function for our learning algorithm explicitly.  All we have to do is define the semigroup operation, and the compiler will derive our training function for us!  We&#8217;ll save a discussion of why this homomorphism property gives us these results for another post.  Instead, we&#8217;ll just take a look at what the Gaussian distribution&#8217;s semigroup operation looks like.</p>
<h3>The Semigroup operation</h3>
<p>Our Gaussian data type is defined as:</p>
<pre>data Gaussian datapoint = Gaussian
    { n  :: !Int         -- The number of samples trained on
    , m1 :: !datapoint   -- The mean (first moment) of the trained distribution
    , m2 :: !datapoint   -- The variance (second moment) times (n-1)
    , dc :: !Int         -- The number of "dummy points" that have been added
    }</pre>
<p>In order to estimate a Gaussian from a sample, we must find the total number of samples (n), the mean (m1), and the variance (calculated from m2).  (We&#8217;ll explain what dc means a little later.)  Therefore, we must figure out an appropriate definition for our semigroup operation below:</p>
<pre>(Gaussian na m1a m2a dca) &lt;&gt; (Gaussian nb m1b m2b dcb) = Gaussian n' m1' m2' dc'</pre>
<p>First, we calculate the number of samples n&#8217;. The number of samples in the resulting distribution is simply the sum of the number of samples in both the input distributions:</p>
<p style="text-align: center;"><span id='tex_3719'></span></p>
<p>Second, we calculate the new average m1&#8242;. We start with the definition that the final mean is:</p>
<p style="text-align: center;"><span id='tex_360'></span></p>
<p>Then we split the summation according to whether the input element <span id='tex_2025'></span> was from the left Gaussian a or right Gaussian b, and substitute with the definition of the mean above:</p>
<table cellpadding="10" align="center">
<tbody>
<tr>
<td style="text-align: left;"><span id='tex_9898'></span></td>
</tr>
<tr>
<td><span id='tex_8299'></span></td>
</tr>
</tbody>
</table>
<p>Notice that this is simply the weighted average of the two means. This makes intuitive sense. But there is a slight problem with this definition: When implemented on a computer with floating point arithmetic, we will get infinity whenever n&#8217; is 0.  We solve this problem by adding a &#8220;dummy&#8221; element into the Gaussian whenever n&#8217; would be zero.  This increases n&#8217; from 0 to 1, preventing the division by 0.  The variable dc counts how many dummy variables have been added, so that we can remove them before performing calculations (e.g. finding the pdf) that would be affected by an incorrect number of samples.</p>
<p>Finally, we must calculate the new m2&#8242;. We start with the definition that the variance times (n-1) is:</p>
<p style="text-align: center;"><span id='tex_9248'></span></p>
<p>(Note that the second half of the equation is a property of variance, and <a href="https://en.wikipedia.org/wiki/Variance#Population_variance_and_sample_variance">its derivation can be found on wikipedia</a>.)</p>
<p>Then, we do some algebra, split the summations according to which input Gaussian the data point came from, and resubstitute the definition of m2 to get:</p>
<table cellpadding="10" align="center">
<tbody>
<tr>
<td><span id='tex_2041'></span></td>
</tr>
<tr>
<td><span id='tex_7497'></span></td>
</tr>
<tr>
<td><span id='tex_2834'></span></td>
</tr>
<tr>
<td><span id='tex_9509'></span></td>
</tr>
<tr>
<td><span id='tex_6124'></span></td>
</tr>
</tbody>
</table>
<p>Notice that this equation has no divisions in it.  This is why we are storing m2 as the variance times (n-1) rather than simply the variance.  Adding in the extra divisions causes training our Gaussian distribution to run about 4x slower.  I&#8217;d say haskell is getting pretty fast if the number of floating point divisions we perform is impacting our code&#8217;s performance that much!</p>
<h3>Performance</h3>
<p>This algebraic interpretation of the Gaussian distribution has excellent time and space performance.  To show this, we&#8217;ll compare performance to the excellent Haskell package called &#8220;<a href="http://hackage.haskell.org/package/statistics">statistics</a>&#8221; that also has support for Gaussian distributions.  We use the criterion package to create three tests:</p>
<pre>&gt; size = 10^8
&gt; main = defaultMain
&gt;     [ bench "statistics-Gaussian" $ whnf (normalFromSample . VU.enumFromN 0) (size)
&gt;     , bench "HLearn-Gaussian" $ whnf
&gt;         (train :: VU.Vector Double -&gt; Gaussian Double)
&gt;         (VU.enumFromN (0::Double) size)
&gt;     , bench "HLearn-Gaussian-Parallel" $ whnf
&gt;         (parallel $ (train :: VU.Vector Double -&gt; Gaussian Double))
&gt;         (VU.enumFromN (0::Double) size)
&gt;     ]</pre>
<p>In these test, we time three different methods of constructing Gaussian distributions given 100,000,000 data points.  On my laptop with 2 cores, I get these results:</p>
<table border="1" cellspacing="0" cellpadding="5px" align="center">
<tbody>
<tr>
<td>statistics-Gaussian</td>
<td>2.85 sec</td>
</tr>
<tr>
<td>HLearn-Gaussian</td>
<td>1.91 sec</td>
</tr>
<tr>
<td><strong>HLearn-Gaussian-Parallel</strong></td>
<td><strong>0.96 sec</strong></td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p>Pretty nice!  The algebraic method managed to outperform the traditional method for training a Gaussian by a handy margin.  Plus, our parallel algorithm runs exactly twice as fast on two processors.  Theoretically, this should scale to an arbitrary number of processors, but I don&#8217;t have a bigger machine to try it out on.</p>
<p>Another interesting advantage of the <a href="http://hackage.haskell.org/package/HLearn-algebra">HLearn library</a> is that <strong>we can trade off time and space performance</strong> by changing which data structures store our data set.  Specifically, we can use the same functions to train on a list or an unboxed vector.  We do this by using the <a href="http://hackage.haskell.org/package/ConstraintKinds">ConstraintKinds</a> package on hackage that extends the base type classes like Functor and Foldable to work on classes that require constraints.  Thus, we have a Functor instance of Vector.Unboxed. This is not possible without ConstraintKinds.</p>
<p>Using this benchmark code:</p>
<pre>main = do
    print $ (train [0..fromIntegral size::Double] :: Gaussian Double)
    print $ (train (VU.enumFromN (0::Double) size) :: Gaussian Double)</pre>
<p style="text-align: left;">We generate the following heap profile:</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-1485" title="spacetests-gaussian" src="http://izbicki.me/blog/wp-content/uploads/2012/11/spacetests-gaussian.png" alt="" width="690" height="462" /></p>
<p style="text-align: left;">Processing the data as a vector requires that we allocate all the memory in advance.  This lets the program run faster, but prevents us from loading data sets larger than the amount of memory we have.  Processing the data as a list, however, allows us to allocate the memory only as we use it.  But because lists are boxed and lazy data structures, we must accept that our program will run about 10x slower.  Lucky for us, <strong>GHC takes care of all the boring details of making this happen seamlessly.  We only have to write our train function once.</strong></p>
<h3 style="text-align: left;">Future Posts</h3>
<p style="text-align: left;">There&#8217;s still at least four more major topics to cover in the HLearn library:  (1) We can extend this discussion to show how the Naive Bayes learning algorithm has a similar monoid and group structure.  (2) There are many more learning algorithms with group structures we can look into.  (3) We can look at exactly how all these higher order functions, like batch and parallel work under the hood.  And (4) we can see how the fast cross-validation I briefly mentioned works and why it&#8217;s important.</p>
<p style="text-align: left;"><a href="http://izbicki.me/blog/feed">Subscribe to the RSS feed</a> and stay tuned!</p>
 <img src="http://izbicki.me/blog/?feed-stats-post-id=1442" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://izbicki.me/blog/gausian-distributions-are-monoids/feed</wfw:commentRss>
		<slash:comments>34</slash:comments>
		</item>
		<item>
		<title>A simple method to radicalize your Christianity.</title>
		<link>http://izbicki.me/blog/a-simple-method-to-radicalize-your-christianity?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=a-simple-method-to-radicalize-your-christianity</link>
		<comments>http://izbicki.me/blog/a-simple-method-to-radicalize-your-christianity#comments</comments>
		<pubDate>Sat, 17 Nov 2012 20:02:33 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Religion]]></category>
		<category><![CDATA[Thoughts]]></category>

		<guid isPermaLink="false">http://izbicki.me/blog/?p=1444</guid>
		<description><![CDATA[Anyone who wants to be first must be the very last, and the servant of all. &#8211; Mark 9:35 Being the servant of all is a hard task that we often forget to do.  We need to remind ourselves constantly that we are here to serve everyone.  One easy way to do this is to [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignright" src="http://upload.wikimedia.org/wikipedia/commons/thumb/c/c0/Bernhard_Strigel_Fu%C3%9Fwaschung.jpg/398px-Bernhard_Strigel_Fu%C3%9Fwaschung.jpg" alt="Jesus washing his disciples' feet" width="239" height="287" /></p>
<blockquote><p>Anyone who wants to be first must be the very last, and the servant of all.</p>
<p>&#8211; Mark 9:35</p></blockquote>
<p>Being the servant of all is a hard task that we often forget to do.  We need to remind ourselves constantly that we are here to serve everyone.  One easy way to do this is to <strong>call everyone els</strong><strong>e &#8220;Sir&#8221; or &#8220;Ma&#8217;am.&#8221;  </strong>This language serves as a reminder to ourselves, and at the same time uplifts the person we&#8217;re talking to.<span id="more-1444"></span></p>
<p>I discovered this trick when I was an officer in the Navy.  All the enlisted sailors had to call me &#8220;Sir,&#8221; and that frankly made me feel good about myself.  It made me feel important.  I had twenty people I was in charge of, and their sole purpose in life was to do what I told them to do.  To drive this point home, all midshipmen have to memorize this quote in their first year at the Naval Academy:</p>
<blockquote><p>Sir, sir is subservient word surviving from the surly days of old Serbia, when certain serfs, too ignorant to remember their lord&#8217;s names, yet too servile to blaspheme them, circumvented the situation by surrogating the subservient word sir, by which I now belatedly address a certain senior cirroped who correctly surmised that I was syrupy enough to say sir after every word I said, sir.</p></blockquote>
<p>But one of the reasons I left the Navy was the realization that the military&#8217;s power structure looked nothing like what Jesus wanted from his followers.  Jesus came to invert this power structure.  He came to make the first last and the last first&#8212;so we need to do that in our own lives.  As an officer, <em>I</em> should have been calling <em>my sailors</em> sir, because it should have been <em>me</em> serving <em>them</em>.  That&#8217;s the example that Jesus gave.</p>
<p>So now when I&#8217;m walking down the street and see a homeless women, I greet her with &#8220;Good morning, ma&#8217;am!&#8221;  God put me on that path in order to be a Jesus for her&#8212;to be her servant.  I need to remind myself that I was made just for this moment: to accept this woman&#8217;s sins as my own and serve her as Christ would.  And she needs someone to treat her with the dignity of a human created in God&#8217;s image.  Being called ma&#8217;am may just be the only dignity she receives for the rest of her life.</p>
<p><img class="aligncenter" src="http://upload.wikimedia.org/wikipedia/commons/thumb/2/2e/Fu%C3%9Fwaschung_Christi_Sizilien_18_Jh.jpg/640px-Fu%C3%9Fwaschung_Christi_Sizilien_18_Jh.jpg" alt="Christ washing his disciples' feet" width="640" height="455" /></p>
 <img src="http://izbicki.me/blog/?feed-stats-post-id=1444" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://izbicki.me/blog/a-simple-method-to-radicalize-your-christianity/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
<!-- This Quick Cache file was built for (  izbicki.me/blog/feed ) in 2.60725 seconds, on Jun 20th, 2013 at 8:03 am UTC. -->
<!-- This Quick Cache file will automatically expire ( and be re-built automatically ) on Jun 20th, 2013 at 9:03 am UTC -->
<!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
<!-- Quick Cache Is Fully Functional :-) ... A Quick Cache file was just served for (  izbicki.me/blog/feed ) in 0.00123 seconds, on Jun 20th, 2013 at 8:21 am UTC. -->