> {-# LANGUAGE KindSignatures #-} > {-# LANGUAGE TypeFamilies #-} > {-# LANGUAGE MultiParamTypeClasses #-} > {-# LANGUAGE UndecidableInstances #-} > {-# LANGUAGE FlexibleInstances #-} > {-# LANGUAGE LiberalTypeSynonyms #-} > {-# LANGUAGE ImpredicativeTypes #-} > {-# LANGUAGE ConstraintKinds #-} > import Control.Category > import Prelude hiding ( (.), id, Functor(..), Applicative(..), Monad(..) ) > import Data.Params > import Data.Params.Functor > import Data.Params.Applicative > import Data.Params.Monad

At the value level, tying the knot is a classic technique in Haskell data structures. It let’s us build circular datastructures using self-refrence and lazy evaluation. The classic example is the cycle:

> cycle = x where > x = 0 : y > y = 1 : x

But there can be no direct analogy between tying the knot at the value and type level. This is because tying the knot requires lazy evaluation, which doesn’t make sense for types.

*(Idea! Maybe we should just start calling Python a lazily typed language!)*

But let’s check out a new type level technique… and if you look at the right angle… and squint just the right amount… then it sorta kinda looks like tying the knot.

Remember all that fun we had with lensified Functors, Applicatives, and Monads? Our new classes are powerful, but we paid a major price for that power: We gave up parametricity. Parametricity is one of the main properties that makes Haskell code so easy to use and fun to write. Giving it up would be a disaster.

So let’s get it back.

First, we’ll modify our type classes. We’ll need to pass a third parameter to each of them. We’ll call it b. This parameter represents the type at the lens position of our Functor/Applicative/Monad.

> class b ~ GetParam lens tb => Functor' lens tb b where > fmap' :: TypeLens p lens -> (a -> b) -> SetParam lens a tb -> tb > class Functor' lens tb b => Applicative' lens tb b where > pure :: GetParam lens tb -> TypeLens Base lens -> tb > > ap :: ( tf ~ SetParam lens (a -> b) tb > , ta ~ SetParam lens a tb > , a ~ GetParam lens ta > ) > => TypeLens Base lens -> tf -> ta -> tb > class Applicative' lens tfb b => Monad' lens tfb b where > join :: tffb ~ CoJoin lens tfb > => TypeLens Base lens -> tffb -> tfb

Now we can guarantee parametricity when we declare instances of the classes. All we have to do is make sure that the b parameter is a variable and not a type constructor. For example, this is parametric:

instance Functor' (Param_a Base) (Either a b) a

but this is not:

instance Functor' (Param_a Base) (Either Int b) Int

Making our instances parametric is not enough for the type checker. We must prove that all instances will always be parametric. These higher-rank type constraints assert this fact:

> type Functor'' p t = forall a t'. ( t' ~ SetParam p a t, Functor' p t' a ) > type Applicative'' p t = forall a t'. ( t' ~ SetParam p a t, Applicative' p t' a ) > type Monad'' p t = forall a t'. ( t' ~ SetParam p a t, Monad' p t' a )

These type synonym constraints are what I’m calling “tying the type knot.” The foralled a and t’ let us represent an “infinite” number of constraints with a finite size, just like tying the knot lets us represent an infinite data structure with finite memory.

This same technique also works for the constrained monad problem:

> type CnstFunctor p c t = forall a t'. ( t' ~ SetParam p a t, Functor' p t' a, c a ) > type CnstApplicative p c t = forall a t'. ( t' ~ SetParam p a t, Applicative' p t' a, c a ) > type CnstMonad p c t = forall a t'. ( t' ~ SetParam p a t, Monad' p t' a, c a )

So what, exactly, do these new parametric constraints buy us?

Remember how we used the idea of type-level rewrite rules to simplify the type of our applicative sequencing operator (*>)? Now we can simplify it even further.

This is the type that we came up with two posts ago:

(*>) :: ( Applicative lens ( SetParam lens ( a -> b -> b ) tb ) , Applicative lens ( SetParam lens ( b -> b ) tb ) , Applicative lens tb , tb ~ SetParam lens b tb ) => SetParam lens a tb -> tb -> TypeLens Base lens -> tb

It’s pretty clean except for the multiple Applicative constraints. But if we use the type-knotted constraints, we can combine all the Applicatives into one:

(*>) :: ( Applicative'' lens tb , tb ~ SetParam lens b tb ) => SetParam lens a tb -> tb -> TypeLens Base lens -> tb

Unfortunately, we can’t test these parametric constraints today because of a ghc bug/missing feature.

With a heavy dose of sugar, we can use our type lenses to create a type level record syntax. This will make our (*>) operator’s type even clearer… almost like original.

The sugaring rules are pretty simple. Just replace any type expression of the form:

t { lens = a }

with a call to the SetParam type function:

SetParam lens a t

And that’s it!

Now, the lensified and standard versions of (*>) are pretty similar looking. Here they are in a side-by-side comparison:

original (*>) :: Applicative t => t a -> t b -> t b lensified (*>) :: Applicative lens t => t { lens = a } -> t { lens = b } -> t { lens = b }

Sweet!

Next time, we’ll see how to promote everything we’ve done to the kind level.

…

…

…

…

Just kidding!

I’m actually getting married this weekend! I wanted to share my excitement with all you haskellers, so I put together this bad little tying the knot pun! Thanks for putting up with me! Yay!

*(disclaimer: there’s probably lot’s of little mistakes floating around in these posts… type theory isn’t really my area… and I just thought of this idea last week… and… now my brain hurts…)*

We’ll be using all the same extensions as before:

> {-# LANGUAGE TemplateHaskell #-} > {-# LANGUAGE ScopedTypeVariables #-} > {-# LANGUAGE KindSignatures #-} > {-# LANGUAGE TypeFamilies #-} > {-# LANGUAGE MultiParamTypeClasses #-} > {-# LANGUAGE UndecidableInstances #-} > {-# LANGUAGE FlexibleInstances #-} > {-# LANGUAGE RankNTypes #-}

But we’ll be adding some pretty nasty ones today:

> {-# LANGUAGE OverlappingInstances #-} > {-# LANGUAGE RebindableSyntax #-}

We need RebindableSyntax to get do notation, but OverlappingInstances is just a product of the Monad class’s definition. I’ll give infinite haskell points to anyone who can refactor this code so we don’t need the extension!

We’ll also be needing all of our previous work on Functors and Applicatives. It has been uploaded to hackage and is sitting in the appropriate modules:

> import Control.Category > import Prelude hiding ( (.), id, Functor(..), Applicative(..), Monad(..) ) > import qualified Prelude as P > import GHC.Exts > import Data.Params > import Data.Params.Applicative > import Data.Params.Functor

And we’re off!

We will define our monads in terms of their join function. In the standard libraries, join has the type:

join :: m (m a) -> m a

The input has the same type as the output, except that the Monad m is repeated twice. There are two differences in the lensified join function: First, the monad we’re working with might be nested arbitrarily deeply in other data types. Second, the argument it is monadic in might not be the last one. Here is an example of what the join type signature would look like for the Left Either monad sitting within a Maybe Monad:

join :: TypeLens Base (Param_a (Param_a Base)) -> Maybe (Either (Either String Int) Int) -> Maybe (Either String Int)

Since we’re all wannabe category theorists here, we’ll create a CoJoin type family that transforms the output of the join function by duplicating the type at location specified by the lens:

> type family CoJoin (lens :: * -> Constraint) t > type instance CoJoin lens t > = SetParam' > lens > ( SetParam' > ( Objective lens ) > ( GetParam lens t ) > ( GetParam (RemoveObjective lens) t ) > ) > t

*(We covered the Objective and RemoveObjective families in a previous post. As a reminder, the Objective family returns the innermost type lens from our input, and the RemoveObjective family returns the lens that results when the innermost lens is taken away.)*

CoJoin only has one instance, so we could have just used a type synonym. That would make debugging harder, however. The advantage of a type family is that when we ask GHCi what the type is, it will perform the substitutions for us. For example:

ghci> :t undefined :: CoJoin (Param_a Base) (Maybe (Either String Int)) :: Maybe (Maybe (Either String Int)) ghci> :t undefined :: CoJoin (Param_a (Param_a Base)) (Maybe (Either String Int)) :: Maybe (Either (Either String Int) Int)

Now we’re ready to see our new Monad class:

> class Applicative lens tfb => Monad lens tfb where > join :: > ( tffb ~ CoJoin lens tfb > ) => TypeLens Base lens > -> tffb -> tfb

The Left and Right Either instances are:

> instance Monad (Param_a Base) (Either a b) where > join lens (Left (Left a)) = Left a > join lens (Left (Right b)) = Right b > join lens (Right b) = Right b > instance Monad (Param_b Base) (Either a b) where > join lens (Right (Right b)) = Right b > join lens (Right (Left a)) = Left a > join lens (Left a) = Left a

And here are some examples of join in action:

ghci> join _b (Right $ Right "monads") :: Either String String Right "monads" ghci> join _b (Right $ Left "are") :: Either String String Left "are" ghci> join _a (Left $ Left "so") :: Either String String Left "so" ghci> join _a (Right "awesome") :: Either String String Right "awesome"

The instances above don’t consider the case when our lenses point inside of the Either type. We’ll need to define two new recursive instances to handle this case. These instances are the reason we needed the OverlappingInstances language extension:

> instance > ( Monad p a > , Either (CoJoin p a) b ~ CoJoin (Param_a p) (Either a b) -- follows from the lens laws > ) => Monad (Param_a p) (Either a b) > where > > join lens (Left a) = Left $ join (zoom lens) a > join lens (Right b) = Right b > instance > ( Monad p b > , Either a (CoJoin p b) ~ CoJoin (Param_b p) (Either a b) -- follows from the lens laws > ) => Monad (Param_b p) (Either a b) > where > > join lens (Left a) = Left a > join lens (Right b) = Right $ join (zoom lens) b

The equality constraints in the instances above are implied by the lens laws. As we discussed yesterday, with the type rules language extension, those constraints could be removed completely, making the code a bit nicer.

Here are some examples of using join in the nested case:

ghci> join (_a._b) (Left $ Right $ Right "lenses") :: Either (Either a String) b Left (Right "lenses") ghci> join (_a._b) (Left $ Right $ Left "are") :: Either (Either String b) b Left (Left "are") ghci> join (_b._b) (Left "neat") :: Either String (Either a String) Left "neat"

Sometimes we will get the same answer if we join in two separate locations. In the first example below, we join the second two Right constructors, whereas in the second example, we join the first two Right constructors. The results are the same:

ghci> join (_b._b) (Right $ Right $ Right "easy") :: Either a (Either a String) Right (Right "easy") ghci> join _b (Right $ Right $ Right "peasy") :: Either a (Either a String) Right (Right "peasy")

We’ll also be needing a Monad instance for Maybe, so here it is:

> instance Monad (Param_a Base) (Maybe a) where > join lens Nothing = Nothing > join lens (Just Nothing) = Nothing > join lens (Just (Just a)) = Just a > instance > ( Monad p a > , Maybe (CoJoin p a) ~ CoJoin (Param_a p) (Maybe a) -- follows from the lens laws > ) => Monad (Param_a p) (Maybe a) > where > join lens Nothing = Nothing > join lens (Just a) = Just $ join (zoom lens) a

From join and our Applicative instance, we can derive our monadic bind function. We don’t want to use the traditional (>>=) operator for bind just yet. We will need to do something fancy with it to make do notation work out. So instead, we will use the (\\=) operator for bind. Its definition is:

(\\=) :: ( Monad lens tb , a ~ GetParam lens tfa , {- ... lens laws go here ... -} ) => ta -> (a -> tb) -> TypeLens Base lens -> tb > infixl 1 \\= > (m \\= f) lens = join lens $ fmap lens f m

We will create the “minus bind operators” in the same way we created minus operators for the Applicative class. Remember, the minus sign points to the parameters that will get a lens applied to them because they are “minus a lens”. These minus operators are defined as:

> infixl 1 \\=- > infixl 1 -\\=- > infixl 1 -\\= > (m \\=- f) lens = ( m \\= \a -> f a $ objective lens ) lens > (m -\\=- f) lens = ( m lens \\= \a -> f a $ objective lens ) lens > (m -\\= f) lens = ( m lens \\= \a -> f a ) lens

For our example, we’ll build a simple monadic filter. The filterSmall function below sits in the Either Monad, but we’ll be using Left to represent successes (the input passes through the filter), and Right to represent failure (the input doesn’t pass through).

> filterSmall :: (Show a, Ord a) => a -> a -> Either a String > filterSmall k x = if x > k > then Left x > else Right $ show x ++ " is too small"

We can call our function using the monadic bind by:

> chain1 :: Either Int String > chain1 = at _a $ Left 20 \\= filterSmall 10

ghci> chain1 Left 20

Instead of using the Left constructor, we can make things a little more generic by using the return function. As usual, it is equivalent to pure:

> return :: Monad lens t => GetParam lens t -> TypeLens Base lens -> t > return = pure

Sine pure’s last parameter is a type lens, we must use the left-minus (-\\=) variant of bind to sequence the computation:

> chain2 :: Either Int String > chain2 = at _a $ return 20 -\\= filterSmall 10

ghci> chain2 Left 20

Similarly, all the bind operators take a type lens as their last parameter. So any future binds must also use left-minus bind:

> chain3 :: Either Int String > chain3 = at _a $ return 20 -\\= filterSmall 10 -\\= filterSmall 15

ghci> chain3 Left 20

And so on:

> chain4 :: Either Int String > chain4 = at _a $ return 20 -\\= filterSmall 10 -\\= filterSmall 15 -\\= filterSmall 25

ghci> chain4 Right "20 is too small"

We can easily nest our monads. Let’s put all of the computations above inside a Maybe wrapper. All we have to do is change the type signature and the lens:

> chain2' :: Maybe (Either Int String) > chain2' = at (_a._a) $ return 20 -\\= filterSmall 10 > chain3' :: Maybe (Either Int String) > chain3' = at (_a._a) $ return 20 -\\= filterSmall 10 -\\= filterSmall 15 > chain4' :: Maybe (Either Int String) > chain4' = at (_a._a) $ return 20 -\\= filterSmall 10 -\\= filterSmall 15 -\\= filterSmall 25

We’re using the RebindableSyntax language extension to construct a custom do notation. We do this by defining our own (>>=) operator. The most generic bind operator we have is the double minus bind (-\\=-). Sometimes we will want to feed a lens to both sides of the bind, so that’s what we’ll use:

> infixl 1 >>= > (m >>= f) lens = (m -\\=- f) lens

Notice that our (>>=) operator and the one from Prelude take different numbers of arguments! GHC is awesome enough that this is not a problem.

RebindableSyntax also requires us to define functions for failed pattern matching and if statements. Our definitions will be pretty simple:

> fail = error > ifThenElse False _ f = f > ifThenElse True t _ = t

Now, we can take our chain2′ function above and rewrite it in do notation. Here it is again for easy reference:

chain2' :: Maybe (Either Int String) chain2' = at (_a._a) $ return 20 -\\= filterSmall 10

First, rewrite it to use (-\\=-) instead of (-\\=) by causing the right hand side to take a lens parameter even though it won’t use it:

> chain2'' :: Maybe (Either Int String) > chain2'' = at (_a._a) $ return 20 -\\=- (\x lens -> filterSmall 10 x)

Then, rewrite it using do notation:

> chain2''' :: Maybe (Either Int String) > chain2''' = at (_a._a) $ do > x <- return 20 > \lens -> filterSmall 10 x

It looks a little bit nicer if we use const to absorb the lens parameter:

> chain2'''' :: Maybe (Either Int String) > chain2'''' = at (_a._a) $ do > x <- return 20 > const $ filterSmall 10 x

Here is our other examples converted into do notation using the same technique:

> chain3''' :: Maybe (Either Int String) > chain3''' = at (_a._a) $ do > x <- return 20 > y <- const $ filterSmall 10 x > const $ filterSmall 15 y > chain4'' :: Maybe (Either Int String) > chain4'' = at (_a._a) $ do > x <- return 20 > y <- const $ filterSmall 10 x > z <- const $ filterSmall 15 y > const $ filterSmall 25 z

And here is a more complicated expression with a nested do:

> chain5 :: Either a (Either a (Maybe (Either Int String))) > chain5 = at (_b._b._a._a) $ do > x <- return 20 > y <- do > a <- const $ filterSmall x 10 > b <- const $ filterSmall 1 3 > return $ a+b > z <- const $ filterSmall y x > return $ z-x

But there is still a limitation. Due to the way the types work out, the first line of a do block must always be a return statement when using the at function to specify our lens. This is a by product of the extra lens parameter our (>>=) operator is passing around. Fortunately, we can automate this construction with the following function:

> atM lens m = at (removeObjective lens) $ do > return $ at (objective lens) $ m

This lets us rewrite chain5 as:

> chain5' :: Either a (Either a (Maybe (Either Int String))) > chain5' = atM (_b._b._a._a) $ do > let x = 20 > y <- do > a <- const $ filterSmall x 10 > b <- const $ filterSmall 1 3 > return $ a+b > z <- const $ filterSmall y x > return $ z-x

Now we fully support do notation!

Hooray!!

How do we get rid of those ugly const functions?

Can optimus prime use type lenses to save our purity from the effects of the evil decepticons?

Does any one actually care about lensified arrow-do?

Stay tuned to find out.

]]>

We only need these two language extensions for the technique:

> {-# LANGUAGE TypeFamilies #-} > {-# LANGUAGE UndecidableInstances #-}

But for our motivating example, we’ll also use these extensions and some basic imports:

> {-# LANGUAGE KindSignatures #-} > {-# LANGUAGE MultiParamTypeClasses #-} > {-# LANGUAGE ConstraintKinds #-} > import Data.Proxy > import GHC.Exts

Let’s begin.

Consider the classes:

> class Param_a (p :: * -> Constraint) t > class Param_b (p :: * -> Constraint) t > class Param_c (p :: * -> Constraint) t > class Base t

These classes can be chained together like so:

> type Telescope_abc = Param_a (Param_b (Param_c Base))

It is easy to write a type family that returns the “head” of this list. On a telescope, the lens closest to you is called the eye piece, so that’s what we’ll call our type family:

> type family EyePiece ( p :: * -> Constraint ) :: * -> Constraint > type instance EyePiece (Param_a p) = Param_a Base > type instance EyePiece (Param_b p) = Param_b Base > type instance EyePiece (Param_c p) = Param_c Base

Again, this type family is “open” because new instances can be defined in any file.

We might use this EyePiece type family as:

ghci> :t Proxy :: Proxy (EyePiece Telescope_abc) :: Proxy (Param_a Base)

Now, let’s try to write a type class that does the opposite. Instead of extracting the first element in the chain, it will extract the last. On a telescope the lens farthest away from you is called the objective, so that’s what we’ll call our type family. We’ll also need to define it as a closed type family:

type family Objective (lens :: * -> Constraint) :: * -> Constraint where Objective (Param_a p) = Objective p Objective (Param_b p) = Objective p Objective (Param_c p) = Objective p Objective (Param_a Base) = Param_a Base Objective (Param_b Base) = Param_b Base Objective (Param_c Base) = Param_c Base

We can use the Objective family like:

ghci> :t Proxy :: Proxy (Objective Telescope_abc) :: Proxy (Param_c Base)

The Objective family must be closed. This is because the only way to identify when we are at the end of the telescope is by checking if the p parmaeter is the Base class. If it is, then we’re done. If not, we must keep moving down the telescope recusively. Without a closed type family, we would have to explicitly list all of the recursive paths. This means type instances whenever we want to add a new Param_xxx class. That’s nasty and error prone.

Again, the downside of closed type families is that they must be defined all in one place. We can work around this limitation by “factoring” the closed type family into a collection of closed and open type families. In the example above, this works out to be:

> type family Objective (lens :: * -> Constraint) :: * -> Constraint > type instance Objective (Param_a p) = Objective_Param_a (Param_a p) > type instance Objective (Param_b p) = Objective_Param_b (Param_b p) > type instance Objective (Param_c p) = Objective_Param_c (Param_c p) > type instance Objective Base = Base > type family Objective_Param_a (lens :: * -> Constraint) :: * -> Constraint where > Objective_Param_a (Param_a Base) = Param_a Base > Objective_Param_a (Param_a p) = Objective p > type family Objective_Param_b (lens :: * -> Constraint) :: * -> Constraint where > Objective_Param_b (Param_b Base) = Param_b Base > Objective_Param_b (Param_b p) = Objective p > type family Objective_Param_c (lens :: * -> Constraint) :: * -> Constraint where > Objective_Param_c (Param_c Base) = Param_c Base > Objective_Param_c (Param_c p) = Objective p

ghci> :t Proxy :: Proxy (Objective Telescope_abc) :: Proxy (Param_c Base)

With this factoring, we are able to define the Objective instance for each Param_xxx in separate files and retain the benefits of closed type families.

Here is another example. The RemoveObjective family acts like the init function from the Prelude:

> type family RemoveObjective (lens :: * -> Constraint) :: * -> Constraint > type instance RemoveObjective (Param_a p) = RemoveObjective_Param_a (Param_a p) > type instance RemoveObjective (Param_b p) = RemoveObjective_Param_b (Param_b p) > type instance RemoveObjective (Param_c p) = RemoveObjective_Param_c (Param_c p) > type family RemoveObjective_Param_a (lens :: * -> Constraint) :: * -> Constraint where > RemoveObjective_Param_a (Param_a Base) = Base > RemoveObjective_Param_a (Param_a p) = Param_a (RemoveObjective p) > type family RemoveObjective_Param_b (lens :: * -> Constraint) :: * -> Constraint where > RemoveObjective_Param_b (Param_b Base) = Base > RemoveObjective_Param_b (Param_b p) = Param_b (RemoveObjective p) > type family RemoveObjective_Param_c (lens :: * -> Constraint) :: * -> Constraint where > RemoveObjective_Param_c (Param_c Base) = Base > RemoveObjective_Param_c (Param_c p) = Param_b (RemoveObjective p)

ghci> :t Proxy :: Proxy (RemoveObjective Telescope_abc) :: Proxy (Param_a (Param_b Base))

Of course, you can’t do this trick with every closed type family. For example, the RemoveObjective_Param_c family above cannot be factored any smaller. But if you find yourself wanting the benefits of both closed and open type families, then your type probably has the needed structure.

]]>We’ve seen how to use the typeparams library to soup up our Functor and Applicative type classes. But we’ve been naughty little haskellers—we’ve been using type lenses without discussing their laws! Today we fix this oversight. **Don’t worry if you didn’t read/understand the previous posts.** This post is much simpler and does not require any background.

First, we’ll translate the standard lens laws to the type level. Then we’ll see how these laws can greatly simplify the type signatures of our functions. Finally, I’ll propose a very simple (yes, I promise!) GHC extension that promotes rewrite rules to the type level. These type level rewrite rules would automatically simplify our type signatures for us. It’s pretty freakin awesome.

Today, we won’t actually import anything from the typeparams library. Instead, we’ll be building up everything from scratch.

> {-# LANGUAGE TypeFamilies #-} > {-# LANGUAGE MultiParamTypeClasses #-} > {-# LANGUAGE PolyKinds #-} > {-# LANGUAGE RankNTypes #-} > {-# LANGUAGE ConstraintKinds #-} > {-# LANGUAGE FlexibleContexts #-} > import Control.Category > import Prelude hiding ( (.), id ) > import GHC.Exts

Given a data type:

> data Example a b c = Example a b c

We construct the following empty classes:

> class Param_a (p :: * -> Constraint) t -- has kind :: * -> Constraint > class Param_b (p :: * -> Constraint) t > class Param_c (p :: * -> Constraint) t

These classes are the type level lenses. Each one uniquely identifies a parameter of the Example data type. To use these lenses, we will need to be able to represent them at the value level. So we create the singleton type:

> data TypeLens p q = TypeLens

Now, we can create three values that uinquely identify the three type parameters:

> _a = TypeLens :: TypeLens p (Param_a p) > _b = TypeLens :: TypeLens p (Param_b p) > _c = TypeLens :: TypeLens p (Param_c p)

We’re calling these things lenses, so they must be composable. In fact, they compose really easy. Check out their Category instance:

> instance Category TypeLens where > id = TypeLens > t1.t2 = TypeLens

When we chain values together using the (.) composition operator, we create a chain of classes at the type level. For example:

ghci> :t _a._b _a._b :: TypeLens p (Param_a (Param_b p)) ghci> :t _a._b._c _a._b._c :: TypeLens p (Param_a (Param_b (Param_c p))) ghci> > :t _a._a._b._c._a._b _a._a._b._c._a._b :: TypeLens p (Param_a (Param_a (Param_b (Param_c (Param_a (Param_b p))))))

These chains of classes correspond to a nesting of data types. For the Example type we created above, _a._b would refer to the type param b1 in the type:

Example (Example a1 b1 c1) b2 c2

_a._b._c would refer to b2 in the type:

Example (Example a1 b1 (Example a2 b2 c2)) b0 c0

and _a._a._b._c._a._b would refer to the parameter b6 in the monster type:

Example ( Example ( Example a2 ( Example a3 b3 ( Example ( Example a5 ( Example a6 b6 c6 ) c5 ) b4 c4 ) ) c2 ) b1 c1 ) b0 c0

The whole point of lenses is they give us an easy way to get and set parameters. At the type level, we do that with these type families:

> type family GetParam (p :: * -> Constraint) (t :: *) :: * > type family SetParam (p :: * -> Constraint) (a :: *) (t :: *) :: *

For our Example data type, the implementations look like:

> type instance GetParam (Param_a p) (Example a b c) = GetParam p a > type instance GetParam (Param_b p) (Example a b c) = GetParam p b > type instance GetParam (Param_c p) (Example a b c) = GetParam p c > type instance SetParam (Param_a p) a' (Example a b c) = Example (SetParam p a' a) b c > type instance SetParam (Param_b p) b' (Example a b c) = Example a (SetParam p b' b) c > type instance SetParam (Param_c p) c' (Example a b c) = Example a b (SetParam p c' c)

These definitions are recursive, so we need a base case to halt the recursion:

> class Base t > type instance GetParam Base t = t > type instance SetParam Base t' t = t'

Here are some example usages of the GetParam family:

ghci> :t undefined :: GetParam (Param_a Base) (Example Int b c) :: Int ghci> :t undefined :: GetParam (Param_b Base) (Example Int Float c) :: Float ghci> :t undefined :: GetParam (Param_a (Param_b Base)) (Example (Example a1 Int c1) b2 Float) :: Int ghci> :t undefined :: GetParam (Param_c Base) (Example (Example a1 Int c1) b2 Float) :: Float

And similar uses of the SetParam family:

ghci> :t undefined :: SetParam (Param_a Base) Char (Example Int b Float) :: Example Char b Float ghci> :t undefined :: SetParam (Param_c Base) Char (Example Int b Float) :: Example Int b Char ghci> :t undefined :: SetParam (Param_a (Param_b Base)) Char (Example (Example a1 Int c1) b2 Float) :: Example (Example a1 Char c1) b2 Float ghci> :t undefined :: SetParam (Param_c Base) Char (Example (Example a1 Int c1) b2 Float) :: Example (Example a1 Int c1) b2 Char

The first lens law is that if we set a type parameter to its current value, then the overall type does not change. In code, this looks like:

> type LensLaw1 lens t = t ~ SetParam lens (GetParam lens t) t

The second lens law states that if we set a type parameter to a certain value, then get the value at the location of the lens, then we should get back our original type. In code:

> type LensLaw2 lens a t = a ~ GetParam lens (SetParam lens a t)

And lastly, if we set the same parameter twice, then the last setter wins. In code:

> type LensLaw3 lens a b t = a ~ GetParam lens (SetParam lens a (SetParam lens b t))

There are many other laws that can be derived from these three simple laws. For example, we can derive this fourth lens law from laws 1 and 3:

> type LensLaw4 lens a b t = SetParam lens a (SetParam lens b t) ~ SetParam lens a t

We’re glossing over some technicalities involving injective type families, here, but we’ll return to this later in the post.

Any time we have laws in Haskell, we’ve got to prove that they hold. Sometimes, parametricity does this for us automatically (as in the case of the Functor laws). But usually, we rely on test frameworks like QuickCheck. Therefore, we need these frameworks at the type level.

This turns out to be straightforward. We can use these functions to verify our laws:

> property_lensLaw1 :: LensLaw1 lens t => TypeLens Base lens -> t -> () > property_lensLaw1 _ _ = () > property_lensLaw2 :: LensLaw2 lens a t => TypeLens Base lens -> a -> t -> () > property_lensLaw2 _ _ _ = () > property_lensLaw3 :: LensLaw3 lens a b t => TypeLens Base lens -> a -> b -> t -> () > property_lensLaw3 _ _ _ _ = ()

We test the laws as follows. First, specialize all the type variables in the function. Then, ask GHC if the function type checks. If it does, then the law holds for the type variables we chose.

Here is an example:

ghci> property_lensLaw1 _a (undefined :: Example Int Float String) () ghci> property_lensLaw2 _a (undefined :: String) (undefined :: Example Int Float String) () ghci> property_lensLaw3 _a (undefined :: String) (undefined :: [a]) (undefined :: Example Int Float String) ()

Now, let’s write some GetParam/SetParam instances that do not obey the laws and see what happens. In the NationalSecurityAgency type below, GetParams works just fine, but SetParams is broken.

> data NationalSecurityAgency x = NationalSecurityAgency > class Param_x (p :: * -> Constraint) t > _x = TypeLens :: TypeLens p (Param_x p) > type instance GetParam (Param_x p) (NationalSecurityAgency x) = x > type instance SetParam (Param_x p) x' (NationalSecurityAgency x) = NationalSecurityAgency String

When we test the first lens law using a String, everything works fine:

ghci> lensLaw1 _x (undefined :: NationalSecurityAgency String) ()

But when we test it using an Int, the type checker explodes:

ghci> lensLaw1 _x (undefined :: NationalSecurityAgency Int) :73:1: Couldn't match type ‘[Char]’ with ‘Int’ Expected type: SetParam (Param_x Base) (GetParam (Param_x Base) (NationalSecurityAgency Int)) (NationalSecurityAgency Int) Actual type: NationalSecurityAgency Int In the expression: lensLaw1 _x (undefined :: NationalSecurityAgency Int) In an equation for ‘it’: it = lensLaw1 _x (undefined :: NationalSecurityAgency Int)

You can imagine a template haskell quickcheck that calls these property functions many times on random types to give a probabalistic test our type laws hold.

These laws will greatly simplify inferred types in our programs. We’ll see why using an example.

Consider the beloved Applicative sequencing operator (*>) . In the standard libraries, it has the type:

(*>) :: Applicative f => f a -> f b -> f b

Sweet and simple.

In the applicative class we generated yesterday, however, the sequencing operator is pretty nasty looking. GHCi reports it has the type of:

> (*>) :: > ( Applicative lens > ( SetParam > lens > (a1 -> GetParam lens (SetParam lens (a -> GetParam lens tb1) tb1)) > (SetParam lens (a -> GetParam lens tb1) tb1) > ) > , Applicative lens (SetParam lens (a -> GetParam lens tb1) tb1) > , Applicative lens tb1 > , (b1 -> a2 -> a2) ~ GetParam > lens > (SetParam > lens > (a1 -> GetParam lens (SetParam lens (a -> GetParam lens tb1) tb1)) > (SetParam lens (a -> GetParam lens tb1) tb1)) > , a1 ~ GetParam lens (SetParam lens a1 (SetParam lens (a -> GetParam lens tb1) tb1)) > , tb0 ~ SetParam lens a tb1 > , ta ~ SetParam lens a1 (SetParam lens (a -> GetParam lens tb1) tb1) > , a ~ GetParam lens (SetParam lens a tb1) > ) => ta > -> tb0 > -> TypeLens Base lens > -> tb1 > (*>) = undefined > class Applicative lens t

Yikes! What the hell does that beast do?!

Somehow, we need to simplify this type signature, and the type lens laws are what lets us do this. For example, one of the constraints above is:

a1 ~ GetParam lens (SetParam lens a1 (SetParam lens (a -> GetParam lens tb1) tb1))

We can use the third lens law to simplify this to:

a1 ~ GetParam lens (SetParam lens a1 tb1)

If we repeat this process many times, we get a type signature that looks like:

> newop :: > ( Applicative lens ( SetParam lens ( a -> b -> b ) tb ) > , Applicative lens ( SetParam lens ( b -> b ) tb ) > , Applicative lens tb > , tb ~ SetParam lens b tb > , LensLaw2 lens (b->b) tb > , LensLaw2 lens b tb > , LensLaw3 lens (a -> b -> b) (b -> b) tb > , LensLaw3 lens a (b->b) tb > , LensLaw4 lens (a->b->b) (b->b) tb > , LensLaw4 lens a (b->b) tb > ) => SetParam lens a tb > -> tb > -> TypeLens Base lens > -> tb > newop = (*>)

This looks quite a bit better, but is still less than ideal. Actually, this is as far as you can get with the lens laws in GHC 7.8. You need injective type families to go further. (See this mailing list thread and this ghc trac issue for more details about what injective type families are.) Currently, injectve type families are slated to enter GHC 7.10, so the rest of this post will be a bit more speculative about what this future GHC can do.

Let’s take another look at the type synonyms for the lens laws:

type LensLaw1 lens t = t ~ SetParam lens (GetParam lens t) t type LensLaw2 lens a t = a ~ GetParam lens (SetParam lens a t) type LensLaw3 lens a b t = a ~ GetParam lens (SetParam lens a (SetParam lens b t))

This code only enforces that the laws hold for certain parameters. But that’s not what we want! All types are equal in the eyes of the law, so what we really want is type synonyms that look like:

type LensLaw1' = forall lens t. t ~ SetParam lens (GetParam lens t) t type LensLaw2' = forall lens a t. a ~ GetParam lens (SetParam lens a t) type LensLaw3' = forall lens a b t. a ~ GetParam lens (SetParam lens a (SetParam lens b t))

Unfortunately, sticking this into GHC yields the dreaded “type families may not be injective” error message. With injective type families, we would be able to write these laws. (This is a somewhat bold claim that I won’t justify here.) Then our code would simplify further to:

newop' :: ( Applicative lens ( SetParam lens ( a -> b -> b ) tb ) , Applicative lens ( SetParam lens ( b -> b ) tb ) , Applicative lens tb , tb ~ SetParam lens b tb , LensLaw1' , LensLaw2' , LensLaw3' ) => SetParam lens a tb -> tb -> TypeLens Base lens -> tb newop' = (*>)

We can still do better. The lens laws are not something that applies only to specific functions. They are global properties of the type families, and they apply everywhere. Therefore, they should be implicitly added as constraints into every type signature.

We could make this happen by adding a new syntax called “type rules”. In the same way that value level rewrite rules simplify our values, these type rules would simplify our types. The syntax could look something like:

type rule LensLaw1' = forall lens t. t ~ SetParam lens (GetParam lens t) t type rule LensLaw2' = forall lens a t. a ~ GetParam lens (SetParam lens a t) type rule LensLaw3' = forall lens a b t. a ~ GetParam lens (SetParam lens a (SetParam lens b t))

There are two differences between a type rule and a regular type synonym: First, they can take no type parameters. Second, they are implicitly added to every type signature in your program.

The three rules above would allow us to rewrite our function as:

newop'' :: ( Applicative lens ( SetParam lens ( a -> b -> b ) tb ) , Applicative lens ( SetParam lens ( b -> b ) tb ) , Applicative lens tb , tb ~ SetParam lens b tb ) => SetParam lens a tb -> tb -> TypeLens Base lens -> tb newop'' = (*>)

That is soooo much nicer!

We still have some work to go to get our newop function’s type signature as simple as (*>) from the standard library. But I think we’ve got a realistic shot at it. In a coming post I’ll be proposing a way to combine the multiple Applicative constraints into a single constraint, and a nice looking sugar over the SetParam/GetParam type families.

If you didn’t quite follow the previous posts about Functors and Applicatives, they might make a bit more sense now.

]]>

Okay… down to business.

> {-# LANGUAGE TemplateHaskell #-} > {-# LANGUAGE ScopedTypeVariables #-} > {-# LANGUAGE KindSignatures #-} > {-# LANGUAGE TypeFamilies #-} > {-# LANGUAGE MultiParamTypeClasses #-} > {-# LANGUAGE UndecidableInstances #-} > {-# LANGUAGE FlexibleInstances #-} > {-# LANGUAGE RankNTypes #-} > {-# LANGUAGE OverloadedStrings #-}

We’ve got a few more imports today. Our work from last time has been uploaded to hackage and is in the Data.Params.Functor module. For parsing, we’ll be torturing the attoparsec library.

> import Control.Category > import Prelude hiding ( (.), id, Functor(..), Applicative(..) ) > import qualified Prelude as P > import Data.Params > import Data.Params.Functor > import qualified Control.Applicative as Ap > import qualified Data.Attoparsec.Text as A > import Data.Attoparsec.Text (parse,Parser,Result) > import Data.Monoid > import Data.Text (Text,pack)

As a quick warm up, let’s talk about the infix fmap operator <$>. The fmap function has type:

fmap :: Functor lens tb => TypeLens Base lens -> (a -> GetParam lens tb) -> SetParam lens a tb -> tb

All this <$> operator does is move fmap’s lens parameter to the end of the parameter list. This restructuring will help us chain our operators together and will be a common theme throughout the post. The operator is defined as:

> infixl 4 <$> > (f <$> t) lens = fmap lens f t

We can use the operator like:

ghci> length <$> (Left $ Right "test") $ _a._b Left (Right 4)

It will also be useful to have an operator just for specifying the type lens. Since a lens specifies the location “at” which we are operating, we call our new operator @@. It is defined as:

> infixr 0 @@ > (@@) :: (TypeLens p q -> b) -> TypeLens p q -> b > (@@) = id

And used like:

ghci> length <$> (Left $ Right "test") @@ _a._b Left (Right 4)

The fourth lens laws states that we must provide both prefix and infix versions of every combinator. Therefore we also introduce the function:

> at :: TypeLens q p -> (TypeLens q p -> t) -> t > at lens f = f lens

ghci> at (_a._b) $ length <$> (Left $ Right "test") Left (Right 4)

We’re ready to see our new Applicative class:

> class Functor lens tb => Applicative lens tb where > > pure :: GetParam lens tb -> TypeLens Base lens -> tb > > ap :: > ( tf ~ SetParam lens (a -> b) tb > , ta ~ SetParam lens a tb > , a ~ GetParam lens ta > , b ~ GetParam lens tb > ) > => TypeLens Base lens > -> tf > -> ta > -> tb

The functions pure and ap have the exact same meaning and laws as their counterparts in the standard libraries. The only difference is the addition of the TypeLens parameter and corresponding constraints.

The Left and Right Applicative instances for the Either class are defined as:

> instance Applicative p a => Applicative (Param_a p) (Either a b) where > pure a lens = Left $ pure a (zoom lens) > ap lens (Right a) _ = Right a > ap lens (Left f) (Right a) = Right a > ap lens (Left f) (Left b) = Left $ ap (zoom lens) f b > instance Applicative p b => Applicative (Param_b p) (Either a b) where > pure b lens = Right $ pure b (zoom lens) > ap lens (Left a) _ = Left a > ap lens (Right f) (Left a) = Left a > ap lens (Right f) (Right b) = Right $ ap (zoom lens) f b

And just like with Functors, we have to define the base case for our recusive definitions:

> instance Applicative Base t where > pure a _ = a > ap _ f = f

Now, to get the Applicative notation we all know and love, we redefine the <*> operator. It is just a thin wrapper around the ap function. Like the <$> operator, we just move the lens parameter to the end:

> infixl 4 <*> > (tf <*> ta) lens = ap lens (tf lens) ta

Easy as cake!

Let’s try it out!

We’ll start with the doubly nested Either. For nested Eithers, the lens we use specifies what the success constructors are. Any other constructors will act as errors.

Here’s an example without an error:

> fact1 :: Either (Either a String) b > fact1 = (++) <$> Left (Right "haskell") <*> Left (Right " rocks!") @@ _a._b

ghci> fact1 Left (Right "haskell rocks!")

Here we have one possible way of signaling an error:

> fact2 :: Either (Either a String) String > fact2 = (++) <$> Left (Right "python") <*> Right "error" @@ _a._b

ghci> fact2 Right "error"

And here we have the other way:

> fact3 :: Either (Either String String) b > fact3 = (++) <$> Left (Right "c++") <*> Left (Left "error") @@ _a._b

ghci> fact3 Left (Left "error")

Of course, Applicatives are much more useful when our functions have many arguments. Let’s create a function that concatenates four strings together into a phrase:

> cat4 :: String -> String -> String -> String -> String > cat4 a b c d = a ++ " " ++ b ++ " "++ c ++ " " ++ d

And create a phrase with no errors:

> phrase1 :: Either (Either a String) b > phrase1 = cat4 > <$> Left (Right "haskell") > <*> Left (Right "is") > <*> Left (Right "super") > <*> Left (Right "awesome") > @@ _a._b

ghci> phrase1 Left (Right "haskell is super awesome")

And a phrase with two errors:

> phrase2 :: Either (Either String String) String > phrase2 = cat4 > <$> Left (Right "python") > <*> Right "error" > <*> Left (Right "is") > <*> Left (Left "error") > @@ _a._b

ghci> phrase2 Right "error"

Notice that in phrase2 we had two different causes of errors. The error with the fewest number of terms will always win. As a proof by example, let’s shuffle around our previous errors. We still get the same result:

> phrase3 :: Either (Either String String) String > phrase3 = cat4 > <$> Left (Right "python") > <*> Left (Left "error") > <*> Left (Right "is") > <*> Right "error" > @@ _a._b

ghci> phrase3 Right "error"

Thisis cool, but it’s not yet very generic. Everytime we want a success, we have to manually specify the constructors we want to use. We can avoid this tedium using the pure function. It’s type signature is:

pure :: Applicative lens tb => GetParam lens tb -> TypeLens Base lens -> tb

The important thing to notice is that the last parameter takes a TypeLens. This follows our magic formula. We can substitute it into our phrase1 variable like:

> phrase1' :: Either (Either a String) b > phrase1' = cat4 > <$> (pure "haskell" @@ _a._b) > <*> (pure "is" @@ _a._b) > <*> (pure "super" @@ _a._b) > <*> (pure "awesome" @@ _a._b) > @@ _a._b

But this is nasty! We have to specify the same TypeLens everywhere we want to use the pure function.

Thankfully, we don’t have to do this. The whole point of lenses is to create ridiculous new combinators that reduce boilerplate! So let’s do that! The “ap minus” combintator will automatically apply the lens for us:

> infixl 4 <*>- > (tf <*>- ta) lens = (tf <*> ta lens) lens

The minus sign signifies that the right side is “minus a lens” and so we should give it one automtically. Using this combinator, we can rewrite our phrase to look like:

> phrase1'' :: Either (Either a String) b > phrase1'' = cat4 > <$> (pure "haskell" @@ _a._b) > <*>- pure "is" > <*>- pure "super" > <*>- pure "awesome" > @@ _a._b

In order to get rid of the first lens application, we’ll need to perform the same trick to <$>:

> infixl 4 <$>- > (f <$>- t) lens = (f <$> t lens) lens

And we get the beautiful:

> phrase1''' :: Either (Either a String) b > phrase1''' = cat4 > <$>- pure "haskell" > <*>- pure "is" > <*>- pure "super" > <*>- pure "awesome" > @@ _a._b

There’s two more Applicative combinators needed for parsing: *> and <* . They use the same definition in the standard libraries, but with a third lens parameter:

> infixl 4 <* > (u <* v) lens = pure const <*> u <*> v @@ lens > infixl 4 *> > (u *> v) lens = pure (const id) <*> u <*> v @@ lens

Now we need to create all of the “minus” operators. Remember that the minus sign points to the variable that will have the lens automatically applied for us:

> infixl 4 <*- > infixl 4 -<*- > infixl 4 -<* > (u <*- v) lens = ( u <* v lens ) lens > (u -<*- v) lens = ( u lens <* v lens ) lens > (u -<* v) lens = ( u lens <* v ) lens > infixl 4 *>- > infixl 4 -*>- > infixl 4 -*> > (u *>- v) lens = ( u *> v lens ) lens > (u -*>- v) lens = ( u lens *> v lens ) lens > (u -*> v) lens = ( u lens *> v ) lens

Confused? Just remember: when you master these new combinators, all the n00bs will worship your l33t h4sk311 5ki115.

Now that we’ve constructed our torture chamber, it’s time to put attoparsec on the rack. We’ll use the built-in “blind” Functor and Applicative instances to define our lensified ones as:

> mkParams ''Parser > instance Functor p a => Functor (Param_a p) (Parser a) where > fmap' lens f parser = P.fmap (fmap' (zoom lens) f) parser > instance Applicative (Param_a Base) (Parser a) where > pure a lens = Ap.pure $ pure a (zoom lens) > ap lens tf ta = tf Ap.<*> ta

And now we’re ready to start parsing. We’ll start simple. The attoparsec library provides a function called string that matches a specified string. We’ll use it to create a Parser that matches the phrase “haskell rocks”:

> chain1 :: TypeLens Base (Param_a Base) -> Parser Text > chain1 = A.string "haskell" *> A.string " rocks"

ghci> parse (chain1 @@ _a) "haskell rocks" Done "" " rocks"

In the above example, we chose to *not* specify the lens in the chain1 variable. This means that if we want to chain it with another parser, we should use the minus then operator like:

> chain2 :: TypeLens Base (Param_a Base) -> Parser Text > chain2 = chain1 -*> A.string "!"

ghci> parse (chain2 @@ _a) "haskell rocks!" Done "" "!"

If we choose to compose on the right, then we’ll need to move the minus sign to the right:

> chain3 :: TypeLens Base (Param_a Base) -> Parser Text > chain3 = A.string "¡" *>- chain2

ghci> parse (chain3 @@ _a) "¡haskell rocks!" Done "" "!"

We have to use minus operators whenever we chain more than two parsers together. In the example below, the first *> takes three parameters (two parsers and a lens). It gets the lens from the minus of the first -*> operator. That operator also needs a lens, which it gets from the next -*>, and so on.

> chain4 :: TypeLens Base (Param_a Base) -> Parser Text > chain4 = A.string "do" > *> A.string " you" > -*> A.string " get" > -*> A.string " it" > -*> A.string " yet?"

ghci> parse (chain4 @@ _a) "do you get it yet?" Done "" " yet?"

If we need to apply a lens to both sides, then we use the -*>- operator:

> chain5 :: TypeLens Base (Param_a Base) -> Parser Text > chain5 = chain3 -*> A.string " ... " -*>- chain4

ghci> parse (chain5 @@ _a) "¡haskell rocks! ... do you get it yet?" Done "" " yet?"

Everything in the last section we could have done without type lenses. But now we’re going to lift the Parser into an arbitrary data type and work with it.

As a concrete example, we’ll put our Parser inside a Maybe. The Maybe Applicative instance is:

> instance Applicative p a => Applicative (Param_a p) (Maybe a) where > pure a lens = Just $ pure a (zoom lens) > ap lens Nothing _ = Nothing > ap lens (Just f) Nothing = Nothing > ap lens (Just f) (Just b) = Just $ ap (zoom lens) f b

And for convenience we’ll use the following parseMaybe function. It has the same effect as the parse function provided by attoparsec, but does everything from within a Maybe.

> parseMaybe :: Maybe (Parser a) -> Text -> Maybe (Result a) > parseMaybe parser str = flip parse str <$> parser @@ _a

Next, we lensify our parser combinators. This string lifts the string function provided by the attoparsec library into an arbitrary parameter specified by our type lens:

> string c lens = pure (A.string c) (zoom lens)

Back to parsing.

Let’s just repeat the same 5 parse chains from above, but now within the Maybe context. Notice two things:

- The A.string function provided by the attoparsec library did not take a type parameter, but our new string function does. This means there’s a lot more minus combinators!
- Instead of specifying our lens to focus on the _a parameter, we must focus on the _a._a parameter to hit the parser.

> chain1' :: TypeLens Base (Param_a (Param_a Base)) -> Maybe (Parser Text) > chain1' = string "haskell" -*>- string " rocks"

ghci> parseMaybe (chain1' @@ _a._a) "haskell rocks" Just Done "" " rocks"

> chain2' :: TypeLens Base (Param_a (Param_a Base)) -> Maybe (Parser Text) > chain2' = chain1' -*>- string "!"

ghci> parse (chain2' @@ _a._a) "haskell rocks!" Done "" '!'

> chain3' :: TypeLens Base (Param_a (Param_a Base)) -> Maybe (Parser Text) > chain3' = string "¡" -*>- chain2'

ghci> parse (chain3' @@ _a._a) "¡haskell rocks!" Done "" '!'

> chain4' :: TypeLens Base (Param_a (Param_a Base)) -> Maybe (Parser Text) > chain4' = string "do" -*>- string " you" -*>- string " get" -*>- string " it" -*>- string " yet?"

ghci> parse (chain4' @@ _a._a) "do you get it yet?" Done "" " yet?"

> chain5' :: TypeLens Base (Param_a (Param_a Base)) -> Maybe (Parser Text) > chain5' = chain3' -*>- string " ... " -*>- chain4'

ghci> parse (chain5' @@ _a._a) "¡haskell rocks! ... do you get it yet?" Done "" " yet?"

Again, there’s nothing special about being nested inside a Maybe. We could be nested inside any monstrous data type of your choosing. Yay!

But in the example we’ve chosen, what happens if we add a Maybe into the chain? Nothing takes over and eats the whole Parser. It doesn’t matter if the Parse was failing or succeeding, the answer is Nothing.

> chain6 :: TypeLens Base (Param_a (Param_a Base)) -> Maybe (Parser Text) > chain6 = string "python" -*> Nothing

ghci> parseMaybe (chain6 @@ _a._a) "python" Nothing ghci> parseMaybe (chain6 @@ _a._a) "haskell" Nothing

Now we’re ready for some super coolness. We’re going to design a parsing circuit that parses three unique Parse streams simultaneously!

Here is our Circuit definition:

> data Circuit x y z > = Circuit (Maybe x) (Maybe y) (Maybe z) > | CircuitFail > deriving (Show) > mkParams ''Circuit

The x, y, and z type params will hold the Parsers. These Parsers are wrapped within a Maybe. A value of Nothing represents that that parser will not consume any input. A value of (Just parser) means that it will consume input.

The Functor instances are rather interesting because of the Maybe wrapper. We must compose _a with the zoomed lens to make the types work out:

> instance Functor p x => Functor (Param_x p) (Circuit x y z) where > fmap' lens f CircuitFail = CircuitFail > fmap' lens f (Circuit x y z) = Circuit (fmap' (_a . zoom lens) f x) y z > instance Functor p y => Functor (Param_y p) (Circuit x y z) where > fmap' lens f CircuitFail = CircuitFail > fmap' lens f (Circuit x y z) = Circuit x (fmap' (_a . zoom lens) f y) z > instance Functor p z => Functor (Param_z p) (Circuit x y z) where > fmap' lens f CircuitFail = CircuitFail > fmap' lens f (Circuit x y z) = Circuit x y (fmap' (_a . zoom lens) f z)

The Applicative instances are where all the action is at. In each case, the pure function is fairly straightforward. It looks just like the other ones we’ve seen except that it applies the _a to the zoomed lens and gives default values of Nothing to the other parsers. The ap function calls ap on the appropriate parser and uses the First Monoid instance on the other two.

> instance > ( Applicative p x > , Monoid y > , Monoid z > ) => Applicative (Param_x p) (Circuit x y z) > where > pure x lens = Circuit (pure x @@ (_a . zoom lens)) Nothing Nothing > ap lens CircuitFail _ = CircuitFail > ap lens _ CircuitFail = CircuitFail > ap lens (Circuit x1 y1 z1) (Circuit x2 y2 z2) = Circuit > (ap (_a . zoom lens) x1 x2) > (getFirst $ First y1 <> First y2) > (getFirst $ First z1 <> First z2) > instance (Monoid x, Applicative p y, Monoid z) => Applicative (Param_y p) (Circuit x y z) where > pure a lens = Circuit Nothing (pure a @@ _a . zoom lens) Nothing > ap lens CircuitFail _ = CircuitFail > ap lens _ CircuitFail = CircuitFail > ap lens (Circuit x1 y1 z1) (Circuit x2 y2 z2) = Circuit > (getFirst $ First x1 <> First x2) > (ap (_a . zoom lens) y1 y2) > (getFirst $ First z1 <> First z2) > instance (Monoid x, Monoid y, Applicative p z) => Applicative (Param_z p) (Circuit x y z) where > pure a lens = Circuit Nothing Nothing (pure a @@ _a . zoom lens) > ap lens CircuitFail _ = CircuitFail > ap lens _ CircuitFail = CircuitFail > ap lens (Circuit x1 y1 z1) (Circuit x2 y2 z2) = Circuit > (getFirst $ First x1 <> First x2) > (getFirst $ First y1 <> First y2) > (ap (_a . zoom lens) z1 z2)

We write a nice wrapper so we can parse our circuits:

> parseCircuit > :: Circuit (Parser x) (Parser y) (Parser z) > -> Text > -> Text > -> Text > -> Circuit (Result x) (Result y) (Result z) > parseCircuit CircuitFail _ _ _ = CircuitFail > parseCircuit (Circuit x y z) str1 str2 str3 = Circuit > ( parseMaybe x str1 ) > ( parseMaybe y str2 ) > ( parseMaybe z str3 )

And now here is a simple circuit for us to play with:

> circ1 :: Circuit (Parser Text) (Parser Text) (Parser Text) > circ1 = Circuit > (string (pack "haskell") @@ _a._a) > (string (pack "is" ) @@ _a._a) > (string (pack "fun" ) @@ _a._a)

When we try to parse our circuit, we just match each word in parallel:

ghci> parseCircuit circ1 "haskell" "is" "fun" Circuit (Just Done "" "haskell") (Just Done "" "is") (Just Done "" "fun")

In this example, we compose our circuit only on the first parameter:

ghci> parseCircuit (circ1 *> circ1 @@ _x._a) "haskell" "is" "fun" Circuit (Just Partial _) (Just Done "" "is") (Just Done "" "fun")

Notice that (above) we no longer finished after matching the word “haskell”. We’ve got a whole ‘nother haskell to go. Oh Joy!

Here, we match completely:

ghci> parseCircuit (circ1 *> circ1 @@ _x._a) "haskellhaskell" "is" "fun" Circuit (Just Done "" "haskell") (Just Done "" "is") (Just Done "" "fun")

In our Circuit type, every parser is—at least so far—acting completely independently. That means one parser can fail while the others succeed:

ghci> parseCircuit circ1 "python " "is" "fun" Circuit (Just Fail "python " [] "Failed reading: takeWith") (Just Done "" "is") (Just Done "" "fun")

Let’s create another simple circuit to play with. In this one, only the first parser performs any actions. The other two are noops:

> circ2 :: Circuit (Parser Text) (Parser y) (Parser z) > circ2 = Circuit > (string (pack " with lenses") @@ _a._a) > Nothing > Nothing

We can compose circ1 and circ2 exactly as you would suspect. Our original string is now only a partial match:

ghci> parseCircuit (circ1 *> circ2 @@ _x._a) "haskell" "is" "fun" Circuit (Just Partial _) (Just Done "" "is") (Just Done "" "fun")

But this matches perfectly:

ghci> parseCircuit (circ1 *> circ2 @@ _x._a) "haskell with lenses" "is" "fun" Circuit (Just Done "" " with lenses") (Just Done "" "is") (Just Done "" "fun")

And this fails:

ghci> parseCircuit (circ1 *> circ2 @@ _x._a) "haskell without lenses" "is" "fun" Circuit (Just Fail " without lenses" [] "Failed reading: takeWith") (Just Done "" "is") (Just Done "" "fun")

We can simplify the code of circ2 even further (and make it more generic) using the pure function:

> circ3 :: Circuit (Parser Text) (Parser y) (Parser z) > circ3 = pure (string (pack " with lenses") @@ _a) @@ _x

circ3 behaves exactly like circ2 when sequenced with circ1:

ghci> parseCircuit (circ1 *> circ3 @@ _x._a) "haskell with lenses" "is" "fun" Circuit (Just Done "" " with lenses") (Just Done "" "is") (Just Done "" "fun")

And that’s enough for today. GHC needs to rest. It’s tired.

We’ve still go so many tantalizing questions to answer:

- What is that CircuitFail gizmo doing?
- How do I use Alternative to branch my parser?
- Can a Circuit’s parser depend on the other parsers in the Circuit?
- Do fiber optic burritos taste good??!?!

Stay tuned to find out!

]]>

First, enable some GHC magic:

> {-# LANGUAGE TemplateHaskell #-} > {-# LANGUAGE ScopedTypeVariables #-} > {-# LANGUAGE KindSignatures #-} > {-# LANGUAGE TypeFamilies #-} > {-# LANGUAGE MultiParamTypeClasses #-} > {-# LANGUAGE UndecidableInstances #-} > {-# LANGUAGE FlexibleInstances #-} > {-# LANGUAGE RankNTypes #-}

And import our libraries:

> import Control.Category > import Prelude hiding ( (.), id, Functor(..) ) > import Data.Params

We’ll use the Either type as our main example. It’s defined as:

data Either a b = Left a | Right b

The Functor instance is pretty straightforward:

class Functor f where fmap :: (a -> b) -> f a -> f b instance Functor (Either a) where fmap f (Left a) = Left a fmap f (Right b) = Right $ f b

But this instance has a key limitation: We can map a function only over the the last type.

Bifunctors are the current solution to this problem. A recent, popular proposal suggested adding them to base. But this is an ad hoc solution whose application does not extend far beyond the Either type.

Type lenses will (kinda sort of) provide a cleaner solution. That is, they fix the problem about as well as regular old lenses fix the problems of record selectors. As a bonus, we’ll get a convenient mechanism for mapping over nested Functors.

Here is the alternative definition of the Functor class using type lenses:

> class Functor lens t where > fmap' :: a ~ GetParam lens t > => TypeLens p lens > -> (a -> b) > -> t > -> SetParam lens b t

It’s okay if you don’t understand the type signature at first glace. (That’s how know you’re using lenses, afterall!) Let’s step through it using the Either example.

The first argument is the type lens. This indicates which parameter we will be mapping over the type t. In the Either data type, we could use the variable _a to map over the Left component or _b to map over the Right.

Next, we encounter two type families, GetParam and SetParam. These act as getters and setters at the type level. In the above example, GetParam is used to extract arbitrary type params from a type. It is defined as:

type family GetParam (p::k1) (t:: *) :: k3 type instance GetParam Param_a (Either a b) = a type instance GetParam Param_b (Either a b) = b

The SetParam type similarly sets the type of arbitrary params in a type. It is defined as:

type family SetParam (p::k1) (a::k2) (t:: *) :: * type instance SetParam Param_a a' (Either a b) = Either a' b type instance SetParam Param_b b' (Either a b) = Either a b'

These instances can be automatically provided for any type by calling the mkParams template haskell function like so:

> mkParams ''Either

Quick aside: With injective type families and a little sugar, we could make this definition of Functor a tad cleaner.

We can replicate the traditional Functor instance with the code:

instance Functor (Param_b Base) (Either a b) where fmap' lens f (Left a) = Left a fmap' lens f (Right b) = Right $ f b

and create a “Left” Functor instance as:

instance Functor (Param_a Base) (Either a b) where fmap' lens f (Left a) = Left $ f a fmap' lens f (Right b) = Right b

Together, these instances let us run the commands:

ghci> fmap _b length $ Left "Roses are red," Left "Roses are red," ghci> fmap _b length $ Rightt "Violets are blue," Right 17 ghci> fmap _a length $ Left "Haskell is fun," Left 15 ghci> fmap _a length $ Right "Type lenses are cool." Right "Type lenses are cool."

With the above definitions, we can’t combine our type lenses at all. Enter the funnily named and awkwardly typed zoom combinator:

zoom :: TypeLens a p -> TypeLens a (Zoom p)

This combinator lets us zoom into a composed type lens, removing the outer most layer. For example, given the composed type lens:

ghci> :t _a._b._a._b _a._b._a._b :: TypeLens a (Param_a (Param_b (Param_a (Param_b a))))

Then zooming in removes the first _a:

ghci> :t zoom (_a._b._a._b) zoom (_a._b._a._b) :: TypeLens a (Param_b (Param_a (Param_b a)))

We will use this combinator to redefine our Functor instances. The new instances will recursively map over every Functor in our input lens:

> instance Functor p b => Functor (Param_b p) (Either a b) where > fmap' lens f (Left a) = Left a > fmap' lens f (Right b) = Right $ fmap' (zoom lens) f b > > instance Functor p a => Functor (Param_a p) (Either a b) where > fmap' lens f (Left a) = Left $ fmap' (zoom lens) f a > fmap' lens f (Right b) = Right b

The type Base provides the base case of the recursion:

> instance Functor Base t where > fmap' _ f a = f a

Now, in order to call fmap’, we must compose our lens with the type lens:

_base :: TypeLens Base Base

For example:

ghci> :t _a._b._a._b._base deeplens :: TypeLens Base (Param_a (Param_b (Param_a (Param_b Base))))

And we call fmap’ like:

ghci> fmap' (_a._b._a._b._base) length $ Left $ Right $ Left $ Right "still simpler than the lens package " Left (Right (Left (Right 42))) ghci> fmap' (_a._b._a._b._base) length $ Left $ Right $ Left $ Left "... for now ..." Left (Right (Left (Left "... for now ...")))

Composing all of our lenses with _base is tedious. So let’s write a function that automates that task:

> fmap :: > ( Functor lens t > ) => TypeLens Base lens > -> (GetParam lens t -> c) > -> t > -> SetParam lens c t > fmap lens = fmap' (lens._base)

And we call fmap as:

ghci> fmap (_a._b._a._b) length $ Left $ Right $ Left $ Left "mwahhahahaha" Left (Right (Left (Left "mwahhahahaha")))

We can easily define more of these new Functor instances. In fact, the procedure is exactly as mechanical for type lens based Functors as it is for the traditional Functors. All you have to do is replace every function application with a recursive Functor call:

f x --> fmap' (zoom lens) f x

Here are some examples using the list and Maybe functors:

> mkParams ''[] > instance Functor p a => Functor (Param_a p) [a] where > fmap' lens f [] = [] > fmap' lens f (a:as) = fmap' (zoom lens) f a : fmap' lens f as

> mkParams ''Maybe > instance Functor p a => Functor (Param_a p) (Maybe a) where > fmap' lens f Nothing = Nothing > fmap' lens f (Just a) = Just $ fmap' (zoom lens) f a

Let’s create a variable that uses all of our functors:

> monster = > [ Nothing > , Just (Left "Hello!") > , Just (Right 42) > , Just (Left "World!") > ]

And go to town:

ghci> fmap (_a._a._a._a) succ monster [Nothing,Just (Left "Ifmmp\""),Just (Right 42),Just (Left "Xpsme\"")] ghci> fmap (_a._a._a) length monster [Nothing,Just (Left 6),Just (Right 42),Just (Left 6)] ghci> fmap (_a._a) (const 3.4) monster [Nothing,Just 3.4,Just 3.4,Just 3.4] ghci> fmap _a show monster ["Nothing","Just (Left \"Hello!\")","Just (Right 42)","Just (Left \"World!\")"]

In our next installment, we’ll tackle Applicative parsing with type lenses. Thought the lens package had too many operators??? You ‘aint seen ‘nothin yet.

]]>This post focuses on how to use functors and monads in practice with the HLearn library. We won’t talk about their category theoretic foundations; instead, we’ll go through **ten concrete examples** involving the categorical distribution. This distribution is somewhat awkwardly named for our purposes because it has nothing to do with category theory—it is the most general distribution over non-numeric (i.e. categorical) data. It’s simplicity should make the examples a little easier to follow. Some more complicated models (e.g. the kernel density estimator and Bayesian classifier) also have functor and monad instances, but we’ll save those for another post.

Before we dive into using functors and monads, we need to set up our code and create some data. Let’s install the packages:

$ cabal install HLearn-distributions-1.1.0.1

Import our modules:

> import Control.ConstraintKinds.Functor > import Control.ConstraintKinds.Monad > import Prelude hiding (Functor(..), Monad (..)) > > import HLearn.Algebra > import HLearn.Models.Distributions

For efficiency reasons we’ll be using the Functor and Monad instances provided by the ConstraintKinds package and language extension. From the user’s perspective, everything works the same as normal monads.

Now let’s create a simple marble data type, and a small bag of marbles for our data set.

> data Marble = Red | Pink | Green | Blue | White > deriving (Read,Show,Eq,Ord) > > bagOfMarbles = [ Pink,Green,Red,Blue,Green,Red,Green,Pink,Blue,White ]

This is a very small data set just to make things easy to visualize. Everything we’ll talk about works just as well on arbitrarily large data sets.

We train a categorical distribution on this data set using the **train** function:

> marblesDist = train bagOfMarbles :: Categorical Double Marble

The **Categorical** type takes two parameters. The first is the type of our probabilities, and the second is the type of our data points. If you stick your hand into the bag and draw a random marble, this distribution tells you the probability of drawing each color.

Let’s plot our distribution:

ghci> plotDistribution (plotFile "marblesDist" $ PNG 400 300) marblesDist

Okay. Now we’re ready for the juicy bits. We’ll start by talking about the list functor. This will motivate the advantages of the categorical distribution functor.

A functor is a container that lets us “map” a function onto every element of the container. Lists are a functor, and so we can apply a function to our data set using the **map** function.

map :: (a -> b) -> [a] -> [b]

**Example 1:**

Let’s say instead of a distribution over the marbles’ colors, I want a distribution over the marbles’ weights. I might have a function that associates a weight with each type of marble:

> marbleWeight :: Marble -> Int -- weight in grams > marbleWeight Red = 3 > marbleWeight Pink = 2 > marbleWeight Green = 3 > marbleWeight Blue = 6 > marbleWeight White = 2

I can generate my new distribution by first transforming my data set, and then training on the result. Notice that the type of our distribution has changed. It is no longer a categorical distribution over marbles; it’s a distribution over ints.

> weightsDist = train $ map marbleWeight bagOfMarbles :: Categorical Double Int

ghci> plotDistribution (plotFile "weightsDist" $ PNG 400 300) weightsDist

This is the standard way of preprocessing data. But we can do better because the categorical distribution is also a functor. Functors have a function called **fmap** that is analogous to calling map on a list. This is its type signature specialized for the Categorical type:

fmap :: (Ord dp0, Ord dp1) => (dp0 -> dp1) -> Categorical prob dp0 -> Categorical prob dp1

We can use fmap to apply the marbleWeights function directly to the distribution:

> weightDist' = fmap marbleWeight marblesDist

This is guaranteed to generate the same exact answer, but it is much faster. **It takes only constant time to call Categorical’s fmap, no matter how much data we have!**

Let me put that another way. Below is a diagram showing the two possible ways to generate a model on a preprocessed data set. Every arrow represents a function application.

The normal way to preprocess data is to take the bottom left path. But because our model is a functor, the top right path becomes available. This path is better because it has the shorter run time.

Furthermore, let’s say we want to experiment with different preprocessing functions. The standard method will take time, whereas using the categorical functor takes time .

*Note: The diagram treats the number of different categories (m) as a constant because it doesn’t depend on the number of data points. In our case, we have 5 types of marbles, so m=5. Every function call in the diagram is really multiplied by m.*

**Example 2:**

For another example, what if we don’t want to differentiate between red and pink marbles? The following function converts all the pink marbles to red.

> pink2red :: Marble -> Marble > pink2red Pink = Red > pink2red dp = dp

Let’s apply it to our distribution, and plot the results:

> nopinkDist = fmap pink2red marblesDist

ghci> plotDistribution (plotFile "nopinkDist" $ PNG 400 300) nopinkDist

That’s about all that a Functor can do by itself. When we call fmap, we can only process individual data points. We can’t change the number of points in the resulting distribution or do other complex processing. Monads give us this power.

Monads are functors with two more functions. The first is called **return**. Its type signature is

return :: (Ord dp) => dp -> Categorical prob dp

We’ve actually seen this function already in previous posts. It’s equivalent to the **train1dp** function found in the **HomTrainer** type class. All it does is train a categorical distribution on a single data point.

The next function is called **join.** It’s a little bit trickier, and it’s where all the magic lies. Its type signature is:

join :: (Ord dp) => Categorical prob (Categorical prob dp) -> Categorical prob dp

As input, join takes a categorical distribution whose data points are other categorical distributions. It then “flattens” the distribution into one that does not take other distributions as input.

**Example 3**

Let’s write a function that removes all the pink marbles from our data set. Whenever we encounter a pink marble, we’ll replace it with an empty categorical distribution; if the marble is not pink, we’ll create a singleton distribution from it.

> forgetPink :: (Num prob) => Marble -> Categorical prob Marble > forgetPink Pink = mempty > forgetPink dp = train1dp dp > > nopinkDist2 = join $ fmap forgetPink marblesDist

ghci> plotDistribution (plotFile "nopinkDist2" $ PNG 400 300) nopinkDist2

This idiom of **join ( fmap … )** is used a lot. For convenience, the** >>=** operator (called **bind**) combines these steps for us. It is defined as:

(>>=) :: Categorical prob dp0 -> (dp0 -> Categorical prob dp1) -> Categorical prob dp1 dist >>= f = join $ fmap f dist

Under this notation, our new distribution can be defined as:

> nopinkDist2' = marblesDist >>= forgetPink

**Example 4
**

Besides removing data points, we can also add new ones. Let’s double the number of pink marbles in our training data:

> doublePink :: (Num prob) => Marble -> Categorical prob Marble > doublePink Pink = 2 .* train1dp Pink > doublePink dp = train1dp dp > > doublepinkDist = marblesDist >>= doublePink

ghci> plotDistribution (plotFile "doublepinkDist" $ PNG 400 300) doublepinkDist

**Example 5
**

Mistakes are often made when collecting data. One common machine learning task is to preprocess data sets to account for these mistakes. In this example, we’ll assume that our sampling process suffers from uniform noise. Specifically, if one of our data points is red, we will assume there is only a 60% chance that the marble was actually red, and a 10% chance each that it was one of the other colors. We will define a function to add this noise to our data set, increasing the accuracy of our final distribution.

Notice that we are using fractional weights for our noise, and that the weights are carefully adjusted so that the total number of marbles in the distribution still sums to one. We don’t want to add or remove marbles while adding noise.

> addNoise :: (Fractional prob) => Marble -> Categorical prob Marble > addNoise dp = 0.5 .* train1dp dp <> 0.1 .* train [ Red,Pink,Green,Blue,White ] > > noiseDist = marblesDist >>= addNoise

ghci> plotDistribution (plotFile "noiseDist" $ PNG 400 300) noiseDist

Adding uniform noise just made all our probabilities closer together.

**Example 6
**

Of course, the amount of noise we add to each sample doesn’t have to be the same everywhere. If I suffer from red-green color blindness, then I might use this as my noise function:

> rgNoise :: (Fractional prob) => Marble -> Categorical prob Marble > rgNoise Red = trainW [(0.7,Red),(0.3,Green)] > rgNoise Green = trainW [(0.1,Red),(0.9,Green)] > rgNoise dp = train1dp dp > > rgNoiseDist = marblesDist >>= rgNoise

ghci> plotDistribution (plotFile "rgNoiseDist" $ PNG 400 300) rgNoiseDist

Because of my color blindness, the probability of drawing a red marble from the bag is higher than drawing a green marble. This is despite the fact that we observed more green marbles in our training data.

**Example 7
**

In the real world, we can never know exactly how much error we have in the samples. Luckily, we can try to learn it by conducting a second experiment. We’ll first experimentally determine how red-green color blind I am, then we’ll use that to update our already trained distribution.

To determine the true error rate, we need some unbiased source of truth. In this case, we can just use someone with good vision. They will select ten red marbles and ten green marbles, and I will guess what color they are.

Let’s train a distribution on what I think green marbles look like:

> greenMarbles = [Green,Red,Green,Red,Green,Red,Red,Green,Green,Green] > greenDist = train greenMarbles :: Categorical Double Marble

and what I think red marbles look like:

> redMarbles = [Red,Green,Red,Green,Red,Red,Green,Green,Red,Red] > redDist = train redMarbles :: Categorical Double Marble

Now we’ll create the noise function based off of our empirical data. The **(/.)** function is scalar division, and we can use it because the categorical distribution is a vector space. We’re dividing by the number of data points in the distribution so that the distribution we output has an effective training size of one. This ensures that we’re not accidentally creating new data points when applying our function to another distribution.

> rgNoise2 :: Marble -> Categorical Double Marble > rgNoise2 Green = greenDist /. numdp greenDist > rgNoise2 Red = redDist /. numdp redDist > rgNoise2 dp = train1dp dp > > rgNoiseDist2 = marblesDist >>= rgNoise2

ghci> plotDistribution (plotFile "rgNoiseDist2" $ PNG 400 300) rgNoiseDist2

**Example 8
**

We can chain our preprocessing functions together in arbitrary ways.

> allDist = marblesDist >>= forgetPink >>= addNoise >>= rgNoise

ghci> plotDistribution (plotFile "allDist" $ PNG 400 300) allDist

But wait! Where’d that pink come from? Wasn’t the call to forgetPink supposed to remove it? The answer is that we did remove it, but then we added it back in with our noise functions. When using monadic functions, we must be careful about the order we apply them in. This is just as true when using regular functions.

Here’s another distribution created from those same functions in a different order:

> allDist2 = marblesDist >>= addNoise >>= rgNoise >>= forgetPink

ghci> plotDistribution (plotFile "allDist" $ PNG 400 300) allDist2

We can also use Haskell’s do notation to accomplish the same exact thing:

>allDist2' :: Categorical Double Marble >allDist2' = do > dp <- train bagOfMarbles > dp <- addNoise dp > dp <- rgNoise dp > dp <- forgetPink dp > return dp

(Since we’re using a custom Monad definition, do notation requires the RebindableSyntax extension.)

**Example 9
**

Do notation gives us a convenient way to preprocess multiple data sets into a single data set. Let’s create two new data sets and their corresponding distributions for us to work with:

> bag1 = [Red,Pink,Green,Blue,White] > bag2 = [Red,Blue,White] > > bag1dist = train bag1 :: Categorical Double Marble > bag2dist = train bag2 :: Categorical Double Marble

Now, we’ll create a third data set that is a weighted combination of bag1 and bag2. We will do this by repeated sampling. On every iteration, with a 20% probability we’ll sample from bag1, and with an 80% probability we’ll sample from bag2. Imperative pseudo-code for this algorithm is:

let comboDist be an empty distribution loop until desired accuracy achieved: let r be a random number from 0 to 1 if r > 0.2: sample dp1 from bag1 add dp1 to comboDist else: sample dp2 from bag2 add dp2 to comboDist

This sampling procedure will obviously not give us an exact answer. But since the categorical distribution supports weighted data points, we can use this simpler pseudo-code to generate an exact answer:

let comboDist be an empty distribution foreach datapoint dp1 in bag1: foreach datapoint dp2 in bag2: add dp1 with weight 0.2 to comboDist add dp2 with weight 0.8 to comboDist

Using do notation, we can express this as:

> comboDist :: Categorical Double Marble > comboDist = do > dp1 <- bag1dist > dp2 <- bag2dist > trainW [(0.2,dp1),(0.8,dp2)]

plotDistribution (plotFile "comboDist" $ PNG 400 300) comboDist

And because the Categorical functor takes constant time, constructing comboDist also takes constant time. The naive imperative algorithm would have taken time .

When combining multiple distributions this way, the number of data points in our final distribution will be the product of the number of data points in the initial distributions:

ghci> numdp combination 15

**Example 10
**

Finally, arbitrarily complex preprocessing functions can be written using Haskell’s do notation. And remember, no matter how complicated these functions are, their run time never depends on the number of elements in the initial data set.

This function adds uniform sampling noise to our bagOfMarbles, but only on those marbles that are also contained in bag2 above.

> comboDist2 :: Categorical Double Marble > comboDist2 = do > dp1 <- marblesDist > dp2 <- bag2dist > if dp1==dp2 > then addNoise dp1 > else return dp1

plotDistribution (plotFile "comboDist2" $ PNG 400 300) comboDist2

This application of monads to machine learning generalizes the monad used in probabilistic functional programming. The main difference is that PFP focused on manipulating already known distributions, not training them from data. Also, if you enjoy this kind of thing, you might be interested in the n-category cafe discussion on category theory in machine learning from a few years back.

In future posts, we’ll look at functors and monads for continuous distributions, multivariate distributions, and classifiers.

Subscribe to the RSS feed to stay tuned!

]]>Haskell code is expressive. The HLearn library uses 6 lines of Haskell to define a function for training a Bayesian classifier; the equivalent code in the Weka library uses over 100 lines of Java. That’s a big difference! In this post, we’ll look at the actual code and see why the Haskell is so much more concise.

**But first, a disclaimer:** It is really hard to fairly compare two code bases this way. In both libraries, there is a lot of supporting code that goes into defining each classifier, and it’s not obvious what code to include and not include. For example, both libraries implement interfaces to a number of probability distributions, and this code is not contained in the source count. The Haskell code takes more advantage of this abstraction, so this is one language-agnostic reason why the Haskell code is shorter. If you think I’m not doing a fair comparison, here’s some links to the full repositories so you can do it yourself:

- HLearn’s bayesian classifier source code (74 lines of code)
- Weka’s naive bayes source code (946 lines of code)

HLearn implements training for a bayesian classifier with these six lines of Haskell:

newtype Bayes labelIndex dist = Bayes dist deriving (Read,Show,Eq,Ord,Monoid,Abelian,Group) instance (Monoid dist, HomTrainer dist) => HomTrainer (Bayes labelIndex dist) where type Datapoint (Bayes labelIndex dist) = Datapoint dist train1dp dp = Bayes $ train1dp dp

This code elegantly captures how to train a Bayesian classifier—just train a probability distribution. Here’s an explanation:

- The first two lines define the Bayes data type as a wrapper around a distribution.
- The fourth line says that we’re implementing the Bayesian classifier using the HomTrainer type class. We do this because
**the Haskell compiler automatically generates a parallel batch training function, an online training function, and a fast cross-validation function for all HomTrainer instances.** - The fifth line says that our data points have the same type as the underlying distribution.
- The sixth line says that in order to train, just train the corresponding distribution.

We only get the benefits of the HomTrainer type class because the bayesian classifier is a monoid. But we didn’t even have to specify what the monoid instance for bayesian classifiers looks like! In this case, it’s automatically derived from the monoid instances for the base distributions using a language extension called GeneralizedNewtypeDeriving. For examples of these monoid structures, check out the algebraic structure of the normal and categorical distributions, or more complex distributions using Markov networks.

Look for these differences between the HLearn and Weka source:

- In Weka we must separately define the online and batch trainers, whereas Haskell derived these for us automatically.
- Weka must perform a variety of error handling that Haskell’s type system takes care of in HLearn.
- The Weka code is tightly coupled to the underlying probability distribution, whereas the Haskell code was generic enough to handle any distribution. This means that while Weka must make the “naive bayes assumption” that all attributes are independent of each other, HLearn can support any dependence structure.
- Weka’s code is made more verbose by for loops and if statements that aren’t necessary for HLearn.
- The Java code requires extensive comments to maintain readability, but the Haskell code is simple enough to be self-documenting (at least once you know how to read Haskell).
- Weka does not have parallel training, fast cross-validation, data point subtraction, or weighted data points, but HLearn does.

/** * Generates the classifier. * * @param instances set of instances serving as training data * @exception Exception if the classifier has not been generated * successfully */ public void buildClassifier(Instances instances) throws Exception { // can classifier handle the data? getCapabilities().testWithFail(instances); // remove instances with missing class instances = new Instances(instances); instances.deleteWithMissingClass(); m_NumClasses = instances.numClasses(); // Copy the instances m_Instances = new Instances(instances); // Discretize instances if required if (m_UseDiscretization) { m_Disc = new weka.filters.supervised.attribute.Discretize(); m_Disc.setInputFormat(m_Instances); m_Instances = weka.filters.Filter.useFilter(m_Instances, m_Disc); } else { m_Disc = null; } // Reserve space for the distributions m_Distributions = new Estimator[m_Instances.numAttributes() - 1] [m_Instances.numClasses()]; m_ClassDistribution = new DiscreteEstimator(m_Instances.numClasses(), true); int attIndex = 0; Enumeration enu = m_Instances.enumerateAttributes(); while (enu.hasMoreElements()) { Attribute attribute = (Attribute) enu.nextElement(); // If the attribute is numeric, determine the estimator // numeric precision from differences between adjacent values double numPrecision = DEFAULT_NUM_PRECISION; if (attribute.type() == Attribute.NUMERIC) { m_Instances.sort(attribute); if ( (m_Instances.numInstances() > 0) && !m_Instances.instance(0).isMissing(attribute)) { double lastVal = m_Instances.instance(0).value(attribute); double currentVal, deltaSum = 0; int distinct = 0; for (int i = 1; i < m_Instances.numInstances(); i++) { Instance currentInst = m_Instances.instance(i); if (currentInst.isMissing(attribute)) { break; } currentVal = currentInst.value(attribute); if (currentVal != lastVal) { deltaSum += currentVal - lastVal; lastVal = currentVal; distinct++; } } if (distinct > 0) { numPrecision = deltaSum / distinct; } } } for (int j = 0; j < m_Instances.numClasses(); j++) { switch (attribute.type()) { case Attribute.NUMERIC: if (m_UseKernelEstimator) { m_Distributions[attIndex][j] = new KernelEstimator(numPrecision); } else { m_Distributions[attIndex][j] = new NormalEstimator(numPrecision); } break; case Attribute.NOMINAL: m_Distributions[attIndex][j] = new DiscreteEstimator(attribute.numValues(), true); break; default: throw new Exception("Attribute type unknown to NaiveBayes"); } } attIndex++; } // Compute counts Enumeration enumInsts = m_Instances.enumerateInstances(); while (enumInsts.hasMoreElements()) { Instance instance = (Instance) enumInsts.nextElement(); updateClassifier(instance); } // Save space m_Instances = new Instances(m_Instances, 0); }

And the code for online learning is:

/** * Updates the classifier with the given instance. * * @param instance the new training instance to include in the model * @exception Exception if the instance could not be incorporated in * the model. */ public void updateClassifier(Instance instance) throws Exception { if (!instance.classIsMissing()) { Enumeration enumAtts = m_Instances.enumerateAttributes(); int attIndex = 0; while (enumAtts.hasMoreElements()) { Attribute attribute = (Attribute) enumAtts.nextElement(); if (!instance.isMissing(attribute)) { m_Distributions[attIndex][(int)instance.classValue()]. addValue(instance.value(attribute), instance.weight()); } attIndex++; } m_ClassDistribution.addValue(instance.classValue(), instance.weight()); } }

Every algorithm implemented in HLearn uses similarly concise code. I invite you to browse the repository and see for yourself. The most complicated algorithm is for Markov chains which use only 6 lines for training, and about 20 for defining the Monoid.

You can expect lots of tutorials on how to incorporate the HLearn library into Haskell programs over the next few months.

Subscribe to the RSS feed to stay tuned!

]]>**Code and instructions for reproducing these experiments are available on github.**

Why is HLearn so much faster?

Well, it turns out that the bayesian classifier has the algebraic structure of a monoid, a group, and a vector space. HLearn uses a new cross-validation algorithm that can exploit these algebraic structures. The standard algorithm runs in time , where is the number of “folds” and is the number of data points. The algebraic algorithms, however, run in time . In other words, it doesn’t matter how many folds we do, the run time is constant! And not only are we faster, but we get the *exact same answer*. Algebraic cross-validation is not an approximation, it’s just fast.

Here’s some run times for k-fold cross-validation on the census income data set. Notice that HLearn’s run time is constant as we add more folds.

And when we set k=n, we have leave-one-out cross-validation. Notice that Weka’s cross-validation has quadratic run time, whereas HLearn has linear run time.

HLearn certainly isn’t going to replace Weka any time soon, but it’s got a number of cool tricks like this going on inside. If you want to read more, you should check out these two recent papers:

I’ll continue to write more about these tricks in future blog posts.

Subscribe to the RSS feed to stay tuned.

]]>

As usual, this post is a literate haskell file. To run this code, you’ll need to install the hlearn-distributions package. This package requires GHC version at least 7.6.

bash> cabal install hlearn-distributions-1.1

Now for some code. We start with our language extensions and imports:

>{-# LANGUAGE DataKinds #-} >{-# LANGUAGE TypeFamilies #-} >{-# LANGUAGE TemplateHaskell #-} > >import HLearn.Algebra >import HLearn.Models.Distributions

Next, we’ll create data type to represent Futurama characters. There are a lot of characters, so we’ll need to keep things pretty organized. The data type will have a record for everything we might want to know about a character. Each of these records will be one of the variables in our multivariate distribution, and all of our data points will have this type.

>data Character = Character > { _name :: String > , _species :: String > , _job :: Job > , _isGood :: Maybe Bool > , _age :: Double -- in years > , _height :: Double -- in feet > , _weight :: Double -- in pounds > } > deriving (Read,Show,Eq,Ord) > >data Job = Manager | Crew | Henchman | Other > deriving (Read,Show,Eq,Ord)

Now, in order for our library to be able to interpret the Character type, we call the template haskell function:

>makeTypeLenses ''Character

This function creates a bunch of data types and type classes for us. These “type lenses” give us a type-safe way to reference the different variables in our multivariate distribution. We’ll see how to use these type level lenses a bit later. There’s no need to understand what’s going on under the hood, but if you’re curious then checkout the hackage documentation or source code.

Now, we’re ready to create a data set and start training. Here’s a list of the employees of Planet Express provided by the resident bureaucrat Hermes Conrad. This list will be our first data set.

>planetExpress = > [ Character "Philip J. Fry" "human" Crew (Just True) 1026 5.8 195 > , Character "Turanga Leela" "alien" Crew (Just True) 43 5.9 170 > , Character "Professor Farnsworth" "human" Manager (Just True) 85 5.5 160 > , Character "Hermes Conrad" "human" Manager (Just True) 36 5.3 210 > , Character "Amy Wong" "human" Other (Just True) 21 5.4 140 > , Character "Zoidberg" "alien" Other (Just True) 212 5.8 225 > , Character "Cubert Farnsworth" "human" Other (Just True) 8 4.3 135 > ]

Let’s train a distribution from this data. Here’s how we would train a distribution where every variable is independent of every other variable:

>dist1 = train planetExpress :: Multivariate Character > '[ Independent Categorical '[String,String,Job,Maybe Bool] > , Independent Normal '[Double,Double,Double] > ] > Double

In the HLearn library, we always use the function **train** to train a model from data points. We specify which model to train in the type signature.

As you can see, the Multivariate distribution takes three type parameters. The first parameter is the type of our data point, in this case Character. The second parameter describes the dependency structure of our distribution. We’ll go over the syntax for the dependency structure in a bit. For now, just notice that it’s a type-level list of distributions. Finally, the third parameter is the type we will use to store our probabilities.

What can we do with this distribution? One simple task we can do is to find marginal distributions. The marginal distribution is the distribution of a certain variable ignoring all the other variables. For example, let’s say I want a distribution of the species that work at planet express. I can get this by:

>dist1a = getMargin TH_species dist1

Notice that we specified which variable we’re taking the marginal of by using the type level lens TH_species. This data constructor was automatically created for us by out template haskell function makeTypeLenses. Every one of our records in the data type has its own unique type lens. It’s name is the name of the record, prefixed by TH. These lenses let us infer the types of our marginal distributions at compile time, rather than at run time. For example, the type of the marginal distribution of species is:

ghci> :t dist1a dist1a :: Categorical String Double

That is, a categorical distributions whose data points are Strings and which stores probabilities as a Double. Now, if I wanted a distribution of the weights of the employees, I can get that by:

>dist1b = getMargin TH_weight dist1

And the type of this distribution is:

ghci> :t dist1b dist1b :: Normal Double

Now, I can easily plot these marginal distributions with the **plotDistribution** function:

ghci> plotDistribution (plotFile "dist1a" $ PNG 250 250) dist1a ghci> plotDistribution (plotFile "dist1b" $ PNG 250 250) dist1b

But wait! I accidentally forgot to include Bender in the planetExpress data set! What can I do?

In a traditional statistics library, we would have to retrain our data from scratch. If we had billions of elements in our data set, this would be an expensive mistake. But in our HLearn library, we can take advantage of the model’s monoid structure. In particular, the compiler used this structure to automatically derive a function called **add1dp** for us. Let’s look at its type:

ghci> :t add1dp add1dp :: HomTrainer model => model -> Datapoint model -> model

It’s pretty simple. The function takes a model and adds the data point associated with that model. It returns the model we would have gotten if the data point had been in our original data set. This is called online training.

Again, because our distributions form monoids, the compiler derived an efficient and exact online training algorithm for us automatically.

So let’s create a new distribution that considers bender:

>bender = Character "Bender Rodriguez" "robot" Crew (Just True) 44 6.1 612 >dist1' = add1dp dist1 bender

And plot our new marginals:

ghci> plotDistribution (plotFile "dist1-withbender-species" $ PNG 250 250) $ getMargin TH_species dist1' ghci> plotDistribution (plotFile "dist1-withbender-weight" $ PNG 250 250) $ getMargin TH_weight dist1'

Notice that our categorical marginal has clearly changed, but that our normal marginal doesn’t seemed to have changed at all. This is because the plotting routines automatically scale the distribution, and the normal distribution, when scaled, always looks the same. We can double check that we actually did change the weight distribution by comparing the mean:

ghci> mean dist1b 176.42857142857142 ghci> mean $ getMargin TH_weight dist1' 230.875

Bender’s weight really changed the distribution after all!

That’s cool, but our original distribution isn’t very interesting. What makes multivariate distributions interesting is when the variables affect each other. This is true in our case, so we’d like to be able to model it. For example, we’ve already seen that robots are much heavier than organic lifeforms, and are throwing off our statistics. The HLearn library supports a small subset of Markov Networks for expressing these dependencies.

We represent Markov Networks as graphs with undirected edges. Every attribute in our distribution is a node, and every dependence between attributes is an edge. We can draw this graph with the **plotNetwork** command:

ghci> plotNetwork "dist1-network" dist1

As expected, there are no edges in our graph because everything is independent. Let’s create a more interesting distribution and plot its Markov network.

>dist2 = train planetExpress :: Multivariate Character > '[ Ignore '[String] > , MultiCategorical '[String] > , Independent Categorical '[Job,Maybe Bool] > , Independent Normal '[Double,Double,Double] > ] > Double

ghci> plotNetwork "dist2-network" dist2

Okay, so what just happened?

The syntax for representing the dependence structure is a little confusing, so let’s go step by step. We represent the dependence information in the graph as a list of types. Each element in the list describes both the marginal distribution and the dependence structure for one or more records in our data type. We must list these elements in the same order as the original data type.

Notice that we’ve made two changes to the list. First, our list now starts with the type Ignore ‘[String]. This means that the first string in our data type—the name—will be ignored. Notice that TH_name is no longer in the Markov Network. This makes sense because we expect that a character’s name should not tell us too much about any of their other attributes.

Second, we’ve added a dependence. The MultiCategorical distribution makes everything afterward in the list dependent on that item, but not the things before it. This means that the exact types of dependencies it can specify are dependent on the order of the records in our data type. Let’s see what happens if we change the location of the MultiCategorical:

>dist3 = train planetExpress :: Multivariate Character > '[ Ignore '[String] > , Independent Categorical '[String] > , MultiCategorical '[Job] > , Independent Categorical '[Maybe Bool] > , Independent Normal '[Double,Double,Double] > ] > Double

ghci> plotNetwork "dist3-network" dist3

As you can see, our species no longer have any relation to anything else. Unfortunately, using this syntax, the order of list elements is important, and so the order we specify our data records is important.

Finally, we can substitute any valid univariate distribution for our Normal and Categorical distributions. The HLearn library currently supports Binomial, Exponential, Geometric, LogNormal, and Poisson distributions. These just don’t make much sense for modelling Futurama characters, so we’re not using them.

Now, we might be tempted to specify that every variable is fully dependent on every other variable. In order to do this, we have to introduce the “Dependent” type. Any valid multivariate distribution can follow Dependent, but only those records specified in the type-list will actually be dependent on each other. For example:

>dist4 = train planetExpress :: Multivariate Character > '[ Ignore '[String] > , MultiCategorical '[String,Job,Maybe Bool] > , Dependent MultiNormal '[Double,Double,Double] > ] > Double

ghci> plotNetwork "dist4-network" dist4

Undoubtably, this is in always going to be the case—everything always has a slight influence on everything else. Unfortunately, it is not easy in practice to model these fully dependent distributions. We need roughly data points to accurately train a distribution, where n is the number of nodes in our graph and e is the number of edges in our network. Thus, by selecting that two attributes are independent of each other, we can greatly reduce the amount of data we need to train an accurate distribution.

I realize that this syntax is a little awkward. I chose it because it was relatively easy to implement. Future versions of the library should support a more intuitive syntax. I also plan to use copulas to greatly expand the expressiveness of these distributions. In the mean time, the best way to figure out the dependencies in a Markov Network are just to plot it and see visually.

Okay. So what distribution makes the most sense for Futurama characters? We’ll say that everything depends on both the characters’ species and job, and that their weight depends on their height.

>planetExpressDist = train planetExpress :: Multivariate Character > '[ Ignore '[String] > , MultiCategorical '[String,Job] > , Independent Categorical '[Maybe Bool] > , Independent Normal '[Double] > , Dependent MultiNormal '[Double,Double] > ] > Double

ghci> plotNetwork "planetExpress-network" planetExpressDist

We still don’t have enough data to to train this network, so let’s create some more. We start by creating a type for our Markov network called FuturamaDist. This is just for convenience so we don’t have to retype the dependence structure many times.

>type FuturamaDist = Multivariate Character > '[ Ignore '[String] > , MultiCategorical '[String,Job] > , Independent Categorical '[Maybe Bool] > , Independent Normal '[Double] > , Dependent MultiNormal '[Double,Double] > ] > Double

Next, we train some more distribubtions of this type on some of the characters. We’ll start with Mom Corporation and the brave Space Forces.

>momCorporation = > [ Character "Mom" "human" Manager (Just False) 100 5.5 130 > , Character "Walt" "human" Henchman (Just False) 22 6.1 170 > , Character "Larry" "human" Henchman (Just False) 18 5.9 180 > , Character "Igner" "human" Henchman (Just False) 15 5.8 175 > ] >momDist = train momCorporation :: FuturamaDist

>spaceForce = > [ Character "Zapp Brannigan" "human" Manager (Nothing) 45 6.0 230 > , Character "Kif Kroker" "alien" Crew (Just True) 113 4.5 120 > ] >spaceDist = train spaceForce :: FuturamaDist

And now some more robots:

>robots = > [ bender > , Character "Calculon" "robot" Other (Nothing) 123 6.8 650 > , Character "The Crushinator" "robot" Other (Nothing) 45 8.0 4500 > , Character "Clamps" "robot" Henchman (Just False) 134 5.8 330 > , Character "DonBot" "robot" Manager (Just False) 178 5.8 520 > , Character "Hedonismbot" "robot" Other (Just False) 69 4.3 1200 > , Character "Preacherbot" "robot" Manager (Nothing) 45 5.8 350 > , Character "Roberto" "robot" Other (Just False) 77 5.9 250 > , Character "Robot Devil" "robot" Other (Just False) 895 6.0 280 > , Character "Robot Santa" "robot" Other (Just False) 488 6.3 950 > ] >robotDist = train robots :: FuturamaDist

Now we’re going to take advantage of the monoid structure of our multivariate distributions to combine all of these distributions into one.

> futuramaDist = planetExpressDist <> momDist <> spaceDist <> robotDist

The resulting distribution is equivalent to having trained a distribution from scratch on all of the data points:

train (planetExpress++momCorporation++spaceForces++robots) :: FuturamaDist

We can take advantage of this property any time we use the train function to automatically parallelize our code. The higher order function **parallel** will split the training task evenly over each of your available processors, then merge them together with the monoid operation. This results in “theoretically perfect” parallel training of these models.

parallel train (planetExpress++momCorporation++spaceForces++robots) :: FuturamaDist

Again, this is only possible because the distributions have a monoid structure.

Now, let’s ask some questions of our distribution. If I pick a character at random, what’s the probability that they’re a good guy? Let’s plot the marginal.

ghci> plotDistribution (plotFile "goodguy" $ PNG 250 250) $ getMargin TH_isGood futuramaDist

But what if I only want to pick from those characters that are humans, or those characters that are robots? Statisticians call this conditioning. We can do that with the condition function:

ghci> plotDistribution (plotFile "goodguy-human" $ PNG 250 250) $ getMargin TH_isGood $ condition TH_species "human" futuramaDist ghci> plotDistribution (plotFile "goodguy-robot" $ PNG 250 250) $ getMargin TH_isGood $ condition TH_species "robot" futuramaDist

Now let’s ask: What’s the average age of an evil robot?

ghci> mean $ getMargin TH_age $ condition TH_isGood (Just False) $ condition TH_species "robot" futuramaDist 273.0769230769231

Notice that conditioning a distribution is a commutative operation. That means we can condition in any order and still get the exact same results. Let’s try it:

ghci> mean $ getMargin TH_age $ condition TH_species "robot" $ condition TH_isGood (Just False) futuramaDist 273.0769230769231

There’s one last thing for us to consider. What does our Markov network look like after conditioning? Let’s find out!

plotNetwork "condition-species-isGood" $ condition TH_species "robot" $ condition TH_isGood (Just False) futuramaDist

Notice that conditioning against these variables caused them to go away from our Markov Network.

Finally, there’s another similar process to conditioning called “marginalizing out.” This lets us ignore the effects of a single attribute without specifically saying what that attribute must be. When we marginalize out on our Markov network, we get the same dependence structure as if we conditioned.

plotNetwork "marginalizeOut-species-isGood" $ marginalizeOut TH_species $ marginalizeOut TH_isGood futuramaDist

Effectively, what the marginalizeOut function does is “forget” the extra dependencies, whereas the condition function “applies” those dependencies. In the end, the resulting Markov network has the same structure, but different values.

Finally, at the start of the post, I mentioned that our multivariate distributions have group and vector space structure. This gives us two more operations we can use: the inverse and scalar multiplication. You can find more posts on how to take advantage of these structures here and here.

The best part of all of this is still coming. Next, we’ll take a look at full on Bayesian classification and why it forms a monoid. Besides online and parallel trainers, this also gives us a fast cross-validation method.

There’ll also be a posts about the monoid structure of Markov *chains*, the Free HomTrainer, and how this whole algebraic framework applies to NP-approximation algorithms as well.

Subscribe to the RSS feed to stay tuned.