Mike Izbicki's Research Page

last updated on Jul 1st, 2019

About me

I'm an assistant professor at Claremont McKenna College. My research develops new machine learning algorithms for large scale social media problems. I particularly like developing algorithms with strong theoretic guarantees, and using these guarantees to achieve better real-world performance. The main theoretical tools I use are from high dimensional probability, computational geometry, and numerical optimization. Outside of the machine learning realm, I do a lot of work with Haskell and open source software.

You can contact me at mike@izbicki.me, and you can download my full cv here.

Selected Projects

Geolocating Social Media

Social media messages are sent from all over the world, in hundreds of languages. Unfortunately, most research on social media only studies English-language messages sent from Western countries, ignoring billions of people. To understand how these people use social media, I've developed techniques for understanding the content of images and text written in any language, and used these techniques to predict which region of the world the content was created in. These techniques have applications such as understanding how political opinions differ throughout the world and measuring the effects of climate change.


Source code for these projects is available on the github page.

Robust stochastic gradient descent

Stochastic gradient descent (SGD) is the main algorithm used to train large scale, modern machine learning algorithms like deep neural networks. Unfortunately, it can fail when a malicious adversary has corrupted the data. I showed that a simple technique call gradient clipping makes SGD robust to these attacks both in theory and practice.


Distributed Learning

I'm interested in map-reduce style distributed learning algorithms. These algorithms are easy to implement, fast, and can provide privacy guarantees. My latest work introduces a new algorithm that improves accuracy without losing speed. My previous work shows that map-reduce style algorithms come with cross-validation performance estimates essentially for free.


My disseration contains an expanded description of this work that's a lot more approachable.

Faster Nearest Neighbor Queries

The cover tree data structure is used to speed up nearest neighbor queries. I simplified the definition of the cover tree and proposed a number of modifications that improve the data structure's real world performance. I'm currently working on ways to further improve the theoretical bounds.


You can find code for experiments available on the project's github page.

Protecting the Power Grid

Modern smartgrids can be damaged by malicious loads. Identifying these malicious loads in large power networks is a difficult challenge. I helped design a new algorithm for identifying malicious loads that is more scalable than previous algorithms. The method uses a low rank modification to the unscented kalman filter.


Haskell Projects

I like exploring the ways that functional programming can make writing numerical code (and especially machine learning code) easier. Some of my open source projects are:

  1. HerbiePlugin automatically rewrites mathematical expressions in Haskell code to be numerically stable.
  2. Subhask is an alternative standard library for Haskell. It provides type safe linear algebra operations, and the interface is considerably more flexible than standard tools like Matlab and NumPy.
  3. HLearn is a machine learning library for Haskell. Unfortunately, Haskell's existing type system is too limited for some of the features I need, so this project is on temporary hiatus. Once Haskell gets proper dependent types, I hope to resume working on this project. An old version of the project is described in this paper presented at TFP2013, but the library has advanced considerably since then and the paper is now out of date.

I also have a number of projects that explore the limits of the Haskell type system:

  1. homoiconic automatically constructs FAlgebras that are isomorphic to a type class. The FAlgebras can then be manipulated as ordinary Haskell objects. The end result is the addition of a Lisp-like homoiconicity feature for Haskell.
  2. ifcxt proposes an extension to Haskell's type system to allow constraints to include if statements. This lets us write faster, more generic code.
  3. typeparams provides "lenses" at the type level to facilitate working with complex types.
  4. parsed is a rewrite of the classic Haskell parsec library in bash. See also the associated SIGBOVIK paper Bashing Haskell.

Improving Contributions to Open Source Software

At UC Riverside, I designed and taught a course on open source software construction. The course won 3rd place in the 2015 SIGCSE Student Research Competition. I also presented the course at the ICOPUST conference in the DPRK (North Korea). I helped DPR Korean students create the first open source contributions from their country.


Some open source projects I've helped undergrads start: