Converting images into time series for data mining

The first step in data mining images is to create a distance measure for two images.  In the intro to data mining images, we called this distance measure the “black box.”  This post will cover how to create distance measures based on time series analysis.  This technique is great for comparing objects with a constant, rigid shape.  For example, it will work well on classifying images of skulls, but not on images of people.  Skulls always have the same shape, whereas a person might be walking, standing, sitting, or curled into a ball.  By the end of this post, you should understand how to compare these hominid skulls from UC Riverside1 using radial scanning and dynamic time warping.

But first, we must start from the beginning.  What exactly is a times series?  Anything that can be plotted on a line graph.  For example, the price of Google stock is a time series:

As you can imagine, time series have been studied extensively.  Most scientists use them at some point in their careers.  Unsurprisingly, they have developed many techniques for analyzing them.  If we can convert our images into time series, then all these tools become available to us.  Therefore, the time series distance measure has two steps:

STEP 1: Convert the images into a time series

STEP 2: Find the distance between two images by finding the distance between their time series

We have our choice of several algorithms for each step.  In the rest of this post, we will look at two algorithms for converting images into time series: radial scanning and linear scanning.  Then, we will look at two algorithms for measuring the distance between time series: Euclidean distance and dynamic time warping.  We will conclude by looking at the types of problems time series analysis handles best and worst.

STEP 1A: Creating a time series by radial scanning

Radial scanning is tricky to explain, but once it clicks you’ll realize that it is both simple and elegant.  Here’s an example from a human skull:

First we find the skull’s outline.  Then we find the distance from the center of the skull to each point on the skull’s outline (B).  Finally, we plot those distances as a time series (C).  The lines connecting the skull to the graph show where that point on the skull maps to the time series below.  In this case, we started at the skull’s mouth and went clockwise.

Skulls from different species produce different time series:

Take a careful look at these skulls and their time series.  Make sure you can spot the differences in the time series between each grouping. Don’t worry yet about how the groupings were made.  Right now, just get a feel for how a shape can be converted into a time series.

Another example of radial scanning comes from Korea University.  Here we are trying to determine a tree’s species based on it’s leaf shapes:2

The labeled points on the leaf at left correspond to the labeled positions on the time series at right.  Radial scanning is a popular technique for leaf classification because every species of plant has a characteristic leaf shape.  Each leaf will be unique, but the pattern of peaks and valleys in the resulting time series should be similar if the species of plant is the same.

We can already tell that the graphs created by the skulls and the leaf look very different to the human eye.  This is a good sign that radial scanning captures important information about the objects shape that we will be able to use in the comparison step.

STEP 1B: Creating a time series by linear scanning

Some objects just aren’t circular, so radial scanning makes no sense.  One example is hand written words.  The University of Massachusetts has analyzed a large collection of George Washington’s letters using the linear scanning method.3 4  In the first image is a picture of the word “Alexandria” as Washington actually wrote it:

Then, we remove the tilt from the image.  All of Washington’s writing has a fairly constant tilt, so this process is easy to automate.

Finally, we create a time series from the word:

To create this time series, we start at the left of the image and consider each column of pixels in turn.  The value at each “time” is just the number of dark pixels in that column.  If you look closely at the time series, you should be able to tell where each bump corresponds to a specific letter.  Some letters, like the “d” get two bumps in the time series because they have two areas with a high concentration of dark pixels.

We could have constructed the time series in other ways as well.  For example, we could have counted the number of pixels from the top of the column to the first dark pixel.  This would have created an outline of the top of the word.  We simply have to consider our application carefully and decide which method will work the best.

We now have two simple methods for creating time series from images.  These are the simplest and most common methods, but the only ones.  WARP5 and Beam Angle Statistics6 are two examples of other methods.  Which is best depends—as always—on the specific application.  Now that we can create the time series, let’s figure out how to compare them.

STEP 2: Comparing the distances

The whole purpose of creating the time series was to create a distance measure that uses them.  The easiest way to do this is the Euclidean distance.  (This is the normal that we are used to.)  Consider the two time series below:7

To calculate the overall distance, we calculate the distance between each corresponding point in the time series.  Corresponding points are connected by black lines.  Notice that the first blue hump corresponds to a flat red area, so this causes the black lines to be shorter.  The second red hump corresponds to a flat blue area, so the black lines are longer.  Everywhere else, the two time series line up fairly well, so the black lines have a mostly constant height.  (Normally, we would start the two time series at the same height, so the first black line would be zero; however, the time series have been moved apart to make the black lines easy to see.)

More formally,

where is the height of the red series at “time” , is the height of the blue series at “time” , and is the length of the time series.  This is a simple and fast calculation, running in time .

A more sophisticated way to compare time series is called Dynamic Time Warping (DTW).  DTW tries to compare similar areas in each time series with each other.  Here are the same two time series compared with DTW:

In this case, each of the humps in the blue series is matched with a hump in the red series, and all the flat areas are paired together.  Notice that a single point in one time series can align with multiple points in the other.  In this case, DTW gives a distance nearly zero—it is a nearly perfect match.  Euclidean distance had a much worse match and would give a large distance.

For most applications, dynamic time warping outperforms straight Euclidean distance.  Take a look at this dendrogram clustering:

The orange series contain three humps, the green four, and the blue five.  But the humps do not line up, so this is a difficult problem for straight Euclidean distance.  In contrast, DTW successfully clustered the time series based on the number of humps they have.

That’s great, but how did DTW decide which points in the red and blue time series should align?

Exhaustive search.  We try every possible alignment and pick the one that works best.  This will be easier to see with a simpler example:

To perform the search, we create an x matrix.  Each row corresponds to a time along the red series, and each column corresponds to a time along the blue series.  The value of each cell is the distance between and .  This effectively compares every time in the red series with every other time in the blue series.  Then, we select the path through the matrix that minimizes the total distance:

The colored boxes correspond to the colored lines connecting the two time series in the first image.  For example, the four light blue squares in the top right are on a single row, so they map one point on the red series to four points on the blue one.

Using dynamic programming, DTW is an algorithm, which is much slower than Euclidean distance’s .  This is a serious problem if we want to use the algorithm to search a large database.

The easiest way to speed up the algorithm is to calculate only a small fraction of the matrix.  Intuitively, we want our warping path to stay relatively close to a diagonal line.  If it stays exactly on the diagonal line, then every red and blue time correspond exactly. This is the same as the Euclidean distance.  At the opposite extreme would be a path that follows the left most, then top most edges.  In this case we are comparing the first blue value to all red values and the last red value to all blue values.  This seems unlikely to make a good match.

There are two common ways to limit the number of calculations.  First is the Sakoe-Chiba band:

The second method is the Itakura parallelogram:

The basic ideas behind these restrictions is pretty straightforward from their pictures.  What isn’t straightforward, however, is that these techniques also increase DTW’s accuracy.8  DTW was introduced to the data mining community in 1994.9  For over a decade researchers tried to find ways to increase the amount of the matrix they could search because they falsely believed that this would lead to more accurate results.

We can also speed up the calculation using an approximation function called a lower bound.  A lower bound is computationally much cheaper than the full DTW function—a good one might run 1000 times faster than the time of the full DTW—and is always less than or equal to the real DTW.  We can run the lower bound on millions of images, and only select the potentially closest matches to run the full DTW algorithm on. Two good lower bounds are LB_Improved10 and LB_Keogh.11

Finally, there are other methods for comparing time series.  The most common is called Longest Common Sub-Sequence (LCSS).  It is useful for matching images suffering from occlusion. 12

When to use Time Series Analysis

Time series analysis is only sensitive to an object’s shape.  It is invariant to colors and internal features.  These properties make time series analysis good for comparing rigid objects, such as skulls, leaves, and handwriting.  These shapes do not change over time, so they will have similar time series no matter when they are measured.

Time series analysis will not work on objects that can change their shapes over time.  People are good examples of this, because we have many different postures.  We can walk, sit, or curl into a ball.  Another distance measure called “shock graphs” is better for comparing the shapes of objects that can move.  We’ll cover shock graphs in a later post.

Footnotes
  1. Eamonn Keogh, Li Wei, Xiaopeng Xi, Sang-Hee Lee and Michail Vlachos  ”LB_Keogh Supports Exact Indexing of Shapes under Rotation Invariance with Arbitrary Representations and Distance Measures.” VLDB 2006. (PDF) []
  2. Yoon-Sik Tak and Eenjun Hwang.  “A Leaf Image Retrieval Scheme Based on Partial Dynamic Warping and Two-Level Filtering” 7th International Conference on Computer and Information Technology, 2007. (Access on IEEE) []
  3. Rath, Kane, Lehman, Partridge, and Manmatha. “Indexing for a Digital Library of George Washinton’s Manuscripts: A Study of Word Matching Techniques.” CIIR Technical Report. (PDF) []
  4. Rath, Manmatha. “Word Image Matching Using Dynamic Time Warping,”  the Proceedings of CVPR-03 conference,vol. 2, pp. 521-527. (PDF) []
  5. Bartolini, Ciaccia, Patella, “WARP: Accurate Retrieval of Shapes Using Phase of Fourier Descriptors and Time Warping Distance” IEEE Transactions of Pattern Analysis and Machine Intelligence, Vol 27 No 1, January 2005. (PDF) []
  6. Arica, Yarman-vural. “BAS: a perceptual shape descriptor based on the beam angle statistics.”  Pattern Recognition Letters 2003. (PDF) []
  7. Keogh, “Exact Indexing of Dynamic Time Warping” (PDF) []
  8.  Ratanamahatana, Keogh. “Three Myths about  Dynamic Time Warping.” SDM 2005. (PDF) []
  9. Berndt, Clifford.  “Using Dynamic Time Warping to Find Patterns in Time Series,” KDD 1994. (PDF) []
  10. Lemire, “Faster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound,” Pattern Recognition 2008. (PDF) []
  11.  Keogh, Ratanamahatana. “Exact indexing of dynamic time warping,” Knowledge and Information Sytems 2002. (PDF) []
  12.  Yazdani, Meral Özsoyoglu. 1996 Sequence matching of images. In Proc. 8th Int. Conf. Sci. Stat. Database Manag. pp. 53–62. []
  1. Jeremy’s avatar

    This is great.

    This might be out of the scope of the blog, but how simple is it to implement these kind of things in R or MatLab?

    Reply

    1. Mike’s avatar

      Thanks!

      It seems like most people use MATLAB to implement the algorithms. MATLAB is nice because most steps can be accomplished in just one or two lines. If you follow the links to the papers, the authors usually put code in the paper or link to the project’s webpage where code will be provided.

      Reply

  2. Andy’s avatar

    I’m a biologist who’s interested in this stuff, but know nothing about how it works. Just wanted to say that I thought it was an excellent, clear intro.

    Reply

  3. Kenny’s avatar

    Excellent post. I’ve used radial scanning in different forms, but I never knew other people used this method in the same manner. A few months ago I created a time series algorithm for image recognition, which at the time I dubbed “radial frequency”. I was working on a machine learning algorithm to do supervised learning on a variety of similar Google image results.

    During the development I used several different photos of apples and oranges, with a goal of teaching the learning algorithm how to accurately and precisely recognize whether or not a photo was featuring an apple or an orange.

    My method used a k-means cluster algorithm to first organize 2d points on a brightness map of the original image. I would then overlay the 2d points back on the original image and transform them to 3d by converting the underlying pixel’s color to a value based on the greatest distance between any two points in the collection.

    If that distance was 50 pixels, then the z value for the new 3d point would be a -25 to 25 value based on the position of the pixel’s color value on the visible color spectrum.

    With a collection of 3d points ready for time series analysis, I perform a spherical radial burst at the arbitrary origin coordinate and plot the collisions as a time series.

    To improve the time series comparison, perform a radial scan on the 2d points to be used in conjunction with the data captured on the 3d radial scan.

    Reply

    1. Mike’s avatar

      Very cool. I’ve never seen anyone create a 3rd dimension off of the image based on pixel values. I’ll bet there’s a lot of room for making cool algorithms with that. Got a link?

      Reply

  4. Chris’s avatar

    Thanks for writing this! I’ve been learning haskell and came across your blog while looking at the HLearn library.

    I’ve heard of DTW before, but never took the time to understand it. After reading your beautiful explanation, I’ve realized it’s the perfect solution to a problem I’ve had at work of comparing high frequency financial market data.

    Reply

    1. Mike’s avatar

      Awesome! Glad it helped!

      Reply

  5. Sam’s avatar

    Really awesome tutorial.. It did help me A LOT.. thanks Mike. But can you point me some clues or code to convert handwriting into time series data. I’ve googled and found nothing. maybe it’s just perfect if you can provide some code..

    Reply

  6. dan’s avatar

    Great idea.

    How would you deal with unequal time series lengths?

    Reply

    1. Mike’s avatar

      DTW should handle that automatically without you having to do anything!

      Reply

  7. Sj’s avatar

    Thanks for this great post! I have few queries though. I am trying to implement this in MATLAB. I believe I have the “distance matrix”. What I wish to attempt is to obtain the warped signal. SO basically, I have a reference signal and another signal and I apply DTW to obtain a kind of a generalized signal. Do you get the idea? I hope I am not misinterpreting the concept.

    Also, what if I have suppose 5 signals that I need to apply this to at the same time. What would you recommend in that scenario?

    Reply

Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>