The Machine Stares Back by Douglas L. Smith | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
     
 



Above: Two examples of Munich's signature (top). If you track the pen's vertical motion over time (center), you get this plot. Dynamic time warping (bottom) lines the curves up by squishing or stretching the time axis as needed at each instant to get the best match. The system then measures the vertical displacement between the two traces, point by point, to decide if they are the same. (In practice, a reference signature is derived from compositing several examples.) below: The same applies to the pens horizontal movements

Says Munich, "Ours is the only camera-based system I know of that looks at writing as its being generated. You could write on ordinary paper while the camera watches, and then throw the paper away. And cameras can be really small. You could have a tiny camera on a wire connected to a credit-card-sized computer. It would be great for airplanes—you'd clip the camera onto the seat-back in front of you, and use the tray table for a desk. It allows for full pen-based interaction with the computer, just as you would with a mouse and keyboard." While collaborators at Bielefeld University in Germany are working on actually reading free-form penmanship (palmtops are still in kindergarten; they cant read cursive script), Munich is working on the underlying problem of seeing the writing.

The basic idea is simple. You poise the pen over a predesignated point on the paper for a second or two, to give the computer a chance to find the pen tip. (Its kind of like going to the inkwell before beginning to write with a quill. In fact, a future version of the system will project an inkwell icon onto the paper, and you'll "dip" into the inkwell to start.) The machine beeps when its ready, and off you go. The computer subtracts out the back-ground paper to create an internal model of what the tip looks like, which it uses to hunt for the tip in subsequent frames. Munich wrote software to measure the tip's position, velocity, and acceleration, and uses another Kalman filter to predict where the tip will turn up next. Again, the system only processes the part of the image it knows the tip will be in, allowing it to run in real time. The computer takes a second look once the pen has moved on, to see if it left a mark. If so, the computer records a "pen-down" stroke (the pen was touching the paper); if not, its a "pen-up" stroke that the reading program can ignore.

The pen-tip position, velocity, and acceleration data is a mathematical description of a curve, which can be matched against other curves, and Munich realized that he had an ideal system for automatic signature verication—a hot technology although not, as we have seen, a mature one. A machine match isn't yet legal in court, for example; but then, DNA evidence has had a pretty rocky road, too. So he modied a popular signal-matching algorithm called dynamic time warping to compensate for the data being offset in time, meaning that the points from one signature usually lie between the points from the other—for example, the first set might catch a cursive "l" at the top and bottom of the loop, while the second set might catch the midpoints of the ascending and descending strokes. (The system runs at 60 frames per second, so the gaps between the points aren't that big, but you get the idea.) He then wrote software to decide if the aligned signatures were close enough to constitute a match, developing more mathematical improvements en route. "The hardest part was actually collecting enough examples to train the system," says Munich. "Normally, you'd like to have dozens of signatures per person, but there's a limit to how many times you can get your labmates, or someone applying for a credit card, to sign their names for you. I only got maybe 10 signatures each." But he noticed that no two of them were quite the same size, or at quite the same angle, so he was able to generate more by slightly rotating or resizing the ones he had. He could even squash them sideways a bit, as if turning a rectangle into a parallelogram. He used the same strategy to evaluate the system's performance, bulking up the number of real signatures and forgeries until there were enough different samples to be statistically meaningful.

It turns out that for signature verication, it doesnt matter whether the pen is touching the paper. We sign our names so often that it's automatic—a single gesture from start to flourish, what a biomechanician would call a ballistic movement. Half the time we're not even looking. Consequently, the pen-up strokes are just as consistent as the pen-down strokes and a lot harder to counterfeit. Says Munich, "You can sit and practice a signature from an example, drawing it over and over slowly and carefully, but how are you going to practice the strokes that aren't recorded?" Leaving aside such obvious gaffes as dotting the wrong "i" first , there's the question of rhythm. Since the computer is recording the pen's speed as well as its path, the forger would have to perform in sync with the victim. (Imagine a pair of ice dancers en duet in separate TV studios, to be composited on videotape later.) "Many other systems use only the pen-down strokes, so we showed that the full trajectory had a comparable, if not better, performance," says Munich.

But the simplest ID-verication system might be staring you in the face—can a computer know you by sight? Actually, this is really the second of two questions, with the first being, can a computer figure out for itself that it's looking at a face? Consider a security camera scanning a crowded department store at Christmas. Can a computer pull the faces out of the milling crowd, the shifting piles of merchandise, the flashing lights, the gently swaying swags of tinsel, and so on? Only then does it make sense to ask if the computer can say, "Hey! That guy's a known shoplifter!" Volumes have been written about face recognition, but in its most general form it remains an unsolved problem. Besides the usual lighting and perspective troubles that any object-recognition system is heir to, faces are infinitely variable—not only from person to person, but from minute to minute. (Watch a two-year-old making faces in the mirror some time.) So some systems look for very low-level features—the < at the corner of the eye, for example—and measure the distances to other such features.A set of readings that matches average distances on real faces is declared to be a face. Other systems take a high-level approach by looking at all the pixels at once and matching them against a stored gallery of faces. Mike Burl (BS 87, MS 92, PhD 97), now at JPL; Thomas Leung (BS 94), now at UC Berkeley working with Perona's thesis advisor, Jitendra Malik; and grad student Markus Weber have developed a system that combines the best of both approaches.

The Machine Stares Back by Douglas L. Smith | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |