The Machine Stares Back by Douglas L. Smith | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
     
 



Above: In this shot of Perona's face, the circles mark all the features the computer thinks could be eyes, the +'es are nostrils, and the X'es are nose tips. The computer picks a pair of eye candidates (the correct one, as it happens), and searches the central ellipse for a nose tip and the side ellipses for nostrils.


Above: Four out of Five aint bad. The computer can still find Burls face, even with one eye hidden.

Their system has a set of high-level feature detectors that independently hunt for such things as the eyes, or the tip of the nose, or the corners of the mouth. Each detector marks all the spots that it thinks could be its feature, and the candidates are then combined in groups to see how they fit."It starts by looking at the features a pair at a time,"Burl explains. "Given a pair of features, it knows where to expect the other ones. So given a potential right eye and a potential left eye, it searches an ellipse between and below them for a potential nose, and so on." If everything falls into place, it's probably found a face; if not, it probably hasn't. That word "probably" is the key. Other systems make "hard" detections–either something is an eye corner, or it isn't. This system gives soft detections, saying, "Gee, this looks pretty eye like–I'll say 80 percent odds. This is a lot more error-tolerant, as a set of features that didn't score well individually but are correctly positioned can outscore one really good eye that doesn't go with anything else. And if the machine finds a few features it likes really well, it will forgive the absence of the others. Thus when Burl covered his mouth with his hand, or tilted a bicycle helmet over one eye, it still picked him out amid the lab's background clutter.

The current version runs on a PC at five frames per second, says Weber. "So every one-fifth of a second, it will find your face. At that rate, it can follow you around. If the system took half a minute to find you, you might be long gone before it decided you were there." This is not only important for security applications, but for fancier notions still to come—if somebody does build an emotion recognizer, for example, it will probably be a computation hog. But if the face recognizer found the face first, and then presented to the emotion recognizer just that part of the screen containing the face (which might only be 10 percent of the image), the emotion recognizer could run much faster because it wouldn't be wasting processing time on extraneous pixels.

Weber and postdoc Max Welling are now moving on to more general issues. Rather than showing a feature detector 100 eyes, and saying, "Look for these," Weber is showing the computer whole faces and letting it decide what's important, using a statistical method of estimating probability densities. The computers choices may not be what we humans perceive as essential to"faceness," but by discovering what the computer looks for on its own, Weber hopes to create generic detectors that could be used by anybody to find anything. "You don't want to have eye-detectors and wheel detectors programmed in," he says, "just for the possibility that you might be asked to recognize faces or cars, because then you would have to have millions of detectors." The latest work in the Perona lab goes straight into the curriculum—Weber is the teaching assistant for EE/CNS 148, Topics in Computational Vision, which this year is covering visual recognition.


Above: Magellan's radar images of Venus cover 98 percent of the planet's surface and contain an estimated one million volcanoes that are 20 kilometers or less in diamater. It would take a human geologist some 20 years and an iron constitution to find them all, so Burl and colleagues created JARtool (for JPL Adaptive Recognition tool) to help with the hunt. In this test image, JARtool has marked prospective volcanoes for human verification. Although initial results are promising, the system is still in development.

At JPL, Burl is developing software to look for and catalog geologic features, such as craters and volcanoes, on Venus, Mars, and elsewhere. At the moment, the software is like an intelligent assist-ant that can help a human geologist comb through archived images, but Burl would like it to mature to where it could actually fly on a spacecraft, picking targets for other instruments. "Eventually, we'd like to go beyond 'recognizers' attuned to specic objects to 'discoverers' that can decide on their own when something looks interesting," he says. "For example, we might be able to find localized features that are distinct from the rest of the image in some way. When Voyager flew by Neptune's moon Triton, it took human interpreters to discover the ice geysers, something never before seen in the solar system. But it took four hours for the images to reach Earth, and it would have taken another four to send a command back to Voyager. Triton would hav

e been a speck in the rearview mirror by then.


Above: Comet Halleys nucleus, as seen by the Giotto spacecraft. This is the best / closest view weve ever gotten of a comet.

So an algorithm that could automatically discover such features and refocus the spacecraft's attention on them would open up all sorts of scientic opportunities. The discovery idea ties back in with the issue of what features are important. If you looked at a lot of faces, you might decide that eyes are interesting, because they are distinctive, localized, and recur in many images. If you looked at a lot of planets, you might decide the same thing about craters."

A spacecraft searching for interesting features on alien worlds also has to figure out where in the world those features are, so that they can be found again on the next orbit. Stefano Soatto (MS '93,PhD '96) started the project in collaboration with Ruggero Frezza of the University of Padua, and grad students Jean-Yves Bouguet (MS '94) and Xiaolin Feng (MS '96) are carrying it on, working with JPL's Larry Matthies and Andrew Johnson. Their software package is slated to fly on JPL's Deep Space 4/Champollion mission, which is to launch in 2003 and deploy a sample-drilling lander on a comet named Tempel 1 in 2006. In order to steer to a soft landing on a distant comet, says Bouguet," the response time has to be truly fast. We need an autonomous navigation system, because we cannot rely on control from Earth. And we need a lot of dynamic information: how fast were going, how fast the comet is rotating, where the landmarks are, and the landing sites."

So the question is, if you shoot a movie as you fly by a rock (in their experiments), can you reconstruct its three-dimensional shape using only the information in those pictures?

The Machine Stares Back by Douglas L. Smith | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |