|
|||||||||||||||||||
Their system has a set of high-level feature detectors that independently hunt for such things as the eyes, or the tip of the nose, or the corners of the mouth. Each detector marks all the spots that it thinks could be its feature, and the candidates are then combined in groups to see how they fit."It starts by looking at the features a pair at a time,"Burl explains. "Given a pair of features, it knows where to expect the other ones. So given a potential right eye and a potential left eye, it searches an ellipse between and below them for a potential nose, and so on." If everything falls into place, it's probably found a face; if not, it probably hasn't. That word "probably" is the key. Other systems make "hard" detectionseither something is an eye corner, or it isn't. This system gives soft detections, saying, "Gee, this looks pretty eye likeI'll say 80 percent odds. This is a lot more error-tolerant, as a set of features that didn't score well individually but are correctly positioned can outscore one really good eye that doesn't go with anything else. And if the machine finds a few features it likes really well, it will forgive the absence of the others. Thus when Burl covered his mouth with his hand, or tilted a bicycle helmet over one eye, it still picked him out amid the lab's background clutter. The current version runs on a PC at five frames per second, says Weber. "So every one-fifth of a second, it will find your face. At that rate, it can follow you around. If the system took half a minute to find you, you might be long gone before it decided you were there." This is not only important for security applications, but for fancier notions still to comeif somebody does build an emotion recognizer, for example, it will probably be a computation hog. But if the face recognizer found the face first, and then presented to the emotion recognizer just that part of the screen containing the face (which might only be 10 percent of the image), the emotion recognizer could run much faster because it wouldn't be wasting processing time on extraneous pixels. Weber and
postdoc Max Welling are now moving on to more general issues. Rather than
showing a feature detector 100 eyes, and saying, "Look for these,"
Weber is showing the computer whole faces and letting it decide what's
important, using a statistical method of estimating probability densities.
The computers choices may not be what we humans perceive as essential
to"faceness," but by discovering what the computer looks for
on its own, Weber hopes to create generic detectors that could be used
by anybody to find anything. "You don't want to have eye-detectors
and wheel detectors
programmed in," he says, "just for the possibility that you
might be asked to recognize faces or cars, because then you would have
to have millions of detectors." The latest work in the Perona lab
goes straight into the curriculumWeber is the teaching assistant
for EE/CNS 148, Topics in Computational Vision, which this year is covering
visual recognition.
At JPL, Burl is developing software to look for and catalog geologic features, such as craters and volcanoes, on Venus, Mars, and elsewhere. At the moment, the software is like an intelligent assist-ant that can help a human geologist comb through archived images, but Burl would like it to mature to where it could actually fly on a spacecraft, picking targets for other instruments. "Eventually, we'd like to go beyond 'recognizers' attuned to specic objects to 'discoverers' that can decide on their own when something looks interesting," he says. "For example, we might be able to find localized features that are distinct from the rest of the image in some way. When Voyager flew by Neptune's moon Triton, it took human interpreters to discover the ice geysers, something never before seen in the solar system. But it took four hours for the images to reach Earth, and it would have taken another four to send a command back to Voyager. Triton would hav e been a speck in the rearview mirror by then.
So an algorithm that could automatically discover such features and refocus the spacecraft's attention on them would open up all sorts of scientic opportunities. The discovery idea ties back in with the issue of what features are important. If you looked at a lot of faces, you might decide that eyes are interesting, because they are distinctive, localized, and recur in many images. If you looked at a lot of planets, you might decide the same thing about craters." A spacecraft searching for interesting features on alien worlds also has to figure out where in the world those features are, so that they can be found again on the next orbit. Stefano Soatto (MS '93,PhD '96) started the project in collaboration with Ruggero Frezza of the University of Padua, and grad students Jean-Yves Bouguet (MS '94) and Xiaolin Feng (MS '96) are carrying it on, working with JPL's Larry Matthies and Andrew Johnson. Their software package is slated to fly on JPL's Deep Space 4/Champollion mission, which is to launch in 2003 and deploy a sample-drilling lander on a comet named Tempel 1 in 2006. In order to steer to a soft landing on a distant comet, says Bouguet," the response time has to be truly fast. We need an autonomous navigation system, because we cannot rely on control from Earth. And we need a lot of dynamic information: how fast were going, how fast the comet is rotating, where the landmarks are, and the landing sites." So the question is, if you shoot a movie as you fly by a rock (in their experiments), can you reconstruct its three-dimensional shape using only the information in those pictures? |
|||||||||||||||||||
|
|||||||||||||||||||