The Machine Stares Back | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
by Douglas L. Smith    
 

Think how handy it would be to have a computer that could see what you mean. It could read your scrawled notes, or pull complex mathematical formulae off a blackboard from the back of the lecture hall, or interpret a new valve design as you sketch it. If it could follow gestures, you'd be able to manipulate virtual objects without clunky gloves, and walk around in virtual environments without body-sensing suits. You might even be able to make a sign of displeasure and elicit a computer-generated apology, relieving your frustration without the risk of personal injury or hardware damage inherent in smacking your stupid machine upside the monitor when it desperately needs it. Pietro Perona, professor of electrical engineering and director of Caltech's Center for Neuromorphic Systems Engineering (a National Science Foundation Engineering Research Center) is working on various aspects of machine vision that might lead to such things. His lab is exploiting the ready availability of cheap video cameras and frame grabbers, which convert video footage into digital stills, and souped-up PCs that have the horsepower to process those images on the fly. Much of the lab's work would have been prohibitively expensive just a few years ago.

Their research revolves around figuring out what computational processes will impart vision to a computer. "An image is just a matrix of numbers encoding color and brightness as a function of x and y," Perona explains. "How do you extract useful information from that mumbo-jumbo? It's not easy. Think of a TV channel that's been scrambled: the information is all there, but you don't see anything." Everything looks like that to a computer, he says"cameras are cheap and ubiquitous, from automatic bank tellers to freeway traffic monitors to your desktop PC; images flood the Internet, but they're consumed' only by humans because, with a few exceptions, nobody knows how to write software that will do something really useful with them. And there are other reasons to design sensory systems for our silicon sidekicks. Computer chips are shrinking but keyboards aren't at least, not much so until humans can grow really pointy fingers, computers can't get really small. "And in order to type, or click your mouse, you have to walk up to a computer and touch it. I'd like to be able to deal with it from across the room, or wherever I am, as we do with people." (We also deal with people by speaking to them, and there are Caltech people working on computers that can hear, but that's another article.) "So the key to developing truly portable computers that we can interact with like humans is to replace large, clunky keyboards and mice with tiny cameras and microphones. Given this general long-term vision, if you'll pardon the pun, one needs to start somewhere, and that's where we are."

Back in 1995, postdoc Enrico Di Bernardo, grad student Luis Gonçalves (MS '92), and Enrico Ursella, who was visiting from the University of Padua in Italy, built the first one-camera system capable of tracking the unrestricted three-dimensional movement of a jointed body part— an arm— in real time. (They figured that if they could do an arm, a whole-body tracker would follow fairly easily.) Commercial 3-D motion-capture systems, says Gonçalves, "use multiple cameras, which is a lot easier. The best systems cost about $150,000 and use 16 cameras, and the subject has to wear reflective markers. Also, we deal with a case where the subject is very close to the camera." As you reach toward the camera, perspective causes your hand and forearm to occupy more pixels than your upper arm. Computers don't like it when different parts of the same object keep changing size in relation to one another; other systems work from farther away, where the perspective isn't so pronounced. There are motion-capture systems that don't rely on vision, but you still have to wear something: either magnetic sensors, or an exoskeleton—a fancy knee brace for your whole body, if you will—that measures the angles of your joints. Any system that requires you to strap on anything is invasive, but the Caltech system is noninvasive—no markers are required. "When we started this," Di Bernardo recalls, "there were only three other labs in the world working on noninvasive systems, and they all used multiple cameras. And now lots of people are developing markerless multicamera systems. But we wanted a user with no special equipment to be able to interact with a PC, which we assumed would be sold with just one camera.

The Machine Stares Back by Douglas L. Smith | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |