Blurring the lines: Virtual human research promises real-world impacts
Halfway through my interview with Louis-Philippe Morency I suddenly felt incredibly self-conscious Every nod, every movement of pen to paper, every glance in his eyes made me wonder what I might have been saying without saying anything. Would he catch my eyes straying to his bookshelves or the traffic on the street below and notice my (rare) moments of boredom and feel insulted? Would he detect a hurried, enthusiastic nod and hammer a point home to me? Would he latch onto my fascination to try to spin me?
Nonverbal cues drive human conversation. They signal a speaker to come to a point with an expectant glance or urge a listener to grasp the significance of a message with a well-timed raise of the eyebrows. Beneath the surface of our words we steep our conversations in texture and fill our discussions with broader meaning when we move our hands to the rhythm of our voices, shift our weight nervously, affix our gaze on listeners and alter the pitch of our voice with excitement or trepidation. These “backchannels” direct the flow of social interactions, but they aren't universal.
Morency completed his Ph. D. at the Massachusetts Institute of Technology and joined a team at the University of Southern California's Institute for Creative Technologies studying how virtual humans — artificially created but independent characters residing in a computer environment and meant to look, move, behave and communicate like real humans — can be taught to interact more plausibly with real humans and even each other using both vocal language and these nonverbal backchannels. Since coming on board at ICT Morency — along with colleagues at ICT — has won a series of awards and other recognition for research in how computers make sense of the visual data they collect.
Backchannels evolve through time, and they are differentiated by culture. They frame our words. But while these backchannels come to us almost as easily as breathing and are as much a product of thousands of years of history as art and music and religion, they're foreign to computers. Scientists could program the whole of the Oxford English Dictionary and countless combinations of “heuristics” — or problem solving formulas — for proper grammar and machines would still have trouble learning this natural language.
The notion that virtual humans might have unscripted conversations with humans and one another may seem like science fiction. Real humans themselves often struggle to communicate with one another; whether we're participating in complex international negotiations or wooing a mate we weave a quilt of words and body language meant to express our needs and desires. Computers communicate in strings of ones and zeros, a vocabulary of closed and open circuits determining how they “decide” to run programs. They have no other culture, no thousands of years of history to determine their identity.
Dialogue is like a dance
As I sit in Morency's sixth-floor office overlooking Marina Del Ray picking his brain about the challenges of giving computers an identity and a language, his speech speeds up, describing how sustained eye gazes and lowered voices might shape a conversation. Some head nods suggest encouragement, others affirmation. Some even suggest boredom. Morency's own eyes widen as he explains this. He speaks excitedly and I imagine him bouncing about, but he's not.
“It's not like you talking and me talking and you talking,” Morency said. “We are in this conversation together, and so it's kind of a dance. Dialogue is like a dance.”
His voices raises in waves as he explains how computers measure the length of time a gaze is focused on a particular spot, the number of decibels a voice lowers and for how long, and the position and angle of a head cocked in confusion. If I nod as Morency speaks, it might suggest to him I want to hear more. If I nod when he stops and he hasn't asked me a yes or no question, though, it might suggest I'm not paying attention. Replace me with a virtual human nodding at the wrong time and it might mean the system hasn't been programmed to understand when a gesture makes sense.
“This is not magic,” Morency says. “I say head nod, but you can do the same thing with an 'uh huh.' You could do the same thing with a smile.”
Some of ICT's other virtual human team members, like Paul Debevec, are focused on creating better graphical representations of nonverbal behaviors. Others, like Jonathan Gratch, the virtual human team's leader, are exploring models for emotional responses.
Having graphics specialists like Debevec on hand to improve how realistically these virtual humans move isn't meant just to wow outsiders, but to create more believable nonverbal behaviors. Graphics technologies have to be so intricately developed that everything from bone structures to skin textures move in a lifelike manner. Skin on an attentive face must look taut; muscles along a relaxed posture must loosen.
Researchers also need to reach out to such fields as social psychology, anthropology, linguistics, and even economics to help explain how a wink or a handshake or other gestures might carry different meanings in different places. Morency calls the synthesis of the disparate fields he has studied “computational psychology.”
“These five areas together — sociology, psychology, linguistics, machine learning and computer vision — are kinda the core for me,”he says, “but if you want to study nonverbal behavior you need to have all of them together.”
Since the dawn of diplomacy, political leaders have found value in understanding how to bridge cultural gaps. In the 21st century, as the world becomes increasingly globalized, institutions are ramping up investment in tools that can improve understanding of foreign cultures. A negotiation might take place sitting down over tea in one culture; another might value terse, to-the-point, discourse ended with handshakes. Where eye contact shows respect one place, it might stir discomfort in another. A raised middle finger might be an insult in one country; A jettisoned shoe might be a more powerful statement somewhere else.
The Army gets an "agent"
One institution trying desperately to bridge the cultural chasms it encounters is the U.S. Army, a primary sponsor of the ICT. Virtual humans that can be taught to speak and act like citizens from any given culture can be used to prepare soldiers for foreign entanglements. As the Army slogs through its seventh year in Iraq and the American presence in Afghanistan deepens, military officials are beginning to recognize that communicating their intentions and that of their troops requires more than Arabic, Pashto and Dari translators
This fall, Patrick Kenny, an ICT computer scientist, showed visitors an example of how the institute's virtual human research uses computer simulations programmed to look, talk, and move just as an Iraqi might to help the army train soldiers. In front of an audience of USC undergraduates, military representatives and even the principal of a local catholic school, Kenny donned a headset microphone and gripped a wireless trackball in his hand as he prepared for a conversation with two “live” Iraqis in a virtual training simulation in development at ICT. As the audience soon saw and Kenny had warned, kinks were still being worked out of the demonstration, such as characters unable to find words to express the “needs” and “goals” the computer programs wanted them to insist upon; but, even the glitches offered a chance to glimpse under the hood, or perhaps the “skull”, of the virtual humans
As the main lights went down in the cozy virtual reality theater at ICT a screen wrapping around nearly half the room filled with the image of a café setting in Baghdad. Two men, represented with graphics not quite up to par with the latest video games, appeared on the screen facing the audience. One, a young Iraqi doctor, stood in scrubs, while the other, a tribal elder, was dressed in traditional garb. Kenny assumed the role of a U.S. Army captain whose goal was to negotiate with the two Iraqis about moving a clinic from outside the café to a safer setting downtown.
Each character represented a visual manifestation of a unique nonhuman “agent,” a complex computer model embodying a set of goals, communication capabilities and behavioral standards defined by a programmer. Kenny played a soldier tasked with trying to learn how to communicate with them. While he knows what goals the virtual humans were programmed with, a soldier training with the system wouldn't. He or she would need to negotiate with the virtual Iraqis to learn how they behave differently than an American might in a similar situation.
If the virtual humans can be taught to act and communicate as real Iraqis would in a similar situation, then soldiers training with them before deployment might be better prepared negotiations with actual Iraqis. Soldiers deployed to other places, such as Afghanistan, wouldn't train with the same virtual characters; they'd train with virtual characters programmed to act properly for that country.
These different situations require different areas of expertise. In addition to cultural specialists, or “domain experts,” ICT needs creative minds from the film and video game industries to devise the scenarios soldiers — or anyone interacting with virtual characters — might encounter.
At first glance, the involvement of Hollywood seems cosmetic. On the first floor of ICT's squat office building a small conference room sits just behind the glass wall of the tiny reception area. A gray replica of a transporter from the Star Trek: The Next Generation television series hangs from the room's ceiling, making the wait for an appointment seem more like a line at a theme park than anticipating an appointment. The show's set designer, Herman Zimmerman, designed most of the ICT's interior.
Film industry visionaries and leaders in video game design develop scenarios for virtual human simulations, contribute to complex graphics and physics simulations shaping the worlds these characters inhabit, and share ideas about how trainees interact with the characters.
“One of the things we've really tried to do, and I think we've been really successful at, is integrate together a lot of different threads of research,” says Bill Swartout, ICT's director of technology and its first employee. “If you think about it, that's kinda the opposite of the way science usually works.”
Swartout says scientists normally take big problems and isolate them into smaller and smaller problems. As they solve the smaller ones, they move on incrementally to larger challenges.
"Sometimes there are synergies between the different areas that actually allow us to solve problems that were more difficult if we attack them by themselves,” he says.
Even though it may seem like ICT's virtual human team is trying to completely recreate the human mind in computers in one dramatic effort, Swartout says the characters aren't quite as independent as they may seem at first glance. They can't make decisions without being programmed with goals such as a specific combination of words recognized at a specific moment in a conversation, or motions such as firmly crossed arms or a slumping posture recognized at certain times by attached cameras. But it's a massively time-consuming and resource-intensive process to program all the words and physical behaviors from all the world's cultures. For virtual humans to be useful but still realistic as training tools, they are embedded into unique stories and scenarios. The virtual humans in Kenny's café scene don't need to be taught how to answer a question about how the Dodgers fared in a recent game or what they thought of the most recent episode of The Simpsons because those aren't questions likely to arise in a negotiation between American soldiers and Iraqi citizens.
Sometimes nobody can mask the virtual humans' limitations, though.
At the ICT's virtual human demonstration set in the Baghdad cafe, the Iraqi doctor had certain goals related to his profession; the tribal elder placed more stock in tradition and culture. Kenny, as a soldier trying to convince them to move a clinic to downtown Baghdad had to satisfy these goals, but even when he tried, the virtual characters sometimes had trouble developing responses on the fly.
“I cannot express what I want to say,” the doctor character told Kenny. The “agent” — or computer model — guiding the doctor character knew what it wanted to accomplish, it calculated an appropriate response based on Kenny's questions, but it didn't have the proper “surface text.” That is, it lacked the sufficient vocabulary to communicate its message. It didn't know how to tell Kenny it needed an assurance its patients would be safe if the clinic were moved.
Kenny had the advantage of knowing the constraints programmed into each virtual character. Without explaining what he changed in detail, Kenny paused the simulation and moved graphical sliders representing each character's goals on the screen and restarted the demonstration. This time he was able to convince each character to agree to moving the clinic, but they began negotiating independently with one another.
“We should move the clinic downtown,” the doctor told the elder.
“I think we should move the clinic downtown,” the elder then told the doctor. The characters ignored Kenny and tried to get each other to agree to move the clinic; even though they had the same goal, they weren't programmed to be able to negotiate independent of him. Instead they just repeated statements like “We should move the clinic downtown,” and “I think it would be a good idea to move the clinic downtown,” and responding “I cannot understand you” or “I do not have the words to express what I want to say.” They looped around their agreement but couldn't understand one another. The situation became so absurd the doctor even spontaneously switched tongues and told the elder “No comprende.”
Even though there are glitches, outside observers admit the ICT is coming closer to creating virtual characters who look and act like real humans. Jeremy Bailenson, who directs Stanford University's Virtual Human Interaction Lab, studies how humans interact through avatars — digital versions of themselves — in immersive virtual environments. These avatars can be characters in complex online video games or just their voices as heard through cellular phones.
Gratch, ICT's virtual human team leader, says Bailenson's research is crucial because understanding human behavior is necessary for building virtual humans. Bailenson described how individuals have been shown to be more receptive to avatars that resemble themselves. What implications does that have for advertising? For politics?
“For the first time as a species things that look human and seem human and sound human are not necessarily human anymore,” Bailenson says.
Beyond the battlefield
Despite the blurring lines, ICT's virtual human experts are beginning to visualize applications for their research beyond the battlefield. Morency dreams of “companion” robots, or virtual characters that could help people throughout life, interacting with individuals based on evolving understanding of their personalities. The Museum of Science in Boston announced late last year it plans to use virtual human technology from ICT to develop “digital docents” who tailor their tours to each museum visitor.
Every ICT researcher I spoke with buzzes with excitement about budding work on “virtual patients” spearheaded by Kenny. Medical schools now use actors to portray individuals suffering from various afflictions in order to train students. Actors, however, have their limits. Children aren't well enough trained to act out serious conditions like autism and it's not easy for actors to emulate conditions like facial muscles paralyzed by stroke.
ICT's advanced graphics modeling techniques and its understanding of the nonverbal aspects of communication could be used to create virtual humans able to supplement actors in medical training and illustrate the effects of disease.
Despite the frustration with current technical limitations, Kenny says critics of ICT's virtual human research probably don't understand it.
"There is so much stuff going on inside the brain that we don't understand. Trying to model that onto computers is very complex,” Kenny says. “It's like making a 747.”
As he speaks, sunlight pours into his office and over action figures from popular cartoons and comic books scattered on shelves and desks, leaving little hint of the stiff military bureaucracy one might expect from an Army-funded research institute. Video game cases sit on bookshelves containing volumes about games, robotics, psychology and a number of other widely varied subjects, echoing the playful but diverse atmosphere surrounding the ICT. Kenny shrugs off the glitches at his earlier demonstration, lean backs in his chair sand stares wistfully out the window.
"Some day I'd like to put on a play with a cast of virtual humans,” he says.