“Scientists at the Massachusetts Institute of Technology have created the first realistic videos of people saying things they never said — a scientific leap that raises unsettling questions about falsifying the moving image,” The Boston Globe reports.
“In one demonstration, the researchers taped a woman speaking into a camera, and then reprocessed the footage into a new video that showed her speaking entirely new sentences, and even mouthing words to a song in Japanese, a language she does not speak. The results were enough to fool viewers consistently, the researchers report.”
The technique’s inventors say it could be used in video games and movie special effects, perhaps reanimating Marilyn Monroe or other dead film stars with new lines. It could also improve dubbed movies, a lucrative global industry.
But scientists warn the technology will also provide a powerful new tool for fraud and propaganda — and will eventually cast doubt on everything from video surveillance to presidential addresses.
[…] Previous work has focused on creating a virtual model of a person’s mouth, then using a computer to render digital images of it as it moves. But the new software relies on an ingenious application of artificial intelligence to teach a machine what a person looks like when talking.
Starting with between two and four minutes of video — the minimum needed for the effect to work — the computer captures images which represent the full range of motion of the mouth and surrounding areas, Ezzat said.
The computer is able to express any face as a combination of these faces (46 in one example), the same way that any color can be represented by a combination of red, green, and blue. The computer then goes through the video, learning how a person expresses every sound, and how it moves from one to the next.
Given a new sound, the computer can then generate an accurate picture of the mouth area and virtually superimpose it on the person’s face, according to a paper describing the work. The researchers are scheduled to present the paper in July at Siggraph, the world’s top computer graphics conference.
The effect is significantly more convincing than a previous effort, called Video Rewrite, which recorded a huge number of small snippets of video and then recombined them. Still, the new method only seems lifelike for a sentence or two at a time, because over longer stretches, the speaker seems to lack emotion.
This is fascinating stuff — the next step toward super-realistic “virtual reality” projects, I’d imagine. The drawbacks are obvious, but perhaps in the short run would have the positive effect of making the soundbite obsolete.
Imagine! A world where what passes for knowledge and information can only be served up in portions larger than a sentence or two… Why, it’s like we’d be living in the dark ages all over again!
[link via Philip Murphy]
—–
