The next time you get on a Zoom call, you might want to ask the person you’re speaking with to push their finger into the side of their nose. Or maybe turn in complete profile to the camera for a minute.
Those are just some of the methods experts have recommended as ways to provide assurance that you are seeing a real image of the person you are speaking to and not an impersonation created with deepfake technology.
It sounds like a strange precaution, but we live in strange times.
Last month, a top executive of the cryptocurrency exchange Binance said that fraudsters had used a sophisticated deepfake “hologram” of him to scam several cryptocurrency projects. Patrick Hillmann, Binance’s chief communications officer, says criminals had used the deepfake to impersonate him on Zoom calls. (Hillmann has not provided evidence to support his claim and some experts are skeptical a deepfake was used. Nonetheless, security researchers say that such incidents are now plausible.) In July, the FBI warned that people could use deepfakes in job interviews conducted over video conferencing software. A month earlier, several European mayors said they were initially fooled by a deepfake video call purporting to be with Ukrainian President Volodymyr Zelensky. Meanwhile, a startup called Metaphysic that develops deepfake software has made it to the finals of “America’s Got Talent,” by creating remarkably good deepfakes of Simon Cowell and the other celebrity judges, transforming other singers into the celebs in real-time, right before the audience’s eyes.
Deepfakes are extremely convincing fake images and videos created through the use of artificial intelligence. It once required a lot of images of someone, a lot of time, and a fair-degree of both coding skill and special effects know-how to create a believable deepfake. And even once created, the AI model couldn’t be run fast enough to produce a deepfake in real-time on a live video transmission.
That’s no longer the case, as both the Binance story and Metaphysics “America’s Got Talent” act highlight. In fact, it’s becoming increasingly easy for people to use deepfake software to impersonate others in live video transmissions. Software allowing someone to do this is now readily available, for free, and requires relatively little technical skill to use. And as the Binance story also shows, this opens the possibility for all kinds of fraud—and political disinformation.
“I am surprised by how fast live deepfakes have come and how good they are,” Hany Farid, a computer scientist at the University of California at Berkeley who is an expert in video analysis and authentication, says. He says there are at least three different open source programs that allow people to create live deepfakes.
Farid is among those who worry that live deepfakes could supercharge fraud. “This is going to be like phishing scams on steroids,” he says.
The “pencil test” and other tricks to catch an AI impostor
Luckily, experts say there are still a number of techniques a person can use to give themselves a reasonable assurance that they are not communicating with a deepfake impersonation. One of the most reliable is simply to ask a person to turn so that the camera is capturing her in complete profile. Deepfakes struggle with profiles for a number of reasons. For most people, there aren’t enough profile images available to train a deepfake model to reliably reproduce the angle. And while there are ways to use computer software to estimate a profile view from a front-facing image, using this software adds complexity to the process of creating the deepfake.
Deepfake software also uses “anchor points” on a person’s face to properly position the deepfake “mask” on top of it. Turning 90 degrees eliminates half of the anchor points, which often results in the software warping, blurring, or distorting the profile image in strange ways that are very noticeable.
Yisroel Mirsky, a researcher who heads the Offensive AI Lab at Israel’s Ben-Gurion University, has experimented with a number of other methods for detecting live deepfakes that he has compared to the CAPTCHA system used by many websites to detect software bots (you know, the one that asks you to pick out all the images of traffic lights in a photo broken up into squares). His techniques include asking people on a video call to pick up a random object and move it across their face, to bounce an object, to lift up and fold their shirt, to stroke their hair, or to mask part of their face with their hand . In each case, either the deepfake will fail to depict the object being passed in front of the face or the method will cause serious distortion to the facial image. For audio deepfakes, Mirsky suggests asking the person to whistle, or to try to speak with an unusual accent, or to hum or sing a tune chosen at random.
Image courtesy of Yisroel Mirsky
“All of today’s existing deepfake technologies follow a very similar protocol,” Mirsky says. “They are trained on lots and lots of data and that data has to have a particular pattern you are teaching the model.” Most AI software is taught to just reliably mimic a person’s face seen from the front and can’t handle oblique angles or objects that occlude the face well.
Meanwhile, Farid has shown that another way to detect possible deepfakes is to use a simple software program that causes the other person’s computer screen to flicker in a certain pattern or causes it to project a light pattern onto the face of the person using the computer. Either the deepfake will fail to transfer the lighting effect to the impersonation or it will be too slow to do so. A similar detection might be possible just by asking someone to use another light source, such as a smartphone flashlight, to illuminate their face from a different angle, Farid says.
To realistically impersonate someone doing something unusual, Mirsky says that the AI software needs to have seen thousands of examples of people doing that thing. But collecting a data set like that is difficult. And even if you could train the AI to reliably impersonate someone doing one of these challenging tasks—like picking up a pencil and passing it in front of their face—the deepfake is still likely to fail if you ask the person to use a very different kind of object, like a mug. And attackers using deepfakes are also unlikely to have been able to train a deepfake to overcome multiple challenges, like both the pencil test and the profile test. Each different task, Mirsky says, increases the complexity of the training the AI requires. “You are limited in the aspects you want the deepfake software to perfect,” he says.
Deepfakes are getting better all the time
For now, few security experts are suggesting that people will need to use these CAPTCHA-like challenges for every Zoom meeting they take. But Mirsky and Farid both said that people might be wise to use them in high-stakes situations, such as a call between political leaders, or a meeting that might result in a high-value financial transaction. And both Farid and Mirsky urged people to be attuned to other possible red flags, such as audio calls from unfamiliar numbers or people behaving strangely or making unusual requests (would President Biden really want you to buy a bunch of Apple giftcards for him?).
Farid says that for very important calls, people might use some kind simple two-factor authentication, such as sending a text message to a mobile number you know to be the correct one for that person, asking if they are on a video call right now with you.
The researchers also emphasized that deepfakes are getting better all the time and that there is no guarantee that it won’t become much easier for them to evade any particular challenge—or even combinations of them—in the future.
That’s also why many researchers are trying to address the problem of live deepfakes from the opposite perspective—creating some sort of digital signature or watermark that would prove that a video call is authentic, rather than trying to uncover a deepfake.
One group that might work on a protocol for verifying live video calls is the Coalition for Content Provenance and Authentication (C2PA)—a foundation dedicated to digital media authentication standards that’s backed by companies including Microsoft, Adobe, Sony, Twitter. “I think the C2PA should pick this up because they have built specification for recorded video and extending it for live video is a natural thing,” Farid says. But Farid admits that trying to authenticate data that is being streamed in real-time is not an easy technological challenge. “I don’t see immediately how to do it, but it will be interesting to think about,” he says.
In the meantime, remind the guests on your next Zoom call to bring a pencil to the meeting.
Sign up for the Fortune Features email list so you don’t miss our biggest features, exclusive interviews, and investigations.