How worried should we be about deepfakes? What sort of threat do they pose to digital ID verification and the biometric technology that we are becoming so reliant on, and are there ways to combat the threat?
The deepfake threat
Deepfakes refer to manipulated videos or other digital representations produced by sophisticated artificial intelligence (AI), which yield fabricated images and sounds that appear to be real. While video deepfakes are arguably the most common, audio deepfakes are also growing in popularity.
You’ve probably seen some of the more prominent deepfakes that are out in the public domain right now – notably those manipulating existing footage of Obama and Tom Cruise. However, while this technology may seem playful on the surface, we mustn’t overlook its potential darker side.
As seen a couple of years ago, criminals used this technology to impersonate a chief executive’s voice and demand a fraudulent transfer of €220,000. This is just one example, but high-quality fraudulent deepfakes are now being used far more regularly and the quality of the technology is improving all the time, particularly with access to images, online videos and social media creating more sources to exploit.
Impact on biometric identities
Now, think about this threat in the context of the growing popularity in biometric technology and digital ID verification. Some government agencies today are using voice recognition for proof of identity, while banks use voice and facial recognition to register new users and facilitate online banking.
For example, HSBC recently revealed that telephone banking fraud has been reduced by 50% since the introduction of a biometric security system that authenticates customers through their voices, estimating that extra layer of security has prevented £249m of UK customers’ money from falling into the hands of criminals in the last year.
However, as shown in the unfortunate deepfake scam above, cybercriminals have started to use the technology for perpetrating fraud and there is now concern that the technology can and will be used to develop fake biometric identifiers to bypass biometric-based fraud prevention solutions.
So, an obvious question is whether deepfakes are powerful enough to fool the biometric-based solutions on which institutions such as banks and governments are becoming so dependent.
The current limits to deepfake technology
The answer is: not currently, but we should still take steps to protect ourselves. I know that’s not a very satisfying answer, but it’s probably the dose of reality that this debate needs.
Firstly, we must think about how biometric authentication works. Take voice biometrics as an example: a good fake voice (even just a good impersonator) can be enough to fool a human. However, voice biometric software is much better at identifying differences that the human ear either doesn’t discern or chooses to ignore, which means that voice biometric ID can help prevent fraud if identity is checked against the voice. Even so-called deep fakes create a poor copy of someone’s voice when analyzed at the digital level; they make quite convincing cameos, especially when combined with video, but again these are poor imitations at a digital level.
Outside of this, the ability for deepfakes to bypass biometrics-based solutions will ultimately be dependent on the type of liveness detection that is integrated into the solution. Liveness detection identifies whether the user is a real person and most basic forms of liveness detection require the user to blink, move their eyes, open their mouth, or nod their head.
However, these simple forms of detecting liveness can be spoofed with deepfakes, as has recently been seen in China where cybercriminals purchased high-quality facial images on the black market and used an app to manipulate the images and create deepfake videos that made it seem as if the faces were blinking, nodding, or opening their mouths.
They then used a special phone to hijack the mobile camera typically used to perform facial recognition checks, which allowed them to trick the tax invoice system into accepting the premade deepfake videos and were good enough to beat the liveness detection check, even though no one was standing in front of the camera.
Fortunately, there are currently no known deepfake-based systems available that can generate a synthetic response that looks like the user and says random words or performs random movements correctly with exact audio-visual synchronization within the limited timeframe available. If it were possible to construct such a deepfake, it would be hugely labor-intensive for each application, making large scale fraud impossible.
However, this doesn’t necessarily mean that the technology won’t mature, and this leads us to the solution that will allow us to future-proof our biometric identities: multiple factors.
Combating deepfakes: Multiple factors
In any situation that uses a biometric-based solution, particularly when being used to prove identity, multiple factors must be used. This is ultimately down to the fact that a combination of, for example, voice, face, and a PIN code is highly secure by means of the fact that any single factor may be possible to fake, but to fake all three in the same instance is virtually impossible. Therefore, to secure our biometric identities against a deepfake threat, we need to have the agility to evolve and add more or different factors as the threats change and become more sophisticated or more available.
An additional factor that is very hard to fake is time (e.g., having to provide an answer to a dynamic question that is unique to the moment it is asked). This may involve speaking a unique server-side generated word or number which cannot be predicted, as well as making a specific movement or facial expression on demand at the time of checking.
An action-based factor (on top of voice and facial biometrics) of “what you do” and – more importantly – “what you are told to do” is incredibly difficult to fake. A deepfake attack that passes the biometric check is unlikely to be able to replicate a required action, as such pre-determination is impossible with the processing power available today. But what would this look like in action?
Imagine you’re a bank and a customer calls to transfer a large amount of money, a situation where it’s vital to authenticate and transactionally bind who is on the other end of the phone. By getting the consumer to read out a unique, alphanumeric sequence, based on a particular transaction and generated in that instant from the transaction and related metadata, the result would be a combination of liveness detection, biometric voice check and proof of binding that person to a specific time and event.
Crucially, even if there were a high-quality spoofing attempt through a deepfake, the unique server-side generated instruction means a pre-prepared deepfake wouldn’t be useful or able to adapt quickly enough.