Blog Date: 3/7/2018
Author: Ray Coulombe
Micro-articulometry. Ever heard of it? It's the term for the technology used to deduce human profile parameters, employing Artificial Intelligence (AI) to discover micro-patterns (or micro-signatures) contained in speech. An article in the January 1, 2018 issue of Fortune magazine, "The Gift of Gab" by Jennifer Alsever, discussed the use of voice for secure authentication. Could this challenge other forms of biometric identification? Alsever's article cites work in AI being conducted at Carnegie Mellon University's Language Technologies Institute under Professor Rita Singh, with whom I had the chance to speak.
An academic paper co-authored by Dr. Singh notes that the different sounds that make up strings of spoken words are produced in rapid succession by modifying the shape of the vocal tract by moving the "articulators" (tongue, lips, jaw etc.). The basic science is: Different shapes result in different resonance patterns, which are heard and interpreted as meaningful speech by humans. Dr. Singh says it's a complex task that differs for everyone. "As a result, it is almost impossible for the voices to two different individuals to be exactly the same at all levels. This, and the fact that many of these influences are beyond the voluntary control of the speaker, make exact and complete mimicry impossible." Her work suggests that the human voice can definitively identify a person-and certain identifying mental and emotional characteristics.
Micro-Articulometry in Action
Recently, the U.S. Coast Guard has worked successfully with Professor Singh since 2014 to combat fake distress calls, where the cost of response can run from $5,000 to $15,000 per hour. Advanced voice analysis can provide information not only about the physical characteristics of the caller, but also the environment they're calling from. It can identify serial callers and work with snippets of voice communication kept intentionally short to convey urgency.
This idea can also apply to combat "swatting," when one makes a bogus call to law enforcement advising of dangerous activities to prompt a strong deployment response (like a SWAT team). In January, Tyler Barriss of Los Angeles called authorities in Wichita, Kansas to advise of a made-up impending event involving a weapon and threat of fire. Police arrived on the scene, fatally shooting an unsuspecting Andrew Finch as he reached for his waistband. [Note: Barriss was sent to be held for arraignment to the Sedgwick County, Kansas correctional facility. This facility received the Elliott A. Boxerbaum security design project of the year award for 2017, profiled in the December issue of SD&I.]
Can the bar be set high enough so that it's virtually impossible for someone to impersonate another's voice to the satisfaction of the AI-based system? The research suggests that this is likely the case. Will this supplant other biometrics? It appears to me that multi-factor authentication will remain a requirement or, at the very least, procedural approaches would be needed to diminish the chances of a false positive.
A false positive could be produced if a system prompted, "Speak your name." In this case, a valid voice recording might accurately reproduce the name. But, if the system then provided a random passphrase for response, it is unlikely the imposter would be able to key up a second recording of that passphrase. But, what if the person were present, under duress, and forced to speak the passphrase? Depending on the analytic, duress might be sensed with information used in a way similar to current duress codes and audio analytics. Also, a second form of credential, biometric or otherwise, could be employed.
It's important to note that voice and speech recognition are two different ideas. According to findbiometrics.com, speech recognition is a user interface technology (think Siri, Alexa, OK Google) while voice recognition (or voiceprint) is the identification and authentication of vocal modalities. Combined, these two ideas allow for authentication and hands-free device use simultaneously.
What's the Cost?
From a hardware standpoint, one needs a microphone to clearly reproduce content and the processing power of an Android device or Raspberry Pi. There's no special reader or appliance involved, and this is likely doable in an intercom station.
The science of profiling is in its infancy, but growing rapidly. In fact, USAA was one of the first companies to deploy biometric screening, according to findbiometrics.com, and they officially launched their system in 2015. USAA members can use facial, voice, or fingerprint recognition to gain access to their accounts. Out of the five million people signed up using the USAA app, over two million of them access the app using multi-factor biometric authentication.
Until all the aspects of this newer technology are fully researched, some challenges to voice recognition include: profiling accuracy in the presence of noise, voice disguise, designing and engineering the right AI techniques to discover information at micro-levels, designing accurate measurement techniques, and improving elements of the AI algorithm. Today, the accuracy level of voice recognition would compare to using fingerprints on a smartphone.
I do believe the technology will eventually meet, and in fact exceed, the requirements for accurate identification, even as researchers continue attempts to characterize an entire persona.