This blog post is a roundup of voice emotion analytics companies. It is the first in a series that aim to provide a good overview of the voice technology landscape as it stands. Through a combination of online searches, industry reports and face-to-face conversations, I’ve assembled a long list of companies in the voice space, and divided these into categories based on their apparent primary function.
The first of these categories is voice emotion analytics. These are companies that can process an audio file containing human speech, extract the paralinguistic features and interpret these as human emotions, then provide an analysis report or other service based on this information.
audEERING is an audio analysis company based just outside of Munich, Germany, that specialises in emotional artificial intelligence. Their team are experts in voice emotion analytics, machine learning and signal processing, and many of their founders have PhDs. Since 2012, they have carried out projects for major brands in many industry verticals, including market research, call centers, social robotics, health and many more.
Their product portfolio comprises software systems for automatic emotion and speaker state recognition from speech signals and methods for music signal analysis. They offer a range of commercial web-APIs, mobile SDKs, and embedded Linux and Windows SDKs.
A very research-oriented company, audEERING are also the developers of openSMILE, an open source research toolkit for audio feature extraction. It is the most widely-used tool for emotion recognition tasks in research and industry, and considered the state-of-the-art in affective computing for audio. audEERING are also responsible for creating the GeMAPS standard acoustic parameter recommendation, a research project that aimed to identify the most effective audio features for use in emotion recognition tasks. The feature sets defined in GeMAPS are easily imported within openSMILE. which standardises their implementation across research projects.
audEERING produce a number of packaged products too, including:
Audiary – a voice enabled diary that allows patients with chronic diseases to record the state of their health, and log their medical adherence. The sensAI technology it incorporates offers a complete analysis of the user’s emotional state.
CallAIser – a call centre speech analysis software that reports the parameters of telephone conversations such as duration and relative share of the dialogue, along with the speakers’ mood and the atmosphere of the conversation. This can detect and prevent escalations before they happen, allowing a more experienced call centre agent to take over and calm the situation down.
sensAI-Music – software that automatically detects tempo, meter, tune and vocals, and calculates the genre of a track, as well as its emotional setting. sensAI-Music helps DJs with planning set lists and dealing with large music databases, and allows for synchronisation of music tracks, with videos, lighting effects, and animated avatars.
“When the interaction is frictionless and seamless, you’re actually more happy with it, you’re less stressed, because it just feels natural.” Florian Eyben, CTO of audEERING
Beyond Verbal was founded in 2012 in Tel Aviv, Israel by Yuval Mor. Their patented voice emotion analytics technology extracts various acoustic features from a speaker’s voice, in real time, giving insights on personal health condition, wellbeing and emotional understanding. The technology does not analyze the linguistic context or content of conversations, nor does it record a speaker’s statements. It detects changes in vocal range that indicate things like anger, or anxiety, or happiness, or satisfaction, and cover nuances in mood, attitude, and decision-making characteristics.
Beyond Verbal’s voice emotion analysis is used in various use cases by clients in a range industries. These include HMOs, life insurance and pharma companies, as well as call centres, robotics and wearable manufacturers, and research institutions. An example use case would be to help customer services representatives improve their own performance, by monitoring the call audio in real-time. An alert can be sent to the agent if they start to lose his/her temper with the customer on the phone, making them aware of their change in mood, and affording them the opportunity to correct their tone.
The technology is offered as a API-style cloud-based licensed service that can be integrated into bigger projects. It measures:
Valence – a variable which ranges from negativity to positivity. When listening to a person talk, it is possible to understand how “positive” or “negative” the person feels about the subject, object or event under discussion.
Arousal – a variable that ranges from tranquility/boredom to alertness/excitement. It corresponds to similar concepts such as level of activation and stimulation.
Temper – an emotional measure that covers a speaker’s entire mood range. Low temper describes depressive and gloomy moods. Medium temper describes friendly, warm and embracive moods. High temper values describe confrontational, domineering and aggressive moods.
Mood groups – an indicator of speaker’s emotional state during the analyzed voice segment. The API produces a total of 11 mood groups which range from anger, loneliness and self-control to happiness and excitement.
Emotion combinations – A combination of various basic emotions, as expressed by the users voice during an analyzed voice section.
“We envision a world in which personal devices understand our emotions and wellbeing, enabling us to become more in tune with ourselves and the messages we communicate to our peers. Understanding emotions can assist us in finding new friends, unlocking new experiences and ultimately, helping us understand better what makes us truly happy.” Yuval Mor, CEO
Affectiva was spun out of MIT Media Lab in 2009 in Boston, United States by Rana el Kaliouby and Rosalind Picard. The company are specialists in both face and voice emotion analytics, capable of identifying 7 emotions, 20 expressions and 13 emojis, and classification of age, gender and ethnicity.
While the intial market focus for Affectiva was advertisers, brands, and retail establishments, their services are used in a wide range of other markets. These include political pollsters, game producers, education apps, health apps (e.g. AR for autism), legal (e.g. video depositions), web media products (e.g. giphy), and even robots and IOT devices (offline, throught their SDK). They are now used by one third of Fortune Global 100 and 1,400+ brands and market research firms.
They have a range of products on offer:
Emotion as a Service – cloud-based solution that analyzes images, videos and audio of humans expressing emotion. Returns facial and vocal emotion metrics on demand, with no coding or integration required.
Emotion SDK – emotion-enables apps, devices and digital experiences, so they can sense and adapt to expressions of emotion, all without the need for an internet connection.
Affectiva Automotive AI – a multi-modal in-cabin sensing AI that identifies, from face and voice, complex and nuanced emotional and cognitive states of drivers and passengers.
Affdex for Market Research – cloud-based facial coding and emotion analytics for advertisers, allowing them to remotely measure consumer emotional responses to digital content.
In-lab Biometric Solution – provides researchers with a holistic view of human behavior, integrating emotion recognition technology and biometric sensors in one place.
Affectiva supports desktop, mobile, IOT/embedded, SAAS, and automotive, across multiple platforms including iOS, Android, Web, Windows, Linux, macOS, Unity and Raspberry Pi.
Their long term strategy is to put an emotion chip in everything.
Nemesysco is a developer of advanced voice analysis technologies for emotion detection, personality and risk assessment. Founded in 2000 in Netanya, Israel, they provide advanced and non-invasive investigation and security tools, fraud prevention solutions, CRM applications, consumer products and psychological diagnostic tools. All Nemesysco’s voice emotion analytics products and services are based on Layered Voice Analysis (LVA), their proprietary and patent protected voice analysis technology.
Nemesysco’s clients are typically call centres, insurance companies, financial institutions and law enforcement agencies.
Call centres use Nemesysco for quality monitoring of their calls, either in real-time or immediately after, to identify the ones that are mistreated by agents. Their system also collects emotional profiles of customers and agents in the CRM system, allowing the most suitable agent for the customer’s emotional profile to be matched, and the most suitable products to be offered.
Insurance companies use Nemesysco’s products to conduct risk assessment and detect fraud in insurance claim calls in real-time. The technology analyses the unique vocal characteristics that may indicate a high probability of fraud or concealment of information.
Banks and financial institutions use Nemesysco to perform credit risk assessment, for immediate fact verification and fraud intention detection. The voice analysis platform improves risk scoring models and reduce uncertainty for lenders, allowing them to verify past events and current information, and identify potential sensitivities.
Law enforcement agencies use the Criminal Investigation Focus Tool to detect and measure psychophysiological reactions in suspects. This allows them to perform real-time analysis during investigations (either face to face or over the phone), and analyse recorded audio and video material offline. Many different emotional reactions can be detected, with the system displaying a label for each segment of audio e.g. ‘low risk’, ‘high risk’, ‘excited, or ‘stressed’. In addition to criminal investigations, these techniques can also be used during the recruitment process for sensitive job roles.
Audio Analytic was founded in 2010 in Cambridge, United Kingdom. Its sound recognition software framework has the ability to understand context through sound, allowing for detection of not only emotion, such as aggressive voices, but also many other specific sounds such as gunshots, smoke alarms, windows breaking, babies crying and dogs barking. Audio Analytic’s software has been embedded into a wide range of consumer technology devices for use in the connected home, outdoors, and in the car.
The technology runs exclusively on the ‘edge’ (i.e. on-device), using cloudless AI. All sound identification, analysis and decision making is done locally, which uses minimal resources and ensures total privacy.
In order to train their detection algorithms, Audio Analytic have built the world’s largest dedicated real-world audio data set, recorded in their dedicated sound labs and through data gathering initiatives. The sounds are labelled, organised and analysed in Alexandria, their proprietary data platform.
“We envision a world where the consumer devices around us are more helpful, more intelligent, more…human. We envision a future where omnipresent, intelligent, context-aware computing is able to better help people by responding to the sounds around us, no matter where we are, and taking appropriate action on our behalf. Our mission is to map the world of sounds and give machines a sense of hearing, whether that is in the home, out and about, or in the car.” Audio Analytic
Aurablue Labs is relatively new company, founded in 2016 in India. It leverages the power of deep learning to recognize emotions from speech signals. Despite their small size, they have developed state-of-artvoice emotion analytics technology that is able to identify emotions by continuously listening to normal day-to-day conversations. This allows users to discover and track anger and happiness by analyzing the tone of their voice, irrespective of the language that they speak.
While there are very few details of the product on their website, Aurablue Labs claims it can be used in a variety of interesting use cases. In call centres it could be used to analyse voice data and measure the quality of service delivered by agents. It can also be used by taxi firms to rate drivers based on aggression in their tone. Consumers can apparently use Aurablue technology to continously monitor their stress levels during the day, and receive alerts when it gets too high. They have also integrated their technology into the Beatz Smart Jukebox, a music player on Android that automatically adapts the playlist according to your detected mood.
VoiceSense are another voice emotion analytics company based in Israel that uses Big Data predictive analytics, rather than their demographic and historical information, to predict the behaviour tendencies of individual customers. The company founders specialise in psychology, signal processing and speech analysis.
VoiceSense have developed an emotion detection analytic engine, which provides real-time indications of the four basic emotions: happiness, anger, sadness and contentment. The analysis is fully language independent, speaker independent, and has a short response time of 5-10 seconds. It reflects emotional changes in the speech over the last 30 seconds.
Given real-time audio data, the analytic engine provides a description of the person’s attitude and behavior in the current moment. VoiceSense also offers a personality classification feature, which works by exploring the typical speech patterns of an individual over the long term, and identifying the characteristic behavioral tendencies of the person. VoiceSense validates it’s personality classification using well-known personality systems, such as the Big 5 personality inventory.
A profile for the user is created, specifying the detected levels of risk affinity or aversion, tendencies for impulsive behavior and rule abidance, personal integrity, sociability, conscientiousness and well-being. Prediction scores are then calculated for specific consumer behaviors, which are then automatically incorporated into decision-making processes and stored in the CRM.
Their flagship product, Speech Enterprise Analytics Leverage (SEAL), assembles these technologies into a speech-based solution that can accurately predict future consumer behavior. It does this through the analysis of prosodic (non-content) speech parameters such as intonation, pace, and stress levels, in both recorded voice files and live audio streams. The system is backed by research and patents, and supports cloud, mobile and local environments.
SEAL can be applied to numerous use cases, such as:
Customer Analytics e.g. churn prediction
Fintech Analytics e.g. loan default prediction
Healthcare e.g. PTSD tracking
HR e.g. staff retention prediction
Personal Assistant e.g. content recommendations
Call Center Interaction Analytics e.g. customer dissatisfaction monitoring
“Turning predictive speech analysis into a common practice of every interaction.” VoiceSense
That’s it for this blog post. You should now have a good idea of the range of voice emotion analytics services on the market today. I hope this will inspire you to try integrating one of them into your product, or even building something similar from scratch.
The next blog post will be a roundup of another segment of the voice technology landscape. If you want to be notified when new blog posts and podcast episodes are published, subscribe to the newsletter!