As new regulations like GDPR and CCPA put a stranglehold on the use of third-party consumer data sources, making sure you have access to actionable first-party data is more important than ever. While you probably have access to data like customer purchase history, website and other digital journey information like paid search interactions, there are limits to how much insight marketers can gather from these sources.
Conversational analytics represents one of the last bastions of precise first-party insights into how your customers interact with your brand, how they think of your product or service, and perhaps most importantly, exactly how they talk about it.
What is Conversational Analytics?
Conversational analytics is the process of extracting usable data from human speech and conversation using natural language processing (NLP) to allow computers to “understand” speech and artificial intelligence (AI) to extract and organize data from it. I know, that’s a lot of alphabet soup to deal with, but what conversational AI boils down to is giving machines the ability to process speech and allowing people to gain insights from massive numbers of conversations at scale — both of which were daunting if not impossible tasks just a few years ago.
Conversational analytics is used to extract and process data from both spoken speech (e.g. phone calls and voice assistants) and typed speech (e.g. customer service chatbots). The applications are myriad, so we will stick with our area of expertise and show you how it’s used in call tracking software that allows marketers to understand call context, predict outcomes, and apply the data to optimize marketing campaigns and improve customer experience.
How Does Speech Analytics Work?
Speech analytics, or speech recognition, is used everywhere. When you talk to Alexa, chat with Siri and, of course, when you use conversation intelligence software, speech analytics is at the core of what makes it all work.
The key challenge in making speech recognition work is that words are not the fundamental element of speech. Words are made up of smaller components called phonemes, which are basically the building blocks of speech. In the English language, for example, we have millions of words, but they're made up of only 42 phonemes. So, the first step in speech recognition is to break up the audio stream into these phonemes. The first step that you need to know in order to do that is some linguistic knowledge of how long a phoneme actually lasts.
If you're going to divide up an audio stream into phonemes, you need to be working on the right time scale. And so, as the audio stream comes in, you chunk it out into segments that are overlapping. And then you want each segment to be short enough that it has just one phoneme in it, but long enough that you can actually capture the acoustic signature of a phoneme.
For example, if you were to do this on a piece of music, it would tell you what notes made up that music. Then that gives you an acoustic signature for that small time window, and then that is a signature that you try to match against a library of phonemes you have.
Once you can identify the phenomes, you can match them against a database of known phenomes that exist in English.
To begin distinguishing words, there are a few more steps. So, you take the audio stream and break it up into these signatures of frequencies then match those against phonemes. And then from the phonemes, you need to get to a word, and you use a pronunciation dictionary for that. The same set of phonemes could make multiple different words, so, on top of that, you have a language model. So, for example, “eight,” the number, or “ate” the verb sound the same. But the words that come before or after can give you an indication of which one was the intention of the speaker.
And so, you have a statistical mapping from the audio to the phoneme. And that's never going to be certain because different people speak differently. They have different accents, might have a bad audio connection, and so on. So that’s a statistical connection. And then the mapping of the pronunciation to words is also statistical. There might be different ways to pronounce a word. And then which word choice is statistical because some words sound the same, so you have to determine from context what the most likely outcome is. That means you have these three probabilistic connections. And at the end of the day, you multiply those probabilities together, and you hope that the probability peaks at a particular word.
Want to dig deeper into how speech analytics works? Listen to this Software Engineering Radio podcast with Invoca data scientist Mike McCourt.
Why Conversational AI is the Future of First-Party Data
One of the reasons that conversational AI is seen as a great first-party data source is that today’s consumers expect more than the purely digital “point-click-buy” transactions and are demanding blended experiences that bring conversations into the mix. In fact, 70% of consumers feel frustrated or angry when they don’t have the choice of contacting a human representative.
Whether it’s a phone call with a human representative, chatting up chatbot, or getting help on the go with text messaging, consumers want to talk. This means that businesses need better ways to listen. In order to hear and understand what your customers are saying at any sort of scale, you need conversational AI to make sense of and take action on the data.
Companies that frequently have conversations with their customers on the phone are sitting on a goldmine of customer data. They may have thousands of hours of customer phone calls every year and have tens of thousands of call recordings banked — just imagine the kind of customer data that is in all of those calls. You’ll learn why they are calling, what makes a purchase happen, whether they are calling more often for service or sales, whether they are happy or mad — the possibilities are nearly endless. Then imagine manually wading through all those call recordings to gain insights on those calls. It’s just not possible. And as it turns out, it was pretty challenging to get computers to do it, too.
The Challenges of Using AI to Understand Human Speech
On the computer side, much of the difficulty in analyzing conversations lies in the many nuances of human speech. Unlike the formulaic equations and coded strings of commands computers usually deal with, human speech follows only a loose pattern and logic. Even if we are only talking about analyzing the English language, there are hundreds of different accents, inflections, phrase patterns, varying word usage, slang, and colloquialisms that even other people have a hard time understanding. New research shows that some elements of speech are hardwired in the human brain, but what really makes people different from machines is our ability to instantaneously process all of this variance of language. Creating a machine learning algorithm that can “learn” how to process human language is a whole different ball of wax.
When it comes to processing conversations in phone calls, which is what Invoca Signal AI conversational analytics does, things get even hairier. “Phone calls are idiosyncratic in the world of natural language processing,” said Invoca data scientist Mike McCourt. “They can be repetitive, can contain both recorded messages and human speech, and often suffer from bad connections.” On top of that, phone calls also contain both full conversations and sequences of simple yes/no answers, hold music, keypresses, silence and many other variables that you don’t see in textual communications. This makes it difficult to design AI software that can juggle so many competing needs.
Most well-known AI models for language are designed for either long, carefully edited texts like news articles, or for short, spontaneous speech like a Tweet or chatbot conversation. In our experience, none of these well-known models work on phone calls, which are both long and spontaneous. Since analyzing phone calls differs so starkly from the rest of the research community, we did our own research and development to develop our conversational AI algorithms.
How Does Invoca’s AI-Powered Conversational Analytics Work?
Invoca’s conversational analytics platform is designed to help marketers get a new view into conversation data from high-intent consumers — such as purchases made or promotion inquiries. Marketers can quickly gain new insights and get attribution from phone calls and take action on them in real time. It is used to drive more revenue-generating calls, boost conversion rates, and optimize the buying experience. Signal AI conversational analytics is most frequently used to:
- Optimize Ad Spend: Automatically adjust keyword bidding strategies and suppress ads in systems like Google Ads and Search Ads 360 for callers who convert over the phone
- Seed Audiences: Create new audiences using offline conversion data to expand your reach of potential customers through native integrations with Facebook and Adobe Experience Cloud
- Personalize Content: Update content management tools like Adobe Target to personalize content for each subsequent consumer visit based on call conversations
Compared to other conversational analytics tools, Signal AI offers marketers deeper insight into the unique conversations happening between a businesses’ buyers and agents — often uncovering conversation patterns and behaviors that marketers didn’t know existed — with consumer-level data that can be made actionable across marketing platforms.
Here’s how conversational AI works in Invoca’s call tracking platform:
Step 1: Call data flows into the Invoca platform during each conversation.
Step 2: The spoken data is transcribed into text so it can be analyzed by the algorithm.
Step 3: The predictive model analyzes the conversation and identifies key patterns, phrases, and actions, then identifies call outcomes such as ‘application submitted’ or ‘quote received’.
Step 4: Those outcomes and insights are pushed into your marketing stack so you can use this valuable conversation data to optimize marketing spend and personalize the customer’s next interaction — all in real time.
The latest addition to the Signal AI conversational analytics suite is Signal Discovery. Powered by unsupervised machine learning, Signal Discovery gives you an unprecedented view of phone conversations with your customers. Signal Discovery automatically groups customer conversations into topics based on similarities in speech patterns. This allows marketers to quickly gain new insights from tens of thousands of conversations and take action on them.
The reason we developed Signal Discovery is that over 56% of marketers have no idea what’s said during the calls that they drive or what the outcomes of those calls are. Signal Discovery shows you real conversation topics with real consumers in a full-color map, eliminating guesswork and assumptions about caller behavior by providing you with hard data. “Signal Discovery shines a whole new light on conversations happening in our contact center, giving us the data we need to enact changes across the organization”, said Noah Brooks, Manager of Digital Engagement and Analytics, University Hospitals. “As a marketer, I’m excited to apply this data in real-time to improve the patient experience.”
The beauty of Signal Discovery is that it not only validates what you may think you know about caller behavior, but it also uncovers caller behaviors that you may not know exist.
There is no better way to get to know someone than having a conversation, and there’s no better way to get to know your customers than by making conversational analytics part of your marketing tech stack.
Download the State of First-Party Marketing Data report to learn more about how your peers are using conversational analytics and call tracking to take action on first-party data.