Automated Speech Recognition (ASR) also known as 'Voice Recognition' or 'speech to text' is the technology that translates spoken words into text, a machine readable format.
VoxiAI ASR technology uses uniquely generated acoustic models that predict how words sound in a given environment, such as when talking on a mobile phone. These acoustic models are combined with language models and pronunciations for exceptional accuracy. VoxiAI ASR is adaptable to specific domains, environments, and languages. ASR takes spoken word and puts it into text, starting point for being acted on.
Natural Language Processing (NLP) is a branch of artificial intelligence that deals with the interaction between computers and humans using the natural language.
NLP comprises of Natural Language Understanding (NLU), Natural Language Generation (NLG) and Dialog Management technologies. NLU helps understand the meaning behind the words. NLU deciphers the intents (what the user wants to do) and entities (names of products, locations etc) from the text and feeds them to dialog management engine to find the best possible response. NLG is then used to convert that response into the language understandable by humans
![]()
VoxiAI believes AI should adapt to human conversation, not the other way around. Powered by the company's proprietary Tandem™ technology, VoxiAI IVA combines the latest in Conversational AI— Automated Speech Recognition (ASR) , Natural Language Processing (NLP), machine learning and Deep Neural Networks—with human understanding at real-time.
How does Tandem work?
VoxiAI is known for its unique approach of blending AI and humans or keeping 'human in the loop'. Adaptive Understanding is at the core of everything we do. Irrespective of the channel, every customer interaction that comes to an Intelligent Virtual Assistant is sent to a Conversational AI engine component of Adaptive Understanding. If the AI has high confidence score on the accuracy of the answer or response, the IVA responds to a customer using the response generated by AI. In rare occasions when the AI doesn't have high enough confidence score due to multiple speakers, background noise, unrecognized language or dialect, caller accent, or simply a complex intent, VoxiAI invokes the 'Human Assisted Understanding (HAU)' component in real time. These humans, called Intent Analysts (IAs), listen to the brief audio recording where the AI had low confidence score and helps AI understand it. This human engagement happens in fraction of seconds, so the end customer never feels any delay or lag in response.
The IAs never interact with the customer directly or listen to the entire call. They simply act as an additional resource of recognition for the Conversational AI engine. When the correct response is sent to the customer, the IAs also help tag and label the data to complete the machine learning loop and ensure that the solution is getting smarter with every interaction.
The end result is an incredibly sophisticated system, capable of a rich understanding of customer commands, requests, and intents that enables customers to engage in natural, open-ended conversations with brands, just as they would with other humans. Unlike most solutions that require customers to engage in restrictive 'robot-speak' or to choose from a limited menu of options, VoxiAI IVA is capable of understanding whatever a customer says, no matter how they express it, fostering effortless and productive conversations at every touchpoint. This eliminates the frustration of ineffective, simplistic solutions and provides unprecedented convenience and ease of use for today's customers.