Asterisk and Speech Recognition
Asterisk and Speech Recognition
Before we dive into Asterisk, we need to select a speech engine.
There are two main types of speech engines:
- Text-to-Speech (TTS)
- Automatic Speech Recognition (ASR).
Generally speaking, your choices for TTS engines are more plentiful. There are more vendors in the TTS market and they cover more languages.
ASR vendors and languages tend to be more sparse, though coverage for North American languages is good.
Many companies has worked with several over the years: LumenVox, NeoSpeech, Nuance, Cepstral, and AT&T Watson, to name a few. All of these companies provide TTS voices.
Only LumenVox, Nuance, and AT&T Watson provide any ASR.
MRCP in Asterisk
The best way to connect Asterisk to an MRCP server is to use the UniMRCP package. UniMRCP consists of a library that provides MRCP support, as well as a suite of native Asterisk applications to interface with MRCP servers from the Dialplan.
Installation instructions for UniMRCP on Asterisk can be found on the UniMRCP site.
Once you have UniMRCP installed and loaded in Asterisk, you will have three new Asterisk Dialplan applications. These applications include:
- MRCPSynth: for text-to-speech
- MRCPRecog: for speech recognition
- SynthAndRecog: for combined TTS + ASR
Examples
In Asterisk Dialplan, you might have something that looks like this:
- We want to play the audio file ‘/srv/app/testivr.wav’
- We want to allow the callers to speak various responses, like “Sales”, “Support”, or “Operator”
- We also want to allow callers to press buttons, like 1 for Sales, 2 for Support, or 0 for Operator
- You also want to allow the caller to “barge,” or interrupt the prompt, rather than forcing them to wait until it finishes playing
- You want to reject anything with a speech recognition confidence lower than 40%
- We’ll also assume this is in US English only
To do this, we need to pass three documents to SynthAndRecog:
- The first argument is the audio prompt to play:
file:///srv/app/test_ivr.wav
- The second argument is the list of grammar URLs, one each for speech and DTMF, separated by commas:
"http://127.0.0.1/documents/corporate_ivr.main_menu_voice,http://127.0.0.1/documents/corporate_ivr.main_menu_dtmf"
- The third argument is the list of flags, separated by ampersands (see above for the link to documentation on the set of available flags)
Here’s our completed example:
exten => s,1,SynthAndRecog("file:///srv/app/testivr.wav","http://127.0.0.1/documents/corporate_ivr.main_menu_voice,http://127.0.0.1/documents/corporate_ivr.main_menu_dtmf",b=1&spl=en-US&ct=0.4)