Eric Bolo is the CTO of Batvoice Technologies, a speech analytics startup based in Paris, France. Eric talks about building a custom speech-to-text system for their flagship product, Call Watch.
He introduces us to speech analytics and audio-mining, and describes some typical applications. We go into detail about speech-to-text (STT) technologies, and discuss the pros and cons of using cloud STT services such as Google speech versus building a custom STT system yourself.
Eric tells us about the latest open source tools and frameworks for building STT systems, and how to get that precious voice data to train our models. We learn how to build and annotate a custom voice dataset ourselves, and hear his advice on starting a voice first company.
This is a great first episode to kick off the series! Eric is super smart, with excellent technical skills and a real passion for voice technology. We already know each other quite well, so I couldn’t think of anyone I’d rather have as my first guest on the show. I know you’re gonna enjoy hearing what he had to say!
Links from the show:
- Batvoice / Callwatch: http://www.batvoice.com
- Google speech API: https://cloud.google.com/speech-to-text/
- Microsoft Translator Speech API: https://www.microsoft.com/en-us/translator/speech.aspx
- IBM Watson: https://www.ibm.com/watson/services/text-to-speech/
- Kaldi toolkit for speech recognition: http://kaldi-asr.org/doc/about.html
- EESEN speech recognition framework: https://github.com/srvk/eesen
- European Language Resources Association: http://www.elra.info/en/
- Mozilla Common Voice Project: https://voice.mozilla.org/en
- TEDlium English speech recognition training corpus from TED talks: http://www.openslr.org/7/
- Voxforge GPL Transcribed Speech corpus: http://www.voxforge.org/
- Amazon Mechanical Turk: https://www.mturk.com/
Find us here: