Vivatech 2018 Voice Startup Summary – Voice Tech Podcast ep.003

Vivatech 2018 Voice Startup Summary

Episode description

This episode covers 8 of the most interesting voice startups that I found at the Vivatech technology conference in Paris, France.

Included in this episode is a voice transcription and synthesis mobile app for the hard of hearing, a voice enabled smart alarm clock that can monitor your sleep quality, a robot behaviour system that delivers CMS content in person, and a comprehensive voice assistant platform that can handle multiple requests in a single query.

Links from the show

Episode transcript

Click to expand

Powered by Google Cloud Speech-to-Text

hello I’m Carl Robinson and this is the voice tech Podcast episode 3 there’s no yesterday we’re going to do something a little bit different because I just spent 2 days at the vivatech technology conference here and sunny Paris France it was held in the huge Exhibition Centre at Porte de Versailles I was really well organised and had a great time there were over 8000 startups present apparently and also every big name tech company that you can think of and even celebrities like Facebook CEO Mark Zuckerberg and the French president himself Emmanuel macron attended I spent 2 days out of a possible three at the event talking to as many Voice first startups as I could so I thought I’d bring you a quick episode Carrie my discussions with a

the most interesting startups that I’ve found before we start a little update on the competition I had last last episode so the experiment itself was a success and that the competition was a failure I didn’t receive a single tweet in response to to my call for free promotion perhaps the price of an eBook wasn’t tantalizing enough but I suspect it’s just because my audience isn’t large enough to support a competition like this yet so I’ll try again when listening numbers hit the next order of magnitude which I hope will be any day now ok so the first startup that I spoke to was rogervoice they are the first worldwide app the subtitles phone calls in real-time bridging the communication between hard of hearing people and hearing people the problem they solve is that being deaf or hard

Harry makes making phone calls in possible or very stressful which in turn reduces the freedom and Independence of the person the simple things like booking appointments become a real obstacle and this can often mean the person who is hard of hearing have to rely on a hearing person to make the call for them all for them to actually go and make an in-person visit to book the appointment at themselves is of course inconvenient and time-consuming the rogervoice of come up with a solution pretty simple it’s just a mobile app that uses voice over IP bit like Skype that you can call regular telephones with but it has a chat interface on the app that you type into and then it uses text to speech to speak whatever you type into the phone over the phone line to the person that you’re calling on the other end and the other person Hears A synthesized voice of course and then when that person replies with using their own voice the software uses speech to

text to transcribe what they said and displayed to you on the app and in this way you can have a regular conversation without ever having to hear the person’s voice so now I spoke to a mere who’s there a quality assurance engineer on the team and I was really impressed with the demo considering that we’re in a noisy environment in the vivatech conference and as 100 people all talking around and retention the system works flawlessly perhaps that’s thanks to the fact that they use the Google text to speech speech speech to text engine but it had really low latency as well and got very very accurate results at even correctly detected the the word Belvoir tack in the speech sorry I was really impressed and I could absolutely see that being used for it really worked is a cloud-based solution so does need an internet connection and Amir tells me they are considering developing their own speech to text and text-to-speech system so they don’t have to

to rely on Google but then and they’re actually surprised me by revealing that he’s hard of hearing himself which in hindsight made sense given that he’s a quality assurance engineer for the product but I honestly had no idea until you told me that you could only hear 7% of what I was saying and that was with the use of a hearing aid device which she was wearing which I fell to come to realise that he said that he was actually relying on on lip reading for comprehension which is the first time I think the first time I spoken to her anyone who is hard of hearing use it was actually really my lips then we started talking about that and he noted that are heritable devices are becoming more popular with the with earphone simple earphones that we use for listening to music transforming from a single function device to multifunction computer interfaces and sensors and Anna as they do and people normal people wear these terrible devices the stigma of where

hearing aid will decrease as obviously that useful become Commonplace and I could also see a service such as rogervoice becoming integrated with his hearables so that the colours words not only shown in tax on the app but also resynthesised into a form that more easily heard by the hard of hearing does not everyone who is hard of hearing his completely deaf they can hear but perhaps with a bit of assistance they could they can understand that Aliyah Roger voice app is available now for iPhone and Android and it’s free if you make calls between rogervoice app users and but if you want to call it a real telephone and then it costs \u20ac6 a month for 599 in fact on a subscription basis so check it out so the next startup I found was Holly or holy I if you like spell h o l i I met Damien koala the chief science officer at Holly Damien Haas

PhD in neuroscience and is a former researcher at Stanford University and cnrs Holly pretty well established company they make a range of connected devices the services with a focus on the health industry as well as the hotels entertainment and hospitality Industries as their newest device which is about to hit the market is called bonjour smart alarm clock with voice control so had a play with it it’s an attractive looking device it’s got an all white surround with a black face in an analogue clock setting to the front which also doubles as a display for other alerts it’s designed specifically for the bedroom which means they going to go in a bad sleeping and getting up routine it does many of the things that you’d expect from a smart speaker such as getting the weather and traffic updates and you can play music from Deezer on Spotify and you can also control smart Home devices such as smart bulbs smart thermostat and security

what’s your can also act based upon conditionals so from since I can wake you up at 8:00 if it’s sunny or let you sleep in if it’s not but it’s main benefit is that it’s connected to the vocal platform so vocal is a platform that the holy team built I combined the voice assistant with her and iot hub so the voice assistant platform is ready to use and is built to streamline the communication between patients and caregivers such as doctors and service providers it does this by providing a voice assistant to monitor and support a patient at home the vocal was also I had that connects to every connected health device in the patient’s home the president’s are blood pressure monitors oximeter glucometers and the like and collect data which is that which then would you then makes available to the patient using voice to the objective is to reduce human error in the administration of medication answering

the level of a Darren switch means how often the patient takes the medication and to ensure they take it in the right quantities and in the right way an example of the combination of iot in voice assistance would be that the bonjour alarm clock can detect when a connected device that has a problem such as a sleep apnea mask that some users wear at night monitor their sleep apnea and it could then inform the user by voice that there’s a problem with that device and then provide instructions on how to fix the problem the Holly also provides a range of other services at one of them is sleep analysis and they can therefore provide us a service that diagnoses sleep apnea and provides sleep quality analysis and as such as their the level of snoring and also pre screening and monitoring of other sleep disorders they use a signal processing and AI to help Physicians and patients

they also produce a variety of other connected devices that help users fall asleep faster a track there sleep quality and wake up more easily so I didn’t ask what voice assistant technology they use in bonjour for the voice interface but I’ll be surprised if they build it all themselves and I would love to have them on the show to find out more about the test was gone into Building and the smart alarm clock I do know it’s based on Android 6 and there’s no API available at present so all the Smart home integrations that are possible with the device are available out of the box already the bonjour alarm clock costs $249 and will start shipping in June and I see that is available for order right now at Holly. Io to the next startup I found on my travels was askhub it’s an Analytics and problem identification service for voice interface systems I spoke with her team the co-founder and see

yo are you told me that askhub how to train your boat by understanding what your but can’t do at the moment so it’s for people who actually building voice interface systems and conversational interfaces to help me diagnose problems and the askhub team use unsupervised clustering algorithms to analyse of thousands of queries that are not understood by your box at the moment and then give you actionable insights so for instance they can identify a user’s intentions to be improved they can detect new services and contents which interests users and it can help you save time and money by analysing thousands of logs they offer real-time tracking of unpause requests with a historical and real time graph showing the unpause requests experienced and the corresponding semantic categories so users get a web dashboard displaying a real-time analysis of the Nou bottlenecks in level

Dr stem which allows them to quickly react to massive failures but also identified daily Trent daily trending topics that having identified The Troubles and queries a data set of similar requests can be generated and then used to train an improved speech recognition model using supervise algorithms and to aid in this process askhub off for a test tool for analysing the impact of the proposed changes made to the NLP model to the test tool runs a simulation over historical queries by replaying the clustered failures on the updated bar and visualising a measuring the improvement this allows the user to train the new model faster and more efficiently and to avoid pushing changes to a live system that would break the system or reduced performance in any way askhub is the kind of product that you don’t think about until you actually start building a product or conversational interface and then you realise it’s completely invaluable to the process you don’t know how you did without it

makes me think of success one let its products such as mixpanel an active layer which are so useful for iterating during product development as well as monitoring the status of functionality and in life products and there’s no pricing information that I can see on the website but you can request a demo by dropping an email to contact at askhub. I know I also met nap while 2nd the senior sales manager from the acapella group now acapella are very well known text to speech specialists established way back in 2003 they really are experts at this they speak Solutions help to vocalise a company’s content basically they have more than 100 voice is available in 34 languages and accents and you can even have acapella create a custom voice for you there text to speech solutions are used in a wide range of products from toys and robots cars and trains screen readers and smartphones iot and many more

but also actively involved in research and development and I found their latest project Shante which stands for short numeric Tony L in French which means digital real-time singing started in early 2014 and there’s a three year programme on French singing speech to text is the progress of projects involving a number of different partners including EarthCam which is the Institute I shall see God in Athlone acoustic music which is actually laboratory that I’m doing my internship at the moment and the goal of the project is to create a high quality system for synthesizing songs that can be used by the general public so the system will sing the words of a song and the synthesizer work in two modes either a song from text mode where the user can enter a text to be sun along with a score and which includes the times in the pictures and the Machine

transform into sound the other mode is the virtual singer mode where the user controls the song synthesizing real-time by gesture based interfaces so it’s just like playing instruments but it’s actually synthesizing a song so acapela really are on the Cutting Edge of things the next company I came across a purely by chance although you couldn’t really miss them because they are part of the huge robot Park that they have diarrhoea vivatech they are called humanizing. Comm spoke with the account manager Sebastian their companies based in Germany and Austria and they’ve created out of wedding to face that lets you program the pepper robot which year was created by Aldebaran Robotics which is now owned by softbank robotics to be haven’t seen it already and the pepper robot is an all white female friendly looking robot about the size of a child with a tablet computer attached to the body

and it provides various services for businesses that it attracts people or provides entertainment you can recommend products you can greet people a company headquarters are in a shop for instance or showcase some kind of product it is often used in retail environments tourism corporate officer use it for events and they saying their Headquarters and sometimes it’s used in public places it’s also good for social robotics so this used in health and elder care settings as well as the tools retirement homes and things so as I was walking past the stand and obviously the robot caught my eyes so I approached pepper and she looked up at me and her posture and I contact we really convincing as you, looked up and moved make made a gesture with her body and her shoulders and had a fleeting moment where I can tell you’re both looking at each other just for me if I first but yeah I could definitely not know me like it’s either see the bed

physical robot has over set over a virtual Avatar or just something that saying a disembodied voice interface anyway Sebastian introduced me to some of humanizing products and the makers of or a content management system that allows you to create manage and deploy your content to one or multiple robots and through the CMS you can create product descriptions and then get pepper the robot to say exactly what you want and and present are the things that you’re selling a there’s also a practice mode where you can get Peppa to actually approach the customers not to stand there and wait to be approached and then after the latest products ordeals I’ll just give him a compliment to brighten their day and pepper can be used as well as a welcoming agent so with the with a welcoming host app you can actually manage guest check instance in a hospitality second setting and then pepper will no

are you with a phone call when a guest is waiting so we can actually man the front desk automatically which is pretty pretty impressive there’s an Entertainment package as well which includes interactive games dancers and animations that will engage the audience you can imagine that we’re being used in a variety of settings something will be very interested in there’s a chatbot integration and Papas speech recognition abilities are pretty impressive but if you want your customers to have her a deeper interactive experience than the chatbot integration is is the solution to that the last feature that they offer which is really more of a mod demonstration feature than than has a practical use although perhaps you can think of some is to actually be pepper and it was actually happening while I walk past her the stand is Sir it’s navigation that allows you to control Peppa through a telepresence system so you actually put on a VR

headset and some gloves and then you see through the eyes of Peppa and when you move your hands that moves pepper the robot hands in that in the same way so her fingers and hands can move pretty well but as I learnt she can’t really great things that I go there were trying to get her to grab a bottle and she just couldn’t do it and when I tried it as old as a pretty rigid so she can’t move her hands very far lateral is that you can’t like hug for instance when I was there there was there another another gas and love show attendee it was wearing the VR headset and control and pepper as I put my hand out to see whether she could grab my hand and she could come to reach for it though because of the limited mobility that she couldn’t really grab it and give me a convincing handshake sisters pretty pretty fun demo then I spoke to Sebastian about the speech speech integration speech recognition capabilities and specifically that speech synthesis and does Peppa

can speak any tell me the softbank don’t allow peppers voice to be changed for the moment at least and I assume this is because they want to maintain a consistent personality across all the pepper robots which will look identical are these the ones at the show dead as so that they can protect their brand image and I know whenever you meet a pepper robot it’s the pepper robot and it’s not been modified by the user and anyway I can understand why they want to do this but I think in the long term there was there going to need to provide a wider variety of Voices and personalities pepper to use because the whole point is a social robot is there didn’t interact with people either that is they provide the voices of personality is all they allow integration with third-party TTS systems as otherwise that the possible uses for paper I feel the restricted somewhat like for instance if you if you wanted to put pepper in front of your night club is that you know as a robot security guard you probably wouldn’t want her to have the same voice as

David you put in a children’s nursery France another stated used on this site I think this is for a different robot actually it although you could probably do it through pepper as well as I work remotely telepresence robots which allows you to be in two places at once I thought I was quite interesting as the robot that they’ve they’ve got doing this type on Wheels or specifically it’s it is a tablet computer on a stick on wheels and you can Skype into a real world of them and I like control control where you move around you can like basically will your face around and ends speak to the gas that the event while you’re in there in the comfort of your own office so that’s all well and good it’s a bit weird I’m not sure it’s really going to take off but I can see that the Logical progression of this text is there is a future where robots are used with something like

Google duplex so that we can actually task physical robots to go out into the world and conduct interactions with people or other robots maybe it will come please real tasks that can’t be handled over the phone we saw how Googles assistant can call a make an appointment but it’ll be great if you could just tell her to go pop down the shops and you know pick up at the Gala something anyway you can find out more about the company humanizing. Comm humanizing weather Z cochlear.ai is a start-up that I was looking for on the first day I was there in it and I couldn’t find them and I was really pleased to discover that they were there on the air on the second day they are developing a system that understands the semantics of audio and I spoke with the unit on pan the co-founder and CEO of cochlear Annie told me that the team which is based in Seoul in South Korea as currently 66 people strong and neither all got a PhD

all about to graduate with one so it’s a pretty impressive team and as a result of course there are highly research focus company so the team of use their extensive knowledge to create a product that extracts non-verbal information from music speech and acoustic events at the moment cos they are relatively young company they can detect six distinct acoustic acoustic events presents at a baby crying or an alarm going off but their products still in development and it’s available is as a cloud based API or an on premise installation I have actually come across similar long established company called audio Analytics and so it seems like this acoustic event detection is there is a space that’s really hotting up if you want to find out more about cochlear.ai you can go to unsurprisingly cochlear.ai and I Save The Best For Last SoundHound is best known for its music recognition service

in my mind at least it was just a direct competitor to Shazam which was later acquired by Apple in that you could press a button when a music was playing and I will tell you the the artist in the name of the track and I thought that’s all they did however I met with Charlotte SoundHound to show me that they’re focused on much more than just that simple use case and they have actually a virtual assistant are pretty yeah I’m pretty comprehensive one as well called houndify houndify this is actually been in the works for years for a for a few years now and that the results really testify to the effort has gone into it so Charlotte demo the voice understanding capabilities on her mobile mobile device and it was really impressive she was keen to state that the houndify system doesn’t transcribe the speech into words using speech to text and then try to understand the semantics the text rather it translates directly from sound to meaning speech to meaning

and if the system that they develop is it the power to handle very complex requests containing multiple queries and all in a fraction of the time a typical speech-to-text system takes so I was sceptical at these a big claims and especially given we’re on a noisy conference but she did a demo where she made a single voice request asking for restaurants nearby the south Asian food but not Chinese food and then the app very quickly Returns the correct results and she asked to sort by price and it did understand the context the contacts to the list already been displayed she didn’t have to ask the whole query again and she just said no sort by list as sort by price and sees me and then she asked her another following question reserve me a table at the first one and they understood the context again open the booking form for the first restaurant in the list and she said she could even go further and asked to book an uber to the restaurant and that it would understand

I was really really impressed by the speed and the accuracy of the system and the difference the contextual awareness made and anyone who wasn’t convinced by The Voice first interfaces were possible or practical would have changed their mind by this demo I I really can’t wait to see more devices integrate the hound of Fire system as it will really push her voice fast forward for more use cases and developer information you can go to houndify. Comm ok so that should have given you a little taste of what was on show at vivatech as always you can find the show notes with links to the resources mentioned in this episode at voice tech podcast. Comm and you can also follow on Twitter at voicetechpodcast.com to ask I need a five-star review on iTunes I currently have 0 reviews is there one of my new

who’s winning to take 2 minutes to rectify the situation I’ll give a shout out on a show to anyone who leaves a review OK that’ll do begging for reviews I hope you’ve enjoyed episode 3 and got something out there I’ll be back soon with another installment as always I’ve been your house Carl Robinson thanks for listening to the voice tech podcast

Subscribe to get future episodes:

Join the discussion:

Support the Voice Tech Podcast:

Share this article

What do you think?

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Posts

Florian Eyben Audeering
Joshua Montgomery Mycroft
Syed Ahmed PubNub

Get notified about new articles