Voice AI for eCommerce – John Fitzpatrick, Voysis – Voice Tech Podcast ep.002

John Fitzpatrick Voysis

Episode description

John Fitzpatrick is the VP of Product & Engineering at Voysis, a leading voice technology company that builds custom Voice AI solutions for businesses. They are currently focused on the ecommerce vertical, helping to voice-enable mobile apps and websites to augment the shopping experience.

In our conversation we cover a range of topics including the major components of the Voysis system, the technologies and tools John’s team used to build it, and the challenges they faced. We also discuss how Voysis protects a user’s privacy and the implications of the GDPR regulation that comes into force soon.

It was a pleasure to talk with John and learn from such an experienced engineer. Whether you’re currently building voice AI systems or are just interested in learning about how they work, this episode will have plenty for you.

Links from the show

Episode transcript

Click to expand

Powered by Google Cloud Speech-to-Text

welcome to the voice tech Podcast episode number two possible thank you to everyone who listen to episode number one I really appreciate all the tweets and emails and all the feedback the guys are provided has been especially helpful to me and I tried to incorporate your comments into today show and I hope you’ll continue to send me your feedback to help shape the podcast before we get started I’d like to try something new and host a little competition I’ve got three Kindle copies of a great book to give away and to enter we have to do is send a tweet that will help me grow my audience about the book is entitled bot business 101 how to start run and grow your business is published in 2017 and is written by ekin Kaya the founder of otago which is a tech company in New York that develops intelligent Vert

can agents and ask him is the most viewed writer in the chat box and conversational agents category on Cora and medium and the book has great reviews on Amazon so I’m sure it’ll be really useful to many of you who are thinking of starting or currently running a business in the voice space so they win a copy we have to do is send a tweet with a link to voicetechpodcast.com and include the hashtag voicetechpodcast.com on word in you can do it as many times as you like of course after a week or select 3 winners at random and contact you via Twitter ok so with that let’s get on with the show my guests today is John Fitzpatrick VP of product and Engineering at voices from there offices in Dublin Edinburgh and Boston in the United States by size helps companies create custom voice AI Solutions that understand the vocabulary of their products and services that they offer we cover the major components of The Voice

the system the technologies and tools the team used to build it and some of the challenges that they faced will also discuss how voices protects the user’s privacy and the implications of the forthcoming gdpr regulation so with that I bring you John Fitzpatrick so I’m here with the John Fitzpatrick and he’s the VP of product and Engineering at voices John has a PhD in computer networking and telecommunications from University College Dublin and in his previous roles he’s been a technical director and I start at founder are responsible for the technology strategy and managing teams of developers hello John on the podcast epic Attenborough to exciting background

to where you are now and yes I was always primary focus on new technology problems an invoice I was quite a natural fit me so I said you don’t you can mention of the stars did my undergrad about the electronics telecommunications Engineering in Dublin City University could always with these courses gravitated towards the telecommunications I do things in particular radio digital signal processing invoice so, after finishing my undergrad then I went on I’m basically he received a scholarship for PhD in computer science University College Dublin and there are focused on a voice over IP should a lot of work around wireless networks on the challenges that entails when transmitting voice over those are those networks little things like very

the last two digits I need to be done, Kodak adaptation and after graduating then from my PhD I went to work so I spent some time as a visiting researcher at university of Oklahoma working on transport layer protocols for MOT for satellite communications actually been worked on how do you say I worked on a number different large European research projects and then after that I was Larry Curie research fellowship and for some more card for postdoc research work as part of that I worked at NEC labs in Germany promoted are working on Santa’s Elves and Wi-Fi mesh network technologies I am in particular have with quality Factor networks what I finish about that actually then move back to Ireland I started working on startup ideas based on my PHD work at startup was called

for streaming we were building Android solutions to their mobile devices to seamlessly and automatically move between Wi-Fi and cellular networks so we were quite early stage we had some seed funding as part of that we ended up getting a court quite early and pre-revenue by largest telecommunications company open and that was at the first air startup you’d like turn that was the first five people at that time and we got a quiet so good so I actually as part of the acquisition of it and I left my joint open it and then there then I was working out in the city of us running running the lads and they are then I became responsible for virtually stage new product development running a number of Nigeria teams working on but the research aspects of new Technologies

10 really kind of building activities of those products I got handed over engineering there for a few years and then actually left open and I joined another start a company called a log entries and I went to know the answer to run the engineering teams we were then actually acquired by a security company and bass Harbor security company called are wrapped and I work down there for a few years and actually left in and so to come up with a voice answered you to take a job at login trees which was not directly linked to the telecommunications and voice-over IP background that you had previous well known as startup and internet the time and they readjust solving colouring with heart problems and so it was a great opportunity for me to go in there in a new area and then also quite a lot of the skills are transferring my

startup time from the team send within open it’s gonna take what I’ve learnt and apply them within then log entries could you tell us a bit about what voices does the value proposition who your target customers no problem it solves for them completely it was founded by 100 CEO at the time then Peter have been academic staff with the new city working on new technology for the last 15 years see you then left you city and started to voices and he’s joined by the co-founder I am an executor Oban and that’s going to really started to scale so I join the team and 1/2 years ago and it’s time we were six people so were now 34 people I’m so growing up very quickly were headquartered in Dublin where we run we have heart

product engineering Anna mardling teams will to death of modelling team in Edinburgh and that’s headed by enhancing who was Googles former head of Texas beach and then all over I go to market activities Den or out of Foster the voices, it provides three quarter pieces of functionality I am so that is such that enables customers to do any type of research cider find a specific product to voice or even just describe the types of products and that they’re looking for and yeah so then we have what we call refinement so that enables a customer to follow on that initial search to further refine your search or contradict previous criteria they giving and really horny and on that perfect pronounce so it’s more like a conversation with that the app of the website that you can say no not that one that the other one absolutely absolutely and as part of that what we provide is very much a multimodal experience because

customers don’t want to just shut your pie talking to her she could have eyes because no that’s fine for things like grocery shopping where you’re ordering get milk and bread in the same things week in Week Out from most cases do it Thursday more considered purchases and you want to see the products that you’re searching for you want to see the results that are coming back and it’s very much a team of browsing experience and to what we find is hasn’t most interfaces customers will use what what what most convenient for them so because this is integrated into the existing applications and websites are caused by the end user will use touch when it’s convenient and use voice when it’s convenient it’s a big part of our product is that multimodal experience what’s already available not to replace it exactly and what we can do is we can take into account where the customer is in the journey so a very simple example of that might be the customer opens the application

I’m using catch the navigating to a particular product category maybe they’ve navigating to you know a particular product but then they meet up the microphone icon it’s a you know actually only show me one red for example because we know where they are injured in the journey and we have context the way they are at the application we take that into account when we process their query and we send them back the client application a data structure which offences update the application state based on what the user asked for it so it’s very very powerful and intelligent systems working on behalf of the customer really to get them the right results exactly where to find the perfect product and get it out of the car in a similar fashion as as pastor John could you describe how the product is built the major components and how they fit together

turn a different components as part of our platform boat during the training phase and runtime it’s probably best cover actually by describing the journey of another user query as a as it goes through our system so the first thing that happens obviously as the customer will have integrated our apis onto our are sdks onto the application user will tap the microphone icon we start streaming audio up into our system and the first thing that happens then is it goes through a DSP process then it goes through a voice activity detection model and that will decide if these are started speaking if they stop speaking and so on and so just gives his brain natural experience that as soon as the customer stop speaking the result Stacey get updated want to go through a voice activity detection that goes through acoustic modelling language modelling and see what they’re essentially do is Aaron Oliver acoustic models France

play Essential the use of audio into at the phonetics in the fields and from there then it goes into the language modelling and face and that’s right it’s the language model really that is its customer specific and that’s the part that really trained off the customers product catalogue and what that’ll do is typically what a language model to is it will do speech to text and we sometimes see that but we can also do what we call Speech To Meaning so embedded within the language model we may have other information so we may know things like this is a product category and Diesel particular keyword is associated with particular products in salon and then from the language modelling it will go out into intent classifiers so place on the way to use the speech what they’ve said we understand you know if the user is it in New search is user trying to add something to the cart at the refining an existing search and sort it did go through things like tiger is it

good few other models that are used to improve accuracy and then it goes into what about downstream microservices and that’s where all of the business logic AND existing translate that use a query into a data structure that sent back to the Clyde application that’s directly compatible with their existing search services the interface between you and the customer well kind of yet so that the microservices what handles constructing that data structure that specific to that customer has made available through the VIP I’ve got a lot of moving parts that sounds like it will take a long time to process mean is the performance in issue something something you have to deal with no so what we do actually is we do what we call on the Friday coding and so that means that as the user creates its control system in Street open real-time so this is offline processing her I’m so as it’s been

system we are after we get the first few frames we immediately stop processing the query was working before you finish that we don’t have to wait until you completely stop speaking to handle hold the whole query we can actually start doing at preprocessing Albert Google is predicting what you’re writing in the in the search bar before you finish the sentence yeah it just in this case of work when are continually sending those updates back we do wait until the user has completely stopped speaking until we send back the data structure but we are big enough lie within our system the on-the-fly processing is happening on the device or its been sent back and being processed in The Cloud so that the only us being streamed from the customers device open to our system on it then processed in on the fly in real-time Kansas there’s little or no audio processing on me

customers device very limited and sometimes don’t be a little bit of transcoding potentially done but but usually not my internet cache on mobile mechanics as I was coming out is that a problem how does it how does it handle that and finding out you’re not 16 kilobits per second to the bandwidth it is actually pretty low are apis yes we provide will handle you know a lot of buffering and things like that on the client side and so if there is lots more likes. Disconnects it can still be bought fortnite you still have a problem if your internet connection is completely down but you know we are processing and real-time but if there is a slight latency albino 50 milliseconds are there it’s not an impact the quality of the audio data processor any competition

I’d like to talk about about the your text Jack and the tools that you used to to build although the components that you’ve just described and could you describe some of the main the main ones that you guys using your team and whether they’re open source or whether you’ve development in house so I suppose two main parts of the sleep there is one time and then there is training during runtime variety of different technologies we obviously he’s coming for us to do that but motor tax that is written in C + + Java and Python runtime components that’s all operated in Cloud environments with things like docker and kubernetes and so on so that we can deploy on an EE cloud infrastructures that means we can seamlessly move between AWS Google Cloud platform is there an alternative customers wanted

tinder on premises that can be provided for as well apart from the client use case of them wanted to deploy on premises why would you want to do it to easily move between servers as I just a disaster recovery type situation Ora and supplier selection problem or it’s from a supplier selection actually so we done a lot of work and no play on AWS and more we see now were using Google Cloud as well and it really comes down to you in the camera space in particular a lot of customers and just only use Amazon and we need we need the ability to be able to seamlessly move across the sky departments because the way we constructed a runtime architecture. That’s pretty easy for us to the customers are connected to your system over an API and white why does it matter to them whether you you’re running on Amazon or not it is just because they don’t want Amazon to have their data

basically if you yesterday on Amazon to have access to the data and you know we do have we can Store audio and we can store all the interactions and so on so they just don’t want that within an address of destruction some cases since yesterday nice big face and is it raining so we we’ve done quite a bit of work in the space and we actually built her own decorating engine and that was in the earlier days of Voices because a lot of the open source training just wasn’t capable of what we needed to speak to it to be able to do and you decided that it optimises heavy for the hardware that we use a proprietary tools of variety of different problems that we found it just current source Open Source Project ill and can’t do a course we don’t reinvent the wheel would you use a variety of open source routing even even during training or increasingly starting to use tensorflow

somebody other to link that we use is Yalta colleges like in which is he known as or pipeline OK to use kodi openfst we use with an increasingly using a Stacey as well which is pretty powerful and that language processing toolkit and then use a lot of automation tonight because we provide a a a voice that per customer every customer gets around dedicated and point in their own dedicated boysack and automation is key for us to be able to scale and maintain all of these different important for these different customers I be able to generate an update for models that I need to be to use a lot of information to enable us to regenerate quickly can you give me some examples of the tools that use yet so we use another one with users Luigi’s Luigi’s a Netflix project and that will use that primarily for automating the building of all the different models and being able to reproduce an equal

models at any time I wanted to ask you about the major challenges that you face when building the system it sounds like there would be many and if you could give us some examples of what work to overcome them and also what didn’t and and what you learnt from that I’m being supported the flexibility of how you speak to the assistance from my perspective I think one of her remaining is was that just a complexity this systems requires a large set of different skills from across our team and a lot of these have mass of interdependency it’s very easy to make something work with to get something to jump from getting something functional to working really really well and I did it bring a really good user experience its huge am I so what we learnt was

not trying not treat you to the different components in the work items that are going on as Shadows rather different projects to work on have to be cross-functional because you’re so many interdependencies all along our pipelines until we came down to the way we structure our projects and research it seems to really prevent sliders I think that have the biggest impact what kind of restructuring did you do to tidy to do with and for all of the project leader finally create cross-functional teams so even if we have a piece of work that you know may be very modelling or AI Focus we make sure that we have engineers involved in a piece of work because ultimately it’s going to go from being a research piece of work potentially you know today that in 3 months time we’re gonna start rolling out the product I’m so we need to make sure that it’s no engineered correctly likewise or doing you know engineering works on some of our training tools are core platform we need to make sure that the Lions really well with what the the Armada

am I going to deliver with other aspects to it so we are the half we have linguists on our teams we have a psychologist and how customers interest face and talk to these different systems are just having all those people involved in different parts of these projects was Keith interesting because communication is something that no huge organisation struggle with trying to connect with the various departments but it’s important even in a company 30 or 40 strong like voices absolutely any communications just use primer date you know the depending on the hardware and I could be in I can be recorder engineering and I modelling teens St requirements in project management sales calling someone pushed for the teams that really important part is that everyone’s no in the same

room or continually continuously communicating on cycle projects working on one of the things I just came to mine was documentation as long as I don’t know how much you guys decide to invest in documentation when things are moving so fast and how much that can how far that goes to solve the problem of communication between different departments and onboarding new users excetera a new employees rather obviously as you said it can be quite difficult when there’s a moving really really fast particular the research so I do things because it’s changing so frequently we we we do maintain very good documentation on the processes once we have to find exactly how many works so we keep documentation and all the different Tooling we have all about development processes

work to develop that documentation spect for an estimated alongside though the actual work to code code the the functions and sell at those processes enough enough and I can taste in can be left to go as a laughter in as a final four intact on at the end Binding is an as a working on different stories that non acceptance criteria most worries would be that the documentation need to be updated for new projects and it’s to create documentation even if it’s just a minimal set. It just got somebody if they’re coming to a repository that even know where to look how developers can actually use voices in their products right now is it is it currently open for developers to go on and read the docs and stop playing with the code

impression that it’s so it is kind of a custom B2B sales process and to integrate voices into her into a customer but our developers are there tools and and guides that developers can use right now to get to grips with things yes it is it is closed for the moment so we don’t have any generic weight if your eyes that customers are that users can head today so the best way is to go onto our is the voices.com and to get to request a demo with our sales team because the types of Solutions we build to be required little bit of customization outside what we do this week we could it die with those customers and we really want to understand other product catalogue the use cases for that type of an engagement typical go through I can ideation phase will will really try and understand the requirements and we run and we run a user City few years of service as well to for the particular types

what’s inside a day selling we really want to understand and the types of interactions are customers will have with those and was like to be run user studies and to do that and then we know that you drive on the data as well to make sure that your data is of sufficient quality that has all of the information we need to be able to build a system that most of that work happens all on our site is very little work required in a customer side other than providing a product catalogue we do we do the rest of the work we spin up an Endpoint for them and then they can still access so there is documentation you know that the developers of voices.com we do have documentation up there on your what are apis looks like how to use the motorway to be the case so as a company moved forward or is there there’s always myself they’ll always be some custom custom work on on your end because it is so personalised to your client and I

the other partial beta to automate more and more of that city on boarding becomes smoother and more automatic as you gain more customers spend a huge amount of energy what amazing a lot of our pipelines so that we can have very fast turnaround times for for these initial initial integration certainly what we are working on it is exploding that functionally to developers as well so we have a lot of that actually that we use internally we just have an exposed it externally to developers because it still does require no knowledge of things like natural language processing me out some linguistic skills and so on but we are we are working on exposing that to do so that they can do things like you even add some new functionality in figure skater activate and deactivate and certain types of features

would you have the key beliefs for customers to regenerate models so we provide a database in which day are customers and upload their product catalogue and they can in a day can do that could be a nicer job can be happening in real time so every time they updating a product catalogue that it’s updating on outside as well to make sure we have the latest and greatest information at any point in time we can kick off Jobcentre regenerate all of their models to make sure that it understands about a customer may add new product category example singer in an essential for you guys as well you don’t you don’t want to have to do you know putting the offer every time a customer updates their products you want to give them the tools the power to do that rerun it and then everything’s everything’s changed their latest products exactly and or all of that is automated this right now fantastic what are the main considerations for a developer then it sounds like you don’t need to be an expert

natural language processing and understanding to to work with her and API and such as the one of my sis but there must be some best practices and some advice that you can give for and preparing the data and integrating these things into it to websites and mobile apps the main thing I think is quality of data I mean as as with all of these systems do you know the better the quality the date of the body systems perform and so we know the way our system works as we don’t want to bypass the customer search pipeline so you may need to make sure that they have fairly powerful search on their side if it’s just simple keyword search a week we can work with that and we can actually make your existing keyword search and more powerful because customers can use natural language and what’s really important is having good attributes product catalogues

it’s actually so you know you understand what categories and the particular products rainwater want the various in a attribute and measure attributes associated with a different products that’s what really makes these systems really really powerful and does does a typical custom already have that was there to do they usually require some guidance and help from you guys turn to clean up their data and put it in the correct format of work was hoping someone would in most cases actually pretty pretty good quality

it’s always a question in my mind because I’m starting at any kind of startup is a daunting experience and the idea that you’d be building something that would directly compete with what the likes of Google and Amazon doing those me was even more dread so yeah especially when they are these guys are at the cutting edge of natural language processing and they’re releasing one brakes after another how does that make you feel as a Development Team working that Phil does it inspire does it help you or do they end up taking all the best people in is it open tomorrow rising experience so I’m just be interested to hear your take on things it’s a question I think ultimately it inspires us and it helps us I mean companies like Google Amazon apple talking to devices normal right now fairly widespread acceptance and adoption of these Technologies

amongst consumers riding that way and we are offering is quite different to Ockendon Amazon provide they tend to be much more focused on your own ecosystems in the domains relevant to them from PE today go to you and Amazon or Google and ask them to train customer please take language models and all the business logic required to deliver this type of a service for you the best you get if you get access to their generic apis and would you generic transcription but there are we going to have a high error rates in particular domains which higher than you book at using a system like ours because it’s trying to pacifically off back customer service for car ok that’s that seems to be a common theme it’s not something we discussed in the last podcast actually was using using a cloud server to speech to text is never going to give you that the accuracy that you can get when when training on their

on specific data domain specific language Pacific cetera and you guys are doing exactly the same thing but with specific product catalogues tailored for individual customers yeah that’s really how you made this isn’t very powerful and work really really well if you have to go domain-specific you know I think we’re quite a while away from having generic systems that will perform very well in in very specific domains do I always think about it is essential what we provide is it’s almost like a salesperson that inherently understands and all of the details of all of your products

that’s really what differentiate us from samba de Janeiro transcription systems the final biggest shoes that I wanted to talk to you about was privacy and so so so hot topics in the news for a number of reasons and I’ll be interested to hear your your take on how how you designed the product to protect our users privacy I’ll see those early days invoice and people are going to be a little bit and cautious about speaking to microphones and wondering whether the data goes you could you describe are you know what steps to take to protect customers data how the data stored etc m e f Benson the outset we were out designer platform we were quite paranoid about a data protection because it is quite sensitive information so I guess one of the advanced

we have is a voice data which we can store the data we don’t have to so when one of our customers integrates when they send up an API request to our system as an optional user ID field and within within their own that’s just a random uuid if they provide that uuid we will store the audio but if they don’t provide that you I do we just discuss we don’t actually stores and the main reason for that is to be able to deal with deletion requests if we if there’s no way to car life and back to a specific user then we wouldn’t be able to do some questions are we just dance for the audio through each of your clients each of your clients has a configuration setting that allows them to I’m pretty to their clients to send her a unique custom identifier to your servers and be stored or not so either all of their customers are sending unique ideas or all of them are not as I crack

she have it both ways so I am the one employees can be done per request so for example they may want if it’s a guest user they may not want to sort out what the other was a logged in user they may want to sort that out so you can actually don’t it’s basically done by if they send the user ID data stored if they don’t it’s not there is a characters it’s not actually their user ID well we have absolutely no way of correlating that back to a specific get individual because he’s ready to just random numbers to us so all of that data around to our customers and we don’t have any access to that I say but either way whether they send an idea or not the actual voice data is stored on voices servers for a period of time

do you ready yes if they send the user ID did the audio data is stored if they don’t send it it’s not only store in the case where you have an idea which then allows them to delete it upon request exactly organic scents yeah it’s all about you know it’s over the wire and anything with you as with isolated per customer and so we have we got query store and again yet cos you’re using Google Amazon Alexa your day on that what’s your data we we isolated at our customer with increase were so there’s no leakage across all of the different customers we integrate with what we can then provide them with this is aggregated Analytics on that data and we can as well providing with anonymised for a query information but we’re not providing any you know anything that would uniquely identify a particular person

important actually is that if you’re dealing with one of these big guys like Google you’re at as an end user I know that my voice data is going to be used to train a model that’s going to be used for all of Googles customers but that’s not the case with voices know exactly do ups to have that all your day so it’s obviously huge beneficial for them because they can learn a lot from that data you know about what are the types of interactions that customers are having with the system and other customers are protected things which is the system currently doesn’t support and of course and using that audio data to make the system better and overtime but because again it all belongs to our customers our customers can just say delete all of us Alex delena or the interest not provide as user IDS and the date is never so much I really like that because as an end user when using an app or website your relationship is with the brand that your

shopping with and not with some anonymous that are 2 year natural language processing company still have to think about that that’s a really important ok so but voices is a cloud based service and I know that there are I can think of at least one startup that’s that processes voice on device I can they call it edge edge computing age based computing so why did you choose a cloud based architecture and what are the limitations are the considerations with that approach am so we have we have a look to us in our edge edge device type stuff and actually you are the types of models we built do you support at that more beneficial for very simple commands so things like you know what words do you do that processing on the device Where it at the customer says your hair blonder like you do it

hey Google it’s not that type of ice processing can work really well you can also add in in a little bit of work on this in terms of adding in three simple commands so it was a music use case for example things like play cards that wouldn’t need to leave device but to do any real natural language processing the hardware that the mobile devices have just simply isn’t sufficient to be able to process those types of careers and to support the size of models that are required to do that process I have suspected as much as a bit of a big climb to expect him and Tony are on board embedded device to go to perform all the magical understanding you just described with giving how many steps are in the process pretty well actually what I’m just means you for a carvery in very constrained and because of models you can have a large amount of flexibility in the way that customers can speak to the system so one of the things

medium for example as we know we take all the customers day that the product catalogue data and we we have what we call user simulator and it generates billions of interactions that customers may have with that service based on all of that data and that’s all committee were cats used to train our systems and so obviously because of that we want to be able to support the customer speaking to shoot to these systems using language whatever way they want to speak because of that the models end up being pretty large to have that flexibility and if you’re doing on the voice you’re very limited in the size of model you can happen therefore if a limit in the types of Aqaba you can you can see what you mention gdpr before I saw it actually becomes enforceable in a week on 25th of May says quite timely how are things are affected you know your team in the design and see me you saw it coming a long way after you design from it at the Beginning but as it made things more difficult

is it something that you guys have welcomed and what would have designed that way anyway I took a problem I mean we were fortunate in that we knew it was coming as we were designing parts of these parts of the platform and Aquarius weren’t so long as we’re able to design with that in mind as I mentioned earlier you know just the way we store the data and the way we handle the data into the box it’s gdpr compliant we mention this before but they’re there for the product and product side of things and the the voice detection is is only active once the user presses of items that means there’s no continuous monitoring of the uses voice there’s no hotword detection and then so therefore there’s no need for a mute button or anything like that just sounds pretty secure as it is

is already it’s very much and on-demand service the system when it integrated onto a customer’s website or application is only activated when the customer tabs that microphone icon snowy are we can support them we are working on adding enough capability to that in which case yes they will be in the adoption or right now and it’s only when when is our taps and we we we make it very clear when the system is listening and when the system is not listen what’s the share of responsibility between voice and your own your retail clients when it comes to providing privacy controls because I motion from the user’s perspective they will only think about the Brand and if there’s any data breach misuse of data they would that the responsibility in the user’s eyes will be on the Brand not on there any service that they choose to use

exchange rate where it’s putting a lot more or less on not only the data controller but actually on the data processor as well and so you obviously in the invoices service and where data processor and not a data controller won’t let that regulation has much more on the sonos and so it’s very much shared responsibility again because we’re not maintaining any way to uniquely identify the customer and all of that information is and with a doubt that our customers already have you know

it’s a tree tightly controlled I would say because of that and that’s great the final few questions that I wanted to ask a really ain’t giving her listeners some advice for working in the field specifically people are interested in voice technology development and contribute into the ecosystem and building some of the components you describe before I work in a company such as nice as what advice would you give to someone who’s just at the beginning of their career and what would have you found to be helpful in in your personal development is letter to where you are Ospreys developments in the space when I started getting in and looking at this face online resources just a ton of Fantastic online

Mrs Browns and Resources available so he looks like your automatic speech recognition intro to neural networks there is a fantastic opportunity cost as well somebody mind guys are not available online for me I try and focus on very heavily on the user experience and the customer needs it is all well and good optimising for low water rates are low latencies but I think this can be up in a matter for less what end customer experience like as the technologies applied as I tend to focus on your customer experience caring so John if I was considering starting a career in voice technology development and there are so many options to choose from so many different websites and books I could read

and machine learning and AI in itself is just an enormous topic it’s hard to know where to start if I wanted to go on to do something that I knew would be still in demand in a few years time and useful to me from her Chris point of view and what would be some of your recommendations and what area should I focus on and what are the main skills I should attempt to acquire yes I think the hardest people vote for us to get on board I’ve been called a burning experts so you know core machine learning and AI experts not just your data scientist by people that can really build out the architectures these networks and deep you understand how they work is always with building out these networks it’s this combination of an art and a science someone can understand a lot of the fundamentals but still never build models that perform really well until I think you need discipline

very detailed knowledge of how these networks are put together but also then head of being applied I’m So In other linguistic Aspects in natural language processing aspects of rebuilding it as smart as well somebody other areas that no team would be your linguist as well to link with that are very good technical skills as well to applying this technology really really interesting so theory and practice and information engineering as well to be able to build a system because I highly computationally complex and so you know high-performance compute infrastructure and then the expertise to build a decision to make up a scalable as well alright fantastic what’s on Horizon devices what will you be focusing your energies on over there the coming six to 12 months and are there any

new projects you’re working on right now that you can talk about sure yet so we continue to evolve a, suffering, working with our current and only by the customers are coming on board of course the never-ending research and engineering work in order to improve continually improve the accuracy the performance of the scalability of our systems that was one piece of work that were particularly excited about at the moment as I work on weight loss and four for Texas beach so wave that is radically new approach for generating radio and when using these methods are these techniques and output based stuff and much much more natural Daniel the standard of ur traditional parametric tests pieces and we’ve made amazing progress there the challenge with the mystic

generating them is highly computationally complex the big Challenge for us has been analysing this so we can get to work and real-time not Focus right now in the next couple of Korra it’s really interesting topic close to my heart actually understanding of waves that requires a huge amount of data and then compute in order to be able to generate they are incredibly lifelike voices the Google of demonstrated and your solution to that uses is parallelizing it because you’re generating you know what is it 16 km to generating you know that number of data points every 72 of those generated independently in the model so that the data compilations to do this often real time until paralyzing is still be so much where can people find out more about you

your work and down and voices in general sis sis yeah and then he says on Twitter as well Fantastic 4 episode number to John thank you very much really appreciate it and I wish you the best of luck with the with the coming here you just heard from John Fitzpatrick the VP of product and Engineering at voices John and his team have done an amazing job building a full end-to-end voice AI system that uses the latest technologies and research the voices product can be used across a range of verticals but they’re currently focused on e-commerce is there beachhead market for both business and technical reasons as we heard in our previous episode by focusing on a specific use case such as e-commerce the

size of the vocabulary and the range of the possible user requests to the voice interface is dramatically reduced making it possible to return very accurate responses John also explained that the decision to build a cloud-based system was based on the fact that is possible to handle a wider range of voice search requests and refinements rather than Edge base one also talked about the challenges involved in building a high performance system and how important it is to have software engineers on the team who are experienced in putting these technologies into production for those who are interested in a technical role in The Voice based we heard that it’s just as important to get hands on practical experience building the software as it is to understand the theory behind the algorithms lastly it was nice to hear John say that having offices in Europe may be easier to recruit Talent that might otherwise be snapped up by the likes of Google don’t forget to enter the competition to win an electronic copy of the book but business 101 how to start run and grow your bottle

cybusiness just send a tweet with the link to voicetechpodcast.com and include the hashtag voicetechpodcast.com as always you can find the show notes with links to the resources mentioned in the episode of voicetechpodcast.com and you can also follow me on Twitter at voicetechpodcast.com friend or colleague about this episode and if you haven’t done so already please head over to iTunes on stitcher and leave us a quick 5 star review it really helps also if you’d like to become a patreon you can support the show for as little as $2 a month at patreon.com / voicetechpodcast.com you’ve enjoyed episode 2 I’ll be back soon with another installment I’ll be in your house Carl Robinson thank you for listening to their

Subscribe to get future episodes:

Join the discussion:

Support the Voice Tech Podcast:

Share this article

What do you think?

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Posts

Conversational Marketing Experiences Wide
Artificial Emotional Intelligence
Florian Eyben Audeering

Get notified about new articles