Syed Ahmed is a Developer Advocate at PubNub, a global Data Stream Network and real-time infrastructure-as-a-service company based in San Francisco, California. PubNub’s products allow developers to build realtime web, mobile, voice and IoT applications. Low network latency is especially important for voice enabled applications that integrate with IoT devices, as any delay during a conversational interaction can affect usability and frustrate users.
In this episode, Syed explains how PubNub solves the latency problem by using a serverless architecture of PubNub Blocks, and walks us through a simple real-world example of a voice enabled doorbell. We learn about the publisher-subscriber pattern that underpins this technology, and why building voice apps with PubNub is quicker, easier and much more scalable than other methods.
A nice surprise during this conversation was that using PubNub Blocks not only improves the network performance of voice apps, but can be used to add advanced features such as contextual memory between phrases, and even between voice devices on different platforms, allowing developers to build more advanced interactions. It’s a great episode, so check it out!
Sebastian Hanfland is the CEO of the audio branding consultancy, Hanfland And Friends. His team help companies define their their brand in audio form, in order to promote brand recognition, increase the perceived quality of products, and influence customers.
Sebastian explains what audio branding is, and how he selects the right voices and sounds for a brand. We explore many interesting examples of audio branding for products, customer service, workplaces and more, and hear a demo of their latest audio branding project for Humanzing Technologies. We also cover the typical challenges faced on audio branding projects, and what brands can do to prepare for the voice technology revolution.
This fantastic episode tells you all you need to know about sound design and audio branding. It is essential listening, not only for people working with voice technologies, but also for designers, marketers and content producers from every industry.
Bryan Colligan is the co-founder of AlphaVoice, the easiest way to get your podcast and audio content onto Amazon Alexa and Google Assistant. In an especially fun conversation, Bryan shares his vision of how the podcasting and voice technology worlds are set to collide, and how the implications will be felt by platforms, content producers and consumers alike.
He reveals the 4 main business models of content producers, explains why voice is the tech wave that will dominate, and foretells the impending convergence of media. We then explore the AlphaVoice product, and examine his plans to build the ultimate Q&A voice interface using content transcription and indexing.
We also touch on how Google entering the podcasting space has influenced his product vision, and tackle the important subject of voice search, including the new paradigms of voice to voice, and voice to video search. Whether you’re a publisher, producer or consumer of digital content, this is an unmissable episode!
Dogac Basaran is a post-doctoral researcher at CNRS, the French national scientific research centre. Today, in part 2 of 2, we explore Dogac’s research into audio fingerprinting, alignment, and melody extraction. By analysing the magnitude of frequency peaks and their relative spacing, Dogac shows us how it’s possible to create audio fingerprints that can be used to detect and match audio recordings, even if they contain noise or are incomplete. These fingerprints have a variety of uses, including aligning multiple recordings of a single speaker/performance, and identifying a particular recording.
We also discuss query by humming, the state-of-the-art technique that takes an audio fingerprint of a person humming a melody, and matches it to a database of music recordings. Dogac also explains why learning how to build neural networks has become an essential skill in this field.
Dogac Basaran is a post-doctoral researcher at CNRS, the French national scientific research centre. Today, in part 1 of 2, Dogac gives us a crash course in signal processing, where we learn what signal processing is and discover some of its many applications.
Leveraging his teaching experience, Dogac uses simple language and real-world examples to explain the fundamental signal processing concepts that are used in voice technology today. He defines frequency, period, and stationarity, and describes how sound cards use sampling and the Nyquist theorem to convert analogue signals into digital.
Pablo Arias is a final-year PhD student in perception and cognitive science at the audio research lab, IRCAM, in Paris. We discuss Pablo’s work on the how smiling changes the voice, and how people perceive smiling and non-smiling voices.
First Pablo explains what cognitive science, neuroscience and perception are, and why research into these areas is so important. He then takes us through the aims, methods, and results of his latest research paper into smiling in the voice, and we discuss the academic and technological implications of his work.
Benjamin Etienne is a data scientist at Rogervoice, a mobile app that allows deaf and hard-of-hearing people to use the telephone. Ben shares his inspirational story about how he taught himself data science and machine learning in the evenings, so he could work in a more technical role. He tells us why he’s not keen on Kaggle competitions, and why getting a job in data science is the best way to master it.
Greg Beller is the Head of the Interfaces Research and Creation team the leading audio research laboratory IRCAM in France. He is also the founder of SYNEKINE, a live entertainment company which mixes art and science in the spirit of research.
We explore the relationship between sound and physical space, and the link between our voices and our gestures. Greg explains what prosody is and its importance in speech and communication.
In this episode I talk with Charles Cadbury, owner of the London-based technology consultancy, Champers Advisory, about his experience building voice applications, and the fascinating future of voice technology. He was great fun to talk to, and had plenty of surprising facts and interesting stories to share. You’re going to really enjoy listening to this episode!
Charles shares his extensive knowledge and experience on a wide range of topics related to voice. We cover the challenges when working with client data, how payment transactions can and will be handled over voice, how voice assistants will change the landscape of consumer sales and marketing, and much more.
This episode covers 8 of the most interesting voice startups that I found at the Vivatech technology conference in Paris, France.
Included in this episode is a voice transcription and synthesis mobile app for the hard of hearing, a voice enabled smart alarm clock that can monitor your sleep quality, a robot behaviour system that delivers CMS content in person, and a comprehensive voice assistant platform that can handle multiple requests in a single query.