[I have two websites where I write blogs. This one started in December, 2003. My other one began in 2012. Across the two sites, this is my 7,000th post.]
New companies with new ideas continue to drift into my vision. This news relates to developers and managers who see the potentials for their products if a reliable Voice AI could be added. This technology whets my old product development appetites for new products or enhanced applications. The company is called Deepgram. It has some pretty impressive credentials. If you’re an innovative thinker looking for a way to jump start your application, take a look at this.
Deepgram marketing sent a quick summary of their 2024.
AI Company Ends 2024 Cash-flow Positive with 400+ Enterprise Customers, 3.3x Annual Usage Growth Across the Past Four Years, Over 50,000 Years of Audio Processed, and Over One Trillion Words Transcribed.
The company’s tools include speech-to-text (STT), text-to-speech (TTS), and full speech-to-speech (STS) offerings.
“2024 was a stellar year for Deepgram, as our traction is accelerating and our long-term vision of empowering developers to build voice AI with human-like accuracy, human-like expressivity, and human-like latency is materializing,” said Scott Stephenson, CEO of Deepgram. “Our product strategy from founding has been to focus on deep-tech first, and the work we have done in building 3-factor automated model adaptation, extreme compression on latent space models (LSMs), hosting models with efficient and unrestricted hot-swapping, and symmetrical delivery across public cloud, private cloud, or on-premises, uniquely positions us to succeed in the $50B market for voice AI agents in demanding environments requiring exceptional accuracy, lowest COGS, highest model adaptability, and lowest latency.”
Deepgram expects to end 2025 as the industry’s only end-to-end speech-to-speech solution built to solve the four critical challenges of enterprise-ready voice AI:
- Accuracy / audio perception: Enterprise use cases require high recognition, understanding, and generation of specialized vocabulary in often challenging audio conditions. Deepgram solves this through novel, non-lossy compressions of these spaces for rapid processing paired with generation, training, and evaluation on synthetic data that precisely matches Deepgram customers’ real-world conditions.
- COGS at scale: Deepgram customers need to profitably build and scale voice AI solutions. Deepgram delivers this through its unique latent audio model with extreme compression combined with deep expertise in high-performance computing.
- Latency: Real-time conversation requires near-instantaneous responses. Deepgram achieves this using streaming state space model architectures, optimized specifically for the underlying hardware to deliver minimal processing delays.
- Context: Effective conversations are deeply contextualized. Deepgram will pass the speech Turing test thanks to its ability to train on vast bodies of data that thoroughly represent its customers’ use cases and pass that context through the entire system and interaction.