Select Page

A3 Expands Event Lineup with FOCUS: Intelligent Vision & Industrial AI Conference

Just in from The Association for Advancing Automation (A3) about a timely new conference. I’m not sure I can make it to Seattle for this conference, but it looks like a good place to explore timely topics.

The Association for Advancing Automation (A3), the leading voice in automation and robotics, today announced the launch of a new industry event, FOCUS: Intelligent Vision & Industrial AI Conference. Set to take place September 24-25, 2025, in Seattle, this conference will provide an in-depth look at the latest advancements in machine vision, imaging technologies, AI, and smart automation applications. Attendees will explore cutting-edge innovations in vision systems and imaging while also diving into real-world case studies on AI-driven automation across industries, including manufacturing, aerospace, agriculture, defense, energy, logistics and medical devices.

With AI-powered automation and vision systems rapidly improving quality control, predictive maintenance, and robotics capabilities, industrial leaders need actionable insights to stay ahead of the curve. Unlike broader industry conferences, the FOCUS: Intelligent Vision & Industrial AI conference will center specifically on the real-world applications of AI and vision technology, featuring expert-led sessions, in-depth case studies, and hands-on technology showcases.

Registration opens soon! Stay ahead of the curve—visit the FOCUS 2025 page and subscribe for updates to be among the first to know when registration goes live.

Honeywell Unveils AI Assistant For Industrial Operators

AI Assistants comprise the new entry ticket for software developers. A tech writer I’ve followed for years recently posted a worry that Microsoft might bring back “Clippy” in AI guise. I’m glad to see Honeywell joining the trend. They’ve been quiet for some time in all their various divisions. With the announcement of breaking the company into parts makes this news both poignant and essential.

These assistants based upon generative AI are becoming an essential ingredient to user interface as a new generation of operators and engineers enter the workforce.

Honeywell announced the latest release of Honeywell Forge Production Intelligence, which seamlessly integrates performance monitoring with a new generative AI assistant to help operators and production managers automate tasks and troubleshoot problems.

By leveraging advanced generative AI models, the platform’s new Intelligent Assistant is designed to enhance user experience by allowing engineers, plant managers and business leaders to access key insights through simple, natural language prompts. The tool will also enable industrials to visualize, trend, and troubleshoot production issues from Key Performance Indicator (KPI) deviation contributors and asset relationships.

The cloud-native platform merges performance monitoring with advanced analytics, enabling rapid root cause analysis of production issues. With the addition of the Intelligent Assistant, users can now summarize deviations and overall insights quicker and more effectively. The capability not only enhances AI insights with greater explainability and usability but also supports closed-loop collaboration workflows with case management integration.

Honeywell Forge Production Intelligence is part of Honeywell’s recently announced suite of AI-enabled solutions for industrials which also includes Experion Operations Assistant and Field Process Knowledge System.

Deepgram Achieves Key Milestone to Delivering Next-Gen, Enterprise-Grade Speech-to-Speech Architecture

While the tech giants are wrestling with their speech AI products, Deepgram seems to be delivering useful products for developers in a variety of applications. Following my last post about the company comes this news about speech-to-speech technology without intermediate text.

Deepgram announced a significant technical achievement in speech-to-speech (STS) technology for enterprise use cases. The company has successfully developed a speech-to-speech model that operates without relying on text conversion at any stage, marking a pivotal step toward the development of contextualized end-to-end speech AI systems. This milestone will enable fully natural and responsive voice interactions that preserve nuances, intonation, and emotional tone throughout real-time communication. When fully operationalized, this architecture will be delivered to customers via a simple upgrade from our existing industry-leading architecture. By adopting this technology alongside Deepgram’s full-featured voice AI platform, companies will gain a strategic advantage, positioning themselves to deliver cutting-edge, scalable voice AI solutions that evolve with the market and outpace competitors.

Existing speech-to-speech (STS) systems are based on architectures that process speech through sequential stages, such as speech-to-text, text-to-text, and text-to-speech. These architectures have become the standard for production deployments for their modularity and maturity, but eliminating text as an intermediary offers opportunities to improve latency and better preserve emotional and contextual nuances.

Meanwhile, multimodal LLMs like Gemini, GPT-4o, and Llama have evolved beyond text-only capabilities to accept additional inputs such as images, videos, and audio. However, despite these advancements, they struggle to capture the fluidity and nuance of human-like conversation. These models still rely on a turn-based framework, where audio input is tokenized and processed within a textual domain, restricting real-time interactivity and expressiveness.

To advance the frontier of speech AI, Deepgram is setting the stage for end-to-end STS models, which offer a more direct approach by converting speech to speech without relying on text. Recent research on speech-to-speech models, such as Hertz and Moshi, has highlighted the significant challenges in developing models that are robust and reliable enough for enterprise use cases. These difficulties stem from the inherent complexities of modeling conversational speech and the substantial computational resources required. Overcoming these hurdles demands innovations in data collection, model architecture, and training methodologies.

Deepgram is transforming speech-to-speech modeling with a new architecture that fuses the latent spaces of specialized components, eliminating the need for text conversion between them. By embedding speech directly into a latent space, Deepgram ensures that important characteristics such as intonation, pacing, and situational and emotional context are preserved throughout the entire processing pipeline. What sets Deepgram apart is its approach to fusing the hidden states—the internal representations that capture meaning, context, and structure—of each individual function: Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS). This fusion is the first step toward training a controllable single, true end-to-end speech model, enabling seamless processing while retaining the strengths of each best-in-class component. This breakthrough has significant implications for enterprise applications, facilitating more natural conversations while maintaining the control and reliability businesses require.

One of the requirements in enterprise speech-to-speech modeling is the ability to understand and troubleshoot each step of the process. This is particularly challenging when text conversion between steps isn’t involved, as verifying both the accuracy of the initial perception and the alignment of the spoken output with the intended response is not straightforward. Deepgram recognized this need and addressed it by designing a new architecture that enables debuggability throughout the entire process.

AI-Powered Asset Performance Management

AI contributes at least 50% of the content of news in my area of interest. No surprise that Yokogawa has harnessed some AI expertise for its Asset Performance Management.

Yokogawa Electric and UptimeAI announced a strategic agreement aimed at enhancing asset performance management in industrial plants. The agreement is underscored by a capital investment in UptimeAI by Yokogawa.

Under the agreement, the companies will integrate UptimeAI’s AI-powered platform into Yokogawa’s OpreX Asset Health Insights service. The combined solution will provide customers in the oil and gas, chemicals, cement, power, and renewable energy industries with a seamless and powerful approach to optimize plant operations, reliability, and maintenance.

Specifically, the bundled offering will merge the capabilities of OpreX Asset Health Insights as an OT/IT data enablement engine with UptimeAI’s flagship modules, “AI Expert: Generative AI” and “AI Expert: Reliability & Process,” bringing advanced LLM-based AI agents, subject matter knowledge, self-learning workflows, maintenance analysis, and industrial asset library models into a comprehensive AI assistant for plant operators. This solution will enable users to achieve a significant positive return on investment in a short period of time by reducing maintenance and operational costs with predictive insights, root cause analysis, and recommendations driven by automated learning processes.

Deepgram Empowers Developers From Startups to Global Enterprises to Build Voice AI

[I have two websites where I write blogs. This one started in December, 2003. My other one began in 2012. Across the two sites, this is my 7,000th post.]

New companies with new ideas continue to drift into my vision. This news relates to developers and managers who see the potentials for their products if a reliable Voice AI could be added. This technology whets my old product development appetites for new products or enhanced applications. The company is called Deepgram. It has some pretty impressive credentials. If you’re an innovative thinker looking for a way to jump start your application, take a look at this.

Deepgram marketing sent a quick summary of their 2024.

AI Company Ends 2024 Cash-flow Positive with 400+ Enterprise Customers, 3.3x Annual Usage Growth Across the Past Four Years, Over 50,000 Years of Audio Processed, and Over One Trillion Words Transcribed.

The company’s tools include speech-to-text (STT), text-to-speech (TTS), and full speech-to-speech (STS) offerings.

“2024 was a stellar year for Deepgram, as our traction is accelerating and our long-term vision of empowering developers to build voice AI with human-like accuracy, human-like expressivity, and human-like latency is materializing,” said Scott Stephenson, CEO of Deepgram. “Our product strategy from founding has been to focus on deep-tech first, and the work we have done in building 3-factor automated model adaptation, extreme compression on latent space models (LSMs), hosting models with efficient and unrestricted hot-swapping, and symmetrical delivery across public cloud, private cloud, or on-premises, uniquely positions us to succeed in the $50B market for voice AI agents in demanding environments requiring exceptional accuracy, lowest COGS, highest model adaptability, and lowest latency.” 

Deepgram expects to end 2025 as the industry’s only end-to-end speech-to-speech solution built to solve the four critical challenges of enterprise-ready voice AI:

  • Accuracy / audio perception: Enterprise use cases require high recognition, understanding, and generation of specialized vocabulary in often challenging audio conditions. Deepgram solves this through novel, non-lossy compressions of these spaces for rapid processing paired with generation, training, and evaluation on synthetic data that precisely matches Deepgram customers’ real-world conditions.
  • COGS at scale: Deepgram customers need to profitably build and scale voice AI solutions. Deepgram delivers this through its unique latent audio model with extreme compression combined with deep expertise in high-performance computing.
  • Latency: Real-time conversation requires near-instantaneous responses. Deepgram achieves this using streaming state space model architectures, optimized specifically for the underlying hardware to deliver minimal processing delays.
  • Context: Effective conversations are deeply contextualized. Deepgram will pass the speech Turing test thanks to its ability to train on vast bodies of data that thoroughly represent its customers’ use cases and pass that context through the entire system and interaction.

Universal Robots Unveils AI Accelerator

Augmented (Artificial) Intelligence (AI) must have a hidden marketing hype person somewhere in the metaverse. We must remind ourselves that the AI revolution-to-be has not yet really arrived.

Companies have begun releasing AI augmented products. These will become part of the learning curve we must all endure. Then we’ll figure out that these are just another tool in the kit.

This news comes from Universal Robots, the Danish collaborative robot (cobot) developer. Its UR AI Accelerator – a ready-to-use hardware and software toolkit created to enable development of AI-powered cobot applications.

Designed for commercial and research applications, the UR AI Accelerator provides developers with an extensible platform to build applications, accelerate research and reduce time to market of AI products.

The toolkit brings AI acceleration to Universal Robots’ (UR) next-generation software platform PolyScope X and is powered by NVIDIA Isaac accelerated libraries and AI models, running on the NVIDIA Jetson AGX Orin system-on-module. Specifically, NVIDIA Isaac Manipulator gives developers the ability to bring accelerated performance and state-of-the-art AI technologies to their robotics solutions. The toolkit also includes the high-quality, newly developed Orbbec Gemini 335Lg 3D camera.

Through in-built demo programs, the AI Accelerator leverages UR’s platform to enable features like pose estimation, tracking, object detection, path planning, image classification, quality inspection, state detection and more. Enabled by PolyScope X, the UR AI Accelerator also gives developers the freedom to choose exactly what toolsets, programming languages and libraries they want to use and the flexibility to create their own programs.

The PolyScope X platform is globally available and can be used for all cobot automation applications across industries. With a small hardware upgrade, the software is compatible with UR’s e-Series cobots and the new-generation cobots UR20 and UR30.

Follow this blog

Get a weekly email of all new posts.