GigaOm Report Highlights Data Management Tools Role Meeting Unstructured Data Challenges

You have been bombarded with analyst predictions about the gigantic quantities of data generated by the Industrial Internet of Things. Most of this is beyond the capacity and/or utility of traditional historians. Perhaps even SQL databases. This news discusses a report and analysis of building a data management strategy. If you aren’t there yet, you will be soon. Even if you are not a Fortune 500 company.

Datadobi, the global leader in unstructured data management software, today released a new report titled, “Building a Modern Data Management Strategy” compiled by GigaOm, a technology research and analysis company and leading global voice on enterprise technologies. The GigaOm analysis reveals the challenges enterprises focused on hybrid and multi-cloud infrastructure are facing today, the increasing need for unstructured data management, and the role played by Datadobi solutions in addressing these requirements.

Opening with an examination of how data management has become key to modern IT strategies, the GigaOm research goes on to explain why the growing demand for IT infrastructure flexibility, keeping control over data, making it quickly available, more secure, and reusable is becoming the only viable solution for keeping budgets under control, while creating additional opportunities for the entire organization.

According to the report, Datadobi provides “a complete set of tools that will help users address today and tomorrow’s infrastructure-driven, data management challenges.” As a result, its solutions are able to quickly impact infrastructure TCO with immediate results, offering customers a core data management foundation that is growing with additional options for users in every type of industry.

GigaOm report author, Enrico Signoretti commented, “Because of data growth, data management is now a necessity in order to understand the data and know what to do with it. The first step towards a sustainable long-term data storage strategy is to understand what, how, and why we save in our storage systems and then take actions depending on the business and operational needs. From this point of view, Datadobi is uniquely positioned to offer a core data management foundation that is growing with additional options for users in every type of industry.”

“This report clearly explains the importance of approaching data as a resource and not a liability, and as a result, why effectively managing unstructured data is key to success,” said Carl D’Halluin, CTO of Datadobi. “Our experience and track record of innovation means we have solutions that help large enterprises address these needs to build modern data management strategies that meet their evolving needs.”

PAS Releases Sensor and Data Integrity

New capability ensures configuration data integrity and signal tracing to improve process safety, reduce cyber risk and support digital transformation

PAS (now PAS Global, part of Hexagon) has long provided some valuable and interesting solutions for process automation. Its Integrity series of configuration management tools now integrated with its cyber security work offers many benefits. This announcement was highlighted at our meetings (virtual, of course) at the 2021 ARC Industry Forum.

PAS Global announced Sensor Data Integrity, a new Automation Integrity module, which enables industrial organizations to ensure configuration data integrity for smart and traditional sensors with signal tracing and validation. This addition to Automation Integrity helps reduce both process safety and cyber risk in support of digital transformation and Industrie 4.0 initiatives.

As industrial organizations expand their deployment of smart sensors, it is becoming increasingly more complex to manage configuration consistency across field device management, distributed control systems (DCS), programmable logic controllers (PLC), safety instrumented systems (SIS), historians and other operational technology (OT). Managing the complex configuration of millions of multi-vendor sensors consistently has become a major challenge for industrial companies. The lack of effective sensor management also puts digital transformation initiatives at risk of falling short of their intended benefits, potentially wasting multimillion-dollar investments.

The new Sensor Data Integrity module provides multi-vendor:

  • Discovery of smart, industrial IoT, and traditional analog sensors
  • Visibility to the complete inventory and potential cyber vulnerability for sensors
  • Creation of templates to define approved configuration for each sensor type
  • Automated detection of configuration errors
  • Automated identification of devices that don’t match assigned templates
  • Cross-checking of parameters (ranges, units, etc.)
  • Support for large-scale, multi-site sensor deployments
  • Sensor signal tracing, validation and visualization

The information provided by Sensor Data Integrity can also be leveraged by sensor asset management systems (AMS) to support instrument calibration and can feed PAS Cyber Integrity to support cybersecurity vulnerability assessments.

“PAS has a strong history of customer-led innovation and the development of Sensor Data Integrity builds on that tradition,” said Eddie Habibi, PAS Founder. “The expansion of smart sensors is making it increasingly difficult for operations teams to monitor for configuration drift and inconsistencies. This means teams are spending more time trying to find issues instead of correcting them, which increases the risk of poor plant performance and cyber vulnerabilities. PAS, now part of Hexagon, is the first technology provider addressing this challenge with a multi-vendor solution that works across OT systems.”

With Sensor Data Integrity, industrial organizations will:

  • Reduce manual effort in reconciling sensor and field device configurations
  • Improve plant performance and reduce safety risk (e.g., fewer unit trips due to bad configurations)
  • Reduce sensor configuration drift and errors by more than 40%
  • Enhance decision-making with higher-quality sensor diagnostics
  • Leverage sensor data for vulnerability assessment and obsolescence planning
  • Reduce sensor-related cost overruns before startup (e.g., accelerated loop check out)

“Multi-vendor sensor configuration management is a long-standing challenge in the industrial sector and the problem is only getting worse with the proliferation of smart sensors,” said Larry O’Brien, Research VP ARC Advisory Group. “In a 2017 study, ARC estimated the process industries lose as much as $1 trillion per year due to unplanned downtime. Misconfigured or inconsistent sensor configurations are key contributors to these events. We are pleased to see PAS, with support from key customers, has introduced sensor data integrity to address this pervasive and growing problem.”

Collaborative Information Server for integrated management of plant operations

Here is the Yokogawa announcement from this year’s ARC Industry Forum. This Collaborative Information Server fits firmly in the trend of providing plant data for better decision-making at both the operations and management level.

Yokogawa Electric Corporation has developed Collaborative Information Server (CI Server) as part of the OpreX Control and Safety System family. The solution will integrate the handling of all kinds of data from plant facilities and systems to enable the optimized management of production activities across an entire enterprise, and provide the environment needed to remotely monitor and control operations from any location. By reducing the need for travel, this also helps to lessen the risk of infection with COVID-19.

As supply chains stretch across the globe and customer needs grow ever more diverse, many companies today are having to deal with increasing complexity in the supply of raw materials and in their operations. At the same time, they are experiencing labor shortages as their most experienced operators age and retire. Under these circumstances, companies must pursue efficiency in their operations and make decisions quickly in response to market changes if they are to remain profitable. And to streamline operations and ensure the safety of their workforce, there is a rapidly growing need for remote solutions that will enable personnel to work together without actually having to be on site. 

To meet these needs, Yokogawa has developed CI Server, a solution that automatically aggregates the data that has been acquired from plant facilities and systems so that personnel in any location can monitor and operate them and have access to all the information needed to make swift and effective decisions.


1. The streamlining of operations and assurance of safety through an operating environment that can be accessed anywhere 

A plant operates most effectively when there is full collaboration between plant operators, experts in areas such as maintenance and quality management, and decision makers at headquarters, as well as with other plants. CI Server provides a remote operation environment that supports wide-area communications and allows plant operations to be monitored and controlled from remote locations such as integrated operations centers. CI Server can be used from any PC or mobile device with a web browser to monitor and control a plant’s operations. Efficient operation from any location is facilitated by creating a suitable dashboard for an organization and granting the necessary access permissions. 

As well as helping to facilitate smooth collaboration among decision makers and other experts who are not at plant sites, CI Server also aids in the efficient operation and management of power plants that are often spread out over a wide area and situated in harsh environments, offshore installations, and other facilities. Furthermore, by eliminating the need for travel to plant sites, CI Server reduces the need for interpersonal contact and thereby leads to a lower risk of COVID-19 transmission for individuals, companies, and the community. 

2. Smooth data integration and centralized management of the information needed for swift decision making 

Facilities and systems often differ in the data formats and the communications protocols that they employ, and this complicates the aggregation of information and the management of data in a unified format. CI Server supports a range of communications protocols, and can not only acquire process data from control systems, but also aggregate data such as the operational status of facilities and equipment, raw material and finished product inventory, and energy consumption. Data on equipment maintenance, product quality, and other items are all gathered automatically in real time, converted to a unified format, and linked and associated. Data from a wide variety of systems and devices made by different vendors can also be gathered and integrated, both on an individual and multiple plant basis. 

The collection and organization of the required data using a unified format previously had to be done manually. The automation of these tasks by CI Server saves time and ensures that the right information is delivered in real time to the right persons, for swift and effective decision making. CI Server enables a quick response to market changes and aids in the optimization of costs and enhancement of operational efficiency not only at individual plants but across an entire company’s manufacturing operations.

3. Use of data in operational improvement activities and in applications

Not only is CI Server’s integration of data useful in managing plants, it also helps to improve production efficiency and quality. CI Server enables the linkage of data in a unified format so that it can be used across the board in information systems, quality improvement systems, data analysis applications, and other such systems, and the data collected over long time periods can be automatically incorporated into and utilized by such systems and applications. 

For example, in process industries where the quality of raw materials and the soundness of production equipment is closely linked to the quality of finished products, the analysis of data with the assistance of artificial intelligence (AI) software can identify new correlations and important key performance indicators that can help to reduce failures and improve overall operations. Furthermore, by constructing a digital twin for a plant using data that has been integrated by CI Server, it is possible to verify new solutions, new parameter settings, and other such operational improvements in advance. 

Shigeyoshi Uehara, a Yokogawa vice president and head of the IA Systems and Service Business Headquarters, says, “The new Collaborative Information Server solution provides the requisite data management infrastructure for customers to carry out their digital transformation (DX). Yokogawa calls the future of the manufacturing industry IA2IA, industrial automation to industrial autonomy, and we will help industries with this migration from being automated to being autonomous. Yokogawa possesses solutions to improve operational efficiency, energy efficiency, quality, and other aspects through data utilization, and with CI Server will provide support to customers for improved production activities and for sustainable business growth.” 

Data Platform Brings Order to Data Lake Query Acceleration Chaos

New standard in data virtualization enables organizations to support interactive analytics on the data lake by leveraging Varada ‘dynamic indexing’ technology that automatically accelerates and optimizes analytics workloads with ‘zero data ops’

Data Ops is hot right now. We have our data lakes and ponds and clouds and probably rain, but how to find, break silos, and manipulate all that stuff requires work. This company just crossed my horizon. Varada has built and released a Data Platform to help you out. Check out its press release.

Varada unveiled its data virtualization platform which helps organizations instantly monetize all of their available data with a predictable and controlled budget. Using a dynamic indexing technology, the Varada Data Platform enables data teams to balance performance and cost of queries at massive scale, without ceding control of their data to third-party vendors.

The Varada Data Platform, available today, offers advantages compared with other data virtualization tools:

  1. Embrace the data lake architecture, allowing organizations to retain full control of their data and avoid vendor lock-in. Because the Varada Data Platform sits atop a customer’s existing data lake, there is no need to move data or budget for additional ETLs and storage, which reduces both cost and complexity while enabling data teams to keep data secure under consistent policies.
  2. Offers “glass box” visibility into how workloads perform. Data teams get deep visibility into workload performance and cluster utilization. They can easily define workload priorities, business requirements and budget. Varada automatically optimizes workloads to meet those performance and budget requirements. Even without the input of data architects, Varada continuously monitors workloads to identify heavy users, hotspots, bottlenecks and other issues and, using machine learning, elastically adjusts the compute and storage cluster. Alternatively, data teams have the option to exercise fine-grained control of budgets and business requirements, so they can gain full control and flexibility.
  3. Applies unique “adaptive indexing” technology to effectively accelerate queries. The Varada Data Platform drastically reduces query execution time and the required compute resources. The key is Varada’s proprietary indexing technology, which breaks data across any column into nano blocks and automatically chooses the most effective index for each nano block based on the data content and structure. This unique indexing technology is what makes queries extremely fast without the need to model data or move it to optimized data platforms.

“The beta period for this product has proven two things,” said Eran Vanounou, CEO of Varada. “First, that organizations are desperate for a way to simplify data ops management while getting the cost of query acceleration under control. Second, the path we’ve chosen is striking a chord: Varada is a ‘zero data ops’ approach that eliminates data silos by serving many workloads from one platform. And because all queries will run atop the data lake, there is a single source of truth that eliminates the need to move or model data. With several dozen early users on the platform, it’s time to bring this innovative approach to a market that’s ready for it.”

Pricing and Supported Data Sources 

The Varada Data Platform currently runs on AWS and supports reserved, on-demand and spot instances. Pricing is per-node, based on a predefined scaling group. The Varada Data Platform is available on AWS Marketplace with integrated billing through AWS, or via AMI (Amazon Machine Image). Enterprise support is also available from Varada.

The platform supports a wide range of data sources and formats, including:

  • Data Formats: ORC, Parquet, JSON, CSV and more
  • Data Catalogs: Hive Metastore, AWS Glue
  • Additional Data Sources: PostgreSQL, MySQL and more

Coming soon are support for GCP and Azure.

About Varada 

The Varada mission is to enable data practitioners to go beyond the traditional limitations imposed by data infrastructure and instead zero in on the data and answers they need—with complete control over performance, cost and flexibility. In Varada’s world of big data, every query can find its optimal plan, with no prior preparation and no bottlenecks, providing consistent performance at a petabyte scale. Varada was founded by veterans of the Dell EMC XtremIO core team and is dedicated to leveraging the data lake architecture to take on the challenge of data and business agility. Varada has been recognized in the Cool Vendors in Data Management report by Gartner Inc.

Shape your future with data and analytics

Microsoft Azure had its day on Dec. 3 just as I was digesting the news from rival Amazon Web Services (AWS). The theme was “all about data and analytics.” The focus was on applications Microsoft has added to its Azure services. Anyone who ever thought that these services stopped at being convenient hosts for your cloud missed the entire business model.

Industrial software developers have been busily aligning with Microsoft Azure. Maybe that is why there was no direct assault on their businesses like there was with the AWS announcements. But… Microsoft’s themes of breaking silos of information and combining advanced analytics have the possibility of rendering moot some of the developers’ own tools—unless they just repackage those from Microsoft.

The heart of the meaning of the virtual event yesterday was summed up by Julia White, Corporate Vice President, Microsoft Azure, on a blog post.

Over the years, we have had a front-row seat to digital transformation occurring across all industries and regions around the world. And in 2020, we’ve seen that digitally transformed organizations have successfully adapted to sudden disruptions. What lies at the heart of digital transformation is also the underpinning of organizations who’ve proven most resilient during turbulent times—and that is data. Data is what enables both analytical power—analyzing the past and gaining new insights, and predictive power—predicting the future and planning ahead.

To harness the power of data, first we need to break down data silos. While not a new concept, achieving this has been a constant challenge in the history of data and analytics as its ecosystem continues to be complex and heterogeneous. We must expand beyond the traditional view that data silos are the core of the problem. The truth is, too many businesses also have silos of skills and silos of technologies, not just silos of data. And, this must be addressed holistically.

For decades, specialized technologies like data warehouses and data lakes have helped us collect and analyze data of all sizes and formats. But in doing so, they often created niches of expertise and specialized technology in the process. This is the paradox of analytics: the more we apply new technology to integrate and analyze data, the more silos we can create.

To break this cycle, a new approach is needed. Organizations must break down all silos to achieve analytical power and predictive power, in a unified, secure, and compliant manner. Your organizational success over the next decade will increasingly depend on your ability to accomplish this goal.

This is why we stepped back and took a new approach to analytics in Azure. We rearchitected our operational and analytics data stores to take full advantage of a new, cloud-native architecture. This fundamental shift, while maintaining consistent tools and languages, is what enables the long-held silos to be eliminated across skills, technology, and data. At the core of this is Azure Synapse Analytics—a limitless analytics service that brings together data integration, enterprise data warehousing, and Big Data analytics into a single service offering unmatched time to insights. With Azure Synapse, organizations can run the full gamut of analytics projects and put data to work much more quickly, productively, and securely, generating insights from all data sources. And, importantly, Azure Synapse combines capabilities spanning the needs of data engineering, machine learning, and BI without creating silos in processes and tools. Customers such as Walgreens, Myntra, and P&G have achieved tremendous success with Azure Synapse, and today we move to the global generally availability, so every customer can now get access.

But, just breaking down silos is not sufficient. A comprehensive data governance solution is needed to know where all data resides across an organization. An organization that does not know where its data is, does not know what its future will be. To empower this solution, we are proud to deliver Azure Purview—a unified data governance service that helps organizations achieve a complete understanding of their data. 

Azure Purview helps discover all data across your organization, track lineage of data, and create a business glossary wherever it is stored: on-premises, across clouds, in SaaS applications, and in Microsoft Power BI. It also helps you understand your data exposures by using over 100 AI classifiers that automatically look for personally identifiable information (PII), sensitive data, and pinpoint out-of-compliance data. Azure Purview is integrated with Microsoft Information Protection which means you can apply the same sensitivity labels defined in Microsoft 365 Compliance Center. With Azure Purview, you can view your data estate pivoting on classifications and labeling and drill into assets containing sensitive data across on-premises, multi-cloud, and multi-edge locations.

 visit us here

Yesterday, Microsoft announced that the latest version of Azure Synapse is generally available, and the company also unveiled a new data governance solution, Azure Purview.

In the year since Azure Synapse was announced, Microsoft says the number of Azure customers running petabyte-scale workloads – or the equivalent of 500 billion pages of standard printed text – has increased fivefold.

Azure Purview, now available in public preview, will initially enable customers to understand exactly what data they have, manage the data’s compliance with privacy regulations and derive valuable insights more quickly.

Just as Azure Synapse represented the evolution of the traditional data warehouse, Azure Purview is the next generation of the data catalog, Microsoft says. It builds on the existing data search capabilities, adding enhancements to help customers comply with data handling laws and incorporate security controls.

The service includes three main components:

  • Data discovery, classification and mapping: Azure Purview will automatically find all of an organization’s data on premises or in the cloud and evaluate the characteristics and sensitivity of the data. Beginning in February, the capability will also be available for data managed by other storage providers.
  • Data catalog: Azure Purview enables all users to search for trusted data using a simple web-based experience. Visual graphs let users quickly see if data of interest is from a trusted source.
  • Data governance: Azure Purview provides a bird’s-eye view of a company’s data landscape, enabling data officers to efficiently govern data use. This enables key insights such as the distribution of data across environments, how data is moving and where sensitive data is stored.

Microsoft says these improvements will help break down the internal barriers that have traditionally complicated and slowed data governance.

Do You Need A Data Scientist or Data Engineer

You will find references to data often in this blog. Perhaps I’ve even been guilty of a phrase, “It’s all about the data.” Back in 2016, I wrote a post where the title included both “data” and “engineering.”

Marketing managers have been pinging me this year evidently after doing web searches for key words. They get a match on one of my blog posts and write trying to get a link added or an article published. They usually don’t know my focus or even what type of media this is. Many think I’m traditional traded press.

It must have been in such a manner that the marketing manager for Jelvix, which looks to be a Ukrainian software development and IT services company, wrote to me referencing this post I did in 2016. She referenced an article on the company web site by Python developer Vitaliy Naumenko regarding whether or when do you need a Data Scientist or a Data Engineer.

That is an interesting question–one which I have not run across in either my IT or my OT travels.

According to IBM’s CTO report, 87% of data science projects are never really executed. 80% of all data science projects end up failing. Mainly, this happens due to the market’s inability to distinguish data scientists and engineers. 

Even now, it’s surprisingly common to find articles online about data scientists’ responsibilities when some of them belong to the data engineer job description. A lack of understanding of what data scientists can and cannot do leads to a high failure percentage and common burn-out. 

The thing is, neither data scientists nor engineers can act on their own. Scientists hugely depend on engineers to provide infrastructure. If it’s not set up correctly, even the most skilled scientists with excellent knowledge of complex computational formulas will not execute the project properly.

The data development and management field include many specialties. Data engineers and scientists are only some of the roles necessary in the field. These positions, however, are intertwined – team members can step in and perform tasks that technically belong to another role.

Check out this image, for example. I like the addition of business as well as technology.

Check out his entire article if you are involved with doing something with all the data you are collecting. He suggests organizations for small, medium, and larger organizations. Unfortunately for me, industrial or manufacturing markets are not listed as specialties of the company. But the company has some good ideas to share.