Lessons from Subsea Oil & Gas on Managing Data Pipelines

In the oil and gas industry, pipelines are an essential component of the business, essentially transporting product from its source to a destination, where it can be processed to meet client requirements.  As new product sources are found and exploited, they are integrated into existing pipelines for transport to their ultimate destinations.  In the subsea oil and gas market, this term is called a tieback and greatly reduces the overall costs for overall project, as it leverages existing infrastructure to support the new fields.  Technological advances have allowed operators to increase the distances for these tiebacks by offering in field processing to assure that there is consistent flow within their pipelines.  Subsea processing close to the edge allows operators to add new fields to existing pipelines in a very cost effective manner.

A parallel can be drawn between the subsea tieback industry and the data management market.   Data pipelines transport and manage data from a variety of business sources, including systems such as ERP, CRM and the various business units within a company.  When new data sources are “found,” companies integrate these new sources into their existing data pipelines, where they are transported to end users either at a departmental or enterprise level.  As many companies begin to tap into their operational data sources, they are finding that their existing data pipelines and processes are becoming strained due to the significant increases in volumes and the diversity of data being collected.  As new technologies were developed in the early 2000’s to support subsea tiebacks, similar advances are now required to support these new data sources including storage, processing, and overall data management.  One key component of this strategy consists of distributed and edge databases which can facilitate processing and filtering data at its source, while delivering value to consumers at various points along the pipeline. 

At many of the companies where we are implementing HarperDB, the data pipelines have been in place and often consist of traditional twisted pair Ethernet cables traversing the plant augmented with WIFI in more remote locations.  As these companies begin to adopt machine learning, conditional maintenance, and real time dashboards, their data volumes have begun to tax their existing data pipes.  In one example, a client attempted to collect high frequency data from vibration sensors, but the data volumes crushed their network – effectively disabling a majority of their operations.  In order to keep pace with the business need for more real time sensor data, companies can either invest in new network infrastructure or follow the lead of the subsea tieback industry and utilize existing pipelines by moving their processing capabilities closer to the edge.

 In traditional IIoT architectures, edge devices collect and forward data to centralized computing environments.  This store and forward method can be expensive and complex to maintain as thousands of data streams must be transmitted across existing data pipes and then processed on server farms typically stored in a data center or in the cloud.  As the processes become more complex and increase in variety, the costs can increase exponentially.  In addition to the increased load on the network pipes, the management of the centralized computing infrastructure can be prohibitive for many companies. 

Distributed computing offers clients the ability to move their analytics and processing from a centralized environment to the edge.  Using architectures based on edge databases, IoT sensors can write locally using ARM based microprocessors, and process and filter the data immediately. The analytics can occur in real-time and the data reduction allows companies to leverage their existing investments in networks – data pipes.  Using distributed databases such as HarperDB provides a platform for implementing interconnected database nodes across the network which can aggregate data for specific users, while transmitting only the most necessary data to the enterprise for modeling and business intelligence.    

In a distributed data pipeline, data would be collected and processed on the edge, and only distributed to nodes which are close to the end users and their applications.  In the vibration example, an edge device running a HarperDB IoT database with a filtering and processing tool, would be deployed on the edge to collect the high frequency data but only distribute data that has relevancy to the business.  When specific datasets meet specific criteria, it would be replicated to a node in the control room where it could be displayed on a real-time HMI (human machine interface) or dashboard.  Additional filters and processing can be completed on the control room or edge node for use in an enterprise historian (data warehouse).  In this scenario, the business can take advantage of their sensor data while maintaining their existing infrastructure, by distributing their data management and computing closer to the edge.  

As subsea oil and gas operators added new product sources into existing pipelines, companies can add new data sources which leverage their existing data pipeline infrastructure, by moving their processing and data management directly to the source.