Organizations around the world are investing enormous amounts of resources in pushing computing to edge devices. There are use cases across most industries like self driving cars, smart grids, healthcare, and many more. These solutions are beginning to take on similar architectural patterns as they evolve from concept to reality.
Most architectures include some lightweight micro-computing edge device like a smart watch, pressure sensor, camera, etc. Those devices then capture data and perhaps cache it locally, but then push it direct to the cloud for processing to services like AWS. At HarperDB we are really focused on what we call the “Data Value Chain”. This is a multi-step process that traditionally looks like this:
- Ingestion/Collection
- Transformation
- Analysis
- Action/Reaction
In the architecture mentioned above, which is quickly becoming standard, Ingestion/Collection occurs on the edge; however, steps 2 thru 4 occur in the cloud. In some cases this is fine and desired. That said, when you want to see real-time actionability from your data value chain, this can create issues. The reason is that it takes time and significant infrastructure to move the data from the edge into the cloud. In Kyle’s latest blog about Hybrid Clouds he goes over some of the other pitfalls of this all in cloud strategy.
The main issue IMHO is speed. Take this scenario for example, a CPG company wants to do just-in-time manufacturing for their products using real-time updates from sensors on store shelves in several hundred thousand retail locations across the world. They put simple IoT sensors on store shelves that measure weight to calculate inventory. Those sensors could be sending several billion inventory updates a day.
Pushing this level of data to the cloud for transformation and analysis is going to create a significant bottleneck, and be incredibly expensive. One project we worked on with similar throughput going into AWS cost us roughly $40,000/ month for just the throughput.
The other issue is going to be the time to analysis. Because this data is being moved to the cloud into a single data repository, it is going to take significant time, and require massive vertical scale to run this analysis. In that same project we ended up buying a Cray Super computer to handle our analysis and processing. For some companies transforming and analyzing can take hours, in other cases it can even take days.
Instead imagine that you were able to perform Transformation, Analysis, and Action/Reaction directly on the edge. Instead of simply using a simple caching mechanism on your IoT sensor in the store, you supplied a full enterprise class database running directly on that device giving you an intelligent edge.
You could leverage the distributed computing power of those IoT devices at a significantly lower price point than vertically scaled hardware. Furthermore, because you already purchased the devices to meet the collection requirements, with all data collection on the edge, it is now possible to perform distributed querying across all the devices in real-time. This allows the CPG company in the scenario above, to perform real-time analysis of their inventory rather than waiting for hours or days at a significantly lower price point than cloud services.
At HarperDB it is our strong belief that the data value chain is going to move directly to the edge over the course of the next few years as more and more organizations see the need for real-time analytics directly on an intelligent edge.