Select Star Logo
February 19, 2018

The Beowulf ARMy and the Advantage of HTAP for IoT

Generic Placeholder for Profile Picture
February 19, 2018
Stephen Goldberg
CEO & Co-Founder of HarperDB

Table of Contents

One of the things we have been talking about a lot in the IoT space is taking advantage of existing capital investments for HTAP workloads.  For example, a lot of companies are purchasing massive amounts of IoT devices; however, in a lot of cases those devices are being used for collection only.   

Something that really interests me is the idea of using those devices for pushing data processing workloads onto the edge and maximizing those capital investments.  

Steven J. Vaughan-Nichols wrote a really great article about how to Build your own supercomputer out of Raspberry Pi boards.   Now this concept is slightly different than what I am suggesting; however, it demonstrates fairly nicely that using commodity hardware and a Beowulf Cluster you can achieve the same compute power as a rather expensive machine for less than $2000.    

Here is a comparison of the supercomputer mentioned on the article to other commercially available hardware with similar compute that I found via a quick google search. 

 Machine  Rasberry PI SupercomputerHPE ProLiant DL380 G9 ServerDell PowerEdge R815RAM256GB256GB256GBCPU32 x 3.1GHz Intel Xeon E3-1225 quad-core processor2 x Intel Xeon 12 Core E5-2678V3 2.50GHz 30MB Intel SmartCache 120W TDP4 X AMD OPTERON 12 CORE PROCESSORS 6174 2.2GHZ 12MB L3 CACHECores1282448Cost$1500$11,904$3196

 Keep in mind that the ZDNet article is 5 years old, so Raspberry Pi’s have improved and the cost has gone down.  That said from a simple comparison you can see that from a compute/cost perspective you are getting a lot more bang for your buck with the Raspberry Pi Beowulf Cluster.  

Now as I mentioned above this is slightly different than the use case that interests me; however, it does strongly demonstrate the point that commodity hardware can be used to achieve greater performance at a lower cost.  That said, obviously configuring a 32 node system has a lot of complexity and maintaining it is more complex than a simple vertically scaled server.   Additionally, commodity hardware is not necessarily built to the standard of these servers so as a result, failures may occur more frequently, and using the hardware in this configuration could cause outages due to those failures. That is why I am more interested in the concept of harnessing that compute power as a distributed cluster.  Let’s say you are trying to build the next major social network and you are expecting 100,000 visitors an hour.   

Imagine you took those same 32 nodes in the Rasberry Pi supercomputer and deployed them individually, potentially even in different geos, behind a load balancer like ngnix.  If your application/platform supports horizontal scale, and most do today, this should be pretty straightforward.   Now imagine if you deployed instances of HarperDB or another IoT database that could be installed directly on an ARM build and supports clustering and replication on each of those Raspberry Pi’s or Dragonboards.  In the case of using HarperDB with our clustering capability you could replicate the data and entire schema across all 32 nodes.  This allows each of those nodes to be independent of each other, rather then dependent in the above supercomputer use case.    

If one of the nodes fails, which is bound to happen, it doesn’t effect the cluster.  Because of the clustering and replication capability in HarperDB all of the data is synched to the other 31 nodes in real-time so nothing is lost.  Load can simply be transferred from the failed node to the active 31 nodes.   If your load balancer is configured to round robin traffic across the nodes, each node should get roughly 312 visitors per hour.  That is 5 visitors per minute.  Each one of those Pi’s should easily be able to handle that traffic.  If a node fails, they should easily be able to handle double that traffic.    You now have a geographically redundant infrastructure that could support irs.gov for $1500 in upfront capital cost.   

To me this is one of the most promising possibilities surrounding micro-computing.  This enables new companies to get up and running for a fraction of the cost.  Furthermore, it enables existing companies to rapidly prototype and innovate in new ways at a much lower price point.  This will increase their margins allowing them to build new and exciting technologies and potentially reduce their cost which will benefit consumers globally.  

While you're here, learn about HarperDB, a breakthrough development platform with a database, applications, and streaming engine in one unified solution.

Check out HarperDB