HTAP Database: Breaking Down the Basics
At HarperDB we have been using the term “HTAP” a lot and I thought it would be helpful to break down this newish term that was coined a few years back by Gartner. HTAP stands for Hybrid Transactional/Analytical Processing and is a database model that can ingest high volumes of data while also enabling analytical processing concurrently. Existing architectures usually enable one or the other. NoSQL is excellent at horizontally scaling and enabling ingests of high volumes of data but is severely limited in deep analytics.
You can add indices to obtain your data in different ways, but this duplicates data and takes up more resources and increases cost. Even with these added indices you will never keep ahead of the data needs of your business. Traditional relational databases are excellent at deep analytics but sending high volumes of data will ultimately cause it to fall over as they vertically scale. On high volume loads traditional relational databases will start to incur row locking, collisions, connection pool issues, indexing issues, and on and on…
With an HTAP database’s single architecture, you are cutting out the need for multi-model databases or complex multi-tiered database architectures. Many times having more might seem like the better option which means multiple databases where each one fulfills a specific workload or one database that transforms the unstructured to structured to allow for analytics and ingest. But as Big Data becomes more complex, higher in volume, and with velocity faster than “real time”, the need for a single-model architecture that does it all starts making more sense. HTAP fulfills the holy grail of a single source of data truth without the overhead of transforms or pushing data through a Rube Goldberg machine of databases and ETL.
Flexing Full NoSQL CRUD with the Power of Schema-less SQL
Multi-tiered database architectures seemingly give flexibility that can handle data needs to organizations from modern enterprises to startups but there is a cost. That cost is literal in the sense of cloud fees and inflated infrastructure costs on top of maintaining a large team to keep it all running. Every step of the way there are potential failure points as well as administration and knowledge that are specific to each system. This limits an organization’s ability to adapt quickly to their changing data needs.
HTAP creates a straight path through this database maze. Offering a single model that has horizontal scaling and flexibility of NoSQL while providing the analytical power of SQL. The horizontal scaling enables massive ingest capacity, while concurrently allowing for reads. This is all available via a robust REST API that gives options to execute all of your operations via full NoSQL CRUD or full SQL CRUD. Rather than multiple databases each with their own indexing schemas, you have a fully indexed database allowing for robust analytics with no additional storage or memory overhead. Schemaless SQL gives your team freedom from adding columns or constraining to data types, because the HTAP database manages it all internally. Wrapping all of this functionality under a REST API creates a unique architecture of database as a micro-service.
HTAP can provide power to IoT
Now that we understand more about an HTAP architecture we can put that together with an IoT project. Utilizing an HTAP IoT model that can run on edge devices, sensors can transact locally in a cluster while at the same time, real-time analytics can execute on an individual device or across the entire network of devices. The devices can also replicate to a central server where more resource intensive analytics can be performed. The benefits from this are cost and time. Utilizing existing devices rather than relying on a server or cloud cluster keeps costs down as the IoT project is leveraging the network of devices. Time is also saved as the data does not need to sync to external services and is immediately available to report on via SQL.
In a real life example, a chain of convenience stores could maintain an IoT network where each store on the edge is recording data on its milk inventory as well as sensor data on refrigerators cooling temperature, etc. The central office can continuously report on inventory levels across stores to determine where they need to ship further products on a true per need basis. Meanwhile the store can receive alerts that a cooling system is having issues in real-time. Both scenarios keep the customer experience consistent as well as keep costs low with understanding what is happening at every store rather than relying on estimates or total equipment failure.
Clustering, the Glue to your HTAP/IoT Database
Database scale was historically achieved via vertically scaling a server. This creates a single point of failure and ultimately creates hard limits to how large of a machine can actually be created or can be budgeted. HTAP avoids this pitfall with clustering. Clustering is horizontal scaling across a network of devices and/or servers and allows for replicating data across devices which adds fail over to the data an organization relies on. Clustering your IoT devices with an HTAP database with a native REST API gives the ability to network across regions, warehouses, or continents allowing a business to get a seamless view of their organization while maintaining transactional uptime and robust reporting from within a single database.