Data is the foundation of all modern software. Without data, what’s the point? Now, what happens when we need to move that data from one application and/or database to another? That’s where ETL tools come into play. ETL stands from Extract, Transform, Load and essentially means data migration from one system to another. Sometimes these tools are in constant operation, sitting between various applications to manage continuous data migration. Many times these tools can be over-utilized and implemented in places they really don’t belong. My vision for ETL has always been a one-time thing, used exclusively to migrate data from one system to the other, not as an integration tool. However, there are plenty of use cases when ETL is the best option. This blog addresses some existing options out there and some suggestions of how to improve your landscape.
Unless we’re talking about a one time data migration, committing to an ETL tool is a big deal. You’re going to want to choose a tool that can be hooked into anything and everything within your existing architectures. Since this tool is effectively ushering data from one system to another, it needs to work with everything, even if you don’t intend on integrating everything all at once. As soon as the tool is integrated into your environment it’s not going anywhere anytime soon. Let’s take a look at some of the existing use cases…
Data Migration
This is the foundation of all ETL. You’re upgrading/switching systems and you need to move data from one place to another. Whether that’s a database or software is insignificant. A developer needs to define the necessary transformations, accounting for variations in data types and encoding. Once this is done we basically hit go and hope for the best. In my experience, it never works the first time. So we iterate over and over again until the data finally moves as expected. While this is never a fun job, it is the perfect use case for ETL and ultimately what ETL was designed for in the first place.
Data Merge
This is a far more common occurrence than you might think. Whether it’s different divisions in a company trying to combine systems or corporate mergers and acquisitions, there are plenty of cases where data needs to be combined. This is very similar to the data migration use case, but in this case the data may be merging into one of the existing systems or a new system altogether. It can also be coming from multiple sources at the same time. In either case, the data merge is a one-time job.
Data Warehousing
The typical data warehousing case is when day to day transactional data is rolled up and used for data analytics. ETL tools are used to move data from the run-the-business software like CRM and ERP tools, to the data warehouse. A data warehouse is just a fancy term for a database designed with analytics in mind; these can be SQL or NoSQL databases. Of course, HarperDB can act as a data warehouse as well, but it can also handle the transactional steps that come before. The ETL operations that move data from the run-the-business software are in constant operation. They are usually time interval based or based on triggered events. ETL in constant operation, is an all too common design pattern of which I’m personally not very comfortable. It’s an added middleware layer that is simply complicating the process.
All of this is great, but what if there was a better way? This is one of the founding principles of HarperDB. We created our database solution because we think layers and layers of different software are unnecessary. HarperDB removes the need for ETL between many systems as it acts as an HTAP database, handling both high volume ingests and queries. In fact, HarperDB even offers additional SQL query functions to achieve transformations not possible in standard database solutions. Instead of adding complexity with ETL tools, why not give HarperDB a try for your data warehousing platform?