Zachary and I had the pleasure of attending #CES2019 in Las Vegas. We saw a lot of really incredible things from Flying Vehicles, to Haptic VR suits combined with human gerbil like balls, and every imaginable type of AI, 3D printer, image recognition software, and smart home device - like a cocktail maker and smart diaper.
After wandering around for a while and watching empathetically as countless teams circled around their malfunctioning robots due to a lack of cloud connectivity, we began asking all of the presenters a single question. What happens when your solution is offline? We were met with responses that ranged from a cute awkward smile to outward hostility. 100% of the responses were some variety of - “Our solution only works with cloud connectivity.”
Honestly, this was really surprising and astounding to Zach and me. No one at CES that we spoke to had a persistent data strategy on the edge or edge decision making capability. Was this because we are focused on Industrial IoT, and CES was more focused on commercial IoT where things are less mission critical? We even spoke to folks in industrial verticals who were working on mobile products for preventative maintenance on commercial properties for example. Their products didn't’ work at all offline. What happens when you’re in the basement or the boiler room when there is a power outage?
I tried to talk to an interactive robot with AI embedded in it, and because of lack of connectivity all it could say to me was, “Sorry what was that?”. Repeatedly.
One could make the argument though that even consumer grade IoT has reached the point where it needs to work regardless of network connectivity or bandwidth. Shouldn’t your doorbell work without the internet? It did ten years ago, why not today? Shouldn’t your fridge, microwave, washing machine, lights, tv, and everything else in your house work that is now “smart”.
Take Amazon Alexa for example. What happens when it is offline? It doesn’t work at all. Wouldn’t it be great if when Alexa was offline that you could still do basic tasks like set reminders, timers, do math, help me spell words, and things that only required you interacting with Alexa, rather than Alexa interacting with the internet? I don’t expect for example that Alexa should be able to tell me how tall Mount Everest is without connectivity, but I do think it should have a persistent data strategy offline that allows it to interact, store data, and make decisions offline. Alexa can’t do that today because it’s 100% dependent on AWS.
*DISCLAIMER: The rest of this post is going to be a somewhat unabashed promotion of HarperDB, because while I try not do that normally, I objectively believe it’s the best solution to solve this problem.
If instead of being entirely dependent on AWS, each Amazon Alexa had HarperDB embedded inside, it could operate entirely offline, answer any questions that didn’t require a google search or web services calls to AWS, store task and alarms locally, and synch all that data to the cloud when it came back online automatically thorough HarperDB’s clustering and replication.
You could apply this same paradigm to most things at CES - providing a wealth of edge independent capability by utilizing HarperDB with local data storage and edge decision making capability. Furthermore, with features like time to live, one-way clustering, and table level replication you can improve security and reduce costs while doing it.
How does that work you say? Easy.
You can store data that you need on your edge device like row level data, user data, device data, etc. So, take a smart thermometer for example and let’s say it’s taking data points every 1 second. Let’s say you have 45,000 customers, that’s 45,000 data points a second, and 2.7 million a minute.
Do you want to move every single one of those readings to the cloud? Probably not. Do you want to move the aggregate time series data, maybe the 1-minute average to the cloud? Sure. How about the events like the temperature outside in an expected range? That would be ideal to have in your cloud application.
The above graphic shows a data paradigm that allows for keeping data where it makes sense. With Time-To-Live in HarperDB, you can delete data at a table level after a configurable period of time, ensuring your local storage doesn’t overflow.
You can also replicate data from the edge to the cloud, cloud to the edge, or edge to edge at a table level.
As a result, you could have an IoT architecture that looks like the below:
As you can see in the above diagram the cloud app and on-device app are talking to the same db with the same end-points, this really simplifies code. Replication from device to cloud is done via native HarperDB protocols. Data storage on device, in the cloud, for analysis, for transactions, and for BI is all done via a single application. IT applications and Business Intelligence tools can be easily integrated into HarperDB via ODBC or JDBC.
The goal of HarperDB is to provide companies a single interface that allows them to interact with their data from the edge to the cloud. They can manage their entire data pipeline on 1 application that is easily configurable, without having to maintain a complex and expensive set of tools.