HarperDB provides real-time access to data via various protocols including MQTT, WebSockets, and Server Sent Events (SSEs). Clients can monitor changes and stream to this data in real-time by subscribing to topics. Topics are based around database tables, similar to how Debezium works for the typical change data capture workflow.
In this three-part series, we’ll walk through how to stream data from external databases and subscribe to those changes using the protocols mentioned above. In Part I, we will cover AWS DynamoDB with later parts detailing MongoDB and DataStax.
The end goal is to replicate this demo from Jaxon.
Part 1: Streaming Data from AWS DynamoDB
AWS Setup
First, we need to create an AWS account and create a DynamoDB table. We’ll be utilizing DynamoDB Streams to capture changes and grab those data into HarperDB. Using the AWS Management Console, open the DynamoDB console at https://console.aws.amazon.com/dynamodb/.
Follow the directions from the AWS documentation on how to enable a stream. Any of the stream types (i.e. key attributes, new image, old image, new and old images) will work but in this demo we used the “New image” option. Take note of the stream ARN, as we will need that later.
Now, we’re ready to push some data into DynamoDB and subscribe to DynamoDB Streams to process them in HarperDB. Since the HarperDB team has published a demo, we will utilize that repo without having to write custom code ourselves.
HarperDB Setup
The rest of this guide assumes that you are familiar with how to set up HarperDB. If not, make sure to check out these guides to get started.
The demo code we will use is located at https://github.com/HarperDB-Add-Ons/hdb-component-realtimedemo. This demo utilizes HarperDB’s component functionality to run applications as part of the HarperDB deployment.
Clone this repository into the component directory of your HarperDB instance (e.g. ~/hdb/components)
Then install the components via `yarn` or `npm`.
Note that the latest Docker image does not support `linux/arm64` (i.e. Mac M1s) yet so if you get compilation errors, try following on an Intel or AMD VM.
Finally, we need to modify the `dbs/credentials/credentials.example.js` file to `dbs/credentials/credentials.js` and update the DynamoDB lines accordingly:
We also need to provide AWS credentials via the `aws.credentials` file in the same directory. Make sure the IAM credentials have access to DynamoDB operations.
Sending Simulated Data
Now that we have everything set up, we are ready to send some data. The example code actually has a demonstration UI that lets you publish to DynamoDB (it uses fake data underneath) and then pull that data into HarperDB.
You can follow the UI setup portion of the README to install the UI portion. Note that if you are running locally, you may run into issues with CORS and will have to allow CORS on your browser. Alternatively, you can run the commands under `dbs/dynamodb` manually as well. The `ingest.js` file sends some random UUID with random lorem ipsum content. The `cdc.js` file handles Change Data Capture by subscribing to Dynamo Streams and publishes the records to HarperDB.
Either using the UI or manually invoking the functions, try sending some data. You should see the fake data on HarperDB Studio populate like in the demo video.
Part 2: Setting up MQTT, WebSockets, and Server Sent Event Subscriptions
Setting up subscriptions are handled via the `harperdb-config.yaml` file. Let’s go through each one in more detail.
For MQTT, we will set the following config:
In the UI demo, locate the `MQTTWS.js` file. Here we can see subscriptions to MQTT using the configuration file. It parses the messages and updates React state to show the data.
WebSockets utilize the REST interfaces and use the `connect(incomingMessages)` method on resources. In the `WS.js` file, you can see a new WebSocket connection. It uses the addEventListener functionality to listen to new messages and parses them like the MQTT connection.
Finally, for Server Side Events, a new EventSource is added via config. Note that Server Side Events actually use the REST server interface underneath the hood..
Wrapping Up
We were able to stream AWS DynamoDB updates to HarperDB and also use MQTT, WebSockets, and Server Side Events to downstream consumers. If you were following along, you can see the AWS DynamoDB portion working from the demo. In our next article, we’ll also demonstrate how to enable this for MongoDB Atlas.