Intro: Real-Time Message Capabilities
As a distributed database, HarperDB has long had the intrinsic ability to publish and deliver real-time data as it is written to our database. This functionality is essential for replicating data across a distributed cluster of HarperDB nodes. HarperDB replication is built on a publish/subscribe mechanism, allowing nodes to have real-time message delivery of transactions as they happen, to maintain data synchronization. This functionality is at the heart of HarperDB and is a robust, battle-hardened, and well-tested message delivery system.
Historically, this has been treated as an internal mechanism in HarperDB. However, we have increasingly seen the opportunity to leverage these capabilities, and provide convenient interfaces for users to tap into real-time data delivery.
Traditionally databases have functioned as a passive responder to queries. This means that building real-time functionality into applications often involves the integration of a multitude of complex and distinct servers, combining databases with message queue brokers and applications and all the difficulties of spanning these integrations, including authentication, authorization, message translation, notifications and more.
However, we believe the future of databases is real-time. Building real-time applications shouldn't require an extensive integration effort with multiple products. By exposing HarperDB's notification capabilities, applications can be built on HarperDB, easily perform traditional queries, and subscribe to real-time data notifications. By adding APIs to subscribe to our data, an entirely new realm of possibilities are unleashed with applications that can retrieve data instantly and monitor it in real-time. And these interfaces then provide the backbone for additional message routing in combination with data notification delivery.
HarperDB is built with real-time message capabilities. In this post, we wanted to lay out what you can do with these capabilities and the new interfaces and functionality that will be coming in the future to simplify, expand, and accelerate real-time communication.
HarperDB Custom Function with Stream Subscriptions
HarperDB uses NATS’s distributed messaging broker to deliver replication messages. HarperDB’s custom functions allow us to write an application that interfaces with the NATS messaging and deliver messages to interested clients in real-time using WebSockets.
WebSockets is a powerful transport protocol that facilitates access from web applications and distributed devices, and can traverse proxies and layer 7 gateways. Our custom function template demonstrates a simple JSON-based protocol, but numerous application protocols can be layered on top of WebSockets and built from our custom function template, including MQTT, AMQP, and more.
Overview
With this Custom Function, HarperDB will be connected to a WebSocket server to facilitate the publication of events via a data subscription.
HarperDB uses NATS to communicate between instances in a database cluster, and in this example a JetStream consumer is made to relay messages in the NATS stream out to WebSocket clients.
Setup
To setup this Custom Function, follow the steps at the top of the README. Basically, clone the code, setup config.json values, and restart.
How It Works
- Use a plugin to create a WebSocket server (@fastify/websocket)
- Use a consumer on the HarperDB leaf stream (https://docs.harperdb.io/docs/clustering)
- Allow clients to connect and subscribe to topics (schemas, tables, records)
- Pipe messages out to clients based on their subscriptions
Doing the Same with MQTT
WebSockets is a transport protocol that is commonly used for web applications and MQTT is an application protocol used for real-time messaging and often used in Internet of Things (IoT) devices. MQTT is often layered on top of WebSockets to facilitate pub/sub across the web applications and HTTP gateways. The above Custom Function design can be used to build a realtime MQTT service that behaves in a similar fanout fashion. MQTT uses topic subscriptions in the same way that subscriptions are made with this custom function, and MQTT subscriptions can be directly mapped to the functionality in this custom function. Likewise, MQTT publish commands can be mapped to record insertions or directly publishing messages on the connected NATS streams (topics can be mapped to NATS subjects).
Coming Capabilities: Real-Time Resource API
For our next release of HarperDB (4.2) we have built a new interface for interacting directly with databases in real-time. We call this the Resource API; a JavaScript interface that provides a standard interface for CRUD operations and subscribing to records, tables, and queries.
One of the significant benefits of the Resource API is that it provides a consistent path convention for addressing and locating resources and records across both RESTful interface and pub/sub topics. This means that pub/sub topics are directly aligned with database tables and records, and pub/sub protocols can be used to interface directly with the data in the database. There is no extra overhead for routing data between database and message queue.
The Resource API also builds on the custom functions of HarperDB to facilitate a highly extensible framework for defining application specific logic for CRUD operations as well as handling subscription and publishing requests, facilitating business logic for fine-grained control of access, data aggregation, transformation, and data structures.
MQTT Server Integration
The Resource API also provides the foundation for optimally integrated real-time protocols. We have built an MQTT server that will be included in the release, and other protocols have a direct path to implementation including AMQP, Server-Sent Events, and custom WebSocket protocols.
With the path and topic alignment, by using MQTT, you can easily choose topics that directly map to tables and records. Publishing to a topic publishes to the corresponding table record by primary key, and subscribing allows you to listen for any data changes or publishes on a given record, regardless of whether it was initiated through MQTT or through other database operations like SQL or custom function actions.
MQTT “retain” messages are particularly well-aligned with database backing, since publishing retain message directly maps to updating database records, and a subscribing to record will use the current record state as the retained message. This is the most optimal mechanism for using MQTT in HarperDB since it so naturally aligns with database access, and allows clients to connect and subscribe to data, and automatically get the current state of the data with maximum integration with other database interactions. This greatly simplifies reliable access without requiring quality of service acknowledgements and session tracking, and optimizing fast access to the latest data (over old data).
Structured Data
HarperDB is a database, not just a generic broker, and therefore highly adept at handling structured data. Data can be published and subscribed in all supported structured/object formats, including JSON, CBOR, and MessagePack, and the data will be stored and handled as structured data. This means that different clients can individually choose which format they prefer, both for inbound and outbound messages. One client could publish in JSON, and another client could choose to receive messages in CBOR.
Working with messages as structured data in a database greatly expands the way data can be processed. Messages that come into the database can easily be indexed, transformed, aggregated and joined with other data. Working with data at a structured level is just as powerful for message handling as it is for traditional database interactions.
Extensible Protocol Support
HarperDB’s upcoming release has an extensible plugin architecture, designed to support the additional new protocols, including other real-time protocols that can be built to facilitate more application specific protocols needs. Harper 4.2 includes protocols:
- HTTP - Ubiquitous protocol for web applications, and with REST conventions has powerful and flexible CRUD capabilities (but not a direct real-time protocol itself)
- WebSockets - A bi-directional transport protocol that uses HTTP upgrade mechanism to work within HTTP/web constraints. As a transport protocol, it can be used as an alternative to direct TCP for application protocols like MQTT.
- MQTT - A highly efficient pub/sub protocol designed for extremely lightweight clients with minimal client overhead (designed for IoT). MQTT supports different QoS levels that can employ server tracking to ensure message delivery.
And additional protocols can be built with the plugin system:
- AMQP - A highly robust and powerful messaging and pub/sub protocol that is designed for optimal delivery across a broad range of messages and applications
- Server Sent Events - A very simple protocol that is very easy to use in web applications with direct native browser support. This is a single directional push protocol, and is extremely efficient for single subscriptions in web applications.
We are currently load testing our MQTT server, where it is showing excellent performance and throughput characteristics. HarperDB is a highly concurrent, multi-threaded server, and the subscription and message delivery system is built on the fastest JavaScript serialization/deserialization library available along with the shared memory mapping capabilities of LMDB, facilitating exceptional speed as records and messages move across threads and are delivered to clients. We will post results as we finish our load testing efforts on more expansive networks.
The Future is Real-Time!
HarperDB is a distributed database built for efficient, fast real-time distribution of data through NATS messaging technology. This can be leveraged for real-time access to data right now, and we are building flexible and powerful new ways to access real-time data and messaging through extensible APIs and standards-based protocols including MQTT.