Clustering

HarperDB’s clustering engine replicates data between instances of HarperDB using a highly performant, bi-directional pub/sub model on a per-table basis.

 

An common use case is an edge application collecting and analyzing sensor data that creates an alert if a sensor value exceeds a given threshold:

  • We don’t want our edge application making outbound http requests for security purposes.
  • We don’t have a reliable network connection.
  • We don’t want all the sensor data to be sent to the cloud- either because of our unreliable network connection, or maybe it’s just a pain to store it.
  • We don’t want our edge node to be accessible from outside the firewall.
  • We do want to send the alerts to the cloud with a snippet of sensor data containing the offending sensor readings.

 

HarperDB simplifies the architecture of such an application with its bi-directional, table-level replication:

  • The edge instance subscribes to a “thresholds” table on the cloud instance, so your application only makes localhost calls to get the thresholds.
  • Your application continually pushes sensor data into a “sensor_data” table via the localhost API, comparing it to the threshold values as it does so.
  • When a threshold violation occurs, your application adds a record to the “alerts” table.
  • Your application appends to that record array “sensor_data” entries for the 60 seconds (or minutes, or days) leading up to the threshold violation.
  • The edge instance publishes the “alerts” table up to the cloud instance.

 

By letting HarperDB focus on the fault-tolerant logistics of transporting your data, you get to write less code. By moving data only when and where it’s needed, you lower storage and bandwidth costs. And by restricting your app to only making local calls to HarperDB, you reduce the overall exposure of your application to outside forces.