Deep Relational Querying for Object-Oriented Data
HarperDB 4.3 introduces native support for relationships, complete with the ability to define relationships in schemas, join data across tables in queries, and access related data through property accessors in JavaScript applications. Combining the capabilities of a distributed object/document database with relational models and join querying, provides a uniquely powerful combination for high-speed low-latency access to sophisticated data models.
Relational data is central to most applications. Even without formal relationship models or flat data storage, most non-trivial data models intrinsically have relational concepts, because our data describes a reality of highly interconnected concepts, ideas, and information. Adding relational capabilities to HarperDB greatly improves the ability to model and query data with relational information.
Efficient, Direct From Client URL Queries
While relational database management systems (RDBMS) are often synonymous with SQL, HarperDB’s object-oriented model for relationships provides significant advantages over SQL. First, HarperDB’s fundamental design is not just a database, but an application database server. Traditional architecture involves an application server that receives HTTP requests and translates them to any necessary SQL queries to fulfill the response. However, with HarperDB, we can “collapse the stack” and directly request HTTP requests that specify queries against the database. These requests can be executed directly and retrieve data from the database without the extra overhead of translating queries to another language and forwarding them to another server. With this model, HarperDB actually uses a query language that is probably even more pervasive than SQL (believe it or not!): URLs. While SQL and URLs are quite different languages, there are intuitive ways to encode queries in URLs that come with significant advantages. HarperDB allows for complex join queries directly encoded as query parameters in URLs, making it easy to build queries directly from clients without intermediate steps. URL queries are succinct, efficient, and highly intuitive.
Let’s take a classic example of querying, we might do a JOIN in SQL with:
And in a typical application server, the department name might have been passed in from an HTTP query parameter.
On the other hand, this query can be directly encoded into a GET HTTP request itself:
And with a simple compact query, we can perform a join query directly from HTTP.
Since HarperDB’s query system can return hierarchical data structures, joined records from tables can be returned with an intuitive and don’t-repeat-yourself (DRY) data structure that more efficiently represents the underlying data. For example, with the example above, the SQL flattens data results, so even if there are many employees and only a few departments, the department data is duplicated many times across the returned rows:
On the other hand, the results of the query above are structured in an efficient object-oriented structure, ready to be directly consumed by client code:
Relational Data Modeling
Part of the brevity of HarperDB relational queries is derived from the ability to define relationships and foreign keys as part of the data model. (Ironically, one of the shortcomings of SQL is that queries are not actually “aware” of the relationships of the data model, and the foreign key connections must be re-specified in the join of every join query). These relationships are defined as part of the data schema. For example, a one-to-many department -> employee relationship could be defined as:
Querying
With the schema in place, queries can be naturally formed with the employees as a property with sub-properties that can be compared in conditions or selected/included in the results. Queries can also include conditions that match properties in related tables. For example, to query for employees with over five years of experience, structured under department, we could write:
This query also demonstrates the use of the FIQL operators that allow for various comparison operators to be used, such as a “greater than” comparison.
HarperDB includes several additional query enhancements, including the ability to define a sort order and nested select capabilities. If we want to sort employees by experience (in decrementing order) and return their age, we could query with:
Access to relational data extends far beyond just RESTful queries. Once relationships are defined, we can reference and search by related/joined attributes with the search_by_value and search_by_value operations API. The relationships also define property getters in the JavaScript data model, making accessing related data in application code easy. For example, if we had retrieved an employee record instance, we could easily access department information through the department property, which automatically resolves to the correct department record from the employee’s departmentId:
GraphQL
Schema-defined relationship properties also provide a natural mechanism for leveraging joined/relationship data through GraphQL. With our new Apollo adapter component, GraphQL queries can be performed directly on HarperDB using Apollo. GraphQL doesn’t have any native mechanism to define join queries; GraphQL is more about specifying the graph of properties to return than complex queries. However, by using relationship properties, GraphQL can easily reference related data through the available predefined properties. For example, the join query above can also be executed with GraphQL (with Apollo component set up to connect to these tables):
Streaming
Performance is always at the center of HarperDB, and querying is no exception. One of the most innovative and uniquely powerful aspects of HarperDB querying is its query streaming capabilities. Traditionally, when a request is made with a query, there are numerous sequential steps:
- Receive request from a client for data and create SQL query
- Use or wait for an open connection in the connection pool
- Send SQL to the database
- Execute query
- Gather all the results
- Serialize and send the data back to the application server
- Gather all the results from the database server and deserialize
- Process, serialize, and send data to the client
This process involves a lot of expensive sequential steps (each requiring the completion of the previous before the next can start) before even a single byte can be sent to a client.
However, HarperDB uses streaming querying whenever possible. This means that when a query is parsed, planned, and starts to execute, once the first record is retrieved, that record can immediately be sent to the client, and the remaining querying can literally execute while the query results are being transferred and downloaded to the client. While benchmarks often just look at completion times with ideal connections, a more realistic real-world perspective considers the whole transfer process, which can often involve slower connections and clients that can begin consuming data before completion. And with HarperDB’s streaming, the time-to-first-byte (TTFB) can be vastly faster than traditional architectures for accelerating the critical bottlenecks in the data loading process. And the parallelization of query and transfer can dramatically improve real-world querying performance (even if this isn’t always fully realized in benchmarks that are only measuring an isolated query execution).
Integrity
Traditional databases have often relied on foreign key constraints and cascading deletes to maintain relational integrity across tables. While this approach has worked fine for single-server databases, foreign key constraints and cascading deletes do not tend to scale well horizontally as they require significant locking that doesn’t work well with fault-tolerant distributed systems. HarperDB instead relies on foreign key “tolerance” for integrity of resolving references. HarperDB querying is built to tolerate referencing delete records and handles the virtual property references in a predictable and deterministic way (can resolve to nulls in JavaScript, properly tolerates missing records in conditions for queries, etc.). This relationship data model is designed for HarperDB’s highly fault-tolerant, robust, and distributed data architecture, capable of horizontally scaling and still providing reliable relationship handling.
Summary
HarperDB’s new relationship and query engine capabilities in 4.3 have dramatically expanded what can be built and how data can be accessed with a modern distributed database. Take a look at our documentation for more information about the powerful query capabilities, and imagine what you can build with HarperDB.