Strict Schema Enforcement vs. Schemaless vs. Dynamic Schema





The debate over whether to use a schema or not has passionate support on both sides. One side appreciates data integrity constraints and predictability, while the other prefers more flexibility (or “agility”) and time effectiveness. The ultimate answer as to which is “better” most likely depends on the specific project, data used, and associated skill set. 
In this post I will cover strict schema enforcement, schemaless, and dynamic schema, including the pros and cons of each one.

Strict Schema

A schema is a blueprint of how a database is constructed. It doesn’t actually hold the data, but instead describes the shape of the data and how it might relate to other tables in the database. Schema’s contain information on all the objects in a database such as tables, attributes, data types and relationships, it can also include triggers, views, indexes and so on. Some common databases that use strict schemas are Oracle, MS SQL Server and PostgreSQL.

Pros: 

  • Gives a high level view of the structure and relationship of the tables in your database. Can make it easier to keep track of what information is and is not in the database. 
  • Enforces data integrity constraints, these are a set of rules that maintain consistent formatting of all entries. 
  • More predictable, which can provide a more efficient storage and indexing structure.

Cons:

  • Takes time to design and build when starting a new project. Modifying the schema can be tricky. Can be a lot of work to maintain.
  • Rigid limits, not flexible.

Schemaless

As the name implies, schemaless does not use a schema. It means the database does not have any fixed structure. A schemaless database does not enforce any data type limitations and can store structured and unstructured data. Some common schemaless databases are MongoDB, CouchDB, and Google Cloud Datastore.

Pros:

  • Quick and easy to setup because there is no schema to model or additional layers required, so the complexity is greatly reduced. With just a few clicks a developer can have a working database.
  • Updates can be made on the fly without having to make changes to a schema or shutting the database down.
  • More flexibility when storing data. You don’t need to decide up front what you’re going to store, how it’s structured or related to other information in the database.
  • Less overhead, which can lead to better performance and scalability.

Cons:

  • No columns means the application has to parse every document to find requested data.
  • No unified metadata, you end up looking at the application to understand the data rather than having that information.
  • No control over the data, you may be receiving garbage, but you don’t have any filters so bad data gets loaded either way. Data filters are pushed out to the application layer.

Dynamic Schema

What many claim as the best of both worlds, a dynamic schema is one that changes as you add data. There is no need to define the schema beforehand. When data is inserted, updated, or removed, the database builds a schema dynamically. Popular dynamic schema databases include HarperDB and MongoDB.

Pros:

  • Easy to set up, requires no input from the user.
  • Provides the structure that comes with a schema, which equals a more efficient storage and indexing model. 
  • Doesn’t force data constraints, can ingest unstructured data.
  • Flexible to develop with as the data model can easily evolve over time.
  • Can handle semistructured data.

Cons:

  • No data enforcement means developers must ensure data adheres to the data model.
  • Data model can get messy if proper processes are not followed

As you can see, there are valid points on each side of the argument and numerous factors to considering when choosing which is right for your specific project. At the end of the day, this decision has a lot to do with the preference of the user and long term project goals. For example, at HarperDB, we are big fans of the dynamic schema, which enables us to ingest any type of data at scale. HarperDB frees you from the hassle of defining data types, providing unlimited flexibility as your applications evolve and scale over time. Which type of schema do you prefer?

While it may not be top of mind, it’s important to get your schema right upfront to avoid unnecessary headaches and additional time and costs later on. Foundation is key, and it’s much more difficult to go back and change that foundation once you’ve actually built on top of it. Take the time to weigh the pros and cons of strict schema enforcement vs. schemaless vs. dynamic schema before you start building, you won’t regret it.  

You can leave comments or feedback on the original post here