Data drives our lives. Our apps, our spending, our home, it’s all driven from a datastore somewhere. On the scale that we constantly ask questions of our devices we might wonder, just how are these systems keeping up with the sheer scale of all the constant data requests being delivered to our brains? NoSQL databases are built to horizontally scale natively as can some relational databases. Monolithic databases are of course a thing that happens. Both of these options can work but the cost of this type of scale surrounding databases can be staggering and keeping it all alive can make a seasoned DevOps shudder. The answer to this question of scale is memory and caching.
In many applications, data retrieved from data stores does not change rapidly so it makes sense to cache the response from a database to be utilized at a later time. This helps rapidly improve performance and keep infrastructure costs down. Or, it could be that your app needs to query results from your data warehouse, which only gets refreshed nightly. In this case the data will not change throughout the day and it is safe to cache this data for easy retrieval. Another application mechanism for caching is memoization, this allows for caching the results of complex function calls.
Your Database Remembers
Many databases have internalized these caching needs by either being in-memory natively, having an in-memory cache or the ability to simulate in-memory with potentially limited features. The spectrum that this solution has been met is broad. There are simple key value stores aplenty in this space as well as complex relational databases that offer in-memory caching. The benefit is it offloads the need to design a caching mechanism on the app side. The downside is data size can exceed the memory footprint available to the database server creating out of memory exceptions on a critical part of your infrastructure.
Caching the Internet
A few months ago, Stephen discussed the potential end of net neutrality. In the scenario of a peer to peer internet what will facilitate the transfer of data is an Information-centric Network (ICN). No longer would there be a host-centric network, but rather a series of nodes with caches of data where the client is routed based on the information required. Currently, when a request is sent across the internet you are routed ultimately to an IP address. In an ICN architecture the information needed points you to nodes that house data related to the request. A distribution of caches in this framework allows for failover and replication of information enabling high scalability and reliability. Simply hitting data stores over and over again is effective, but inefficient. Bludgeoning a database for frequently polled data can cause row locks, connection issues, and force costs to soar to keep up for inefficient design patterns. As is always found, systems on scale require finesse and solutions that maximize the potential of the hardware. Leveraging memory caching, whether thru your own application or from an in memory database can utilize resources already available to you and enable you to do more with less.