What We're Building Towards (For Now…)

Posted by Matt Hagy on May 01, 2020.

I imagine that many of you joining me on this journey through database development would like to know where we're headed. While I don't have a formal plan, nor do I have final destination in mind, I do have an intermediate goal. I want us to push key/value stores to their limits and in doing so understand the motivation for other types of databases; e.g., relational databases and columnar stores.

Initially, we're be covering some standard aspects of a general key/value store. For example, we've already explored the following topics.

Next, I plan for us to explore and develop these additional standard components of a key/value store.

Partitioning (i.e., sharding) for enhanced concurrent performance
Write-ahead-logs for data durability
Caching frequently used keys
A proper RESTful HTTP interface
Structured values (i.e., documents) with support for partial reads and partial writes
Composite keys that include both a partitioning and sorting key
Secondary indexes
Table scan queries
Providing isolation and atomicity through transactions with multi-version concurrency control
Replication in a leader/follower configuration

Yet, I also want us to experiment with support for features not commonly associated with a key/value store, including:

Aggregation queries. E.g., computing the sum of some attribute for documents that match certain criteria.
Supporting join queries to allow users to write queries that use more than one table. This includes joins on non-indexed columns as will require us to develop some basic map-reduce-like computing capabilities.
Constraints, including uniqueness of values and also the preservation of relations due to foreign keys when entries are updated or deleted.
Basic triggers and stored procedures, for which we'll develop a mini-turing complete language.

Yes, such features are rarely supported by key/values stores and are instead are more commonly associated with relational databases. In developing support for these features, I believe we'll start to understand some of the deficiencies of a key/value store and the motivation for different types of databases. Further, we'll get to explore some of the algorithms and components used by relational databases and see how they can be adapted to meet the needs of our novel and increasingly sophisticated key/value stores.

From there, we can decide whether we want to build a proper relational database or possibly go a different direction on our journey through database development.

Additionally, I hope we can regularly extend and refine our benchmarking techniques so as to best quantify the different aspects of databases and the workloads they support. For example, I'm currently figuring out how to best simulate the workload experienced by a key/value store that serves as the data store for a website. Measurement can include a distribution of latency to see if any of our hypothetical users would have a particularly bad experience in terms of long waits times due to certain design and configuration decisions of the key/value store.

We can also take diversions from developing databases to instead study and benchmark existing databases. I've personally spent some time reading through PostgreSQL and LevelDB source code and I think it would be interesting for us to learn how robust databases address certain challenges, including how they implement their solutions in code.

So thats the plan for now. Let me know if you have suggestions about projects that we should explore or feedback on the current work at matthew.hagy@gmail.com.