Appendix: Databases in more detail
Read additional details and insights that didn't fit into the main chapter on databases.
Flat file database
The simplest way to store data is a flat file database. Even if you call it "just organized files".
Serverless systems don't have drives to store files so these flat file databases aren't a popular choice. You'd have to use S3 or similar, which negates some of the built-in advantages.
We mention them here because they're often a great choice and most engineers forget they exist.
Advantages of flat files
Compared to other databases, flat files have zero overhead. Your data goes straight to storage without your database adding any logic on top.
This gives flat files amazing write performance. As long as you're adding data to the end of a file.
Optimizing file access for speed comes down to your operating system. Performance mostly has to do with memory paging, filesystem configuration, direct memory access, and hardware-level optimizations.
The end result is:
- fast append performance, because you stream data from memory to drive without processor involvement
- fast read performance for common reads, because computers use their memory as read-through cache. Once you read a file, subsequent reads come from much faster memory
- fast search for predicted lookups, because you can structure your files in a way that makes common lookups instant
- great scalability, because you can spread your data over as many servers as you'd like
Disadvantages of flat files
Where flat files struggle are data updates.
To add a line at the beginning of a file, you have to move the whole thing. To change a line in the middle, you have to update everything that comes after.
Another problem is lack of a query interface.
You have to read all your files to compare, analyze, and search. If you didn't think of a use-case beforehand, you're left with a slow search through everything.
- slow data updates, because you often have to rewrite more than just the changes
- no data shape guarantees, because you can store individual data items any way you like. If you change your schema, you have to deal with inconsistencies, or rewrite your data
- slow broad reads, because you lose benefits of read-through-cache, if you read data across your entire database with random patterns
- no ACID compliance unless you build it yourself at the application layer