December’s Book: Designing Data-Intensive Applications by Martin Kleppmann
Why this book: I want a deep dive into databases, and to think more along the lines of a software/systems engineer.
Takeaway: this a huge, detailed textbook. I used a highlighter liberally. For this post, I want to create a quick list of common, critical questions & topics (basically, an outline!).
- Reliability, Scalability, Maintainability
- What is the load on the system (think bottleneck)?
- Relational Model vs. Document Model (RDMS vs. NoSQL)
- What to index?
- How to log?
- Compatibility – Backward, Forward
- Encoding… JSON, XML, Protocol Buffers, Avro, etc.
- Distributed Data – Scalability, Fault Tolerance/Availability, Latency
- Replication: single-leader, multi-leader, leaderless
- Replication – synchronous vs. async
- Partitioning (Sharding) combined with Replication
- ACID: Atomicity, Consistency, Isolation, Durability
- Dirty Reads, Dirty Writes
- Serialization
- Distributed Systems problems: unreliable networks, faults/partial failures, timeouts, unreliable clocks, process pauses
- Consistency through Linearization
- Systems of Record vs. Derived Data Systems
- Batch Processing vs. Stream Processing