data-intensive

Book: Designing Data-Intensive Applications

December’s Book: Designing Data-Intensive Applications by Martin Kleppmann

Why this book: I want a deep dive into databases, and to think more along the lines of a software/systems engineer. 

Takeaway: this a huge, detailed textbook. I used a highlighter liberally. For this post, I want to create a quick list of common, critical questions & topics (basically, an outline!).

  1. Reliability, Scalability, Maintainability
  2. What is the load on the system (think bottleneck)?
  3. Relational Model vs. Document Model (RDMS vs. NoSQL)
  4.  What to index? 
  5. How to log?
  6. Compatibility – Backward, Forward
  7. Encoding… JSON, XML, Protocol Buffers, Avro, etc.
  8. Distributed Data – Scalability, Fault Tolerance/Availability, Latency
  9. Replication: single-leader, multi-leader, leaderless
  10. Replication – synchronous vs. async
  11. Partitioning (Sharding) combined with Replication
  12. ACID: Atomicity, Consistency, Isolation, Durability
  13. Dirty Reads, Dirty Writes
  14. Serialization 
  15. Distributed Systems problems: unreliable networks, faults/partial failures, timeouts, unreliable clocks, process pauses
  16. Consistency through Linearization
  17. Systems of Record vs. Derived Data Systems
  18. Batch Processing vs. Stream Processing