data-intensive

Book: Designing Data-Intensive Applications

December’s Book: Designing Data-Intensive Applications by Martin Kleppmann

Why this book: I want a deep dive into databases, and to think more along the lines of a software/systems engineer. 

Takeaway: this a huge, detailed textbook. I used a highlighter liberally. For this post, I want to create a quick list of common, critical questions & topics (basically, an outline!).

  1. Reliability, Scalability, Maintainability
  2. What is the load on the system (think bottleneck)?
  3. Relational Model vs. Document Model (RDMS vs. NoSQL)
  4.  What to index? 
  5. How to log?
  6. Compatibility – Backward, Forward
  7. Encoding… JSON, XML, Protocol Buffers, Avro, etc.
  8. Distributed Data – Scalability, Fault Tolerance/Availability, Latency
  9. Replication: single-leader, multi-leader, leaderless
  10. Replication – synchronous vs. async
  11. Partitioning (Sharding) combined with Replication
  12. ACID: Atomicity, Consistency, Isolation, Durability
  13. Dirty Reads, Dirty Writes
  14. Serialization 
  15. Distributed Systems problems: unreliable networks, faults/partial failures, timeouts, unreliable clocks, process pauses
  16. Consistency through Linearization
  17. Systems of Record vs. Derived Data Systems
  18. Batch Processing vs. Stream Processing
db

What kind of SQL … ??

Tech: Relational Database Management Systems

Challenge: I wanted to take a few moments to read, research, and write down the differences in relational databases. I have experienced this many times – when asking a colleague a high-level conceptual database question, the initial response is “what kind of database is it?”… I don’t always think that is relevant considering the question, but nonetheless I’d like to know the key advantages & disadvantages among these popular systems.

SQLite

  • Advantages
    • light-weight, easy to embed in software
    • very fast read/write
    • no install/configure
    • minimal bugs
    • serverless
    • open-source
  • Disadvantages
    • restricted to 2GB in most cases
    • can only handle low volume transactions
    • no concurrent writes (bad for write-intensive)
    • not multi-user
    • no data type checking (can insert string into integer field)

 
MySQL (side note: LAMP-stack… Linux, Apache, MySQL, PHP)

  • Advantages
    • open source
    • most secure & reliable – used by WordPress, Facebook, Twitter
    • great data security & support for transactional processing (eCommerce)
    • configurable for flawless performance
    • optimized for web applications
    • can run on all major platforms, and supports apps in all popular languages
    • easy to learn
  • Disadvantages
    • might not be great for high concurrency levels
    • helmed by Oracle now… progress has slowed/halted
    • key features are applications and add-ons (text search, ACID compliance)
    • can be limited in areas such as warehousing, fault tolerance, performance diagnostics

 
PostgreSQL

  • Advantages
    • open source
    • exhaustive library & framework support
    • superior query optimizer (great for complex data models)
    • built-in NoSQL key-value store
    • can practically be used for any data problem situation
    • integration with Heroku
    • very reliable
  • Disadvantages
    • so expansive, can be tough to learn
    • slower than MySQL

 
Oracle

  • Advantages
    • all instances backward compatible
    • high functionality for large data sets – many large international banks utilize Oracle
    • high data integrity (aces ACID test)
    • efficient data-recovery tech (Flashback)
    • supports cursors (making programming easier)
  • Disadvantages
    • expensive
    • complex
    • high level of expertise typically needed to properly administer

 
Microsoft SQL Server (MS SQL)

  • Advantages
    • enterprise level 
    • excellent data recovery
    • easy to install
    • great security features (security audits, events can be automatically written to log)
    • great scalability
    • integration w/ .NET framework
  • Disadvantages
    • expensive
    • limited compatibility (Windows-based)
    • uses custom core language

 

Reference:

Wikipedia: RDMSs