computer_science_distilled

Book: Computer Science Distilled

November’s Book: Computer Science Distilled by Ferreira Filho

Why this book: I do not have a degree in computer science and this looks to be a good primer to get my mind grapes flowing. My college major, Operations & Information Systems Management, did explore some of these concepts but not in depth. I think it is time for a refresh and a deeper dive.

Final, Final Takeaways:

  • Each chapter ends with a reference list of books to further explore the chapter’s topics. 
  • The colophon states the cover image is from an 1845 schematic by Charles Babbage; the first programmable computer!

Notes/Thoughts:

Chapter 1: Basics

  • Flowchart
    • states & instructions = rectangle
    • decision step = triangle
  • Factorials get BIG FAST. 10! = 3,628,800
  • Fun fact: human DNA has about 3 million base pairs, replicated in each of the 3 trillion cells of the human body
  • Final Takeaways
    • Great refresher on mathematical terms/concepts: truth tables, permutations (with/without identical items), combinations, probabilities: counting and independent, complementary, and mutually exclusive events. Definitely can use this chapter as a reference!
    • XKCD is a great comic
    • The Zebra Puzzle is fun

Chapter 2: Complexity

  • Two main components when considering what an algorithm will “cost”: time and memory (ongoing calculations take up space)
  • Exponential algorithms (2^n) are much more prohibitively expensive than quadratic algorithms (n^2)
  • Final Takeaways
    • I want to learn more about Big O notation – I recognize the term & idea, and feel it is the most vital content of the chapter 
    • I have a book in mind for December (Designing Data-Intensive Applications), but Big O would be a good topic for another month. Eventually, I’d like to be a software or systems engineer and obviously knowing more in this realm will be key!

Chapter 3: Strategy

  • merge algorithm O(n)
    • fixed number of operations
    • think 2 lists of fish, sorting alphabetically
  • power set algorithm O(2^n)
    • double the operations when input size increases by 1
    • power set = truth table
    • think fragrance combinations from a set of flowers
  • recursion
    • remember the base case
    • think palindrome or fibonacci function
    • recursion vs. iteration => it’s a trade-off
      • more expensive: recursion
      • faster speed: iteration
      • higher complexity: iteration
  • brute force (exhaustive search) O(n^2)
    • number of pairs in an interval increases quadratically as interval increases
    • think best trade: in one month, find the day where buying & selling nets the most profit. OR find the optimal buy/sell timeframe
  • backtracing
    • think 8 Queens Puzzle
    • use recursion… once you get a false value, rollback to the last true value and try again
    • “fail early, fail often”
  • heuristics
    • method that leads to a solution that is good enough
    • think chess (man vs. computer): after your first 4 moves, 288 billion possible positions, wow! find a move that is good enough.
    • Greed Approach
      • make best choice at each step, and don’t look back
      • think burglar filling a knapsack. No time to remove or consider items already in the knapsack
  • divide and conquer
    • look for optimal substructure and use recursion
  • memoization
    • think about knapsack… some calculations done repeatedly
    • store the result of the repeated calc
  • branch and bound
    • divide into subproblems
    • find upper & lower bounds of each subproblem
    • compare subproblem bounds w/ all branches
  • Final Takeaway
    • will be very helpful to return to this chapter when tackling a data-parsing problem

Chapter 4: Data

  • Abstract Data Types: how variables of a given data type are operated
    • stack – LI,FO
    • queue – FI,FO
    • list – more flexible than stack or queue; many available data operations
    • sorted list – fewer operators than list, but items always ordered
    • map – stores mappings with a key and value (kinda like a Hash?)
    • set – unordered group of unique items
  • Structures: how variables/data organized & accessed
    • array – instant access, sequential memory space, but can be impractical with large data sets
    • linked list – each item has a pointer to memory location of next item
    • double linked list – each item has pointers in both directions
      • for either linked list… can not find specific nth item in list
    • array vs. list => it’s a trade-off
      • faster insert/delete: list
      • insert/delete at random points: list
      • random, unordered data access: array
      • extreme performance access: array
    • tree
      • think about traversing HTML
      • nodes & edges
      • root node – no parent
      • leaf node – no children
      • height – level of deepest node
    • binary search tree
      • at most, each node can have only 2 children
      • left node < parent, right node > parent
      • binary heap – tree where parent must be greater (or smaller) than both child nodes
    • graph – no child, parent, or root node! most flexible structure!
    • hash – each item is given a memory position; still needs a large chuck of memory set aside
  • Final Takeaway
    • How is a hash different from a map? I know one falls under how data is operated, and the other organized/accessed, but the definitions feel very similar.

Chapter 5: Algorithms

  • Most important, an efficient algorithm likely exists already to solve your issue!
  • Lots of sorting algos: Selection, Insertion, Merge, Quick
  • Searching: Sequential, Binary, or use a Hash!
  • Graphs: Depth First Search vs. Breath First Search => it’s a trade-off!
    • DFS – down, using a Stack
    • BFS – across, using a Queue
    • Simple, less memory: DFS
    • DFS if need to explore all graph nodes
    • BFS if expected location is close to the root
  • classic problem: find the shortest path between nodes
    • try Djkistra Algorithm
    • uses a priority queue (as opposed to BFS, which is an auxiliary queue)
    • huge area? try Bidirectional Search
  • Google, PageRank Algorithm
    • modeled the web as a graph
    • web page = node, links = edges
    • the more edges a page has, the higher the rank! 
  • network, workflow, cost issues? Linear optimization problems are likely best solved with Simplex Method
  • Final Takeaways
    • I think the key here is first to see how your data is modeled, and then look for a algorithm that matches the problem at hand
    • Beware of choice! Successful algorithms can have pitfalls, drawbacks

Chapter 6: Databases

  • The definition of Normalize, Normalization is not what I have used/heard
    • book: The process of transforming a database with replicated data to one without
    • familiar: prepare and/or flatten data for ease of consumption
    • I think I have been wrong here (the very reason I am writing this blog & reading these books lol)!
  • An index is basically a self-balancing binary search tree
  • NoSQL (btw, it can be pronounced either way)
    • most widely known type: document store
      • a data entry contains all info app needs 
      • data entry: document
      • group of documents: collection
  • Graph Databases
    • data entry = node, relationship = edge
    • the most flexible type of DB!
    • data modeled like a network? Graph is likely best
  • Big Data: Volume, Velocity, Variety, (Variability, Veracity)
  • SQL vs. NoSQL => it’s a trade-off!
    • data-centered: SQL
    • maximize structure & eliminate duplication: SQL
    • application-centered: NoSQL
    • faster development: NoSQL
    • time spent on consistency: SQL
  • Distributed
    • Single-Master Replication
      • master receives all queries
      • forwards to slave (which contains a replica of DB)
    • Multi-Master Replication
      • load balancer distributes queries
      • all masters connected
      • write queries propagated amongst all masters
    • Sharding
      • each computer has portion of DB
      • query router send query to correct computer
      • use with replication to avoid one shard failing and having portion of DB unavailable
    • be wary of Data Consistency; Eventual Consistency (many writes among distribution take time to catch up & sync) might not be good enough
  • Serialization: SQL, XML, JSON, CSV

Chapter 7: Computers

  • bus: group of wires for transmitting data (think individual RAM component)
    • address bus: transmit address/location data (unidirectional)
    • data bus: transmit data to and from
  • register: internal memory cell in CPU
  • instruction set: collection of all operations
  • at core of never-ending CPU cycle is Program Counter (PC or BIOS)
    • stores the memory address of next instruction to be executed
    • special register
    • will hold immutable core logic for computer startup
  • CPU clock: number of basic operations per second
    • 2 MHz = two million operations/second
    • quad-core 2 GHz = close to a billion operations/second
  • bit architecture
    • 4 bit = processing binary number instructions up to 4 digits 
    • 8 bit = up to 8, 32 bit = 32 digits, etc
    • thus a 64-bit program can’t run on 32-bit
    • 64-bit register = 2^64 = over 17 billion gigabytes
  • endian
    • little-endian = store numbers left-to-right
    • big-endian = right-to-left
  • compiler
    • converts programming language into machine instructions
    • turing-complete: read/write data, performs conditional branching
    • scripting languages (JS, Ruby, Python) use an interpreter to skip compiling (much slower! but immediate code execution)
    • once compiled, original code is impossible to recover
    • BUT is possible to decode the binary (disassembly). this is reverse-engineering, and frequently how programs are hacked (think pirated software that bypasses auth/download code)
  • memory hierarchy
    • Processor-Memory Gap: RAM is slower
    • the following help bridge the gap
      • Temporal Locality: if used once, likely to be used again
      • Spatial Locality: when address used, near-by addresses likely to be used shortly
      • L1/L2/L3 Cache: contents of memory with high probability of being accessed
    • Main Memory (RAM) – primary
    • Main Storage (DISK) – secondary, could be tape or SSD
  • Final Takeaway

Chapter 8: Programming

  • Values are first-class citizens. I never thought of it this way; I always associated the term “first-class” with functions & JavaScript
  • Paradigms
    • Imperative
      • first!
      • exact instructions at each step with specific commands
      • Machine Code: mnemonic instructions (ex – CP, MOV using Assembly ASM)
      • StructuredGOTO, JUMP to control execution flow, eventually conditionals (for much better control)
      • Procedural: an advancement, allowing dry code & reusability
    • Declarative
      • what you want, not how
      • Functional:
        • functions are first-class, & thus higher-order functions
        • closures 
          • can “remember” a var, and access it in future 
          • allow for clean approach to global var
    • Logic: best for AI, natural language processing
  • Final Takeaway
    • Once, I failed an interview code challenge. I now know why because I did not understand or know about closures!