Lecture 1: Introduction

Overview

  • Infrastructure
    • Storage
    • Communication
    • Computation
  • Implementations
    • RPC
    • threads
    • concurrency

  • performance and scalability

    • double computers, double throughputs
    • Fault Tolerance
      • Availability
      • Recoverability
      • Non-volatile Storage
      • Replication
  • Consistency

    • example key value service

Google MapReduce

  • simple for a programmer to write a parallel distributed computation
    • programmer gives a map function and a reduce function
    • the MapReduce program distributes the workload to Google servers
      • the programmer does not need to worry about the distributed system itself!

  • word count is the classic example

  • no need to worry about fault-tolerance!
    • if a single worker fails, rerun it!
  • GFS - file system that is distributed

  • have to do column store
    • This is a shuffle, but expensive to do