Lecture 1: Introduction
Overview
- Infrastructure
- Storage
- Communication
- Computation
- Implementations
- RPC
- threads
- concurrency
-
performance and scalability
- double computers, double throughputs
- Fault Tolerance
- Availability
- Recoverability
- Non-volatile Storage
- Replication
-
Consistency
- example key value service
Google MapReduce
- simple for a programmer to write a parallel distributed computation
- programmer gives a map function and a reduce function
- the MapReduce program distributes the workload to Google servers
- the programmer does not need to worry about the distributed system itself!
- word count is the classic example
- no need to worry about fault-tolerance!
- if a single worker fails, rerun it!
- GFS - file system that is distributed
- have to do column store
- This is a shuffle, but expensive to do