Introduction
MIT 6.824: Distributed Systems
Lectures
-
- GFS
-
- Fault Tolerance Raft (2)
-
- Zookeeper
-
- Cloud Replicated DB, Aurora
-
- Distributed Transactions
-
- Spanner
-
- Big Data: Spark
Lecture 1: Introduction
Overview
- Infrastructure
- Storage
- Communication
- Computation
- Implementations
- RPC
- threads
- concurrency
-
performance and scalability
- double computers, double throughputs
- Fault Tolerance
- Availability
- Recoverability
- Non-volatile Storage
- Replication
-
Consistency
- example key value service
Google MapReduce
- simple for a programmer to write a parallel distributed computation
- programmer gives a map function and a reduce function
- the MapReduce program distributes the workload to Google servers
- the programmer does not need to worry about the distributed system itself!
- word count is the classic example
- no need to worry about fault-tolerance!
- if a single worker fails, rerun it!
- GFS - file system that is distributed
- have to do column store
- This is a shuffle, but expensive to do
Lecture 3: GFS
- GFS published in 2003, MapReduce in 2004, Hadoop/HDFS in 2006
Big Storage
- performance - sharding
- fault - tolerance
- tolerance - replication
- replication - inconsistency
- consistency - low performance
- strong vs weak conssitency
- last write corrupt
GFS
- big, fast
- global
- sharding
- automatic recovery
- single data center (really :O)
- internal use
- big sequential accces (not random)
- single master!
- map reduce has as single master too, but failure is so unlikely its fine to rerun all operations
- master data
- file name
- __ handles__
- list of chunk server (cs)
- primary version number (v)
- lease expiration
- LOG, CHECKPOINT, DISK
- append to log efficiently
- append to log efficiently
- READ
- name of master
- master ot list of servers
- gets chunk server which sends data back
- WRITE
- no primary - on master
- find up to date replicas
- pick p, s
- increment version #
- problem of split brain
- network partition
- give a primary a lease (has a timer)
- primary know who has the lease and can wait for it to expire
- these are secondaries
- mostly appends
- ask if they can do it
- only write if they promise they can
- what if primary crashes
Lecture 6: Fault Tolerance: Raft (1)
- RAFT
- state machine replication correct
- MapReduce and GFS require single master to choose
- simple, but single point of failure :/
- Split Brain
- two clients and two servers
- what if a client crash?
- what if a server crash?
- split brain c1-s1, c2-s2, the data: 1 can be inconsistent
- this is a network partition
- solution: majority vote with odd number of servers
- example 2 out of 3 majority
- quorum
- with 2f+1, you can have f failures, because f+1 agree to a majority
- Paxos
- VSR
- solution: majority vote with odd number of servers
- this is a network partition
Raft
- K/V server
- table
- Raft layer
- log of operations
- replicated
- Clients: C1, C2
put(k, v)
get(k)
- for an operation
- enters master server, enters RAFT layer
- when have majority vote coordination
- send up to the key/value state
- when have majority vote coordination
- enters master server, enters RAFT layer
- AE: append entry
- with 1 master and two other servers, only need one to reply
- go interface
- applyChannel, applyMsg with index
- election timer -> start election
- leader election
- term++, request votes
- choosing random times
- newly elected leader, but has divergent logs
- handling server crashes
- S1 lost 3 on log
- can become leader and crash
Visualization
- green client sends value to blue server
- distributed consensus is agreeing on a value with multiple nodes
- nodes can be in 3 states
- follower state
- candidate state
- leader state
- all nodes start in follower state
- it sends a request vote to the other nodes
- they reply with their vote
- if it gets a majority it becomes leader
- this is leader election
- all changes now go through the leader
- each change is written to a log (uncommitted)
- to commit the change is replicated to followers
- after the leader commits, it notifies the followers the entry is committed
- they commit in the process, called log replication
Leader Election
- first election timeout
- amount of time follower waits until becoming candidate
- randomized between 150ms and 300ms
- and votes for itself
- the nodes also reset election timeout
- like a server crashes
- no heartbeat sent. one node starts and election because its timeout was randomly faster
- example of a split vote
- when this occurs a revote happens and Node D is leader
Log Replication
- done using append entries messages
- a response is sent to the client
- network partitions
- the log entry stays uncommitted!
- once the partition is healed, B steps down
Lecture 7: Fault Tolerance: Raft (2)
- Why not longest log as leader?
- example vot yes only if high log entry
- AE Reply
- Persistent
- log
- correct Term
- voted For
- synchronize disk update - persisting
write(fd, _)
fsync(fd)
- storing a log
- snapshot every once in a while for efficiency
- linearizability
- linear execution history
- operations that finish first, and reads must come after start of writes
- reads and writes in lineariable order
- not linearizable - has a cycle
Lecture 8: Zookeeper
- The history is linearizable
- not linearizable - cycle in the graph
- Zookeeper layer
- ZAB layer
- many clients
- more servers -> slower
- allow client to send any read only request to any replica
- log
- guarantees
- linearizable writes
- cheap read
- want fresh data
- can send a sync operation
- write order
- read order?
- exist("ready")
- if it ever exists, PLEASE send me a notification
- at the correct point in the log to the client's reads!
Lecture 10: Cloud Replicated DB, Aurora
- Amazon Web Services
- Custom Backend! - Not AWS Services
- EC2 - general server/vm
- S3 - store snapshots
- persist to EBS
- can load a new EC2
- Availability Zones
- Transaction (XACTION)
- crash recovery
- to log
- RDS
- replicated to another AZ
- replicas with quorum
- just log entries
- only need 4 out of 6!
- doing this instead of having to replicate the page table (like RDS) 35x performance!
- goals of survival of failure of datacenter (AZ)
- transient slow replicas (?)
- quorum replication
- n replicas
- read and write quorum must overlap
- W and R
- databases vm/servers can crash
- spin up new server log says redo these in the transactions
- many many pages
- copy of A, B, etc on different servers
- replacement, pick 100 segment services
- pick 1 replica from each
- reads more common than writes
- atomicly
Lecture 12: Distributed Transactions
- Distributed Transactions
- Concurrency Control
- Atomic Commit
- Above done in an abstraction call a transaction
- BEGIN_X
- END_X
- example of a read only transaction
- ACID
- Atomic - all or nothing vs failure
- Consistent
- Isolated - Serializable
- Durable
- serial order of execution of transactions
- Concurrency Control
- pessimistic - locking
- optimistic - occ
- concurrency control with 2 phase locking
- distributed transactions
- want atomic all or nothing - but what if failure
- two-phase commit!
- transaction coordinator
- sends prepare messages to everybody
- yes/no
- commit
- ack
- lock/unlock - but can crash
- use a log
- that is why we have yes/no and ack response
- individually replicated - raft/group
- sharding?
Lecture 13: Spanner
- globally replicated database
-
each data center
- copies for redundancy and to be near clients
- sharded - throughput
- has a leader/follower
-
R/W transactions
- BEGIN
- x = x + 1
- y = y + 1
- END
- BEGIN
- 2 Phase Commit and 2 Phase Locking in Paxos Groups
- do a read transaction
- to master of paxos group - replicate to others
- 2PC with other master of paxos group
- commits and leaders of paxos group can replicate to copies
- simple wishes - wanted constraints
- serializable
- external consistency
- can't just read latest value rip
- Snapshot Isolation
- imagine all time is synchronized
- can now do timestamp
- need clock synchronization
- gps master
- Google has TrueTime
- returns (earliest, latest)
- get latest
- start
- commit
- delay time stamp such that the earliest it could possibly be is greater that the latest another transaction could be
- the latest for the T1 timestamp is 10
- so read after 12
Lecture 15: Big Data: Spark
- PageRank in Spark
- scala
- graph
- simulation in parallel iteratively
- run example
- page1 has highest rank
val lines = spark.read.textFile("in").rdd
// MapPartitionRDD[5]
// not actually executed! lazy lineage graph!
// get an array rep
lines.collect()
val links1 = lines.map{ s => val parts = s.split("\\s+"); (parts(0), parts(1)) }
val links2 = links1.distinct()
val links3 = links2.groupByKey()
// persist to disk ?
val links4 = links3.cache()
var ranks = links4.mapValues(v => 1.0)
val jj = links4.join(ranks)
val contribs = jj.values.flatMap{ case (usls, rank) => urls.map(url => (url, rank / urls.size)) }
ranks = contribs.reduceByKey(_ + _).mapValues(0.15 + 0.85 * _)
val output = rank.collect()
output.foreach(tup => println(s"${tup._1} has rank: ${tup._2} ."))
- the lineage graph
- store in HDFS (like GFS)
- parallel RDD
read
map
- shuffle
- expensive
distinct
- wide dependencies
- complete
- failed worker
- wide dependencies problem!
- store on HDFS - fault tolerant storage
- checkpoint
- if a worker fail
- have the lineage graph so we can reconstruct
Lecture 16: Cache Consistency: Memcached at Facebook
- Facebook's goal is to making features and selling ads
- Only scaling if they need to
1.
- Simple, single server
2.
- Multiple Front End Servers
- Still a Single backend database
- Don't have to worry about things like distributed transactions
3.
- Multiple Front End Servers, Multiple Databases Shared
- a-g for database 1, g-q for database 2, q-z for database 3
4.
- MEMCACHE Servers
get(key, value)
put(value)
- a cache for data that is hit often "hot" data
- but if a MEMCACHE fails, the database will fail
- the database usually only sees a few percent that the caching layers sees!!!
5.
- Stale data doesn't really matter
- just some pictures from friends from last hour, etc.
- Two datacenters
- Try to avoid cross nation communication
Read and Write Lookaside Caching
- delete for overwriting, because the data in the cache is stale
- performance by:
- partition
- RAM efficient
- not efficient for hot keys
- replication
- good for hot keys
- few TCP connections
- partition
- performance by:
- replication between datacenters
In One Region
- sharded database
- multiple clusters
- FE to MC connections is \( O(N^2) \) so have to limit it and make it in clusters
- Regional Pool Cache
Thundering Herd
- a problem where a miss causes other front end servers to overload the database
- instead use a lease
- a number
- invalidate the lease
- when trying to write note lease is invalid, write is invalid
- when deleting, use gutter memcache
Consistency
- we get into a data race
- problem - data is out of date forever!
- use lease again
- grant a lease (a number)
Extras
- CS 162: Key-Value Stores, and Chord
- Dynamo: Amazon’s Highly Available Key-value Store
- CockroachDB, Spanner, MongoDB
- Fast and Secure Global Payments with Stellar
- Scalability! But at What COST?
- Cluster Management with Borg and Kubernetes
- Serverless, Coordination-free Distributed Computing, and the CALM Theorem
- An Illustrated Proof of the CAP Theorem
- Ray: A Distributed Execution Framework for AI
Extra 1: Scalability Harvard CS 75
- VPS Service
- for hosting
- Amazon Web Services :)
Vertical Scaling
- Just throw a bigger computers
- more RAM, more cores, more storage space
- SAS and SSD drives for write heavy operations like databases
Horizontal Scaling
- a bunch of cheaper slower machines
Load Balancer
- load balancer has its own IP address
- from client to load balancer to server 1
- could do a copy of everything on each server
- could do instead images.host.com and have a server for each part
- could load balance with our DNS server
- Round Robin: to server 1, to server 2, to server 3, then back to server 1
- What if server 1 gets heavy weight users?
- Let Load Balancer decide instead of round robin
- Can use AWS Load Balancer,
- Application Load Balancer (ALB), Network Load Balancer (NLB)
Sticky Sessions
- session cookies
RAID
- RAID 0: 2 hard drives, stripe to drive 1 then drive 2
- RAID 1: 2 hard drives, mirror data, if one dies, you still have a copy of your data
- RAID 5: 5 drives, stripe to 4, and have 1 for redundancy
- RAID 6: 2 hard drives can die
- NFS file system, distributed file system
- What if you trip over a power cord?
Caching
- static page instead of dynamic page from PHP, HTML file is static
- problem: changing the style would have to change hundreds of thousands of HTML files
- MySQL cache: for identically executed queries
- memcached: memory cache, stores in RAM
- use extensively by Facebook
Data Replication
- avoid single points of failure
- Master-Master
- High avaliability with Heartbeat Master-Master
- Load Balancing + Replication
- heartbeat
- Active/Passive
- if server dies, heartbeat ends, Passive becomes Active
- Partition and Server slaves for replications
Layers of Replication and Redundancy
- Multiple Load Balancers
- Multiple Backend Servers
- Multiple Load Balancers to Databases
- Multiple Databases
- Cross Connected
- Multiple Network Swtiches
- In a single Datacenter
- AWS Avalibility Zone
- Then different regions, East, West, Asia, Europe
- AWS Avalibility Zone
- To do Geography based Load Balancing by doing a Load Balancer with DNS
- Avoid Single Points of Failure!
Security
- want TCP, SSL to Load Balancer
- Now everything is in HTTP instead
Extra 2: Chord Algorithm Berkeley CS 162
- Consistent Hashing
- ring space of \(0\) to \( 2^m - 1 \)
- take an id like ip, hash it and put it on the ring
- in our key value store, where to store keys?
- node 20 knows key-values for [16, 20] since the previous node is at 15
- correctness
- because it is a ring!
- always a node that hold they keys
- performance
- \(O(log(n))\)
- routing
- start at any node which will contain where the next node is
- worst case is \(O(n)\) - not great
- correctness with stabilization algorithm
- updates predecestor and successor
- a new node gets one node from DNS
- it calls another node, which calls another node, etc.
- the final node connects to the new node and updates its successor as the new node
- it calls another node, which calls another node, etc.
- finger table
- a table that knows probabilisticly approximately halfway, quarter, eighth, etc.
- now instead of having to hope each node
- you can jump at most half way, then another quarter, etc.
- \(\text{key}+2^i ~ \text{mod} ~ \text{ringsize}\)
- fault tolerance if the node fails, successor and precessor has a copy!
- Dynamo used Chord's Consistent Hashing but using a SLA agreement
Extra 3: Dynamo Amazon’s Highly Available Key-value Store
- problem: reads scale with read replicates but writes too much overhead
- another problem was it doesn't have auto-scaling
- wanted to do data modeling of shopping cart
- as a key-value operation
- app at top
- key value operations
- shared (A-F, G-K, etc)
- problem: wanted partitioning
- solution/technique: consistent hashing
- prob: high availability for writes
- sol: vector clocks with conflict resolution on read
- prob: handling failures (temporary)
- sol: hinted handoff
- prob: recovering from failures (permanent)
- sol: anti-entropy with merkle-trees
- prob: membership and failure dectection
- sol: gossip-bsaed protocol
- partitioning with consistent hashing
- instead of just choosing a server from N server with hash(k) mod N
- have a ring [0, L]
- hash(k) mod L
- distribute N servers randomly among the L ring and handle all key calls from ring section of previous node to current node
- instead of just choosing a server from N server with hash(k) mod N
- need a quorum to consider a read or write successful
- vector clocks
- conflict if servers think have different items
- consistenty by sending messages as vectors
- there is a property to timely order operations and get servers to be consistent
- handling failures
- temporary: hinted handoff, gives it to the next node in the consistent hash
- permanent: sync using merkle tree
- periodicly exchange key ranges
- just pass root of tree instead of whole ring
- if inconsistent, pass the children and repeat until find when inconsistent
- pass only a logorithmic size of data and only the root in best case
- membership and failure decection
- could send heart beats
- could have single node to handle listening like zookeeper
- single point of failure
- instead just gossip it and in logorithmic time, realize can identify node is dead
- could have single node to handle listening like zookeeper
- could send heart beats
Extra 4: CAP Theorem
- A distributed system coan have at msot two out of the three properties
- Consistency
- Availability
- Partition Tolerance
- BigTable, MongoDB, and Redis are CP
- Spanner is CP but highly available
- Dynamo, Cassandra are AP
- Amazon Aurora? Physalia in AWS EBS (Looks to be CP but highly available)?
- servers \(G_1\), \(G_2\) and client with initial value \(v_0\)
- write to any server. Only \(G_1\) is updated in this case
Consistency
- any read operation that begins after a write operation completes must return that value, or the result of a later writer operation
- we write to \(G_1\) but read from \(G_2\) and get a different result
- we write to \(G_1\) which replicates to \(G_2\) and sends an acknowledgement which \(G_1\) sends back to ghe client
- now the client can read from \(G_2\) and get a consistent result
Availability
- every request received by a non-failing node in the system must result in a response
- server cannot ignore client's requests
Partition Tolerance
- the network will be allowed to lose arbitarily many messages sent from one node to another
Proof
- proof by contradiction. Assume a system can be CAP
- begin by partitioning our system.
- with a partition, the data is inconsistent
Extra 5: Serverless, Coordination-free Distributed Computing, and the CALM Theorem
- 4 parts
- Serverless
- Avoiding Coordination
- CALM Theorem
- Hydro
- new platform, new language!
- super computers with new programming paradigms!
- how will people program the cloud!?
- building a program modeling is hard
- distributed systems, consistent, and partial failure
- more popular than Map Reduce
Serverless
- fine grain resource usage and efficiency
- new economy models for cloud providers and users
- auto scaling!
- not unbounded distributed computing
- pay per IO
- can't do the "batch" to disks
- No inbound network communcation
- makes distributed comp difficult
- embarrassingly parallel like Map work
- Reduce doesn't :(
- communication heavy, like shuffles in Spark :'(
- makes distributed comp difficult
Avoiding Coordination
- How do you embrace state
- data gravity
- consistency - hard :(
- consistency over long distances is hard
- split brain problem
- coordination based consistency is bad!
- make consistency as small as possible
- coordination has really bad tail latency
- slowdown cascades
- instead reason about application semantics
- rich application logic to READS and WRITES
- formalize semantics!
CALM Theorem
- Consistency as Logical Monotonicity
- if they are logically monotonic, it's consistent!
- programming confluence
- only care about outcomes
- distribtued deadlock detection
- checks cycles
- there exists
- garbage collection
- reference between objects on different machines
- 05 to 06 are garbage
- but machine 2 can't say its garbage until machine 3
- it requires coordination
- you can get crazy parallelism!
- share nothing!
- how to write in logic language instead of declarative language?
- like SQL
- maybe it'll be internal language IR for compiler
- like databases too!
- have our cake and eat it too!
Hydro: Stateful Serverless and Beyond
- Anna autoscaling multi-tier KVS
- Cloudburst Stateful FaaS with caches
- HydroLogic, an IR (doesn't depend on order)
- Hydrolysis, compiler
- Anna be like Redis and S3
- CALM consistency of simple lattices
- autoscaling
- best-of-conference
- multi-tiered!
- can be in fast memory
- or slow persistent disk
- shared nothing at all scales and threads!!!
- under contention, cache thrashing problems
- auto scaling
- cost 350x performance
- robot motion planning
- serverless jupyter
- each cell is running a cloudburst lambda! :O
- can handle all the memroy pursue
- sharing model state!
- motion planning with lambdas
- run compute, then share state
- need a coordination EC2, bottleneck
- much quicker cost
Extra 6: Stellar
- We take for granted currency that's stable
- good investments
- want equitable access to financial systems
- open membership
- issuer-enforced finality
- security
- cross-issuer atomicity
- good market to trade
- ACH
- requires national regulations
- can't trade across systems paypal -> venmo?
Stellar
- Replicated State Machine
- keep ledger safe
- public key authorize operations
- accounts can issue assets
- Transaction guarantee atomicity
- only works if everyone agrees on ledge double-spend attack
- think of the internet
- transitive connections
- China -> Stanford
- China -/-> Google
- Google -> Stanford
- Stellar Consensus Protocol SCP
- Byztentime Hypothesis
- quorum slice - majority
- quorum slice
- is a quorum slice but not quorum
- quorum tier
- like the internet
- citi bank sends an attack
- ACLU doesn't agree because it has no quorum from other non-profits
- both nodes depend on V7
- liveness?
- sufficiency problem
- must be of an intact set (non-evil)
- sufficiency problem
- cascade theorem
- federated voting
- can get stuck
- balloting
- nomination
- quorum intersecting
- cascade until it includes all the nodes
- crappy consensus
- intertwined nodes cannot confirm contradictory statement
- optimal safety
- differ vote
- green is in a quorum
- will eventually cascade
- balloting
- federated voting can get stuck
- invariant: all stuck and decided ballots must chose same value
prepare(n, x)
commit(n, x)
- synchronize with timer
- cascades
- 1,000 operations/ledger
- 133 nodes, 74 validators, 17 "tier-one" by 5 organizations
- hash signing
Extra 7: CockroachDB, Spanner, MongoDB
- timestamp 18:44
- Amazon
- fork of MySQL or PostgreSQL
- custom storage on EBS
- transaction level on EBS
- Redis
- In memory
- single thread engine
CockroachDB
- Database Don't die
- Open Source like Spanner
- Distributed Database
- Decentralized Shared Nothing
- Log Structured Storage Architecture at individual nodes (RocksDB)
- Concurrency Control Model: Multi-Version OCC
- Serializable Isolation
- multi-layer
- RocksDB storage manager (low level)
- Raft - replication and consensus
- key-value api
- for pages to fetch data on
- hybrid clock
- order transaction globally
- transaction stage intent check conflict commits (?)
- global keyspace
- key -> data
- leader
- raft replication
- instead of buffer manager -> disk (your classic single node database)
- get from distributed key value store system
- FoundationDB does the same thing (distributed database)
- engineering! done well, fault tolerant and all the edge cases!
Spanner
- Cloud Spanner
- Google wrote BigTable in 2006
- give up SQL
- give up joins
- column-based database
- Adwords ran on sharded MySQL
- needed transactions
- Megastore
- 2011
- geo-replicated
- schematized, semi-relational ?
- log structured on disk
- strict 2PC MVCC Multi-Paxos 2PC
- Paxos groups
- External Consistency
- global write_transaction synchronous replication
- lock-free read only transactions
- joins are slow :'(
- have to go to another node/table
- interleave
- single page efficient physical denormalization
- wound-wait deadlock prevention
- don't need deadlock detection
- ordering throuhg unique timestamps from atomic clocks and GPS devices
- tablets (shards)
- paxos - elect leader in tablet group
- 2PC for txn spanning tablets
- completely physical wall-clock time
- necessary for linearizability
- paxos group decide order transaction commit
- internal Google API
- time range
- wait long enough
- then can commit
- at commit + release locks
- then can commit
- F1 has OCC
- now SQL
- Built for the ad system
- Spanner SQL
- for everyone else
- good benchmarks but expensive!
MongoDB
- shared nothing document database (2007)
- json document
- now can do multi-action transactions
- don't join
- instead embed single document
- pre join
- the json example
- no query optimizer
- just heuristics
- instead generate a bunch of query plans
- execute all of them!
- whichever is fastest return
- execute all of them!
- shared nothing architecture
- master slave replication
- auto-sharding
- partition by hash or range
- automaticly split if a shard is too big
- startups liked this, oh if I grow, it'll shard automaticly
mmap
- OS storage manage- changed
- storage backend is WiredTiger but could replace with RocksDB
- avoid premature optimizations!
- based on months in the future
- MySQL and Postgres should be fine for most cases
- If need to ACTUALLY scale, you have money - instead find and pay smart people to help
Extra 8: Scalability! But at What COST?
- GraphX paper
- lots of cores 128
- repeated look at each label in the graph
- flows through
- ran a single core in Rust and it was fast!
- with better algorithms! - Even faster
- scalability fetishised by community
- at 16 cores it is the same performance at a single core
- 1, 2 cores
- very very fast! micro-seconds
- still high performance on a single laptop!
- When you are focued - high performance compiled languages, caches, not JVM
- step 1. get 100x on your computer
- step 2. then 100x from 100x computers
- but that's expensive!
Extra 9: Ray: A Distributed Execution Framework for AI
- Task Parallel and Actors Abstraction
- compatible with TensorFlow, PyTorch etc!
- million of tasks per seconds
- stateful compution
- fault tolerance
Ray API
- take python functions
- add
@ray.remote
- returns a future
ray.get(id3)
- add
- task scheduled
- think about tasks and futures
- parameter server basically key-value
import ray
ray.init()
- now a ray.remote
- get with ray.get(ps.get_params.remote())
- fake update
Ray Architecture
- workers, object manager and scheduler
- global control store
Libraries
- Apache Arrow
- serializing, like Python C++
-
pip install ray
-
linear based fault tolerance
- rerun
- can rerun
Extra 10: Cluster Management with Borg and Kubernetes
- Borg
- Google Data Centers
- Even Bigger
- program to run
- cell/custesr
- cmdline args
- 10,000 copies of hello world
- Borg
- BorgMaster
- Borglet
- all managed by Config file
- And Blaze?
- things can go wrong
- a cell
- usage - cpu/vm
- lots of waste
- overhead
- bigger cells are better, less partitioning overhead
- bucket to next order of 2
- some people run very little resource
- some people have nice round numbers
- used best fit scheduling algorithm
- but it could be gamed
- people overallocate
- be more aggressive
- yellow safety margin
- smaller on week 2
- then set it back to original settings
- configs
- monitor
- binaries
- you only need to focus on the app
Kubernetes
- open source Borg
- it schedules and allocates
- you don't need to car about the machine
- replication
- Google Kubernetes Service
- resiliency
- efficiency
- containers