Lecture 6: Fault Tolerance: Raft (1)

RAFT
- state machine replication correct
MapReduce and GFS require single master to choose
- simple, but single point of failure :/
Split Brain

two clients and two servers
- what if a client crash?
- what if a server crash?

split brain c1-s1, c2-s2, the data: 1 can be inconsistent
- this is a network partition
  - solution: majority vote with odd number of servers
    - example 2 out of 3 majority
    - quorum
    - with 2f+1, you can have f failures, because f+1 agree to a majority
  - Paxos
  - VSR

Raft

K/V server
- table
- Raft layer
  - log of operations
- replicated
Clients: C1, C2
- put(k, v)
- get(k)

for an operation
- enters master server, enters RAFT layer
  - when have majority vote coordination
    - send up to the key/value state

AE: append entry
- with 1 master and two other servers, only need one to reply

go interface
applyChannel, applyMsg with index
election timer -> start election
leader election
- term++, request votes

choosing random times
newly elected leader, but has divergent logs
handling server crashes

S1 lost 3 on log

can become leader and crash

Visualization

Visualization: Raft Understadable Distributed Consensus

green client sends value to blue server
distributed consensus is agreeing on a value with multiple nodes
nodes can be in 3 states

follower state

candidate state

leader state
all nodes start in follower state

it sends a request vote to the other nodes
- they reply with their vote

if it gets a majority it becomes leader
- this is leader election
- all changes now go through the leader

each change is written to a log (uncommitted)

to commit the change is replicated to followers

after the leader commits, it notifies the followers the entry is committed
- they commit in the process, called log replication

Leader Election

first election timeout
- amount of time follower waits until becoming candidate
- randomized between 150ms and 300ms

and votes for itself

the nodes also reset election timeout

like a server crashes

no heartbeat sent. one node starts and election because its timeout was randomly faster

example of a split vote

when this occurs a revote happens and Node D is leader

Log Replication

done using append entries messages

a response is sent to the client

network partitions

the log entry stays uncommitted!

once the partition is healed, B steps down