-
Notifications
You must be signed in to change notification settings - Fork 20
Crashing and Recovery
The Tendermint consensus algorithm can make guarantee safety and liveliness with up to F
adversaries that can behave arbitrarily, including becoming unresponsive. However, these are theoretical properties are guaranteed by assuming the existence of 2F+1
honest participants. These honest participants will avoid certain behaviours (e.g. prevoting for different blocks in the same height/round), and will be online (e.g. send messages and respond to messages).
In reality, software crashes, hardware failures, and network partitions are inevitable and can corrupt the state of an honest participants. Hyperdrive is designed (and tested) to ensure that honest participants will still behave correctly even in the face of arbitrary crashes.
Every n
message, Hyperdrive saves its entire process state to persistent storage. This includes all of the messages it has send and received that are still relevant. This guarantees that in the event of an unexpected crash a process state can will be the same as if it "missed" at most n
messages. As n
increases, performance improves (less marshaling and less disk IO), but the risk of crashes causing liveliness faults also increases.
During boot, Hyperdrive will look at its persistent storage for the messages it has most recently sent. It will then re-broadcast the most recent proposal, prevote, and precommit over the network interface. This is important, because an unexpected crash could occur between Hyperdrive broadcasting and the network interface actually successfully deliver the broadcast message.