Crashing and Recovery

The Tendermint consensus algorithm can make guarantee safety and liveliness with up to F adversaries that can behave arbitrarily, including becoming unresponsive. However, these are theoretical properties are guaranteed by assuming the existence of 2F+1 honest participants. These honest participants will avoid certain behaviours (e.g. prevoting for different blocks in the same height/round), and will be online (e.g. send messages and respond to messages).

In reality, software crashes, hardware failures, and network partitions are inevitable and can corrupt the state of an honest participants. Hyperdrive is designed (and tested) to ensure that honest participants will still behave correctly even in the face of arbitrary crashes.

State persistence

Every n message, Hyperdrive saves its entire process state to persistent storage. This includes all of the messages it has send and received that are still relevant. This guarantees that in the event of an unexpected crash a process state can will be the same as if it "missed" at most n messages. As n increases, performance improves (less marshaling and less disk IO), but the risk of crashes causing liveliness faults also increases.

Networking persistence

During boot, Hyperdrive will look at its persistent storage for the messages it has most recently sent. It will then re-broadcast the most recent proposal, prevote, and precommit over the network interface. This is important, because an unexpected crash could occur between Hyperdrive broadcasting and the network interface actually successfully deliver the broadcast message.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crashing and Recovery

State persistence

Networking persistence

Clone this wiki locally