Fault Detection and Recovery



next up previous contents index
Next: Pvmd' Up: PVM Daemon Previous: Wait Contexts

Fault Detection and Recovery

Fault detection   originates in the pvmd-pvmd   protocol (Section gif). When the pvmd times out while communicating with another, it calls hostfailentry(), which scans waitlist and terminates any operations waiting on the down host.

A pvmd can recover from the loss of any foreign pvmd except the master. If a slave loses the master, the slave shuts itself down. This algorithm ensures that the virtual machine doesn't become partitioned and run as two partial machines. It does, however, decrease fault tolerance of the virtual machine because the master must never crash. There is currently no way for the master to hand off its status to another pvmd, so it always remains part of the configuration. (This is an improvement over PVM 2, in which the failure of any pvmd would shut down the entire system.)