next up previous
Next: Failure Detection Up:

NetSolve: A Network Previous: The Workload Model


Fault Tolerance

Fault tolerance is an important issue in any loosely connected distributed system like NetSolve. The failure of one or more components of the system should not cause any catastrophic failure. Moreover, the number of side effects generated by such a failure should be as low as possible and minimize the drop in performance. Fault tolerance in NetSolve takes place at different levels. Here we will justify some of our implementation choices.





Joint Institute for Computational Science
Mon Apr 29 13:00:40 EDT 1996