Home News People Research Study Search

Institute for Computing Systems Architecture

SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery

Mark D. Hill, Computer Sciences Department, University of Wisconsin-Madison

Availability is increasingly important for shared memory multiprocessors, but the market for commercial servers prefers that availability not come at the cost of appreciably more hardware or a significant degradation in performance. Implementation trends toward less-reliable deep submicron transistors necessitate architectural techniques that increase availability.

We develop an availability solution, called SafetyNet, that uses a unified, lightweight checkpoint/recovery mechanism to support multiple long-latency fault detection schemes. SafetyNet efficiently coordinates checkpoints across the system in logical time and minimizes runtime overhead by pipelining checkpoint validation with subsequent parallel execution.

ISCA results show that SafetyNet can tolerate dropped coherence messages or the loss of an interconnection network switch (a) without adding statistically significant overhead during fault-free execution and (b) avoiding a crash when tolerated faults occur. The talk will also touch upon future directions toward tolerating hardware and software design errors.

Biography:

Mark D. Hill is Professor and Romnes Fellow in both the Computer Sciences Department and the Electrical and Computer Engineering Department at the University of Wisconsin-Madison. He currently co-directs the Wisconsin Multifacet project with Prof. David Wood. Hill is visiting Universidad Politecnica de Catalunya (UPC) for the 2002-2003 academic year.

Hill has made contributions to cache design, cache simulation, translation buffers, memory consistency models, parallel simulation, and parallel computer design. He won an NSF Presidential Young Investigator award in 1989, was named an IEEE Fellow in 2000 for "contributions to cache memory design and analysis," and co-won the best paper award in VLDB 2001.


Home : Colloquium 

Please contact our webadmin with any comments or changes.
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh.