CIFTS: Coordinated Infrastructure for Fault Tolerant Systems
Providing the connections to allow the development of component-based applications capable of taking an active role in responding to faults in HPC systems.
Collaboration Status: Active
TASCS Contact: David E. Bernholdt, ORNL, bernholdtde@ornl.gov
Collaboration Summary: The CIFTS project is developing a software infrastructure and conventions (the Fault Tolerance Backplane, FTB) to facilitate awareness of fault-related events within HPC systems from the hardware level to the operating system and system software, up to the application level. The goal of this collaboration is to integrate the FTB into the CCA environment to allow component applications to participate as both consumers and producers of fault information.
Collaboration Notes: We are working with the SWIM plasma physics project to prototype a fault-aware component-based application. This initial demonstration will not involve CCA-compliant components, but will use an Event Service modeled on the CCA Event service, and the lightweight, custom-built Integrated Plasma Simulator framework. Based on the results of this demonstration, we plan to develop Babel bindings for the FTB and a bridge between the FTB and the CCA Event Service.
Collaboration Image URLs: n/a
Software Involved: Fault Tolerance Backplane (FTB), CCA Event Service
Collaboration URL: n/a
Partner Project URL: http://www.mcs.anl.gov/research/cifts/
Partner Project Sponsorship: DOE non-SciDAC (OASCR)