The Common Component Architecture Forum | |||||||||||||||
|
Computational Quality of Service (CQoS)
The component-based approach to scientific computing confers two main advantages to the development of high-performance scientific software: (1) it facilitates the development of multiphysics applications by leveraging components that encapsulate the expertise of many investigators who typically have different expertise and preferences for programming styles and languages, and (2) it enables the automated assembly of those components so that their runtime and accuracy can be optimized. Computer and computational scientists at ANL, ORNL, SNL, and the University of Oregon, who are members of the SciDAC Center for Component Technology for Terascale Simulation Software, have recently defined a prototype software architecture for Computational Quality of Service (CQoS), which focuses on enabling the automatic selection and configuration of components to suit the computational conditions imposed by a simulation and its operating environment. Because numerical conditions typically change as simulations evolve, and because a massively parallel machine environment may degrade or improve with time, a CQoS-enabled environment provides dynamic adaptation support throughout the application's execution. A mathematical model of a component's non-functional capabilities, such as computational cost, accuracy, and failure rate, when used in combination with a description of another component's requirements, enables the automation of the process of application assembly. Such performance models can be supplied through user-provided or automated instrumentation of the source code, or through source code analysis. Given individual performance models for an ensemble of components, we have developed a global performance model evaluator, which estimates the performance model of an entire component ensemble. Optimizing the overall performance expressed in this aggregate model enables the automatic selection of components, resulting in enhanced overall application performance. Furthermore, the quality metrics and corresponding performance models can be used at runtime to substitute a component instance that no longer satisfies certain quality requirements with another functionally equivalent component, which implements a different algorithm. The CQoS infrastructure has been used by researchers in the SciDAC Computational Facility for Reacting Flow Science to analyze the computational and message-passing costs of their simulations with a view of scaling up to thousands of processors. Minimizing the ratio of message-passing time to computation time is crucial for this process. Whereas manual optimization is prohibitively complex, the automated application assembly and adaptation approach leads to far more efficient use of parallel resources. Selected Recent Publications |
||||||||||||||
© Copyright 2002-2004 | |||||||||||||||