CQoS Infrastructure Update

Li Li's work note

Motivation

Our CQoS infrastructure design is partially motivated by SciDAC applications in:

parallel mesh partitioning in combustion (CFRFS project, BES ... more details coming from Jaideep et al. ... link here)
managing resources in quantum chemistry (BES ... details coming from Joe, Meng-Shiou, Masha ... link here)
efficient solution of linear systems in fusion and accelerator simulations (FACETS project, FES, and COMPASS project, HEP/NP/BES)

Adaptive linear solvers: The solution of linear algebraic systems of equations often dominates the overall runtime of large-scale PDE-based simulations, including some phases of modeling in the COMPASS accelerator and FACETS fusion projects. CQoS work here focuses on determining appropriate choices for algorithms and parameters of TOPS linear and nonlinear solver components. Because the properties of linear systems in time-dependent and/or nonlinear applications may significantly change during the coarse of a given simulation, CQoS-enabled adaptive multimethod solvers have promise to improve robustness and reduce overall time to solution. The design goal is to help select Krylov methods and preconditioners dynamically to match the attributes of linear systems as they change during the course of long-running simulations.

The basic idea is during runtime to monitor the overall nonlinear, time-dependent solution process as well as changes in linear systems through computing matrix metadata that describe various system properties. Exhaustive analyses of properties of a number of matrices are stored in a historical database for subsequent analysis, together with performance results, for instance from solving linear systems with these matrices. New matrices, as described by their metadata, can then be matched up against this database for recommendations as to preferred solution methods.

In the initial investigation, we simply use the criterion of having a symmetric vs. nonsymmetric linear operator to select an appropriate Krylov method and preconditioner for use within Newton’s method:

Driven cavity operator is nonsymmetric.
Krylov solvers for nonsymmetric systems include GMRES, TFQMR.
Preconditioners for nonsymmetric systems include ILU, SOR.

The adaptive strategy will gradually increase in its sophistication.

CQoS Infrastructure Design Principles

info coming here ...

Metadata and Performance Database Query and Management Capabilities

Matrix metadata format

We use the Anamod metadata standard (V. Eijkhout, Univ. of Texas) to represent properties of matrices and linear systems. Anamod provides eight categories:

Estimates for the departure from normality
- Example elements are:
  - trace-asquared: an auxiliary quantity
  - commutator-normF: the Frobenius norm of the commutator

Simple (normlike) quantities
- Contains simple, normlike quantities, In general, these are properties that can be calculated in time proportional to the number of nonzeros of the matrix. Example elements are:
  - trace: Trace, sum of diagonal elements;
  - normF: Frobenius-norm, sqrt of sum of elements squared;

Spectral properties
- Provides various estimates that are derived from running the system through a GMRES iteration for a few dozen iterations. Example elements are:
  - ritz-values-c: complex parts of stored Ritz values
  - ellipse-ax: size of -axis of the enclosing ellipse

Structural properties
- Computes matrix properties that are only a function of the nonzero structure. Thus, these properties will likely stay invariant during a nonlinear solve process, or while time-stepping a system of equations. Example elements are:
  - nrows: number of rows in the matrix
  - symmetry: 1 for symmetric matrix, 0 otherwise

Measurements of element variance
- Contains various estimates of how different elements in a matrix are. Example elements are:
  - diagonal-average: average value of absolute diagonal elements,
  - diagonal-variance: standard deviation of diagonal average

ICMK distribution
- ICMK (`Inverse Cuthill-McKee') is a heuristic that, based on the sparsity structure of a matrix, tries to find a meaningful block structure. This is done without permuting the matrix. This block structure can be used in Block Jacobi preconditioners: distributing the matrix over parallel processors according to the block structure often improves convergence. Example elements are:
  - nsplits: number of split points found in the matrix
  - splits: the sequence of split points, including 0 and N

Intelligent Preconditioner Recommendation System
- Structural properties used in the University of Kentucky `Intelligent Preconditioner Recommendation System'. Example elements are:
  - nnzup: number of nonzeros in upper triangle
  - relsymm: relative symmetry, ratio of structural symmetric elements to total number of nonzeros

Jones Plassmann multi-colouring
- Example elements are:
  - n-colours: the number of colours computed
  - colour-set-sizes: an array containing in location i the number of points of colour i.

Metadata and performance database

We generate and store the matrix metadata in a database, together with metadata at other levels of abstraction, and performance analysis results from applying different linear solvers to these matrices. The metadata at different levels of abstraction includes system parameters with respect to hardware and compiler configuration, algorithmic parameters and problem-specific parameters. Together the metadata should provide enough information to reproduce a particular experiment instance.

As a new matrix or linear system arise during the course of simulation, the adapter consults the history database of previous runs in judging the match between the system and linear solvers and the associated solution parameters. It look up “the best” performed method for solving the system, using the associated metadata as search indices. The recommendations as to preferred solution methods are in the form of, for instance, Krylov method, preconditioner, algorithmic parameters configuring the method. After the system has been solved, data reflecting this run can be added to the database as a new case.

Proposed performance database store and query API

This initial approach is motivated by linear solvers in fusion and accelerator applications; we will extend/generalize this API based on needs from quantum chemistry and combustion applications.

loadSchema Load schema specifying matrix and linear system metadata format.

loadMetadatafromXML Load matrix metadata, for instance, in a XML file format into database.

loadExperimentData Load performance data from an experiment run to the database, associate the data to corresponding metadata items in the database.

determineMethod(metaName, metaValue, metric) Match metadata name and value with respect to a matrix property to best-performed (in terms of the specified performance evaluation metric, e.g., convergence rate, execution time) linear solver in history database.

determineMethodParameter(method, paramName, metaName, metaValue, metric) Given the linear solver selected, determine an ‘optimal’ parameter configuration by searching history performance information in the database.

loadAnalysisData After the system has been solved, analysis data reflecting this run is added to the database as a new case. The analysis data consists of brief summary of performance characteristic of the run, such as, number of iterations, convergence rate, execution time per iteration, which will be used as a key to expedite the search for best solution in the database.

Metadata at other levels of abstraction, for instance, system and algorithm parameters, will add complexity to database queries. An adaptive method may require questions like ‘which linear solver fits in when the linear system to be solved is becoming sparse, considering the memory limitation of the machine the program is running on? ’ be answered. The query APIs can be organized in a multiple-levels structure, with each level reflecting a specific metadata abstraction level.

Return to CQoS meeting area

Created by: likli last modification: Monday 27 of August, 2007 [15:09:01 UTC] by likli

The original document is available at http://www.cca-forum.org/wiki/tiki-index.php?page=CQoS%20Infrastructure%20Update