Use Cases for MCMD Parallelism that Coop Parallelism Anticipates
(POC: John May, LLNL)

1. Subletting of work to improve load balance.

An application that does a multiphysics or multiscale simulation may have problems with load balance because some processes have extra work to do at certain scales or in certain physics regimes. An MCMD model can alleviate the load balance problem by dividing the work into a main, well-balanced portion that runs as a single, standard data parallel job, and one or more other portions that are not well balanced over the domain of the computation. A pool of server components can be created to complete requests on behalf of the busier tasks in the main job. A job can submit a request to do a particular calculation, along with the input parameters, through a remote method invocation. An intermediate proxy can receive this request and assign the calculation to an available server, which passes back a result when it has completed the calculation. The requesting task can block while waiting for the response or, preferably, continue with its work and generate more further nonblocking requests while awaiting the results. As an alternative to using nonblocking requests, the main job can be scheduled so that multiple tasks are running on each processor. Then when one task is waiting for a response from a server, another task can do its computation for a different part of the domain. The server components can themselves be parallel jobs, so that they can complete a computation faster than the requesting task could have done on its own. This parallelism, along with the potential for having a processor in the main computation issue multiple concurrent requests, allows the busier processors to complete their work sooner than they could otherwise, thus improving load balance. An extension of this idea would allow a pool of servers to support multiple calculations. A single component could have methods for each calculation,or a separate component could be written for each computation, and one instance of each type of component could be launched onto a processors (or a group of processors if the components were internally parallel).

This scenario is designed to require minimal changes to the main model. It also assumes that components can have internal parallelism. The main application, the server code, and the proxy all run separate executables and share no data except through function call and return values. One version of this scenario requires the runtime system to allow multiple components to run concurrently on the same processor or processors.

2. Parameter studies using multiple instances of a parallel model.

Developers of a model often want to understand how changes to the input parameters of a model affect its output. One way to get this information is to run the model many times while systematically changing the inputs. If the complete set of inputs is known in advance, a simple script can submit a series of runs and gather the results. However, it may be useful to use results of initial runs to guide the choice of parameters for later runs. For example, if the model is found to be especially sensitive to changes in a certain range of parameter values, a study could include more tests in that range. An MCMD implementation of this technique could use a single master component that launched multiple instances of a parallel component executing the model being studied. As each instance completed, the master would evaluate the results seen so far and choose parameters for the next instantiation of the model. It would continue in this way until all the parameters of interest had been tested.

This scenario is designed to require little or no change to the model being tested. It also assumes that components can have internal parallelism. The master and the model would run separate executables. The controlling software would need to be able to launch new components into a processor pool as needed and detect when they had finished running (or terminated abnormally).

3. Federated multiphysics simulations.

A multiphysics simulation could be build from existing codes to model complex scenarios. A classic example is the idea of coupling models of the ocean, atmosphere, and sea ice to form a complete climate model. Another example would be combining a structure mechanics model with a fluid dynamics model to simulate airflow over a wing. In these examples, each component is itself a parallel code, a collection of tasks within a component may need communicate date to multiple tasks in another component. This many-to-many communication entails some complex, but tractable problems. Cooperative parallelism does not yet support many-to-many remote method invocation, but we are aware of applications for it.

This scenario would require some changes to existing codes to support intercommunication. In some cases, parallel components might be created or destroyed on the fly; for example if a physical object separated into multiple pieces, a separate component might each new piece. Each component would run one of several different executables.

4. Smarter I/O processing.

This is a newer idea that we've just started tossing around with a few people. A symponent could be designed to look like a smart I/O object. When a task (or group of tasks) wanted to write a big piece of data, that data could be sent through an RMI to a symponent that did some further processing on the data before writing it to disk. This could be simple compression, or it could include analysis, filtering, or merging with other data. Once the data had been massaged, it could be stored on disk. The advantage of this approach over current I/O models is that it gives the app a chance to do some CPU-intensive work on the data concurrently with the ongoing work of the application. Of course, the same thing could be done with threads, but this might be a cleaner and more portable interface: you could develop a postprocessing pacakge independently of its customer application(s). Of course, you could do the same thing in reverse to do preprocessing on data read from disk or other sources. One potential problem is extra cost of moving data between symponents before writing it to disk. (But this is really no worse than writing it directly to disk as far as the requesting process is concerned.

It has to pay the cost of moving the data out of its address space, and sending it to another symponent is similar to writing it to an internal buffer. The I/O symponent has to write the processed data to disk, but that step is invisible to the part of the application that generated the data.) Another issue is handling collective parallel I/O, where multiple tasks want to send data for postprocessing in a coordinated way. This is an instance of the many-to-many communication problem described in Item 3.

Created by: manoj77 last modification: Friday 13 of April, 2007 [18:39:45 UTC] by manoj77

similar

Online users

16 online users