A PSE for Air Quality Models Using the 3D+T+Mk
Archetype
Donald Dabdub(1) and K. Mani Chandy(2)
(1) Department of Mechanical and Aerospace Engineering
University of California, Irvine
Irvine, CA 92697-3975
(2) Department of Computer Sciences
California Institute of Technology 256-80
Pasadena, CA 91125
ABSTRACT
The development of a problem-solving environment (PSE) for air quality
models is presented. The focus of the work is on the integration of a variety
of parallel and sequential computers into a unified workbench accessible
to scientists and engineers who concentrate on the science and not on the
parallel programming aspects of air pollution modeling. In addition, the
problem-solving environment is designed to serve as a tool to be used in
education and public awareness efforts. The central idea of this work is
that of model abstraction in physical simulations. The abstraction of the
PSEs for physical simulations deals with space, time, and a collection
of model data. We call this problem domain the 3D+T+Mk domain.
The 3D+T+Mk abstraction is used to navigate through input and
output data, manage I/O, and specify modules of parallel programs. The
first problem in which these ideas are implemented is air pollution modeling
in the South Coast Air Basin of California. We present results of developments
of a PSE for atmospheric chemical dynamic models that describe mathematically
the transport and transformation of pollutants using a three-dimensional
Eulerian approach. This work uses the California Institute of Technology
(CIT) model as the underlying air quality model to drive the PSE.
INTRODUCTION
Computers in general, and parallel computers in particular, can be powerful tools for scientists, engineers and managers, if the users can focus attention on their specific problems rather than on general computational issues. One way to package a computational solution to a problem is to enclose it within a problem-solving environment (PSE) that is specific to the problem of interest. A PSE is a set of tools and methodologies designed to formulate the problem, to solve the problem and to analyze the results; all in a user friendly environment that is natural to the problem domain.
As stated by Gallopoulos et al. (1) the ideal PSEs provide "a framework that is all things to all people: they solve simple or complex problems, support rapid prototyping or detailed analysis, and can be used in introductory education or at the frontiers of science."
In particular, a PSE designed specifically for air quality modeling
allows environmental scientists, public policy planners, interested citizens
and students to explore the dynamics of photochemical smog. A PSE allows
them to focus their attention on issues of interest to them such as sources
of pollution, chemistry, and wind and temperature patterns, while paying
little attention to the machine (parallel architecture, message-passing
or shared-memory system, programming languages, graphics packages) on which
their programs execute.
RESEARCH GOALS
Diversity of Interests
The people who use a computer as a simulation engine may want to exploit
different levels of detail of the simulator. Undergraduate students who
use an airshed model are primarily interested in the interactions between
types and quantity of pollutants on the one hand and domain factors such
as wind, topology and temperature on the other; they are not interested
in manipulating the simulation program itself. Environmental engineers
are interested in the issues in which undergraduate students are interested,
and in addition, want to investigate algorithmic issues and study different
simulation scenarios. Others may be interested in porting the simulation
to a different parallel computer. A question that we are investigating
is as follows: Is it possible to design a problem solving environment (PSE)
that helps users exploit a simulation engine to the level of detail
of concern to them?
Specificity
A related issue is that of specificity versus generality of the PSE.
A PSE that is a precise fit for a person's needs is easier to use than
a more general one that has to be tailored. An air quality model (AQM)
of Southern California is a subset a family of fluid dynamics models. The
application has a high degree of specificity. The problem with high degrees
of specificity is that large numbers of PSEs are required to fit the many
different specific requirements precisely. Our challenge is to create PSE-development
methods that help in producing PSEs that can be tailored to a high-degree
of specificity with relatively little effort.
Composability of PSEs
PSEs can support collaborations between people with different interests
in solving a common problem. For instance, chemists, environmental engineers,
health professionals, business leaders and public policy experts may collaborate
on making decisions about an emission-control strategy in Los Angeles County.
These people may use different tools corresponding to their specific interests;
all these tools have to be linked into a collaborative PSE. A question
that we are working on is: How can different kinds of PSE tools (air quality
models, spreadsheets, automobile traffic models) be composed to form an
integrated environment?
Performance Tuning for Target Computer Architectures
High-performance simulations require that simulations be tuned to obtain
performance on target machines. Some users do not want to have anything
to do with tuning simulations, and they accept the best performance that
automatic tuning can provide. Others are willing to provide some information
to the runtime mechanisms to help them tune the application to fit the
target machine, especially when the machine is parallel. Our challenge
is to design a PSE through which the user can provide information about
performance tuning their simulations for specific machines.
Exploiting Internet Technology
Internet technology can be exploited in several ways including (i) supporting
users anywhere with Internet access to use the PSE remotely, or to download
the PSE, and (ii) helping users get access to appropriate machines, particularly
supercomputers, to execute their simulations. The focus of our project,
so far, has been on the first issue. We are specifically interested in
giving concerned citizens all over the country access to models that they
can use to understand their natural environment.
Summary of Scientific Issues
In summary, the scientific questions addressed by our research are:
THE EXPERIMENT
Our experiment consists of building PSEs for specific problems and then getting users to evaluate our PSEs. Next, we give an overview of the experiment and discuss the experiment in terms of the questions raised in the last section.
The problem domains from the most specific to the least specific are
as follows:
Our experiment is designed to test the extensibility of the PSE infrastructure.
The classes of users that we have targeted are: First, citizens who are concerned about their environment, but who may not be scientists. An even more specific group within this class are high-school students in Southern California interested in the environment. These people would use the PSE from remote sites via the Internet. Second, environmental scientists and researchers at the graduate school level and up, and a more specific group within this class of researchers who specialize in computer simulations of the environment.
CURRENT STATUS OF AIR QUALITY MODELING
Development of Air Quality Models
Mathematical models used to study the dynamics of photochemical air
pollution were first developed in the early 1970s. There have been comprehensive
research efforts in the identification, formulation and numerical solution
of the main physical and chemical processes associated with ozone production.
Tesche (2) and Seinfeld (3) describe the development and applications of
urban air quality models. A topic of current research in model development
is that of incorporating the aerosol phase into existing air quality models.
The heavy computational demands imposed by the aerosol computations provide
one of the driving forces motivating the use of parallel computers.
Parallelization of Air Quality Models
Considerable research efforts has been devoted to domain-decomposition
strategies that implement air-quality models on parallel supercomputers.
Results from previous research indicate that parallel implementation of
the chemistry operator, transport operator, and I/O routines are required
to obtain the highest speed-ups. A typical 24-hour run to simulate gas-phase
pollutant dynamics on the South Coast Air Basin of California using the
California Institute of Technology (CIT) model requires less than 7 minutes
on the Intel Paragon with 128 nodes (4). Figure 1 shows the performance
of the parallel implementation of the CIT model on different architectures.
Differences in performance are due to the different processor and network
speeds of each parallel computer. The PSE developed in this work can use
both the sequential and parallel implementations of the CIT model.
Problem Solving Environments for Air Quality Modeling
Much of the research on PSEs has dealt with environments for computational
mathematics in differential equations and linear algebra. For instance,
Langtangen (5) and Weerawarana et al. (6) present a PSE for differential
equations and a set of tools to develop PSEs respectively. There has been
some research to develop problem solving environments that are tailored
to specific applications. For instance, Fraga and McKinnon (7) have developed
a PSE for the automated synthesis of chemical process flowsheets. However,
there has been little research and development of a PSE designed for air
quality modeling.
DESIGN TOOLS AND METHODOLOGIES
The PSE uses the CIT air-quality model (AQM) as the first environmental
application. Other typical urban- and regional-scale AQMs have a structure
similar to that of the CIT model. Thus there is no loss of generality in
considering this model as a test case. The tool used to develop the PSE
is Tcl/Tk. Tcl, pronounced tickle, stands for "tool command language.''
Tcl is actually two things: a language and a library (8). First, Tcl is
a simple textual language intended primarily for issuing commands to interactive
programs. Second, Tcl is a library package that we embed in our model.
The Tcl library consists of a parser for the Tcl language, routines to
implement the Tcl built-in commands, and procedures that allow each application
to extend Tcl with additional commands specific to that application.
There are various advantages to using Tcl to develop the PSE. First,
Tcl provides a standard syntax: once users know Tcl, they will be able
to issue commands easily to any Tcl-based application. Second, Tcl provides
programmability. All a Tcl application needs to do is to implement a few
application-specific low-level commands. Tcl provides many utility commands
plus a general programming interface for building up complex command procedures.
By using Tcl, applications need not re-implement these features. Third,
extensions to Tcl, such as the Tk toolkit, provide mechanisms for communicating
between applications by sending Tcl commands back and forth. The common
Tcl language framework makes it easier for applications to communicate
with one another. Fourth, Tcl is available free of charge. Fifth, it runs
on a wide variety of platforms. Sixth, Tcl provides the capability to interact
with popular World Wide Web browsers to exploit Internet technology.
The central abstraction of the PSE for air quality models deals with
space, time and a collection of model data. We call this problem domain
the 3D+T+Mk domain, where there are three dimensions of space
(3D), one dimension of time (T) and k dimensions inherent to the model.
The 3D+T+Mk abstraction is used to construct parallel program
archetypes, navigate through input and output data, and manage I/O. For
example, the time series plot for ozone concentration at a given monitoring
station or an animated display of the isopleth of a given pollutant are
different projections of the 3D+T+Mk space. By using a higher
level of abstraction in the design of the PSE, the main structure of the
code can be reused to develop a PSE for other air quality models or other
environmental applications. A focus of our research is the evaluation of
reuse of the 3D+T+Mk archetype.
COMPONENTS OF THE PROBLEM SOLVING ENVIRONMENT
The main components of the PSE are shown in Figure 2. This section discusses
each of the modules already incorporated and comments on the need of parallel
program archetypes that have not yet been implemented. We designed the
following components to be reusable.
Database Management
Air quality models require large sets of input data: time, space, or
time/space dependent variables. In addition, the type of data used might
be scalar (like temperature fields) or vector (like wind fields). We developed
a module within the PSE that describes, extracts, and integrates the large
meteorological and/or emission inventory databases required by environmental
models. The standardization of data models is crucial for the development
of reusable object-model-based libraries such as parallel archetypes or
visualization routines. In addition, the conceptual data model eases the
design of communication channels among all objects within the PSE.
Algorithm Modularity
The computational needs of air quality models are often concentrated
on the solution of a few operators. For example, the most challenging numerical
aspects of solving the atmospheric diffusion equation are the chemistry
operator and the advection operator. The chemistry operator consists of
solving a system of stiff nonlinear, coupled ordinary differential equations.
It is well known that the main challenge presented by a chemistry solver
is that of performance and robustness. The advection operator consists
of solving a hyperbolic partial differential equation to account for the
transport processes of pollutants in the atmosphere. The main challenge
presented by the advection solver is that of accuracy. The PSE packages
a number of algorithms in a modular fashion to allow the rapid prototyping
of the numerical techniques to be used by the chemistry, transport and
filtering algorithms of the air quality model.
Analysis and Visualization
Air quality models produce large output data files. When the model output has been generated, the engineering and scientific analysis component of the work has only begun. We developed a visualization component within the PSE that manipulates both the input and data generated by the model. The user is allowed to interactively explore a variety of data representations. These options for analysis include the capability of displaying fixed-space and variable-time data (time series plots), variable-space and fixed-time data (contour plots) using a color-coded approach, as well as the animated display of model predictions (see Figure 3).
I/O Management
While air quality models are executing, results are written to disk.
The data sets generated are rather large and, in most cases, unformatted.
The objective of the I/O manager is to have a standard representation of
the output data as it is moved into a heterogeneous network containing
sequential, distributed and parallel architectures. The I/O manager also
provides a common representation of data regardless of its location (or
locations, in the case of a parallel file system).
Initial and Boundary Conditions Manager
The initial conditions and boundary conditions of models are stored as objects in the PSE. In this manner, they are isolated from the kernel physics and chemistry of the air quality model. A researcher is able to quickly and interactively reconfigure and retest the simulations with different initial and/or boundary conditions without the need to rewrite code.
Parallel Program Archetypes
Parallel computers have begun to enjoy wide usage in environmental applications. In many areas of environmental modeling, the use of parallel computers is no longer a luxury but a necessity. Nevertheless, porting serial models to parallel architectures has been regarded as more challenging than developing correct sequential software, due primarily to the fact that modelers may have to deal with nondeterminacy and multiple threads of execution. We have developed parallel program archetypes that abstracts the parallel structure of the program into a skeleton; the users fleshes out the skeleton by providing sequential programs for the slots of the skeleton. Thus, users can (if they so choose) focus primarily on sequential programming issues, allowing the parallel archetype to take care of parallel features. For instance the mesh-spectral archetype (9,10) used to parallelize the CIT air quality can be used to parallelize any air-quality model or, for that matter, any model that follows a similar data flow dependency. The air-quality model was designed before we had completed development of the mesh-spectral archetype, but it fits the structure of the archetype.
Some of the key questions that must be addressed in order to develop
parallel archetypes are: What is the best way to map model processes to
the computer processors? How can the best load balancing be achieved? When
should one follow a task versus a data parallel paradigm? How should parallel
archetypes be integrated with the work bench? The use of archetypes may
not provide the performance obtained by optimizing message passing, but
it simplifies the task of parallelization. This is analogous to the use
of high-level languages: one may be able to obtain higher performance using
assembly code, but the ease of programming justifies the reduced performance.
In addition to the benefit of reducing the effort required to produce efficient
and accurate concurrent environmental applications, parallel archetypes
help in code portability between different runtime systems and machines.
CONCLUSIONS
This paper describes an experiment on the development of a "machine tool" for PSEs. The hypothesis is that a robust, easy-to-use PSE abstract structure can be tailored to obtain PSEs for specific problems simply. The effort required to develop a robust problem-specific PSE is very high because the PSE has to help users explore the problem space no matter what they do; our hypothesis is that this effort can be amortized over several problem-specific PSEs by employing an abstraction. The problem-space abstraction that we use is the 3D+T+Mk archetype for 3D space, time and k parameters at each point. We expect to be able to reuse our abstraction to develop PSEs for problems that fit this structure.
Our experiment to test our hypothesis consists of developing a PSE archetypal structure for the 3D+T+Mk problem space and applying this structure to a specific problem. The first problem we chose was a simulation of an airshed. The abstract PSE structure was developed concurrently with the development of the airshed PSE with the airshed problem suggesting the design of the abstraction, and the design of the abstraction suggesting ideas for the airshed PSEs. The first version of a PSE for simulating the airshed for Southern California is now complete and will be used for courses in environmental engineering. The back-end of the PSE can be a sequential or parallel computer.
The first part of the experiment consists of evaluating the completeness of the PSE for the Southern California airshed; there is little point using an abstraction that cannot be used even for one specific problem. This part of the experiment is well under way and the results are promising. The target classes of users are (i) environmentally-concerned citizens with little knowledge of computation or chemistry, (ii) environmental scientists who are primarily concerned about environmental science and engineering but only secondarily concerned about computational details, and (iii) computational scientists including those primarily interested in parallel computation. The promising results have encouraged us to continue with the experiment. Of course, one of the most important benefits of the experiment is the Southern California airshed PSE itself. We encourage readers to evaluate the airshed PSE.
The remaining parts of the experiment are to formalize the 3D+T+Mk
archetypal PSE, and to then evaluate the ease of applying the archetype
to different problems, some similar to the airshed PSE and others that
are very different. We have evaluated a sequence of such problems, and
expect to develop problem-specific PSEs for other problems in the coming
year.
ACKNOWLEDGEMENTS
The authors would like to thank the division of Advanced Scientific
Computing of the National Science Foundation for providing support for
the work reported here under grant CCR-9527130 and the National Science
Foundation Center for Research in Parallel Computation. This research was
performed in part using the Intel Paragon System at the Caltech Center
for Advanced Computing Research.
REFERENCES
1 Gallopoulos, E.; Houstis, E.N.; Rice, J.R. IEEE Comp. Sci. Engr. 1994 1, 11-23.
2 Tesche, T.W. Environ. Int. 1983 9, 465-490.
3 Seinfeld, J.H. J. Air Pollut. Control Assoc. 1988 38, 616-645.
4 Dabdub D.; Seinfeld J.H. Parallel Computing 1996 22, 111-130.
5 Langtangen, H.P. Diffpack: Software for partial differential equations http://www.oslo.sintef.no/diffpack/dplibrary.html
6 Weerawarana, S., Houstis, E.N., Rice, J.R., Catlin, A.C., Crabill, C.L., Chui, C.C. and Marcus, S.; PDELab: An Object-Oriented Framework for Building Problem Solving Environments for PDE Based Applications, Technical Report CSD-TR-94-021, Department of Computer Sciences, Purdue University, 1994.
7 Fraga, E.S. and McKinnon, K.I.M.; CHiPS: A Process Synthesis Package, Technical Report 1993-06, Department of Chemical Engineering, Edinburgh University, 1993.
8 Ousterhout, J.K.; Tcl and the Tk Toolkit; Addison-Wesley Professional Computing.: Massachusetts, 1994.
9 Chandy, K.M., Manohar R., Massingill B.L., and Meiron D.I.; Integrating Task and Data Parallelism with the Group Communication Archetype, International Parallel Processing Symposium, 1995.
10 Chandy, K.M.; Concurrent Program Archetypes, Proceedings of the Scalable Parallel Library Conference, 1994.
Figure 1: CPU time for a 24-hour simulation of the South Coast
Air Basin using a parallel version of the CIT model on various parallel
architectures.
Figure 2: Modules of the Problem Solving Environment for air quality
models.
Figure 3: Snapshot from the visualization module of the Problem
Solving Environment for the CIT air quality model. This view shows the
concentration of ozone in the South Coast Air Basin of California at 15:00
hours for August 27, 1987.