A PSE for Air Quality Models Using the 3D+T+Mk Archetype

Donald Dabdub(1) and K. Mani Chandy(2)

(1) Department of Mechanical and Aerospace Engineering
University of California, Irvine
Irvine, CA 92697-3975

(2) Department of Computer Sciences
California Institute of Technology 256-80
Pasadena, CA 91125

ABSTRACT

The development of a problem-solving environment (PSE) for air quality models is presented. The focus of the work is on the integration of a variety of parallel and sequential computers into a unified workbench accessible to scientists and engineers who concentrate on the science and not on the parallel programming aspects of air pollution modeling. In addition, the problem-solving environment is designed to serve as a tool to be used in education and public awareness efforts. The central idea of this work is that of model abstraction in physical simulations. The abstraction of the PSEs for physical simulations deals with space, time, and a collection of model data. We call this problem domain the 3D+T+Mk domain. The 3D+T+Mk abstraction is used to navigate through input and output data, manage I/O, and specify modules of parallel programs. The first problem in which these ideas are implemented is air pollution modeling in the South Coast Air Basin of California. We present results of developments of a PSE for atmospheric chemical dynamic models that describe mathematically the transport and transformation of pollutants using a three-dimensional Eulerian approach. This work uses the California Institute of Technology (CIT) model as the underlying air quality model to drive the PSE.

INTRODUCTION

Computers in general, and parallel computers in particular, can be powerful tools for scientists, engineers and managers, if the users can focus attention on their specific problems rather than on general computational issues. One way to package a computational solution to a problem is to enclose it within a problem-solving environment (PSE) that is specific to the problem of interest. A PSE is a set of tools and methodologies designed to formulate the problem, to solve the problem and to analyze the results; all in a user friendly environment that is natural to the problem domain.

As stated by Gallopoulos et al. (1) the ideal PSEs provide "a framework that is all things to all people: they solve simple or complex problems, support rapid prototyping or detailed analysis, and can be used in introductory education or at the frontiers of science."

In particular, a PSE designed specifically for air quality modeling allows environmental scientists, public policy planners, interested citizens and students to explore the dynamics of photochemical smog. A PSE allows them to focus their attention on issues of interest to them such as sources of pollution, chemistry, and wind and temperature patterns, while paying little attention to the machine (parallel architecture, message-passing or shared-memory system, programming languages, graphics packages) on which their programs execute.

RESEARCH GOALS

Diversity of Interests

The people who use a computer as a simulation engine may want to exploit different levels of detail of the simulator. Undergraduate students who use an airshed model are primarily interested in the interactions between types and quantity of pollutants on the one hand and domain factors such as wind, topology and temperature on the other; they are not interested in manipulating the simulation program itself. Environmental engineers are interested in the issues in which undergraduate students are interested, and in addition, want to investigate algorithmic issues and study different simulation scenarios. Others may be interested in porting the simulation to a different parallel computer. A question that we are investigating is as follows: Is it possible to design a problem solving environment (PSE) that helps users exploit a simulation engine to the level of detail of concern to them?

Specificity

A related issue is that of specificity versus generality of the PSE. A PSE that is a precise fit for a person's needs is easier to use than a more general one that has to be tailored. An air quality model (AQM) of Southern California is a subset a family of fluid dynamics models. The application has a high degree of specificity. The problem with high degrees of specificity is that large numbers of PSEs are required to fit the many different specific requirements precisely. Our challenge is to create PSE-development methods that help in producing PSEs that can be tailored to a high-degree of specificity with relatively little effort.

Composability of PSEs

PSEs can support collaborations between people with different interests in solving a common problem. For instance, chemists, environmental engineers, health professionals, business leaders and public policy experts may collaborate on making decisions about an emission-control strategy in Los Angeles County. These people may use different tools corresponding to their specific interests; all these tools have to be linked into a collaborative PSE. A question that we are working on is: How can different kinds of PSE tools (air quality models, spreadsheets, automobile traffic models) be composed to form an integrated environment?

Performance Tuning for Target Computer Architectures

High-performance simulations require that simulations be tuned to obtain performance on target machines. Some users do not want to have anything to do with tuning simulations, and they accept the best performance that automatic tuning can provide. Others are willing to provide some information to the runtime mechanisms to help them tune the application to fit the target machine, especially when the machine is parallel. Our challenge is to design a PSE through which the user can provide information about performance tuning their simulations for specific machines.

Exploiting Internet Technology

Internet technology can be exploited in several ways including (i) supporting users anywhere with Internet access to use the PSE remotely, or to download the PSE, and (ii) helping users get access to appropriate machines, particularly supercomputers, to execute their simulations. The focus of our project, so far, has been on the first issue. We are specifically interested in giving concerned citizens all over the country access to models that they can use to understand their natural environment.

Summary of Scientific Issues

In summary, the scientific questions addressed by our research are:

THE EXPERIMENT

Our experiment consists of building PSEs for specific problems and then getting users to evaluate our PSEs. Next, we give an overview of the experiment and discuss the experiment in terms of the questions raised in the last section.

The problem domains from the most specific to the least specific are as follows:

Our experiment is designed to test the extensibility of the PSE infrastructure.

The classes of users that we have targeted are: First, citizens who are concerned about their environment, but who may not be scientists. An even more specific group within this class are high-school students in Southern California interested in the environment. These people would use the PSE from remote sites via the Internet. Second, environmental scientists and researchers at the graduate school level and up, and a more specific group within this class of researchers who specialize in computer simulations of the environment.

CURRENT STATUS OF AIR QUALITY MODELING

Development of Air Quality Models

Mathematical models used to study the dynamics of photochemical air pollution were first developed in the early 1970s. There have been comprehensive research efforts in the identification, formulation and numerical solution of the main physical and chemical processes associated with ozone production. Tesche (2) and Seinfeld (3) describe the development and applications of urban air quality models. A topic of current research in model development is that of incorporating the aerosol phase into existing air quality models. The heavy computational demands imposed by the aerosol computations provide one of the driving forces motivating the use of parallel computers.

Parallelization of Air Quality Models

Considerable research efforts has been devoted to domain-decomposition strategies that implement air-quality models on parallel supercomputers. Results from previous research indicate that parallel implementation of the chemistry operator, transport operator, and I/O routines are required to obtain the highest speed-ups. A typical 24-hour run to simulate gas-phase pollutant dynamics on the South Coast Air Basin of California using the California Institute of Technology (CIT) model requires less than 7 minutes on the Intel Paragon with 128 nodes (4). Figure 1 shows the performance of the parallel implementation of the CIT model on different architectures. Differences in performance are due to the different processor and network speeds of each parallel computer. The PSE developed in this work can use both the sequential and parallel implementations of the CIT model.

Problem Solving Environments for Air Quality Modeling

Much of the research on PSEs has dealt with environments for computational mathematics in differential equations and linear algebra. For instance, Langtangen (5) and Weerawarana et al. (6) present a PSE for differential equations and a set of tools to develop PSEs respectively. There has been some research to develop problem solving environments that are tailored to specific applications. For instance, Fraga and McKinnon (7) have developed a PSE for the automated synthesis of chemical process flowsheets. However, there has been little research and development of a PSE designed for air quality modeling.

DESIGN TOOLS AND METHODOLOGIES

The PSE uses the CIT air-quality model (AQM) as the first environmental application. Other typical urban- and regional-scale AQMs have a structure similar to that of the CIT model. Thus there is no loss of generality in considering this model as a test case. The tool used to develop the PSE is Tcl/Tk. Tcl, pronounced tickle, stands for "tool command language.'' Tcl is actually two things: a language and a library (8). First, Tcl is a simple textual language intended primarily for issuing commands to interactive programs. Second, Tcl is a library package that we embed in our model. The Tcl library consists of a parser for the Tcl language, routines to implement the Tcl built-in commands, and procedures that allow each application to extend Tcl with additional commands specific to that application.

There are various advantages to using Tcl to develop the PSE. First, Tcl provides a standard syntax: once users know Tcl, they will be able to issue commands easily to any Tcl-based application. Second, Tcl provides programmability. All a Tcl application needs to do is to implement a few application-specific low-level commands. Tcl provides many utility commands plus a general programming interface for building up complex command procedures. By using Tcl, applications need not re-implement these features. Third, extensions to Tcl, such as the Tk toolkit, provide mechanisms for communicating between applications by sending Tcl commands back and forth. The common Tcl language framework makes it easier for applications to communicate with one another. Fourth, Tcl is available free of charge. Fifth, it runs on a wide variety of platforms. Sixth, Tcl provides the capability to interact with popular World Wide Web browsers to exploit Internet technology.

The central abstraction of the PSE for air quality models deals with space, time and a collection of model data. We call this problem domain the 3D+T+Mk domain, where there are three dimensions of space (3D), one dimension of time (T) and k dimensions inherent to the model. The 3D+T+Mk abstraction is used to construct parallel program archetypes, navigate through input and output data, and manage I/O. For example, the time series plot for ozone concentration at a given monitoring station or an animated display of the isopleth of a given pollutant are different projections of the 3D+T+Mk space. By using a higher level of abstraction in the design of the PSE, the main structure of the code can be reused to develop a PSE for other air quality models or other environmental applications. A focus of our research is the evaluation of reuse of the 3D+T+Mk archetype.

COMPONENTS OF THE PROBLEM SOLVING ENVIRONMENT

The main components of the PSE are shown in Figure 2. This section discusses each of the modules already incorporated and comments on the need of parallel program archetypes that have not yet been implemented. We designed the following components to be reusable.

Database Management

Air quality models require large sets of input data: time, space, or time/space dependent variables. In addition, the type of data used might be scalar (like temperature fields) or vector (like wind fields). We developed a module within the PSE that describes, extracts, and integrates the large meteorological and/or emission inventory databases required by environmental models. The standardization of data models is crucial for the development of reusable object-model-based libraries such as parallel archetypes or visualization routines. In addition, the conceptual data model eases the design of communication channels among all objects within the PSE.

Algorithm Modularity

The computational needs of air quality models are often concentrated on the solution of a few operators. For example, the most challenging numerical aspects of solving the atmospheric diffusion equation are the chemistry operator and the advection operator. The chemistry operator consists of solving a system of stiff nonlinear, coupled ordinary differential equations. It is well known that the main challenge presented by a chemistry solver is that of performance and robustness. The advection operator consists of solving a hyperbolic partial differential equation to account for the transport processes of pollutants in the atmosphere. The main challenge presented by the advection solver is that of accuracy. The PSE packages a number of algorithms in a modular fashion to allow the rapid prototyping of the numerical techniques to be used by the chemistry, transport and filtering algorithms of the air quality model.

Analysis and Visualization

Air quality models produce large output data files. When the model output has been generated, the engineering and scientific analysis component of the work has only begun. We developed a visualization component within the PSE that manipulates both the input and data generated by the model. The user is allowed to interactively explore a variety of data representations. These options for analysis include the capability of displaying fixed-space and variable-time data (time series plots), variable-space and fixed-time data (contour plots) using a color-coded approach, as well as the animated display of model predictions (see Figure 3).

I/O Management

While air quality models are executing, results are written to disk. The data sets generated are rather large and, in most cases, unformatted. The objective of the I/O manager is to have a standard representation of the output data as it is moved into a heterogeneous network containing sequential, distributed and parallel architectures. The I/O manager also provides a common representation of data regardless of its location (or locations, in the case of a parallel file system).

Initial and Boundary Conditions Manager

The initial conditions and boundary conditions of models are stored as objects in the PSE. In this manner, they are isolated from the kernel physics and chemistry of the air quality model. A researcher is able to quickly and interactively reconfigure and retest the simulations with different initial and/or boundary conditions without the need to rewrite code.


Parallel Program Archetypes

Parallel computers have begun to enjoy wide usage in environmental applications. In many areas of environmental modeling, the use of parallel computers is no longer a luxury but a necessity. Nevertheless, porting serial models to parallel architectures has been regarded as more challenging than developing correct sequential software, due primarily to the fact that modelers may have to deal with nondeterminacy and multiple threads of execution. We have developed parallel program archetypes that abstracts the parallel structure of the program into a skeleton; the users fleshes out the skeleton by providing sequential programs for the slots of the skeleton. Thus, users can (if they so choose) focus primarily on sequential programming issues, allowing the parallel archetype to take care of parallel features. For instance the mesh-spectral archetype (9,10) used to parallelize the CIT air quality can be used to parallelize any air-quality model or, for that matter, any model that follows a similar data flow dependency. The air-quality model was designed before we had completed development of the mesh-spectral archetype, but it fits the structure of the archetype.

Some of the key questions that must be addressed in order to develop parallel archetypes are: What is the best way to map model processes to the computer processors? How can the best load balancing be achieved? When should one follow a task versus a data parallel paradigm? How should parallel archetypes be integrated with the work bench? The use of archetypes may not provide the performance obtained by optimizing message passing, but it simplifies the task of parallelization. This is analogous to the use of high-level languages: one may be able to obtain higher performance using assembly code, but the ease of programming justifies the reduced performance. In addition to the benefit of reducing the effort required to produce efficient and accurate concurrent environmental applications, parallel archetypes help in code portability between different runtime systems and machines.

CONCLUSIONS

This paper describes an experiment on the development of a "machine tool" for PSEs. The hypothesis is that a robust, easy-to-use PSE abstract structure can be tailored to obtain PSEs for specific problems simply. The effort required to develop a robust problem-specific PSE is very high because the PSE has to help users explore the problem space no matter what they do; our hypothesis is that this effort can be amortized over several problem-specific PSEs by employing an abstraction. The problem-space abstraction that we use is the 3D+T+Mk archetype for 3D space, time and k parameters at each point. We expect to be able to reuse our abstraction to develop PSEs for problems that fit this structure.

Our experiment to test our hypothesis consists of developing a PSE archetypal structure for the 3D+T+Mk problem space and applying this structure to a specific problem. The first problem we chose was a simulation of an airshed. The abstract PSE structure was developed concurrently with the development of the airshed PSE with the airshed problem suggesting the design of the abstraction, and the design of the abstraction suggesting ideas for the airshed PSEs. The first version of a PSE for simulating the airshed for Southern California is now complete and will be used for courses in environmental engineering. The back-end of the PSE can be a sequential or parallel computer.

The first part of the experiment consists of evaluating the completeness of the PSE for the Southern California airshed; there is little point using an abstraction that cannot be used even for one specific problem. This part of the experiment is well under way and the results are promising. The target classes of users are (i) environmentally-concerned citizens with little knowledge of computation or chemistry, (ii) environmental scientists who are primarily concerned about environmental science and engineering but only secondarily concerned about computational details, and (iii) computational scientists including those primarily interested in parallel computation. The promising results have encouraged us to continue with the experiment. Of course, one of the most important benefits of the experiment is the Southern California airshed PSE itself. We encourage readers to evaluate the airshed PSE.

The remaining parts of the experiment are to formalize the 3D+T+Mk archetypal PSE, and to then evaluate the ease of applying the archetype to different problems, some similar to the airshed PSE and others that are very different. We have evaluated a sequence of such problems, and expect to develop problem-specific PSEs for other problems in the coming year.

ACKNOWLEDGEMENTS

The authors would like to thank the division of Advanced Scientific Computing of the National Science Foundation for providing support for the work reported here under grant CCR-9527130 and the National Science Foundation Center for Research in Parallel Computation. This research was performed in part using the Intel Paragon System at the Caltech Center for Advanced Computing Research.

REFERENCES

Figure 1: CPU time for a 24-hour simulation of the South Coast Air Basin using a parallel version of the CIT model on various parallel architectures.

Figure 2:
Modules of the Problem Solving Environment for air quality models.

Figure 3: Snapshot from the visualization module of the Problem Solving Environment for the CIT air quality model. This view shows the concentration of ozone in the South Coast Air Basin of California at 15:00 hours for August 27, 1987.

Dabdub's Home Page