A PSE for Airshed Models Using the 3D+T+Mk Archetype

MANAGING SPECIFICITY AND GENERALITY:

TAILORING GENERAL ARCHETYPAL PSEs TO SPECIFIC USERS.

Donald Dabdub⁽¹⁾ and K. Mani Chandy⁽²⁾

Department of Mechanical and Aerospace Engineering

University of California, Irvine

Irvine, CA 92697-3975

Department of Computer Sciences

California Institute of Technology 256-80

Pasadena, CA 91125

ABSTRACT

PSEs encapsulate computational details such as parallel machine architecture, programming language and algorithms leaving the user free to focus attention on computational experiments. PSEs that are tailored to a specific class of users encapsulate more of the computational details germane to those users. Narrowly-tailored PSEs are more useful to the target class of user but developing PSEs for many different classes of users is expensive. This paper reports on an experiment to deal with the specificity-generality dimension of PSEs by using archetypal PSEs. The experiment consists of developing a collection of PSEs for a class of problems all of which deal with 3 dimensions, time, and a collection of attributes of points in the space-time domain. We start with PSEs that deal with air-quality models and then progress to application areas that get increasingly remote from air quality and the environment. Another dimension we explore is that of user sophistication: we start with experts in air-quality models and then progress to college students, experts in areas such as public policy who are not environmental engineers, and then to high-school students. Our experiments suggest that developing collections of related PSEs, or archetypal PSEs, can be helpful in reducing some of the effort required to develop and maintain user-specific PSEs.

INTRODUCTION

Computers in general, and parallel computers in particular, can be powerful tools for scientists, engineers and managers, if the users can focus attention on their specific problems rather than on general computational issues. One way to package a computational solution to a problem is to enclose it within a problem-solving environment (PSE) that is specific to the problem of interest. A PSE is a set of tools and methodologies designed to formulate the problem, to solve the problem and to analyze the results; all in a user friendly environment that is natural to the problem domain.

As stated by Gallopoulos et al. (1) the ideal PSEs provide "a framework that is all things to all people: they solve simple or complex problems, support rapid prototyping or detailed analysis, and can be used in introductory education or at the frontiers of science."

In particular, a PSE designed specifically for air quality modeling allows environmental scientists, public policy planners, interested citizens and students to explore the dynamics of photochemical smog. A PSE allows them to focus their attention on issues of interest to them such as sources of pollution, chemistry, and wind and temperature patterns, while paying little attention to the machine (parallel architecture, message-passing or shared-memory system, programming languages, graphics packages) on which their programs execute.

RESEARCH GOALS

Diversity of Interests

The people who use a computer as a simulation engine may want to exploit different levels of detail of the simulator. Undergraduate students who use an airshed model are primarily interested in the interactions between types and quantity of pollutants on the one hand and domain factors such as wind, topology and temperature on the other; they are not interested in manipulating the simulation program itself. Environmental engineers are interested in the issues in which undergraduate students are interested, and in addition, want to investigate algorithmic issues and study different simulation scenarios. Others may want to port simulations to different parallel computers. A question that we are investigating is as follows: Is it possible to design a problem solving environment (PSE) that helps users exploit a simulation engine to the level of detail of concern to them?

Specificity

A related issue is that of specificity versus generality of the PSE. A PSE that is a precise fit for a person's needs is easier to use than a more general one that has to be tailored. An air quality model (AQM) of Southern California is a subset a family of fluid dynamics models. The application has a high degree of specificity. The problem with high degrees of specificity is that large numbers of PSEs are required to fit the many different specific requirements precisely. Our challenge is to create PSE-development methods that help in producing PSEs that can be tailored to a high-degree of specificity with relatively little effort.

Composability of PSEs

PSEs can support collaborations between people with different interests in solving a common problem. For instance, chemists, environmental engineers, health professionals, business leaders and public policy experts may collaborate on making decisions about an emission-control strategy in Los Angeles County. These people may use different tools corresponding to their specific interests; all these tools have to be linked into a collaborative PSE. A question that we are working on is: How can different kinds of PSE tools (air quality models, spreadsheets, automobile traffic models) be composed to form an integrated environment?

Performance Tuning for Target Computer Architectures

High-performance simulations require that simulations be tuned to obtain performance on target machines. Some users do not want to have anything to do with tuning simulations, and they accept the best performance that automatic tuning can provide. Others are willing to provide some information to the runtime mechanisms to help them tune the application to fit the target machine, especially when the machine is parallel. Our challenge is to design a PSE through which the user can provide information about performance tuning their simulations for specific machines.

Exploiting Internet Technology

Internet technology can be exploited in several ways including (i) supporting users anywhere with Internet access to use the PSE remotely, or to download the PSE, and (ii) helping users get access to appropriate machines, particularly supercomputers, to execute their simulations. The focus of our project, so far, has been on the first issue. We are specifically interested in giving concerned citizens all over the country access to models that they can use to understand their natural environment.

Summary of Scientific Issues

In summary, the scientific questions addressed by our research are:

How can PSEs help users exploit the PSE to the level of detail of concern to them?
How can we make PSE "machine tools" that can be used to tailor PSEs to a high degree of specificity?
How can different kinds of PSEs be composed into an integrated environment?
How can PSEs help in tailoring a simulation to target machines?
How can network technology be exploited to get a PSE to every citizen that wants one and has Internet access?

THE EXPERIMENT

Our experiment consists of (i) planning a PSE archetype for a related class of problems, (ii) tailoring the archetype for specific applications and users with a specific level of sophistication in the application area, (iii) getting feedback from users, and (iv) evaluating the costs and benefits of using PSE archetypes for developing a class of related PSEs. Next, we give an overview of the experiment and discuss the experiment in terms of the questions raised in the last section.

The problem domains from the most specific to the least specific are as follows:

AQMs (Air Quality Models) of Southern California using the CIT model.
Other AQMs for any region.
Environmental models in addition to AQMs.
Computational fluid dynamics with chemistry.
Computing optimal flight paths for airplanes that minimize risk to the airplanes in a hostile environment. Though this problem is not a fluid dynamics problem, the problem space is 3D space, time, and several parameters such as risk associated with each edge.
Atomic physics. (This tests the limits of reusability of specific PSEs because the subject matter is very different from airsheds.)

Our experiment is designed to test the extensibility of the PSE infrastructure.

The classes of users that we have targeted are: First, citizens who are concerned about their environment, but who may not be scientists. An even more specific group within this class are high-school students in Southern California interested in the environment. These people would use the PSE from remote sites via the Internet. Second, environmental scientists and researchers at the graduate school level and up, and a more specific group within this class of researchers who specialize in computer simulations of the environment.

CURRENT STATUS OF AIR QUALITY MODELING

Development of Air Quality Models

Mathematical models used to study the dynamics of photochemical air pollution were first developed in the early 1970s. There have been comprehensive research efforts in the identification, formulation and numerical solution of the main physical and chemical processes associated with ozone production. Tesche (2) and Seinfeld (3) describe the development and applications of urban air quality models. A topic of current research in model development is that of incorporating the aerosol phase into existing air quality models. The heavy computational demands imposed by the aerosol computations provide one of the driving forces motivating the use of parallel computers.

Parallelization of Air Quality Models

Considerable research efforts has been devoted to domain-decomposition strategies that implement air-quality models on parallel supercomputers. Results from previous research indicate that parallel implementation of the chemistry operator, transport operator, and I/O routines are required to obtain the highest speed-ups. A typical 24-hour run to simulate gas-phase pollutant dynamics on the South Coast Air Basin of California using the California Institute of Technology (CIT) model requires less than 7 minutes on the Intel Paragon with 128 nodes (4). Figure 1 shows the performance of the parallel implementation of the CIT model on different architectures. Differences in performance are due to the different processor and network speeds of each parallel computer. The PSE developed in this work can use both the sequential and parallel implementations of the CIT model.

Problem Solving Environments for Air Quality Modeling

Much of the research on PSEs has dealt with environments for computational mathematics in differential equations and linear algebra. For instance, Langtangen (5) and Weerawarana et al. (6) present a PSE for differential equations and a set of tools to develop PSEs respectively. There has been some research to develop problem solving environments that are tailored to specific applications. For instance, Fraga and McKinnon (7) have developed a PSE for the automated synthesis of chemical process flowsheets. However, there has been little research and development of a PSE designed for air quality modeling.

DESIGN TOOLS AND METHODOLOGIES

The PSE uses the CIT air-quality model (AQM) as the first environmental application. Other typical urban- and regional-scale AQMs have a structure similar to that of the CIT model. Thus there is no loss of generality in considering this model as a test case. The tool used to develop the PSE is Tcl/Tk. Tcl, pronounced tickle, stands for "tool command language." Tcl is actually two things: a language and a library (8). First, Tcl is a simple textual language intended primarily for issuing commands to interactive programs. Second, Tcl is a library package that we embed in our model. The Tcl library consists of a parser for the Tcl language, routines to implement the Tcl built-in commands, and procedures that allow each application to extend Tcl with additional commands specific to that application.

There are various advantages to using Tcl to develop the PSE. First, Tcl provides a standard syntax: once users know Tcl, they will be able to issue commands easily to any Tcl-based application. Second, Tcl provides programmability. All a Tcl application needs to do is to implement a few application-specific low-level commands. Tcl provides many utility commands plus a general programming interface for building up complex command procedures. By using Tcl, applications need not re-implement these features. Third, extensions to Tcl, such as the Tk toolkit, provide mechanisms for communicating between applications by sending Tcl commands back and forth. The common Tcl language framework makes it easier for applications to communicate with one another. Fourth, Tcl is available free of charge. Fifth, it runs on a wide variety of platforms. Sixth, Tcl provides the capability to interact with popular World Wide Web browsers to exploit Internet technology.

The central abstraction of the PSE for air quality models deals with space, time and a collection of model data. We call this problem domain the 3D+T+M^k domain, where there are three dimensions of space (3D), one dimension of time (T) and k dimensions inherent to the model. The 3D+T+M^k abstraction is used to construct parallel program archetypes, navigate through input and output data, and manage I/O. For example, the time series plot for ozone concentration at a given monitoring station or an animated display of the isopleth of a given pollutant are different projections of the 3D+T+M^k space. By using a higher level of abstraction in the design of the PSE, the main structure of the code can be reused to develop a PSE for other air quality models or other environmental applications. A focus of our research is the evaluation of reuse of the 3D+T+M^k archetype.

COMPONENTS OF THE PROBLEM SOLVING ENVIRONMENT

The main components of the PSE are shown in Figure 2. This section discusses each of the modules already incorporated and comments on the need of parallel program archetypes that have not yet been implemented. We designed the following components to be reusable.

Database Management

Air quality models require large sets of input data: time, space, or time/space dependent variables. In addition, the type of data used might be scalar (like temperature fields) or vector (like wind fields). We developed a module within the PSE that describes, extracts, and integrates the large meteorological and/or emission inventory databases required by environmental models. The standardization of data models is crucial for the development of reusable object-model-based libraries such as parallel archetypes or visualization routines. In addition, the conceptual data model eases the design of communication channels among all objects within the PSE.

Algorithm Modularity

The computational needs of air quality models are often concentrated on the solution of a few operators. For example, the most challenging numerical aspects of solving the atmospheric diffusion equation are the chemistry operator and the advection operator. The chemistry operator consists of solving a system of stiff nonlinear, coupled ordinary differential equations. It is well known that the main challenge presented by a chemistry solver is that of performance and robustness. The advection operator consists of solving a hyperbolic partial differential equation to account for the transport processes of pollutants in the atmosphere. The main challenge presented by the advection solver is that of accuracy. The PSE packages a number of algorithms in a modular fashion to allow the rapid prototyping of the numerical techniques to be used by the chemistry, transport and filtering algorithms of the air quality model.

Analysis and Visualization

Air quality models produce large output data files. When the model output has been generated, the engineering and scientific analysis component of the work has only begun. We developed a visualization component within the PSE that manipulates both the input and data generated by the model. The user is allowed to interactively explore a variety of data representations. These options for analysis include the capability of displaying fixed-space and variable-time data (time series plots), variable-space and fixed-time data (contour plots) using a color-coded approach, as well as the animated display of model predictions (see Figure 3).

I/O Management

While air quality models are executing, results are written to disk. The data sets generated are rather large and, in most cases, unformatted. The objective of the I/O manager is to have a standard representation of the output data as it is moved into a heterogeneous network containing sequential, distributed and parallel architectures. The I/O manager also provides a common representation of data regardless of its location (or locations, in the case of a parallel file system).

Initial and Boundary Conditions Manager

The initial conditions and boundary conditions of models are stored as objects in the PSE.

In this manner, they are isolated from the kernel physics and chemistry of the air quality model. A researcher is able to quickly and interactively reconfigure and retest the simulations with different initial and/or boundary conditions without the need to rewrite code.

Parallel Program Archetypes

Parallel computers have begun to enjoy wide usage in environmental applications. In many areas of environmental modeling, the use of parallel computers is no longer a luxury but a necessity. Nevertheless, porting serial models to parallel architectures has been regarded as more challenging than developing correct sequential software, due primarily to the fact that modelers may have to deal with nondeterminacy and multiple threads of execution. We have developed parallel program archetypes that abstracts the parallel structure of the program into a skeleton; the users fleshes out the skeleton by providing sequential programs for the slots of the skeleton. Thus, users can (if they so choose) focus primarily on sequential programming issues, allowing the parallel archetype to take care of parallel features. For instance the mesh-spectral archetype (9,10) used to parallelize the CIT air quality can be used to parallelize any air-quality model or, for that matter, any model that follows a similar data flow dependency. The air-quality model was designed before we had completed development of the mesh-spectral archetype, but it fits the structure of the archetype.

Some of the key questions that must be addressed in order to develop parallel archetypes are: What is the best way to map model processes to the computer processors? How can the best load balancing be achieved? When should one follow a task versus a data parallel paradigm? How should parallel archetypes be integrated with the work bench? The use of archetypes may not provide the performance obtained by optimizing message passing, but it simplifies the task of parallelization. This is analogous to the use of high-level languages: one may be able to obtain higher performance using assembly code, but the ease of programming justifies the reduced performance. In addition to the benefit of reducing the effort required to produce efficient and accurate concurrent environmental applications, parallel archetypes help in code portability between different runtime systems and machines.

THE PROBLEM SOLVING ENVIRONMENT IN THE CLASSROOM

The PSE for airshed modeling enables teachers, students, and concerned citizens to use parallel computing to study air pollution patterns and understand the consequences of public policy on pollution control. The PSE has been used in university courses this year and will be accessible to the K-12 classroom in the near future. The airshed model PSE is helping students understand and appreciate the relevance of science in their daily lives while taking advantage of parallel computation. PSEs allow students to deal with scientific problems without having to be concerned about parallelism.

The PSE has different goals for K-12 and college students. The primary goal for K-12 students is motivating the importance of science in their daily lives in an immediate and direct way: science and technology has an impact on every breath they take. Science can come alive with the help of tools that help students drive scenarios where they can understand the impact of temperature, wind patterns, automobiles, electric vehicles, and factories, on the air they breathe. K-12 students will be hand-held by the PSE as they explore a variety of scenarios. The scenarios can be developed easily by instructors to guide the students through the material that is considered appropriate.

School children are probably going to use commodity uniprocessors, whereas college students are more likely to have access to parallel supercomputers. To overcome the need for fast response required to maintain the attention span of students, the PSE for K-12 will use pre-calculated scenarios tailored to their needs. The data sets of the South Coast Air Basin of California, for example, require a rather large amount of disk space. Furthermore, since some schools might have lower-bandwidth access to the internet than colleges, we are exploring putting the PSE on CD-ROMs for K-12. We started with a focus on college students, with a plan to extend the PSE to K-12, because: (i) we understood college students better, (ii) we could get feedback from our students, and (iii) the PSE is currently designed for research scientists who are closer to college students than to school students.

The goal for college students is to introduce them to environmental computational modeling and air pollution control. The PSE has been used to teach undergraduate and graduate students at the University of California, Irvine. Other colleges are considering its use for next year. A formative evaluation review form for college students has been developed to provide feedback about the use of PSE in the classroom. Results indicate that on the average 80% of the features of the PSE were useful to gain a greater understanding of the dynamics of atmospheric pollutants. In particular, the greatest single feature considered most instructive by the students is the ability to generate animated color contour plots of pollutant mixing ratios.

CONCLUSIONS

This paper describes an experiment on the development of a "machine tool" for PSEs. The hypothesis is that a robust, easy-to-use PSE abstract structure can be tailored to obtain PSEs for specific problems simply. The effort required to develop a robust problem-specific PSE is very high because the PSE has to help users explore the problem space no matter what they do; our hypothesis is that this effort can be amortized over several problem-specific PSEs by employing an abstraction. The problem-space abstraction that we use is the 3D+T+M^k archetype for 3D space, time and k parameters at each point. We expect to be able to reuse our abstraction to develop PSEs for problems that fit this structure.

Our experiment to test our hypothesis consists of developing a PSE archetypal structure for the 3D+T+M^k problem space and applying this structure to a specific problem. The first problem we chose was a simulation of an airshed. The abstract PSE structure was developed concurrently with the development of the airshed PSE with the airshed problem suggesting the design of the abstraction, and the design of the abstraction suggesting ideas for the airshed PSEs. The first version of a PSE for simulating the airshed for Southern California is now complete and will be used for courses in environmental engineering. The back-end of the PSE can be a sequential or parallel computer.

The first part of the experiment consists of evaluating the completeness of the PSE for the Southern California airshed; there is little point using an abstraction that cannot be used even for one specific problem. This part of the experiment is well under way and the results are promising. The target classes of users are (i) environmentally-concerned citizens with little knowledge of computation or chemistry, (ii) environmental scientists who are primarily concerned about environmental science and engineering but only secondarily concerned about computational details, and (iii) computational scientists including those primarily interested in parallel computation. The promising results have encouraged us to continue with the experiment. Of course, one of the most important benefits of the experiment is the Southern California airshed PSE itself. We encourage readers to evaluate the airshed PSE.

The remaining parts of the experiment are to formalize the 3D+T+M^karchetypal PSE, and to then evaluate the ease of applying the archetype to different problems, some similar to the airshed PSE and others that are very different. We have evaluated a sequence of such problems, and expect to develop problem-specific PSEs for other problems in the coming year.

ACKNOWLEDGEMENTS

We wish to thank Prof. Tom Hewett for making the evaluation review form available to us and for his valuable assistance in the project. The authors would like to thank the division of Advanced Scientific Computing of the National Science Foundation for providing support for the work reported here under grant CCR-9527130 and the National Science Foundation Center for Research in Parallel Computation. This research was performed in part using the Intel Paragon System at the Caltech Center for Advanced Computing Research.

REFERENCES

Gallopoulos, E.; Houstis, E.N.; Rice, J.R. IEEE Comp. Sci. Engr. 1994 1, 11-23.
Tesche, T.W. Environ. Int. 1983 9, 465-490.
Seinfeld, J.H. J. Air Pollut. Control Assoc. 1988 38, 616-645.
Dabdub D.; Seinfeld J.H. Parallel Computing 1996 22, 111-130.
Langtangen, H.P. Diffpack: Software for partial differential equations http://www.oslo.sintef.no/diffpack/dplibrary.html
Weerawarana, S., Houstis, E.N., Rice, J.R., Catlin, A.C., Crabill, C.L., Chui, C.C. and Marcus, S.; PDELab: An Object-Oriented Framework for Building Problem Solving Environments for PDE Based Applications, Technical Report CSD-TR-94-021, Department of Computer Sciences, Purdue University, 1994.
Fraga, E.S. and McKinnon, K.I.M.; CHiPS: A Process Synthesis Package, Technical Report 1993-06, Department of Chemical Engineering, Edinburgh University, 1993.
Ousterhout, J.K.; Tcl and the Tk Toolkit; Addison-Wesley Professional Computing.: Massachusetts, 1994.
Chandy, K.M., Manohar R., Massingill B.L., and Meiron D.I.; Integrating Task and Data Parallelism with the Group Communication Archetype, International Parallel Processing Symposium, 1995.
Chandy, K.M.; Concurrent Program Archetypes, Proceedings of the Scalable Parallel Library Conference, 1994.

Figure 1: CPU time for a 24-hour simulation of the South Coast Air Basin using a parallel version of

the CIT model on various parallel architectures.

Figure 2: Modules of the Problem Solving Environment for air quality models.

Figure 3: Snapshot from the visualization module of the Problem Solving Environment for the CIT

air quality model. This view shows the concentration of ozone in the South Coast Air Basin of California at 15:00 hours for August 27, 1987.