PSEs encapsulate computational details such as parallel machine
architecture, programming language and algorithms leaving the
user free to focus attention on computational experiments. PSEs
that are tailored to a specific class of users encapsulate more
of the computational details germane to those users. Narrowly-tailored
PSEs are more useful to the target class of user but developing
PSEs for many different classes of users is expensive. This paper
reports on an experiment to deal with the specificity-generality
dimension of PSEs by using archetypal PSEs. The experiment consists
of developing a collection of PSEs for a class of problems all
of which deal with 3 dimensions, time, and a collection of attributes
of points in the space-time domain. We start with PSEs that deal
with air-quality models and then progress to application areas
that get increasingly remote from air quality and the environment.
Another dimension we explore is that of user sophistication: we
start with experts in air-quality models and then progress to
college students, experts in areas such as public policy who are
not environmental engineers, and then to high-school students.
Our experiments suggest that developing collections of related
PSEs, or archetypal PSEs, can be helpful in reducing some of the
effort required to develop and maintain user-specific PSEs.
Computers in general, and parallel computers in particular, can be powerful tools for scientists, engineers and managers, if the users can focus attention on their specific problems rather than on general computational issues. One way to package a computational solution to a problem is to enclose it within a problem-solving environment (PSE) that is specific to the problem of interest. A PSE is a set of tools and methodologies designed to formulate the problem, to solve the problem and to analyze the results; all in a user friendly environment that is natural to the problem domain.
As stated by Gallopoulos et al. (1) the ideal PSEs provide "a framework that is all things to all people: they solve simple or complex problems, support rapid prototyping or detailed analysis, and can be used in introductory education or at the frontiers of science."
In particular, a PSE designed specifically for air quality modeling
allows environmental scientists, public policy planners, interested
citizens and students to explore the dynamics of photochemical
smog. A PSE allows them to focus their attention on issues of
interest to them such as sources of pollution, chemistry, and
wind and temperature patterns, while paying little attention to
the machine (parallel architecture, message-passing or shared-memory
system, programming languages, graphics packages) on which their
Diversity of Interests
The people who use a computer as a simulation engine may want
to exploit different levels of detail of the simulator. Undergraduate
students who use an airshed model are primarily interested in
the interactions between types and quantity of pollutants on the
one hand and domain factors such as wind, topology and temperature
on the other; they are not interested in manipulating the simulation
program itself. Environmental engineers are interested in the
issues in which undergraduate students are interested, and in
addition, want to investigate algorithmic issues and study different
simulation scenarios. Others may want to port simulations to
different parallel computers. A question that we are investigating
is as follows: Is it possible to design a problem solving environment
(PSE) that helps users exploit a simulation engine to the level
of detail of concern to them?
A related issue is that of specificity versus generality of the
PSE. A PSE that is a precise fit for a person's needs is easier
to use than a more general one that has to be tailored. An air
quality model (AQM) of Southern California is a subset a family
of fluid dynamics models. The application has a high degree of
specificity. The problem with high degrees of specificity is
that large numbers of PSEs are required to fit the many different
specific requirements precisely. Our challenge is to create PSE-development
methods that help in producing PSEs that can be tailored to a
high-degree of specificity with relatively little effort.
Composability of PSEs
PSEs can support collaborations between people with different
interests in solving a common problem. For instance, chemists,
environmental engineers, health professionals, business leaders
and public policy experts may collaborate on making decisions
about an emission-control strategy in Los Angeles County. These
people may use different tools corresponding to their specific
interests; all these tools have to be linked into a collaborative
PSE. A question that we are working on is: How can different kinds
of PSE tools (air quality models, spreadsheets, automobile traffic
models) be composed to form an integrated environment?
Performance Tuning for Target Computer Architectures
High-performance simulations require that simulations be tuned
to obtain performance on target machines. Some users do not want
to have anything to do with tuning simulations, and they accept
the best performance that automatic tuning can provide. Others
are willing to provide some information to the runtime mechanisms
to help them tune the application to fit the target machine, especially
when the machine is parallel. Our challenge is to design a PSE
through which the user can provide information about performance
tuning their simulations for specific machines.
Exploiting Internet Technology
Internet technology can be exploited in several ways including
(i) supporting users anywhere with Internet access to use the
PSE remotely, or to download the PSE, and (ii) helping users get
access to appropriate machines, particularly supercomputers, to
execute their simulations. The focus of our project, so far, has
been on the first issue. We are specifically interested in giving
concerned citizens all over the country access to models that
they can use to understand their natural environment.
Summary of Scientific Issues
In summary, the scientific questions addressed by our research are:
Our experiment consists of (i) planning a PSE archetype for a related class of problems, (ii) tailoring the archetype for specific applications and users with a specific level of sophistication in the application area, (iii) getting feedback from users, and (iv) evaluating the costs and benefits of using PSE archetypes for developing a class of related PSEs. Next, we give an overview of the experiment and discuss the experiment in terms of the questions raised in the last section.
The problem domains from the most specific to the least specific
are as follows:
Our experiment is designed to test the extensibility of the PSE
The classes of users that we have targeted are: First, citizens who are concerned about their environment, but who may not be scientists. An even more specific group within this class are high-school students in Southern California interested in the environment. These people would use the PSE from remote sites via the Internet. Second, environmental scientists and researchers at the graduate school level and up, and a more specific group within this class of researchers who specialize in computer simulations of the environment.
CURRENT STATUS OF AIR QUALITY MODELING
Development of Air Quality Models
Mathematical models used to study the dynamics of photochemical
air pollution were first developed in the early 1970s. There
have been comprehensive research efforts in the identification,
formulation and numerical solution of the main physical and chemical
processes associated with ozone production. Tesche (2) and Seinfeld
(3) describe the development and applications of urban air quality
models. A topic of current research in model development is that
of incorporating the aerosol phase into existing air quality models.
The heavy computational demands imposed by the aerosol computations
provide one of the driving forces motivating the use of parallel
Parallelization of Air Quality Models
Considerable research efforts has been devoted to domain-decomposition
strategies that implement air-quality models on parallel supercomputers.
Results from previous research indicate that parallel implementation
of the chemistry operator, transport operator, and I/O routines
are required to obtain the highest speed-ups. A typical 24-hour
run to simulate gas-phase pollutant dynamics on the South Coast
Air Basin of California using the California Institute of Technology
(CIT) model requires less than 7 minutes on the Intel Paragon
with 128 nodes (4). Figure 1 shows the performance of the parallel
implementation of the CIT model on different architectures. Differences
in performance are due to the different processor and network
speeds of each parallel computer. The PSE developed in this work
can use both the sequential and parallel implementations of the
Problem Solving Environments for Air Quality Modeling
Much of the research on PSEs has dealt with environments for computational
mathematics in differential equations and linear algebra. For
instance, Langtangen (5) and Weerawarana et al. (6) present
a PSE for differential equations and a set of tools to develop
PSEs respectively. There has been some research to develop problem
solving environments that are tailored to specific applications.
For instance, Fraga and McKinnon (7) have developed a PSE for
the automated synthesis of chemical process flowsheets. However,
there has been little research and development of a PSE designed
for air quality modeling.
DESIGN TOOLS AND METHODOLOGIES
The PSE uses the CIT air-quality model (AQM) as the first
environmental application. Other typical urban- and regional-scale
AQMs have a structure similar to that of the CIT model. Thus
there is no loss of generality in considering this model as a
test case. The tool used to develop the PSE is Tcl/Tk. Tcl,
pronounced tickle, stands for "tool command language."
Tcl is actually two things: a language and a library (8).
First, Tcl is a simple textual language intended primarily for
issuing commands to interactive programs. Second, Tcl is a library
package that we embed in our model. The Tcl library consists
of a parser for the Tcl language, routines to implement the Tcl
built-in commands, and procedures that allow each application
to extend Tcl with additional commands specific to that application.
There are various advantages to using Tcl to develop the PSE.
First, Tcl provides a standard syntax: once users know Tcl,
they will be able to issue commands easily to any Tcl-based application.
Second, Tcl provides programmability. All a Tcl application
needs to do is to implement a few application-specific low-level
commands. Tcl provides many utility commands plus a general programming
interface for building up complex command procedures. By using
Tcl, applications need not re-implement these features. Third,
extensions to Tcl, such as the Tk toolkit, provide mechanisms
for communicating between applications by sending Tcl commands
back and forth. The common Tcl language framework makes it easier
for applications to communicate with one another. Fourth, Tcl
is available free of charge. Fifth, it runs on a wide variety
of platforms. Sixth, Tcl provides the capability to interact
with popular World Wide Web browsers to exploit Internet technology.
The central abstraction of the PSE for air quality models deals
with space, time and a collection of model data. We call this
problem domain the 3D+T+Mk domain, where there are
three dimensions of space (3D), one dimension of time (T) and
k dimensions inherent to the model. The 3D+T+Mk abstraction
is used to construct parallel program archetypes, navigate through
input and output data, and manage I/O. For example, the time
series plot for ozone concentration at a given monitoring station
or an animated display of the isopleth of a given pollutant are
different projections of the 3D+T+Mk space. By using
a higher level of abstraction in the design of the PSE, the main
structure of the code can be reused to develop a PSE for other
air quality models or other environmental applications. A focus
of our research is the evaluation of reuse of the 3D+T+Mk
COMPONENTS OF THE PROBLEM SOLVING ENVIRONMENT
The main components of the PSE are shown in Figure 2.
This section discusses each of the modules already incorporated
and comments on the need of parallel program archetypes that have
not yet been implemented. We designed the following components
to be reusable.
Air quality models require large sets of input data: time, space,
or time/space dependent variables. In addition, the type of data
used might be scalar (like temperature fields) or vector (like
wind fields). We developed a module within the PSE that describes,
extracts, and integrates the large meteorological and/or emission
inventory databases required by environmental models. The standardization
of data models is crucial for the development of reusable object-model-based
libraries such as parallel archetypes or visualization routines.
In addition, the conceptual data model eases the design of communication
channels among all objects within the PSE.
The computational needs of air quality models are often concentrated
on the solution of a few operators. For example, the most challenging
numerical aspects of solving the atmospheric diffusion equation
are the chemistry operator and the advection operator. The chemistry
operator consists of solving a system of stiff nonlinear, coupled
ordinary differential equations. It is well known that the main
challenge presented by a chemistry solver is that of performance
and robustness. The advection operator consists of solving a
hyperbolic partial differential equation to account for the transport
processes of pollutants in the atmosphere. The main challenge
presented by the advection solver is that of accuracy. The PSE
packages a number of algorithms in a modular fashion to allow
the rapid prototyping of the numerical techniques to be used by
the chemistry, transport and filtering algorithms of the air quality
Analysis and Visualization
Air quality models produce large output data files. When the
model output has been generated, the engineering and scientific
analysis component of the work has only begun. We developed a
visualization component within the PSE that manipulates both the
input and data generated by the model. The user is allowed to
interactively explore a variety of data representations. These
options for analysis include the capability of displaying fixed-space
and variable-time data (time series plots), variable-space and
fixed-time data (contour plots) using a color-coded approach,
as well as the animated display of model predictions (see Figure
While air quality models are executing, results are written to
disk. The data sets generated are rather large and, in most cases,
unformatted. The objective of the I/O manager is to have a standard
representation of the output data as it is moved into a heterogeneous
network containing sequential, distributed and parallel architectures.
The I/O manager also provides a common representation of data
regardless of its location (or locations, in the case of a parallel
Initial and Boundary Conditions Manager
The initial conditions and boundary conditions of models are stored as objects in the PSE.
In this manner, they are isolated from the kernel physics and
chemistry of the air quality model. A researcher is able to quickly
and interactively reconfigure and retest the simulations with
different initial and/or boundary conditions without the need
to rewrite code.
Parallel Program Archetypes
Parallel computers have begun to enjoy wide usage in environmental applications. In many areas of environmental modeling, the use of parallel computers is no longer a luxury but a necessity. Nevertheless, porting serial models to parallel architectures has been regarded as more challenging than developing correct sequential software, due primarily to the fact that modelers may have to deal with nondeterminacy and multiple threads of execution. We have developed parallel program archetypes that abstracts the parallel structure of the program into a skeleton; the users fleshes out the skeleton by providing sequential programs for the slots of the skeleton. Thus, users can (if they so choose) focus primarily on sequential programming issues, allowing the parallel archetype to take care of parallel features. For instance the mesh-spectral archetype (9,10) used to parallelize the CIT air quality can be used to parallelize any air-quality model or, for that matter, any model that follows a similar data flow dependency. The air-quality model was designed before we had completed development of the mesh-spectral archetype, but it fits the structure of the archetype.
Some of the key questions that must be addressed in order to develop
parallel archetypes are: What is the best way to map model processes
to the computer processors? How can the best load balancing be
achieved? When should one follow a task versus a data parallel
paradigm? How should parallel archetypes be integrated with the
work bench? The use of archetypes may not provide the performance
obtained by optimizing message passing, but it simplifies the
task of parallelization. This is analogous to the use of high-level
languages: one may be able to obtain higher performance using
assembly code, but the ease of programming justifies the reduced
performance. In addition to the benefit of reducing the effort
required to produce efficient and accurate concurrent environmental
applications, parallel archetypes help in code portability between
different runtime systems and machines.
THE PROBLEM SOLVING ENVIRONMENT IN THE CLASSROOM
The PSE for airshed modeling enables teachers, students, and concerned citizens to use parallel computing to study air pollution patterns and understand the consequences of public policy on pollution control. The PSE has been used in university courses this year and will be accessible to the K-12 classroom in the near future. The airshed model PSE is helping students understand and appreciate the relevance of science in their daily lives while taking advantage of parallel computation. PSEs allow students to deal with scientific problems without having to be concerned about parallelism.
The PSE has different goals for K-12 and college students. The primary goal for K-12 students is motivating the importance of science in their daily lives in an immediate and direct way: science and technology has an impact on every breath they take. Science can come alive with the help of tools that help students drive scenarios where they can understand the impact of temperature, wind patterns, automobiles, electric vehicles, and factories, on the air they breathe. K-12 students will be hand-held by the PSE as they explore a variety of scenarios. The scenarios can be developed easily by instructors to guide the students through the material that is considered appropriate.
School children are probably going to use commodity uniprocessors, whereas college students are more likely to have access to parallel supercomputers. To overcome the need for fast response required to maintain the attention span of students, the PSE for K-12 will use pre-calculated scenarios tailored to their needs. The data sets of the South Coast Air Basin of California, for example, require a rather large amount of disk space. Furthermore, since some schools might have lower-bandwidth access to the internet than colleges, we are exploring putting the PSE on CD-ROMs for K-12. We started with a focus on college students, with a plan to extend the PSE to K-12, because: (i) we understood college students better, (ii) we could get feedback from our students, and (iii) the PSE is currently designed for research scientists who are closer to college students than to school students.
The goal for college students is to introduce them to environmental
computational modeling and air pollution control. The PSE has
been used to teach undergraduate and graduate students at the
University of California, Irvine. Other colleges are considering
its use for next year. A formative evaluation review form for
college students has been developed to provide feedback about
the use of PSE in the classroom. Results indicate that on the
average 80% of the features of the PSE were useful to gain a greater
understanding of the dynamics of atmospheric pollutants. In particular,
the greatest single feature considered most instructive by the
students is the ability to generate animated color contour plots
of pollutant mixing ratios.
This paper describes an experiment on the development of a "machine tool" for PSEs. The hypothesis is that a robust, easy-to-use PSE abstract structure can be tailored to obtain PSEs for specific problems simply. The effort required to develop a robust problem-specific PSE is very high because the PSE has to help users explore the problem space no matter what they do; our hypothesis is that this effort can be amortized over several problem-specific PSEs by employing an abstraction. The problem-space abstraction that we use is the 3D+T+Mk archetype for 3D space, time and k parameters at each point. We expect to be able to reuse our abstraction to develop PSEs for problems that fit this structure.
Our experiment to test our hypothesis consists of developing a PSE archetypal structure for the 3D+T+Mk problem space and applying this structure to a specific problem. The first problem we chose was a simulation of an airshed. The abstract PSE structure was developed concurrently with the development of the airshed PSE with the airshed problem suggesting the design of the abstraction, and the design of the abstraction suggesting ideas for the airshed PSEs. The first version of a PSE for simulating the airshed for Southern California is now complete and will be used for courses in environmental engineering. The back-end of the PSE can be a sequential or parallel computer.
The first part of the experiment consists of evaluating the completeness of the PSE for the Southern California airshed; there is little point using an abstraction that cannot be used even for one specific problem. This part of the experiment is well under way and the results are promising. The target classes of users are (i) environmentally-concerned citizens with little knowledge of computation or chemistry, (ii) environmental scientists who are primarily concerned about environmental science and engineering but only secondarily concerned about computational details, and (iii) computational scientists including those primarily interested in parallel computation. The promising results have encouraged us to continue with the experiment. Of course, one of the most important benefits of the experiment is the Southern California airshed PSE itself. We encourage readers to evaluate the airshed PSE.
The remaining parts of the experiment are to formalize the 3D+T+Mk archetypal PSE, and to then evaluate the ease of applying the archetype to different problems, some similar to the airshed PSE and others that are very different. We have evaluated a sequence of such problems, and expect to develop problem-specific PSEs for other problems in the coming year.
We wish to thank Prof. Tom Hewett for making the evaluation review
form available to us and for his valuable assistance in the project.
The authors would like to thank the division of Advanced Scientific
Computing of the National Science Foundation for providing support
for the work reported here under grant CCR-9527130 and the National
Science Foundation Center for Research in Parallel Computation.
This research was performed in part using the Intel Paragon System
at the Caltech Center for Advanced Computing Research.
Figure 1: CPU time for a 24-hour simulation of the South Coast Air Basin using a parallel version of
the CIT model on various parallel architectures.
Figure 2: Modules of the Problem Solving Environment for
air quality models.
Figure 3: Snapshot from the visualization module of the Problem Solving Environment for the CIT
air quality model. This view shows the concentration of ozone
in the South Coast Air Basin of California at 15:00 hours for
August 27, 1987.