ESM

Development of an Earth System Model (ESM):
Atmosphere/Ocean Dynamics and Tracers Chemistry
CAN-NCCS5-149 Ocean General Circulation Model ( OGCM)
Yi Chao, Benny N. Cheng and Ping Wang
Jet Propulsion Laboratory, Pasadena, CA

Abstract

The ocean general circulation model (OGCM) is one of the four component modules in the Earth System Model (ESM) being developed as part of the National Aeronautics and Space Administration (NASA)'s High-Performance Computing and Communication (HPCC) grand challenge applications. Here we discuss the progress in the parallel implementation and optimization of OGCM. This document serves the purpose of a combined users' guide and reference manual, andis submitted in compliance with milestone requirement of contract CAN-NCCS5-149. It includes descriptions of the model equations and solution algorithm, instructions on how to obtain the OGCM, how to compile the code with various options, and how to run the code, and how to perform diagnostic tests and timings.

Introduction

Ocean modeling plays an important role in both understanding the current climatic conditions and predicting the future climate change. In situ oceanographic instruments provide only sparse measurements over the world ocean. Although remote-sensed data from satellites cover the world ocean, it only provides information on the ocean surface. Information below the ocean surface has to be obtained from 3-dimensional ocean general circulation models (OGCMs).

Ocean General Circulation Model

The OGCM is based on the Parallel Ocean Program (POP) developed at Los Alamos National Laboratory (Smith et al., 1992). This ocean model evolved from the Bryan-Cox 3-dimensional primitive equations ocean model (Bryan, 1969; Cox, 1984), developed at NOAA Geophysical Fluid Dynamics Laboratory (GFDL), and later known as the Semtner and Chervin (1988; 1992) model or the Modular Ocean Model (MOM; Pacanowski et al., 1992). This ocean model solves the 3-dimensional primitive equations with the finite difference technique.
The equations are separated into barotropic (the vertical mean) and baroclinic (departures from the vertical mean) components. The barotropic equations isolate the fast surface gravity waves that are contained in the system, while the baroclinic equations contain the slower internal gravity waves. The baroclinic component is 3-dimensional, and uses explicit leapfrog time stepping. It parallelizes very well on the massively parallel computer. The barotropic component is 2-dimensional, and solved implicitly. It differs from the original Bryan-Cox formulation in that it removes the rigid-lid approximation and treats the sea surface height as a prognostic variable (i.e., free-surface). The free-surface model is superior to the rigid-lid model because it provides more accurate solution to the governing equations. More importantly, the free-surface model tremendously reduces the global communication otherwise required by the rigid-lid model.

Optimization

The original POP code was developed in FORTRAN 90 on the Los Alamos CM-2 Connection Machine. During the first half of 1994, the code was ported to the T3D by Cray Research using SHMEM-based message passing. Since the code on the T3D was still time- consuming when large problems were encountered, improving the code performance was required. In order to significantly reduce wallclock time, the code was optimized using single PE optimization techniques and other strategies that are documented in our previous technical reports I and II.

Performance

    A test problem was chosen with a local grid size of 37 x 34 x 60 cells. Timings were run for machine sizes from 1 to 256 processors, corresponding to a global grid of up to 592 x 544 x 60. The POP code decompose the grid in blocks in both x and y, and all z data for a given (x,y) is local to one processor. All results shown in this section refer to scaled size problems, where the problem size per processor is fixed and the number of processors is varied.
Figure 1 shows the run time (in wall clock time) per time step vs. the number of processors, for both the original code and the optimized code. The code running on one processor has been improved by 43%, and as the number of processors involved in the calculation increases, so does the improvement due to the optimization, up to 59% on 256 processors.
    It is clear from Figure 1 that the code's scaling has also been improved. This improvement is enumerated in Figure 2, showing the speed-up achieved versus the number of processors used in the calculation. For the ideal parallel code, these two numbers would be equal. The optimized code is performing quite well, in running 250 times faster than the single processor code on 256 processors. The original code, however, clearly had scaling problems, as its speed-up on 256 processors is only 182 times.
    Figure 3 shows the actual performance achieved in terms of computation rate. As mentioned above, the T3D processor has a clock period of 6.6 nanoseconds, corresponding to a clock rate of 151.5 MHz. Since the processor can complete one floating point results per clock period, this is equivalent to a maximum floating point performance of 151.5 MFLOP/s. For 256 processors, this maximum performance is 38.8 GFLOP/s.

Software Package

POP (Parallel Ocean Program) is a software package that runs on Cray T3D massively parallel computers for modeling global ocean circulation. This software is freely and publicly available from the authors. We request that any images produced by the software contain the following acknowledgement: This work was conducted at the Jet Propulsion Laboratory (JPL), California Institute of Technology, under contract with the National Aeronautics and Space Administration (NASA). For further information about the POP code, see the POP webpages at http://lochness.jpl.nasa.gov. Any questions about this code distribution can be addressed to:

Yi Chao or Benny Cheng
M/S 300-323
NASA-JPL
4800 Oak Grove Dr.
Pasadena, CA 91109

Obtaining the OGCM source code

The OGCM code can be found at the anonymous ftp site at yosemite.atmos.ucla.edu (enter anonymous as your username). Once you are in, cd to /pub/hpcc. The compressed tar file is POP.tar.Z. For further information, please contact Yi Chao at yc@pacific.jpl.nasa.gov.

About the source code

POP has been compiled successfully for the Cray T3D machine, running UNICOS 9.0.2.4 and up to 1024 processors in parallel. The following compilers must be present in the system: cft77, cc, and cam. The input data for initializing the model are in the file initial.F, which can be modified to suit your own purpose. Other initialization data are in the set_*.F files and the size of the grid can be altered in params.H.
A detailed description of the important subroutines can be seen below:

The basic structure of the program is as follows: main:
*call initial ! initialization routine
*do itt = 1,nlast
call step ! one ocean timestep
enddo
step:
*call time_manager
*set up for leapfrog or matsuno step
*call baroclinic ! explicit solution of baroclinic eqns.
*call barotropic ! implicit solution of barotropic eqns.
*update variables for next step
baroclinic:
*do k = 1,km ! loop over depth levels
call adv_flux ! advection flux velocities
call vmix ! vertical mixing coefficients
call clinic ! r.h.s. of momentum eqns.
call tracer ! tracer transport eqns.
enddo
*implicit treatment of coriolis and diffusive terms
*update velocities and tracers at new time (UA,VA,TA)
*convective adjustment
barotropic:
*set up operator and r.h.s. of elliptic equation
*call (pcg,cgr,jacobi) ! elliptic solvers
*update surface pressure and barotropic velocities
clinic:
*call advu ! momentum advection
*coriolis terms
*call gradp ! hydrostatic pressure gradient
*call hdiffu ! horizontal diffusion of momentum
*call vdiffu ! vertical diffusion of momentum
tracer:
*call advt ! tracer advection
*call hdifft ! horizontal diffusion of tracers
*call vdifft ! vertical diffusion of tracers
*call tsource ! source terms

Running the OGCM

Uncompress and untar the package POP.tar.Z into your directory. In the POP subdirectory, edit the Makefile to tell the computer where the paths are to your compilers, and any other options you wish to compile with.
This version contains: a makefile, include files named .H, ordinary Fortran90 source code in .F files, assembler code in .s, i/o filenames are set in the input file 'in_files.dat'.
Type make. A selection of possible POP versions will show up, and select the one you desire. Currently, we have the following options:

Original means the code original code as ported from the POP program for CM5 by Rick Smith at LANL.
Mask on means the code contains routines to mask off the land portion of the (idealized) topography. Mask+BLAS means the code now includes BLAS routines for speeding up the array computations.
Mask+BLAS+compiler optimization means code is compiled with the following options: -o aggress -o unroll -o noieeedivide, and then linked with -D rdahead=on.
Full optimiziation means some loops were manually unrolled and optimized in addition to the above.

    See CRAY's Scientific Libraries Reference Manual, publication SR-2081, for more information on BLAS routines.
    The code is set up to run a simple test problem with idealized geometry. This test problem can be run with arbitrary resolution. To set the horizontal resolution, change parameters imt & jmt in 'params.H'. To set the vertical resolution, change parameter km in 'params.H', and and make sure that the input files 'in_depths.dat' and 'in_ts.dat' contain data for km levels.
    The convergence criterion on the elliptic solver is set to an adequately small value. For purposes of timing the code, it is suggested that the number of iterations be hardwired to 100. To do this, go into file input.dat and set: err = 1.0e-25 and mxscan = 100.
    The makefile uses the C preprocessor to make .h include files from .H files, and .f files from .F files, with various precompiler options.
    The compiled code, if successful, is named "pop". You can then run the code by typing: ./pop -npes # where # is the number of PE's desired.

Diagnostic routines

The POP comes with diagnostic routines to check the accuracy of the output. These can be turned on by setting -diagnostic=1 in the Makefile and then recompiling the program. Make sure to type make clean" to clean out any old code before recompiling. A sample diagnostic output (from stdout) for 256 PE is given in the file SAMPLE included in the distribution. The diagnostic routines also produces a numerical output file containing the final values of the ocean variables and it is called IO_out. This is a Cray binary output file, which can be read by rd_2d_hpcc.f. This program can be compiled with cf77 rd_2d_hpcc.f" and then executed with ./a.out -npes 1 We have integrated the POP code for 10 days, and analyzed the sea level at the end of the 10-day integration. Results from the optimized code matched well with the original code within error bounds.

Timing the code

To get an estimate of the rate of the program in gflops/sec, run the program without the -Ta option (default), and take note of the number of seconds in Timer 1. Then turn on the apprentice option -Ta, in the Makefile, compile the code, run the program, and count the FLOPS as specified in our webpage. The estimated speed is given by the ratio of the above two numbers. Note that the current default local grid size is 64x32x20. For example, based on 10 time steps and 50 iterations (scans) with 64 processors, we obtained the following table:
Gflops Time(sec) Speed(Gflops/s) Comments
1904.77 6779.26 0.281 Original
2305.77 6046.90 0.381 Mask on
2305.77 4517.81 0.510 Mask+BLAS
2305.77 3337.32 0.690 Mask+BLAS+compiler optimization
2305.77 2858.56 0.807 Full optimization

Setting up a new model configuration

In order to set up a particular model configuration for your own applications, one needs to obtain many data files, including the bottom topography, surface forcings (wind stress, heat and fresh-water flux), and side boundary conditions (historical 3-dimensional profiles of temperature and salinity). We have collected various data files which can be used for various applications. If you are interested in obtaining these data or a subset of these data files, please contact Yi Chao.

Summary

    This document serves the purpose of a combined users' guide and reference manual for the OGCM. It includes descriptions of the model equations and solution algorithm, instructions on how to obtain the OGCM, how to compile the code with various options, and how to run the code and perform diagnostic tests and timings. There is also discussions of the procedure for constructing a new application of a user's choice.
    Optimization methods which were used on the POP code included unrolling loops, both manually and through the compiler, rewriting code to equivalence multi-dimensional arrays as one-dimensional arrays, simplification of algebra, reductions in the number of divide operations, use of mask arrays to trade fast floating point work for slow comparisons and branches, explicitly rewriting statements in array notation as loops, and using optimized libraries. All of these contributed to the large decrease in time required to solve ocean modeling problems.
    A 1/6 degree Atlantic OGCM has been constructed on the 256-PE Cray T3D (Chao et al., 1996). It is quite feasible that we can construct a 1/16 degree Atlantic OGCM on the 384- processor T3E. In additional to the OGCM integrations, we are coupling this OGCM with the atmospheric general circulation model (AGCM) developed at UCLA on the T3D/E (Drummond et al., 1997). We are also in the process of implementing a biogeochemical component in the OGCM, so that the carbon cycle associated with the increasing CO2 and global warming can be addressed.

References

K. Bryan, Numerical Method for the Study of the World Ocean Circulation, J. Comp. Phy., 4, 1687-1712, 1969.
Y. Chao, A. Gangopadhyay, F.O. Bryan, W.R. Holland, Modeling the Gulf Stream System: How Far From Reality?, Geophys. Res. Letts., 23, 3155-3158, 1996.
M.D. Cox, Primitive Equation, 3-Dimensional Model of the Ocean, Group , Tech. Report 1, GFDL/NOAA, Princeton, NJ, 1984.
L.A. Drummond, J.D. Farrara, C.R. Mechoso, J.A. Spahr, Y. Chao, D.S. Katz, J.Z. Lou, and P. Wang, Parallel Optimization of An Earth System Model (100 GIGAFLOPS and Beyond?), 1997 Society for Computer Simulation (SCS) International Conference, Atlanta, Georgia, April, 1997.
R. Pacanowski, R.K. Dixon, and A. Rosati, Modular Ocean Model User's Guide, GFDL Ocean Group Tech. Report 2, GFDL/NOAA, Princeton, NJ, 1992.
A.J. Semtner and R.M. Chervin, Ocean-General Circulation from a Global Eddy-Resolving Model, J. Geophys. Research Oceans, 97, 5493-5550, 1992.
R.D. Smith, J.K. Dukowicz, and R.C. Malone, Parallel Ocean General Circulation Modeling, Physica D, 60, 38-61, 1992.