| |
|
Requirements
POP has been compiled successfully for
the Cray T3D machine, running UNICOS 9.0.2.4
and up to 1024 processors in parallel.
The following compilers must be present
in the system: cft77, cc, and cam. The
input data for initializing the model
are in the file initial.F, which can be
modified to suit your own purpose. Other
initialization data are in the set_*.F
files and the size of the grid can be
altered in params.H.
Running
the Software
-
Uncompress and untar the package POP.tar.Z
into your directory. In the POP subdirectory,
edit the Makefile to tell the computer
where the paths are to your compilers,
and any other options you wish to compile
with.
-
Type
make. A selection of possible POP versions
will show up, and select the one you
desire.
-
The
compiled code, if successful, is named
"pop". You can then run the
code by typing: ./pop -npes # where
# is the number of PE's desired.
Timing
the Code
To get an estimate of the rate of the
program in gflops/sec, run the program
without the -Ta option (default), and
take note of the number of seconds in
Timer 1. Then turn on the apprentice option
-Ta, in the Makefile, compile the code,
run the program, and count the FLOPS as
specified in our webpage. The estimated
speed is given by the ratio of the above
two numbers. Note that the current default
local grid size is 64x32x20. For example,
based on 10 time steps and 50 iterations
(scans) with 64 processors, we obtained
the following table:
Gflops Time(sec) Speed(Gflops/s) Comments
14.79 42.63 0.347 Original
18.24 37.56 0.486 Mask on
18.24 28.52 0.640 Mask+BLAS
18.24 20.65 0.883 Mask+BLAS+compiler optimization
18.24 20.00 0.912 Full optimizationExplanation
of the comments:
-
Original means the code original code
as ported from the POP program for CM5
by Rick Smith at LANL.
-
Mask
on means the code contains routines
to mask off the land portion of the
(idealized) topography.
-
Mask+BLAS
means the code now includes BLAS routines
for speeding up the array computations.
-
Mask+BLAS+compiler
optimization means code is compiled
with the following options: -o aggress
-o unroll -o noieeedivide, and then
linked with -D rdahead=on.
-
Full
optimiziation means some loops were
manually unrolled and optimized in addition
to the above.
See
CRAY's Scientific Libraries Reference
Manual, publication SR-2081, for more
information on BLAS routines.
Diagnostic
Requirements
The POP comes with diagnostic routines
to check the accuracy of the output. These
can be turned on by setting -diagnostic=1
in the Makefile and then recompiling the
program. Make sure to type "make
clean" to clean out any old code
before recompiling. A sample diagnostic
output (from stdout) for 256 PE is given
in the file SAMPLE included in the distribution.
The diagnostic routines also produces
a numerical output file containing the
final values of the ocean variables and
it is called IO_out. This is a Cray binary
output file, which can be read by rd_2d_hpcc.f.
This program can be compiled with "cf77
rd_2d_hpcc.f" and then executed with
./a.out -npes 1 We have integrated the
POP code for 10 days, and analyzed the
sea level at the end of the 10-day integration.
Results from the optimized code matched
well with the original code within error
bounds.
|
|
|
|