Code: Select all
Hi all-
I have experimented with running ROMS 2.2 on a iMac 2Ghz Intel Core Duo
processor, with 667 MHz DDR2 RAM. I used free compilers installed
with fink. As you will see, I suspect that I would do better to use
the Intel compiler, but I have not done so since for large runs I
use my linux cluster.
As a quick conclusion, I find that MPI works best. I suspect my
results differ from Alexander Schepetkins because he used an intel
compiler, which does a _much_ better job with openMP.
My test model is a 349x219x20 grid, running with NDTFAST=20 for 20
timesteps. The relative size of the grid, and the size of NDTFAST,
will affect these results, as they control the patterns of
communication between threads and processes.
I used both the g95 and gfortran compilers, and I parallelized the
code with both openMP and MPI. These results are by no means
exhaustive, but should get you up and running.
I installed most of the software with fink, from the unstable/main
branch for os x 10.4. It should not be hard to install your own
software without fink, or to use the precompiled versions of these
compilers from the HPC project.
G95: (version 0.90, compiled with gcc 4.0.3).
1) install g95 and netcdf with fink.
2) make g95 the fortran compiler in the ROMS makefile
3) cp External/Linux-g95.mk to External/Darwin-g95.mk
4) change netcdf paths in Darwin-g95.mk to your appropriate
location, likely
NETCDF_INCDIR ?= /sw/include
NETCDF_LIBDIR ?= /sw/lib
5) set FFLAGS += -03 -ffast-math
6) compile with "make -j2" The -j2 launches 2 compile jobs at
once, one for each core, and it is much faster.
7) it will fail at mod_strings.f90. You have two choices
1) add quotes to the part of the code which should
read:
character (len=80) :: my_os = 'Darwin'
character (len=80) :: my_cpu = '1'
character (len=80) :: my_fort = 'g95'
character (len=80) :: my_fc = 'g95'
character (len=160) :: my_fflags =' '
2) fix the makefile so this does not happen, and does
not break the linux compiles with the same makefile. We will
praise you with great praise! I actually have not put
much effort into it.
8) run "make -j2" again
It ran in 198.967 seconds. Playing with the tileing does not help.
GFORTRAN (part of GCC 4.2)
1) install with fink. "sudo fink install gcc4"
2) change the compiler to gfortran in the makefile
3) make netcdf for this compiler.
a) snag the netcdf sources,
b) un tar, and run ./configure in the src directory
c) change -Df2cFortran in macros.make to -DpgiFortran
d) make
4) cp Linux-gfortran.mk to Darwind-gfortran.mk Set netcdf
variables to point to the libraries you compiled in step 3.
5) set FFLAGS += -O3 -ftree-vectorize -msse3
6) do steps (6)-(8) of the G95 description above.
it runs in 181 seconds. This is not as small as I would
expect useing the SSE3 optimizations. I suspect this is
because it fails to vectorize most of the loops. If you set
the verbose flag for vectorization, you will find that it has
trouble with the vast majority of the loops. I have not yet
figured out how to fix this. The intel compiler does much
better on my Linux AMD-Opteron codes.
Playing with the tileing does not help much.
GFORTRAN with openmp.
1) as above, but add -fopenmp to the FFLAGS, turn on openMP in
the top level makefile and change the NtileI and NtileJ in
the external file to take advantage of the multiple
threads. Set OMP_NUM_THREADS in the enviroment to use as
many threads as you would like. I used 2.
fails -- can't handle the directive "OMP THREADPRIVATE
(/process/)" in mod_parallel.F. If I fix this with a hack, it
runs slower than without openMP. This is contrary to
Alexander Schepetkin's results with ifort, so I am inclined to
blame the compiler.
GFORTRAN with MPI
1) as with GFORTRAN above but:
2) use fink to instal openmpi
3) change makefile to turn on MPI compiling.
4) change "FC := gfortran" in Darwin-gfortran.mk to "FC :=
mpif90" (this is not quite kosher, but since I sometimes run
MPI with different compilers and libraries on the same
machine, it is how I do it).
5) change NtileI and NtileJ so that one is 1, and the other is 2
experiment to see which works best.
6) change your command to run the model to look something
like:
om-mpirun -np 2 ${PWD}/oceanM ${PWD}/external/ocean_whatever.in
It runs, on my machine, in 121 seconds.
I hope this helps someone.
Jamie Pringle