I have got the ROMS to run in parallel on a Linux platform (with Fedora 2) using the ifort 8.1 compiler. When doing so, it was necessary for me to change the stacksize on each of the threads (I used 2 processors with each having 2 threads) and I did so using the command : export KMP_STACKSIZE=50m (so that each thread had a 50 MB size). The compilation flags I used in my Makefile were:
CFT = ifort -pc80
CPPFLAGS = -P -DLINUX -D_OPENMP -I$(NETCDF_INCDIR)
CPP = /lib/cpp -traditional
FFLAGS = -i_dynamic -openmp -fpp -O
MDEPFLAGS = --cpp --fext=f90 --file=-
I found that it was absolutely necessary to use the -fpp flag. The resulting executable gave me modest speed-ups over a corresponding serial run (about 30% faster at best when running on 4 threads). I was expecting to get a factor of 2 (100% speed-up) or even better. I found that when I used the flags in Makefile.OMP_ifort (the relevent Makefile which came with the ROMS code package) with FFLAGS = -openmp -fpp -ip -O3 -tpp7 -xW (I had to include -fpp also), the code took a very long time to compile but I got no appreciable increases in speed-ups in the run times.
Does anybody know what additional/alternative compilation flags I can use to get better speed-ups with OPENMP on a Linux platform with ifort?
PS. I have verified that my serial and parallel runs give more or less identical numerical solutions by looking at time histories of the differences in all of the flow variables.