ifort i7 optimization flags issue

Report or discuss software problems and other woes

Moderators: arango, robertson

Post Reply
Message
Author
balbin

ifort i7 optimization flags issue

#1 Unread post by balbin »

Compiler flags are not optimal: -heap-arrays ---> -no-heap-arrays, but you have to adjust stacksize limit.
Instruction set: -msse2 --> -xSSE4.1 because your processor in Xeon 5400-series, not Pentium4 "Northwood"
or Opteron 248.
OK. At the HP I have tried the new compiler options:
Operating system : Linux
CPU/hardware : x86_64
Compiler system : ifort
Compiler command : /opt/intel/composerxe-2011.0.084/bin/intel64/ifort
Compiler flags : -no-heap-arrays -fp-model precise -openmp -fpp -ip -O3 -xSSE4.1 -free
and it runs until it has to write avg file. It stops without any explanation and no complains just BEFORE the output
WRT_HIS - wrote history fields (Index=1,1) into time record = 0000030
WRT_AVG - wrote averaged fields into time record = 0000001
WRT_RST - wrote re-start fields (Index=1,1) into time record = 0000001
but it had written to his and rst files previously. Inside his file there are 30 records and inside avg file there is none.

I have tried limiting stack size to 16 MB and unlimiting. Just in case it was a problem with netcdf I have recompiled netcdf libraries following intel instructions changing -xT by -xSSE4.1:
$ export CC=icc
$ export CXX=icpc
$ export CFLAGS='-O3 -xSSE4.1 -ip -no-prec-div -static'
$ export CXXFLAGS='-O3 -xSSE4.1 -ip -no-prec-div -static'

$ export F77=ifort
$ export FC=ifort
$ export F90=ifort
$ export FFLAGS='-O3 -xSSE4.1 -ip -no-prec-div -static'

$ export CPP='icc -E'
$ export CXXCPP='icpc -E'
It always stop at the same point but does not give any reason. It does not write anything else at the output file. I attach the redirected output file log3.txt

Using "Compiler flags : -heap-arrays -fp-model precise -openmp -fpp -ip -O3 -xSSE4.1 -free" runs fine as usual. I have restarted the run from the last saved rst after recompiling with the -heap-arrays option and log3_rst.txt is the output that is still running.
Sorry, I have no experience with this. What am I doing wrong? Thanks for your help.
Attachments
log3_rst.txt
(104.36 KiB) Downloaded 409 times
log3.txt
(1.34 MiB) Downloaded 410 times

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Intel’s new i7 980x CPU gives disappointing speedup

#2 Unread post by kate »

If you redirected the standard output, did you also redirect the standard error? It might have written something there, say from the netcdf library.

Did the restarted run write to the avg file? I'm guessing no. For debugging, you can ask it to only average the first few steps, but you still need a way to see what is going wrong.

balbin

Re: Intel’s new i7 980x CPU gives disappointing speedup

#3 Unread post by balbin »

No, I did not redirect the standard error! I forgot it. Of course I will, tomorrow. Thanks.
The restarted run with -heap-arrays option does write into the averaged file. I have to check it but I know from previous experiences restarting from a break that the first record after restart of averaged values seems to be not correct and not related to the assigned time.
The restarted run with -no-heap-arrays option stops at the very same point and does not write into the avg file.

balbin

Re: Intel’s new i7 980x CPU gives disappointing speedup

#4 Unread post by balbin »

Standard error says "Segmentation fault"
I asked the program to write to avg file every day instead of every month and it stops just before writing to the avg file after calculating the first day.

I checked the previously restarted with -heap-arrays option and it writes to avg file at day 30 (30th Jan) (360 days/year climatological run). In nc file it says time is 15th of Feb!? OK this is other thing I don´t understand but is not the point here.

Thanks

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Intel’s new i7 980x CPU gives disappointing speedup

#5 Unread post by kate »

Try again with -O2 instead of -O3 and see if that runs. Compiler bugs are often in the optimizer phase of the compile. You want to find the fastest options that give the correct answer. You can always report compiler bugs if you have a current license, but they'll want a short (<100 lines) program that demonstrates the error. This can be unbelievably hard to obtain, probably not worth the trouble.

balbin

Re: Intel’s new i7 980x CPU gives disappointing speedup

#6 Unread post by balbin »

Thanks, Kate.
-O2 stops in the same way. It only runs fine with the -heap-arrays.
There seems to be something inside set_avg_tile. I modified set_avg.F

Code: Select all

       write(*,*)'entra en set_avg_tile',ng,tile
      CALL set_avg_tile (ng, tile,                                      &
     &                   LBi, UBi, LBj, UBj,                            &
     &                   IminS, ImaxS, JminS, JmaxS,                    &
# ifdef SOLVE3D
     &                   NOUT,                                          &
# endif
     &                   KOUT)
      write(*,*)'sale de set_avg_tile',ng,tile
Now I ask the program to write avgs every 10 time steps. I attach the output "log6.txt"
If I try to do the same inside set_avg_tile, for instance asking for iic(ng) or just to say "hello" it stops when it reaches the write statement.

The compiler complains for some things that I dont know if are important. Just in case:
WS> build.sh -j > build.log
makefile:241: INCLUDING FILE /home/balbin/make_macros.mk WHICH CONTAINS APPLICATION-DEPENDENT MAKE DEFINITIONS
makefile:237: INCLUDING FILE /media/Data/projects/medsea/build/make_macros.mk WHICH CONTAINS APPLICATION-DEPENDENT MAKE DEFINITIONS
set_weights.f90(199): remark #8290: Recommended relationship between field width 'W' and the number of fractional digits 'D' in this edit descriptor is 'W>=D+3'.
30 FORMAT (/,1x,'ndtfast, nfast = ',2i4,3x,'nfast/ndtfast = ',f7.5)
------------------------------------------------------------------^
ar: creating /media/Data/projects/medsea/build/libNLM_bio.a
ar: creating /media/Data/projects/medsea/build/libNLM_sed.a
ar: creating /media/Data/projects/medsea/build/libMODS.a
ar: creating /media/Data/projects/medsea/build/libANA.a
ar: creating /media/Data/projects/medsea/build/libNLM.a
ar: creating /media/Data/projects/medsea/build/libUTIL.a
ifort: command line remark #10010: option '-Vaxlib' is deprecated and will be removed in a future release. See '-help deprecated'
I also include build.log
Attachments
log6.txt
(46.31 KiB) Downloaded 389 times
build.log
(215.14 KiB) Downloaded 424 times

User avatar
shchepet
Posts: 188
Joined: Fri Nov 14, 2003 4:57 pm

Re: Intel’s new i7 980x CPU gives disappointing speedup

#7 Unread post by shchepet »

It appears that you are looking at very basic segmentation fault, which may be
associated with either ROMS itself, or to the particular netCDF version you are
using. set_avg_tile is a long routine. Most likely the breaking point occurs
inside

Code: Select all

!  Convert accumulated sums into time-averages, if appropriate.
!-----------------------------------------------------------------------
!
      IF ((iic(ng).gt.ntsAVG(ng)).and.                                  &
     &    (MOD(iic(ng)-1,nAVG(ng)).eq.0) .....

part of the code, starting at line 2170 and ending at line 2888, since this
segment of the code is executed only when MOD(iic-1,nAVG)==0, which is
the final stage of averaging. Still, too much code to pinpoint the problem easily.

1.Is there any way to recompile and execute the code with compiler flags
appended by

Code: Select all

-g -check all
or

Code: Select all

-g -check arg_temp_created,bounds,pointers,uninit,format,output_conversion
while keeping -openmp -fpp -no-heap-arrays -xSSE4.1 -free in place (i.e. run
it in parallel, with proper instruction set and using stack instead of heap.

This may pinpoint breaking point.

Suppress flag -O3 (since -g will override it any way);

Suppress flag -ip (interprocedural analysis -- I observed at least once that this
flag caused problem because of compiler bug, but that was associated to arithmetic
precision, and not with memory issue, so here it is probably irrelevant);

2. what version of netCDF do you use and how did you compile it? [A general
advice here is to use either netcdf-3.6.3 (the final release of version 3 generation),
or to use netcdf-4.1.1, but stay away from anything in between, i.e., 4.0.x should be
avoided. Also, 4.1.1 can be compiled with or without HDF support. Do you use it?]

balbin

Re: Intel’s new i7 980x CPU gives disappointing speedup

#8 Unread post by balbin »

Q.2:
I used netcdf v 3.6.2 compiled following intel recipe for linux http://software.intel.com/en-us/article ... compilers/
and including a line, "#include <cstring>", at ncvalues.cpp and sfc_pres_temp_rd.cpp as explained at http://www.unidata.ucar.edu/support/hel ... 09331.html. But still there was a warning:
netcdf.cpp(1267): warning #68: integer conversion resulted in a change of sign
t[5] = -1;
^

netcdf.cpp(1270): warning #68: integer conversion resulted in a change of sign
if (t[j] == -1) {
^
I did not care but this morning I fixed it by copying netcdf.cpp from version 4.1.1 (maybe not a good idea). I got the same error running roms.

After that I compiled v 3.6.3 following intel instructions and there was no need to modify anything. log.1,2,3,and 4 are the outputs of ./configure, make, make check and sudo make install respectively. err.2 and 3 are the related error messages.
My program runs exactly the same way as yesterday. It stops when reaching set_avg_tile.

Q.1:
I compiled with -g -check all and no -ip
I included a line into set_avg_tile

Code: Select all

# include "set_bounds.h"

            print*,'hola'
!
!-----------------------------------------------------------------------
!  Return if time-averaging window is zero.
!-----------------------------------------------------------------------
!
      IF (nAVG(ng).eq.0) RETURN
And it stops as soon as it reaches there.
The output shows some errors related to TIME_REF = -1 (the output format seems to complain when writhing day -1) and the reading of T/F flags for writing output fields. I include the complete output log8.txt

Thanks again
Attachments
log8.txt
(40.71 KiB) Downloaded 353 times
log.1.txt
(12.7 KiB) Downloaded 395 times
log.2.txt
(27.72 KiB) Downloaded 416 times
log.3.txt
(54.54 KiB) Downloaded 398 times
log.4.txt
(20.28 KiB) Downloaded 418 times
err.2.txt
(480 Bytes) Downloaded 412 times
err.3.txt
(854 Bytes) Downloaded 410 times

User avatar
shchepet
Posts: 188
Joined: Fri Nov 14, 2003 4:57 pm

Re: Intel’s new i7 980x CPU gives disappointing speedup

#9 Unread post by shchepet »

Now it looks like this time your problem is different than before (when it was compiled
without -g -extra_flags): now the code terminates immediately when attempts it to call
set_avg for the very first time, not when finalizing averaging.

Based on the fact that word 'hola' never gets printed in log8.txt,

Code: Select all

            print*,'hola'
....
      IF (nAVG(ng).eq.0) RETURN
It appears that segmentation fault occurs at the very moment when set_avg_tile is called
by its driver, CALL set_avg_tile (ng, tile,....), starting with line 60 of set_avg.F. This means that
some of the arguments of the routine being called are not valid pointers, i.e., an allocatable array
was not properly allocated (this is unlikely because, after all, the code runs with -heap_arrays
compiler flag), or because the compiler decided to create a temporal copy-in - copy-out array for
one of the arguments, and there is no enough space in stack to allocate it if it must go to stack
(i.e., when the code compiled with with -no-heap_arrays flag.)

To check/verify that this is the case place another print*,'hola 1' statement just before
CALL set_avg_tile (ng, tile,... line, and see whether this message shows up, while the one
from the inside does not.

What about undefining CPP-switch AVERAGES completely? Would it still terminate?

balbin

Re: Intel’s new i7 980x CPU gives disappointing speedup

#10 Unread post by balbin »

I am checking all these things under the UPWELLING case.
I reproduce the problem using 8 threads and:

Code: Select all

! C-preprocessing Flag.

    MyAppCPP = UPWELLING

! Input variable information file name.  This file needs to be processed
! first so all information arrays can be initialized properly.

     VARNAME = /home/balbin/roms/ROMS/External/varinfo.dat

! Grid dimension parameters. See notes below in the Glossary for how to set
! these parameters correctly.
          Lm == 160           ! Number of I-direction INTERIOR RHO-points
          Mm == 320           ! Number of J-direction INTERIOR RHO-points
           N == 30            ! Number of vertical levels
...
      NtileI == 1                               ! I-direction partition
      NtileJ == 16                               ! J-direction partition
I also played with number of threads and NtileJ. With the original 41x80x16 grid the code runs fine
If I undef AVERAGES the code runs fine

1.- Checking how it reaches set_avg_tile with -g -extra_flags

Code: Select all

       print*,'going into set_avg_tile',tile
      CALL set_avg_tile (ng, tile,                                      &
     &                   LBi, UBi, LBj, UBj,                            &
     &                   IminS, ImaxS, JminS, JmaxS,                    &
# ifdef SOLVE3D
     &                   NOUT,                                          &
# endif
     &                   KOUT)
       print*,'out of set_avg_tile',tile
it goes into and out of set_avg_tile until it has to compute mean values. See log1.txt


2.- Checking inside set_avg_tile with -g -extra_flags

Code: Select all

!-----------------------------------------------------------------------
!  Return if time-averaging window is zero.
!-----------------------------------------------------------------------
!
      print*,'hola'
      IF (nAVG(ng).eq.0) RETURN
!
!-----------------------------------------------------------------------
!  Compute vorticity fields.
!-----------------------------------------------------------------------
The it does not reach to print anything. See log2.txt
Attachments
log1.txt
(37.93 KiB) Downloaded 419 times
log2.txt
(33.72 KiB) Downloaded 436 times

balbin

Re: Intel’s new i7 980x CPU gives disappointing speedup

#11 Unread post by balbin »

My MacBook Pro reproduces the UPWELLING problem

MacBookPro6.2, Intel Core i7, 2.66GHz, 2x2GB DDR3@1067 MHz

netcdf library version "3.6.3" of Feb 22 2011 16:31:32
MacBook> uname -a
Darwin Mac-Book-Pro-de-Rosa-Balbin.local 10.6.0 Darwin Kernel Version 10.6.0: Wed Nov 10 18:11:58 PST 2010; root:xnu-1504.9.26~3/RELEASE_X86_64 x86_64
MacBook> ifort -v
Version 12.0.2

See attached output log3.txt

I will check the MacPro and I will tell you.
Attachments
log3.txt
(37.62 KiB) Downloaded 406 times

balbin

Re: Intel’s new i7 980x CPU gives disappointing speedup

#12 Unread post by balbin »

yes, the MacPro also reproduces the UPWELLING problem.
Playing around with compilation options sometimes it says "Illegal instruction" instead of "Segmentation fault"

Should I try to recompile netcdf with different options?

Post Reply