dogbone composite nesting: contact()%Kdg in nesting.f90 not assigned?

Bug reports, work arounds and fixes

Moderators: arango, robertson

Post Reply
Message
Author
blaupole
Posts: 5
Joined: Fri Sep 24, 2010 12:12 am
Location: Antarctic Research Centre

dogbone composite nesting: contact()%Kdg in nesting.f90 not assigned?

#1 Unread post by blaupole »

Hi,

I'm trying to run the dogbone composite test case.
It errors after reading initial state conditions, which seems to be caused by the k index of the donor grid not containing a sensible value.

It must be a problem at my end since dogbone seems to run successfully for many others.

OS Ubuntu 18.04
gfortran, gcc version 7.5.0
netCDF-Fortran 4.4.4
mpirun (Open MPI) 2.1.1
ROMS checkout April 1 2021, SVN Revision : 1053

The tile configuration for this test run in roms_dogbone_composite.in

Code: Select all

      NtileI == 1  1                             ! I-direction partition
      NtileJ == 2  2                             ! J-direction partition
To obtain more information I have run it in debug mode.

Code: Select all

mpirun -np 2 romsG roms_dogbone_composite.in 
One (possibly related?) thing that seems not straightforward to me - the two grids are subdivided into 2 tiles each. In theory shouldn't that become 4 MPI ranks? ROMS won't allow me to assign 4 CPUs to the problem, it only accepts 2.

However, running it with 2 CPU it creates the following error message (full log attached as .txt file)

Code: Select all

...
NLM: GET_STATE - Reading state initial conditions,                       0001-01-01 00:00:00.00
                   (Grid 02, t = 0.0000, File: dogbone_ini_right.nc, Rec=0001, Index=1)
                - free-surface
                   (Min =  0.00000000E+00 Max =  0.00000000E+00)
                - vertically integrated u-momentum component
                   (Min =  0.00000000E+00 Max =  0.00000000E+00)
                - vertically integrated v-momentum component
                   (Min =  0.00000000E+00 Max =  0.00000000E+00)
                - u-momentum component
                   (Min =  0.00000000E+00 Max =  0.00000000E+00)
                - v-momentum component
                   (Min =  0.00000000E+00 Max =  0.00000000E+00)
                - potential temperature
                   (Min =  1.00000000E+01 Max =  1.00000000E+01)
                - salinity
                   (Min =  3.50000000E+01 Max =  3.50000000E+01)
At line 3986 of file nesting.f90
Fortran runtime error: Index '-1094795586' of dimension 2 of array 'ac' below lower bound of 1

Error termination. Backtrace:
At line 3986 of file nesting.f90
Fortran runtime error: Index '-1094795586' of dimension 2 of array 'ac' below lower bound of 1

Error termination. Backtrace:
#0  0x7faf7d9d52ed in ???
#1  0x7faf7d9d5ed5 in ???
#2  0x7faf7d9d62a7 in ???
#0  0x7fe816e322ed in ???
#1  0x7fe816e32ed5 in ???
#2  0x7fe816e332a7 in ???
#3  0x55a6f7583ee5 in __nesting_mod_MOD_put_contact3d
	at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/nesting.f90:3986
#4  0x55a6f76c51fc in put_composite
	at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/nesting.f90:1801
#5  0x55a6f77723bc in __nesting_mod_MOD_nesting
	at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/nesting.f90:300
#3  0x56404f097ee5 in __nesting_mod_MOD_put_contact3d
	at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/nesting.f90:3986
#4  0x56404f1d91fc in put_composite
	at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/nesting.f90:1801
#5  0x56404f2863bc in __nesting_mod_MOD_nesting
	at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/nesting.f90:300
#6  0x55a6f6cc24be in initial_
	at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/initial.f90:289
#7  0x55a6f6a537e2 in __ocean_control_mod_MOD_roms_initialize
	at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/ocean_control.f90:142
#6  0x56404e7d64be in initial_
	at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/initial.f90:289
#8  0x55a6f6a4e966 in ocean
	at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/master.f90:96
#9  0x55a6f6a4f0d6 in main
	at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/master.f90:51
#7  0x56404e5677e2 in __ocean_control_mod_MOD_roms_initialize
	at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/ocean_control.f90:142
#8  0x56404e562966 in ocean
	at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/master.f90:96
#9  0x56404e5630d6 in main
	at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/master.f90:51

=================================================================

=================================================================
==9387==ERROR: LeakSanitizer: detected memory leaks
==9388==ERROR: LeakSanitizer: detected memory leaks


Direct leak of 2240 byte(s) in 2 object(s) allocated from:
Direct leak of 2240 byte(s) in 2 object(s) allocated from:
    #0 0x7faf7e343b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
    #0 0x7fe8177a0b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
    #1 0x7faf5f40c6b6 in mca_coll_sync_comm_query (/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_coll_sync.so+0x16b6)
    #1 0x7fe8006d96b6 in mca_coll_sync_comm_query (/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_coll_sync.so+0x16b6)


Direct leak of 68 byte(s) in 12 object(s) allocated from:
Direct leak of 68 byte(s) in 12 object(s) allocated from:
    #0 0x7faf7e2dc538 in strdup (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x77538)
    #0 0x7fe817739538 in strdup (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x77538)
    #1 0x7faf60075087  (<unknown module>)
    #1 0x7fe801342087  (<unknown module>)


Direct leak of 29 byte(s) in 1 object(s) allocated from:
Direct leak of 29 byte(s) in 1 object(s) allocated from:
    #0 0x7faf7e2dc538 in strdup (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x77538)
    #0 0x7fe817739538 in strdup (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x77538)
    #1 0x7faf60726ed0  (<unknown module>)
    #1 0x7fe8019f3ed0  (<unknown module>)
    #2 0x75762e6f65672e34  (<unknown module>)
    #2 0x75762e6f65672e34  (<unknown module>)


Direct leak of 11 byte(s) in 1 object(s) allocated from:
Direct leak of 11 byte(s) in 1 object(s) allocated from:
    #0 0x7faf7e343b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
    #0 0x7fe8177a0b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
    #1 0x7faf70ca1dad  (/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pmix_pmix112.so+0x1ddad)
    #1 0x7fe80a160dad  (/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pmix_pmix112.so+0x1ddad)


SUMMARY: AddressSanitizer: 2348 byte(s) leaked in 16 allocation(s).
SUMMARY: AddressSanitizer: 2348 byte(s) leaked in 16 allocation(s).
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[17814,1],1]
  Exit code:    1
--------------------------------------------------------------------------
It blows up on the first use of kdg at line 3986 of file nesting.f90

Code: Select all

!
!  Interpolate.
!
      DO k=LBk,UBk
        DO m=1,Npoints
          i=contact(cr)%Irg(m)
          j=contact(cr)%Jrg(m)
          kdg=contact(cr)%Kdg(k,m)   <----
          kdgm1=MAX(kdg-1,Kmin)
          IF (((Istr.le.i).and.(i.le.Iend)).and.                        &
     &        ((Jstr.le.j).and.(j.le.Jend))) THEN
            cff(1)=contact(cr)%Lweight(1,m)*contact(cr)%Vweight(1,k,m)
            cff(2)=contact(cr)%Lweight(2,m)*contact(cr)%Vweight(1,k,m)
            cff(3)=contact(cr)%Lweight(3,m)*contact(cr)%Vweight(1,k,m)
            cff(4)=contact(cr)%Lweight(4,m)*contact(cr)%Vweight(1,k,m)
            cff(5)=contact(cr)%Lweight(1,m)*contact(cr)%Vweight(2,k,m)
            cff(6)=contact(cr)%Lweight(2,m)*contact(cr)%Vweight(2,k,m)
            cff(7)=contact(cr)%Lweight(3,m)*contact(cr)%Vweight(2,k,m)
            cff(8)=contact(cr)%Lweight(4,m)*contact(cr)%Vweight(2,k,m)
            Ar(i,j,k)=cff(1)*Ac(1,kdgm1,m)+                             &
     &                cff(2)*Ac(2,kdgm1,m)+                             &
     &                cff(3)*Ac(3,kdgm1,m)+                             &
     &                cff(4)*Ac(4,kdgm1,m)+                             &
     &                cff(5)*Ac(1,kdg  ,m)+                             &   <----
     &                cff(6)*Ac(2,kdg  ,m)+                             &
     &                cff(7)*Ac(3,kdg  ,m)+                             &
     &                cff(8)*Ac(4,kdg  ,m)
            Ar(i,j,k)=Ar(i,j,k)*Amask(i,j)
          END IF
        END DO
      END DO
Explicitly writing out the index inside the IF statement with

Code: Select all

 write(*,*) kdg,i,j,kdgm1
results in

Code: Select all

-1094795586          22           9           1
 -1094795586          22           0           1
As a first question where and how is contact(cr)%Kdg populated ? I see it being allocated in mod_nesting.f90 and I see its syblings Idg and Jdg being populated in set_contact.f90 but can't figure out where Kdg comes from.

Any suggestions would be much appreciated.
Stefan
Attachments
set_contact.f90
(36.78 KiB) Downloaded 417 times
mod_nesting.f90
(31.05 KiB) Downloaded 397 times
nesting.f90
(230.47 KiB) Downloaded 401 times
dogbone_ngc_refined.nc
(425.89 KiB) Downloaded 401 times
out_log.txt
(40.05 KiB) Downloaded 374 times
dogbone.h
(1.08 KiB) Downloaded 378 times

jcwarner
Posts: 1200
Joined: Wed Dec 31, 2003 6:16 pm
Location: USGS, USA

Re: dogbone composite nesting: contact()%Kdg in nesting.f90 not assigned?

#2 Unread post by jcwarner »

i think there is a issue with the updated calls for z_weights. There is a check in set_contact that has

IF (.not.ANY(Lcoincident).and.ANY(Lcomposite)) THEN
get_Vweights=.TRUE.
ELSE
get_Vweights=.FALSE.
END IF

i think we need this to allow get_vweights to be true.
then at top of nesting we have

IF ((isection.eq.nzwgt).and.get_Vweights) THEN
DO tile=last_tile(ng),first_tile(ng),-1
CALL z_weights (ng, model, tile)

then in z_weights, it sets the kdg stuff that is missing.
let me dig in a bit and ask Hernan on this one.

User avatar
arango
Site Admin
Posts: 1367
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Re: dogbone composite nesting: contact()%Kdg in nesting.f90 not assigned?

#3 Unread post by arango »

Yes, in set_contat.F we need to have instead:

Code: Select all

!
!  Set the switch to compute vertical interpolation weights. Currently,
!  they are only needed in non-coincident composite grids.
!
      IF (.not.ANY(Lcoincident).or.ANY(Lcomposite)) THEN
        get_Vweights=.TRUE.
      ELSE
        get_Vweights=.FALSE.
      END IF
we need .or. instead of .and. in the IF-directive. After I made that change, the dogbone test case runs fine. Thank you to bringing this issue to our attention.

Post Reply