I'm trying to run the dogbone composite test case.
It errors after reading initial state conditions, which seems to be caused by the k index of the donor grid not containing a sensible value.
It must be a problem at my end since dogbone seems to run successfully for many others.
OS Ubuntu 18.04
gfortran, gcc version 7.5.0
netCDF-Fortran 4.4.4
mpirun (Open MPI) 2.1.1
ROMS checkout April 1 2021, SVN Revision : 1053
The tile configuration for this test run in roms_dogbone_composite.in
Code: Select all
NtileI == 1 1 ! I-direction partition
NtileJ == 2 2 ! J-direction partition
Code: Select all
mpirun -np 2 romsG roms_dogbone_composite.in
However, running it with 2 CPU it creates the following error message (full log attached as .txt file)
Code: Select all
...
NLM: GET_STATE - Reading state initial conditions, 0001-01-01 00:00:00.00
(Grid 02, t = 0.0000, File: dogbone_ini_right.nc, Rec=0001, Index=1)
- free-surface
(Min = 0.00000000E+00 Max = 0.00000000E+00)
- vertically integrated u-momentum component
(Min = 0.00000000E+00 Max = 0.00000000E+00)
- vertically integrated v-momentum component
(Min = 0.00000000E+00 Max = 0.00000000E+00)
- u-momentum component
(Min = 0.00000000E+00 Max = 0.00000000E+00)
- v-momentum component
(Min = 0.00000000E+00 Max = 0.00000000E+00)
- potential temperature
(Min = 1.00000000E+01 Max = 1.00000000E+01)
- salinity
(Min = 3.50000000E+01 Max = 3.50000000E+01)
At line 3986 of file nesting.f90
Fortran runtime error: Index '-1094795586' of dimension 2 of array 'ac' below lower bound of 1
Error termination. Backtrace:
At line 3986 of file nesting.f90
Fortran runtime error: Index '-1094795586' of dimension 2 of array 'ac' below lower bound of 1
Error termination. Backtrace:
#0 0x7faf7d9d52ed in ???
#1 0x7faf7d9d5ed5 in ???
#2 0x7faf7d9d62a7 in ???
#0 0x7fe816e322ed in ???
#1 0x7fe816e32ed5 in ???
#2 0x7fe816e332a7 in ???
#3 0x55a6f7583ee5 in __nesting_mod_MOD_put_contact3d
at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/nesting.f90:3986
#4 0x55a6f76c51fc in put_composite
at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/nesting.f90:1801
#5 0x55a6f77723bc in __nesting_mod_MOD_nesting
at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/nesting.f90:300
#3 0x56404f097ee5 in __nesting_mod_MOD_put_contact3d
at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/nesting.f90:3986
#4 0x56404f1d91fc in put_composite
at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/nesting.f90:1801
#5 0x56404f2863bc in __nesting_mod_MOD_nesting
at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/nesting.f90:300
#6 0x55a6f6cc24be in initial_
at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/initial.f90:289
#7 0x55a6f6a537e2 in __ocean_control_mod_MOD_roms_initialize
at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/ocean_control.f90:142
#6 0x56404e7d64be in initial_
at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/initial.f90:289
#8 0x55a6f6a4e966 in ocean
at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/master.f90:96
#9 0x55a6f6a4f0d6 in main
at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/master.f90:51
#7 0x56404e5677e2 in __ocean_control_mod_MOD_roms_initialize
at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/ocean_control.f90:142
#8 0x56404e562966 in ocean
at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/master.f90:96
#9 0x56404e5630d6 in main
at /home/jes/03_science/07_ROMS_models/023/simulations/dogbone_byr/Build_dogbone_byr/master.f90:51
=================================================================
=================================================================
==9387==ERROR: LeakSanitizer: detected memory leaks
==9388==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 2240 byte(s) in 2 object(s) allocated from:
Direct leak of 2240 byte(s) in 2 object(s) allocated from:
#0 0x7faf7e343b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
#0 0x7fe8177a0b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
#1 0x7faf5f40c6b6 in mca_coll_sync_comm_query (/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_coll_sync.so+0x16b6)
#1 0x7fe8006d96b6 in mca_coll_sync_comm_query (/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_coll_sync.so+0x16b6)
Direct leak of 68 byte(s) in 12 object(s) allocated from:
Direct leak of 68 byte(s) in 12 object(s) allocated from:
#0 0x7faf7e2dc538 in strdup (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x77538)
#0 0x7fe817739538 in strdup (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x77538)
#1 0x7faf60075087 (<unknown module>)
#1 0x7fe801342087 (<unknown module>)
Direct leak of 29 byte(s) in 1 object(s) allocated from:
Direct leak of 29 byte(s) in 1 object(s) allocated from:
#0 0x7faf7e2dc538 in strdup (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x77538)
#0 0x7fe817739538 in strdup (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x77538)
#1 0x7faf60726ed0 (<unknown module>)
#1 0x7fe8019f3ed0 (<unknown module>)
#2 0x75762e6f65672e34 (<unknown module>)
#2 0x75762e6f65672e34 (<unknown module>)
Direct leak of 11 byte(s) in 1 object(s) allocated from:
Direct leak of 11 byte(s) in 1 object(s) allocated from:
#0 0x7faf7e343b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
#0 0x7fe8177a0b40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
#1 0x7faf70ca1dad (/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pmix_pmix112.so+0x1ddad)
#1 0x7fe80a160dad (/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pmix_pmix112.so+0x1ddad)
SUMMARY: AddressSanitizer: 2348 byte(s) leaked in 16 allocation(s).
SUMMARY: AddressSanitizer: 2348 byte(s) leaked in 16 allocation(s).
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[17814,1],1]
Exit code: 1
--------------------------------------------------------------------------
Code: Select all
!
! Interpolate.
!
DO k=LBk,UBk
DO m=1,Npoints
i=contact(cr)%Irg(m)
j=contact(cr)%Jrg(m)
kdg=contact(cr)%Kdg(k,m) <----
kdgm1=MAX(kdg-1,Kmin)
IF (((Istr.le.i).and.(i.le.Iend)).and. &
& ((Jstr.le.j).and.(j.le.Jend))) THEN
cff(1)=contact(cr)%Lweight(1,m)*contact(cr)%Vweight(1,k,m)
cff(2)=contact(cr)%Lweight(2,m)*contact(cr)%Vweight(1,k,m)
cff(3)=contact(cr)%Lweight(3,m)*contact(cr)%Vweight(1,k,m)
cff(4)=contact(cr)%Lweight(4,m)*contact(cr)%Vweight(1,k,m)
cff(5)=contact(cr)%Lweight(1,m)*contact(cr)%Vweight(2,k,m)
cff(6)=contact(cr)%Lweight(2,m)*contact(cr)%Vweight(2,k,m)
cff(7)=contact(cr)%Lweight(3,m)*contact(cr)%Vweight(2,k,m)
cff(8)=contact(cr)%Lweight(4,m)*contact(cr)%Vweight(2,k,m)
Ar(i,j,k)=cff(1)*Ac(1,kdgm1,m)+ &
& cff(2)*Ac(2,kdgm1,m)+ &
& cff(3)*Ac(3,kdgm1,m)+ &
& cff(4)*Ac(4,kdgm1,m)+ &
& cff(5)*Ac(1,kdg ,m)+ & <----
& cff(6)*Ac(2,kdg ,m)+ &
& cff(7)*Ac(3,kdg ,m)+ &
& cff(8)*Ac(4,kdg ,m)
Ar(i,j,k)=Ar(i,j,k)*Amask(i,j)
END IF
END DO
END DO
Code: Select all
write(*,*) kdg,i,j,kdgm1
Code: Select all
-1094795586 22 9 1
-1094795586 22 0 1
Any suggestions would be much appreciated.
Stefan