Hi all,
I recently experienced a strange problem after migrating to ROMS 3.0. My application worked fine for serial run but blew up for parallel run (tilting 1 x 4). After blowing up, one of the computing node died. Such strange behavior happened repeatedly. Suggestion is greatly appreciated !!!! Thanks
Shih-Nan
My configuration:
Operating system : Suse Linux
CPU/hardware : x86_64
Compiler system : ifort
Compiler command : /usr/local/mvapich/bin/mpif90
Compiler flags : -ip -O3 -xW -free -free
Resolution, Grid 01: 0192x0101x020, Parallel Nodes: 4, Tiling: 001x004
USE_MPI ?= on
USE_MPIF90 ?= on
My cpp :
#define UV_LOGDRAG
#define UV_ADV
#define UV_PSOURCE
#define DJ_GRADPS
#define TS_MPDATA
#define MIX_GEO_TS
#define TS_PSOURCE
#define NONLIN_EOS
#define SALINITY
#define MASKING
#define SOLVE3D
#define SPLINES
#define RADIATION_2D
#define TCLM_NUDGING /* Nudging of tracer climatology */
#define TCLIMATOLOGY /* Processing of tracer climatology */
#define SOUTH_TNUDGING
#define NORTH_TNUDGING
#define WEST_TNUDGING
#define EASTERN_WALL
#define NORTH_FSCHAPMAN
#define NORTH_M2FLATHER
#define NORTH_M3RADIATION
#define NORTH_TRADIATION
#define WEST_FSCHAPMAN
#define WEST_M2FLATHER
#define WEST_M3RADIATION
#define WEST_TRADIATION
#define SOUTH_FSCHAPMAN
#define SOUTH_M2FLATHER
#define SOUTH_M3RADIATION
#define SOUTH_TRADIATION
#undef SOUTH_FSGRADIENT
#undef SOUTH_M2GRADIENT
#define ANA_INITIAL
#define ANA_TCLIMA
#define ANA_PSOURCE
#define ANA_SMFLUX
#define ANA_SRFLUX
#define ANA_SSFLUX
#define ANA_STFLUX
#define ANA_BSFLUX
#define ANA_BTFLUX
#define ANA_FSOBC
#define ANA_M2OBC
#define ANA_TOBC
#define ANA_SEDIMENT
#define ANA_SPFLUX
#define ANA_BPFLUX
#define GLS_MIXING
#ifdef GLS_MIXING
# define N2S2_HORAVG
# define CANUTO_A
# undef KANTHA_CLAYSON
# undef CRAIG_BANNER
# undef CHARNOK
# undef ZOS_HSIG
# undef TKE_WAVEDISS
#endif
#define SEDIMENT
#ifdef SEDIMENT
# define SUSPLOAD
# undef BEDLOAD_SOULSBY
# undef BEDLOAD_MPM
# undef SED_DENS
# undef SED_MORPH
# undef SED_BIODIFF
#endif
cluster node died after model blow up (3.0 version)
- m.hadfield
- Posts: 521
- Joined: Tue Jul 01, 2003 4:12 am
- Location: NIWA
Try re-making the model with USE_DEBUG=on. This will enable bounds checking, which might well reveal problems with the code when you run it. If not, you need to learn more about the nature of the crash: where it occurs and when. Run the model until just before the crash, save model history fields and examine this for some clues.
By the way, what does the model print to stdout?
By the way, what does the model print to stdout?