problems with increasing the number of processors

Report or discuss software problems and other woes

Moderators: arango, robertson

Post Reply
Message
Author
pavel_fayman
Posts: 7
Joined: Thu Feb 24, 2011 2:40 pm
Location: FERHRI

problems with increasing the number of processors

#1 Unread post by pavel_fayman »

Dear Colleagues. Can anyone suggest what the problem might be.

When i use 60 proc or less, for example:
mpirun -np 60 romsM jes.in
it work well.

when i increase the number of processors, for example: mpirun -np 70 romsM jes.in
result is:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

adnelson
Posts: 4
Joined: Thu Jun 18, 2020 7:49 pm
Location: University of Rhode Island

Re: problems with increasing the number of processors

#2 Unread post by adnelson »

Did you set NtileI and NtileJ in the .in file correctly (NtileI x NtileJ = nproc)?

Does the system you're using have that many processors, or allow you to use that many processors?

I've also run into a problem before when a core on a node on a supercomputer was down (16 cores per node, but one didn't work). Are you able to see which nodes/cores you're using?

If none of these, could you provide some model setup info (resolution, etc.) and system info (number of available processors, RAM, etc.) that could help diagnose the problem?

pavel_fayman
Posts: 7
Joined: Thu Feb 24, 2011 2:40 pm
Location: FERHRI

Re: problems with increasing the number of processors

#3 Unread post by pavel_fayman »

*.h, *.in and *.bash files are attached.

system info:
Supermicro SYS-6029P-WTRT
Intel® Xeon Gold 6152
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 85
model name : Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz
stepping : 4
microcode : 0x200005e
cpu MHz : 1000.651
cache size : 30976 KB
physical id : 0
siblings : 44
core id : 0
cpu cores : 22
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
bogomips : 4200.00
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

768 G memory in 64GB DDR4 2666 MHz (PC4-21300) LR ECC Registered modules
DIMM.
Only 3 knots.
Interconnect Eltex MES5324 10G.

Computations are performed on a 512G virtual machine with 80 cores
Attachments
jes77.h
(9.27 KiB) Downloaded 210 times
jes.in
(130.76 KiB) Downloaded 205 times
build_jes.bash
(17.23 KiB) Downloaded 210 times

Post Reply