"Basic Groundwork"(MPI-choice/ etc) -PGI_Compiled

Discussion on computers, ROMS installation and compiling

Moderators: arango, robertson

Post Reply
Message
Author
timchipman

"Basic Groundwork"(MPI-choice/ etc) -PGI_Compiled

#1 Unread post by timchipman »

Hi all,

sorry if this is "obvious" stuff, but I've trawled the forums and am having a hard time putting together some pieces. If someone can answer more-or-less clearly a few things, it would help me tremendously.

Basic context: I'm a sysadmin setting up a cluster to run Roms 2.2 (possibly 3.0 at a later time). Cluster is two-way dual-core opteron, gig-ether interconnect, Rocks (Redhat/centOS) platform, using PGI compiler to build (Intel/ifort seemed too hellish after first attempts for MPI)

I've got a "successful" (ie, it works but performance stinks) build of Roms thus,

-PGI compiler suite (latest and greatest version)
-MPICH for MPI, default config/install - compiled with PGI "by hand"

"performance stinks" means that as we add more CPUs the overall runtime *increases*. Thus, a 4-cpu job (single node) takes 30minutes with a test data set; then the same data set on 8-CPU run takes approx 60 minutes, and then 16 CPU it takes about 80-90minutes. In all cases they run as straight MPI-only job though, launched in identical manner. (Brief review of output suggests that "Halo exchange" is punishing us with the MPI scale-up? and also that 2d analysis phase in particular is suffering .. ? but alas I'm not really familiar with this, being a "sysadmin-type", not a "modeller-type")

I'm curious,

* What is the "recommended MPI for best performance" with roms 2.2? Roms 3.0? (one posting I've seen suggests that LAM is much better than MPICH ; now it seems LAM is replaced by "OpenMPI" though? and I see no mention of it anywhere.. and also I gather PGI may have available an integrated "tuned" MPI of some kind too? (not just a MPICH rebuild?) )

* Is there any option (now?future?) for "hybrid" builds, ie, OpenMP for SMP operation withing single SMP cluster node, but MPI job spanning multiple nodes in cluster ? I've seen discussion in forum on this topic on-and-off but haven't exactly seen clear concensus..

* IF anyone feels so inclined, pointers or "specific build hints" for a given "recommended / working well" MPI/PGI built setup .. would certainly be **VERY** welcome also. For that matter, comments on possible benefits of migrating successfully from Roms 2.2->3.X would also not be unwelcome :-)

(I've tried,for example to build my Roms using not just Mpich, but also tried LAM and OpenMPI. The OpenMPI build attempts have simply failed so far - odd link/library issues (?) - and my attempt to build with LAM has been semi-successful, in that I believe I have a binary now compiled which launches, but I'm not certain it actually works (more testing needed on calling / launching it properly; slight hassle since cluster uses "SSH-no-passwords-needed" not RSH which is default of LAM? ugh?)

If anyone actually recommends it, I can also do builds using ifort,not just PGI, but I gather that ifort is a bit messy to build MPI Roms .. (?)

And - of course - I will summarize and post back to this thread any findings / progress I have with this topic, in case it is of use / interest to others.

Many thanks,

--Tim Chipman

RubenDiez-Lazaro

#2 Unread post by RubenDiez-Lazaro »

The poor performance can be produced by a network overload...

When you run a application in only a physical node (4 processes, one by each virtual processor), Is there traffic in the network?? If there are traffic in the network using only a physical node (4 processes in your configuration), something fails....

The communication among processes in SAME physical machine must use the memory, not the network....

I tested some MPI implementations and finally i use the MPICH2

In order to use memory for communications between processes in same physical machine, you must compile the MPICH2 package passing to the configure script the option:

Code: Select all

--with-device=ch3:ssm
You can read the page 12 ("choose the communication device") of the "MPICH2 installer's guide" document from the MPICH official site for more info about this option...

Regards

timchipman

#3 Unread post by timchipman »

Hi,

Many thanks for the reply. I'm not certain I've compiled MPICH2 using the flag specified. Will double-check this and see.

Also to mention (since absent from my original post) - network traffic appears very low in these cases.


Tim

lefevre

Re: "Basic Groundwork"(MPI-choice/ etc) -PGI_Compiled

#4 Unread post by lefevre »

Some clue :

With Tyan S2885 (2-way+2-core 285 opteron), Roms is scaling very well, even using all cores per node. But I use openMPI + ifort (same result with pgi+openMPI). This is true since I upgrade my gig-ethernet to Infiniband. Before with gig-ethernet, past 2 nodes, the scaling was very poor.

Attention : processor and memory affinity are an issue and if your process pingpong from one processor to another (check with "taskset") this is wrong. OpenMPI do a very good job about cpu-affinity and with MPI in general. Please, Look openMPI forum and Faq.

Best regard
Jerome Lefevre

Post Reply