problems on NETCDF4/PARALLEL_IO

General scientific issues regarding ROMS

Moderators: arango, robertson

Post Reply
Message
Author
User avatar
lmkli
Posts: 24
Joined: Wed Aug 02, 2006 1:21 pm
Location: TAMU

problems on NETCDF4/PARALLEL_IO

#1 Unread post by lmkli »

Although NetCDF4/PARALLEL_IO has been available for a long time, I just finished the compilation recently.
And now I still have some problems when I ran it.
I took the UPWELLING case as the test, and I compiled it to two versions.
One was compiled with NetCDF4/HDF5 libraries by turn on PARALLEL_IO, and the other was just without PARALLEL_IO.
Both of them were compiled successfully.
After I ran the two versions of UPWELLING cases, I got some problems very confused.
(I set the Lm, Mm in the ocean_upwelling.in file to 400 and 800)

Firstly, the two versions take different time to finish the job. PARALLEL_IO version is 2 hours and 36 minutes,
while NO_PARALLEL_IO version is only 18 minutes.

Secondly, the size of output NC files are very different. PARALLEL_IO version history file is 999M, while
NO_PARALLEL_IO version is 2.6G. And as well as the avg/dia files.

Thirdly, from the output log file, I can see it report PARALLEL_IO version has two more CPP flags than NO_PARALLEL_IO
version: NETCDF4 and PARALLEL_IO; but from the file header of the output NC files, it shows PARALLEL_IO version
has different additional CPP flags: NETCDF4 and PERFECT_RESTART.

BTW, the compiler is intel fortran, MPI is openmpi-intel.

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: problems on NETCDF4/PARALLEL_IO

#2 Unread post by kate »

Can you restart from the restart file? PERFECT_RESTART could explain the different file sizes you see - you need to save more to have it. Did you ask for PERFECT_RESTART?

Do the output files look reasonable? How many processes did you run with?

User avatar
lmkli
Posts: 24
Joined: Wed Aug 02, 2006 1:21 pm
Location: TAMU

Re: problems on NETCDF4/PARALLEL_IO

#3 Unread post by lmkli »

I didn't try to restart it yet, I'll do that ASAP and post the result here.
I didn't define PERFECT_RESTART in either case.
I applied NtileI,NtileJ = 8 and 16, total is 128 processors.
The output result looks reasonable.
If PERFECT_RESTART can explain the file size of ocean_rst.nc, can it explain the ocean_his.nc file.
I hope PARALLEL_IO can speed up the computing when I applied big problem (say 1000x1000x40 or more),
but the test didn't give me the result I want. PARALLEL_IO seems more slower.

User avatar
arango
Site Admin
Posts: 1367
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Re: problems on NETCDF4/PARALLEL_IO

#4 Unread post by arango »

Make sure that you use a recent version of the code. Last May, I made some corrections to improve parallel I/O efficiency, see :arrow: ticket.

Parallel I/O needs special computer architecture and communications. If the access to writing data into the disk is via network cables, the serial I/O is usually more efficient. See my simple tests in the above ticket. As you increase the number of nodes, you will be penalized substantially for the communications involving parallel I/O. In my opinion, parallel I/O is not for cluster computers that write data frequently to an external disk.

Post Reply