Parallel I/O via the NetCDF-4/HDF5 libraries released

ROMS Code Release Announcements

Moderators: arango, robertson

Post Reply
Message
Author
User avatar
arango
Site Admin
Posts: 1367
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Parallel I/O via the NetCDF-4/HDF5 libraries released

#1 Unread post by arango »

Finally, I am done with phase III of the parallel I/O implementation in ROMS. This was very tricky because it required a complete re-design of the I/O structure of ROMS to get a generic parallel interface (mod_netcdf.F) for all input and output files. It is highly recommended to use the routines in this module for any non-tiled NetCDF I/O processing. Notice that most of the calls to the NetCDF library functions (nf90_***) are made from the ROMS module (mod_netcdf). The exceptions are def_dim.F, def_info.F, def_var.F, and reading (nf_fread*d.F) and writing (nf_fwrite*d.F) routines.

During the update of input (nf_fread*d) routines, I noticed that we were processing the ghost points during reading in distributed-memory by setting the local variable Nghost:

Code: Select all

      IF (model.eq.iADM) THEN
        Nghost=0
      ELSE
        Nghost=GHOST_POINTS
      END IF
This is completely removed now and Nghost=0. That is, the tile ghost points are not filled during reading. These points are processed elsewhere with the appropriate calls to the distributed-memory exchange routines in mp_exchange.F.

:idea: Warning: With parallel I/O there are many combinations of NetCDF and HDF libraries that may need to be built. The build scripts build.sh and build.bash have changed accordingly. You may build NetCDF-3 and NetCDF-4 libraries in serial. If creating new NetCDF-4 file format (NETCDF4 C-preprocessing option) you may need serial versions of the NetCDF4/HDF5 libraries. This is the case in serial and shared-memory ROMS applications. If you want parallel I/O, you need to compile NetCDf4/HDF5 with the MPI library as explained below.

Parallel I/O in ROMS:
  • To activate parallel I/O in ROMS you need to turn on both PARALLEL_IO and NETCDF4 C-preprocessing options. You also need to compile ROMS with the MPI library. That is, the macro USE_MPI must be activated in the makefile, build.sh or build.bash scripts.
  • Parallel I/O is only possible in with the NetCDF-4 and HDF5 libraries.
  • The NetCDF-4 and HDF5 libraries must be built with the same compiler and compiler options.
  • The HDF5 library (version 1.8.1) must be built with the --enable-parallel flag. The NetCDF configure script will detect the parallel capability of HDF5 and build the NetCDF parallel I/O features automatically.
  • Parallel I/O is only possible in distributed-memory and requires an implementation of MPI-2. Use, for example, the :arrow: MPICH2 implementation. Check also this :arrow: link. It does not work yet with the :arrow: OpenMPI library because the variable MPI_COMM_WORLD is always zero in calls to mpi_init. We need a non-zero value for the HDF5 library to work. We reported this problem to the OpenMPI developers.
  • Parallel I/O requires the MPI-IO layer which is part of MPI-2. You need to be sure that the MPI-IO layer is activated when building the implementation of MPI-2.
  • In MPI parallel I/O applications, the processing can be Collective or Independent. In the NetCDF-4/HDF5 library the parallel I/O access is set by a call to the function nf90_var_par_access(ncid, varid, access) for each I/O variable. Calling this function affects only the open file. This information is not written to the data file. The default is to treat all the variables to Collective operations. The parallel access status lasts as long as the file is open or changed. This function can be called as often as desired. Independent I/O access means that processing is not dependent on or affected by other parallel processes (nodes). This is the case in ROMS non-tiled variables. Contrarily, Collective I/O access implies that all parallel processes participate during processing. This is the case for ROMS tiled variables: each node in the group reads/writes their own tile data when PARALLEL_IO is activated.
  • File compression (DEFLATE C-preprocessing option) is not possible in parallel I/O for writing data. This is because the compression makes it imposible for the HDF5 library to exactly map the data to the disk location. However, deflated data can be read with parallel I/O.
  • Parallel I/O performance gains can be seen on multi-core computer architectures. However, parallel I/O can be obtained on a Linux workstation where multiple processor can be simulated.
  • The MPICH2 library generate a lot of messages to standard output. Therefore, use the following command when running ROMS to get rid of all those annoying messages:

    Code: Select all

    mpirun -np 4 oceanM ocean.in > & log < /ded/null &
:idea: :idea: Warning: The mpirun script is different for MPICH, MPICH2, and OpenMPI implementations of MPI :!: You always need to use the appropriate script :!:

HDF5 and NetCDF-4 compiling notes:
  • As mentioned above, we have been unable to get the Fortran 90 interface of HDF5/NetCDF4 to work with OpenMPI. We have even tried the latest OpenMPI 1.3 that was release January 19, 2009. Unfortunately, all the parallel tests that come with NetCDF4 are C codes and thus don't catch this error.
  • At this time we suggest you use MPICH2 to compile HDF5, NetCDF4 and ROMS. If you discover other parallel compilers that work, please let us know.
  • You will need a separate HDF5 and NetCDF4 library for each compiler/mpi combination you want to use. For example, if you want to be able to run in serial and parallel you will need a separate serial version of HDF5 and NetCDF4. If you have 2 different MPI implementations with the same compiler (i.e. MPICH2 and MVAPICH2 for your PGI compler), you will need to compile a separate HDF5 and NetCDF4 library for each MPI implementation.
  • When configuring HDF5, parallel libraries will be built automatically if your CC environment variable is set to mpicc. If your parallel compiler has a non-standard name, you will probably need to use the --enable-parallel flag when configuring. NetCDF4's configure script will recognize that HDF5 was built for parallel I/O and turn on NetCDF4's parallel I/O features.
  • When configuring NetCDF4 you MUST include the --enable-netcdf4 option to build the version 4 interface (including parallel IO) for NetCDF.
  • We had a lot of problems compiling NetCDF 4.0 and NetCDF 4.0.1-beta2 releases and ended up using the daily snapshot instead. We are using the snapshot from January 12th 2009.
  • If building with gfortran, g95, pgi, or ifort (and possibly others), it is important to set the CPPFLAG -DpgiFortran for HDF5 and NetCDF4 or you will get name mismatches within the resulting libraries
:idea: :idea: :idea: Warning: Please use ROMS parallel I/O wisely. Do not overkill by running relatively small grids with a lot of CPU processes. This will decrease the efficiency due to the overhead of MPI communications. Recall that there is always an optimal number of processes for a particular application :!: Let's start testing a documenting the gains of using parallel I/O in different architectures. Please post your results in this forum.

There is an overhead at the beginning when the output NetCDF files are created. This is due to several of the scalars and application parameters that are defined by def_info.F and written by wrt_info.F. I am not too worried about this right now because this only happens when the file is created. I am investigating writing these into a structure or group. Notice that structures and groups are part of HDF. Processing structures is easy in C but more complicated in Fortran-90. My problem with structures is that we need to know its size in bytes in advance. The groups look promising, check NetCDF-4 new groups, compound and user derived types for more details. By the way, the NetCDF-4 file format is actually an HDF file with NetCDF self-describing metadata design.

Post Reply