[Smeagol-discuss] compiling siesta in parallel

Jimmy Tang jtang at tchpc.tcd.ie
Mon Mar 10 19:45:55 GMT 2008


Hi Sushil,

On Mon, Mar 10, 2008 at 09:40:02AM +0530, Sushil Auluck wrote:

>>>
>>>>                  -L/opt/intel/cmkl/10.0.011/lib/em64t 
>>>> -lmkl_scalapack_ilp64 -lmkl_scalapack -lmkl_blacs_intelmpi20_ilp64 
>>>> -lmkl_lapack -lmkl_em64t -lguide -lpthread -lrt -lsvml -lmkl_core
>>>> [slakxncf at master Src]$ ls -l siesta
>>>> -rwxrwxr-x  1 slakxncf slakxncf 14643324 Mar  9 14:16 siesta
>>>> [slakxncf at master Src]$ cd ../Examples/SiH/
>>>> [slakxncf at master SiH]$ mpdtrace
>>>> master
>>>> [slakxncf at master SiH]$ mpirun -n 2 ../../Src/siesta < sih.fdf
>>>> [cli_0]: aborting job:
>>>> Fatal error in MPI_Comm_rank: Invalid communicator, error stack:
>>>> MPI_Comm_rank(107): MPI_Comm_rank(comm=0x5b, rank=0x11ca864) failed
>>>> MPI_Comm_rank(65).: Invalid communicator
>>>> [cli_1]: aborting job:
>>>> Fatal error in MPI_Comm_rank: Invalid communicator, error stack:
>>>> MPI_Comm_rank(107): MPI_Comm_rank(comm=0x5b, rank=0x11ca864) failed
>>>> MPI_Comm_rank(65).: Invalid communicator
>>>> rank 0 in job 7  master.iitk.com_48605   caused collective abort of all 
>>>> ranks
>>>>   exit status of rank 0: return code 1
>>>> [slakxncf at master SiH]$
>>>

there seems to be a mismatch of mpi interfaces in your arch.make file,

>>>
>> ------------------------------------------------------------------------
>>
>> SIESTA_ARCH=intel9-cmkl8-mpi
>> #
>> # arch.make created by Lucas Fernandez Seivane, quevedin at gmail.com
>> # You may need to change the name of the compiler, location of libraries...
>> # Modified by Alberto Garcia to suit cryst at the UPV.
>> #
>> # Note: The -mp1 option is necessary to recover IEEE floating point precision,
>> #       but it sometimes leads to bad code. Use -mp instead.
>> #       In this released .make file, we not use the highest optimization.
>> #
>> LANG=
>> #FC=mpiifort
>> FC=mpif90   FC_ASIS=$(FC)
>> #
>> FFLAGS=-O1 FFLAGS_DEBUG= -g -O0
>> RANLIB=echo #MPI_INCLUDE=/opt/intel/mpi/2.0/include
>> MPI_INCLUDE=/opt/mpich-1.2.6/include   MPI_INTERFACE=libmpi_f90.a
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
here you are pointing to mpich 1.2.6 (note the smeagol devs have tested
and use mpich) which seems fine.

>> DEFS_MPI=-DMPI
>> #
>> #LIBS=-L/opt/intel/cmkl/8.0/lib/32  -lmkl_scalapacktesting_intel80  \
>> #      -lmkl_scalapack -lmkl_blacs_intelmpi20  \
>> #      -lmkl_lapack -lmkl_ia32 -lguide -lpthread -lrt -lsvml
>> LIBS=-L/opt/intel/cmkl/10.0.011/lib/em64t -lmkl_scalapack_ilp64  \
>>      -lmkl_scalapack -lmkl_blacs_intelmpi20_ilp64  \
>>      -lmkl_lapack -lmkl_em64t -lguide -lpthread -lrt -lsvml -lmkl_core

the problem may be because you are trying to use the mkl_blacs and
mkl_scalapack libs from intel which are probably (from the looks of the
names) is using the mpi2.0 standard. This mismatch maybe causing your
problems when you are trying to run in parallel across two machines.

I would suggest you try using blacs and scalapack that is linked with
mpich 1.2.x (the same version you are using above)



Jimmy.

-- 
Jimmy Tang
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | http://www.tchpc.tcd.ie/~jtang
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://lists.tchpc.tcd.ie/pipermail/smeagol-discuss/attachments/20080310/3def3c95/attachment.bin 


More information about the Smeagol-discuss mailing list