[Smeagol-discuss] problem about smeagol parallel efficiency
张广平
284107217 at qq.com
Tue Jan 5 12:50:13 GMT 2010
HI,every smeagol user
I have encountered a problem:when I use one core to work,it costs me 35 minutes while 49 minutes for 8 cores to work for the same task.The more cores the more time? Another example : one core cost me 42 minutes while 6 core cost me 26 minutes,the efficiency is so bad. Our OS is as follows:
-------------------------------------------------------
[test at localhost LIB]$ uname -a
Linux localhost 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
----------------------------------------------------------
And there are 8 core for one node.We now just let the parallel in one node.The information for one core is :
-----------------------------------------------------------
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
stepping : 6
cpu MHz : 2666.844
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips : 5333.68
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
------------------------------------------------------------------
All the 8 cores are the same in one node.
When I compile the code for parallel mode,I use mkl-10.0.2.018,and all the math library form it.The fortran compile used is pgi7.0,and MPI I use mpich-1.2.7.Are there any problems for the software I used that lead to bad efficiency.
My arch.make for paralle is:
-----------------------------------------------------------------
SIESTA_ARCH=pgf90
FC=mpif90
FC_ASIS=$(FC)
FFLAGS= -tp p7-64 -OPT:Ofast -O2
LDFLAGS= -tp p7-64 -OPT:Ofast -O2
COMP_LIBS =
FFLAGS_DEBUG=
TRANSPORTFLAGS = -tp p7-64 -OPT:Ofast -O2 -c
SOURCE_DIR=/home/test/software/smeagol-1.3.7
EXEC = smeagolpara
#NETCDF_LIBS=/usr/local/netcdf-3.5/lib/pgi/libnetcdf.a
#NETCDF_INTERFACE=libnetcdf_f90.a
#DEFS_CDF=-DCDF
MPI_INTERFACE=libmpi_f90.a
MPI_INCLUDE=/home/test/software/mpich-1.2.7/include
DEFS_MPI=-DMPI
BLAS_LIBS= -L/home/test/intel/mkl/10.0.2.018/lib/em64t -lmkl_solver -lmkl_em64t -lguide -lpthread
LAPACK_LIBS= -L/home/test/intel/mkl/10.0.2.018/lib/em64t -lmkl_lapack -lmkl_core
BLACS_LIBS= -L/home/test/intel/mkl/10.0.2.018/lib/em64t -lmkl_blacs_lp64
SCALAPACK_LIBS= -L/home/test/intel/mkl/10.0.2.018/lib/em64t -lmkl_scalapack_lp64
LIBS= $(SCALAPACK_LIBS) $(BLACS_LIBS) $(LAPACK_LIBS) $(BLAS_LIBS)
RANLIB=echo
SYS=bsd
DEFS= $(DEFS_CDF) $(DEFS_MPI)
#
.F.o:
$(FC) -c $(FFLAGS) $(DEFS) $<
.f.o:
$(FC) -c $(FFLAGS) $<
.F90.o:
$(FC) -c $(FFLAGS) $(DEFS) $<
.f90.o:
$(FC) -c $(FFLAGS) $<
#
------------------------------------------------------------------
The whole mkl lib is :
------------------------------------------------------------------
libguide.a libmkl_intel_lp64.a
libguide.so libmkl_intel_lp64.so
libiomp5.a libmkl_intel_sp2dp.a
libiomp5.so libmkl_intel_sp2dp.so
libmkl_blacs_ilp64.a libmkl_intel_thread.a
libmkl_blacs_intelmpi20_ilp64.a libmkl_intel_thread.so
libmkl_blacs_intelmpi20_lp64.a libmkl_lapack.a
libmkl_blacs_intelmpi_ilp64.a libmkl_lapack.so
libmkl_blacs_intelmpi_lp64.a libmkl_mc.so
libmkl_blacs_lp64.a libmkl_p4n.so
libmkl_blacs_openmpi_ilp64.a libmkl_scalapack.a
libmkl_blacs_openmpi_lp64.a libmkl_scalapack_ilp64.a
libmkl_cdft.a libmkl_scalapack_lp64.a
libmkl_cdft_core.a libmkl_sequential.a
libmkl_core.a libmkl_sequential.so
libmkl_core.so libmkl.so
libmkl_def.so libmkl_solver.a
libmkl_em64t.a libmkl_solver_ilp64.a
libmkl_gf_ilp64.a libmkl_solver_ilp64_sequential.a
libmkl_gf_ilp64.so libmkl_solver_lp64.a
libmkl_gf_lp64.a libmkl_solver_lp64_sequential.a
libmkl_gf_lp64.so libmkl_vml_def.so
libmkl_gnu_thread.a libmkl_vml_mc2.so
libmkl_gnu_thread.so libmkl_vml_mc.so
libmkl_intel_ilp64.a libmkl_vml_p4n.so
libmkl_intel_ilp64.so
---------------------------------------------------------------
When I run the task, I first copy the compiled executable file smeagolpara in the directory /Src to my work directory,then I use the command: mpirun -np 8 smeagolpara <Auwire.fdf> mx.log & after the lead calculation.
By the way ,can the lead use parallel calculation? It seems not for me.
I put the input files in attachment.
Any advise is welcome!
BEST REGARDS!
YOURS
Guangping Zhang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tchpc.tcd.ie/pipermail/smeagol-discuss/attachments/20100105/06995a64/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arch.make
Type: application/octet-stream
Size: 1112 bytes
Desc: not available
Url : http://lists.tchpc.tcd.ie/pipermail/smeagol-discuss/attachments/20100105/06995a64/attachment-0004.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Au.fdf
Type: application/octet-stream
Size: 2125 bytes
Desc: not available
Url : http://lists.tchpc.tcd.ie/pipermail/smeagol-discuss/attachments/20100105/06995a64/attachment-0005.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Au.psf
Type: application/octet-stream
Size: 147698 bytes
Desc: not available
Url : http://lists.tchpc.tcd.ie/pipermail/smeagol-discuss/attachments/20100105/06995a64/attachment-0006.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Auwire.fdf
Type: application/octet-stream
Size: 3364 bytes
Desc: not available
Url : http://lists.tchpc.tcd.ie/pipermail/smeagol-discuss/attachments/20100105/06995a64/attachment-0007.obj
More information about the Smeagol-discuss
mailing list