[Smeagol-discuss] problem about smeagol parallel efficiency
Barraza-lopez, Salvador
sbl3 at mail.gatech.edu
Tue Jan 5 13:52:41 GMT 2010
Hi Guangping.
The bottom line is these input lines:
NEnergReal 1000
NEnergImCircle 200
NEnergImLine 50
NPoles 20
Neither 6 nor 8 cores divide them. You have NEnergImCircle+NEnergImLine+2NPoles=290 integration points for the equilibrium part and 1000 integration points for the non-equilibrium part of the energy integration. Hence try to make your number of integration points be DIVISIBLE by the number of processors you are using (240 integration points in the equilibrium part and 960 for the non-equilibrium part will work fine -and will scale properly- on 6 and 8 processors). Otherwise try to use 5 processors with the input file as-is and you'll also see scalable performance.
Best regards,
-Salvador.
----- Original Message -----
From: "张广平" <284107217 at qq.com>
To: "smeagol-discuss" <smeagol-discuss at lists.tchpc.tcd.ie>
Sent: Tuesday, January 5, 2010 7:50:13 AM GMT -05:00 US/Canada Eastern
Subject: [Smeagol-discuss] problem about smeagol parallel efficiency
HI,every smeagol user
I have encountered a problem:when I use one core to work,it costs me 35 minutes while 49 minutes for 8 cores to work for the same task.The more cores the more time? Another example : one core cost me 42 minutes while 6 core cost me 26 minutes,the efficiency is so bad.
Our OS is as follows:
-------------------------------------------------------
[test at localhost LIB]$ uname -a
Linux localhost 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
----------------------------------------------------------
And there are 8 core for one node.We now just let the parallel in one node.The information for one core is :
-----------------------------------------------------------
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
stepping : 6
cpu MHz : 2666.844
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips : 5333.68
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
------------------------------------------------------------------
All the 8 cores are the same in one node.
When I compile the code for parallel mode,I use mkl-10.0.2.018,and all the math library form it.The fortran compile used is pgi7.0,and MPI I use mpich-1.2.7.Are there any problems for the software I used that lead to bad efficiency.
My arch.make for paralle is:
-----------------------------------------------------------------
SIESTA_ARCH=pgf90
FC=mpif90
FC_ASIS=$(FC)
FFLAGS= -tp p7-64 -OPT:Ofast -O2
LDFLAGS= -tp p7-64 -OPT:Ofast -O2
COMP_LIBS =
FFLAGS_DEBUG=
TRANSPORTFLAGS = -tp p7-64 -OPT:Ofast -O2 -c
SOURCE_DIR=/home/test/software/smeagol-1.3.7
EXEC = smeagolpara
#NETCDF_LIBS=/usr/local/netcdf-3.5/lib/pgi/libnetcdf.a
#NETCDF_INTERFACE=libnetcdf_f90.a
#DEFS_CDF=-DCDF
MPI_INTERFACE=libmpi_f90.a
MPI_INCLUDE=/home/test/software/mpich-1.2.7/include
DEFS_MPI=-DMPI
BLAS_LIBS= -L/home/test/intel/mkl/10.0.2.018/lib/em64t -lmkl_solver -lmkl_em64t -lguide -lpthread
LAPACK_LIBS= -L/home/test/intel/mkl/10.0.2.018/lib/em64t -lmkl_lapack -lmkl_core
BLACS_LIBS= -L/home/test/intel/mkl/10.0.2.018/lib/em64t -lmkl_blacs_lp64
SCALAPACK_LIBS= -L/home/test/intel/mkl/10.0.2.018/lib/em64t -lmkl_scalapack_lp64
LIBS= $(SCALAPACK_LIBS) $(BLACS_LIBS) $(LAPACK_LIBS) $(BLAS_LIBS)
RANLIB=echo
SYS=bsd
DEFS= $(DEFS_CDF) $(DEFS_MPI)
#
.F.o:
$(FC) -c $(FFLAGS) $(DEFS) $<
.f.o:
$(FC) -c $(FFLAGS) $<
.F90.o:
$(FC) -c $(FFLAGS) $(DEFS) $<
.f90.o:
$(FC) -c $(FFLAGS) $<
#
------------------------------------------------------------------
The whole mkl lib is :
------------------------------------------------------------------
libguide.a libmkl_intel_lp64.a
libguide.so libmkl_intel_lp64.so
libiomp5.a libmkl_intel_sp2dp.a
libiomp5.so libmkl_intel_sp2dp.so
libmkl_blacs_ilp64.a libmkl_intel_thread.a
libmkl_blacs_intelmpi20_ilp64.a libmkl_intel_thread.so
libmkl_blacs_intelmpi20_lp64.a libmkl_lapack.a
libmkl_blacs_intelmpi_ilp64.a libmkl_lapack.so
libmkl_blacs_intelmpi_lp64.a libmkl_mc.so
libmkl_blacs_lp64.a libmkl_p4n.so
libmkl_blacs_openmpi_ilp64.a libmkl_scalapack.a
libmkl_blacs_openmpi_lp64.a libmkl_scalapack_ilp64.a
libmkl_cdft.a libmkl_scalapack_lp64.a
libmkl_cdft_core.a libmkl_sequential.a
libmkl_core.a libmkl_sequential.so
libmkl_core.so libmkl.so
libmkl_def.so libmkl_solver.a
libmkl_em64t.a libmkl_solver_ilp64.a
libmkl_gf_ilp64.a libmkl_solver_ilp64_sequential.a
libmkl_gf_ilp64.so libmkl_solver_lp64.a
libmkl_gf_lp64.a libmkl_solver_lp64_sequential.a
libmkl_gf_lp64.so libmkl_vml_def.so
libmkl_gnu_thread.a libmkl_vml_mc2.so
libmkl_gnu_thread.so libmkl_vml_mc.so
libmkl_intel_ilp64.a libmkl_vml_p4n.so
libmkl_intel_ilp64.so
---------------------------------------------------------------
When I run the task, I first copy the compiled executable file smeagolpara in the directory /Src to my work directory,then I use the command: mpirun -np 8 smeagolpara <Auwire.fdf> mx.log & after the lead calculation.
By the way ,can the lead use parallel calculation? It seems not for me.
I put the input files in attachment.
Any advise is welcome!
BEST REGARDS!
YOURS
Guangping Zhang
_______________________________________________ Smeagol-discuss mailing list Smeagol-discuss at lists.tchpc.tcd.ie http://lists.tchpc.tcd.ie/listinfo/smeagol-discuss
--
Salvador Barraza-Lopez
Postdoctoral Fellow
School of Physics
The Georgia Institute of Technology
Office N205
837 State Street Atlanta, Georgia 30332-0430 U.S.A
Tel: (404) 894-0892 Fax: (404) 894-9958
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tchpc.tcd.ie/pipermail/smeagol-discuss/attachments/20100105/d32c351b/attachment.html
More information about the Smeagol-discuss
mailing list