[Smeagol-discuss] Error on IBM SP4

Ivan Rungger runggeri at tcd.ie
Thu Apr 3 10:53:57 IST 2008


Dear Xiaobing,

  we have a similar problem on a Blue Gene machine that also uses the
xlf compiler. It is definitely a memory problem, it seems that the when
compiled with xlf the code does not free the allocated memory properly.
One option that you might add is "-qxlf90=autodealloc" which deallocates
automatically all the allocated objects at the end of a subroutine
(except pointers). This however did not solve our problem, and in the
distributed code (smeagol-1.0b) there should be no memory leak. Other
than that we don't know and we will look into the problem ourselves in
some time. Using the proper memory profiling tools you might however be
able to find the problem, if you find the solution please write it.

Cheers,

 Ivan

xiaobing.feng at ipcms.u-strasbg.fr wrote:
> Dear everyone,
>
> I'm trying to run Smeagol on a IBM SP 4 supercomputer.
> The code was compiled successfully. However, I got problem with
> transport calculations.
>
> Smeagol stopped due to the following error:
>
> "negfk.F", line 535: 1525-108 Error encountered while attempting to allocate a
> data object.  The program will stop.
>
> The last a few lines of screen output was normal
> gensvd: Leads decimation
> gensvd: Dim of H1 and S1 :     60
> gensvd: Rank of H1:            14
> gensvd: Rank of (H1,S1):       37
> gensvd: Decimated states:       9
> gensvd: Decimation from the left
>
>
> I used mpxlf_r and mpxlf90_r to compile the code, the compiling flags are:
>    -qzerosize -O0  -qarch=auto -qtune=auto -qcache=auto -qnolm -q64
>
> I thought this error was caused by insufficient memory, so I added
> the following line in job submit script
>    # @ resources = ConsumableCpus(1) ConsumableMemory(2000mb)
> but, problem remains. I even tried to set small number of points
> in energy integration, like NEnergReal 10, and SaveMemtranspK T, without
> success.
>
> The line 535 in negfk.F, which caused the problem, is:
>    ALLOCATE(al(N1),ar(N1),alr(N1))
> I printed out the value of N1. N1 is 328.
>
> Now I have no idea what caused the problem. The same input runs  well
> on a Linux cluster. I'll very grateful if you could help me diagonalize the
> problem.
>
> Many thanks in advance.
> Yours,
>
> Xiaobing
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
> _______________________________________________
> Smeagol-discuss mailing list
> Smeagol-discuss at lists.tchpc.tcd.ie
> http://lists.tchpc.tcd.ie/listinfo/smeagol-discuss
>
>   


-- 
=================================================
Ivan Rungger,

School of Physics,
Trinity College Dublin,
Dublin 2,  IRELAND
Phone: +353-1-6088454
Email: runggeri at tcd.ie

=================================================




More information about the Smeagol-discuss mailing list