[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: developer release 3.1.2


R Clint Whaley <rwhaley@cs.utk.edu> writes:

> Camm,
> >1) It looks as though the prefetch assisted double precision level2
> >   will max out at about 50% + standard atlas.  Transpose: 94 ->140,
> >   Notrans: 67 -> 97.  dger remains to be completed.  Basically, I
> >   just looked at the atlas compiled assembler, and added prefetch.
> >
> >   So the rule of thumb appears to be SIMD +50%, prefetch +50%, 
> >   both +100%.
> 50% from prefetch alone is quite nice; do you have a way of packaging
> the assembler in a C file, or will I need to modify the makefiles to
> support assembler?  I wouldn't be surprised to see a greater gain
> for double precision complex . . .

I have a header/c-file setup like the single and complex just about
ready.  dger only shows about +25%, due to the extreme cache
pollution, I suppose.

> >3) I do hope we can find a solution for distributed atlas binaries.  I
> >   know the idea is for the user to build atlas on each platform they
> >   will use, and that the current tree will skip any routines which
> >   fail to compile on a given platform, (i.e. if there is no SIMD
> >   support).  Serious users will do this no doubt.  
> I don't know much about the .deb format, but I thought I read once that
> it could run scripts.  No chance you can run a simple example SIMD program,
> and install SIMD-enabled lib when it works, and the PII-style when it does
> not, I guess?

Good idea. Thanks!

> >4) Do we have an idea as to when we might want to release a
> >   SIMD-enhanced atlas, say in Debian?  
> You guys can of course release any time you wish.  Antoine and I are trying
> to get the next official release of ATLAS ready by the end of the summer,
> but we'll see if we get it rolling or not.  There are two main additions
> for this next big release:  the opening up of the kernels for outside
> contribution, and the addition of SMP support via pthreads.  As soon as we
> get these guys in and tested, we'll have a release.  
> For the first phase of the work, Antoine worked seperately on threading while
> I worked on the infrastructure necessary to open up the kernels.  We have
> just started the process of bringing the work back together so it all is
> in one package.  When we get something working at all reliably, we'll have
> developer releases that include threading, so you should be able to follow,
> at least roughly, the progress to the next ATLAS release.
> After the release, I hope to formalize a bit more the developer/regular
> releases.  I certainly plan to keep both around: having a developer release
> with the newest stuff that we have is certainly a boon to people working on
> the package, and allows everyone to get stuff used much quicker than we
> can give out with the "stable" releases . . .
> >Is there any word on the most important level3 front?
> I haven't heard from the emmerald guys since they said that giving GEMM to
> us as a kernel provided too poor of a performance.  Since it apparently
> beat our current kernel, I disagree, but you can't release code you don't
> have :)  Last I heard they were working on a complete GEMM instead . . . 
> As a general rule, if I hear anything important on the developer front,
> I'll CC to the list . . .

This sounds somewhat like the blocking issues we were discussing with
the level2 sometime back.  In that case, while there certainly is a
hit, it appears to be small for reasonable routines.  I suppose I'm
persuaded of the virtue of an all-purpose kernel, though I don't
exactly know why :-).  Seriously, though, I'm persuaded by the virtue
of the quality of atlas as a whole.

> I agree that the Level 3 is the most important for performance reasons, but
> to me the main thing is to have the ability to contribute in the stable
> package;  I think particular contributions will come later.  So far, I have your
> stuff, and Goto's gemm: these are already significant proof-of-concept,
> and once people see the power of this building block approach, I hope
> that people will fill in the pieces we don't have . . .

You had mentioned trying a gemv based gemm for the complex in an
earlier message.  As a lark, I just tried that for the single
precision.  I seem to get about as good as the standard atlas gemm
(~350 MFLOPS, sgemv was ~ 250 MFLOPS), but the mmsearch did not pick
my routine. You had also indicated that this strategy was not the best
way to go, most likely.  Could you elaborate a bit on what would
likely be needed beyond a loop over gemv?  It seems as though one
cannot count on longer contiguous vectors than kb no matter what one

> That's why we will have an ATLAS release as soon as Antoine and I get our
> stuff together, regardless of what outside contribution we have in place
> at that time: the quicker we get this stuff in front of all our users
> (remember, right now the only people who know about the developer release
> and kernel contribution are a few people I sent mail to, and those who
> have stumbled over the web page somehow) the quicker the holes in our
> coverage will fill up . . .
> Cheers,
> Clint

Take care,

Camm Maguire			     			camm@enhanced.com
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah