[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SSE Level 3 drop in gemm

>OK.  Thanks!  But even with the sMMRES above, doing the make install
>made a lib without my kernel.  bin/arch/xsl3blastst gives 370 MFLOPS. :-(

I suspect there is an error in the install process in the current developer
release; I used it to install Peter's ultrasparc gemm the other day, and while
it detected it as the fastest, the install did not actually use it.  That
should all be fixed when I revamp things ASAP (right now I'm drudging through
some required "groveling for dollars" in the hopes of getting some funding)

>Just a reminder, one other outstanding issue is the data alignment.
>Setting Atlas_Cachelen did not seem to affect the mmtst.c or fc.c
>programs.  Am I missing something here?

Neither the timer nor tester uses ATL_cachelen; the tester doesn't measure
performance, and the kernels *should* work with any alignment legal for the
type, so it seems like this is the right thing to do to me.  The lack of using
ATL_Cachelen is less defensible in Fc.c, but it already aligns all operands
to 128 byte boundaries.  You need more than that?

Now, as far as ATLAS is concerned, it only aligns the *mallocs* to
ATL_Cachelen; multiple blocks are stored contiguously, which means that
you'll do best if NB*sizeof(TYPE) is a multiple of your ATL_Cachelen . . .