[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SSE Level 3 drop in gemm



Hello again!  Sorry to write so frequently, but this is almost wrapped
up, and I'd like to get it out quickly if possible, while its fresh in
my mind.

R Clint Whaley <rwhaley@cs.utk.edu> writes:

> Camm,
> 
> >Now I have a different issue.  My kernel likes nb=56 the best.  Atlas
> >standard likes nb=64.  And this is what I get in sMMRES:
> >
> >intech20:/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/tune/blas/gemm/Linux_fpic$ cat res/sMMRES
> >MULADD  LAT  NB  MU  NU  KU  FFTCH  IFTCH  NFTCH    MFLOP
> >     0    2  64   5   1  64      0      5      1   371.46
> >16
> >ATL_sgemm_SSE.c "CM"
> >     1    1  64   2   2  64      0      4      1   617.61
> 
> The best solution is probably to add another case to scases.dsc.  I guess
> you have a line that looks something like this right now:
> 0 1 1 1 1 1 2 2 64 ATL_sgemm_SSE.c "CM"
> add a second line, with the explicit 56 blocking factor:
> 0 -56 -56 -56 1 1 2 2 64 ATL_sgemm_SSE.c "CM"
> 

OK.  Thanks!  But even with the sMMRES above, doing the make install
made a lib without my kernel.  bin/arch/xsl3blastst gives 370 MFLOPS. :-(

> And that should force ATLAS time using your best blocking factor, which should
> then be selected . . .  In the long run, I think it makes sense to have ATLAS
> do this: if a user-contributed kernel is found to beat the generated kernel,
> run through the list of possible NBs again, to find the best.  I'll look at
> adding that while I'm doing the cleanup stuff . . .
> 

Sounds good!

> >Also, any sugegstions on the unrolling issue?  If I put in macros to
> >unroll k at differing levels depending on KB, would that confuse the
> >search engine?  Should install faster than having 3 different k unroll
> >kernels to time.
> 
> I don't think it would confuse the script, but you'd have to force the
> different NB's at present via a technique like above.  Should work like
> a charm if I add the NB search, though . . .
> 

OK.

> Cheers,
> Clint
> 
> 

Just a reminder, one other outstanding issue is the data alignment.
Setting Atlas_Cachelen did not seem to affect the mmtst.c or fc.c
programs.  Am I missing something here?

Take care,


-- 
Camm Maguire			     			camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah