[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FW: newbie Athlon optimization question...


>Reading over the documentation online, it seems like what I need to do is
>basically just write some faster kernel implementation and stick it in
>ATLAS/tune/blas/gemm/CASES, as well as create one of those description
>files.  It also mentioned that the main bulk of time was taken up by the
>matrix-matrix multiply algorithm.  I'm trying to base a faster Athlon
>implementation off of some existing ATLAS code.  Of the files listed
>below, which one actually is the matrix-matrix multiply source for the
>kernel?  BTW this was taekn from my tracing of HPL as mentioned in the
>earlier email:

After you've finished the install, the main kernel is in

To use this bad boy as a contributed kernel as described in atlas_contrib.ps,
don't forget to change the name of the routine, and make it be able to
handle multiple BETAs, as described in atlas_contrib.

I've tried to improve it by hand myself, with no particular luck.  The x87
ISA with it's 8 register stack leaves little room for elegance or
intelligence.  With only 8 registers, register blocking is pretty much
impotent, meaning you are hitting the L1 all the time.  This traffic seems
to pretty much rule out improvements from adding prefetch, as far as I can

>Lastly, is there an easy way to find out if ATLAS really did use
>my source file/function?  Or do I have to trace the execution of a program
>to find out which function was called?  Thanks for your help.

After you follow the instructions on page 22 to get it to use your kernel,
scoping ATLAS/tune/blas/gemm/<arch>/res/dMMRES will tell you what it used.