[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: FW: newbie Athlon optimization question...
>Reading over the documentation online, it seems like what I need to do is
>basically just write some faster kernel implementation and stick it in
>ATLAS/tune/blas/gemm/CASES, as well as create one of those description
>files. It also mentioned that the main bulk of time was taken up by the
>matrix-matrix multiply algorithm. I'm trying to base a faster Athlon
>implementation off of some existing ATLAS code. Of the files listed
>below, which one actually is the matrix-matrix multiply source for the
>kernel? BTW this was taekn from my tracing of HPL as mentioned in the
After you've finished the install, the main kernel is in
To use this bad boy as a contributed kernel as described in atlas_contrib.ps,
don't forget to change the name of the routine, and make it be able to
handle multiple BETAs, as described in atlas_contrib.
I've tried to improve it by hand myself, with no particular luck. The x87
ISA with it's 8 register stack leaves little room for elegance or
intelligence. With only 8 registers, register blocking is pretty much
impotent, meaning you are hitting the L1 all the time. This traffic seems
to pretty much rule out improvements from adding prefetch, as far as I can
>Lastly, is there an easy way to find out if ATLAS really did use
>my source file/function? Or do I have to trace the execution of a program
>to find out which function was called? Thanks for your help.
After you follow the instructions on page 22 to get it to use your kernel,
scoping ATLAS/tune/blas/gemm/<arch>/res/dMMRES will tell you what it used.