[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
This is the first release based off the sourceforge CVS. It's main
reason for being is the IA64. I've got Itanium prefetch rolling,
I wrote a prefetched matmul kernel for level 3, and the Level 1 and 2
get speedup from the previously existing prefetch-enabled kernels.
No need to get it if you are not on an IA64, there are no general speedups.
On a machine for which I can't publish results, this kernel gets about 75%
of peak, while the old kernel got roughly 70%. On the compaq testdrive
machine (800Mhz IA64), the new full dgemm clocks in at 2.245Gflop for *very*
large problems. I don't have the old installed on this machine, so I'm not
sure how this compares, but it is only 70% of peak, so maybe the faster
clock speed is bringing down % of peak . . .
I don't have access to an MKL that works, and if I did, I'd be under NDA,
so I don't know how it compares. If anyone has such information and is
allowed to divulge it, I'd love to see a comparison.
As an odd note, using a gcc 3.0 installed by me on our NDAd IA64 prototype,
everything works fine. On compaq's machine using gcc 3.0 installed by them,
the ATLAS generated cleanup seg faults when compiled with -O2, but not when
compiled with -O. This costs you performance, of course . . . Anyway,
if you try to install and get seg faults during install, lower MMFLAGS to
-O . . .
If anyone gets performance results with the new stuff, please post. I'd
be particularly interested in comparison with MKL or previous atlas on