[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

ATLAS 3.3.8: strong enough for a man, but made for your workstation


ATLAS 3.3.8 is finally available, get yours at:

The _big_ news is a 25% performance increase in double precision performance
on Athlon's, due to Julian Ruhe's excellent assembly kernel.  This bad boy
gets something like 78% of peak on the Athlons I have tried it on
(1.2Ghz tbird & 600Mhz Athlon classic).  Remember, the peak of an Athlon
using the x87 FPU is 2*mhz, not the 1*mhz enjoyed by Pentiums.  If you've got
an Athlon, you've got a reason to scope 3.3.8 . . .

In order to support Julian's kernel, I added support for accepting gemm
kernels as precompiled .o, rather than always expecting .c.  This adds quite
a bit of flexability to the kernel submission.

I have tested the new stuff on Linux; I don't think FreeBSD uses elf format,
anyone know?  I have special code in for Windows, but it's pretty lame and
I doubt it will work out of the gate.  If anyone tries on FreeBSD or Windows,
let me know how bad the carnage is . . .

The source for the kernel is included as well, don't worry.  If you are
interested in seeing what hoops he jumped through to produce this excellent
code, you can find them in ATLAS/tune/blas/gemm/CASES/objs.

Another key feature is the addition of a sanity test for post-install.  It
takes a few minutes to run, and can be called from the ATLAS/ directory by:
   make sanity_test arch=<arch>
This guy will run all BLAS interface testers, as well as some quick lapack
testers from ATLAS/bin.  Together, these tests give a pretty good idea if
your install went OK.  I highly recommend all users run this after the
install completes . . .

We also now have all the LAPACK routines relating to inversion supported by
ATLAS.  See ATLAS/interfaces/lapack for details.

This release allows for the tuning of the Level 1 routine _ROT.  This completes
the level 1 routines I intend to support for the next release.

Finally, this release should fix some bugs from the last, including some
Level 1 errors that can cause seg faults, and an error in detecting AltiVec
for the G4.  PIII users should see their Level 1 performance go up
considerably for certain routines; the arch defaults were bad before . . .

This release was supposed to be the feature-frozen release for starting the
path to the next stable release.  However, I've still got some B.S. to put
in, so that will be delayed until 3.3.9.