[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
ATLAS Developer release 3.3.0 available
I have finally got a new developer release out the door. The main thing
is that it has Camm & Peter's SSE2 stuff in it, so we can clock in at
around 2Gflop on the P4 for DGEMM. Peter may note that I am using NB=80
for single precision, rather than the more optimal NB of 112. 112 is
asymptotically better, but even at N=3000, it is only 3% better than NB=80,
while NB=80 gets twice as good a performance for N=20-200 . . .
The new developer release represents everything that has been submitted,
with the exception of the parallel make functionality, which should be
in the next one.
This release also includes some code from me, speeding up small case real
LU and Cholesky, and some improvements to complex TRSM. All this goodness
is available at:
By the way, my mail is presently not working, so don't be too surprised if
you send me mail and don't hear back immediately: I will have to scope the
developer release to make sure this email goes out . . .