[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

An Intel SIMD drop in kernel



Hello everyone,

As Clint knows, I've been working on an SIMD SGEMM for PIIIs. I've
been attempting to write an drop in kernel for ATLAS, but I've found
that the restriction of N=M=K<64 to be pretty incompatible with the
fundamental algorithm I've been employing (maximal length dot
products). 

For our current research I've been tinkering with my old
implementation of Emmerald (http://csl.anu.edu.au/~daa/research.html)
in an attempt to boost performance. The result is a new version of
Emmerald roughly 1.1 times faster than the old, with peak Mflops
1.86 times the clock rate. It uses Atlas for some of it's smaller
clean up cases.

Getting to the point, I'd like to offer it as a drop in SGEMM for
ATLAS. I am reasonably confident that it is not possible to
write a user kernel which achieves similar performance (having tried
and failed!).

I'd like to know the best way to package up Emmerald to make it easy
for Clint to integrate as he sees fit.

-- 
-Doug  -- http://csl.anu.edu.au/~daa, Ph:(02) 6279-8608, Fax:(02) 6279-8651
Real programmers can tell email has arrived just by the distinctive sound 
the hard disk makes.