Re: sgemm questions

Hi Peter!  Great work on your generator!  I'd love to see a sample of
what its producing!

Peter Soendergaard <soender@cs.utk.edu> writes:

> I have included a file with the macros I currently use, but I only use
> very basic instructions. pfadd, pfmul, pfacc and some of the mmx
> instructions to move 32 bits in and out of the vectors.
> Which added instructions were you thinking of? prefetch{nta,t0,t1,t2},
> flip-the-vector?

These and movntq.  But I've played a bit, and found that (apparently)
1) prefetchnta is no gain over prefetch
2) movntq seems to barely improve things for complex, and single
   beta=0, but dramatically kills performance for single beta1,X.

So I think the facts justify a single AMD implementation.

> I have not done any real tests for the SSE, I have more or less just
> confirmed that I got working code, so I cant remember the exact
> performance I got, but it was reasonable.

Great!  Please let me know if you have this under control and no
longer need the kernels I've been submitting.  Then I'd have more time
for things like chasing down atlas compile errors on odd platforms

Take care,

