Re: sgemm questions

Hello again!  Just looked at this stuff again today, and did a rather
simple change which makes the kernel work for the Athlon, with a
slightly higher percentage of peak than the Intel, it appears.  

My question: AMD added a few extra instructions for the Athlon vs the
K6.  Only two of these could be useful here, as far as I can tell.
Should we try to make one kernel without these to run on all AMD K6+
hardware, or a separate one for the Athlon with the extra
instructions?  I don't yet know how much these would help, but what
would be worth the extra load of yet another subarchitecture?

Take care,

