[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Wrapping of Julians code more or less completed.




Hi Clint,

I have finished wrapping Julians athlon kernel into a .c file using gcc
inline assembly. It provides all  four precisions and does N cleanup. N is
always read at runtime.

I have not looked at the prefetching, so that stuff is still only
optimized for 30x30 dgemm, but hopefully it does not do to muh of a
difference.

Please test it thouroughly for speed, since I have a hard time testing it
properly over my 56k modem.

Cheers,

Peter.