[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Wrapping of Julians code more or less completed.
Hi Clint,
I have finished wrapping Julians athlon kernel into a .c file using gcc
inline assembly. It provides all four precisions and does N cleanup. N is
always read at runtime.
I have not looked at the prefetching, so that stuff is still only
optimized for 30x30 dgemm, but hopefully it does not do to muh of a
difference.
Please test it thouroughly for speed, since I have a hard time testing it
properly over my 56k modem.
Cheers,
Peter.