[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: sgemm questions
Gottem. I'll let you know when I have results.
>BTW, why 713, down from earlier 760?
The 760 was what I got using the kernel timer, and 713 is what I got
timing the full SGEMM built on top of it. I'm not sure I can explain the
full 50 mflop difference; the data copy shouldn't continue to kill us as
matrices get large. However, the kernel timer rarely exactly predicts
full-gemm performance (it over or under estimates depending on the arch) . . .
CGEMM also peaks around 711Mflop for a 1120x1120 problem.
>Great! Please let me know if you have this under control and no
>longer need the kernels I've been submitting. Then I'd have more time
>for things like chasing down atlas compile errors on odd platforms
Peter has only given me one SSE kernel, and it was roughly the same
speed as yours (maybe slightly slower, but not appreciably different).
So, at the moment, my plan is to use your stuff for SSE and his for
3DNow!, as you both originally signed up for; it wasn't until you
had each created your own kernel that you both apparantly decided to
produce the complement . . . My thought on duplicate submissions is that
first one wins unless performance or some other mitigating factor
intervenes . . .
I've been using your kernel to help debug my new install procedure, and
now that you've sent in cleanup, it should be doubly useful. I can use
the UltraSparc kernel similarly, but due to NFS and slower proc, the ultrasparc
install is almost 10 times longer than the PIII on my laptop . . .