[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SSE numbers



Guys,

I've just finished incorporating Camm's newest submissions, and the time
seemed ripe to finally compare against Doug's submitted full emmerald
SGEMM.  All numbers are for my 500Mhz coppermine PIII laptop, using
the ATLAS timers (which do cache flushing, if my numbers seem low to you).

The short version is that Doug's code is a little better for some smaller
problem sizes due to kernel cleanup, but that the kernel code is the clear
winner for moderate or large problems.  SGEMM peaks at 920Mflop
(46% of SSE-peak, 184% of x87 peak), and CGEMM is about the same.  Emmerald
SGEMM appears to peak around 880.  Single precision LU peaks (for the prob
sizes I ran) around 665 for kernel approach, and 611 for emmerald (yow! LU
timings exceeding x87 peak).

All in all, I think the kernel option seems to be quite adequate for
performance here.  A kind of cool feature of this is that we get mix and
match on the build.  The present code uses Camm's kernel, Camm's N-cleanup
code, and Peter's M-cleanup code.  It further uses Camm's K-cleanup for
K = 4 or 8, Peter's for K = 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64,
and the generated for the rest.

Peter's still cranking on the cleanup, so it may actually wind up improved
before release, which would help with the small prob sizes . . .

Cheers,
Clint

GHBlas : Greg Henry's blas (some old version, may be better now)
Peter  : Timings I sent a while back, using Peter's kernels only
emmeral: SGEMM built from Doug's emmerald full-gemm
New    : New mixed kernels described above

NOTE: nb=64 for SGEMM, 56 for CGEMM

                       100   200   300   400   500   600   700   800   900  1000
                     ===== ===== ===== ===== ===== ===== ===== ===== ===== =====
GHBlas  SGEMM        400.0 417.4 405.0 412.9 416.7 419.4 426.1 428.5 426.3 428.3
Peter   SGEMM        500.0 662.1 736.4 800.0 757.6 815.1 826.5 853.3 857.6 840.3

emmeral SGEMM        714.3 784.3 741.2 783.7 862.1 800.0 797.7 867.8 796.7 873.4
New     SGEMM        597.0 769.2 804.3 872.7 833.3 900.0 902.6 906.2 917.0 909.1

emmeral STRMM        303.0 519.5 540.0 609.5 625.0 635.3 591.4 664.9 662.7 671.1
New     STRMM        330.6 625.0 663.2 738.5 721.2 800.0 762.2 800.0 810.0 819.7

emmeral SLU          223.6 331.0 413.0 462.9 526.6 532.7 525.1 563.7 599.5 611.2
New     SLU          207.4 354.2 413.0 489.5 520.0 567.7 571.1 614.4 630.6 640.5

emmeral SLLt         135.7 241.1 308.4 380.2 426.4 492.1 487.6 488.5 523.5 542.8
New     SLLt         134.3 235.4 318.0 383.8 439.9 481.2 498.2 540.0 566.1 580.6

New     CGEMM         56.9 817.0 864.0 867.8 862.1 886.2 890.9 898.2 891.7 894.9
New     CLU          246.1 381.7 468.9 544.2 559.8 605.9 643.8 662.4 691.5 696.0


                       128   256   384   512   640   768   896  1024  1152  1280
                     ===== ===== ===== ===== ===== ===== ===== ===== ===== =====
emmeral SGEMM        751.8 820.2 617.7 745.7 680.9 748.7 672.3 732.9 693.3 733.3
New     SGEMM        830.1 838.9 894.0 958.7 919.8 906.0 910.5 910.5 926.6 919.8

emmeral SLU          255.0 368.3 440.9 438.0 523.7 482.7 584.3 470.6 595.7 547.9
New     SLU          268.7 368.3 476.4 490.9 569.2 580.2 622.3 533.8 661.4 665.4


                       112   224   336   448   560   672   784   896  1008  1120
                     ===== ===== ===== ===== ===== ===== ===== ===== ===== =====
New     CGEMM        969.7 899.2 892.5 899.2 912.3 909.3 909.2 912.0 916.5 916.0