[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Timing roundup (P4, IA64, Athlon)



Guys,

I include below some interesting timings, where I compare the following three
systems, all running linux, and using a default ATLAS 3.2.0 install:

ATH : 1Ghz Athlon, SDRAM                           $1269
P4  : 1.5Ghz Pentium 4, Rambus                     $2109
IA64: 666Mhz Itanium, no idea on mem               ?????

So, the first thing to note is that the Athlon is using the old memory
type (SDRAM, not the newer SDDRAM, or whatever the hell it is), and the
P4 is using rambus.  I have no idea what the Itanium has.  The price above
is not what we payed for the machines (I have no idea), it's what Gateway
tells me those machines with 256Mb of memory cost.

All the numbers here are using the P4's normal FPU.  This machine will
need SSE2 to really shine (that will pump it's theor peak to 2*mhz).  However,
the normal FPU is what you get just using gcc on linux, so it's what linux
people will be getting for a while, as well as MSVC++ people (Intel has a
compiler for Windows that apparently generates SSE2 code automatically,
which is what MKL is apparently already using to get dmatmul > Mhz).

So, the good news is that the P4 looks a lot like a PIII at the greater
clock speed, even when using the normal FPU (I had heard rumors that the
P4 fpu was crippled), since you get roughly 72% of peak with dgemm (the
exact number the PII gets; PIII's typically get more like 76%).  Here's
some peak numbers (extracted from detailed timings below):


                  Theo   dMatmul      dLU    dMM %     dLU %
            Mhz   peak   (MFLOP)   (MFLOP)   of Mhz    of Mhz
           ====   ====   =======   ======    ======    ======
ATH :      1000   2000    1192.6    1003.1    119.3     100.3
P4  :      1500   1500    1073.9     986.1     71.6      65.7
IA64:       666   2664    1866.3    1336.0    280.2     200.6

           Theoretical    dMatmul      dLU   dLU %
           peak (Mflop)    % peak   % peak   of dMM
           ============   =======   ======   ======
ATH :             2000       59.6     50.2     84.1
P4  :             1500       71.6     65.7     91.8
IA64:             2664       70.1     50.2     71.6


OK, so peak performance-wise (where N=3000 is largest timings I took: both
Athlon and IA64 LU numbers were still getting better, as you would expect by
looking at their LU % of MM numbers), without SSE2, it looks like the P4 will
need to be about 1.66 times faster than an Athlon to maintain the same GEMM
peak, and about 1.53 times faster to maintain the same LU peak.  Since the LU
peak should be perked up quite a bit by faster memory, it may look more like
the MM numbers soon.  So, under these conditions, Athlon is the fp king of
the two.  Athlon is far and away the flops/$ champion, and as far as I know,
this is true of any machine on the market.

Anyway, the full timings are given below.  You'll see that the P4 does well
early (probably due to superior memory), with the IA64 doing really poorly
for small probs (memory again).

Cheers,
Clint


             100    200    300    400    500    600    700    800    900   1000
           =====  =====  =====  =====  =====  =====  =====  =====  =====  =====
ATH  dMM   909.1 1010.5 1080.0 1163.6 1087.0 1136.8 1143.3 1190.7 1205.0 1156.1
P4   dMM   952.4 1010.5 1080.0  984.6 1041.7 1080.0 1055.4 1077.9 1088.1 1075.3
IA64 dMM   866.3 1247.9 1472.4 1566.6 1570.6 1708.0 1645.1 1730.3 1710.2 1741.5

ATH  dLU   477.4  611.8  695.0  709.8  780.1  777.4  815.8  793.1  823.0  865.2
P4   dLU   435.8  611.8  718.2  788.6  805.2  821.8  878.5  874.4  882.9  888.2
IA64 dLU   241.2  419.4  554.3  652.8  754.2  800.4  832.4  873.0  926.0  937.0

            1200   1400   1600   1800   2000   2200   2400   2600   2800   3000
           =====  =====  =====  =====  =====  =====  =====  =====  =====  =====
ATH  dMM  1183.6 1172.6 1175.3 1192.6 1179.9 1175.3 1189.7 1191.2 1190.1 1187.3
P4   dMM  1066.7 1067.7 1066.7 1065.3 1071.0 1073.9 1073.3 1072.7 1073.4  852.3
IA64 dMM  1789.2 1809.9 1820.4 1858.1 1840.1 1823.3 1832.5 1810.5 1862.2 1866.3

ATH  dLU   878.8  923.4  925.2  943.2  950.3  965.5  974.9  983.5  994.6 1003.1
P4   dLU   906.5  932.8  937.9  950.2  955.4  965.5  969.8  977.8  975.4  986.1
IA64 dLU   990.7 1047.1 1077.9 1149.2 1179.4 1208.7 1240.7 1272.7 1305.7 1336.0

                          GEMM   SYMM   SYRK  SYR2K   TRMM   TRSM
                         =====  =====  =====  =====  =====  =====
ATH-1     500           1136.4 1000.0  835.0 1087.0  961.5  961.5
P4-1.5    500           1041.7  961.5  835.0 1000.0  892.9 1041.7
IA64-666  500           1610.1 1201.9 1462.9 1462.9 1082.5  816.6