[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

MKL5.0 v. ATLAS3.2 v ATLAS 3.3 on the P4



Guys,

Here are some timings comparing MKL5.0, ATLAS 3.2.1, and the ATLAS developer
release 3.3.0.  All timings are on our 1.5Ghz P4 (256K L2).  Note that the
developer release requires an experimental "as" to assemble the new SSE2
instructions; I was not able in the 5 minutes I spent on it to get this
rolling under cygwin/Windows 2000.  Again, I'm timing under Win2K 'cause
the Linux version of MKL is under NDA.  So, MKL5.0 and ATLAS 3.2.1 timings
were obtained under Win2K, while the ATLAS 3.3.0 timings were taken under
Linux **on the same machine**.

As with the PIII, MKL5.0 seg faults for 500x500 HERK HER2K, so that's why there
are no timings for that case.

The quickest summation I could give would be: just use ATLAS. 
The main difference between ATLAS 3.2 and 3.3 is, of course, support for
SSE2 using Camm and Peter's excellent kernels.  I have not done full timings;
I settled for what I had time for, so perhaps MKL may be better on others,
but I think its P4 support is just too preliminary for that to be likely . . .

Cheers,
Clint
*******************************************************************************
*                         1.5Ghz P4, 256K L2                                  *
*******************************************************************************
M50 : MKL5.0, Win2K
A32 : ATLAS 3.2.1, Win2K
A33 : ATLAS 3.3.0, Linux

            1200   1400   1600   1800   2000   2200   2400   2600   2800   3000
           =====  =====  =====  =====  =====  =====  =====  =====  =====  =====
M50  dLU   676.4  681.2  677.9  685.7  690.5  690.0  691.7  694.6  695.3  698.1
A32  dLU  1045.7 1073.6 1077.1 1108.8 1109.1 1124.8 1130.2 1138.9 1137.9 1161.6
A33  dLU  1514.8 1562.7 1568.6 1619.3 1645.5 1677.6 1690.4 1720.1 1715.2 1722.1
M50  sLU  1741.7 1790.7 1840.5 1883.8 1861.5 1915.3 1999.8 1995.9 1977.1 1994.4
A32  sLU  2449.5 2571.5 2699.7 2812.1 2878.7 2977.9 3036.6 3094.0 3142.3 3191.8
A33  sLU  2449.5 2504.6 2624.4 2756.3 2851.0 2944.5 3060.8 3090.8 3126.2 3173.8

             100    200    300    400    500    600    700    800    900   1000
           =====  =====  =====  =====  =====  =====  =====  =====  =====  =====
M50  sLU   556.9 1121.7 1346.6 1245.2 1468.4 1513.9 1522.8 1696.6 1728.1 1748.5
A32  sLU   527.6  917.8 1134.0 1520.9 1560.2 1917.6 1903.5 1994.2 2207.3 2220.6
A33  sLU   514.1  917.8 1134.0 1521.0 1664.2 1917.6 1903.5 2131.3 2207.3 2220.6
M50  dLU   384.8  531.3  567.0  606.6  622.5  639.2  650.8  642.2  664.3  672.2
A32  dLU   425.7  673.0  766.8  815.8  734.2  924.9  993.1  974.3  988.9 1023.3
A33  dLU   435.8  696.2  936.8 1120.7 1134.7 1150.6 1269.0 1311.6 1387.4 1448.2
M50  cLU   769.8  998.1 1024.4 1033.4 1037.6 1086.1 1100.1 1098.8 1114.9 1109.3
A32  cLU   554.9  912.6 1239.8 1543.0 1665.4 1856.9 2077.6 2162.7 2310.6 2377.9
A33  cLU   631.0 1064.7 1438.2 1623.9 1850.5 2055.9 2229.7 2313.0 2369.7 2445.6
M50  zLU   696.2  848.3  859.5  919.1  897.8  898.0  895.4  896.6  928.4  917.9
A32  zLU   504.8  709.8  826.6  897.4  945.0  992.5 1026.0 1040.2 1071.8 1077.5
A33  zLU   438.9  734.3  980.6 1136.7 1189.6 1338.7 1451.1 1436.5 1518.1 1532.0

M50  sMM  2500.0 2380.0 2454.5 3200.0 3125.0 2880.0 2982.6 3200.0 3095.5 2980.6
A32  sMM  2142.9 2880.0 3200.0 3605.6 4166.7 3570.2 3591.6 3923.4 3940.5 3913.9
A33  sMM  2631.6 2917.6 3240.0 4266.7 3846.2 3600.0 4035.3 4096.0 3940.5 3921.6
M50  dMM   952.4 1361.7 1350.0 1600.0 1562.5 1728.0 1591.6 1762.5 1672.0 1752.8
A32  dMM   952.4 1066.7 1148.9 1163.6 1184.8 1196.7 1222.8 1217.6 1245.1 1240.7
A33  dMM  1515.2 1600.0 1675.9 1920.0 2000.0 1878.3 1854.1 1969.2 1997.3 1941.7
M50  cMM    19.4 1113.0 1136.8 1216.2 1189.1 1190.1 1191.5 1202.9 1198.3 1199.4
A32  cMM   112.0 2115.7 2700.0 3938.5 4000.0 3918.4 3859.4 4129.0 4072.6 3994.0
A33  cMM   666.7 3200.0 3085.7 3938.5 4000.0 3840.0 3920.0 4137.4 4050.0 4060.9
M50  zMM  1000.0 1010.5  981.8 1064.4 1039.5 1071.3 1045.7 1067.8 1051.2 1052.5
A32  zMM  1047.1 1010.5 1200.0 1187.9 1161.4 1223.8 1212.0 1217.2 1208.5 1199.4
A33  zMM  1538.5 1920.0 1963.6 1897.3 1923.1 2057.1 2017.6 2133.3 2046.3 2088.8

                                               HEMM   HERK  HER2K
                                        GEMM   SYMM   SYRK  SYR2K   TRMM   TRSM
                                      ====== ====== ====== ====== ====== ======
M50 s500                              3125.0 1785.7 1926.9 1243.8 2500.0 1785.7
A32 s500                              3571.4 3571.4 2783.3 3571.4 3125.0 2272.7
A33 s500                              3846.2 3571.4 2890.4 3571.4 3125.0 2381.0
M50 d500                              1087.0  803.9 1138.6  889.7 1126.1  690.6
A32 d500                              1000.0 1136.4  963.5 1136.4 1041.7 1041.7
A33 d500                              1923.1 1923.1 1565.6 1785.7 1666.7 1470.6
M50 c500                              1062.7 1040.6 1000.0  891.3 1019.3 1085.7
A32 c500                              2849.0 3436.4 2947.1 3846.2 3128.1 1467.7
A33 c500                              4000.0 3703.7 2783.3 3783.3 3336.7 2085.4
M50 z500                              1019.4  960.6 **SEG FAULT**  925.1  892.2
A32 z500                              1203.4 1189.1  961.6 1189.1 1040.5 1062.6
A33 z500                              1851.9 1785.7 1565.6 1923.1 1725.9 1352.7