[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Altivec and ATLAS



Nick,

>My machine is a dual 533 Mhz G4 with 133 Mhz SDRAM, 64K of L1 cache, and 
>1 MB of L2 cache running at 233 Mhz.
>ATL_mm4x4x2_1_pref.c makes a 670 Mflops SGEMM.

My mistake, anyway.  I was thinking of dgemm numbers, not sgemm . . .

>Using Altivec, I get a 1280 Mflops SGEMM;  2 Gflops with both processors 
>using pthreads.
>The NB is 80.  I do think that I can make this better; after all, the 
>Altivec unit can do 4 single-precision muladds per cycle!
>I'm actually not much of an Altivec programmer. This is one of my first 
>efforts.

Memory costs will, of course, prevent you from getting the full speed.
I have a 500Mhz PIII using SSE, which should get a peak of 2Gflop.
However, both MKL (Intel's BLAS) and ATLAS's SGEMM peak out around
920Mflop.  If we scale this up to 533, that would be 980MFLOP.

So, if your 1.3Gflop holds up, I will be very impressed already.  However,
it is often the case that the kernel timer is not completely accurate.
With SSE, it overpredicts performance by a fair amount.  So, once the full
SGEMM is built, I'll be interested to see what you get.  If you have
difficulty building the full gemm from the directions in the contrib
doc, let me know.  If you can't get it to roll, of course, once you are
satisfied with your kernel, you can send it in to me, and as soon as I
can get PPC access, I'll build it into the full gemm, and let you know
what happens .  .

And by the way, on your machine, you do have Make.<arch>'s L2SIZE set to
to >= 2097152, right?

Thanks,
Clint