[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Altivec and ATLAS

Unaligned C is okay - I've written unaligned load and store code for C, 
and it results in about a 5 or 10% performance penalty.  My 
Altivec-based single-precision L1 matmul is getting in the neighborhood 
of 1.2 - 1.3 Gflops on my 533 Mhz G4.  I can probably make it better 
than that (scalar code gets about 670 Mflops).

The G4 has prefetch instructions as well, which may improve the copy 
performance - right now I have no idea where in ATLAS these instructions 
should go though!


On Thursday, June 7, 2001, at 11:21 AM, R Clint Whaley wrote:

>> I have a question about the L1 copy matmul.  Altivec code generally
>> requires data to be aligned on 128-bit boundaries.  One can work with
>> unaligned data but it requires extra work.  Is it possible to guarantee
>> that the copied version of the matrices in ATLAS are 128-bit aligned
>> even if the original matrices aren't?  Which portion of the code should
>> I look at?  The Altivec extensions include 128-bit aligned versions of
>> malloc and calloc, so perhaps I can just do a one or two line
>> replacement.
> ATLAS already guarantees 128 bit alignment for everything except 
> K-cleanup.
> This was put in during the last release for SSE and 3DNow! support.  
> Here
> is the relevant thread (note that 16 byte == 128 bit for discussion):
>    http://www.netlib.org/atlas/atlas-comm/msg00144.html
> Note that this is the alignment of the input matrices A and B *ONLY*, C 
> has
> no guaranteed alignment (C is often passed in by the user and not 
> buffered
> by ATLAS).  Is A and B enough, or do you believe you will need C 
> aligned as
> well (http://www.netlib.org/atlas/atlas-comm/msg00274.html gives a brief
> overview of why copying C can be too costly)?
> Cheers,
> Clint

Nicholas Coult, Ph.D.,  web: http://melby.augsburg.edu/~coult
Assistant Professor, Department of Mathematics, Augsburg College
coult@augsburg.edu, phone:  (612) 330-1064 office: Science Hall 137B