[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

*To*: camm@enhanced.com*Subject*: Alignment of cleanup routines*From*: R Clint Whaley <rwhaley@cs.utk.edu>*Date*: Wed, 22 Nov 2000 18:07:11 -0500 (EST)*Cc*: atlas-comm@cs.utk.edu

Camm, >I may be misgrokking something, but it seems to me that all this tells >us is the relative alignment between subsequent rows of a and b. The >starting a and b themselves need not have the same alignment, right? >(what I mean by this is a%16 =/!= b%16) OK, so my understanding is that A%16 == B%16 == 0 forall routines, excepting the K-cleanup cases. So, It's clear why the K-cleanup cases don't have the required alignment, but I guess the M & N cleanup is less so. Here is how I think it works: ATLAS now has a macro called ATL_MinMMAlign, which is the minimum alignment A and B must have in order to use the copy-kernel. ATLAS also dictates that (NB*sizeof()) % ATL_MinMMAlign == 0. When ATLAS copies the matrices, the outer matrix is always copied into a panel of size NB * K, and the outer matrix is in a workspace varying between NB*K <= worksize <= [M,N]*K, depending on whether outer matrix is A (M) or B (N). So, the inner panal is malloced with the correct alignment, and the outer workspace has the correct alignment because NB is a multiple of ATL_MinMMAlign, and so any multiple of NB (in this case, K*NB) is aligned as well. For non-K cleanup, all blocks inside the panels are seperated by NB*something blocks, so alignment between blocks is maintained. For everything except K-cleanup, each column is of length NB, so all columns are aligned as well . . . All these tricks break down for K-cleanup because of internal block alignment; Each block begins on a multiple (since m=NB for A and n=NB for B), but the internal columns are misaligned according to k. The cases where more than one dimension is less than NB are not handled by user-supplied cleanup anyway, so user-contributed cleanup never sees a case where blocks of A or B are not multiples of ATL_MinMMAlign (in this case, 16). >So I guess the best strategy is to >1) use a 1x4 Hmm, don't know. >2) give up on b alignment >3) increment by 1 until a is aligned >4) stride a by multiples of 4,2, or 1 depending on KB. >Is this your understanding? I was wondering basically if I could get >b alignment as well. Only for K-cleanup. M and N cleanup have the same alignment as full kernel. For K-cleanup, A and B start with same alignment, but this doesn't help you, since every column of A is applied to every column of B, and they are all possible misaligned, so concentrating on getting one aligned as you say is probably the trick. Essentially, you can then use aligned loads of A, and unaligned loads of B. What Peter has done, and what I think makes a lot of sense, is having cleanup for KB%4==0 cases which assumes alignment (and thus gets good performance), and the non-aligned you discuss above for the KB%4 != 0 cases . . . Cheers, Clint

**Follow-Ups**:**Re: Alignment of cleanup routines***From:*Camm Maguire <camm@enhanced.com>

- Prev by Date:
**Re: SSE gemm question** - Next by Date:
**config help** - Prev by thread:
**Re: config help** - Next by thread:
**Re: Alignment of cleanup routines** - Index(es):