[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Math-atlas-results] SSE warnings, Band matrix request feature



Camm,

First off, since this message doesn't have any timings in it, it is probably
not wise to post it to the results list :)

>1) In trying to clean up the warnings on the l2 SSE kernels, I'm
>   finding that many of them only appear when using the 2.96 (broken)
>   gcc version on torc.  2.95.x and 3.0.2 don't appear to show these
>   warnings, which refer to macro redefinitions, but I have only
>   tested 3.0.2 on non-i386 machines.  In any case, my code includes
>   the same header multiple times, between each of which a few key
>   macros are changed.  And certain of the macros in the header file
>   thus multiply included give the redefinition warning with 2.96,
>   while others adjacently defined do not.  No apparent rhyme or
>   reason.  I can certainly work around with undef's, or some moderate
>   rewriting, but I'd like to get a minimal fix in first, so I'm
>   wondering whether 2.96 is faulty in this respect and should be
>   ignored.  As long as I've used these macros, redefining the same
>   macro to the same value never produces a warning, but maybe I've
>   been relying on non-standard cpp all this time.

Redefining the same macro name with the same definition is allowed by ANSI/
ISO C, but a several compilers nonetheless issue warnings about it.  Elsewhere
in ATLAS, we never do it.  If any macro is going to be redefined, #undef
is first applied.  I think that would be best for this as well, even though
your stuff is only meant to be compiled by gcc (most atlas routines can be
compiled by any compiler).  This will guarantee that all present/future gccs
don't issue the pages of warnings . . .

>2) I've gotten interested in band matrices recently, and am wondering
>   how atlas handles these.  Take the extreme case of a diagonal
>   matrix, 'band packed' so that the diagonal elements are contiguous in
>   memory.   For s{tsg}bmv, there seems to be no way the basic atlas
>   code can hand this off to a kernel without moving the memory
>   around.  But this would be an easily vectorizeable operation.
>   Should we have a 4rth l2 kernel to deal with band matrices?

ATLAS handles banded and packed as, essentially, reference BLAS.  In our NFS
proposal (rejected), Antoine and myself laid out how to handle these guys,
including extending them to Level 3 operations, giving you order of magnitude
performance improvements.  You can indeed base them on kernels, but not, as
you point out, the very narrow band cases.

It was going to be a year or two of work by our full-time team to do this very
thorough solution we proposed, so it's pretty clear it won't happen now.  As
far as things that are within the realm of the possible, if you examine 
Antoine's Level 2 packed and banded routines, you will see they are like our
dense Level 2: mixed recursive/kernel-based solutions.  This means that if
someone were to write efficient versions of Antoine's reference kernel, you
would speed up the entire Level 2 packed/banded, just as with dense.  

However, there are no packed/banded kernel testers/timers as there are with
dense, so this is more problematic.  I will not have time to produce such
tester/timers myself . . .

Cheers,
Clint