There are several assumptions that need to hold true for a user-supplied GEMV
primitive. First, the loop ordering must be that implied by the
<flag>
setting
the user supplies in the primitive description file, as discussed in
Section 6.1.1.
Each primitive makes assumptions about the arguments it handles,
and these assumptions are reflected in the routine name. The function name of
a GEMV primitive is:
ATL_<pre>gemv<Trans>_a1_x1_<betanam>_y1where:
<pre>
is replaced by the precision prefix:
s, d, c, or z.
<Trans>
is replaced by transpose specifier:
<betanam>
is replaced by the beta specifier this kernel
supplies. All GEMV kernels must supply the following beta specifiers and
names:
For a given gemv primitive (either NoTranspose or Transpose), if the cpp macro
Conj_
is defined we want the conjugate form of that transpose setting
(i.e., Nc or C).
Each file is further compiled with differing cpp settings to generate the various beta cases. The beta macro settings and their meanings are:
CPP MACRO | MEANING |
BETA0 | Primitive should provide |
BETA1 | Primitive should provide |
BETAX | Primitive should provide |
BETAXI0 | For complex only, primitive should provide , |
where the imaginary component of beta is zero. |
In terms of the BLAS API, the GEMV kernels additionally assume
incX = 1
incY = 1
Higher level ATLAS routines ensure these assumptions are true before calling the primitive.
Therefore, the routine:
ATL_dgemvN_a1_x1_b0_y1supplies a primitive doing notranspose gemv, on a column-major array with , , incX = 1 and incY = 1. while:
ATL_cgemvNc_a1_x1_bXi0_y1:supplies a primitive doing notranspose gemv, on a column-major array whose elements should be conjugated before the multiplication, with , incX = 1, incY = 1, and whose real component is unknown, but whose imaginary component is known to be zero.
For greater understanding of how these CPP macros are used to compile multiple primitives from one file, examine the provided CASES files.
The API of the primitive is:
ATL_<pre>gemv<Trans>_a1_x1_<betanam>_y1 ( const int M, /* length of Y vector */ const int N, /* length of X vector */ const SCALAR alpha,/* ignored, assumed to be one */ const TYPE *A, /* pointer to column-major matrix */ const int lda, /* leading dimension of A, or row-stride */ const TYPE *X, /* vector to multiply A by */ const int incX, /* ignored, assumed to be one */ const SCALAR beta, /* value of beta */ TYPE *Y, /* output vector */ const int incY /* ignored, assumed to be one */ );
where,
<pre> : |
s | d | c | z |
SCALAR | float | double | float* | float* |
TYPE | float | double | float | float |
Note that the meaning of M and N are slightly different than that used by the Fortran77 API, in that they give the vector lengths, not array dimensions.