[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SSE Level 3 drop in gemm



Greetings!

R Clint Whaley <rwhaley@cs.utk.edu> writes:

> Camm,
> 
> No knowledge/understanding of the register reservation, unfortunately . . .
> 
> >Otherwise, the kernel is working fine.  Performance fluctuates on the
> >short timer runs, but is somewhere between 670 and 700 MFLOPS for the
> >beta=0 case, and about 670 for arbitrary beta.
> 
> Great, that represents something like a 1.9 speedup over ATLAS's kernel,
> doesn't it?
> 
> >On another front -- Do you have any word on the complex compilation
> >procedure, Clint?  The deal is that all beta cases seem to be
> >referenced by the same timer (fc.c) program, regardless of beta= flag.
> 
> Yep, ATLAS/doc/atlas_contrib.ps explains this in the section on complex
> matmul: it's done with 4 calls to essentially a real matmul.  Even the
> case of beta=1 requires a real beta=X, 'cause you need the -1.0 case 
> because the two imaginary elements that contribute to the real component
> (notice steps 1 and 3 on page 14 use negative).  The timer compiles your
> complex code 3 times to get the b1, b0, and bX cases.  What exactly is
> the problem you are having with it?
> 

I guess my problem is that make mmutstcase pre=c nb=?? mmrout=... only
compiles the kernel once, so that xsmmtst fails to link, with an
undefined reference to the bX routine.

=============================================================================
</atlas/tmp/atlas-3.1.2D/tune/blas/gemm/Linux_fpic$ make mmutstcase pre=s nb=56 mmrout=../CASES/ATL_sgemm_SSE.c
make mmutstcase pre=s nb=
<ne/blas/gemm/Linux_fpic$ make mmutstcase pre=s nb=5                         6 mmrout=../CASES/ATL_sge
<ake mmutstcase pre=s nb=56 mmrout=../CASES/ATL_sgem                         m_SSE.c
rm -f smm.c smm.[o,c]
./xemit_mm -p s -b 1 -M 56 -N 56 -K 56 -R -3 \
                   > smm.c
pre=s, CU=0, ma=0, ff=0, if=-1, nf=-1, lo=1, ta=112, tb=111, lat=4, mu=4, nu=4, ku=1, m=56, n=56, k=56, lda=0, ldb=0, ldc=0, csA=1, csB=1, csC=1, alpha=1, beta=1

cat ../CASES/ATL_sgemm_SSE.c >> smm.c
/usr/bin/gcc  -DL2SIZE=524288 -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include/Linux_fpic -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include/contrib  -DAdd__ -DStringSunStyle -fomit-frame-pointer -O -fPIC -c smm.c
make mmtstcase0 pre=s ta=t tb=n muladd=1 lat=4 loopO=JIK M=56 N=56 K=56 mb=56 nb=56 kb=56 mu=4 nu=4 ku=1 lda=56 ldb=56 ldc=0 csA=1 csB=1 csC=1 alpha=1 beta=1 moves="-DMoveA -DMoveB" cleanup=0 mmobjs=smm.o
make[1]: Entering directory `/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/tune/blas/gemm/Linux_fpic'
rm -f smmtst.o
/usr/bin/gcc  -DL2SIZE=524288 -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include/Linux_fpic -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include/contrib  -DAdd__ -DStringSunStyle -fomit-frame-pointer -O3 -funroll-all-loops -fPIC -DsREAL -DtranAt -DtranBn \
              -DMULADD=1 -DLAT=4 -DJIK \
              -DMB0=56 -DNB0=56 -DKB0=56 \
              -DMB=56 -DNB=56 -DKB=56 \
              -DKU=1 -DNU=4 -DMU=4 \
              -DLDA=56 -DLDB=56 -DLDC=0 \
              -DcsA=1 -DcsB=1 -DcsC=1 \
              -DALPHA=1 -DBETA=1 -DMoveA -DMoveB \
              -DCLEANUP=0 \
              -o smmtst.o -c ../mmtst.c
/usr/bin/gcc  -DL2SIZE=524288 -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include/Linux_fpic -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include/contrib  -DAdd__ -DStringSunStyle -fomit-frame-pointer -O3 -funroll-all-loops -fPIC -o xsmmtst smmtst.o smm.o
/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/bin/Linux_fpic/ATLrun.sh /mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/tune/blas/gemm/Linux_fpic xsmmtst
PASSED TEST
make[1]: Leaving directory `/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/tune/blas/gemm/Linux_fpic'
</atlas/tmp/atlas-3.1.2D/tune/blas/gemm/Linux_fpic$ make mmutstcase pre=c nb=56 mmrout=../CASES/ATL_sgemm_SSE.c
make mmutstcase pre=c nb=
<ne/blas/gemm/Linux_fpic$ make mmutstcase pre=c nb=5                         6 mmrout=../CASES/ATL_sge
<ake mmutstcase pre=c nb=56 mmrout=../CASES/ATL_sgem                         m_SSE.c
rm -f cmm.c cmm.[o,c]
./xemit_mm -p c -b 1 -M 56 -N 56 -K 56 -R -3 \
                   > cmm.c
pre=c, CU=0, ma=0, ff=0, if=-1, nf=-1, lo=1, ta=112, tb=111, lat=4, mu=4, nu=4, ku=1, m=56, n=56, k=56, lda=0, ldb=0, ldc=0, csA=1, csB=1, csC=1, alpha=1, beta=1

cat ../CASES/ATL_sgemm_SSE.c >> cmm.c
/usr/bin/gcc  -DL2SIZE=524288 -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include/Linux_fpic -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include/contrib  -DAdd__ -DStringSunStyle -fomit-frame-pointer -O -fPIC -c cmm.c
make mmtstcase0 pre=c ta=t tb=n muladd=1 lat=4 loopO=JIK M=56 N=56 K=56 mb=56 nb=56 kb=56 mu=4 nu=4 ku=1 lda=56 ldb=56 ldc=0 csA=1 csB=1 csC=1 alpha=1 beta=1 moves="-DMoveA -DMoveB" cleanup=0 mmobjs=cmm.o
make[1]: Entering directory `/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/tune/blas/gemm/Linux_fpic'
rm -f cmmtst.o
/usr/bin/gcc  -DL2SIZE=524288 -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include/Linux_fpic -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include/contrib  -DAdd__ -DStringSunStyle -fomit-frame-pointer -O3 -funroll-all-loops -fPIC -DcREAL -DtranAt -DtranBn \
              -DMULADD=1 -DLAT=4 -DJIK \
              -DMB0=56 -DNB0=56 -DKB0=56 \
              -DMB=56 -DNB=56 -DKB=56 \
              -DKU=1 -DNU=4 -DMU=4 \
              -DLDA=56 -DLDB=56 -DLDC=0 \
              -DcsA=1 -DcsB=1 -DcsC=1 \
              -DALPHA=1 -DBETA=1 -DMoveA -DMoveB \
              -DCLEANUP=0 \
              -o cmmtst.o -c ../mmtst.c
/usr/bin/gcc  -DL2SIZE=524288 -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include/Linux_fpic -I/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/include/contrib  -DAdd__ -DStringSunStyle -fomit-frame-pointer -O3 -funroll-all-loops -fPIC -o xcmmtst cmmtst.o cmm.o
cmmtst.o: In function `mmtst':
cmmtst.o(.text+0xbce): undefined reference to `ATL_cJIK56x56x56TN56x56x0_a1_bX'
cmmtst.o(.text+0xc41): undefined reference to `ATL_cJIK56x56x56TN56x56x0_a1_bX'
collect2: ld returned 1 exit status
make[1]: *** [mmtstcase0] Error 1
make[1]: Leaving directory `/mnt/i19/f/debian/mm/atlas/tmp/atlas-3.1.2D/tune/blas/gemm/Linux_fpic'
make: *** [mmutstcase] Error 2
=============================================================================

Take care,

> Cheers,
> Clint
> 
> 

-- 
Camm Maguire			     			camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah