[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 3.3.10

R Clint Whaley <rwhaley@cs.utk.edu> writes:

> Camm,

> >   the p4 timing issue we discussed previously still exists, then I
> >   think we need to add a line hardcoding values for n,m,k for the SSE
> >   dgemm I provided.  
> I can't remember this discussion at the moment, what p4 timing issue are we
> talking about?

This issue is that the gemm kernel I submitted, with following
dcases.dsc line:
200   8   4   1   4 1 1 4  1  4 ATL_gemm_SSE.c          "Camm Maguire"

Is never timed for nb's as high as 80 on torc19, for example.  Timing
stops at much lower nb's, which, if I recall, you said was due to the
unreliable cache detection on the P4.

If I put in the following line:
202   8 -80 -80 -80 0 3 1  1  4 ATL_gemm_SSE.c          "Camm Maguire"

the kernel ends up being selected on torc19, but not by much.  Its not
critical for atlas obviously, but there's no sense having a kernel in
there which won't be timed in competitive parameter regions. 

BTW, the docs don't specify how you define the muladd and latency
parameters.  Could you please explain briefly?  Some of the code I've
submitted may have non-conventional values.

> >Also, have several s and d level1 kernels I'd like to upload.
> One option is to CVS add them into your kernel directory, and then send
> me the descriptor file lines for me to try, I guess . . .


More stuff:

1) On machines with little l2 cache, the k6 for example with 64KB,
   supplying make config with a cache size of 128 or 256 results in a
   build failure, as atlas can't get the timing parameters to within
   tolerance.  Does this make sense, or does this have nothing to do
   with cache size, but rather indicates some external machine load
   during the timing?  I tried to leave the machine quiet.

2) request for config.c to output some simple line which a script
   could read indicating what ISA extensions are going to be used.  My
   understanding is that the current possibilities are
   sse,sse2,3dnow,ev5,sparc64. (If its easier, I could post this to

3) I can't close the sourceforge issue assigned to me, which was fixed
   with last night's cvs commit.  There appears to be no item on the
   web page allowing me to do so.  Have I missed something?

4) I've got a few l1 kernels, but the cases lines seem a bit messy.
   For example:
 4  2  1  scal_44_SSE.c     "C. Maguire" 
 5  2  1  scal_45_SSE.c     "C. Maguire" 
 6  2  1  scal_46_SSE.c     "C. Maguire" 
 7  2  1  scal_47_SSE.c     "C. Maguire" 
 8  2  1  scal_48_SSE.c     "C. Maguire" 
 9  2  1  scal_54_SSE.c     "C. Maguire" 
10  2  1  scal_55_SSE.c     "C. Maguire" 
11  2  1  scal_56_SSE.c     "C. Maguire" 
12  2  1  scal_57_SSE.c     "C. Maguire" 
13  2  1  scal_58_SSE.c     "C. Maguire" 
14  2  1  scal_64_SSE.c     "C. Maguire" 
15  2  1  scal_65_SSE.c     "C. Maguire" 
16  2  1  scal_66_SSE.c     "C. Maguire" 
17  2  1  scal_67_SSE.c     "C. Maguire" 
18  2  1  scal_68_SSE.c     "C. Maguire" 
19  2  1  scal_74_SSE.c     "C. Maguire" 
20  2  1  scal_75_SSE.c     "C. Maguire" 
21  2  1  scal_76_SSE.c     "C. Maguire" 
22  2  1  scal_77_SSE.c     "C. Maguire" 
23  2  1  scal_78_SSE.c     "C. Maguire" 
24  2  1  scal_84_SSE.c     "C. Maguire" 
25  2  1  scal_85_SSE.c     "C. Maguire" 
26  2  1  scal_86_SSE.c     "C. Maguire" 
27  2  1  scal_87_SSE.c     "C. Maguire" 
28  2  1  scal_88_SSE.c     "C. Maguire" 

	These files are identical save the definition of two CPP
	macros indicating how far ahead to prefetch, and how far to
	unroll the loop.  Is there any cleaner way of telling the
	timer to simply take the file and time with ranges -DKB={beg
	to end} -DPF={beg to end}?

5) You had indicated that you'd like all prefetch stuff to use you
   macros.  This can be done if you would supply an assember version
   in addition to the __asm__ wrapped one, e.g.
#define a_prefetch(a_,b_,c_) "prefetch" str(a_) " str(b_) "(" str(c_) ")\n\t"
   where a_ is the prefetch flavour, b_ is the offset, and c_ is the
   register containing the base address.  Or if you'd like to to
   macros with different names for the different flavours, that's
   obviously ok too.  I'm not sure if much mileage will be gained
   here, as all the stuff I'm writing these days is just sse code,
   which won't run anyway on athlon, etc.

More later!

> Thanks,
> Clint

Camm Maguire			     			camm@enhanced.com
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah