[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sgemm questions

To: atlas-comm@cs.utk.edu
Subject: Re: sgemm questions
From: Doug ABERDEEN <daa@discus.anu.edu.au>
Date: Thu, 23 Nov 2000 13:19:21 +1100
In-Reply-To: <54k89w9d6s.fsf@intech19.enhanced.com>; from Camm Maguire on Wed, Nov 22, 2000 at 11:36:11AM -0500
References: <200011161822.NAA02695@enterprise.cs.utk.edu> <54k89w9d6s.fsf@intech19.enhanced.com>

> 2) Peter's ideas of a) unrolling fully with KB ~ 56, b) 1x4 strategy
>    c) loading C at the beginning rather than at the end  and
>    (shockingly) d) doing no pipelining at all all seem to be wins.  I
>    couldn't believe d) when I saw it, but its apparently true -- the
>    PIII likes code like load(a) mul(b,a) add(a,c) best.  Apparently,
>    the parallelism between muls and adds mentioned by Doug Aberdeen in
>    his earlier email only appears fully when the intermediary register
>    is the same.  Doug, maybe you can try this and see if you can get
>    better than 0.75 clock?  Or maybe I misunderstand you?

The following sequence gets 0.84 IPC, an improvement over 0.75, and
the best performance to date:

    MULPS(0, 1);
    ADDPS(1, 2);

    MULPS(3, 4);
    ADDPS(4, 5);

    MULPS(6, 7);
    ADDPS(7, 0);

    MULPS(3, 4);
    ADDPS(4, 5);

Note that each pair of MULPS ADDPS don't have dependencies on
ajacent sets. If they do, the IPC drops to 0.29, which is for the following 
code:

    MULPS(0, 1);
    ADDPS(1, 2);
    MULPS(2, 3);
    ADDPS(3, 4);
    MULPS(4, 5);
    ADDPS(5, 6);
    MULPS(6, 7);
    ADDPS(7, 0);

There's pipeline stalls all over this one. I don't quite understand
how the first code does so well, since there should be a stall between
each MUL and ADD. They must do something funky in the hardware, or
perhaps the instruction re-ordering works really well in this case.

-- 

-Doug  -- http://beaker.anu.edu.au, Ph:(02) 6279-8608, Fax:(02) 6279-8651
A pessimist is just a realist who has not been proved right... yet.

References:
- Re: sgemm questions
  - From: R Clint Whaley <rwhaley@cs.utk.edu>
- Re: sgemm questions
  - From: Camm Maguire <camm@enhanced.com>

Prev by Date: config help
Next by Date: Re: config help
Prev by thread: Re: sgemm questions
Next by thread: ATLAS on the IA64 ?
Index(es):
- Date
- Thread