[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 3DNow! Mflop = 4 * Mhz



>yes, but no place to store results and to dissolve dependencies. Maybe it
>is possible if the load/store pipe is keept busy all the time, and make
>some memory locations count as additional registers.

The latency for 3DNow add/mul is 2 cycles, so you should only need 2 registers
to hold mul results before passing them back to the accumulator registers,
which leaves you with enough registers for a 2x1 or 1x2 3DNow register block;
with the fact that each 3DNow reg has 2 single prec elements, this should
already be quite efficient . . .

Clint

> We were discussing how 3Dnow could get 4*mhz even though it only does
> two ops per vector, rather than 4 as for SSE.  The trick is that it can
> do an add and multiply in the same clock cycle, just as with normal flops.
> So this means that seperate multiply/add instructions will be key . . .
> 
> Cheers,
> Clint
>