3DNow! Mflop = 4 * Mhz

We were discussing how 3Dnow could get 4*mhz even though it only does
two ops per vector, rather than 4 as for SSE.  The trick is that it can
do an add and multiply in the same clock cycle, just as with normal flops.
So this means that seperate multiply/add instructions will be key . . .