Interesting discovery for the P4

Hi Peter!  Very interesting!  So does this mean that pipelining the
code is irrelevant?

Peter Soendergaard writes:

> Hi everyone.
> I was just testing my fastest result for the P4, which is something like
> 2300 - 2500 mflops for double precision, and I discovered that the layout
> of the code seemed to be totally irrelevant! I tried 4 different ways of
> schedulling the instructions are the code all ran at exactly the same
> speed. This seems to indicate that the trace cache of the P4 actually
> works quite well. I could also mean that another factor (bandwidth) is
> limiting the speed, but still I would expect bigger variations for the
> different ways of schedulling.
> So the P4 might be quite a good chip after all if code layout does not
> have to be heavily optimized.
