Re: Altivec matmul kernel (attachment)

Here's a question for the group:

Altivec fp instructions execute in one of two modes:
In "Java" mode, denormalized results are handled correctly, and 
multiply-add instructions have a 5-cycle latency.

In "non-Java" mode, denormalized results may not be handled correctly, 
and multiply-add instructions have a 4-cycle latency.  All other 
computations are IEEE compliant.  My matmul kernel gets about 150-200 
Mflop speed bump (1650 to 1850, roughly) when going from Java mode to 
non-Java mode.

Should I let the user handle Java vs. non-Java mode, or should I turn 
off Java mode explicitly?  (The submitted version doesn't touch the Java 
mode bit).

