Adaptive Approach for Level 3 BLAS
Do a parameter study of the operation on the target machine, done once.
Only generated code is on-chip multiply
BLAS operation written in terms of generated on-chip multiply
All tranpose cases coerced through data copy to 1 case of on-chip multiply
- Only 1 case generated per platform