Algorithmic approach for Level 3
Only generated code is on-chip multiply
BLAS operation written in terms of generated on-chip multiply
All tranpose cases coerced through data copy to 1 case of on-chip multiply
Only 1 case generated per platform
Previous slide
Next slide
Back to first slide
View graphic version