Algorithmic approach for Level 3 BLAS
Recur down to L1 cache block size
Need kernel at bottom of recursion
Use gemm-based kernel for portability
Recursive TRMM
Previous slide
Next slide
Back to first slide
View graphic version