ScaLAPACK Project

Code Generation
Strategy

¨Code is iteratively generated & timed until optimal case is found. We try:

ãDiffering NBs

ãBreaking false dependencies

ãM, N and K loop unrolling

¨On-chip multiply optimizes for:

ãTLB access

ãL1 cache reuse

ãFP unit usage

ãMemory fetch

ãRegister reuse

ãLoop overhead minimization

¨Takes a 30 minutes to a hour to run.