¨Today’s processors can achieve high-performance, but this requires extensive machine-specific hand tuning.
¨Hardware and software have a large design space w/many parameters, performance sensitive to
ãBlocking sizes, loop nesting permutations, loop
unrolling depths, software pipelining
strategies, register allocations, and instruction
schedules.
ãComplicated interactions with
the increasingly sophisticated micro-architectures
of new microprocessors.
ãPerformance instability
¨About a year ago no tuned BLAS for Pentium for
Linux.
¨Need for quick/dynamic deployment of optimized
routines.
¨ATLAS - Automatic Tuned Linear Algebra Software
ãPhiPac from Berkeley
ãFFTW from MIT (http://www.fftw.org)