Numerical Libraries
20 years ago
- 1 Mflop/s - Scalar based
- Linpack, Level 1 BLAS, loop unrolling
10 years ago
- 1 Gflop/s - Vector & SMP computing, cache aware
- LAPACK, Level 2 & 3 BLAS, block partitioned, latency tolerant
Today
- 1 Tflop/s - Highly parallel, network based, message passing
- ScaLAPACK, data decomposition, communication/computation
10 years away
- 1 Pflop/s - Many more levels MH, combination/grids & HPC
- More adaptive, LT and bandwidth aware, fault tolerant, extended precision, attention to SMP nodes