Jack Dongarra : Papers

High Performance Computing at a Crossroads, Ewa Deelman, Jack Dongarra, Bruce Hendrickson, Amanda Randles, Daniel Reed, Edward Seigel, and Katherine Yelick, Science, February 2025, Vol. 387, issue 6736, pp. 829-831.

Hardware Trends Impacting Floating-Point Computations in Scientific Applications, Jack Dongarra, John Gunnels, Harun Bayraktar, Azzam Haidar, Dan Ernst, November 2024, https://doi.org/10.48550/arXiv.2411.12090

The Co-Evolution of Computational Physics and High-Performance Computing, Dongarra, J., Keyes, D., Nat Rev Phys (2024). https://doi.org/10.1038/s42254-024-00750-z

XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing, Hoefler, Torsten; Copik, Marcin; Beckman, Pete; Jones, Andrew; Foster, Ian; Parashar, Manish; Reed, Daniel; Troyer, Matthias; Schulthess, Thomas; Ernst, Dan; Dongarra, Jack, accepted in Computing in Science and Engineering, IEEE, March 2024. 10.1109/MCSE.2024.3382154

MAGMA: Enabling Exascale Performance with Accelerated BLAS and LAPACK for Diverse GPU Architectures, Abdelfattah, Ahmad; Beams, Natalie; Carson, Robert; Ghysels, Pieter; Kolev, Tzanio; Stitt, Thomas; Vargas, Arturo; Tomov, Stanimire; Dongarra, Jack, submitted to International Journal of High Performance Computing Applications, February 2024. https://doi.org/10.1177/10943420241261960

Trends in computational science: natural language processing and network analysis of 23 years of ICCS publications, Lijing Luo, Sergey Kovalchuk, Valeria Krzhizhanovskaya, Maciej Paszynski and Jack Dongarra, ICCS 2024 conference, https://doi.org/10.1007/978-3-031-63751-3_2

HPC Forecast: Cloudy and Uncertain, Reed, D., Gannon, D., Dongarra, J., Communications of the ACM, February 2023, Vol. 66 No. 2, Pages 82-90, 10.1145/3552309 Also https://vimeo.com/784558423

Special Issue on Clusters, Clouds, and Data for Scientific Computing, Jack Dongarra and Bernard Tourancheau, The International Journal of High Performance Computing Applications, https://doi.org/10.1177/10943420231180188

GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure, Ahmad Abdelfattah, Stanimire Tomov, Piotr Luszczek, Hartwig Anzt, Jack Dongarra, SC-W 2023: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis.

Using Additive Modifications in LU Factorization Instead of Pivoting, Neil Lindquist, Piotr Luszczek, Jack Dongarra, The ACM International Conference on Supercomputing 2023 (ACM ICS'23). 10.1145/3577193.3593731

PAQR: Pivoting Avoiding QR factorization, Sid-Lakhdar, W.M., Cayrols, S., Bielich, D., Abdelfattah, A., Luszczek, P., Gates, M., Tomov, S., Johansen, H., Williams-Young, D., Davis, T.A. Dongarra, J., 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 10.1109/IPDPS54959.2023.00040

Task-Based Polar Decomposition Using SLATE on Massively Parallel Systems with Hardware Accelerators, Dalal Sukkari, Mark Gates, Mohammed Al Farhan, Hartwig Anzt, Jack Dongarra, Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis- SC23, 10.1145/3624062.3624248

Combining Multitask and Transfer Learning with Deep Gaussian Processes for Autotuning-Based Performance Engineering, Luszczek, P., W. M. Sid-Lakhdar, and J. Dongarra, The International Journal of High Performance Computing Applications, March 2023. doi.org/10.1177/1094342023116636

An Introduction to High Performance Computing and its Intersection with Advances in Modeling REEs and Actinides, Deborah A. Penchoff,Edward Valeev,Heike Jagode,Piotr Luszczek,Anthony Danalis, George Bosilca, Robert J. Harrison, Jack Dongarra,Theresa L. Windus American Chemical Society. �Computational Science for Lanthanides and Actinides� DOI: 10.1021/bk-2021-1388.ch001

Using Additive Modifications in LU Factorization Instead of Pivoting, Lindquist, N., P. Luszczek, and J. Dongarra, 37th ACM International Conference on Supercomputing (ICS'23), Orlando, FL, ACM, June 2023. https://doi.org/10.1145/3577193.3593731

Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers, Abdelfattah, A., P. Ghysels, W. Boukaram, S. Tomov, X. Sherry Li, and J. Dongarra, 2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), Dallas, TX, IEEE Computer Society, pp. 354-367, November 2022.

Randomized numerical linear algebra: A perspective on the field with an eye to software, Riley Murray, James Demmel, Michael W Mahoney, N Benjamin Erichson, Maksim Melnichenko, Osman Asif Malik, Laura Grigori, Piotr Luszczek, Michał Dereziński, Miles E Lopes, Tianyu Liang, Hengrui Luo, Jack Dongarra, 2023/2/22, arXiv preprint arXiv:2302.11474

Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications, Cao, Qinglei and Abdulah, Sameh and Alomairy, Rabab and Pei, Yu and Nag, Pratik and Bosilca, George and Dongarra, Jack and Genton, Marc G. and Keyes, David E. and Ltaief, Hatem and Sun, Yin, SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA, 2022, pp. 1-12, doi: 10.1109/SC41404.2022.00007.

2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications (Correctness), Dallas, TX, USA, 2022, pp. 1-9, doi: 10.1109/Correctness56720.2022.00006.

Mixed-Precision Algorithm for Finding Selected Eigenvalues and Eigenvectors of Symmetric and Hermitian Matrices, Y. M. Tsai, P. Luszczek and J. Dongarra, ," 2022 IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems (ScalAH), Dallas, TX, USA, 2022, pp. 43-50, doi: 10.1109/ScalAH56622.2022.00011.

Can the United States Maintain Its Leadership in High-Performance Computing? Dongarra, Deelman, et al, A report from the ASCAC Subcommittee on American Competitiveness and Innovation to the ASCR office, https://doi.org/10.2172/1989107

Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements, Barry, D., H. Jagode, A. Danalis, and J. Dongarra, 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, Florida, IEEE, 2023. DOI:10.1109/IPDPSW59300.2023.00070

http://www.netlib.org/utk/people/JackDongarra/PAPERS/Addressing_Irregular-sc22.pdfA Not So Simple Matter of Software; The Evolution of Mathematical Software: Software and Algorithms Follow the Hardware, Jack Dongarra, Accepted in CACM, August 2, 2022.

Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs, Cayrols, S., Li, J., Bosilca, G., Tomov, S., Ayala, A., Dongarra, J., IEEE Cluster 2022 DOI:10.1109/CLUSTER51413.2022.00029

Sequential Task Flow Runtime Model Improvements and Limitations, Pei, Y., Bosilca, G., Dongarra, J., ROSS Worksop SC22. DOI: 10.1109/ROSS56639.2022.00009

Mixed-Precision Algorithm for Finding Selected Eigenvalues and Eigenvectors of Symmetric and Hermitian Matrices, Tsai, Y. M., Luszczek, P., J. Dongarra, J., the 13th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems (ScalAH22) DOI: 10.1109/ScalAH56622.2022.00011

Proposed Consistent Exception Handling for the BLAS and LAPACK, Demmel, J., Dongarra, J., Gates, M., Henry, G., Langou, J., Li, X., Luszczek, P., Pereira, W., Riedy, J. Rubio-Gonz�lez, C., 2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications 10.1109/Correctness56720.2022.00006

Threshold Pivoting for Dense LU Factorization, Lindquist, N., Gates, M., Luszczek, P., Dongarra, J., 2022 IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems (ScalAH). 10.1109/ScalAH56622.2022.00010

Performance Framework and Runtime Analysis of Parallel FFT on Large Multi-GPU Systems, Ayala, A., Tomov, S., Stoyanov, M., Haidar, A., Dongarra J., 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 10.1109/IPDPSW55747.2022.00072

HPC Forecast: Cloudy and Uncertain, Daniel Reed, Dennis Gannon, and Jack Dongarra, CACM accepted July 27, 2022.

A Framework to Exploit Data Sparsity in Tile Low-Rank Cholesky Factorization, Q. Cao, et al., in 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, 2022 pp. 414-424. doi: 10.1109/IPDPS53621.2022.00047

Batch QR Factorization on GPUs: Design, Optimization, and Tuning, Abdelfattah, A., S. Tomov, and J. Dongarra, ICCS 2022, Lecture Notes in Computer Science, vol. 13350, Cham, Springer International Publishing, June 2022. DOI: 10.1007/978-3-031-08751-6_5

Evaluating Data Redistribution in PaRSEC, Cao, Bosilca, Losad, Wu, Zhong, Dongarra, Transactions on Parallel and Distributed Systems, DOI: 10.1109/TPDS.2021.3131657

Using Long Vector Extensions for MPI Reductions, Dong Zhong, Qinglei Cao; George Bosilca; Jack Dongarra, Parallel Computing, Volume 109, March 2022. https://doi.org/10.1016/j.parco.2021.102871

Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC, Abdulah, S., Q. Cao, Y. Pei, G. Bosilca, J. Dongarra, M. G. Genton, D. E. Keyes, H. Ltaief, and Y. Sun, IEEE Transactions on Parallel and Distributed Systems, vol. 33, issue 4, pp. 964 - 976, April 2022. DOI: 10.1109/TPDS.2021.3084071

Comparing Distributed Termination Detection Algorithms for Modern HPC Platforms, Bosilca, G., A. Bouteiller, T. Herault, V. Le Fèvre, Y. Robert, and J. Dongarra, International Journal of Networking and Computing, vol. 12, issue 1, pp. 26 - 46, January 2022. DOI: 10.15803/ijnc.12.1_26

Rick Archibald, Edmond Chow, Eduardo D'Azevedo, Jack Dongarra, Markus Eisenbach, Rocco Febbo, Florent Lopez, Daniel Nichols, Stanimire Tomov, Kwai Wong, Junqi Yin, (2020). Integrating Deep Learning in Domain Sciences at Exascale. In: Nichols, J., Verastegui, B., Maccabe, A.�., Hernandez, O., Parete-Koon, S., Ahearn, T. (eds) Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI. SMC 2020. Communications in Computer and Information Science, vol 1315. Springer, Cham. https://doi.org/10.1007/978-3-030-63393-6_3
A pdf version is available.

A More Portable HeFFTe: Implementing a Fallback Algorithm for Scalable Fourier Transforms,

D. Sharp, M. Stoyanov, S. Tomov and J. Dongarra, 2021 IEEE High Performance Extreme Computing Conference (HPEC), 2021, pp. 1-5, doi: 10.1109/HPEC49654.2021.9622811.

Scalability Issues in FFT Computation, Ayala, A., Tomov, S., Stoyanov, M., Dongarra, J. (2021). In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2021. Lecture Notes in Computer Science(), vol 12942. Springer, Cham. https://doi.org/10.1007/978-3-030-86359-3_21

Revisiting Credit Distribution Algorithms for Distributed Termination Detection, G. Bosilca, A. Bouteiller, T. Herault, V. Le F�vre, Y. Robert and J. Dongarra, 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2021, pp. 611-620, doi: 10.1109/IPDPSW52791.2021.00095.

Accelerating Multi-Process Communication for Parallel 3-D FFT, Ayala, A. , S. Tomov, M. Stoyanov, A. Haidar, J. Dongarra, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC 21), St. Louis, MO, Nov 2021.

Accelerating FFT towards Exascale Computing, Ayala, A., S. Tomov, A. Haidar, M. Stoyanov, S. Cayrols, J. Li, G. Bosilca, and J. Dongarra, NVIDIA GPU Technology Conference (GTC2021), Digital, March 2021.
A pdf version is available.

Scalability Issues in FFT Computation, Ayala A., Tomov S., Stoyanov M., Dongarra J, PaCT 2021. Lecture Notes in Computer Science, Volume 12942, September 2021

Exploiting Block Structures of KKT Matrices for Efficient Solution of Convex Optimization Problems, Iqbal, Z., S. Nooshabadi, I. Yamazaki, S. Tomov, and J. Dongarra, IEEE Access, August 2021.
A pdf version is available. file:///Users/dongarra/Dropbox/stuff/Papers/totheweb-2022/papers.htm#2021

20 years of computational science: Selected papers from 2020 International Conference on Computational Science, Kovalchuk, Sergey V., Valeria V. Krzhizhanovskaya, P M. A. Sloot, G�bor Z�vodszky, Michael H Lees, M. Paszynski, and J. Dongarra, Journal of Computational Science, July 2021.
A pdf version is available.

Accelerating Restarted GMRES with Mixed Precision Arithmetic, Neil Lindquist, Piotr Luszczek, and Jack Dongarra, IEEE Transactions on Parallel and Distributed Computing, June 2021.
A pdf version is available.

A Set of Batched Basic Linear Algebra Subprograms, Abdelfattah, A., T. Costa, J. Dongarra, M. Gates, A. Haidar, S. Hammarling, N. J. Higham, J. Kurzak, P. Luszczek, S. Tomov, et al., ACM Transactions on Mathematical Software, accepted October 2020.
A pdf version is available.

Efficient Exascale Discretizations: High-Order Finite Element Methods, Kolev, Tzanio; Fischer, Paul; MIN, MISUN; Dongarra, Jack; Brown, Jed; Dobrev, Veselin; Warburton, Timothy; Tomov, Stanimire; Shephard, Mark; Abdelfattah, Ahmad; Barra, Valeria; Beams, Natalie; Camier, Jean-Sylvain; Chalmers, Noel; Dudouit, Yohann; Karakus, Ali; Karlin, Ian; Kerkemeier, Stefan; Lan, Yu-Hsiang; Medina, David; Merzari, Elia; Obabko, Aleksandr; Pazner, Will; Rathnayake, Thilina; Smith, Cameron; Spies, Lukas; Świrydowicz, Kasia; Thompson, Jeremy; Tomboulides, Ananias; Tomov, Vladimir, accepted in International Journal of High Performance Computing Applications, May 2021.

A Survey of Numerical Linear Algebra Methods Utilizing Mixed Precision Arithmetic, Ahmad Abdelfattah, Hartwig Anzt, Erik G. Boman, Erin Carson, Terry Cojean Jack Dongarra, Alyson Fox, Mark Gates, Nicholas J. Higham, Xiaoye S. Li, Jennifer Loe, Piotr Luszczek, Srikara Pranesh, Siva Rajamanickam, Tobias Ribizel, Barry F. Smith, Kasia Swirydowicz, Stephen Thomas, Stanimire Tomov, Yaohung M. Tsai, and Ulrike Meier Yang, International Journal of High Performance Computing Applications, February 2021. https://journals.sagepub.com/doi/10.1177/10943420211003313

Distributed-Memory Multi-GPU Block-Sparse Tensor Contraction for Electronic Structure, Herault, T., Y. Robert, G. Bosilca, R. Harrison, C. Lewis, E. Valeev, and J. Dongarra, 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021), Portland, OR, IEEE, May 2021, accepted December 2020.

Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems, Cao, Q., Y. Pei, K. Akbudak, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021), Portland, OR, IEEE, May 2021, accepted December 2020.

Revisiting Credit Distribution Algorithms for Distributed Termination Detection, George Bosilca, Aurelien Bouteiller, Thomas Herault, Valentin Le F�vre, Yves Robert and Jack Dongarra, IPDPS-APDCM2021 (Workshop on Advances in Parallel and Distributed Computational Models, accepted March 2021.

Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach with PaRSEC, Sameh Abdulah, George Bosilca, Qinglei Cao, Jack Dongarra, Marc Genton, David Keyes, Hatem Ltaief, Yu Pei, Ying Sun, accepted in IEEE Transactions on Parallel and Distributed Computing, May 2021.

Harnessing the Computing Continuum for Programming Our World, P., Beckman, J. Dongarra, N. Ferrier, G. Fox, T. Moore, D. Reed, and M. Beck, Fog Computing: Theory and Practice, John Wiley & Sons, Inc., 2020. DOI: 10.1002/9781119551713.ch7
A PDF version is available.

MAGMA Templates for Scalable Linear Algebra on Emerging Architectures, Farhan, M. Al, A. Abdelfattah, S. Tomov, M. Gates, D. Sukkari, A. Haidar, R. Rosenberg, and J. Dongarra, �The International Journal of High Performance Computing Applications, vol. 34, issue 6, pp. 645-658, November 2020. DOI: https://doi.org/10.1177/1094342020938421
A PDF version is available.

Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs, Brown, C., A. Abdelfattah, S. Tomov, and J. Dongarra, 2020 IEEE High Performance Extreme Computing Virtual Conference: IEEE, September 2020.

A PDF version is available.

HAN: A Hierarchical AutotuNed Collective Communication Framework, Luo, X., W. Wu, G. Bosilca, Y. Pei, Q. Cao, T. Patinyasakdikul, D. Zhong, and J. Dongarra, IEEE Cluster Conference, Kobe, Japan, Best Paper Award, IEEE Computer Society Press, September 2020.

A PDF version is available.

Flexible Data Redistribution in a Task-Based Runtime System, Cao, Q., G. Bosilca, W. Wu, D. Zhong, A. Bouteiller, and J. Dongarra, IEEE International Conference on Cluster Computing (Cluster 2020), Kobe, Japan, IEEE, September 2020. DOI: https://doi.org/10.1109/CLUSTER49012.2020.00032
A PDF version is available.

Evaluating the Performance of NVIDIA�s A100 Ampere GPU for Sparse and Batched Computations, Anzt, H., Y. M. Tsai, A. Abdelfattah, T. Cojean, and J. Dongarra, 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS): IEEE, November 2020.

A PDF version is available.

High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs, Beams, N., A. Abdelfattah, S. Tomov, J. Dongarra, T. Kolev, and Y. Dudouit, 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA): IEEE, November 2020.

A PDF version is available.

Replacing Pivoting in Distributed Gaussian Elimination with Randomized Techniques, Lindquist, N., P. Luszczek, and J. Dongarra, 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA): IEEE, November 2020.
A PDF version is available.

Using Advanced Vector Extensions AVX-512 for MPI Reduction, Zhong, D., Q. Cao, G. Bosilca, and J. Dongarra, EuroMPI/USA '20: 27th European MPI Users' Group Meeting, Austin, TX, September 2020. DOI: https://doi.org/10.1145/3416315.3416316

A PDF version is available.

Mixed-Precision Iterative Refinement using Tensor Cores on GPUs to Accelerate Solution of Linear Systems, Haidar, A., H. Bayraktar, S. Tomov, J. Dongarra, and N. J. Higham, Proceedings of the Royal Society A, vol. 476, issue 2243, November 2020. DOI: https://doi.org/10.1098/rspa.2020.0110
A PDF version is available.

Matrix Multiplication on Batches of Small Matrices in Half and Half-Complex Precisions, Abdelfattah, A., J. Dongarra, and S. Tomov, Journal of Parallel and Distributed Computing, vol. 145, pp. 188�201, November 2020. DOI: https://doi.org/10.1016/j.jpdc.2020.07.001
A PDF version is available.

Translational Process: Mathematical Software Perspective, Dongarra, J., M. Gates, P. Luszczek, and S. Tomov, Journal of Computational Science, August 2020. DOI: https://doi.org/10.1016/j.jocs.2020.101216
A PDF version is available.

Scalable Data Generation for Evaluating Mixed-Precision Solvers, Luszczek, P., Y. Tsai, N. Lindquist, H. Anzt, and J. Dongarra, 2020 IEEE High Performance Extreme Computing Conference (HPEC): IEEE, September 2020.
A PDF version is available.

Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications, Cao, Q., Y. Pei, K. Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, The Platform for Advanced Scientific Computing (PASC) Conference (PASC20), 2:1-2:11 DOI: https://doi.org/10.1145/3394277.3401846

A PDF version is available.

Numerical Algorithms for High-Performance Computational Science, J., Dongarra, L. Grigori, and N. J. Higham, Philosophical Transactions of the Royal Society A, vol. 378, issue 2166, 2020. DOI: 10.1098/rsta.2019.0066
A PDF version is available.

Improving the Performance of the GMRES method using Mixed-Precision Techniques, N., Lindquist, P. Luszczek, and J. Dongarra, Smoky Mountains Computational Sciences & Engineering Conference (SMC2020), August 2020.
A pdf version is available.

SLATE Users' Guide, Gates, M., A. Charara, J. Kurzak, A. YarKhan, M. Al Farhan, D. Sukkari, and J. Dongarra, SLATE Working Notes, no. 10, ICL-UT-19-01: Innovative Computing Laboratory, University of Tennessee, July 2020.

A PDF version is available.

Performance Tuning SLATE, Gates, M., A. Charara, A. YarKhan, D. Sukkari, M. Al Farhan, and J. Dongarra, SLATE Working Notes, no. 14, ICL-UT-20-01: Innovative Computing Laboratory, University of Tennessee, January 2020.

A PDF version is available.

Prospectus for the Next LAPACK and ScaLAPACK Libraries: Basic ALgebra LIbraries for Sustainable Technology with Interdisciplinary Collaboration (BALLISTIC), Demmel, J., J. Dongarra, J. Langou, J. Langou, P. Luszczek, and M. Mahoney, LAPACK Working Notes, no. 297, ICL-UT-20-07: University of Tennessee.

A PDF version is available.

Integrating Deep Learning in Domain Sciences at Exascale, Archibald, R., E. Chow, E. D�Azevedo, J. Dongarra, M. Eisenbach, R. Febbo, F. Lopez, D. Nichols, S. Tomov, K. Wong, and J. Yin, Innovative Computing Laboratory Technical Report no. ICL-UT-20-10: University of Tennessee, August 2020.
A PDF version is available.

SLATE Performance Report: Updates to Cholesky and LU Factorizations, YarKhan, A., M. Al Farhan, D. Sukkari, M. Gates, and J. Dongarra, Innovative Computing Laboratory Technical Report, no. ICL-UT-20-14: University of Tennessee, October 2020.

A PDF version is available.



home \| books \| courses \| jobs \| notes \| papers \| projects \| talks

	Years 2023 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1979	Papers: See my Google Scholar page for a close-to-comprehensive list of publications and my ResearchGate page for the full text of many. (My DBLP page provides a good list too, organized nicely by year, and with a co-author index.) - 2025- High Performance Computing at a Crossroads, Ewa Deelman, Jack Dongarra, Bruce Hendrickson, Amanda Randles, Daniel Reed, Edward Seigel, and Katherine Yelick, Science, February 2025, Vol. 387, issue 6736, pp. 829-831. A pdf version is available. - 2024- Hardware Trends Impacting Floating-Point Computations in Scientific Applications, Jack Dongarra, John Gunnels, Harun Bayraktar, Azzam Haidar, Dan Ernst, November 2024, https://doi.org/10.48550/arXiv.2411.12090 A pdf version is available. The Co-Evolution of Computational Physics and High-Performance Computing, Dongarra, J., Keyes, D., Nat Rev Phys (2024). https://doi.org/10.1038/s42254-024-00750-z A pdf version is available. XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing, Hoefler, Torsten; Copik, Marcin; Beckman, Pete; Jones, Andrew; Foster, Ian; Parashar, Manish; Reed, Daniel; Troyer, Matthias; Schulthess, Thomas; Ernst, Dan; Dongarra, Jack, accepted in Computing in Science and Engineering, IEEE, March 2024. 10.1109/MCSE.2024.3382154 A pdf version is available. MAGMA: Enabling Exascale Performance with Accelerated BLAS and LAPACK for Diverse GPU Architectures, Abdelfattah, Ahmad; Beams, Natalie; Carson, Robert; Ghysels, Pieter; Kolev, Tzanio; Stitt, Thomas; Vargas, Arturo; Tomov, Stanimire; Dongarra, Jack, submitted to International Journal of High Performance Computing Applications, February 2024. https://doi.org/10.1177/10943420241261960 A pdf version is available. Trends in computational science: natural language processing and network analysis of 23 years of ICCS publications, Lijing Luo, Sergey Kovalchuk, Valeria Krzhizhanovskaya, Maciej Paszynski and Jack Dongarra, ICCS 2024 conference, https://doi.org/10.1007/978-3-031-63751-3_2 A pdf version is available. - 2023- HPC Forecast: Cloudy and Uncertain, Reed, D., Gannon, D., Dongarra, J., Communications of the ACM, February 2023, Vol. 66 No. 2, Pages 82-90, 10.1145/3552309 Also https://vimeo.com/784558423 A pdf version is available. Special Issue on Clusters, Clouds, and Data for Scientific Computing, Jack Dongarra and Bernard Tourancheau, The International Journal of High Performance Computing Applications, https://doi.org/10.1177/10943420231180188 A pdf version is available. GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure, Ahmad Abdelfattah, Stanimire Tomov, Piotr Luszczek, Hartwig Anzt, Jack Dongarra, SC-W 2023: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis. https://doi.org/10.1145/3624062.3624247 A pdf version is available. Using Additive Modifications in LU Factorization Instead of Pivoting, Neil Lindquist, Piotr Luszczek, Jack Dongarra, The ACM International Conference on Supercomputing 2023 (ACM ICS'23). 10.1145/3577193.3593731 A pdf version is available. PAQR: Pivoting Avoiding QR factorization, Sid-Lakhdar, W.M., Cayrols, S., Bielich, D., Abdelfattah, A., Luszczek, P., Gates, M., Tomov, S., Johansen, H., Williams-Young, D., Davis, T.A. Dongarra, J., 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 10.1109/IPDPS54959.2023.00040 A pdf version is available. Task-Based Polar Decomposition Using SLATE on Massively Parallel Systems with Hardware Accelerators, Dalal Sukkari, Mark Gates, Mohammed Al Farhan, Hartwig Anzt, Jack Dongarra, Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis- SC23, 10.1145/3624062.3624248 A pdf version is available. Charting a Path in a Shifting Technical and Geopolitical Landscape: Post-Exascale Computing for the National Nuclear Security Administration (2023), National Academy Press, Washington, DC, ISBN 978-0-309-70108-2 \| DOI https://doi.org/10.17226/26916 https://www.dropbox.com/scl/fi/70u8t0xp8onk7ne0yvr8x/NASEM_AdvCopyPrepub_PostExascale.pdf?rlkey=8t8vzsgnwyxvfajaekunz022k&dl=0 A pdf version is available. Combining Multitask and Transfer Learning with Deep Gaussian Processes for Autotuning-Based Performance Engineering, Luszczek, P., W. M. Sid-Lakhdar, and J. Dongarra, The International Journal of High Performance Computing Applications, March 2023. doi.org/10.1177/1094342023116636 A pdf version is available. An Introduction to High Performance Computing and its Intersection with Advances in Modeling REEs and Actinides, Deborah A. Penchoff,Edward Valeev,Heike Jagode,Piotr Luszczek,Anthony Danalis, George Bosilca, Robert J. Harrison, Jack Dongarra,Theresa L. Windus American Chemical Society. �Computational Science for Lanthanides and Actinides� DOI: 10.1021/bk-2021-1388.ch001 A pdf version is available. Using Additive Modifications in LU Factorization Instead of Pivoting, Lindquist, N., P. Luszczek, and J. Dongarra, 37th ACM International Conference on Supercomputing (ICS'23), Orlando, FL, ACM, June 2023. https://doi.org/10.1145/3577193.3593731 A pdf version is available. Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers, Abdelfattah, A., P. Ghysels, W. Boukaram, S. Tomov, X. Sherry Li, and J. Dongarra, 2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), Dallas, TX, IEEE Computer Society, pp. 354-367, November 2022. A pdf version is available. Randomized numerical linear algebra: A perspective on the field with an eye to software, Riley Murray, James Demmel, Michael W Mahoney, N Benjamin Erichson, Maksim Melnichenko, Osman Asif Malik, Laura Grigori, Piotr Luszczek, Michał Dereziński, Miles E Lopes, Tianyu Liang, Hengrui Luo, Jack Dongarra, 2023/2/22, arXiv preprint arXiv:2302.11474 A pdf version is available. Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications, Cao, Qinglei and Abdulah, Sameh and Alomairy, Rabab and Pei, Yu and Nag, Pratik and Bosilca, George and Dongarra, Jack and Genton, Marc G. and Keyes, David E. and Ltaief, Hatem and Sun, Yin, SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA, 2022, pp. 1-12, doi: 10.1109/SC41404.2022.00007. A pdf version is available. Proposed Consistent Exception Handling for the BLAS and LAPACK, Demmel, James and Dongarra, Jack and Gates, Mark and Henry, Greg and Langou, Julien and Li, Xiaoye and Luszczek, Piotr and Pereira, Weslley and Riedy, Jason and Rubio-Gonz�lez, Cindy 2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications (Correctness), Dallas, TX, USA, 2022, pp. 1-9, doi: 10.1109/Correctness56720.2022.00006. A pdf version is available. Mixed-Precision Algorithm for Finding Selected Eigenvalues and Eigenvectors of Symmetric and Hermitian Matrices, Y. M. Tsai, P. Luszczek and J. Dongarra, ," 2022 IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems (ScalAH), Dallas, TX, USA, 2022, pp. 43-50, doi: 10.1109/ScalAH56622.2022.00011. A pdf version is available. Portable and Efficient Dense Linear Algebra in the Beginning of the Exascale Era, Gates, Mark and YarKhan, Asim and Sukkari, Dalal and Akbudak, Kadir and Cayrols, Sebastien and Bielich, Daniel and Abdelfattah, Ahmad and Farhan, Mohammed Al and Dongarra, Jack, 2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Dallas, TX, USA, 2022, pp. 36-46, doi: 10.1109/P3HPC56579.2022.00009. A pdf version is available. Can the United States Maintain Its Leadership in High-Performance Computing? Dongarra, Deelman, et al, A report from the ASCAC Subcommittee on American Competitiveness and Innovation to the ASCR office, https://doi.org/10.2172/1989107 A pdf version is available. Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements, Barry, D., H. Jagode, A. Danalis, and J. Dongarra, 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, Florida, IEEE, 2023. DOI:10.1109/IPDPSW59300.2023.00070 A pdf version is available. - 2022 - http://www.netlib.org/utk/people/JackDongarra/PAPERS/Addressing_Irregular-sc22.pdfA Not So Simple Matter of Software; The Evolution of Mathematical Software: Software and Algorithms Follow the Hardware, Jack Dongarra, Accepted in CACM, August 2, 2022. A pdf version is available. Addressing Irregular Patterns of Matrix Computations on GPUs and their Impact on Applications Powered by Sparse Direct Solvers, Abdelfattah, A., Ghysels, P., Boukaram, W., Tomov, S., Xiaoye, S., Dongarra, J. SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, November 2022, Article No.: 26, Pages 1�14. 10.1109/SC41404.2022.00031 A pdf version is available. Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs, Cayrols, S., Li, J., Bosilca, G., Tomov, S., Ayala, A., Dongarra, J., IEEE Cluster 2022 DOI:10.1109/CLUSTER51413.2022.00029 A pdf version is available. Sequential Task Flow Runtime Model Improvements and Limitations, Pei, Y., Bosilca, G., Dongarra, J., ROSS Worksop SC22. DOI: 10.1109/ROSS56639.2022.00009 A pdf version is available. Mixed-Precision Algorithm for Finding Selected Eigenvalues and Eigenvectors of Symmetric and Hermitian Matrices, Tsai, Y. M., Luszczek, P., J. Dongarra, J., the 13th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems (ScalAH22) DOI: 10.1109/ScalAH56622.2022.00011 A pdf version is available. Proposed Consistent Exception Handling for the BLAS and LAPACK, Demmel, J., Dongarra, J., Gates, M., Henry, G., Langou, J., Li, X., Luszczek, P., Pereira, W., Riedy, J. Rubio-Gonz�lez, C., 2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications 10.1109/Correctness56720.2022.00006 A pdf version is available. Threshold Pivoting for Dense LU Factorization, Lindquist, N., Gates, M., Luszczek, P., Dongarra, J., 2022 IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems (ScalAH). 10.1109/ScalAH56622.2022.00010 A pdf version is available. Performance Framework and Runtime Analysis of Parallel FFT on Large Multi-GPU Systems, Ayala, A., Tomov, S., Stoyanov, M., Haidar, A., Dongarra J., 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 10.1109/IPDPSW55747.2022.00072 A pdf version is available. HPC Forecast: Cloudy and Uncertain, Daniel Reed, Dennis Gannon, and Jack Dongarra, CACM accepted July 27, 2022. A pdf version is available. A Framework to Exploit Data Sparsity in Tile Low-Rank Cholesky Factorization, Q. Cao, et al., in 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, 2022 pp. 414-424. doi: 10.1109/IPDPS53621.2022.00047 A pdf version is available. Batch QR Factorization on GPUs: Design, Optimization, and Tuning, Abdelfattah, A., S. Tomov, and J. Dongarra, ICCS 2022, Lecture Notes in Computer Science, vol. 13350, Cham, Springer International Publishing, June 2022. DOI: 10.1007/978-3-031-08751-6_5 A pdf version is available. Evaluating Data Redistribution in PaRSEC, Cao, Bosilca, Losad, Wu, Zhong, Dongarra, Transactions on Parallel and Distributed Systems, DOI: 10.1109/TPDS.2021.3131657 A pdf version is available. Using Long Vector Extensions for MPI Reductions, Dong Zhong, Qinglei Cao; George Bosilca; Jack Dongarra, Parallel Computing, Volume 109, March 2022. https://doi.org/10.1016/j.parco.2021.102871 A pdf version is available. Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC, Abdulah, S., Q. Cao, Y. Pei, G. Bosilca, J. Dongarra, M. G. Genton, D. E. Keyes, H. Ltaief, and Y. Sun, IEEE Transactions on Parallel and Distributed Systems, vol. 33, issue 4, pp. 964 - 976, April 2022. DOI: 10.1109/TPDS.2021.3084071 A pdf version is available. Comparing Distributed Termination Detection Algorithms for Modern HPC Platforms, Bosilca, G., A. Bouteiller, T. Herault, V. Le Fèvre, Y. Robert, and J. Dongarra, International Journal of Networking and Computing, vol. 12, issue 1, pp. 26 - 46, January 2022. DOI: 10.15803/ijnc.12.1_26 A pdf version is available. - 2021 - Integrating Deep Learning in Domain Sciences at Exascale, Rick Archibald, Edmond Chow, Eduardo D'Azevedo, Jack Dongarra, Markus Eisenbach, Rocco Febbo, Florent Lopez, Daniel Nichols, Stanimire Tomov, Kwai Wong, Junqi Yin, (2020). Integrating Deep Learning in Domain Sciences at Exascale. In: Nichols, J., Verastegui, B., Maccabe, A.�., Hernandez, O., Parete-Koon, S., Ahearn, T. (eds) Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI. SMC 2020. Communications in Computer and Information Science, vol 1315. Springer, Cham. https://doi.org/10.1007/978-3-030-63393-6_3 A pdf version is available. A More Portable HeFFTe: Implementing a Fallback Algorithm for Scalable Fourier Transforms, D. Sharp, M. Stoyanov, S. Tomov and J. Dongarra, 2021 IEEE High Performance Extreme Computing Conference (HPEC), 2021, pp. 1-5, doi: 10.1109/HPEC49654.2021.9622811. A pdf version is available. Scalability Issues in FFT Computation, Ayala, A., Tomov, S., Stoyanov, M., Dongarra, J. (2021). In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2021. Lecture Notes in Computer Science(), vol 12942. Springer, Cham. https://doi.org/10.1007/978-3-030-86359-3_21 A pdf version is available. Revisiting Credit Distribution Algorithms for Distributed Termination Detection, G. Bosilca, A. Bouteiller, T. Herault, V. Le F�vre, Y. Robert and J. Dongarra, 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2021, pp. 611-620, doi: 10.1109/IPDPSW52791.2021.00095. A pdf version is available. Accelerating Multi-Process Communication for Parallel 3-D FFT, Ayala, A. , S. Tomov, M. Stoyanov, A. Haidar, J. Dongarra, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC 21), St. Louis, MO, Nov 2021. A pdf version is available. Accelerating FFT towards Exascale Computing, Ayala, A., S. Tomov, A. Haidar, M. Stoyanov, S. Cayrols, J. Li, G. Bosilca, and J. Dongarra, NVIDIA GPU Technology Conference (GTC2021), Digital, March 2021. A pdf version is available. An Introduction to High Performance Computing and its Intersection with Advances in Modeling REEs and Actinides , Deborah A. Penchoff,Edward Valeev,Heike Jagode,Piotr Luszczek,Anthony Danalis, George Bosilca, Robert J. Harrison, Jack Dongarra,Theresa L. Windus American Chemical Society. �Computational Science for Lanthanides and Actinides� DOI: 10.1021/bk-2021-1388.ch001 A pdf version is available. Scalability Issues in FFT Computation, Ayala A., Tomov S., Stoyanov M., Dongarra J, PaCT 2021. Lecture Notes in Computer Science, Volume 12942, September 2021 A pdf version is available. Exploiting Block Structures of KKT Matrices for Efficient Solution of Convex Optimization Problems, Iqbal, Z., S. Nooshabadi, I. Yamazaki, S. Tomov, and J. Dongarra, IEEE Access, August 2021. A pdf version is available. file:///Users/dongarra/Dropbox/stuff/Papers/totheweb-2022/papers.htm#2021 20 years of computational science: Selected papers from 2020 International Conference on Computational Science, Kovalchuk, Sergey V., Valeria V. Krzhizhanovskaya, P M. A. Sloot, G�bor Z�vodszky, Michael H Lees, M. Paszynski, and J. Dongarra, Journal of Computational Science, July 2021. A pdf version is available. Accelerating Restarted GMRES with Mixed Precision Arithmetic, Neil Lindquist, Piotr Luszczek, and Jack Dongarra, IEEE Transactions on Parallel and Distributed Computing, June 2021. A pdf version is available. A Set of Batched Basic Linear Algebra Subprograms, Abdelfattah, A., T. Costa, J. Dongarra, M. Gates, A. Haidar, S. Hammarling, N. J. Higham, J. Kurzak, P. Luszczek, S. Tomov, et al., ACM Transactions on Mathematical Software, accepted October 2020. A pdf version is available. Efficient Exascale Discretizations: High-Order Finite Element Methods, Kolev, Tzanio; Fischer, Paul; MIN, MISUN; Dongarra, Jack; Brown, Jed; Dobrev, Veselin; Warburton, Timothy; Tomov, Stanimire; Shephard, Mark; Abdelfattah, Ahmad; Barra, Valeria; Beams, Natalie; Camier, Jean-Sylvain; Chalmers, Noel; Dudouit, Yohann; Karakus, Ali; Karlin, Ian; Kerkemeier, Stefan; Lan, Yu-Hsiang; Medina, David; Merzari, Elia; Obabko, Aleksandr; Pazner, Will; Rathnayake, Thilina; Smith, Cameron; Spies, Lukas; Świrydowicz, Kasia; Thompson, Jeremy; Tomboulides, Ananias; Tomov, Vladimir, accepted in International Journal of High Performance Computing Applications, May 2021. A PDF version is available. A Survey of Numerical Linear Algebra Methods Utilizing Mixed Precision Arithmetic, Ahmad Abdelfattah, Hartwig Anzt, Erik G. Boman, Erin Carson, Terry Cojean Jack Dongarra, Alyson Fox, Mark Gates, Nicholas J. Higham, Xiaoye S. Li, Jennifer Loe, Piotr Luszczek, Srikara Pranesh, Siva Rajamanickam, Tobias Ribizel, Barry F. Smith, Kasia Swirydowicz, Stephen Thomas, Stanimire Tomov, Yaohung M. Tsai, and Ulrike Meier Yang, International Journal of High Performance Computing Applications, February 2021. https://journals.sagepub.com/doi/10.1177/10943420211003313 A PDF version is available. Distributed-Memory Multi-GPU Block-Sparse Tensor Contraction for Electronic Structure, Herault, T., Y. Robert, G. Bosilca, R. Harrison, C. Lewis, E. Valeev, and J. Dongarra, 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021), Portland, OR, IEEE, May 2021, accepted December 2020. A PDF version is available. Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems, Cao, Q., Y. Pei, K. Akbudak, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021), Portland, OR, IEEE, May 2021, accepted December 2020. A PDF version is available. Revisiting Credit Distribution Algorithms for Distributed Termination Detection, George Bosilca, Aurelien Bouteiller, Thomas Herault, Valentin Le F�vre, Yves Robert and Jack Dongarra, IPDPS-APDCM2021 (Workshop on Advances in Parallel and Distributed Computational Models, accepted March 2021. A PDF version is available. Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach with PaRSEC, Sameh Abdulah, George Bosilca, Qinglei Cao, Jack Dongarra, Marc Genton, David Keyes, Hatem Ltaief, Yu Pei, Ying Sun, accepted in IEEE Transactions on Parallel and Distributed Computing, May 2021. A PDF version is available. - 2020 - Harnessing the Computing Continuum for Programming Our World, P., Beckman, J. Dongarra, N. Ferrier, G. Fox, T. Moore, D. Reed, and M. Beck, Fog Computing: Theory and Practice, John Wiley & Sons, Inc., 2020. DOI: 10.1002/9781119551713.ch7 A PDF version is available. MAGMA Templates for Scalable Linear Algebra on Emerging Architectures, Farhan, M. Al, A. Abdelfattah, S. Tomov, M. Gates, D. Sukkari, A. Haidar, R. Rosenberg, and J. Dongarra, �The International Journal of High Performance Computing Applications, vol. 34, issue 6, pp. 645-658, November 2020. DOI: https://doi.org/10.1177/1094342020938421 A PDF version is available. Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs, Brown, C., A. Abdelfattah, S. Tomov, and J. Dongarra, 2020 IEEE High Performance Extreme Computing Virtual Conference: IEEE, September 2020. A PDF version is available. HAN: A Hierarchical AutotuNed Collective Communication Framework, Luo, X., W. Wu, G. Bosilca, Y. Pei, Q. Cao, T. Patinyasakdikul, D. Zhong, and J. Dongarra, IEEE Cluster Conference, Kobe, Japan, Best Paper Award, IEEE Computer Society Press, September 2020. A PDF version is available. Flexible Data Redistribution in a Task-Based Runtime System, Cao, Q., G. Bosilca, W. Wu, D. Zhong, A. Bouteiller, and J. Dongarra, IEEE International Conference on Cluster Computing (Cluster 2020), Kobe, Japan, IEEE, September 2020. DOI: https://doi.org/10.1109/CLUSTER49012.2020.00032 A PDF version is available. Evaluating the Performance of NVIDIA�s A100 Ampere GPU for Sparse and Batched Computations, Anzt, H., Y. M. Tsai, A. Abdelfattah, T. Cojean, and J. Dongarra, 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS): IEEE, November 2020. A PDF version is available. High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs, Beams, N., A. Abdelfattah, S. Tomov, J. Dongarra, T. Kolev, and Y. Dudouit, 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA): IEEE, November 2020. A PDF version is available. Replacing Pivoting in Distributed Gaussian Elimination with Randomized Techniques, Lindquist, N., P. Luszczek, and J. Dongarra, 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA): IEEE, November 2020. A PDF version is available. Using Advanced Vector Extensions AVX-512 for MPI Reduction, Zhong, D., Q. Cao, G. Bosilca, and J. Dongarra, EuroMPI/USA '20: 27th European MPI Users' Group Meeting, Austin, TX, September 2020. DOI: https://doi.org/10.1145/3416315.3416316 A PDF version is available. Mixed-Precision Iterative Refinement using Tensor Cores on GPUs to Accelerate Solution of Linear Systems, Haidar, A., H. Bayraktar, S. Tomov, J. Dongarra, and N. J. Higham, Proceedings of the Royal Society A, vol. 476, issue 2243, November 2020. DOI: https://doi.org/10.1098/rspa.2020.0110 A PDF version is available. Matrix Multiplication on Batches of Small Matrices in Half and Half-Complex Precisions, Abdelfattah, A., J. Dongarra, and S. Tomov, Journal of Parallel and Distributed Computing, vol. 145, pp. 188�201, November 2020. DOI: https://doi.org/10.1016/j.jpdc.2020.07.001 A PDF version is available. Translational Process: Mathematical Software Perspective, Dongarra, J., M. Gates, P. Luszczek, and S. Tomov, Journal of Computational Science, August 2020. DOI: https://doi.org/10.1016/j.jocs.2020.101216 A PDF version is available. Scalable Data Generation for Evaluating Mixed-Precision Solvers, Luszczek, P., Y. Tsai, N. Lindquist, H. Anzt, and J. Dongarra, 2020 IEEE High Performance Extreme Computing Conference (HPEC): IEEE, September 2020. A PDF version is available. Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications, Cao, Q., Y. Pei, K. Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, The Platform for Advanced Scientific Computing (PASC) Conference (PASC20), 2:1-2:11 DOI: https://doi.org/10.1145/3394277.3401846 A PDF version is available. Numerical Algorithms for High-Performance Computational Science, J., Dongarra, L. Grigori, and N. J. Higham, Philosophical Transactions of the Royal Society A, vol. 378, issue 2166, 2020. DOI: 10.1098/rsta.2019.0066 A PDF version is available. FFT-ECP API and High-Performance Library Prototype for 2-D and 3-D FFTs on Large-Scale Heterogeneous Systems with GPUs, S., Tomov, A. Ayala, A. Haidar, and J. Dongarra, no. FFT-ECP STML13-27, Innovative Computing Laboratory, University of Tennessee, January 2020. A PDF version is available. Formulation of Requirements for new PAPI++ Software Package: Part I: Survey Results, H., Jagode, A. Danalis, and J. Dongarra, PAPI++ Working Notes, no. No. 1, ICL-UT-20-02, Innovative Computing Laboratory, University of Tennessee Knoxville, January 2020. A PDF version is available. Project-Based Research and Training in High Performance Data Sciences, Data Analytics, and Machine Learning, K., Wong, S. Tomov, and J. Dongarra, The Journal of Computational Science Education, vol. 11, issue 1, 36-44, January 2020. DOI: 10.22369/issn.2153-4136/11/1/7 A PDF version is available. Performance Tuning SLATE, M., Gates, A. Charara, A. YarKhan, D. Sukkari, M. Al Farhan, and J. Dongarra, SLATE Working Notes, no. 14, ICL-UT-20-01, Innovative Computing Laboratory, University of Tennessee, January 2020. A PDF version is available. Load-balancing Sparse Matrix Vector Product Kernels on GPUs, H., Anzt, Y-C. Chen, T. Cojean, J. Dongarra, G. Flegar, R. Nayak, E. S. Quintana-Orti, Y. Tsai, and W. Wang, ACM Transactions on Parallel Computing, issue 2, March 2020. DOI: 10.1145/3380930 A PDF version is available. Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures, F., Lopez, E. Chow, S. Tomov, and J. Dongarra, Innovative Computing Laboratory Technical Report, no. ICL-UT-20-04, University of Tennessee, Knoxville, March 2020. A PDF version is available. Reducing the Amount of out-of-core Data Access for GPU-Accelerated Randomized SVD, Y., Lu, I. Yamazaki, F. Ino, Y. Matsushita, S. Tomov, and J. Dongarra, Concurrency and Computation: Practice and Experience, April 2020. DOI: 10.1002/cpe.5754 A PDF version is available. Using Arm Scalable Vector Extension to optimize Open MPI, D., Zhong, P. Shamis, Q. Cao, G. Bosilca, and J. Dongarra, 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020), Melbourne, Australia, IEEE/ACM, May 2020. A pdf version is available. Asynchronous SGD for DNN training on Shared-memory Parallel Architectures, F., Lopez, E. Chow, S. Tomov, and J. Dongarra, Workshop on Scalable Deep Learning over Parallel And Distributed Infrastructures (ScaDL 2020), May 2020. A PDF version is available. Mixed-Precision Solution of Linear Systems Using Accelerator-Based Computing, A., Haidar, H. Bayraktar, S. Tomov, J. Dongarra, and N. J. Higham, Innovative Computing Laboratory Technical Report, no. ICL-UT-20-05, University of Tennessee, May 2020. A PDF version is available. Communication Avoiding 2D Stencil Implementations over PaRSEC Task-Based Runtime, Y., Pei, Q. Cao, G. Bosilca, P. Luszczek, V. Eijkhout, and J. Dongarra, 21st IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2020), New Orleans, LA, IEEE, May 2020. A PDF version is available. Twenty Years of Computational Science, V., Krzhizhanovskaya, G. Závodszky, M. Lees, J. Dongarra, P. Sloot, S. Brissos, and J. Teixeira, International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, June 2020. heFFTe: Highly Efficient FFT for Exascale, A., Ayala, S. Tomov, A. Haidar, and J. Dongarra, International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, June 2020. A pdf version is available. Investigating the Benefit of FP16-enabled Mixed-precision Solvers for Symmetric Positive Definite Matrices using GPUs, A., Abdelfattah, S. Tomov, and J. Dongarra, International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, Elsevier, June 2020. A pdf version is available. Report on the Fujitsu Fugaku System, J., Dongarra, Innovative Computing Laboratory Technical Report, no. ICL-UT-20-06, University of Tennessee, June 2020. A PDF version is available. Improving the Performance of the GMRES method using Mixed-Precision Techniques, N., Lindquist, P. Luszczek, and J. Dongarra, Smoky Mountains Computational Sciences & Engineering Conference (SMC2020), August 2020. A pdf version is available. SLATE Users' Guide, Gates, M., A. Charara, J. Kurzak, A. YarKhan, M. Al Farhan, D. Sukkari, and J. Dongarra, SLATE Working Notes, no. 10, ICL-UT-19-01: Innovative Computing Laboratory, University of Tennessee, July 2020. A PDF version is available. Performance Tuning SLATE, Gates, M., A. Charara, A. YarKhan, D. Sukkari, M. Al Farhan, and J. Dongarra, SLATE Working Notes, no. 14, ICL-UT-20-01: Innovative Computing Laboratory, University of Tennessee, January 2020. A PDF version is available. Prospectus for the Next LAPACK and ScaLAPACK Libraries: Basic ALgebra LIbraries for Sustainable Technology with Interdisciplinary Collaboration (BALLISTIC), Demmel, J., J. Dongarra, J. Langou, J. Langou, P. Luszczek, and M. Mahoney, LAPACK Working Notes, no. 297, ICL-UT-20-07: University of Tennessee. A PDF version is available. Integrating Deep Learning in Domain Sciences at Exascale, Archibald, R., E. Chow, E. D�Azevedo, J. Dongarra, M. Eisenbach, R. Febbo, F. Lopez, D. Nichols, S. Tomov, K. Wong, and J. Yin, Innovative Computing Laboratory Technical Report no. ICL-UT-20-10: University of Tennessee, August 2020. A PDF version is available. SLATE Performance Report: Updates to Cholesky and LU Factorizations, YarKhan, A., M. Al Farhan, D. Sukkari, M. Gates, and J. Dongarra, Innovative Computing Laboratory Technical Report, no. ICL-UT-20-14: University of Tennessee, October 2020. A PDF version is available. - 2019 - Adaptive Precision in Block-Jacobi Preconditioning for Iterative Sparse Linear System Solvers, Anzt, H., J. Dongarra, G. Flegar, N. J. Higham, and E. S. Quintana-Orti, Concurrency and Computation: Practice and Experience, vol. 31, no. 6, pp. e4460, March 2019. DOI: 10.1002/cpe.4460 A PDF version is available. Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices, Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, and J. Dongarra, Parallel Computing, vol. 81, pp. 1-21, January 2019. DOI: 10.1016/j.parco.2018.10.003 A PDF version is available. CEED ECP Milestone Report: Performance Tuning of CEED Software and 1st and 2nd Wave Apps: Zenodo, Tomov, S., A. Abdelfattah, V. Barra, N. Beams, J. Brown, J-S. Camier, V. Dobrev, J. Dongarra, Y. Dudouit, P. Fischer, et al., October 2019. DOI: 10.5281/zenodo.3477618 A PDF version is available. Characterization of Power Usage and Performance in Data-Intensive Applications using MapReduce over MPI, Davis, J., T. Gao, S. Chandrasekaran, H. Jagode, A. Danalis, P. Balaji, J. Dongarra, and M. Taufer, 2019 International Conference on Parallel Computing (ParCo2019), Prague, Czech Republic, September 2019. Checkpointing Strategies for Shared High-Performance Computing Platforms, Herault, T., Y. Robert, A. Bouteiller, D. Arnold, K. Ferreira, G. Bosilca, and J. Dongarra, International Journal of Networking and Computing, vol. 9, no. 1, pp. 28-52, 2019. A PDF version is available. Comparing the Performance of Rigid, Moldable, and Grid-Shaped Applications on Failure-Prone HPC Platforms, Le Fevre, V., T. Herault, Y. Robert, A. Bouteiller, A. Hori, G. Bosilca, and J. Dongarra, Parallel Computing, vol. 85, pp. 1-12, July 2019. DOI: 10.1016/j.parco.2019.02.002 A PDF version is available. Counter Inspection Toolkit: Making Sense out of Hardware Performance Events, Danalis, A., H. Jagode, H. Hanumantharayappa, S. Ragate, and J. Dongarra, 11th International Workshop on Parallel Tools for High Performance Computing, Dresden, Germany, Cham, Switzerland: Springer, February 2019. DOI: 10.1007/978-3-030-11987-4_2 A PDF version is available. Design and Implementation for FFT-ECP on Distributed Accelerated Systems, Tomov, S., A. Haidar, A. Ayala, D. Schultz, and J. Dongarra, Innovative Computing Laboratory Technical Report, no. ICL-UT-19-05: University of Tennessee, April 2019. A PDF version is available. Distributed-Memory Lattice H-Matrix Factorization, Yamazaki, I., A. Ida, R. Yokota, and J. Dongarra, The International Journal of High Performance Computing Applications, vol. 33, issue 5, pp. 1046-1063, August 2019. DOI: 10.1177/1094342019861139 A PDF version is available. An Empirical View of SLATE Algorithms on Scalable Hybrid System, YarKhan, A., J. Kurzak, A. Abdelfattah, and J. Dongarra, Innovative Computing Laboratory Technical Report, no. ICL-UT-19-08: University of Tennessee, Knoxville, September 2019. A PDF version is available. Evaluation of Directive-Based Performance Portable Programming Models, Lopez, M. G., W. Joubert, V. Larrea, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, International Journal of High Performance Computing and Networking, vol. 14, issue 2, pp. 165-182. DOI: http://dx.doi.org/10.1504/IJHPCN.2017.10009064 A PDF version is available. Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization, Pei, Y., G. Bosilca, I. Yamazaki, A. Ida, and J. Dongarra, PAW-ATM Workshop at SC19, Denver, CO, ACM, November 2019. A PDF version is available. Fast Batched Matrix Multiplication for Small Sizes using Half Precision Arithmetic on GPUs, Abdelfattah, A., S. Tomov, and J. Dongarra, 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019. A PDF version is available. FFT-ECP Implementation Optimizations and Features Phase, Tomov, S., A. Haidar, A. Ayala, H. Shaiek, and J. Dongarra, Innovative Computing Laboratory Technical Report, no. ICL-UT-19-12: University of Tennessee, October 2019. A PDF version is available. Generic Matrix Multiplication for Multi-GPU Accelerated Distributed-Memory Platforms over PaRSEC, Herault, T., Y. Robert, G. Bosilca, and J. Dongarra, ScalA'19: 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Denver, CO, IEEE, November 2019. A PDF version is available. GPUDirect MPI Communications and Optimizations to Accelerate FFTs on Exascale Systems, Shaiek, H., S. Tomov, A. Ayala, A. Haidar, and J. Dongarra, EuroMPI'19 Posters, Zurich, Switzerland, no. icl-ut-19-06: ICL, September 2019. A PDF version is available. Hands-on Research and Training in High-Performance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments, Wong, K., S. Tomov, and J. Dongarra, ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019. A PDF version is available. Impacts of Multi-GPU MPI Collective Communications on Large FFT Computation, Ayala, A., S. Tomov, X. Luo, H. Shaiek, A. Haidar, G. Bosilca, and J. Dongarra, Workshop on Exascale MPI (ExaMPI) at SC19, Denver, CO, November 2019. A PDF version is available. Increasing Accuracy of Iterative Refinement in Limited Floating-Point Arithmetic on Half-Precision Accelerators, Luszczek, P., I. Yamazaki, and J. Dongarra, IEEE High Performance Extreme Computing Conference (HPEC 2019), Best Paper Finalist, Waltham, MA, IEEE, September 2019. A PDF version is available. Least Squares Solvers for Distributed-Memory Machines with GPU Accelerators, Kurzak, J., M. Gates, A. Charara, A. YarKhan, and J. Dongarra, ACM International Conference on Supercomputing (ICS '19), Phoenix, Arizona, ACM, pp. 117–126, June 2019. DOI: 10.1145/3324989.3325719 A PDF version is available. Linear Systems Solvers for Distributed-Memory Machines with GPU Accelerators, Kurzak, J., M. Gates, A. Charara, A. YarKhan, I. Yamazaki, and J. Dongarra, Euro-Par 2019: Parallel Processing, vol. 11725: Springer, pp. 495–506, August 2019. DOI: 10.1007/978-3-030-29400-7_35 MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing, Nichols, D., N-S. Tomov, F. Betancourt, S. Tomov, K. Wong, and J. Dongarra, ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019. A PDF version is available. Massively Parallel Automated Software Tuning, Kurzak, J., Y. Tsai, M. Gates, A. Abdelfattah, and J. Dongarra, 48th International Conference on Parallel Processing (ICPP 2019), Kyoto, Japan, ACM Press, August 2019. DOI: 10.1145/3337821.3337908 A PDF version is available. Solving Linear Diophantine Systems on Parallel Architectures, Zaitsev, D., S. Tomov, and J. Dongarra, IEEE Transactions on Parallel and Distributed Systems, vol. 30, issue 5, pp. 1158-1169, May 2019, 2018. DOI: http://dx.doi.org/10.1109/TPDS.2018.2873354 A PDF version is available. Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation, Bai, Z., J. Dongarra, D. Lu, and I. Yamazaki, International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019. A PDF version is available. PAPI Software-Defined Events for in-Depth Performance Analysis, Jagode, H., A. Danalis, H. Anzt, and J. Dongarra, The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1113-1127, November 2019. A PDF version is available. ParILUT - A Parallel Threshold ILU for GPUs, Anzt, H., T. Ribizel, G. Flegar, E. Chow, and J. Dongarra, IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019. DOI: 10.1109/IPDPS.2019.00033 A PDF version is available. Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools, Cao, Q., Y. Pei, T. Herault, K. Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, Workshop on Programming and Performance Visualization Tools (ProTools 19) at SC19, Denver, CO, ACM, November 2019. A PDF version is available. Performance of Asynchronous Optimized Schwarz with One-sided Communication, Yamazaki, I., E. Chow, A. Bouteiller, and J. Dongarra, Parallel Computing, vol. 86, pp. 66-81, August 2019. DOI: 10.1016/j.parco.2019.05.004 A PDF version is available. PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP, Dongarra, J., M. Gates, A. Haidar, J. Kurzak, P. Luszczek, P. Wu, I. Yamazaki, A. YarKhan, M. Abalenkovs, N. Bagherpour, et al., ACM Transactions on Mathematical Software, vol. 45, issue 2, June 2019. DOI: 10.1145/3264491 DOI A PDF version is available. Progressive Optimization of Batched LU Factorization on GPUs, Abdelfattah, A., S. Tomov, and J. Dongarra, IEEE High Performance Extreme Computing Conference (HPEC’19), Waltham, MA, IEEE, September 2019. A PDF version is available. Race to Exascale, Dongarra, J., S. Gottlieb, and W. T. Kramer, Computing in Science and Engineering, vol. 21, issue 1, pp. 4-5, March 2019. DOI: 10.1109/MCSE.2018.2882574 A PDF version is available. SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library, Gates, M., J. Kurzak, A. Charara, A. YarKhan, and J. Dongarra, International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Denver, CO, ACM, November 2019. DOI: 10.1145/3295500.3356223 A PDF version is available. SLATE Developers' Guide, Charara, A., M. Gates, J. Kurzak, A. YarKhan, and J. Dongarra, SLATE Working Notes, no. 11, ICL-UT-19-02: Innovative Computing Laboratory, University of Tennessee, December 2019. A PDF version is available. SLATE Mixed Precision Performance Report, Charara, A., J. Dongarra, M. Gates, J. Kurzak, and A. YarKhan, Innovative Computing Laboratory Technical Report, no. ICL-UT-19-03: University of Tennessee, April 2019. A PDF version is available. SLATE Users' Guide, Gates, M., A. Charara, J. Kurzak, and J. Dongarra, SLATE Working Notes, no. 10, ICL-UT-19-01: Innovative Computing Laboratory, University of Tennessee, January 2019. SLATE Working Note 12: Implementing Matrix Inversions, Kurzak, J., M. Gates, A. Charara, A. YarKhan, and J. Dongarra, SLATE Working Notes, no. 12, ICL-UT-19-04: Innovative Computing Laboratory, University of Tennessee, June 2019. A PDF version is available. SLATE Working Note 13: Implementing Singular Value and Symmetric/Hermitian Eigenvalue Solvers, Gates, M., M. Al Farhan, A. Charara, J. Kurzak, D. Sukkari, A. YarKhan, and J. Dongarra, SLATE Working Notes, no. 13, ICL-UT-19-07: Innovative Computing Laboratory, University of Tennessee, September 2019. A PDF version is available. Software-Defined Events through PAPI, Danalis, A., H. Jagode, T. Herault, P. Luszczek, and J. Dongarra, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, May 2019. DOI: 10.1109/IPDPSW.2019.00069 A PDF version is available. Towards Continuous Benchmarking, Anzt, H., Y. Chen Chen, T. Cojean, J. Dongarra, G. Flegar, P. Nayak, E. S. Quintana-Orti, Y. M. Tsai, and W. Wang, Platform for Advanced Scientific Computing Conference (PASC 2019), Zurich, Switzerland, ACM Press, June 2019. DOI: 10.1145/3324989.3325719 A PDF version is available. Towards Half-Precision Computation for Complex Matrices: A Case Study for Mixed Precision Solvers on GPUs, Abdelfattah, A., S. Tomov, and J. Dongarra, ScalA19: 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Denver, CO, IEEE, November 2019. A PDF version is available. What it Takes to keep PAPI Instrumental for the HPC Community, Jagode, H., A. Danalis, and J. Dongarra, 1st Workshop on Sustainable Scientific Software (CW3S19), Collegeville, Minnesota, July 2019. A PDF version is available. - 2018 - Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU Clusters, Yamazaki, I., A. Abdelfattah, A. Ida, S. Ohshima, S. Tomov, R. Yokota, and J. Dongarra, IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, BC, Canada, IEEE, May 2018. https://www.semanticscholar.org/paper/Analyzing-Performance-of-BiCGStab-with-Hierarchical-Yamazaki-Abdelfattah/9f5a5449a04f09fb8b27a106f363dc5a5035a1b9 A pdf version is available. Do moldable applications perform better on failure-prone HPC platforms? Le F�vre, V., G. Bosilca, A. Bouteiller, T. Herault, A. Hori, Y. Robert, and J. Dongarra, 11th Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids, Turin, Italy, Springer Verlag, August 2018. https://link.springer.com/chapter/10.1007/978-3-030-10549-5_61 A pdf version is available. ADAPT: An Event-Based Adaptive Collective Communication Framework, Luo, X., W. Wu, G. Bosilca, T. Patinyasakdikul, L. Wang, and J. Dongarra, The 27th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '18), Tempe, Arizona, ACM Press, June 2018, http://dx.doi.org/10.1145/3208040.3208054. A pdf version is available. Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms, Herault, T., Y. Robert, A. Bouteiller, D. Arnold, K. Ferreira, G. Bosilca, and J. Dongarra, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Best Paper Award, Vancouver, BC, Canada, IEEE, May 2018. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8425494 A pdf version is available. Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers, Haidar, A., S. Tomov, J. Dongarra, and N. J. Higham, International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Dallas, TX, IEEE, November 2018. DOI: 10.1109/SC.2018.00050 https://dl.acm.org/citation.cfm?id=3291719 A pdf version is available. The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques, Haidar, A., A. Abdelfattah, M. Zounon, P. Wu, S. Pranesh, S. Tomov, and J. Dongarra, International Conference on Computational Science (ICCS 2018), vol. 10860, Wuxi, China, Springer, pp. 586�600, June 2018, https://doi.org/10.1007/978-3-319-93698-7_45 A pdf version is available. Variable-Size Batched Condition Number Calculation on GPUs, Anzt, H., J. Dongarra, G. Flegar, and T. Gruetzmacher, SBAC-PAD, Lyon, France, September 2018. https://ieeexplore.ieee.org/document/8645907 A pdf version is available. A Jaccard Weights Kernel Leveraging Independent Thread Scheduling on GPUs, Anzt, H. and J. Dongarra, SBAC-PAD, Lyon, France, September 2018. https://ieeexplore.ieee.org/document/8645946 A pdf version is available. Symmetric Indefinite Linear Solver using OpenMP Task on Multicore Architectures, Yamazaki, I., J. Kurzak, P. Wu, M. Zounon, and J. Dongarra, IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 8, pp. 1879�1892, August 2018, http://dx.doi.org/10.1109/TPDS.2018.2808964. A pdf version is available. Computational Benefit of GPU Optimization for Atmospheric Chemistry Modeling, Sun, J., J. Fu, J. Drake, Q. Zhu, A. Haidar, M. Gates, S. Tomov, and J. Dongarra, Journal of Advances in Modeling Earth Systems, vol. 10, issue 8, pp. 1952�1969, August 2018, https://doi.org/10.1029/2018MS001276. A pdf version is available. Evaluation of Dataflow Programming Models for Electronic Structure Theory, Jagode, H., A. Danalis, R. Hoque, M. Faverge, and J. Dongarra, Concurrency and Computation: Practice and Experience: Special Issue on Parallel and Distributed Algorithms, vol. 2018, issue e4490, pp. 1�20, May 2018, https://doi.org/10.1002/cpe.4490. A pdf version is available. Accelerating NWChem Coupled Cluster through Dataflow-Based Execution, Jagode, H., A. Danalis, and J. Dongarra, International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 540�551, July 2018, https://doi.org/10.1007/978-3-319-32149-3_35. A pdf version is available. Investigating Power Capping toward Energy-Efficient Scientific Applications, Haidar, A., H. Jagode, P. Vaccaro, A. YarKhan, S. Tomov, and J. Dongarra, Concurrency Computation: Practice and Experience, vol. 2018, issue e4485, pp. 1�14, April 2018, http://dx.doi.org/10.1002/cpe.4485. A pdf version is available. A Guide for Achieving High Performance with Very Small Matrices on GPUs: A Case Study of Batched LU and Cholesky Factorizations, Haidar, A., A. Abdelfattah, M. Zounon, S. Tomov, and J. Dongarra, IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 5, pp. 973�984, May 2018, https://doi.org/10.1109/TPDS.2017.2783929. A pdf version is available. Accelerating the SVD Two Stage Bidiagonal Reduction and Divide and Conquer Using GPUs, Gates, M., S. Tomov, and J. Dongarra, Parallel Computing, vol. 74, pp. 3�18, May 2018, http://dx.doi.org/10.1016/j.parco.2017.10.004. A pdf version is available. The 30th Anniversary of the Supercomputing Conference: Bringing the Future Closer�Supercomputing History and the Immortality of Now, Dongarra, J., V. Getov, and K. Walsh, Computer, vol. 51, issue 10, pp. 74�85, November 2018, http://dx.doi.org/10.1109/MC.2018.3971352. A pdf version is available. Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators, Dongarra, J., M. Gates, J. Kurzak, P. Luszczek, and Y. Tsai, Proceedings of the IEEE, vol. 106, issue 11, pp. 2040�2055, November 2018, http://dx.doi.org/10.1109/JPROC.2018.2868961. A pdf version is available. A Failure Detection for HPC Platforms, G. Bosilca, A. Bouteiller, A. Guermouche, T. Herault, Y. Roberts, P. Sens, and J. Dongarra, International Journal of High Performance Computing Applications, Volume 32 Issue 1, January 2018, pp 139-158, http://journals.sagepub.com/doi/10.1177/1094342017711505 A pdf version is available. Accelerating the SVD Bidiagonalization of a Batch of Small Matrices using GPUs, Tingxing Dong Azzam Haidar Stanimire Tomov Jack Dongarra, Journal of Computational Science, January 2018, https://doi.org/doi:10.1016/j.jocs.2018.01.007 A pdf version is available. Adaptive Precision in Block-Jacobi Preconditioning for Iterative Sparse Linear System Solvers, H. Anzt, J. Dongarra, G. Flegar, N. Higham, E. Quintana-Orti, Concurrency and Computation: Practice and Experience, http://dx.doi.org/10.1002/cpe.4460, January, 2018. A pdf version is available. A Guide for Achieving High Performance With Very Small Matrices On GPU: A case Study of Batched LU and Cholesky Factorizations, Azzam Haidar, Ahmad Abdelfattah, Mawussi Zounon, Stanimire Tomov, Jack Dongarra, IEEE Transactions on Parallel and Distributed Systems, Vol. 29, No. 5, May 2018, DOI: 10.1109/TPDS.2017.2783929 A pdf version is available. Investigating Power Capping toward Energy-Efficient Scientific Applications, A. Haidar, H. Jagode, A. YarKhan, P. Vaccaro, S. Tomov, J. Dongarra, Concurrency and Computations: Practice and Experience, February 2018, DOI: 10.1002/cpe.4485. A pdf version is available. Evaluation of Dataflow Programming Models for Electronic Structure Theory, H. Jagode, A. Danalis, R. Hoque, M. Faverge, and J. Dongarra, Concurrency and Computations: Practice and Experience, vol. 2018, issue e4490, pp. 1--20, May 2018. https://doi.org/10.1002/cpe.4490 A pdf version is available. Big Data and Extreme-Scale Computing: Pathways to Convergence-Toward a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry, M. Asch, et al., International Journal of High Performance Computing Applications, Volume 32 Issue 4, Fall 2018, pp 435-479. doi.org/10.1177/1094342018778123 A pdf version is available. PARILUT - A New Parallel Threshold ILU Factorization, H. Anzt, E. Chow, J. Dongarra, SIAM SISC, Vol 40 No 4, pp C503-C519. https://doi.org/10.1137/16M1079506 A pdf version is available. Autotuning in High-Performance Computing Applications, Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K. Hollingsworth, Boyana Norris, and Richard Vuduc, IEEE Proceedings, August 2018. DOI:10.1109/JPROC.2018.2841200 A pdf version is available. Evaluation of Directive-based Performance Portable Programming Models, M. Graham Lopez, Wayne Joubert, Veronica Vergara Larrea, Oscar Hernandez, Azzam Haidar, Stanimire Tomov, Jack Dongarra, Int. J. Signal and Imaging Systems Engineering, Vol. x, No. x, 2017 , 2017. DOI: 10.1504/IJHPCN.2017.10009064 A pdf version is available. Accelerating the SVD Two Stage Reduction and Divide-and-Conquer Using GPUs, Mark Gates, Stanimire Tomov, Jack Dongarra, Parallel Computing, accepted November 2017. A pdf version is available. Batched One-sided Factorizations of Tiny Matrices using GPUs: Challenges and Countermeasure, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack Dongarra, Journal of Computational Science, Volume 26, May 2018, pp 226-236. https://doi.org/10.1016/j.jocs.2018.01.005 A pdf version is available. Symmetric Indefinite Linear Solver using OpenMP Task on Manycore Architecture, I Yamazaki, J. Kurzak, P. Wu, Z. Mawussi, J. Dongarra, IEEE Transactions on Parallel and Distributed Systems, Volume: 29, Issue: 8, Aug. 1 2018. 10.1109/TPDS.2018.2808964 A pdf version is available. The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Exascale, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I. Yamazaki, SIAM Review, vol. 60, issue 4, pp. 808�865, November 2018, https://doi.org/10.1137/17M1117732. A pdf version is available. PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, P. Wu, I. Yamazaki, A. YarKhan, M. Abalenkovs, N. Bagherpour, S. Hammarling, J. Sistek, Accepted in ACM TOMS July 2018. A pdf version is available. Using Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning, Edmond Chow, Hartwig Anzt, Jennifer Scott, Jack Dongarra, Journal of Parallel and Distributed Computing, 119, pp 219-230, 2018. https://doi.org/10.1016/j.jpdc.2018.04.017 A pdf version is available. Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs�, IEEE Transaction on Parallel and Distributed Systems, Accepted May 2018. 10.1109/TPDS.2018.2842785 A pdf version is available. Computational Benefit of GPU Optimization for the Atmospheric Chemistry Modeling, Jian Sun, Joshua Fu, John Drake, Qingzhao Zhu, Azzam Haidar, Mark Gates, Stanimire Tomov, Jack Dongarra, Journal of Advances in Modeling Earth Systems, https://doi.org/10.1029/2018MS001276 A pdf version is available. Analyzing Performance of BiCGStab with Hierarchical-matrix on GPU clusters, Ichitaro Yamazaki, Ahmad Abdelfattah, Akihiro Ida, Satoshi Ohshima, Stanimire Tomov, Rio Yokota and Jack Dongarra, IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, British Columbia, Canada, IEEE, May 2018. A pdf version is available. Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms, T. Herault, Y. Robert, A. Bouteiller, D. Arnold, K. Ferreira, G. Bosilca, J. Dongarra, APDCM Workshop at IPDPS 2018, Best paper award. A pdf version is available. ADAPT: An Event-Based Adaptive Collective Communication Framework, Wu, W., G. Bosilca, X. Luo, T. Patinyasakdikul, L. Wang, and J. Dongarra, Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '18, Tempe, Arizona, ACM Press, June 2018. 10.1145/3208040.3208054 A pdf version is available. Harnessing GPU's Tensor Cores Fast FP16 Arithmetic to Speedup Mixed-Precision Iterative Refinement Solvers,� Tomov, Azzam, Dongarra, Higham, submitted to SC18. A pdf version is available. Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack Dongarra, accepted in IEEE HPEC September 2018, Waltham, MA. A pdf version is available. Do moldable applications perform better on failure-prone HPC platforms?, Euro-Par 2018, Resilience Workshop, Turin Italy, Accepted June 2018. A pdf version is available. Incomplete Sparse Approximate Inverses for Parallel Preconditioning, H. Anzt, T. Huckle, J. Brackle , J. Dongarra, Parallel Computing, Volume 71, January 2018, Pages 1-22, doi.org/10.1016/j.parco.2017.10.003. A pdf version is available. - 2017 - A Look Back on 30 Years of the Gordon Bell Prize, Gordon Bell, David Bailey, Alan H. Karp, Jack Dongarra, Kevin Walsh, International Journal of High Performance Computing and Networking, 2017, Vol. 31(6) 469�484, DOI: 10.1177/1094342017738610 A pdf version is available. The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems, Jack Dongarra, Sven Hammarling, Nick Higham, Samuel Relton, Pedro Valero-Lara, and Mawussi Zounon, ICCS�17, ETH Zurich, Procedia Computer Science, Volume 108, 2017, Pages 495-504, DOI:10.1016/j.procs.2017.05.138 A pdf version is available. Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices, Mark Gates, Jakub Kurzak, Piotr Luszczek, Yu Pei and Jack Dongarra, iWAPT 2017 at IPDPS 2017. A pdf version is available. Variable-Size Batched LU for Small Matrices and its Integration into Block-Jacobi Preconditioning, Hartwig Anzt, Jack Dongarra, Goran Flegar and Enrique S. Quintana-Orti, 2017 46th International Conference on Parallel Processing (ICPP), August 2017, pp 91-100, DOI: 10.1109/ICPP.2017.18 A pdf version is available. Out of Memory SVD Solver for Big Data, Azzam Haidar, Khairul Kabir, Diana Fayad, Stanimire Tomov, Jack Dongarra, 2017 IEEE High Performance Extreme Computing Conference. September 2017, pp 1-7, DOI: 10.1109/HPEC.2017.8091029 A pdf version is available. Towards Numerical Benchmark for Half-Precision Floating Point Arithmetic, Piotr Luszczek, Jakub Kurzak, Ichitaro Yamazaki and Jack Dongarra, 2017 IEEE High Performance Extreme Computing Conference, Boston, 2017, DOI: 10.1109/HPEC.2017.8091031 A pdf version is available. Sampling Algorithms to Update Truncated SVD, Ichitaro Yamazaki, Stanimire Tomov and Jack Dongarra, accepted at the IEEE Big Data 2017 Conference, Boston MA, December 11-16, 2017. A pdf version is available. Flexible Batched Sparse Matrix-Vector Product on GPUs, H. Anzt, G. Collins, J. Dongarra, G. Flegar, and E. S. Quintana-Orti, The 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA �17), Denver, Colorado, ACM Press, November 2017. A pdf version is available. Power-Aware Computing: Measurement, Control, and Performance Analysis for Intel Xeon Phi, Azzam Haidar, Heike Jagode, Asim Yarkhan, Phil Vaccaro, Stanimire Tomov and Jack Dongarra, 2017 IEEE High Performance Extreme Computing Conference (HPEC), September 2017, DOI: 10.1109/HPEC.2017.8091085 A pdf version is available. Scaling Point Set Registration in 3D Across Thread Counts on Multicore and Hardware Accelerator Platforms through Autotuning for Large Scale Analysis of Scientific Point Clouds, Piotr Luszczek, Jakub Kurzak, Ichitaro Yamazaki, David Keffer, and Jack Dongarra, accepted in IEEE International Workshop on Benchmarking, Performance Tuning and Optimization for Big Data Applications (BPOD 2017), December 2017, Boston, MA. A pdf version is available. Optimized Batched Linear Algebra for Modern Architectures, J. Dongarra, S. Hammarling, N. J. Higham, S.D. Relton, and M. Zounon, In Euro-Par 2017: Parallel Processing, F.F. Rivera, T.F. Pena, and J.C. Cabaleiro, editors, volume 10417 of Lecture Notes in Computer Science, Springer-Verlag, Cham, 2017, pages 511--522. DOI: 10.1007/978-3-319-64203-1_37. A pdf version is available. Variable-Size Batched Gauss-Huard for Block-Jacobi Preconditioning, Hartwig Anzt, Jack Dongarra, Goran Flegar, Enrique S. Quintana-Ort�, Andr�s E. Tom�s, Procedia Computer Science, Volume 108, pp 1783 - 1792, 2017, International Conference on Computational Science, ICCS 2017, 12-14 June 2017, Zurich, Switzerland, ISSN 1877-0509, DOI:10.1016/j.procs.2017.05.186. A pdf version is available. Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation, Mathieu Faverge, Julien Langou, Yves Robert, Jack J. Dongarra, 2017 IPDPS Conference, DOI:10.1109/IPDPS.2017.46 A pdf version is available. Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack Dongarra, ICCS�17, ETH Zurich, Procedia Computer Science, Volume 108, 2017, Pages 606-615, DOI:10.1016/j.procs.2017.05.250. A pdf version is available. Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives, I. Yamazaki, M. Hoemmen, P. Luszczek, J. Dongarra, IPDPS Workshop PDSEC-2017, Workshop Best Paper Award, 2017. DOI: 10.1109/IPDPSW.2017.65 A pdf version is available. Novel HPC Techniques to Batch Execution of Many Variable Size BLAS Computations on GPUs, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack Dongarra, ICS�17, Frankfurt, ISBN: 978-1-4503-5020-4, DOI:10.1145/3079079.3079103. A pdf version is available. Bringing High Performance Computing to Big Data Algorithms, H. Anzt, J. Dongarra, M. Gates, J. Kurzak , P. Luszczek, S. Tomov, I. Yamazaki in Handbook of Big Data Technologies, Editors: Albert Y. Zomaya, Sherif Sakr, ISBN: 978-3-319-49339-8 (Print) 978-3-319-49340-4 (Online), DOI:10.1007/978-3-319-49340-4, Springer, 2017. A pdf version is available. Preconditioned Krylov solvers on GPUs, Hartwig Anzt, Mark Gates, Jack Dongarra, Moritz Kreutzerd, Gerhard Welleind, Martin K�hlere, Parallel Computing, DOI:10.1016/j.parco.2017.05.006, June 2017. A pdf version is available. Scaling Point Set Registration in 3D Across Thread Counts on Multicore and Hardware Accelerator Platforms through Autotuning for Large Scale Analysis of Scientific Point Clouds, Piotr Luszczek, Jakub Kurzak, Ichitaro Yamazaki, David Keffer, and Jack Dongarra, accepted in IEEE International Workshop on Benchmarking, Performance Tuning and Optimization for Big Data Applications (BPOD 2017), December 2017, Boston, MA. DOI: 10.1109/BigData.2017.8258258 A pdf version is available. Variable-Size Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioning on Graphics Processors Parallel Computing, H. Anzt, G. Flegar, J. Dongarra, E, Qunintana Otri, Parallel Computing, doi.org/10.1016/j.parco.2017.12.006. A pdf version is available. Preconditioned Krylov solvers on GPUs, Hartwig Anzt, Mark Gates, Jack Dongarra, Moritz Kreutzerd, Gerhard Welleind, Martin K�hlere, Parallel Computing, DOI:10.1016/j.parco.2017.05.006, June 2017. A pdf version is available. Evaluation of Directive-based Performance Portable Programming Models,� M. Graham Lopez, Wayne Joubert, Veronica Vergara Larrea, Oscar Hernandez, Azzam Haidar, Stanimire Tomov, Jack Dongarra, International Journal of High Performance Computing and Networking, accepted May 2017. A pdf version is available. A Framework for Out of Memory Algorithms, K. Kabir, A. Haidar, S. Tomov, A. Bouteiller, J. Dongarra, in Kunkel J., Yokota R., Balaji P., Keyes D. (eds) High Performance Computing, ISC 2017. Lecture Notes in Computer Science, vol 10266. Springer, Frankfurt, Germany, June 19-21, 2017, DOI:10.1007/978-3-319-58667-0_9 A pdf version is available. Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs, Hartwig Anzt, Jack Dongarra, Goran Flegar and Enrique S. Quintana-Orti, Proceeding PMAM'17 Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, Pages 1-10, Austin, TX, USA � February 04 - 08, 2017, ISBN: 978-1-4503-4883-6 DOI:10.1145/3026937.3026940 A pdf version is available. High-Performance Cholesky Factorization for GPU-Only Execution, Azzam Haidar, Ahmad Abdelfattah, Stanimire Tomov and Jack Dongarra, Proceeding GPGPU-10 Proceedings of the General Purpose GPUs, Pages 42-52 Austin, TX, USA � February 04 - 08, 2017, DOI:10.1145/3038228.3038237 A pdf version is available. Updating Incomplete Factorization Preconditioners for Model Order Reduction, Hartwig Anzt, Edmond Chow, Jens Saak, and Jack Dongarra, Numerical Algorithms, November 2016, Volume 73, Issue 3, pp 611�630, DOI:10.1007/s11075-016-0110-2 A pdf version is available. Accelerating NWChem Coupled Cluster through dataflow-based Execution, A. Danalis, H. Jagode, and J. Dongarra, The International Journal of High Performance Computing Applications, 2017, DOI:10.1177/1094342016672543 A pdf version is available. On the Performance and Energy Efficiency of Sparse Linear Algebra on GPU, Hartwig Anzt, Stanimire Tomov, and Jack Dongarra, International Journal of High Performance Computing, 2017, DOI:10.1177/1094342016672081 A pdf version is available. Solving Dense Symmetric Indefinite Systems using GPUs, M. Baboulin, J. Dongarra, A. Remy, S. Tomov, I. Yamazaki, Concurrency and Computation: Practice and Experience, 2017, DOI:10.1002/cpe.4055 A pdf version is available. Fine-grained Bit-Flip Protection for Relaxation Methods, H. Anzt, J. Dongarra, and E Quintana-Orti, the Journal of Computational Science, 2017, DOI:10.1016/j.jocs.2016.11.013 A pdf version is available. Fast Cholesky Factorization on GPUs for Batch and Native Modes in MAGMA, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack Dongarra, Journal of Computational Science, Volume 20, May 2017, Pages 85�93 DOI:10.1016/j.jocs.2016.12.009 A pdf version is available. With Extreme Computing, the Rules Have Changed, Jack Dongarra, Stanimire Tomov, Piotr Luszczek, Jakub Kurzak, Mark Gates, Ichitaro Yamazaki, Hartwig Anzt, Azzam Haidar, and Ahmad Abdelfattah, IEEE CISE, April 2017, DOI:10.1109/MCSE.2017.48 A pdf version is available. Structure-aware Linear Solver for Realtime Convex Optimization for Embedded Systems," I. Yamazaki, S. Tomov, J. Dongarra, IEEE Embedded Systems Letters, May 2017, DOI: 10.1109/LES.2017.2700401 A pdf version is available. Design and Implementation of the PULSAR Programming System for Large Scale Computing, J. Kurzak, P. Luszczek, I. Yamazaki, Y. Robert, J. Dongarra, Supercomputing Frontiers and Innovations, 2017, DOI:10.14529/jsfi170101 A pdf version is available. Bringing High Performance Computing to Big Data Algorithms, H. Anzt, J. Dongarra, M. Gates, J. Kurzak , P. Luszczek, S. Tomov, I. Yamazaki in Handbook of Big Data Technologies Editors: Albert Y. Zomaya, Sherif Sakr, ISBN: 978-3-319-49339-8 (Print) 978-3-319-49340-4 (Online), DOI:10.1007/978-3-319-49340-4, Springer, 2017. A pdf version is available. Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices, Tingxing Dong, Azzam Haidar, Stanimire Tomov and Jack Dongarra, ICCS�17, ETH Zurich, Procedia Computer Science, Volume 108, 2017, Pages 1008�1018, DOI:10.1016/j.procs.2017.05.237 A pdf version is available. Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack Dongarra, ICCS�17, ETH Zurich, Procedia Computer Science, Volume 108, 2017, Pages 606-615, DOI:10.1016/j.procs.2017.05.250 A pdf version is available. The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems, Jack Dongarra, Sven Hammarling, Nick Higham, Samuel Relton, Pedro Valero-Laraand Mawussi Zounon, ICCS�17, ETH Zurich, Procedia Computer Science, Volume 108, 2017, Pages 495-504, DOI:10.1016/j.procs.2017.05.138 A pdf version is available. Novel HPC Techniques to Batch Execution of Many Variable Size BLAS Computations on GPUs, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov and Jack Dongarra, ICS 2017 Chicago, June 14 2017, DOI:10.1145/3079079.3079103 A pdf version is available. Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation, Mathieu Faverge, Julien Langou, Yves Robert, Jack J. Dongarra, 2017 IPDPS Conference, DOI:10.1109/IPDPS.2017.46 A pdf version is available. Variable-Size Batched Gauss-Huard for Block-Jacobi Preconditioning, Hartwig Anzt, Jack Dongarra, Goran Flegar, Enrique S. Quintana-Ort�, Andr�s E. Tom�s, Procedia Computer Science, Volume 108, pp 1783 - 1792, 2017, International Conference on Computational Science, ICCS 2017, 12-14 June 2017, Zurich, Switzerland, ISSN 1877-0509, DOI:10.1016/j.procs.2017.05.186. A pdf version is available. Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs, Hartwig Anzt, Jack Dongarra, Goran Flegar and Enrique S. Quintana-Orti, accepted PMAM 2017, December 2016. A pdf version is available. - 2016 - Linear algebra software for large-scale accelerated multicore computing, A. Abdelfattah, H. Anzt, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I., Yamazaki and A. YarKhan, Acta Numerica / Volume 25 / May 2016, pp 1 - 160, DOI: 10.1017/S0962492916000015. A pdf version is available. Report on the Sunway TaihuLight System, Jack Dongarra, University of Tennessee, Department of Electrical Engineering and Computer Science Tech Report UT-EECS-16-742, June 2016. A pdf version is available. Sunway TaihuLight Supercomputer Makes Its Appearance, Jack Dongarra, The National Science Review 2016 3: 265-266, September 2016, DOI: 10.1093/nsr/nww044. A pdf version is available. Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU, I. Yamazaki, S. Tomov, and J. Dongarra, ACM Transactions on Mathematical Software (TOMS), Volume 43 Issue 2, September 2016 DOI:>10.1145/2898347 A pdf version is available. On the Performance and Energy Efficiency of Sparse Linear Algebra on GPUs, H. Anzt, S. Tomov, and J. Dongarra, The International Journal of High Performance Computing Applications, DOI: 10.1177/1094342016672081. A pdf version is available. Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs, A. Abdelfattah, A. Haidar, S. Tomov, and J. Dongarra, International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016. A pdf version is available. High-Performance Tensor Contractions for GPUs, A. Abdelfattah, M. Baboulin , V. Dobrev, J. Dongarra , C. Earl , J. Falcou , A. Haidar , I. Karlin , T. Kolev , I. Masliah, International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016 A pdf version is available. Efficiency of General Krylov Methods on GPUs � An Experimental Study, Hartwig Anzt, Jack Dongarra, Moritz Kreutzer, Gerhard Wellein, Martin K�hler, AsHES Workshop, IPDPS, 2016. A pdf version is available. On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack Dongarra, The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2016), IPDPS 2016, Chicago, IL, IEEE, May 2016. A pdf version is available. GPU-Aware Non-contiguous Data Movement In Open MPI, W. Wu, G. Bosilca, R. vandeVaart, S. Jeaugey, and J. Dongarra, The 25th International Symposium on High Performance Distributed Computing (HPDC2016). A pdf version is available. Creating a Standardised Set of Batched BLAS Routines, Jack Dongarra, Sven Hammarling, Nicholas J. Higham, Samuel D. Relton, Pedro Valero-Lara and Mawussi Zounon, in the Proceedings of the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4, 2016), Gabrielle Allen, Jeffrey Carver et al, volume 1686, CEUR Workshop Proceedings, http://ceur-ws.org/Vol-1686/WSSSPE4_paper_3.pdf. A pdf version is available. Hessenberg Reduction with Transient Error Resilience on GPU-Based Hybrid Architectures, Y. Jai, P. Luszczek, and J. Dongarra, The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 2016, May 2016, Chicago. DOI: 10.1109/IPDPSW.2016.34 A pdf version is available. Non-GPU-resident Dense Symmetric Indefinite Factorization, I. Yamazaki, S. Tomov, and J. Dongarra, Concurrency and Computation: Practice and Experience, DOI: 10.1002/cpe.4012, November 2016. A pdf version is available. A New Metric for Ranking High Performance Computing Systems, Jack Dongarra, Michael A. Heroux, and Piotr Luszczek, National Science Review, Volume 3, Issue 1, March 2016, pp 30-35, DOI: 10.1093/nsr/nwv084. A pdf version is available. Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results, Julien Herrmann, George Bosilca, Thomas H�rault, Loris Marchal, Yves Robert, and Jack Dongarra, Parallel Computing, Volume 52, February 2016, pp. 22�41, DOI: 10.1016/j.parco.2015.09.005. A pdf version is available. Optimization and Performance Evaluation of the IDR Iterative Krylov Solver on GPUs, Hartwig Anzt, Moritz Kreutzer, Eduardo Ponce, Gregory D. Peterson, Gerhard Wellein, Jack Dongarra, The International Journal of High Performance Computing Applications, 1�11, 2016, DOI: 10.1177/1094342016646844 A pdf version is available. Experiences in Autotuning Matrix Multiplication for Energy Minimization on GPUs, Anzt, H., B. Haugen, J. Kurzak, P. Luszczek, and J. Dongarra, Concurrency in Computation: Practice and Experience, vol. 27, issue 17, pp. 5096-5113, DOI: 10.1002/cpe.3516. A pdf version is available. High Performance Conjugate Gradient Benchmark: A new Metric for Ranking High Performance Computing Systems,�J. Dongarra, M. Heroux, P. Luszczek, The International Journal of High Performance Computing Applications, Volume 30 Issue 1, Spring 2016. DOI: 10.1177/1094342015593158. A pdf version is available. Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results, Herrmann, J., G. Bosilca, T. Herault, L. Marchal, Y. Robert, and J. Dongarra, Parallel Computing, vol. 52, pp. 22-41, February 2016. DOI: 10.1016/j.parco.2015.09.005. A pdf version is available. Updating Incomplete Factorization Preconditioners for Model Order Reduction, Hartwig Anzt, Edmond Chow, Jens Saak, and Jack Dongarra, accepted in Numerical Algorithms, January 2016. A pdf version is available. Stability and Performance of Various Singular Value QR Implementations and Case-studies with Adaptive Mixed Precision on Multicore CPU with GPUs, Ichitaro Yamazaki, Stanimire Tomov, and Jack Dongarra, Accepted TOMS, February 2016. A pdf version is available. Performance Optimization of Sparse Matrix-Vector Multiplication for Multi-component PDE-based Applications using GPUs, Ahmad Ahmad, Hatem Ltaief, David Keyes, and Jack Dongarra, accepted Concurrency and Computation: Practice and Experience, April 2016. A pdf version is available. Porting the PLASMA Numerical Library to the OpenMP Standard, Asim YarKhan, Jakub Kurzak, Piotr Luszczek, and Jack Dongarra, accepted in International Journal of Parallel Programming, May 2016. A pdf version is available. Domain Overlap for Iterative Sparse Triangular Solves on GPUs, Hartwig Anzt, Edmond Chow, Daniel Szyld, and Jack Dongarra, Software for Exascale Computing, Leibniz Supercomputing Centre, Munich, Germany, Volume 113 of the series Lecture Notes in Computational Science and Engineering pp 527-545, Jan 25�27, 2016. DOI: 10.1007/978-3-319-40528-5_24 A pdf version is available. Performance, Design, and Autotuning of Batched GEMM for GPUs, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack Dongarra, High Performance Computing, Volume 9697 of the series Lecture Notes in Computer Science pp 21-38, 2016, DOI: 10.1007/978-3-319-41321-1_2 A pdf version is available. Accelerating the Conjugate Gradient Algorithm with GPU in CFD Simulations, Hartwig Anzt, Marc Baboulin, Jack Dongarra, Yvan Fournier, Frank Hulsemann, Amal Khabou and Yushan Wang, VECPAR 2016. A pdf version is available. Task-Based Cholesky Decomposition on Knights Corner using OpenMP, Joseph Dorris, Jakub Kurzak, Piotr Luszczek, Asim Yarkhan, Jack Dongarra, Awarded the Best Paper Award at the P^3MA workshop co-located with ISC, High Performance Computing, Volume 9945 of the series Lecture Notes in Computer Science pp 544-562, DOI: 10.1007/978-3-319-46079-6_37 A pdf version is available. LU, QR, and Cholesky Factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi, Azzam Haidar, Stanimire Tomov, Konstantin Arturov, Murat Guney, Shane Story, Jack Dongarra, 2016 IEEE High Performance Extreme Computing Conference (HPEC �16) Twentieth Annual HPEC Conference 13 - 15 September 2016, Waltham, MA USA. A pdf version is available. Performance Analysis and Acceleration of Explicit Integration for Large Kinetic Networks using Batched GPU Computations, A. Haidar, B. Brock, S. Tomov, M. Guidry, J. Billings, D. Shyles, J. Dongarra, 2016 IEEE High Performance Extreme Computing Conference (HPEC �16), September 13-15, 2016. A pdf version is available. Failure Detection and Propagation in HPC systems, George Bosilca, Aurelien Bouteiller, Amina Guermouche, Thomas Herault, Yves Robert, Pierre Sens, Jack Dongarra, Nominated for Best Paper, Proceedings of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Salt Lake City, Utah, IEEE Press, pp. 27:1-27:11, November 2016. A pdf version is available. Performance-Portable Autotuning of OpenCL Kernels for Convolutional Layers of Deep Neural Networks, Yaohung Tsai, Piotr Luszczek, Jakub Kurzak and Jack Dongarra, in the Machine Learning and HPC Environments Workshop associated with SC16, November 2016. A pdf version is available. Batched Generation of Incomplete Sparse Approximate Inverses on GPUs, H. Anzt, E. Chow, T. Huckle, J. Dongarra, Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp. 49�56, November 2016. A pdf version is available. Towards Achieving Performance Portability Using Directives for Accelerators, M. Lopez, V. Larrea, W. Joubert, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, Utah, Innovative Computing Laboratory, University of Tennessee, November 2016. A pdf version is available. Performance Analysis and Acceleration of Explicit Integration for Large Kinetic Networks using Batched GPU Computations, A. Haidar, B. Brock, S. Tomov, M. Guidry, J. Billings, D. Shyles, and J. Dongarra, 2016 IEEE High Performance Extreme Computing Conference (HPEC �16), Waltham, MA, IEEE, September 2016. A pdf version is available. Power Management and Event Verification in PAPI, H. Jagode, A. YarKhan, A. Danalis , and J. Dongarra, Tools for High Performance Computing 2015: Proceedings of the 9th International Workshop on Parallel Tools for High Performance Computing, September 2015, Dresden, Germany, Dresden, Germany, Springer International Publishing, pp. pp. 41-51, 2016. A pdf version is available. Search Space Generation and Pruning System for Autotuners, Piotr Luszczek, Mark Gates, Jakub Kurzak, Anthony Danalis, and Jack Dongarra, the 30th IEEE International Parallel & Distributed Processing Symposium, Chicago, IL, IEEE, May 2016. A pdf version is available. High-performance Matrix-Matrix Multiplications of Very Small Matrices, I. Masliah, A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, and J. Dongarra, 22nd International European Conference on Parallel and Distributed Computing (Euro-Par'16), Grenoble, France, Springer International Publishing, August 2016. A pdf version is available. Heterogeneous Streaming, C. Newburn, et al., The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2016, Chicago, IL, IEEE, May 2016. A pdf version is available. CUDA-aware non-contiguous data movement in Open MPI, Wei Wu, George Bosilca, Rolf vandeVaart, and Jack Dongarra, 25th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'16), Kyoto, Japan, ACM, June 2016. A pdf version is available. - 2015 - Exascale Computing and Big Data: The Next Frontier, Daniel A. Reed and Jack Dongarra, Communications of the ACM, Vol. 58 No. 7, Pages 56-68, DOI: 10.1145/2699414. A pdf version is available. Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures, M. Baboulin, J. Dongarra, A. R�my, S. Tomov, I. Yamazaki, the Proceedings of the 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Volume 9573 of the series Lecture Notes in Computer Science pp 86-95, DOI: 10.1007/978-3-319-32149-3_9 A pdf version is available. Accelerating Collaborative Filtering Using Concepts from High Performance Computing, Mark Gates, Hartwig Anzt, Jakub Kurzak, and Jack Dongarra, 2015 IEEE International Conference on Big Data (IEEE BigData, November 2015). DOI: 10.1109/BigData.2015.7363811 A pdf version is available. Strengthening compute and data intensive capacities of Armenia,� H. Astsatryan, V. Sahakyan, Y. Shoukourian, P.H. Cros, M. Dayde, J. Dongarra, P. Oster, in RoEduNet International Conference - Networking in Education and Research (RoEduNet NER), 2015 14th, vol., no., pp.28-33, 24-26, Sept. 2015 DOI: 10.1109/RoEduNet.2015.7311823 A pdf version is available. Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems, M. Abalenkovs, A. Abdelfattah, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I. Yamazaki, A. YarKhan, Supercomputing Frontiers and Innovations, Volume 2, Number 4, pages 67-86, 2015, DOI: 10.14529/jsfi1504 A pdf version is available. The TOP500 List of Supercomputers and Progress in High Performance Computing, Erich Strohmaier, Hans W. Meuer, Jack Dongarra, Horst D. Simon, IEEE Computer, No.11 - Nov. (2015 vol.48), pp. 42�49, http://doi.ieeecomputersociety.org/10.1109/MC.2015.338. A pdf version is available. Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs, Jakub Kurzak, Hartwig Anzt, Mark Gates, and Jack Dongarra, IEEE Transactions on Parallel and Distributed Systems, no. 1045�9219, November 2015. A pdf version is available. Mixing LU-QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers, Mathieu Faverge, Julien Herrmann, Julien Langou, Bradley Lowery, Yves Robert, and Jack Dongarra, Journal on Parallel and Distributed Computing, Volume 85, November 2015, pp. 32�46, http://dx.doi.org/10.1016/j.jpdc.201. A pdf version is available. A Scalable Approach to Solving Dense Linear Algebra Problems on Hybrid CPU-GPU Systems, Fengguang Song and Jack Dongarra, Concurrency and Computation: Practice and Experience, Volume 27, Issue 14, 25 September 2015, pp. 3702�3723, DOI: 10.1002/cpe.3403. A pdf version is available. A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination, Simplice Donfack, Jack Dongarra, Mathieu Faverge, Mark Gates, Jakub Kurzak, Piotr Luszczek, and Ichitaro Yamazaki, Concurrency and Computation: Practice and Experience Volume 27, Issue 5, pp. 1292�1309, 10 April 2015, http://dx.doi.org/10.1002/cpe.3306. A pdf version is available. Experiences in Autotuning Matrix Multiplication for Energy Minimization on GPUs, Hartwig Anzt, Blake Haugen, Jakub Kurzak, Piotr Luszczek, and Jack Dongarra, Concurrency and Computing: Practice and Experience, Volume 27, Issue 17, December 2015, pp. 5096�5113, http://dx.doi.org/10.1109/IPDPSW.2014.107. A pdf version is available. Mixed-Precision Cholesky QR Factorization and its Case Studies on Multicore CPUS with Multiple GPUs, Ichitaro Yamazaki, Stanimire Tomov, and Jack Dongarra, SIAM J. Sci. Comput. 37-3 (2015), pp. C307-C330, http://dx.doi.org/10.1137/14M0973773. A pdf version is available. A New Metric for Ranking High Performance Computing Systems, Jack Dongarra, Michael A. Heroux, and Piotr Luszczek, National Science Review, January 2016, DOI: 10.1093/nsr/nwv084. A pdf version is available. Computing Low-rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and its Application to Solving a Hierarchically Semiseparable Linear System of Equations, Ichitaro Yamazaki, Stanimire Tomov and Jack Dongarra, Scientific Programming, vol. 2015, Article ID 246019, 17 pages, 2015, http://dx.doi.org/10.1155/2015/246019. A pdf version is available. Batched Matrix Computations on Hardware Accelerators Based on GPUs, Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stanimire Tomov, and Jack Dongarra, The International Journal of High Performance Computing Applications, May 2015 29: 193-208, first published on February 9, 2015, http://dx.doi.org/1177/1094342014567546. A pdf version is available. PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution, Anthony Danalis, Heike Jagode, George Bosilca and Jack Dongarra, to appear IEEE Cluster 2015, Chicago, Illinois, USA, Sept. 8-11, 2015. A pdf version is available. Random Sampling to Update Partial Singular Value Decomposition on a Hybrid CPU/GPU Cluster, Ichitaro Yamazaki, Jakub Kurzak, Piotr Luszczek, Jack Dongarra, to appear SC15, November 2015. A pdf version is available. Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs, Théo Mary, Ichitaro Yamazaki, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Jack Dongarra, to appear SC15, November 2015. A pdf version is available. Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems, Thomas Herault, Aurelien Bouteiller, George Bosilca, Marc Gamell, Keita Teranishi, Manish Parashar, Jack Dongarra, to appear SC15, November 2015. A pdf version is available. Efficient Implementation Of Quantum Materials Simulations On Distributed CPU-GPU Systems, Raffaele Solc� , Anton Kozhevnikov, Azzam Haidar, Stanimire Tomov, Thomas C. Schulthess, Jack Dongarra, to appear SC15, finalist for the Best Paper Award, November 2015. A pdf version is available. Dense Symmetric Indefinite Factorization on GPU Accelerated Architecture, Marc Baboulin, Jack Dongarra, Adrien Remy, Stanimire Tomov, and Ichitaro Yamazaki, to appear PPAM 2015, Krakow Poland, 2015. A pdf version is available. Plan B: Interruption of Ongoing MPI Operations to Support Failure Recovery, Aurelien Bouteiller, George Bosilca and Jack Dongarra, to appear EUROMPI Conference, Spetember 2015. A pdf version is available. Flexible Linear Algebra Development and Scheduling with Cholesky Factorization, Azzam Haidar, Asim YarKhan, Chongxiao Cao, Piotr Luszczek, Stanimire Tomov, Jack Dongarra, 17th IEEE International Conference on High Performance Computing and Communications, New York, New York, August 2015. A pdf version is available. Iterative Sparse Triangular Solves for Prconditioning, Hartwig Anzt, Edmond Chow and Jack Dongarra, to appear in EuroPar 2015, Vienna Austria, August 2015. A pdf version is available. Design for a Soft Error Resilient Dynamic Task-based Runtime, Chongxaio Cao, George Bosilca, Thomas Herault, and Jack Dongarra, 29th IEEE International Parallel & Distributed Processing Symposium, Hyderabad, INDIA, May 2015. A pdf version is available. Hierarchical DAG Scheduling for Hybrid Distributed Systems, Wei Wu, George Bosilca, Aurelien Bouteiller, Mathieu Faverge, and Jack Dongarra, 29th IEEE International Parallel & Distributed Processing Symposium, Hyderabad, INDIA, May 2015. A pdf version is available. Performance Analysis and Optimisation of Two-Sided Factorization Algorithms for Heterogeneous Platform, International Conference on Computational Science 2015, ICCS 2015, Computational Science at the Gates of Nature Edited By Slawomir Koziel, Leifur Leifsson, Michael Lees, Valeria V. Krzhizhanovskaya, Jack Dongarra and Peter M.A. Sloot. doi:10.1016/j.procs.2015.05.222 A pdf version is available. Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product, H. Anzt, S. Tomov, and J. Dongarra, In Spring Simulation Multi-Conference 2015 (SpringSim15), 2015. A pdf version is available. Performance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architecture, Khairul Kabir, Azzam Haidar, Stanimire Tomov, Jack Dongarra. Best Paper Award at 2015 Spring Simulation Multiconference, 23rd High Performance Computing Symposium (HPC 2015). A pdf version is available. Energy Efficiency and Performance Frontiers for Sparse Computations on GPU Supercomputers, Hartwig Anzt, Stan Tomov, and Jack Dongarra, PMAM '15 Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, ACM New York, NY, USA 2015, doi:10.1145/2712386.2712387 A pdf version is available. Towards Batched Linear Solvers on Accelerated Hardware Platforms, Azzam Haidar, Piotr Luszczek, Stanimire Tomov, and Jack Dongarra, In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, San Francisco, CA, February 7-11, 2015. 10.1145/2688500.2688534 A pdf version is available. Optimization for Performance and Energy for Batched Matrix Computations on GPUs, Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stanimire Tomov, and Jack Dongarra, 8th Workshop on General Purpose Processing Using GPUs, (GPGPU 8), San Francisco, February 7, 2015. 10.1145/2716282.2716288 A pdf version is available. Optimizing Krylov Subspace Solvers on Graphics Processing Units, Hartwig Anzt, Stanimire Tomov, Piotr Luszczek, Ichitaro Yamazaki, Jack Dongarra, and William Sawyer, Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International, pp 941-949, DOI: 10.1109/IPDPSW.2014.107 A pdf version is available. Experiences in Autotuning Matrix Multiplication for Energy Minimization on GPUs, Hartwig Anzt, Blake Haugen, Jakub Kurzak, Piotr Luszczek, and Jack Dongarra, accepted in Concurrency and Computing: Practice and Experience, March 2015. DOI: 10.1002/cpe.3516 A pdf version is available. Mixing LU-QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers, Mathieu Faverge, Julien Herrmann, Julien Langou, Bradley Lowery, Yves Robert, and Jack Dongarra, accepted in Journal on Parallel and Distributed Computing, March 2015. http://dx.doi.org/10.1016/j.jpdc.201 A pdf version is available. Mixed-Precision Cholesky QR Factorization and its Case Studies on Multicore CPUS with Multiple GPUs, I. Yamazaki, S. Tomov, and J. Dongarra, SIAM J. Sci. Comput., Volume 37, Issue 3, DOI:10.1137/14M0973773 A pdf version is available. Updating Incomplete Factorization Preconditioners for Model Order Reduction, Hartwig Anzt, Edmond Chow, Jens Saak, and Jack Dongarra, To appear in Parallel Computing. A pdf version is available. A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination, Simplice Donfack, Jack Dongarra, Mathieu Faverge, Mark Gates, Jakub Kurzak, Piotr Luszczek, Ichitaro Yamazaki, Submitted to Concurrency and Computation: Practice and Experience, Volume 27, Issue 5, pages 1292-1309, April 2015. DOI: 10.1002/cpe.3306 A pdf version is available. Computing Low-rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and its Application to Solving a Hierarchically Semiseparable Linear System of Equations, Ichitaro Yamazaki, Stanimire Tomov and Jack Dongarra, Scientific Programming, vol. 2015, Article ID 246019, 17 pages, 2015. http://dx.doi.org/10.1155/2015/246019. A pdf version is available. Acceleration of GPU-based Krylov Solvers via Data Transfer Reduction, Hartwig Anzt, Stanimire Tomov, Piotr Luszczek, William Sawyer and Jack Dongarra, The International Journal of High Performance Computing Applications, accepted April 2015, http://dx.doi.org/10.1177/1094342015580139. A pdf version is available. Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy, Aurelien Bouteiller, Thomas Herault, George Bosilca, Peng Du, and Jack Dongarra, ACM Transactions on Parallel Computing, Volume 1 Issue 2, January 2015, http://dx.doi.org/10.1145/2686892. A pdf version is available. HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Xeon Phi, Jack Dongarra, Mark Gates, Azzam Haidar, Yulu Jia, Khairul Kabir, Piotr Luszczek, and Stanimire Tomov, Scientific Programming, Volume 2015 (2015), Article ID 502593, 11 pages http://dx.doi.org/10.1155/2015/502593. A pdf version is available. Batched Matrix Computations on Hardware Accelerators Based on GPUs, Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stanimire Tomov, and Jack Dongarra, The International Journal of High Performance Computing Applications, May 2015 29: 193-208, first published on February 9, 2015, http://dx.doi.org/1177/1094342014567546. A pdf version is available. Composing Resilience Techniques: ABFT, Periodic and Incremental Checkpointing, George Bosilca, Aurelien Bouteiller, Thomas Herault, Yves Robert, and Jack Dongarra, International Journal of Networking and Computing, Volume 5, Number 1, pages 2-25, January 2015. A pdf version is available. Exascale Computing and Big Data: The Next Frontier, Daniel A. Reed and Jack Dongarra, accepted in Communications of the ACM, Vol. 58 No. 7, Pages 56-68, DOI: 10.1145/2699414. A pdf version is available. - 2014 - Unified Model for Assessing Checkpointing Protocols at Extreme-Scale, George Bosilca, Aurelien Bouteiller, Elisabeth Brunet, Franck Cappello, Jack Dongarra, Amina Guermouche, Thomas Herault, Yves Robert, Frederic Vivien, and Dounia Zaidouni, Concurrency and Computation: Practice and Experience, Volume 26, Issue 17, pp. 2772�2791, 10 December 2014, DOI: 10.1002/cpe.3173. A pdf version is available. Performance of Various Computers Using Standard Linear Equations Software, (Linpack Benchmark Report), Jack J. Dongarra, University of Tennessee Computer Science Technical Report, CS-89-85, 2014. A postscript version is available. Parallel Simulation of Superscalar Scheduling, Blake Haugen, Piotr Luszczek, Jakub Kurzak, Asim YarKhan, and Jack Dongarra, CPP'14: International Conference on Parallel Processing, Minneapolis, MN, 2014, DOI: 10.1109/ICPP.2014.21 A pdf version is available. Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors, Azzam Haidar, Chongxiao Cao, Ichitaro Yamazaki, Jack Dongarra, Mark Gates, Piotr Luszczek, and Stan Tomov, Scala 2014, ACM, New Orleans, LA, November 17, 2014, DOE:10.1109/ScalA.2014.8 A pdf version is available. Access-averse Framework for Computing Low-rank Matrix Approximations, Ichitaro Yamazaki, Theo Mary, Jakub Kurzak, Stanimire Tomov, and Jack Dongarra, First International Workshop on High Performance Big Graph Data Management, Analysis, and Mining (in Conjunction with IEEE BigData'14), October, 27, 2014, Bethesda, MD, Pages: 70 - 77, DOI: 10.1109/BigData.2014.7004374 A pdf version is available. PTG: An Abstraction for Unhindered Parallelism, Anthony Danalis, George Bosilca, Aurelien Bouteiller, Thomas Herault, and Jack Dongarra, WOLFHPC '14 Proceedings of the Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing Pages 21-30, SC14 Workshop, New Orleans, LA, November 17, 2014, DOI:10.1109/WOLFHPC.2014.8 A pdf version is available. Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES, Ichitaro Yamazaki, Stanimire Tomov, and Jack Dongarra, ScalA2014, Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), New Orleans, LA, November 17, 2014. DOI:10.1109/ScalA.2014.6 A pdf version is available. Power Monitoring with PAPI for Extreme Scale Architectures and Dataflow-based Programming Models, McCraw, Heike, Ralph, James, Danalis, Anthony, Dongarra, Jack, Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications (HPCMASPA 2014), IEEE Cluster 2014, IEEE, Madrid, Spain, September, 2014. DOI: 10.1109/CLUSTER.2014.6968672 A pdf version is available. LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU, Tingxing Dong, Azzam Haidar, Piotr Luszczek, James Austin Harris, Stanimire Tomov, and Jack Dongarra, High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), Paris, France, 2014, DOI:10.1109/HPCC.2014.30 A pdf version is available. A Step towards Energy Efficient Computing: Redesigning A Hydrodynamic Application on CPU-GPU, Tingxing Dong, Veselin Dobrev, Tzanio Kolev, Robert Rieben, Stanimire Tomov, and Jack Dongarra, 28th IEEE International Parallel & Distributed Processing Symposium, 2014, DOI: 10.1109/IPDPS.2014.103 A pdf version is available. clMAGMA: High Performance Dense Linear Algebra with OpenCL, Chongxiao Cao, Jack Dongarra, Peng Du, Mark Gates, Piotr Luszczek, Stanimire Tomov, IWOCL '14, May 12 - 13 2014, Bristol, United Kingdom. A pdf version is available. A Scalable Approach to Solving Dense Linear Algebra Problems on Hybrid CPU-GPU Systems, Fengguang Song and Jack Dongarra, accepted in Concurrency and Computation: Practice and Experience, August 2014. DOI: 10.1002/cpe.3403 A pdf version is available DOE: Assessment of Workforce Development Needs in office of Science Research Disciplines, DOE ASCAC Subcommittee Report, B. Chapman, et. al, July 2014. A pdf version is available. Top Ten Exascale Research Challenges, DOE ASCAC Subcommittee Report, 2014, R. Lucas, et. al. A pdf version is available. Applied Mathematics Research for Exascale Computing, Jack Dongarra (co-chair, Oak Ridge National Laboratory) and Jeffrey Hittinger (co-chair, Lawrence Livermore National Laboratory, et. al. DOE Report for the Office of Science, Advanced Scientific Computing Research, 2014. A pdf version is available. Unified Model for Assessing Checkpointing Protocols at Extreme-Scale, George Bosilca, Aurelien Bouteiller, Elisabeth Brunet, Franck Cappello, Jack Dongarra, Amina Guermouche, Thomas Herault, Yves Robert, Frederic Vivien, and Dounia Zaidouni, accepted in Concurrency and Computation: Practice and Experience, Volume 26, Issue 17, pages 2772-2791, 10 December 2014, DOI: 10.1002/cpe.3173. A pdf version is available. Accelerating Numerical Dense Linear Algebra Calculations with GPUs, Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Ichitaro Yamazaki, pp. 3-28, in Numerical Computations with GPUs, edited by Volodymyr Kindratenko, Springer, 2014, DOI:10.1007/978-3-319-06548-9_1. A pdf version is available. Looking Back at Dense Linear Algebra Software, Piotr Luszczek, Jakub Kurzak, and Jack Dongarra, Journal of Parallel and Distributed Computing, pp 2548-2560, 2014. http://dx.doi.org/10.1016/j.jpdc.2013.10.005 A pdf version is available. A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks, Azzam Haidar, Stanimire Tomov, Jack Dongarra, Raffaele Solc`a, Thomas Schulthess, International Journal of High Performance Computing Applications, volume 28, number 2 pp 196-209, 2014. DOI: 10.1177/1094342013502097 A pdf version is available. Update Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization, J. Dongarra, M. Faverge, P. Luszcsek, Concurrency and Computation: Practice and Experience, Volume 26, Issue 7, pp 1408-1431, DOI: 10.1002/cpe.3110, 2014. A pdf version is available. Model-Driven One-Sided Factorizations on Multicore, Accelerated Systems, Jack Dongarra, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Asim YarKhan, Supercomputing Frontiers and Innovations, volume 1, number 1, 2014. A pdf version is available. Performance and Reliability Trade-offs for the Double Checkpointing Algorithm, Jack Dongarra, Thomas Herault and Yves Robert, The International Journal of Networking and Computing, Vol 4 No 1, p. 23-41, 2014. A pdf version is available. An Efficient Distributed Randomized Algorithm For Solving Large Dense Symmetric Indefinite Linear Systems, Marc Baboulin, Dulceneia Becker, George Bosilca, Anthony Danalis, and Jack Dongarra, Parallel Computing, Volume 40 Issue 7, July 2014, pp 213-223. DOI: 10.1016/j.parco.2013.12.003 A pdf version is available. HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi, Jack Dongarra, Mark Gates, Azzam Haidar, Yulu Jia, Khairul Kabir, Piotr Luszczek, and Stanimire Tomov, Volume 2015 (2015), Article ID 502593, Scientific Programming. DOI: 10.1155/2015/502593 A pdf version is available. Exascale Computing and Big Data: The Next Frontier, Daniel A. Reed and Jack Dongarra, DOI: 10.1145/2699414, Communications of the ACM, Vol. 58 No. 7, Pages 56-68, July 2015. A pdf version is available. Communication-Avoiding Symmetric-Indefinite Factorization, G. Ballard, D. Becker, J. Demmel, J. Dongarra, A. Druinsky, I. Peled, O. Schwartz, S. Toledo, and I. Yamazaki, DOI:10.1137/130929060, SIAM J. Matrix Anal. Appl. 35(4): 1364-1460 (2014). A pdf version is available. Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy, Aurelien Bouteiller, Thomas Herault, George Bosilca, Peng Du, and Jack Dongarra, DOI: 10.1145/2686892, ACM Transactions on Parallel Computing, Volume 1 Issue 2, January 2015. A pdf version is available. Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results, Julien Herrmann, George Bosilca, Thomas Hurault, Loris Marchal, Yves Robert, Jack Dongarra, submitted to Parallel Computing May 2014. A pdf version is available. Optimizing Krylov Subspace Solvers on Graphics Processing Units, Hartwig Anzt, Stanimire Tomov, Piotr Luszczek, Ichitaro Yamazaki, Jack Dongarra, and William Sawyer, submitted to International Journal of High Performance Computing Applications 2014. A pdf version is available. A Scalable Approach to Solving Dense Linear Algebra Problems on Hybrid CPU-GPU Systems, Fengguang Song and Jack Dongarra, DOI: 10.1002/cpe.3403, Concurrency and Computation: Practice and Experience, October 2014. A pdf version is available. LAPACK, CRC Handbook on Linear Algebra, Second Edition, Zhaojun Bai, James Demmel, Jack Dongarra, Julien Langou, and Jenny Wang, Editor Leslie Hogben, CRC Press, ISBN 9781466507289, 2014. A pdf version is available. Accelerating Numerical Dense Linear Algebra Calculations with GPUs, Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Ichitaro Yamazaki, to appear in Numerical Computations with GPUs, edited by Volodymyr Kindratenko, Springer, 2014. A pdf version is available. Computing Least Squares Condition Numbers on Hybrid Multicore/GPU Systems, M. Baboulin and J. Dongarra and R. Lacroix, Proceedings for the Applied Mathematics, Modeling and Computational Science (AMMCS) conference, Vol. 117 (2015). A pdf version is available. New Multi-Stage Algorithm for Symmetric Eigenvalues and Eigenvectors Achieves Two-Fold Speedup, A. Haidar, P. Luszczek, J. Dongarra, Best Paper Award, Workshop on Parallel and Distributed Scientific and Engineering Computing, Phoenix, AZ, May, 2014. A pdf version is available. Designing LU-QR Hybrid Solvers for Performance and Stability, Mathieu Faverge, Julien Herrmann, Julien Langou, Bradley Lowery, Yves Robert, and Jack Dongarra, 28th IEEE International Parallel & Distributed Processing Symposium, 2014. A pdf version is available. Redesigning A Hydrodynamic Application on CPU-GPU, Tingxing Dong, Veselin Dobrev, Tzanio Kolev, Robert Rieben, Stanimire Tomov, Jack Dongarra, 28th IEEE International Parallel & Distributed Processing Symposium. A pdf version is available. Improving the Performance of CA-GMRES on Multicores with Multiple GPUs, I. Yamazaki, H. Anzt, S. Tomov, M. Hoemmen, and J. Dongarra, 28th IEEE International Parallel & Distributed Processing Symposium. A pdf version is available. Unified Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment, A. Haidar, C. Cao, J. Dongarra, P. Luszczek, S. Tomov, A. YarKhan, K. Kabir, 28th IEEE International Parallel & Distributed Processing Symposium. A pdf version is available. Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for CA-GMRES on GPUs, Best Paper Award, Ichitaro Yamazaki, Stanimire Tomov, Tingxing Dong and Jack Dongarra VECPAR 2014, June 30 - July 3, 2014, Eugene, Oregon. A pdf version is available. Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem, Mark Gates, Azzam Haidar and Jack Dongarra VECPAR 2014, June 30 - July 3, 2014, Eugene, Oregon. A pdf version is available. Self-Adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures, Hartwig Anzt, Dimitar Lukarski, Stan Tomov and Jack Dongarra VECPAR 2014, June 30 - July 3, 2014, Eugene, Oregon. A pdf version is available. Hybrid Multi-Elimination ILU Preconditioners on GPUs, Dimitar Lukarski, Hartwig Anzt, Stanimire Tomov, and Jack Dongarra, 23rd Heterogeneity in Computing Workshop (HCW 2014), in Proc. of IPDPS 2014, Phoenix, Arizona, May 19-23, 2014. A pdf version is available. Optimizing Krylov Subspace Solvers on Graphics Processing Units, Hartwig Anzt, Stanimire Tomov, Piotr Luszczek, Ichitaro Yamazaki, Jack Dongarra, and William Sawyer, The Third International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), May 19, 2014, Phoenix, AZ, part of IPDPS Conference. A pdf version is available. MIAMI: A Framework for Application Performance Diagnosis, G. Marin, J. Dongarra, and D. Terpstra, ISPASS-2014 2014 IEEE International Symposium on Performance Analysis of Systems and Software March 23-25, 2014 Hyatt Regency Hotel in Monterey, CA. A pdf version is available. Assessing the Impact of ABFT and Checkpoint Composite Strategies, Bosilca, G., Bouteiller, A., Herault, T., Robert, Y., Dongarra, J. IPDPSW, APDCM 2014, Phoenix, AZ, May, 2014. A pdf version is available. Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs, Simplice Donfack, Stanimire Tomov and Jack Dongarra, Fourth International Workshop on Accelerators and Hybrid Exascale Systems, May 19, 2014. A pdf version is available. Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime, Ichitaro Yamazaki, Jakub Kurzak, Piotr Luszczek, Jack Dongarra, Parallel Processing Letters, Volume 24, Number 4, December 2014, doi: 10.1142/S0129626414420043. A pdf version is available. Scaling Up Matrix Computations on Shared-Memory Manycore Systems with 1000 CPU Cores, Fengguang Song and Jack Dongarra, Proceeding ICS '14 Proceedings of the 28th ACM international conference on Supercomputing, pp 333-342, ACM New York, NY, USA, ISBN: 978-1-4503-2642-1 doi>10.1145/2597652.2597670 A pdf version is available. Heterogenous Acceleration for Linear Algebra in Mulit-Coprocessor Environments, Azzam Haidar, Piotr Luszczek, Stanimire Tomov and Jack Dongarra VECPAR 2014, June 30 - July 3, 2014, Eugene, Oregon, accepted March 2014. A pdf version is available. A Fast Batched Choleksy Factorization on a GPU, Tingxing Dong, Azzam Haidar, Stanimire Tomov and Jack Dongarra, 43rd International Conference on Parallel Processing (ICPP-2014), Minneapolis, USA, during September 9-12, 2014. A pdf version is available. clMAGMA: High Performance Dense Linear Algebra with OpenCL, Chongxiao Cao, Jack Dongarra, Peng Du, Mark Gates, Piotr Luszczek, Stanimire Tomov, The International Workshop on OpenCL, Bristol University, England, May 12-13, 2014. A pdf version is available. Utilizing Dataflow-based Execution for Coupled Cluster Methods, Heike McCraw, Anthony Danalis, Thomas Herault, George Bosilca, Jack Dongarra, Karol Kowalski, Theresa L. Windus, Poster at Clusters 2014. A pdf version is available. - 2013 - Trip Report to Changsha and the Tianhe-2 Supercomputer, J. Dongarra, June 3, 2013. A pdf version is available. Extending the Scope of the Checkpoint-on-Failure Protocol for Forward Recovery in Standard MPI, Wesley Bland, Peng Du, Aurelien Bouteiller, Thomas Herault, George Bosilca, and Jack J. Dongarra, Concurrency and Computing: Practice and Experience, Volume 25, Issue 17, pp. 2381�2393, DOI: 10.1002/cpe.3100. A pdf version is available. Extending the Scope of the Checkpoint-on-Failure Protocol for Forward Recovery in Standard MPI, Wesley Bland, Peng Du, Aurelien Bouteiller, Thomas Herault, George Bosilca, and Jack J. Dongarra, Concurrency and Computing: Practice and Experience, Volume 25, Issue 17, pages 2381-2393, 2013, DOI: 10.1002/cpe.3100. A pdf version is available. Toward a New Metric for Ranking High Performance Computing Systems, M. Heroux and J. Dongarra, UTK EECS Tech Report and Sandia National Labs Report SAND2013-4744, June 2013. A pdf version is available. A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks, Azzam Haidar, Stanimire Tomov, Jack Dongarra, Raffaele Solc`a, Thomas Schulthess, International Journal of High Performance Computing Applications, accepted July 2013. A pdf version is available. PaRSEC: A programming paradigm exploiting heterogeneity for enhancing scalability, George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Thomas Herault, Jack J. Dongarra, accepted in IEEE Computing in Science and Engineering, September 2013. A pdf version is available. Unified Model for Assessing Checkpointing Protocols at Extreme-Scale, George Bosilca, Aurelien Bouteiller, Elisabeth Brunet, Franck Cappello, Jack Dongarra, Amina Guermouche, Thomas Herault, Yves Robert, Frederic Vivien, and Dounia Zaidouni, accepted in Concurrency and Computation: Practice and Experience, October 2013. A pdf version is available. Tridiagonalization of a Dense Symmetric Matrix On Multiple GPUs and Its Application to Symmetric Eigenvalue Problems, Ichitaro Yamazaki, Tingxing Dong, Raffaele Solc�, Stanimire Tomov, Jack Dongarra, Thomas Schulthess, Concurrency and Computation: Practice and Experience, published online, October 2013, DOI: 10.1002/cpe.3152 A pdf version is available. Post-Failure Recovery of MPI Communication Capability: Design and Rationale, Wesley Bland, Aurelien Bouteiller, Thomas Herault, George Bosilca and Jack J. Dongarra, International Journal of High Performance Computing Applications, Volume 27, Issue 3, Fall 2013, pp 44-254, DOI: 10.1177/1094342013488238. A pdf version is available. Toward High Performance Divide and Conquer Eigensolver for Dense Symmetric Matrices, Azzam Haidare Hatem Ltaief, and Jack Dongarra, SIAM SISC, Vol. 34, No. 6, pp. C249-C274. A pdf version is available. Accelerating Linear System Solutions Using Randomization Techniques, Marc Baboulin, Jack Dongarra, Julien Herrmann, and Stanimire Tomov, ACM TOMS, Vol. 39, No 2 (2013). A pdf version is available. Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms, Fred G. Gustavson, Jerzy Wasniewski, Jack J. Dongarra, J. Herrero, and J. Langou, ACM Transactions on Mathematical Software (TOMS), Vol. 39, No 2 (2013). A pdf version is available. High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures, H. Ltaief, P. Luszczek, and J. Dongarra, ACM Transactions on Mathematical Software, Volume 39, Issue 3, April 2013. A pdf version is available. An Evaluation of User-Level Failure Mitigation support in MPI, Aurelien Bouteiller, Wesley Bland, Thomas Herault, Joshua Hursey, George Bosilca and Jack Dongarra, Recent Advances in the Message Passing Interface, Lecture Notes in Computer Science Volume 7490, 2012, pp 193-203, ISSN: 0010-485X, April 2013. A pdf version is available. Kernel-Assisted and Topology-Aware Collective Communications on Multi-core/Many-core Platforms, Teng Ma, George Bosilca, Aurelien Bouteiller, Jack Dongarra, Journal of Parallel and Distributed Computing, Volume 73, Issue 7, pp. 1000-1010, July 2013. (Best paper award IPDPS 2013 Conference) A pdf version is available. BlackjackBench: Portable Hardware Characterization with Automated Results Analysis, Anthony Danalis, Piotr Luszczek, Gabriel Marin, Jeffrey S. Vetter and Jack Dongarra, Computer Journal, 2013; doi: 10.1093/comjnl/bxt057. A pdf version is available. Enabling Workflows in GridSolve: Request Sequencing and Service Trading, Yinan Li, Asim YarKhan, Jack Dongarra, Keith Seymour, and Aurlie Hurault, The Journal of Supercomputing, June 2013, Volume 64, Issue 3, pp 1133-1152. A pdf version is available. Correlated Set Coordination in Fault Tolerant Message Logging Protocols, A. Boureiller, T. Herault, G. Bosilca, J. Dongarra, Concurrency and Computation: Practice and Experience, Volume 25, Issue 4, pages 572-585, 2013. A pdf version is available. LU Factorization with Partial Pivoting for a Multicore System with Accelerators, J. Kurzak, P. Luszczek, and J. Dongarra, IEEE Transactions on Parallel and Distributed Computing, August 2013 (vol. 24 no. 8), pp. 1613-1621. A pdf version is available. Soft Error Resilient QR Factorization for Hybrid System with GPGPU,P. Du, P. Luszczek, S. Tomov, and J. Dongarra, accepted in Journal of Computational Science, January 2013. A pdf version is available. Hierarchical QR factorization algorithms for multi-core cluster systems, Jack Dongarra, Mathieu Faverge, Thomas Herault, Mathias Jacquelin, Julien Langou, Yves Robert, Parallel Computing, Volume 39, Issues 4-5, April-May 2013, Pages 212��232. A pdf version is available. A Block-Asynchronous Relaxation Method for Graphics Processing Units, Hartwig Anzt, Stanimire Tomov, Jack Dongarra, Vincent Heuveline, Journal of Parallel and Distributed Computing, Journal of Parallel and Distributed Computing, Online June 6, 2013, http://dx.doi.org/10.1016/j.bbr.2011.03.031 A pdf version is available. Extending the Scope of the Checkpoint-on-Failure Protocol for Forward Recovery in Standard MPI, Wesley Bland, Peng Du, Aurelien Bouteiller, Thomas Herault, George Bosilca, Jack J. Dongarra, accepted in Concurrency and Computing: Practice and Experience, June 2013. A pdf version is available. Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization, J. Dongarra, M. Faverge, P. Luszcsek, Accepted Concurrency and Computation: Practice and Experience, July 2013. A pdf version is available. Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators, A. Abdelfattah, J. Dongarra, D. Keyes, and H. Ltaief, 10th International Meeting on High-Performance Computing for Computational Science (VECPAR 2012), Lecture Notes in Computer Science 7851, pp 72-79, 2013. A pdf version is available. Programming the LU Factorization for a Multicore System with Accelerators, Jakub Kurzak, Piotr Luszczek, Mathieu Faverge, and Jack Dongarra, 10th International Meeting on High-Performance Computing for Computational Science (VECPAR 2012), Lecture Notes in Computer Science 7851, pp 28-35, 2013. A pdf version is available. Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach, George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Herault, Piotr Luszczek, and Jack J. Dongara, in the book Scalable Computing and Communications: Theory and Practice, edited by Samee U. Khan, Lizhe Wang, and Albert Y. Zomaya, Publisher John Wiley & Sons, ISBN: 978-1-1181-6265-1, 2013. A pdf version is available. Keeneland: Computational Science Using Heterogeneous GPU Computing, J. Vetter, R. Glassbrook, K. Schwan, S. Yalamanchili, M. Horton, A. Gavrilovska, M. Slawinska, J. Meredith, P. Roth, K. Spafford, S. Tomov, J. Wynkoop, Ed. Jeffrey S. Vetter, Contemporary High Performance Computing: From Petascale Toward Exascale, Taylor and Francis, Boca Raton, CRC Computational Science Series, 2013. A pdf version is available. HPC Challenge: Design, History, and Implementation Highlights, J. Dongarra and P. Luszczek, Ed. Jeffrey S. Vetter, Contemporary High Performance Computing: From Petascale Toward Exascale, Taylor and Francis, Boca Raton, CRC Computational Science Series, 2013, ISBN: 978-1-4665-6834-1. A pdf version is available. Multithreading in the PLASMA Library, Jakub Kurzak, Piotr Luszczek, Asim YarKhan, Mathieu Faverge, Julien Langou, Henricus Bouwmeester, and Jack Dongarra in Mult and Many��Core Processing: Architecture, Programming, Algorithms, & Applications, Edited by Mohamed Ahmed, Reda A. Ammar, Sanguthevar Rajasekaran Series: Chapman & Hall/CRC Computer & Information Science Series, published by Taylor & Francis, 2013. A pdf version is available. Looking Back at Dense Linear Algebra Software, Piotr Luszczek, Jakub Kurzak, and Jack Dongarra, submitted to Journal of Parallel and Distributed Computing, August 2013. A pdf version is available. Scalable Dense Linear Algebra on Heterogeneous Hardware, George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Herault, Jakub Kurzak, Piotr Luszczek, Stan Tomov, Jack Dongarra, to appear in the book HPC: Transition Towards Exascale Processing, in the series Advances in Parallel Computing, IOS Press. A pdf version is available. LAPACK, CRC Handbook on Linear Algebra, Second Edition, Zhaojun Bai, James Demmel, Jack Dongarra, Julien Langou, and Jenny Wang, Editor Leslie Hogben, CRC Press, to appear 2013. A pdf version is available. Revisiting the Double Checkpointing Algorithm, Jack Dongarra, Thomas Herault and Yves Robert, 15th Workshop on Advances in Parallel and Distributed Computational Models, at the IEEE International Parallel & Distributed Processing Symposium 2013, Boston MA, January 2013. A pdf version is available. Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures, Ichitaro Yamazaki, Dulceneia Becker, Jack Dongarra, Alex Druinsky, Inon Peled, and Sivan Toledo, Grey Ballard, James Demmel, and Oded Schwartz, 15th Workshop on Advances in Parallel and Distributed Computational Models, at the IEEE International Parallel & Distributed Processing Symposium 2013, (Best Paper Award0, Boston MA, January 2013. A pdf version is available. Virtual Systolic Array for QR Decomposition, Jakub Kurzak, Piotr Luszczek, Mark Gates, Ichitaro Yamazaki, and Jack Dongarra, 15th Workshop on Advances in Parallel and Distributed Computational Models, at the IEEE International Parallel & Distributed Processing Symposium 2013, Boston MA, January 2013. A pdf version is available. clMAGMA: High Performance Dense Linear Algebra with OpenCL, C. Cao, Jack Dongarra, Peng Du, Mark Gates, Piotr Luszczek, Stanimire Tomov, International Workshop on OpenCL (IWOCL), GATech, May 13-14, 2013. A pdf version is available. A Parallel solver for Incompressible Fluid Flows, Y. Wang, M. Baboulin, J. Dongarra, J. Falcou, Y Fraigneau, and O. Le Maitre, International Conference on Computational Science, ICCS 2013, Barcelona, Spain, May, 2013. A pdf version is available. Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations, Azzam Haidar, Raffaele Solca, Mark Gates, Stanimire Tomov, Thomas Schulthess, and Jack Dongarra, International Supercomputing Conference ISC, Germany, Lecture Notes in Computer Science, Volume 7905, 2013, pp 67-80. A pdf version is available. Beyond the CPU: Hardware Performance Counter Monitoring on Blue Gene/Q, Dan Terpstra, Kris Davis, Heike McCraw, Jack Dongarra, International Supercomputing Conference ISC, Germany, Lecture Notes in Computer Science, Volume 7905, 2013, pp 213-225. A pdf version is available. Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication, Azzam Haidar, Mark Gates, Stanimire Tomov, Jack Dongarra, ICS '13 Proceedings of the 27th international ACM conference on International conference on supercomputing, Pages 223-232, ACM New York, NY, USA, June 2013, Eugene Oregon. A pdf version is available. Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi, Jack Dongarra, Mark Gates, Azzam Haidar, Yulu Jia, Khairul Kabir, Piotr Luszczek and Stan Tomov, To appear in the PPAM Conference 2013, Warsaw, Poland, September 2013. A pdf version is available. Standards for Graph Algorithm Primitives, Tim Mattson et. al, to appear HPEC��2013, Boston, September 10, 2013. A pdf version is available. Implementing a Systolic Algorithm for QR Factorization on Multicore Clusters with PaRSEC, Guillaume Aupy, Mathieu Faverge, Yves Robert, Jakub Kurzak, Piotr Luszczek, and Jack Dongarra, accepted in the 6th Workshop on Productivity and Performance held in conjunction with Euro-Par 2013, Aachen, Germany August 26 or 27, 2013. A pdf version is available. Parallel Reduction to Hessenberg Form with Algorithm-based Fault Tolerance, Yulu Jia, George Bosilca, Piotr Luszczek, and Jack J. Dongarra, accepted in SC2013, July 2013. A pdf version is available. - 2012 - Autotuning GEMMs for Fermi, Jakub Kurzak, Stanimire Tomov, and Jack Dongarra, IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 11, November 2012, pp 2045-2057. A pdf version is available. Energy Footprint of Advanced Dense Numerical Linear Algebra using Tile Algorithms on Multicore Architecture, Jack Dongarra, Hatem Ltaief, Piotr Luszczek, and Vince M. Weaver, The 2nd International Conference on Cloud and Green Computing(CGC 2012), pp 274 - 281, ISBN: 978-1-4673-3027-5, November 1-3, 2012, Xiangtan, Hunan, China. A pdf version is available. A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks, Raffaele Solc�, Azzam Haidar, Stanimire Tomov, Jack Dongarra, and Thomas C. Schulthess, Proceeding SC '12 Proceedings of the 2012, High Performance Computing, Networking Storage and Analysis, Pages 1338-1339 IEEE Computer Society Washington, DC, USA. A pdf version is available. Autotuning GEMMs for Fermi,Jakub Kurzak, Stanimire Tomov, and Jack Dongarra, IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 11, November 2012, pp 2045-2057. A pdf version is available. Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures, A. Haidar, H. Ltaief, A, YarKhan, J. Dongarra, Concurrency and Computations, Volume 24, Issue 3, pages 305��321, 10 March 2012. A pdf version is available. From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming, P. Du, R. Weber, P. Luszczek, S. Tomov, G. Peterson, and J. Dongarra, Parallel Computing, Volume 38, Issue 8, August 2012, pp. 391-407. A pdf version is available. High-performance computing systems: Status and Outlook, , Jack Dongarra and A. J. van der Steen, Acta Numerica (2012), pp. 1-96. A pdf version is available. An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs, Jakub Kurzak, Rajib Nath, Peng Du, and Jack Dongarra, in Applied Parallel and Scientific Computing, PARA 2010, Editor Lristjan Jonasson, Springer, LNCS, Volume 7133, pp 248-257, 2012. A pdf version is available. DAGuE: A generic distributed DAG engine for high performance computing, G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier, J. Dongarra, Parallel Computing, Volume 38, Issue 1-2, pp. 37 �� 51, 2012. A pdf version is available. Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems, Christof V�mel, Stanimire Tomov, and Jack Dongarra, SIAM J. Sci. Comput. Volume 34, pp. C70-C82, 2012. A pdf version is available. A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction, A. Haidar, H. Ltaief, P. Luszczek, and J. Dongarra, 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Shanghai, China, May 2012. A pdf version is available. A Tiled Parallel Solver For Symmetric Indefinite Systems On Multicore Architectures,Marc Babolin, D. Becker, and J. Dongarra, 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Shanghai, China, May 2012. A pdf version is available. Algorithm-Based Fault Tolerance for Dense Matrix Factorization,Peng Du, Aurelien Bouteiller, George Bosilca, Jack J. Dongarra, Thomas Herault, 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), February 25-29, 2012, New Orleans, LA. A pdf version is available. Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems,Hartwig Anzt, Stan Tomov, Mark Gates, Jack Dongarra, and Vincent Heuveline, Procedia Computer Science, Proceedings of the International Conference on Computational Science, ICCS 2012, Volume 9, 2012, Pages 7��16, 2012. A pdf version is available. From Serial Loops to Parallel Execution on Distributed Systems, Anthony Danalis, Aurelien Bouteiller, George Bosilca, Jack J. Dongarra, Thomas Herault, submitted to PPoPP 2012. A pdf version is available. HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters,(Best Paper), Teng Ma, G. Bosilca, A. Bouteiller, J. Dongarra, 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Shanghai, China, May 2012.. A pdf version is available. Weighted Block-Asynchronous Relaxation for GPU-Accelerated Systems, Hartwig Anzt, Jack Dongarra, and Vincent Heuveline, submitted to SIAM Journal on Computing March 2012. A pdf version is available. Dense Linear Algebra on Accelerated Multicore Hardware, Jack Dongarra, Jakub Kurzak, Piotr Luszczek, and Stanimire Tomov, in High Performance Scientific Computing: Algorithms and Applications, Editors Michael W. Berry, Kyle A. Gallivan, Efstratios Gallopoulos, Ananth Grama, Bernard Philippe, Yousef Saad and Faisal Saied, Springer, 2012. A pdf version is available. Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures using Tree Reduction, H. Ltaief, P. Luszczek, and J. Dongarra, in Lecture Notes in Computer Science, Volume 7203, 2012, Parallel Processing and Applied Mathematics 9th International Conference, PPAM 2011, Torun, Poland, September 11-14, 2011, Part I, Roman Wyrzykowski, Jack Dongarra , Konrad Karczewski and Jerzy Wasniewski, pp 661-670, 2012. A pdf version is available. Reducing the Amount of Pivoting in Symmetric Indefinite Systems, D. Becker, M. Babolin, J. Dongarra, in Lecture Notes in Computer Science, Volume 7203, 2012, Parallel Processing and Applied Mathematics 9th International Conference, PPAM 2011, Torun, Poland, September 11-14, 2011, Part I, Roman Wyrzykowski, Jack Dongarra , Konrad Karczewski and Jerzy Wasniewski, pp 133-142, 2012. A pdf version is available. Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems, Hartwig Anzt, Stan Tomov, Mark Gates, Jack Dongarra, and Vincent Heuveline, International Conference on Computational Science, International Conference on Computational Science, (ICCS) 2012, May 2012, Omaha NE. A pdf version is available. One-sided dense matrix factorizations on a multicore with multiple GPU accelerators in MAGMA, Ichitaro Yamazaki, Stanimire Tomov, and Jack Dongarra, International Conference on Computational Science, ICCS 2012, Omaha NE. A pdf version is available. A Class of Communication-Avoiding Algorithms for Solving General Dense Linear Systems on CPU/GPU Parallel Machines, Marc Baboulin, Simplice Donfack, Jack Dongarra, Laura Grigori, Adrien R�emy, Stanimire Tomov, International Conference on Computational Science, ICCS 2012, Omaha NE. A pdf version is available. High Performance Dense Linear System Solver with Resilience to Multiple Soft Errors, P. Du, P. Luszczek, and J. Dongarra, International Conference on Computational Science, ICCS 2012, Omaha NE. A pdf version is available. Enabling and Scaling Matrix Computations on Heterogeneous Multi-Core and Multi-GPU Systems, Fengguang Song and Jack Dongarra, ICS 2012 Conference, 26th International Conference on Supercomputing, 25-29 June 2012, San Servolo Island, Venice, Italy. A pdf version is available. A Scalable Framework for Heterogeneous GPU-Based Clusters, F. Song and J. Dongarra, ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '12), Pittsburgh, USA on January 2012. A pdf version is available. A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI, Wesley Bland, Peng Du, Aurelien Bouteiller, Thomas Herault, George Bosilca, and Jack J. Dongarra, Euro-Par 2012 Parallel Processing, Lecture Notes in Computer Science Volume 7484, 2012, pp 477-488 as a distinguished paper. A pdf version is available. From Serial Loops to Parallel Execution on Distributed Systems, Anthony Danalis, Aurelien Bouteiller, George Bosilca, Jack J. Dongarra, Thomas Herault, Euro-Par 2012 Parallel Processing, Lecture Notes in Computer Science Volume 7484, 2012, pp 246-257. A pdf version is available. Power Profiling of Cholesky and QR Factorizations on Distributed Memory Systems, George Bosilca, Jack Dongarra, and Hatem Ltaief, accepted at the EnA-HPC 2012 : Third International Conference on Energy-Aware High Performance Computing, International Conference on Energy-Aware High Performance Computing, September 12-14, 2012. A pdf version is available. Energy Footprint of Advanced Dense Numerical Linear Algebra using Tile Algorithms on Multicore Architecture, Jack Dongarra, Hatem Ltaief, Piotr Luszczek, and Vince M. Weaver, submitted to The 2nd International Conference on Cloud and Green Computing(CGC 2012) November 1-3, 2012, Xiangtan, Hunan, China. A pdf version is available. Anatomy of a Globally Recursive Embedded LINPACK Benchmark, Piotr Luszczek and Jack Dongarra, accepted in 2012 IEEE High Performance Extreme Computing Conference, Waltham, Massachusetts, September 2012. A pdf version is available. Weights for Block-Asynchronous Iteration on GPU-Accelerated Systems, Hartwig Anzt, Stanimire Tomov, Jack Dongarra, and Vincent Heuveline, To appear in the 10th HeteroPar'2012 (Tenth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms), Rhodes Island, Greece, August 2012. A pdf version is available. GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement, H. Anzt, P. Luszczek, J. Dongarra, V. Heuveline, Euro-Par 2012 Parallel Processing, Lecture Notes in Computer Science Volume 7484, 2012, pp 908-919, Rhodes Island, Greece, August 2012. A pdf version is available. - 2011 - High-Performance High-Resolution Semi-Lagrangian Tracer Transport on a Sphere, T. White and J. Dongarra, Journal of Computational Physics, Volume 230 Issue 17, July, 2011, pp 6778-6799. A pdf version is available. A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures, M. Horton, S. Tomov, and J. Dongarra, to appear 2011 Symposium on Application Accelerators in High Performance Computing, 19-21 July, 2011, Knoxville TN. A pdf version is available. Algorithm-based Fault Tolerance Method for Soft Error Resilience in High-Performance Linpack, Peng Du, Piotr Luszczek, and Jack Dongarra, IEEE Cluster 2011, September 26-30, Austin, TX. A pdf version is available. Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures, Azzam , Hatem Ltaief, Asim YarKhan and Jack Dongarra, IPDPS 2011, Anchorage, AK, May 2011. A pdf version is available. BLAS for GPUs, R. Nath, S. Tomov, and J. Dongarra, pp 57-80, in Scientific Computing with Multicore and Accelerators, Edited by Jakub Kurzak, David Bader, and Jack Dongarra, Chapman & Hall/CRC Computational Science Series, ISBN 978-1-4398-2536-5, 2011. A pdf version is available. Changes in Dense Linear Algebra Kernels, Decades-long perspective, Piotr Luszczek, Jakub Kurzak, and Jack Dongarra, pp 313-342, in Solving the Schr�dinger equation: has everything been tried? Editor Paul Popular, Imperial College Press, 2011, ISBN-13 978-1-84816-724-7. A pdf version is available. DAGuE: A generic distributed DAG engine for high performance computing,G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier, J. Dongarra, Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on , pp.1151-1158, 16-20 May 2011, ISSN: 1530-2075. A pdf version is available. Dense Linear Algebra for Hybrid GPU-Based Systems, S. Tomov and J. Dongarra, pp 37-56, in Scientific Computing with Multicore and Accelerators, Edited by Jakub Kurzak, David Bader, and Jack Dongarra, Chapman & Hall/CRC Computational Science Series, ISBN 978-1-4398-2536-5, 2011. A pdf version is available. Evaluation of the HPC Challenge Benchmarks in Virtualized Environments, P. Luszczek, E. Meek, S. Moore, D. Terpstra, J. Dongarra, 6th Workshop on Virtualization in High-Performance Cloud Computing (VHPC '11) as part of Euro-Par 2011, Bordeux France. A pdf version is available. Exploiting Fine-Grain Parallelism in Recursive LU Factorization, Jack Dongarra, Mathieu Faverge, Hatem Ltaief, Piotr Luszczek, International Conference on Parallel Computing, 30 August - 2 September 2011, Ghant Belgium. A pdf version is available. Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA,George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Azzam Haidar, Thomas Herault, Jakub Kurzak, Julien Langou, Pierre Lemarinier, Hatem Ltaief, Piotr Luszczek, Asim YarKhan, Jack Dongarra, 12th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-11), May 16-20, 2011, Anchorage, Alaska, USA. A pdf version is available. Fully Empirical Autotuned Dense QR Factorization For Multicore Architectures, E. Agullo, J. Dongarra, R. Nath, S. Tomov, EuroPar 2011. A pdf version is available. High Performance Matrix Inversion Based on LU Factorization for Multicore Architectures,J. Dongarra, M. Faverge, H. Ltaief, P. Luszcsek, 4th Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS) 2011, Co-located with Supercomputing/SC 2011, Seattle Washington, November 14th, 2011. A pdf version is available. High-Performance High-Resolution Semi-Lagrangian Tracer Transport on a Sphere, T. White and J. Dongarra, Journal of Computational Physics, Volume 230 Issue 17, July, 2011, pp 6778-6799. A pdf version is available. Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW, T. Ma, A. Bouteiller, G. Bosilca, J. Dongarra, EuroMPI-2011, September 19-21, 2011, Santorini Greece. A pdf version is available. Implementing Matrix Factorization on the Cell B.E., J. Kurzak, and J. Dongarra, pp. 21-35, in Scientific Computing with Multicore and Accelerators, Edited by Jakub Kurzak, David Bader, and Jack Dongarra, Chapman & Hall/CRC Computational Science Series, ISBN 978-1-4398-2536-5, 2011. A pdf version is available. Implementing Matrix Multiplication on the Cell B.E., W. Alvaro, J. Kurzak, and J. Dongarra, pp 3-20, in Scientific Computing with Multicore and Accelerators, Edited by Jakub Kurzak, David Bader, and Jack Dongarra, Chapman & Hall/CRC Computational Science Series, ISBN 978-1-4398-2536-5, 2011. A pdf version is available. Improvement of parallelization efficiency of batch pattern BP training algorithm using Open MPI, Volodymyr Turchenko, Lucio Grandinetti, George Bosilca and Jack J. Dongarra, International Conferenc e on Computational Science, ICCS 2010, Amsterdam The Netherlands, June 2010. A pdf version is available. Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community, J.S. Vetter, R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, and S. Yalamanchili, IEEE Computing in Science and Engineering, 13(5):90-5, 2011, ISSN: 1521-9615. A pdf version is available. Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms, Fred G. Gustavson, Jerzy Wasniewski, Jack J. Dongarra, J. Herrero, and J. Langou, accepted in ACM TOMS, June 2011. A pdf version is available. LU Factorization for Accelerator-based Systems, Emmanuel Agullo, C�edric Augonnet, Jack Dongarra, Mathieu Faverge, Julien Langou, Hatem Ltaief, Stanimire Tomov, The 9TH ACS/IEEE International Conference on Computer Systems and Applications AICCSA 2011, June 27th - June 30th 2011, Sharm El-Sheikh, Egypt. A pdf version is available. Multithreading in the PLASMA Library,Jakub Kurzak, Piotr Luszczek, Asim YarKhan, Mathieu Faverge, Julien Langou, Henricus Bouwmeester, and Jack Dongarra in Multi- and Many-Core Technologies: Architecture, Programming, Algorithms, & Applications, published by Taylor & Francis, 2011. A pdf version is available. OMPIO: A Modular Software Architecture for MPI I/O, Mohamad Chaarawi, Edgar Gabriel, Rainer Keller, Richard Graham, George Bosilca and Jack Dongarra, EuroMPI-2011, September 19-21, 2011, Santorini Greece. A pdf version is available. On Scalability for MPI Runtime Systems, George Bosilca, Thomas Herault, Ala Rezmerita and Jack Dongarra, The International Workshop on Runtime and Operating Systems for Supercomputers, May 31, 2011. A pdf version is available. Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs, Jakub Kurzak, Jack Dongarra, and Rajib Nath, IEEE/ACM SC11 Conference, Seattle WA, November 2011. A pdf version is available. Overlapping Computation and Communication for Advection on Hybrid Parallel Computers, J. White and J. Dongarra, IPDPS 2011, Anchorage, AK, May 2011. A pdf version is available. Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels, Hatem Ltaief, Azzam , and Jack Dongarra, IEEE/ACM SC11 Conference, Seattle WA, November 2011. A pdf version is available. Performance Portability of a GPU Enabled Factorization with the DAGuE Framework,Aurelien Bouteiller, George Bosilca, Jack J. Dongarra, Thomas Herault, Pierre Lemarinier, Stanimir Tomov and Narapat Ohm Saengpatsa, IEEE Cluster: workshop on Parallel Programming on Accelerator Clusters (PPAC), June 24, 2011. A pdf version is available. Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power and Energy Efficiency, Hatem Ltaief, Piotr Luszczek and Jack Dongarra, the International Conference on Energy-Aware High Performance Computing September 07-09, 2011, Hamburg, Germany. A pdf version is available. QCG-OMPI: MPI Applications on Grids, Emmanuel Agullo, Camille Coti, Thomas Herault, Julien Langou, Sylvain Peyronnet, Ala Rezmerita, Franck Cappello, Jack Dongarra, Future Generation Computer Systems, Volume 27, Issue 4, pp 357-369, April 2011. A pdf version is available. QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment, Emmanuel Agullo, Camille Coti, Jack Dongarra, Thomas Herault, and Julien Langou, UT-CS-10-651, Janua ry 6, 2010. A pdf version is available. QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, E. Agullo, C. Augonnet, J. Dongarra, M. Feverge, H. Ltaief, S. Thibault, S. Tomov, IPDPS 2011, Anchorage, AK, May 2011. A pdf version is available. Recent Advances in the Message Passing Interface 18th European MPI Users' Group Meeting,EuroMPI 2011 Santorini, Greece, September 18-21, 2011, Yiannis Cotronis, Anthony Danalis, Dimitrios S. Nikolopoulos, and Jack Dongarra (Eds.) Springer, LNCS, Volume 6960, 2011, ISSN 0302-9743, ISBN 978-3-642-24448-3. Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution, and Inverse. Fred G. Gustavson, Jerzy Wasniewski, Jack J. Dongarra, and J. Langou, ACM TOMS, Volume 37, Number 2, 2011, pp. 18-1:18-21, 2011, ISSN 0098-3500. A pdf version is available. Reducing the Amount of Pivoting in Symmetric Indefinite Systems, D. Becker, M. Babolin, J. Dongarra, to appear PPAM, October 2011. A pdf version is available. Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure, G. Bosilca, T. Herault, P. Lemarinier, A. Rezmerita, and J. Dongarra, EuroMPI-2011, September 19-21, 2011, Santorini Greece. A pdf version is available. Scientific Computing with Multicore and Accelerators, Edited by Jakub Kurzak, David Bader, and Jack Dongarra, Chapman & Hall/CRC Computational Science Series, ISBN 978-1-4398-2536-5, 2011. Soft Error Resilient QR Factorization for Hybrid System with GPGPU,P. Du, P. Luszczek, S. Tomov, and J. Dongarra, Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) held in conjunction with the 24th IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC) 2011, November 14, 2011, Seattle, WA, USA. A pdf version is available. Solving the Generalized Symmetric Eigenvalue Problem using Tile Algorithms on Multicore Architectures, Hatem Ltaief, Piotr Luszczek, and Jack Dongarra, International Conference on Parallel Computing, 30 August - 2 September 2011, Ghant Belgium. A pdf version is available. The International Exascale Software Roadmap, J. Dongarra, P. Beckman, et. al, International Journal of High Performance Computing, Volume 25, Number 1, pp. 3-60, 2011, ISSN 1094-3420. A pdf version is available. Toward High Performance and Conquer Eigensolver for Dense Symmetric Matrices, Azzam Haidar, Hatem Ltaief, and Jack Dongarra, submitted to SIAM SISC, February 2011. A pdf version is available. Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures,Agullo, E., Bouwmeester, H., Dongarra, J., Kurzak, J., Langou, J., and Rosenberg, L., In Proceedings of the 9th International Meeting on High Performance Computing for Computational Science, VEC- PAR'10, Berkeley, CA, June 22-25 2011. A pdf version is available. Trace-based Performance Analysis for the Petascale Simulation Code FLASH, Heike Jagode, Jack Dongarra, Andreas Knupfer, Matthias Jurenz, Matthias S. Muller, and Wolfgang E. Nagel, International Journal of High Performance Computing, Volume 25, Number 4, Winter 2011, pp. 428-439, ISSN 1094-3420. A pdf version is available. Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices using Tile Algorithms on Multicore Architectures, Piotr Luszczek, Hatem Ltaief, and Jack Dongarra, IPDPS 2011, Anchorage, AK, May 2011. A pdf version is available. - 2010 - Accelerating the Reduction to Upper Hessenberg, Tridiagonal, and Bidiagonal Forms Through Hybrid GPU-Based Computing, S. Tomov, R. Nath, and J. Dongarra, Parallel Computing, Volume 36, Number 12, 2010, pp. 45-654. A pdf version is available. An Improved MAGMA GEMM for Fermi GPUs, Rajib Nath, Stanimire Tomov, and Jack Dongarra, International Journal of High Performance Computing Applications, Volume 24, number 4, 2010, pp 511-515, ISSN 1094-3420. A pdf version is available. Dense Linear Algebra Solvers for Multicore with GPU Accelerators, Stanimire Tomov, Rajib Nath, Hatem Ltaief, and Jack Dongarra, Proceedings of IPDPS 2010: 24th IEEE I nternational Parallel and Distributed Processing Symposium, Atlanta, GA, April 2010. A pdf version is available. Empirical Performance Tuning of Dense Linear Algebra Software, Jack Dongarra and Shirley Moore, pp 255-272, in Performance Tuning of Scientific Applications, David H. Bailey, Robert F. Lucas, Samuel W. Williams, Editors, Chapman & Hall/CRC Computational Science Series, ISBN 978-1-4398-1569-4, 2010. A pdf version is available. Faster, Cheaper, Better - a Hybridization Methodology to Develop Linear Algebra Software for GPUs, Emmanuel Agullo, Cedric Augonnet, Jack Dongarra, Hatem Ltaief, Raymond Namyst, Samuel Thibault, and Stanimire Tomov, Nvidia GPU Gems, Morgan Kaufmann (Ed.), 2010. A pdf version is available. Hybrid Multicore Cholesky Factorization with Multiple GPU Accelerators, H. Ltaief, S. Tomov, R. Nath, and J. Dongarra, Submitted to IEEE Transaction on Parallel and Distributed Computing, March 2010. A pdf version is available. Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures, Hatem Ltaief, Jakub Kurzak, and Jack Dongarra, IEEE Transactions on Parallel and Distributed Systems, April 2010, pp 417-423. A pdf version is available. Redesigning the Message Logging Model for High Performance, A. Bouteiller, G. Bosilca, and J. Dongarra, Concurrency and Computation Practice and Experience, Volume 22, Number 15, November 2010, pp 2196-2212, ISSN 1532-0626. A pdf version is available. Scheduling Linear Algebra Operations on Multicore Processors, Jakub Kurzak, Hatem Ltaief, Jack Dongarra, and Rosa M. Badia, Concurrency and Computation: Practice and Experience, Vol. 22, no. 1, pp. 15-44, January, 2010. A pdf version is available. Scheduling Two-sided Transformations using Algorithms-by-Tiles on Multicore Architectures, H. Ltaief, J. Kurzak, J. Dongarra, and R. Badia, Scientific Programming, Volume 18, Number 1, pp 35-50, 2010, ISSN 1058-9244. A pdf version is available. Self-Healing Network for Scalable Fault-Tolerant Runtime Environments, T. Angskun, G. Fagg, G. Bosilca, J. Pjesivac-Grbovic, and J Dongarra, Future Generation Computer Systems, Volume 26, Issue 3, pp 479-485, March 2010, ISSN 0167-739X, 2010. A pdf version is available. SmartGridRPC: The new RPC model for high performance Grid computing and its implementation in SmartGridSolve, T. Brady, A. Lastovetsky, K. Seymour, M. Guidolin,and J. Dongarra, Concurrency Practice and Experience, pp 2467-2487, Volume 22 Number 18, ISSN 1532-0626, 2010. A pdf version is available. Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems, Parallel Computing, Volume 36, Issues 5-6, pp 232-240, 2010, ISSN 0167-8191. A pdf version is available. Reliability and Performance Modeling and Analysis for Grid Computing,Yuan-Shun Dai, Jack Dongarra, in Handbook of Research on Scalable Computing Technologies, Editors Kuan-Ching Li, Ching-Hsien Hsu, Laurence Tianruo Yang, Jack Dongarra, Hans Zima, IGI Global, 2010. A pdf version is available. Transparent Cross-Platform Access to Software Services using GridSolve and GridRPC, Keith Seymour, Asim YarKhan, and Jack Dongarra to appear in Cloud Computing and Software Services: Theory and Techniques, editors Syed Ahson and Mohammad Ilyas, 2010, CRC Press. A pdf version is available. - 2009 - A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures, Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra, Parallel Computing, Volume 35, Issue 1, pp 38-53, 2009, ISSN:0167-8191 A pdf version is available. Accelerating Scientific Computations with Mixed Precision Algorithms, Marc Baboulin, Alfredo Buttari, Jack Dongarra, Jakub Kurzak, Julie Langou, Julien Langou, Piotr Luszczek, and Stanimire Tomov, Computer Physics Communications 180 (2009) 2526-2533. A pdf version is available. Accelerating Time-To-Solution for Computational Science and Engineering, J. Demmel, J. Dongarra, A. Fox, S. Williams, V. Volkov, and K. Yelick, SciDAC Review, Winter 2009, pp 46-57. A pdf version is available. Algorithmic Based Fault Tolerance Applied to High Performance Computing, Jack J. Dongarra, George Bosilca, Remi Delmas, and Julien Langou, Journal of Parallel and Distributed Computing, Volume 69, pp 410-416, 2009. A pdf version is available. Computing the Conditioning of the Components of a Linear Least Squares Solution, Marc Baboulin, Jack Dongarra, and Julien Langou,Numerical Linear Algebra with Applications, July 2009, Volume 16 Issue 7, p 517-533. A pdf version is available. Highly Scalable Self-Healing Algorithms for High Peroformance Scientific Computing, Zizhong Chen and Dongarra, J.IEEE Transactions on Computing, Volume 58, Number 11, November 2009, pp 1512-1524, ISSN 0018-9340. A pdf version is available. Optimizing Matrix Multiplication for a Short-Vector SIMD Architecture -- CELL Processor, Wesley Alvaro, Jakub Kurzak, and Jack Dongarra, Parallel Computing, Volume 35, pp 138-150, 2009. A pdf version is available. Paravirtualization Effect on Single- and Multi-threaded Memory-Intensive Linear Algebra Software, Lamia Youseff, Keith Seymour, Haihang You, Dmitrii Zagorodnov, Jack Dongarra, and Rich Wolski, Cluster Computing Journal, Volume 12, Number 2 / June, 2009, pp 101-122, ISSN 1386-7857. A pdf version is available. QR Factorization for the CELL Processor, Jakub Kurzak and Jack Dongarra, Accepted in Scientific Programming, Scientific Programming, Volume 17, Issue 1-2, January 2009, pp 31-42, ISSN:1058-9244. A pdf version is available. Scheduling Linear Algebra Operations on Multicore Processors, Jakub Kurzak, Hatem Ltaief, Jack Dongarra, and Rosa Badia, to appear in Trends in High Performance and Large Scale Computing, editors L. Grandinetti, G. Joubert, and W. Gentzsch, IOP Press, to be published in 2009. A pdf version is available. The International Exascale Software Project: A Call to Cooperative Action by the Global High Performance Community, Jack Dongarra, Pete Beckman, Patrick Aerts, Frank Cappello, Thomas Lippert, Satoshi Matsuoka, Paul Messina, Terry Moore, Rick Stevens, Anne Trefethen, Mateo Valero, Volume 23, Number 4, Winter 2009, International Journal of High Performance Computer Applications, pp 309-322, ISSN 1094-3420. A pdf version is available. The Problem with the Linpack Benchmark Matrix Generator, Julien Langou and Jack Dongarra, International Journal of High Performance Computer Applications, Volume 23, Number 1, Spring 2009, pp 5 - 14. A pdf version is available. - 2008 - A Comparison of Search Techniques for Empirical Code Optimization, Keith Seymour, Haihang You, and Jack Dongarra, submitted to The Third international Workshop on Automatic Performance Tuning, October 1st, 2008, Tsukuba International Congress Center, Epochal Tsukuba, Japan. A pdf version is available. A Tribute to Gene Golub, Jack Dongarra, Computing in Science and Engineering, IEEE, March/April 2008, pp 5. A pdf version is available. Algorithm-Based Fault Tolerance for Fail-Stop Failures, Zizhong Chen and Jack Dongarra, IEEE Transactions on Parallel and Distributed Systems, Vol. 19, No. 12, December, 2008. A pdf version is available. Interactive Grid-Access Using Gridsolve and Giggle, M. Hardt, K. Seymour, J. Dongarra, M. Zapf, and N.V. Ruiter, Computing and Informatics, Vol. 27, No. 2, pp 233-248, 2008, ISSN 1335-9150. A pdf version is available. Interior State Computation of Nano Structures,Andrew Canning, Jack Dongarra, Julien Langou, Osni Marques, Stanimire Tomov, Christof Voemel, and Lin-Wang Wang, PARA 2008, 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing, May 13-16, 2008, Trondheim Norway. A pdf version is available. Netlib and NA-Net: Building a Scientific Computing Community, J. Dongarra, G. Golub, E. Grosse, C. Moler, K. Moore, IEEE Annals of the History of Computing, Volume 3 Number 2, April - June 2008, pp 30 - 41. A pdf version is available. Parallel Tiled QR Factorization for Multicore Architectures, Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra, Concurrency and Computation: Practice and Experience, 2008; 20:1573-1590. A pdf version is available. Revisiting Matrix Product on Master-Worker Platforms, Jack Dongarra, Jean-François Pineau, Yves Robert, Zhiao Shi and Frédéric Vivien, International Journal of Foundations of Computer Science (IJFCS), Volume 19, Number 6, December 2008, pp 1317-1336. A pdf version is available. Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization, Jakub Kurzak, Alfredo Buttari, and Jack Dongarra, IEEE Transactions on Parallel and Distributed Systems, Volume 19, Number 9, September 2008, pp 1 - 11. A pdf version is available. Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures, Marc Baboulin, Stan Tomov and Jack Dongarra, PARA 2008, 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing, EECS Tech Report UT-CS-08-615, LAWN #200, May 13-16, 2008, Trondheim Norway. A pdf version is available. State-of-the-Art Eigensolvers for Electronic Structure Calculations of Large Scale Nano-Systems, Christof Vomel, Stanimire Z. Tomov, Osni A. Marques, A. Canning, Lin-Wang Wang, and Jack J. Dongarra, Journal of Computational Physics, Volume 227, Issue 15 (July 2008), pages 7113-7124. A pdf version is available. The PlayStation 3 for High Performance Scientific Computing, Jakub Kurzak, Alfredo Buttari, Piotr Luszczek, and Jack Dongarra, Computing in Science and Engineering, IEEE, May/June 2008, pp 80-83. A pdf version is available. Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy, Alfredo Buttari, Jack Dongarra, Jakub Kurzak, Piotr Luszczek, and Stanimire Tomov, ACM Transactions on Mathematical Software, Volume 34 Number 4, July 2008, pp 1 - 22. A pdf version is available. - 2007 - Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems, A. Buttari, J. Dongarra, J. Langou, J. Langou, P. Luszczek, J. Kurzak, International Journal of High Performance Computing Applications, 21(4):457-466, 2007. A pdf version is available. Automatic Analysis of Inefficiency Patterns in Parallel Applications, Felix Wolf, Bernd Mohr, Jack Dongarra, Shirley Moore, Concurrency and Computation: Practice and Experience, Volume 19, Issue 11, pp 1481-1496, August 2007. A pdf version is available. Implementation of Mixed Precision in Solving Systems of Linear Equations on the Cell Processor, Jakub Kurzak, Jack Dongarra, Concurrency and Computation: Practice and Experience, Volume 19, Issue 10, pp 1371-1385, July 2007. A pdf version is available. Improved Runtime and Transfer Time Prediction Mechanisms in a Network Enabled Servers Middleware, Emmanuel Jeannot, Keith Seymour, Asim YarKhan, and Jack J. Dongarra, Parallel Processing Letters, March 2007, Volume 17, Number 1, pp 47-59, ISSN 0129-6264. A pdf version is available. Performance Analysis of MPI Collective Operations, Jelena Pjesivac-Grbovi´c, Thara Angskun, George Bosilca, Graham E. Fagg, Edgar Gabriel, and Jack J. Dongarra, Cluster Computing Journal, Volume 10, pp 127-143, 2007. A pdf version is available. Recovery Patterns for Iterative Methods in a Parallel Unstable Environment, G. Bosilca, Z. Chen, J. Dongarra, and J. Langou, SIAM Journal on Scientific Computing, pp 102-116, Volume 30, Number 1, 2007. A pdf version is available. Scalability Analysis of the SPEC OpenMP Benchmarks on Large-Scale Shared Memory Multiprocessors, K. Fuerlinger, M. Gerndt, J. Dongarra, in Lecture Notes in Computer Science, Volumes 4487-4490, Computational Science - ICCS 2007, 7th International Conference Beijing, China, May 27 - 30, 2007, Editors Yong Shi, Geert Dick van Albada, Jack Dongarra, and Peter M.A. Sloot, ISBN-10 3-540-72589-X, ISSN 0302-9743, Springer Berlin / Heidelberg, 2007. A pdf version is available. The Impact of Multicore on Computational Science Software, Jack Dongarra, Dennis Gannon, Geoffrey Fox, and Ken Kennedy, CTWatch Quarterly, Volume 3 Number 1, February 2007, (Unreviewed). A pdf version is available. The Use of Bulk States to Accelerate the Band Edge State Calculation of a Semiconductor Quantum Dot, Christof Vomel, Stanimire Z. Tomov, Lin-Wang Wang, Osni A. Marques, and Jack J. Dongarra, Journal of Computational Physics, Volume 223, Number 2, pp 774-782, ISSN 0021-9991, 2007. A pdf version is available. - 2006 - Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems), Julie Langou, Julien Langou, Piotr Luszczek, Jakub Kurzak, Alfredo Buttari and Jack Dongarra, Proceding of the ACM/IEEE SC2006 Conference on High Performance, Networking, and Computing, November 11-17, 20006, Tampa, FL, https://doi.org/10.1145/1188455.1188573. A pdf version is available. Conjugate-Gradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures, Stanimire Tomov, Julien Langou, Andrew Canning, Lin-Wang Wang, and Jack Dongarra, The International Journal of Computational Science and Engineering, Volume 2, Number 3/4, pp 205-212, 2006, ISSN 1742-7185. A pdf version is available. Design and Implementation of the HPC Challenge Benchmark Suite, Piotr Luszczek, Jack Dongarra, Jeremy Kepner, CTWatch Quarterly, November 2006, Volume 2, Number 4A, http://www.ctwatch.org/quarterly/archives/november-2006/ (Unreviewed). A pdf version is available. NanoPSE: A Nanoscience Problem Solving Environment for Atomistic Electronic Structure of Semiconductor Nanostructures, W. B. Jones, G. Bester, A. Canning, A. Franceschetti, P. A. Graf, K. Kim, J. Langou, L.W. Wang, J. Dongarra, and A. Zunger, , in "the Proceedings of Science Discovery through Advanced Computing (SciDAC 2005)", Journal of Physics: Conference Series 16, 277-282, 2005. A pdf version is available. Predicting the Electronic Properties of 3D, Million-Atom Semiconductor Nanostructure Architectures, A. Zunger, A. Franceschetti, G. Bester, W.B. Jones, Kwiseon Kim, P. A. Graf, L-W. Wang, A. Canning, O. Marques, C. Voemel, J. Dongarra, J. Langou and S. Tomov, Journal of Physics: 46 (2006) 292-298. A pdf version is available. Scheduling Workflow Applications on Processors with Different Capabilities, Zhiao Shi and Jack Dongarra, Future Generation Computing Systems, Volume 22, pp 665-675, 2006. A pdf version is available. Recent Developments in GridSolve, Asim YarKhan, Keith Seymour, Kiran Sagi, Zhiao Shi, and Jack Dongarra, International Journal of High Performance Applications and Supercomputing, Volume 20 Number 1 Spring 2006, ISSN 1094-3420, pp 131-132. A pdf version is available. Self Adapting Numerical Software (SANS) Effort, George Bosilca, Zizhong Chen, Jack Dongarra, Victor Eijkhout, Graham E. Fagg, Erika Fuentes, Julien Langou, Piotr Luszczek, Jelena Pjesivac-Grbovic, Keith Seymour, Haihang You, and Sathish S. Vadhiyar, IBM Journal of Research and Development, pp. 223-238, Volume 50, Number 2/3, 2006. A pdf version is available. Trends in High-Performance Computing, Jack Dongarra, January/February 2006, IEEE Circuits & Devices Magazine, pp 22-27, ISSN 8755-3996. A pdf version is available. Twenty-Plus Years of Netlib and NA-Net, Part 1 and 2, SIAM News, pp 1-3, Volume 39, Number 3&4, April & May 2006 (Unreviewed news article). A pdf version is available. - 2005 - A Not So Simple Matter of Software, Jack Dongarra, NCSA Access, Summer 2005 (non-refereed magazine publication). A pdf version is available. A Scalable Approach to MPI Application Performance Analysis, Shirley Moore, Felix Wolf, Jack Dongarra, Sameer Shende, Patricia Teller, and Bernd Mohr, Volume 3666, Recent Advances in Parallel Virtual Machine and Messaging Passing Interface Users' Group Meeting Euro PVMMPI 2005, pp 309-316, Springer Heidelberg, 2005, ISSN: 0302-9743. A pdf version is available. An Asynchronous Algorithm on NetSolve Global Computing System, Jack Dongarra, Nahid Emad, S. A. Shahzadeh Fazeli, Future Generation Computing Systems , Vol. 22, No. 3, pp 279-290, 2005. A pdf version is available. Biological Sequence Alignment on the Computational Grid using the GrADS Framework, Asim YarKhan and Jack Dongarra, Future Generation Computer Systems, Volume 21, Issue 6, pp 980-986, June 2005. A pdf version is available. Condition Numbers of Gaussian Random Matrices, Zizhong Chen and Jack Dongarra, SIAM Matrix Analysis and Applications, Volume 27, Number 3, pp 603-620, 2005. A pdf version is available. Evaluating Dynamic Communicators and One-Sided Operations for Current MPI Libraries, Edgar Gabriel, Graham E. Fagg, and Jack J. Dongarra, International Journal of High Performance Computing Applications, Volume 19, Number 1, pp 67-81, Spring 2005, ISSN 1094-3420. A pdf version is available. Hash Functions for Datatype Signatures in MPI, George Bosilca, Jack Dongarra, Graham Fagg, and Julien Langou, Lecture Notes in Computer Science, Volume 3666, Recent Advances in Parallel Virtual Machine and Messaging Passing Interface Users' Group Meeting Euro PVMMPI 2005, pp 76-83, Springer Heidelberg, 2005, ISSN: 0302-9743. A pdf version is available. High Performance Computing: Clusters, Constellations, MPPs, and Future Directions, Jack Dongarra, Thomas Sterling, Horst Simon, and Erich Strohmaier, Computing in Science and Engineering, Volume 7, Number 2, March/April 2005, pp. 51-59, ISSN 1521-9615. A pdf version is available. New Grid Scheduling and Rescheduling Methods in the GrADS Project, F. Berman, H. Casanova, A Chien, K. Cooper, H. Dail, A. Dasgupta, W. Deng, J. Dongarra, L. Johnsson, K. Kennedy, C. Koelbel, B. Liu, X. Liu, A. Mandal, G. Marin, M. Mazina, J. Mellor-Crummey, C. Mendes, A. Olugbile, M. Patel, D. Reed, Z. Shi,O. Sievert, H. Xia, and A.YarKhan, International Journal of Parallel Programming, Vol. 33, No. 2, June 2005. A pdf version is available. Process Fault-Tolerance: Semantics, Design and Applications for High Performance Computing, Graham E. Fagg, Edgar Gabriel, Zizhong Chen, Thara Angskun, George Bosilca, Jelena Pjesivac-Grbovic, and Jack J. Dongarra, International Journal for High Performance Applications and Supercomputing, Vol. 19, N0. 4, pp 465-478. 2005. A pdf version is available. Recent Trends in the Marketplace of High Performance Computing, Erich Strohmaier, Jack J. Dongarra, Hans W. Meuer, and Horst D. Simon, Parallel Computing, Volume 31, Issues 3-4 , pp 261-273, March-April 2005. A pdf version is available. Scalable Fault Tolerant MPI: Extending the Recovery Algorithm, Graham E. Fagg, Thara Angskun, George Bosilca, Jelena Pjesivac-Grbovic, and Jack J. Dongarra, Lecture Notes in Computer Science, Volume 3666, Recent Advances in Parallel Virtual Machine and Messaging Passing Interface Users' Group Meeting Euro PVMMPI 2005, pp 67-75, Springer Heidelberg, 2005, ISSN: 0302-9743. A pdf version is available. Scanning the Special Issue on Program Generation Optimization and Platform Adaptation, J.M.F. Moura, M. Puschel, D. Padua, and J. Dongarra, Proceedings of the IEEE, Volume 93, Number 2, February 2005, pp 211-215, ISSN 0018-9219. A pdf version is available. Self Adapting Linear Algebra Algorithms and Software, Jim Demmel, Jack Dongarra, Victor Eijkhout, Erika Fuentes, Antoine Petitet, Rich Vuduc, R. Clint Whaley, Katherine Yelick, Proceedings of the IEEE, Volume 93, Number 2, February 2005, pp 293-312, ISSN 0018-9219. A pdf version is available. Self Adaptivity in Grid Computing, S. Vadhiyar and J. Dongarra, Concurrency and Computation: Practice and Experience. Volume 17, Issue 2-4, 2005, pp. 235-257. A pdf version is available. The Component Structure of a Self-Adapting Numerical Software System, Victor Eijkhout, Erika Fuentes, Thomas Eidson, and Jack Dongarra, International Journal of Parallel Programming, Vol. 33, No. 2, June 2005. A pdf version is available. The Top500 and Computational Science, A not so simple matter of software, Jack Dongarra, Scientific Computing, pp 14-16, August 2005 (non-refereed magazine publication). A pdf version is available. - 2004 - Simplified Grid Computing through Spreadsheets and NetSolve, David Abramson, Jack Dongarra, Eric Meek, Paul Roe, Zhiao Shi, High Performance Computing and Grid in Asia Pacific Region, 2004. Proceedings. Seventh International Conference, 22-22 July 2004 DOI: 10.1109/HPCASIA.2004.1324012 A pdf version is available. Building and Using a Fault Tolerant MPI Implementation, Graham E Fagg and Jack J Dongarra, International Journal of High Performance Applications and Supercomputing, Volume 18, number 3, Fall 2004, pp 353-362, ISSN 1094-3420. A pdf version is available. GrADSolve - A Grid-based RPC system for Remote Invocation of Parallel Software, Sathish Vadhiyar and Jack Dongarra, Journal of Parallel and Distributed Computing, 64(6):774-783, June 2004, ISSN 0743-7315. A pdf version is available. Self Adapting Software for Numerical Linear Algebra and LAPACK for Clusters, Z. Chen, J. Dongarra, P. Luszczek, and K. Roche, Parallel Computing 29(11-12):1723-1743, November/December 2003, ISSN 0167-8191. A pdf version is available. The Virtual Instrument: Support for Grid-enabled MCell Simulations, Henri Casanova, Thomas Bartol, Francine Berman, Erhan Gokcay, Adam Birnbaum, Jack Dongarra, Mark Ellisman, Marcio Faerman, Michelle Miller, Graziano Obertelli, Stuart Pomerantz, Terry Sejnowski, Joel Stiles, Rich Wolski, International Journal of High Performance Computing Applications, Volume 18, Number 1, Spring 2004, pp 3-18, ISSN 1094-3420. A pdf version is available. Toward an Accurate Model for Collective Communications, Sathish Vadhiyar, Graham Fagg, Jack Dongarra, International Journal of High Performance Computing Applications, Volume 18, Number 1, Spring 2004, pp 159-166, ISSN 1094-3420. A pdf version is available. Trends in High Performance Computing, Jack Dongarra, The Computer Journal, 47(4):399-403, The British Computer Society, 2004. A pdf version is available. - 2003 - Self Adaptability in Grid Computing, S. Vadhiyar and J. Dongarra, Currency and Computation: Practice and Experience, January 2003, ISSN 1532-0634. A pdf version is available. Self-adapting Numerical Algorithm for Next Generation Applications, J. Dongarra and V. Eijkhout, International Journal of High Performance Computing Applications 17(2):125-132, Summer 2003, ISSN 1094-3420. A pdf version is available. Self-adapting Numerical Software and Automatic Tuning of Heuristics, Jack Dongarra and Victor Eijkhout, Lecture Notes in Computer Science, Volume 2660, Springer-Verlag Heidelberg, pp 759 - 770, ISSN: 0302-9743, June 2003. A pdf version is available. SRS: A Framework for Developing Malleable and Migratable Parallel Applications for Distributed Systems, S. S. Vadhiyar and J. J. Dongarra, Parallel Processing Letters 13(2):291-312, June 2003, ISSN 0129-6264. A pdf version is available. The LINPACK Benchmark: Past, Present, and Future, J. J. Dongarra, P. Luszczek, and A. Petitet, Concurrency and Computation: Practice and Experience 15(9):803-820, August 2003, ISSN 1532-0634. A pdf version is available. - 2002 - A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures, G. Henry, D. Watkins, and J. Dongarra, SIAM Journal on Scientific Computing 24(1):284-311, January 2003, ISSN 1064-8275. A pdf version is available. An Updated Set of Basic Linear Algebra Subprograms (BLAS), L. S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, and R. C. Whaley, ACM Transactions on Mathematical Software 28(2):135-151, June 2002, ISSN 0098-3500. A pdf version is available. Automatic Translation of Fortran to JVM Bytecode, K. Seymour and J. Dongarra, Concurrency and Computation: Practice and Experience 15(3-5):207-222, March/April 2003, ISSN 1532-0626 (print), 1532-0634 (electronic). A pdf version is available. Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard, Special Issue - Part I, International Journal of High Performance Computing Applications 16(1):1-111, Spring 2002, ISSN 1094-3420. A pdf version is available. Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard, Special Issue - Part II, International Journal of High Performance Computing Applications 16(2):115-199, Spring 2002, ISSN 1094-3420. A pdf version is available. HARNESS Fault Tolerant MPI Design, Usage and Performance Issues, G. E. Fagg and J. J. Dongarra, Future Generation Computer Systems 18(8):1127-1142, October 2002, ISSN 0167-739X. A pdf version is available. Innovations of the NetSolve Grid Computing System, D. C. Arnold, H. Casanova, and J. Dongarra, Concurrency and Computation: Practice and Experience, Special Issue: Grid Computing Environments 14(13-15):1457-1479, November/December 2002, ISSN 1532-0626 (print), 1532-0634 (electronic). A pdf version is available. Middleware for the Use of Storage in Communication, M. Beck, D. Arnold, A. Bassi, F. Berman, H. Casanova, J. Dongarra, T. Moore, G. Obertelli, J. Plank, M. Swany, S. Vadhiyar, and R. Wolski, Parallel Computing 28(12):1773-1788, December 2002, ISSN 0167-8191. A pdf version is available. NetBuild: Transparent Cross-Platform Access to Computational Software Libraries, K. Moore and J. Dongarra, Concurrency and Computation: Practice and Experience 14(13-15):1445-1456, November/December 2002, ISSN 1532-0626 (print), 1532-0634 (electronic). A pdf version is available. - 2001 - A Comparison of Parallel Solvers for Diagonally Dominant and General Narrow-Banded Linear Systems, P. Arbenz, A. Cleary, J. Dongarra, and M. Hegland, Parallel and Distributed Computing Practices, Special Issue: Parallel Numerical Linear Algebra 2(4):385-400, November 1999, ISSN 1097-2803. A pdf version is available. Automated Empirical Optimization of Software and the ATLAS Project, R. Whaley, A. Petitet, and J. Dongarra, Parallel Computing 27(1-2):3-25, January 2001, ISSN 0167-8191. A pdf version is available. Biannual Top-500 Computer Lists Track Changing Environments for Scientific Computing, J. Dongarra, H. Meuer, H. Simon, and E. Strohmaier, SIAM News 34(9), November 2001, ISSN 0036-1445. A pdf version is available. HARNESS and Fault Tolerant MPI, G. Fagg, A. Bukovsky, and J. Dongarra, Parallel Computing 27(11):1479-1496, October 2001, ISSN 0167-8191. A pdf version is available. High Performance Computing Trends, J. J. Dongarra, H. W. Meuer, H. D. Simon, and E. Strohmaier, HERMIS 2:155-163, November 2001, ISSN 1108-7609. A pdf version is available. Iterative Solver Benchmark, J. Dongarra, V. Eijkhout, and H. van der Vorst, Scientific Programming 9(4):223-231, 2001, ISSN 1058-9244. A pdf version is available. Measuring Computer Performance: A Practitioner��s Guide, Book Review by D. Lilja, Cambridge University Press (ISBN 0-521-64105-5), SIAM Review 43(2):383-384, 2001, ISSN 0036-1445. A pdf version is available. Network-Enabled Solvers: A Step Toward Grid-Based Computing, J. Dongarra, SIAM News 34(10), December 2001, ISSN 0036-1445. A pdf version is available. Numerical Libraries and the Grid, A. Petitet, S. Blackford, J. Dongarra, B. Ellis, G. Fagg, K. Roche, and S. Vadhiyar, International Journal of High Performance Computing Applications 15(4):359-374, Winter 2001, ISSN 1094-3420. A pdf version is available. Numerical Libraries and Tools for Scalable Parallel Cluster Computing, J. Dongarra, S. Moore, and A. Trefethen, International Journal of High Performance Computing Applications 15(2):175-180, Summer 2001, ISSN 1094-3420. A pdf version is available. On the Convergence of Computational and Data Grids, D. C. Arnold, S. S. Vahdiyar, and J. J. Dongarra, Parallel Processing Letters 11(2-3):187-202, June/September 2001, ISSN 0129-6264. A pdf version is available. Recursive Approach in Sparse Matrix LU Factorization, J. Dongarra, V. Eijkhout, and P. Luszczek, Scientific Programming 9(1):51-60, 2001, ISSN 1058-9244. A pdf version is available. Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries, K. Kennedy, B. Broom, K. Cooper, J. Dongarra, R. Fowler, D. Gannon, L. Johnsson, J. Mellor-Crummey, and L. Torczon, Journal of Parallel and Distributed Computing 61(12):1803-1826, December 2001, ISSN 0743-7315. A pdf version is available. The GrADS Project: Software Support for High-Level Grid Application Development, F. Berman, A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, J. Mellor-Crummey, D. Reed, L. Torczon, and R. Wolski, International Journal of High Performance Computing Applications 15(4):327-344, Winter 2001, ISSN 1094-3420. A pdf version is available. The Quest for Petascale Computing, J. Dongarra and D. Walker, Computing in Science and Engineering 3(3):32-39, May/June 2001, ISSN 1521-9615. A pdf version is available. - 2000 - A Portable Programming Interface for Performance Evaluation on Modern Processors, S. Browne, J Dongarra, N. Garner, G. Ho, and P. Mucci, International Journal of High Performance Computing Applications 14(3):189-204, Fall 2000, ISSN 1094-3420. A pdf version is available. The Design And Implementation Of The Parallel Out-Of-Core Scalapack LU, QR, And Cholesky Factorization Routines, E. D'Azevedo and J. Dongarra, Concurrency: Practice and Experience 12(15):1481-1493, 2000, ISSN 1040-3108. A pdf version is available. - 1999 - A Comparison Of Parallel Solvers For General Narrow Banded Linear Systems, P. Arbenz, A. Cleary, J. Dongarra, and M. Hegland, Parallel and Distributed Computing Practices 2(4):385-400, December 1999, ISSN 1097-2803. A pdf version is available. A Parallel Divide and Conquer algorithm for the Symmetric Eigenvalue Problem, F. Tisseur and J. Dongarra, SIAM Journal on Scientific Computing 6(20):2223-2236, 1999, ISSN 1064-8275. A pdf version is available. Adaptive Scheduling for Task Farming with Grid Middleware, H. Casanova, M. Kim, J. Plank, and J. Dongarra, International Journal of High Performance Computing Applications 13(3):231-240, Fall 1999, ISSN 1094-3420. A pdf version is available. Algorithmic Issues on Heterogeneous Computing Platforms, Pierre Boulet, J. Dongarra, F. Rastello, Y. Robert, and F. Vivien, Parallel Processing Letters 9(2):197-213, 1999, ISSN 0129-6264. A pdf version is available. Algorithmic Redistribution Methods for Block-Cyclic Decompositions, A. P. Petitet and J. J. Dongarra, IEEE Transactions on Parallel and Distributed Systems 10(12):201-220, 1999, ISSN 1045-9219. A pdf version is available. Atlanta Organizers Put Mathematics to Work For the Math Sciences Community, M. Berry and J. Dongarra, SIAM News 32(6), July/August 1999, ISSN 0036-1445. A pdf version is available. Deploying Fault Tolerance and Task Migration with NetSolve, J. S. Plank, H. Casanova, M. Beck, and J. J. Dongarra, Future Generation Computer Systems 15(5-6):745-755, October 1999, ISSN 0167-739X. A pdf version is available. Experiences with Windows NT as a Cluster Computing Platform for Parallel Computing, M. Fischer and J. Dongarra, Parallel and Distributed Computing Practices, Special Issue: Cluster Computing 2(2):119-128, June 1999, ISSN 1097-2803. A pdf version is available. HARNESS: A Next Generation Distributed Virtual Machine, M. Beck, J. J. Dongarra, G. E. Fagg, G. A. Geist, P. Gray, J. Kohl, M. Migliardi, K. Moore, T. Moore, P. Papadopoulous, S. L. Scott, and V. Sunderam, Future Generation Computer Systems 15(5-6):571-582, October 1999, ISSN 0167-739X. A pdf version is available. JLAPACK - Compiling LAPACK Fortran to Java, D. Doolin, J. Dongarra, and K. Seymour, Scientific Programming 7(2):111-138, 1999, ISSN 1058-9244. A pdf version is available. Logistical Quality of Service in NetSolve, M. Beck, H. Casanova, J. Dongarra, T. Moore, J. Plank, F. Berman, and R. Wolski, Computer Communications 22(11):1034-1044, 1999, ISSN 0140-3664. A pdf version is available. Numerical Linear Algebra Algorithms and Software, J. Dongarra and V. Eijkhout, Journal of Computational and Applied Mathematics 123(1-2):489-514, November 1, 2000, ISSN 0377-0427. A pdf version is available. Scalable Networked Information Processing Environment (SNIPE), G. E. Fagg, K. Moore, and J. J. Dongarra, Future Generation Computer Systems 15(5-6):595-605, October 1999, ISSN 0167-739X. A pdf version is available. Static Tiling For Heterogeneous Computing Platforms, P. Boulet, J. Dongarra, Y. Robert, and F. Vivien, Parallel Computing 25(5):547-568, 1999, ISSN 0167-8191. A pdf version is available. Stochastic Performance Prediction for Iterative Algorithms in Distributed Environments, H. Casanova, M. Thomason, and J. Dongarra, Journal of Parallel and Distributed Computing 58(1):68-91, July 1999, ISSN 0743-7315. A pdf version is available. The Marketplace for High-Performance Computers, E. Strohmaier, J. Dongarra, H. Meuer, and H. Simon, Parallel Computing 25(13-14):1517-1545, December 1999, ISSN 0167-8191. A pdf version is available. Tiling On Systems with Communication/Computation Overlap, P.-Y. Calland, J. Dongarra, and Y. Robert, Concurrency: Practice and Experience 11(3):139-153, 1999, ISSN 1040-3108. A pdf version is available. - 1998 - Applying NetSolve's Network Enabled Server, H. Casanova and J. Dongarra, IEEE Computational Science and Engineering 5(3):57-67, July/September 1998, ISSN 1070-9924. A pdf version is available. Determining the Idle Time of a Tiling: New Results, F. Desprez, J. Dongarra, F. Rastello, and Yves Robert, Journal of Computing and Information Science in Engineering (Special Issue on Compiler Techniques for High-Performance Computing) 14(1):167-190, March 1998, ISSN 1530-9827. A pdf version is available. Developing Numerical Libraries in Java, R. F. Boisvert, J. J. Dongarra, R. Pozo, K. A. Remington, and G. W. Stewart, Concurrency: Practice and Experience 10(11-13):1117-1129, 1998, ISSN 1040-3108. A pdf version is available. National HPCC Software Exchange (NHSE): Uniting the High Performance Computing and Communications Community, S. Browne, J. Dongarra, J. Horner, P. McMahan, S. Wells, D-Lib Magazine (Electronic), May 1998, ISSN 1082-9873. A pdf version is available. Programming Tools and Environments, J. Saltz, A. Sussman, S. Graham, J. Demmel, S. Baden, and J. Dongarra, Communications of the ACM 41(11):64-73, November 1998, ISSN 0001-0782 A pdf version is available. Scheduling Block-Cyclic Array Redistribution, F. Desprez, J. Dongarra, A. Petitet, C. Randriamaro, and Y. Robert, IEEE Transactions on Parallel and Distributed Systems 9(2):192-205, February 1998, ISSN 1045-9219. A pdf version is available. Using Agent-based Software for Scientific Computing in the NetSolve System, H. Casanova and J. Dongarra, Parallel Computing 24(12-13):1777-1790, November, 1998, ISSN 0167-8191.k A pdf version is available. - 1997 - Changing Technologies of HPC, J. J. Dongarra, H. W. Meuer, H. D. Simon, and E. Strohmaier, Future Generation Computer Systems 12(5):461-474, April 1997, ISSN 0167-739X. A pdf version is available. Fault Tolerant Matrix Operations for Networks of Workstations Using Diskless Checkpointing, J. Plank, Y. Kim, and J. Dongarra, Journal of Parallel and Distributed Computing 43(2):125-138, 1997, ISSN 0743-7315. A pdf version is available. Java Access to Numerical Libraries, H. Casanova, J. Dongarra, and D. Doolin, Concurrency: Practice and Experience 9(11):1279-1291, 1997, ISSN 1040-3108. A pdf version is available. Key Concepts for Parallel Out of Core LU Factorization, J. Dongarra, S. Hammarling, and D. Walker, Parallel Computing 23(1-2):49-70, April 1997. ISSN 0167-8191. A pdf version is available. Message-Passing Performance of Various Computers, J. Dongarra and T. Dunigan, Concurrency: Practice and Experience 9(10):915-926, 1997, ISSN 1040-3108. A pdf version is available. NetSolve: A Network-Enabled Server for Solving Computational Science Problems, H. Casanova, and J. Dongarra, The International Journal of Supercomputer Applications and High Performance Computing 11(3):212-223, Fall 1997. ISSN 1078-3482. A pdf version is available. Practical Experience in the Numerical Dangers of Heterogeneous Computing, L. S. Blackford, A. Cleary, J. Demmel, J. Dongarra, I. Dhillon, S. Hammarling, A. Petitet, H. Ren, K. Stanley, and R. C. Whaley, ACM Transactions on Mathematical Software 23(2):133-147, June 1997, ISSN 0098-3500. A pdf version is available. The Spectral Decomposition of Nonsymmetric Matrices on Distributed Memory Computers, J. Bai, J. Demmel, J. Dongarra, A. Petitet, H. Robinson, and K. Stanley, SIAM Journal on Scientific Computing 18(5):1446-1461, 1997, ISSN 0196-5204. A pdf version is available. Top500 Supercomputer Sites, J. Dongarra, H. W. Meuer and E. Strohmaier, Supercomputer 67:89-120, 1997, ISSN 0168-7875. A pdf version is available. - 1996 - A Message Passing Standard for MPP and Workstations, J. Dongarra, S. W. Otto, M. Snir, and D. Walker, Communications of the ACM 39(7):84-90, July 1996, ISSN 0001-0782. A pdf version is available. Algorithmic Bombardment for the Iterative Solution of Linear Systems: A Poly-Iterative Approach, R. Barrett, M. Berry, J. Dongarra, V. Eijkhout, and C. Romine, Journal of Computational and Applied Mathematics 74(1-2):91-110, November 1996, ISSN 0377-0427. A pdf version is available. Chebyshev tau - QZ Algorithm Methods for Calculating Spectra of Hydrodynamic Stability Problems, J. Dongarra, B. Straughan and D. W. Walker, Applied Numerical Mathematics 22(4):399-435, 1996, ISSN 0168-9274. A pdf version is available. Future Linear Algebra Libraries, J. Dongarra, IEEE Computational Science and Engineering 3(2):38-40, Summer 1996, ISSN 1070-9924. A pdf version is available. LAPACK for Fortran90, J. Dongarra, J. Du Croz, S. Hammarling, J. Wasniewski, A. Zemla, Applied Mathematics and Computer Science 6(2):101-109, 1996, ISSN 1641-876X. A pdf version is available. MPI: A Standard Message Passing Interface, J. Dongarra and D. Walker, Supercomputer 12(1):56-68, January 1996, ISSN 0168-7875. Overview of High-Performance Computers, A. van der Steen and J. Dongarra, Electronic Journal of the NHSE Review 1(1), 1996, HTML. PB-BLAS: A Set of Parallel Block Basic Linear Algebra Subroutines, J. Choi, J. Dongarra, and D. Walker, Concurrency: Practice and Experience 8(7):517-535, September 1996, ISSN 1040-3108. A pdf version is available. PVMPI: An Integration of PVM and MPI Systems, G. Fagg and J. Dongarra, Calculateurs Parallèles 8(2):151-166, 1996, Hermes, ISSN 1260-3198. A pdf version is available. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance, J. Choi, J. Demmel, J. Dongarra, I. Dhillon, S. Ostrouchov, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley, Computer Physics Communications 97(1-2):1-15, August 1996, ISSN 0010-4655. A pdf version is available. The Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines, J. Choi, J. J. Dongarra, L. S. Ostrouchov, A. P. Petitet, D. W. Walker and R. C. Whaley, Scientific Programming 5(3):173-184, Fall 1996, ISSN 1058-9244. A pdf version is available. - 1995 - A Highly Parallel Algorithm for the Reduction of a Nonsymmetric Matrix to Block Upper-Hessenberg Form, M. W. Berry, J. Dongarra, and Y. Kim, Parallel Computing 21(8):1189-1212, August 1995, ISSN 0167-8191. A pdf version is available. Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers, J. Choi, J. Dongarra, and D. Walker, Parallel Computing 21(9):1387-1405, 1995, ISSN 0167-8191. A pdf version is available. Performance Study of LU Factorization with Low Communication Overhead on Multiprocessors, F. Desprez, J. Dongarra, and B. Tourancheau, Parallel Processing Letters 5(2):157-169, June 1995, ISSN 0129-6264. A pdf version is available. Recent Enhancements to PVM, A. Beguelin, J. Dongarra, A. Geist, R. Manchek, and V. Sunderam, International Journal of Supercomputer Applications and High Performance Computing 9(2):108-127, Summer 1995, ISSN 1078-3482. A pdf version is available. Software Distribution Using XNETLIB, J. Dongarra, T. Rowan and R. Wade, ACM Transactions on Mathematical Software 21(1):79-88, March 1995, ISSN 0098-3500. A pdf version is available. Software Libraries for Linear Algebra Computations on High Performance Computers, J. Dongarra and D. Walker, SIAM Review 37(2):151-180, June 1995, ISSN 0036-1445. A pdf version is available. The Design of a Parallel, Dense Linear Algebra Software Library: Reduction to Hessenberg, Tridiagonal, and Bidiagonal Form, J. Choi, J. Dongarra, and D. Walker, Numerical Algorithms 10(3-4):379-400, 1995, ISSN 1017-1398. A pdf version is available. The National HPCC Software Exchange, S. Browne, J. Dongarra, S. Green, K. Moore, T. Rowan, R. Wade, G. Fox, K. Hawick K. Kennedy, J. Pool, R. Stevens, B. Olsen, and T. Disz, IEEE Computational Science and Engineering 2(2):62-69, Summer 1995, ISSN 1070-9924. A pdf version is available. The Netlib Mathematical Software Repository, S. Browne, J. Dongarra, E. Grosse, and T. Rowan, D-Lib Magazine, Electronic Journal, September 1995, ISSN 1082-9873, http://www.dlib.org/dlib/september95/netlib/09browne.html. A pdf version is available. The ParkBench Benchmark Collection, J. Dongarra and T. Hey, Supercomputer 11(2-3):94-115, June 1995, ISSN 0168-7875. Top500 Supercomputer Sites, J. Dongarra, H. Meuer and E. Strohmaier, Supercomputer 11(2-3):133-194, June 1995, ISSN 0168-7875. A pdf version is available. - 1994 - CRPC Research into Linear Algebra Software for High-Performance Computers, J. Choi, J. J. Dongarra, R. Pozo, D. C. Sorensen, and D. W. Walker, International Journal of Supercomputing Applications 8(2):99-118, Summer 1994, ISSN 0890-2720. A pdf version is available. Experiences with CODE and HeNCE in Visual Programming for Parallel Computing, J. C. Browne, J. Dongarra, S. I. Hyder, K. Moore, and P. Newton, IEEE Parallel and Distributed Technology 3(1):75-83, Spring 1994, ISSN 1063-6552. A pdf version is available. HeNCE: A Heterogeneous Network Computing Environment, A. Beguelin, J. J. Dongarra, G. A. Geist, R. Manchek, and K. Moore, Scientific Programming 3(1):49-60, Spring 1994, ISSN 1058-9244. A pdf version is available. MPI: A Message Passing Interface Standard, Special Issue, International Journal of Supercomputer Applications 8(3-4):159-416, Fall/Winter 1994, ISSN 0890-2720. A pdf version is available. PARKBENCH Report - 1: Public International Benchmarks for Parallel Computers, PARKBENCH Committee (assembled by R. Hockney and M. Berry, with contributions from D. Bailey, M. Berry, J. Dongarra, V. Getov, T. Haupt, T. Hey, R. Hockney, and D. Walker), Scientific Programming 3(2):101-146, 1994, ISSN 1059-9244. A pdf version is available. PDS: A Performance Database Server, M. W. Berry, J. Dongarra, B. H. LaRose, and T. Letsche, Scientific Programming 3(2):147-156, 1994, ISSN 1059-9244. A pdf version is available. PUMMA: Parallel Universal Matrix Multiplication Algorithms on Distributed Memory Concurrent Computers, J. Choi, J. J. Dongarra, and D. W. Walker, Concurrency: Practice and Experience 6(7):543-570, October 1994, ISSN 1040-3108. A pdf version is available. Scalability Issues in the Design of a Library for Dense Linear Algebra, J. J. Dongarra, R. A. van de Geijn, and D. W. Walker, Journal of Parallel and Distributed Computing 22(3):523-537, September 1994, ISSN 0743-7315. A pdf version is available. The PVM Concurrent Computing System: Evolution, Experiences, and Trends, V. S. Sunderam, J. Dongarra, G. A. Geist, and R Manchek, Parallel Computing 20(4):531-545, March 31, 1994, ISSN 0167-8191. A pdf version is available. - 1993 - A Parallel Algorithm for the Non-Symmetric Eigenvalue Problem, J. J. Dongarra and M. Sidani, SIAM Journal on Scientific Computing 14(3):542-569, May 1993, ISSN 1064-8275. A pdf version is available. Integrated PVM Framework Supports Heterogeneous Network Computing, J. Dongarra, G. A. Geist, R. Manchek, and V. S. Sunderam, Computers in Physics 7(2):166-175, April 1993, ISSN 0895-6111. A pdf version is available. Linear Algebra Libraries for High-Performance Computers: A Personal Perspective, J. Dongarra, IEEE Parallel and Distributed Technology: Systems and Applications 1(1):17-24, February 1993, ISSN 1063-6552. A pdf version is available. Performance of LAPACK: A Portable Library of Numerical Linear Algebra Routines, E. C. Anderson and J. Dongarra, Proceedings of the IEEE 81(8):1094-1102, August 1993, ISSN 0018-9219. A pdf version is available. Supporting Heterogeneous Network Computing: PVM, J. Dongarra, A. Geist, R. Manchek, and V. Sunderam, Chemical Design Automation News 8(9-10):36-42, September/October 1993, ISSN 0886-6716. A pdf version is available. Visualization and Debugging in a Heterogeneous Environment, A. Beguelin, J. Dongarra, A. Geist, and V. Sunderam, IEEE Computer 26(6):88-95, June 1993, ISSN 0018-9162. A pdf version is available. - 1992 - ALGORITHM 710; FORTRAN Subroutines for Computing the Eigenvalues and Eigenvectors of a General Matrix by Reduction to General Tridiagonal Form, J. J. Dongarra, G. A. Geist, and C. H. Romine, ACM Transactions on Mathematical Software 18(4):392-400, December 1992, ISSN 0098-3500. A pdf version is available. Generalized QR Factorization and Its Applications, E. Anderson, Z. Bai, and J. Dongarra, Linear Algebra and Its Applications 162-164:243-271, February 1992, ISSN 0024-3795. A pdf version is available. Numerical Considerations in Computing Invariant Subspaces, J. J. Dongarra, S. Hammarling and J. H. Wilkinson, SIAM Journal on Matrix Analysis and Applications 13(1):145-161, January 1992, ISSN 0895-4798. A pdf version is available. Performance of Various Computers Using Standard Sparse Linear Equations Solving Techniques, J. J. Dongarra and H. A. van der Vorst, Supercomputer 9(5):17-29, September 1992, ISSN 0168-7875. A pdf version is available. Reduction to Condensed Form for the Eigenvalue Problem on Distributed Memory Architectures, J. J. Dongarra and R. A. van de Geijn, Parallel Computing 18(9):973-982, September 1992, ISSN 0167-8191. A pdf version is available. - 1991 - A Comparative Study of Automatic Vectorizing Compilers, D. Levine, D. Callahan, and J. Dongarra, Parallel Computing, 17(10-11):1223-1244, December 1991, ISSN 0167-8191. A pdf version is available. Opening the Door to Heterogeneous Network Supercomputing, A. Beguelin, J. Dongarra, A. Geist, R. Manchek, and V. Sunderam, Supercomputing Review 4(9):44-45, September 1991, ISSN 1048-6836. A pdf version is available. Parallel Loops - A Test Suite for Parallelizing Compilers: Description and Example Results, J. Dongarra, M. Furtney, S. Reinhardt and J. Russell, Parallel Computing 17(10-11):1247-1257, December 1991, ISSN 0167-8191. A pdf version is available. Special Report: 1990 Gordon Bell Prize Winners, J. Dongarra, A. H. Karp, K. Miura, and H. Simon, IEEE Software 8(3):92-97, 102, May/June 1991, ISSN 0740-7459. A pdf version is available. The IBM RISC System/6000 and Linear Algebra Operations, J. Dongarra, P. Mayes and G. Radicati di Brozolo, Supercomputer 8(4):15-30, July 1991, ISSN 0168-7875. A pdf version is available. - 1990 - A Set of Level 3 Basic Linear Algebra Subprograms, J. J. Dongarra, J. Du Croz, S. Hammarling, and I. S. Duff, ACM Transactions on Mathematical Software 16(1):1-17, March 1990, ISSN 0098-3500. A pdf version is available. Evolution of Numerical Software for Dense Linear Algebra, Jack Dongarra and Sven Hammarling, In M. G. Cox and S. Hammarling, editors, Reliable Numerical Computation, pages 297-327. Oxford University Press, Oxford, UK, 1990. A pdf version is available. Automatic Blocking of Nested Loops, R. Schreiber and J. Dongarra, University of Tennessee Technical Report CS-90-108, Knoxville, TN 37996, USA, 1990. A pdf version is available. A Tool to Aid in the Design, Implementation, and Understanding of Matrix Algorithms for Parallel Processors, J. Dongarra, O. Brewer, J. A. Kohl, and S. Fineberg, Journal of Parallel and Distributed Computing 9(2):185-202, June 1990, ISSN 0743-7315. A pdf version is available. Algorithm 679; A Set of Level 3 Basic Linear Algebra Subprogram: Model Implementation and Test Programs, J. J. Dongarra, J. Du Croz, S. Hammarling, and I. S. Duff, ACM Transactions on Mathematical Software 16(1):18-28, March 1990, ISSN 0098-3500. A pdf version is available. - 1989 - Block Reduction of Matrices to Condensed Forms for Eigenvalue Computations, J. J. Dongarra, S. J. Hammarling, and D. C. Sorensen, Journal of Computational and Applied Mathematics 27(1-2):215-227, September 1989, ISSN 0377-0427. A pdf version is available. Shopping for Mathematical Software Electronically, J. Dongarra and E. Grosse, IEEE Potentials 8(1):37-38, February 1989, ISSN 0278-6648. A pdf version is available. - 1988 - Algorithm 656: An Extended Set of Basic Linear Algebra Subprograms: Model Implementation and Test Programs, J. J. Dongarra, J. Du Croz, S. Hammarling, R. J. Hanson, ACM Transactions on Mathematical Software 14(1):18-32, March 1988, ISSN 0098-3500. A pdf version is available. An Extended Set of Fortran Basic Linear Algebra Subprograms, J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson, ACM Transactions on Mathematical Software 14(1): 1-17, March 1988, ISSN 0098-3500. A pdf version is available. Programming Methodology and Performance Issues for Advanced Computer Architectures, J. J. Dongarra, D. C. Sorensen, K. Connolly, and J. Patterson, Parallel Computing 8(1-3):41-58, October 1988, ISSN 0167-8191. A pdf version is available. Tools to Aid in the Analysis of Memory Access Patterns for FORTRAN Programs, O. Brewer, J. Dongarra, and D. Sorensen, Parallel Computing 9(1):25-35, December 1988, ISSN 0167-8191. A pdf version is available. - 1987 - A Fully Parallel Algorithm for the Symmetric Eigenvalue Problem, J. J. Dongarra and D. C. Sorensen, SIAM Journal on Scientific and Statistical Computing 8(2):139-154, March 1987, ISSN 0196-5204. A pdf version is available. A Portable Environment for Developing Parallel FORTRAN Programs, J. J. Dongarra and D. C. Sorensen, Parallel Computing 5(1-2):175-186, July 1987, ISSN 0167-8191. A pdf version is available. Computer Benchmarking: Paths and Pitfalls, J. Dongarra, J. Martin, and J. Worlton, IEEE Spectrum 24(7): 38-43, June 1987, ISSN 0018-9235. A pdf version is available. Distribution of Mathematical Software via Electronic Mail, J. J. Dongarra and E. Grosse, Communications of the ACM 30(5):403-407, May 1987, ISSN 0001-0782. A pdf version is available. Solving Banded Systems on a Parallel Processor, J. J. Dongarra and L. Johnsson, Parallel Computing 5(1-2):219-246, July 1987, ISSN 0167-8191. A pdf version is available. - 1986 - How Do the "Minisupers" Stack Up?, J. J. Dongarra, IEEE Computer 19(3):93, 100, March 1986, ISSN 0018-9162. A pdf version is available. Implementing Dense Linear Algebra Algorithms Using Multitasking on the CRAY X-MP-4 (Or Approaching the Gigaflop), J. J. Dongarra and T. Hewitt, SIAM Journal on Statistical and Scientific Computing 7(1):347-350, January 1986, ISSN 0196-5204. A pdf version is available. Implementation of Some Concurrent Algorithms for Matrix Factorization, J. J. Dongarra, A. H. Sameh, and D. C. Sorensen, Parallel Computing 3(1):25-34, March 1986, ISSN 0167-8191. A pdf version is available. Linear Algebra on High-Performance Computers, J. Dongarra and D. Sorensen, Applied Mathematics and Computation 20(1-2):57-88, September 1986, ISSN 0096-3003. A pdf version is available. Squeezing the Most out of High Performance Computers for Finding the Eigenvalues, J. Dongarra, L. Kaufman, and S. Hammarling, Linear Algebra and Its Applications 77:113-136, May 1986, ISSN 0024-3795. A pdf version is available. - 1985 - A Proposal for an Extended Set of Fortran Basic Linear Algebra Subprograms, J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson, ACM SIGNUM Newsletter 20(1):2-18, January 1985, ISSN 0163-5778. A pdf version is available. Algorithm Design for Different Computer Architectures, J. J. Dongarra, B. T. Smith, and D. Sorensen, IEEE Software 2(4):79-80, July 1985. A pdf version is available. - 1984 - A Collection of Parallel Linear Equations Routines for the Denelcor HEP, J. J. Dongarra and R. E Hiromoto, Parallel Computing 1(2):133-142, December 1984, ISSN 0167-8191. A pdf version is available. EISPACK - A Collection for Solving Eigenvalue Problems, J. Dongarra and C. Moler, in Sources and Development of Mathematical Software, W. R. Cowell, ed., pp. 68-87, Prentice-Hall: Upper Saddle River, NY, 1984, ISBN 0-13-823501-5. A pdf version is available. Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine, J. J. Dongarra, F. G. Gustavson and A. Karp, SIAM Review 26(1):91-112, January 1984, ISSN 0036-1445. A pdf version is available. Multiprocessing Linear Algebra Algorithms on the CRAY X-MP-2: Experiences with Small Granularity, S. S. Chen, J. J. Dongarra, and C. C. Hsiung, Journal of Parallel and Distributed Computing 1(1):22-31, August 1984, ISSN 0743-7315. A pdf version is available. On Some Parallel Banded System Solvers, J. J. Dongarra and A. H. Sameh, Parallel Computing 1(3):223-235, December 1984. A pdf version is available. Performances comparés de 80 ordinateurs sur des programmes Fortran, J. J. Dongarra, Technique et Science Informatiques 3(5):355-360, 1984, ISSN 0752-4072. A pdf version is available. Solving the Secular Equation Including Spin Orbit Coupling for Systems with Inversion and Time Reversal Symmetry, J. J. Dongarra, J. R. Gabriel, D. D. Koelling, and J. H. Wilkinson, Journal of Computational Physics 54(2):278-288, May 1984, ISSN 0021-9991. A pdf version is available. Squeezing the Most out of an Algorithm in CRAY FORTRAN, J. J. Dongarra, and S. C. Eisenstat, ACM Transactions on Mathematical Software 10(3):219-230, September 1984, ISSN 0098-3500. A pdf version is available. The Eigenvalue Problem for Hermitian Matrices with Time Reversal Symmetry, J. J. Dongarra, J. R. Gabriel, D. D. Koelling, and J. H. Wilkinson, Linear Algebra and Its Applications 60:27-42, August 1984, ISSN 0024-3795. A pdf version is available. - 1983 - Improving the Accuracy of Computed Eigenvalues and Eigenvectors, J. J. Dongarra, C. B. Moler and J. H. Wilkinson, SIAM Journal on Numerical Analysis 20(1):23-45, February 1983, ISSN 0036-1429. A pdf version is available. Improving the Accuracy of Computed Singular Values, J. J. Dongarra, SIAM Journal on Scientific and Statistical Computing 4(4):712-719, December 1983, ISSN 0196-5204. A pdf version is available. Performance of Various Computers Using Standard Linear Equations Software in a Fortran Environment, J. J. Dongarra, ACM SIGARCH Computer Architecture News 11(5):22-27, December 1983, ISSN 0163-5964. A pdf version is available. - 1982 - Algorithm 589: SICEDR: A FORTRAN Subroutine for Improving the Accuracy of Computed Matrix Eigenvalues, J. J. Dongarra, ACM Transactions on MathematicalSoftware 8(4):371-375, December 1982, ISSN 0098-3500. A pdf version is available. - 1979 - Unrolling Loops in Fortran, J. Dongarra and A. R. Hinds, Software-Practice and Experience, 9(3):219-226, March 1979, ISSN 0038-0644. A pdf version is available.



top dongarra@icl.utk.edu

- 2025-

- 2025-

- 2024-

- 2024-

- 2023-

- 2023-

- 2022 -

- 2022 -

- 2021 -

- 2021 -

- 2020 -

- 2019 -

- 2018 -

- 2017 -

- 2016 -

- 2015 -

- 2014 -

- 2013 -

- 2012 -

- 2011 -

- 2010 -

- 2009 -

- 2008 -

- 2007 -

- 2006 -

- 2005 -

- 2004 -

- 2003 -

- 2002 -

- 2001 -

- 2000 -

- 1999 -

- 1998 -

- 1997 -

- 1996 -

- 1995 -

- 1994 -

- 1993 -

- 1992 -

- 1991 -

- 1990 -

- 1989 -

- 1988 -

- 1987 -

- 1986 -

- 1985 -

- 1984 -

- 1983 -

- 1982 -

- 1979 -