Papers:
See my Google
Scholar page for a close-to-comprehensive
list of publications and my ResearchGate
page for the full text of many. (My DBLP
page provides a good list too, organized nicely by
year, and with a co-author
index.)
-
2024-
The Co-Evolution of Computational Physics and High-Performance Computing, Dongarra, J., Keyes, D., Nat Rev Phys (2024). https://doi.org/10.1038/s42254-024-00750-z
MAGMA: Enabling Exascale Performance with
Accelerated BLAS and LAPACK for Diverse GPU
Architectures, Abdelfattah, Ahmad;
Beams, Natalie; Carson, Robert; Ghysels, Pieter; Kolev,
Tzanio; Stitt, Thomas; Vargas, Arturo; Tomov, Stanimire;
Dongarra, Jack, submitted to International Journal
of High Performance Computing Applications, February
2024. https://doi.org/10.1177/10943420241261960
Trends in computational science: natural
language processing and network analysis of 23 years
of ICCS publications, Lijing Luo,
Sergey Kovalchuk, Valeria Krzhizhanovskaya, Maciej
Paszynski and Jack Dongarra, ICCS 2024 conference, https://doi.org/10.1007/978-3-031-63751-3_2
HPC
Forecast: Cloudy and Uncertain, Reed, D.,
Gannon, D., Dongarra, J.,Communications of the
ACM, February 2023, Vol. 66 No. 2, Pages 82-90,
10.1145/3552309 Also https://vimeo.com/784558423
Special
Issue on Clusters, Clouds, and Data for Scientific
Computing, Jack Dongarra
and Bernard Tourancheau, The International Journal of
High Performance Computing Applications,
https://doi.org/10.1177/10943420231180188
GPU-based LU
Factorization and Solve on Batches of Matrices with
Band Structure, Ahmad
Abdelfattah, Stanimire Tomov, Piotr Luszczek, Hartwig
Anzt, Jack Dongarra, SC-W 2023: Proceedings of the SC
'23 Workshops of The International Conference on High
Performance Computing, Network, Storage, and Analysis.
Using
Additive Modifications in LU Factorization Instead of
Pivoting, Neil
Lindquist, Piotr Luszczek, Jack Dongarra, The ACM
International Conference on Supercomputing 2023 (ACM
ICS'23). 10.1145/3577193.3593731
Task-Based
Polar Decomposition Using SLATE on Massively Parallel
Systems with Hardware Accelerators, Dalal
Sukkari, Mark Gates, Mohammed Al Farhan, Hartwig Anzt,
Jack Dongarra, Workshops of The International Conference
on High Performance Computing, Network, Storage, and
Analysis- SC23, 10.1145/3624062.3624248
Combining Multitask and Transfer Learning with Deep
Gaussian Processes for Autotuning-Based Performance
Engineering, Luszczek, P., W. M.
Sid-Lakhdar, and J. Dongarra, The International Journal
of High Performance Computing Applications, March 2023.
doi.org/10.1177/1094342023116636
An Introduction to High
Performance Computing and its Intersection with
Advances in Modeling REEs and Actinides, Deborah A. Penchoff,Edward
Valeev,Heike Jagode,Piotr
Luszczek,Anthony Danalis, George Bosilca,
Robert J. Harrison, Jack Dongarra,Theresa
L. Windus American Chemical Society.
“Computational Science for Lanthanides and Actinides”
DOI:10.1021/bk-2021-1388.ch001
Using
Additive Modifications in LU Factorization Instead of
Pivoting, Lindquist,
N., P. Luszczek, and J. Dongarra, 37th ACM
International Conference on Supercomputing (ICS'23),
Orlando, FL, ACM, June 2023. https://doi.org/10.1145/3577193.3593731
Addressing
Irregular Patterns of Matrix Computations on GPUs and
Their Impact on Applications Powered by Sparse Direct
Solvers,
Abdelfattah,
A., P. Ghysels, W. Boukaram, S. Tomov, X. Sherry Li, and J. Dongarra, 2022
International Conference for High Performance
Computing, Networking, Storage and Analysis (SC22),
Dallas, TX, IEEE Computer Society, pp. 354-367, November
2022.
Randomized
numerical linear algebra: A perspective on the field
with an eye to software,
Riley Murray, James Demmel, Michael W Mahoney, N
Benjamin Erichson, Maksim Melnichenko, Osman Asif Malik,
Laura Grigori, Piotr Luszczek, Michał Dereziński, Miles
E Lopes, Tianyu Liang, Hengrui Luo, Jack Dongarra,
2023/2/22, arXiv preprint arXiv:2302.11474
Reshaping
Geostatistical
Modeling and Prediction for Extreme-Scale
Environmental Applications,
Cao,
Qinglei and Abdulah, Sameh and Alomairy, Rabab and Pei,
Yu and Nag, Pratik and Bosilca, George and Dongarra,
Jack and Genton, Marc G. and Keyes, David E. and Ltaief,
Hatem and Sun, Yin,
SC22:
International Conference for High Performance
Computing, Networking, Storage and Analysis,
Dallas, TX, USA, 2022, pp. 1-12, doi:
10.1109/SC41404.2022.00007.
Proposed Consistent Exception Handling for the BLAS and LAPACK, Demmel, James and Dongarra, Jack and Gates, Mark and Henry, Greg and Langou, Julien and Li, Xiaoye and Luszczek, Piotr and Pereira, Weslley and Riedy, Jason and Rubio-González, Cindy
2022
IEEE/ACM Sixth International Workshop on Software
Correctness for HPC Applications (Correctness),
Dallas, TX, USA, 2022, pp. 1-9, doi:
10.1109/Correctness56720.2022.00006.
Mixed-Precision
Algorithm
for Finding Selected Eigenvalues and Eigenvectors of
Symmetric and Hermitian Matrices,
Y. M. Tsai, P. Luszczek and J. Dongarra, ," 2022
IEEE/ACM Workshop on Latest Advances in Scalable
Algorithms for Large-Scale Heterogeneous Systems
(ScalAH), Dallas, TX, USA, 2022, pp.
43-50, doi: 10.1109/ScalAH56622.2022.00011.
Portable and Efficient Dense Linear Algebra in the Beginning of the Exascale Era, Gates, Mark and YarKhan, Asim and Sukkari, Dalal and Akbudak, Kadir and Cayrols, Sebastien and Bielich, Daniel and Abdelfattah, Ahmad and Farhan, Mohammed Al and Dongarra, Jack, 2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Dallas, TX, USA, 2022, pp. 36-46, doi: 10.1109/P3HPC56579.2022.00009.
Can
the
United States Maintain Its Leadership in
High-Performance Computing?
Dongarra, Deelman, et al, A report from the ASCAC
Subcommittee on American Competitiveness and Innovation
to the ASCR office, https://doi.org/10.2172/1989107
Memory Traffic and
Complete Application Profiling with PAPI
Multi-Component Measurements, Barry, D., H.
Jagode,
A. Danalis, and J.
Dongarra,
2023 IEEE
International Parallel and Distributed Processing
Symposium Workshops (IPDPSW), St.
Petersburg, Florida, IEEE, 2023. DOI:10.1109/IPDPSW59300.2023.00070
http://www.netlib.org/utk/people/JackDongarra/PAPERS/Addressing_Irregular-sc22.pdfA
Not So Simple Matter of Software; The Evolution of
Mathematical Software: Software and Algorithms Follow
the Hardware, Jack Dongarra, Accepted in CACM, August 2,
2022.
Mixed-Precision
Algorithm for Finding Selected Eigenvalues and
Eigenvectors of Symmetric and Hermitian Matrices, Tsai, Y. M.,
Luszczek, P., J. Dongarra, J., the 13th Workshop
on Latest Advances in Scalable Algorithms for
Large-Scale Heterogeneous Systems (ScalAH22) DOI: 10.1109/ScalAH56622.2022.00011
Proposed
Consistent
Exception Handling for the BLAS and
LAPACK, Demmel, J.,
Dongarra, J., Gates, M., Henry, G., Langou,
J., Li, X., Luszczek, P., Pereira, W.,
Riedy, J. Rubio-González, C., 2022 IEEE/ACM
Sixth International Workshop on Software
Correctness for HPC Applications 10.1109/Correctness56720.2022.00006
Threshold
Pivoting for Dense LU Factorization, Lindquist, N.,
Gates, M., Luszczek, P., Dongarra, J., 2022 IEEE/ACM
Workshop on Latest Advances in Scalable Algorithms for
Large-Scale Heterogeneous Systems (ScalAH). 10.1109/ScalAH56622.2022.00010
Performance
Framework and Runtime Analysis of Parallel FFT on
Large Multi-GPU Systems, Ayala, A.,
Tomov, S., Stoyanov, M., Haidar, A., Dongarra J., 2022
IEEE International Parallel and Distributed Processing
Symposium Workshops (IPDPSW), 10.1109/IPDPSW55747.2022.00072
A Framework to Exploit Data
Sparsity in Tile Low-Rank Cholesky Factorization, Q. Cao, et al., in 2022 IEEE
International Parallel and Distributed Processing
Symposium (IPDPS), Lyon, France, 2022 pp. 414-424.
doi: 10.1109/IPDPS53621.2022.00047
Batch QR
Factorization on GPUs: Design, Optimization, and
Tuning,Abdelfattah,
A., S.
Tomov, and J. Dongarra, ICCS 2022, Lecture
Notes in Computer Science, vol. 13350, Cham,
Springer International Publishing, June 2022. DOI:
10.1007/978-3-031-08751-6_5
Evaluating Data
Redistribution in PaRSEC, Cao, Bosilca, Losad, Wu, Zhong,
Dongarra,Transactions on Parallel and
Distributed Systems, DOI: 10.1109/TPDS.2021.3131657
Using Long
Vector Extensions for MPI Reductions, Dong Zhong, Qinglei Cao; George Bosilca;
Jack Dongarra, Parallel Computing, Volume 109, March 2022. https://doi.org/10.1016/j.parco.2021.102871
Accelerating
Geostatistical Modeling and Prediction With
Mixed-Precision Computations: A High-Productivity
Approach With PaRSEC, Abdulah,
S., Q. Cao, Y. Pei, G. Bosilca, J. Dongarra,
M. G. Genton, D. E. Keyes, H. Ltaief,
and Y. Sun, IEEE Transactions on Parallel
and Distributed Systems, vol. 33,
issue 4, pp. 964 - 976, April 2022. DOI:10.1109/TPDS.2021.3084071
Comparing
Distributed Termination Detection Algorithms for
Modern HPC Platforms,
Bosilca,
G., A. Bouteiller, T. Herault, V. Le Fèvre,
Y. Robert, and J. Dongarra, International Journal of
Networking and Computing, vol. 12, issue 1, pp. 26 - 46, January 2022. DOI:10.15803/ijnc.12.1_26
Integrating Deep
Learning in Domain Sciences at Exascale,
Rick
Archibald,Edmond
Chow, Eduardo D'Azevedo, Jack Dongarra, Markus
Eisenbach, Rocco Febbo, Florent Lopez, Daniel Nichols,
Stanimire Tomov, Kwai Wong, Junqi Yin, (2020). Integrating Deep
Learning in Domain Sciences at Exascale. In: Nichols,
J., Verastegui, B., Maccabe, A.‘., Hernandez, O.,
Parete-Koon, S., Ahearn, T. (eds) Driving Scientific and
Engineering Discoveries Through the Convergence of HPC,
Big Data and AI. SMC 2020. Communications in Computer
and Information Science, vol 1315. Springer, Cham. https://doi.org/10.1007/978-3-030-63393-6_3 A
pdf
version is available.
A More Portable HeFFTe:
Implementing a Fallback Algorithm for Scalable Fourier
Transforms,
D. Sharp, M. Stoyanov, S. Tomov and J. Dongarra,
2021 IEEE High Performance Extreme Computing
Conference (HPEC), 2021, pp. 1-5, doi:
10.1109/HPEC49654.2021.9622811.
Revisiting Credit
Distribution Algorithms for Distributed Termination
Detection, G. Bosilca, A. Bouteiller, T.
Herault, V. Le Fèvre, Y. Robert and J. Dongarra, 2021
IEEE International Parallel and Distributed Processing
Symposium Workshops (IPDPSW), 2021, pp. 611-620,
doi: 10.1109/IPDPSW52791.2021.00095.
Accelerating Multi-Process Communication for
Parallel 3-D FFT, Ayala, A. , S. Tomov, M.
Stoyanov, A. Haidar, J. Dongarra, The
International Conference for High Performance
Computing, Networking, Storage, and Analysis (SC 21),
St. Louis, MO, Nov 2021.
Accelerating FFT
towards Exascale Computing, Ayala, A., S. Tomov, A.
Haidar, M. Stoyanov, S. Cayrols, J. Li, G. Bosilca, and
J. Dongarra, NVIDIA GPU Technology Conference
(GTC2021), Digital, March 2021.
A pdf
version is available.
An Introduction to High Performance Computing and
its Intersection with Advances in Modeling REEs and
Actinides, Deborah A. Penchoff,Edward
Valeev,Heike Jagode,Piotr
Luszczek,Anthony Danalis, George Bosilca,
Robert J. Harrison, Jack Dongarra,Theresa L.
Windus American Chemical Society.
“Computational Science for Lanthanides and Actinides” DOI:
10.1021/bk-2021-1388.ch001 A
pdf
version is available.
Scalability
Issues in FFT Computation, Ayala A., Tomov S.,
Stoyanov M., Dongarra J, PaCT 2021. Lecture Notes in
Computer Science, Volume 12942, September
2021
Exploiting
Block Structures of KKT Matrices for Efficient
Solution of Convex Optimization Problems, Iqbal,
Z., S. Nooshabadi, I. Yamazaki, S. Tomov, and J.
Dongarra, IEEE Access, August 2021.
A pdf
version is available.
file:///Users/dongarra/Dropbox/stuff/Papers/totheweb-2022/papers.htm#2021
20
years of computational science: Selected papers from
2020 International Conference on Computational Science,
Kovalchuk, Sergey V., Valeria V. Krzhizhanovskaya, P M.
A. Sloot, Gábor Závodszky, Michael H Lees, M. Paszynski,
and J. Dongarra, Journal of Computational Science,
July 2021. A
pdf
version is available.
Accelerating
Restarted GMRES with Mixed Precision Arithmetic,
Neil Lindquist, Piotr Luszczek, and Jack Dongarra, IEEE
Transactions on Parallel and Distributed Computing,
June 2021.
A pdf
version is available.
A Set of Batched
Basic Linear Algebra Subprograms,
Abdelfattah, A., T. Costa, J. Dongarra, M. Gates, A.
Haidar, S. Hammarling, N. J. Higham, J. Kurzak, P.
Luszczek, S. Tomov, et al., ACM
Transactions on Mathematical Software, accepted
October 2020.
A pdf
version is available.
A
Survey of Numerical Linear Algebra Methods Utilizing
Mixed Precision Arithmetic, Ahmad Abdelfattah,
Hartwig Anzt, Erik G. Boman, Erin Carson, Terry Cojean
Jack Dongarra, Alyson Fox, Mark Gates, Nicholas J.
Higham, Xiaoye S. Li, Jennifer Loe, Piotr Luszczek,
Srikara Pranesh, Siva Rajamanickam, Tobias Ribizel,
Barry F. Smith, Kasia Swirydowicz, Stephen Thomas,
Stanimire Tomov, Yaohung M. Tsai, and Ulrike Meier Yang,
International Journal of High Performance Computing
Applications, February 2021. https://journals.sagepub.com/doi/10.1177/10943420211003313
Distributed-Memory Multi-GPU Block-Sparse Tensor
Contraction for Electronic Structure,
Herault, T., Y. Robert, G. Bosilca, R. Harrison, C.
Lewis, E. Valeev, and J. Dongarra, 35th IEEE
International Parallel & Distributed Processing
Symposium (IPDPS 2021), Portland, OR, IEEE, May
2021, accepted December 2020.
Leveraging PaRSEC Runtime Support
to Tackle Challenging 3D Data-Sparse Matrix Problems,
Cao, Q., Y. Pei, K. Akbudak, G. Bosilca, H. Ltaief, D.
Keyes, and J. Dongarra, 35th IEEE International
Parallel & Distributed Processing Symposium (IPDPS
2021), Portland, OR, IEEE, May 2021, accepted
December 2020.
Revisiting
Credit
Distribution Algorithms for Distributed Termination
Detection, George Bosilca, Aurelien Bouteiller,
Thomas Herault, Valentin Le Fèvre, Yves Robert and Jack
Dongarra, IPDPS-APDCM2021 (Workshop on Advances in
Parallel and Distributed Computational Models, accepted
March 2021.
Accelerating
Geostatistical
Modeling and Prediction With Mixed-Precision
Computations:
A High-Productivity Approach with PaRSEC, Sameh
Abdulah, George Bosilca, Qinglei Cao, Jack Dongarra,
Marc Genton, David Keyes, Hatem Ltaief, Yu Pei, Ying
Sun, accepted in IEEE Transactions on Parallel and
Distributed Computing, May 2021.
Harnessing the
Computing Continuum for Programming Our World,
P., Beckman, J. Dongarra, N. Ferrier, G. Fox, T.
Moore, D. Reed, and M. Beck, Fog Computing:
Theory and Practice, John Wiley & Sons,
Inc., 2020. DOI: 10.1002/9781119551713.ch7
A PDF version is available.
MAGMA Templates for
Scalable Linear Algebra on Emerging Architectures,
Farhan, M. Al, A. Abdelfattah, S. Tomov, M. Gates,
D. Sukkari, A. Haidar, R. Rosenberg, and J.
Dongarra, “The International Journal of High
Performance Computing Applications, vol. 34,
issue 6, pp. 645-658, November 2020. DOI: https://doi.org/10.1177/1094342020938421 A PDF version is
available.
Design,
Optimization,and
Benchmarking of Dense Linear Algebra
Algorithms on AMD GPUs,
Brown, C., A. Abdelfattah, S. Tomov, and J.
Dongarra, 2020IEEE
High Performance Extreme Computing Virtual
Conference: IEEE,September 2020.
HAN: A
Hierarchical AutotuNed Collective Communication
Framework,
Luo, X., W. Wu, G. Bosilca, Y. Pei, Q. Cao, T.
Patinyasakdikul, D. Zhong, and J. Dongarra, IEEE
Cluster Conference, Kobe, Japan, Best Paper
Award, IEEE Computer Society Press, September 2020.
Flexible Data
Redistribution in a Task-Based Runtime System, Cao, Q., G. Bosilca, W.
Wu, D. Zhong, A. Bouteiller, and J. Dongarra, IEEE
International Conference on Cluster Computing
(Cluster 2020), Kobe, Japan, IEEE, September
2020. DOI: https://doi.org/10.1109/CLUSTER49012.2020.00032
A PDF
version is available.
Evaluating the
Performance of NVIDIA’s A100 Ampere GPU for
Sparse and Batched Computations, Anzt,
H., Y. M. Tsai, A. Abdelfattah, T. Cojean, and J.
Dongarra, 2020 IEEE/ACM Performance Modeling,
Benchmarking and Simulation of High Performance
Computer Systems (PMBS): IEEE, November 2020.
High-Order Finite
Element Method using Standard and Device-Level
Batch GEMM on GPUs, Beams, N., A.
Abdelfattah, S. Tomov, J. Dongarra, T. Kolev, and Y.
Dudouit, 2020 IEEE/ACM 11th Workshop on Latest
Advances in Scalable Algorithms for Large-Scale
Systems (ScalA): IEEE, November 2020.
Replacing
Pivoting in Distributed Gaussian Elimination
with Randomized Techniques,
Lindquist, N., P. Luszczek, and J. Dongarra, 2020
IEEE/ACM 11th Workshop on Latest Advances in
Scalable Algorithms for Large-Scale Systems
(ScalA): IEEE, November 2020. A PDF
version is available.
Using Advanced Vector
Extensions AVX-512 for MPI Reduction, Zhong, D., Q. Cao, G.
Bosilca, and J. Dongarra, EuroMPI/USA '20: 27th
European MPI Users' Group Meeting, Austin, TX,
September 2020. DOI: https://doi.org/10.1145/3416315.3416316
Mixed-Precision
Iterative Refinement using Tensor Cores on GPUs to
Accelerate Solution of Linear Systems, Haidar,
A., H. Bayraktar, S. Tomov, J. Dongarra, and N. J.
Higham, Proceedings
of the Royal Society A, vol. 476, issue 2243,
November 2020. DOI: https://doi.org/10.1098/rspa.2020.0110
A
PDF version is
available.
Matrix Multiplication on
Batches of Small Matrices in Half and Half-Complex
Precisions, Abdelfattah, A., J. Dongarra, and S.
Tomov, Journal
of Parallel and Distributed Computing, vol.
145, pp. 188–201, November 2020. DOI: https://doi.org/10.1016/j.jpdc.2020.07.001
A PDF version
is available.
Translational Process: Mathematical
Software Perspective, Dongarra, J., M. Gates, P.
Luszczek, and S. Tomov, Journal of Computational
Science, August 2020. DOI: https://doi.org/10.1016/j.jocs.2020.101216
A PDF
version is available.
Scalable
Data Generation for Evaluating Mixed-Precision
Solvers, Luszczek, P., Y. Tsai, N. Lindquist, H.
Anzt, and J. Dongarra, 2020 IEEE High Performance
Extreme Computing Conference (HPEC): IEEE, September
2020. A PDF
version is available.
Extreme-Scale
Task-Based Cholesky Factorization Toward Climate and
Weather Prediction Applications, Cao, Q., Y.
Pei, K. Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief,
D. Keyes, and J. Dongarra, The Platform for
Advanced Scientific Computing (PASC) Conference
(PASC20), 2:1-2:11 DOI: https://doi.org/10.1145/3394277.3401846
Numerical Algorithms for High-Performance
Computational Science, J., Dongarra, L.
Grigori, and N. J. Higham, Philosophical Transactions
of the Royal Society A, vol. 378, issue 2166, 2020.
DOI: 10.1098/rsta.2019.0066
A PDF
version is available.
FFT-ECP API and High-Performance Library Prototype
for 2-D and 3-D FFTs on Large-Scale Heterogeneous
Systems with GPUs, S., Tomov, A. Ayala, A.
Haidar, and J. Dongarra, no. FFT-ECP STML13-27, Innovative
Computing Laboratory, University of Tennessee, January
2020.
A PDF
version is available.
Formulation of Requirements for new PAPI++
Software Package: Part I: Survey Results, H.,
Jagode, A. Danalis, and J. Dongarra, PAPI++ Working
Notes, no. No. 1, ICL-UT-20-02, Innovative
Computing Laboratory, University of Tennessee Knoxville,
January 2020.
A PDF
version is available.
Project-Based Research and Training in High
Performance Data Sciences, Data Analytics, and Machine
Learning, K., Wong, S. Tomov, and J.
Dongarra, The Journal of Computational Science
Education, vol. 11, issue 1, 36-44, January 2020.
DOI: 10.22369/issn.2153-4136/11/1/7
A PDF
version is available.
Performance Tuning SLATE, M., Gates, A.
Charara, A. YarKhan, D. Sukkari, M. Al Farhan, and J.
Dongarra, SLATE Working Notes, no. 14,
ICL-UT-20-01, Innovative Computing Laboratory, University
of Tennessee, January 2020.
A PDF
version is available.
Load-balancing Sparse Matrix Vector Product
Kernels on GPUs, H., Anzt, Y-C. Chen, T.
Cojean, J. Dongarra, G. Flegar, R. Nayak, E. S.
Quintana-Orti, Y. Tsai, and W. Wang, ACM Transactions
on Parallel Computing, issue 2, March 2020. DOI: 10.1145/3380930
A PDF
version is available.
Asynchronous SGD for DNN Training on Shared-Memory
Parallel Architectures, F., Lopez, E. Chow,
S. Tomov, and J. Dongarra, Innovative Computing
Laboratory Technical Report, no. ICL-UT-20-04,
University of Tennessee, Knoxville, March 2020.
A PDF
version is available.
Reducing the Amount of out-of-core Data Access for
GPU-Accelerated Randomized SVD, Y., Lu, I.
Yamazaki, F. Ino, Y. Matsushita, S. Tomov, and J.
Dongarra, Concurrency and Computation: Practice and
Experience, April 2020. DOI:
10.1002/cpe.5754
A PDF
version is available.
Using Arm Scalable Vector Extension to optimize
Open MPI, D., Zhong, P. Shamis, Q. Cao, G.
Bosilca, and J. Dongarra, 20th IEEE/ACM International
Symposium on Cluster, Cloud and Internet Computing
(CCGRID 2020), Melbourne, Australia, IEEE/ACM, May
2020.
A pdf
version is available.
Asynchronous SGD for DNN training on Shared-memory
Parallel Architectures, F., Lopez, E. Chow,
S. Tomov, and J. Dongarra, Workshop on Scalable Deep
Learning over Parallel And Distributed Infrastructures
(ScaDL 2020), May 2020.
A PDF
version is available.
Mixed-Precision Solution of Linear Systems Using
Accelerator-Based Computing, A., Haidar, H.
Bayraktar, S. Tomov, J. Dongarra, and N. J. Higham, Innovative
Computing Laboratory Technical Report, no.
ICL-UT-20-05, University of Tennessee, May 2020.
A PDF
version is available.
Communication Avoiding 2D Stencil Implementations
over PaRSEC Task-Based Runtime, Y., Pei, Q.
Cao, G. Bosilca, P. Luszczek, V. Eijkhout, and J.
Dongarra, 21st IEEE International Workshop on
Parallel and Distributed Scientific and Engineering
Computing (PDSEC 2020), New Orleans, LA, IEEE, May
2020.
A PDF
version is available.
Twenty Years of Computational Science,
V., Krzhizhanovskaya, G. Závodszky, M. Lees, J. Dongarra,
P. Sloot, S. Brissos, and J. Teixeira, International
Conference on Computational Science (ICCS 2020),
Amsterdam, Netherlands, June 2020.
heFFTe: Highly Efficient FFT for Exascale,
A., Ayala, S. Tomov, A. Haidar, and J. Dongarra, International
Conference on Computational Science (ICCS 2020),
Amsterdam, Netherlands, June 2020.
A pdf
version is available.
Investigating the Benefit of FP16-enabled
Mixed-precision Solvers for Symmetric Positive Definite
Matrices using GPUs, A., Abdelfattah, S.
Tomov, and J. Dongarra, International Conference on
Computational Science (ICCS 2020), Amsterdam,
Netherlands, Elsevier, June 2020.
A pdf
version is available.
Report on the Fujitsu Fugaku System,
J., Dongarra, Innovative Computing Laboratory
Technical Report, no. ICL-UT-20-06, University of
Tennessee, June 2020.
A PDF
version is available.
Improving the Performance of the GMRES method
using Mixed-Precision Techniques, N.,
Lindquist, P. Luszczek, and J. Dongarra, Smoky
Mountains Computational Sciences & Engineering
Conference (SMC2020), August 2020.
A pdf
version is available.
SLATE Users' Guide,
Gates, M., A. Charara, J. Kurzak, A. YarKhan, M. Al
Farhan, D. Sukkari, and J. Dongarra, SLATE Working
Notes, no. 10, ICL-UT-19-01:
Innovative Computing Laboratory, University of
Tennessee, July 2020.
Performance Tuning
SLATE, Gates, M., A. Charara, A. YarKhan, D.
Sukkari, M. Al Farhan, and J. Dongarra, SLATE Working
Notes, no. 14, ICL-UT-20-01: Innovative Computing
Laboratory, University of Tennessee, January 2020.
Prospectus for the
Next LAPACK and ScaLAPACK Libraries: Basic ALgebra
LIbraries for Sustainable Technology with
Interdisciplinary Collaboration (BALLISTIC),
Demmel, J., J. Dongarra, J. Langou, J. Langou, P.
Luszczek, and M. Mahoney, LAPACK Working Notes, no.
297, ICL-UT-20-07: University of Tennessee.
Integrating
Deep
Learning in Domain Sciences at Exascale,
Archibald, R., E. Chow, E. D’Azevedo, J. Dongarra, M.
Eisenbach, R. Febbo, F. Lopez, D. Nichols, S. Tomov, K.
Wong, and J. Yin, Innovative
Computing Laboratory Technical Report no. ICL-UT-20-10:
University of Tennessee, August 2020.
A PDF version is available.
SLATE Performance
Report: Updates to Cholesky and LU Factorizations, YarKhan, A., M. Al
Farhan, D. Sukkari, M. Gates, and J. Dongarra,
Innovative Computing Laboratory Technical Report, no.
ICL-UT-20-14: University of Tennessee, October 2020.
Adaptive Precision in Block-Jacobi Preconditioning
for Iterative Sparse Linear System Solvers, Anzt,
H., J. Dongarra, G. Flegar, N. J. Higham, and E. S.
Quintana-Orti, Concurrency and Computation: Practice and
Experience, vol. 31, no. 6, pp. e4460, March 2019. DOI:
10.1002/cpe.4460
A PDF
version is available.
Algorithms and Optimization Techniques for
High-Performance Matrix-Matrix Multiplications of Very
Small Matrices, Masliah, I., A. Abdelfattah, A.
Haidar, S. Tomov, M. Baboulin, J. Falcou, and J.
Dongarra, Parallel Computing, vol. 81, pp. 1-21, January
2019. DOI: 10.1016/j.parco.2018.10.003
A PDF
version is available.
CEED ECP Milestone Report: Performance Tuning of CEED
Software and 1st and 2nd Wave Apps: Zenodo, Tomov,
S., A. Abdelfattah, V. Barra, N. Beams, J. Brown, J-S.
Camier, V. Dobrev, J. Dongarra, Y. Dudouit, P. Fischer,
et al., October 2019. DOI: 10.5281/zenodo.3477618
A PDF
version is available.
Characterization of Power Usage and Performance in
Data-Intensive Applications using MapReduce over MPI,
Davis, J., T. Gao, S. Chandrasekaran, H. Jagode, A.
Danalis, P. Balaji, J. Dongarra, and M. Taufer, 2019
International Conference on Parallel Computing
(ParCo2019), Prague, Czech Republic, September 2019.
Checkpointing Strategies for Shared High-Performance
Computing Platforms, Herault, T., Y. Robert, A.
Bouteiller, D. Arnold, K. Ferreira, G. Bosilca, and J.
Dongarra, International Journal of Networking and
Computing, vol. 9, no. 1, pp. 28-52, 2019.
A PDF
version is available.
Comparing the Performance of Rigid, Moldable, and
Grid-Shaped Applications on Failure-Prone HPC
Platforms, Le Fevre, V., T. Herault, Y. Robert, A.
Bouteiller, A. Hori, G. Bosilca, and J. Dongarra,
Parallel Computing, vol. 85, pp. 1-12, July 2019. DOI:
10.1016/j.parco.2019.02.002
A PDF
version is available.
Counter Inspection Toolkit: Making Sense out of
Hardware Performance Events, Danalis, A., H.
Jagode, H. Hanumantharayappa, S. Ragate, and J.
Dongarra, 11th International Workshop on Parallel Tools
for High Performance Computing, Dresden, Germany, Cham,
Switzerland: Springer, February 2019. DOI:
10.1007/978-3-030-11987-4_2
A PDF
version is available.
Design and Implementation for FFT-ECP on Distributed
Accelerated Systems, Tomov, S., A. Haidar, A.
Ayala, D. Schultz, and J. Dongarra, Innovative Computing
Laboratory Technical Report, no. ICL-UT-19-05:
University of Tennessee, April 2019.
A PDF
version is available.
Distributed-Memory Lattice H-Matrix Factorization,
Yamazaki, I., A. Ida, R. Yokota, and J. Dongarra, The
International Journal of High Performance Computing
Applications, vol. 33, issue 5, pp. 1046-1063, August
2019. DOI: 10.1177/1094342019861139
A PDF
version is available.
An Empirical View of SLATE Algorithms on Scalable
Hybrid System, YarKhan, A., J. Kurzak, A.
Abdelfattah, and J. Dongarra, Innovative Computing
Laboratory Technical Report, no. ICL-UT-19-08:
University of Tennessee, Knoxville, September 2019.
A PDF
version is available.
Evaluation of Directive-Based Performance Portable
Programming Models, Lopez, M. G., W. Joubert, V.
Larrea, O. Hernandez, A. Haidar, S. Tomov, and J.
Dongarra, International Journal of High Performance
Computing and Networking, vol. 14, issue 2, pp. 165-182.
DOI: http://dx.doi.org/10.1504/IJHPCN.2017.10009064
A PDF
version is available.
Evaluation of Programming Models to Address Load
Imbalance on Distributed Multi-Core CPUs: A Case Study
with Block Low-Rank Factorization, Pei, Y., G.
Bosilca, I. Yamazaki, A. Ida, and J. Dongarra, PAW-ATM
Workshop at SC19, Denver, CO, ACM, November 2019.
A PDF
version is available.
Fast Batched Matrix Multiplication for Small Sizes
using Half Precision Arithmetic on GPUs,
Abdelfattah, A., S. Tomov, and J. Dongarra, 33rd IEEE
International Parallel and Distributed Processing
Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May
2019.
A PDF
version is available.
FFT-ECP Implementation Optimizations and Features
Phase, Tomov, S., A. Haidar, A. Ayala, H. Shaiek,
and J. Dongarra, Innovative Computing Laboratory
Technical Report, no. ICL-UT-19-12: University of
Tennessee, October 2019.
A PDF
version is available.
Generic Matrix Multiplication for Multi-GPU
Accelerated Distributed-Memory Platforms over PaRSEC,
Herault, T., Y. Robert, G. Bosilca, and J. Dongarra,
ScalA'19: 10th Workshop on Latest Advances in Scalable
Algorithms for Large-Scale Systems, Denver, CO, IEEE,
November 2019.
A PDF
version is available.
GPUDirect MPI Communications and Optimizations to
Accelerate FFTs on Exascale Systems, Shaiek, H.,
S. Tomov, A. Ayala, A. Haidar, and J. Dongarra,
EuroMPI'19 Posters, Zurich, Switzerland, no.
icl-ut-19-06: ICL, September 2019.
A PDF
version is available.
Hands-on Research and Training in High-Performance
Data Sciences, Data Analytics, and Machine Learning
for Emerging Environments, Wong, K., S. Tomov, and
J. Dongarra, ISC High Performance, Frankfurt, Germany,
Springer International Publishing, June 2019.
A PDF
version is available.
Impacts of Multi-GPU MPI Collective Communications on
Large FFT Computation, Ayala, A., S. Tomov, X.
Luo, H. Shaiek, A. Haidar, G. Bosilca, and J. Dongarra,
Workshop on Exascale MPI (ExaMPI) at SC19, Denver, CO,
November 2019.
A PDF
version is available.
Increasing Accuracy of Iterative Refinement in
Limited Floating-Point Arithmetic on Half-Precision
Accelerators, Luszczek, P., I. Yamazaki, and J.
Dongarra, IEEE High Performance Extreme Computing
Conference (HPEC 2019), Best Paper Finalist, Waltham,
MA, IEEE, September 2019.
A PDF
version is available.
Least Squares Solvers for Distributed-Memory Machines
with GPU Accelerators, Kurzak, J., M. Gates, A.
Charara, A. YarKhan, and J. Dongarra, ACM International
Conference on Supercomputing (ICS '19), Phoenix,
Arizona, ACM, pp. 117–126, June 2019. DOI:
10.1145/3324989.3325719
A PDF
version is available.
Linear Systems Solvers for Distributed-Memory
Machines with GPU Accelerators, Kurzak, J., M.
Gates, A. Charara, A. YarKhan, I. Yamazaki, and J.
Dongarra, Euro-Par 2019: Parallel Processing, vol.
11725: Springer, pp. 495–506, August 2019. DOI:
10.1007/978-3-030-29400-7_35
MagmaDNN: Towards High-Performance Data Analytics and
Machine Learning for Data-Driven Scientific Computing,
Nichols, D., N-S. Tomov, F. Betancourt, S. Tomov, K.
Wong, and J. Dongarra, ISC High Performance, Frankfurt,
Germany, Springer International Publishing, June 2019.
A PDF
version is available.
Massively Parallel Automated Software Tuning,
Kurzak, J., Y. Tsai, M. Gates, A. Abdelfattah, and J.
Dongarra, 48th International Conference on Parallel
Processing (ICPP 2019), Kyoto, Japan, ACM Press, August
2019. DOI: 10.1145/3337821.3337908
A PDF
version is available.
Solving Linear Diophantine Systems on Parallel
Architectures, Zaitsev, D., S. Tomov, and J.
Dongarra, IEEE Transactions on Parallel and Distributed
Systems, vol. 30, issue 5, pp. 1158-1169, May 2019,
2018. DOI: http://dx.doi.org/10.1109/TPDS.2018.2873354
A PDF
version is available.
Matrix Powers Kernels for Thick-Restart Lanczos with
Explicit External Deflation, Bai, Z., J. Dongarra,
D. Lu, and I. Yamazaki, International Parallel and
Distributed Processing Symposium (IPDPS), Rio de
Janeiro, Brazil, IEEE, May 2019.
A PDF
version is available.
PAPI Software-Defined Events for in-Depth Performance
Analysis, Jagode, H., A. Danalis, H. Anzt, and J.
Dongarra, The International Journal of High Performance
Computing Applications, vol. 33, issue 6, pp. 1113-1127,
November 2019.
A PDF
version is available.
ParILUT - A Parallel Threshold ILU for GPUs,
Anzt, H., T. Ribizel, G. Flegar, E. Chow, and J.
Dongarra, IEEE International Parallel and Distributed
Processing Symposium (IPDPS), Rio de Janeiro, Brazil,
IEEE, May 2019. DOI: 10.1109/IPDPS.2019.00033
A PDF
version is available.
Performance Analysis of Tile Low-Rank Cholesky
Factorization Using PaRSEC Instrumentation Tools,
Cao, Q., Y. Pei, T. Herault, K. Akbudak, A. Mikhalev, G.
Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, Workshop
on Programming and Performance Visualization Tools
(ProTools 19) at SC19, Denver, CO, ACM, November 2019.
A PDF
version is available.
Performance of Asynchronous Optimized Schwarz with
One-sided Communication, Yamazaki, I., E. Chow, A.
Bouteiller, and J. Dongarra, Parallel Computing, vol.
86, pp. 66-81, August 2019. DOI:
10.1016/j.parco.2019.05.004
A PDF
version is available.
PLASMA: Parallel Linear Algebra Software for
Multicore Using OpenMP, Dongarra, J., M. Gates, A.
Haidar, J. Kurzak, P. Luszczek, P. Wu, I. Yamazaki, A.
YarKhan, M. Abalenkovs, N. Bagherpour, et al., ACM
Transactions on Mathematical Software, vol. 45, issue 2,
June 2019. DOI: 10.1145/3264491 DOI
A PDF
version is available.
Progressive Optimization of Batched LU Factorization
on GPUs, Abdelfattah, A., S. Tomov, and J.
Dongarra, IEEE High Performance Extreme Computing
Conference (HPEC’19), Waltham, MA, IEEE, September
2019.
A PDF
version is available.
Race to Exascale, Dongarra, J., S. Gottlieb, and
W. T. Kramer, Computing in Science and Engineering, vol.
21, issue 1, pp. 4-5, March 2019. DOI:
10.1109/MCSE.2018.2882574
A PDF
version is available.
SLATE: Design of a Modern Distributed and Accelerated
Linear Algebra Library, Gates, M., J. Kurzak, A.
Charara, A. YarKhan, and J. Dongarra, International
Conference for High Performance Computing, Networking,
Storage and Analysis (SC19), Denver, CO, ACM, November
2019. DOI: 10.1145/3295500.3356223
A PDF
version is available.
SLATE Developers' Guide, Charara, A., M. Gates,
J. Kurzak, A. YarKhan, and J. Dongarra, SLATE Working
Notes, no. 11, ICL-UT-19-02: Innovative Computing
Laboratory, University of Tennessee, December 2019.
A PDF
version is available.
SLATE Mixed Precision Performance Report,
Charara, A., J. Dongarra, M. Gates, J. Kurzak, and A.
YarKhan, Innovative Computing Laboratory Technical
Report, no. ICL-UT-19-03: University of Tennessee, April
2019.
A PDF
version is available.
SLATE Users' Guide, Gates, M., A. Charara, J.
Kurzak, and J. Dongarra, SLATE Working Notes, no. 10,
ICL-UT-19-01: Innovative Computing Laboratory,
University of Tennessee, January 2019.
SLATE Working Note 12: Implementing Matrix Inversions,
Kurzak, J., M. Gates, A. Charara, A. YarKhan, and J.
Dongarra, SLATE Working Notes, no. 12, ICL-UT-19-04:
Innovative Computing Laboratory, University of
Tennessee, June 2019.
A PDF
version is available.
SLATE Working Note 13: Implementing Singular Value
and Symmetric/Hermitian Eigenvalue Solvers, Gates,
M., M. Al Farhan, A. Charara, J. Kurzak, D. Sukkari, A.
YarKhan, and J. Dongarra, SLATE Working Notes, no. 13,
ICL-UT-19-07: Innovative Computing Laboratory,
University of Tennessee, September 2019.
A PDF
version is available.
Software-Defined Events through PAPI, Danalis,
A., H. Jagode, T. Herault, P. Luszczek, and J. Dongarra,
2019 IEEE International Parallel and Distributed
Processing Symposium Workshops (IPDPSW), Rio de Janeiro,
Brazil, IEEE, May 2019. DOI: 10.1109/IPDPSW.2019.00069
A PDF
version is available.
Towards Continuous Benchmarking, Anzt, H., Y.
Chen Chen, T. Cojean, J. Dongarra, G. Flegar, P. Nayak,
E. S. Quintana-Orti, Y. M. Tsai, and W. Wang, Platform
for Advanced Scientific Computing Conference (PASC
2019), Zurich, Switzerland, ACM Press, June 2019. DOI:
10.1145/3324989.3325719
A PDF
version is available.
Towards Half-Precision Computation for Complex
Matrices: A Case Study for Mixed Precision Solvers on
GPUs, Abdelfattah, A., S. Tomov, and J. Dongarra,
ScalA19: 10th Workshop on Latest Advances in Scalable
Algorithms for Large-Scale Systems, Denver, CO, IEEE,
November 2019.
A PDF
version is available.
What it Takes to keep PAPI Instrumental for the HPC
Community, Jagode, H., A. Danalis, and J.
Dongarra, 1st Workshop on Sustainable Scientific
Software (CW3S19), Collegeville, Minnesota, July 2019.
A PDF
version is available.
- 2018 -
Analyzing
Performance of BiCGStab with Hierarchical Matrix
on GPU Clusters, Yamazaki, I., A. Abdelfattah, A. Ida, S.
Ohshima, S. Tomov, R. Yokota, and J. Dongarra,
IEEE International Parallel
and Distributed Processing Symposium (IPDPS),
Vancouver, BC, Canada, IEEE, May 2018.
https://www.semanticscholar.org/paper/Analyzing-Performance-of-BiCGStab-with-Hierarchical-Yamazaki-Abdelfattah/9f5a5449a04f09fb8b27a106f363dc5a5035a1b9
A pdf
version is available.
Do moldable applications perform better on
failure-prone HPC platforms? Le F�vre, V., G. Bosilca, A. Bouteiller,
T. Herault, A. Hori, Y. Robert, and J.
Dongarra, 11th
Workshop on Resiliency in High Performance Computing
in Clusters, Clouds, and Grids, Turin, Italy,
Springer Verlag, August 2018.
https://link.springer.com/chapter/10.1007/978-3-030-10549-5_61 A pdf
version is available.
ADAPT: An Event-Based Adaptive Collective
Communication Framework, Luo, X., W. Wu, G. Bosilca, T.
Patinyasakdikul, L. Wang, and J. Dongarra, The 27th International Symposium on
High-Performance Parallel and Distributed Computing
(HPDC '18), Tempe, Arizona, ACM Press, June 2018,
http://dx.doi.org/10.1145/3208040.3208054.
A pdf
version is available.
Optimal Cooperative Checkpointing for Shared
High-Performance Computing Platforms, Herault, T., Y. Robert, A. Bouteiller, D.
Arnold, K. Ferreira, G. Bosilca, and J. Dongarra, 2018 IEEE International Parallel and
Distributed Processing Symposium Workshops (IPDPSW),
Best Paper Award, Vancouver, BC, Canada, IEEE, May
2018.
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8425494
A pdf
version is available.
Harnessing GPU Tensor Cores for Fast FP16
Arithmetic to Speed up Mixed-Precision Iterative
Refinement Solvers, Haidar,
A., S. Tomov, J. Dongarra, and N. J. Higham, International Conference for High
Performance Computing, Networking, Storage, and
Analysis (SC18), Dallas, TX, IEEE, November 2018.
DOI: 10.1109/SC.2018.00050
https://dl.acm.org/citation.cfm?id=3291719 A pdf
version is available.
The Design of Fast and Energy-Efficient Linear
Solvers: On the Potential of Half-Precision
Arithmetic and Iterative Refinement Techniques, Haidar, A., A. Abdelfattah, M. Zounon, P.
Wu, S. Pranesh, S. Tomov, and J. Dongarra, International Conference on Computational
Science (ICCS 2018), vol. 10860, Wuxi, China,
Springer, pp. 586–600, June 2018,
https://doi.org/10.1007/978-3-319-93698-7_45 A pdf
version is available. Variable-Size Batched
Condition Number Calculation on GPUs, Anzt, H., J. Dongarra, G. Flegar, and T.
Gruetzmacher, SBAC-PAD, Lyon, France, September 2018.
https://ieeexplore.ieee.org/document/8645907 A pdf
version is available.
A Jaccard Weights Kernel Leveraging Independent
Thread Scheduling on GPUs, Anzt, H. and J. Dongarra, SBAC-PAD, Lyon, France, September 2018.
https://ieeexplore.ieee.org/document/8645946
A pdf
version is available.
Symmetric Indefinite Linear Solver using OpenMP
Task on Multicore Architectures, Yamazaki, I., J. Kurzak, P. Wu, M. Zounon,
and J. Dongarra, IEEE
Transactions on Parallel and Distributed Systems,
vol. 29, issue 8, pp. 1879–1892, August 2018,
http://dx.doi.org/10.1109/TPDS.2018.2808964. A pdf
version is available.
Computational Benefit of GPU Optimization for
Atmospheric Chemistry Modeling, Sun, J., J. Fu, J. Drake, Q. Zhu, A.
Haidar, M. Gates, S. Tomov, and J. Dongarra, Journal of Advances in Modeling Earth
Systems, vol. 10, issue 8, pp. 1952–1969, August
2018, https://doi.org/10.1029/2018MS001276. A pdf
version is available.
Evaluation of Dataflow Programming Models for
Electronic Structure Theory, Jagode, H., A. Danalis, R. Hoque, M.
Faverge, and J. Dongarra, Concurrency and Computation: Practice and
Experience: Special Issue on Parallel and
Distributed Algorithms, vol. 2018, issue e4490, pp.
1–20, May 2018, https://doi.org/10.1002/cpe.4490.
A pdf
version is available.
Accelerating NWChem Coupled Cluster through
Dataflow-Based Execution, Jagode, H., A. Danalis, and J. Dongarra, International Journal of High Performance
Computing Applications, vol. 32, issue 4, pp.
540–551, July 2018,
https://doi.org/10.1007/978-3-319-32149-3_35.
A pdf
version is available.
Investigating Power Capping toward
Energy-Efficient Scientific Applications, Haidar, A., H. Jagode, P. Vaccaro, A.
YarKhan, S. Tomov, and J. Dongarra, Concurrency Computation: Practice and
Experience, vol. 2018, issue e4485, pp. 1–14, April
2018, http://dx.doi.org/10.1002/cpe.4485.
A pdf
version is available.
A Guide for Achieving High Performance with Very
Small Matrices on GPUs: A Case Study of Batched LU
and Cholesky Factorizations, Haidar, A., A. Abdelfattah, M. Zounon, S.
Tomov, and J. Dongarra, IEEE Transactions on Parallel and
Distributed Systems, vol. 29, issue 5, pp. 973–984,
May 2018, https://doi.org/10.1109/TPDS.2017.2783929.
A pdf
version is available.
Accelerating the SVD Two Stage Bidiagonal
Reduction and Divide and Conquer Using GPUs, Gates, M., S. Tomov, and J. Dongarra, Parallel Computing, vol. 74, pp. 3–18, May
2018, http://dx.doi.org/10.1016/j.parco.2017.10.004. A pdf
version is available.
The 30th Anniversary of the Supercomputing
Conference: Bringing the Future
Closer—Supercomputing History and the Immortality
of Now, Dongarra,
J., V. Getov, and K. Walsh, Computer, vol. 51, issue 10, pp. 74–85,
November 2018,
http://dx.doi.org/10.1109/MC.2018.3971352.
A pdf
version is available.
Autotuning Numerical Dense Linear Algebra for
Batched Computation With GPU Hardware
Accelerators, Dongarra, J., M. Gates, J. Kurzak, P.
Luszczek, and Y. Tsai, Proceedings of the IEEE, vol. 106, issue
11, pp. 2040–2055, November 2018,
http://dx.doi.org/10.1109/JPROC.2018.2868961.
A pdf
version is available.
A Failure Detection for HPC Platforms, G.
Bosilca, A. Bouteiller, A. Guermouche, T. Herault,
Y. Roberts, P. Sens, and J. Dongarra, International
Journal of High Performance Computing Applications,
Volume 32 Issue 1, January 2018, pp 139-158,
http://journals.sagepub.com/doi/10.1177/1094342017711505
A pdf
version is available.
Accelerating the
SVD Bidiagonalization of a Batch of Small Matrices
using GPUs, Tingxing Dong Azzam Haidar
Stanimire Tomov Jack Dongarra, Journal of
Computational Science, January 2018,
https://doi.org/doi:10.1016/j.jocs.2018.01.007
A pdf version
is available.
Adaptive Precision
in Block-Jacobi Preconditioning for Iterative
Sparse Linear System Solvers, H. Anzt, J.
Dongarra, G. Flegar, N. Higham, E. Quintana-Orti,
Concurrency and Computation: Practice and
Experience, http://dx.doi.org/10.1002/cpe.4460,
January, 2018.
A pdf
version is available.
A Guide for
Achieving High Performance With Very Small
Matrices On GPU: A case Study of Batched LU and
Cholesky Factorizations, Azzam Haidar,
Ahmad Abdelfattah, Mawussi Zounon, Stanimire Tomov,
Jack Dongarra, IEEE Transactions on Parallel
and Distributed Systems, Vol. 29, No. 5, May 2018,
DOI: 10.1109/TPDS.2017.2783929
A pdf
version is available.
Investigating Power
Capping toward Energy-Efficient Scientific
Applications, A. Haidar, H. Jagode, A.
YarKhan, P. Vaccaro, S. Tomov, J. Dongarra,
Concurrency and Computations: Practice and
Experience, February 2018, DOI: 10.1002/cpe.4485.
A pdf
version is available.
Evaluation of
Dataflow Programming Models for Electronic
Structure Theory, H. Jagode, A. Danalis, R.
Hoque, M. Faverge, and J. Dongarra, Concurrency and
Computations: Practice and Experience, vol. 2018,
issue e4490, pp. 1--20, May 2018.
https://doi.org/10.1002/cpe.4490
A pdf
version is available.
Big Data and
Extreme-Scale Computing: Pathways to
Convergence-Toward a Shaping Strategy for a Future
Software and Data Ecosystem for Scientific Inquiry,
M. Asch, et al., International Journal of High
Performance Computing Applications, Volume 32 Issue
4, Fall 2018, pp 435-479.
doi.org/10.1177/1094342018778123
A pdf
version is available.
PARILUT - A New
Parallel Threshold ILU Factorization, H.
Anzt, E. Chow, J. Dongarra, SIAM SISC, Vol 40 No 4,
pp C503-C519. https://doi.org/10.1137/16M1079506
A pdf
version is available.
Autotuning in
High-Performance Computing Applications,
Prasanna Balaprakash, Jack Dongarra, Todd Gamblin,
Mary Hall, Jeffrey K. Hollingsworth, Boyana Norris,
and Richard Vuduc, IEEE Proceedings, August 2018.
DOI:10.1109/JPROC.2018.2841200
A pdf
version is available.
Evaluation of
Directive-based Performance Portable Programming
Models, M. Graham Lopez, Wayne Joubert,
Veronica Vergara Larrea, Oscar Hernandez, Azzam
Haidar, Stanimire Tomov, Jack Dongarra, Int. J.
Signal and Imaging Systems Engineering, Vol. x, No.
x, 2017 , 2017. DOI: 10.1504/IJHPCN.2017.10009064
A pdf
version is available.
Accelerating the
SVD Two Stage Reduction and Divide-and-Conquer
Using GPUs, Mark Gates, Stanimire Tomov,
Jack Dongarra, Parallel Computing, accepted November
2017.
A pdf
version is available.
Batched One-sided
Factorizations of Tiny Matrices using GPUs:
Challenges and Countermeasure, Ahmad
Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack
Dongarra, Journal of Computational Science, Volume
26, May 2018, pp 226-236.
https://doi.org/10.1016/j.jocs.2018.01.005
A pdf
version is available.
Symmetric
Indefinite Linear Solver using OpenMP Task on
Manycore Architecture, I Yamazaki, J.
Kurzak, P. Wu, Z. Mawussi, J. Dongarra, IEEE
Transactions on Parallel and Distributed Systems,
Volume: 29, Issue: 8, Aug. 1 2018.
10.1109/TPDS.2018.2808964
A pdf
version is available.
The Singular Value
Decomposition: Anatomy of Optimizing an Algorithm
for Exascale, J. Dongarra, M. Gates, A.
Haidar, J. Kurzak, P. Luszczek, S. Tomov, I.
Yamazaki, SIAM Review, vol. 60, issue 4, pp.
808–865, November 2018,
https://doi.org/10.1137/17M1117732.
A pdf
version is available.
PLASMA: Parallel
Linear Algebra Software for Multicore Using OpenMP,
J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P.
Luszczek, P. Wu, I. Yamazaki, A. YarKhan, M.
Abalenkovs, N. Bagherpour, S. Hammarling, J. Sistek,
Accepted in ACM TOMS July 2018.
A pdf
version is available.
Using Jacobi
Iterations and Blocking for Solving Sparse
Triangular Systems in Incomplete Factorization
Preconditioning, Edmond Chow, Hartwig Anzt,
Jennifer Scott, Jack Dongarra, Journal of Parallel
and Distributed Computing, 119, pp 219-230, 2018.
https://doi.org/10.1016/j.jpdc.2018.04.017
A pdf
version is available.
Analysis and Design
Techniques towards High-Performance and
Energy-Efficient Dense Linear Solvers on GPUs”,
IEEE Transaction on Parallel and Distributed
Systems, Accepted May 2018.
10.1109/TPDS.2018.2842785
A pdf
version is available.
Computational
Benefit of GPU Optimization for the Atmospheric
Chemistry Modeling, Jian Sun, Joshua Fu,
John Drake, Qingzhao Zhu, Azzam Haidar, Mark Gates,
Stanimire Tomov, Jack Dongarra, Journal of Advances
in Modeling Earth Systems,
https://doi.org/10.1029/2018MS001276
A pdf
version is available.
Analyzing
Performance of BiCGStab with Hierarchical-matrix on GPU
clusters, Ichitaro Yamazaki, Ahmad
Abdelfattah, Akihiro Ida, Satoshi Ohshima, Stanimire
Tomov, Rio Yokota and Jack Dongarra, IEEE
International Parallel and Distributed Processing
Symposium (IPDPS), Vancouver, British Columbia,
Canada, IEEE, May 2018.
A pdf
version is available.
Optimal Cooperative
Checkpointing for Shared High-Performance
Computing Platforms, T. Herault, Y. Robert,
A. Bouteiller, D. Arnold, K. Ferreira, G. Bosilca,
J. Dongarra, APDCM Workshop at IPDPS 2018, Best
paper award.
A pdf
version is available.
ADAPT: An
Event-Based Adaptive Collective Communication
Framework, Wu, W., G. Bosilca, X.
Luo, T. Patinyasakdikul, L. Wang, and J. Dongarra,
Proceedings of the 27th International Symposium on
High-Performance Parallel and Distributed Computing
- HPDC '18, Tempe, Arizona, ACM Press, June 2018.
10.1145/3208040.3208054
A pdf
version is available.
Harnessing GPU's
Tensor Cores Fast FP16 Arithmetic to Speedup
Mixed-Precision Iterative Refinement Solvers,”
Tomov, Azzam, Dongarra, Higham, submitted to SC18.
A pdf
version is available.
Optimizing GPU
Kernels for Irregular Batch Workloads: A Case
Study for Cholesky Factorization, Ahmad
Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack
Dongarra, accepted in IEEE HPEC September 2018,
Waltham, MA.
A pdf
version is available.
Do moldable
applications perform better on failure-prone HPC
platforms?, Euro-Par 2018, Resilience
Workshop, Turin Italy, Accepted June 2018.
A pdf
version is available.
Incomplete Sparse
Approximate Inverses for Parallel Preconditioning,
H. Anzt, T. Huckle, J. Brackle , J. Dongarra,
Parallel Computing, Volume 71, January 2018, Pages
1-22, doi.org/10.1016/j.parco.2017.10.003.
A pdf
version is available.
- 2017 -
A Look Back on 30 Years of the
Gordon Bell Prize, Gordon Bell, David
Bailey, Alan H. Karp, Jack Dongarra, Kevin
Walsh, International Journal of High
Performance Computing and Networking, 2017,
Vol. 31(6) 469–484, DOI:
10.1177/1094342017738610
A pdf
version is available.
The Design
and Performance of Batched BLAS on Modern
High-Performance Computing Systems, Jack
Dongarra, Sven Hammarling, Nick Higham,
Samuel Relton, Pedro Valero-Lara, and
Mawussi Zounon, ICCS’17, ETH Zurich,
Procedia Computer Science, Volume 108, 2017,
Pages 495-504,
DOI:10.1016/j.procs.2017.05.138 A
pdf
version is available.
Autotuning Batch Cholesky Factorization in
CUDA with Interleaved Layout of Matrices,
Mark Gates, Jakub Kurzak, Piotr
Luszczek, Yu Pei and Jack Dongarra, iWAPT
2017 at IPDPS 2017. A
pdf
version is available.
Variable-Size Batched LU for Small
Matrices and its Integration into
Block-Jacobi Preconditioning, Hartwig
Anzt,
Jack Dongarra, Goran Flegar and Enrique S.
Quintana-Orti, 2017 46th International
Conference on Parallel Processing (ICPP),
August 2017, pp 91-100, DOI:
10.1109/ICPP.2017.18 A
pdf
version is available.
Out of Memory SVD Solver for Big Data, Azzam
Haidar, Khairul Kabir, Diana Fayad,
Stanimire Tomov, Jack Dongarra, 2017 IEEE
High Performance Extreme Computing
Conference. September 2017, pp 1-7, DOI:
10.1109/HPEC.2017.8091029 A
pdf
version is available.
Towards
Numerical Benchmark for Half-Precision
Floating Point Arithmetic, Piotr
Luszczek, Jakub Kurzak, Ichitaro Yamazaki
and Jack Dongarra, 2017 IEEE High
Performance Extreme Computing Conference,
Boston, 2017, DOI: 10.1109/HPEC.2017.8091031
A pdf
version is available.
Sampling
Algorithms to Update Truncated SVD,
Ichitaro Yamazaki, Stanimire Tomov and Jack
Dongarra, accepted at the IEEE Big Data 2017
Conference, Boston MA, December 11-16, 2017.
A pdf
version is available.
Flexible
Batched Sparse Matrix-Vector Product on
GPUs, H. Anzt, G. Collins, J.
Dongarra, G. Flegar, and E. S.
Quintana-Orti, The 8th Workshop on Latest
Advances in Scalable Algorithms for
Large-Scale Systems (ScalA ‘17), Denver,
Colorado, ACM Press, November 2017.
A pdf
version is available.
Power-Aware
Computing: Measurement, Control, and
Performance Analysis for Intel Xeon Phi,
Azzam Haidar, Heike Jagode, Asim Yarkhan,
Phil Vaccaro, Stanimire Tomov and Jack
Dongarra, 2017 IEEE High Performance Extreme
Computing Conference (HPEC), September 2017,
DOI: 10.1109/HPEC.2017.8091085
A pdf
version is available.
Scaling Point
Set Registration in 3D Across Thread
Counts on Multicore and Hardware
Accelerator Platforms through Autotuning
for Large Scale Analysis of Scientific
Point Clouds, Piotr Luszczek, Jakub
Kurzak, Ichitaro Yamazaki, David Keffer, and
Jack Dongarra, accepted in IEEE
International Workshop on Benchmarking,
Performance Tuning and Optimization for Big
Data Applications (BPOD 2017), December
2017, Boston, MA.
A pdf
version is available.
Optimized Batched Linear Algebra for
Modern Architectures, J. Dongarra,
S. Hammarling, N. J. Higham, S.D. Relton,
and M. Zounon, In Euro-Par 2017: Parallel
Processing, F.F. Rivera, T.F. Pena, and J.C.
Cabaleiro, editors, volume 10417 of Lecture
Notes in Computer Science, Springer-Verlag,
Cham, 2017, pages 511--522. DOI:
10.1007/978-3-319-64203-1_37.
A pdf
version is available.
Variable-Size Batched Gauss-Huard for
Block-Jacobi Preconditioning, Hartwig
Anzt,
Jack Dongarra, Goran Flegar, Enrique S.
Quintana-Ort�, Andr�s E. Tom�s,
Procedia Computer Science, Volume 108, pp
1783 - 1792, 2017, International Conference
on Computational Science, ICCS 2017, 12-14
June 2017, Zurich, Switzerland, ISSN
1877-0509, DOI:10.1016/j.procs.2017.05.186. A
pdf
version is available.
Bidiagonalization and R-Bidiagonalization:
Parallel Tiled Algorithms, Critical Paths
and Distributed-Memory Implementation, Mathieu
Faverge, Julien Langou, Yves Robert, Jack J.
Dongarra, 2017 IPDPS Conference,
DOI:10.1109/IPDPS.2017.46 A
pdf
version is available.
Factorization and Inversion of a Million
Matrices using GPUs: Challenges and
Countermeasures, Ahmad
Abdelfattah, Azzam Haidar, Stanimire Tomov,
and Jack Dongarra, ICCS’17, ETH Zurich,
Procedia Computer Science, Volume 108, 2017,
Pages 606-615,
DOI:10.1016/j.procs.2017.05.250. A
pdf
version is available.
Improving Performance of GMRES by Reducing
Communication and Pipelining Global
Collectives, I. Yamazaki, M.
Hoemmen, P. Luszczek, J. Dongarra, IPDPS
Workshop PDSEC-2017, Workshop Best Paper
Award, 2017. DOI: 10.1109/IPDPSW.2017.65 A
pdf
version is available.
Novel HPC Techniques to Batch Execution of
Many Variable Size BLAS Computations on
GPUs, Ahmad Abdelfattah, Azzam
Haidar, Stanimire Tomov, and Jack Dongarra,
ICS’17, Frankfurt, ISBN:
978-1-4503-5020-4,
DOI:10.1145/3079079.3079103. A
pdf
version is available.
Bringing High
Performance Computing to Big Data
Algorithms, H. Anzt, J. Dongarra,
M. Gates, J. Kurzak , P. Luszczek, S. Tomov,
I. Yamazaki in Handbook of Big Data
Technologies, Editors: Albert Y. Zomaya,
Sherif Sakr, ISBN: 978-3-319-49339-8 (Print)
978-3-319-49340-4 (Online),
DOI:10.1007/978-3-319-49340-4, Springer,
2017.
A pdf
version is available.
Preconditioned Krylov solvers on GPUs, Hartwig
Anzt, Mark Gates, Jack Dongarra, Moritz
Kreutzerd, Gerhard Welleind, Martin
K�hlere, Parallel Computing,
DOI:10.1016/j.parco.2017.05.006, June 2017.
A pdf
version is available.
Scaling Point Set Registration in 3D
Across Thread Counts on Multicore and
Hardware Accelerator Platforms through
Autotuning for Large Scale Analysis of
Scientific Point Clouds, Piotr
Luszczek, Jakub Kurzak, Ichitaro Yamazaki,
David Keffer, and Jack Dongarra, accepted in
IEEE International Workshop on Benchmarking,
Performance Tuning and Optimization for Big
Data Applications (BPOD 2017), December
2017, Boston, MA. DOI:
10.1109/BigData.2017.8258258
A pdf
version is available.
Variable-Size Batched Gauss-Jordan
Elimination for Block-Jacobi
Preconditioning on Graphics Processors
Parallel Computing, H. Anzt, G.
Flegar, J. Dongarra, E, Qunintana Otri,
Parallel Computing,
doi.org/10.1016/j.parco.2017.12.006.
A pdf
version is available.
Preconditioned Krylov solvers on GPUs, Hartwig
Anzt, Mark Gates, Jack Dongarra, Moritz
Kreutzerd, Gerhard Welleind, Martin
K�hlere, Parallel Computing,
DOI:10.1016/j.parco.2017.05.006, June 2017. A pdf
version is available.
Evaluation
of Directive-based Performance Portable
Programming Models,“ M. Graham
Lopez, Wayne Joubert, Veronica Vergara
Larrea, Oscar Hernandez, Azzam Haidar,
Stanimire Tomov, Jack Dongarra,
International Journal of High Performance
Computing and Networking, accepted May 2017.
A pdf
version is available.
A Framework
for Out of Memory Algorithms,
K. Kabir, A. Haidar, S. Tomov, A.
Bouteiller, J. Dongarra, in Kunkel J.,
Yokota R., Balaji P., Keyes D. (eds) High
Performance Computing, ISC 2017.
Lecture Notes in Computer Science, vol
10266. Springer, Frankfurt, Germany, June
19-21, 2017, DOI:10.1007/978-3-319-58667-0_9
A pdf
version is available.
Batched
Gauss-Jordan Elimination for Block-Jacobi
Preconditioner Generation on GPUs,
Hartwig Anzt, Jack Dongarra, Goran Flegar
and Enrique S. Quintana-Orti, Proceeding
PMAM'17 Proceedings of the 8th International
Workshop on Programming Models and
Applications for Multicores and Manycores,
Pages 1-10, Austin, TX, USA — February 04 -
08, 2017, ISBN: 978-1-4503-4883-6
DOI:10.1145/3026937.3026940
A pdf
version is available.
High-Performance
Cholesky Factorization for GPU-Only
Execution, Azzam Haidar, Ahmad
Abdelfattah, Stanimire Tomov and Jack
Dongarra, Proceeding GPGPU-10 Proceedings of
the General Purpose GPUs, Pages 42-52
Austin, TX, USA — February 04 - 08, 2017,
DOI:10.1145/3038228.3038237
A pdf
version is available.
Updating
Incomplete Factorization Preconditioners
for Model Order Reduction, Hartwig
Anzt, Edmond Chow, Jens Saak, and Jack
Dongarra, Numerical Algorithms, November
2016, Volume 73, Issue 3, pp 611–630,
DOI:10.1007/s11075-016-0110-2 A pdf
version is available.
Accelerating
NWChem Coupled Cluster through
dataflow-based Execution, A.
Danalis, H. Jagode, and J. Dongarra, The
International Journal of High Performance
Computing Applications, 2017,
DOI:10.1177/1094342016672543 A pdf
version is available.
On the
Performance and Energy Efficiency of
Sparse Linear Algebra on GPU,
Hartwig Anzt, Stanimire Tomov, and Jack
Dongarra, International Journal of High
Performance Computing, 2017,
DOI:10.1177/1094342016672081 A pdf
version is available.
Solving
Dense Symmetric Indefinite Systems using
GPUs, M. Baboulin, J. Dongarra, A.
Remy, S. Tomov, I. Yamazaki, Concurrency and
Computation: Practice and Experience, 2017,
DOI:10.1002/cpe.4055 A pdf
version is available.
Fine-grained
Bit-Flip Protection for Relaxation
Methods, H. Anzt, J. Dongarra, and
E Quintana-Orti, the Journal of
Computational Science, 2017,
DOI:10.1016/j.jocs.2016.11.013 A pdf
version is available.
Fast
Cholesky Factorization on GPUs for Batch
and Native Modes in MAGMA, Ahmad
Abdelfattah, Azzam Haidar, Stanimire Tomov,
and Jack Dongarra, Journal of Computational
Science, Volume 20, May 2017, Pages 85–93
DOI:10.1016/j.jocs.2016.12.009 A pdf
version is available.
With
Extreme Computing, the Rules Have Changed,
Jack Dongarra, Stanimire Tomov, Piotr
Luszczek, Jakub Kurzak, Mark Gates, Ichitaro
Yamazaki, Hartwig Anzt, Azzam Haidar, and
Ahmad Abdelfattah, IEEE CISE, April 2017,
DOI:10.1109/MCSE.2017.48 A pdf
version is available.
Structure-aware
Linear Solver for Realtime Convex
Optimization for Embedded Systems,"
I. Yamazaki, S. Tomov, J. Dongarra, IEEE
Embedded Systems Letters, May 2017, DOI:
10.1109/LES.2017.2700401 A pdf
version is available.
Design and
Implementation of the PULSAR Programming
System for Large Scale Computing,
J. Kurzak, P. Luszczek, I. Yamazaki, Y.
Robert, J. Dongarra, Supercomputing
Frontiers and Innovations, 2017,
DOI:10.14529/jsfi170101 A pdf
version is available.
Bringing
High Performance Computing to Big Data
Algorithms, H. Anzt, J. Dongarra,
M. Gates, J. Kurzak , P. Luszczek, S. Tomov,
I. Yamazaki in Handbook of Big Data
Technologies Editors: Albert Y. Zomaya,
Sherif Sakr, ISBN: 978-3-319-49339-8 (Print)
978-3-319-49340-4 (Online),
DOI:10.1007/978-3-319-49340-4, Springer,
2017. A pdf
version is available.
Optimizing
the SVD Bidiagonalization Process for a
Batch of Small Matrices, Tingxing
Dong, Azzam Haidar, Stanimire Tomov and Jack
Dongarra, ICCS’17, ETH Zurich, Procedia
Computer Science, Volume 108, 2017, Pages
1008–1018, DOI:10.1016/j.procs.2017.05.237 A pdf
version is available.
Factorization
and Inversion of a Million Matrices using
GPUs: Challenges and Countermeasures,
Ahmad Abdelfattah, Azzam Haidar, Stanimire
Tomov, and Jack Dongarra, ICCS’17, ETH
Zurich, Procedia Computer Science, Volume
108, 2017, Pages 606-615,
DOI:10.1016/j.procs.2017.05.250 A pdf
version is available.
The Design
and Performance of Batched BLAS on Modern
High-Performance Computing Systems,
Jack Dongarra, Sven Hammarling, Nick Higham,
Samuel Relton, Pedro Valero-Laraand Mawussi
Zounon, ICCS’17, ETH Zurich, Procedia
Computer Science, Volume 108, 2017, Pages
495-504, DOI:10.1016/j.procs.2017.05.138 A pdf
version is available.
Novel HPC
Techniques to Batch Execution of Many
Variable Size BLAS Computations on GPUs,
Ahmad Abdelfattah, Azzam Haidar, Stanimire
Tomov and Jack Dongarra, ICS 2017 Chicago,
June 14 2017, DOI:10.1145/3079079.3079103 A pdf
version is available.
Bidiagonalization
and R-Bidiagonalization: Parallel Tiled
Algorithms, Critical Paths and
Distributed-Memory Implementation,
Mathieu Faverge, Julien Langou, Yves Robert,
Jack J. Dongarra, 2017 IPDPS Conference,
DOI:10.1109/IPDPS.2017.46 A pdf
version is available.
Variable-Size
Batched Gauss-Huard for Block-Jacobi
Preconditioning, Hartwig Anzt, Jack
Dongarra, Goran Flegar, Enrique S.
Quintana-Ort�, Andr�s E. Tom�s,
Procedia Computer Science, Volume 108, pp
1783 - 1792, 2017, International Conference
on Computational Science, ICCS 2017, 12-14
June 2017, Zurich, Switzerland, ISSN
1877-0509, DOI:10.1016/j.procs.2017.05.186. A pdf
version is available.
Batched
Gauss-Jordan Elimination for Block-Jacobi
Preconditioner Generation on GPUs,
Hartwig Anzt, Jack Dongarra, Goran Flegar
and Enrique S. Quintana-Orti, accepted PMAM
2017, December 2016.
A pdf
version is available.
- 2016 -
Linear algebra
software for large-scale accelerated
multicore computing, A.
Abdelfattah, H. Anzt, J. Dongarra, M. Gates,
A. Haidar, J. Kurzak, P. Luszczek, S. Tomov,
I., Yamazaki and A. YarKhan, Acta Numerica /
Volume 25 / May 2016, pp 1 - 160, DOI:
10.1017/S0962492916000015.
A pdf
version is available.
Report on the Sunway TaihuLight System, Jack
Dongarra, University of Tennessee,
Department of Electrical Engineering
and Computer Science Tech Report
UT-EECS-16-742, June 2016. A pdf
version is available.
Sunway
TaihuLight Supercomputer Makes Its
Appearance, Jack Dongarra, The
National Science Review 2016 3: 265-266,
September 2016, DOI: 10.1093/nsr/nww044.
A pdf
version is available.
Stability
and Performance of Various Singular Value
QR Implementations on Multicore CPU with a
GPU, I. Yamazaki, S. Tomov, and J.
Dongarra, ACM Transactions on Mathematical
Software (TOMS), Volume 43 Issue 2,
September 2016 DOI:>10.1145/2898347 A
pdf
version is available.
On the Performance and Energy Efficiency
of Sparse Linear Algebra on GPUs,
H. Anzt, S. Tomov, and J. Dongarra, The
International Journal of High Performance
Computing Applications, DOI:
10.1177/1094342016672081.
A pdf
version is available.
Performance
Tuning and Optimization Techniques of
Fixed and Variable Size Batched Cholesky
Factorization on GPUs, A.
Abdelfattah, A. Haidar, S. Tomov, and J.
Dongarra, International Conference on
Computational Science (ICCS'16), San Diego,
CA, June 2016.
A pdf
version is available.
High-Performance
Tensor Contractions for GPUs, A.
Abdelfattah, M. Baboulin , V. Dobrev, J.
Dongarra , C. Earl , J. Falcou , A. Haidar ,
I. Karlin , T. Kolev , I. Masliah,
International Conference on Computational
Science (ICCS'16), San Diego, CA, June 2016
A pdf
version is available.
Efficiency
of General Krylov Methods on GPUs – An
Experimental Study, Hartwig Anzt,
Jack Dongarra, Moritz Kreutzer, Gerhard
Wellein, Martin K�hler, AsHES Workshop,
IPDPS, 2016.
A pdf
version is available.
On the
Development of Variable Size Batched
Computation for Heterogeneous Parallel
Architectures, Ahmad Abdelfattah,
Azzam Haidar, Stanimire Tomov, Jack
Dongarra, The 17th IEEE International
Workshop on Parallel and Distributed
Scientific and Engineering Computing (PDSEC
2016), IPDPS 2016, Chicago, IL, IEEE, May
2016.
A pdf
version is available.
GPU-Aware
Non-contiguous Data Movement In Open MPI,
W. Wu, G. Bosilca, R. vandeVaart, S.
Jeaugey, and J. Dongarra, The 25th
International Symposium on High Performance
Distributed Computing (HPDC2016).
A pdf
version is available.
Creating a
Standardised Set of Batched BLAS Routines,
Jack Dongarra, Sven Hammarling, Nicholas J.
Higham, Samuel D. Relton, Pedro Valero-Lara
and Mawussi Zounon, in the Proceedings of
the Fourth Workshop on Sustainable Software
for Science: Practice and Experiences
(WSSSPE4, 2016), Gabrielle Allen, Jeffrey
Carver et al, volume 1686, CEUR Workshop
Proceedings,
http://ceur-ws.org/Vol-1686/WSSSPE4_paper_3.pdf.
A pdf
version is available.
Hessenberg
Reduction with Transient Error Resilience
on GPU-Based Hybrid Architectures,
Y. Jai, P. Luszczek, and J. Dongarra, The
Sixth International Workshop on Accelerators
and Hybrid Exascale Systems (AsHES) 2016,
May 2016, Chicago. DOI:
10.1109/IPDPSW.2016.34
A pdf
version is available.
Non-GPU-resident
Dense Symmetric Indefinite Factorization,
I. Yamazaki, S. Tomov, and J. Dongarra,
Concurrency and Computation: Practice and
Experience, DOI: 10.1002/cpe.4012, November
2016.
A pdf
version is available.
A New
Metric for Ranking High Performance
Computing Systems, Jack Dongarra,
Michael A. Heroux, and Piotr Luszczek,
National Science Review, Volume 3, Issue 1,
March 2016, pp 30-35, DOI:
10.1093/nsr/nwv084.
A pdf
version is available.
Assessing the Cost of Redistribution
followed by a Computational Kernel:
Complexity and Performance Results, Julien
Herrmann,
George Bosilca, Thomas H�rault, Loris
Marchal, Yves Robert, and Jack Dongarra,
Parallel Computing, Volume 52, February
2016, pp. 22–41, DOI:
10.1016/j.parco.2015.09.005.
A pdf
version is available.
Optimization and Performance Evaluation of
the IDR Iterative Krylov Solver on GPUs,
Hartwig Anzt, Moritz Kreutzer, Eduardo
Ponce, Gregory D. Peterson, Gerhard Wellein,
Jack Dongarra, The International Journal of
High Performance Computing Applications,
1–11, 2016, DOI: 10.1177/1094342016646844
A pdf
version is available.
Experiences in Autotuning Matrix
Multiplication for Energy Minimization on
GPUs, Anzt, H., B. Haugen, J.
Kurzak, P. Luszczek, and J. Dongarra,
Concurrency in Computation: Practice and
Experience, vol. 27, issue 17, pp.
5096-5113, DOI: 10.1002/cpe.3516. A pdf
version is available.
High
Performance Conjugate Gradient Benchmark:
A new Metric for Ranking High Performance
Computing Systems,”J. Dongarra, M.
Heroux, P. Luszczek, The International
Journal of High Performance Computing
Applications, Volume 30 Issue 1, Spring
2016. DOI: 10.1177/1094342015593158.
A pdf
version is available.
Assessing
the Cost of Redistribution followed by a
Computational Kernel: Complexity and
Performance Results, Herrmann, J.,
G. Bosilca, T. Herault, L. Marchal, Y.
Robert, and J. Dongarra, Parallel Computing,
vol. 52, pp. 22-41, February 2016.
DOI: 10.1016/j.parco.2015.09.005.
A pdf
version is available.
Updating
Incomplete Factorization Preconditioners
for Model Order Reduction, Hartwig
Anzt, Edmond Chow, Jens Saak, and Jack
Dongarra, accepted in Numerical Algorithms,
January 2016.
A pdf
version is available.
Stability
and Performance of Various Singular Value
QR Implementations and Case-studies with
Adaptive Mixed Precision on Multicore CPU
with GPUs, Ichitaro Yamazaki,
Stanimire Tomov, and Jack Dongarra, Accepted
TOMS, February 2016.
A pdf
version is available.
Performance
Optimization of Sparse Matrix-Vector
Multiplication for Multi-component
PDE-based Applications using GPUs,
Ahmad Ahmad, Hatem Ltaief, David Keyes, and
Jack Dongarra, accepted Concurrency and
Computation: Practice and Experience, April
2016.
A pdf
version is available.
Porting the
PLASMA Numerical Library to the OpenMP
Standard, Asim YarKhan, Jakub
Kurzak, Piotr Luszczek, and Jack Dongarra,
accepted in International Journal of
Parallel Programming, May 2016.
A pdf
version is available.
Domain
Overlap for Iterative Sparse Triangular
Solves on GPUs, Hartwig Anzt,
Edmond Chow, Daniel Szyld, and Jack
Dongarra, Software for Exascale Computing,
Leibniz Supercomputing Centre, Munich,
Germany, Volume 113 of the series Lecture
Notes in Computational Science and
Engineering pp 527-545, Jan 25–27, 2016.
DOI: 10.1007/978-3-319-40528-5_24
A pdf
version is available.
Performance,
Design, and Autotuning of Batched GEMM for
GPUs, Ahmad Abdelfattah, Azzam
Haidar, Stanimire Tomov, Jack Dongarra, High
Performance Computing, Volume 9697 of the
series Lecture Notes in Computer Science pp
21-38, 2016, DOI:
10.1007/978-3-319-41321-1_2
A pdf
version is available.
Accelerating
the Conjugate Gradient Algorithm with GPU
in CFD Simulations, Hartwig Anzt,
Marc Baboulin, Jack Dongarra, Yvan Fournier,
Frank Hulsemann, Amal Khabou and Yushan
Wang, VECPAR 2016.
A pdf
version is available.
Task-Based
Cholesky Decomposition on Knights Corner
using OpenMP, Joseph Dorris, Jakub
Kurzak, Piotr Luszczek, Asim Yarkhan, Jack
Dongarra, Awarded the Best Paper Award at
the P^3MA workshop co-located with ISC, High
Performance Computing, Volume 9945 of the
series Lecture Notes in Computer Science pp
544-562, DOI: 10.1007/978-3-319-46079-6_37
A pdf
version is available.
LU, QR, and
Cholesky Factorizations: Programming
Model, Performance Analysis and
Optimization Techniques for the Intel
Knights Landing Xeon Phi, Azzam
Haidar, Stanimire Tomov, Konstantin Arturov,
Murat Guney, Shane Story, Jack Dongarra,
2016 IEEE High Performance Extreme Computing
Conference (HPEC ‘16) Twentieth Annual HPEC
Conference 13 - 15 September 2016, Waltham,
MA USA.
A pdf
version is available.
Performance
Analysis and Acceleration of Explicit
Integration for Large Kinetic Networks
using Batched GPU Computations, A.
Haidar, B. Brock, S. Tomov, M. Guidry, J.
Billings, D. Shyles, J. Dongarra, 2016 IEEE
High Performance Extreme Computing
Conference (HPEC ‘16), September 13-15,
2016.
A pdf
version is available.
Failure
Detection and Propagation in HPC systems,
George Bosilca, Aurelien Bouteiller, Amina
Guermouche, Thomas Herault, Yves Robert,
Pierre Sens, Jack Dongarra, Nominated for
Best Paper, Proceedings of the The
International Conference for High
Performance Computing, Networking, Storage
and Analysis (SC'16), Salt Lake City, Utah,
IEEE Press, pp. 27:1-27:11, November 2016.
A pdf
version is available.
Performance-Portable
Autotuning of OpenCL Kernels for
Convolutional Layers of Deep Neural
Networks, Yaohung Tsai, Piotr
Luszczek, Jakub Kurzak and Jack Dongarra, in
the Machine Learning and HPC Environments
Workshop associated with SC16,
November 2016.
A pdf
version is available.
Batched Generation of Incomplete Sparse
Approximate Inverses on GPUs, H.
Anzt, E. Chow, T. Huckle, J. Dongarra,
Proceedings of the 7th Workshop on Latest
Advances in Scalable Algorithms for
Large-Scale Systems, pp. 49–56, November
2016.
A pdf
version is available.
Towards
Achieving Performance Portability Using
Directives for Accelerators, M.
Lopez, V. Larrea, W. Joubert, O. Hernandez,
A. Haidar, S. Tomov, and J. Dongarra, The
International Conference for High
Performance Computing, Networking, Storage
and Analysis (SC'16), Third Workshop on
Accelerator Programming Using Directives
(WACCPD), Salt Lake City, Utah, Innovative
Computing Laboratory, University of
Tennessee, November 2016.
A pdf
version is available.
Performance
Analysis and Acceleration of Explicit
Integration for Large Kinetic Networks
using Batched GPU Computations, A.
Haidar, B. Brock, S. Tomov, M. Guidry, J.
Billings, D. Shyles, and J. Dongarra, 2016
IEEE High Performance Extreme Computing
Conference (HPEC ‘16), Waltham, MA, IEEE,
September 2016.
A pdf
version is available.
Power
Management and Event Verification in PAPI,
H. Jagode, A. YarKhan, A. Danalis , and J.
Dongarra, Tools for High Performance
Computing 2015: Proceedings of the 9th
International Workshop on Parallel Tools for
High Performance Computing, September 2015,
Dresden, Germany, Dresden, Germany, Springer
International Publishing, pp. pp. 41-51,
2016.
A pdf
version is available.
Search
Space Generation and Pruning System for
Autotuners, Piotr Luszczek, Mark
Gates, Jakub Kurzak, Anthony Danalis, and
Jack Dongarra, the 30th IEEE International
Parallel & Distributed Processing
Symposium, Chicago, IL, IEEE, May 2016.
A pdf
version is available.
High-performance
Matrix-Matrix Multiplications of Very
Small Matrices, I. Masliah, A.
Abdelfattah, A. Haidar, S. Tomov, M.
Baboulin, J. Falcou, and J. Dongarra, 22nd
International European Conference on
Parallel and Distributed Computing
(Euro-Par'16), Grenoble, France, Springer
International Publishing, August 2016.
A pdf
version is available.
Heterogeneous
Streaming, C. Newburn, et al., The
Sixth International Workshop on Accelerators
and Hybrid Exascale Systems (AsHES), IPDPS
2016, Chicago, IL, IEEE, May 2016.
A pdf
version is available.
CUDA-aware
non-contiguous data movement in Open MPI,
Wei Wu, George Bosilca, Rolf vandeVaart, and
Jack Dongarra, 25th International Symposium
on High-Performance Parallel and Distributed
Computing (HPDC'16), Kyoto, Japan, ACM, June
2016.
A pdf
version is available.
- 2015 -
Exascale
Computing and Big Data: The Next Frontier,
Daniel A. Reed and Jack Dongarra,
Communications of the ACM, Vol. 58 No. 7,
Pages 56-68, DOI: 10.1145/2699414. A pdf
version is available.
Dense
Symmetric Indefinite Factorization on GPU
Accelerated Architectures, M.
Baboulin, J. Dongarra, A. R�my, S. Tomov,
I. Yamazaki, the Proceedings of the 11th
International Conference on Parallel
Processing and Applied Mathematics (PPAM
2015), Volume 9573 of the series Lecture
Notes in Computer Science pp 86-95, DOI:
10.1007/978-3-319-32149-3_9
A pdf
version is available.
Accelerating
Collaborative Filtering Using Concepts
from High Performance Computing,
Mark Gates, Hartwig Anzt, Jakub Kurzak, and
Jack Dongarra, 2015 IEEE International
Conference on Big Data (IEEE BigData,
November 2015). DOI:
10.1109/BigData.2015.7363811
A pdf
version is available.
Strengthening
compute and data intensive capacities of
Armenia,” H. Astsatryan, V.
Sahakyan, Y. Shoukourian, P.H. Cros, M.
Dayde, J. Dongarra, P. Oster, in RoEduNet
International Conference - Networking in
Education and Research (RoEduNet NER), 2015
14th, vol., no., pp.28-33, 24-26, Sept. 2015
DOI: 10.1109/RoEduNet.2015.7311823
A pdf
version is available.
Parallel
Programming Models for Dense Linear
Algebra on Heterogeneous Systems,
M. Abalenkovs, A. Abdelfattah, J. Dongarra,
M. Gates, A. Haidar, J. Kurzak, P. Luszczek,
S. Tomov, I. Yamazaki, A. YarKhan,
Supercomputing Frontiers and Innovations,
Volume 2, Number 4, pages 67-86, 2015, DOI:
10.14529/jsfi1504
A pdf
version is available.
The TOP500
List of Supercomputers and Progress in
High Performance Computing, Erich
Strohmaier, Hans W. Meuer, Jack Dongarra,
Horst D. Simon, IEEE Computer, No.11 - Nov.
(2015 vol.48), pp. 42–49,
http://doi.ieeecomputersociety.org/10.1109/MC.2015.338. A pdf
version is available.
Implementation and Tuning of Batched
Cholesky Factorization and Solve for
NVIDIA GPUs, Jakub Kurzak, Hartwig
Anzt, Mark Gates, and Jack Dongarra, IEEE
Transactions on Parallel and Distributed
Systems, no. 1045–9219, November 2015. A pdf
version is available.
Mixing LU-QR Factorization Algorithms to
Design High-Performance Dense Linear
Algebra Solvers, Mathieu Faverge,
Julien Herrmann, Julien Langou, Bradley
Lowery, Yves Robert, and Jack Dongarra,
Journal on Parallel and Distributed
Computing, Volume 85, November 2015, pp.
32–46, http://dx.doi.org/10.1016/j.jpdc.201.
A pdf
version is available.
A Scalable Approach to Solving Dense
Linear Algebra Problems on Hybrid CPU-GPU
Systems, Fengguang Song and Jack
Dongarra, Concurrency and Computation:
Practice and Experience, Volume 27, Issue
14, 25 September 2015, pp. 3702–3723, DOI:
10.1002/cpe.3403.
A pdf
version is available.
A Survey of
Recent Developments in Parallel
Implementations of Gaussian Elimination,
Simplice Donfack, Jack Dongarra, Mathieu
Faverge, Mark Gates, Jakub Kurzak, Piotr
Luszczek, and Ichitaro Yamazaki, Concurrency
and Computation: Practice and Experience
Volume 27, Issue 5, pp. 1292–1309, 10 April
2015, http://dx.doi.org/10.1002/cpe.3306.
A pdf
version is available.
Experiences
in Autotuning Matrix Multiplication for
Energy Minimization on GPUs,
Hartwig Anzt, Blake Haugen, Jakub Kurzak,
Piotr Luszczek, and Jack Dongarra,
Concurrency and Computing: Practice and
Experience, Volume 27, Issue 17, December
2015, pp. 5096–5113,
http://dx.doi.org/10.1109/IPDPSW.2014.107.
A pdf
version is available.
Mixed-Precision
Cholesky QR Factorization and its Case
Studies on Multicore CPUS with Multiple
GPUs, Ichitaro Yamazaki, Stanimire
Tomov, and Jack Dongarra, SIAM J. Sci.
Comput. 37-3 (2015), pp. C307-C330,
http://dx.doi.org/10.1137/14M0973773.
A pdf
version is available.
A New Metric for Ranking High Performance
Computing Systems, Jack Dongarra,
Michael A. Heroux, and Piotr Luszczek,
National Science Review, January 2016, DOI:
10.1093/nsr/nwv084. A pdf
version is available.
Computing
Low-rank Approximation of a Dense Matrix
on Multicore CPUs with a GPU and its
Application to Solving a Hierarchically
Semiseparable Linear System of Equations,
Ichitaro Yamazaki, Stanimire Tomov and Jack
Dongarra, Scientific Programming, vol. 2015,
Article ID 246019, 17 pages, 2015,
http://dx.doi.org/10.1155/2015/246019.
A pdf
version is available.
Batched Matrix Computations on Hardware
Accelerators Based on GPUs, Azzam
Haidar, Tingxing Dong, Piotr Luszczek,
Stanimire Tomov, and Jack Dongarra, The
International Journal of High Performance
Computing Applications, May 2015 29:
193-208, first published on February 9,
2015,
http://dx.doi.org/1177/1094342014567546.
A pdf
version is available.
PaRSEC in Practice: Optimizing a Legacy
Chemistry Application through Distributed
Task-Based Execution, Anthony
Danalis, Heike Jagode, George Bosilca and
Jack Dongarra, to appear IEEE Cluster 2015,
Chicago, Illinois, USA, Sept. 8-11, 2015.
A pdf
version is available.
Random Sampling to Update Partial Singular
Value Decomposition on a Hybrid CPU/GPU
Cluster, Ichitaro Yamazaki, Jakub
Kurzak, Piotr Luszczek, Jack Dongarra, to
appear SC15, November 2015.
A pdf
version is available.
Practical Scalable Consensus for
Pseudo-Synchronous Distributed Systems,
Thomas Herault, Aurelien Bouteiller, George
Bosilca, Marc Gamell, Keita Teranishi,
Manish Parashar, Jack Dongarra, to appear
SC15, November 2015.
A pdf
version is available.
Efficient Implementation Of Quantum
Materials Simulations On Distributed
CPU-GPU Systems, Raffaele
Solc� , Anton Kozhevnikov, Azzam
Haidar, Stanimire Tomov, Thomas C.
Schulthess, Jack Dongarra, to appear SC15,
finalist for the Best Paper Award, November
2015.
A pdf
version is available.
Dense Symmetric Indefinite Factorization
on GPU Accelerated Architecture,
Marc Baboulin, Jack Dongarra, Adrien Remy,
Stanimire Tomov, and Ichitaro Yamazaki, to
appear PPAM 2015, Krakow Poland, 2015.
A pdf
version is available.
Plan B: Interruption of Ongoing MPI
Operations to Support Failure Recovery,
Aurelien Bouteiller, George Bosilca and Jack
Dongarra, to appear EUROMPI Conference,
Spetember 2015.
A pdf
version is available.
Flexible Linear Algebra Development and
Scheduling with Cholesky Factorization,
Azzam Haidar, Asim YarKhan, Chongxiao Cao,
Piotr Luszczek, Stanimire Tomov, Jack
Dongarra, 17th IEEE International Conference
on High Performance Computing and
Communications, New York, New York, August
2015.
A pdf
version is available.
Iterative Sparse Triangular Solves for
Prconditioning, Hartwig Anzt,
Edmond Chow and Jack Dongarra, to appear in
EuroPar 2015, Vienna Austria, August 2015.
A pdf
version is available.
Design for a Soft Error Resilient Dynamic
Task-based Runtime, Chongxaio Cao,
George Bosilca, Thomas Herault, and Jack
Dongarra, 29th IEEE International Parallel
& Distributed Processing Symposium,
Hyderabad, INDIA, May 2015.
A pdf
version is available.
Hierarchical DAG Scheduling for Hybrid
Distributed Systems, Wei Wu, George
Bosilca, Aurelien Bouteiller, Mathieu
Faverge, and Jack Dongarra, 29th IEEE
International Parallel & Distributed
Processing Symposium, Hyderabad, INDIA, May
2015.
A pdf
version is available.
Performance Analysis and Optimisation of
Two-Sided Factorization Algorithms for
Heterogeneous Platform, International
Conference
on Computational Science 2015, ICCS
2015, Computational Science at the
Gates of Nature Edited By Slawomir Koziel,
Leifur Leifsson, Michael Lees, Valeria V.
Krzhizhanovskaya, Jack Dongarra and Peter
M.A. Sloot. doi:10.1016/j.procs.2015.05.222
A pdf
version is available.
Accelerating the LOBPCG method on GPUs
using a blocked sparse matrix vector
product, H. Anzt, S. Tomov, and J.
Dongarra, In Spring Simulation
Multi-Conference 2015 (SpringSim15), 2015.
A pdf
version is available.
Performance Analysis and Design of a
Hessenberg Reduction using Stabilized
Blocked Elementary Transformations for New
Architecture, Khairul Kabir, Azzam
Haidar, Stanimire Tomov, Jack Dongarra. Best
Paper Award at 2015 Spring Simulation
Multiconference, 23rd High Performance
Computing Symposium (HPC 2015).
A pdf
version is available.
Energy Efficiency and Performance
Frontiers for Sparse Computations on GPU
Supercomputers, Hartwig Anzt, Stan
Tomov, and Jack Dongarra, PMAM '15
Proceedings of the Sixth International
Workshop on Programming Models and
Applications for Multicores and Manycores,
ACM New York, NY, USA 2015,
doi:10.1145/2712386.2712387
A pdf
version is available.
Towards Batched Linear Solvers on
Accelerated Hardware Platforms,
Azzam Haidar, Piotr Luszczek, Stanimire
Tomov, and Jack Dongarra, In Proceedings of
the 20th ACM SIGPLAN Symposium on Principles
and Practice of Parallel Programming, PPoPP
2015, San Francisco, CA, February 7-11,
2015. 10.1145/2688500.2688534
A pdf
version is available.
Optimization for Performance and Energy
for Batched Matrix Computations on GPUs,
Azzam Haidar, Tingxing Dong, Piotr Luszczek,
Stanimire Tomov, and Jack Dongarra, 8th
Workshop on General Purpose Processing Using
GPUs, (GPGPU 8), San Francisco, February 7,
2015. 10.1145/2716282.2716288
A pdf
version is available.
Optimizing Krylov Subspace Solvers on
Graphics Processing Units, Hartwig
Anzt, Stanimire Tomov, Piotr Luszczek,
Ichitaro Yamazaki, Jack Dongarra, and
William Sawyer, Parallel & Distributed
Processing Symposium Workshops (IPDPSW),
2014 IEEE International, pp 941-949, DOI:
10.1109/IPDPSW.2014.107
A pdf
version is available.
Experiences in Autotuning Matrix
Multiplication for Energy Minimization on
GPUs, Hartwig Anzt, Blake Haugen,
Jakub Kurzak, Piotr Luszczek, and Jack
Dongarra, accepted in Concurrency and
Computing: Practice and Experience, March
2015. DOI: 10.1002/cpe.3516 A pdf
version is available.
Mixing LU-QR Factorization Algorithms to
Design High-Performance Dense Linear
Algebra Solvers, Mathieu Faverge,
Julien Herrmann, Julien Langou, Bradley
Lowery, Yves Robert, and Jack Dongarra,
accepted in Journal on Parallel and
Distributed Computing, March 2015.
http://dx.doi.org/10.1016/j.jpdc.201
A pdf
version is available.
Mixed-Precision Cholesky QR Factorization
and its Case Studies on Multicore CPUS
with Multiple GPUs, I. Yamazaki, S.
Tomov, and J. Dongarra, SIAM J. Sci.
Comput., Volume 37, Issue 3,
DOI:10.1137/14M0973773
A pdf
version is available.
Updating Incomplete Factorization
Preconditioners for Model Order Reduction,
Hartwig Anzt, Edmond Chow, Jens Saak,
and Jack Dongarra, To appear in Parallel
Computing.
A pdfversion is available.
A Survey of Recent Developments in
Parallel Implementations of Gaussian
Elimination, Simplice Donfack,
Jack Dongarra, Mathieu Faverge, Mark Gates,
Jakub Kurzak, Piotr Luszczek, Ichitaro
Yamazaki, Submitted to Concurrency and
Computation: Practice and Experience, Volume
27, Issue 5, pages 1292-1309, April 2015.
DOI: 10.1002/cpe.3306
A pdf version is available.
Computing
Low-rank Approximation of a Dense Matrix
on Multicore CPUs with a GPU and its
Application to Solving a Hierarchically
Semiseparable Linear System of Equations,
Ichitaro Yamazaki, Stanimire Tomov and Jack
Dongarra, Scientific Programming, vol. 2015,
Article ID 246019, 17 pages, 2015.
http://dx.doi.org/10.1155/2015/246019.
A pdf
version is available.
Acceleration
of GPU-based Krylov Solvers via Data
Transfer Reduction, Hartwig Anzt,
Stanimire Tomov, Piotr Luszczek, William
Sawyer and Jack Dongarra, The International
Journal of High Performance Computing
Applications, accepted April 2015,
http://dx.doi.org/10.1177/1094342015580139.
A pdf
version is available.
Algorithm-based
Fault Tolerance for Dense Matrix
Factorizations, Multiple Failures and
Accuracy, Aurelien Bouteiller,
Thomas Herault, George Bosilca, Peng Du, and
Jack Dongarra, ACM Transactions on Parallel
Computing, Volume 1 Issue 2, January 2015,
http://dx.doi.org/10.1145/2686892.
A pdf
version is available.
HPC
Programming on Intel Many-Integrated-Core
Hardware with MAGMA Xeon Phi, Jack
Dongarra, Mark Gates, Azzam Haidar, Yulu
Jia, Khairul Kabir, Piotr Luszczek, and
Stanimire Tomov, Scientific Programming,
Volume 2015 (2015), Article ID 502593, 11
pages
http://dx.doi.org/10.1155/2015/502593.
A pdf
version is available.
Batched
Matrix Computations on Hardware
Accelerators Based on GPUs, Azzam
Haidar, Tingxing Dong, Piotr Luszczek,
Stanimire Tomov, and Jack Dongarra, The
International Journal of High Performance
Computing Applications, May 2015 29:
193-208, first published on February 9,
2015,
http://dx.doi.org/1177/1094342014567546.
A pdf
version is available.
Composing
Resilience Techniques: ABFT, Periodic and
Incremental Checkpointing, George
Bosilca, Aurelien Bouteiller, Thomas
Herault, Yves Robert, and Jack Dongarra,
International Journal of Networking and
Computing, Volume 5, Number 1, pages 2-25,
January 2015.
A pdf
version is available.
Exascale
Computing and Big Data: The Next Frontier,
Daniel A. Reed and Jack Dongarra, accepted
in Communications of the ACM, Vol. 58 No. 7,
Pages 56-68, DOI: 10.1145/2699414.
A pdf
version is available.
- 2014 -
Unified Model for Assessing
Checkpointing Protocols at
Extreme-Scale, George
Bosilca, Aurelien Bouteiller,
Elisabeth Brunet, Franck Cappello,
Jack Dongarra, Amina Guermouche,
Thomas Herault, Yves Robert,
Frederic Vivien, and Dounia
Zaidouni, Concurrency and
Computation: Practice and
Experience, Volume 26, Issue 17, pp.
2772–2791, 10 December 2014, DOI:
10.1002/cpe.3173. A pdf
version is available.
Performance of Various Computers
Using Standard Linear Equations
Software, (Linpack Benchmark
Report), Jack J. Dongarra,
University of Tennessee Computer
Science Technical Report, CS-89-85,
2014.
A postscript
version is available.
Parallel Simulation of
Superscalar Scheduling,
Blake Haugen, Piotr Luszczek,
Jakub Kurzak, Asim YarKhan, and
Jack Dongarra, CPP'14:
International Conference on
Parallel Processing, Minneapolis,
MN, 2014, DOI:
10.1109/ICPP.2014.21
A pdf
version is available.
Performance and Portability with
OpenCL for Throughput-Oriented
HPC Workloads Across
Accelerators, Coprocessors, and
Multicore Processors,
Azzam Haidar, Chongxiao Cao,
Ichitaro Yamazaki, Jack Dongarra,
Mark Gates, Piotr Luszczek, and
Stan Tomov, Scala 2014, ACM, New
Orleans, LA, November 17, 2014,
DOE:10.1109/ScalA.2014.8
A pdf
version is available.
Access-averse Framework for
Computing Low-rank Matrix
Approximations, Ichitaro
Yamazaki, Theo Mary, Jakub Kurzak,
Stanimire Tomov, and Jack
Dongarra, First International
Workshop on High Performance Big
Graph Data Management, Analysis,
and Mining (in Conjunction with
IEEE BigData'14), October, 27,
2014, Bethesda, MD, Pages: 70 -
77, DOI:
10.1109/BigData.2014.7004374
A pdf
version is available.
PTG: An Abstraction for
Unhindered Parallelism, Anthony
Danalis, George Bosilca, Aurelien
Bouteiller, Thomas Herault, and
Jack Dongarra, WOLFHPC '14
Proceedings of the Fourth
International Workshop on
Domain-Specific Languages and
High-Level Frameworks for High
Performance Computing Pages 21-30,
SC14 Workshop, New Orleans, LA,
November 17, 2014,
DOI:10.1109/WOLFHPC.2014.8
A pdf
version is available.
Deflation Strategies to Improve
the Convergence of
Communication-Avoiding GMRES,
Ichitaro Yamazaki, Stanimire
Tomov, and Jack Dongarra,
ScalA2014, Workshop on Latest
Advances in Scalable Algorithms
for Large-Scale Systems (ScalA),
New Orleans, LA, November 17,
2014. DOI:10.1109/ScalA.2014.6
A pdf
version is available.
Power Monitoring with PAPI for
Extreme Scale Architectures and
Dataflow-based Programming
Models, McCraw, Heike,
Ralph, James, Danalis, Anthony,
Dongarra, Jack, Workshop on
Monitoring and Analysis for High
Performance Computing Systems Plus
Applications (HPCMASPA 2014), IEEE
Cluster 2014, IEEE, Madrid, Spain,
September, 2014. DOI:
10.1109/CLUSTER.2014.6968672
A pdf
version is available.
LU Factorization of
Small Matrices: Accelerating
Batched DGETRF on the GPU,
Tingxing Dong, Azzam Haidar, Piotr
Luszczek, James Austin Harris,
Stanimire Tomov, and Jack
Dongarra, High Performance
Computing and Communications, 2014
IEEE 6th Intl Symp on Cyberspace
Safety and Security, 2014 IEEE
11th Intl Conf on Embedded
Software and Syst
(HPCC,CSS,ICESS), Paris, France,
2014, DOI:10.1109/HPCC.2014.30
A pdf
version is available.
A Step towards Energy Efficient
Computing: Redesigning A
Hydrodynamic Application on
CPU-GPU, Tingxing Dong,
Veselin Dobrev, Tzanio Kolev,
Robert Rieben, Stanimire Tomov,
and Jack Dongarra, 28th IEEE
International Parallel &
Distributed Processing Symposium,
2014, DOI: 10.1109/IPDPS.2014.103
A pdf
version is available.
clMAGMA: High
Performance Dense Linear Algebra
with OpenCL, Chongxiao
Cao, Jack Dongarra, Peng Du, Mark
Gates, Piotr Luszczek, Stanimire
Tomov, IWOCL '14, May 12 - 13
2014, Bristol, United Kingdom.
A pdf
version is available.
A Scalable Approach to
Solving Dense Linear Algebra
Problems on Hybrid CPU-GPU
Systems, Fengguang Song
and Jack Dongarra, accepted in
Concurrency and Computation:
Practice and Experience, August
2014. DOI: 10.1002/cpe.3403
A pdf
version is available
DOE:
Assessment of Workforce
Development Needs in office of
Science Research Disciplines, DOE
ASCAC Subcommittee Report,
B. Chapman, et. al, July 2014.
A pdf
version is available.
Top
Ten Exascale Research Challenges,
DOE ASCAC Subcommittee Report, 2014,
R. Lucas, et. al.
A pdf
version is available.
Applied
Mathematics Research for Exascale
Computing, Jack Dongarra
(co-chair, Oak Ridge National
Laboratory) and Jeffrey Hittinger
(co-chair, Lawrence Livermore
National Laboratory, et. al. DOE
Report for the Office of Science,
Advanced Scientific Computing
Research, 2014.
A pdf
version is available.
Unified
Model for Assessing Checkpointing
Protocols at Extreme-Scale,
George Bosilca, Aurelien Bouteiller,
Elisabeth Brunet, Franck Cappello,
Jack Dongarra, Amina Guermouche,
Thomas Herault, Yves Robert,
Frederic Vivien, and Dounia
Zaidouni, accepted in Concurrency
and Computation: Practice and
Experience, Volume 26, Issue 17,
pages 2772-2791, 10 December 2014,
DOI: 10.1002/cpe.3173.
A pdf
version is available.
Accelerating
Numerical Dense Linear Algebra
Calculations with GPUs,
Jack Dongarra, Mark Gates, Azzam
Haidar, Jakub Kurzak, Piotr
Luszczek, Stanimire Tomov, and
Ichitaro Yamazaki, pp. 3-28, in
Numerical Computations with GPUs,
edited by Volodymyr Kindratenko,
Springer, 2014,
DOI:10.1007/978-3-319-06548-9_1.
A pdf
version is available.
Looking Back at Dense Linear
Algebra Software, Piotr
Luszczek, Jakub Kurzak, and Jack
Dongarra, Journal of Parallel and
Distributed Computing, pp 2548-2560,
2014.
http://dx.doi.org/10.1016/j.jpdc.2013.10.005
A pdf
version is available.
A
Novel Hybrid CPU-GPU Generalized
Eigensolver for Electronic Structure
Calculations Based on Fine Grained
Memory Aware Tasks, Azzam
Haidar, Stanimire Tomov, Jack
Dongarra, Raffaele Solc`a, Thomas
Schulthess, International Journal of
High Performance Computing
Applications, volume 28, number 2 pp
196-209, 2014. DOI:
10.1177/1094342013502097
A pdf
version is available.
Update
Achieving Numerical Accuracy and
High Performance using Recursive
Tile LU Factorization, J.
Dongarra, M. Faverge, P. Luszcsek,
Concurrency and Computation:
Practice and Experience, Volume 26,
Issue 7, pp 1408-1431, DOI:
10.1002/cpe.3110, 2014.
A pdf
version is available.
Model-Driven
One-Sided Factorizations on
Multicore, Accelerated Systems,
Jack Dongarra, Azzam Haidar, Jakub
Kurzak, Piotr Luszczek, Stanimire
Tomov, Asim YarKhan, Supercomputing
Frontiers and Innovations, volume 1,
number 1, 2014.
A pdf
version is available. Performance
and Reliability Trade-offs for the
Double Checkpointing Algorithm,
Jack Dongarra, Thomas Herault and
Yves Robert, The International
Journal of Networking and Computing,
Vol 4 No 1, p. 23-41, 2014.
A pdf
version is available.
An
Efficient Distributed Randomized
Algorithm For Solving Large Dense
Symmetric Indefinite Linear
Systems, Marc Baboulin,
Dulceneia Becker, George Bosilca,
Anthony Danalis, and Jack Dongarra,
Parallel Computing, Volume 40 Issue
7, July 2014, pp 213-223. DOI:
10.1016/j.parco.2013.12.003
A pdf
version is available.
HPC
Programming on Intel
Many-Integrated-Core Hardware with
MAGMA Port to Xeon Phi,
Jack Dongarra, Mark Gates, Azzam
Haidar, Yulu Jia, Khairul Kabir,
Piotr Luszczek, and Stanimire Tomov,
Volume 2015 (2015), Article ID
502593, Scientific Programming. DOI:
10.1155/2015/502593
A pdf
version is available.
Exascale Computing and Big Data:
The Next Frontier, Daniel A.
Reed and Jack Dongarra, DOI:
10.1145/2699414, Communications of
the ACM, Vol. 58 No. 7, Pages 56-68,
July 2015.
A pdf
version is available.
Communication-Avoiding
Symmetric-Indefinite Factorization,
G. Ballard, D. Becker, J. Demmel, J.
Dongarra, A. Druinsky, I.
Peled, O. Schwartz, S. Toledo, and
I. Yamazaki, DOI:10.1137/130929060,
SIAM J. Matrix Anal. Appl. 35(4):
1364-1460 (2014).
A pdf
version is available.
Algorithm-based
Fault Tolerance for Dense Matrix
Factorizations, Multiple Failures
and Accuracy, Aurelien
Bouteiller, Thomas Herault, George
Bosilca, Peng Du, and Jack Dongarra,
DOI: 10.1145/2686892, ACM
Transactions on Parallel Computing,
Volume 1 Issue 2, January 2015.
A pdf
version is available.
Assessing
the Cost of Redistribution
followed by a Computational
Kernel: Complexity and Performance
Results, Julien Herrmann,
George Bosilca, Thomas Hurault,
Loris Marchal, Yves Robert, Jack
Dongarra, submitted to Parallel
Computing May 2014.
A pdf
version is available.
Optimizing
Krylov Subspace Solvers on
Graphics Processing Units,
Hartwig Anzt, Stanimire Tomov, Piotr
Luszczek, Ichitaro Yamazaki, Jack
Dongarra, and William Sawyer,
submitted to International Journal
of High Performance Computing
Applications 2014.
A pdf
version is available.
A
Scalable Approach to Solving Dense
Linear Algebra Problems on Hybrid
CPU-GPU Systems, Fengguang
Song and Jack Dongarra, DOI:
10.1002/cpe.3403, Concurrency and
Computation: Practice and
Experience, October 2014.
A pdf
version is available.
LAPACK,
CRC Handbook on Linear Algebra,
Second Edition, Zhaojun
Bai, James Demmel, Jack Dongarra,
Julien Langou, and Jenny Wang,
Editor Leslie Hogben, CRC Press,
ISBN 9781466507289, 2014.
A pdf
version is available.
Accelerating
Numerical Dense Linear Algebra
Calculations with GPUs,
Jack Dongarra, Mark Gates, Azzam
Haidar, Jakub Kurzak, Piotr
Luszczek, Stanimire Tomov, and
Ichitaro Yamazaki, to appear in
Numerical Computations with GPUs,
edited by Volodymyr Kindratenko,
Springer, 2014.
A pdf
version is available.
Computing
Least Squares Condition Numbers on
Hybrid Multicore/GPU Systems,
M. Baboulin and J. Dongarra and R.
Lacroix, Proceedings for the Applied
Mathematics, Modeling and
Computational Science (AMMCS)
conference, Vol. 117 (2015).
A pdf
version is available.
New
Multi-Stage Algorithm for
Symmetric Eigenvalues and
Eigenvectors Achieves Two-Fold
Speedup, A. Haidar,
P. Luszczek, J. Dongarra, Best Paper
Award, Workshop on Parallel and
Distributed Scientific and
Engineering Computing, Phoenix, AZ,
May, 2014.
A pdf
version is available.
Designing
LU-QR Hybrid Solvers for
Performance and Stability,
Mathieu Faverge, Julien Herrmann,
Julien Langou, Bradley Lowery, Yves
Robert, and Jack Dongarra, 28th IEEE
International Parallel &
Distributed Processing Symposium,
2014.
A pdf
version is available.
Redesigning
A Hydrodynamic Application on
CPU-GPU, Tingxing Dong,
Veselin Dobrev, Tzanio Kolev, Robert
Rieben, Stanimire Tomov, Jack
Dongarra, 28th IEEE International
Parallel & Distributed
Processing Symposium.
A pdf
version is available.
Improving
the Performance of CA-GMRES on
Multicores with Multiple GPUs,
I. Yamazaki, H. Anzt, S. Tomov, M.
Hoemmen, and J. Dongarra, 28th IEEE
International Parallel &
Distributed Processing Symposium.
A pdf
version is available.
Unified
Development for Mixed Multi-GPU
and Multi-Coprocessor Environments
using a Lightweight Runtime
Environment, A. Haidar, C.
Cao, J. Dongarra, P. Luszczek, S.
Tomov, A. YarKhan, K. Kabir, 28th
IEEE International Parallel &
Distributed Processing Symposium.
A pdf
version is available.
Mixed-Precision
Orthogonalization Scheme and
Adaptive Step Size for CA-GMRES on
GPUs, Best Paper Award,
Ichitaro Yamazaki, Stanimire Tomov,
Tingxing Dong and Jack Dongarra
VECPAR 2014, June 30 - July 3, 2014,
Eugene, Oregon.
A pdf
version is available.
Accelerating
computation of eigenvectors in the
nonsymmetric eigenvalue problem,
Mark Gates, Azzam Haidar and Jack
Dongarra VECPAR 2014, June 30 - July
3, 2014, Eugene, Oregon.
A pdf
version is available.
Self-Adaptive
Multiprecision Preconditioners on
Multicore and Manycore
Architectures, Hartwig
Anzt, Dimitar Lukarski, Stan Tomov
and Jack Dongarra VECPAR 2014, June
30 - July 3, 2014, Eugene, Oregon.
A pdf
version is available.
Hybrid
Multi-Elimination ILU
Preconditioners on GPUs,
Dimitar Lukarski, Hartwig Anzt,
Stanimire Tomov, and Jack Dongarra,
23rd Heterogeneity in Computing
Workshop (HCW 2014), in Proc. of
IPDPS 2014, Phoenix, Arizona, May
19-23, 2014.
A pdf
version is available.
Optimizing
Krylov Subspace Solvers on
Graphics Processing Units,
Hartwig Anzt, Stanimire Tomov, Piotr
Luszczek, Ichitaro Yamazaki, Jack
Dongarra, and William Sawyer, The
Third International Workshop on
Accelerators and Hybrid Exascale
Systems (AsHES), May 19, 2014,
Phoenix, AZ, part of IPDPS
Conference.
A pdf
version is available.
MIAMI:
A Framework for Application
Performance Diagnosis, G.
Marin, J. Dongarra, and D. Terpstra,
ISPASS-2014 2014 IEEE International
Symposium on Performance Analysis of
Systems and Software March 23-25,
2014 Hyatt Regency Hotel in
Monterey, CA.
A pdf
version is available.
Assessing
the Impact of ABFT and Checkpoint
Composite Strategies,
Bosilca, G., Bouteiller, A.,
Herault, T., Robert, Y., Dongarra,
J. IPDPSW, APDCM 2014, Phoenix, AZ,
May, 2014.
A pdf
version is available. Dynamically
balanced synchronization-avoiding
LU factorization with multicore
and GPUs, Simplice Donfack,
Stanimire Tomov and Jack Dongarra,
Fourth International Workshop on
Accelerators and Hybrid Exascale
Systems, May 19, 2014.
A pdf
version is available.
Design
and Implementation of a Large
Scale Tree-Based QR Decomposition
Using a 3D Virtual Systolic Array
and a Lightweight Runtime,
Ichitaro Yamazaki, Jakub Kurzak,
Piotr Luszczek, Jack Dongarra,
Parallel Processing Letters, Volume
24, Number 4, December 2014, doi:
10.1142/S0129626414420043.
A pdf
version is available.
Scaling
Up Matrix Computations on
Shared-Memory Manycore Systems
with 1000 CPU Cores,
Fengguang Song and Jack Dongarra,
Proceeding ICS '14 Proceedings of
the 28th ACM international
conference on Supercomputing, pp
333-342, ACM New York, NY, USA,
ISBN: 978-1-4503-2642-1
doi>10.1145/2597652.2597670
A pdf
version is available.
Heterogenous
Acceleration for Linear Algebra in
Mulit-Coprocessor Environments,
Azzam Haidar, Piotr Luszczek,
Stanimire Tomov and Jack Dongarra
VECPAR 2014, June 30 - July 3, 2014,
Eugene, Oregon, accepted March 2014.
A pdf
version is available.
A
Fast Batched Choleksy
Factorization on a GPU,
Tingxing Dong, Azzam Haidar,
Stanimire Tomov and Jack Dongarra,
43rd International Conference on
Parallel Processing (ICPP-2014),
Minneapolis, USA, during September
9-12, 2014.
A pdf
version is available.
clMAGMA:
High Performance Dense Linear
Algebra with OpenCL,
Chongxiao Cao, Jack Dongarra, Peng
Du, Mark Gates, Piotr Luszczek,
Stanimire Tomov, The International
Workshop on OpenCL, Bristol
University, England, May 12-13,
2014.
A pdf
version is available.
Utilizing
Dataflow-based Execution for
Coupled Cluster Methods,
Heike McCraw, Anthony Danalis,
Thomas Herault, George Bosilca,
Jack Dongarra, Karol Kowalski,
Theresa L. Windus, Poster at
Clusters 2014.
A pdf
version is available.
- 2013 -
Trip
Report to Changsha and the
Tianhe-2 Supercomputer,
J. Dongarra, June 3, 2013.
A pdf
version is available.
Extending
the Scope of the
Checkpoint-on-Failure Protocol
for Forward Recovery in
Standard MPI, Wesley
Bland, Peng Du, Aurelien
Bouteiller, Thomas Herault,
George Bosilca, and Jack J.
Dongarra, Concurrency and
Computing: Practice and
Experience, Volume 25, Issue 17,
pp. 2381–2393, DOI:
10.1002/cpe.3100.
A pdf
version is available.
Extending
the Scope of the
Checkpoint-on-Failure Protocol
for Forward Recovery in
Standard MPI, Wesley
Bland, Peng Du, Aurelien
Bouteiller, Thomas Herault,
George Bosilca, and Jack J.
Dongarra, Concurrency and
Computing: Practice and
Experience, Volume 25, Issue 17,
pages 2381-2393, 2013, DOI:
10.1002/cpe.3100.
A pdf
version is available.
Toward
a New Metric for Ranking High
Performance Computing Systems,
M. Heroux and J. Dongarra, UTK
EECS Tech Report and Sandia
National Labs Report
SAND2013-4744, June 2013.
A pdf
version is available.
A
Novel Hybrid CPU-GPU
Generalized Eigensolver for
Electronic Structure
Calculations Based on Fine
Grained Memory Aware Tasks, Azzam
Haidar, Stanimire Tomov, Jack
Dongarra, Raffaele Solc`a,
Thomas Schulthess, International
Journal of High Performance
Computing Applications, accepted
July 2013.
A pdf
version is available.
PaRSEC:
A programming paradigm
exploiting heterogeneity for
enhancing scalability,
George Bosilca, Aurelien
Bouteiller, Anthony Danalis,
Mathieu Faverge, Thomas Herault,
Jack J. Dongarra, accepted in
IEEE Computing in Science and
Engineering, September 2013.
A pdf
version is available.
Unified
Model for Assessing
Checkpointing Protocols at
Extreme-Scale, George
Bosilca, Aurelien Bouteiller,
Elisabeth Brunet, Franck
Cappello, Jack Dongarra, Amina
Guermouche, Thomas Herault, Yves
Robert, Frederic Vivien, and
Dounia Zaidouni, accepted in
Concurrency and Computation:
Practice and Experience, October
2013.
A pdf
version is available.
Tridiagonalization of a Dense
Symmetric Matrix On Multiple
GPUs and Its Application to
Symmetric Eigenvalue Problems,
Ichitaro Yamazaki,
Tingxing Dong, Raffaele Solc�,
Stanimire Tomov, Jack Dongarra,
Thomas Schulthess, Concurrency
and Computation: Practice and
Experience, published online,
October 2013, DOI:
10.1002/cpe.3152
A pdf
version is available.
Post-Failure Recovery of MPI
Communication Capability:
Design and Rationale, Wesley
Bland, Aurelien Bouteiller,
Thomas Herault, George Bosilca
and Jack J. Dongarra,
International Journal of High
Performance Computing
Applications, Volume 27, Issue
3, Fall 2013, pp 44-254, DOI:
10.1177/1094342013488238.
A pdf
version is available.
Toward High Performance Divide
and Conquer Eigensolver for
Dense Symmetric Matrices,
Azzam Haidare Hatem Ltaief, and
Jack Dongarra, SIAM SISC, Vol.
34, No. 6, pp. C249-C274.
A pdf
version is available.
Accelerating Linear
System Solutions Using
Randomization Techniques,
Marc Baboulin, Jack Dongarra,
Julien Herrmann, and Stanimire
Tomov, ACM TOMS, Vol. 39,
No 2 (2013).
A pdf
version is available.
Level-3
Cholesky Factorization
Routines Improve Performance
of Many Cholesky Algorithms,
Fred G. Gustavson, Jerzy
Wasniewski, Jack J. Dongarra, J.
Herrero, and J. Langou, ACM
Transactions on Mathematical
Software (TOMS), Vol. 39, No 2
(2013).
A pdf
version is available.
High Performance
Bidiagonal Reduction using
Tile Algorithms on Homogeneous
Multicore Architectures,
H. Ltaief, P. Luszczek, and J.
Dongarra, ACM Transactions on
Mathematical Software, Volume
39, Issue 3, April 2013.
A pdf
version is available.
An
Evaluation of User-Level
Failure Mitigation support in
MPI, Aurelien
Bouteiller, Wesley Bland, Thomas
Herault, Joshua Hursey, George
Bosilca and Jack Dongarra,
Recent Advances in the Message
Passing Interface, Lecture Notes
in Computer Science Volume 7490,
2012, pp 193-203, ISSN:
0010-485X, April 2013.
A pdf
version is available.
Kernel-Assisted
and Topology-Aware Collective
Communications on
Multi-core/Many-core Platforms,
Teng Ma, George Bosilca,
Aurelien Bouteiller, Jack
Dongarra, Journal of Parallel
and Distributed Computing,
Volume 73, Issue 7, pp.
1000-1010, July 2013. (Best
paper award IPDPS 2013
Conference)
A pdf
version is available.
BlackjackBench:
Portable Hardware
Characterization with
Automated Results Analysis,
Anthony Danalis, Piotr Luszczek,
Gabriel Marin, Jeffrey S. Vetter
and Jack Dongarra, Computer
Journal, 2013; doi:
10.1093/comjnl/bxt057.
A pdf
version is available.
Enabling
Workflows in GridSolve:
Request Sequencing and Service
Trading, Yinan Li, Asim
YarKhan, Jack Dongarra, Keith
Seymour, and Aurlie Hurault, The
Journal of Supercomputing, June
2013, Volume 64, Issue 3, pp
1133-1152.
A pdf
version is available.
Correlated
Set Coordination in Fault Tolerant
Message Logging Protocols,
A. Boureiller, T. Herault, G.
Bosilca, J. Dongarra, Concurrency
and Computation: Practice and
Experience, Volume 25, Issue 4,
pages 572-585, 2013.
A pdf
version is available.
LU
Factorization with Partial
Pivoting for a Multicore
System with Accelerators,
J. Kurzak, P. Luszczek, and J.
Dongarra, IEEE Transactions on
Parallel and Distributed
Computing, August 2013 (vol. 24
no. 8), pp. 1613-1621.
A pdf
version is available.
Soft Error Resilient
QR Factorization for Hybrid
System with GPGPU,P. Du,
P. Luszczek, S. Tomov, and J.
Dongarra, accepted in Journal of
Computational Science, January
2013.
A pdf
version is available.
Hierarchical
QR factorization algorithms
for multi-core cluster systems,
Jack Dongarra, Mathieu Faverge,
Thomas Herault, Mathias
Jacquelin, Julien Langou, Yves
Robert, Parallel Computing,
Volume 39, Issues 4-5, April-May
2013, Pages 212�€“232.
A pdf
version is available.
A
Block-Asynchronous Relaxation
Method for Graphics Processing
Units, Hartwig Anzt,
Stanimire Tomov, Jack Dongarra,
Vincent Heuveline, Journal of
Parallel and Distributed
Computing, Journal of Parallel
and Distributed Computing,
Online June 6, 2013,
http://dx.doi.org/10.1016/j.bbr.2011.03.031
A pdf
version is available.
Extending
the Scope of the
Checkpoint-on-Failure Protocol
for Forward Recovery in
Standard MPI, Wesley
Bland, Peng Du, Aurelien
Bouteiller, Thomas Herault,
George Bosilca, Jack J.
Dongarra, accepted in
Concurrency and Computing:
Practice and Experience, June
2013.
A pdf
version is available.
Achieving Numerical
Accuracy and High Performance
using Recursive Tile LU
Factorization, J.
Dongarra, M. Faverge, P.
Luszcsek, Accepted Concurrency
and Computation: Practice and
Experience, July 2013.
A pdf
version is available.
Optimizing
Memory-Bound Numerical Kernels
on GPU Hardware Accelerators,
A. Abdelfattah, J. Dongarra, D.
Keyes, and H. Ltaief, 10th
International Meeting on
High-Performance Computing for
Computational Science (VECPAR
2012), Lecture Notes in Computer
Science 7851, pp 72-79, 2013.
A pdf
version is available.
Programming
the LU Factorization for a
Multicore System with
Accelerators, Jakub
Kurzak, Piotr Luszczek, Mathieu
Faverge, and Jack Dongarra, 10th
International Meeting on
High-Performance Computing for
Computational Science (VECPAR
2012), Lecture Notes in Computer
Science 7851, pp 28-35, 2013.
A pdf
version is available.
Dense
Linear Algebra on Distributed
Heterogeneous Hardware with a
Symbolic DAG Approach,
George Bosilca, Aurelien
Bouteiller, Anthony Danalis,
Thomas Herault, Piotr Luszczek,
and Jack J. Dongara, in the book
Scalable Computing and
Communications: Theory and
Practice, edited by Samee U.
Khan, Lizhe Wang, and Albert Y.
Zomaya, Publisher John Wiley
& Sons, ISBN:
978-1-1181-6265-1, 2013.
A pdf
version is available.
Keeneland:
Computational Science Using
Heterogeneous GPU Computing,
J. Vetter, R. Glassbrook, K.
Schwan, S. Yalamanchili, M.
Horton, A. Gavrilovska, M.
Slawinska, J. Meredith, P. Roth,
K. Spafford, S. Tomov, J.
Wynkoop, Ed. Jeffrey S. Vetter,
Contemporary High Performance
Computing: From Petascale Toward
Exascale, Taylor and Francis,
Boca Raton, CRC Computational
Science Series, 2013.
A pdf
version is available.
HPC
Challenge: Design, History,
and Implementation Highlights,
J. Dongarra and P. Luszczek, Ed.
Jeffrey S. Vetter, Contemporary
High Performance Computing: From
Petascale Toward Exascale,
Taylor and Francis, Boca Raton,
CRC Computational Science
Series, 2013, ISBN:
978-1-4665-6834-1.
A pdf
version is available.
Multithreading
in the PLASMA Library,
Jakub Kurzak, Piotr Luszczek,
Asim YarKhan, Mathieu Faverge,
Julien Langou, Henricus
Bouwmeester, and Jack Dongarra
in Mult and Many�€�Core
Processing: Architecture,
Programming, Algorithms, &
Applications, Edited by Mohamed
Ahmed, Reda A. Ammar,
Sanguthevar Rajasekaran Series:
Chapman & Hall/CRC Computer
& Information Science
Series, published by Taylor
& Francis, 2013.
A pdf
version is available.
Looking
Back at Dense Linear Algebra
Software, Piotr
Luszczek, Jakub Kurzak, and Jack
Dongarra, submitted to Journal
of Parallel and Distributed
Computing, August 2013.
A pdf
version is available.
Scalable
Dense Linear Algebra on
Heterogeneous Hardware,
George Bosilca, Aurelien
Bouteiller, Anthony Danalis,
Thomas Herault, Jakub Kurzak,
Piotr Luszczek, Stan Tomov, Jack
Dongarra, to appear in the book
HPC: Transition Towards Exascale
Processing, in the series
Advances in Parallel Computing,
IOS Press.
A pdf
version is available.
LAPACK,
CRC Handbook on Linear
Algebra, Second Edition,
Zhaojun Bai, James Demmel, Jack
Dongarra, Julien Langou, and
Jenny Wang, Editor Leslie
Hogben, CRC Press, to appear
2013.
A pdf
version is available.
Revisiting
the Double Checkpointing
Algorithm, Jack
Dongarra, Thomas Herault and
Yves Robert, 15th Workshop on
Advances in Parallel and
Distributed Computational
Models, at the IEEE
International Parallel &
Distributed Processing Symposium
2013, Boston MA, January 2013.
A pdf
version is available.
Implementing
a Blocked Aasen's Algorithm
with a Dynamic Scheduler on
Multicore Architectures,
Ichitaro Yamazaki, Dulceneia
Becker, Jack Dongarra, Alex
Druinsky, Inon Peled, and Sivan
Toledo, Grey Ballard, James
Demmel, and Oded Schwartz, 15th
Workshop on Advances in Parallel
and Distributed Computational
Models, at the IEEE
International Parallel &
Distributed Processing Symposium
2013, (Best Paper Award0, Boston
MA, January 2013.
A pdf
version is available.
Virtual
Systolic Array for QR
Decomposition, Jakub
Kurzak, Piotr Luszczek, Mark
Gates, Ichitaro Yamazaki, and
Jack Dongarra, 15th
Workshop on Advances in Parallel
and Distributed Computational
Models, at the IEEE
International Parallel &
Distributed Processing Symposium
2013, Boston MA, January 2013.
A pdf
version is available.
clMAGMA:
High Performance Dense Linear
Algebra with OpenCL, C.
Cao, Jack Dongarra, Peng Du,
Mark Gates, Piotr Luszczek,
Stanimire Tomov, International
Workshop on OpenCL (IWOCL),
GATech, May 13-14, 2013.
A pdf
version is available.
A
Parallel solver for
Incompressible Fluid Flows,
Y. Wang, M. Baboulin, J.
Dongarra, J. Falcou, Y
Fraigneau, and O. Le Maitre,
International Conference on
Computational Science, ICCS
2013, Barcelona, Spain, May,
2013.
A pdf
version is available.
Leading
Edge Hybrid Multi-GPU
Algorithms for Generalized
Eigenproblems in Electronic
Structure Calculations,
Azzam Haidar, Raffaele Solca,
Mark Gates, Stanimire Tomov,
Thomas Schulthess, and Jack
Dongarra, International
Supercomputing Conference ISC,
Germany, Lecture Notes in
Computer Science, Volume 7905,
2013, pp 67-80.
A pdf
version is available.
Beyond
the CPU: Hardware Performance
Counter Monitoring on Blue
Gene/Q, Dan Terpstra,
Kris Davis, Heike McCraw, Jack
Dongarra, International
Supercomputing Conference ISC,
Germany, Lecture Notes in
Computer Science, Volume
7905, 2013, pp 213-225.
A pdf
version is available.
Toward
a scalable multi-GPU
eigensolver via
compute-intensive kernels and
efficient communication,
Azzam Haidar, Mark Gates,
Stanimire Tomov, Jack Dongarra,
ICS '13 Proceedings of the 27th
international ACM conference on
International conference on
supercomputing, Pages 223-232,
ACM New York, NY, USA, June
2013, Eugene Oregon.
A pdf
version is available.
Portable
HPC Programming on Intel
Many-Integrated-Core Hardware
with MAGMA Port to Xeon Phi,
Jack Dongarra, Mark Gates, Azzam
Haidar, Yulu Jia, Khairul Kabir,
Piotr Luszczek and Stan Tomov,
To appear in the PPAM Conference
2013, Warsaw, Poland, September
2013.
A pdf
version is available.
Standards
for Graph Algorithm Primitives,
Tim Mattson et. al, to appear
HPEC�€™2013, Boston, September
10, 2013.
A pdf
version is available.
Implementing
a Systolic Algorithm for QR
Factorization on Multicore
Clusters with PaRSEC,
Guillaume Aupy, Mathieu Faverge,
Yves Robert, Jakub Kurzak, Piotr
Luszczek, and Jack Dongarra,
accepted in the 6th Workshop on
Productivity and Performance
held in conjunction with
Euro-Par 2013, Aachen, Germany
August 26 or 27, 2013.
A pdf
version is available.
Parallel
Reduction to Hessenberg Form
with Algorithm-based Fault
Tolerance, Yulu Jia,
George Bosilca, Piotr Luszczek,
and Jack J. Dongarra, accepted
in SC2013, July 2013.
A pdf
version is available.
- 2012 -
Autotuning GEMMs
for Fermi, Jakub
Kurzak, Stanimire Tomov, and
Jack Dongarra, IEEE
Transactions on Parallel and
Distributed Systems, vol.
23, no. 11, November 2012,
pp 2045-2057.
A pdf
version is available.
Energy Footprint of
Advanced Dense Numerical
Linear Algebra using Tile
Algorithms on Multicore
Architecture, Jack
Dongarra, Hatem Ltaief,
Piotr Luszczek, and Vince M.
Weaver, The 2nd
International Conference on
Cloud and Green
Computing(CGC 2012), pp 274
- 281, ISBN:
978-1-4673-3027-5, November
1-3, 2012, Xiangtan, Hunan,
China.
A pdf
version is available.
A Novel Hybrid CPU-GPU
Generalized Eigensolver
for Electronic Structure
Calculations Based on Fine
Grained Memory Aware
Tasks, Raffaele
Solc�, Azzam Haidar,
Stanimire Tomov, Jack
Dongarra, and Thomas C.
Schulthess, Proceeding SC
'12 Proceedings of the 2012,
High Performance Computing,
Networking Storage and
Analysis, Pages 1338-1339
IEEE Computer Society
Washington, DC, USA.
A pdf
version is available.
Autotuning GEMMs for
Fermi,Jakub Kurzak,
Stanimire Tomov, and Jack
Dongarra, IEEE Transactions
on Parallel and Distributed
Systems, vol. 23, no. 11,
November 2012, pp 2045-2057.
A pdf
version is available.
Analysis
of Dynamically Scheduled
Tile Algorithms for Dense
Linear Algebra on
Multicore Architectures, A.
Haidar, H. Ltaief, A,
YarKhan, J. Dongarra,
Concurrency and
Computations, Volume 24,
Issue 3, pages 305�€“321,
10 March 2012.
A pdf
version is available.
From CUDA to OpenCL:
Towards a
Performance-portable
Solution for
Multi-platform GPU
Programming, P. Du,
R. Weber, P. Luszczek, S.
Tomov, G. Peterson, and J.
Dongarra, Parallel
Computing, Volume 38, Issue
8, August 2012, pp. 391-407.
A pdf
version is available.
High-performance
computing systems: Status
and Outlook, ,
Jack Dongarra and A. J. van
der Steen, Acta Numerica
(2012), pp. 1-96.
A pdf
version is available.
An Implementation
of the Tile QR
Factorization for a GPU
and Multiple CPUs,
Jakub Kurzak, Rajib Nath,
Peng Du, and Jack Dongarra,
in Applied Parallel and
Scientific Computing, PARA
2010, Editor Lristjan
Jonasson, Springer, LNCS,
Volume 7133, pp 248-257,
2012.
A pdf
version is available.
DAGuE: A generic
distributed DAG engine for
high performance
computing, G. Bosilca,
A. Bouteiller, A. Danalis,
T. Herault, P. Lemarinier,
J. Dongarra, Parallel
Computing, Volume 38, Issue
1-2, pp. 37 �€“ 51, 2012.
A pdf
version is available.
Divide and Conquer
on Hybrid GPU-Accelerated
Multicore Systems,
Christof V�mel, Stanimire
Tomov, and Jack Dongarra,
SIAM J. Sci. Comput. Volume
34, pp. C70-C82, 2012.
A pdf
version is available.
A Comprehensive
Study of Task Coalescing
for Selecting Parallelism
Granularity in a Two-Stage
Bidiagonal Reduction,
A. Haidar, H. Ltaief, P.
Luszczek, and J. Dongarra,
26th IEEE International
Parallel & Distributed
Processing Symposium
(IPDPS), Shanghai, China,
May 2012.
A pdf
version is available.
A Tiled Parallel Solver
For Symmetric Indefinite
Systems On Multicore
Architectures,Marc
Babolin, D. Becker, and J.
Dongarra, 26th IEEE
International Parallel &
Distributed Processing
Symposium (IPDPS), Shanghai,
China, May 2012.
A pdf
version is available.
Algorithm-Based Fault
Tolerance for Dense Matrix
Factorization,Peng Du,
Aurelien Bouteiller, George
Bosilca, Jack J. Dongarra,
Thomas Herault, 17th ACM
SIGPLAN Symposium on
Principles and Practice of
Parallel Programming
(PPoPP), February 25-29,
2012, New Orleans, LA.
A pdf
version is available.
Block-asynchronous
Multigrid Smoothers for
GPU-accelerated Systems,Hartwig
Anzt, Stan Tomov, Mark
Gates, Jack Dongarra, and
Vincent Heuveline, Procedia
Computer Science,
Proceedings of the
International Conference on
Computational Science, ICCS
2012, Volume 9, 2012, Pages
7�€“16, 2012.
A pdf
version is available.
From Serial Loops to
Parallel Execution on
Distributed Systems,
Anthony Danalis, Aurelien
Bouteiller, George Bosilca,
Jack J. Dongarra, Thomas
Herault, submitted to PPoPP
2012.
A pdf
version is available.
HierKNEM: An Adaptive
Framework for
Kernel-Assisted and
Topology-Aware Collective
Communications on
Many-core Clusters,(Best
Paper),
Teng Ma, G. Bosilca, A.
Bouteiller, J. Dongarra,
26th IEEE International
Parallel & Distributed
Processing Symposium
(IPDPS), Shanghai, China,
May 2012..
A pdf
version is available.
Weighted
Block-Asynchronous
Relaxation for
GPU-Accelerated Systems,
Hartwig Anzt, Jack Dongarra,
and Vincent Heuveline,
submitted to SIAM Journal on
Computing March 2012.
A pdf
version is available.
Dense Linear Algebra on
Accelerated Multicore
Hardware, Jack
Dongarra, Jakub Kurzak,
Piotr Luszczek, and
Stanimire Tomov, in High
Performance Scientific
Computing: Algorithms and
Applications, Editors
Michael W. Berry, Kyle A.
Gallivan, Efstratios
Gallopoulos, Ananth Grama,
Bernard Philippe, Yousef
Saad and Faisal Saied,
Springer, 2012.
A pdf
version is available.
Enhancing
Parallelism of Tile
Bidiagonal Transformation
on Multicore Architectures
using Tree Reduction,
H. Ltaief, P. Luszczek, and
J. Dongarra, in Lecture Notes in
Computer Science, Volume
7203, 2012, Parallel
Processing and Applied
Mathematics 9th
International Conference,
PPAM 2011, Torun, Poland,
September 11-14, 2011, Part
I, Roman Wyrzykowski, Jack
Dongarra , Konrad Karczewski
and Jerzy Wasniewski, pp
661-670, 2012.
A pdf
version is available.
Reducing the Amount
of Pivoting in Symmetric
Indefinite Systems,
D. Becker, M. Babolin, J.
Dongarra, in Lecture Notes
in Computer Science, Volume
7203, 2012, Parallel
Processing and Applied
Mathematics 9th
International Conference,
PPAM 2011, Torun, Poland,
September 11-14, 2011, Part
I, Roman Wyrzykowski, Jack
Dongarra , Konrad Karczewski
and Jerzy Wasniewski, pp
133-142, 2012.
A pdf
version is available.
Block-asynchronous
Multigrid Smoothers for
GPU-accelerated Systems, Hartwig
Anzt,
Stan Tomov, Mark Gates, Jack
Dongarra, and Vincent
Heuveline, International
Conference on Computational
Science, International
Conference on Computational
Science, (ICCS) 2012, May
2012, Omaha NE.
A pdf
version is available.
One-sided
dense matrix
factorizations on a
multicore with multiple
GPU accelerators in MAGMA,
Ichitaro Yamazaki,
Stanimire Tomov, and Jack
Dongarra, International
Conference on Computational
Science, ICCS 2012, Omaha
NE.
A pdf
version is available.
A
Class of
Communication-Avoiding
Algorithms for Solving
General Dense Linear
Systems on CPU/GPU
Parallel Machines, Marc
Baboulin,
Simplice Donfack, Jack
Dongarra, Laura Grigori,
Adrien R�emy, Stanimire
Tomov, International
Conference on Computational
Science, ICCS 2012, Omaha
NE.
A pdf
version is available.
High
Performance Dense Linear
System Solver with
Resilience to Multiple
Soft Errors, P.
Du, P. Luszczek, and J.
Dongarra, International
Conference on Computational
Science, ICCS 2012, Omaha
NE.
A pdf
version is available.
Enabling and
Scaling Matrix
Computations on
Heterogeneous Multi-Core
and Multi-GPU Systems,
Fengguang Song and Jack
Dongarra, ICS 2012
Conference, 26th
International Conference on
Supercomputing, 25-29 June
2012, San Servolo Island,
Venice, Italy.
A pdf
version is available.
A
Scalable Framework for
Heterogeneous GPU-Based
Clusters, F. Song
and J. Dongarra, ACM
Symposium on Parallelism in
Algorithms and Architectures
(SPAA '12), Pittsburgh, USA
on January 2012.
A pdf
version is available.
A
Checkpoint-on-Failure
Protocol for
Algorithm-Based Recovery
in Standard MPI, Wesley
Bland, Peng Du, Aurelien
Bouteiller, Thomas Herault,
George Bosilca, and Jack J.
Dongarra, Euro-Par 2012
Parallel Processing, Lecture
Notes in Computer Science
Volume 7484, 2012, pp
477-488 as a distinguished
paper.
A pdf
version is available.
From
Serial Loops to Parallel
Execution on Distributed
Systems, Anthony
Danalis, Aurelien
Bouteiller, George Bosilca,
Jack J. Dongarra, Thomas
Herault, Euro-Par 2012
Parallel Processing, Lecture
Notes in Computer Science
Volume 7484, 2012, pp
246-257.
A pdf
version is available.
Power Profiling of
Cholesky and QR
Factorizations on
Distributed Memory Systems,
George Bosilca, Jack
Dongarra, and Hatem Ltaief,
accepted at the EnA-HPC 2012
: Third International
Conference on Energy-Aware
High Performance Computing,
International Conference on
Energy-Aware High
Performance Computing,
September 12-14, 2012.
A pdf
version is available.
Energy
Footprint of Advanced
Dense Numerical Linear
Algebra using Tile
Algorithms on Multicore
Architecture, Jack
Dongarra, Hatem Ltaief,
Piotr Luszczek, and Vince M.
Weaver, submitted to The 2nd
International Conference on
Cloud and Green
Computing(CGC 2012) November
1-3, 2012, Xiangtan, Hunan,
China.
A pdf
version is available.
Anatomy of a
Globally Recursive
Embedded LINPACK Benchmark,
Piotr Luszczek and Jack
Dongarra, accepted in 2012
IEEE High Performance
Extreme Computing
Conference, Waltham,
Massachusetts, September
2012.
A pdf
version is available.
Weights for
Block-Asynchronous
Iteration on
GPU-Accelerated Systems,
Hartwig Anzt, Stanimire
Tomov, Jack Dongarra, and
Vincent Heuveline, To appear
in the 10th HeteroPar'2012
(Tenth International
Workshop on Algorithms,
Models and Tools for
Parallel Computing on
Heterogeneous Platforms),
Rhodes Island, Greece,
August 2012.
A pdf
version is available.
GPU-Accelerated
Asynchronous Error
Correction for Mixed
Precision Iterative
Refinement, H.
Anzt, P. Luszczek, J.
Dongarra, V. Heuveline,
Euro-Par 2012 Parallel
Processing, Lecture Notes in
Computer Science Volume
7484, 2012, pp 908-919, Rhodes Island,
Greece, August 2012.
A pdf
version is available.
- 2011 -
High-Performance
High-Resolution
Semi-Lagrangian Tracer
Transport on a Sphere, T.
White and J. Dongarra,
Journal of Computational
Physics, Volume 230 Issue
17, July, 2011, pp
6778-6799.
A pdf
version is available.
A Class of Hybrid LAPACK
Algorithms for Multicore
and GPU Architectures,
M. Horton, S. Tomov, and J.
Dongarra, to appear 2011
Symposium on Application
Accelerators in High
Performance Computing, 19-21
July, 2011, Knoxville TN.
A pdf
version is available.
Algorithm-based Fault
Tolerance Method for Soft
Error Resilience in
High-Performance Linpack,
Peng Du, Piotr Luszczek, and
Jack Dongarra, IEEE Cluster
2011, September 26-30,
Austin, TX.
A pdf
version is available.
Analysis of Dynamically
Scheduled Tile Algorithms
for Dense Linear Algebra
on Multicore
Architectures, Azzam ,
Hatem Ltaief, Asim YarKhan
and Jack Dongarra, IPDPS
2011, Anchorage, AK, May
2011.
A pdf
version is available.
BLAS for GPUs, R.
Nath, S. Tomov, and J.
Dongarra, pp 57-80, in
Scientific Computing with
Multicore and Accelerators,
Edited by Jakub Kurzak,
David Bader, and Jack
Dongarra, Chapman &
Hall/CRC Computational
Science Series, ISBN
978-1-4398-2536-5, 2011.
A pdf
version is available.
Changes in Dense Linear
Algebra Kernels,
Decades-long perspective,
Piotr Luszczek, Jakub
Kurzak, and Jack Dongarra,
pp 313-342, in Solving the
Schr�dinger equation: has
everything been tried?
Editor Paul Popular,
Imperial College Press,
2011, ISBN-13
978-1-84816-724-7.
A pdf
version is available.
DAGuE: A generic
distributed DAG engine for
high performance
computing,G. Bosilca,
A. Bouteiller, A. Danalis,
T. Herault, P. Lemarinier,
J. Dongarra, Parallel and
Distributed Processing
Workshops and Phd Forum
(IPDPSW), 2011 IEEE
International Symposium on ,
pp.1151-1158, 16-20 May
2011, ISSN: 1530-2075.
A pdf
version is available.
Dense Linear Algebra for
Hybrid GPU-Based Systems,
S. Tomov and J. Dongarra, pp
37-56, in Scientific
Computing with Multicore and
Accelerators, Edited by
Jakub Kurzak, David Bader,
and Jack Dongarra, Chapman
& Hall/CRC Computational
Science Series, ISBN
978-1-4398-2536-5, 2011.
A pdf
version is available.
Evaluation of the HPC
Challenge Benchmarks in
Virtualized Environments,
P. Luszczek, E. Meek, S.
Moore, D. Terpstra, J.
Dongarra, 6th Workshop on
Virtualization in
High-Performance Cloud
Computing (VHPC '11) as part
of Euro-Par 2011, Bordeux
France.
A pdf
version is available.
Exploiting Fine-Grain
Parallelism in Recursive
LU Factorization, Jack
Dongarra, Mathieu Faverge,
Hatem Ltaief, Piotr
Luszczek, International
Conference on Parallel
Computing, 30 August - 2
September 2011, Ghant
Belgium.
A pdf
version is available.
Flexible Development of
Dense Linear Algebra
Algorithms on Massively
Parallel Architectures
with DPLASMA,George
Bosilca, Aurelien
Bouteiller, Anthony Danalis,
Mathieu Faverge, Azzam
Haidar, Thomas Herault,
Jakub Kurzak, Julien Langou,
Pierre Lemarinier, Hatem
Ltaief, Piotr Luszczek, Asim
YarKhan, Jack Dongarra, 12th
IEEE International Workshop
on Parallel and Distributed
Scientific and Engineering
Computing (PDSEC-11), May
16-20, 2011, Anchorage,
Alaska, USA.
A pdf
version is available.
Fully Empirical Autotuned
Dense QR Factorization For
Multicore Architectures,
E. Agullo, J. Dongarra, R.
Nath, S. Tomov, EuroPar
2011.
A pdf
version is available.
High Performance Matrix
Inversion Based on LU
Factorization for
Multicore Architectures,J.
Dongarra, M. Faverge, H.
Ltaief, P. Luszcsek, 4th
Workshop on Many-Task
Computing on Grids and
Supercomputers (MTAGS) 2011,
Co-located with
Supercomputing/SC 2011,
Seattle Washington, November
14th, 2011.
A pdf
version is available.
High-Performance
High-Resolution
Semi-Lagrangian Tracer
Transport on a Sphere,
T. White and J. Dongarra,
Journal of Computational
Physics, Volume 230 Issue
17, July, 2011, pp
6778-6799.
A pdf
version is available.
Impact of Kernel-Assisted
MPI Communication over
Scientific Applications:
CPMD and FFTW, T. Ma,
A. Bouteiller, G. Bosilca,
J. Dongarra, EuroMPI-2011,
September 19-21, 2011,
Santorini Greece.
A pdf
version is available.
Implementing Matrix
Factorization on the Cell
B.E., J. Kurzak, and
J. Dongarra, pp. 21-35, in
Scientific Computing with
Multicore and Accelerators,
Edited by Jakub Kurzak,
David Bader, and Jack
Dongarra, Chapman &
Hall/CRC Computational
Science Series, ISBN
978-1-4398-2536-5, 2011.
A pdf
version is available.
Implementing Matrix
Multiplication on the Cell
B.E., W. Alvaro, J.
Kurzak, and J. Dongarra, pp
3-20, in Scientific
Computing with Multicore and
Accelerators, Edited by
Jakub Kurzak, David Bader,
and Jack Dongarra, Chapman
& Hall/CRC Computational
Science Series, ISBN
978-1-4398-2536-5, 2011.
A pdf
version is available.
Improvement of
parallelization efficiency
of batch pattern BP
training algorithm using
Open MPI, Volodymyr
Turchenko, Lucio
Grandinetti, George Bosilca
and Jack J. Dongarra,
International Conferenc e on
Computational Science, ICCS
2010, Amsterdam The
Netherlands, June 2010.
A pdf
version is available.
Keeneland: Bringing
Heterogeneous GPU
Computing to the
Computational Science
Community, J.S.
Vetter, R. Glassbrook, J.
Dongarra, K. Schwan, B.
Loftis, S. McNally, J.
Meredith, J. Rogers, P.
Roth, K. Spafford, and S.
Yalamanchili, IEEE Computing
in Science and Engineering,
13(5):90-5, 2011, ISSN:
1521-9615.
A pdf
version is available.
Level-3 Cholesky
Factorization Routines
Improve Performance of
Many Cholesky Algorithms,
Fred G. Gustavson, Jerzy
Wasniewski, Jack J.
Dongarra, J. Herrero, and J.
Langou, accepted in ACM
TOMS, June 2011.
A pdf
version is available.
LU Factorization for
Accelerator-based Systems,
Emmanuel Agullo, C�edric
Augonnet, Jack Dongarra,
Mathieu Faverge, Julien
Langou, Hatem Ltaief,
Stanimire Tomov, The 9TH
ACS/IEEE International
Conference on Computer
Systems and Applications
AICCSA 2011, June 27th -
June 30th 2011, Sharm
El-Sheikh, Egypt.
A pdf
version is available.
Multithreading in the
PLASMA Library,Jakub
Kurzak, Piotr Luszczek, Asim
YarKhan, Mathieu Faverge,
Julien Langou, Henricus
Bouwmeester, and Jack
Dongarra in Multi- and
Many-Core Technologies:
Architecture, Programming,
Algorithms, &
Applications, published by
Taylor & Francis, 2011.
A pdf
version is available.
OMPIO: A Modular Software
Architecture for MPI I/O,
Mohamad Chaarawi, Edgar
Gabriel, Rainer Keller,
Richard Graham, George
Bosilca and Jack Dongarra,
EuroMPI-2011, September
19-21, 2011, Santorini
Greece.
A pdf
version is available.
On Scalability for MPI
Runtime Systems,
George Bosilca, Thomas
Herault, Ala Rezmerita and
Jack Dongarra, The
International Workshop on
Runtime and Operating
Systems for Supercomputers,
May 31, 2011.
A pdf
version is available.
Optimizing Symmetric
Dense Matrix-Vector
Multiplication on GPUs,
Jakub Kurzak, Jack Dongarra,
and Rajib Nath, IEEE/ACM
SC11 Conference, Seattle WA,
November 2011.
A pdf
version is available.
Overlapping Computation
and Communication for
Advection on Hybrid
Parallel Computers, J.
White and J. Dongarra, IPDPS
2011, Anchorage, AK, May
2011.
A pdf
version is available.
Parallel Reduction to
Condensed Forms for
Symmetric Eigenvalue
Problems using Aggregated
Fine-Grained and
Memory-Aware Kernels,
Hatem Ltaief, Azzam , and
Jack Dongarra, IEEE/ACM SC11
Conference, Seattle WA,
November 2011.
A pdf
version is available.
Performance Portability
of a GPU Enabled
Factorization with the
DAGuE Framework,Aurelien
Bouteiller, George Bosilca,
Jack J. Dongarra, Thomas
Herault, Pierre Lemarinier,
Stanimir Tomov and Narapat
Ohm Saengpatsa, IEEE
Cluster: workshop on
Parallel Programming on
Accelerator Clusters (PPAC),
June 24, 2011.
A pdf
version is available.
Profiling High
Performance Dense Linear
Algebra Algorithms on
Multicore Architectures
for Power and Energy
Efficiency, Hatem
Ltaief, Piotr Luszczek and
Jack Dongarra, the
International Conference on
Energy-Aware High
Performance Computing
September 07-09, 2011,
Hamburg, Germany.
A pdf
version is available.
QCG-OMPI: MPI
Applications on Grids,
Emmanuel Agullo, Camille
Coti, Thomas Herault, Julien
Langou, Sylvain Peyronnet,
Ala Rezmerita, Franck
Cappello, Jack Dongarra,
Future Generation Computer
Systems, Volume 27, Issue 4,
pp 357-369, April 2011.
A pdf
version is available.
QR Factorization of Tall
and Skinny Matrices in a
Grid Computing
Environment, Emmanuel
Agullo, Camille Coti, Jack
Dongarra, Thomas Herault,
and Julien Langou,
UT-CS-10-651, Janua ry 6,
2010.
A pdf
version is available.
QR Factorization on a
Multicore Node Enhanced
with Multiple GPU
Accelerators, E.
Agullo, C. Augonnet, J.
Dongarra, M. Feverge, H.
Ltaief, S. Thibault, S.
Tomov, IPDPS 2011,
Anchorage, AK, May 2011.
A pdf
version is available.
Recent Advances in the
Message Passing Interface
18th European MPI Users'
Group Meeting,EuroMPI
2011 Santorini, Greece,
September 18-21, 2011,
Yiannis Cotronis, Anthony
Danalis, Dimitrios S.
Nikolopoulos, and Jack
Dongarra (Eds.) Springer,
LNCS, Volume 6960, 2011,
ISSN 0302-9743, ISBN
978-3-642-24448-3.
Rectangular Full Packed
Format for Cholesky's
Algorithm: Factorization,
Solution, and Inverse.
Fred G. Gustavson, Jerzy
Wasniewski, Jack J.
Dongarra, and J. Langou, ACM
TOMS, Volume 37, Number 2,
2011, pp. 18-1:18-21, 2011,
ISSN 0098-3500.
A pdf
version is available.
Reducing the Amount of
Pivoting in Symmetric
Indefinite Systems, D.
Becker, M. Babolin, J.
Dongarra, to appear PPAM,
October 2011.
A pdf
version is available.
Scalable Runtime for MPI:
Efficiently Building the
Communication
Infrastructure, G.
Bosilca, T. Herault, P.
Lemarinier, A. Rezmerita,
and J. Dongarra,
EuroMPI-2011, September
19-21, 2011, Santorini
Greece.
A pdf
version is available.
Scientific Computing with
Multicore and
Accelerators, Edited
by Jakub Kurzak, David
Bader, and Jack Dongarra,
Chapman & Hall/CRC
Computational Science
Series, ISBN
978-1-4398-2536-5, 2011.
Soft Error Resilient QR
Factorization for Hybrid
System with GPGPU,P.
Du, P. Luszczek, S. Tomov,
and J. Dongarra, Workshop on
Latest Advances in Scalable
Algorithms for Large-Scale
Systems (ScalA) held in
conjunction with the 24th
IEEE/ACM International
Conference on High
Performance Computing,
Networking, Storage and
Analysis (SC) 2011, November
14, 2011, Seattle, WA, USA.
A pdf
version is available.
Solving the Generalized
Symmetric Eigenvalue
Problem using Tile
Algorithms on Multicore
Architectures, Hatem
Ltaief, Piotr Luszczek, and
Jack Dongarra, International
Conference on Parallel
Computing, 30 August - 2
September 2011, Ghant
Belgium.
A pdf
version is available.
The International
Exascale Software Roadmap,
J. Dongarra, P. Beckman, et.
al, International Journal of
High Performance Computing,
Volume 25, Number 1, pp.
3-60, 2011, ISSN 1094-3420.
A pdf
version is available.
Toward High
Performance and
Conquer Eigensolver for
Dense Symmetric Matrices,
Azzam Haidar, Hatem Ltaief,
and Jack Dongarra, submitted
to SIAM SISC, February 2011.
A pdf
version is available.
Towards an efficient tile
matrix inversion of
symmetric positive
definite matrices on
multicore architectures,Agullo,
E.,
Bouwmeester, H., Dongarra,
J., Kurzak, J., Langou, J.,
and Rosenberg, L., In
Proceedings of the 9th
International Meeting on
High Performance Computing
for Computational Science,
VEC- PAR'10, Berkeley, CA,
June 22-25 2011.
A pdf
version is available.
Trace-based Performance
Analysis for the Petascale
Simulation Code FLASH,
Heike Jagode, Jack Dongarra,
Andreas Knupfer, Matthias
Jurenz, Matthias S. Muller,
and Wolfgang E. Nagel,
International Journal of
High Performance Computing,
Volume 25, Number 4, Winter
2011, pp. 428-439, ISSN
1094-3420.
A pdf
version is available.
Two-Stage Tridiagonal
Reduction for Dense
Symmetric Matrices using
Tile Algorithms on
Multicore Architectures,
Piotr Luszczek, Hatem
Ltaief, and Jack Dongarra,
IPDPS 2011, Anchorage, AK,
May 2011.
A pdf
version is available.
- 2010 -
Accelerating
the Reduction to Upper
Hessenberg, Tridiagonal,
and Bidiagonal Forms
Through Hybrid GPU-Based
Computing, S.
Tomov, R. Nath, and J.
Dongarra, Parallel
Computing, Volume 36, Number
12, 2010, pp. 45-654.
A pdf
version is available.
An Improved MAGMA GEMM
for Fermi GPUs, Rajib
Nath, Stanimire Tomov, and
Jack Dongarra, International
Journal of High Performance
Computing Applications,
Volume 24, number 4, 2010,
pp 511-515, ISSN 1094-3420.
A pdf
version is available.
Dense Linear Algebra
Solvers for Multicore with
GPU Accelerators,
Stanimire Tomov, Rajib Nath,
Hatem Ltaief, and Jack
Dongarra, Proceedings of
IPDPS 2010: 24th IEEE I
nternational Parallel and
Distributed Processing
Symposium, Atlanta, GA,
April 2010.
A pdf
version is available.
Empirical Performance
Tuning of Dense Linear
Algebra Software, Jack
Dongarra and Shirley Moore,
pp 255-272, in Performance
Tuning of Scientific
Applications, David H.
Bailey, Robert F. Lucas,
Samuel W. Williams, Editors,
Chapman & Hall/CRC
Computational Science
Series, ISBN
978-1-4398-1569-4, 2010.
A pdf
version is available.
Faster, Cheaper, Better -
a Hybridization
Methodology to Develop
Linear Algebra Software
for GPUs, Emmanuel
Agullo, Cedric Augonnet,
Jack Dongarra, Hatem Ltaief,
Raymond Namyst, Samuel
Thibault, and Stanimire
Tomov, Nvidia GPU Gems,
Morgan Kaufmann (Ed.), 2010.
A pdf
version is available.
Hybrid Multicore Cholesky
Factorization with
Multiple GPU Accelerators,
H. Ltaief, S. Tomov, R.
Nath, and J. Dongarra,
Submitted to IEEE
Transaction on Parallel and
Distributed Computing, March
2010.
A pdf
version is available.
Parallel Band Two-Sided
Matrix Bidiagonalization
for Multicore
Architectures, Hatem
Ltaief, Jakub Kurzak, and
Jack Dongarra, IEEE
Transactions on Parallel and
Distributed Systems, April
2010, pp 417-423.
A pdf
version is available.
Redesigning the Message
Logging Model for High
Performance, A.
Bouteiller, G. Bosilca, and
J. Dongarra, Concurrency and
Computation Practice and
Experience, Volume 22,
Number 15, November 2010, pp
2196-2212, ISSN 1532-0626.
A pdf
version is available.
Scheduling Linear Algebra
Operations on Multicore
Processors, Jakub
Kurzak, Hatem Ltaief, Jack
Dongarra, and Rosa M. Badia,
Concurrency and Computation:
Practice and Experience,
Vol. 22, no. 1, pp. 15-44,
January, 2010.
A pdf
version is available.
Scheduling Two-sided
Transformations using
Algorithms-by-Tiles on
Multicore Architectures,
H. Ltaief, J. Kurzak, J.
Dongarra, and R. Badia,
Scientific Programming,
Volume 18, Number 1, pp
35-50, 2010, ISSN 1058-9244.
A pdf
version is available.
Self-Healing Network for
Scalable Fault-Tolerant
Runtime Environments,
T. Angskun, G. Fagg, G.
Bosilca, J.
Pjesivac-Grbovic, and J
Dongarra, Future Generation
Computer Systems, Volume 26,
Issue 3, pp 479-485, March
2010, ISSN 0167-739X, 2010.
A pdf
version is available.
SmartGridRPC: The new RPC
model for high performance
Grid computing and its
implementation in
SmartGridSolve, T.
Brady, A. Lastovetsky, K.
Seymour, M. Guidolin,and J.
Dongarra, Concurrency
Practice and Experience, pp
2467-2487, Volume 22 Number
18, ISSN 1532-0626, 2010.
A pdf
version is available.
Towards Dense Linear
Algebra for Hybrid GPU
Accelerated Manycore
Systems, Parallel
Computing, Volume 36, Issues
5-6, pp 232-240, 2010, ISSN
0167-8191.
A pdf
version is available.
Reliability and
Performance Modeling and
Analysis for Grid
Computing,Yuan-Shun
Dai, Jack Dongarra, in
Handbook of Research on
Scalable Computing
Technologies, Editors
Kuan-Ching Li, Ching-Hsien
Hsu, Laurence Tianruo Yang,
Jack Dongarra, Hans Zima,
IGI Global, 2010.
A pdf
version is available.
Transparent
Cross-Platform Access to
Software Services using
GridSolve and GridRPC,
Keith Seymour, Asim YarKhan,
and Jack Dongarra to appear
in Cloud Computing and
Software Services: Theory
and Techniques, editors Syed
Ahson and Mohammad Ilyas,
2010, CRC Press.
A pdf
version is available.
- 2009 -
A Class of Parallel Tiled
Linear Algebra Algorithms
for Multicore
Architectures, Alfredo
Buttari, Julien Langou,
Jakub Kurzak, and Jack
Dongarra, Parallel
Computing, Volume 35, Issue
1, pp 38-53, 2009,
ISSN:0167-8191
A pdf
version is available.
Accelerating Scientific
Computations with Mixed
Precision Algorithms,
Marc Baboulin, Alfredo
Buttari, Jack Dongarra,
Jakub Kurzak, Julie Langou,
Julien Langou, Piotr
Luszczek, and Stanimire
Tomov, Computer Physics
Communications 180 (2009)
2526-2533.
A pdf
version is available.
Accelerating
Time-To-Solution for
Computational Science and
Engineering, J.
Demmel, J. Dongarra, A. Fox,
S. Williams, V. Volkov, and
K. Yelick, SciDAC Review,
Winter 2009, pp 46-57.
A pdf
version is available.
Algorithmic Based Fault
Tolerance Applied to High
Performance Computing,
Jack J. Dongarra, George
Bosilca, Remi Delmas, and
Julien Langou, Journal of
Parallel and Distributed
Computing, Volume 69, pp
410-416, 2009.
A pdf
version is available.
Computing the
Conditioning of the
Components of a Linear
Least Squares Solution,
Marc Baboulin, Jack
Dongarra, and Julien
Langou,Numerical Linear
Algebra with Applications,
July 2009, Volume 16 Issue
7, p 517-533.
A pdf
version is available.
Highly Scalable
Self-Healing Algorithms
for High Peroformance
Scientific Computing,
Zizhong Chen and Dongarra,
J.IEEE Transactions on
Computing, Volume 58, Number
11, November 2009, pp
1512-1524, ISSN 0018-9340.
A pdf
version is available.
Optimizing Matrix
Multiplication for a
Short-Vector SIMD
Architecture -- CELL
Processor, Wesley
Alvaro, Jakub Kurzak, and
Jack Dongarra, Parallel
Computing, Volume 35, pp
138-150, 2009.
A pdf
version is available.
Paravirtualization Effect
on Single- and
Multi-threaded
Memory-Intensive Linear
Algebra Software,
Lamia Youseff, Keith
Seymour, Haihang You,
Dmitrii Zagorodnov, Jack
Dongarra, and Rich Wolski,
Cluster Computing Journal,
Volume 12, Number 2 / June,
2009, pp 101-122, ISSN
1386-7857.
A pdf
version is available.
QR Factorization for the
CELL Processor, Jakub
Kurzak and Jack Dongarra,
Accepted in Scientific
Programming, Scientific
Programming, Volume 17,
Issue 1-2, January 2009, pp
31-42, ISSN:1058-9244.
A pdf
version is available.
Scheduling Linear Algebra
Operations on Multicore
Processors, Jakub
Kurzak, Hatem Ltaief, Jack
Dongarra, and Rosa Badia, to
appear in Trends in High
Performance and Large Scale
Computing, editors L.
Grandinetti, G. Joubert, and
W. Gentzsch, IOP Press, to
be published in 2009.
A pdf
version is available.
The International
Exascale Software Project:
A Call to Cooperative
Action by the Global High
Performance Community,
Jack Dongarra, Pete Beckman,
Patrick Aerts, Frank
Cappello, Thomas Lippert,
Satoshi Matsuoka, Paul
Messina, Terry Moore, Rick
Stevens, Anne Trefethen,
Mateo Valero, Volume 23,
Number 4, Winter 2009,
International Journal of
High Performance Computer
Applications, pp 309-322,
ISSN 1094-3420.
A pdf
version is available.
The Problem with the
Linpack Benchmark Matrix
Generator, Julien
Langou and Jack Dongarra,
International Journal of
High Performance Computer
Applications, Volume 23,
Number 1, Spring 2009, pp 5
- 14.
A pdf
version is available.
- 2008 -
A Comparison of Search
Techniques for Empirical
Code Optimization,
Keith Seymour, Haihang You,
and Jack Dongarra, submitted
to The Third international
Workshop on Automatic
Performance Tuning, October
1st, 2008, Tsukuba
International Congress
Center, Epochal Tsukuba,
Japan.
A pdf
version is available.
A Tribute to Gene Golub,
Jack Dongarra,
Computing in Science and
Engineering, IEEE,
March/April 2008, pp 5.
A pdf
version is available.
Algorithm-Based Fault
Tolerance for Fail-Stop
Failures, Zizhong Chen
and Jack Dongarra, IEEE
Transactions on Parallel and
Distributed Systems, Vol.
19, No. 12, December, 2008.
A pdf
version is available.
Interactive Grid-Access
Using Gridsolve and Giggle,
M. Hardt, K. Seymour, J.
Dongarra, M. Zapf, and N.V.
Ruiter, Computing and
Informatics, Vol. 27, No. 2,
pp 233-248, 2008, ISSN
1335-9150.
A pdf
version is available.
Interior State
Computation of Nano
Structures,Andrew
Canning, Jack Dongarra,
Julien Langou, Osni Marques,
Stanimire Tomov, Christof
Voemel, and Lin-Wang Wang,
PARA 2008, 9th International
Workshop on State-of-the-Art
in Scientific and Parallel
Computing, May 13-16, 2008,
Trondheim Norway.
A pdf
version is available.
Netlib and NA-Net:
Building a Scientific
Computing Community,
J. Dongarra, G. Golub, E.
Grosse, C. Moler, K. Moore,
IEEE Annals of the History
of Computing, Volume 3
Number 2, April - June 2008,
pp 30 - 41.
A pdf
version is available.
Parallel Tiled QR
Factorization for
Multicore Architectures,
Alfredo Buttari, Julien
Langou, Jakub Kurzak, and
Jack Dongarra, Concurrency
and Computation: Practice
and Experience, 2008; 20:1573-1590.
A pdf
version is available.
Solving Systems of Linear
Equations on the CELL
Processor Using Cholesky
Factorization, Jakub
Kurzak, Alfredo Buttari, and
Jack Dongarra, IEEE
Transactions on Parallel and
Distributed Systems, Volume
19, Number 9, September
2008, pp 1 - 11.
A pdf
version is available.
Some Issues in Dense
Linear Algebra for
Multicore and Special
Purpose Architectures,
Marc Baboulin, Stan Tomov
and Jack Dongarra, PARA
2008, 9th International
Workshop on State-of-the-Art
in Scientific and Parallel
Computing, EECS Tech Report
UT-CS-08-615, LAWN #200, May
13-16, 2008, Trondheim
Norway.
A pdf
version is available.
State-of-the-Art
Eigensolvers for
Electronic Structure
Calculations of Large
Scale Nano-Systems,
Christof Vomel, Stanimire Z.
Tomov, Osni A. Marques, A.
Canning, Lin-Wang Wang, and
Jack J. Dongarra, Journal of
Computational Physics,
Volume 227, Issue 15 (July
2008), pages 7113-7124.
A pdf
version is available.
The PlayStation 3 for
High Performance
Scientific Computing,
Jakub Kurzak, Alfredo
Buttari, Piotr Luszczek, and
Jack Dongarra, Computing in
Science and Engineering,
IEEE, May/June 2008, pp
80-83.
A pdf
version is available.
Using Mixed Precision for
Sparse Matrix Computations
to Enhance the Performance
while Achieving 64-bit
Accuracy, Alfredo
Buttari, Jack Dongarra,
Jakub Kurzak, Piotr
Luszczek, and Stanimire
Tomov, ACM Transactions on
Mathematical Software,
Volume 34 Number 4, July
2008, pp 1 - 22.
A pdf
version is available.
- 2007 -
Mixed Precision Iterative
Refinement Techniques for
the Solution of Dense
Linear Systems, A.
Buttari, J. Dongarra, J.
Langou, J. Langou, P.
Luszczek, J. Kurzak, International
Journal of High
Performance Computing
Applications,
21(4):457-466, 2007.
A pdf
version is available.
Automatic Analysis of
Inefficiency Patterns in
Parallel Applications,
Felix Wolf, Bernd Mohr, Jack
Dongarra, Shirley Moore,
Concurrency and Computation:
Practice and Experience,
Volume 19, Issue 11, pp
1481-1496, August 2007.
A pdf
version is available.
Implementation of Mixed
Precision in Solving
Systems of Linear
Equations on the Cell
Processor, Jakub
Kurzak, Jack Dongarra,
Concurrency and Computation:
Practice and Experience,
Volume 19, Issue 10, pp
1371-1385, July 2007.
A pdf
version is available.
Improved Runtime and
Transfer Time Prediction
Mechanisms in a Network
Enabled Servers Middleware,
Emmanuel Jeannot, Keith
Seymour, Asim YarKhan, and
Jack J. Dongarra, Parallel
Processing Letters, March
2007, Volume 17, Number 1,
pp 47-59, ISSN 0129-6264.
A pdf
version is available.
Performance Analysis of
MPI Collective Operations,
Jelena Pjesivac-Grbovi´c,
Thara Angskun, George
Bosilca, Graham E. Fagg,
Edgar Gabriel, and Jack J.
Dongarra, Cluster Computing
Journal, Volume 10, pp
127-143, 2007.
A pdf
version is available.
Recovery Patterns for
Iterative Methods in a
Parallel Unstable
Environment, G.
Bosilca, Z. Chen, J.
Dongarra, and J. Langou,
SIAM Journal on Scientific
Computing, pp 102-116,
Volume 30, Number 1, 2007.
A pdf
version is available.
Scalability Analysis of
the SPEC OpenMP Benchmarks
on Large-Scale Shared
Memory Multiprocessors,
K. Fuerlinger, M. Gerndt, J.
Dongarra, in Lecture Notes
in Computer Science, Volumes
4487-4490, Computational
Science - ICCS 2007, 7th
International Conference
Beijing, China, May 27 - 30,
2007, Editors Yong Shi,
Geert Dick van Albada, Jack
Dongarra, and Peter M.A.
Sloot, ISBN-10
3-540-72589-X, ISSN
0302-9743, Springer Berlin /
Heidelberg, 2007.
A pdf
version is available.
The Impact of Multicore
on Computational Science
Software, Jack
Dongarra, Dennis Gannon,
Geoffrey Fox, and Ken
Kennedy, CTWatch Quarterly,
Volume 3 Number 1, February
2007, (Unreviewed).
A pdf
version is available.
The Use of Bulk States to
Accelerate the Band Edge
State Calculation of a
Semiconductor Quantum Dot,
Christof Vomel, Stanimire Z.
Tomov, Lin-Wang Wang, Osni
A. Marques, and Jack J.
Dongarra, Journal of
Computational Physics,
Volume 223, Number 2, pp
774-782, ISSN 0021-9991,
2007.
A pdf
version is available.
- 2006 -
Exploiting
the
Performance of
32 bit
Floating Point
Arithmetic in
Obtaining 64
bit Accuracy
(Revisiting
Iterative
Refinement for
Linear
Systems),
Julie Langou,
Julien Langou,
Piotr
Luszczek,
Jakub Kurzak,
Alfredo
Buttari and
Jack Dongarra,
Proceding of
the ACM/IEEE
SC2006
Conference on
High
Performance,
Networking,
and Computing,
November
11-17, 20006,
Tampa, FL,
https://doi.org/10.1145/1188455.1188573.
A pdf
version is
available.
Conjugate-Gradient
Eigenvalue Solvers in
Computing Electronic
Properties of
Nanostructure
Architectures,
Stanimire Tomov, Julien
Langou, Andrew Canning,
Lin-Wang Wang, and Jack
Dongarra, The International
Journal of Computational
Science and Engineering,
Volume 2, Number 3/4, pp
205-212, 2006, ISSN
1742-7185.
A pdf
version is available.
Design and Implementation
of the HPC Challenge
Benchmark Suite, Piotr
Luszczek, Jack Dongarra,
Jeremy Kepner, CTWatch
Quarterly, November 2006,
Volume 2, Number 4A,
http://www.ctwatch.org/quarterly/archives/november-2006/
(Unreviewed).
A pdf
version is available.
NanoPSE: A Nanoscience
Problem Solving
Environment for Atomistic
Electronic Structure of
Semiconductor
Nanostructures, W. B.
Jones, G. Bester, A.
Canning, A. Franceschetti,
P. A. Graf, K. Kim, J.
Langou, L.W. Wang, J.
Dongarra, and A. Zunger, ,
in "the Proceedings of
Science Discovery through
Advanced Computing (SciDAC
2005)", Journal of Physics:
Conference Series 16,
277-282, 2005.
A pdf
version is available.
Predicting the Electronic
Properties of 3D,
Million-Atom Semiconductor
Nanostructure
Architectures, A.
Zunger, A. Franceschetti, G.
Bester, W.B. Jones, Kwiseon
Kim, P. A. Graf, L-W. Wang,
A. Canning, O. Marques, C.
Voemel, J. Dongarra, J.
Langou and S. Tomov, Journal
of Physics: 46 (2006)
292-298.
A pdf
version is available.
Scheduling Workflow
Applications on Processors
with Different
Capabilities, Zhiao
Shi and Jack Dongarra,
Future Generation Computing
Systems, Volume 22, pp
665-675, 2006.
A pdf
version is available.
Recent Developments in
GridSolve, Asim
YarKhan, Keith Seymour,
Kiran Sagi, Zhiao Shi, and
Jack Dongarra, International
Journal of High Performance
Applications and
Supercomputing, Volume 20
Number 1 Spring 2006, ISSN
1094-3420, pp 131-132.
A pdf
version is available.
Self Adapting Numerical
Software (SANS) Effort,
George Bosilca, Zizhong
Chen, Jack Dongarra, Victor
Eijkhout, Graham E. Fagg,
Erika Fuentes, Julien
Langou, Piotr Luszczek,
Jelena Pjesivac-Grbovic,
Keith Seymour, Haihang You,
and Sathish S. Vadhiyar, IBM
Journal of Research and
Development, pp. 223-238,
Volume 50, Number 2/3, 2006.
A pdf
version is available.
Trends in
High-Performance Computing,
Jack Dongarra,
January/February 2006, IEEE
Circuits & Devices
Magazine, pp 22-27, ISSN
8755-3996.
A pdf
version is available.
Twenty-Plus Years of
Netlib and NA-Net, Part 1
and 2, SIAM News, pp
1-3, Volume 39, Number
3&4, April & May
2006 (Unreviewed news
article).
A pdf
version is available.
- 2005 -
A Not So Simple Matter of
Software, Jack
Dongarra, NCSA Access,
Summer 2005 (non-refereed
magazine publication).
A pdf
version is available.
A Scalable Approach to
MPI Application
Performance Analysis,
Shirley Moore, Felix Wolf,
Jack Dongarra, Sameer
Shende, Patricia Teller, and
Bernd Mohr, Volume 3666,
Recent Advances in Parallel
Virtual Machine and
Messaging Passing Interface
Users' Group Meeting Euro
PVMMPI 2005, pp 309-316,
Springer Heidelberg, 2005,
ISSN: 0302-9743. A pdf
version is available.
An Asynchronous Algorithm
on NetSolve Global
Computing System, Jack
Dongarra, Nahid Emad, S. A.
Shahzadeh Fazeli, Future
Generation Computing Systems
, Vol. 22, No. 3, pp
279-290, 2005.
A pdf
version is available.
Biological Sequence
Alignment on the
Computational Grid using
the GrADS Framework,
Asim YarKhan and Jack
Dongarra, Future Generation
Computer Systems, Volume 21,
Issue 6, pp 980-986, June
2005.
A pdf
version is available.
Condition Numbers of
Gaussian Random Matrices,
Zizhong Chen and Jack
Dongarra, SIAM Matrix
Analysis and Applications,
Volume 27, Number 3, pp
603-620, 2005.
A pdf
version is available.
Evaluating Dynamic
Communicators and
One-Sided Operations for
Current MPI Libraries,
Edgar Gabriel, Graham E.
Fagg, and Jack J. Dongarra,
International Journal of
High Performance Computing
Applications, Volume 19,
Number 1, pp 67-81, Spring
2005, ISSN 1094-3420.
A pdf
version is available.
Hash Functions for
Datatype Signatures in
MPI, George Bosilca,
Jack Dongarra, Graham Fagg,
and Julien Langou, Lecture
Notes in Computer Science,
Volume 3666, Recent Advances
in Parallel Virtual Machine
and Messaging Passing
Interface Users' Group
Meeting Euro PVMMPI 2005, pp
76-83, Springer Heidelberg,
2005, ISSN: 0302-9743.
A pdf
version is available.
High Performance
Computing: Clusters,
Constellations, MPPs, and
Future Directions,
Jack Dongarra, Thomas
Sterling, Horst Simon, and
Erich Strohmaier, Computing
in Science and Engineering,
Volume 7, Number 2,
March/April 2005, pp. 51-59,
ISSN 1521-9615.
A pdf
version is available.
New Grid Scheduling and
Rescheduling Methods in
the GrADS Project, F.
Berman, H. Casanova, A
Chien, K. Cooper, H. Dail,
A. Dasgupta, W. Deng, J.
Dongarra, L. Johnsson, K.
Kennedy, C. Koelbel, B. Liu,
X. Liu, A. Mandal, G. Marin,
M. Mazina, J.
Mellor-Crummey, C. Mendes,
A. Olugbile, M. Patel, D.
Reed, Z. Shi,O. Sievert, H.
Xia, and A.YarKhan,
International Journal of
Parallel Programming, Vol.
33, No. 2, June 2005.
A pdf
version is available.
Process Fault-Tolerance:
Semantics, Design and
Applications for High
Performance Computing,
Graham E. Fagg, Edgar
Gabriel, Zizhong Chen, Thara
Angskun, George Bosilca,
Jelena Pjesivac-Grbovic, and
Jack J. Dongarra,
International Journal for
High Performance
Applications and
Supercomputing, Vol. 19, N0.
4, pp 465-478. 2005.
A pdf
version is available.
Recent Trends in the
Marketplace of High
Performance Computing,
Erich Strohmaier, Jack J.
Dongarra, Hans W. Meuer, and
Horst D. Simon, Parallel
Computing, Volume 31, Issues
3-4 , pp 261-273,
March-April 2005.
A pdf
version is available.
Scalable Fault Tolerant
MPI: Extending the
Recovery Algorithm,
Graham E. Fagg, Thara
Angskun, George Bosilca,
Jelena Pjesivac-Grbovic, and
Jack J. Dongarra, Lecture
Notes in Computer Science,
Volume 3666, Recent Advances
in Parallel Virtual Machine
and Messaging Passing
Interface Users' Group
Meeting Euro PVMMPI 2005, pp
67-75, Springer Heidelberg,
2005, ISSN: 0302-9743.
A pdf
version is available.
Scanning the Special
Issue on Program
Generation Optimization
and Platform Adaptation,
J.M.F. Moura, M. Puschel, D.
Padua, and J. Dongarra,
Proceedings of the IEEE,
Volume 93, Number 2,
February 2005, pp 211-215,
ISSN 0018-9219.
A pdf
version is available.
Self Adapting Linear
Algebra Algorithms and
Software, Jim Demmel,
Jack Dongarra, Victor
Eijkhout, Erika Fuentes,
Antoine Petitet, Rich Vuduc,
R. Clint Whaley, Katherine
Yelick, Proceedings of the
IEEE, Volume 93, Number 2,
February 2005, pp 293-312,
ISSN 0018-9219.
A pdf
version is available.
Self Adaptivity in Grid
Computing, S. Vadhiyar
and J. Dongarra, Concurrency
and Computation: Practice
and Experience. Volume 17,
Issue 2-4, 2005, pp.
235-257.
A pdf
version is available.
The Component Structure
of a Self-Adapting
Numerical Software System,
Victor Eijkhout, Erika
Fuentes, Thomas Eidson, and
Jack Dongarra, International
Journal of Parallel
Programming, Vol. 33, No. 2,
June 2005.
A pdf
version is available.
The Top500 and
Computational Science, A
not so simple matter of
software, Jack
Dongarra, Scientific
Computing, pp 14-16, August
2005 (non-refereed magazine
publication).
A pdf
version is available.
- 2004 -
Simplified Grid
Computing through
Spreadsheets and NetSolve,
David Abramson, Jack
Dongarra, Eric Meek, Paul
Roe, Zhiao Shi, High
Performance Computing and
Grid in Asia Pacific Region,
2004. Proceedings. Seventh
International Conference,
22-22 July 2004 DOI:
10.1109/HPCASIA.2004.1324012
A pdf
version is available.
Building and Using a
Fault Tolerant MPI
Implementation,
Graham E Fagg and Jack J
Dongarra, International
Journal of High
Performance Applications
and Supercomputing,
Volume 18, number 3,
Fall 2004, pp 353-362,
ISSN 1094-3420.
A pdf
version is available.
GrADSolve - A
Grid-based RPC system
for Remote Invocation
of Parallel Software,
Sathish Vadhiyar
and Jack Dongarra,
Journal of Parallel and
Distributed Computing,
64(6):774-783, June
2004, ISSN 0743-7315.
A pdf
version is available.
Self Adapting
Software for Numerical
Linear Algebra and
LAPACK for Clusters,
Z. Chen, J. Dongarra, P.
Luszczek, and K. Roche,
Parallel Computing
29(11-12):1723-1743,
November/December 2003,
ISSN 0167-8191.
A pdf
version is available.
The Virtual
Instrument: Support
for Grid-enabled MCell
Simulations, Henri
Casanova, Thomas Bartol,
Francine Berman, Erhan
Gokcay, Adam Birnbaum,
Jack Dongarra, Mark
Ellisman, Marcio
Faerman, Michelle
Miller, Graziano
Obertelli, Stuart
Pomerantz, Terry
Sejnowski, Joel Stiles,
Rich Wolski,
International Journal of
High Performance
Computing Applications,
Volume 18, Number 1,
Spring 2004, pp 3-18,
ISSN 1094-3420.
A pdf
version is available.
Toward an Accurate
Model for Collective
Communications,
Sathish Vadhiyar, Graham
Fagg, Jack Dongarra,
International Journal of
High Performance
Computing Applications,
Volume 18, Number 1,
Spring 2004, pp 159-166,
ISSN 1094-3420.
A pdf
version is available.
Trends in High
Performance Computing,
Jack Dongarra, The
Computer Journal,
47(4):399-403, The
British Computer
Society, 2004.
A pdf
version is available.
- 2003 -
Self Adaptability in
Grid Computing,
S. Vadhiyar and J.
Dongarra, Currency and
Computation: Practice
and Experience, January
2003, ISSN 1532-0634.
A pdf
version is available.
Self-adapting
Numerical Algorithm
for Next Generation
Applications, J.
Dongarra and V.
Eijkhout, International
Journal of High
Performance Computing
Applications
17(2):125-132, Summer
2003, ISSN 1094-3420. A
pdf
version is available.
Self-adapting
Numerical Software and
Automatic Tuning of
Heuristics, Jack
Dongarra and Victor
Eijkhout, Lecture Notes
in Computer Science,
Volume 2660,
Springer-Verlag
Heidelberg, pp 759 -
770, ISSN: 0302-9743,
June 2003.
A pdf
version is available.
SRS: A Framework for
Developing Malleable
and Migratable
Parallel Applications
for Distributed
Systems, S. S.
Vadhiyar and J. J.
Dongarra, Parallel
Processing Letters
13(2):291-312, June
2003, ISSN 0129-6264.
A pdf
version is available.
The LINPACK
Benchmark: Past,
Present, and Future,
J. J. Dongarra, P.
Luszczek, and A.
Petitet, Concurrency and
Computation: Practice
and Experience
15(9):803-820, August
2003, ISSN 1532-0634.
A pdf
version is available.
- 2002 -
A Parallel
Implementation of the
Nonsymmetric QR
Algorithm for
Distributed Memory
Architectures, G.
Henry, D. Watkins, and
J. Dongarra, SIAM
Journal on Scientific
Computing 24(1):284-311,
January 2003, ISSN
1064-8275.
A pdf
version is available.
An Updated Set of
Basic Linear Algebra
Subprograms (BLAS),
L. S. Blackford, J.
Demmel, J. Dongarra, I.
Duff, S. Hammarling, G.
Henry, M. Heroux, L.
Kaufman, A. Lumsdaine,
A. Petitet, R. Pozo, K.
Remington, and R. C.
Whaley, ACM Transactions
on Mathematical Software
28(2):135-151, June
2002, ISSN 0098-3500.
A pdf
version is available.
Automatic Translation
of Fortran to JVM
Bytecode, K.
Seymour and J. Dongarra,
Concurrency and
Computation: Practice
and Experience
15(3-5):207-222,
March/April 2003, ISSN
1532-0626 (print),
1532-0634 (electronic).
A pdf
version is available.
Basic Linear Algebra
Subprograms Technical
(BLAST) Forum Standard,
Special Issue - Part I,
International Journal of
High Performance
Computing Applications
16(1):1-111, Spring
2002, ISSN 1094-3420.
A pdf
version is available.
Basic Linear Algebra
Subprograms Technical
(BLAST) Forum Standard,
Special Issue - Part II,
International Journal of
High Performance
Computing Applications
16(2):115-199, Spring
2002, ISSN 1094-3420.
A pdf
version is available.
HARNESS Fault
Tolerant MPI Design,
Usage and Performance
Issues, G. E. Fagg
and J. J. Dongarra,
Future Generation
Computer Systems
18(8):1127-1142, October
2002, ISSN 0167-739X.
A pdf
version is available.
Innovations of the
NetSolve Grid
Computing System,
D. C. Arnold, H.
Casanova, and J.
Dongarra, Concurrency
and Computation:
Practice and Experience,
Special Issue: Grid
Computing Environments
14(13-15):1457-1479,
November/December 2002,
ISSN 1532-0626 (print),
1532-0634 (electronic).
A pdf
version is available.
Middleware for the
Use of Storage in
Communication, M.
Beck, D. Arnold, A.
Bassi, F. Berman, H.
Casanova, J. Dongarra,
T. Moore, G. Obertelli,
J. Plank, M. Swany, S.
Vadhiyar, and R. Wolski,
Parallel Computing
28(12):1773-1788,
December 2002, ISSN
0167-8191.
A pdf
version is available.
NetBuild: Transparent
Cross-Platform Access
to Computational
Software Libraries,
K. Moore and J.
Dongarra, Concurrency
and Computation:
Practice and Experience
14(13-15):1445-1456,
November/December 2002,
ISSN 1532-0626 (print),
1532-0634 (electronic).
A pdf
version is available.
- 2001 -
A Comparison of
Parallel Solvers for
Diagonally Dominant
and General
Narrow-Banded Linear
Systems, P.
Arbenz, A. Cleary, J.
Dongarra, and M.
Hegland, Parallel and
Distributed Computing
Practices, Special
Issue: Parallel
Numerical Linear Algebra
2(4):385-400, November
1999, ISSN 1097-2803.
A pdf
version is available.
Automated Empirical
Optimization of
Software and the ATLAS
Project, R.
Whaley, A. Petitet, and
J. Dongarra, Parallel
Computing 27(1-2):3-25,
January 2001, ISSN
0167-8191.
A pdf
version is available.
Biannual Top-500
Computer Lists Track
Changing Environments
for Scientific
Computing, J.
Dongarra, H. Meuer, H.
Simon, and E.
Strohmaier, SIAM News
34(9), November 2001,
ISSN 0036-1445.
A pdf
version is available.
HARNESS and Fault
Tolerant MPI, G.
Fagg, A. Bukovsky, and
J. Dongarra, Parallel
Computing
27(11):1479-1496,
October 2001, ISSN
0167-8191.
A pdf
version is available.
High Performance
Computing Trends,
J. J. Dongarra, H. W.
Meuer, H. D. Simon, and
E. Strohmaier, HERMIS
2:155-163, November
2001, ISSN 1108-7609.
A pdf
version is available.
Iterative Solver
Benchmark, J.
Dongarra, V. Eijkhout,
and H. van der Vorst,
Scientific Programming
9(4):223-231, 2001, ISSN
1058-9244.
A pdf
version is available.
Measuring Computer
Performance: A
Practitioner��‚��„�s
Guide, Book Review
by D. Lilja, Cambridge
University Press (ISBN
0-521-64105-5), SIAM
Review 43(2):383-384,
2001, ISSN 0036-1445.
A pdf
version is available.
Network-Enabled
Solvers: A Step Toward
Grid-Based Computing,
J. Dongarra, SIAM News
34(10), December 2001,
ISSN 0036-1445.
A pdf
version is available.
Numerical Libraries
and the Grid, A.
Petitet, S. Blackford,
J. Dongarra, B. Ellis,
G. Fagg, K. Roche, and
S. Vadhiyar,
International Journal of
High Performance
Computing Applications
15(4):359-374, Winter
2001, ISSN 1094-3420.
A pdf
version is available.
Numerical Libraries
and Tools for Scalable
Parallel Cluster
Computing, J.
Dongarra, S. Moore, and
A. Trefethen,
International Journal of
High Performance
Computing Applications
15(2):175-180, Summer
2001, ISSN 1094-3420.
A pdf
version is available.
On the Convergence of
Computational and Data
Grids, D. C.
Arnold, S. S. Vahdiyar,
and J. J. Dongarra,
Parallel Processing
Letters 11(2-3):187-202,
June/September 2001,
ISSN 0129-6264.
A pdf
version is available.
Recursive Approach in
Sparse Matrix LU
Factorization, J.
Dongarra, V. Eijkhout,
and P. Luszczek,
Scientific Programming
9(1):51-60, 2001, ISSN
1058-9244.
A pdf
version is available.
Telescoping
Languages: A Strategy
for Automatic
Generation of
Scientific
Problem-Solving
Systems from Annotated
Libraries, K.
Kennedy, B. Broom, K.
Cooper, J. Dongarra, R.
Fowler, D. Gannon, L.
Johnsson, J.
Mellor-Crummey, and L.
Torczon, Journal of
Parallel and Distributed
Computing
61(12):1803-1826,
December 2001, ISSN
0743-7315.
A pdf
version is available.
The GrADS Project:
Software Support for
High-Level Grid
Application
Development, F.
Berman, A. Chien, K.
Cooper, J. Dongarra, I.
Foster, D. Gannon, L.
Johnsson, K. Kennedy, C.
Kesselman, J.
Mellor-Crummey, D. Reed,
L. Torczon, and R.
Wolski, International
Journal of High
Performance Computing
Applications
15(4):327-344, Winter
2001, ISSN 1094-3420.
A pdf
version is available.
The Quest for
Petascale Computing,
J. Dongarra and D.
Walker, Computing in
Science and Engineering
3(3):32-39, May/June
2001, ISSN 1521-9615.
A pdf
version is available.
- 2000 -
A Portable
Programming Interface
for Performance
Evaluation on Modern
Processors, S.
Browne, J Dongarra, N.
Garner, G. Ho, and P.
Mucci, International
Journal of High
Performance Computing
Applications
14(3):189-204, Fall
2000, ISSN 1094-3420.
A pdf
version is available.
The Design And
Implementation Of The
Parallel Out-Of-Core
Scalapack LU, QR, And
Cholesky Factorization
Routines, E.
D'Azevedo and J.
Dongarra, Concurrency:
Practice and Experience
12(15):1481-1493, 2000,
ISSN 1040-3108.
A pdf
version is available.
- 1999 -
A Comparison Of
Parallel Solvers For
General Narrow Banded
Linear Systems, P.
Arbenz, A. Cleary, J.
Dongarra, and M.
Hegland, Parallel and
Distributed Computing
Practices 2(4):385-400,
December 1999, ISSN
1097-2803.
A pdf
version is available.
A Parallel Divide and
Conquer algorithm for
the Symmetric
Eigenvalue Problem,
F. Tisseur and J.
Dongarra, SIAM Journal
on Scientific Computing
6(20):2223-2236, 1999,
ISSN 1064-8275.
A pdf
version is available.
Adaptive Scheduling
for Task Farming with
Grid Middleware,
H. Casanova, M. Kim, J.
Plank, and J. Dongarra,
International Journal of
High Performance
Computing Applications
13(3):231-240, Fall
1999, ISSN 1094-3420.
A pdf
version is available.
Algorithmic Issues on
Heterogeneous
Computing Platforms,
Pierre Boulet, J.
Dongarra, F. Rastello,
Y. Robert, and F.
Vivien, Parallel
Processing Letters
9(2):197-213, 1999, ISSN
0129-6264.
A pdf
version is available.
Algorithmic
Redistribution Methods
for Block-Cyclic
Decompositions, A.
P. Petitet and J. J.
Dongarra, IEEE
Transactions on Parallel
and Distributed Systems
10(12):201-220, 1999,
ISSN 1045-9219.
A pdf
version is available.
Atlanta Organizers
Put Mathematics to
Work For the Math
Sciences Community,
M. Berry and J.
Dongarra, SIAM News
32(6), July/August 1999,
ISSN 0036-1445.
A pdf
version is available.
Deploying Fault
Tolerance and Task
Migration with
NetSolve, J. S.
Plank, H. Casanova, M.
Beck, and J. J.
Dongarra, Future
Generation Computer
Systems 15(5-6):745-755,
October 1999, ISSN
0167-739X.
A pdf
version is available.
Experiences with
Windows NT as a
Cluster Computing
Platform for Parallel
Computing, M.
Fischer and J. Dongarra,
Parallel and Distributed
Computing Practices,
Special Issue: Cluster
Computing 2(2):119-128,
June 1999, ISSN
1097-2803.
A pdf
version is available.
HARNESS: A Next
Generation Distributed
Virtual Machine,
M. Beck, J. J. Dongarra,
G. E. Fagg, G. A. Geist,
P. Gray, J. Kohl, M.
Migliardi, K. Moore, T.
Moore, P. Papadopoulous,
S. L. Scott, and V.
Sunderam, Future
Generation Computer
Systems 15(5-6):571-582,
October 1999, ISSN
0167-739X.
A pdf
version is available.
JLAPACK - Compiling
LAPACK Fortran to Java,
D. Doolin, J. Dongarra,
and K. Seymour,
Scientific Programming
7(2):111-138, 1999, ISSN
1058-9244.
A pdf
version is available.
Logistical Quality of
Service in NetSolve,
M. Beck, H. Casanova, J.
Dongarra, T. Moore, J.
Plank, F. Berman, and R.
Wolski, Computer
Communications
22(11):1034-1044, 1999,
ISSN 0140-3664.
A pdf
version is available.
Numerical Linear
Algebra Algorithms and
Software, J.
Dongarra and V.
Eijkhout, Journal of
Computational and
Applied Mathematics
123(1-2):489-514,
November 1, 2000, ISSN
0377-0427.
A pdf
version is available.
Scalable Networked
Information Processing
Environment (SNIPE),
G. E. Fagg, K. Moore,
and J. J. Dongarra,
Future Generation
Computer Systems
15(5-6):595-605, October
1999, ISSN 0167-739X.
A pdf
version is available.
Static Tiling For
Heterogeneous
Computing Platforms,
P. Boulet, J. Dongarra,
Y. Robert, and F.
Vivien, Parallel
Computing 25(5):547-568,
1999, ISSN 0167-8191.
A pdf
version is available.
Stochastic
Performance Prediction
for Iterative
Algorithms in
Distributed
Environments, H.
Casanova, M. Thomason,
and J. Dongarra, Journal
of Parallel and
Distributed Computing
58(1):68-91, July 1999,
ISSN 0743-7315.
A pdf
version is available.
The Marketplace for
High-Performance
Computers, E.
Strohmaier, J. Dongarra,
H. Meuer, and H. Simon,
Parallel Computing
25(13-14):1517-1545,
December 1999, ISSN
0167-8191.
A pdf
version is available.
Tiling On Systems
with
Communication/Computation
Overlap, P.-Y.
Calland, J. Dongarra,
and Y. Robert,
Concurrency: Practice
and Experience
11(3):139-153, 1999,
ISSN 1040-3108.
A pdf
version is available.
- 1998 -
Applying NetSolve's
Network Enabled Server,
H. Casanova and J.
Dongarra, IEEE
Computational Science
and Engineering
5(3):57-67,
July/September 1998,
ISSN 1070-9924.
A pdf
version is available.
Determining the Idle
Time of a Tiling: New
Results, F.
Desprez, J. Dongarra, F.
Rastello, and Yves
Robert, Journal of
Computing and
Information Science in
Engineering (Special
Issue on Compiler
Techniques for
High-Performance
Computing)
14(1):167-190, March
1998, ISSN 1530-9827.
A pdf
version is available.
Developing Numerical
Libraries in Java,
R. F. Boisvert, J. J.
Dongarra, R. Pozo, K. A.
Remington, and G. W.
Stewart, Concurrency:
Practice and Experience
10(11-13):1117-1129,
1998, ISSN 1040-3108.
A pdf
version is available.
National HPCC
Software Exchange
(NHSE): Uniting the
High Performance
Computing and
Communications
Community, S.
Browne, J. Dongarra, J.
Horner, P. McMahan, S.
Wells, D-Lib Magazine
(Electronic), May 1998,
ISSN 1082-9873.
A pdf
version is available.
Programming Tools and
Environments, J.
Saltz, A. Sussman, S.
Graham, J. Demmel, S.
Baden, and J. Dongarra,
Communications of the
ACM 41(11):64-73,
November 1998, ISSN
0001-0782
A pdf
version is available.
Scheduling
Block-Cyclic Array
Redistribution,
F. Desprez, J. Dongarra,
A. Petitet, C.
Randriamaro, and Y.
Robert, IEEE
Transactions on Parallel
and Distributed Systems
9(2):192-205, February
1998, ISSN 1045-9219.
A pdf
version is available.
Using Agent-based
Software for
Scientific Computing
in the NetSolve System,
H. Casanova and J.
Dongarra, Parallel
Computing
24(12-13):1777-1790,
November, 1998, ISSN
0167-8191.k
A pdf
version is available.
- 1997 -
Changing Technologies
of HPC, J. J.
Dongarra, H. W. Meuer,
H. D. Simon, and E.
Strohmaier, Future
Generation Computer
Systems 12(5):461-474,
April 1997, ISSN
0167-739X.
A pdf
version is available.
Fault Tolerant Matrix
Operations for
Networks of
Workstations Using
Diskless
Checkpointing, J.
Plank, Y. Kim, and J.
Dongarra, Journal of
Parallel and Distributed
Computing 43(2):125-138,
1997, ISSN 0743-7315.
A pdf
version is available.
Java Access to
Numerical Libraries,
H. Casanova, J.
Dongarra, and D. Doolin,
Concurrency: Practice
and Experience
9(11):1279-1291, 1997,
ISSN 1040-3108.
A pdf
version is available.
Key Concepts for
Parallel Out of Core
LU Factorization,
J. Dongarra, S.
Hammarling, and D.
Walker, Parallel
Computing 23(1-2):49-70,
April 1997. ISSN
0167-8191.
A pdf
version is available.
Message-Passing
Performance of Various
Computers, J.
Dongarra and T. Dunigan,
Concurrency: Practice
and Experience
9(10):915-926, 1997,
ISSN 1040-3108.
A pdf
version is available.
NetSolve: A
Network-Enabled Server
for Solving
Computational Science
Problems, H.
Casanova, and J.
Dongarra, The
International Journal of
Supercomputer
Applications and High
Performance Computing
11(3):212-223, Fall
1997. ISSN 1078-3482.
A pdf
version is available.
Practical Experience
in the Numerical
Dangers of
Heterogeneous
Computing, L. S.
Blackford, A. Cleary, J.
Demmel, J. Dongarra, I.
Dhillon, S. Hammarling,
A. Petitet, H. Ren, K.
Stanley, and R. C.
Whaley, ACM Transactions
on Mathematical Software
23(2):133-147, June
1997, ISSN 0098-3500.
A pdf
version is available.
The Spectral
Decomposition of
Nonsymmetric Matrices
on Distributed Memory
Computers, J. Bai,
J. Demmel, J. Dongarra,
A. Petitet, H. Robinson,
and K. Stanley, SIAM
Journal on Scientific
Computing
18(5):1446-1461, 1997,
ISSN 0196-5204.
A pdf
version is available.
Top500 Supercomputer
Sites, J.
Dongarra, H. W. Meuer
and E. Strohmaier,
Supercomputer 67:89-120,
1997, ISSN 0168-7875.
A pdf
version is available.
- 1996 -
A Message Passing
Standard for MPP and
Workstations, J.
Dongarra, S. W. Otto, M.
Snir, and D. Walker,
Communications of the
ACM 39(7):84-90, July
1996, ISSN 0001-0782.
A pdf
version is available.
Algorithmic
Bombardment for the
Iterative Solution of
Linear Systems: A
Poly-Iterative
Approach, R.
Barrett, M. Berry, J.
Dongarra, V. Eijkhout,
and C. Romine, Journal
of Computational and
Applied Mathematics
74(1-2):91-110, November
1996, ISSN 0377-0427.
A pdf
version is available.
Chebyshev tau - QZ
Algorithm Methods for
Calculating Spectra of
Hydrodynamic Stability
Problems, J.
Dongarra, B. Straughan
and D. W. Walker,
Applied Numerical
Mathematics
22(4):399-435, 1996,
ISSN 0168-9274.
A pdf
version is available.
Future Linear Algebra
Libraries, J.
Dongarra, IEEE
Computational Science
and Engineering
3(2):38-40, Summer 1996,
ISSN 1070-9924.
A pdf
version is available.
LAPACK for Fortran90,
J. Dongarra, J. Du Croz,
S. Hammarling, J.
Wasniewski, A. Zemla,
Applied Mathematics and
Computer Science
6(2):101-109, 1996, ISSN
1641-876X.
A pdf
version is available.
MPI: A Standard
Message Passing
Interface, J.
Dongarra and D. Walker,
Supercomputer
12(1):56-68, January
1996, ISSN 0168-7875.
Overview of
High-Performance
Computers, A. van
der Steen and J.
Dongarra, Electronic
Journal of the NHSE
Review 1(1), 1996, HTML.
PB-BLAS: A Set of
Parallel Block Basic
Linear Algebra
Subroutines, J.
Choi, J. Dongarra, and
D. Walker, Concurrency:
Practice and Experience
8(7):517-535, September
1996, ISSN 1040-3108.
A pdf
version is available.
PVMPI: An Integration
of PVM and MPI Systems,
G. Fagg and J. Dongarra,
Calculateurs Parallèles
8(2):151-166, 1996,
Hermes, ISSN 1260-3198.
A pdf
version is available.
ScaLAPACK: A Portable
Linear Algebra Library
for Distributed Memory
Computers - Design
Issues and Performance,
J. Choi, J. Demmel, J.
Dongarra, I. Dhillon, S.
Ostrouchov, A. Petitet,
K. Stanley, D. Walker,
and R. C. Whaley,
Computer Physics
Communications
97(1-2):1-15, August
1996, ISSN 0010-4655.
A pdf
version is available.
The Design and
Implementation of the
ScaLAPACK LU, QR, and
Cholesky Factorization
Routines, J. Choi,
J. J. Dongarra, L. S.
Ostrouchov, A. P.
Petitet, D. W. Walker
and R. C. Whaley,
Scientific Programming
5(3):173-184, Fall 1996,
ISSN 1058-9244.
A pdf
version is available.
- 1995 -
A Highly Parallel
Algorithm for the
Reduction of a
Nonsymmetric Matrix to
Block Upper-Hessenberg
Form, M. W. Berry,
J. Dongarra, and Y. Kim,
Parallel Computing
21(8):1189-1212, August
1995, ISSN 0167-8191.
A pdf
version is available.
Parallel Matrix
Transpose Algorithms
on Distributed Memory
Concurrent Computers,
J. Choi, J. Dongarra,
and D. Walker, Parallel
Computing
21(9):1387-1405, 1995,
ISSN 0167-8191.
A pdf
version is available.
Performance Study of
LU Factorization with
Low Communication
Overhead on
Multiprocessors,
F. Desprez, J. Dongarra,
and B. Tourancheau,
Parallel Processing
Letters 5(2):157-169,
June 1995, ISSN
0129-6264.
A pdf
version is available.
Recent Enhancements
to PVM, A.
Beguelin, J. Dongarra,
A. Geist, R. Manchek,
and V. Sunderam,
International Journal of
Supercomputer
Applications and High
Performance Computing
9(2):108-127, Summer
1995, ISSN 1078-3482.
A pdf
version is available.
Software Distribution
Using XNETLIB, J.
Dongarra, T. Rowan and
R. Wade, ACM
Transactions on
Mathematical Software
21(1):79-88, March 1995,
ISSN 0098-3500.
A pdf
version is available.
Software Libraries
for Linear Algebra
Computations on High
Performance Computers,
J. Dongarra and D.
Walker, SIAM Review
37(2):151-180, June
1995, ISSN 0036-1445.
A pdf
version is available.
The Design of a
Parallel, Dense Linear
Algebra Software
Library: Reduction to
Hessenberg,
Tridiagonal, and
Bidiagonal Form,
J. Choi, J. Dongarra,
and D. Walker, Numerical
Algorithms
10(3-4):379-400, 1995,
ISSN 1017-1398.
A pdf
version is available.
The National HPCC
Software Exchange,
S. Browne, J. Dongarra,
S. Green, K. Moore, T.
Rowan, R. Wade, G. Fox,
K. Hawick K. Kennedy, J.
Pool, R. Stevens, B.
Olsen, and T. Disz, IEEE
Computational Science
and Engineering
2(2):62-69, Summer 1995,
ISSN 1070-9924.
A pdf
version is available.
The Netlib
Mathematical Software
Repository, S.
Browne, J. Dongarra, E.
Grosse, and T. Rowan,
D-Lib Magazine,
Electronic Journal,
September 1995, ISSN
1082-9873,
http://www.dlib.org/dlib/september95/netlib/09browne.html.
A pdf
version is available.
The ParkBench
Benchmark Collection,
J. Dongarra and T. Hey,
Supercomputer
11(2-3):94-115, June
1995, ISSN 0168-7875.
Top500 Supercomputer
Sites, J.
Dongarra, H. Meuer and
E. Strohmaier,
Supercomputer
11(2-3):133-194, June
1995, ISSN 0168-7875.
A pdf
version is available.
- 1994 -
CRPC Research into
Linear Algebra
Software for
High-Performance
Computers, J.
Choi, J. J. Dongarra, R.
Pozo, D. C. Sorensen,
and D. W. Walker,
International Journal of
Supercomputing
Applications
8(2):99-118, Summer
1994, ISSN 0890-2720.
A pdf
version is available.
Experiences with CODE
and HeNCE in Visual
Programming for
Parallel Computing,
J. C. Browne, J.
Dongarra, S. I. Hyder,
K. Moore, and P. Newton,
IEEE Parallel and
Distributed Technology
3(1):75-83, Spring 1994,
ISSN 1063-6552.
A pdf
version is available.
HeNCE: A
Heterogeneous Network
Computing Environment,
A. Beguelin, J. J.
Dongarra, G. A. Geist,
R. Manchek, and K.
Moore, Scientific
Programming 3(1):49-60,
Spring 1994, ISSN
1058-9244.
A pdf
version is available.
MPI: A Message
Passing Interface
Standard, Special
Issue, International
Journal of Supercomputer
Applications
8(3-4):159-416,
Fall/Winter 1994, ISSN
0890-2720.
A pdf
version is available.
PARKBENCH Report - 1:
Public International
Benchmarks for
Parallel Computers,
PARKBENCH Committee
(assembled by R. Hockney
and M. Berry, with
contributions from D.
Bailey, M. Berry, J.
Dongarra, V. Getov, T.
Haupt, T. Hey, R.
Hockney, and D. Walker),
Scientific Programming
3(2):101-146, 1994, ISSN
1059-9244.
A pdf
version is available.
PDS: A Performance
Database Server,
M. W. Berry, J.
Dongarra, B. H. LaRose,
and T. Letsche,
Scientific Programming
3(2):147-156, 1994, ISSN
1059-9244.
A pdf
version is available.
PUMMA: Parallel
Universal Matrix
Multiplication
Algorithms on
Distributed Memory
Concurrent Computers,
J. Choi, J. J. Dongarra,
and D. W. Walker,
Concurrency: Practice
and Experience
6(7):543-570, October
1994, ISSN 1040-3108.
A pdf
version is available.
Scalability Issues in
the Design of a
Library for Dense
Linear Algebra, J.
J. Dongarra, R. A. van
de Geijn, and D. W.
Walker, Journal of
Parallel and Distributed
Computing 22(3):523-537,
September 1994, ISSN
0743-7315.
A pdf
version is available.
The PVM Concurrent
Computing System:
Evolution,
Experiences, and
Trends, V. S.
Sunderam, J. Dongarra,
G. A. Geist, and R
Manchek, Parallel
Computing 20(4):531-545,
March 31, 1994, ISSN
0167-8191.
A pdf
version is available.
- 1993 -
A Parallel Algorithm
for the Non-Symmetric
Eigenvalue Problem,
J. J. Dongarra and M.
Sidani, SIAM Journal on
Scientific Computing
14(3):542-569, May 1993,
ISSN 1064-8275.
A pdf
version is available.
Integrated PVM
Framework Supports
Heterogeneous Network
Computing, J.
Dongarra, G. A. Geist,
R. Manchek, and V. S.
Sunderam, Computers in
Physics 7(2):166-175,
April 1993, ISSN
0895-6111.
A pdf
version is available.
Linear Algebra
Libraries for
High-Performance
Computers: A Personal
Perspective, J.
Dongarra, IEEE Parallel
and Distributed
Technology: Systems and
Applications 1(1):17-24,
February 1993, ISSN
1063-6552.
A pdf
version is available.
Performance of
LAPACK: A Portable
Library of Numerical
Linear Algebra
Routines, E. C.
Anderson and J.
Dongarra, Proceedings of
the IEEE
81(8):1094-1102, August
1993, ISSN 0018-9219.
A pdf
version is available.
Supporting
Heterogeneous Network
Computing: PVM, J.
Dongarra, A. Geist, R.
Manchek, and V.
Sunderam, Chemical
Design Automation News
8(9-10):36-42,
September/October 1993,
ISSN 0886-6716.
A pdf
version is available.
Visualization and
Debugging in a
Heterogeneous
Environment, A.
Beguelin, J. Dongarra,
A. Geist, and V.
Sunderam, IEEE Computer
26(6):88-95, June 1993,
ISSN 0018-9162.
A pdf
version is available.
- 1992 -
ALGORITHM 710;
FORTRAN Subroutines
for Computing the
Eigenvalues and
Eigenvectors of a
General Matrix by
Reduction to General
Tridiagonal Form,
J. J. Dongarra, G. A.
Geist, and C. H. Romine,
ACM Transactions on
Mathematical Software
18(4):392-400, December
1992, ISSN 0098-3500.
A pdf
version is available.
Generalized QR
Factorization and Its
Applications, E.
Anderson, Z. Bai, and J.
Dongarra, Linear Algebra
and Its Applications
162-164:243-271,
February 1992, ISSN
0024-3795.
A pdf
version is available.
Numerical
Considerations in
Computing Invariant
Subspaces, J. J.
Dongarra, S. Hammarling
and J. H. Wilkinson,
SIAM Journal on Matrix
Analysis and
Applications
13(1):145-161, January
1992, ISSN 0895-4798.
A pdf
version is available.
Performance of
Various Computers
Using Standard Sparse
Linear Equations
Solving Techniques,
J. J. Dongarra and H. A.
van der Vorst,
Supercomputer
9(5):17-29, September
1992, ISSN 0168-7875.
A pdf
version is available.
Reduction to
Condensed Form for the
Eigenvalue Problem on
Distributed Memory
Architectures, J.
J. Dongarra and R. A.
van de Geijn, Parallel
Computing 18(9):973-982,
September 1992, ISSN
0167-8191.
A pdf
version is available.
- 1991 -
A Comparative Study
of Automatic
Vectorizing Compilers,
D. Levine, D. Callahan,
and J. Dongarra,
Parallel Computing,
17(10-11):1223-1244,
December 1991, ISSN
0167-8191.
A pdf
version is available.
Opening the Door to
Heterogeneous Network
Supercomputing, A.
Beguelin, J. Dongarra,
A. Geist, R. Manchek,
and V. Sunderam,
Supercomputing Review
4(9):44-45, September
1991, ISSN 1048-6836.
A pdf
version is available.
Parallel Loops - A
Test Suite for
Parallelizing
Compilers: Description
and Example Results,
J. Dongarra, M. Furtney,
S. Reinhardt and J.
Russell, Parallel
Computing
17(10-11):1247-1257,
December 1991, ISSN
0167-8191.
A pdf
version is available.
Special Report: 1990
Gordon Bell Prize
Winners, J.
Dongarra, A. H. Karp, K.
Miura, and H. Simon,
IEEE Software
8(3):92-97, 102,
May/June 1991, ISSN
0740-7459.
A pdf
version is available.
The IBM RISC
System/6000 and Linear
Algebra Operations,
J. Dongarra, P. Mayes
and G. Radicati di
Brozolo, Supercomputer
8(4):15-30, July 1991,
ISSN 0168-7875.
A pdf
version is available.
- 1990 -
A Set of Level 3
Basic Linear Algebra
Subprograms, J. J.
Dongarra, J. Du Croz, S.
Hammarling, and I. S.
Duff, ACM Transactions
on Mathematical Software
16(1):1-17, March 1990,
ISSN 0098-3500.
A pdf
version is available.
Evolution of
Numerical Software for
Dense Linear Algebra,
Jack Dongarra and Sven
Hammarling, In M. G. Cox
and S. Hammarling,
editors, Reliable
Numerical Computation,
pages 297-327. Oxford
University Press,
Oxford, UK, 1990.
A pdf
version is available.
Automatic
Blocking of Nested
Loops, R.
Schreiber and J.
Dongarra, University of
Tennessee Technical
Report CS-90-108,
Knoxville, TN 37996,
USA, 1990.
A pdf
version is available.
A Tool to Aid in the
Design,
Implementation, and
Understanding of
Matrix Algorithms for
Parallel Processors,
J. Dongarra, O. Brewer,
J. A. Kohl, and S.
Fineberg, Journal of
Parallel and Distributed
Computing 9(2):185-202,
June 1990, ISSN
0743-7315.
A pdf
version is available.
Algorithm 679; A Set
of Level 3 Basic
Linear Algebra
Subprogram: Model
Implementation and
Test Programs, J.
J. Dongarra, J. Du Croz,
S. Hammarling, and I. S.
Duff, ACM Transactions
on Mathematical Software
16(1):18-28, March 1990,
ISSN 0098-3500.
A pdf
version is available.
- 1989 -
Block Reduction of
Matrices to Condensed
Forms for Eigenvalue
Computations, J.
J. Dongarra, S. J.
Hammarling, and D. C.
Sorensen, Journal of
Computational and
Applied Mathematics
27(1-2):215-227,
September 1989, ISSN
0377-0427.
A pdf
version is available.
Shopping for
Mathematical Software
Electronically, J.
Dongarra and E. Grosse,
IEEE Potentials
8(1):37-38, February
1989, ISSN 0278-6648.
A pdf
version is available.
- 1988 -
Algorithm 656: An
Extended Set of Basic
Linear Algebra
Subprograms: Model
Implementation and
Test Programs, J.
J. Dongarra, J. Du Croz,
S. Hammarling, R. J.
Hanson, ACM Transactions
on Mathematical Software
14(1):18-32, March 1988,
ISSN 0098-3500.
A pdf
version is available.
An Extended Set of
Fortran Basic Linear
Algebra Subprograms,
J. J. Dongarra, J. Du
Croz, S. Hammarling, and
R. J. Hanson, ACM
Transactions on
Mathematical Software
14(1): 1-17, March 1988,
ISSN 0098-3500.
A pdf
version is available.
Programming
Methodology and
Performance Issues for
Advanced Computer
Architectures, J.
J. Dongarra, D. C.
Sorensen, K. Connolly,
and J. Patterson,
Parallel Computing
8(1-3):41-58, October
1988, ISSN 0167-8191.
A pdf
version is available.
Tools to Aid in the
Analysis of Memory
Access Patterns for
FORTRAN Programs,
O. Brewer, J. Dongarra,
and D. Sorensen,
Parallel Computing
9(1):25-35, December
1988, ISSN 0167-8191.
A pdf
version is available.
- 1987 -
A Fully Parallel
Algorithm for the
Symmetric Eigenvalue
Problem, J. J.
Dongarra and D. C.
Sorensen, SIAM Journal
on Scientific and
Statistical Computing
8(2):139-154, March
1987, ISSN 0196-5204.
A pdf
version is available.
A Portable
Environment for
Developing Parallel
FORTRAN Programs,
J. J. Dongarra and D. C.
Sorensen, Parallel
Computing
5(1-2):175-186, July
1987, ISSN 0167-8191.
A pdf
version is available.
Computer
Benchmarking: Paths
and Pitfalls, J.
Dongarra, J. Martin, and
J. Worlton, IEEE
Spectrum 24(7): 38-43,
June 1987, ISSN
0018-9235.
A pdf
version is available.
Distribution of
Mathematical Software
via Electronic Mail,
J. J. Dongarra and E.
Grosse, Communications
of the ACM
30(5):403-407, May 1987,
ISSN 0001-0782.
A pdf
version is available.
Solving Banded
Systems on a Parallel
Processor, J. J.
Dongarra and L.
Johnsson, Parallel
Computing
5(1-2):219-246, July
1987, ISSN 0167-8191.
A pdf
version is available.
- 1986 -
How Do the
"Minisupers" Stack Up?,
J. J. Dongarra, IEEE
Computer 19(3):93, 100,
March 1986, ISSN
0018-9162.
A pdf
version is available.
Implementing Dense
Linear Algebra
Algorithms Using
Multitasking on the
CRAY X-MP-4 (Or
Approaching the
Gigaflop), J. J.
Dongarra and T. Hewitt,
SIAM Journal on
Statistical and
Scientific Computing
7(1):347-350, January
1986, ISSN 0196-5204.
A pdf
version is available.
Implementation of
Some Concurrent
Algorithms for Matrix
Factorization, J.
J. Dongarra, A. H.
Sameh, and D. C.
Sorensen, Parallel
Computing 3(1):25-34,
March 1986, ISSN
0167-8191.
A pdf
version is available.
Linear Algebra on
High-Performance
Computers, J.
Dongarra and D.
Sorensen, Applied
Mathematics and
Computation
20(1-2):57-88, September
1986, ISSN 0096-3003.
A pdf
version is available.
Squeezing the Most
out of High
Performance Computers
for Finding the
Eigenvalues, J.
Dongarra, L. Kaufman,
and S. Hammarling,
Linear Algebra and Its
Applications 77:113-136,
May 1986, ISSN
0024-3795.
A pdf
version is available.
- 1985 -
A Proposal for an
Extended Set of
Fortran Basic Linear
Algebra Subprograms,
J. J. Dongarra, J. Du
Croz, S. Hammarling, and
R. J. Hanson, ACM SIGNUM
Newsletter 20(1):2-18,
January 1985, ISSN
0163-5778.
A pdf
version is available.
Algorithm Design for
Different Computer
Architectures, J.
J. Dongarra, B. T.
Smith, and D. Sorensen,
IEEE Software
2(4):79-80, July 1985.
A pdf
version is available.
- 1984 -
A Collection of
Parallel Linear
Equations Routines for
the Denelcor HEP,
J. J. Dongarra and R. E
Hiromoto, Parallel
Computing 1(2):133-142,
December 1984, ISSN
0167-8191.
A pdf
version is available.
EISPACK - A
Collection for Solving
Eigenvalue Problems,
J. Dongarra and C.
Moler, in Sources and
Development of
Mathematical Software,
W. R. Cowell, ed., pp.
68-87, Prentice-Hall:
Upper Saddle River, NY,
1984, ISBN
0-13-823501-5.
A pdf
version is available.
Implementing Linear
Algebra Algorithms for
Dense Matrices on a
Vector Pipeline
Machine, J. J.
Dongarra, F. G.
Gustavson and A. Karp,
SIAM Review
26(1):91-112, January
1984, ISSN 0036-1445.
A pdf
version is available.
Multiprocessing
Linear Algebra
Algorithms on the CRAY
X-MP-2: Experiences
with Small Granularity,
S. S. Chen, J. J.
Dongarra, and C. C.
Hsiung, Journal of
Parallel and Distributed
Computing 1(1):22-31,
August 1984, ISSN
0743-7315.
A pdf
version is available.
On Some Parallel
Banded System Solvers,
J. J. Dongarra and A. H.
Sameh, Parallel
Computing 1(3):223-235,
December 1984.
A pdf
version is available.
Solving the Secular
Equation Including
Spin Orbit Coupling
for Systems with
Inversion and Time
Reversal Symmetry,
J. J. Dongarra, J. R.
Gabriel, D. D. Koelling,
and J. H. Wilkinson,
Journal of Computational
Physics 54(2):278-288,
May 1984, ISSN
0021-9991.
A pdf
version is available.
Squeezing the Most
out of an Algorithm in
CRAY FORTRAN, J.
J. Dongarra, and S. C.
Eisenstat, ACM
Transactions on
Mathematical Software
10(3):219-230, September
1984, ISSN 0098-3500.
A pdf
version is available.
The Eigenvalue
Problem for Hermitian
Matrices with Time
Reversal Symmetry,
J. J. Dongarra, J. R.
Gabriel, D. D. Koelling,
and J. H. Wilkinson,
Linear Algebra and Its
Applications 60:27-42,
August 1984, ISSN
0024-3795.
A pdf
version is available.
- 1983 -
Improving the
Accuracy of Computed
Eigenvalues and
Eigenvectors, J.
J. Dongarra, C. B. Moler
and J. H. Wilkinson,
SIAM Journal on
Numerical Analysis
20(1):23-45, February
1983, ISSN 0036-1429.
A pdf
version is available.
Improving the
Accuracy of Computed
Singular Values,
J. J. Dongarra, SIAM
Journal on Scientific
and Statistical
Computing 4(4):712-719,
December 1983, ISSN
0196-5204.
A pdf
version is available.
Performance of
Various Computers
Using Standard Linear
Equations Software in
a Fortran Environment,
J. J. Dongarra, ACM
SIGARCH Computer
Architecture News
11(5):22-27, December
1983, ISSN 0163-5964.
A pdf
version is available.
- 1982 -
Algorithm 589:
SICEDR: A FORTRAN
Subroutine for
Improving the Accuracy
of Computed Matrix
Eigenvalues, J. J.
Dongarra, ACM
Transactions on
MathematicalSoftware
8(4):371-375, December
1982, ISSN 0098-3500.
A pdf
version is available.
- 1979 -
Unrolling Loops in
Fortran, J.
Dongarra and A. R.
Hinds, Software-Practice
and Experience,
9(3):219-226, March
1979, ISSN 0038-0644.
A pdf
version is available.