Workshop on Clusters, Clouds,
and Data for Scientific Computing
CCDSC 2016
(last update 4/5/17 3:31 AM)
427 Chemin de ChanzŽ, France
Sponsored by:
NSF, AMD, PGI,
Nvidia, Intel, Mellanox, GENCI, CGG-Veritas, ICL/UTK, Grenoble Alps University.
Click
here for the full size image.
Clusters, Clouds, and Data for Scientific Computing
2016
427 Chemin de ChanzŽ, France
October 3rd Ð 6th,
2016
CCDSC 2016 will be held at a resort outside of Lyon France called La maison des contes http://www.chateauform.com/en/chateauform/maison/17/chateau-la-maison-des-contes
The address of the Chateau
is:
Ch‰teauformÕ La Maison des
Contes
427 chemin de ChanzŽ
69490 DareizŽ
Telephone: +33 1 30 28 69 69
1 hr 30 min from the Saint
ExupŽry Airport
45 minutes from Lyon
GPS Coordinates: North latitude 45¡ 54' 20" East longitude 4¡ 30' 41"
Go to http://maps.google.com and type in: Ò427 chemin de ChanzŽ 69490 DareizŽÓ or
see:
Maps: click here
Map of Chateau: click here
This proceeding gathers information
about the participants of the Workshop on Clusters, Clouds, and Data for
Scientific Computing that will be held at La Maison des Contes, 427 Chemin de
ChanzŽ, France on October 3-6, 2016.
This workshop is a continuation of a series of workshops started in 1992
entitled Workshop on Environments and Tools for Parallel Scientific Computing.
These workshops have been held every two years and alternate between the U.S.
and France. The purpose of this the workshop, which is by invitation only, is
to evaluate the state-of-the-art and future trends for cluster computing and
the use of computational clouds for scientific computing.
This workshop addresses a
number of themes for developing and using both cluster and computational clouds.
In particular, the talks covered:
¤ Survey and analyze the key
deployment, operational and usage issues for clusters, clouds and grids,
especially focusing on discontinuities produced by multicore and hybrid
architectures, data intensive science, and the increasing need for wide
area/local area interaction.
¤
Document the current state-of-the-art in each of these areas,
identifying interesting questions and limitations. Experiences with clusters, clouds and grids relative to
their science research communities and science domains that are benefitting
from the technology.
¤
Explore interoperability among
disparate clouds as well as interoperability between various clouds and grids
and the impact on the domain sciences.
¤ Explore directions for future
research and development against the background of disruptive trends and
technologies and the recognized gaps in the current state-of-the-art.
Speakers will present their
research and interact with all the participants on the future software
technologies that will provide for easier use of parallel computers.
This
workshop was made possible thanks to sponsorship from NSF, AMD, PGI, Nvidia,
Intel, Mellanox, GENCI, CGG-Veritas, ICL/UT, Grenoble Alps University.
Thanks!
Jack Dongarra, Knoxville,
Tennessee, USA.
Bernard Tourancheau, Grenoble,
France
October 3rd |
Jack Dongarra, U of Tenn Bernard Tourancheau, U Grenoble |
|
6:30 Ð 7:45 |
Session Chair: Jack Dongarra |
(5 talks - 15 minute each) |
6:30 |
Doug Miles, PGI/Nvidia |
On the Role of Compiler Directives in
High-Performance Computing |
6:45 |
Patrick Demichel, HP |
|
7:00 |
Rich Graham, Mellanox |
|
7:15 |
Joe Curley, Intel |
|
7:30 |
Bill Brantley, AMD |
|
7:45 |
Gunter Roeth, Nvidia |
|
8:00 pm Ð 9:00 pm |
Dinner |
|
Tuesday,
October 4th |
|
|
7:30 - 8:30 |
Breakfast |
|
8:30 - 10:30 |
Session Chair: Bernard Tourancheau, U Grenoble |
(6 talks Ð 20 minutes each) |
8:30 |
Pete Beckman |
|
8:50 |
Rosa Badia |
Task-based programming in COMPSs to
converge from HPC to Big Data |
9:10 |
Ian Foster |
New Directions in Globus: Collections,
Responsive Storage, and Safe Data |
9:30 |
Geoffrey Fox |
Distinguishing
Parallel and Distributed Computing Performance |
9:50 |
Ewa Deelman |
|
10:10 |
Laurent Lefevre |
|
10:30 -11:00 |
Coffee |
|
11:00 - 1:00 |
Session Chair: Patrick Demichel |
(6 talks Ð 20 minutes each) |
11:00 |
Alok Choudhary |
|
11:20 |
Vaidy Sunderam |
Cost and Utility
Tradeoffs on IaaS Clouds, Grids, and On-Premise Resources |
11:40 |
Anne Benoit |
Resilient application co-scheduling with processor redistribution |
12:00 |
Frank Mueller |
|
12:20 |
Fran Berman |
|
12:40 |
Emmanuel Jeannot |
|
1:00 - 2:00 |
Lunch - break |
|
2:30 Ð 3:00 |
Coffee |
|
3:00 - 5:20 |
Session Chair: Fran Berman |
(6 talks Ð 20 minutes each) |
3:00 |
Barbara Chapman |
|
3:20 |
Rusty Lusk |
From Automated Theorem Proving to Nuclear Structure Analysis with Self-Scheduled Task Parallelism |
3:40 |
Tony Hey |
|
4:00 |
Franck Cappello |
Lossy Compression of scientific data: from Stone Age to Renaissance |
4:20 |
Al Geist |
|
4:40 |
Yves Robert |
Some recent results about resilience and/or co-scheduling for large-scale systems |
6:30 Ð 7:30 |
Organic wine tasting in the Ç salon des contes È where we had the first welcome gathering |
|
8:00 Ð 9:00 |
Dinner |
|
Wednesday, October 5th |
|
|
7:30 - 8:30 |
Breakfast |
|
8:30 - 10:10 |
Session Chair: Emmanuel Jeannot |
(5 talks Ð 20 minutes each) |
8:30 |
Bill Gropp |
|
8:50 |
Andrew Grimshaw |
PCubeS/IT - A Type Architecture and
Portable Parallel Language for Hierarchical Parallel Machines |
9:10 |
Ron Brightwell |
Embracing
Diversity: OS Support for Integrating High-Performance Computing and Data
Analytics |
9:30 |
Joel Saltz |
Convergence of Data and Computation:
Integration of Sensors and Simulation |
9:50 |
Laura Grigori |
|
10:10 -10:40 |
Coffee |
|
10:40 - 12:40 |
Session Chair: Laurent
Lefevre |
(6 talks Ð 20 minutes each) |
10:40 |
Samuel Thibault |
|
11:00 |
Michela Taufer |
|
11:20 |
Jeff Hollingsworth |
|
11:40 |
Haohuan Fu |
|
12:00 |
David Walker |
Morton Ordering of 2D
Arrays for Parallelism and Efficient Access to Hierarchical Memory |
12:20 - 2:00 |
Lunch |
|
2:00 Ð 4:00 |
Session Chair: Michela Taufer |
(4 talks Ð 20 minutes each) |
2:00 |
Dimitrios Nikolopoulos |
|
2:20 |
Rob Ross |
From
File Systems to Services: Changing the Data Management Model in HPC |
2:40 |
Torsten Hoefler |
Automatic GPU compilation and why you want to run MPI on your GPU |
3:00 |
Manish Parashar |
|
3:20 Ð 4:00 |
Coffee |
|
4:00 Ð 6:00 |
Session Chair: Rosa Badia |
(4 talks Ð 20 minutes each) |
4:00 |
Jeff Vetter |
Performance
Portability for Extreme Scale High Performance Computing |
4:20 |
Bernd Mohr |
POP -- Parallel Performance Analysis and Tuning as a Service |
4:40 |
Padma Ragahavan |
|
5:00 |
Carl Kesselman |
|
8:00 Ð 9:00 |
Dinner |
|
9:00 pm - |
|
|
Thuesday, October 6th |
|
|
7:30 - 8:30 |
Breakfast |
|
8:30 - 10:10 |
Session Chair: Padma Raghavan |
(5 talks Ð 20 minutes each) |
8:30 |
Ilkay Altintas |
Workflows as an Operation Tool for
Scientific Computing using Data Science |
8:50 |
Christian Obrecht |
On
a Novel Method for High Performance Computational Fluid Dynamics |
9:10 |
Anthony Danalis |
|
9:30 |
Pavan Balaji |
|
9:50 |
Martin Swany |
|
10:10 -10:40 |
Coffee |
|
10:40 - 12:20 |
Session Chair: Jack Dongarra |
(3 talks Ð 20 minutes each) |
10:40 |
Frederic Desprez |
BOAST:
Performance Portability Using Meta-Programming and Auto-Tuning |
11:00 |
Frederic Vivien |
|
11:20 |
Minh Quan Ho |
LBM
3D stencil memory bound improvement with many-core processors
asynchronous transfers |
11:40 |
|
|
12:00 - 1:30 |
Lunch |
|
1:30 |
Depart |
|
Here is some information on
the meeting in Lyon. We have
updated the workshop webpage http://bit.ly/ccdsc-2016 with the workshop agenda.
On Monday October 3rd
there will be a bus to pick up participants at Lyon's Saint ExupŽry (old name
Satolas) Airport at 3:00. (Note that the Saint ExupŽry airport has its own
train station with direct TGV connections to Paris via Charles de Gaulle. If
you arrive by train at Saint ExupŽry airport please go to the airport meeting
point (point-rencontre) (second floor, next to the shuttles, near the hallway
between the two terminals, see http://www.lyonaeroports.com/en/practicals-informations/information-points ).
The bus will be at the TGV station is after a long
corridor from the airport terminal. The bus stop is near the station entrance
on the parking lot called "depose minute".
The bus will then travel to
pick up people at the Lyon Part Dieu railway station at 4:45. (There are two
train stations in Lyon, you want Part Dieu station not the Perrache station.)
There will be someone with a sign at the "Meeting Point/point de
rencontre" of the station to direct you to the bus.
The bus is expected to arrive
at the La Maison des Contes around 5:30. We would like to hold the first
session on Tuesday evening from 6:30 pm to 8:00 pm, with dinner following the
session. The La Maison des Contes is about 43 Km from Lyon. For a map to the La
Maison des Contes go to http://maps.google.com and type in: Ò427 chemin de ChanzŽ 69490 DareizŽÓ or
see: Maps: click here
Map of Chateau: click here
VERY IMPORTANT: Please send
your arrival and departure times to Jack so we can arrange the appropriate size
bus for transportation. VERY VERY
IMPORTANT: If your flight is such that you will miss the bus on Monday, October
3rd at 3:00 send Bernard your flight arrival information so he can
arrange for a transportation to pick you up at the train station or the airport
in Lyon. It turns out that a taxi from Lyon to the Chateau can cost as much as
100 Euro and the Chateau may be hard to find at night if you rent a car and are
not a French driver :-).
At the end of the meeting on Thursday
afternoon, we will arrange for a bus to transport people to the train station
and airport. If you are catching an early flight in the morning of Friday
October 7th you may want to stay at the hotel located
at Lyon's Saint ExupŽry Airport,
see http://www.lyonaeroports.com/eng/Shops-facilities/Hotels
for details.
There are also many hotels in
Lyon area, see: http://www.en.lyon-france.com/
Due to room constraints at
the La Maison des Contes, we ask that you not bring a guest. Dress at the
workshop is informal. Please tell
us if you need special requirements (vegetarian food etc...) We are expecting
to have internet and wireless connections at the meeting, but you know this is France.
Please send this information
to Jack (dongarra@icl.utk.edu) by
August 5th.
Name:
Institute:
Title:
Abstract:
ParticipantÕs
brief biography:
Arrival / Departure Details:
|
|
Arrival Times in Lyon |
Departure Times in Lyon |
Special |
Ilkay |
Altintas |
10/3 10:25 UA 8024 |
10/7 12:00 UA 8067 |
|
Rosa |
Badia |
10/3 13:55 U2 4417 |
10/6 19:00 VY 1223 |
|
Pavan |
Balaji |
10/3 14:10 UA 8916 |
10/7 7:20 UA 8881 |
|
Pete |
Beckman |
10/3 14:10 UA 8916
|
10/6 Train |
|
Anne |
Benoit |
Drive |
Drive |
|
Fran |
Berman |
10/3 11:00 BR 3587 |
10/7 9:00 LH 2247 |
|
Bill |
Brantley |
10/3 10:55 TA 1807 |
10/6 18:20 TA 1810 |
|
Ron |
Brightwell |
10/3 13:20 DL 9515 |
10/7 6:40 DL 8611 |
|
Franck |
Cappello |
10/3 14:00 Part Dieu |
10/7 16:57 Part Dieu |
|
Barbara |
Chapman |
10/3 Part Dieu |
10/4 21:00 |
|
Alok |
Choudhary |
10/3 13:20 KL 1417 |
10/6 19:05 LH 1079 |
|
Joe |
Curley |
10/3 13:00 Part Dieu |
10/6 |
|
Anthony |
Danalis |
10/3 11:15 DL 8320 |
10/6 16:25 |
|
Ewa |
Deelman |
Part Dieu |
10/6 19:30 to LHR |
|
Patrick |
Demichel |
Drive |
Drive |
|
Frederic |
Desprez |
Drive 10/5 |
Drive 10/6 |
|
Jack |
Dongarra |
10/3 11:15 DL 8320 |
10/7 6:40 DL 8611 |
|
Ian |
Foster |
10/3 14:10 UA 8916 |
10/7 9:00 UA 9486 depart 10/6 10:00 |
|
Geoffrey |
Fox |
10/3 13:20 KL
1417 |
10/7 10:05 KL
1414 |
|
Haohuan |
Fu |
10/3 10:20 LH 1074 |
10/6 19:05 LH 1079 |
|
Al |
Geist |
10/3 11:15 DL 8320 |
10/7 6:40 DL 8611 |
|
Rich |
Graham |
10/3 10:20 UN 8914 |
10/7 9:00 UA 9486 |
|
Laura |
Grigori |
Part Dieu |
Departs Wednesday 10/4 |
|
Andrew |
Grimshaw |
10/3 12:15 LH 2248 |
10/6 17:05 LH 2251 |
|
Bill |
Gropp |
10/3 train Part Dieu |
10/7 flights |
|
Tony |
Hey |
10/3 18:35 BA 362 pickup Roland Taxi cellular
number is +33 6 08 25 55 46 |
10/5 19:30 BA 363 |
|
Minh
Quan |
Ho |
Part Dieu |
Part Dieu |
|
Torsten |
Hoefler |
10/3 Part Dieu 13:30 from Geneva |
Part Dieu |
|
Jeff |
Hollingsworth |
10/4 12:15 UA 9487 |
10/7 9:00 UA 9486 |
|
Emmanuel |
Jeannot |
10/3 14:30 AF 5372 |
10/6 15:15 AS 4126 depart 10/6 12:00 |
|
Carl |
Kesselman |
10/3 14:10 LH 1076 |
10/6 13:55 AF 8285 depart 10/6 10:00 |
|
Laurent |
Lefevre |
Drive 10/3 20:00 |
Drive 10/6 5:00 |
vegetarian (no meat, but fish, milk, eggs OK) |
Rusty |
Lusk |
10/3 14:10 UA 8916
|
10/6 Train |
|
Doug |
Miles |
10/3 bus from airport |
10/6 bus to airport |
|
Bernd |
Mohr |
10/3 Part Dieu TGV 9826 14:00 |
10/6 Part Dieu TGV 6622
15:00 |
|
Frank |
Mueller |
10/3 14:10 LH 1076 |
10/6 17:05 LH 2251 |
|
Dimitrios |
Nikolopoulos |
10/3 15:05 EI 552 |
10/6 15:45 EI 1553 depart 10/6 12:00 |
|
Christian |
Obrecht |
Drive |
Drive |
|
Manish |
Parashar |
10/3 12:15 UA 9487 |
10/7 7:20 UA 8881 |
Vegetarian |
Padma |
Raghavan |
10/3 11:00 UA 9959 |
10/7 10:00 UA 973 |
|
Yves |
Robert |
Drive |
Drive |
|
Rob |
Ross |
10/3 14:00 UA 8916 |
10/7 7:20 UA 8881 |
|
Gunter |
Roeth |
10/3 Drive |
10/5 Drive |
|
Joel |
Saltz |
10/3 10:20 LH 1074 |
10/6 14:25 LH 1077 depart 10/6 10:00 |
|
Vaidy |
Sunderam |
10/3 13:20 DL 9515 |
10/7 10:05 DL 9468 |
|
Martin |
Swany |
10/3 13:20 DL 9515 |
10/7 10:05 DL 9468 |
|
Michela |
Taufer |
Part Dieu |
10/6 19:30 |
|
Samuel |
Thibault |
10/3 14:30 AF 5372 |
10/6 20:20 AF 5383 |
|
Bernard |
Tourancheau |
Make the airport bus at 15:00 |
10/6 16:00 |
|
Jeff |
Vetter |
10/3 9:45 AF 7640 |
10/7 6:40 AF 7651 |
|
Frederic |
Vivien |
Drive |
Drive |
|
David |
Walker |
10/3 13:20 KL 1417 |
10/6 18:20 KL 1416 |
|
Ilkay Altintas, UCSD
Workflows as an Operation Tool for Scientific
Computing using Data Science
Workflows are used by many scientific communities
to capture, automate and standardize computational and data practices in
science. In addition to the earlier use of workflows in HPC and HTC
applications, they present an opportunity to operationalize dynamic data-driven
solutions in which big data systems can be merged with existing big data and
cloud solutions, especially in scenarios where a scalable and reusable
integration of streaming data, analytical tools and computational
infrastructure is needed. This talk will focus on using workflows as a scalable
and reproducible programming model for data streaming and steering within
dynamic data-driven applications, e.g., wildfire prediction, smart
manufacturing, smart grids and traffic control. A summary our ongoing research
efforts on using data science techniques for end-to-end performance prediction
and dynamic steering of workflow-driven applications will also be presented.
Rosa M Badia, Barcelona Supercomputing Center
Task-based
programming in COMPSs to converge from HPC to Big Data
Task-based programming have proven to be a suite
model for HPC applications. The different instances of StarSs have been good
demonstrators of this and have promoted the acceptance of task-based
programming in the OpenMP standard.
Big Data programming models have been dominated by approaches like
MapReduce/Hadoop or Spark, which define a set of operators to be used by the
applications. Since COMPSs is the
StarSs instance that tackles distributed computing (including Clouds), it can
be considered in order to provide a task-based programming model for Big Data
applications.
The talk will describe why we consider that
task-based programming models are a good approach for Big Data and will compare
examples between COMPSs and Apache Spark, including performance results.
Pavan Balaji, ANL
How I Learned to Stop Worrying
about Exascale and Love MPI
Pete Beckman, ANL
WaggleVision
Sensors and embedded computing devices are being
woven into buildings, roads, household appliances, and light bulbs. Most
sensors and actuators are designed to be as simple as possible, with low-power
microprocessors that just push sensor values up to the cloud. However,
another class of powerful, programmable sensor node is emerging. The
Waggle (www.wa8.gl) platform supports parallel computing, machine learning, and computer
vision for advanced intelligent sensing applications. Waggle is an open source
and open hardware project at Argonne National Laboratory that has developed a
novel wireless sensor system to enable a new breed of smart city research and
sensor-driven environmental science. Leveraging machine learning tools such as
GoogleÕs TensorFlow and BerkeleyÕs Caffe and computer vision packages such as
OpenCV, Waggle sensors can understand their surroundings while also measuring
air quality and environmental conditions. Waggle is the core technology for
the Chicago ArrayOfThings (AoT) project (https://arrayofthings.github.io). The AoT will deploy 500
Waggle-based nodes on the streets of Chicago beginning in 2016. Prototype
versions are already deployed on a couple campuses. The presentation will
outline the current progress of designing and deploying the current platform,
and our progress on research topics in computer science, including parallel
computing, operating system resilience, data aggregation, and HPC modeling and
simulation.
Anne Benoit, ENS Lyon, France
Resilient
application co-scheduling with processor redistribution
Recently, the benefits of co-scheduling several
applications have been demonstrated in a fault-free context, both in terms of
performance and energy savings.
However, large-scale computer systems are confronted to frequent
failures, and resilience techniques must be employed to ensure the completion
of large applications. Indeed,
failures may create severe imbalance between applications, and significantly
degrade performance. We propose to redistribute the resources assigned to each
application upon the striking of failures, in order to minimize the expected
completion time of a set of co-scheduled applications. First, we introduce a formal model and
establish complexity results. When no redistribution is allowed, we can
minimize the expected completion time in polynomial time, while the problem
becomes NP-complete with redistributions, even in a fault-free context.
Therefore, we design polynomial-time heuristics that perform redistributions
and account for processor failures. A fault simulator is used to perform
extensive simulations that demonstrate the usefulness of redistribution and the
performance of the proposed heuristics.
Fran Berman, RPI
Sustaining the Data Ecosystem
Innovation in a digital world presupposes that the data will be there
when we need it, but will it? Without enabling technical infrastructure,
supporting social infrastructure, and sufficient attention to the stewardship
and long-term preservation of digital data, data may become inaccessible or
lost. This is particularly critical for data generated by sponsored research
projects where the focus is typically on innovation rather than infrastructure,
and support for stewardship and preservation may be short-term. In
this talk, we provide a holistic perspective on the opportunities and
challenges involved in creating a sustainable data ecosystem to drive
data-driven innovation for current and future applications.
Ron Brightwell, Sandia Labs
Embracing
Diversity: OS Support for Integrating High-Performance Computing and Data
Analytics
It is unlikely that one operating system or a
single software stack will support the emerging and future needs of the high-performance
computing and high-performance data analytics applications. There are many
technical and non-technical reasons why functional partitioning through
customized software stacks will continue to persist. Rather than pursuing
approaches that constrain the ability to provide a system software environment
that satisfies a diverse and competing set of requirements, methods and
interfaces that enable the use and integration of multiple software stacks
should be pursued. This talk will describe the challenges that motivate the
need to support multiple concurrent software stacks for enabling application
composition, more complex application workflows, and a potentially richer set
of usage models for extreme-scale high-performance computing systems. The Hobbes
project led by Sandia National Laboratories has been exploring operating system
infrastructure for supporting multiple concurrent software stacks. This talk
will describe this infrastructure, relevant interfaces, and highlight issues
that motivate future exploration.
Franck Cappello, ANL
Lossy
Compression of scientific data: from Stone Age to Renaissance
Data-reduction is already necessary for many
scientific simulations and experiments.
Exascale simulations and updates of large scale instruments will require
significantly more reduction. Compression is one of the fundamental techniques
that can help address this challenge. Can we compress floating point datasets
more than with Gzip of JPEG (Stone age)? The answer is yes with the best state
of the art compressors (Renaissance). But it's not easy. Good compressors are
intricate machineries optimizing multiple objectives: compression factors,
respect of error bounds, compression speed, decompression speed, etc. In this
talk, I will present the best in class lossy compressor for floating point
dataset, respecting strictly user set error bounds. Can we declare
victory? No, some datasets are
"hard to compress". I will present our understanding of them and
techniques to improve their compression.
Are user ready to use lossy compressors? Well, there is no consensus here: one
key for the adoption is the understanding of the controls that users need on
the compression errors.
Alok N. Choudhary, Northwestern University
Scaling
Resiliency via machine learning and compression
Data checkpoint-restart is the most common fault
tolerance technique in High Performance Computing (HPC) systems, which writes
the full state of the machine to stable storage and restarts the last point of
checkpoint. As the HPC systems move towards exascale, the external storage
space, costs of time and power to move data off system from traditional
checkpointing method threaten to overwhelm not only the simulations but also
the post-simulation data analysis. One conventional practice to address this
problem is to apply data compression in order to achieve data reduction.
However, most of the lossless compression techniques that look for repeated
patterns are ineffective for scientific data, as when high-precision data is
used, common patterns are rare to find.
In this talk, we present machine learning
techniques that learn the distribution of changes in state values of
simulations, and an algorithm that significantly compresses data with
guaranteed point-wise error bounds. Capturing the distribution of relative
changes in pair-wise data elements instead of storing the data itself provides
an opportunity to incorporate the temporal dimension of the data and learn the
distribution evolution of the changes. The algorithm consists of the following
steps. (1) Similar to forward predictive coding in video compression, it first
computes the relative change in data values between two consecutive timestamps
or iterations. (2) A machine learning-based data approximation algorithm will
be designed to transform the change distribution into groups (an index domain
with a much smaller space required), and encode data points by their group
indices. (3) As each data point is approximated by its corresponding group
index value, our approach allows a program restart or post-simulation analysis
to use the compressed data with a controlled approximation.
Joe Curley, Intel
Tales of Parallel
App Enabling on the Path to Exascale
Anthony Danalis, UTK
Dataflow programming: Do we need it for
exascale?
Task based execution has been growing in popularity in the last few
years. Several new runtime systems are being actively developed and some more
traditional ones are adding tasking support. However, task execution and
dataflow programming are not the same thing. This talk will discuss the
differences between the two and examine the pros and cons of the different ways
of supporting task based parallelism. The presentation will also attempt to
extrapolate into the future of high performance computing and examine the possibility
of tighter integration between the different layers of the stack, namely the
runtime, compiler and the human developer.
Ewa Deelman, ISI
What is missing in workflow
technologies
This talk will look at the current state of workflow technologies. It
will give examples from the Pegasus Workflow Management System and describe
current capabilities. Examples will be drawn from a variety of applications in
astronomy, gravitational-wave physics, earthquake science and bioinformatics.
These workflows are executing on heterogeneous environments including clouds,
HPC, and HTC resources. Finally, the talk will lay out the missing
capabilities that are necessary to broaden the use of workflows in science.
Patrick Demichel, HP
The future of IT
technologies
Our industry is in perpetual technological
transformation and even acceleration, with an immense appetite for always more
compute, storage, network, applications, etc. We now face an immense opportunity with
the Internet of Things world and all its potentiality to solve many challenges
of our society ; but our industry is having for the first time the symptoms it
reaches some fundamental limits of scaling. Then we need a more drastic
transformation, we will see what technologies will help us to continue on that
road towards the massively distributed Exascale systems we envision and all the
technologies we need to develop.
Frederic Desprez,
INRIA
BOAST: Performance Portability Using
Meta-Programming and Auto-Tuning
Performance
portability of HPC applications is of paramount importance but tedious and
costly in terms of human resources. Unfortunately those efforts are often lost
when migrating to new architectures as optimization are not generally
applicable. In the Mont-Blanc European project we tackle this problem from
several angles. One of them is by using task based runtime (OmpSs) to get
adaptive scientific applications. Another one is by promoting scientific
application auto-tuning. Unfortunately, the investment to setup a dedicated
auto-tuning framework is usually too expensive for a single application. Source
to source transformations or compiler based solutions exist but sometimes prove
too restrictive to cover all use-cases.
We thus propose
BOAST a meta-programming framework aiming at generating parametrized source
code. The aim is for the programmer to be able to orthogonally express
optimizations on a computing kernel, enabling a thorough search of the
optimization space. This also allows a lot of code factorization and thus code
base reduction. We will demonstrate the use of BOAST on a classical Laplace
kernel. Demonstrating how our embedded DSL allowed the description of non
trivial optimizations. We will also show how the BOAST framework enabled
performance and non regression tests to be performed on the generated code
versions, resulting in proven and efficient computing kernels on several
architectures.
Ian Foster, U of Chicago and ANL
New Directions
in Globus: Collections, Responsive Storage, and Safe Data
The Globus team has spent the past five years
developing new cloud software-as-a-service approaches to research data
management. This work has produced powerful cloud services and a widely
deployed network of more than 10,000 Globus endpoints that together enable
ubiquitous, secure, and efficient access to large quantities of scientific
data. We are now investigating how we can build on this distributed
infrastructure to automate further research data management tasks, such as
mapping distributed data collections, detecting and responding to storage
system events, and collaborative analysis of sensitive information. I present
here some of the use cases that motivate this work, the ideas that we are
exploring, and early results.
Geoffrey Fox, Indiana University
Distinguishing
Parallel and Distributed Computing Performance
We are pursuing the concept of HPC-ABDS
-- High Performance Computing Enhanced Apache Big Data Stack -- where we
try to blend the usability and functionality of the community big data stack
with the performance of HPC. Here we examine major Apache Programming
environments including Spark, Flink, Hadoop, Storm, Heron and Beam. We suggest
that parallel and distributed computing often implement similar concepts (such
as reduction, communication or dataflow) but that these need to be implemented
differently to respect the different performance, fault-tolerance, synchronization,
and execution flexibility requirements of parallel and distributed programs. We
present early results on the HPC-ABDS strategy of implementing best-practice
runtimes for both these computing paradigms in major Apache environments.
Haohuan Fu, Tsinghua University
Refactoring and Optimizing the
Community Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer
This
paper reports our efforts on refactoring and optimizing the Community
Atmosphere Model (CAM) on the Sunway TaihuLight supercomputer, which uses a
many-core processor that consists of management processing elements (MPEs) and
clusters of computing processing elements (CPEs). To map the large code base of
CAM to the millions of cores on the Sunway system, we take OpenACC-based
refactoring as the major approach, and apply source-to-source translator tools
to exploit the most suitable parallelism for the CPE cluster, and to fit the
intermediate variable into the limited on-chip fast buffer. For individual
kernels, when comparing the original ported version using only MPEs and the
refactored version using both the MPE and CPE clusters, we achieve up to 22x
speedup for the compute-intensive kernels. For the 25km resolution CAM global
model, we manage to scale to 24,000 MPEs, and 1,536,000 CPEs, and achieve a
simulation speed of 2.81 model years per day.
Al Geist, ORNL
Are Killer Apps
Killing Exascale?
In 2009 the goal was to get to exascale by 2018. In
2013 the goal was slipped to 2020. Today the U.S. Exascale Computing project is
targeting 2023 for the first U.S.
exascale computer. Is it technology, politics, or the lack of any
compelling killer apps that is driving out the target date for exascale? This
talk examines all three of these reasons and shows that while technology and
politics play a role, the lack of even a single killer exascale app is killing
exascale.
Rich Graham, Mellanox
The Active Network
Laura Grigori, INRIA
Low rank approximation
and write avoiding algorithms
In this talk we will discuss algorithms for
computing a low rank approximation of a matrix. This problem has numerous and diverse
applications ranging from scientific computing problems such as fast solvers
for integral equations to data analytics problems such as principal component
analysis (PCA) or image processing.
We discuss various approaches for computing such an approximation that
show a trade-off between speed versus deterministic/probabilistic accuracy.
We also discuss the need to avoid writing suggested
by emerging memory technologies as nonvolatile memories. Writes can be much more expensive than
reads in some current and emerging technologies, in terms of both time and
energy. This motivates us to study
algorithms that reduce the number of writes in addition to more generally
reducing communication between processors and between different levels of the
memory hierarchy.
William Gropp, University of Illinois at
Urbana-Champaign
Do You Know What
Your I/O is Doing?
Even though supercomputers are typically described
in terms of their floating point performance, science applications also need
significant I/O performance for all parts of the science workflow. This ranges from reading input data, to
writing simulation output, to conducting analysis across years of simulation
data. This talk presents recent
data on the use of I/O at several supercomputing centers and what that suggests
about the challenges and open problems in I/O on HPC systems. The talk concludes with an example of
how I/O performance can be improved in an application.
Andrew Grimshaw, U of Virginia
PCubeS/IT - A Type Architecture and Portable
Parallel Language for Hierarchical Parallel Machines
Writing portable
parallel applications remains a challenge, particularly in the presence of
increasingly heterogeneous and deep node architectures. Achieving good
performance is especially challenging for rookie parallel programmers who lack
the experience of optimizing performance on different types of hardware.
Snyder, in his seminal work ÒType Architectures, Shared Memory, and the
Corollary of Modest Potential,Ó argues that a reflection of the salient
features of an architecture in the programming language is necessary. The
PCubeS type architecture represents a parallel machine as a finite hierarchy of
parallel processing spaces each having fixed, possibly zero, compute and memory
capacities and containing a finite set of uniform, independent sub-spaces.
The IT language
is a PCubeS language in which computations are defined to take place in a
corresponding hierarchy of logical processing spaces, each of which may impose
a different partitioning of data structures. The programmer is responsible for
decomposing the problem into multiple spaces, selecting the best decomposition
of variables in each space, and for mapping the logical IT spaces to the
physical spaces of the target machine. Only the last step, mapping
logical to physical spaces is different for each target machine. The rest of
the IT program remains the same for different target machines.
The IT compiler
and run-time system are responsible for breaking up and executing the code for
each logical space on the specified hardware space and managing all
communication and synchronization between partitions within a space and between
different logical spaces. Two IT compilers have been completed: a multicore
compiler and a distributed memory/multicore compiler that uses MPI for
inter-host communication. A third compiler, adding GPGPUs to the distributed
memory/multicore compiler is in development.
In this talk I
will briefly review what makes efficient parallel programming difficult for
programmers, and why the problem is only getting worse as machines are
developed with deeper and deeper memory hierarchies and more and more node
heterogeneity. I will then introduce the PCubeS type architecture and IT
programming language as a mechanism to addressing the efficient portable
parallel programming problem via a series of sample programs. Finally I will
present the performance results for several application kernels on multicore,
distributed memory/multicore, and distributed memory/GPGPU.
Tony
Hey, Science and Technology Facilities Council, UK
The Revolution in Experimental
and Observational Science: The Convergence of Data-Intensive and
Compute-Intensive Infrastructure
The revolution in Experimental and Observational
Science (EOS) is being driven by the new generation of facilities and
instruments, and by dramatic advances in detector technology. In addition, the
experiments now being performed at large-scale facilities, such as the Diamond
Light Source in the UK and Argonne Advanced Photon Source in the US, are
becoming increasingly complex, often requiring advanced computational modelling
to interpret the results. There is also an increasing requirement for the
facilities to provide near real-time feedback on the progress of an experiment
as the data is being collected. A final complexity comes from the need to
understand multi-modal data which combines data from several different
experiments on the same instrument or data from several different instruments.
All of these trends are requiring a closer coupling between data and compute
resources.
Minh Quan HO, Laboratoire Informatique de Grenoble
(LIG)
LBM 3D stencil
memory bound improvement with many-core processors asynchronous transfers
State-of-the-art academic and industrial many-core
processors present an alternative to mainstream CPU and GPU processors. In
particular, the 93-Petaflops Sunway supercomputer, built from NoC-based
many-core processors, has opened a new era for high performance computing that
does not rely on GPU acceleration. However, memory bandwidth remains the main
challenge for these architectures. This motivates our approach for optimizing
3D Lattice Boltzmann Method (LBM) applications, one of the most data-intensive
kinds of stencil computations, on many-core processors by using local memory
and asynchronous software-prefetching. A representative 3D LBM solver is taken
as an example. We achieve 33% performance gain on the Kalray MPPA-256
processor, by actively prefetching data compared to a ÒpassiveÓ programming
model (OpenCL). We also introduce two- wall, a new LBM propagation algorithm
which performs in-place lattice updates. This method cuts down memory
requirement by half, reduces bandwidth losses in copying halo cells and offers
another 5% improvement, delivering an overall 38% performance gain.
Torsten Hoefler, ETH Zurich
Automatic GPU
compilation and why you want to run MPI on your GPU
Auto-parallelization
of programs that have not been developed with parallelism in mind is one of the
holy grails in computer science. It
requires understanding the source code's data flow to automatically distribute
the data, parallelize the computations, and infer synchronizations where
necessary. We will discuss our new LLVM-based research compiler Polly-ACC that
enables automatic compilation to accelerator devices such as GPUs.
Unfortunately, its applicability is limited to codes for which the iteration
space and all accesses can be described as affine functions. In the second part
of the talk, we will discuss dCUDA, a way to express parallel codes in MPI-RMA,
a well-known communication library, to map them automatically to GPU clusters.
The dCUDA approach enables simple and portable programming across heterogeneous
devices due to programmer-specified locality. Furthermore, dCUDA enables
hardware-supported overlap of computation and communication and is applicable
to next-generation technologies such as NVLINK. We will demonstrate encouraging
initial results and show limitations of current devices in order to start a
discussion.
Jeff
Hollingsworth, U Maryland
Handling Phase Behavior of Parallel Programs
The execution of
high-performance computing applications can often classified into phases with
distinct behavior. These periods of time are relatively short and create
problems for many tools. One type of tool that can be confused by short
phases are auto-tuners. In particular, optimal parameters for one phase may not
be optimal for another. I will describe how run-time phase annotations
can be combined with an online auto-tuner like Active Harmony. This
allows an auto-tuner to reassemble the disjoint (and possibly interleaved)
execution phases of an application, and optimize each as if they were whole
uninterrupted tuning targets. I will present some results of phase analysis and
talk about future challenges as auto-tuning is scaled up to larger systems.
Emmanuel Jeannot, INRIA Bordeaux
Towards
System-Scale Optimization of HPC Applications
TADaaM
(Topology Aware DAtA Management) is a new Inria project team targeting the
optimization of HPC applications taking into account the topology, the
affinity, the memory hierarchy, the network contention, the input data and
other factors impacting performance. In this talk, we will give an overview of
the problematic we want to address, examples of concrete research issues we are
looking at as well as set of existing software and results that will be the
basis of this project for the coming years. This talk will welcome comments and
feedback about this project as well as collaborations and use cases.
Carl Kesselman,
ISI
Scientific Data Asset Management
In his seminal
1960 paper "Man-Computer Symbiosis,Ó J. C. R. Licklider observed:
Òmy choices of what to attempt and what not to attempt [are] determined to an
embarrassingly great extent by considerations of clerical feasibility, not
intellectual capability.Ó Today with the advances in data-driven discovery
enabled by big-data, the internet of things, and the associated data deluge,
the situation is if anything worse not better over fifty years later. One
could argue that the dismal rates of scientific repeatability are attributable
at least in part to this situation. Issues associated with the costs and
complexity of data management are rife across all scientific discipline, yet
surprisingly, almost no tools exist to address this problem. In my talk, I will
introduce the idea of scientific data asset management as a missing element of
the scientific software ecosystem and illustrate how common concepts and tools
can be developed that can be applied across many diverse use cases to
significantly streamline the process of data driven discovery.
Laurent Lefevre, Inria
GreenFactory :
orchestrating power capabilities and leverages at large scale for energy
efficient infrastructures
With
hardware improvement, several leverages of power and energy capabilities are
now available (shutdown, slowdown...). Dealing with such capabilities at large
scale remains a real challenge as some of them can have some impact on
performance, cooling, or even contradictory objectives. We will present several
models of such leverages and how the orchestration of such capabilities can
improve energy efficiency of large scale infrastructures and applications.
Rusty Lusk,
Argonne National Laboratory
From Automated Theorem Proving to Nuclear
Structure Analysis with Self-Scheduled Task Parallelism
Long ago, we tried,
with some success, to parallelize the automated theorem proving system we were
working on at the time. Theorem proving is a challenging application for
parallelism because of its unpredictable paths forward and irregular subtask
sizes. The technique we used then, before there was such a thing as a
taxonomy of parallel programming models, is now perhaps worthy of a name; we
propose Òself-scheduled task parallelism,Ó which is different from many current
Òtask parallelÓ models and systems. In this talk I will introduce the
central idea in the context of automated theorem proving and then show how it
has been modernized and adapted for the exascale age, with particular
application to large-scale Monte Carlo calculations of nuclear structure.
Doug Miles,
PGI/Nvidia
On the Role of Compiler Directives in
High-Performance Computing
Compiler
directives were originally conceived and used as hints to enable better
optimization and more efficient code generation. The advent of
directive-based parallelization extended the use of directives into the realm
of de facto language extensions. What role should directives play in HPC
programming, and how should they interact with and impact parallel constructs
in current and future standardized programming languages? This brief talk
will explore the issue and provide a few thoughts on a logical path forward for compiler
directives in HPC.
Bernd Mohr,
Juelich Supercomputing Centre, Germany
POP -- Parallel Performance Analysis and
Tuning as a Service
Developers of HPC applications can now count on free advice from
European experts to analyse the performance of their scientific codes. The
Performance Optimization and Productivity (POP) Centre of Excellence, funded by
the European Commission under H2020, started operating at the end of 2015. The
POP Centre of Excellence gathers together experts from BSC, JSC, HLRS, RWTH
Aachen University, NAG and Ter@tec. The objective of POP is to provide
performance measurement and analysis services to the industrial and academic
HPC community, help them to better understand the performance behaviour of
their codes and suggest improvements to increase their efficiency. Training and
user education regarding application tuning is also provided. Further
information can be found at http://www.pop-coe.eu/. The talk will give an overview of
the POP Centre of Excellence and describe the common performance assessment
strategy and metrics developed and defined by the project partners. The
presentation will close with some success stories and reports from performance
assessments already performed in the first year of operation by POP personal.
Frank Mueller, NC State University
Mini-Ckpts:
Surviving OS Failures in Persistent Memory
Current resilient efforts in HPC have focused on
application fault-tolerance rather than the operating system (OS), despite the
fact that recent studies have suggested that failures in OS memory may be more
likely. The OS is critical to a systemÕs correct and efficient operation of the
node and processes it governsÑand the parallel nature of HPC applications means
any single node failure generally forces all processes of this application to
terminate due to tight communication in HPC. Therefore, the OS itself must be
capable of tolerating failures in a robust system.
We contribute mini-ckpts, a framework which enables
application survival despite the occurrence of a fatal OS failure or crash.
minickpts achieves this tolerance by ensuring that the crit ical data
describing a process is preserved in persistent memory prior to the failure.
Following the failure, the OS is rejuvenated via a warm reboot and the
application continues execution effectively making the failure and restart
transparent. The mini- ckpts rejuvenation and recovery process is measured to
take between three to six seconds and has a failure-free overhead of between
3-5% for a number of key HPC workloads. In contrast to current fault-tolerance
methods, this work ensures that the operating and runtime systems can continue
in the presence of faults. This is a much finer-grained and dynamic method of
fault-tolerance than the current coarse-grained application-centric methods.
Handling faults at this level has the potential to greatly reduce overheads and
enables mitigation of additional faults.
Dimitrios
Nikolopoulos, Queens University Belfast
Computational Significance and its
Implications for HPC
This talk explores the scope for relaxing the accuracy of computation
and storage in HPC systems and applications by leveraging abstractions and
metrics of computational significance. While this effort is largely motivated
by a desire to reduce the energy footprint of HPC systems, we explore broader
implications of the approach on resilience, performance, and the design of the
system software stack.
Christian
Obrecht, National Institute of Applied Sciences in Lyon
On a novel method for high performance
computational fluid dynamics
In engineering
applications of computational fluid dynamics, using unstructured meshes has been
the obvious choice for decades. However, building an appropriate unstructured
mesh is often a time-consuming task. In recent years, much attention has been
drawn on alternative methods operating on regular Cartesian meshes. Besides
trivial meshing, this kind of approach is usually well-suited to massively
parallel processors. The lattice Boltzmann method (LBM) is the most popular of
these alternatives. It has however the disadvantage of involving significantly
more data per fluid cell than classic Navier-Stokes solvers which impinges upon
performance in a memory bound context.
In this
contribution, we will introduce the link-wise artificial compressibility method
(LW-ACM), a recently proposed approach which combines the advantages of the LBM
to the lower memory requirements of classic Navier-Stokes solvers. For
three-dimensional simulations, memory consumption is reduced by a factor of 5
and performance on GPU increases by a factor of 2, with respect to LBM. Several
implementations of the LW-ACM, using either CUDA or OpenCL, will be presented.
Performance and optimisation issues on both CPU and GPU will be discussed as
well.
Manish Parashar,
Rutgers University
Experiments with Software-Defined
Environments for Science
Software-defined
platforms, such as those enabled by Cloud services, provide new levels of
flexibility, which combined with autonomic capabilities can lead to very
dynamic infrastructures that can adapt themselves to application and user
needs. Such platforms can enable new formulations in science and engineering by
opportunistically leveraging heterogeneous and loosely connected data and
computing resources. In this talk I will explore how elastic software-defined
execution based on autonomic federation of resources and management of applications
can support such dynamic and data-driven workflows. I will also explore how
such abstractions can potentially lead to new paradigms and practices in
science and engineering. This talk is based on research that is part of the
CometCloud project at the Cloud and Autonomic Computing Center at Rutgers and
at the Rutgers Discovery Informatics Institute.
Padma Raghavan,
Vanderbilt University
Synchronization, Load-Balancing and
Redundant Calculations: Finding the Sweet Spot of High Performance
Computing
Parallel processing at scale as well as workflows that operate on
"big data" are becoming increasingly pervasive. This is
changing the space of trade-offs between synchronization levels, load-balancing
and redundant calculations to achieve high performance. We will explore this
matter with some limit studies that illustrate the value of finding the
sweet spots in order to start a discussion on these trade-offs could inform the
development of algorithms, programming models and software environments.
Guther Roeth, NVIDIA
Deep Learning Ð Impact on Modern Life
An introduction to the new computing paradigm that is deep learning, a
subset of machine learning "Teaching computers to think!".
Accelerated by NVIDIA hardware and software, whether your application involves
natural language processing, robotics, bioinformatics, speech, video, search
engines, online advertising or Þnance this overview will highlight the
incredible ability of the algorithms currently helping us address some of our
Grandest of Challenges.
Rob Ross, ANL
From File
Systems to Services: Changing the Data Management Model in HPC
Abstract:
HPC applications are composed from software
components that provide only the communication, concurrency, and
synchronization needed for the task at hand. In contrast, parallel file system
are kernel resident, fully consistent services with semantic obligations
developed on single core machines 50 years ago; parallel file systems are
old-fashioned system services forced to scale as fast as the HPC system. Rather
than the monolithic storage services seen today, we envision an ecosystem of
services being composed to meet the specific needs of science activities at
extreme scale. In fact, a nascent ecosystem of services is present today. In
this talk we will discuss drivers leading to this development, some examples in
existence today, and work we are undertaking to accelerate the rate at which
these services are developed and mature to meet application needs.
Joel Saltz,SUNY
Stronybrook
Convergence of Data and Computation:
Integration of Sensors and Simulation
A great variety of
biomedical and physical application areas involve the need to make predictions
that require 1) multi-scale sensor data, 2) computations aimed at
creating quantitative multi-scale characterizations of
material/chemical/biological properties and 3) use of
material/chemical/biological characterizations to make predictions.
I will describe this paradigm and discuss ways in which this plays out in the
rapidly emerging area of exascale Cancer research. I will also describe
ideas for and prototypes of applicable tools and methods.
Vaidy Sunderam,
Emory University
Cost and Utility Tradeoffs on IaaS Clouds,
Grids, and On-Premise Resources
Cloud computing
is now a mainstream technology in many application domains and user
constituencies. For scientific high-performance applications in academic and
research settings
however, the trade-offs between cost and elasticity on the one hand, and
performance and access on the other, is not always clear. We discuss our
experiences with comparing cost and utility for a class of numerical codes on
three typical platform-types available to researchers: IaaS clouds, grids, and
on-premise local resources. To rank the tested platforms, we introduce a simple
utility function describing the value of a completed computational task to the
user as a function of the wait time and the cost of the computation. Our
results suggest that each platform has situational value, providing tradeoffs
between cost and turnaround.
Martin Swany, Indiana University
Offloading
Collective Operations to Programmable Logic
This talk describes our architecture and
implementation for offloading collective operations to programmable logic in
the communication substrate. Collective operations -- operations that
involve communication between groups of cooperating processes -- are widely
used in parallel processing. The design and implementation strategies of
collective operations plays a significant role in their performance and thus affects
the performance of many high performance computing applications that utilize
them. The programmable logic provided by FPGAs is a powerful option for
creating
task-specific logic to aid applications.
Leveraging FPGAs to improve collective operation performance stands to
offer significant capability and performance
Michela Taufer, University of Delaware
In Situ Data
Analysis of Protein Trajectories
The transition towards exascale computing will be
accompanied by a performance dichotomy. Computational peak performance will
rapidly increase; I/O performance will either grow slowly or be completely
stagnant. Essentially, the rate at which data are generated will grow much
faster than the rate at which data can be read from and written to the disk. Molecular
Dynamics (MD) simulations will soon face the I/O problem of efficiently writing
to and reading from disk on the next generation of supercomputers.
This talk targets MD simulations at the exascale
and proposes a novel technique for in situ data analysis of MD trajectories.
Our technique maps individual trajectories' substructures (i.e., alpha-helices
and beta-strands) to metadata frame by frame. The metadata captures the
conformational properties of the substructures. The ensemble of metadata can be
used for automatic, strategic analysis within a trajectory or across
trajectories, without manually identify those portions of trajectories in which
critical changes take place. We demonstrate our technique's effectiveness by
applying it to 26.3k helices and 31.2k strands from 9,917 PDB proteins and by
providing three empirical case studies.
Samuel Thibault, UniversitŽ Bordeaux, INRIA
Task-graph-based
applications, from theory to exascale?
Expressing parallelism through task graphs (DAG),
well studied theoretically for the past decades, has recently gained a lot of
attention in HPC: a flurry of task-based runtime systems have appeared, and
task graphs have been standardized in OpenMP. Will it be the graal of
parallelism, allowing to easily scale to exa? The StarPU runtime system has
previously proved to be able to seamlessly exploit heterogenous systems thanks
to task graphs and state-of-the-art scheduling heuristics. We here present our
work for HPC over clusters, which shows promising results with very reasonable
effort from the application programmer, both on real platforms and leveraging
task-accurate simulation.
Jeff Vetter, ORNL
Performance
Portability for Extreme Scale High Performance Computing
Concerns about energy-efficiency and reliability have forced our community to reexamine the full spectrum of architectures, software, and algorithms that constitute the HPC ecosystem. While architectures have remained relatively stable for almost two decades, new architectural features, such as heterogeneous processing, non-volatile memory, and optical interconnection networks, have emerged as possible solutions to these constraints. In turn, these architectural changes will force the community to redesign software systems and applications to exploit these new capabilities. However, these dramatic architectural changes are leading to the new challenge of performance portability, where very few applications can make productive use of these very complex systems. In fact, we believe this is the most critical challenge facing HPC. To this end, our group has designed a number of novel methods and tools to help scientists predict and program these increasingly complex systems. In this talk, I will describe a few of these efforts, with a specific focus on emerging non-volatile memory systems and performance prediction.
FrŽdŽric Vivien, INDIA
Multi-level
checkpointing and silent error detection
In this talk we will survey a set of recent results in fault-tolerance for HPC applications. Multi-level checkpointing enables to take advantage of different trade-offs between checkpointing cost and fault resilience. Silent data corruptions (SDCs) make a potential important threat to next-generation systems. Hence, several mechanisms to detect them have been proposed. We will see how one can design an efficient multi-level checkpointing protocol, how to choose which SDC detectors to use, and when to look for data corruptions, and how to do everything at the same time (or almost).
Morton Ordering of 2D
Arrays for Parallelism and Efficient Access to Hierarchical Memory
This talk describes the
recursive Morton ordering that supports efficient access to hierarchical memory
across a range of heterogeneous computer platforms, ranging from many-core
devices, multi-core processor, clusters, and distributed environments.
Programmer-level control of the memory hierarchy is also considered. A brief
overview of previous research in this area is given, and algorithms that make
use of recursive blocking are described. These are then used to demonstrate the
efficiency of the Morton ordering approach by performance experiments on
different processors. In particular, timing results are presented for matrix
multiplication, Cholesky factorisation, and fast Fourier transform algorithms.
The use of the Morton ordering approach leads naturally to algorithms that are
recursive, and exposes parallelism at each level of recursion. Thus, the
approach advocated in this talk not only provides convenient and efficient access
to hierarchical memory, but also provides a basis for exploiting parallelism.
Ilkay Altintas is the chief data science
officer at the San Diego Supercomputer Center (SDSC), UC San Diego, where she
is also the founder and director for the Workflows for Data Science Center of
Excellence. Since joining SDSC in 2001, she has worked on different aspects of
dataflow-based computing and workflows as a principal investigator and in other
leadership roles across a wide range of cross-disciplinary NSF, DOE, NIH, and
Moore Foundation projects. Ilkay is a co-initiator of and an active contributor
to the popular open-source Kepler Scientific Workflow System and the co-author
of publications related to computational data science and e-Sciences at the
intersection of scientific workflows, provenance, distributed computing,
bioinformatics, observatory systems, conceptual data querying, and software
modeling.
Rosa M. Badia holds a PhD on Computer
Science (1994) from the Technical University of Catalonia (UPC). She is the
manager of the Workflows and Distributed Computing research group at the
Barcelona Supercomputing Center (BSC). She is also a Scientific Researcher from
the Consejo Superior de Investigaciones Cientificas (CSIC). She was involved in teaching and
research activities at the UPC from 1989 to 2008, where she was an Associated
Professor since year 1997. From 1999 to 2005 she was involved in research and
development activities at the European Center of Parallelism of Barcelona (CEPBA).
Her current research interest are programming models for complex platforms
(from multicore, GPUs to Grid/Cloud).
The group lead by Dr. Badia has been developing StarSs programming model
for more than 10 years, with a high success in adoption by application
developers. Currently the group focuses its efforts in two instances of StarSs:
OmpSs for heterogeneous platforms and COMPSs/PyCOMPSs for distributed computing
including Cloud. Dr Badia has
published more than 150 papers in
international conferences and journals in the topics of her research.
She is currently participating in the following European funded projects:
ASCETIC, Euroserver, The Human Brain Project, EU-Brazil CloudConnect, the
BioExcel CoE, NEXTGenIO, MUG, EUBra BIGSEA, TANGO and it is a member of HiPEAC2
NoE.
Anne Benoit received the PhD degree from
Institut National Polytechnique de Grenoble in 2003, and the Habilitation ˆ
Diriger des Recherches (HDR) from Ecole Normale SupŽrieure de Lyon (ENS Lyon)
in 2009. She is currently an associate professor in the Computer Science
Laboratory LIP at ENS Lyon, France. She is the author of 37 papers published in
international journals, and 77 papers published in international conferences.
She is the advisor of 7 PhD theses. Her research interests include algorithm
design and scheduling techniques for parallel and distributed platforms, and
also the performance evaluation of parallel systems and applications, with a
focus on energy awareness and resilience. She is Associate Editor of IEEE TPDS,
JPDC, and SUSCOM. She is the
program chair of several workshops and conferences, in particular she is the
program chair for HiPCÕ2016, and the program chair for the Algorithms track of
SCÕ2016. She is a senior member of
the IEEE, and she has been elected a Junior Member of Institut Universitaire de
France in 2009.
Francine Berman is the Edward P. Hamilton Distinguished
Professor in Computer Science at Rensselaer Polytechnic Institute. She
currently serves as U.S. Chair of the Research Data Alliance (RDA) and Co-Chair
of RDA's international leadership Council. Previously, she served as the
Vice President for Research at RPI, and the High Performance Computing Endowed
Chair and Director of the San Diego Supercomputer Center at UC San Diego.
Berman currently serves as Chair of the Anita Borg Institute Board of Trustees,
as co-Chair of the National Science Foundation CISE Advisory Committee, and as
a member of the Board of Trustees of the Sloan Foundation. Her research
interests include data cyberinfrastructure and the policy and practice of
digital stewardship and preservation. In 2009, Dr. Berman was the
inaugural recipient of the ACM/IEEE-CS Ken Kennedy Award for "influential
leadership in the design, development, and deployment of national-scale
cyberinfrastructure" and in 2015, Berman was nominated by President Obama
and confirmed by the U.S. Senate to become a member of the National Council on
the Humanities.
Bill Brantley is a Fellow Design Engineer in
the Research Division of Advanced Micro Devices leading parts of FastForward
and DesignForward research contracts as well as other efforts. Prior to
AMD he was at IBM T.J. Watson Research Center where he was one of the
architects and implementers of the 64 CPU RP3 (a DARPA supported HPC system
development in the mid-80s) including a hardware performance monitor. In
IBM Austin he held in a number of roles including the analysis of server
performance in the Linux Technology Center. Prior to joining IBM, he
completed his Ph.D. at Carnegie Mellon University in ECE after work for 3 years
at Los Alamos National Laboratory.
Ron Brightwell currently manages the Scalable
System Software Department at Sandia National Laboratories. He joined Sandia in
1995 after receiving his BS in mathematics and his MS in computer science from
Mississippi State University. While at Sandia, he has designed and developed
software for lightweight compute node operating systems and high-performance
networks on several large-scale massively parallel systems, including the Intel
Paragon and TeraFLOPS, and the Cray T3 and XT series of machines. He has
authored more than 100 peer-reviewed journal, conference, and workshop
publications. He is a Senior Member of the IEEE and the ACM.
Franck Cappello is leading the resilience
group at ANL, developing research on fault modeling and tolerance. He
particularly focused in the past three years on Silent Data Corruption
detection and checkpoint/restart environment. He recently started an effort in lossy
compression of floating point data, in particular for reducing checkpoint size.
Alok Choudhary is the Henry & Isabelle
Dever Professor of Electrical Engineering and Computer Science and a professor
at Kellogg School of Management. He is also the founder, chairman and chief
scientist (served as its CEO during 2011-2013) of 4C insights (formerly Voxsup
Inc.)., a big data analytics and social media marketing company. He received
the National Science Foundation's Young Investigator Award in 1993. He is a
fellow of IEEE, ACM and AAAS. His research interests are in high-performance
computing, data intensive computing, scalable data mining, computer
architecture, high-performance I/O systems, software and their applications in
science, medicine and business. Alok Choudhary has published more than 400
papers in various journals and conferences and has graduated 33 PhD students.
Techniques developed by his group can be found on every modern processor and
scalable software developed by his group can be found on many supercomputers.
Alok ChoudharyÕs work and interviews have appeared in many traditional media
including New York Times, Chicago Tribune, The Telegraph, ABC, PBS, NPR,
AdExchange, Business Daily and many international media outlets all over the
world.
Joe Curley
serves Intel¨ Corporation as Senior Director, HPC Platform and Ecosystem
Enablement in the High Performance Computing Platform Group (HPG). His primary
responsibilities include supporting global ecosystem partners to develop their
own powerful and energy-efficient HPC computing solutions utilizing Intel
hardware and software products. Mr. Curley joined Intel Corporation in 2007,
and has served in multiple other planning and business leadership roles.
Prior to joining Intel, Joe worked at Dell, Inc. leading the global workstation product line, consumer and small business desktops, and a series of engineering roles. He began his career at computer graphics pioneer Tseng Labs.
Anthony Danalis is currently a Research Scientist II with
the Innovative Computing Laboratory at the University of Tennessee, Knoxville.
His research interests come from the area of High Performance Computing and
Performance Analysis. Recently, his work has been focused on the subjects of
Compiler Analysis and Optimization, System Benchmarking, MPI, and Accelerators.
He received his Ph.D. in Computer Science from the
University of Delaware on Compiler Optimizations for HPC. Previously, he
received an M.Sc. from the University of Delaware and an M.Sc. from the University
of Crete, both on Computer Networks, and a B.Sc. in Physics from the University
of Crete.
Ewa Deelman is a Research Professor at the
USC Computer Science Department and a Research Director, at the USC Information
Sciences Institute (ISI). Dr. Deelman's research interests include the design
and exploration of collaborative, distributed scientific environments, with
particular emphasis on automation of scientific workflow and management of
computing resources, as well as the management of scientific data. In 1997 Dr.
Deelman received her PhD in Computer Science from the Rensselaer Polytechnic
Institute.
FrŽdŽric Desprez is a Chief Senior Research Scientist at
Inria (Corse team Grenoble). He received his PhD in C.S. from Institut National
Polytechnique de Grenoble, France, in 1994 and his MS in C.S. from ENS Lyon in
1990. At Inria, he holds a position of Deputy Scientific Director in charge of
High Performance Computing, Distributed Systems, networks, and software
engineering. In 2008, he obtained an IBM faculty award for his work around data
distribution and scheduling for grid and Cloud platforms.
Frederic's
current activities include parallel algorithms, scheduling for large scale
distributed platforms (clusters, grids, and Clouds), data management, and grid
and cloud computing. He leads the Grid'5000 project, which offers a platform to
evaluate large scale algorithms, applications, and middleware systems.
See http://graal.ens-lyon.fr/~desprez/ for further information.
Patrick Demichel, Working for HPE since 36
years on computer technologies with focus on scientific domains. Now
distinguished Technologist in EMEA on HPC, Big Data and IoT, helping in the
development and integration of all IT innovations for extreme systems. Work on
The Machine program with HPE Laboratories, Moonshot and all emerging
technologies. Past experience: worked on Itanium development in USA.
Jack Dongarra holds an appointment at the University of Tennessee, Oak Ridge National Laboratory, and the University of Manchester. He specializes in numerical algorithms in linear algebra, parallel computing, use of advanced-computer architectures, programming methodology, and tools for parallel computers. He was awarded the IEEE Sid Fernbach Award in 2004; in 2008 he was the recipient of the first IEEE Medal of Excellence in Scalable Computing; in 2010 he was the first recipient of the SIAM Special Interest Group on Supercomputing's award for Career Achievement; in 2011 he was the recipient of the IEEE IPDPS Charles Babbage Award; and in 2013 he received the ACM/IEEE Ken Kennedy Award. He is a Fellow of the AAAS, ACM, IEEE, and SIAM and a member of the National Academy of Engineering.
Ian Foster is a Professor of Computer Science at the
University of Chicago, a Distinguished Fellow at Argonne National Laboratory,
and Director of the Computation Institute. He is also a fellow of the American
Association for the Advancement of Science, the Association for Computing
Machinery, and the British Computer Society. His awards include the British
Computer Society's Lovelace Medal, honorary doctorates from the University of
Canterbury, New Zealand, and CINVESTAV, Mexico, and the IEEE Tsutomu Kanai
award.
Geoffrey Fox
received a Ph.D. in Theoretical Physics from Cambridge University and is now distinguished professor of
Computing, Engineering and Physics at Indiana University where he is director of the Digital Science
Center, and Chair of Department of Intelligent Systems Engineering at the
School of Informatics and Computing.
He previously held positions at Caltech, Syracuse University and Florida
State University after being a postdoc at the Institute of Advanced Study at
Princeton, Lawrence Berkeley Laboratory and Peterhouse College Cambridge. He
has supervised the PhD of 67 students and published around 1200 papers in
physics and computer science with an hindex of 72 and over 28000 citations.
He currently works in applying computer science from infrastructure to analytics in Biology, Pathology, Sensor Clouds, Earthquake and Ice-sheet Science, Image processing, Deep Learning, Network Science, Financial Systems and Particle Physics. The infrastructure work is built around Software Defined Systems on Clouds and Clusters. The analytics focuses on scalable parallelism. He is involved in several projects to enhance the capabilities of Minority Serving Institutions. He has experience in online education and its use in MOOCs for areas like Data and Computational Science. He is a Fellow of APS (Physics) and ACM (Computing).
Haohuan Fu is an associate professor in the Ministry of Education Key Laboratory
for Earth System Modeling, and Center of Earth System Science in Tsinghua
University. He is also the deputy director of the National Supercomputing
Center in Wuxi. His research interests include design methodologies for highly
efficient and highly scalable simulation applications that can take advantage
of emerging multi-core, many-core, and reconfigurable architectures, and make
full utilization of current Peta-Flops and future Exa-Flops supercomputers; and
intelligent data Management, analysis, and data Mining platforms that combine
the statistic methods and machine learning technologies. Fu has a PhD in
computing from Imperial College London. HeÕs a member of IEEE.
Al Geist is a Corporate Research
Fellow at Oak Ridge National Laboratory. He is on the Leadership Team of the
U.S. Exascale Computing Project and wrote most of the planning documents. He is the Chief Technology Officer of
ORNL's Leadership Computing Facility and Chief Scientist for the Computer
Science and Mathematics Division. His recent research is on Exascale computing
and resilience needs of the hardware and software. He leads the U.S. Department
of Energy technical Council on Resilience.
Laura Grigori is a senior research
scientist at INRIA in France, where she is leading Alpines group, a joint group
between INRIA and J.L. Lions Laboratory, UPMC, in Paris. Her field of expertise is in high
performance scientific computing and numerical linear algebra. In the recent years she has co-developed
communication avoiding algorithms.
She has given several invited plenary talks on this topic including one
in SIAM Conference on Parallel Processing 2012 and in IEEE/ACM Supercomputing 2015
Conference. She has received with her co-authors the first SIAM Siag on
Supercomputing Best Paper Prize in 2016.
Andrew Grimshaw received his Ph.D. from the University of
Illinois at Urbana-Champaign in 1988. He joined the University of Virginia as an
Assistant Professor of Computer Science, becoming Associate Professor in 1994
and Professor in 1999. He is the chief designer and architect of Mentat,
Legion, Genesis II, and the co-architect for XSEDE. In 1999 he co-founded Avaki
Corporation, and served as its Chairman and Chief Technical Officer until 2003.
In 2003 he won the Frost and Sullivan Technology Innovation Award. In 2008 he
became the founding director of the University of Virginia Alliance for
Computational Science and Engineering (UVACSE). The mission of UVACSE is to
change the culture of computation at the University of Virginia and to
accelerate computationally oriented research.
Andrew is the chairman of the Open Grid Forum (OGF), having served both
as a member of the OGF's Board of Directors and as Architecture Area Director.
Andrew is the author or co-author of over 100 publications and book
chapters. His current projects are IT, Genesis II, and XSEDE. IT is a next
generation portable parallel language based on the PCubeS type architecture.
Genesis II, is an open source, standards-based, Grid system that focuses on
making Grids easy-to-use and accessible to non computer-scientists. XSEDE
(eXtreme Science and Engineering Discovery Environment) is the NSF follow-on to
the TeraGrid project.
William Gropp
Acting Director and Chief Scientist, NCSA
Director, Parallel Computing Institute
Thomas M. Siebel Chair in Computer Science
University
of Illinois Urbana-Champaign
Tony Hey began his career as a
theoretical physicist with a doctorate in particle physics from the University
of Oxford in the UK. After a career in physics that included research positions
at Caltech and CERN, and a professorship at the University of Southampton in
England, he became interested in parallel computing and moved into computer
science. In the 1980Õs he was one of the pioneers of distributed memory
message-passing computing and co-wrote the first draft of the successful MPI
message-passing standard.
After being both Head of Department and Dean of Engineering at Southampton, Tony Hey escaped to lead the U.K.Õs ground-breaking ÔeScienceÕ initiative in 2001. He recognized the importance of Big Data for science and wrote one of the first papers on the ÔData DelugeÕ in 2003. He joined Microsoft in 2005 as a Vice President and was responsible for MicrosoftÕs global university research engagements. He worked with Jim Gray and his multidisciplinary eScience research group and edited a tribute to Jim called ÔThe Fourth Paradigm: Data-Intensive Scientific Discovery.Õ Hey left Microsoft in 2014 and spent a year as a Senior Data Science Fellow at the eScience Institute at the University of Washington. He returned to the UK in November 2015 and is now Chief Data Scientist at the Science and Technology Facilities Council.
In 1987 Tony Hey was asked by Caltech Nobel physicist Richard Feynman to write up his ÔLectures on ComputationÕ. This covered such unconventional topics as the thermodynamics of computing as well as an outline for a quantum computer. FeynmanÕs introduction to the workings of a computer in terms of the actions of a Ôdumb file clerkÕ was the inspiration for his new book ÔThe Computing UniverseÕ Ð his attempt to write a popular book about computer science.
Tony Hey is a fellow of the AAAS and of the UK's Royal Academy of Engineering. In 2005, he was awarded a CBE by Prince Charles for his Ôservices to science.Õ
Minh Quan Ho is currently PhD student at
the University Grenoble Alps. His
main research topics are on optimizing 3D stencil codes and dense linear
algebra libraries on many-core processors (Kalray MPPA). Prior to that, Minh Quan did his
graduate internship porting the HPL benchmark on the MPPA processor by
implementing a lightweight subset of MPI tuned for MPPA. He also participated in parallelizing
PSi, a Markov-systems simulator project developed by the Grenoble Informatics
Laboratory (LIG) and INRIA. Minh
Quan graduated from his master's degree at the Ecole Polytechnique de Grenoble.
Torsten Hoeffler is an Assistant Professor of
Computer Science at ETH ZŸrich, Switzerland. Before joining ETH, he led the
performance modeling and simulation efforts of parallel petascale applications
for the NSF-funded Blue Waters project at NCSA/UIUC. He is also a key member of the Message
Passing Interface (MPI) Forum where he chairs the "Collective Operations
and Topologies" working group.
Torsten won best paper awards at the ACM/IEEE Supercomputing Conference
SC10, SC13, SC14, EuroMPI'13, HPDC'15, HPDC'16, IPDPS'15, and other
conferences. He published numerous
peer-reviewed scientific conference and journal articles and authored chapters
of the MPI-2.2 and MPI-3.0 standards. He received the Latsis prize of ETH
Zurich as well as an ERC starting grant in 2015. His research interests revolve
around the central topic of "Performance-centric System Design" and
include scalable networks, parallel programming techniques, and performance
modeling. Additional information
about Torsten can be found on his homepage at htor.inf.ethz.ch.
Jeffrey K. Hollingsworth is a Professor of the Computer Science
Department at the University of Maryland, College Park. He also has an
appointment in the University of Maryland Institute for Advanced Computer
Studies and the Electrical and Computer Engineering Department. He received his
PhD and MS degrees in computer sciences from the University of Wisconsin. He
received a B. S. in Electrical Engineering from the University of California at
Berkeley. Dr. HollingsworthÕs research seeks to develop a unified
framework to understand the performance of large systems and focuses in
performance measurement and auto tuning. He is Editor in chief of the journal
Parallel Computing, was general chair of the SC12 conference, and is Chair of
ACM SIGHPC.
Emmanuel Jeannot is a Senior Research Scientist
at Inria and is doing his research at INRIA Bordeaux Sud-Ouest and at the LaBRI
laboratory since 2009. Before that he hold the same position at INRIA Nancy
Grand-Est. In 2006, he was a visiting researcher at the University of
Tennessee, ICL laboratory. From 1999 to 2005 he was assistant professor at the
UniversitŽ Henry PoincarŽ, Nancy 1. During the period of 2000 to 2009 he did
his research at the LORIA laboratory. He got his PhD and Master degree of
computer science (resp. in 1996 and 1999) both from Ecole Normale SupŽrieur de
Lyon, at the LIP laboratory. After his PhD he spent one year as a postdoc at
the LaBRI laboratory in Bordeaux. He is currently leading the TADaaM Inria
team. His main research interests lie in parallel and high-performance computing
and more precisely: processes placement, topology-aware algorithms, scheduling
for heterogeneous environments, data redistribution, algorithms and models for
parallel machines, distributed computing software, adaptive online compression,
and programming models.
Laurent Lefevre is a permanent researcher in
computer science at Inria (the French Institute for Research in Computer
Science and Control). He is a member of the Avalon team (Algorithms and
Software Architectures for Distributed and HPC Platforms) from the LIP
laboratory in Ecole Normale SupŽrieure of Lyon, France. He has organized several conferences in
high performance networking and computing
and he is a member of several program committees. He has co-authored
more than 100 papers published in refereed journals and conference proceedings.
His interests include: energy efficiency in large scale distributed systems,
high performance computing, distributed computing and networking, high
performance networks protocols and services.
See http://perso.ens-lyon.fr/laurent.lefevre for further information.
Ewing ÒRustyÓ Lusk is currently Argonne Distinguished Fellow
Emeritus at Argonne National Laboratory. After obtaining his degree in
pure mathematics at the University of Maryland in 1970, he spent 12 years as
professor of mathematics and computer science at Northern Illinois University
before moving to Argonne in 1982. There he worked on automated reasoning,
logic programming, and parallel computing software. His primary contributions
have been in standardizing the message-passing model (MPI, with many others)
and implementing it (MPICH, with Bill Gropp and others). Most recently,
he has been an active member of the UNEDF and NUCLEI DOE SciDAC projects in
computational nuclear physics. He doesnÕt know any physics, but the
physicists donÕt know any computer science, so it works out.
Doug Miles is Director of PGI compilers & tools at NVIDIA. He has worked
in HPC for 30 years in math library development, benchmarking, programming
model development, technical marketing and SW engineering management at
Floating Point Systems, Cray Research Superservers, The Portland Group,
STMicroelectronics and NVIDIA.
Bernd Mohr started to design and develop tools for performance analysis of
parallel programs already 1987 at the University of Erlangen in Germany. During
a three year Postdoc position at the University of Oregon, he designed and
implemented the original TAU performance analysis framework. Since 1996 he has
been a senior scientist at Forschungszentrum JŸlich. Since 2000, he is the team
leader for the group "Programming Environments and Performance
Optimization". Besides being responsible for user support and training in
regard to performance tools at the JŸlich Supercomputing Centre (JSC), he is
leading the Scalasca and Score-P performance tools efforts in JŸlich. Since
2007, he also serves as deputy head for the JSC division "Application
support". For the SC and ISC Conference series, he serves on the Steering
Committee. He is the author of many conference and journal articles about
performance analysis and tuning of parallel programs.
Frank Mueller is a Professor in Computer
Science and a member of multiple research centers at North Carolina State
University. Previously, he held positions at Lawrence Livermore National
Laboratory and Humboldt University Berlin, Germany. He received his Ph.D. from
Florida State University in 1994.
He has published papers in the areas of parallel and distributed systems,
embedded and real-time systems and compilers. He is a member of ACM SIGPLAN, ACM
SIGBED and a senior member of the ACM and IEEE Computer Societies as well as an
IEEE Fellow and an ACM Distinguished Scientist. He is a recipient of an NSF Career
Award, an IBM Faculty Award, a Google Research Award and two Fellowships from
the Humboldt Foundation.
Dimitrios Nikolopoulos is Professor, Head of the School of
Electronics, Electrical Engineering and Computer Science, and Acting Director
of the Centre for Data Science and Scalable Computing at QueenÕs University
Belfast. His current research focuses on low-latency analytics and new
computing paradigms that push the boundaries of performance and
energy-efficiency. His accolades include a Royal Society Wolfson Research
Fellowship, NSF and DOE Career Awards, and an IBM Faculty Award. He direct a
research group of 30 staff and a research grant portfolio of approximately £30
million.
Christian Obrecht is an associate professor of Applied Physics
at the department of Civil Engineering and Urban Planning of the National
Institute of Applied Sciences in Lyon (INSA Lyon). Dr Obrecht first graduated
in Mathematics from UniversitŽ de Strasbourg in 1990. From 1993 to 2008, he
served as a teacher of mathematics in French secondary education. He obtained a
masterÕs degree in Computer Science from UniversitŽ Lyon 1 in 2009 and a
doctoral degree in Civil Engineering from INSA Lyon in 2012. He was appointed
associate professor in 2015 and joined both the Thermal Energy Storage and the
Building Physics research groups at the Centre of Energy and Thermal Sciences
of Lyon (CETHIL). His research work
focuses on innovative approaches in computational fluid dynamics suited to
massively parallel processors with applications to high performance simulations
of heat storage processes and of urban microclimatic conditions.
Manish Parashar is Distinguished Professor of Computer
Science at Rutgers University. He is also the founding Director of the Rutgers
Discovery Informatics Institute (RDI2). His research interests are in the broad
areas of Parallel and Distributed Computing and Computational and Data-Enabled
Science and Engineering. Manish serves on the editorial boards and organizing
committees of a large number of journals and international conferences and
workshops, and has deployed several software systems that are widely used.
He has also received a number of awards and is Fellow of AAAS, Fellow of
IEEE/IEEE Computer Society and ACM Distinguished Scientist. For more
information please visit http://parashar.rutgers.edu/.
Padma Raghavan specializes in high-performance computing
and its applications with a particular focus on sparse graph and matrix
problems. Her contributions are in the areas of scalable parallel
computing; energy-aware supercomputing; and computational modeling, simulation
and knowledge extraction. Prior to joining Vanderbilt in Feb
2016, Raghavan served as the Associate Vice President for Research
and Director of Strategic Initiatives at Penn State, where she was also a
Distinguished Professor of Computer Science and Engineering and the founding
director of the university-wide Institute for CyberScience. Raghavan is
now a Professor of Computer Science and Computer Engineering and the Vice
Provost for Research at Vanderbilt University.
Gunter Roeth joined NVIDIA as a Solution Architect in
October last year having previously worked at Cray, HP, Sun Microsystems and
most recently BULL. He has a Master in geophysics from the Institut de Physique
du Globe (IPG) in Paris and has completed a PhD in seismology on the use of
neural networks (artificial intelligence) for interpreting geophysical data.
Robert Ross is Interim Director of the
Mathematics and Computer Science (MCS) Division at Argonne National Laboratory.
Rob is a senior fellow in the Northwestern-Argonne Institute for Science and
Engineering and in the University of Chicago and Argonne Computation Institute.
He also serves an adjunct assistant professor in the Department of Electrical
and Computer Engineering at Clemson University. His research interests include
design, implementation, and deployment of complex distributed systems and data
and communication system software for high-performance computing. Rob received
his Ph.D. in computer engineering from Clemson University in 2000. He currently
holds several leadership positions at Argonne and in the U.S. Department of
Energy (DOE) computing community, including serving as deputy director of the
Scientific Data Management, Analysis and Visualization Institute and as lead of
the Data Management and Workflow Software component of the DOE Office of
Science Exascale Computing activity.
Vaidy Sunderam is a faculty member at Emory University.
His research interests are in parallel and distributed systems,
infrastructures for collaborative computing, and data security. His prior and
recent research efforts have focused on system architectures and
implementations for heterogeneous metacomputing, collaborative resource
sharing, and data management systems. Vaidy teaches computer science at the
beginning, advanced, and graduate levels, and advises graduate theses in the
area of computer systems.
Martin Swany is Associate Chair and Professor in the
Intelligent Systems Engineering Department in the School of Informatics and
Computing at Indiana University, and the Deputy Director of the Center for
Research in Extreme Scale Technologies (CREST). His research interests
include high-performance parallel and distributed computing and networking.
Michela Taufer is an associate professor in the same
department at the University of Delaware. She earned her masterÕs degrees
in Computer Engineering from the University of Padova (Italy) and her doctoral
degree in Computer Science from the Swiss Federal Institute of Technology
(Switzerland). From 2003 to 2004 she was a La Jolla Interfaces in Science
Training Program (LJIS) Postdoctoral Fellow at the University of California San
Diego (UCSD) and The Scripps Research Institute (TSRI), where she worked on
interdisciplinary projects in computer systems and computational chemistry.
From 2005 to 2007, she was an Assistant Professor at the Computer Science
Department of the University of Texas at El Paso (UTEP). She joined the
University of Delaware in 2007 as an Assistant Professor and was promoted to
Associate Professor with tenure in 2012.
Taufer's research interests include scientific applications and their advanced programmability in heterogeneous computing (i.e., multi-core and many-core platforms, GPUs); performance analysis, modeling, and optimization of multi-scale applications on heterogeneous computing, cloud computing, and volunteer computing; numerical reproducibility and stability of large-scale simulations on multi-core platforms; big data analytics and MapReduce.
Samuel Thibault is Assistant Professor at the University of Bordeaux since 2008, and
part of the Inria STORM team. His researches revolve around thread, task, and
data transfer scheduling in parallel and distributed runtime systems. He is
currently focused on the design of the StarPU runtime, and more particularly
its scheduling heuristics for heterogeneous architectures and for distributed
systems.
Bernard Tourancheau got a MSc. in Apply Maths from Grenoble University in 1986 and a MSc. in Renewable Energy Science and Technology from Loughborough University in 2007. He was awarded best Computer Science PhD by Institut National Polytechnique of Grenoble in 1989 for his work on Parallel Computing for Distributed Memory Architectures.
He was appointed assistant professor at Ecole Normale SupŽrieure de Lyon LIP lab in 1989 before joining CNRS as a junior researcher. After initiating a CNRS-NSF collaboration, he worked on leave at the University of Tennessee on a senior researcher position with the US Center for Research in Parallel Computation at the ICL laboratory.
He then took a Professor position at University of Lyon in 1995 where he created a research laboratory and the INRIA RESO team, specialized in High Speed Networking and HPC.
In 2001, he joined SUN Microsystems Laboratories for a 6 years sabbatical as a Principal Investigator in the DARPA HPCS project where he lead the backplane networking group.
Back in academia he oriented his research on wireless sensor networks for building energy efficiency at ENS LIP and INSA CITI labs.
He was appointed Professor at University Joseph Fourier of Grenoble in 2012. Since then in the LIG lab Drakkar team, he is developing research about protocols and architectures for the Internet of Things. He as well pursues HPC multicores GPGPU's communication algorithms optimization research. He is also a scientific promoter of renewable energy transition, relocalization and low tech to answer the peak oil and global warming issues.
He has authored more than 140 peer-reviewed publications and filed 10 patents.
Jeffrey Vetter, Ph.D., is a Distinguished R&D Staff Member at Oak Ridge National Laboratory (ORNL). At ORNL, Vetter is the founding group leader of the Future Technologies Group in the Computer Science and Mathematics Division. Vetter also holds joint appointments at the Georgia Institute of Technology and the University of Tennessee-Knoxville. Vetter earned his Ph.D. in Computer Science from the Georgia Institute of Technology. Vetter is a Senior Member of the IEEE, and a Distinguished Scientist Member of the ACM. In 2010, Vetter, as part of an interdisciplinary team from Georgia Tech, NYU, and ORNL, was awarded the ACM Gordon Bell Prize. Also, his work has won awards at major conferences including Best Paper Awards at the International Parallel and Distributed Processing Symposium (IPDPS) and EuroPar, Best Student Paper Finalist at SC14, and Best Presentation at EASC 2015. In 2015, Vetter served as the SC15 Technical Program Chair. His recent books, entitled "Contemporary High Performance Computing: From Petascale toward Exascale (Vols. 1 and 2)," survey the international landscape of HPC. See his website for more information: http://ft.ornl.gov/~vetter/.
FrŽdŽric Vivien received his Ph.D. degree from the ƒcole Normale SupŽrieure de Lyon in 1997. From 1998 to 2002, he was an associate professor at the Louis Pasteur University in Strasbourg, France. He spent the year 2000 working with the Computer Architecture Group of the MIT Laboratory for Computer Science. He is currently a senior researcher from INRIA, working at ENS Lyon, France. He leads the INRIA project-team Roma, which focuses on designing models, algorithms, and scheduling strategies to optimize the execution of scientific applications. He is the author of two books, more than 35 papers published in international journals, and more than 50 papers published in international conferences. His main research interests are scheduling techniques and parallel algorithms for distributed and/or heterogeneous systems.
David Walker is Professor of High Performance Computing in the School of Computer Science and Informatics at Cardiff University, where he heads the Distributed Collaborative Computing group. From 2002-2010 he was also Director of the Welsh e-Science Centre. He received a B.A. (Hons) in Mathematics from Jesus College, Cambridge in 1976, an M.Sc. in Astrophysics from Queen Mary College, London, in 1979, and a Ph.D. in Physics from the same institution in 1983. Professor Walker has conducted research into parallel and distributed algorithms and applications for the past 25 years in the UK and USA, and has published over 140 papers on these subjects. Professor Walker was instrumental in initiating and guiding the development of the MPI specification for message-passing, and has co-authored a book on MPI. He also contributed to the ScaLAPACK library for parallel numerical linear algebra computations. Professor WalkerÕs research interests include software environments for distributed scientific computing, problem-solving environments and portals, and parallel applications and algorithms. Professor Walker is a Principal Editor of Computer Physics Communications, the co-editor of Concurrency and Computation: Practice and Experience, and serves on the editorial boards of the International Journal of High Performance Computing Applications, and the Journal of Computational Science.
CCGSC 1992, Participants (Some)
CCGSC 1994 Participants (Some),
Blackberry Farm, Tennessee
Missing CCGSC 1996 - Anyone have a picture?
CCGSC 1998 Participants,
Blackberry Farm, Tennessee
CCGSC 2000 Participants,
Faverges, France
CCGSC 2002 Participants,
Faverges, France
CCGCS 2004 Participants, Faverges, France
CCGCS 2006 Participants, Flat
Rock North Carolina
Some additional
pictures can be found here.
http://web.eecs.utk.edu/~dongarra/ccgsc2006/
CCGCS 2008 Participants, Flat
Rock North Carolina
http://web.eecs.utk.edu/~dongarra/ccgsc2008/
CCGCS 2010 Participants, Flat
Rock North Carolina
http://web.eecs.utk.edu/~dongarra/ccgsc2010/
CCDSC 2012 Participants,
Dareize, France
http://web.eecs.utk.edu/~dongarra/CCDSC-2012/index.htm
CCGSC 2014 Participants,
Dareize, France
http://web.eecs.utk.edu/~dongarra/CCDSC-2014/index.htm