Professional Overview
Research Leadership Profile
Key Achievements
- IEEE Fellow
- IEEE Charles Babbage Award
- 5 Achievement/Service/Honor Awards
- 2 R&D 100 Awards for innovative software
- 12 Best papers Finalists/Awards
- 300+ publications with 19,500+ citations
- 113 invited talks including dozens of keynotes
- 22 Ph.D. students advised
Current Focus Areas
LLMs Evaluation / LLMs for Science
Resilient High-Performance Parallel/Distributed Computing
Compression of Scientific Data
Education
Ph.D. + 7 years diploma (HDR)
University of Paris XI, October 2001
Jury: Michel Cosnard, Brigitte Plateau, Marc Snir, Ian Foster, Mitsuhisa Sato, Joffroy Beauquier
Ph.D. in Computer Science
University of Paris XI, January 1994
Mention très honorable avec les félicitations du jury
Jury: Michel Cosnard, Brigitte Plateau, William Jalby, Zvonko Vranesic, Daniel Etiemble
Jury: Michel Cosnard, Brigitte Plateau, William Jalby, Zvonko Vranesic, Daniel Etiemble
DEA (Pre-doctoral degree)
University of Paris XI, July 1989
Master of Science
University of Paris VIII, July 1988
Professional Positions
Project Manager and Senior Computer Scientist
Argonne National Laboratory | April 2013 - Present
Leading research on resilience and compression in high-performance computing, directing multiple ECP projects, and developing innovative solutions for extreme-scale computing challenges.
Adjunct Research Professor
University of Illinois at Urbana-Champaign | April 2013 - Present
Visiting Research Professor
University of Illinois at Urbana-Champaign | July 2009 - March 2013
Senior Researcher
INRIA | September 2003 - December 2013
Junior Researcher
CNRS | February 1994 - August 2003
Awards & Honors
Scientific & Leadership Recognitions
▼
- 2024 IEEE Charles Babbage Award
- 2024 Secretary of DOE Honor's Award
- 2024 Europar Achievement Award
- 2022 ACM HPDC Achievement Award
- 2021 IEEE TC Award for Editorial Service and Excellence
- 2018 IEEE TCPP Outstanding Service and Contribution Award
- 2017 IEEE Fellow, Class of 2017
Technical Software Recognitions
▼
- 2021 R&D 100 Award - SZ: A Lossy Compression Framework for Scientific Data
- 2019 R&D 100 Award - Scalable Checkpoint/Restart (SCR) Framework
Best Papers & Special Recognitions
▼
Best Paper Awards by Year
- 2025 IEEE IPDPS - "Enabling Efficient Error-controlled Lossy Compression" Best Paper
- 2025 ACM HPDC - "IPComp: Interpolation Based Progressive Lossy Compression" Best Paper Candidate
- 2025 ACM ICS - "Pushing the Limits of GPU Lossy Compression" Best Paper Candidate
- 2024 ACM HPDC - "DataStates-LLM: Lazy Asynchronous Checkpointing for LLMs" Best Paper
- 2024 IEEE/ACM SC - "hZCC: Accelerating Collective Communication" Best Paper Candidate
- 2023 IEEE Transactions on Big Data - Best Paper (123 published papers in 2023)
- 2023 ACM ICS - "FZ: A flexible auto-tuned modular framework" Best Paper Candidate
- 2023 IEEE Cluster - Best Student Poster Finalist
- 2022 HiPC - "Towards Efficient Cache Allocation" Best Paper
- 2022 DRBSD workshop (IEEE/ACM SC) - "Understanding Effects of Modern Compressors" Best Paper
- 2022 IEEE/ACM SC - "Mitigating Silent Data Corruptions" Best Paper & Best Student Paper Finalist
- 2018 IEEE Cluster - Overall Best Paper + 3 area Best Papers
- 2011 IEEE/ACM SC - FTI Paper Perfect Score Award
- 2007 Europar - "Characterizing Result Errors" Best Paper
- 2001 IEEE CCGRID - "OVM: Out-of-Order Execution" Best Paper
Student Awards:
- 2023: Graduate student 1st place ACM SRC award
- 2022: Undergraduate 1st place ACM SRC award
- 2022: Graduate student 2nd place ACM SRC award
Current Responsibilities
Lead of AuroraGPT Evaluation Group
Since January 2024
Responsible for overseeing the design and development of evaluation methods, including benchmarks (MCQs, Open Responses, Chain of thoughts, etc.) and other measurement techniques, to assess the performance of Large Language Models (LLMs) as research assistants.
Leader of Resilience and Compression Topics at Argonne/MCS
Since April 2013
Developing research strategy for resilience and compression within MCS, coordinating with DOE program managers, submitting research proposals, participating in ECP-related projects, developing new research topics, and disseminating research results and software.
Executive Director for ANL of JLESC
Initiated in 2014
INRIA-Illinois-ANL-BSC-JSC-Riken-UTK Joint-Laboratory on Extreme Scale Computing. Responsible for development of JLESC activities including workshops and visits, as well as day-to-day management. Initiated this Joint-Lab in 2014 and directed it until 2022.
Publications
Publication Statistics
6
Books, Proceedings
76
Peer Reviewed Journal Articles
237
Peer reviewed Conference and Workshop Papers
113
Invited Keynotes, Plenaries and Talks
Most Influential Publications
▼
The International Exascale Software Project Roadmap
966 citations
IJHPCA, 2011 - J. Dongarra et al.
966 citations
IJHPCA, 2011 - J. Dongarra et al.
Grid'5000: A Large-scale Platform
711 citations
IEEE/ACM GRID 2005 - R. Bolze, F. Cappello et al.
711 citations
IEEE/ACM GRID 2005 - R. Bolze, F. Cappello et al.
XtremWeb: Generic Global Computing System
600 citations
IEEE/ACM CCGRID 2001 - G. Fedak et al.
600 citations
IEEE/ACM CCGRID 2001 - G. Fedak et al.
Fast Error-bounded Lossy HPC Data Compression with SZ
599 citations
IEEE IPDPS 2016 - S. Di, F. Cappello
599 citations
IEEE IPDPS 2016 - S. Di, F. Cappello
Cost-benefit Analysis of Cloud Computing
539 citations
IEEE IPDPS 2009 - D. Kondo et al.
539 citations
IEEE IPDPS 2009 - D. Kondo et al.
Toward Exascale Resilience
493 citations
IJHPCA, 2009 - F. Cappello et al.
493 citations
IJHPCA, 2009 - F. Cappello et al.
MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes
483 citations
IEEE/ACM SC 2002 - G. Bosilca et al.
483 citations
IEEE/ACM SC 2002 - G. Bosilca et al.
FTI: High performance fault tolerance interface for hybrid systems
453 citations
IEEE/ACM SC 2011 - L. Bautista-Gomez et al.
453 citations
IEEE/ACM SC 2011 - L. Bautista-Gomez et al.
MPI versus MPI+ OpenMP on the IBM SP for the NAS Benchmarks
390 citations
IEEE/ACM SC 2000 - F. Cappello et al.
390 citations
IEEE/ACM SC 2000 - F. Cappello et al.
All Papers
▼
All Invited Keynotes/Plenaries and Talks
▼
- [I113] AuroraGPT: Exploring AI Assistants for Science, ORAP Forum, Invited talk, Nov., 2025
- [I112] EAIRA: Establishing a Methodology to Evaluate LLMs/LRMs as Research Assistants, Exa-DoST 2025 Annual Meeting, Invited talk, Nov., 2025
- [I111] EAIRA: Establishing a Methodology to Evaluate LLMs/LRMs as Research Assistants, Clemson's inaugural HPC Day, Invited Keynote, Sept., 2025
- [I110] EAIRA: Establishing a Methodology to Evaluate LLMs/LRMs as Research Assistants, ECMWF’s 50th anniversary celebrations, Invited Keynote, Sept., 2025
- [I109] EAIRA: Establishing a Methodology to Evaluate LLMs/LRMs as Research Assistants, Trillion Parameters Consorsium annual meeting (TPC25), Invited plenary, Aug., 2025
- [I108] EAIRA: Establishing a Methodology to Evaluate LLMs/LRMs as Research Assistants, AI workshop on spectroscopy, JLAB (Jefferson Lab), Invited talk, June, 2025
- [I107] AuroraGPT and the 1000 Scientists AI JAM. Exascale Round Table Committee, Invited talk, May, 2025
- [I106] EAIRA: Establishing a Methodology to Evaluate LLMs/LRMs as Research Assistants, HPC&AI Workshop at Stony Brook University, Theoretical Physics Symposium, Invited talk, May, 2025
- [I105] EAIRA: Establishing a Methodology to Evaluate LLMs/LRMs as Research Assistants, Theoretical Physics Symposium, Perimeter Institute, Toronto, Keynote, April, 2025
- [I104] EAIRA: Establishing a Methodology to Evaluate LLMs/LRMs as Research Assistants, TPC (Trillion Parameters Consortium), Invited talk, virtual, April, 2025
- [I103] EAIRA: Establishing a Methodology to Evaluate LLMs/LRMs as Research Assistants, Invited seminar, Virginia University, March 2025
- [I102] AuroraGPT/Eval: Establishing a Methodology to Evaluate LLMs/LRMs as Research Assistants, ETP4HPC, Keynote, February 2025
- [I101] AuroraGPT/Eval: Establishing a Methodology to Evaluate LLMs/LRMs as Research Assistants, Riken RCCS Symposium, Invited Plenary, Jan 2025
- [I100] AuroraGPT/Eval: Establishing a methodology to evaluate LLMs/FMs as Research Assistants, University of Virginia, AI for science workshop, Keynote, October 2024
- [I99] How much can we reduce scientific data without losing science, Invited Seminar CS department, Northwestern U., Evanston, Jan 2024
- [I98] AuroraGPT: Exploring AI Assistants for Science, Keynote, IPDPS24, San Fransisco, June 2024
- [I97] AuroraGPT, Evaluation of AI Assistants for Science: Critical and non-Trivial, Invited Plenary, TPC 2024 Barcelona, August 2024
- [I96] AuroraGPT: Rationale, Challenges and Development of an AI Research Assistant, Keynote, Europar24, August 2024
- [I95] Establishing a Methodology to Evaluate AI Models as Research Assistants, Invited Plenary, CCDSC 2024 Lyon, September 2024
- [I94] Frontier AI for Science Security and Technology: FASST, Invited Plenary, NSF HDR Ecosystem Conference 2024, Champaign, September 2024
- [I93] AuroraGPT: Rationale, Data Challenges and Development of an AI Research Assistant, NYSDS 2024, Invited Plenary, New York (remote), September 2024
- [I92] Toward AI-augmented SWARM based resilience for Integrate Research Infrastructures, Keynote, SuperCheck workshop at SC23, November 2023
- [I91] How much can we really compress scientific data without losing science?, Keynote, LIG (Laboratoire Informatique de Grenoble) Keynote Speeches, 2023
- [I90] How much can we really compress scientific data, Invited talk, Workshop on Clusters, Clouds, and Data for Scientific Computing, CCDSC, 2022
- [I89] A Reflection on Methodologies, Algorithms, and Software for HPDC, Keynote, ACM HPDC 2022
- [I88] Fault-tolerance Resilience at Extreme Scale, Keynote, IEEE DSN, 2022
- [I87] Compression techniques in the US Exascale Program (ECP), Invited talk, Workshop on, data compression for weather and climate data, 2022
- [I86] High Ratio, Speed and Accuracy Customizable Scientific Data Compression with SZ, Keynote, The Second International Workshop on Big Data Reduction, IEEE International Conference on Big Data, 2021
- [I85] Lossy compression for scientific data, APS seminar, Dec. 2021
- [I84] Scientific data reduction, from renaissance to modern age (CCDSC), Invited talk, Lyon, Sept. 2021. Canceled because of COVID-19
- [I83] Cooking the perfect reduction or how to shrink science data while keeping its substance, Invited seminar, Clemson University, Computer Science, February 2021
- [I82] International Forum on Detectors for Photon Science (IFDEPS 2020), Mar. 2020. Canceled because of COVID-19
- [I81] Fulfilling the promises of Lossy compression for scientific applications, (CCDSC), Invited talk, Lyon, Sept. 2020. Canceled because of COVID-19
- [I80] Compression for scientific data, invited seminar, Inria Grenoble, IMAG building, Feb. 2020
- [I79] HPC-BigData Convergence: What to do when scientific data becomes too big?, Keynote, Scheduling workshop, Bordeaux, June 2019
- [I78] Trends in HPC Resilience From Extreme Homogeneity to Extreme Heterogeneity, HPDC PC meeting workshop, Arlington, March. 2019
- [I77] The ECP EZ project, ECP Video Interview, Dec. 2018
- [I76] Keeping-up with the flood of scientific data, Keynote, HiPEAC CSW, Oct. 2018
- [I75] Three frontiers of lossy compression for scientific data, HPC and Data Science for Scientific Discovery, Invited talk, UCLA, Oct. 2018
- [I74] Keeping-up with the flood of scientific data, Invited talk, Co-design Workshop, China, Oct. 2018
- [I73] Three frontiers of lossy compression for scientific data, Workshop on Clusters, Clouds, and Data for Scientific Computing (CCDSC), Invited talk, Lyon, Sept. 2018
- [I72] Keeping-up with the flood of scientific data, Keynote, IEEE ISPDC 2018, Switzerland, June 2018
- [I71] Lossy compression of scientific simulation data: from visualization to checkpoint/restart, International workshop on the Convergence of Extreme Scale Computing and Big Data Analysis, collocated with IEEE IPDPS 2018, Invited talk, May 2018
- [I70] Keeping-up with the Flood of Data in Extreme Scale Simulations, Colloquium of Center for Computational Sciences, University of Tsukuba, May 2018.
- [I69] Progress toward transparent asynchronous multi-level checkpointing with VeloC, SIAM-PP - Resilience mini-symposium, Invited talk, Tokyo, Japan, Mar. 2018, replaced by Bogdan Nicolae
- [I68] Addressing Fault Tolerance and Data Compression at Exascale, ECP Podcast, published in Inside HPC, https://insidehpc.com/2018/02/podcast-addressing-fault-tolerance-data-compression-exascale/
- [I67] Lossy Compression of Scientific Datasets, Keynote, PPAM 2017 conference, Poland, Sept. 2017
- [I66] Reconfigurable Computing for Beyond Moore Computing, Invited Panelist, Smoky Mountain Conference (SMC), Sept. 2017
- [I65] Checkpoint/Restart: Why You Should Delegate it to a Specialized Library, Invited talk, SIAM Annual Meeting, Pittsburgh, July 2017
- [I64] From General Purpose-Exact computing to Tailored-Lossy computing (scientific computing), Invited talk, Greater Chicago Area System Research Workshop, IIT, Chicago, April 2017
- [I63] FPGA for Scientific Computing and Data Analytics, Invited talk, International, workshop on Co-design, Xian, China, Oct. 2016
- [I62] The Exascale Computing Project and Argonne software activity, Keynote, CREST workshop, Tokyo, Dec. 2016
- [I61] Scientific Computing and Data Analytics: How to Deal with the Flood of Data. Distinguished Lecture, Northeaster University, Boston, Boston, Nov. 2016
- [I60] Lossy Compression of sSientific Data: From Stone Age to Renaissance, Workshop on Clusters, Clouds, and Data for Scientific Computing (CCDSC), Invited talk, Lyon, Oct. 2016
- [I59] FPGA for Scientific Computing and Data Analytics, Invited talk, International workshop on Co-design, Xian, China, Oct. 2016
- [I58] Reconfigurable Computing: An Ingredient of Post-Moore Scientific Computing?, Invited dinner talk, Argonne Training Program on Extreme-Scale Computing (ATPESC), St Charles, Aug. 2016
- [I57] On-Demand Data Analytics and Storage for Extreme-Scale Simulations and Experiments, Invited short talk, BDEC meeting, Frankfurt, June 2016
- [I56] Trust in Results of Numerical Simulation: the New Challenging Scientific Problem in Reliability, Invited Plenary, Conference on Data Analysis: CODA 2016, Santa Fé, March 2016
- [I55] Grid'5000 Origin and Some Suggestions for the Next 10 Years, Invited talk, Grid5000 school, Feb. 2016
- [I54] The Joint Laboratory for Extreme Scale Computing: Investigating the challenges of post petascale scientific computing, Invited talk, 6th AICS International Symposium, Feb. 2016
- [I53] Taking on Exascale Challenges: Key Lessons and International Collaboration Opportunities Delivered by European Cutting-Edge HPC Initiatives, Invited Panelist, SC15 BOF on European HPC Technology Projects, Nov. 2015
- [I52] Re-form: Approaching Reconfigurable Computing for HPC and Data Analytics, Invited talk, International Workshop on Co-design, Wuxi, China, Nov. 2015
- [I51] Let's Forget about "Fault Tolerance" and "Resilience" for HPC ; Trust is the New Challenging Scientific Problem in Reliability, Keynote, FTS workshop as part of IEEE Cluster 2015, Chicago, Sept. 2015
- [I50] Toward Exascale Resilience, Keynote, HiPEAC, thematic session on “reliability for exascale platforms and its impact on performance, from the point of view of programming models,” Oslo, May 5-7, 2015 (cancelled)
- [I49] Advances in Climate Simulations at Extreme Scale, Invited Plenary, International workshop on Co-design, Guanzhou, China, 2014 (cancelled)
- [I48] Toward Approximate Detection of Silent Data Corruptions, CCDSC, Invited Plenary, France, 2014
- [I47] Resilient Algorithms and Computing Models, SIAM Annual Meeting, Invited talk, USA, 2014
- [I46] Climate Modeling at Extreme Scale, Invited plenary, International workshop on Co-design, Guilin, China, 2013
- [I45] High Performance Fault Tolerance / Resilience at Extreme Scale, Keynote, HPCS 2013, Helsinki July, 2013
- [I44] Advanced Fault Tolerance Techniques for Postpetascale Systems, Invited plenary, AICS Symposium, Kobe, 2013
- [I43] Fault Tolerance at Exascale: Recent Progresses and Open Questions, Keynote, IEEE Cluster, Beijing, 2012
- [I42] Fault Tolerance for HPC at Extreme Scale: The Disruptive Way, Keynote, SPAC-PAD, New York, 2012
- [I41] Toward Exascale Climate Simulation: Exploring Limits of Current Codes, Invited talk at “Weather and Climate Prediction on Next Generation Supercomputers: Numerical and Computational Aspects, Met office, Exeter, UK, 2012
- [I40] Failure Prediction: Current Situation and Open Questions, CCDSC, Invited plenary, France, 2012
- [I39] Redesiging Fault Tolerance for High Performance Computing, Distinguished Speaker seminar, I2PC, UIUC, 2012
- [I38] A Holistic Approach for Exascale (Scalable) Resilience, Keynote talk, IEEE/ACM SC11/ScalA workshop, 2011
- [I37] Fault Tolerance for High Performance Computing Applications in Hostile Environments: Exascale and Cloud, KEYNOTE talk, IEEE IPDPS/DPDNS11, Anchorage, 2011
- [I36] Exascale: The Great Disruption, Keynote talk, PDP 2011, Cyprus, 2011
- [I35] EESI: the European Exascale Software Initiative, KEYNOTE talk, Intel Exascale Leadership Conference, 2011
- [I34] Toward Exascale Resilience, Invited talk HiPC workshop “Reaching Exascale in This Decade,” 2010
- [I33] From Grid to Cloud: A View from the Experimental Platform Side, Invited Plenary talk, IEEE Grid 2008
- [I32] Fault Tolerance & PetaScale Systems: Current Knowledge, Challenges and Opportunities, Keynote talk, Europar, Spain, 2008
- [I31] Fault Tolerance & PetaScale Systems: Current Knowledge, Challenges and Opportunities», Keynote talk, EuroPVM/MPI, Dublin, 2008
- [I30] French National Grid Testbed: Grid 5000, Keynote talk, DCABES 2008, Dalian, China
- [I29] Towards an International Computer Science Grid, Keynote talk, IEEE/ACM CCGRID'2007, Rio, Brazil, 2007
- [I28] Towards an International Computer Science Grid, Keynote talk, GCP'2007, Paris, 2007
- [I27] Towards an International Computer Science Grid, Keynote talk, Symposium on Grid, Delft, 2007
- [I26] Towards an International Computer Science Grid, Keynote talk, IEEE WETICE, Paris, 2007
- [I25] Fault Tolerance & PetaScale Systems: Current Knowledge, Challenges and Opportunities, Clusters and Computational Grids for Scientific Computing 2008, Highland Lake Inn, Asheville, September 10–13, 2008
- [I24] Fault Tolerance & PetaScale Systems: Current Knowledge, Challenges and Opportunities, HPC Conference, Cetraro, July 2008
- [I23] Grid'5000, Motivations, Status and Early Results, Grid@Asia workshop, Seoul, Corea, December 13, 2006
- [I22] When Scale Reactivates Research in Distributed Computing: Grid'5000, Instant Grid and MPI-V, STIC-Amsud meeting, Santiago, Chile, October 18-20, 2006
- [I21] Grid'5000, Motivations, Status and Early Results, Workshop of the Grille Academic Tunisienne pour la Recherche Scientifique, Tunis, Tunisia, Oct. 2006
- [I20] An Update of Grid'5000 and a Focus on a Fault Tolerant MPI Experiment, Clusters and Computational Grids for Scientific Computing 2006, Highland Lake Inn, Asheville, USA, September 10-13, 2006
- [I19] Grid'5000, Motivations, Status and Early Results, HPC Conference, Cetraro, July 2006
- [I18] Grid and Utility Computing: Do they really mean Pervasive Services?, ICPS 2006 panel session, June 2006
- [I17] Grid 5000: The Need for Experimental Platform for Grid Research, ExpGrid Workshop Panel, Paris, June 2006
- [I16] Grid'5000, Motivations, Status and Early Results, Workshop new trends in HPDC, Amsterdam, March, 2006
- [I15] Grid Projects in France and Europe, Colloquium on "25 years of collaboration between Instituto de Informatica de l'UFRGS and France,” Porto Alegre, November 2005
- [I14] Grid'5000, Workshop Grid@large, in conjunction with Europar 2005, Lisboa, August 2005
- [I13] Dependability in Grids, Workshop of the IFIP WG10.4 ON DEPENDABLE COMPUTING AND FAULT TOLERANCE, Yokoama, July, 2005
- [I12] Grid Research Tools and Grid'5000, workshop on P2P: concept, outils et applications ; Geneve, May 2005
- [I11] Dependability in Grids, panel "Dependability Challenges and Education Perspectives", Fifth European Dependable Computing Conference, Budapest, April 2005
- [I10] Desktop Grid, Global Computing and P2P Distributed Systems, workshop on Advanced Grid Technologies, Systems & Services, Session: Grid Foundations for Business & Industry, IST Call 5, Brussels, February 2005
- [I9] The MPICH-V Project, ENS/NSF Workshop, Lyon, September 2004
- [I8] Hybrid Preemptive Scheduling of MPI Applications on the Grids, Scheduling Workshop, Modane, August 2004
- [I7] P2P Computing: From Expectations to Feedback, Trans-European-Research and Education Networking Association, Zagreb, Croatia, May 2003
- [I6] Desktop Grids with XtremWeb: Experiences and Feedback, panel “Desktop Grids: 10,000 fold parallelism for the masses” SuperComputing 2002 (SC2002), November 2002, Baltimore
- [I5] XtremWeb: Toward High Performance Computing on P2P systems, Advanced Research Workshop on High Performance Computing, Cetraro, June 2002
- [I4] Système de Calcul Global Pair à Pair, Journée de l'ORAP, Saclay, March 2002
- [I3] OVM: High Performance Computing with RPC Programming Style, Score Users Group Meeting, Oxford, UK September 2000
- [I2] Understanding Performance of SMP Clusters for the NAS Benchmark, Workshop on Grid and Cluster Computing, Tsukuba, Japan, March 2000
- [I1] Comparing Performance of MPI and MPI+OpenMP for NAS Benchmark on IBM SP3, IBM Watson ACTC European Workshop, Paris, France, May 2000
Software & Methods
SZ Lossy Compressor
Started in 2015 | R&D100 Award 2021 | Part of the E4S software stack deployed on Exascale Systems
Lossy compressor strictly respecting user-set error bounds. Integrated into HDF5, ADIOS, and NetCDF I/O libraries as part of the ECP project. Helps several Exascale Computing Project applications reduce dataset sizes significantly.
GitHub Repository →
GitHub Repository →
VeloC Checkpoint-Restart Framework
Started in 2016 | R&D100 Award 2019 (as part of SCR2) | Part of the E4S software stack deployed on Exascale Systems
Multilevel checkpoint-restart framework helping Exascale Computing Project applications reduce checkpoint/restart overhead with minimum code modification.
GitHub Repository →
GitHub Repository →
Grid'5000
Started in 2003 | 6000+ users | 2500+ publications, 300+ Ph. d Thesis used Grid5000 for their experiments
Initiated and directed this experimental platform for parallel and distributed computing from 2003-2008. Transformed clusters distributed across France into a fully reconfigurable experimental platform. Inspired NSF Future Grid and Chameleon projects.
Grid'5000 Official Website →
Grid'5000 Official History →
Grid'5000 Official Website →
Grid'5000 Official History →
MPICH-V
Started in 2002 | served/inspired 10s of papers (see list) totaling >7000+ citations
Experimental platform for fault tolerant protocols. Origin of a decade of research producing tens of publications in fault tolerance.
XtremWeb
Started in 1999
Experimental platform for Desktop Grid. Adopted as the foundation of the iExec platform.
iExec Platform →
iExec Platform →
Research Grants
Grant Portfolio Summary
60+ Grants as Main PI or Co-PI:
- 20+ French research projects
- 6 European projects (2 STREPS, 1 NoE, 1 Infrastructure, 2 support actions)
- 20+ USA grants (DOE ECP, DOE ASCR, NSF, Sandia, ANL LDRD)
- 1 International project (G8)
Recent Grants (2020-2024)
▼
- [60] 2024 DOE ASCR AI for Science - co-PI
- [59] 2024 DOE ASCR ZF Reduction project - lead PI
- [58] 2024 Argonne LDRD, AuroraGPT - group lead
- [57] 2024 NSF CSSI FZ project supplemental funding - Lead PI
- [56] 2023-2028 DOE ASCR Xscope - X-ray & Neutron Scientific Center
- [55] 2023-2028 DOE ASCR Illumine - Intelligent Learning for Light Source
- [54] 2023-2027 NSF CSSI FZ: Cyberinfrastructure for lossy compression - Lead PI
- [53] 2023-2028 DOE ASCR Distributed Intelligence for Resilient Workflows - Lead Resilience
- [52] 2023-2026 DOE FAIR DTIO: Computational Storage - co-PI
- [51] 2022-2025 DOE ASCR Actionable Intelligent Visual Analytics - co-PI
- [50] 2022-2025 DOE ASCR support for JLESC - PI
- [49] 2021-2023 NSF CSSI ROCCI: In Situ Lossy Compression - Co-PI
- [48] 2020-2022 SPP with oil company - lead PI
- [46] 2020-2023 DOE ECP VeloC-SZ - lead PI
DOE ECP and Major US Grants (2014-2020)
▼
- [47] 2018-2019 DOE support for JLESC - lead PI
- [45] 2018-2020 NSF Ephemeral Coherence Cohort - co-PI (Marc Snir, PI)
- [44] 2017-2018 DOE support for JLESC - lead PI
- [43] 2017-2020 DOE ECP VeloC - lead PI
- [42] 2017-2020 DOE ECP EZ - lead PI
- [41] 2017-2023 DOE ECP CODAR - Lead for data reduction (Ian Foster, PI)
- [40] 2017-2023 DOE ECP Computing the Sky - co-lead Data Analysis (Salman Habib, PI)
- [39] 2016-2019 NSF ALETHEIA: Automatic detection framework - co-PI
- [38] 2016-2019 DOE ASCR Catalog - co-PI
- [37] 2015-2018 EDF Grant to support JLESC - PI
- [36] 2015-2018 DARPA BRASS - senior personnel
- [35] 2015-2017 ANL LDRD Re-form: FPGA reconfigurability - co-PI
- [34] 2014-2017 DOE ASC DECAF: High-Performance Decoupling - co-PI
- [33] 2014-2017 PUF NextGN: Next Generation Simulation Platforms - PI
European and International Projects (2006-2016)
▼
- [32] 2013-2015 Anomaly@Exascale: INRIA International Associate Team - co-PI
- [31] 2013-2016 ANL LDRD Paris: Data Knowledge-Based Resilience - PI
- [30] 2013-2016 MontBlanc 2: European FP7 IP - Co-PI (as Inria)
- [29] 2013-2016 Scorpio: Significance-Based Computing - European FP7 FET
- [28] 2012 EESI2: European Exascale Software Initiative 2 - Leader Resilience
- [27] 2012 DOE Fault Tolerance Framework for Cray XE6 - PI
- [26] 2012 AMFT: Advanced Multilevel Fault Tolerance - STRATOS prototype
- [25] 2011 G8 ECS: Towards Exascale Climate Simulation - initiator & director
- [24] 2011 DOE XStack: Event Log Analysis - main PI
- [23] 2010 ANR-JST FP3C: Framework for Post Petascale - co-PI
- [22] 2010 EESI: European Exascale Software Initiative - co-initiator
- [18] 2007 EDGeS: Infrastructure European Project FP7 - co-PI
- [17] 2006 Grid4All: European Project STREP FP6 - co-PI
- [16] 2006 QosCos Grid: European Project STREP FP6 - PI for INRIA
- [11] 2004 CoreGRID: European Network of Excellence - senior personnel
French National Projects and Early Grants (1999-2010)
▼
- [21] 2010 RESCUE, ANR White - co-PI
- [20] MAP-REDUCE, ANR ARPEGE - co-PI
- [19] 2008 Aladdin: Project after Grid'5000 - Scientific Director
- [15] 2006 HIPcal: ANR Calcul Intensif et Simulation - co-PI
- [14] 2005 CARRIOCAS: Competitivity pole System@tic - PI INRIA
- [13] 2005 Grid eXplorer: Large-Scale Emulation Platform - PI
- [12] 2004 Large-Scale Evaluation of HP Networks, ACI Grid'5000 - PI
- [10] 2003 CNRS action: National experimental platform - PI
- [9] 2003 Data GRID Explorer, Data Mass ACI - PI
- [8] 2003 CNRS-Urbana collaboration - co-PI with Marc Snir
- [7] 2002 DataGraal, ACI GRID - co-PI
- [6] 2002 cASPer: Community based Application Service Provider - co-PI
- [5] 2002 Augernome XtremWeb, PPF Paris XI University - co-PI
- [4] 2001 CGP2P: P2P Global Computing, ACI GRID - PI
- [3] 2001 GRID2, ACI GRID coordination action - senior personnel
- [2] 2000 XtremWeb Desktop Grid, Ministry of research - co-PI
- [1] 1999 RNRT ROM Project: Multi-service Optical network - senior personnel
Student Advising
Advising Statistics
- 22 Ph.D. students advised (most now in research or professor positions)
- 58 Ph.D. defense juries
- 8 tenure track examinations (French Habilitation)
- Multiple postdoctoral researchers supervised
Alumni Success Stories
▼
Olivier Richard
Ph.D. 1999 - Hybrid Parallel Programming, Cluster scheduler
Current: Assistant Professor IMAG
Current: Assistant Professor IMAG
George Bosilca
Ph.D. 2003 - OVM Project
Current: Software Architect, NVIDIA
Ph.D. 2003 - OVM Project
Current: Software Architect, NVIDIA
Gilles Fedak
Ph.D. 2004 - XtremWeb
Current: INRIA Researcher & Founder and CEO of iExec
Ph.D. 2004 - XtremWeb
Current: INRIA Researcher & Founder and CEO of iExec
Leonardo Bautista Gomez
Ph.D. Co-advisor 2012 - FTI Software
Current: Founder and team leader of MigaLabs
Ph.D. Co-advisor 2012 - FTI Software
Current: Founder and team leader of MigaLabs
Ana Gainaru
Ph.D. Co-advisor 2015 - Failure Prediction
Current: Computer Scientist at ORNL
Ph.D. Co-advisor 2015 - Failure Prediction
Current: Computer Scientist at ORNL
Dingwen Tao
Ph.D. Co-advisor - Lossy Compression
Current: Full Professor · Institute of Computing Technology, Chinese Academy of Sciences
Ph.D. Co-advisor - Lossy Compression
Current: Full Professor · Institute of Computing Technology, Chinese Academy of Sciences
Geraud Krawezik
Ph.D. Advisor 2005 - Advanced programming with OpenMP
Current: Software Engineer, Flatiron Institute, Simons Foundation
Ph.D. Advisor 2005 - Advanced programming with OpenMP
Current: Software Engineer, Flatiron Institute, Simons Foundation
Aurelien Bouteiller
Ph.D. Co-Advisor 2006 - MPICH-V1 environment and protocols
Current: Research Assistan Professor, Innovative Computing Laboratory, UTK
Ph.D. Co-Advisor 2006 - MPICH-V1 environment and protocols
Current: Research Assistan Professor, Innovative Computing Laboratory, UTK
Oleg Lodigensky
Ph.D. Co-Advisor 2006 - XtremWeb-Auger : HEP desktop Grid
Current: Expert, CryptoNext Security
Ph.D. Co-Advisor 2006 - XtremWeb-Auger : HEP desktop Grid
Current: Expert, CryptoNext Security
Pierre Lemarinier
Ph.D. Co-Advisor 2006 - MPICH-V2 environment and protocol
Current: Product Owner at Atos BDS R&D
Ph.D. Co-Advisor 2006 - MPICH-V2 environment and protocol
Current: Product Owner at Atos BDS R&D
Benjamin Quetier
Ph.D. Co-Advisor 2008 - very large scale distributed system emulator
Current: co-founder, CTO Invenis
Ph.D. Co-Advisor 2008 - very large scale distributed system emulator
Current: co-founder, CTO Invenis
Services
Journal editorial boards and conference steering committees
▼
- Editorial Board Elsevier Parallel Computing, 2021
- Editorial Board IEEE Transaction on Computers, since 2019
- Editorial Board IEEE Transaction on Parallel and Distributed Computing until 2018
- Editorial Board International Journal of Grid Computing, Kluwer Academic Publishers since 2003
- Editorial Board International Journal of Cluster Computing since 2008
- Steering Committee IEEE and ACM HPDC 2014-2020 and previously 2006-2010
- Steering Committee IEEE/ACM CCGRID
Conference, workshop, session (co-)organization
▼
- Tech paper area co-chair IEEE IPDPS 2025
- Tutorial co-chair IEEE/ACM SC 2023
- Award chair IEEE/ACM SC 2022
- Award deputy chair IEEE/ACM SC 2021
- Virtual Logistics Liaison - Tech papers IEEE/ACM SC 2021
- Tech Paper Chair IEEE/ACM SC 2020
- Program Chair IEEE Cluster 2020
- Deputy Tech Paper Chair IEEE/ACM SC 2019
- System software track chair IEEE IPDPS 2018
- Poster Chair IEEE/ACM SC 2018
- Program co-chair IEEE CCGrid 2017
- Emerging Technology chair ACM/IEEE SC 2017
- Program vice-chair: Security, Privacy, and Reliability track IEEE CCGrid 2016
- Award chair ACM/IEEE SC 2015
- Scientific visualization showcase chair IEEE/ACM SC 2014
- Program co-chair ACM CAC 2014
- Program co-chair ACM HPDC 2014
- Test of Time Award chair ACM IEEE SC 2013
- Birds of a feather: G8 Exascale projects, ACM/IEEM SC13
- Panel: "Fault Tolerance/resilience at Petascale/Exascale: Is it really critical? Are solutions necessarily disruptive?" IEEE/ACM SC13
- Session: "system software challenges" as ISC2013
- Tutorial co-chair ACM IEEE SC 2012
- Technical Paper co-chair ACM IEEE SC 2011
- Program chair HIPC 2010
- Program chair IEEE NCA 2010
- Technical Paper Area co-chair IEEE/ACM SC 2009
- Program co-chair IEEE/ACM CCGrid 2009
- Dagstuhl Seminar on Fault tolerance for HPC, 2009
- General co-chair Grid and Pervasive Computing
- General co-chair PCCGrid'2007, First Workshop on Large-Scale and Volatile Desktop Grids (PCCGrid) in conjunction with the IEEE International Parallel & Distributed Processing Symposium, 2007
- Program co-chair EuroPVM/MPI, Paris, Sept, 2007, http://www.pvmmpi07.org/, 2007
- Program co-chair HotP2P'2007, Fourth International Workshop on Hot Topics in Peer-to-Peer Systems (Hot-P2P) in conjunction with the IEEE International Parallel & Distributed Processing Symposium, 2007
- Workshop PC-Grid at IEEE IPDPS 2006
- General Chair IEEE HPDC 2006
- Workshop GP2PC'2005, "Global and Peer to Peer Computing", CCGRID'2005, Cardiff, 9 May 2005
- Workshop GP2PC'2004, "Global and Peer to Peer Computing", CCGRID'2004, Chicago, 19 April 2004
- Workshop "Global and Peer-to-Peer Computing" (GP2PC) Workshop co-located with IEEE/ACM CCGrid’2003, Tokyo, Japan, 2003
- Winter school GRID 2002, Aussois, Dec. 2002
- GRID summer school co-located with RenPar 2002, Hammamet, May 2002
- Workshop « Global and Peer-to-Peer Computing » (GP2PC) co-located with IEEE/ACM CCGrid’2002, Berlin, Germany, 2002 (www.lri.fr/~fci/GP2PC.htm)
- Workshop « Global Computing on Personal Devices » (GP2PC) workshop co-located with international conference IEEE/ACM CCGrid’2001, Brisbane, Australia, 2001
- IEEE HPCA6 , Toulouse, France, 2000
Contact Information
Email:
cappello@anl.gov
cappello@anl.gov
Phone:
+1 217 417 8557
+1 217 417 8557
Website:
ANL Profile
ANL Profile
Google Scholar:
View Publications
View Publications
JLESC Website:
Joint Laboratory
Joint Laboratory
Mathematics and Computer Science Division
Argonne National Laboratory
9700 S. Cass Avenue
Lemont, IL 60439, USA
Argonne National Laboratory
9700 S. Cass Avenue
Lemont, IL 60439, USA