Data Work Group

Research requires data and so this team meets to make available data from our own institutions, and to discuss found data available elsewhere.

The Data Work Group meets on the third Friday of every month, at 9AM EST/15h CET. Please fill out the interest form and we will contact you with details soon.

Known Data Sets
Antici, F., et al. F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems. 1.0, Zenodo, 5 June 2024, doi:10.5281/zenodo.11467483. Data available on Zenodo.

F. Antici et al. 2023. PM100: A Job Power Consumption Dataset of a Large-scale Production HPC System. In Proceedings of the SC ’23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W ’23). Association for Computing Machinery, New York, NY, USA, 1812–1819. DOI: https://doi.org/10.1145/3624062.3624263. Data available on Zenodo.

Borghesi, A., Di Santi, C., Molan, M. et al. M100 ExaData: a data collection campaign on the CINECA’s Marconi100 Tier-0 supercomputer. Sci Data 10, 288 (2023). https://doi.org/10.1038/s41597-023-02174-3. Data available on Zenodo.

Samsi, Siddharth, Weiss, Matthew, Bestor, David, et al. The MIT Supercloud Dataset. 2021 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2021. Data available at MIT.

Teto, Bryce, Hawkins, Max. “Georgia Data Center Water Usage“, Github, 2025. Data available on Github.

Work In Process/Unpublished
Wells, J., Edmon, P., Singh, R. Harvard’s HPC Sustainability Dataset. Data available on GitHub.