The location of data that is needed during a calculation plays a central role in increasing the efficiency of future HPC systems. While already efficient methods for reserving data exist on the processor-level, access to the parallel file system is still a bottleneck. HPC file systems are usually a shared medium, which are used by many users in parallel. Furthermore, the performance is limited by the interface between the central file system and the compute nodes. Thus, for an application it is currently not possible to predict the actual load on the file system infrastructure and to optimize the I/O subsystem.
The project aims to improve I/O performance for highly-parallel applications by distributed ad-hoc overlay file systems. For this purpose, it examines how job-specific temporary file systems can be efficiently provided for HPC environments. These file systems are to be created from the resources of the computing nodes involved. The temporary file systems are filled with the necessary data through an integration into the scheduling system of the supercomputer before the job starts. After the completion of the job, the data is migrated back into the global parallel file system. The research approach includes both the design of the file system itself as well as the questions about the proper scheduling strategy for planning the necessary I/O transfers.
Project Partners
- Center for Information Services and High Performance Computing, TU Dresden
- Karlsruhe Institute of Technology, Steinbuch Center for Computing (SCC)
- Johannes Gutenberg University Mainz, Zentrum für Datenverarbeitung
Funding Period
02/2016 - 01/2019
External Links
Contact
Publications
2021
- Tim Süß, Lars Nagel, Marc-André Vef, André Brinkmann, Dustin Feld, and Thomas Soddemann. 2021. Pure Functions in C: A Small Keyword for Automatic Parallelization. Int. J. Parallel Program. 49: 1–24. DOI
2020
- André Brinkmann, Kathryn Mohror and Weikuan Yu, Philip Carns, Toni Cortes, Scott A Klasky, Alberto Miranda, Franz-Josef Pfreundt, Robert B Ross, and Marc-André Vef. 2020. Ad Hoc File Systems for High-Performance Computing. Journal of Computer Science and Technology (JCST) 35, 1: 4–26. DOI Author/Publisher URL
- Marc-André Vef, Nafiseh Moti, Tim Süß, Markus Tacke, Tommaso Tocci, Ramon Nou, Alberto Miranda, Toni Cortes, and André Brinkmann. 2020. GekkoFS - A Temporary Burst Buffer File System for HPC Applications. Journal of Computer Science and Technology (JCST) 35, 1: 72–91. DOI Author/Publisher URL
- Sebastian Oeste, Marc-André Vef, Mehmet Soysal, Wolfgang E Nagel, André Brinkmann, and Achim Streit. 2020. ADA-FS - Advanced Data Placement via Ad hoc File Systems at Extreme Scales. In Software for Exascale Computing - SPPEXA 2016-2019. Springer International Publishing, 29–59. DOI
2019
- Mehmet Soysal, Marco Berghoff, Thorsten Zirwes, Marc-André Vef, Sebastian Oeste, André Brinkmann, Wolfgang E Nagel, and Achim Streit. 2019. Using On-Demand File Systems in HPC Environments. In 17th International Conference on High Performance Computing and Simulation (HPCS), Dublin, Ireland, July 15-19.
2018
- Marc-André Vef, Nafiseh Moti, Tim Süß, Tommaso Tocci, Ramon Nou, Alberto Miranda, Toni Cortes, and André Brinkmann. 2018. GekkoFS - A Temporary Distributed File System for HPC Applications. In IEEE International Conference on Cluster Computing (CLUSTER), Belfast, UK, September 10-13, 319–324. DOI Author/Publisher URL
- Marc-André Vef, Vasily Tarasov, Dean Hildebrand, and André Brinkmann. 2018. Challenges and Solutions for Tracing Storage Systems: A Case Study with Spectrum Scale. ACM Transactions on Storage (ToS) 14: 18:1–18:24. DOI Author/Publisher URL
2017
- Tim Süß, Lars Nagel, Marc-André Vef, André Brinkmann, Dustin Feld, and Thomas Soddemann. 2017. Pure Functions in C: A Small Keyword for Automatic Parallelization. In 2017 IEEE International Conference on Cluster Computing (CLUSTER), 552–556. DOI