The location of data that is needed during a calculation plays a central role in increasing the efficiency of future HPC systems. While already efficient methods for reserving data exist on the processor-level, access to the parallel file system is still a bottleneck. HPC file systems are usually a shared medium, which are used by many users in parallel. Furthermore, the performance is limited by the interface between the central file system and the compute nodes. Thus, for an application it is currently not possible to predict the actual load on the file system infrastructure and to optimize the I/O subsystem.
The project aims to improve I/O performance for highly-parallel applications by distributed ad-hoc overlay file systems. For this purpose, it examines how job-specific temporary file systems can be efficiently provided for HPC environments. These file systems are to be created from the resources of the computing nodes involved. The temporary file systems are filled with the necessary data through an integration into the scheduling system of the supercomputer before the job starts. After the completion of the job, the data is migrated back into the global parallel file system. The research approach includes both the design of the file system itself as well as the questions about the proper scheduling strategy for planning the necessary I/O transfers.
- Center for Information Services and High Performance Computing, TU Dresden
- Karlsruhe Institute of Technology, Steinbuch Center for Computing (SCC)
- Johannes Gutenberg University Mainz, Zentrum für Datenverarbeitung
02/2016 - 01/2019
- Vef, Marc-Andre ; Moti, Nafiseh ; SuB, Tim ; Tocci, Tommaso ; Nou, Ramon ; Miranda, Alberto ; Cortes, Toni ; Brinkmann, Andre:
GekkoFS - A Temporary Distributed File System for HPC Applications
2018 IEEE International Conference on Cluster Computing (CLUSTER). 2018. P. 319 - 324 (Konferenzbeitrag)
- Vef, Marc-Andre ; Tarasov, Vasily ; Hildebrand, Dean ; Brinkmann, Andre:
Challenges and Solutions for Tracing Storage Systems: A Case Study with Spectrum Scale
ACM TRANSACTIONS ON STORAGE. Vol. 14, Issue . 2018. P. 1 - 24
- SuB, Tim ; Nagel, Lars ; Vef, Marc-Andre ; Brinkmann, Andre ; Feld, Dustin ; Soddemann, Thomas:
Pure Functions in C: A Small Keyword for Automatic Parallelization
2017 IEEE International Conference on Cluster Computing (CLUSTER). 2017. P. 552 - 556 (Konferenzbeitrag)