High performance computing (HPC) is currently reaching the exascale level and compute clusters will combine the performance of millions of cores to perform more than 1018 operations per second soon. Additionally to the challenges resulting from going to exascale computing, HPC is in the transition from being compute-centric to being data-centric. The management and handling of data become more and more important, and it will be crucial to scale data capacity and data bandwidth.
Our group Efficient Computing and Storage at the Johannes Gutenberg University Mainz is focusing on the areas storage systems and scalable computing.
We are focusing both on block and file level storage. We are developing protocols and architectures, which are able to efficiently use the underlying storage medias and integrate these architectures within scalable environments. New storage technologies, like solid state disks (SSDs) and non-volatile main memory (NVMM), are integrated within these environments and help us to deliver optimized storage systems, e.g., in the context of parallel file systems, data deduplication, and backup.
Combining the performance of accelerators and processors is investigated in the context of different HPC codes and our extensions to these codes and to HPC middleware environments help applications to better utilize CPU and GPU computing resources, while improving energy efficiency is also investigated in the context of Cloud Computing, which helps to simplify the access to scientific applications.
Most recent publications
- Wen Cheng, Mi Luo, Lingfang Zeng, Yang Wang, and André Brinkmann. 2022. Lifespan-based garbage collection to improve SSD’s reliability and performance. Journal of Parallel and Distributed Computing 164: 28–39. DOI
- Alvaro Frank, Manuel Baumgartner, Reza Salkhordeh, and André Brinkmann. 2021. Improving checkpointing intervals by considering individual job failure probabilities. In 35th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 209–309. DOI
- Frederic Schimmelpfennig, Marc-André Vef, Reza Salkhordeh, Alberto Miranda, Ramon Nou, and André Brinkmann. 2021. Streamlining distributed Deep Learning I/O with ad hoc file systems. In 2021 IEEE International Conference on Cluster Computing (CLUSTER), 169–180. DOI Author/Publisher URL
- Nicolas Krauter, Patrick Raaf, Peter Braam, Reza Salkhordeh, Sebastian Erdweg, and Andre Brinkmann. 2021. Persistent Software Transactional Memory in Haskell. Proc. ACM Program. Lang. 5. DOI Author/Publisher URL
- Petra Berenbrink, André Brinkmann, Robert Elsäßer, Tom Friedetzky, and Lars Nagel. 2021. Randomized renaming in shared memory systems. Journal of Parallel and Distributed Computing 150: 112–120. DOI