The ENVELOPE project encompasses a set of Topics related to proactive failure prediction and tolerance in HPC clusters. Among its goals, the ENVELOPE project aims to analyze monitoring data in HPC-Centers, develop automated methods for failure prediction and create systems to proactively handle these errors using job migration techniques at system and user level.

The ZDV will create and perform a survey of German HPC centers to gather information about the available monitoring infrastructures and methods. The survey will be the foundation to collect and analyze monitoring data from German HPC centers. Machine learning methods will be used to predict component failures in these centers.

Project Partners

Funding Period

01/2017 - 12/2019

External Links

Project Website




