Hpc4u
Encyclopedia : H : HP : HPC : Hpc4u
| Logo | |
|---|---|
--> | |
| HPC4U in brief | |
| Name (short) | HPC4U |
| Name (long) | Highly Predictable Clusters for Internet Grids |
| Contract: | IST-511531 |
| Project type: | STREP |
| Start: | 01.06.2004 |
| Duration: | 36 months |
| Contribution: | 1.7M EUR |
| Partners: | CETIC BE, http://www.cetic.be |
| Dolphin NO, http://www.dolphin.no | |
| Fujitsu Europe UK, http://fujitsu.co.uk | |
| IBM France FR, http://www.ibm.fr | |
| Scali NO, http://www.scali.no | |
| University of Linköping SE, http://www.liu.se | |
| University of Paderborn DE, http://www.upb.de/pc2 | |
| Project Management | |
| Coordinator: | Géry Schneider IBM France |
| Technical Manager: | Matthias Hovestadt University of Paderborn |
| Exploitation and Dissemination Manager: | Simon Alexandre CETIC |
| Get in contact | |
| Website: | [link] |
HPC4U is a EU-funded project,[link] targeting on providing a software-only solution for a transparent and reliable cluster middleware. Its main scientific and technological objectives are:
- predictable and reliable middleware for clusters, built from COTS components
- provision of fault tolerance, based on checkpointing, snapshotting and migration
- realization of SLA-aware resource management, using the fault tolerance building blocks on network, storage, and process
- support of negotiation of SLAs for Grid middleware
- cluster middleware as active Grid component, migrating jobs over the Grid
- application transparent support of arbitrary single- and multiple node jobs
Grids for Complex Problem Solving
The HPC4U results will provide [Next Generation Grids] with the possibility to guarantee the completion of Grid jobs and leverage the larger uptake of Grid environments. The HPC4U software will be customisable and interoperable with other Grids and will open new perspectives to the usage of Grids for additional services as they are today strongly required by the industry. HPC4U will extend well accepted technologies and integrate them with very innovative features (such as Grid embedded Fault Tolerance), for all the components required for a dependable Grid (storage, communication, resource management, application environment).
The goal of the HPC4U project (Highly Predictable Cluster for Internet Grids) is to provide an application-transparent and software-only solution of a reliable Resource Management System. It will allow the Grid to negotiate on [Service Level Agreements], and it will also feature mechanisms like process and storage checkpointing to realize Fault Tolerance and to assure the adherence with given SLAs. The HPC4U solution will act as an active Grid component, using available Grid resources for further improving its level of Fault Tolerance.
The results of HPC4U will be a mix of open source and proprietary software embedded in two outcomes. The SLA-aware and Grid-enabled Resource Management System includes SLA negotiation, multi-site SLA-aware scheduling, security and interfaces for storage, checkpointing, and networking support. It will be multi-platform in nature and available as open source. The second HPC4U outcome will be a vertically integrated commercial product with proprietary Linux-specific developments for storage, networking and checkpointing. This outcome will demonstrate the entire, ready-to-use HPC4U functionality (job checkpointing, migration, and restart) for Grids based on Linux architectures.
Grid reliability
The attractiveness of the current Grid Computing for commercial and for a broader community suffers from the fact that the reliable, predictable and deadline-bounded job execution on remote Grid sites cannot be guaranteed. However, if Grid middleware does not provide any mechanisms for enabling the user to obtain hard and contractually assured guarantees, users can not be confident that their remote jobs will be treated in the same way as local jobs, that deadlines will be held even in case of failures and that their jobs will be executed using the desired resources. Accordingly, current Grid environments are solely applicable for the computation of low priority jobs, where the best-effort service is sufficient and no guarantees are demanded. Therefore, at least commercial users may not yet harness the potentials of Grid Computing, but still rely on job processing on their local site, as responsibilities and priorities are well-defined and solely aligned at the particular commercial interests. The outcome of the project will be a reliable, predictable, SLA-aware Grid middleware, which:
- Assigns start times and ensures the resource availability within a user-defined time interval in the Grid environment. This is mandatory for execution of multi-site jobs and planning of workflows. Moreover, due to the agreed SLAs the user can be sure, that their jobs at the remote site will be treated with the same priority as on the local site. By providing Fault Tolerance mechanisms HPC4U also ensures the adherence of deadlines for job completion.
- Provides secure and efficient job check-pointing and job migration over multiple administrative domains in order to increase the fault tolerance. Temporal failures of HW/SW/Network will be compensated by moving the jobs to other suitable, SLA-aware resources in the Grid and thus hiding the failure from the user.
- Includes operating system-driven check-pointing, which can be applied to any application and can be used transparently. In connection with novel storage and communication technologies based on commodity components, the Fault Tolerance will be affordable for a huge number of Grid participants without purchasing expensive and proprietary solutions.
The outcome of the project will be a reliable, predictable, SLA-aware Grid Middleware, which can be delivered in a twofold manner:
- First outcome: HPC4U Grid middleware includes the developed SLA-aware resource management system, the module for SLA negotiation, the SLA-aware resource scheduler, mechanisms for job migration over multiple administrative domains, security infrastructure and the interfaces to the check-pointing feature with networking and storage. The source code, documentation, interface description, and reference architecture will be open source according to LGPL (GNU Lesser General Public License). Thus, the HPC4U middleware can also be used in commercial products without forcing open source status of these products. In this framework the users can integrate any check-pointing mechanisms, storage and networking modules. On the top layer interfaces for interaction with OGSA-compliant Grid middleware will be provided.
- Second outcome: Vertically integrated, ready-to-use HPC4U middleware, which contains all necessary fault-tolerance mechanisms such as check-pointing feature, storage, and networking. Due to the pre-existing - partly commercial - products of the HPC4U partners this product will not be available as open source. It will demonstrate the functionality and the performance of the HPC4U work in real-world scenarios and applications.
From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.
