GASPI HPCN Braunschweig 9.5.2012
Projektpartner Fraunhofer Gesellschaft e.v. Fraunhofer ITWM Fraunhofer SCAI T-Systems Solutions for Research GmbH Forschungszentrum Jülich Karlsruher Institut für Technologie Deutsches Zentrum für Luft- und Raumfahrt e.v. Institut für Aerodynamik und Strömungstechnik Institut für Antriebstechnik Technische Universität Dresden Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) Deutscher Wetterdienst scapos AG
PGAS Ansatz Der PGAS Ansatz bietet dem Entwickler Einen abstrakten gemeinsamen Adressraum Datenlokalität Asynchrone Kommunikationsmodelle Das PGAS-API des Fraunhofer ITWM (GPI) wird seit 2005 entwickelt wird seit 2007 exklusiv in den HPC Industrieprojekten des ITWM eingesetzt. bietet MPI-Entwicklern einen leichten Umstieg auf ein PGAS Programmiermodell
The Partitioned Global Address Space Global Memory Local Memory Local Memory Local Memory Local Memory
Key Features of GASPI In a Partitioned Global Address Space every thread can read/write the entire global memory of an application. Scalability From bulk synchronous two sided Communication Patterns to asynchronous one-sided communication. Versatility Beyond the Message Passing Model. Fault Tolerance. Timeouts in Non-Local Operations, Dynamic Node Sets. Flexibility PGAS as an API, Support for Multiple Memory Models
Projektaktivitäten-Zusammenfassung Definition des GASPI Standards eines PGAS-API; Sicherstellung der Interoperabilität mit MPI. Entwicklung einer hochperformanten Bibliothek zur einseitigen und asynchronen Kommunikation auf der Basis des Fraunhofer PGAS-API. Bereitstellung einer hochportablen und quelloffenen GASPI- Implementierung. (GasNet) Anpassung und Weiterentwicklung der Vampir Performance- Analyse Suite an den GASPI Standard.
Projektaktivitäten-Zusammenfassung GASPI-basierte effiziente numerische Bibliotheken für dünn- und dichtbesetzte lineare Algebra und High-Level Löser. Verifizierung durch Portierung von komplexen, industrienahen Anwendungen. Evaluierung, Benchmarking und Performance-Analyse. Ausbreitung in die Gemeinschaft von HPC & Wissenschaftlichem Rechnen Verbreitung, Bildung von Usergroups, Training und Workshops.
Performance Scalability RDMA queues for one-sided read and write operations, including support for arbitrary strided data (offset lists for sender and receiver). GASPI will hence e.g. allow for RDMA based Halo Exchanges in unstructured meshes. (Zero Copy) GASPI is threadsafe. Multithreaded communication is the default rather than the exception. Write, Notify, Write_Notifiy: relaxed synchronization with Double Buffering, where traditional (asynchronous) handshake mechanisms remain possible. No Buffered Communication (Zero Copy)
Performance Scalability No polling for outstanding receives, zero CPU communication overhead, true asynchronous read/write. Fast synchronous collectives with time-based blocking and timeouts. Support for asynchronous collectives in core API. Passive Receives two sided semantics, no Busy-Waiting. Allows for distributed updates, non-time critical asynchronous collectives (logs, convergence limits, error handling). Passive Receives accept messages with arbitrary tags from any sender and act upon the tag: Passive Active Messages, so to speak. Global Atomic operations: FetchAdd, cmpswap. Extensive profiling Support.
Scalability
Segments Versatility Tight coupling of Multi-Physics Solvers Runtime evaluation of applications. Support for heterogeneous Memory Architectures (NVRAM, GPGPU, SSD). Multiple Memory Models Symmetric Data Parallel (OpenShmem), Symmetric Stack Based Memory Management, Master/Slave, Irregular.
Versatility Global Memory, CSM Segment IO IO Global Memory, CFD Runtime Segment Evaluation 2 Process Global Memory, CFD Segment 1 IO Segment (NVRAM) IO Segment (NVRAM) Local Memory Local Memory Local Memory Local Memory Local Memory
Versatility GRT ~ 1-30 TB Future > 100TB alltoallv, weakly regular, inplace data sorting
Versatility Goal: Achieve a good distribution of computational load Work steeling GPI: Best in class performance
Fault Tolerance. Fault Tolerance Timeouts in all collective operations Timeouts for Read, Write, Wait, Segment Creation, Passive Communication. Dynamic growth and shrinking of node set. Fast Checkpoint/Restarts to NVRAM. State vectors for GASPI processes.
Flexibility Interoperability and Compatibility Compatibility with most Programming Languages should even work as extension to e.g. CAF. Interoperability with MPI. Compatibility with the Memory Model of OpenShmem. Support for all Threading Models (OpenMP/Pthreads/..) similar to MPI, GASPI is orthotogonal to Threads.
Flexibility Flexibility Allows for shrinking and growth of node set. Callback mechanisms for Global Reductions Offset lists for RDMA read/write (write_list, write_list_notify) Groups (Communicators) Node, Socket or even Process based PGAS instances. Advanvced Ressource Handling, configurable setup at startup, Explicit connection management
Hello World Examples
Allreduce Examples
Key Features of GASPI Scalability Versatility Fault Tolerance Flexibility www.gaspi.de
Questions?