GASPI. HPCN Braunschweig

Ähnliche Dokumente
H.1 FORMI: An RMI Extension for Adaptive Applications H.1 FORMI: An RMI Extension for Adaptive Applications

Matrix Transposition mit gaspi_read_notify. Vanessa End HPCN Workshop 11. Mai 2016

Ressourcenmanagement in Netzwerken SS06 Vorl. 12,

Universität Karlsruhe (TH)

Ereignisbearbeitung. Proactor Asychronous Completion Token Acceptor Connector. Software Design Patterns

Introduction to Reliable and Secure Distributed Programming

Fluid-Particle Multiphase Flow Simulations for the Study of Sand Infiltration into Immobile Gravel-Beds

GridMate The Grid Matlab Extension

LOEWE-CSC and it s data center

Unit 1. Motivation and Basics of Classical Logic. Fuzzy Logic I 6

High Performance Computing am Fraunhofer ITWM

Computational Models

Towards Modular Supercomputing with Slurm

TSM 5.2 Experiences Lothar Wollschläger Zentralinstitut für Angewandte Mathematik Forschungszentrum Jülich

IDS Lizenzierung für IDS und HDR. Primärserver IDS Lizenz HDR Lizenz

Emotion Recognition of Call Center Conversations Robert Bosch Engineering and Business Solutions Private Limited

WE SHAPE INDUSTRY 4.0 BOSCH CONNECTED INDUSTRY DR.-ING. STEFAN AßMANN

Open queueing network model of a computer system: completed jobs

39 Object Request Brokers. 40 Components of an ORB Stubs and Skeletons Stub

Latency Scenarios of Bridged Networks

Creating OpenSocial Gadgets. Bastian Hofmann

En:Tool EnEff BIM Introduction to the Project and Research Association

Cilk Sprache für Parallelprogrammierung. IPD Snelting, Lehrstuhl für Programmierparadigmen

Outline. Cell Broadband Engine. Application Areas. The Cell

EtherNet/IP Topology and Engineering MPx06/07/08VRS

Motion Controller 2 - MC2

Introduction Workshop 11th 12th November 2013

PostgreSQL auf vielen CPUs. Hans-Jürgen Schönig Hans-Jürgen Schönig

Simulating the Idle: A New Load Case for Vehicle Thermal Management

Privacy-preserving Ubiquitous Social Mining via Modular and Compositional Virtual Sensors

Institut für Künstliche Intelligenz

Routing in WSN Exercise

Introducing PAThWay. Structured and methodical performance engineering. Isaías A. Comprés Ureña Ventsislav Petkov Michael Firbach Michael Gerndt

Newest Generation of the BS2 Corrosion/Warning and Measurement System

WebLogic Server für Dummies

Cell Broadband Engine

Fault Tolerant Network on chips Topologies

Unicode Support Atomic Operations Thread Support Type-Generic Makros Sicherheit Ease-of-Use C11. Thomas Duckardt

Big Data Management Thema 14: Cassandra

Potentials for Economic Improvement of Die Casting Cells

Effektive Nutzung der Simulationsumgebung Anregungen für die Praxis

Lessons learned from co-operation The project Virtual interaction with Web 2.0 in companies

vcdm im Wandel Vorstellung des neuen User Interfaces und Austausch zur Funktionalität V

Engineering the Factory of the Future Now.Next.Beyond. Heiko Schwindt VP Automation & Electrification Solutions, Bosch Rexroth

auf differentiellen Leitungen

Infrastructure as a Service (IaaS) Solutions for Online Game Service Provision

39 Object Request Brokers

39 Object Request Brokers

Grade 12: Qualifikationsphase. My Abitur

English version (Die deutsche Version finden Sie unten)

Ist NVMe das Speicherprotokoll der Zukunft?

Geschäftsprozesse und Regeln

2 Rechnerarchitekturen

Systemaufbau Blockdiagramm / System structure

AS Path-Prepending in the Internet And Its Impact on Routing Decisions

HIR Method & Tools for Fit Gap analysis

RESI A Natural Language Specification Improver

SAFE HARBOR STATEMENT

Michael Stumpen Grid Computing. Prof. Dr. Fuhr SS04 Kommunikation. Wide-Area Implementation of the Message Passing Interface

WebLogic Server im Zusammenspiel mit Real Application Cluster

Oracle AWR und ASH Analyse und Interpretation

Parallelizing derivative calculations using OpenMP and ADOL-C

Informationen zur Oracle DB SE2

1.2 QoS-Verbesserungen

Support Technologies based on Bi-Modal Network Analysis. H. Ulrich Hoppe. Virtuelles Arbeiten und Lernen in projektartigen Netzwerken

H Mcast Future Internet made in Hamburg?

Einführung in die Finite Element Methode Projekt 2

VGM. VGM information. HAMBURG SÜD VGM WEB PORTAL - USER GUIDE June 2016

Tube Analyzer LogViewer 2.3

Nebenläufige Programme mit Python

oscan ein präemptives Echtzeit-Multitasking-Betriebssystem

Multidiscussion Systems

Kommunikation von Prozessen: Signale und Pipes

Chapel Dennis Appelt. Seminar: Sprachen für Parallelverarbeitung.

Vorlesung Hochleistungsrechnen SS 2010 (c) Thomas Ludwig 447

Registration of residence at Citizens Office (Bürgerbüro)

Level 1 German, 2016

Wortdekodierung. Vorlesungsunterlagen Speech Communication 2, SS Franz Pernkopf/Erhard Rank

Priorities (time independent and time dependent) Different service times of different classes at Type-1 nodes -

Automatisierter Java EE Entwicklungs-Lifecycle mit WebLogic Server 12c. Robin Müller-Bady Systemberater, Oracle Deutschland

prorm Budget Planning promx GmbH Nordring Nuremberg

Multi-threaded Programming with Cilk

Online Learning in Management

Oracle Integration Cloud Service

Business-centric Storage How appliances make complete backup solutions simple to build and to sell

The Unbreakable Database System

HDTV cinematic feeling with all HDMI 2.0 features, improved haptics and design packaging

City West between Modern Age and History: How Does the Balancing Act. between Traditional Retail Structures and International

Level 2 German, 2011

Skalierbarkeit von Oracle RAC

Operating Systems Principles C11

Exploring the knowledge in Semi Structured Data Sets with Rich Queries

Modellierung der Business Architecture mit BPM 12c

FEBE Die Frontend-Backend-Lösung für Excel

Künstliche Intelligenz

VGM. VGM information. HAMBURG SÜD VGM WEB PORTAL USER GUIDE June 2016

KNIME HPC INTEGRATION VIA UNICORE

Titelmasterformat durch Klicken bearbeiten

MANAGEMENT AND FEDERATION OF STREAM PROCESSING APPLICATIONS

News. 6. Oktober 2011 ZKI Arbeitskreis Supercomputing. Dr. Franz-Josef Pfreundt Competence Center for HPC

Transkript:

GASPI HPCN Braunschweig 9.5.2012

Projektpartner Fraunhofer Gesellschaft e.v. Fraunhofer ITWM Fraunhofer SCAI T-Systems Solutions for Research GmbH Forschungszentrum Jülich Karlsruher Institut für Technologie Deutsches Zentrum für Luft- und Raumfahrt e.v. Institut für Aerodynamik und Strömungstechnik Institut für Antriebstechnik Technische Universität Dresden Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) Deutscher Wetterdienst scapos AG

PGAS Ansatz Der PGAS Ansatz bietet dem Entwickler Einen abstrakten gemeinsamen Adressraum Datenlokalität Asynchrone Kommunikationsmodelle Das PGAS-API des Fraunhofer ITWM (GPI) wird seit 2005 entwickelt wird seit 2007 exklusiv in den HPC Industrieprojekten des ITWM eingesetzt. bietet MPI-Entwicklern einen leichten Umstieg auf ein PGAS Programmiermodell

The Partitioned Global Address Space Global Memory Local Memory Local Memory Local Memory Local Memory

Key Features of GASPI In a Partitioned Global Address Space every thread can read/write the entire global memory of an application. Scalability From bulk synchronous two sided Communication Patterns to asynchronous one-sided communication. Versatility Beyond the Message Passing Model. Fault Tolerance. Timeouts in Non-Local Operations, Dynamic Node Sets. Flexibility PGAS as an API, Support for Multiple Memory Models

Projektaktivitäten-Zusammenfassung Definition des GASPI Standards eines PGAS-API; Sicherstellung der Interoperabilität mit MPI. Entwicklung einer hochperformanten Bibliothek zur einseitigen und asynchronen Kommunikation auf der Basis des Fraunhofer PGAS-API. Bereitstellung einer hochportablen und quelloffenen GASPI- Implementierung. (GasNet) Anpassung und Weiterentwicklung der Vampir Performance- Analyse Suite an den GASPI Standard.

Projektaktivitäten-Zusammenfassung GASPI-basierte effiziente numerische Bibliotheken für dünn- und dichtbesetzte lineare Algebra und High-Level Löser. Verifizierung durch Portierung von komplexen, industrienahen Anwendungen. Evaluierung, Benchmarking und Performance-Analyse. Ausbreitung in die Gemeinschaft von HPC & Wissenschaftlichem Rechnen Verbreitung, Bildung von Usergroups, Training und Workshops.

Performance Scalability RDMA queues for one-sided read and write operations, including support for arbitrary strided data (offset lists for sender and receiver). GASPI will hence e.g. allow for RDMA based Halo Exchanges in unstructured meshes. (Zero Copy) GASPI is threadsafe. Multithreaded communication is the default rather than the exception. Write, Notify, Write_Notifiy: relaxed synchronization with Double Buffering, where traditional (asynchronous) handshake mechanisms remain possible. No Buffered Communication (Zero Copy)

Performance Scalability No polling for outstanding receives, zero CPU communication overhead, true asynchronous read/write. Fast synchronous collectives with time-based blocking and timeouts. Support for asynchronous collectives in core API. Passive Receives two sided semantics, no Busy-Waiting. Allows for distributed updates, non-time critical asynchronous collectives (logs, convergence limits, error handling). Passive Receives accept messages with arbitrary tags from any sender and act upon the tag: Passive Active Messages, so to speak. Global Atomic operations: FetchAdd, cmpswap. Extensive profiling Support.

Scalability

Segments Versatility Tight coupling of Multi-Physics Solvers Runtime evaluation of applications. Support for heterogeneous Memory Architectures (NVRAM, GPGPU, SSD). Multiple Memory Models Symmetric Data Parallel (OpenShmem), Symmetric Stack Based Memory Management, Master/Slave, Irregular.

Versatility Global Memory, CSM Segment IO IO Global Memory, CFD Runtime Segment Evaluation 2 Process Global Memory, CFD Segment 1 IO Segment (NVRAM) IO Segment (NVRAM) Local Memory Local Memory Local Memory Local Memory Local Memory

Versatility GRT ~ 1-30 TB Future > 100TB alltoallv, weakly regular, inplace data sorting

Versatility Goal: Achieve a good distribution of computational load Work steeling GPI: Best in class performance

Fault Tolerance. Fault Tolerance Timeouts in all collective operations Timeouts for Read, Write, Wait, Segment Creation, Passive Communication. Dynamic growth and shrinking of node set. Fast Checkpoint/Restarts to NVRAM. State vectors for GASPI processes.

Flexibility Interoperability and Compatibility Compatibility with most Programming Languages should even work as extension to e.g. CAF. Interoperability with MPI. Compatibility with the Memory Model of OpenShmem. Support for all Threading Models (OpenMP/Pthreads/..) similar to MPI, GASPI is orthotogonal to Threads.

Flexibility Flexibility Allows for shrinking and growth of node set. Callback mechanisms for Global Reductions Offset lists for RDMA read/write (write_list, write_list_notify) Groups (Communicators) Node, Socket or even Process based PGAS instances. Advanvced Ressource Handling, configurable setup at startup, Explicit connection management

Hello World Examples

Allreduce Examples

Key Features of GASPI Scalability Versatility Fault Tolerance Flexibility www.gaspi.de

Questions?