Brainware als Faktor für energieeffizientes HPC

Ähnliche Dokumente
Fluid-Particle Multiphase Flow Simulations for the Study of Sand Infiltration into Immobile Gravel-Beds

Brainware für Green IT

ISEA RWTH Aachen Electric Bus Simulation

Das HPC Ökosystem in Darmstadt und Hessen

Power-Efficient Server Utilization in Compute Clouds

Walter Buchmayr Ges.m.b.H.

GRIPS - GIS basiertes Risikoanalyse-, Informations- und Planungssystem

Mitglied der Leibniz-Gemeinschaft

Exercise (Part XI) Anastasia Mochalova, Lehrstuhl für ABWL und Wirtschaftsinformatik, Kath. Universität Eichstätt-Ingolstadt 1

Exercise (Part II) Anastasia Mochalova, Lehrstuhl für ABWL und Wirtschaftsinformatik, Kath. Universität Eichstätt-Ingolstadt 1

MULTI PHYSICS SIMULATION IN MANUFACTURING

SARA 1. Project Meeting

prorm Budget Planning promx GmbH Nordring Nuremberg

Tube Analyzer LogViewer 2.3

Dynamic Hybrid Simulation

Handbuch der therapeutischen Seelsorge: Die Seelsorge-Praxis / Gesprächsführung in der Seelsorge (German Edition)

Customer-specific software for autonomous driving and driver assistance (ADAS)

Turning Data Into Insights Into Value. Process Mining. Introduction to KPMG Process Mining

p^db=`oj===pìééçêíáåñçêã~íáçå=

Wie man heute die Liebe fürs Leben findet

GAUSS towards a common certification process for GNSS applications using the European Satellite System Galileo

SCHLAUE LÖSUNGEN FÜR IHRE ANWENDUNGEN SMART SOLUTIONS FOR YOUR APPLICATIONS

MODELLING AND CONTROLLING THE STEERING FORCE FEEDBACK USING SIMULINK AND xpc TARGET

Climate change and availability of water resources for Lima

Exercise (Part V) Anastasia Mochalova, Lehrstuhl für ABWL und Wirtschaftsinformatik, Kath. Universität Eichstätt-Ingolstadt 1

Word-CRM-Upload-Button. User manual

Geometrie und Bedeutung: Kap 5

Technische Universität Kaiserslautern Lehrstuhl für Virtuelle Produktentwicklung

Die deutsche Windows HPC Benutzergruppe

There are 10 weeks this summer vacation the weeks beginning: June 23, June 30, July 7, July 14, July 21, Jul 28, Aug 4, Aug 11, Aug 18, Aug 25

Introducing PAThWay. Structured and methodical performance engineering. Isaías A. Comprés Ureña Ventsislav Petkov Michael Firbach Michael Gerndt

Ein Stern in dunkler Nacht Die schoensten Weihnachtsgeschichten. Click here if your download doesn"t start automatically

Product Lifecycle Manager

Registration of residence at Citizens Office (Bürgerbüro)

Evaluation of an Augmented-Realitybased 3D User Interface to Enhance the 3D-Understanding of Molecular Chemistry

Simulation of a Battery Electric Vehicle

Handwerk Trades. Arbeitswelten / Working Environments. Green Technology for the Blue Planet Clean Energy from Solar and Windows

Creating OpenSocial Gadgets. Bastian Hofmann

En:Tool EnEff BIM Introduction to the Project and Research Association

Electrical testing of Bosch common rail solenoid valve (MV) injectors

AS Path-Prepending in the Internet And Its Impact on Routing Decisions

Im Fluss der Zeit: Gedanken beim Älterwerden (HERDER spektrum) (German Edition)

Die einfachste Diät der Welt: Das Plus-Minus- Prinzip (GU Reihe Einzeltitel)

New X-ray optics for biomedical diagnostics

HIR Method & Tools for Fit Gap analysis

-weishaupt- November 12, Wolf Point Ballroom 350 W. Mart Center Drive, Chicago, IL. Christoph Petri. Technical Sales, Chicago-Area

Die Bedeutung neurowissenschaftlicher Erkenntnisse für die Werbung (German Edition)

FEM Isoparametric Concept

Reparaturen kompakt - Küche + Bad: Waschbecken, Fliesen, Spüle, Armaturen, Dunstabzugshaube... (German Edition)

IATUL SIG-LOQUM Group

Fußballtraining für jeden Tag: Die 365 besten Übungen (German Edition)

p^db=`oj===pìééçêíáåñçêã~íáçå=

Critical Chain and Scrum

Potentials for Economic Improvement of Die Casting Cells

Online Learning in Management

Karlsruhe Institute of Technology Die Kooperation von Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH)

NEWSLETTER. FileDirector Version 2.5 Novelties. Filing system designer. Filing system in WinClient

Electrical testing of Bosch common rail piezo injectors

Software development with continuous integration

Level 1 German, 2014

Max und Moritz: Eine Bubengeschichte in Sieben Streichen (German Edition)

Wer bin ich - und wenn ja wie viele?: Eine philosophische Reise. Click here if your download doesn"t start automatically

GridMate The Grid Matlab Extension

Inequality Utilitarian and Capabilities Perspectives (and what they may imply for public health)

Mehr Service, weniger Ausfälle im Rechenzentrum

Die besten Chuck Norris Witze: Alle Fakten über den härtesten Mann der Welt (German Edition)

Dienstleistungen Services

Aluminium-air batteries: new materials and perspectives

Kursbuch Naturheilverfahren: Curriculum der Weiterbildung zur Erlangung der Zusatzbezeichnung Naturheilverfahren (German Edition)

Asynchronous Generators

Funktion der Mindestreserve im Bezug auf die Schlüsselzinssätze der EZB (German Edition)

Pluralisierung von Lebensformen - Veränderung familiärer Strukturen und innergesellschaftlicher Wandel (German Edition)

Preisliste für The Unscrambler X

Cycling and (or?) Trams

Konkret - der Ratgeber: Die besten Tipps zu Internet, Handy und Co. (German Edition)

Die UN-Kinderrechtskonvention. Darstellung der Bedeutung (German Edition)

FACHKUNDE FüR KAUFLEUTE IM GESUNDHEITSWESEN FROM THIEME GEORG VERLAG

WE SHAPE INDUSTRY 4.0 BOSCH CONNECTED INDUSTRY DR.-ING. STEFAN AßMANN

Martin Luther. Click here if your download doesn"t start automatically

Die Intrige: Historischer Roman (German Edition)

VGM. VGM information. HAMBURG SÜD VGM WEB PORTAL - USER GUIDE June 2016

Introduction FEM, 1D-Example

Cycling. and / or Trams

Benjamin Whorf, Die Sumerer Und Der Einfluss Der Sprache Auf Das Denken (Philippika) (German Edition)

Introduction FEM, 1D-Example

Kopfsache schlank: Wie wir über unser Gehirn unser Gewicht steuern (German Edition)

PONS DIE DREI??? FRAGEZEICHEN, ARCTIC ADVENTURE: ENGLISCH LERNEN MIT JUSTUS, PETER UND BOB

VGM. VGM information. HAMBURG SÜD VGM WEB PORTAL USER GUIDE June 2016

Komplexität beherrschen mit Contract Based Design

FAHRZEUGENTWICKLUNG IM AUTOMOBILBAU FROM HANSER FACHBUCHVERLAG DOWNLOAD EBOOK : FAHRZEUGENTWICKLUNG IM AUTOMOBILBAU FROM HANSER FACHBUCHVERLAG PDF

Duell auf offener Straße: Wenn sich Hunde an der Leine aggressiv verhalten (Cadmos Hundebuch) (German Edition)

Vasco Tonack Network and Communication, ZEDAT. Cisco UC Licensing. Und die Nachteile von Extension Mobility

Microsoft Azure Fundamentals MOC 10979

Volksgenossinnen: Frauen in der NS- Volksgemeinschaft (Beiträge zur Geschichte des Nationalsozialismus) (German Edition)

Wirtschaftlichkeit. VeloCity Vienna. Speakers: Bernhard Fink, MVV, Markus Schildhauer, ADFC. ADFC-MVV folding bike

Intel Cluster Studio. Michael Burger FG Scientific Computing TU Darmstadt

Open Source und Workflow im Unternehmen: Eine Untersuchung von Processmaker, Joget, Bonita Open Solution, uengine und Activiti (German Edition)

Was heißt Denken?: Vorlesung Wintersemester 1951/52. [Was bedeutet das alles?] (Reclams Universal-Bibliothek) (German Edition)

Transkript:

Brainware als Faktor für energieeffizientes HPC Christian Bischof, Dieter an Mey, Christian Terboven christian.bischof@tu-darmstadt.de - HRZ, TU Darmstadt {anmey, terboven}@rz.rwth-aachen.de - RZ, RWTH Aachen 20.09.2012, ZKI AK SC, Universität Düsseldorf Rechen- und Kommunikationszentrum (RZ)

Motivation Definition of Green HPC from insidehpc.com: Design and management techniques that contribute to the responsible, effective use of energy in the operation of high performance computing centers and equipment. But: The current situation hardly allows for the Economical Optimization of the Total Budget Different budgets for Staff Hardware (mainly through applications every X years), Maintenance Power Building (mainly through applications once per decade?!) User s in general don t pay for compute resources 2

Agenda Cost of Brainware versus Hardware Tuning Opportunities Success Stories Summary 3

Cost of Brainware versus Hardware 4

Understanding the Total Cost of Ownership Assumptions 2 Mio HW investment per year 5 years lifetime with 4 years maintenance through vendor Power: 850 KW, PUE=1.5, 0.14 per kwh => 1.5 Mio per year ISV software provided by users Commercial batch system Free Linux distribution costs per year percentage Building ( 5Mio / 25y) 200.000 3,72% HPC software 50.000 0,93% ISV software 0 0,00% Batch system 100.000 1,86% Linux 0 0,00% power 1.500.000 27,93% office space 0 0,00% Staff 12 FTE 720.000 13,41% hardware maintenance 800.000 14,90% investment compute servers 2.000.000 37,24% sum costs 5.370.000 100,00% 5

1 18 35 52 69 86 103 120 137 154 171 188 Does it pay off to hire more HPC Experts? Start tuning top user projects first 15 projects account for 50% of the load 64 projects account for 80% of the load Assumptions It takes 2 months to tune one project One analyst can handle 5 projects per year A projects profits for 2 years As a consequence one HPC expert can take care of 10 projects at a time One FTE costs 60,000 100,00% 90,00% 80,00% 70,00% 60,00% 50,00% 40,00% 30,00% 20,00% 10,00% 0,00% Accumulated usage of top accounts (excl. JARA-HPC) Accumulated usage of top accounts 6 Tuning can improve the code by 5, 10 or 20 percent

Does it pay off? Yes! 600.000 400.000 200.000 For example: Break even point: 7.5 HPC Analysts improve top 75 projects by 10% (TCO is 5.3 Mio /y) ROI [ ] 0-200.000 #Projects 70 60 50 40 30 20 10 Savings with 5% improvement 80 90 100 110 120 130 140 150 # of tuned projects 160-400.000-600.000 7 Savings with 10% improvement Savings with 20% improvement 10 projects handled by one FTE (60.000 /y)

Tuning Opportunities 8

Opportunities for Tuning wo/ Code Access Sanity Check Use HW Counters to Measure Performance To check for Performance Anomalies IO behavior System call statistics Hardware Choose the optimal hardware platform File system, IO parameters Parameterization Choose optimal number of threads / MPI processes Thread / Process Placement (NUMA) Mapping MPI topology to hardware topology MPI parameterization (buffers, protocols) Optimal libraries (MKL ) 9

Opportunities for Tuning w/ Code Changes Cache Tuning padding, blocking, loop based optimization techniques Inlining/outlining, Help compiler to perform optimizations MPI optimization avoid global synchronization Hide / reduce communication overhead, Unblocking communication Coalesce communications OpenMP optimization Extend parallel regions Check for false sharing NUMA optimization: first touch, migration In vogue: Add OpenMP to an MPI code to improve scalability Of Course: Choosing the optimal Algorithm is crucial 10 To be handled by or with the domain expert

Opportunities for Tuning wo/ Code Changes Development Environment Choose the optimal compiler Choose optimal compiler options Autoparallelization Compiler Profile / Feedback Adapt dataset: Partitioning / Blocking Load Balancing This list is not intended to be exhaustive, but rather to illustrate that the skill set of an HPC tuning expert is very different from that of an application scientist who develops a program, but both skill sets are needed. SimLabs 11 Interdisciplinary Collaboration: HPC, Domain Expert, Numerical Expert,

Teaching via Courses and Workshop Selected courses and workshops in Aachen, since 2011: Date Event 03/2011 Parallel Programming in CES (en), 1 week, 75 part. 05/2011 Visual Studio 2010 + Windows HPC Server workshop, 25 selected part. 10/2011 AIXcelerate Tuning Workshop (en) (with Intel), 25 selected part. 12/2011 Parallel Programming with MATLAB, 35 part. 03/2012 Parallel Programming in CES (en), 1 week, 55 part. 08-09/2012 Parallel Programming Summer Courses (en): MPI, OpenMP, Tools, 10/2012 Planned: Tuning for bigsmp HPC Workshop (en) 10/2012 Planned: OpenACC Workshop (en) 11/2012 Planned: Technical Cloud Computing with Microsoft Azure (en) 12

Success Stories 13

HPC Consulting may save real money: Combustion of Biofuels Primary Breakup for Diesel Sprays: Hybrid CDPLit Adding a single OpenMPparallelized kernel improves efficiency by 10% approx. Turns into cost reduction of an equivalent of one FTE/yr. Human effort ~7 weeks 512 s 256 s 128 s 64 s 32 s Runtime for Small Test Data Set 8 PPN 1TPP 4 PPN 1 TPP 4 PPN 2 TPP 16 s 1 2 4 16 32 48 64 Nodes Cluster of Excellence Tailor-Made Fuels from Biomass, Inst. F. Combustion Technology, RWTH Aachen University 14

Green IT = Using Resources more efficiently Code Impact Partner FLOWer Matlab Genehunter Dynmatt 15 1.8 x higher efficiency of hybrid Navier Stokes Solver simulating the landing of a space launch vehicle By adding autopar to MPI, carefully assigning threads to processes to adjust load imbalances 150x Speed-up of numerical Solution of the Diffusion Equation by extracting compute intense kernel, transforming it to a Fortran code + careful cache tuning 14x Speed-up through Cache Optimization plus scalable MPI Parallelization of Linkage Analysis to identify genes which may cause diseases. 33x Speed-up through I/O-Optimization by implementing appropriate Buffering and reducing meta data operations Code Impact Partner FIRE NestedCP TFS ~100x Speed-up of Image Recognition Software on large SMP by nested parallelization which saves a lot of IO 10-50 x Speed-up for Critical Point Extraction in flow simulation output data through nested parallelization with OpenMP even with highly imbalanced work chunks. 20x Speed-up for Simulation of Human Nasal Flow for Computer Aided Surgery through nested parallelization with OpenMP RWTH Laboratory of Mechanics RWTH Institute of Physical Chemistry Inst. F. Medical Biometry, Computer Science and Epidemiology (IMBIE) Bonn RWTH Institute of Steel and Light Alloy Building Higher sophistication of parallelization leads to higher scalability, but does not save resources RWTH Chair of Computer Science 6 Virtual Reality Center Aachen RWTH Aerodynamic Institute, Parallel Software Products

HECToR: Distributed CSE Success Stories 16 Code Domain Effect Effort Saving CASTEP Key Materials Science 4x Speed and 4x Scalability 8 PMs 320k - 480k (p.a.) NEMO Oceanography Speed and I/O- Performance CASINO Quantum Monte- Carlo 4x Performance and 4x Scalability 6 PMs 95 k (p.a.) 12 PMs 760 k (p.a.) CP2K Materials Science 12 % Speed and Scalability 12 PMs 1500 k (in total) GLOMAP/ TOMCAT CITCOM Atmospheric Chemistry Geodynamic Thermal Convection 15 % Performance? 30% Performance? significant EBL Fluid Turbulence 4x Scalability 12 PMs ChemShell Catalytic Chemistry 8x Performance 9 PMs Fluidity- ICOM Ocean Modelling Scalability? DL_POLY_3 Molecular Dynamics 20x Performance 6 PMs CARP Heart Modelling 20x Performance 8 PMs

HPC Consulting may save real money: Hydro-Dynamics with XNS XNS (M. Behr, CATS, RWTH) Simulation of Hydro-Dynamic forces of the Ohio Dam Additional OpenMP Parallelization: 9 parallel regions Parallelized with MPI Human effort: ~ 6 weeks Scales very well for larger case 20-40 % improvement Improvement of execution time in percentage best effort MPI only versus best effort hybrid 50,00% 40,00% 30,00% 20,00% 10,00% Higher is better Nehalem EP Cluster (2 processor chips, 4 cores each) with InfiniBand-QDR 17 0,00% -10,00% -20,00% 1 2 4 6 8 16 32 48 64 # Compute Nodes

Execution time [sec] XNS: How much efficiency do we want to sacrifice? Execution Time 1280 640 320 160 Lower is better PPN1 TPP1 PPN1 TPP2 PPN1 TPP4 PPN1 TPP8 PPN2 TPP1 PPN2 TPP2 PPN2 TPP4 PPN4 TPP1 PPN4 TPP2 PPN8 TPP1 80 40 18 20 PPN = processes per node TPP = threads per process 1 2 4 6 8 16 32 48 64 # Compute Nodes

Efficiency [%] XNS: How much efficiency do we want to sacrifice? - Efficiency 100,0% 90,0% Higher is better PPN = processes per node TPP = threads per process 80,0% 70,0% 60,0% 50,0% 40,0% 30,0% PPN1 TPP1 PPN1 TPP2 PPN1 TPP4 PPN1 TPP8 PPN2 TPP1 PPN2 TPP2 PPN2 TPP4 PPN4 TPP1 PPN4 TPP2 PPN8 TPP1 20,0% 10,0% 19 0,0% 1 2 4 6 8 16 32 48 64

XNS: How much efficiency to we want to sacrifice? Improvements versus Efficiency 100,00% 80,00% 100,00% 93,40% Parallelization Efficiency (best effort) 74,10% 60,00% 40,00% 20,00% 0,00% -20,00% Relative Improvement of Hybrid Version (best efforts) 15,52% 59,35% 33,21% 49,53% 38,93% 32,95% 19,79% 20,98% 20,53% 17,46% 12,67% 1 2 4 6 8 16 32 48 64-16,19% -14,01% # Compute Nodes 15,95% 8,57% 20

Summary 21

Summary Higher investment in brainware pays-off, if it is only for tuning A University can save real money by investing in Brainware rather than Electricity for inefficiently used hardware HPC Performance Analysts are a rare species Needs HW knowledge, Tools, programming languages, compiler technologies + paradigms, (algorithms), OS effects It takes some time to hire anyone and get him/her up to speed Team work: more different brains create more synergy And now there are GPUs They have much higher head room for tuning 22

The End and an Invitation German Heterogeneous Computing Group (GHCG) Unabhängige Interessengruppe rund um das Hochleistungsrechnen mit Beschleunigern im deutschsprachigen Raum Ziel: Intensivierung des technischen und wissenschaftlichen Austauschs über Projekte, Hardware und Algorithmen Nutzergruppen-Treffen Datum: 1. + 2. Oktober 2012 Ort: Braunschweig (Haus der Wissenschaft) Anmelden und mitmachen (kostenfrei)! www.ghc-group.org (Anmeldung & weitere Infos) Jeder ist herzlich willkommen! Themen (u.a.) Neuerungen im Hardund Softwarebereich CFD auf heterogenen Architekturen 23