IBM Workshop Hands-on Workshop zur IBM Big Data Plattform und BigInsights Harald Gröger, Gerhard Wenzel, Martin Clement Client Technical Specialists Big Data
Inhalt Durch Lösungen für Big Data können aus riesigen Datenmengen geschäftsrelevante Informationen gewonnen werden. In diesem Workshop werden Sie unter Anleitung unstrukturierte in Files gespeicherte Massendaten aus dem Internet mit IBM InfoSphere BigInsights administrieren und über Spreadsheets auswerten. Ziel - Sie gewinnen selbst praktische Erfahrung mit den Big Data Lösungen der IBM und können einschätzen, welchen Nutzen dieses topaktuelle Thema Ihrem Unternehmen bringen kann. Teilnehmer - Der Workshop richtet sich an alle Big Data Experten und die, die es werden wollen. Vorkenntnisse der IBM Big Data Lösungen sind nicht notwendig. Agenda Einführung IBM Big Data Plattform und BigInsights (IBM Hadoop Distribution) Übung 1: Komfortable Administration von Systemen und Anwendungen Übung 2: Analyse von Daten aus sozialen Netzwerken über Spreadsheets Live Demo: Text-Analyse zur Gewinnung relevanter Geschäfts-Informationen
Was ist Big Data? Volume Variety Velocity Veracity Data at Scale Terabytes to petabytes of data Data in Many Forms Structured, unstructured, text, multimedia Data in Motion Analysis of streaming data to enable decisions within fractions of a second. Data Uncertainty Managing the reliability and predictability of inherently imprecise data types.
Die IBM Big Data Zonen-Architektur Real-time Analytics Intelligence Analysis Data in Motion Ingestion and Integration Streams Integrated Exploration Decision Management Data at Rest ETL, Quality, MDM Landing, Analytics and Archive Warehouse / Marts BI and Predictive Analytics Data in Many Forms MapReduce Navigation and Discovery Hadoop Information Governance, Security and Business Continuity
Was ist Hadoop? Apache Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. MapReduce - The framework that understands and assigns work to the nodes in a cluster. HDFS - A file system that spans all the nodes in a Hadoop cluster for data storage. It links together the file systems on many local nodes to make them into one big file system. HDFS assumes nodes will fail, so it achieves reliability by replicating data across multiple nodes Scalable add nodes without changing data formats, how data is loaded, how jobs are written, or the applications on top Cost effective massively parallel computing on commodity servers with sizeable decrease in storage cost, which makes it affordable to model all your data Flexible schema-less, can absorb any type of data, data from multiple sources can be joined and aggregated in arbitrary ways enabling deep analyses Fault tolerant loss of a node results in work redirect to another location of the data and continues processing
Umfang der IBM BigInsights Hadoop-Distribution Enterprise class Quick Start Edition New for V2.1. Free. Non-production only Apache Hadoop Basic Edition Free download - Jaql - Integrated install Enterprise Edition Sold by # of terabytes managed PureData for Hadoop - Appliance simplicity Enterprise ready - Integrated web console - Administrative tools, security - RDBMS, warehouse connectivity - Enterprise Integration - Performance Optimization - Pre-built applications Analytics included - Visualization Capabilities - Spreadsheet-style tool - Big SQL - Text analytics - Eclipse development -- Accelerators PureData for Hadoop brings BigInsights as an appliance form factor to the market Breadth of capabilities
Generelle Informationen Name Hostname der VM = bivm Login Benutzer = biadmin Kennwort = biadmin
Tutorial - Managing your Big Data environment Dauer ca. 10 Minuten Start BigInsights Web Console über Desktop Icon, dann weiter mit Chapter 2 / Lesson 1 / Schritt 3 (Seite 4).
Tutorial - Analyzing Big Data with BigSheets Dauer ca. 40 Minuten Alle Prerequisites sind bereits erfüllt. Die Daten sind heruntergeladen und importiert. Start im Files Tab der BigInsights Web Console mit Chapter 4 / Lesson 1 / Schritt 3 (Seite 14), (hdfs/biginsights/sheets/watson_data_preloaded) Ende nach Lesson 6 / Schritt 3 (Seite 21).
Console Demo
BigSheets Demo Blog News Spreadsheet Format From unstructured text to formatted spreadsheets and charts Chart
Text Analytics Demo unstructured text Labels / Examples AQL Regex / Dictionary generate From unstructured text documents to text analytics result table text highlight AQL Candidates create combination of regex and dictionaries plus distance, case,... AQL Filter Result Table result table duplicates, irrelevant candidates,...
Thank You!