Seminar P2P im Sommersemester 2010

Transkript

1 Seminar P2P im Sommersemester 2010 Teilnehmer des Seminar P2P im Sommersemester September 2010 Veranstalter: Fachgebiete P2P, KOM, TK, DVS Fachbereich Informatik TU Darmstadt

2 Inhaltsverzeichnis 1 NAT Traversal Techniques von Stephan Arneth 4 2 Trac Models for Multiplayer Games von Maxim Babarinov 10 3 Sybil Resistant DHT von Niklas Büscher 16 4 KI in modernen Computerspielen von Damian A. Czarny 23 5 Reputation Systems for P2P Networks von Alexander Gebhardt 35 6 P2P Gaming Overlays von Christian Glaser 43 7 General Overview on Benchmarking Techniques and Their Applicability for P2P Systems von Tomasz Grubba 49 8 Safebook Key Management von Felix Günther 55 9 Mobile P2P von Yann Karl Survey of Possible Tasks for Articial Life in Large-scale Networks von Denis Lapiner Survey and Denition of Distributed Information Management Systems von Niklas Lochschmidt Peer-to-Peer Business Models von Oliver May Analyzing Network Coding for Security Threats and Attacks von Benjamin Milde Current Research on Twitter von Philipp Neubrand Large-scale Multiplayer Games and Networked Virtual Environments von Leonhard Nobach P2P Online Social Networks von Sascha Nordquist User Behavior Modeling in P2P Systems von Malcolm Parsons A Forgetting Internet von Olga Petrova Benutzerverhalten in Online-Social-Networks von Daniel Puscher General Overview on Underlay Modelling for P2P Simulators von Christian Rosskopf Peer-to-Peer-Botnets: Ein systematischer Überblick von André Schaller 149 2

3 22 Plausible Deniability von Martin Soemer Cloud Computing von Martin Stopczynski The Impact of Network Properties on Multiplayer Games von Dimitri Wulert P2P Quality Management von Mingmin Xu 178 3

4 NAT TRAVERSAL TECHNIQUES NAT Traversal Techniques Stephan Arneth Zusammenfassung In dieser Seminararbeit soll das Problem von P2P-Anwendungen behandelt werden, die mit verschiedenen NAT-Implementationen konfrontiert werden. Aufgrund dieser verschiedenen Implementationen haben P2P-Anwendungen Schwierigkeiten, ihre Kommunikationspartner zu erreichen. Hierzu werden die NAT-Grundlagen untersucht, auf den Trend aktueller Router eingegangen und mögliche Wege für das Traversieren aufgezeigt. Verhaltensbasierte Verfahren wie STUN, TURN, ALG, Reversal und Hole-Punching werden untersucht. Zugleich werden kontrollbasierte Verfahren vorgestellt, die das NAT direkt manipulieren. Letztlich erkennt man, dass zur Lösung dieses Problems eine Kombination von verschiedenen Verfahren eingesetzt werden muss und noch weitere Entwicklungen erforderlich sind. I. EINFÜHRUNG In den meisten Fällen erhält man vom Internet Service Provider (ISP) nur eine IP-Adresse. Diese IP-Adresse wird als öffentlich bezeichnet. Wenn mehrere Rechner auf das Internet zugreifen möchten, existiert schon ein kleines lokales Netzwerk, bestehend aus diesen privaten Rechnern. Die privaten Rechner im lokalen Netz erhalten keine öffentliche IP-Adresse, da diese Menge beschränkt ist. Daraus ergeben sich zwei Netze, das Local Area Network (LAN) und das Wide Area Network (WAN), zwischen diesen der Verkehr vermittelt werden muss. Der Router übernimmt diese Aufgabe und setzt für die Vermittlung zwischen den beiden Netzen das Network Address Translation (NAT) ein. Nicht nur im privaten Umfeld wird NAT eingesetzt, sondern auch in anderen Subnetzen, um die Zahl verwendeter öffentlicher IP-Adressen zu beschränken. NAT ist grundsätzlich im Hinblick auf die Client- / Serverarchitektur zugeschnitten worden. Hier liegt genau das Problem für P2P-Netze. Peer-to- Peer Kommunikation unterscheidet sich erheblich vom traditionellen Client- / Servermodell. Jeder Peer kann sowohl als Client und Server auftreten. Das führt zu einem Problem mit dem NAT-Verfahren. Ein Peer innerhalb des lokalen Netzes kann zwar problemlos nach außen hin eine Verbindung aufbauen. Wenn jedoch sein Kommunikationspartner hinter einem Router liegt, kann er diesen entsprechend ohne Kenntnis des verwendeten NAT-Verfahrens nicht erreichen. Grundsätzlich kann festgestellt werden, dass viele der verwendeten Router neben geschäftlichen auch im privaten Umfeld generell von außen nicht passierbar sind (vgl. [6]). Der Autor Muller et al. [10] deutet auch daraufhin, dass selbst wenn die Adressenknappheit durch IPv6 gelöst wird, die privaten und Firmen-netzwerke weiterhin NAT einsetzen werden. Hierfür liegen Gründe in der Privatheit und Sicherheit bei den Nutzern. Im geschäftlichen Bereich möchte man auch nicht die Topologie des Netzwerks öffentlich zeigen. NAT wird somit auch für das Verstecken von internen Hosts verwendet. Daraus ergeben sich Hindernisse beim Auffinden der Kommunikationspartner. Der Benutzer hinter einem NAT-Router hat nach Liu et Pan [8] bei beliebig verwendeten P2P-Diensten mit geringeren Geschwindigkeiten zu rechnen. Das hat zwei Gründe. Zunächst können andere Peers keine Verbindung zum Peer des betroffenen Benutzers aufbauen, die für eine parallele Kommunikation verwendet werden kann. Er hat infolgedessen deutlich weniger Nachbarn als ein Peer ohne NAT und erhält damit auch eine geringere Bandbreite. Ein weiterer Grund ist die geringere Uploadgeschwindigkeit gegenüber der Downloadgeschwindigkeit, die entsprechend der bei Bittorrent vorhandenen tit-for-tat -Strategie, die NAT-Betroffenen noch zusätzlich bremst. Aus der Einschätzung von Muller et al. [10] folgt, dass P2P zukünftig weiterhin durch NAT behindert wird. Das Problem des NAT Traversal wird somit langfristig bestehen bleiben und damit sind Investitionen für die Bewältigung dieses Problems lohnenswert. Im weiteren Verlauf dieser Arbeit beschäftigt sich Abschnitt II mit dem NAT-Mechanismus. Nach Besprechung der grundlegenden NAT-Implementierungen werden in Abschnitt III die Traversal-Techniken untersucht und verglichen. Hierbei werden verhaltensbasierte (engl. behavior-based) und kontrollbasierte (engl. control-based) Ansätze betrachtet. Nach der Vorstellung der verschiedenen Verfahren wird am Ende eine Zusammenfassung und eine Einschätzung über die Problematik gegeben. II. WAS IST NAT? Als sich die Problematik der Knappheit an IP-Adressen im Internet Protokoll Version 4 in den 90er Jahren immer mehr verdeutlicht hat, soll das Network Address Translation (NAT) Protokoll eine kurzfristige Lösung darstellen. Es soll den Druck der immer knapperen öffentlichen IP-Adressen entschärfen. Der NAT Mechanismus ermöglicht die Verwendung einer einzigen öffentlichen IP-Adresse, die eine Gruppe von Rechnern verwenden kann. Viele Router im privaten Bereich setzen dies ein und kodieren im inneren lokalen Netzwerk die bekannten IP-Adressen der Klasse C (siehe [4]). Nach außen hin tritt der Router dem Internet als einziger Rechner in Erscheinung. Infolgedessen ist er als ein Vermittler für die Zuteilung von Verbindungen zwischen dem privaten und öffentlichen Netzwerk zu verstehen. Bei der ersten ausgehenden Verbindung wird beim Router in der NAT-Tabelle die IP-Adresse mit dem verwendeten Sourceport des internen Rechner gespeichert. Der Router ersetzt dann in jedem Paket die interne IP-Adresse durch die öffentliche IP- Adresse und dem verwendeten Sourceport des Routers. Die Pakete werden jeweils nach der Anpassung vom Router ins Internet weitergeleitet. Die Antwort aus dem Internet wird 4

5 NAT TRAVERSAL TECHNIQUES vom Router wieder entgegengenommen und wieder für den internen Rechner zurück auf die private IP-Adresse übersetzt (vgl. [10]). Wie RFC 3489 [12] jedoch aufzeigt, existieren vier verschiedene Typen von NAT-Implementationen, die im Folgenden näher beschrieben werden. Allen gemeinsam ist die Tatsache, dass für den Sender aus dem internen Netz immer alle Ziele uneingeschränkt erreichbar sind. Der Unterschied zwischen den folgenden Typen liegt in der Erreichbarkeit aus dem externen Netz zum inneren Netzwerk. Zur Illustration werden die Beteiligten folgendermaßen bezeichnet. Peer A (LAN) - L(Adresse:P) Router-NAT - R(Adresse:P) Peer B / C (WAN) - W(Adresse:P) a) Full cone: Beim sog. Full cone wird der ausgehende Internetzugriff nach außen hin auf eine spezifische Portadresse umgeleitet. Zum Beispiel hat Peer A eine Webseite aufgerufen. Sein Sourceport ist und wird L:31400 bezeichnet. Der Router legt eine Bindung für Peer A und Port fest. Dabei verwendet der Router nach außen hin jedoch einen anderen Sourceport (R:61300) als Peer A. Peer B erhält infolgedessen die Anfrage von Peer A von der öffentlichen Adresse des Routers und von Port Der Router leitet die darauf folgende Antwort von Peer B an den Client auf Port weiter. Bei diesem NAT-Verfahren ist ein Zugriff von einem anderen außen stehenden Rechner möglich (siehe Abbildung 1). Wenn z.b. Peer C auf den Sourceport des Routers zugreift, kann Peer C auch den Client erreichen. Peer A S-Port: ) NAT Abbildung 1. NAT-Typ: Full Cone, [2] Port: ) Peer B 2) jeder S-Port 4) jeder S-Port Peer C b) Restricted cone: Dieser Typ schränkt den Zugriff von außen auf die Source IP-Adresse ein. Nur wenn vorher Peer B von Peer A bereits adressiert worden ist, kann Peer B Peer A erreichen. Dabei ist der verwendete Port von Peer B nicht von Bedeutung. Peer A kann Peer B auf Port X aufgerufen haben und Peer B kann ihm über Port Y antworten (siehe Abbildung 2). Peer A S-Port: ) NAT Port: jeder S-Port 3) 2) 4) Peer B Peer C anderen Sourceports werden vom NAT-Router abgewiesen. Dadurch muss der Aufruf und die darauf erfolgende Antwort sowohl in Adresse und verwendetem Port übereinstimmen (siehe Abbildung 3). Peer A S-Port: ) NAT Port: Abbildung 3. NAT-Typ: Port Restricted Cone, [2] 3) 2) Dport Peer A!= SPort Peer B 4) Peer B Peer C d) Symmetrisch: Das symmetrische Verfahren ist etwas komplexer (siehe Abbildung 4). Zu Beginn wird die erste Anfrage von Peer A (L:31500) auf eine Bindung (R:45000) gesetzt. Diese Bindung bleibt solange erhalten, wie Peer A über diese Verbindung kommuniziert. Wenn er jedoch eine andere Verbindung mit gleichem Sourceport und Destinationport zu einem anderen Peer mit anderer IP-Adresse aufbaut, verwendet der Router eine neue Bindung mit anderem Sourceport (R:X). Dies äußert sich dadurch, dass der NAT nach außen hin einen neuen Port für die neue Bindung festlegt. Die Art und Weise des Hochzählens der Portnummern unterscheidet sich von Router zu Router. Eine Gruppe von Routern zählt deterministisch hoch. Die Andere verwendet Sprünge und ist schwer nachzuvollziehen. Der NAT-Router kodiert grundsätzlich die Verbindungen mit einem 5-Tupel: Protokoll, Local-IP, Local-Port, Destination-IP, Destination- Port. Das symmetrische Verfahren ist noch komplizierter als Port Restricted Cone für außen stehende Peers zu durchqueren, da sie die dynamische Festlegung des NAT-Routers nicht vorhersehen können. Deshalb existieren auch port-prediction - Ansätze, die jedoch nicht in allen Fällen erfolgreich sind. Dies begründet sich darin, dass sie ein sprunghaftes fast zufälliges Hochzählen einiger symmetrischer NAT Implementationen nicht vorhersehen können (vgl. [8]). Unter port-prediction versteht man das Erraten der vorhandenen NAT-Bindung und dem dabei verwendeten Port. Bei Feststellung eines gültigen Ports des NAT Routers kann ein Peer von außen den innerhalb des NATs befindlichen Peer erreichen. Peer A S-Port: ) NAT Port: Port: Abbildung 4. NAT-Typ: Symmetric Cone, [2] 3) 3) 2) 4) Peer B D-Port Peer A!= S-Port Peer B Peer C Abbildung 2. NAT-Typ: Restricted Cone, [2] c) Port Restricted Cone: Zusätzlich zu Restricted cone ist der verwendete Sourceport des außen stehenden Rechners eingeschränkt. Die Rechnerantwort darf als Sourceport nur den von Peer A aufrufenden Port (DPort) verwenden. Alle Grundsätzlich ist festzustellen, dass NAT auf die Client-/ Serverarchiktektur zugeschnitten ist. Die verschiedenen Verfahren zeigen auf, dass kein eindeutiger Standard in Bezug auf NAT existiert. RFC 5389 [11] führt an, dass sich die aktuelleren Router in die beschriebenen NAT-Klassifikationen nicht mehr kategorisieren lassen. Sie zeigen vielmehr ein dyna- 5

6 NAT TRAVERSAL TECHNIQUES mischeres Verhalten und wechseln zwischen diesen Klassifikationen entsprechend der Netzwerklast. Dieser Trend erschwert das Traversieren von NAT-Routern erheblich. III. NAT TRAVERSAL TECHNIKEN Wie kann die NAT Barriere überwunden werden? Hierfür existieren verhaltensorientierte (engl. behavior-based) und kontrollorientierte (engl. control-based) Ansätze. Verhaltensorientierte Verfahren haben den Vorteil, dass sie das NAT nicht verändern. Kontrollorientierte Verfahren verändern dagegen aktiv das NAT ohne es zu beobachten, um explizit einen Port für die Kommunikation einzurichten. eine Verbindung zu Client B aufzubauen. Nachdem Client A die Anforderung erhalten hat, öffnet Client A die Verbindung zu Client B, auf die Client B daraufhin mit Client A kommunizieren kann. Reversal ähnelt dem im späteren Verlauf beschriebenen Hole Punching (vgl. [3]). A. verhaltensbasierte Verfahren In diesem Abschnitt werden die verhaltensorientierten Ansätze untersucht. Die Gemeinsamkeit dieser Verfahren liegt darin, dass zunächst die Art der zwischen den Peers liegenden NAT-Implementationen festgestellt werden müssen. Im zweiten Schritt wird der erkannte NAT-Typ entsprechend passiert. 1) Relay: Eine ganz simple Methode zur Überwindung von NAT-Implementierungen ist die Verwendung eines dritten Relay-Servers, der für die komplette Kommunikation zwischen zweier Peers vorhanden sein muss (siehe Abbildung 5). Über diesen Relay-Server wird die komplette Kommunikation abgewickelt. Die Peers haben hierbei keine direkte Verbindung. Voraussetzung für die Verbindung ist somit die ständige Erreichbarkeit des Relay-Servers. Außerdem ist die Bandbreite zum Relay-Server für die Peers sehr wichtig. Sonst entwickelt sich das Relay zu einem Nadelöhr. Das Relay ist jedoch wieder ein Fallback auf die traditionelle Client/Server-Architektur, welches das Potenzial von P2P einschränkt. Abbildung 6. NAT Traversal mit Reversal, [3] 3) STUN: Das Verfahren Session Traversal Utilities for NAT (STUN) wird in RFC 3489 [12] und RFC 5389 [11] beschrieben. In [12] wird das klassische STUN-Protokoll beschrieben, was die generelle Bewältigung der verschiedenen NAT-Typen versucht. Eine grundsätzliche Voraussetzung ist das Vorhandensein eines Servers außerhalb des NAT-Routers. Der außen stehende Dritte sorgt für die Feststellung der öffentlichen Adresse der hinter den NAT Routern befindlichen Peers. Die Peers können nicht von sich aus ihre öffentliche IP-Adresse sicher feststellen. Zur Illustration des klassischen STUN-Prinzips dient die folgende Abbildung 7. Request STUN-Client NAT Response: IP: STUN-Server Abbildung 7. Ablauf STUN-Protokoll Abbildung 5. NAT Traversal über Relaying, [3] 2) Reversal: Wenn ein Peer direkt mit dem Internet ohne NAT verbunden ist, gibt es die Möglichkeit die Reversal- Methode zu verwenden (siehe Abbildung 6). Als Voraussetzung gilt, dass beide Kommunikationspartner sich vorher an einem Rendezvous Server S angemeldet haben, damit der Server die Peers hinter NAT erreichen kann. Dann sendet Client B die Anforderung über den Server S an Client A um Für die Feststellung der eigenen öffentlichen IP-Adresse nimmt der STUN-Client Kontakt mit einem STUN-Server auf. Die Kommunikation erfolgt dabei ausschließlich über das User-Datagram-Protokoll (UDP). Der STUN-Server erhält die Anfrage des STUN-Clients mit der öffentlichen IP-Adresse des STUN-Clients, da der Router das Paket neu adressiert hat. Der STUN-Server schickt als Antwort die öffentliche IP-Adresse des STUN-Clients an den STUN-Client zurück. Damit ist dem STUN-Client die öffentliche IP-Adresse bekannt. Das STUN- Protokoll kann somit nur die öffentliche IP-Adresse des letzten NAT-Routers detektieren, falls der Binding-Request mehrere 6

7 NAT TRAVERSAL TECHNIQUES NAT-Netze passiert. Das STUN Verfahren kann mit allen in section II vorgestellten NAT-Typen bis auf das symmetrische NAT verwendet werden. Bei symmetrischen NAT hat man das Problem, dass den durch STUN festgestellten Port und Adresse des NAT-Routers nicht von anderen Peers genutzt werden. Der symmetrische NAT- Router würde die ankommende Verbindung des neuen Peers abweisen. Somit bringt dem neuen Peer die Information über eine erfolgreiche Verbindung zu Peer X keinen Mehrwert. Um den Fall einer symmetrischen NAT-Umgebung zu erkennen, dient folgender Ansatz. Über die Verwendung von zwei nacheinander abgesendeten Binding-Requests kann der Client feststellen, ob der Sourceport des NAT-Routers fortlaufend geändert wird. Dies ist ein Erkennungszeichen für symmetrisches NAT. 4) TURN: Aufgrund der Schwächen von STUN bezüglich des symmetrischen NATs ist 2005 das Verfahren Traversal Using Relays around NAT (TURN) herausgearbeitet worden. Es handelt sich um eine Erweiterung des STUN-Verfahrens. Bei TURN sind die Allocation-Nachrichten um einige Informationen erweitert. Sie werden bei der Kommunikation zwischen TURN-Client und TURN-Server eingesetzt. In diesen Allocation-Requests werden zusätzlich Verbindungsinformationen gespeichert, so dass der TURN-Server vom TURN- Client Informationen aus dem privaten Netz erhält. Mit Hilfe dieser Informationen kann der TURN-Server die Verhältnisse in den NAT-Umgebungen einschätzen. Er kann mit diesen Informationen auch in bestimmten Fällen ein symmetrisches NAT bewältigen. Im Fall des symmetrischen NATs achtet der TURN-Server darauf, dass im Voraus eine Verbindung vom Ziel zum TURN-Server abgegangen war, damit die Verbindung eines anderen Peers auf die abgegangene Verbindung umgelenkt werden kann. Diese kombinatorische Leistung setzt an den TURN-Server enorme Leistung voraus, da er eine Relay-Funktion inne hat. Deshalb liegt es nahe, dass TURN erst als die letzte Möglichkeit für NAT Traversal genutzt wird (vgl. [9]). 5) Application Level Gateway: Der Application Level Gateway (ALG) oder auch Proxy hat eine ähnliche Aufgabe wie das NAT. Er übernimmt auch eine Stellvertreterrolle und verteilt Pakete an die angeschlossenen Rechner. NAT wird im Vergleich auf der dritten OSI-Schicht betrieben und hat somit keinen Einblick auf den Inhalt der Datenpakete. Der Gateway ist dagegen auf höheren Schichten implementiert und hat dadurch die Möglichkeit Datenpakete zu analysieren. Gegenüber den Routern (NAT) kann der Gateway infolgedessen den Payload (Nutzdaten) von Paketen interpretieren. Er hat somit die Möglichkeit, zusätzliche Informationen abzulegen und abzurufen. Aus höheren OSI-Schichten verschachteln sich mehrere Protokollinformationen und Dateninhalte, die für die unteren Schichten als Payload interpretiert werden. Aufgrund dieses Schichtenmodells haben Router auf der dritten Ebene Schwierigkeiten detailliertere Informationen aus dem Paket zu erhalten. Der ALG dagegen kann den Payload interpretieren und ermöglicht die Übersetzung von öffentlichen IP-Adresse auf private IP-Adressen mit semantischem Hintergrund. Hier stellt der ALG eine zusätzliche Komponente dar und muss im Router ebenfalls untergebracht werden. Diese Kombination wird in RFC 2993 [5] als NAT/ALG klassifiziert. Jedoch ist diese notwendige hardwaretechnische Anpassung am Router genau ein Schwachpunkt für die Umsetzung. Sie ist nach Hu et al. mit Aufwand und Kosten verbunden und daher wenig attraktiv (vgl. [6], 2.4). 6) Hole Punching: Dieses Verfahren verwendet ebenfalls einen externen Server für die Herstellung der Verbindungen zweier Peers ähnlich wie bei STUN. Der Unterschied hierbei ist jedoch, dass Hole Punching die Verbindung durch keepalive-messages längere Zeit halten kann. Der Firewall wird dadurch von innen heraus ein fortwährender Verkehr simuliert, wodurch die Verbindung von außen nach innen nicht abbrechen kann. Sie verändert nicht die Konfiguration der Firewall bzw. Router, sondern erstellt durch simulierte Pakete einen Tunnel in der Firewall. Hole Punching funktioniert in den meisten Fällen nur über das Transportprotokoll UDP. Über TCP kann kein Handshake zwischen den Peers durchgeführt werden. Insbesondere beim sog. Legacy NAT schlägt der TCP-Handshake, da diese NAT-Implementierung das NAT- Verfahren stark abweichend implementiert hat. Der verwendete Relay-Server dient in diesem Fall nur für die Feststellung der öffentlichen IP-Adressen der beteiligten Peers. Nachdem die öffentlichen IP-Adressen festgestellt worden sind, kann das Hole Punching durchgeführt werden (siehe Abbildung 8). Zuerst öffnet Client A eine Verbindung zu Client B. Diese Verbindung schlägt zunächst am Router von Client B fehl, aber der NAT-Router hält auf der Seite von Client A die Verbindung offen. Client B hat jetzt die Möglichkeit Client A zu erreichen. Diese Vorgehensweise wird gleichzeitig für Client A ausgeführt. Ein Problem bei Hole Punching ist jedoch, dass es kein symmetrisches NAT überwinden kann (vgl. [3]). Abbildung 8. Ablauf Hole Punching aus [3] 7) Kombinierte Verfahren: Die Autoren Wacker et al. [13] beschreiben ein Verfahren, welches das STUN-Protokoll für die Feststellung des NAT-Typs von Peers verwendet. Zusätzlich zu dem öffentlichen STUN-Server im klassischen Internet werden auf P2P-Ebene ebenfalls STUN-Server über 7

8 NAT TRAVERSAL TECHNIQUES sog. superpeers erstellt. Damit ist das P2P-Netz unabhängiger von den klassischen STUN-Servern. Nachdem die NAT- Typen über das STUN-Protokoll festgestellt sind, werden die geeigneten Traversal-Methoden entsprechend der folgenden Tabelle I ausgewählt. NAT Full Cone Tabelle I TRAVERSAL SELEKTION, [13] Restricted Cone Port Restricted Cone Symmetric Full Cone Direkt Reversal Reversal Reversal Restricted Cone Port Restricted Cone Direkt Hole Punching Hole Punching Hole Punching Direkt Hole Punching Hole Punching Relaying Symmetric Direkt Hole Punching Relaying Relaying Neben dem Vorschlag von Wacker et al. [13] existiert ein weiteres kombiniertes Verfahren mit der Bezeichnung Nat- Trav (vgl. [1]), welches speziell TCP Verbindungen als Basis und Hole Punching für das NAT Traversal verwendet. Für die Verwaltung ist ein externer Server als Connection Broker im Einsatz. Die Autoren wollen mit ihrem Vorschlag die folgenden Anforderungen erfüllen, die nach ihrer Sicht erforderlich sind: Skalierbarkeit Standardisierte Schnittstellen (Interfaces) Standardisierte Stacks (speziell TCP) Sicherheit Mobilität Der Grund für die ausschließliche Verwendung von TCP statt UDP ist, dass viele Netzwerke UDP nicht zulassen. Die Verwendung von SSL soll sicherstellen, dass die Peers sich gegenseitig authentifizieren können. Das symmetrische NAT-Verfahren wird in der Arbeit von Wang et al. [14] eingehend untersucht. Hier wird das progressive symmetrische NAT-Verfahren klassifiziert, welches deterministisch die Portnummern auswählt und als einfacher einzuschätzen gilt. Das randomisierte symmetrische NAT- Verfahren verwendet dagegen zufällige Portnummern. Das STUN-Protokoll wird letztlich um port-prediction-methoden erweitert zum sog. PS-STUN. B. Kontrollbasierte Verfahren 1) Port forwarding: Eine direkte manuelle Manipulation des Routers ist das Port Forwarding. Der Benutzer muss hierbei jedoch von Hand einen Port für die ankommende Verbindung auswählen und im Router einstellen. Im Router muss in diesem Fall neben dem Port für ankommende Verbindungen auch die Zieladresse im lokalen Netzwerk eingestellt sein. Dieser Port kann somit nur für einen Rechner im lokalen Netzwerk genutzt werden. Über den festgelegten Port ist der lokale Rechner aus dem Internet erreichbar (vgl. [1] und [7]). Die Router-Konfiguration setzt jedoch einige Kenntnisse voraus, die für die meisten Benutzer nicht vorausgesetzt werden können. 2) UPnP: Ein für den Benutzer komfortableres Verfahren ist, wenn die P2P-Anwendung über Universal Plug and Play (UPnP) direkt mit dem Router kommuniziert. Die Anwendung öffnet dadurch automatisch einen Port für ankommende Verbindungen. Damit wird letztlich das Port Forwarding im vorherigen Abschnitt ausgeführt. Die Anwendung muss jedoch UPnP unterstützen und der Router muss UPnP unterstützen bzw. es muss aktiviert sein. Der Router kann allerdings von jedem Rechner aus dem lokalen Netzwerk über UPnP manipuliert werden. Es ist hierbei herauszustellen, dass UPnP nicht für NAT Traversal entwickelt wurde. Das Protokoll wird nur für die Automatisierung verwendet. Der Router kann über UPnP dem internen Rechner weitere Informationen anbieten. Wenn die öffentliche IP-Adresse vom Router direkt abgerufen werden kann, ist das STUN-Verfahren oder andere Feststellungstechniken nicht notwendig (vgl. [1] und [7]). IV. ZUSAMMENFASSUNG UND BEWERTUNG Die verhaltensbasierten Verfahren haben den Vorteil, dass sie nicht das NAT an sich verändern. Sie umgehen die NAT-Blockade durch Simulation einer ausgehenden Verbindung, die im nächsten Schritt für den ankommenden Verkehr genutzt wird. Das STUN-Protokoll an sich, stellt nur die öffentliche IP-Adresse des betroffenen Clients fest. STUN allein hat somit nur eine informative Komponente. In Kombination mit Hole Punching kann die P2P-Anwendung über STUN das NAT überwinden, solange ein symmetrisches NAT ausgeschlossen ist. Das angesprochene einfache Relay ist dagegen für alle NAT-Typen verwendbar. Der Relay-Server muss jedoch den gesamten Verkehr weiterleiten. Die Peers sprechen in diesem Fall nur indirekt miteinander. Der Relay-Server steht jedoch völlig konträr zur Idee von P2P. In diesem Zusammenhang hat der Relay-Server Ähnlichkeit mit dem TURN-Server. Sie sind als teure Lösungen im Vergleich zu den anderen Verfahren zu betrachten. Der Application Level Gateway (ALG) ist eine Erweiterung für NAT-Router, um das NAT zu unterstützen. Das ALG ist als Hardware Erweiterung zu verstehen, da der Router selbst diese Funktionalität nicht bieten kann. Die Übersetzung am Router kann mit ALG besser in Zusammenhang betrachtet werden, da die Verbindung mit ihrem Inhalt betrachtet werden kann. Der Router auf OSI-Schicht 3 kann nur einzelne Pakete interpretieren und nur grob abschätzen, wie sich die Verbindung verhalten könnte. Der ALG dagegen kann durch die Interpretation des Payloads von mehreren Paketen die Semantik einsehen und den NAT-Router daraufhin gezielter justieren. Durch den Einblick des ALGs auf die Router-Konfiguration kann er auch das symmetrische NAT- Verhalten besser einschätzen. Hole Punching ist grundsätzlich die einfachste Methode für P2P-Anwendungen und ist mit einer P2P-Umgebung am Besten zu vereinen. Nachdem die öffentlichen IP-Adressen über STUN festgestellt worden sind, können die Peers über Hole Punching unabhängig von Dritten miteinander die Verbindungen aufbauen. Es kann jedoch, wie bereits erwähnt, nicht symmetrische NATs überwinden. 8

9 NAT TRAVERSAL TECHNIQUES Die kontrollbasierten Verfahren greifen direkt in die Router- Konfiguration ein. Sie beobachten nicht vorher das Verhalten des Routers, sondern erstellen für sich den notwendigen Port. Das manuelle Port-Forwarding ist aufgrund der notwendigen Kenntnisse nur von bestimmten Personenkreisen durchführbar. UPnP ist aus sicherheitstechnischer Sicht kritisch zu betrachten. Jeder Rechner und jede Anwendung kann für sich beliebig einen Port für den ankommenden Verkehr einstellen. Dem Benutzer ist diese Anpassung nicht unbedingt bekannt. Es existiert keine globale Kontrolle über die festgelegten Ports. So können auch verschiedene Rechner miteinander im internen Netz interferieren, wenn Anwendung A den gleichen Port für den ankommenden Verkehr verwenden möchte wie Anwendung B. Das sowohl manuelle und dynamische über UPnP erfolgende Port-Forwarding lässt sich jedoch nicht bei kaskadierten NATs einsetzen (vgl. [8]). Die P2P-Anwendung kann nur den im eigenen lokalen Netz vorliegenden Router beeinflussen. Die außerhalb des eigenen lokalen Netzwerks befindlichen Router können für die P2P-Anwendung nicht angepasst werden. [6] Z. Hu. NAT Traversal Techniques and Peer-to-Peer Applications. Telecommunications Software and Multimedia Laboratory, Helsinki University of Technology. I, III-A5 [7] Knobloch. NAT Traversal. Seminar P2P WS 2005, Humboldt Universität, III-B1, III-B2 [8] Y. Liu and J. Pan. The Impact of NAT on BitTorrent-like P2P Systems. pages , I, II-0d, IV [9] R. Mahy, P. Matthews, and J. Rosenberg. Traversal Using Relays around NAT (TURN): Relay Extensions to Session Traversal Utilities for NAT (STUN). RFC 5766 (Proposed Standard), April III-A4 [10] A. Muller, G. Carle, and A. Klenk. Behavior and Classification of NAT Devices and Implications for NAT Traversal. IEEE Network, 22(5):14 19, September I, II, V [11] J. Rosenberg, R. Mahy, P. Matthews, and D. Wing. Session Traversal Utilities for NAT (STUN). RFC 5389 (Proposed Standard), October II-0d, III-A3 [12] J. Rosenberg, J. Weinberger, C. Huitema, and R. Mahy. STUN - Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs). RFC 3489 (Proposed Standard), March Obsoleted by RFC II, III-A3 [13] A. Wacker, G. Schiele, S. Holzapfel, and T. Weis. A NAT Traversal Mechanism for Peer-To-Peer Networks Eighth International Conference on Peer-to-Peer Computing, pages 81 83, September III-A7, I, III-A7 [14] Y. Wang, Z. Lu, and J. Gu. Research on Symmetric NAT Traversal in P2P applications. International Multi-Conference on Computing in the Global Information Technology - (ICCGI 06), 00(c):59 59, III-A7 V. FAZIT Aufgrund der diversen NAT-Implementationen existiert kein alleinstehendes NAT-Traversal Verfahren, was alle Formen bewältigen kann. Das Problem hierbei ist, dass kein exakter Standard für NAT von den Herstellern verfolgt wird. Die Router weisen ein dynamischeres Verhalten auf, wodurch die besprochenen NAT-Typen abwechselnd zum Einsatz kommen. Die NAT Implementationen unterscheiden sich nicht nur von Hersteller zu Hersteller, sondern auch von Modell zu Modell (vgl. [10]). Leider wird es zukünftig nicht absehbar sein, dass NAT grundsätzlich abgelöst wird. Selbst wenn IPv6 eingeführt wird, ist zu erwarten, dass NAT weiterhin zur Abtrennung von lokalen Netzwerken eingesetzt wird. Die ursprünglich als kurzfristig angesehene Zwischenlösung zur Bewältigung der IP-Adressenknappheit hat sich im Alltag des Internets manifestiert. Um dieses Problem aufgreifen zu können, müssen mehrere NAT-Traversal-Techniken verwendet werden. Das TURN-Verfahren kann zwar die meisten aller auftretenden Fälle bewältigen. TURN benötigt jedoch enorme Ressourcen. Zuerst wird die Verwendung von STUN empfohlen, um die einfachsten NAT-Implementieren durchqueren zu können und dann erst die aufwändigeren Verfahren wie TURN einzusetzen. LITERATUR [1] J. Eppinger. TCP Connections for P2P Apps: A Software Approach to Solving the NAT Problem. Technical Report CMU-ISRI, pages , III-A7, III-B1, III-B2 [2] Knocke et al. Wie funktioniert Network Address Translation (NAT)? - Kategorien, , 2, 3, 4 [3] B. Ford, P. Srisuresh, and D. Kegel. Peer-to-Peer Communication Across Network Address Translators. In USENIX Annual Technical Conference, page , , III-A2, 6, III-A6, 8 [4] V. Fuller and T. Li. Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan. RFC 4632 (Best Current Practice), August II [5] T. Hain. Architectural Implications of NAT. RFC 2993 (Informational), November III-A5 9

10 TRAFFIC MODELS FOR MULTIPLAYER GAMES Traffic Models for Multiplayer Games Maxim Babarinow Abstract Recently, a number of studies have been conducted in the field of traffic modeling for network multiplayer games. However, new hardware, software, and architectures have been proposed for network games. For example, peer-to-peer technology was introduced to replace client-server architecture to reduce the impact of single-point-of-failure. On the one hand, there is a need for developing benchmarks, generating realistic workloads, to validate these techniques. On the other hand, Internet Service Providers (ISPs) should be able to estimate the amount of traffic caused by network games to improve their architectures. Following a brief introduction and a presentation of measurement methodologies, this work presents a classification of traffic models. State-of-the-art low-level traffic models are categorized and the most important results are presented. The goal of this work is to help researchers and ISPs in the development of game traffic generators to evaluate promising approaches for providing low-latency gaming environments. I. INTRODUCTION Internet multimedia traffic is increasing rapidly due to the growth in popularity and prevalence of interactive applications such as video-conferencing, streaming media and network games [12]. By the introduction of multiplayer games on the Internet, network games received a high boost in popularity, since they attract millions of users who are playing simultaneously in virtual worlds. Interactive network games are now generating a significant part of today s Internet traffic and pose one of the most profitable businesses on the Internet. Analyzing this kind of traffic is highly interesting for Internet Service Providers (ISPs), manufacturers and researchers. Interactive gaming has a higher need for real time performance than other real-time applications. The requirement of quality of service (QoS) for such traffic is stricter than for voice or video applications that require roundtrip delays of less than about 300 ms [17]. Game players found that the difference between 50 and 150 ms delay can determine who wins or loses the game [1]. This knowledge forces game players to choose an Internet Service Provider (ISP) that is offering the best performance. ISPs in turn respond to this demand by providing support for high-quality gaming environments. The supply of this support may lead to better customer retention or even new revenue streams [18]. In order to improve their infrastructure for this kind of traffic, ISPs must have detailed knowledge of the network load generated by games. Currently, due to the motivational factors we described, there are a number of convenient traffic models. They are created by analyzing a number of different popular games. These models permit the evaluation of hardware and software for low-latency gaming environments. II. GAME AND TRAFFIC MODEL CLASSIFICATION There are a very large number of commercial network games in different game genres and a variety of papers on the modeling of game traffic. For the classification and comparison we use the game classification and results presented in [9]. The traffic of various game genres varies depending on client hardware, game architecture and other factors [3], [9]. One of the main conclusions from the study in Lakkakorpia [9] is that in general, action games, simulators, and real time strategy (RTS) games produce real time traffic while turn based strategy games produce non-real-time traffic. Based on this knowledge, the traffic models presented for action, simulators and RTS games can be summarized as a single category. Further research [2], [15], [7] analyzed traffic of Massively Multiplayer Online Role Play Games (MMORPGs). The research shows that there are key differences between MMORPG traffic and other game genres. Based on this fact, we introduce a second category, the MMOGs. Action, simulators, and real time strategy (RTS) games. This class of games requires short reaction times by the player. Thus, such games are especially sensitive to network characteristics and generate realtime traffic [9]. Games in this category generally contain virtual characters or avatars moving in real time virtual environments. Examples: Quake III, Halo 2, Half-Life, Need for Speed, or Starcraft. Massively Multiplayer Online Role Playing Games (MMORPGs). MMORPGs describe a fast-growing game genre and belong to the most popular games among network gamers. They attract millions of people who are playing simultaneously and are populating virtual worlds. MMORPGs (Massively Multiplayer Online Roleplay Games) are a special role playing genre extension to MMOGs (Massively Multiplayer Online Games). MMORPGs and FPS (First Person Shooter) games both generate small packets and require low bandwidths [2]. On the other hand, the traffic characteristics of MMORPGs are different from the well established FPSs [15]. MMORPGs, such as World of Warcraft (WoW), Lineage I & II or Shen Zhou Online, use a different type of protocol and have other limitations and requirements to the network. A. Measurement methodology III. METHODOLOGY Different multiplayer games offer different connectivity options. Internetwork Packet Exchange (IPX) games are played over LANs, while TCP/IP games can be played either over LANs or the Internet [1]. In some of the analyzed games, 10

11 TRAFFIC MODELS FOR MULTIPLAYER GAMES the game server can be run in dedicated or non-dedicated mode [13]. Most games are built as client-server applications [5]. In this setting, clients are communicating and coordinating with a central server. The server keeps track of the global game state and passes game state information to each client. Clients synchronize their local state with the server by using the packets they receive and they return packets containing player movement and status information [4]. Tools such as Tcpdump 1, Etheral 2 (a network protocol analyzer), pkthisto 3 (packet traffic analysis program) and other network packet sniffers like Commview 4 can be used to capture and analyze game network traffic. When the traffic of a game server cannot be measured (e.g., in World of Warcraft, a popular MMORPG), it is assumed that the received packets at client level are server traffic, and transmitted packets are client traffic [13]. B. Characterization and validation methodology The goal in the development of traffic models is to achieve a reasonable match with the observed data but also to find a balance between accurate modeling and performance (simulation execution speed) [11]. Borella [1] found that traditional goodness-of-fit tests such as Chi-square (χ 2 ) and Kolmogorov- Smirnov (KS) often fail and do not lead to acceptable results. This assumption was validated in numerous other publications [11] [9]. These tests are biased towards large or "messy" data sets, as well as data that exhibits significant autocorrelation [1]. Instead of determining a data set s fit to a specific model, a common approach is to determine the discrepancy between empirical data and the mathematical model. The discrepancy is measured using a metric λ 2 that returns a non-negative magnitude [14]. For a value of 0, there is no discrepancy at all. Discrepancy is significant if λ 2 > 1.0. The smaller the discrepancy, the better the model [9]. Various distribution functions are used for modeling packet length and packet interarrival time. A promising approach providing less discrepancy is to model a data set for a particular part with a split distribution [9]. This approach also provides an estimator ˆλ 2 to measure the discrepancy in case data is grouped. Quantilequantile (Q-Q) plots are commonly used to visually compare and display the quality of a model. Deploying a large number of network game clients with variable network conditions is expensive and therefore difficult [3]. However, the provided traffic models can be used in combination with network simulation tools, such as NS-2 5 to develop traffic generators [6], [10], [11], [18]. 1) Distribution functions: For a mathematical description a distribution function and its parameters have to be chosen to fit the empirical data. Commonly used distributions are: normal, extreme value, deterministic, exponential, power lognormal or even complex gamma function. In this section, two of these most commonly used distribution functions are presented with a brief explanation. In this section we will renounce on introducing goodness-of-fit tests such as Chi-square (χ 2 ) and Kolmogorov-Smirnov (KS) since, as we mentioned before, they do not lead to acceptable results. Extreme Value distribution: f(x) = 1 b x a x a e b e b e b > 0 This distribution is commonly used for modeling the smallest or largest value in a large set of identically distributed values. Here, a is a location parameter and b is a scale parameter. Cumulative distribution of Extreme Value distribution is given by: F C x a e (x) = e b Power Lognormal Distribution: f(x, p, σ) = ( p xσ )φ( logx σ )(Φ( logx σ )) p 1 x, p, σ > 0 This equation can be used with parameters provided in the citations of this work for matching the packet size distribution in various games. Cumulative distribution function of Power Lognormal Distribution is defined as follows: F C (x, p, σ) = 1 (Φ( logx σ )) p x, p, σ > 0 IV. TRAFFIC MODELS Traffic modeling for games can be categorized into low or high level driven traffic models. Low level traffic models are based on attributes like packet sizes, packet inter-arrival times (amount of time between the arrival of one packet and the arrival of the next packet) and data rates (in packets and bits per second) [1], [6], [3]. High level approaches generate workload by modeling user behavior. Tan et al. [16] present a Network Game Mobility Model (NGMM) for First-Person-Shooter (FPS) games. They state that there is a growing need to develop more realistic models than the low level approaches, since low level traffic models fail to address application level aspects of network game traffic. However, the NGMM model cannot be applied or generalized for other game genres like MMOGs. The behavior of users in this genre can vary drastically, leading to irregular traffic which is unique to MMORPGs [2]. Since the complexity of high level models compared to low level approaches is significantly higher, a detailed discussion of high level models would exceed the context of this paper. An alternative approach is to consider in-game behavior for the low-level traffic model presented in [13]. This work primarily discusses low level traffic models but also presents a promising approach [13] by combining in-game behavior with low level traffic modeling. 11

12 TRAFFIC MODELS FOR MULTIPLAYER GAMES V. ACTION, SIMULATORS AND REAL TIME STRATEGY (RTS) GAMES Borella [1] presented a traffic model for Quake I and Quake II. Other traffic models for Quake II are presented in [11] and [13]. The authors of [4], [5], [6] present traffic models for Half-Life. A traffic model for Counter-Strike and Starcraft is presented in [3]. In [10] a traffic model and characterization is presented for the Xbox console game Halo 1 and in [18] for its successor Halo 2. More traffic models for different game genres are presented in [9]. A. Packet size Server to Client. The packet size sent from server to different clients strongly depends on the number of players participating in the game [11], [18]. Figure 1 shows how the packet size corresponds to the number of players in Quake III. For every player, a certain amount of data has to be sent from server to client to synchronize their gameplay. Therefore, the packet size is directly dependent on the number of players [5]. Different maps can also influence the packet length [6] but only to a lesser extent than the number of players [11]. distribution plus a negative exponential function for every additional player [11]. This model is, however, not useful for higher packet lengths over 200 Bytes. Other studies [4], [1], [18] use the extreme value distribution which fits best for Half-Life, Quake II and Halo 2 server packet lengths. In [18], different parameters are presented depending on the number of players. Client to Server. Packet lengths from Quake III and Half-Life are almost independent of all observed parameters (number of players, map type, client hardware) [11] [6]. Each Client sends packets with nearly the same size. Packets in Quake III have a more limited range from 50 to 70 Bytes [11] whereas the packet size in Half-Life ranges from 60 to 90 Bytes [6]. Half-Life packet size presented in [4] varies around a mean of 82 Bytes where 99 percent of all packets range between 60 and 110 Bytes. Figure 2 shows an aggregated packet length plot. The packet length is as previously introduced independent of all parameters. The peaks in the packet length log that occur at random time intervals may represent a short idle time [6]. Fig. 2. Half-Life - all clients to server packet lengths [6] Fig. 1. Quake III - packet length vs. number of players [11] In Half-Life, the packet length is wide spread and ranges from 60 to 300 Bytes. Most packets are between 140 and 180 Bytes [6]. The mean packet size for Counter- Strike presented in [4] is 127 Bytes and 160 Bytes in [3]. The Quake III mean packet length is 77 Bytes [1] and in the case of Starcraft it amounts to 120 Bytes [3]. Overall, the average packet sizes are much smaller than the typical Internet traffic packet size of more than 400 Bytes. The packets are usually very small because they contain mainly movement and status information [4]. A Lognormal distribution with different parameters can be used to find out a fitting packet length. In [6], parameters presented for Half-Life are considering different maps. The packet length in Quake III with a two player game can be modeled with a Lognormal Quake s mean packet size is about 23 Bytes and can be well-modeled deterministic [1]. Median packet sizes for Starcraft are about 120 Bytes and for Counter-strike about 160 Bytes [3]. Regarding Halo 2, the packet size depends on the number of players on a single client and can be modeled as a linear function [18]. All other traffic models confirmed that there is no dependency from the number of players. Our assumption is that this finding relates to the Xbox architecture of Halo 2 which differs from common FPS games architectures, since many other studies approved that client to server packet size is mostly independent of all observed parameters. B. Packets per Second (PPS) and data rates Server to Client. The PPS rate in Quake III varies slightly between 19 and 20 packets per seconds and the packet transmission rate is almost constant. One packet update per client occurs approximately every 50 ms [11]. Almost the same value was observed in the traffic-model 12

13 TRAFFIC MODELS FOR MULTIPLAYER GAMES of Half-Life [6] where the PPS rate varies between 16 and 16.5 packets per second. Approaching the capacity of 22 players in Counter-strike, the server has a packet rate of around 800 packets per second and hovers around 900 Kbits per second [5]. The data rate (Kbits per second) depends on the PPS rate and packet length [6]. Client to Server. The author of [4] assumed that as slower client machines require more processing time for rendering, their packet rate is lower. This assumption has been confirmed by [11], [1], [4]. The packet per second rate of individual clients depend on the graphic card and the map played [11]. The PPS rate in Half-Life varies from 22 to 24 packets and mean data rate is about 13 Kbits per second [6]. The bandwidth used by clients in a 2-players Starcraft game is around 600 Bytes per second. Each pair of players adds about 1500 Bytes per second of data and can be modeled as a linear function [3]. The average bandwidth in Counter-Strike is around 2800 Bytes per second [3]. C. Packet inter-arrival times Server to Client. Inter-arrival times of Counter-strike, Half-Life and Quake III servers are very steady [4], [6], [11]. Grenville et al. [6] show that there is a low dependency of the map played. Generally, the map or number of clients connected to a server have no influence on inter-arrival times as shown in Figure 3 [11], [13]. Every 60 ms (Half-Life) [6], 50 ms (Quake III) [11], [13] or 62 ms (Counter-Strike) [4] one update packet is sent to clients. Packet inter-arrival times can be modeled with fixed time intervals depending on the map played [6]. Another approach is to ignore the type of map and model interarrival times by a complex gamma distribution [11], or an extreme value distribution [1], [4], to use mixtures of various distributions (normal, extreme, deterministic, exponential) [9] or to approximate with a simple spike at a constant value [11]. Client to Server. The client inter-arrival time is independent on the number of players. If the number of clients increases, the mean of packet inter-arrival time of all clients are similar [13]. The packet transmission from the different clients is strongly dependent on the graphic card and rendering software used [11], [6]. Client s graphic cards and the map played influence the traffic characteristics. Modern graphic systems with more memory and faster graphic processing chips send more packets per second than machines with old graphic cards [11]. Various parameters and distribution functions for multiple game genres can be extracted from Lakkakorpia and c.o. study [9]. VI. MASSIVELY MULTIPLAYER ONLINE ROLE-PLAYING GAMES (MMORPGS) In [7] the authors present a traffic model for Lineage (popular MMORPG). Authors of [2] proposed a analyzed an extensive traffic trace for a popular Korean MMORPG Fig. 3. Quake III - Probability Density Function (PDF) of packet interarrival times for server and client considering number of players and in-game behavior [13] called ShenZhou Online. Both traffic models are developed for 2D games. Interestingly, comparing mostly the same game in 2D and full-3d graphics (Lineage I and Lineage II), leads to different bandwidth consumptions [8]. Motivated by this fact, traffic characterization and models for Lineage II, the successor of Lineage I, which features full-3d graphics is presented in [8]. Papers focused on traffic patterns generated by the popular MMORPG World of Warcraft (WoW) are presented in [15] and [13]. The key characteristics of previously discussed online game traffic are that all of them have small and highly periodic UDP packets. However, most MMORPGs exchange messages using TCP packets due to their client server structure and connection management convenience [7]. A. Bandwidth Consumption The tendency of down and uplink bandwidth consumption can be analyzed during a traffic measurement period. It was observed that at weekdays, the bandwidth in Lineage II oscillates between 20 Mbps and 100 Mbps [8]. The highest consumption was found on holidays [15] with up to more than 140 Mbps in Lineage II [7]. Low packet rate and small payload of MMORPGs like Shen Zhou Online [2] lead to low bandwidth requirements to play this type of game. Nearly all connections consume less than 8 Kbps when TCP ACKs are considered. This usage is much lower than the average 40 Kbps needed by Counter-Strike [2]. This surprising result might stem from the players behavior in these two different types of games. Players behavior in MMORPGs is relatively 13

14 TRAFFIC MODELS FOR MULTIPLAYER GAMES slow-paced compared to FPS games but on the other hand the bandwidth consumption is comparable to the online RTS game Warcraft III [2]. The bandwidth of downstream packets in Lineage II [8] is substantially larger than that of upstream packets. This disparity is much bigger than in Lineage I. It was found out that the asymmetry between upstream and downstream traffic is significant. This is caused by the different graphical representation of Lineage II and Lineage I. Svoboda et al. compared their analyzed traces with a video record concluding that high peaks in downlink correlate with high environment interaction. This raises the question how the number of users impacts on bandwidth. It was found that the correlation between number of users and bandwidth is linear (see Fig. 4) [7], [8]. This finding is a result from the structure of common MMOGs since not every client s message to the server is broadcast. Messages are forwarded only to a small group of users who are near to the sender or in the same area [7]. II traffic analysis shows that aggregate packet inter-arrival times are dependent on the number of concurrent users [8]. Most packets (99 percent) arrive in 270 microseconds. This is shorter than in Lineage I with 2 ms. This difference is caused by concurrent user number, game structure and full- 3D graphics in comparison of Lineage II [8] to Lineage I. The model for inter-arrival of WoW does not exactly correspond to the empirical distribution but can be best modeled for server and client as normal distributions [13]. Same problems with matching observed data was observed in Lineage I traffic best matching model was Extreme Value Distribution. D. Comparison of MMOG traffic with other game genres MMORPGs game pace is slower than FPS games, so the inter-arrival times of server and client of World of Warcraft are longer than in the case of Quake III [13]. In games like FPS, RTS, and FTG (Fighting Games), during a game, players must be always active or they will be defeated. In MMORPGs and other adventure-oriented games players do not need to "play" all the time [2], so they spend a lot of time in a idle state or are communicating over the in-game chat functionality with other players. Non-RPG games are mostly round based. Hence, the session time is very short in comparison to MMOGs. The analysis of session duration of Lineage II shows the big impact [8]. The observed average session duration is 183 minutes [8]. The distribution of session durations is heavy-tailed, meaning that the average rate is mostly generated by addicted users who play more than 80 hours during the measurement period. The average playing time of Lineage II is much higher as the average time about minutes [7]. Fig. 4. Correlation between number of users and bandwidth [7] B. Packet lengths BY empirically analyzing Lineage I, it was shown that the smallest packets have no data at all and just 40 Bytes of header. These are pure control packets such as SYN, ACK and FIN [7]. The average client to server packet size is about 49 Bytes and server to client about 76 Bytes [7]. The client packet size in Shen Zhou Online without the TCP/IP header amounts to about 40 Bytes. The server packet sizes have a much higher distribution with an average payload size of 114 Bytes [2]. As previously discussed, server packets are containing data about multiple players. Especially in MMOGs, the server sends packets not containing the data of all players but only of players that are in the player s close environment. Lineage I packet size can be best modeled with a Power Lognormal Distribution [7]. C. Packet Inter-arrival Times When a player does not play the game, inter-arrival time of the server is more than 200 ms. When the player is in a battle, server and client transmit more packets to each other and the inter-arrival time is the shortest [13]. Lineage VII. CONCLUSION AND FUTURE WORK Network games attract millions of people and they are generating a significant part of today s internet traffic. The increasing popularity of such interactive applications leads to the introduction and development of new hardware, software and architectures. These techniques have to be validated to provide low-latency gaming environments. A promising approach is to develop benchmarks for validation and evaluation of quality of service (QoS) aspects in respect to games. However, the question rises how realistic workloads can be generated. This work presented an overview over the most recent traffic measurement methods, characterization and low level traffic modeling of popular multiplayer games. Low level traffic models are introduced to generate workload. The main research results are that traffic of various game genres can depend on client hardware, game architecture and other factors. In general, we can classify the traffic based on realtime requirements of different game genres. But also, we have to introduce a separate category for MMORPGs because of their special traffic characteristics. On the one hand, Action, Simulators and RTS games require short reaction times; they generate real-time traffic and are especially sensitive to network characteristics. On the other hand MMORPGs require low-bandwidth consumption; they generate small packets and have very long-tailed session durations. The most significant effect of 3D-based MMORPGs (Lineage II) compared to 2D 14

15 TRAFFIC MODELS FOR MULTIPLAYER GAMES MMORPGs (Lineage I) is the asymmetry between upstream and downstream traffic. Different measurement methods, various setups and different parameter analysis make it complicated to compare the presented studies. Consequently, the need for the development of a standard in the measurement of game traffic will arise. In most of the presented traffic models ingame behaviors were mentioned but ignored in their traffic models. Future experiments about other game genres or types of network games should rise to the challenge of including in-game behavior for low-level traffic models. The presented results and studies can be used as a basis for future benchmark development. [18] Sebastian Zander and Grenville Armitage. A traffic model for the xbox game halo 2. In NOSSDAV 05: Proceedings of the international workshop on Network and operating systems support for digital audio and video, pages 13 18, New York, NY, USA, ACM. I, III-B, V, V-A, V-A, V-A REFERENCES [1] M. S. Borella. Source models of network game traffic. Computer Communications, 23(4): , I, III-A, III-B, IV, V, V-A, V-A, V-B, V-C [2] Kuan-Ta Chen, Polly Huang, and Chin-Laung Lei. Game traffic analysis: An mmorpg perspective. Computer Networks, Nov II, IV, VI, VI-A, VI-B, VI-D [3] Mark Claypool, David LaPoint, and Josh Winslow, editors. Network Analysis of Counter-strike and Starcraft. In Proceedings of the 22nd IEEE International Performance, Computing, and Communications Conference (IPCCC), pages , Phoenix, Arizona, USA II, III-B, IV, V, V-A, V-A, V-B [4] Johannes Färber. Network game traffic modelling. ACM, New York, NY, USA, III-A, V, V-A, V-B, V-C [5] Wu-chang Feng, Francis Chang, Wu-chi Feng, and Jonathan Walpole. Provisioning on-line games: a traffic analysis of a busy counter-strike server. In IMW 02: Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment, pages , New York, NY, USA, ACM. III-A, V, V-A, V-B [6] Tanja Lang Grenville, Tanja Lang, Grenville Armitage, Phillip Branch, and Hwan yi Choo. A synthetic traffic model for half-life. In in Australian Telecommunications, Networks and Applications Conference (ATNAC, III-B, IV, V, V-A, V-A, 2, V-B, V-C [7] J. Kim, E. Hong, and Y Choi. Measurement and analysis of a massively multiplayer online role playing game traffic, August II, VI, VI-A, 4, VI-B, VI-D [8] Jaecheol Kim, Jaeyoung Choi, Dukhyun Chang, Taekyoung Kwon, Yanghee Choi, and Eungsu Yuk. Traffic characteristics of a massively multi-player online role playing game. In NetGames 05: Proceedings of 4th ACM SIGCOMM workshop on Network and system support for games, pages 1 8, New York, NY, USA, ACM. VI, VI-A, VI-C, VI-D [9] J. Lakkakorpia, A. Heinerb, and J. Ruutuc. Measurement and characterization of internet gaming traffic. Research Seminar on Networking, February II, III-B, V, V-C [10] T. Lang and G. Armitage. A ns-2 model for the system link game halo. Australian Telecommunications Networks and Applications Conference (ATNAC), III-B, V [11] Tanja Lang, Philip Branch, and Grenville Armitage. A synthetic traffic model for Quake3. ACM, New York, NY, USA, III-B, V, V-A, 1, V-A, V-B, V-C [12] Mingzhe Li, Mark Claypool, Robert Kinicki, and James Nichols. Characteristics of streaming media stored on the web. ACM Trans. Internet Technol., 5(4): , I [13] HyoJoo Park, TaeYong Kim, and SaJoong Kim. Network traffic analysis and modeling for games. Internet and Network Economics, pages , III-A, IV, V, V-C, 3, VI, VI-C, VI-D [14] S.P. Pederson and M.E. Johnson. Estimating model discrepancy. In Technometrics 32, pages , III-B [15] P. Svoboda, W. Karner, and M. Rupp. Traffic analysis and modeling for world of warcraft, June II, VI, VI-A [16] Swee Ann Tan, William Lau, and Allan Loh. Networked game mobility model for first-person-shooter games. In NetGames 05: Proceedings of 4th ACM SIGCOMM workshop on Network and system support for games, pages 1 9, New York, NY, USA, ACM. IV [17] International Telecommunication Union. One way transmission time. Technical report, Recommendation G.114, February I 15

16 SYBIL RESISTANT DHT Sybil resistant DHT Niklas Büscher Abstract Distributed Hash Tables (DHTs) are often exposed to an attack, in which an adversary introduces many false identities to the DHT network, called sybil attack. These malicious nodes increase the influence of an attacker and undermine the peer-topeer structure, which leads to a malfunctioned system. We give a short overview of the different types of solutions to resist this attack and then concentrate on two of them. We discuss their theoretical potential and promises in contrast to their usability and real feasibility. I. INTRODUCTION Many systems on the internet are vulnerable to an attack, in which an adversary creates a lot of malicious identities, called sybil attack. The attacker as a single entity, creates many identities to interact with the attacked systems. The probably most common victims are voting and reputation systems. If one single attacker submits a huge number of votes to one poll, he influences the voting results. In 2009 the Time magazine 1 started an online vote, to ask who are the most influencing persons in The most voted persons are unknown persons, leaded by a pseudonym moot. A few internet users from 4chan 2, an image board, found a way to submit more than one vote and so they submitted many millions of votes with just a few physical entities. The online service of the time magazine could not decide which physical entity has already voted and which not and so they accepted many millions of virtual identities belonging to the same users. It is also possible to undermine many reputation systems by creating many false identities, who influence the reputation process for the needs of an adversary. Most common examples are Googles Page Rank or the reputation systems of the online stores like Ebay or Amazon, where an attacker decries products of his competitors or increases the rating of his own products. The reliability of any distributed network depends on the reliability of each single node. If an attacker could gain access to many false nodes or create numerous malicious nodes he has an influence on the networks behavior. There are many ways to influence and disturb peer-to-peer systems with the usage of a huge number of controlled nodes. One example is, that an attacker could start flooding the whole network by sending many nonsense requests from his many malicious identities. This stops or slows the routing and information exchanging process, because many honest nodes have to work for the multiple false identities and do not have many resources left to work for the honest nodes. Another main problem is the vulnerability of the routing process Peer-to-peer systems often use a multi-hop routing and so a large number of malicious nodes increases the probability to have a malicious node within a path between honest nodes. Our main focus in this paper is how to resist this routing problem. The following sections in this paper provide an overview of solutions to the sybil attack. In Section II we will give an introduction on DHTs and online socials networks. Section III follows with a description the Sybil attack and the possible impacts on DHT. Then in section IV we will give a classification of solutions to this attack and sections V and VI will go into a more detailed discussion of these solutions. In section VII we will conclude our overview on the sybil attack and give an outlook on further research. A. Distributed Hash Tables II. BACKGROUND Distributed hash tables (DHTs) are part of many structured peer-to-peer systems. The DHTs provide a lookup structure like it is known from hash-tables. Out of a black-box view, a DHT system provides two functions, one to save or put a pair (key, data) on the network and one to retrieve the data associated with a given key. Every node has a responsibility for a range of keyvalues and to save the associated data. The goal is to divide the keyspace into uniform parts, so that every participating node has the same amount of work to do. The responsibility for maintaining the mapping from keys to values is distributed among the nodes. This allows DHTs to scale with extremely large numbers of nodes. An example of a DHT peer-to-peer system is the Chord network [8]. It has the characteristic features of a DHT and is used as a base for our later view on the solutions of the sybil attack. We first take a look at the structure of the Chord network and the implemented routing process. Chord nodes organize themselves in a ring structure [fig 1], In a well functioned network, each entering node looks up its own position in the ring, depending on the own ID (e.g. ID = H(u)) and then contacts or creates links to its successors. This should guarantee that all nodes are reachable. To increase the efficiency of this structure, each node keeps a table of fingers to other node. These fingers are carefully chosen around the ring, so lookups can be done in a very fast way without many hops. The goal is to reach an exponentially distribution of fingers around the ring to offer a lookup in O(log (n)). So short jumps could be done in detail but also longer jumps could be done very fast. As said before the saved data is associated with a key, derived from the hash function H. Each node has a responsibility for the keys (and the associated data) next to its own ID. The 16

17 SYBIL RESISTANT DHT Fig. 1. The routing structure of the Chord network. png/250px-chord_network.png typical lookup strategy of a node starts with a lookup in the own successor and finger table, for the node which is closest to the target node. Then the node sends a request to this chosen finger, which replies with the requested data or has to recursively forward the request until the queried node is found. This standard lookup procedure is called a multi-hop closeness routing [1]. In typically DHTs there are two ways to do a multi-hop lookup. One is called the iterative way and the other is the recursive way. An iterative lookup requires more initiative from the querying node but offers more control about the routing process. The querying node requests its fingers for the next possible hops and then continues with querying the next hop itself. This is in contrast to the recursive way, where the fingers are their selves responsible for forwarding the request. B. Social Networks - Bootstrap Graph A (online) social network service provides possibilities for persons and institutions to build up a digital representation of parts of their physical world relations. These virtual representation forms a well connected network graph out of the virtual identities and the links between these contacts. A bootstrap graph is the initial situation for the routing process in peer-to-peer systems [1], [7]. A online social network can be used as a bootstrap graph. This bootstrap approach offers the possibility to find a probably honest route from node to node, because online social networks (should) reflect trusted physical relations. III. SYBIL ATTACK The name sybil attack was introduced by Douceur [3], [4], based on a novel about the multiple identities of the main character Sybil. There is not any formal definition of the sybil attack, but it is clearly stated [1] [3], [5], [10], that the sybil attack is described with the phrase, that one physical entity creates many (virtual) identities to increase its influence. As described in the introduction, the sybil attack occurs in a large number of domains, but we focus on the sybil attack on DHTs. The reliability of a DHT network depends on the reliability of each single node. If an attacker gains access to many false nodes or creates numerous malicious nodes he will have an influence on the DHT networks behavior. There are at least two different targets for an attacker. On the one hand there is the saved data in the DHT, which can be manipulated or deleted by an attacker. And on the other hand there is the routing process which can be disturbed. The distribution of the sybil nodes has consequences for the possible attacks. In Chord, a region full of sybil nodes in the chord network has a large influence on the data saved in the region, but the impact on the rest at the ring structure and the routing process is another than fairly distributed sybil nodes. Looking at the attacks on the routing process, a sybil attacker can start flooding the whole network by sending many nonsense requests from his malicious identities to stop or to slow the routing and information exchanging process of the peer-topeer network. Another attack with the usage of many malicious sybil nodes concerns the routing process of the honest nodes. If a honest node searches another node or content on the peer-to-peer overlay, depending on the routing process of the DHT network, the node will look up into its hash-table to get the next hop (depending on the routing protocol). If the next hop is a false node, the false node will have at least two possibilities to disturb the routing of this honest node. At first it just can simple stop forwarding the request to the next closer node to the goal node. Depending on the routing protocol, the honest node has to wait until the time out is reached and then has to continue requesting the next node. Depending on the percentage of the false nodes to the whole network, the routing process will take a long time, because the honest nodes will hit many false nodes. Another possibility of the sybil attacker is to disturb the routing process is, that every malicious node replies a routing request with a link to another malicious node, controlled by the adversary. So the honest node is sent from malicious node to malicious node. This behavior is known as the "wild goose chase" and the attacking nodes are called Byzantine adversaries. In the next chapter we will classify known solutions for sybil attack against the routing process in DHT networks. IV. OVERVIEW OF SOLUTIONS There are many ways to classify solutions for the sybil attack [6] on DHT. It already starts with the different types of DHT systems and many DHTs implement different routing protocols and organizing structures. Other ways to classify the solutions are by system type and environment(structured, unstructured, routing protocol... ), by means, by costs and possible other classes. Furthermore the solutions can be divided into two different types of solutions, recognizing the attack and prevention or reaction on this attack. Our main focus is set on the prevention and robustness of the routing process, given a structured DHT peer-to-peer network. 17

18 SYBIL RESISTANT DHT So we decided to differentiate between two main classes of solutions. A. Architectural keeping The first group uses existing structures and does not need any architectural redesign, in contrast to the second group, which requires architectural changes and new structures. The solutions of architecture keeping group, can be used for many existing DHT networks, without changing the DHT network or bootstrap approach. The solutions in the first group are flooding and resource testing. The flooding approach is rather a theoretical than useful strategy in large DHTs to prevent the routing process against the sybil attack. Flooding the whole network with routing requests is to expensive in the number of messages, but a querying node will reach all connected honest nodes. So flooding is only useful in small networks, and we will not take a deeper look into it. Resource testing was mentioned as a useful approach [6] to fortify the sybil attack. Resource testing can be described as a stresstest for testable and bounded resources. The main idea behind resource testing is, the assumption that one physical entity has only a limited access to number of resources like IP-addresses or CPU-time. The resource testing approach can further be divided into two classes, the one time resource testing class and the continuous resource testing, called recurring costs. But Douceur [3] has proven, that resource testing is not feasible under usual conditions. The next section offers more details on the feasibility of resource testing. B. Architectural extending The second group of solutions requires architectural changes or further means to provide a sybil resistant DHT network. One of the probably most mentioned solutions in this group is the use of a central authority, who controls the access to the DHT. Every node has to register itself to the central authority or central server. So there is some check between the central authority and each node. This can be done by the well known means like verification or identification via identity cards. The central authority certifies the presented identities and every node has to present its own signed identity to other nodes. This solution shows apparently perfect results in the presence of a strong central authority [6]. But the main problem of this solution is the need for a central server, which is not very common in peer-to-peer systems and decreases the advantages of distributed systems. Another problem is the strength of this solution, because it depends on the strength of the authorization process of the central server. Many CAPTCHAs do not resist a long time against computational attacks, and there is also the possibility to get access to many accounts. Hence the sybil problem returns from peer-to-peer to the strength of single authorization process. There is another class of architectural extending approaches, which makes use of social networks or bootstrap graphs. The next section gives a detailed description of two solutions based on a social network. Both solutions are introduced by Chris Lesniewski-Laas [1], [5] but have a different idea. The first solution builds a one-hop DHT with random walks through the social network and derives a probabilistic model to ensure a small lookup failure rate. The second solution uses an opposite multi-hop approach. The main idea behind this approach is to avoid too much trust into any node. Therefor this approach records the usage of every finger and maintains so called trust-profiles for every node. Both solutions show good simulations results, but have critical assumptions of the social network, which will be shown in the next section besides a detailed description of both solutions. Another mentioned solutions in a survey of solutions for the sybil attack are trusted devices [6]. This is a hardware based solution which ensures, for example via cryptographic functions, that the using entity has only a single identity. We will not take this approach into any further consideration, because this hardware devices are not very common in consumers computers. All solutions of both groups have advantages and disadvantages depending on the requirements of the user. Section VII summarizes this dis- and advantages and the next section takes a deeper look into the proof of the unfeasibility of resource testing and section VI follows with a detailed description of the most promising distributed, social network approaches. V. USELESS RESOURCE TESTING? As described, resource testing tries to find out, whether every new node, introduced to the peer-to-peer network, is a own physical entity with computational or bandwidth resources. This can be done by checking the IP-address of each node and disallowing duplicates. Another way of resource testing is to check the computational power of each node, with small computational puzzles like bruteforcing hashes. Besides the disadvantages of testing resources, which are power consumption for CPU puzzles and disallowing users behind a NAT 3, this is not the main problem of resource testing. John R. Douceur [3] has proven, that resource testing is very inefficient to prevent the sybil attack. He introduced a very simple and formal model to set up four lemmas and to prove them to show up that resource testing is not quite useful. His model of a peer-to-peer network is made of three parts [fig 2]: A set of infrastructural entities A cloud which offers broadcast communication A pipe connecting an entity to the cloud Nodes send messages through the pipe to the broadcasting cloud to communicate with other nodes. He differs between entity and identity. An entity is a physical connected unit, which can present one or even more identities to the peer-to-peer network. In a network without any central server, each entity accepts only identities that it could identify itself or identities which are accepted by other identities who vouch for the new identity. We do not want to go into detail, 3 Network Address Translation, computers behind a router share a public internet IP address. 18

19 SYBIL RESISTANT DHT an attacker has the possibility to register new identities to the social network and to build up contacts between these identities. But they also assume, that the attacker nodes convinces only a small number of honest nodes to build up a social link between attacker and honest node. So there are only a few links from honest nodes to malicious nodes. These links between honest and false nodes are called attack links. Although an attacker can introduce many more malicious nodes behind these attached point. Fig. 2. Formal model of distributed environment. [3] but exemplary show how the proof is done. The first lemma Douceur presents, shows that even with severely resource constraints, a false entity can register a constant number of different identities to the network in the direct validation process: Lemma 1: If p is the ratio of the resources of a faulty entity f to the resources of a minimally capable entity, then f can present g = p distinct identities to local entity l. [3] The proof is quite obvious, but this first lemma shows one problem with resource testing. If the amount of the tested resource differs from entity to entity, there must be a minimal lower bound of needed resources capacities. And every entity with more resources can present g = p identities to other entities. The other three lemmas show, that each entity has to validate simultaneously all identities it is presented. Else an attacker can register a large or even unbound number of false identities. They also show, that a sufficiently large set of false entities can register an unbounded number of identities by vouching for themselves. And at last they show up, that all nodes must perform their identity validations at the same time to resist a multiple identity registration. These lemmas and proofs leads Douceur to the statement, that at sybil resistant resource testing approach needs has to fulfill the following requirements: [3] All entities operate under nearly identical resource constraints. All presented identities are validated simultaneously by all entities, coordinated across the system. When accepting identities that are not directly validated, the required number of vouchers exceeds the number of systemwide failures. These are requirements, which probably cannot be fulfilled in any practical distributed system. Douceurs model is kept very simple, but it can easily transported to other peer-to-peer systems and demonstrates the weakness of resource testing. VI. BOOTSTRAP BASED DHT ROUTING Both solutions in the following paragraphs are based on a bootstrap graph, as described in II. Both confirm, that A. One-Hop DHT, probablistic approach The first presented solution solves the Byzantine Dissidents Problem in a DHT network with a probabilistic onehop routing protocol. The Byzantine Dissidents Problem was mentioned in the beginning as the wild goose chase. It is introduced by Chris Lesniewski-Laas [5] and the basic idea is to use the assumptions on the social network to decrease the lookup failure rate. The approach builds up its own onehop DHT, although it is based on the SybilLimit system [9]. To understand this solution it is not necessary to describe the SybilLimit system. The social network is used to construct a finger table for every node. The underlaying social network has n honest nodes with m undirected edges connecting them. The social network should be fast mixing. The mixing time is a measure for the radius of honest node in the network. Every random walk with the length d from a node k should result in an uniform distribution of honest end nodes. The network is known as fast mixing if d = O(log n). Every node constructs its routing table with r = Θ( m log m) entries. Each entry is an endpoint of a random walk with the length d, starting with nodes, where a social link exists. These endpoints of the random walks are called fingers. Every node in the network constructs its own routing table with r fingers by initiating these r random walks. Lesniewski-Laas estimates the probability of a random walk, without hitting a false node, of a length d with O( g log n n ). Where g is the number of attack edges in the social network. This probability is called the escape probability and he assumes, that g should be bounded by o( n log n ), this limits the escape probability to o(1). So his work is based on the assumption that the attacker could not create more than o( n log n ) attack edges, and we will get back to this later on. The routing algorithm is not very complex. Every source node s, which searches for a target node, broadcasts the target identifier t to all its fingers f i. He assumes in his paper, that at least Ω(r) fingers of s are honest, and at least one of the fingers of the honest fingers has the target t in its routing table. The probability that the target is in any of the honest sources fingers is Ω(r) m, remind m is the number edges between honest nodes. Following this statement, the probability for failing the request results in p fail = O( 1 m ). So this protocol provides a good probability p hit = 1 p fail to find the target, without getting misleaded by a false node. Lesniewski-Laas also shows up some improvements in his paper on his protocol to reduce the traffic of the sent messages. As described, the protocol begins with the broadcast from node 19

20 SYBIL RESISTANT DHT Central Server TABLE I SUMMED UP OVERVIEW TO FORTIFY THE SYBIL ATTACK Name Requirements Success Disadvantages The strength against any sybil attack is reduced to the strength of Central Server, Authorization process, Certifying the central server. A good authorization process results in good resistance. Resource Testing One-Hop sybil-proof DHT Multi-Hop trust based routing All entities have identical resources, all presented identities are validated simultaneously by all entities. Initial bootstrap graph Continuous social network Works under perfect conditions. Fortifies very well against o(n/ log n) attack edges. Decreases very fast above o(n/ log n) attack edges. Routing is still possible with many attack edges. Single point of failure. A central server is not very common in peerto-peer systems. Its nearly impossible to reach perfect conditions. Requires large routing finger tables and even more routing messages. Requires a non common bootstrap graph model. Prevents only against o(n/ log n) attack edges. Requires bootstrap graph. Requires more bandwidth in absence of any adversary. Problems with many bottlenecks or a non uniform distribution of the most connected nodes. Assumption can not be hold. s to all fingers to lookup up the target. This would result in at least r = Θ( m log m) messages. Lesniewski-Laas suggests to use features from existing peer-to-peer networks for example Chord. Every node construct its routing table as described before but every node creates a second table of successors nodes. To get this successors, each node u asks every node f in its finger table for the k = log m nodes next to the hashvalue of u, ID = H(u). The advantage of this approach is, that there are at least Ω(r) honest nodes in the finger table and that a at least these Ω(r) will reply valid successors of u. Using these techniques, Lesniewski-Las showed up, that O(1) messages are enough to get to the next hop. In the worst case scenario, the node u has to send log m messages to find the target node t. These whole paper is the first approach to get a sybil resistant routing protocol with a sublinear number of messages (the simple solution is to broadcast O(n) messages, described as flooding in privious section). Lesniewski-Laas substantiates his suggestion with simulation results. He implemented a simulation proof-ofconcept on a graph with nodes and edges without any separated clustes and nodes with a smaller degree than 5. His simulation runs with, randomly chosen, nodes swapped from honest to false nodes and all the edges to this nodes have been marked as attack edges. As estimated with a number g = o(n/ log n) of attack edges the number of failed lookups was very low (< 5%). Even with some more attack edges, the routing protocol successfully resist against evil nodes, but with a rising number of attack edges the number of failed lookups dramatically increases. To give an example with g = the failure lookup rate is stated with 15.8%. One problem beside the assumption of this suggested approach is the size of the routing table. Lesniewski-Laas suggests a finger table size of r = Θ( m log m) to proof his sybil resistant routing. A bigger peer-to-peer network with approximately edges, requires a maintaining of a routing table with more than entries per node. Normally there are many leaving and entering nodes and so a maintaining is even harder. B. Multi-Hop DHT, uniform trust approach Danezis, Lesniewski-Laa, Kasshoek and Anderson showed up another solution to the sybil attack based on a bootstrap graph in their paper Sybil-resistant DHT routing [1]. In contrast to the first approach, this solution is based on a multi-hop routing protocol implemented in Chord. But the sybil resistant features they developed, could be implemented in other networks, too. The main idea is to uniformly use all nodes in the network to avoid bottlenecks nodes. Looking at the assumption, the attacker is attached at only a few points in the network and thats why the goal of this solution is not to give too much trust into any node. The common routing strategy in Chord is closeness routing, as described before it is a fast lookup strategy but susceptible to the wild goose chase. So they added a routing strategy, called diversity routing. As the name already says, this strategy tries to diversify the routing request on many nodes. The implementation of the diversity routing begins with two changes in the Chord iterative strategy. At first, every new joining node has to discover its successor, predecessor and finger nodes, as usual, but in change also the paths in the social network to these nodes. The second change is, that a queried node has to return all nodes which it knows about and the connections in the social network. In difference to the closeness routing, where every queried node replies with just the next closest node to the target. So every querying node s has the possibility to choose an own way to the target t by getting information from other nodes. The routing works as follows: 1) Each initiating node keeps track of queried nodes, done by ID. A histogram is calculated of the frequency with which every node in the network has been on the path in the social network to a queried node so far. The authors of this paper call this a trust profile. 2) A node which wants to initiate a lookup has to calculate a trust profile for every candidate. This trust profile 20

21 SYBIL RESISTANT DHT contains the number of usage of this node in direction to the queried node. 3) These candidate profiles are compared to each other to find out which decision increases the least trust on a single or small amount of nodes. The best and smallest profile and candidate, lexicographically ordered, is choosen. TABLE II NUMBER OF QUERIES TO SATISFY 100 LOOKUPS IN CLOSENESS, MIXED AND ZIG-ZAG ROUTING (100 GOOD NODES). [1] Mixed (b=0.2) Number of bad nodes Closeness Zig-Zag % % % % % Good entries in finger table working and outperform any standard routing like closeness routing. With regard to the assumptions the Byzantine Dissidents Problem is solved in both solutions. Fig. 3. Illustrating a step of the diversity routing node selection. [1] This diversity routing does not work in any direction, or even towards the queried target t. So the authors present two strategies to combine the closeness and the diversity routing. The first strategy, mixed routing, just balance the choice between diversity and closeness by a factor b [0, 1]. Given to ranks c i for closeness routing and d i for diversity routing, the new rank r i is calculated by: r i = bc i + (1 b)d i The next hop is a mixed choice between safety and speed. But this very intuitive approach did not show any good simulation results, the mixed routing performs as bad as the closeness routing in the presence of malicious nodes, even by optimizing b. The authors show up another approach, called zig-zag routing. As the name suggests, from hop to hop the strategy changes. One step is done by closeness routing, the next step is chosen by trust. The simulation statistics of the authors demonstrate the best results in comparison to the other mentioned strategies in presence of false nodes. These results of their simulation are shown in table II. The practicability of this approach has the same problem as the bootstrap graph approach before. There is a need to maintain lots of data of many other nodes in a probably fast switching network. With the usage of zig-zag routing, the length of each walk is doubled, so the percentage to hit false nodes increases. C. Advantages Both solutions show really good simulations results. Even with a large number of sybil nodes, both approaches keep up D. Disadvantages Both solutions have disadvantages which cannot be disregarded. At first both need an initial bootstrap graph, this is a huge architectural change for many DHT systems. But the even harder problem is the assumption on the attacker and the number of attached points in the social network. The social engineering techniques are sophisticated enough to show, that it is possible to introduce a lot of sybil contacts into a typical open social network. To sum up, with regard to the assumptions both approaches are promising, but further research with other assumptions has to be done. VII. CONCLUSION We described solutions, how to resist the sybil attack. They are summed up in table I. Non of the given methods can be called the perfect method to get a perfect sybil resistant routing. Every method requires means which are a bit against the spirit of peer-to-peer networks(central server, social network) or the method does not successfully resists against the sybil attack(resource testing). All methods can be used in the specified environments, which fulfill the requirements. The sybil attack seems to be still a problem for many peerto-peer systems which cannot use a bootstrap graph model or central server because of their architecture. To sum up, the sybil problem has to be solved domain specific. There is still a large open field to research methods which prevent a peer-to-peer network against the sybil attack with the usage of different means and in different domains. And we think there is some tradeoff between anonymity and sybil attack prevention. REFERENCES [1] G Danezis, C Lesniewski-Laas, and MF Kaashoek. Sybil-resistant DHT routing. Computer Security..., II-A, II-B, III, IV-B, VI-B, 3, II [2] J. Dinger and H. Hartenstein. Defending the sybil attack in p2p networks: Taxonomy, challenges, and a proposal for self-registration. III [3] J Douceur. The sybil attack. Peer-to-Peer Systems, III, IV-A, V, 2, V [4] John Douceur. Peer-to-Peer Systems, volume 2429 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, Berlin, Heidelberg, October III 21

22 SYBIL RESISTANT DHT [5] C Lesniewski-Laas. A Sybil-proof one-hop DHT. Proceedings of the 1st workshop on Social..., IV-B, VI-A [6] BN Levine, C Shields, and NB Margolin. A survey of solutions to the sybil attack. University of Massachusetts Amherst..., IV, IV-A, IV-B [7] E. Sit and R. Morris. Security considerations for peer-to-peer distributed hash tables. Peer-to-Peer Systems, pages , II-B [8] I. Stoica, R. Morris, D. Liben-Nowell, D.R. Karger, M.F. Kaashoek, F. Dabek, and H. Balakrishnan. Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Transactions on networking, 11(1):17 32, II-A [9] H. Yu, P.B. Gibbons, M. Kaminsky, and F. Xiao. Sybillimit: A nearoptimal social network defense against sybil attacks. In IEEE Symposium on Security and Privacy, SP 2008, pages 3 17, VI-A [10] Q. Zhang, P. Wang, D.S. Reeves, and P. Ning. Defending against sybil attacks in sensor networks. In 25th IEEE International Conference on Distributed Computing Systems Workshops, 2005, pages , III 22

23 KI IN MODERNEN COMPUTERSPIELEN KI in modernen Computerspielen Damian A. Czarny Zusammenfassung Spiele-KI ist ein integraler Bestandteil moderner Computerspiele, welche heutzutage durchaus dazu in der Lage sind überaus realistische und komplexe Spielwelten zu erschaffen. Die vorliegende Ausarbeitung beschäftigt sich mit der Frage, welche Herausforderungen dadurch für die Spiele-KI entstehen und wie diese in angemessener Weise zu lösen sind. Die Ausarbeitung ist dabei in zwei Teile aufgeteilt. Teil eins bemüht sich einen möglichst Einsteiger freundlichen Überblick über die Thematik zu vermitteln. Wohingegen der zweite Teil eine exemplarisch aufgeführte Untersuchung der Spiele KI in First- Person Shootern beinhaltet. Die behandelten Konzepte und Umsetzungstechniken werden dabei anhand eines eigens entwickelten Modells vorgestellt, welches auf dem Modell des Rationalen Agenten von S. Russel und einer FPS-Bot Architektur nach P. Tozour basiert. I. EINLEITUNG Computerspiele 1 können spätestens seit dem Erfolg von Grand Theft Auto 4 (GTA 4), welches laut dem Publisher Take Two im März 2009 bereits mehr als 770 Millionen Dollar Einnahmen einspielte, nicht mehr als Nischenprodukte bezeichnet werden. 2 Diese Zahlen sind umso beeindruckender wenn man bedenkt, dass erst mit der Einführung des Heimund Personalcomputers (PC) in den 1980er Jahren sich Computerspiele zu einem Massenphänomen entwickelten. Ein Rückstand von rund 80 Jahren auf eine Filmindustrie, die ihre erste blühte Zeit mit der Stummfilmära bereits Ende des 19. Jahrhunderts feierte. Doch was ist ein Computerspiel eigentlich, wie wird es definiert? Folgende Definition beschreibt die wesentliche Natur eines Computerspiels durchaus treffend: Ein Computerspiel versteht man am besten als eine in sich geschlossene Mikrowelt, mit eigenen Regeln und einer Geschichte. Das zentrale Element ist die Interaktivität, d.h., ein Spiel ermöglicht dem Spieler eine reichhaltige, interessante Interaktion mit einer eigens dafür geschaffenen Spielwelt. [6, S. 32] Ein jedes Computerspiel verfügt darüber hinaus über einige typische Komponenten die sich grob in zwei Bereiche unterteilen lassen: Technische Komponenten: Grafik, Audio und Physik Spielerische Komponenten: Spiellogik, Geschichte und Künstliche Intelligenz (KI) 1 Oftmals wird zwischen Computerspiel (Spielsoftware für PCs) und Videospiel (Spielsoftware für Videospielkonsolen zum Anschluss an ein TV-Gerät) unterschieden 2 aller_zeiten Zum Vergleich: Um in die Reihe der zehn erfolgreichsten Filmproduktionen aller Zeiten aufgenommen zu werden, werden momentan etwas mehr als 900 Millionen Dollar Einnahmen benötigt. Im weiteren Verlauf dieser Ausarbeitung erfolgt eine genaue Analyse der KI Computerspiel Komponente, auch Spiele-KI genannt. Ziel ist es auf der einen Seite im nächsten Teil der Ausarbeitung einen möglichst hürdenlosen Einstiegspunkt in die komplexe Materie zu schaffen. Auf der anderen Seite soll mit der Aufführung zahlreicher elementarer Definitionen, wie die eines Computerspiels oder die der Künstlichen Intelligenz, eine einheitliche Nomenklatur für den dritten und umfangreichsten Teil der Ausarbeitung etabliert werden. In diesem wird dann mit der Untersuchung der Spiele-KI von First-Person Shootern versucht, die Anforderungen der KI in Computerspielen weiter zu konkretisieren. Die Vorstellung verschiedener Verfahren soll zeigen wie Teile dieser Anforderungen erfüllt werden können. Abgeschlossen wird die Ausarbeitung mit einem Fazit und einer kurzen aber kritischen Betrachtung des aktuellen Entwicklungsstandes der Spiele-KI in der Industrie. II. SPIELE-KI: EIN ÜBERBLICK A. Künstliche Intelligenz und Spiele-KI Künstliche Intelligenz (KI) bezeichnet in erster Linie einen Wissenschaftszweig der Informatik. Im Zentrum der Betrachtung steht dabei die (automatisierte) Schaffung von Intelligenz. Die traditionelle KI hat im Laufe der Zeit den Begriff der Intelligenz aus einer Vielzahl unterschiedlicher Richtungen untersucht. Je nach Interpretation des Begriffs verfolgte die KI einen anderen Weg zur Erforschung und Erschaffung intelligenter Systeme. Eine konkrete Definition der KI ist aufgrund dieser zeitlich geprägten Auslegung, was Intelligenz bedeutet, als durchaus problematisch anzusehen. Stuart Russell versucht jedoch in [14, S. 17f] den verschiedenen Entwicklungsströmen der KI gerecht zu werden und leitet daraus vier übergeordnete Auffassungen ab. Sie definieren ab wann ein künstliches System als intelligent bezeichnet werden kann. Demnach ist ein System genau dann intelligent, wenn dieses System entweder... menschlich denken, rational denken, menschlich handeln oder rational handeln kann. Rationalität meint in diesem Kontext die Fähigkeit, das Richtige oder das Bestmögliche im Sinne einer vorgegebenen Richtlinie, der maximierbaren Nutzenfunktion, zu tun. Somit konzentriert sich die KI unter Betrachtung der Rationalität auf das Ziel ein Problem optimal zu lösen. Das Entscheidende bei dieser Betrachtung ist die Vernachlässigung des Prozesses zur 23

24 KI IN MODERNEN COMPUTERSPIELEN Entstehung des Lösungsweges. Der Lösungsweg muss nur optimal unter der Nutzenfunktion sein, wie jedoch die Lösung erreicht wurde, ob durch Berechnungen, logisches Schließen oder auf menschenähnliche Weise ist nebensächlich. Nach Russell bildet diese Auffassung von Intelligenz das rationale Handeln am besten ab und ist somit seiner Meinung nach die aktuell erstrebenswerteste Richtung der KI Forschung. Russell begründet diese Annahme indem er aufführt, dass rationales Denken immer auf korrekten logischen Schlussfolgerungen beruht und diese allein nicht die ganze Rationalität darstellen, weil es häufig Situationen gibt, wo man zwar nicht das beweisbar Korrekte tun kann, jedoch etwas getan werden muss. Wenn im weiteren Verlauf dieser Ausarbeitung von der traditionellen KI die Rede ist, dann wird dabei die akademische Betrachtung des rationalen Handelns gemeint. (vgl. [14, S. 58ff]) Spiele-KI ist ein Sammelbegriff für die praktische Ausübung einiger Teilbereiche der traditionellen KI in Computerspielen. Somit sind Spiele als Anwendungsszenario für die von der KI zu entwickelten intelligenten Systeme zu verstehen. Die Spiele-KI Entwicklung ist allerdings keine primär akademisch betriebene Ausübung, sondern eine industrielle. Somit hat sich der Begriff Spiele- KI auch für die unterschiedlichen Zielsetzungen beider Lager in der Entwicklung von KI-Systemen durchgesetzt. Die größte Abgrenzung zwischen industriell-geprägter Spiele-KI und traditioneller KI liegt im unterschiedlichen Problemlösungsansatz begründet. Für die Spiele-KI steht nicht das optimale Lösen eines KI-Problems im Vordergrund, sondern eine Problemlösung die auf die Maximierung des Unterhaltungswertes ausgerichtet ist. Die Problemlösung muss sich dem zufolge in erster Linie an den Spielspaß, die Hardwareleistung der Kundenzielgruppe und an die Ziele des Spieldesigns ausrichten. Im Extremfall interessiert sich eine Spiele-KI überhaupt nicht für optimale Ergebnisse. So könnte die aktuelle Schwierigkeitsmodellierung vorgeben in bestimmten Situationen nicht optimale Aktionen zu wählen, um den Spieler nicht zu überfordern. Auch kann im Sinne der Abwechslung die Wahl nicht optimaler Ergebnisse sinnvoll sein. In der akademischen Welt hingegen wünscht man sich am liebsten beweisbar wiederholbare optimale Ergebnisse. In [11] wird unter der Betrachtung eben jener Zieldefinition geschlussfolgert, dass die daraus abgeleitente wesentliche Rolle der Spiele-KI darin besteht, ein für den Spieler nachvollziehbares, d.h. glaubhaftes, und autonom wirkendes Verhalten von Nicht-Spieler Charakteren (NPC - Non-Playing Characters) zu realisieren. Intelligente NPCs sollen der Spielwelt somit eine gewisse Lebendigkeit und damit auch eine gesteigerte Glaubwürdigkeit verleihen, was sich positiv auf den Spielspaß eines Spiels auswirkt. B. Anforderungen und KI-Typen Die Hauptaufgabe der Spiele-KI besteht noch einmal zusammengefasst darin ein realistisches bzw. in erster Linie glaubhaftes Verhalten von Nicht-Spieler Charakteren wie Gegnern oder Mitspielern zu erstellen. Doch welche Anforderungen werden an eine solche Spiele-KI gestellt und benötigen alle Computerspiele die gleichen Anforderungen? Diesen beiden Fragen soll im Folgenden näher nachgegangen werden. Allgemein lassen sich folgende Anforderungen an eine Spiele-KI ableiten, die sich zu großen Teilen aus den Anforderungen an eine Human Level AI ergeben. Was hauptsächlich daran liegt, dass in dem Groß der Computerspiele hauptsächlich menschähnliche Wesen die Spielwelt bevölkern. (vgl. [9]). Das zu simulierende Verhalten sollte... glaubwürdig, autonom, robust, koordiniert, kommunikativ (intern/extern), geplant und kreativ sein. Weiter sollte es... in Echtzeit reagieren, lernfähig sein, die Umgebung wahrnehmen und mit dieser interagieren können. Die Computerspielbranche stellt weitere exklusive Anforderungen an die Spiele-KI, die sich im Wesentlichen aus wirtschaftlichen Überlegungen heraus ergeben: Anpassungsfähigkeit der Spiele-KI, besonders in Hinblick auf den Schwierigkeitsgrad Maximierung des Unterhaltungswert des Computerspiels (Spielspaß) Abwechslungsreichtum aufweisen um den Spieler so lange wie möglich an das Produkt zu binden Performant sein Kostengünstig sein Die konkreten Anforderungen an eine Spiele-KI und besonders ihr Maß der Realisierung, rudimentär oder auf Human Level AI-Niveau, hängen maßgeblich vom Computerspiel ab in dem sie eingesetzt werden soll. Computerspiele müssen, um sich gut verkaufen zu können, über ein Alleinstellungsmerkmal verfügen, weshalb die Produktion eines Computerspiels und damit die der Spiele-KI immer einer gewissen Einzigartigkeit unterliegt. J. Laird beschreibt in [9] die Anforderungen an eine Spiele-KI anhand von sogenannten KI-Typen. Einen KI- Typen kann man sich als eine Rollenbeschreibung vorstellen, die immer wieder in Computerspielen Verwendung findet. Für den anschließenden dritten Teil dieser Ausarbeitung sind die beiden erstgenannten KI-Typen von Bedeutung, da diese Anforderungen näher untersucht werden. Die Restlichen werden an dieser Stelle nur der Vollständigkeit halber erwähnt. 24

25 KI IN MODERNEN COMPUTERSPIELEN Tactical Enemies repräsentieren gegnerische Einheiten, die vom Spieler bekämpft werden müssen. Sie zeichnen sich maßgeblich durch intelligentes autonomes Handeln aus. Dies schließt Aspekte der Wegfindung, der Umgebungsexploration und Interaktion mit ein. Letztgenannter Punkt bezeichnet Aktionen wie beispielsweise das Aufheben einer Waffe von einem Tisch oder das Öffnen von Türen. Gegner müssen, wie kein anderer KI-Typ, den Erwartungen des Spielers gerecht werden. Dies liegt daran, da diese zentraler Bestandteil der Herausforderungen und somit des Spielspaßes eines Spiels sind. Dabei müssen eventuelle Stärken als auch Schwächen des darzustellenden Gegners nachgebildet werden, d.h. wird z.b. ein menschlicher Wächter verkörpert, sollte dieser auf der einen Seite über rudimentäre Kampferfahrung verfügen und auf der anderen Seite eine verminderte Sehstärke in schlecht beleuchteten Räumen haben. Gegebenenfalls müssen Tactical Enemies auch ein intelligentes Gruppenverhalten aufweisen, was Koordinierungs- und Kooperationsfähigkeiten voraussetzt. Partners sollen den Spieler im Kampf gegen Tactical Enenmies unterstützen. Sie ähneln diesen deshalb stark, weil sie ebenfalls im Kampf ein autonomes und glaubhaftes Verhalten aufweisen müssen. Computerspiele sind fast immer auf den Spieler fokussiert, der dementsprechend oft eine entsprechende Führungsrolle in Anspruch nimmt. Der Schwerpunkt dieses KI-Typs liegt deshalb nicht primär auf autonomen Handeln, sondern auf dem Befolgen der Befehle des Spielers. Dazu ist eine Mensch-zu-Maschine Kommunikation und Koordinierung zu realisieren, die aufgrund der hohen Komplexität momentan leider nur mit stark begrenzten Sprachmitteln umgesetzt wird, wenn überhaupt. Eine solch einfache Befehlsmenge könnte aus folgenden vorgefertigten Befehlen bestehen: verteidige, greife an und folge mir. Essentiell für die Kommunikation und Kooperation mit dem Spieler ist die Herausforderung zu meistern, der KI ein Verständnis von Teamwork zu vermitteln, dazu gehört u.a. die Intensionen des Spielers zu verstehen. Support Characters repräsentieren die auf den ersten Blick simpelste Gruppe der KI-Typen. Sie sollen den Spieler in erster Linie außerhalb des Kampfes unterstützen, z.b. als Ladenbetreiber oder als Auftraggeber. Die Menschzu-Maschine Kommunikation ist aus diesem Grund ihre Hauptanforderung. Allerdings erfährt auch die Maschinezu-Maschine Kommunikation in aktuellen Spielen mehr Beachtung. So werden in Assasines Creed 3 Support Characters dazu verwendet, die Welt mit glaubwürdigen Statisten zu bevölkern. Jeder Statist geht dabei im Hintergrund einem jeweils individuellen, mit anderen nicht unbedingt disjunkten, Tagesablauf nach. An diesem Beispiel kann man erahnen welches Potenzial hier verborgen liegt. Dieser KI-Typ kann mit geringem Aufwand minimal abgedeckt werden, doch schon kleine Steigerungen können zu einem deutlichen Atmosphäre-plus in Sachen realistischer Spielwelt führen. Stratetic Opponents sind zumeist die Wiedersacher 3 des Spielers in Strategiespielen II-C. Ihr Fokus liegt dabei auf einem Ressourcenmanagement, welches das Sammeln, Verwalten und Einsetzen von Ressourcen beinhaltet. Mit den erworbenen Ressourcen werden Gebäude oder Einheiten erstanden und diese dann entsprechend den eigenen Zielen eingesetzt. Dabei sind besonders die Entwicklung und die Einhaltung einer übergeordneten Strategie hervorzuheben, damit die Einheiten auf ein Ziel oder mehrere Ziele hinarbeiten können. In diesem Kontext muss auch auf bestimmte Limitierungen der KI geachtet werden. Der Spieler besitzt mit Maus und Tastatur nur über eine begrenzte Eingabegeschwindigkeit zur Spielweltinteraktion. Eine Spiele- KI muss deshalb den Limitierungen des Spielers angepasst werden um bei diesem nicht den Eindruck mangelnder Fairness aufkommen zu lassen. Units sind die zu befehligenden Einheiten die überwiegend in Strategiespielen dazu eingesetzt werden um bestimmte Spielziele zu erreichen. Ähnlich wie bei den Partners zeichnen sie sich dadurch aus, dass sie primär Befehle entgegen nehmen müssen, die daraufhin selbstständig ausgeführt werden. Selbständig deswegen, weil von ihnen erwartet wird, dass sie ein gewisses Maß an Eigenständigkeit aufweisen, damit der Spieler nicht jeden einzelnen Schritt für alle Einheiten vorgeben muss. Die Eigenständigkeit der Einheiten stellt den Entwickler allerdings vor einen heiklen Balanceakt. Einerseits möchte man intelligente und eigenständige Einheiten, anderseits muss darauf geachtet werden, das Herausforderungspotenzial nicht gegen Null laufen zu lassen, sodass im Extremfall der Spieler gar nicht mehr gebraucht wird. Weiter müssen Hardwaregrenzen beachten werden, die in Spielen mit großen Einheitenmengen mit jeweils eigenständigen KI Routinen pro Einheit unweigerlich erreicht werden würden. Commentators und Story Directors sind KI Typen die für eine relativ kleine Zahl von Spielen benötigt werden. Commentators sollen das Spielgeschehen analysieren und Interpretieren. Die Interpretation soll am besten in natürlicher Sprache an den Spieler wieder gegeben werden. Haupteinsatzgebiet sind hierbei Sportspiele, wie Fußball oder Tennis, die auch in der Realität professionell kommentiert werden. Story Directors hingegen sind die Regisseure eines Computerspiels. Sie überwachen und lenken ein Spiel mit spielvorantreibenden Aktionen. Diese Aktionen können u.a. Informationen und Anweisungen für andere KI-Typen enthalten. Ein Beispiel hierfür wäre eine vom Spieler verratene verbündete Gruppierung anzuweisen in Zukunft nur noch feindselig auf diesen Spieler zu reagieren. C. Spiele Genres Computerspiele lassen sich in Anlehnung an die Filmbranche in sogenannte Spiele Genres einteilen. Allerdings sind die Grenzen zwischen den einzelnen Genres in der Computerspielbranche fließender als bei Filmen. Dieser Umstand wird mit dem aktuellen Trend der bewussten Vermischung von Genreelementen, den sogenannten Genres 25

26 KI IN MODERNEN COMPUTERSPIELEN Mixes, weiter verstärkt. Dennoch lassen sich die Kernelemente eines Computerspiels auf die nachfolgend vorgestellten Genres abbilden. Die Genres werden hier nur der Vollständigkeit halber erwähnt. Aus diesem Grund erfolgt an dieser Stelle eine Begrenzung auf die wichtigsten vier Genres, die größtenteils mit den im vorherigen Abschnitt eingeführten KI-Typen charakterisiert werden können. Weitere Informationen und Beispiele, sowie eine differenziertere Betrachtung von Spiele Genres findet der interessierte Leser in [9] und vor allem in [19]. Action Games, hier übernimmt der Spieler die direkte Kontrolle über eine Figur oder ein Fahrzeug in der Spielwelt. Die grundlegende Herausforderung dieser Spiele besteht darin, bestimmte Aufgaben zu erfüllen die nur mit ausreichend Geschicklichkeit, Reaktionsschnelligkeit und Timing zu meistern sind. Durch Einsatz von Waffen oder Kampfsporttaktiken kann der Spieler sich gegen gegnerische Einheiten - Tactical Enemies - zur Wehr setzten. Darüber hinaus stellen Entwickler dem Spieler immer häufiger computergesteuerte Begleiter - Parnters - temporär oder über die gesamte Spielzeit zur Seite. Das in III verwendete Genre der First-Person Shooter als Anwendungsszenario wird den Actions Games zugeordnet. Adventure Games gehören dem ältesten Genre an, demjenigen von dem behauptet wird der interaktiven Fiktion am Nächsten zu kommen. Ein Adventure bezeichnet nicht ein bestimmtes Spielszenario, wie z.b. Western, Weltraum oder Gegenwart, sondern ist ein Sammelbegriff für eine Ausrichtung auf Rätsel und Aufgaben die eine Interaktion mit der Umgebung erfordern. Eine ausgereifte und cineastisch inszenierte Handlung steht dabei ebenfalls im Mittelpunkt. Zwar können Adventures durchaus actionlastig sein, doch steht der Kampf und insbesondere eine direkte Konfrontation mit dem Gegner nicht im Mittelpunkt des Spiels. Action tritt in diesen Spielen meistens in Form von Mini-Spielen auf, wie dem Drücken von bestimmten Tastenfolgen unter Zeitdruck. Adventure Games stehen und fallen mit ihren Supporting Characters. Der Spieler muss mit ihnen interagieren können und sie als realistische, von Zielen getriebene, Individuen wahrnehmen. Weiter ist der Story Director aufgrund der Hervorhebung der Handlung und der daraus einzuleitenden Auswirkungen auf die Spielwelt von großer Bedeutung. Strategy Games erfordern schnelles und vorausschauendes strategisches oder taktisches Planen. Der Spieler kontrolliert eine Vielzahl von Kampfeinheiten die je nach Spielsetting Panzer, Ritter, Orks oder andere Figuren verkörpern können. Der Spieler sieht das Spiel meistens aus einer entfernten Draufsicht, auch Gottansicht genannt, die vor allem den nötigen Überblick zum Befehligen der Einheiten bieten soll. Hauptaufgabe des Spielers ist die bei den Stratetic Opponents aufgeführten Aufgaben zu erledigen. Man unterscheidet zwischen zwei Arten von Strategiespielen: Rundenbasierte Strategiespiele (TBS - Turn-based strategy) und Echtzeit Strategiespiele (RTS - Real-time strategy). Sport Games stellen in erster Linie virtuelle Formen real bekannter Sportarten wie Mannschaftssport, Rennsport, Athletik oder Extremsport dar. Die Anforderungen an die KI sind vielfältig und fordernd, da in den meisten Fällen von einer großen Wissensbasis des Spielers ausgegangen werden muss. Somit könnten bereits kleinste Abweichungen zur Realität den Spielspaß merklich trüben. Die KI wird generell in drei Rollen eingesetzt. Die erste Rolle ist besonders in Mannschaftssportspielen ähnlich der in Strategiespielen. Hier werden über Einheitenkontrolle die individuellen Sportler kontrolliert und anhand einer übergeordneten Taktik eingesetzt. Dies ist deshalb notwendig, da der Spieler meistens zu einem bestimmten Zeitpunkt nur einen Spieler direkt kontrollieren kann. Verstärkt wird die strategische Komponente, wenn der Spieler als Trainer auch Management Aufgaben übernimmt. Die zweite Rolle entspricht der der Stratetic Opponents um z.b. die gegnerischen Einheiten oder das gegnerische Team zu leiten. Die letzte Rolle entspricht dem Einsatz von Commentators. III. SPIELE-KI IN FIRST PERSON SHOOTERN - EIN ANWENDUNGSBEISPIEL Nachdem in II ein Überblick über die Anforderungen der KI in verschiedenen Spielen gegeben worden ist, soll in diesem Teil der Ausarbeitung eine exemplarisch aufgeführte Untersuchung der Spiele-KI von First-Person Shootern (FPS) aufgeführt werden. FPS stellen mittlerweile ein eigenes Untergenre der Action Games dar. In denen der Spieler bildlich gesprochen in die Haut der Spielfigur schlüpft. Dabei wird versucht den Eindruck zu erwecken die virtuelle Umgebung mit den Sinnen der Spielfigur wahrzunehmen, welche in der Regel auf Sehen und Hören beschränkt sind. Man spricht in diesem Zusammenhang von der sogenannten Ich- oder Ego-Perspektive, welche auch in anderen Medien wie z.b. Büchern Verwendung findet. Ziel der Untersuchung ist es, anhand der Anforderungsbeschreibung eines Tactical Enemies aus Abschnitt II-B einige konkrete Techniken, Konzepte und Mechanismen vorzustellen, mit deren Hilfe man einige der zuvor genannten Anforderungen erfüllen kann. Die Begrenzung auf einen KI-Typen und auf ein konkretes Untergenre soll es ermöglichen, trotz der relativ geringen Anzahl an Techniken, ein in sich abgeschlossenes Bild einer Spiele-KI und ihrer Komponenten aufzuzeigen. An erster Stelle der Untersuchung steht die Einführung des bereits in II-A erwähnten Konzepts der Rationalität. Rationales Handeln wurde an dieser Stelle als eine mögliche Auffassung von Intelligenz vorgestellt. Die Betrachtung der Spiele-KI mit Hilfe von Agenten als übergeordnetes Konzept ermöglicht darüber hinaus einen zentralen Zugang zur KI als Wissenschaftsdisziplin und ihrer erprobten Techniken. Außerdem können Agenten als Entwurfsgrundlage zur Entwicklung der zu steuernden Einheiten einer Spiele-KI betrachtet werden. 26

27 KI IN MODERNEN COMPUTERSPIELEN A. Rationale Agenten Stuart Russell beschreibt in [14] einen Agenten als alles, was eine Umgebung über Sensoren wahrnehmen kann und in dieser Umgebung über Aktuatoren handelt.. Dieses einfache Konzept auf die Spiele-KI übertragen bedeutet, dass ein Agent gefilterte Informationen, im weiteren Verlauf als Wahrnehmungen bezeichnet, über seine Umgebung, die Spielwelt, bekommt und dann anhand dieser Informationen entscheidet, wie seine nächste Aktion in der Umgebung auszusehen hat. Weiter geht man üblicherweise davon aus, dass ein Agent zwar seine eigenen Aktionen wahrnehmen kann, allerdings nicht immer auch dessen Wirkungen. Abbildung 1 veranschaulicht diesen Zusammenhang. Die in der Mitte der Abbildung dargestellte Funktion f wird Agentenfunktion genannt und soll anhand einer Wahrnehmungsfolge P die Ausführung einer bestimmten Aktion andeuten. Einfache Reflex-Agenten: Aktionsauswahl basiert nur auf aktueller Wahrnehmung und wird durch Condition- Action-Rules modelliert. Modellbasierte Reflex-Agenten: Es wird ein interner Zustand und ein Weltmodell verwaltet, um nicht beobachtbare Aspekte zu modellieren. Nicht beobachtbare Aspekte sind beispielsweise gegnerische Aktionen außerhalb des Wahrnehmungsbereiches des Agenten. Zielbasierte Agenten: Anhand des Weltmodells und des internen Zustandes beeinflusst eine Zielfunktion die Aktionsauswahl. Suchen und Planen sind wichtige Mittel um Aktionsfolgen zu finden die Ziele erreichen. Nutzenbasierte Agenten: Zielerfüllung wird auf den Nutzen analysiert um zwischen Aktionen oder Aktionsketten die Ziele erfüllen zu unterscheiden. Eine Nutzenfunktion bildet einen Zustand auf eine, den Nutzen darstellende, reelle Zahl ab. Lernende Agenten: Lernen erlaubt in unbekannten Umgebungen eine gegenüber dem Ausgangswissen größere Kompetenz aufzubauen. Prinzipiell kann man einen Nutzenbasierten Agenten, egal ob mit oder ohne Lernaspekt, als den Agenten bezeichnen der einem rationalen, und somit intelligenten, Agenten am nächsten kommt. Im weiteren Verlauf wird mit Agent genau dieser Typ bezeichnet. Abbildung 2 fasst die wichtigsten Komponenten dieser Agentenart nochmal zusammen. Abbildung 1. Der schematische Aufbau eines Agenten. [14] Ein rationaler Agent ist in diesem Sinne ein Agent der das Richtige bzw. das Bestmögliche tut. Um entscheiden zu können was das Beste ist, braucht der Agent eine Leistungsbewertungsfunktion, die eine Bewertung seiner Aktionen ermöglicht. Wie diese auszusehen hat ist Anwendungsabhängig. Wichtig ist allerdings eine nicht ergebnisorientierte Funktion zu verwenden, sondern eine die den vom Programmierer gewünschten Weg zum Ziel belohnt. Ein rationaler Agent zeichnet sich dadurch aus, diese Leistungsbewertung unter Betrachtung seiner Wahrnehmungen, eventuellen Vorwissen und den wahrscheinlichen Auswirkungen seiner möglichen nächsten Aktionen zu maximieren. Rationalität sollte in der Anwesenheit von unvollständigem oder unsicherem Wissen nicht mit Perfektion verwechselt werden. Perfektion maximiert immer die tatsächliche Leistung. Rationalität maximiert dagegen die erwartete Leistung unter Berücksichtigung des aktuellen Wissens. Generell werden mindestens fünf Agententypen unterschieden die, mit Ausnahme des Lernenden Agenten, als aufeinander aufbauende Prinzipien betrachtet werden können: Abbildung 2. Der Nutzenbasierte Agent und seine wichtigsten Komponenten. [14] B. KI in First Person Shootern In III-A wurde der Agent u.a. als ein Entwurfsmodell für eine zu steuernde Einheit der Spiele-KI vorgestellt. Im Bereich der FPS spricht man in diesem Zusammenhang von sogenannten Bots. Für gewöhnlich steuert ein Agentenexemplar ein Exemplar eines Bots, mit anderen Worten einen konkreten Tactical Enemy, wie z.b. einen gegnerischen Wächter oder einen Soldaten in der Spielwelt. In Abbildung 1 wurde gezeigt, dass sich der eigentliche Entscheidungsprozess abstrakt als eine Aktionsauswahlfunktion darstellen lässt. Die darauf 27

28 KI IN MODERNEN COMPUTERSPIELEN aufbauende Verfeinerung des Agenten aus Abbildung 2 zeigt, welche Abstraktionen mit einem Zustandsmodell, einer Leistungsbewertungsfunktion, einer Ziel- und Nutzenfunktion sich hinter dem Anfangs simplen Modell verbergen. FPS oder Computerspiele im Allgemeinen sind mittlerweile in der Lage überaus realistische Spielwelten zu erschaffen, wodurch Bots in die Lage versetzt werden müssen sich in überaus komplexen virtuellen Umgebungen rational Verhalten zu können. Aus diesem Grund schlägt Paul Tozour in [18] eine KI Architektur für FPS vor, die den Entscheidungsprozess in vier hierarchisch angeordnete Komponenten Verhalten, Bewegung, Animation und Kampf unterteilt. Die Verhaltenskomponente ist dabei die übergeordnete Komponente, welche für die Entscheidungsfindung auf der Top-Ebene zuständig ist. Rolle, Abstraktionsniveau und Aufbau entsprechen weitestgehend dem Agentenentwurf aus Abbildung 2. Mit anderen Worten soll die Verhaltenskomponente mit der Umgebung interagieren, einen internen Zustand verwalten, Ziele definieren und auswählen, sowie den Nutzen der Aktionen bestimmen und in letzter Instanz die nächste auszuführende Aktion bestimmen. Die Weiterentwicklung des Agenten besteht nun darin, dass bestimmte aufwendige Entscheidungsprozesse in Unterkomponenten auslagert werden. Die Verhaltenskomponente bestimmt z.b. als nächste Aktion eine Fluchtbewegung statt einer Kampfaktion. Wie diese Flucht genau auszusehen hat, d.h. in erster Instanz welcher Fluchtweg einzuschlagen ist, wird an die Bewegungskomponente weiter delegiert. Die Animationskomponente unterscheidet sich hierbei etwas von den anderen beiden Komponenten, zwar gibt Paul Tozour in seinem Artikel keine genaue hierarchische Anordnung seiner Komponenten vor, doch wird ersichtlich, dass die beiden Komponenten Kampf und Bewegung zusätzlich zur Verhaltenskomponente einen direkten oder indirekten Zugriff auf die Animationskomponente benötigen. Sie Animationskomponente ist hauptsächlich für das Abspielen passender Animationsequenzen zuständig. Wenn man die Animationskomponente auf den Agentenentwurf überträgt, könnte man sie als Bestandteil der Aktuatorkomponente auffassen. Abbildung 3 stellt zusammenfassend eine mögliche Erweiterung des Agentenentwurfs nach Russell (Abbildung 2) um die von Tozour vorgeschlagenden FPS Komponenten dar. Nachfolgend soll eine detailierte Vorstellung von möglichen konkreten Implementierungstechniken für die Verhaltens-, Bewegungs- und Kampfkomponente untersucht werden. Die Animationskomponente wird nicht weiter betrachtet. Es sei nur noch folgende kurze Schlussbemerkung über sie gestattet. Die Animationskomponente verwaltet unter anderem ein Skelettmodell des Bots, um parallele Animationssequenzen abspielen zu können, wenn diese über eine disjunkte Menge an betroffenen Körperregionen verfügen. In diesem Zusammenhang spielt auch die Priorisierung von Animationssequenzen eine wichtige Rolle. Beispielsweise hat das virtuelle Ableben des Bots den Abbruch aller aktuellen Animationen zur Folge, weil eine Sterbeanimation sofort abzuspielen ist und sozusagen alle anderen Sequenzen unterbindet. Für diese oder andere Aufgabenbereiche werden durchaus KI-Techniken wie Koordinierungsmethoden eingesetzt, doch spielen diese weit weniger eine Rolle als in den anderen Komponenten. Aus diesem Grund könnte man auch durchaus die Auffassung vertreten, die Animation der Bot Aktionen nicht als Teil des Bots selber anzusehen, sondern eher als Aufgabe und somit Bestandteil der Gameengine zu verstehen. Abbildung 3. Erweitertes Agentenmodell für FPS. Die Verhaltenskomponente ist dabei als der komplette Agent aufzufassen. Basierend auf [14] und [18] C. Verhaltenskomponente Es gibt eine Vielzahl von Möglichkeiten den Entscheidungsprozess zu implementieren. Im Folgenden werden drei davon etwas näher behandelt: Spielbäume, Planer und Endliche Automaten. Spielbaum Ein Spielbaum modelliert den Entscheidungsraum mit Hilfe von Bäumen. Ein Knoten repräsentiert dabei einen möglichen Spielzustand, wobei die Wurzel den jeweils aktuellen Spielzustand aus Sicht des Bots beschreibt. Die Kanten sind mit den jeweils erlaubten Aktionen eines Zustandes gekennzeichnet und führen zu Folgezuständen die nach der Ausführung der entsprechenden Aktion resultieren. Die zukünftigen gegnerischen Aktionen spielen bei der Entscheidungsfindung eine elementare Rolle, weswegen sie und die der Mitspieler ebenfalls in den Baum integriert werden. Der Baum enthält dementsprechend abwechselnd die möglichen Aktionen des Bots und die der anderen Spieler. Es werden in der Theorie solange alle Kombinationen aufgebaut bis jeder Pfad im Baum zu einem Blattknoten führt. Ein Blattknoten beschreibt einen Terminalzustand, d.h. einen Zustand in dem keine weiteren Aktionen mehr ausgeführt werden können. Ein Terminalzustand ist die einzige Stelle im Baum in der eine verlässliche Nutzenbewertung stattfinden kann. Die denkbar einfachste Variante einer finalen Zustandsbewertung (Nutzenbewertung) wäre eine nummerische Aussage über Sieg oder Niederlage 28

29 KI IN MODERNEN COMPUTERSPIELEN wie z.b. folgende die den Ausgang eines Duells mit einem anderen Bot bewertet: f(duellgewonnen) = 1 f(duellv erloren) = 0 Wenn man einen solchen Spielbaum aufgebaut hat, kann das Entscheidungsproblem der nächsten besten Aktion als Suchproblem aufgefasst werden und somit alle gängigen Suchalgorithmen wie Minimax- oder Alpha-Beta-Suche angewandt werden. Die Nutzenbewertung der Blätter ermöglicht es das Suchproblem als die Maximierung des Nutzens zu definieren, somit würde im obigen einfachen Beispiel die Aktion ausgewählt werden, die zu den meisten Blättern mit f(duellgewonnen) führt. Dieser Ansatz wird überaus erfolgreich in klassischen Spielen wie Schach oder Kartenspielen verwendet, eignet sich jedoch nur begrenzt für komplexere Computerspiele, weil hier der Suchraum sehr schnell überaus groß werden kann, wodurch eine effektive vollständige Suche fast unmöglich wird. Es existieren einige Erweiterungen, wie die Verwendung von Heurisiken, oder Variationen, wie der Einsatz von Simulationen (Monte-Caro Simulation [2] oder UCT Suche [4]), die das Ergebnis einer vollständigen Suche approximieren. Diese sind jedoch nicht in der Lage das Performance Problem in angemessener Weise für komplexe Spiele zu lösen, weswegen ein Einsatz, wenn überhaupt, nur für kleine Teilentscheidungen in Betracht kommt. Für weitere Informationen zu Spielbäumen und konkreten Beispielen dazu sei an dieser Stelle auf [14, Kap. 6]) verwiesen. Planen Statt einer Suche im Entscheidungsraum kann auch ein Planungsalgorithmus zur Planung eingesetzt werden. Ein Plan ist dabei definiert als eine Abfolge von Aktionen, die es zu ermitteln gilt. Planen beschreibt somit den Prozess, wie man von einer Ausgangssituation zu einer gewünschten Zielsituation gelangt. Im Allgemeinen wird zwischen Vorwärts- und Rückwärtsplanern unterschieden. Am einfachsten stellt man sich Planen als ein Wegebzw. Navigationsproblem in einem Graphen vor, bei dem die Vorwärtssuche jeweils vom Start zum Ziel und die Rückwärtssuche vom Ziel zum Start sucht. Planen ist somit als eine spezielle Art der Suche zu verstehen, welche um eine aussagekräftige Aktionsbeschreibung in einer logischen Repräsentation erweitert wird. Planer ermöglichen eine effektive Suche in großen Zustandsräumen, weil anders als bei der normalen Suche nicht nur in den Blättern, sondern dank der logischen Aktionsbeschreibung in jedem Planungsschritt ausreichend Informationen zur Nutzenevaluierung zur Verfügung stehen und somit eine schnelle Konzentration auf vielversprechende Aktionen möglich wird. Der größte Nachteil des Planes liegt jedoch in der Schwierigkeit der Erfassung der Spielwelt als eine logische Repräsentation, die wie beschrieben die Grundlage des Planens darstellt. Planen wird von mehreren Spielen erfolgreich eingesetzt. Das FPS Spiel KIllzone [13] beispielsweise verwendet das sogenannte HTN-Planen (Hierarchical Task Network).HTN- Planen unterteilt dabei das Ziel durch Methoden in viele kleinere Teilziele, auch Tasks genannt. Anders als in klassischen Planungsalgorithmen wie STRIPS fließt bei der Task-Zerlegung eine Menge an spezifischen Domänenwissen von Experten in das Planen mit ein, wodurch dem HTN- Planen eine große praktische Relevanz zukommt. Eine Wiederverwendbarkeit für andere Computerspiele wird allerdings dadurch merklich erschwert. Planen ist eine fortschrittliche und komplexe KI-Technik, die zum Verständnis ein entsprechend fundiertes KI- Wissen voraussetzt. Aus diesem Grund wird nicht der Planungsansatz, sondern die dritte hier behandelte Möglichkeit zur Entscheidungsfindung ausführlicher behandelt - der Einsatz von sogenannten Finite State Maschines (FSM), zu deutsch: Endliche Automaten. Die Verwendung von Endlichen Automaten stellt den in der Praxis am verbreitetesten Ansatz in der Industrie dar. Für den weiter an Planen interessierten Leser sei für allgemeine Informationen auf [14, Kap. 11] verwiesen. Detailliertere Ausführungen zum HTN-Planen finden sich in [10, S. 2fff]. Endlicher Automat Endliche Automaten gehören zu der wohl bekanntesten Problemklasse der Automatenlehre. Im Grunde ist ein Endlicher Automat ein gerichteter Graph mit einer endlichen Anzahl an Zuständen und Transitionen. Er verhält sich somit ähnlich einem Baum, in dem Kanten in alle Richtungen und in beliebiger, aber endlicher, Zahl erlaubt sind. Ähnlich zum Wurzelknoten eines Baumes muss jeder Endlicher Automat einen Startzustand und, analog zu den Blättern eines Baumes, mindestens einen Endzustand auszeichnen. Zustände im Allgemeinen beschreiben auf abstrakte Weise das Verhalten des Bots. Die Transitionen ermöglichen Zustandsübergänge bzw. im Kontext des Bots Verhaltensänderungen. Der aktuell besuchte Zustand wird dabei über mehrere Aktionsentscheidungen hinweg gespeichert. Für gewöhnlich kann dabei zu einem bestimmen Zeitpunkt allerdings nur jeweils ein Zustand aktiv sein. Endliche Automaten werden in deterministische und nichtdeterministische unterteilt. Bei ersteren handelt es sich um Endliche Automaten, wo der Folgezustand eindeutig durch den aktuellen Zustand, der aktuellen Eingabe und falls vorhanden den Informationen über den aktuellen Zustand des Weltmodells bestimmbar ist. Mit anderen Worten, wenn der Bot sich in einer bestimmten Situation immer gleich verhält. Andernfalls spricht man von Nichtdeterminismus. Nichtdeterminismus ist dann erwünscht, wenn man Unwissenheit über die Umgebung modellieren möchte oder eine gewisse Unberechenbarkeit erreichen möchte. Mittels Potenzmengenkonstruktion lässt sich allerdings jeder nichtdeterministische Automat in einen äquivalenten, unter Umständen aber viel komplexeren, deterministischen Automaten transformieren. 29

30 KI IN MODERNEN COMPUTERSPIELEN Ganz gleich ob deterministisch oder nichtdeterministisch, jeder Endliche Automat muss, wie auch alle anderen auf Rationalität basierenden Entscheidungsverfahren, eine Nutzenbewertung vornehmen um eine Transaktion und somit die nächste Aktion des Bots zu bestimmen. Es existieren mehrere Möglichkeiten eine solche Evaluierung zu erreichen. Eine mögliche wäre die Zustandsübergänge mit logischen Bedingungen zu verknüpfen. Ob eine Bedingungskomposition einer Transaktion vom aktuellen Zustand aus erfüllt wird hängt dabei von der aktuellen Eingabe (Bot Wahrnehmung) und vom Weltzustand ab. Eine einfache Evaluierung würde z.b. eine Aktionsauswahl nach der Anzahl der erfüllten (Teil)-Bedingungen vornehmen. Hierzu soll ein Beispiel aufgeführt werden. Für ein ähnliches allerdings ausführlicheres Schritt-für-Schritt Beispiel eines Endlichen Automaten für Computerspiele sei an dieser Stelle auf [1] verwiesen. Betrachtet wird nun der in Abbildung 4 dargestellte Endliche Automat der eine mögliche Modellierung der von Paul Tozour [18] vorgeschlagenen Komponenten einer exemplarischen Verhaltenssteuerung zeigt. Der deterministische Endliche Automat verfügt über folgende Zustände: Idle: Startzustand, der Bot steht Wache und ist bereit in Aktion zu treten. Patrolling: Der Bot folgt einen vom Designer vorgegeben Überwachungsweg um nach neuen Gegnern Ausschau zu halten. Combat: Der Bot ist geht in den Kampfmodus. Das genaue Kampfverhalten wird an die Kampfkomponente weiter geleitet. Fleeing: Der Bot ergreift die Flucht vor Gegnern oder sonstigen Bedrohungsquellen. Searching: Der Bot sucht nach einem geflüchteten Gegner. Die: Endzustand, virtuelles Ableben des Bots. Abbildung 4. Endlicher Automat eines simplen FPS-Bots basierend auf [18]. Wegen der Übersichtlichkeit wurde hierbei auf die Abbildung des Endzustands verzichtet. Zu ihm würde allerdings jeder Zustand mit einer hoch priorisierten Transaktion ishealthzero = true führen. Auch auf die Rückkanten von Fleeing und Searching nach Combat wurde verzichtet. Eine Nutzenevaluierung könnte nun wie folgt aussehen: Wenn man sich im Zustand Combat befindet und alle Informationen des folgende Literal ergeben würde: {isopponent = true}, {ishealthcritical = false}, {isopponentfleeing = false}, {isopponenthealthcritical = false}, {ismygunpowerful = false} Darauf könnte die obige einfache Evaluierung für Flüchten einen Nutzen von drei und für Verfolgen, sowie für Patrouillieren einen Nutzen von eins ermitteln. Bei diesen Nutzenwerten würde dann die Aktion die zum Zustand Fleeing führt ausgewählt werden. Wenn man das Literal im Beispiel folgenderweise abändern würde: isopponentf leeing = true. Dann kämme jeweils ein Nutzen von zwei für Flüchten und Verfolgen heraus. Eine Entscheidung könnte nun entweder rein zufällig erfolgen oder unter Berücksichtigung einer bestimmten Wahrscheinlichkeit, die umso größer ist, je größer der Nutzen der Transition ist. Diese Variante ist in der Fachliteratur als Fuzzy Logic bekannt. Auch der Einsatz einer Priorisierungsliste wäre denkbar, welche vorschreibt, dass bei gleichem Nutzen Flüchten bevorzugt werden sollte. Andere Varianten wie die Gewichtung von erfüllten oder nicht erfüllten Bedingungen wären ebenfalls möglich. Wie man sieht existiert hier ein großer Variationsspielraum. Endliche Automaten sind aus diesem Grund, ihrer Einfachheit, die nur ein begrenztes KI Wissen erfordert, und bekannter effektiver Code- Implementierungen sehr beliebt. So wird diese Technologie in erfolgreichen Spielen wie Quake oder Halo eingesetzt. Endliche Automaten haben allerdings auch Nachteile. Das ursprüngliche Konzept erlaubt zwar anders als bei Spielbäumen das Nachbilden von dynamischen Verhalten, doch kann die Modellierung sehr schnell zu einer hohen Anzahl an Transitionen, Zuständen und Zyklen führen, welche alles andere als einfach zu handhaben wären. Es existieren allerdings Techniken um diesem Problem entgegenzuwirken, wirklich lösen können sie dieses Skalierungsproblem allerdings nicht. Ein Beispiel für eine solche Technik wäre die Einführung von Hierarchien (HFSM), wo ein Zustand auch ein Endlicher Automat bzw. ein HFSM sein kann. Ein Ausführliches Beispiel hierfür bietet Damian Isla unter [7]. Der unter anderem die These vertritt, dass Endliche Automaten von einer verbesserten, d.h. formaleren, Wissensrepräsentation enorm profitieren würden, weil der Informationsgehalt bzw. die Aussagekraft einer Wahrnehmung dadurch deutlich erhöht werden würde, was letzten Endes eine effektivere Modellierung von Endlichen Automaten ermöglichen könnte. Lerntechniken werden momentan u.a. dafür eingesetzt zur Laufzeit genau diese Informationen aus den Wahrnehmungen zu gewinnen. Techniken wie das Induktive Lernen oder der Einsatz von Neuronalen Netzen würden z.b. eine automatische aus der Erfahrung heraus entwickelte Bewertung der Wahrnehmungen oder Wahrnehmungsquellen ermöglichen, die es dem Bot z.b. erlauben würde im Laufe des Spiel zu bemerken, dass die Sichtung eines nahen Gegners immer 30

31 KI IN MODERNEN COMPUTERSPIELEN maßgeblichen Ausgang auf den Spielverlauf hatte als andere Wahrnehmungsaspekte und deswegen stärkeren Einfluss auf den Entscheidungsprozess einnehmen sollte. Die Kombinierung von formalen Methoden und Lerntechniken hätte meiner Meinung eine kommutative Wirkung und somit ein großes Potential. Mit der formalen Modellierung könnte der Experte die seiner Meinung nach relevanten Informationen aussagekräftig und in eine Berechenbare Form aus den Wahrnehmungen in das Modell einfließen lassen. Mit darauf aufbauenden Lerntechniken könnte er dann den Automaten Erkenntnisse lernen lassen die außerhalb der Wissensbasis des Entwicklers liegen. D. Bewegungskomponente Die Verhaltenskomponente bestimmt zwar wie in III-C aufgeführt die Zielbestimmung, wie genau das Ziel jedoch erreicht werden soll fällt in den Zuständigkeitsbereich der Bewegungskomponente. Sie muss demzufolge über eine Wegfindung in komplexen Umgebungen verfügen, entlang der gefunden Wege sich bewegen können und dabei Hindernisse erkennen und ausweichen können. An erster Stelle steht die Entwicklung einer passenden Repräsentation der Spielwelt, der Suchraumrepräsentation, auf dessen Datenstrukturen entsprechende Wegfindungsalgorithmen eingesetzt werden können. Die Wahl einer Suchraumrepräsentation ist stark von der spezifischen Umgebung abhängig. In FPS Spielen haben sich Wegpunktegraphen und Navigation Meshes durchgesetzt, die im Folgenden näher betrachtet werden sollen. Wegpunktegraphen fassen die Spielwelt als einen zusammenhängenden gerichteten Graphen zusammen, wobei ein Wegpunkt bzw. ein Knoten einen Ort repräsentiert der von einem Bot angesteuert werden kann. Die Kanten verbinden Graphen typisch einzelne Wegpunkte miteinander und stellen die Wege auf denen sich ein Bot bewegen kann dar. Wegpunktegraphen fungieren in diesem Sinne als ein stark vereinfachtes Modell der Spielwelt. Dies führt automatisch zu einer Eingrenzung der Bewegungsfreiheit eines Bots, da dieser außerhalb des Graphen keinerlei Informationen verfügt und somit diesen niemals verlassen darf. Auf der anderen Seite ermöglicht die Simplifizierung auf ein einfaches Graphenmodell eine effiziente Berechnung des kürzesten Weges durch Einsatz wohl definierter Graphenalgorithmen wie Hill Climbing, A*, Tiefen- oder Breitensuche. Der Einsatz von Wegpunktegraphen ist wegen seiner Einfachheit und aufgrund einer starken Integrierbarkeit von Expertenwissen beliebt. Der Leveldesigner kann durch geschickte Platzierung von Wegpunkten und durch Hinzunahme von Wegpunkt-Attributen eine durchaus effektive und differenzierbare Bot-Navigationen realisieren. Beispielsweise ist es möglich über Wegpunkte den Bot Informationen über seine nahe Umgebung, wie Erreichbarkeit und Sichtbarkeit entfernter Wegepunkte zu geben. Auch können für die Verhaltenskomponente zur Zielbestimmung relevante Informationen den Wegpunkten angehängt werden, wie z.b. eine Auszeichnung als Scharfschützenpunkt, der beim Besitzt eines passenden Scharfschützengewehrs als potenzielles Bewegungsziel an Attraktivität gewinnen würde. Weiter können Kantenmarkierungen Animationsbefehle enthalten wie sprung- oder duck-anweisungen. Das Einfließen von Expertenwissen wird gleichsam als größter Nachteil dieses Konzeptes aufgefasst. Einerseits ist die Erstellung mit einem erheblichen Aufwand für den Leveldesigner verbunden und andererseits ist die Intelligenz der Bot-Navigation im Wesentlichen vom Können des Leveldesigners abhängig. Um dem ein wenig entgegenzuwirken werden immer häufiger Lerntechniken eingesetzt die mit ausgeklügelten Statistiken den Bot u.a. in die Lage versetzen können bestimmte Wegpunkte zu meiden, weil diese regelmäßig zum virtuellen Tod des Bots und somit dem nicht erreichen des Bewegungsziel geführt haben. Um mit Wegpunktegraphen Hindernisse umgehen zu können muss die simple Graphenstruktur erweitert werden. Dies liegt in erster Linie daran, dass in simplen Graphen nicht zwischen Wegpunkten und Hindernissen unterschieden werden kann. Die Einführung von Radien, die den freien Platz in der Umgebung der Knoten und Kanten angeben, ermöglicht ein Ausschließen von statischen Hindernissen, wie einer Säule oder ähnlichen unbeweglichen und unzerstörbaren Hindernissen aus dem Graph. Abbildung 5 veranschaulicht den Einsatz von Radien zur Hindernisausschließung. Ein netter Nebeneffekt von Radien ist, da sie explizit den Hindernisfreien Raum definieren und damit die Bewegungsfreiheit eines Bots wie in Abbildung 6 zu sehen wesentlich erhöht. Der Bot muss sich nicht länger nur auf den Kanten zwischen den Knoten bewegen, sondern kann wie in Abbildung 5 den ganzen durch die Radien aufgespannten Raum (in der Abbildung der graue Bereich) zwischen Zwei Knoten benutzen. Damit wird ebenfalls ein weiteres Problem von Wegepunktgraphen gemindert - das Pfadglättungsproblem. Abbildung 5. Ein Graph bestehend aus vier Knoten aus dem der Hinternisknoten mittels Radien und den dadurch definierten freien Raum aus den Graphen ausgeschlossen wird. Basierend auf [5] Von Bots die Wegpunktegraphen zur Navigation einsetzen wird behauptet sie würden sich wie auf Schienen bewegen, da 31

32 KI IN MODERNEN COMPUTERSPIELEN Abbildung 6. Ein Wegepunktgraph in dem ein Pfad von A nach B hervorgehoben ist. [16] Abbildung 7. Ein Navigation Mesh der viel weniger Knoten und Kanten benötigt wie der Wegepunktgraph. Der Freiraum zwischen den Knoten dargestellt als weiße Fläche ist dabei nicht unbekanntes Gebiet, sondern kann zur Wegeoptimierung benutzt werden. [16] sie sich immer auf Geraden entlang bewegen. Diesem Problem wird mit Techniken wie dem Catmull-Rom spline, welches eine Abrundung der Kanten bewirkt, entgegengewirkt, doch wie erwähnt besitzt ein Bot keine Informationen außerhalb des Graphen, sodass eine Abrundung und somit ein Abweichen vom Graphen zur einer Kollision führen kann. Radien erweitern den Informationsraum, sodass der Spielraum für Abrunden innerhalb des Graphen vergrößert wird. Um weiter dynamischen Hindernissen, d.h. beweglichen oder zerstörbaren Hindernissen, ausweichen zu können werden weitere Informationen oder Techniken wie Graphenkommentierung oder Boundingboxes bzw. spheres benötigt. (vgl. primär [17] aber auch [5]) Navigation Meshes repräsentieren nach Paul Tozour [17] eine Menge von konvexen Polygonen, welche die begehbare Fläche einer 3D Umgebung beschreiben. Ein Polygon ist dabei ein Vieleck welches zur Darstellung von dreidimensionalen Objekten benutzt wird. Abbildung 7 zeigt ein Navigation Mesh für das aus Abbildung 6 aufgeführte Beispiel. Die Abbildung zeigt, dass jede Ecke eines Polygons als Knoten und jede Verbindung zwischen zwei Ecken als Kante innerhalb des Graphen aufgefasst werden kann. Entscheidend für Navigations Meshes ist das erstens die begehbare Fläche möglichst vollständig und realistisch von den Polygonen abgedeckt wird. Es macht demnach wenig Sinn mit Navigation Meshes einen Wegepunktegraphen nachzubilden. Zweitens die Eigenschaft der Konvexität. Konvex bezeichnet in diesem Zusammenhang das jede Verbindungsstrecke zwischen zwei beliebigen Punkten innerhalb des Polygons ebenfalls vollständig in diesem Polygon liegt. Diese Eigenschaft garantiert somit dass ein Agent der sich zwischen zwei Knoten bewegt niemals das aktuelle Mesh bzw. Polygon verlässt. Navigation Meshes besitzen nach [16] einige wesentliche Vorteile gegenüber Wegepunktegraphen. So tritt das Pfadglättungsproblem aufgrund der weit genaueren Raumabdeckung des Navigation Meshes weit weniger zu Tage, weil eine Pfadkorrektur und somit eine Hindernisumgehung ohne weites möglich ist. Wohingegen bei Wegepunktegraphen ohne Radien ein plötzliches Hindernis einer Katastrophe gleichkommt, da der kleine durch die Kante vorgegebene Weg nicht verlassen werden darf. Weiter zeigt der Vergleich von Abbildung 7 und 6 das größere Flächen weitaus effektiver und effizienter abgedeckt werden können als es mit einem Wegepunktegraphen, selbst mit Radien, möglich wäre. Außerdem zeigt der Abbildungsvergleich, dass ein Zick-Zag laufen von Bots basierend auf Wegpunktegraphen vermieden wird, der unweigerlich bei diesen entsteht, weil die Freiräume zwischen den Knoten im Wegpunktegraphen unbekanntes Gebiet darstellen und somit nicht betreten werden dürfen Zwar existiert in einem Navigation Mesh ebenfalls Freiraum doch ist dieser auf ein Minimum beschränkt. Als Nachteile von Navigation Meshes werden lediglich die im Vergleich zu Wegepunktegraphen hohen Speicher kosten und die durchaus komplexe und aufwendige Generierung angesehen. Auf beiden Suchraumrepräsentationen kann abschließend einer der bereits erwähnten Graphenalgorithmen angewandt werden. Generell unterscheidet man zwischen einer globalen und einer lokalen Suche, in denen sich jeweils andere Suchalgorithmen etabliert haben. In [12] definiert A. Nareyek die globale Suche als Long-Term Pathfinding und bezeichnet damit eine Art übergeordnete Suche die einen kürzesten Weg von A nach B unter Vernachlässigung einiger Details, wie z.b. dynamische Hindernisse, vornimmt. Der A*-Algorithmus wird hier als populärster Vertreter seiner Zunft aufgeführt, dessen Funktionsweise bei Bedarf ebenfalls in [12] oder in [14] 32

33 KI IN MODERNEN COMPUTERSPIELEN nachgeschlagen werden kann. Die lokale Suche wird hingegen hauptsächlich dazu eingesetzt den optimalen Weg zwischen zwei benachbarten Knoten unter Berücksichtigung aller Details zu finden bzw. den Weg der globalen Suche zu verfeinern. In [12] und in [15] wird die lokale Suche als Short-Term Steering bezeichnet, wobei der Begriff des Steerings aus der Robotik stammt und für eine Reihe von Techniken zur exakten Bewegungssteuerung eines Agenten aus einer eingeschränkten Perspektive steht. Eine mögliche Steering-Technik ist beispielsweise eine Vektorrechnung zur Steuerung, sodass z.b. mittels Vektoraddition des Bewegungsvektors des Agenten mit den Vektorrepräsentationen der Hindernisse ein neuer Bewegungsvektor berechnet wird der somit um die Hindernisse herum führt. Es ist ersichtlich, dass solch eine Art der Berechnung nur für die unmittelbare Umgebung vorgenommen werden kann. E. Kampfkomponente Die Verhaltenskomponente entscheidet ähnlich wie bei der Bewegungskomponente lediglich wann zu kämpfen ist, in der Kampfkomponente erst wird dann entscheiden, wie der Bot letzen Endes den Kampf konkret zu bestreiten hat. In diesen Zuständigkeitsbereich fallen u.a Gegnerund Waffenauswahl, sowie die Entscheidung für das endgültige Zielen und Schießen. Die Mächtigkeit der beiden letztgenannten Punkte ist stark von der Umgebung abhängig in der die Spiele-KI eingesetzt wird. Je nach Mächtigkeit der Umgebungsinteraktionen, zielt der Entscheidungsprozess im feingranularen Fall auf eine Koordinatenangabe in einem drei-dimensionalen Raum ab, sodass jeder einzelner Punkt des Gegners getroffen werden kann, oder im entgegen gesetzten Fall lediglich auf eine Angabe des Feuerzeitpunktes um den Gegner nur zu treffen. Die Kampfkomponente benutzt in der Regel, wie die Verhaltenskomponente Endliche Automaten oder ähnliche darauf aufbauende Konstrukte zur Entscheidungsfindung. Ein wesentlicher Aspekt ist dabei, dass der Entscheidungsprozess in der Regel mindestens zweigeteilt ist. Als erstes wird eine Kampftaktik gewählt, die die anschließende Einzelentscheidungswahl maßgeblich beeinflusst und in regelmäßigen Abständen auf ihre Gültigkeit hin überprüft wird. Die größte Herausforderung bei der Taktikauswahl ist die momentane Spielsituation bestmöglich erfassen und interpretieren zu können. Eine Taktik wird in der Regel als vordefinierte Regelmenge mittels Expertenwissen realisiert, welches wenn aktiv den Entscheidungsprozess auf eine gewisse Weise beeinflusst. Eine mögliche Technik zur Taktikauswahl ist die aus III-D eingeführten Wegepunkte mit taktisch relevanten Informationen zu versehen und diese während des Spiels in die Entscheidungsfindung mit einfließen zu lassen. (vgl. [18]) Ein für die Taktikauswahl weiteres wichtiges Element ist die Gegnermodellierung. Eine exakte Modellierung ermöglicht es fundierte Aussagen über das zukünftige Gegnerverhalten zu tätigen und demensprechend die eigne Taktik darauf auszurichten. Die Gegnermodellierung ist ebenfalls für das Zielen und Schießen essentiell. Damit ein sich in Bewegung befindlicher Gegner getroffen werden kann, muss eine Voraussage über sein Verhalten, ins besondere seine zukünftige Bewegung in Bezug auf Richtung und Geschwindigkeit getroffen werden. In diesem Fall hilft ein sogenannter Tracker [3] welcher im einfachsten Fall die Projektilgeschwindigkeit der aktuellen Waffe, sowie Geschwindigkeit und Richtung des Ziels berechnet und somit nicht auf die aktuelle Gegnerposition zielt, sondern auf eine voraussichtlich zukünftige Position, wie es in Abbildung 8 dargestellt ist. Bei diesen Berechnungen ist die Berücksichtigung von Waffeneigenschaften von entscheidender Bedeutung. Die eben aufgeführte simple Berechnung wäre für eine Mehrschusswaffe, wie ein Maschinengewehr oder eine Schrottflinte unbrauchbar und müsste weitere Aspekte beachten. Abbildung 8. Ein simpler Tracker um ein bewegliches Ziel mit einer Einzelschusswaffe zu treffen. [3] IV. ZUSAMMENFASSUNG UND FAZIT Im Rahmen dieser Ausarbeitung wurde gezeigt, dass das Anforderungsspektrum an eine Spiele-KI breit gefächert ist. Die Anforderungen reichen dabei von simpel anmutenden Fähigkeiten wie Bewegen und Kämpfen, hin zu Aspketen menschenähnlicher Verhaltensnachbildung (Human Level AI), wie der Kommunikations- und/oder Koordinierungsfähigkeit in Gruppen. Mit der Analyse eines Tactical Enemies in First- Person Shootern wurden Techniken und zu berücksichtigende Aspekte zur Erfüllung einiger der zuvor aufgeführten Anforderungen vorgestellt und analysiert. Ein wesentliches Merkmal dieser Untersuchung war die Zusammenführung von Expertenwissen, eines in der Industrie tätigen KI-Entwickler wie P. Tozour, mit etablierten Konzepten der traditionellen KI, wie dem Modell des rationalen Agenten nach S. Russel. Anhand dieses Modells wurden eine Reihe von Techniken und Alternativen in den Bereichen Verhalten, Bewegung und Kampf aufgezeigt und auf etwaige Stärken und Schwächen der verschiedenen Ansätze eingegangen. 33

34 KI IN MODERNEN COMPUTERSPIELEN Ein Aspekt der bei dieser vorgenommenen Betrachtung vielleicht nicht jedem Leser sofort ins Auge springt, ist der Einfluss der Spiele-KI auf das Gamedesign. Wenn man beispielsweise die Kampfkomponente betrachtet muss man unweigerlich feststellen, dass diese Komponente, wie keine andere KI Komponente, wesentlichen Einfluss auf die Spielbalance ausübt. Die Kampfkomponente muss auf die Modellierung menschlicher Schwächen achten. Ein augenblicklich schießender, dabei noch perfekt zielender, Bot wäre für die meisten Spieler unbezwingbar. Wenn man sich jetzt ein Missionsdesign vor Augen führt, welches ohne Berücksichtigung der KI, in einer besonderen Spielsituation eine Spielerkonfrontation mit fünf oder sechs solcher Gegner vorsieht, würde man wahrscheinlich zu dem Schluss kommen, dass auch der weltbeste Spieler nie den Abspann dieses Spiels erleben würde. Auf der anderen Seite ist ein Strategiespiel, wo eine intelligente autonome Einheiten-KI zum siegreichen Beenden einer Mission den Spieler nicht oder nur sehr eingeschränkt benötigt, ebenfalls nicht zweckdienlich. Wenn doch müsste dies im Gamedesign von Beginn an explizit berücksichtigt werden um den Spieler vor andere Herausforderungen zu stellen. Für Heiko Klinge ist dieser Einfluss in [8] einer der Hauptgründe warum die KI in Computerspielen seit Jahren auf geringem Level stagniert. Die Spiele-KI wird trotz ihrer großen Bedeutung meist relativ spät eingeplant. Bei Verzögerungen nicht selten erst in ein fast fertiges Spiel im Nachhinein rein programmiert. Als Gründe hierfür werden eine falsche Priorisierung der Spiele-KI, die nicht als das Verkaufsargument von der Marketing Abteilung angesehen wird, und eine Unterschätzung des Entwicklungsaufwands aufgeführt. Das es allerdings auch anders geht hat in der jüngsten Zeit vor allem Assasines Creed gezeigt. Die Entwicklung von belebten Städten mit Hunderten von Passanten mit jeweils individuellen Tagesabläufen war von Beginn an zentrales Spielelement und dass für den Publisher, für das Marketing und damit auch für den Entwickler selbst. Demnach lässt die Zukunft durchaus Hoffen, besonders wenn man bedenkt, dass Computerspiele es sich nicht auf Dauer leisten können in fotorealistischen Welten über eine dumme KI zu verfügen. Es muss jedoch gesagt werden, dass zwar von Firmen wie Artificial und Xaitment langsam professionelle Tools auf den Markt kommen, diese jedoch noch sehr weit davon entfernt sind in Sachen Mächtigkeit und Ausgereiftheit mit Tools aus anderen Bereichen, wie dem Grafikbereich mit Maya oder 3DStudioMax gleich ziehen zu können. Eine Möglichkeit wie man einen Fortschritt erzielen könnte, ist es, wie in dieser Ausarbeitung gezeigt, sich auf die Erfolge und Konzepte der traditionellen KI zu beziehen und diese als Grundlage für weitere Entwicklungen zu verwenden. [2] Rémi Coulom. Efficient selectivity and backup operators in monte-carlo tree search. Computers and Games, pages 72 83, III-C [3] Daniel Sanchez Crespo Dalmau and Daniel Sanchez Crespo. Game programming: Action oriented ai. Website. articles/article.aspx?p=102090&seqnum=8. Zugriff III-E, 8 [4] Sylvain Gelly and David Silver. Combining online and offline knowledge in uct. In ICML 07: Proceedings of the 24th international conference on Machine learning, pages ACM, III-C [5] Sebastian Hammes. Hindernisnavigation. Website, fh-wedel.de/fileadmin/mitarbeiter/iw/lehrveranstaltungen/2007ss/ SeminarSpieleKI/Ausarbeitung6HindernisnavigationHammes.pdf. Zugriff , III-D [6] Marc Hassenzahl. Attraktive software was gestalter von computerspielen lernen können. In User Interface Tuning. Benutzungsschnittstellen menschlich gestalten, I [7] Damian Isla. Handling complexity in the halo 2 ai. Website. http: // Zugriff III-C [8] Heiko Klinge. Künstliche Dummheit statt Künstliche Intelligenz: Warum Künstliche Intelligenz (KI) in Spielen stagniert. GameStar-Ausgabe 02/2008, IV [9] John E. Laird and Michael van Lent. Human-Level AI s Killer Application. AI Magazine Volume 22 Number 2, II-B, II-C [10] M. Lekavy and P. Návrat. Expressivity of strips-like and htn-like planning. In Agent and Multi-Agent Systems: Technologies and Applications, pages , III-C [11] Michael Mateas. Expressive ai: Games and artificial intelligence. In Proceedings of International DiGRA Conference, II-A [12] Alexander Nareyek. Ai in computer games. Queue, 1(10):58 65, III-D [13] T. Verweij R. Straatman and A. Champandard. Killzone 2 multiplayer bots. Paris Game AI Conference 2009, III-C [14] S. J. Russell and P. Norvig. Künstliche Intelligenz: Ein moderner Ansatz. Pearson Studium, 2nd edition, II-A, III-A, 1, 2, 3, III-C, III-D [15] Simon L. Tomlinson. S.l. tomlinson: The long and short of steering in computer games the long and short of steering in computer. III-D [16] Paul Tozour. Fixing pathfinding once and for all. Website. ai-blog.net/archives/ html. Zugriff , 7, III-D [17] Paul Tozour. Building a near-optimal navigation mesh. In AI Game Programming Wisdom, pages Charles River Media, III-D [18] Paul Tozour. First-person shooter ai architecture. In AI Game Programming Wisdom, pages Charles River Media, III-B, 3, III-C, 4, III-E [19] Wikipedia. Video game genres. Website. Video_game_genres. Zugriff II-C LITERATUR [1] Jason Brownlee. Finite state machines (fsm). Website. com/finitestatemachines/. Zugriff III-C 34

35 REPUTATION SYSTEMS FOR P2P NETWORKS Reputation Systems for P2P Networks Alexander Gebhardt Abstract In P2P-Systems like in economics the knowledge of a transaction partner s reputation may be seen as major factor deciding for a transaction or against it. Many efforts have been made in the research community to model and implement mechanisms that produce reliable reputation information on peers in a network in order to support decision making. This paper introduces a general architecture of reputation systems and details its main facets referring to some of the best known approaches in the field. Topologies, identity models, reputation policies and metrics as well as inherent security issues and solutions are part of the discussion. I. INTRODUCTION In business transactions throughout the economy, trust is a fundamental principle for the assessment of their value and risk. If all parties in such a transaction trust each other, they mutually rely on the display of a certain behavior specified in a contract between them. In Internet economics the market connects huge amounts of users, and the parties interacting with each other are mainly unknown strangers hidden behind synonyms. The Internet auctioning platform ebay operates a large system that tries to establish a basis for trust relationships in its scope. Users in ebay rate each other after successful or unsuccessful transactions and put these ratings on display for everyone else. Malicious behavior such as fraud can be exposed and further transactions with the malicious party become rather improbable. Since the rise of peer to peer networks especially for file-sharing purposes, malicious behavior introduces real security threats to nearly all security goals in participating computer systems, reaching from integrity, authenticity and privacy to confidentiality. The need for a notion of trust in peer to peer networks gave rise to a vast amount of research regarding reputation systems for decentralized networks. Whereas in centralized networks, the known concept of public key infrastructures and trust anchors is applicable for one or more central trusted entities deciding for trust and distrust inside the system, this approach does not scale without major financial investments in bandwidth and computing power. Since 2001 a large amount of different systems have been proposed that collect user opinions expressed similarly to the opinions in ebay from the peer to peer network. They derive from this data local and global reputation measures that can be retrieved by users of the network in preparation of a transaction. In fact, the number of approaches is so large, that there is even a vast amount of surveys that try to categorize the approaches and find common denominators that help to route research among extensions of present models [5], instead of broadening the base of available mechanisms: Jøsang et al. state a certain lack of coherence in the area of reputation systems. In this paper, several surveys are investigated for common classifications of reputation systems and summarized up to a comprehensible and short scheme. This scheme is set into context of approaches that have been analyzed in the surveys as well as approaches, outside the scope of the surveys. The intention behind the choice of these approaches is to discover a wide range of properties that can be encountered in reputation systems. In section II, the overall incentive to put a reputation system in place is differentiated and put into perspective. The following section III explains the notion of reputation and how it can be used to establish trust in decentralized networks. It is subdivided into the main components and properties of reputation systems, namely their architectures III-A, the policies they implement III-B, their identity models III-C, their reputation metrics and processing methods III-D, and finally their security problems III-E. In section IV the main conclusions of this paper are summarized. II. TRUST The first thing to understand with trust in the context of reputation systems is that an entity s reputation may be an indicator for whether to trust it or not, but a high value of reputation is not equivalent to a high value of trust. And even if an entity is trustworthy that does not say anything about its reputation, which might be low because it never interacted with someone. This section will give a short introduction to trust and its facets in peer to peer networks. The notion of trust in the context of computer systems or human interaction via computer systems would be quite simple, if the notion of trust in everyday life was simple. In fact, the everyday life notion of trust is rather complex and bears different meanings in different contexts although used under a common name. To give a clear understanding of what trust means in a peer to peer environment the notion has to be expanded. In the sentence I trust peer X an unconditional trust in a peer as a whole is expressed, which might imply trust in the peer s current behavior, trust in the steadiness of the peer s behavior, trust in the authenticity of its identity and so on. This situation is very rare in everyday life and even more rare in a more or less anonymous network of millions of users one has never met personally. In general, trust is limited to a certain aspect of an entity under specified conditions in a specified context. An example in a peer to peer environment might be: Given the trust in my communication channel, I trust the association of a peers identifier and its real identity in the form of its IP address. Jøsang et al. select in [5] several definitions of trust found in the literature, which sum up to the following: Trusting in the reliability of a peer with regard to a given action that influences the environment of the truster is called reliability trust. The extent to which one party is willing to depend on something or somebody in a given situation with 35

36 REPUTATION SYSTEMS FOR P2P NETWORKS a feeling of relative security, even though negative consequences are possible [5, p. 4] is called decision trust. Both forms of trust incorporate the dependence of the truster on the trustee or his behavior. Decision trust, in contrast to most other trust notions, also incorporates risk which leads to a much broader notion of trust, since trust may also be present in situations where a negative outcome may occur. Provision trust is called the trust of an entity in another entity s behavior compared to its asserted behavior specified in a contract. Identity trust is the trust an entity puts into the association between another entity and its provided identifier, which commonly goes by the name entity authenticity. The term ID will further on refer to an identifier or name in the identity model of a peer to peer system whereas the identity or real identity refers to a credential linked to a user of the system e.g. the IP-address. More generally, trust relationships can be classified in terms of scope and perspective. The perspective of a trust relationship may be either subjective or objective. A persons own experience with another person falls in the category of subjective perspective, as do other people s experiences, whereas an objective perspective can be obtained via standardized tests e.g. of a service providers reliability. The scope is another very important detail in common predications concerning trust and can either be general or specific. While a general notion of trust makes reasoning very simple, it omits many details due to its coarse granularity. A specific predication of trust covers only a very small set of aspects of an entity but makes the circumstances under which the trust holds very precise. III. REPUTATION In all kinds of networks where reciprocally unknown users interact with another by exchanging valuable goods, there is a need for metrics that indicate whether a planned interaction is likely to result in the expected trade or in another undesirable state. In economics, such metrics are partially derived from product prizes, where a conspicuously low price for a product leads to the assumption that it is of very low quality, and a high price to the assumption of superior quality. These assumptions are independent of the entity offering the products. A metric that delivers more reliable information about the outcome of a transaction in this sense is reputation. An entity offering a service that has gained good reputation from earlier service users e.g. contented customers of a store may be very likely to make a fair trade with a new customer. The same can be said about bad reputation with an unfair outcome. In peer to peer networks, there are usually no prices to pay when accessing services, thus, there is no implication for the outcome of a transaction. In general, there are also many service providers that a future user may have never seen or heard of before, leaving a transaction outcome unsure. Like in the real world, a peer has to ask other peers that have knowledge on the reliability of the service provider i.e. their provision trust in him, to be able to decide whether to interact with the provider or not. Reputation systems automatize this process by incentivizing users to rate past transactions and make their rating publicly available for other users. Peer to peer applications strongly rely on the principles of reciprocality and cooperation. The service offered by the peer to peer application gains in quality with each active peer that contributes to the system. As long as every peer is willing to contribute at least the amount of service he consumed, the participation in the network does not have any disadvantage. Since the best local strategy to use such an application is to consume without contributing, the application has to implement a policy that forces or at least incentivizes users to contribute. User of such an application have to see an immediate gain from contribution instead of the hardly visible gain for the overall service quality. The reliability of peers also contributes to the overall service quality. If a peer provides a poor service e.g. providing a low quality file in a file-sharing network, the service consumer will have to make a second attempt to receive a file with an appropriate quality level. If this attempt also fails, the user may decide to leave the network and thereby reduces the global service quality. These two examples are problems whose solutions are covered by reputation systems for peer to peer networks. Reputation systems are able to incentivize users to contribute to the system and give the opportunity to gather information on a service provider or service before using it. This leads to a higher level of confidence with the system which in turn incentivizes more users to join the network. Reputation is what is generally said or believed about a person s or thing s character or standing. [5] In social environments reputation is mostly unquantified information that is human comprehensible only and either needs a lot of interpretation to help reasoning about an entity s trustworthiness or is already expressed in natural language using terms that are vaguely in an at least partial order like terrible, bad, neutral and good. Generally these descriptions have to be modeled so that a computer program may itself reason about reputation, but in terms of reduced overhead, the most approaches use numeric values to quantify reputation and complex algorithms to determine which behavior shall be rated with which value. The quantification allows fast processing, and compact display in user interaction, but also gives room for e.g. game theoretic approaches for peer selection based on these values. Reputation systems have been widely studied in the literature [5], [8], [10], [11] and are available in hundreds of more ore less different implementations and models which is a known issue regarding the lack of coherence in the research on this topic. In this section, reputation systems will be described regarding several aspects like their architecture in III-A, the policies they are implementing in III-B, their identity models in III-C and their security issues in III-E. According to Ruohomaa et al. [11], the remaining part of reputation systems can be investigated by looking at the three topics: creation and content of a recommendation, selection and use of recommenders and interpretation and reasoning applied to the gathered information which will be discussed in detail in section III-D. 36

37 REPUTATION SYSTEMS FOR P2P NETWORKS A. Topologies Reputation systems for peer to peer networks can be put into the application layer next to the peer to peer application they are supporting as what can be called an extension. Since the reputation system is not bound to the overlay that supports its host, it may or may not introduce out of band communication in order to fulfill its purpose. In the most cases where out of band communication takes place, it utilizes direct TCP/IP connections or a custom overlay. There exist mainly three different types of topologies for reputation systems. Most reputation systems rely on a central server that stores reputation information for its clients. Such architectures often handle bottlenecks by setting up redundancy at the server level and provide a central identity management, which is apart from its topology the basis for every reputation system. A well known example of this topology is the ebay platform. The decentralized systems corresponding to peer to peer topologies often use the identity management of their host applications, which in the most cases is of distributed nature. Other systems form an individual overlay for reputation management to be independent of the supported peer to peer overlay. Reputation values are either stored on the peers the reputation is attached to or on other peers in the role of an agent like in [2]. Some approaches even use distributed storage systems to store and replicate the reputation data in a transparent fashion [1]. While distributed approaches attempt to eliminate, they lack for example mechanisms to counter sybil attacks (See section III-E for further explanation). The third type of topology, often referred to as hybrid, is a solution where most of the network acts like a decentralized network, but parts of the functionality incorporate a classic client / server approach. These networks either exploit the heterogeneity of peers assigning them demanding tasks (ultra peers known from Gnutella v or super peers known from FastTrack 2 ) or rely on a central trusted authority that for example provides access control mechanisms, global variables or an identity generator service [3]. B. Policies Policies are widely used to force users of any system into a certain behavior. Incentive mechanisms implement incentive policies that modify the game theoretic environment in a system, such that users that try to maximize their own profit have to follow certain rules resulting in a behavior that satisfies the policy. Users face an incentive to behave exactly as the policy states, given that the goal of the user is a goal the incentive mechanism or the policy addresses. Reputation systems can be seen as implementation of an incentive policy as they provide a way to maximize a user s own profit by following certain rules, i.e. by participating in a reputation protocol. Reputation systems promise a higher chance for successful service use and a basis for reliability trust and provision trust, which is e.g. essential in business 1 Gnutella v. 0.6: 2 KaZaA: transactions between two peers. The incentive policy implemented in reputation systems incentivizes users to offer valuable service and to behave in a reasonable and transparent way. Since following an incentive is optional, reputation systems also need to handle peers that do not follow the incentive, commonly called malicious peers. Malicious peers are similar to regular peers except for differences in the source of profit respectively their goal. While a regular peer s goals in a network may range from successfully using a service to altruisticly providing service for global welfare, the goals of a malicious peer partially conflict with them. A malicious peer may e.g. interrupt service usage and provision running denial of service attacks on other peers. Even a regular user may be categorized as malicious, if he does not follow a general principle of reciprocity, i.e. giving back what has been taken, which is vital for peer to peer networks to achieve a certain level of service quality. Peers that use services without providing services themselves are called freeriders and are unwelcome guests in many reputation systems. These systems try to incentivize service provision by limiting the service usage capability of a peer in the network according to its contribution rate in the past. A common problem, these reputation systems have to cope with is that newcomers do not have a past in the network (or at least seem so) and therefore look like freeriders, but need to be able to gain reputation (which freeriders should not) in order to be able to provide good service. This problem is referred to as the newcomer problem. According to S. Marti and H. Garcia-Molina [8] there exists a conflict of objectives between incentive schemes encouraging cooperation in the network and incentive schemes that try to punish malicious peers, implying that both goals cannot be satisfied at the same time by one system. A similar conflict exists between the cooperation incentive and one that punishes freeriders, as described in the preceding passage. The systems described in [2] and [6] for example focus on the punishment of malware spreading peers while taking into account the freeriding issue in their source selection algorithms (See section III-D4). With the ReGreT system [12], Sabater et al. try to incentivize cooperation in social environments or so called multi agent environments [11]. Due to a sophisticated formal model their approach is also adaptable to a wide range of other policies thus supporting arbitrary incentives like e.g. an incentive to deliver on time as stated in [12]. M. Gupta et al. [3] even relativize the notion of freeriders by assigning reputation to peers that participate in the routing of queries, which are practically all peers in the system except maybe some malicious peers that practice message dropping. On the other hand, freeriders are explicitly penalized in their approach as the metric bases on message sizes which are minimal in routing compared to e.g. application level service provision. Their incentive is strictly focused on peer cooperation. An example for a list of requirements or goals of a typical reputation system is that of EigenTrust [6] which comprises: a self-policing system in which shared ethics are defined by the users anonymity maintenance no profit for newcomers 37

38 REPUTATION SYSTEMS FOR P2P NETWORKS minimal overhead robustness in case of attacks Clearly, the policy this system implements in none that punishes freeriders. It explicitly addresses the wish for anonymity in a rating system in order to prevent vengeance actions. It addresses the cost of participating in the protocol via the requirement for minimal overhead. The no-profit requirement for newcomers targets whitewashing attacks and incentivizes the maintenance of a steady identity which supports the reputation system as will be stressed in the following section. C. Identity Model Identity models are a key concept to reputation systems. In everyday life, reputation is associated with an entity like a person or with collectives like a company or a brand. In collectives each entity inherits some ground reputation from the fact that the entity is a member of the collective, and in the other direction adds to the reputation of the collective making a collective s reputation a pool of the reputation of its members. Single entities can be identified by identity-cards which bind their appearance to their name and membership contracts which bind them to their collective identifier. In computer reputation systems these rules hold for immutable identifiers which can be seen as names for a collective of users or entities behind that name. Every regular user that follows the rules of a peer to peer network, using such an identifier, adds the reputation gained via interactions with other peers to the collective name. If the user acts maliciously, the group reputation will suffer from this peer s behavior. In peer to peer file-sharing networks, the common case is that the groups only contain one single user loosely coupled to the group name (i.e. there are no contracts). In some systems, the identifiers change with every start of the application which would lead to very temporary and ineffective reputation scores. So, the basis for every reputation system is a long lasting identifier, preferably binding an entity behind a name to that name. This identifier can be assigned to a reputation score providing a basis for trust. Mechanisms to achieve such a binding reach from establishing a central server that generates identifiers for identities in the network and asks the requesting entity to provide identifiers for an out of band identity, up to the usage of secrets e.g. with public / private key pairs [2], [3], [7]. Such a binding is desirable for liability reasons but conflicts with an important goal in open peer to peer networks: Anonymity. Especially in the presence of whitewashing or the sybil attack (See section III-E) more complex identity models have to be consulted that bring some cost to identity creation like a difficult computation process or a simple fee. The major problem with such models is the central authority they rely on which induces bottlenecks and thus a bad scalability. Further solutions fall into the scope of literature dedicated to identity models and are not discussed in this paper. D. Reputation Metric Besides important environmental factors such as the identity model, the reputation policy and the reputation system s topology, the algorithmic core of the system can be investigated. According to Ruohomaa et al. [11] it can be divided into: creation and content of a recommendation, selection and use of recommenders and interpretation and reasoning applied to the gathered information. Aberer et al. [1] divide the core into: a global trust model, which will in general depend on a notion of reputation and describe when an agent is trustworthy, a local algorithm to determine trust and data and communication management. The global trust model of Aberer is a formal requirement for the reasoning part in Ruohomaa s list. It specifies how exactly the reasoning can be done and how to interpret the results. The local algorithm in Aberer s list contains the selection and use of recommenders and also the creation and content of a recommendation. The data and communication management of Aberer et al. addresses mainly the storage and access mechanisms used for persistence and retrieval of reputation records. Ruohomaa et al. only touch on this subject in the reasoning part indicated by gathered information. Since Aberer et al. take a layer-model perspective there is no surprise in the reverse order compared to Ruohomaa et al.. It is to notice that the algorithm for the selection of a service provider is a characteristic of the reasoning and interpretation phase. In [4] Hoffman et al. make a much more detailed categorization of the reputation system, which is in the most cases covered by the discussion in this section even if not explicitly stated. Their analysis framework is a more formal approach that divides the reputation system into formulation, calculation and dissemination. The formulation subsystem specifies the creation and content of a recommendation as in Ruohomaa s list. The calculation subsystem specifies how a reputation value is being calculated from recommendations, resembling the local algorithm of Aberer et al.. The dissemination subsystem is exactly what Ruohomaa et al. call data and communcation management. As can be concluded, all the investigated categorizations are similar. The categorization proposed in this paper is the following: Creation of recommendations Recommender selection Trustworthiness decision Source selection For clarity reasons the discussion of the interpretation and reasoning phase is splitted into the trustworthiness decision that filters out certain peers from the set of obtained sources and the source selection process. The storage of reputation records is addressed in the section on recommendation creation (III-D1). The foundation of a useful reputation metric is the ability to aggregate a large set of user experiences with other users and transactions into a very small set of values that nevertheless give a hint on the contents of the large set. The notion of a compression function may be adequate to describe what produces this metric. 38

39 REPUTATION SYSTEMS FOR P2P NETWORKS The ebay reputation system [9] for example constructs single transaction ratings of a recommender, a recommendee, a time stamp, a rating of either -1 for negative, 0 for neutral and 1 for positive transaction outcome and finally a text comment for detailing reasons for the rating. All ratings for an entity in the ebay system are collected centrally and aggregated into a simple general reputation value, namely the number of positive ratings by distinct users minus the number of negative ratings by distinct users. This number only serves as a simple metric for fast comparison and can be seen as a recommendation. The detailed database of reputation records of a given entity can be viewed completely for each entity and due to the timestamps in transaction records, some predictability of behavior may be assessed. While the ebay system might be extensive in terms of usage count and therefore widely studied in current and past research, it has two drawbacks that make it a bad choice for peer to peer networks. The first is obviously the centralized architecture. The second is the inclusion of human-machineinteraction in the process of reasoning about trustworthiness [11]. In some of the current peer to peer networks, the service provider selection is hidden from the user, as there are for example many source selection steps in file-sharing systems like BitTorrent 3. Making these steps visible, in order to let a user decide which service provider to use, would decrease the usability of such systems. Therefore the goal of the following reputation mechanisms for peer to peer systems is to make trust machine-decidable based on reputation data stored in the network. 1) Creation of Recommendations: All computations in reputation systems are based on transaction records, either giving a positive or a negative rating. Aberer et al. propose in [1] a binary transaction value, i.e. either a transaction succeeds or the transaction fails. Based on the transaction data with negative outcome, called complaints, they derive a reputation value that is the product of the number of complaints filed by the concerning node and the number of complaints filed by other nodes about it. This value is proportional to the untrustworthiness of an entity, as high values indicate a high rate of mutual complaints. In [2] Damiani et al. add a second source of data to their reputation system for file-sharing networks. As explained in section II several types of trust can be distinguished. The differentiation of trust, made in this approach is on the one hand the provision trust and on the other hand a kind of identity trust in data, commonly called data authenticity. Each peer holds its own opinion on available data in form of a short binary record (+ or -) and for provision trust, a list containing for each past transaction partner the number of positive experiences and the number of negative experiences. In recommendation records, assembled during votes, the available data is aggregated into a single rating and sent to the poller. The details of the aggregation are omitted by the authors. Kamvar et al. present with EigenTrust [6] a similar concept of binary experiences. In EigenTrust, experiences are aggregated like in [1] by subtracting the negative experiences from 3 the positive ones but in contrast setting negative reputation values to zero and normalizing them to be a real number in the interval of [0, 1]. In a peer to peer network with n nodes, a global reputation matrix of range n is assumed containing the opinions (assigned reputation) from each peer on each other peer. This matrix is distributed among all peers in the network, each storing its own row, i.e. its vector of opinions on all other peers. As a restriction on the range of reputation values, the sum of values in each row has to be 1. For this reason EigenTrust may be categorized as a flow model [5], which is an important property against some attacks on reputation systems as explained in section III-E. Besides this local reputation vector, each peer also tries to estimate a global reputation value for each other peer by querying selected peers for their opinions. These opinions add to the local knowledge of the global matrix and are used in a transitive trust calculation. In this calculation all possible transitive trust chains are constructed from the executing peer to all reachable other peers, where the reputation resulting from a chain is the product of assigned reputations in the chain. Kamvar et al. show that the global reputation vector resulting from this procedure converges against the left principal eigenvector of the global trust matrix, which means that over a small set of gathered opinions and several iterations for the trust chain, the resulting vector for all peers is nearly equal. Two different approaches are proposed by Gupta et al. in [3] with two reputation systems called debit credit reputation computation (DCRC) and credit only reputation computation (CORC). Both target the freeriding problem in peer to peer file-sharing networks, which is clearly a goal different to the preceding approaches. The CORC mechanism is the contrary to [1] as it omits negative experiences and hence can only provide a metric for the general participation of an entity in the network. The DCRC mechanism is like EigenTrust [6] a flow model, with the difference that the reputation that is given to a service provider is taken from the service consumer and not from other service providers. Since several services in a peer to peer file-sharing environment are distinguished by Gupta et al., a cumulated reputation score is built from the sum of transaction in the role of a service provider, subtracted by the sum of transactions in the role of a service consumer. This second metric, like the first, indicates a level of participation, but it helps to distinguish between newcomers and freeriders, because freeriders will be assigned a negative score in DCRC while newcomers have a zero value. Significantly different to the preceding approaches, Sabater et al. propose with ReGreT [12] a mechanism for social networks that rates single transactions with real numbers in the interval [ 1, 1] according to the difference between a contract and the actual behavior of the transaction partner in a set of statements of the contract. Thus the reputation can be categorized in different aspects like e.g. a reputation to be lying about service quality, or a reputation to be delivering on time. A cumulated reputation value, if necessary, can be derived from reputation values of finer granularity by applying ontological rules, which form the third dimension of reputation next to the individual dimension (a peer s own experiences) and the social dimension (other witnesses experiences). Each dimension is 39

40 REPUTATION SYSTEMS FOR P2P NETWORKS processed in an individual step. The first step computes the weighted mean of the own experiences giving more relevance to recent experiences. The second step collects third party opinions weighted by their reliability and aggregates them in a convex combination with the own reputation estimation. The third dimension may need further processing steps according to the depth of the ontology graph. The outcome is a single reputation value from the same range as the experience ratings. The notion of reliability that is important in transitive trust chains is computed in the same three-dimensional structure and uses the deviation of values as a metric. 2) Recommender Selection: In the voting based algorithms a set of voters is queried about their opinions on a set of resource providers. These voters are either selected randomly from the network, from the set of peers the initiator of the vote has interacted with like in [2] or in some cases, they are found via hash functions like in [6]. In hybrid reputation systems the set of voters can be replaced by a central trusted authority signing reputation values [3]. In the decentralized cases, the notion of transitive trust plays an important role for the voting process. The voters opinions on a service providers are weighted by the trust that the initiator puts in the voter. So, an unreputable voter has a low impact on the result of the vote. 3) Trustworthiness Decision: The reasoning upon the obtained reputation values is mostly done via thresholds that divide trustworthy peers from untrustworthy ones. In [3] a select-best approach is implemented that selects peers in descending reputation order as service providers until the desired service could be successfully used. Marti et al. [7] see peers as trustworthy as long as they have a reputation value above a certain threshold. If no service provider passes this threshold, the service selection process will be regarded as failed. Damiani et al. [2] filter the outcome of their opinion poll with regard to data reputation, IP-clusters of voters and callback confirmation, where nodes from known malicious IP-clusters are excluded as well as peers not responding to a test callback. The remaining set of peers is regarded as trustworthy. In EigenTrust [6], each peer responding to a query is a potential download source. There is no filtering applied in order to give newcomers a chance of being selected. Aberer et al. [1] provide a function that determines from the reputation data if a peer is trustworthy or not. 4) Source Selection: The diverse approaches all have different methodology when it comes to selecting a service provider from a list of the trustworthy providers. Problems arising in this part of the reputation management include the newcomer problem and a load balancing issue. If a reputation mechanism selects the most reliable peer out of a preselection of peers with good promised service quality, this will lead to an explosion of reputation for this peer, because its reputation will rise after the transaction and the next peers will even more probably decide in favor of this reputable peer. In [3] this seems to be the case. EigenTrust [6] interprets its normalized reputation values as a probability of being selected, providing some load distribution instead of a focused load to the most reputable peer. The mechanism also incorporates a small probability that a peer with no reputation will be selected in this process. The disadvantage of not selecting the most reputable peer is the increased chance of selecting a malicious peer as service provider. Damiani et al. [2] use a voting mechanism, taking place after the receipt of all QueryHit messages in Gnutella. The voting mechanism delivers other peers opinions on peers in the set of available sources and thus forms the basis for the decision algorithm. Although the most reputable peer is selected from the set as a service provider, it is contacted before the transaction and asked, if the necessary bandwidth is available or not, and in negative cases, the second most reputable peer is selected and so forth. This approach forces some load distribution while still suffering from reputation explosion for the most reputable peers. E. Problems and Security Issues While many reputation systems are being installed to solve security issues concerning malware, they introduce a broad range of vulnerabilities and problems themselves. Jøsang et al. list some of them in [5] which will be discussed in this section. Since with reputation systems user identities become associated with a real value and with long term identities the time an identity is present in the network is strongly increased, a user s identity may become target of various attacks. Malicious peers may try to steal the identity in an identity theft in order to use the high reputation to give other malicious peers a good reputation or to behave badly in the name of someone else. Competitors may try to launch denial of service attacks on the entity behind the identity to disturb the entity s transactions which will result in bad reputation. Malicious peers may also try to lie about transactions with a certain other peer which is called badmouthing or slandering [4] in combination with lying and discrimination. Rating a peer negative, even though it provided good service is called badmouthing. Rating a peer inverse to its actual behavior is called lying, which includes giving a peer a positive rating, even if it acts malicious or does not act at all. If a peer rates the majority of other peers according to their behavior but rates a small group always negative the peer discriminates the set of victims. In all cases, a victim of the attack loses reputation and with it the corresponding service quality: The network s value will decrease for the victim as will the victim s value for the network. 1) Whitewashing: In the case of bad reputation, which is by assumption the common case for malicious peers, the easiest way to further use the network is to acquire a new identity without an incriminating past behavior. This is also known as whitewashing and a major problem in all fully decentralized systems, since they lack a central instance that could use common heuristics or requirements to determine whether a new user really is new as he claims to be. Another approach to this problem is to penalize new users in terms of a more slowly rising reputation. 40

41 REPUTATION SYSTEMS FOR P2P NETWORKS 2) Sybil Attack: A further attack on the identity model is the sybil attack. During a sybil attack, a malicious user attempts to acquire a large number of identities. Each identity is used to give other malicious entities good reputation by lying, resulting in a malicious cluster of arbitrarily high reputation that will be very likely to be selected by some peers for future transactions. In [5], the solution to this behavior is called flow model. In a flow model the global reputation for the collective of all peers is a constant, and with every transaction, the honest, hard working service provider gains reputation while the service consumer looses the same amount. Google s PageRank algorithm [9] is named as a well known member of this group of models. A basic assumption for this model to protect against sybil attacks is that the initial reputation of a newcomer is so low that in transitive trust chains, their opinion has no weight. If this was not the case, a malicious attacker could easily obtain hundreds of newcomer IDs each rating one malicious target peer positively and thus not suffering from the negative flow in the rating peers. 3) Discrimination: Regarding the topic of discrimination in reputation systems another issue evolves: If a peer gets discriminated by another peer that has a high reputation, other peers will be very likely to identify the victim as the malicious node, lying about the transaction. Only systems that include a weight to the victim s opinion derived from its general trustworthiness may handle this problem correctly. 4) Dynamic Behavior: If an entity s service quality drops after reaching a high level, the following loss of reputation may not be reflected properly in systems that calculate their reputation value as a quotient of successful and unsuccessful transactions, because they tend to become inert with time. So, many authors implement a countermeasure in form of a fading or forgetting factor in their reputation systems, that puts a weight to the age of transaction ratings. Recent transactions are rated normally while older transactions loose more and more weight for the reputation value. On the one hand, this mechanism makes it easy for bad behaving peers to let others forget their bad behavior, on the other hand the service providers will have to keep their quality high in order to keep a good reputation. 5) Expectation Problem: One issue that is inherent to all current reputation systems is the requirement of human participation in the rating process. In the previous section it has been pointed out that such a requirement reduces the usability of the system as users are forced to perform additional actions that may in cost outweigh the benefits derived from the system. The design of source selection algorithms automates a part of the system while for the rating human participation remains necessary. This is a formalization problem because an application cannot perform a comparison between expected and actual outcome of a transaction if the expected outcome cannot be formalized. In the example of P2P file-sharing the application cannot check if the downloaded copyright free music file is a binary representation for the song the user wanted to download since the user cannot tell the application what exactly he or she does expect. This problem is explicitly addressed by the Regret system [12] which assumes a contract that states what the expected outcome is. It remains open if the cost of formulating precise contracts for service use in peer to peer overlays is less than the cost for repeated evaluation in the case of a bad outcome. IV. CONCLUSION This paper has summarized common categories of reputation systems provided in the literature and has put some selected reputation systems in this context. In section II a common understanding of the notions of trust has been fostered in order to reason about reputation systems and their purpose. The classification of trust as done in [5] helps to distinguish between an amount of trust that can be put into a persons behavior in the context of risk, and the trust in a persons behavior according to the effects on one s own welfare. It has been pointed out that trust can be built on reputation but reputation is not the only metric, trust is built on in real economics. Thus, reputation systems can only provide a tiny base for the establishment of trust in distributed environments. The section on reputation III has guided through the main components and attributes of a reputation system, as for example its architecture, which may be either centralized like the ebay reputation system, decentralized like EigenTrust [6] or hybrid like [3]. It has been shown that reputation systems provide incentives according to their policies which might be to encourage cooperation, to stop the spread of malware in the file-sharing network, or even to prevent freeriding. In section III-C the different identity models of reputation systems have been introduced and it has been emphasized, that a steady identity is a main requirement for the implementation of a reputation system. Without steady identities, reputation would only last as long as the identity is held, which could be easily exploited by malicious peers in a whitewashing attack. The core of a reputation system has been detailed in section III-D where a common denominator for the available classifications from [1], [5], [11] has been derived. The investigation has focused on four parts of the core, namely the creation and content of recommendations, the selection of recommenders, the trustworthiness decision and finally the source selection. Different approaches could be distinguished by: their dimension of rating information (positive or negative ratings) the recommendation type (real number, binary, categorized, normalized) their storage structure (centralized, agents, peer itself) the selection of recommenders (voting, trust agent, random set, neighbors) the aggregation methods (convex combination, transitive trust chains, sum) the selection of service providers (deterministic, probabilistic) The choice of examined reputation systems show a wide range of these properties and also cover many cases in the regarded surveys. Finally, security issues have been discussed in section III-E. The results showed that a lot of efforts have to be put in the prevention of lying, whitewashing, sybil attacks, discrimination and unfair ratings, in order to gain value from reputation 41

42 REPUTATION SYSTEMS FOR P2P NETWORKS usage. They also showed that the different approaches all have their weaknesses. The presence of weaknesses and the absence of the ultimate solution has been pointed out to derive from the approaches different goals and incentives and the conflict of objectives described in III-B. Reputation systems can greatly enhance the transparency of marketplaces in peer to peer networks as in economics, where for example rating agencies publicly rate companies for the reliability of their financial solvency, which serves its potential co-contractors as a base for risk assessment. In peer to peer networks the risk of being exposed to malware can be analogously mitigated by risk assessment based on recommendation data of specialized reputation systems. Since the notions of computer reputation systems originate from the social environment and economics, some of the systems and especially the surveys on reputation systems may emit useful ideas for situations outside of computer systems that cope with problems where solutions with a central management are infeasible. A likely example might be an open source project in engineering like an open source car 4. Nevertheless, for peer to peer networks and maybe the growing number of online social networks reputation systems become an essential requirement, providing methods known from social life that help human users to decide whom to trust. [11] Sini Ruohomaa, Lea Kutvonen, and Eleni Koutrouli. Reputation management survey. In ARES 07: Proceedings of the The Second International Conference on Availability, Reliability and Security, pages , Washington, DC, USA, IEEE Computer Society. III, III-B, III-D, IV [12] Jordi Sabater and Carles Sierra. Reputation and social network analysis in multi-agent systems. In AAMAS 02: Proceedings of the first international joint conference on Autonomous agents and multiagent systems, pages , New York, NY, USA, ACM. III-B, III-D1, III-E5 REFERENCES [1] Karl Aberer and Zoran Despotovic. Managing trust in a peer-2-peer information system. In CIKM 01: Proceedings of the tenth international conference on Information and knowledge management, pages , New York, NY, USA, ACM. III-A, III-D, III-D1, III-D3, IV [2] Ernesto Damiani, De Capitani di Vimercati, Stefano Paraboschi, Pierangela Samarati, and Fabio Violante. A reputation-based approach for choosing reliable resources in peer-to-peer networks. In CCS 02: Proceedings of the 9th ACM conference on Computer and communications security, pages , New York, NY, USA, ACM. III-A, III-B, III-C, III-D1, III-D2, III-D3, III-D4 [3] Minaxi Gupta, Paul Judge, and Mostafa Ammar. A reputation system for peer-to-peer networks. In NOSSDAV 03: Proceedings of the 13th international workshop on Network and operating systems support for digital audio and video, pages , New York, NY, USA, ACM. III-A, III-B, III-C, III-D1, III-D2, III-D3, III-D4, IV [4] Kevin Hoffman, David Zage, and Cristina Nita-Rotaru. A survey of attack and defense techniques for reputation systems. ACM Comput. Surv., 42(1):1 31, III-D, III-E [5] Audun Jøsang, Roslan Ismail, and Colin Boyd. A survey of trust and reputation systems for online service provision. Decis. Support Syst., 43(2): , I, II, III, III-D1, III-E, III-E2, IV [6] Sepandar D. Kamvar, Mario T. Schlosser, and Hector Garcia-Molina. The eigentrust algorithm for reputation management in p2p networks. In WWW 03: Proceedings of the 12th international conference on World Wide Web, pages , New York, NY, USA, ACM. III-B, III-D1, III-D2, III-D3, III-D4, IV [7] Sergio Marti and Hector Garcia-Molina. Limited reputation sharing in p2p systems. In EC 04: Proceedings of the 5th ACM conference on Electronic commerce, pages , New York, NY, USA, ACM. III-C, III-D3 [8] Sergio Marti and Hector Garcia-Molina. Taxonomy of trust: Categorizing p2p reputation systems. Comput. Netw., 50(4): , III, III-B [9] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report , Stanford InfoLab, November Previous number = SIDL-WP III-D, III-E2 [10] Paul Resnick, Ko Kuwabara, Richard Zeckhauser, and Eric Friedman. Reputation systems. Commun. ACM, 43(12):45 48, III

43 P2P GAMING OVERLAYS P2P Gaming Overlays Christian Glaser Abstract Aktuelle Server/Client Ansätze um Spiele mit hoher Benutzeranzahl zu realisieren stellen hohe Anforderungen an die Server und erzeugen hohe Betriebskosten. Die hohen Anforderungen und Betriebskosten machen es schwierig solch ein Spiel am Laufen zu halten. Wird das Spiel nicht oft genug verkauft bzw. sind durchschnittlich zu wenige zahlende Spieler vorhanden können die Betriebskosten nicht gedeckt werden. Diese Arbeit stellt einige Peer-to-Peer (P2P) Ansätze vor, mit denen man vom klassischen Server/Client System abweicht um so die Last auf die Peers zu verlagern. Von einer kommerziellen Anwendung ist man noch entfernt, aber die ersten Ansätze sind sehr vielversprechend. I. EINFÜHRUNG In aktuellen Server/Client Architekturen senden die Spieler ihre Bewegungsinformationen und weitere Aktionen an den Server. Der Server muss die Aktionen validieren und sendet dann die Informationen weiter an Spieler, für welche diese Informationen relevant sind. Dies muss der Server für jeden einzelnen Spieler durchführen, was schnell zu einer extremen Last führt. Speziell für Spiele mit einer extremen Anzahl von Spielern, wie sie in Massively Multiplayer Online Games (MMOG) oder Massively Multiplayer Online Role-Playing Games (MMORPG) vorkommen, wird dies zum Problem. Fig. 1 zeigt die Anzahl aktiver Spieler der bekanntesten MMOGs. Schnell reicht hier ein einzelner Server nicht mehr und es müssen ganze Server-Cluster aufgebaut werden. Diese Server- Cluster müssen gewartet werden, es muss entsprechende Bandbreite vorhanden sein um den Datentransfer zu gewährleisten, müssen gekühlt und gewartet werden. Dies alles erzeugt enorme Kosten, welche nur gedeckt werden können, wenn auf Dauer genug Spieler für das Spiel bezahlen. Ist dies nicht der Fall ist die Gefahr groß, dass die Kosten die Einnahmen übersteigen oder sich zumindest so sehr angleichen, dass der Betrieb des Spieles sich nicht mehr rechnet. Daher ist es vor allem für kleinere Firmen schwer ein MMOG zu entwickeln, da das wirtschaftliche Risiko zu groß ist. Besonders weil World of Warcraft hier bereits die Vormachtstellung besitzt und die meisten Spieler bereits auf ihren Servern hat. Fig. 1 zeigt hier recht gut wie World of Warcraft mit über 10 Millionen aktiven Spielern kaum Spieler für weitere MMOGs übrig lässt. [11] P2P-Systeme könnten die Lösung für diese Probleme bieten, da hier der Server entlastet oder vollkommen überflüssig gemacht wird. Das ist möglich, indem die Last auf die einzelnen Spieler (Peers) verteilt wird und so jeder neue Client Ressourcen ins System einbringt. Die einzelnen Peers müssen ihre Updates direkt an die Peers schicken, für welche die Information relevant ist. Die Hauptaufgabe eines solchen P2P- Systems ist also das Finden der relevanten Peers. Diese Arbeit stellt einige alternative Ansätze vor um solch ein P2P-System zu realisieren. Die Hauptaufmerksamkeit Fig. 1. Aktive Spieler der größten MMOGs. World of Warcraft sticht hier mit über 10 Millionen aktiven Spielern heraus. Quelle: MMOGCHART MMOG Active Subscriptions: 200,000+, mmogchart.com/chart1.html wird hierbei auf die Übermittlung von Positionsinformationen gerichtet, da ein Spieler seinen Avatar die meiste Zeit durch die Spielwelt bewegt, stellt dies die wichtigste Form der Kommunikation in MMOGs dar. Zunächst wird ein wichtiges Verfahren für P2P-Systeme vorgestellt, der Multicast, welches wichtig ist, da die Bandbreite oft nicht ausreicht um alle Nachrichten Unicast zu versenden. Dann folgen einige klassische Ansätze für Interest Management, was die Grundlage ist für ein P2P-System um dann als letztes das Donnybrook System vorzustellen, welches Interest Management auf einer höheren Ebene betreibt um so schnelle Action-Spiele (also Spiele, die einen geringen Delay benötigen) mit vielen Spielern möglich zu machen. II. MULTICAST Da in einem P2P-System, wie es für MMOGs benötigt wird, oft die selbe Nachricht an viele Spieler geschickt wird, ist es oft sinnvoll, die Nachricht nicht Unicast an jeden einzeln zu schicken, sondern nur einmal ins Netz - das Netzwerk leitet die Nachricht dann an alle Empfänger weiter. Ideal wäre es, wie in DIVE [7] ein LAN zu haben, welches Multicast unterstützt, oder die Möglichkeit, sich in einer Gruppe anzumelden und eine Nachricht der Gruppe wird an alle Mitglieder der Gruppe weitergeleitet, was komplett vom Netzwerk geregelt wird. Leider wird ein solches Multicast von vielen Netzen (vor allem dem Internet) nicht unterstützt. [5] Abhilfe lässt sich mit einem Applikation Layer Multicast (ALM) schaffen. Dies ist zwar nicht so effizient wie ein Netzwerk Multicast, aber besser als bei einer großen Anzahl von Peers jede Nachricht hintereinander an alle Peers zu schicken, 43

44 P2P GAMING OVERLAYS da die Bandbreite dafür oft nicht ausreicht. Stattdessen werden Multicast Bäume aufgebaut. D.h. ein Paket wird einmal (oder mehrmals) verschickt und der Empfänger schickt sie wieder an weitere Peers. Es gibt viele verschiedene Möglichkeiten solche Multicast Bäume aufzubauen. Denkbar wäre z.b. auch die Peers in feste Gruppen zu unterteilen oder diese dynamisch anhand der Bandbreite aufzubauen. Multicast Bäume sind ein wohlerforschtes Gebiet und daher gibt es bereits viele Implementierungen. In [11] sind einige aufgelistet. III. INTEREST MANAGEMENT Lu Fan et al. [11] beschreibt Interest Management folgendermaßen: Ohne die Unterstützung eines Servers ist es wichtig die Daten zu beschränken, die ein Peer senden oder empfangen muss, da es sonst seine Bandweite überschreiten würde. Interest Management (IM) ist ein klassisches Forschungsgebiet welches sich mit dem Thema beschäftigt und ursprünglich von Macedonia et al. in den 1990ern ins Leben gerufen wurde (gemäß [12], wie zitiert von [11]). Die Idee von Interest Management basiert auf 2 Annahmen: 1. Ein Spieler muss nicht alles in der Spielwelt wissen, solange es ihn nicht betrifft 2. Ein Spieler-Avatar hat nur begrenzte Bewegungsgeschwindigkeit und Wahrnehmung Also kann die Sicht eines Spielers auf einen kleinen Bereich beschränkt werden. Dieser kleinere und vergleichsweiße statische Bereich nennt sich Area of Interest (AOI). Vereinfacht kann man zwischen 3 Klassen unterscheiden: Ein regionsbasiertes Publish/Subscribe System, ein Räumliches System und ein Hybrid aus beidem und dem Client/Server System, welches in der Einleitung vorgestellt wurde. Im Folgenden werden Peers, die sich in der AOI befinden, als Nachbarn bezeichnet. Fig. 2. Modell eines publish/subscribe Systems Quelle: Peer-to-Peer Support for Massively Multiplayer Games [4] B. Räumlich Ein räumliches System funktioniert grundsätzlich über die Entfernung, die ein Spieler-Avatar in der Spielwelt zu anderen hat. Dazu können z.b. sogenannte Voronoi Diagramme [13] aufgestellt oder nur über die Entfernung und Sensorknoten [2] gearbeitet werden. A. Regionsbasiertes publish/subscribe Die Welt wird beim regionsbasiertem Publish/Subscribe System während des Entwurfs der virtuellen Welt in feste Regionen unterteilt. Peers, deren Spieler-Avatare sich in einer Region befinden, formen eine Interest Group und verteilen ihre Nachrichten nur untereinander als Multicast. Oder es werden, wenn das Spieldesign dies zulässt, die Nachrichten auch in benachbarte Regionen verteilt. [4] Dieses Konzept ist sehr einfach zu implementieren und es ist leichter die Area of Subscription als die AOI Kollision zu berechnen. Allerdings unterliegt dieses System einigen Problemen: Es ist schwierig die richtige Größe der Regionen zu wählen. Sie müssen groß genug sein um sicherzustellen, dass Peers ihre Nachrichten verteilen bevor sie die Region wieder verlassen. Aber nicht zu groß, sonst wird ein Peer mit Nachrichten überschwemmt. Die richtige Größe zu finden ist gerade dann schwierig wenn es Peers mit sehr unterschiedlichen Geschwindigkeiten gibt. Z.B. Ein Flugzeug und ein Fußsoldat. [8] Eine statische regionale Aufteilung funktioniert nicht immer wenn die Peers ungleichmäßig verteilt sind. [11] Fig. 3. Voronoi Diagramm. Der rote Kreis ist die AOI des Peers. Quadrate sind angrenzende Nachbarn. Dreiecke sind abgrenzende Nachbarn. Kreise sind normale Nachbarn und Kreuze sind keine Nachbarn. Quelle: VON [13] 1) VON (Voronoi-based Overlay Network): Wenn man eine Menge von Punkten im Raum hat (Zentren genannt), dann unterteilt ein Voronoi den Raum so, dass Bereiche aufgespannt werden, sodas in einem Bereich alle Punkte näher zu einem Zentrum sind als zu jedem anderem. In einem Spiel wären die Zentren die Positionen der einzelnen Spieler, für die ein Voronoi Diagramm angelegt wird. Darüber wird kreisförmig die AOI des Spielers gelegt und so angrenzende Nachbarn, welche den Spieler direkt umgeben, sowie abgrenzende Nachbarn, die sich am Rand seiner AOI befinden, bestimmt. Fig. 3 visualisiert dies. Zu den Nachbarn wird eine direkte Verbindung gehalten 44

45 P2P GAMING OVERLAYS Fig. 5. psense Quelle: [2] Fig. 4. Voronoi Worst Case erstellt mit CGAL Quelle: CGAL 2D Voronoi Diagramm (kein Multicast), um minimale Latenz zu garantieren. Zu den angrenzenden Nachbarn wird in jedem Fall eine Verbindung gehalten, auch wenn diese nicht komplett in der AOI liegen. Dies ist notwendig, damit die Verbindung des P2P Netzwerks nicht getrennt wird. Bewegt der Spieler seinen Avatar, sendet der Client Positionsnachrichten an alle Nachbarn. Neue Knoten werden über abgrenzende Nachbarn entdeckt, da diese Knoten außerhalb der AOI kennen. Wenn sich ein Spieler-Avatar bewegt, sendet er Positionsupdates an alle Nachbarn. Wenn der Empfänger ein abgrenzender Nachbar ist, überprüft er, ob seine angrenzenden Nachbarn nun auch für den sich bewegenden Spieler-Avatar sichtbar sind, und teilt dies dem Peer mit, welchem der Spieler-Avatar gehört. Diese bauen dann direkte Verbindungen auf, teilen sich Nachbarn mit und aktualisieren ihre Voronoi Diagramme. Will sich ein neuer Spieler Verbinden, muss er die Koordinarten seines Avatars an irgendeinen Knoten senden. Dieser leitet ihn weiter zu einem Knoten, der näher an seinen Koordinarten ist, bis er einen Knoten gefunden hat, welcher in seiner AOI ist. Trennt ein Peer die Verbindung wieder, egal ob durch Abbruch der Verbindung oder gezieltem Disconnect, aktualisieren sich die Knoten über seine angrenzenden Nachbarn. Nachteil von diesem System ist, dass auf den einzelnen Maschinen ein großer Berechnungsoverhead vorhanden ist, da jeder Client ein Voronoi Diagramm aktuell halten muss. Dazu kommt, dass die Kommunikation nicht minimal ist. Man stelle sich vor, dass ein Spieler ganz viele angrenzende Nachbarn hat, die aber alle nicht in der AOI liegen. Fig. 4 visualisiert das Problem. 2) psense: psense [2] arbeitet wie VON über die Entfernung, die ein Spieler-Avatar zu anderen hat. Im Gegensatz zu VON erspart sich psense allerdings die aufwändig zu berechnenden Voronoi-Diagramme. Die Spieler-Avatare, die sich in der AOI befinden, werden lediglich in einer Near- Node Liste geführt. Um zu verhindern, dass das Netzwerk partitioniert wird arbeitet psense mit Sensor Knoten, welche in einer Sensor-Node Liste gespeichert werden. Würde man nur Verbindungen zu Peers in der AOI aufrecht erhalten, würde sehr schnell das Netzwerk partitionieren, sobald etwas Luft zwischen Spielergruppen entsteht. Und da es keinen Server mit globalem Wissen gibt, könnte man das Netzwerk nicht mehr zusammenführen. Deshalb hält psense in jeder Himmelrichtung der Spielwelt die Verbindung zu einem Peer aufrecht, dessen Spieler-Avatar am nächsten ist. Das sind die Sensorknoten. Fig. 5 a und b) zeigen dies. Die Sensorknoten werden periodisch abgefragt, ob sie einen besseren Sensor kennen, und sobald ein Spieler-Avatar näher an die AOI rückt bemerkt das der Sensorknoten und teilt dies als Vorschlag für einen besseren Sensorknoten mit (Fig. 5 c). Der einzige Fall, in dem das Netzwerk nun noch Partitionieren kann, ist wenn ein Sensorknoten offline geht. Passiert dies werden Peers aus der Near-Node Liste gefragt, die sich in dem Sektor befinden, oder auch Sensorknoten aus benachbarten Sektoren. Im seltenen Fall, dass alle Spieler-Avatare sich in einer geraden Linie befinden kann das Netzwerk allerdings immer noch partitionieren. Die Sensorknoten sind außerdem 45

46 P2P GAMING OVERLAYS Fig. 6. psense Quelle: [2] dafür verantwortlich, dass Spieler-Avatare, die sich der AOI nähern, bemerkt werden. Wenn ein Spieler dem Netzwerk beitreten will, muss er lediglich die Adresse eines beliebigen Peers kennen. Z.B. von einem Login-Server, der solch einen Peer kennt. Ist dessen Spieler-Avatar bereits in der AOI kann er direkt anfangen Positionsupdates zu verschicken. Ist der Spieler-Avatar des Peers noch nicht in Vision Range bekommt er von diesem Peer einen Sensor-Node genannt, welcher sich näher an seiner Position befindet und wird so weitergeleitet, bis ein Peer mit Spieler-Avatar in der AOI gefunden wird. In Fig. 5 a) und b) ist zu sehen, wie dann die noch unbekannten Nachbarn gefunden werden. Positionsupdates werden an alle Knoten weitergeleitet, die bereits bekannt sind (und für die die Bandbreite ausreicht), p ist noch unbekannt. q erhält das Positionsupdate, kennt p allerdings schon, und leitet das Positionsupdate an p weiter. p meldet sich darauf dann als neuer Nachbar an und ist von da an bekannt. In Fig. 5 b) ist p von keinem Peer in der AOI bekannt, dafür kennt Sensor S einen Spieler- Avatar, der näher ist und leitet daher das Positionsupate weiter an q, welcher p kennt, um das Positionsupdate wiederum weiterzuleiten. C. Hybrid Beim Hybrid ist die Welt in Regionen aufgeteilt wie beim regionsbasiertem Publish/Subscribe Model. Für jede Region wird ein Super-Peer ausgewählt. Verbindet sich ein Peer, findet er den Super-Peer für die Region heraus und gibt ihm sporadisch Positionsupdates durch. Der Super-Peer hat so eine globale Sicht der Welt und kann AOI-Kollisionen vorausberechnen. AOI Kollisionen treten auf, falls die AOIs zweier Spieler-Avatare sich überschneiden und sie sich so gegenseitig sehen könnten. Spieler-Avatare, deren AOI sich mit hoher Wahrscheinlichkeit bald schneiden, teilt er dies mit, damit diese direkte P2P Verbindungen aufbauen. MOPAR [1] ist z.b. ein solches Hybrid System. Dieses System hat den Vorteil der feinen Unterteilung wie im regionalen Ansatz ohne den Berechnungsoverhead durch z.b. Voronoi Diagramme auf den Peers zu haben und es ist leichter zu implementieren als ein regionaler Ansatz. Allerdings kann es hohe Last auf den Super-Peers bedeuten und macht diese zu einem Single-Point of Failure. ( Lu Fan et al [11] ) IV. UMGANG MIT HOHER SPIELERDICHTE In MMOGs kommt es oft vor, dass an bestimmten Orten oder Zeiten die Spielerdichte sehr hoch wird, z.b. in Städten oder zu bestimmten Events, an denen viele Spieler teilnehmen möchten. Auch hier gibt es verschiedene Arten wie man damit umgehen kann. A. AOI dynamisch anpassen VON [13] und Velvet [10] passen z.b. ihre AOI dynamisch an und verkleinern oder vergrößern sie, je nachdem, wie viele Spieler-Avatare in der Nähe sind. Abbildung 5 und 6 verdeutlichen dies. Das hat zum Vorteil, dass kein Lag dadurch entsteht, dass Peers auf Positionsupdates warten müssen, allerdings sieht der Spieler eben auch nicht mehr so weit, was zur Folge hat, dass für den Spieler Möglichkeiten eingeschränkt werden können (z.b. wenn die AOI kleiner wird als seine Aktionsreichweite). Fig. 7. Verkleinerung der AOI Quelle: Velvet [10] B. Multicast psense [2] macht dagegen sowas ähnliches wie einen Multicast, indem es einfach die Positionsupdates an so viele Nachbarn schickt wie es die Bandbreite zulässt. Wenn ein Peer ein Positionsupdate bekommt schickt er sie nochmal weiter, da diese eventuell noch nicht alle bekommen haben. Dies 46

47 P2P GAMING OVERLAYS Fig. 8. Vergrößerung der AOI Quelle: Velvet [10] wird so oft gemacht, bis ein festgelegter Hop-Count überschritten wird. Nachteil hierbei ist, dass für alle Peers, welche die Nachricht nicht im ersten Hop bekommen, ein größerer Delay besteht. Die Performance vom gesamten System nimmt also ab, dafür muss die AOI Reichweite nicht eingeschränkt werden und dem Spieler werden somit keine Möglichkeiten genommen. C. Besseres Interest Management Eine weitere Möglichkeit, die man von Donnybrook [3] adaptieren könnte, wäre, Spieler-Avatare, auf welche die Aufmerksamkeit des Spielers zurzeit nicht liegt, durch guided AI Bots (Doppelgängers) zu ersetzen. Dies wird im nächsten Kapitel ausführlich beschrieben. Dies vereint die Vorteile der beiden vorher beschrieben Systeme ohne deren Nachteile zu haben. V. DONNYBROOK In den vorherigen Kapiteln lag der Fokus auf MMOGs, welche zwar eine sehr große Spielerzahl zulassen, dies aber nur mit großen Delay möglich machen. Ausweichen oder ähnliche schnelle Aktionen sind hier normalerweise nicht möglich. Ein Treffer wird stattdessen meist mit Wahrscheinlichkeiten berechnet. (Kampfwertungssystem WoW 1 ) Schnelle Action- Spiele dagegen benötigen geringen Delay und basieren auf schnellen Aktionen. Dies war bisher allerdings nur mit geringen Spielerzahlen möglich. Battlefield 1942 unterstützt z.b. bis zu 64 Spielern 2, sofern der Server wirklich Leistungsstark ist. Donnybrook [3] geht nun einen vollkommen neuen Ansatz, um auch solch Action Spiele mit sehr vielen Spielern zuzulassen, indem es Spieler-Avatare, auf die sich der Spieler aufgrund der eingeschränkter Wahrnehmungsfähigkeit des Menschen aktuell nicht konzentrieren kann, durch guided AI Bots zu ersetzen. Nur die Peers, auf desen Spieler-Avatare der Spieler aktuell seine Aufmerksamkeit widmet, werden genau dargestellt. Der Unterschied zu klassischen Interest Management Systemen besteht also darin, dass innerhalb der AOI noch einmal unterschieden wird zwischen Interessant und weniger Interessant während im klassischen Interest Management alle Avatare in der AOI als Interessant zu betrachten sind. A. Interest Sets In Donnybrook geht man davon aus, dass gemäß [6] und [9] der Mensch sich nur auf eine konstante Zahl von Objekten konzentrieren kann und von den restlichen Objekten nur sehr wenig bemerkt. Um das zu modellieren führt jeder Spieler ein Interest Set welches 5 andere Spieler-Avatare listet, denen er die meiste Aufmerksamkeit schenkt. Nur von diesen 5 Spieler- Avataren erhält er genaue Positionsangaben. Um die Spieler herauszufinden, welchen die meiste Aufmerksamkeit geschenkt wird, werden Dinge analysiert wie z.b. die Entfernung zum Spieler-Avatar -> Spieler-Avatare die nicht weit entfernt sind wird eher Beachtung geschenkt Blickrichtung des Spielers -> wohin wird gezielt? welchen Spieler-Avataren wurde zuletzt Beachtung geschenkt, denn es ist üblich für den Menschen ein bestimmtes Ziel zu verfolgen Zwischen diesen Punkten kann dann noch je nach Kontext eine andere Gewichtung bestimmt werden. Zielt der Spieler z.b. mit einem Scharfschützengewehr, so ist der Punkt wohin er zielt der stärkste Indikator für seine Aufmerksamkeit, wohingegen er mit einer Nahkampfwaffe sich eher auf Ziele in seiner Nähe konzentriert. B. Doppelgängers Alle anderen Spieler-Avatare, auf denen nicht die Konzentration liegt, werden durch guided AI Bots (sogenannte Doppelgängers) gesteuert und bekommen nur ab und zu (etwa einmal in der Sekunde) Bewegungsimpulse von den wirklichen Spielern. Zwischen diesen Bewegungsimpulsen muss ein Doppelgänger die Kontrolle übernehmen, sonst würde der Spieler- Avatar ohne flüssige Bewegung von einer Position zur anderen springen und damit die Aufmerksamkeit des beobachtenden Spielers auf sich ziehen. VI. AUSBLICK Man konnte sehen, dass für die Positionsübermittlung über ein P2P-Netz schon sehr ausgereifte Verfahren existieren. Regionsbasierte Publish/Subscribe Verfahren sind eine gute Wahl, falls die Spielwelt sehr gut partitionierbar ist. Mit VAST 3 existiert eine schon implementierte räumliche Variante, die VON nutzt um Positionsupdates über ein P2P- Netz zu übertragen. Mit psense besitzt man einen Ansatz, der ohne hohen Berechnungsaufwand auskommt und so P2P- MMOGs auch auf schwächeren Rechnern ermöglicht. Ebenso ist dies mit dem Hybridansatz der Fall, der zusätzlich mehr Sicherheit bieten könnte, da immer ein Super-Peer ein Gebiet überwacht. Diese Super-Peers könnten Server vom Entwickler sein, welche aber nicht mehr so Kostenintensiv wären wie bei einem klassischem Server/Client System. Bis diese Systeme aber in einem realen MMOG eingesetzt werden können sind noch viele weitere Arbeiten nötig, da eben nur die Positionsübermittlung so ausgereift ist. Für weitere Merkmale wie Persistenz und Sicherheit gibt es zwar einige Ansätze aber noch zu wenig ausgereiftes. Für Persistenz gibt es zwar z.b. schon ausgereifte verteilte Datenbank Systeme, aber es muss noch erforscht werden inwiefern sich diese Systeme für MMOGs eignen oder wie man sie modifizieren

48 P2P GAMING OVERLAYS müsste, da sie ursprünglich für Filesharing entwickelt wurden und nicht die Performanceanforderungen und Sicherheitskriterien eines MMOGs erfüllen. Sicherheit ist sehr schwer zu gewährleisten in MMOGs, da es keine zentrale Instanz gibt, welche z.b. Positionsupdates überprüfen kann. Dies spricht für den Hybrid Ansatz, welcher zumindest regional einen Super- Peer besitzt, welcher dies übernehmen könnte. [11] Man sieht also, dass man in dem Gebiet noch vor vielen Problemen steht, aber auch ist großes Potenzial erkennbar, welches stark zur Spekulation über zukünftige MMOGs verleitet. So sind MMOGs ohne monatlich Kosten denkbar, da die Serverkosten entfallen und monatliche Kosten kaum mehr gerechtfertigt wären. Auch eine völlig neue Art von Spiel wäre denkbar: Action-MMOGs, welche "Interest Management"- Verfahren mit Donnybrook kombinieren und eine riesige Welt für Millionen von Spielern samt schnellen actiongeladen Gefechten bieten. REFERENCES [1] Son T. Vuong Anthony Yu. MOPAR: A Mobile Peer-to-Peer Overlay Architecture for Interest Management of Massively Multiplayer Online Games. Proceedings of the international workshop on Network and operating systems support for digital audio and video, III-C [2] Sebastian Jeckel Patric Kabus Bettina Kemme Alejandro Buchmann Arne Schmieg, Michael Stieler. psense - Maintaining a dynamic localized peer-to-peer structure for position based multicast in games. Eighth International Conference on Peer-to-Peer Computing, III-B, 5, III-B2, 6, IV-B [3] Jacob R. Lorch Thomas Moscibroda Jeffrey Pang Srinivasan Seshan Xinyu Zhuang Ashwin Bharambe, John R. Douceur. Donnybrook: Enabling Large-Scale, High-Speed, Peer-to-Peer Games. ACM SIGCOMM Computer Communication Review, IV-C, V [4] Wei Xu Bryan Hopkins Björn Knutsson, Honghui Lu. Peer-to-Peer Support for Massively Multiplayer Games. IEEE INFOCOM, III-A, 2 [5] Bryan Lyles Hassan Kassem Doug Balensiefen Christophe Diot, Brain Neil Levine. Deployment Issues for the IP Multicast Service and Architecture. IEEE Network, II [6] Nelson Cowan. The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, V-A [7] Mårten Stenius Emmanuel Frécon. DIVE: a scaleable network architecture for distributed virtual environments. Distributed Systems Engineering, II [8] Kier Storey Graham Morgan, Fengyun Lu. Interest Management Middleware for Networked Games. Proceedings of the 2005 symposium on Interactive 3D graphics and games, III-A [9] Norma Graham J. G. Robson. Probability summation and regional variation in contrast sensitivity across the visual field. Vision Research, V-A [10] Nicolas D. Georganas Jauvane C. de Oliveira. VELVET: An Adaptive Hybrid Architecture for VEry Large Virtual EnvironmenTs. Presence, IV-A, 7, 8 [11] Hamish Taylor Lu Fan, Phil Trinder. Design Issues for Peer-to-Peer Massively Multiplayer Online Games. International Journal of Advanced Media and Communication, I, II, III, III-A, III-C, VI [12] Michael R. Macedonia. NPSNET: A Network Software Architecture for Large Scale Virtual Environments. Citeseer, III [13] Tsu-Han Chen Shun-Yun Hu, Jui-Fa Chen. VON: A Scalable Peer-to- Peer Network for Virtual Environments. IEEE Network, III-B, 3, IV-A 48

49 GENERAL OVERVIEW ON BENCHMARKING TECHNIQUES AND THEIR APPLICABILITY FOR P2P SYSTEMS General Overview on Benchmarking Techniques and Their Applicability for P2P Systems Tomasz Grubba Abstract During the last ten years, the number and variety of peer-to-peer systems has risen. Although different benchmarking methods exist since the 1970s, only little effort is put into developing a standardized benchmarking system in this field. Considering the fact that the number of (newly developed) peerto-peer systems still grows, a tool for measuring performance may become helpful. In this paper the term benchmarking is defined and a few benchmarks from other domains are introduced and discussed with respect to peer-to-peer systems. Finally the only exisiting peer-to-peer benchmark, P2PTester, is presented and judging by the results of the discussions a list of requirements for future benchmarks or performance measuring tools in this field is being provided. I. INTRODUCTION In a time when Google Inc., originally an Internet search company, holds the number one spot of the most valuable brands in the world [19], it is not surprising that distributed systems continue to play a more and more important role. One of these technologies, peer-to-peer, first became known to the public when Napster [3] was released in the late 1990s. Since then, the idea of peer-to-peer continued to evolve and led to such "success stories" like Skype [1] or BitTorrent [10]. The idea behind this model is the communication between several participants, where each of them can act as a server as well as a client. Furthermore, such a system can also be characterized by three properties [18]: autonomousity: each peer can leave or join the network at any time heterogeneity: the connected peers can run on different operating systems or platforms most traffic happens between peers: peers can establish a direct connection between themselves and exchange data directly By using the peer-to-peer model, applications are said to provide a better scalability as well as availability and robustness. The scalability factor can easily be demonstrated by a common filesharing example: The more peers own a certain file (seeds), the more sources can be contacted by another peer (therefore, theoretically, the available bandwidth rises with each additional peer). And since the files are distributed among many peers, such a system (usually) lacks a single point of failure, which makes it more robust to attacks or data loss compared to a single server. Even though many peer-to-peer projects exist, until now there is no standardized way to test the performance of such systems. In fact, there is only one approach to benchmark them [6], which has not been deployed in too many scenarios yet. This paper will define the term benchmarking, in its original meaning as well as in its meaning in computer science. A few benchmarking tools, Linpack, TPC-E, Magpie and P2PTester, will be introduced and discussed in respect to their applicability to peer-to-peer systems. Finally, a short overview of the required functionality and of important design decisions of peer-to-peer benchmarking tools will be provided. II. BENCHMARKING Today it is hard to give a precise definition of the term benchmarking. Since it was first introduced by Xerox Corporation in the 1970s, we might want to cite the former CEO of Xerox, David T. Keams: "Benchmarking is the continuous process of measuring products, services and practices against the toughest competitors or those companies recognized as industry leaders" [5]. This quote provides an important piece of information which is often overseen when talking about benchmarking: In every benchmark, there should be at least two parties, which can be compared to each other. Otherwise many of the metrics cannot be interpreted correctly, if there is no reference point. Similarly to the definition, there are also many definitions of the process of benchmarking. Usually, all of those can be roughly reduced to two main steps while performing a benchmark [5]: benchmarking performance includes the quantification of performance and comparing it to the performance of competitors changing the practices includes changes inside the company in order to improve its performance As it can be seen in step two, the term originates from the field of business administration, but today many of the characteristics can be transferred to the field of computer science, as described in the following chapter. A. Benchmarking in Computer Science In the field of computer science, first benchmarking tools started to emerge in the 1980s. They usually run on several systems (we will define system in a few moments) and try to quantify and compare performance as well as the price (usually as a 5-year-cost-of-ownership). This paper will focus on the performance factor only. One can basically differentiate between four different forms of benchmarking [11]: the same software runs on different (hardware) machines (e.g. running the Linkpack benchmark on different processors) 49

50 GENERAL OVERVIEW ON BENCHMARKING TECHNIQUES AND THEIR APPLICABILITY FOR P2P SYSTEMS different software runs on the same machine (e.g. evaluating the efficiency of different algorithms) different releases of a software product run on one machine (e.g. comparing different version of the Microsoft Windows operating system) different software- and hardware-systems are compared to each other (e.g. comparing "Wintel" systems to the former Mac/PowerPC systems) The basic process of running a benchmark is comparable to the process of software testing. First, the System Under Test (SUT) needs to be specified along with the specific performance aspects, that need to be measured. In step two a metric is defined, which will be used to determine the performance and allows systems to be compared to each other. Following the definition of the output some workload is provided. Depending of the type of the benchmark, workload can be some equations to be solved or data to be inserted to a database. Finally, the benchmark is executed (several times) and the results are evaluated. In the beginning, one of the first approaches was to measure the performance using the MIPS-metric (millions of instructions per second). As Gray states [11], it is a too simple approach. For instance, MIPS may have another meaning to performance on different architectures. In addition, the metric is hardly scalable (e.g. it is not clear how it should be interpreted in a multiprocessor environment). Therefore, every domain or even every benchmark vendor uses an own metric in order to make the results comparable across platforms. B. Domainspecific Benchmarks During the years, it became clear that it will not be possible to find and define one standard benchmark for comparing computer systems. The reason for this can be explained by a simple example. Taking a look at the so-called supercomputers [2], those architectures are usually purely designed for fast CPU-execution times, but often lack database-functionality, which would make them useless in this field. In the following, this paper will first briefly describe four more and less popular benchmarks from different domains, ordered by their applicability to peer-to-peer systems. A rather popular field of benchmarking, that of graphic cards, is left out due to its irrelevance to the topic of peer-to-peer benchmarking. 1) Linpack: The Linpack-Benchmark [9] emerged from the project BLAS, which originally (1970s) was just a collection of linear equations programmed in Fortran. During the years, more performance data was added and as of today it is one of the standard tools to measure computational power and is also used to rank computers in the Top500 list [2]. The main idea behind the benchmark is to solve a general dense matrix problem Ax=b. Due to memory limitations, matrices of the size 100 were used initially. As the capacity of memory increased, such a limitation has become discarded. Fig. 1 shows a list of four different benchmarks in the Linpack suite of benchmarks. Benchmark Name Matrix Dimension LINPACK LINPACK LINPACK Parallel HPLinpack arbitratry Fig. 1: Matrix sizes in Linpack benchmarks A benchmark can have a variable number of operations, for example the Linpack 100 s DGEFA routine counts close to floating point operations. The results of the benchmark are presented in FLOPS (floating point operations per second). Current supercomputers are capable of running up to TeraFLOPS [2], while a common current personal computer (e.g. an Intel i5 with 2.26GHz) may compute close to MegaFLOPS. Just recently Linpack has become a little more known to the public after Google released Android 2.2 (using a new JIT-compiler). The first speed tests were made with Linpack for Android and have shown an increase of up to 450% in system performance compared to Android 2.1 (as can be for example seen in [20]). Linpack has also been a subject to criticism. First, as mentioned before, it can only quantify the processing power (and to some extent also the speed of the memory), which is not necessarily representative for the overall performance of a system. Furthermore many manufacturers tried to optimize their machines for the Linpack-benchmark, for example by recognizing and replacing equations by using faster algorithms. In addition, such a benchmark is not representative for many computational problems, which do not heavily rely on floating point operations. Regarding its applicability to peer-to-peer systems it needs to be stated, that because of its local execution there is almost no use in benchmarking those systems. The processing power needed to run a peer-to-peer client should be sufficient on most or even all of today s computers (including mobile devices). Fig. 2: TPC-E business model [16] 2) Transaction Processing Performance Council: Out of several consortia, which try to standardize domain-specific benchmarks, one of the better known is the Transaction 50

51 GENERAL OVERVIEW ON BENCHMARKING TECHNIQUES AND THEIR APPLICABILITY FOR P2P SYSTEMS Fig. 3: TPC-E database model [16] Processing Performance Council consisting of several hardand software manufacturers, which is defining standards for transaction processing and database benchmarks. Up to this day, they have released nine different benchmarks [17], while only TPC-C, TPC-E and TPC-H are still used and maintained. Out of those, we will briefly introduce TPC-E, which is a benchmark for On-Line Transaction Processing, developed in 2007 [16]. The system simulates the workload of a brokerage firm containing three distributed entities (Customers, Brokerage House and Stock Exchange), which can roughly be described as a three-tier architecture (see Fig. 2). The whole system consists of 33 tables with 188 columns and is constructed to be vendor neutral, in order to make the benchmark runnable on different platforms. Fig. 3 provides an ER-diagram in order to demonstrate the complexity of the system. Though the single tables are not of interest in this context, the colors indicate the "tiers" of the architecture (red: broker, blue: customer, green: market, orange: dimension tables used by more than one tier) A data loader is also provided but does not state a maximum number of entries. Customers can be loaded in blocks of Along with the customers, also companies and securities and a few more entities are loaded into the database. After the data is loaded, a predefined set of transactions is executed. The benchmark contains 12 different transaction types, such as "Enter a stock trade" or "Lookup historical trade info". The metric for the results is TpsE (Transactions per second E, in order to differentiate between the different available benchmarks). In comparison to earlier benchmarks developed by the consortium (e.g. TPC-A or TPC-B), this has several benefits. Since it is partly using real data (based on the 2000 US and Canada census data), it is easier to understand by humans and additionally represents a much more realistic real world scenario than abstract queries (e.g. in the Wisconsin Query Set [8]). Furthermore, by having a variable size of the dataset used, it is easily scalable and can simulate a small brokerage firm as well as a bigger one with millions of customers. In contrast to the previously introduced Linpack benchmark, TPC-E can also be run on a distributed system. Unfortunately, the results only show transactions per seconds and are not splitted in smaller parts of the underlying architecture. Assuming a database built on the peer-to-peer model, the TPC benchmarks could be a first step to measure the performance. But given the fact, that the tools are basically run on the client respectively on the customers and brokerage house side, the results would mainly depend on the machine they are tested on. Observations based on averaged results from different clients have the disadvantage of hiding information about the performance of the peers in both ends of the spectrum (i.e. the extremely slow and fast ones, which may be of interest while developing such systems). 3) Magpie: Another interesting approach to measure the performance of distributed systems is Magpie [4], [13], developed by Microsoft s Research department in the UK. Unfortunately, even though the project exists at least since 2002, up to this day there are no tools available to the public. So the following chapter will only demonstrate the basic idea behind the project without providing any examples of real world tests. Magpie is developed as an online modelling infrastructure and is based on two key design principles. Firstly, influenced by the fact that most Microsoft products, especially Windows, are proprietary, it needs to be black-box, i.e. any software tested with this benchmark is not needed to be modified. Secondly, end-to-end tracing enables the analysis of every 51

52 GENERAL OVERVIEW ON BENCHMARKING TECHNIQUES AND THEIR APPLICABILITY FOR P2P SYSTEMS request between any two peers individually. Using further techniques, such as behavioural clustering rather than for example URL-based clustering, and stochastic context-free grammar with a probabilistic state machine, it is possible for Magpie to model the system and benchmark it in order to optimize the performance. This approach seems to be a new way to benchmark systems, but, as mentioned before, too litte information is shared with the world, even though the authors were planning to profile a peer-to-peer messaging framework in 2002 [13]. Since no details about the work have been published in the recent years, it is possible, that the project has been cancelled. 4) P2PTester: The only project, that focuses solely on peerto-peer benchmarking, is P2PTester, being led by the PRiSM Laboratory in France [6]. They state three main goals of their performance measurement tool: 1) Genericity: the tool should be applicable to a wide range of different peer-to-peer platforms 2) Scalability: the tool should be deployable on a large scale and produce only a low overhead itself 3) Modularity: detailed and fine-grained measures should be possible in order to analyze components of the underlying system P2PTester is structured into four layers, which are schematically shown in Fig. 4 and will be explained in the following. Fig. 4: P2PTester Architecture [6] The Communication or Network Layer offers an infrastructure in order to enable an exchange of messages between peers. In the current version a basic socket-approach is used. The Application (or Protocol Peer) Layer consists of application specific modules (so is basically the layer, where the application itself is located), which can for example be responsible for routing or query processing. The Test(er) Layer is responsible for allowing the modularity of the components. The main part of it is the distributed logging module, which records the details of each event (by intercepting every action performed by the application). The Master Tester or Test Generation Layer helps the user in creating and controlling tests. It contains a Java Graphical User Interface and additionally has an own web interface to demonstrate the results. P2PTester provides an API for writing test scenarios. Using this API, any actions by the peer-to-peer systems are intercepted by P2PTester in order to log them. Using these log files, the tool is able to measure the following parameters: number and size of message needed for standard P2Pactions (joining, leaving, querying) size of data stored on each peer query result sizes query processing time, partly broken down into more exact information Among the demonstrated benchmarking tools, P2PTester is the most applicable one in the peer-to-peer world. It has the potential to work on a big variety of systems (provided they can be adjusted, more on that later) and deliver comparable results. Even though it is a very promising project, unfortunately, there are a few downsides of this approach as well: First of all, by providing interfaces for the peer-to-peer systems, it is harder to apply them on many existing architectures, because in contrast to Magpie, this is not a black box solution. Especially while dealing with proprietary software, this tool might be useless. Secondly, it is not known, whether the system has been deployed on a large number of peers. The biggest known number is a test on roughly 100 computers at three locations in France in Considering the fact that many systems are used by several million peers (e.g. Skype), it is hard to predict the applicability of the tool. Comparing the results to the ones of the other benchmarks presented in this paper (especially Linpack and TPC-E) it lacks an easy to understand metric. On the one hand this makes it easier to benchmark components and single requests (which is important for developers), but on the other hand normal endusers will have trouble to understand the results because of their complexity. Nevertheless, as of today, P2PTester is the best and most promising approach to benchmark peer-to-peer systems. 5) Other approaches: Several other approaches to perform simulations or measure performance of distributed systems exist, but most of them have limitations, which make them hard to apply in the real world. PeerSim [14] is a tool to simulate peer-to-peer networks, but it is not possible to apply it to existing systems with an own infrastructure in order to measure the performance GnutellaSim [12] simulates a peer-to-peer network by instantiating a framework with the Gnutella network. In theory it 52

53 GENERAL OVERVIEW ON BENCHMARKING TECHNIQUES AND THEIR APPLICABILITY FOR P2P SYSTEMS should be possible to extend the functionality to other systems, but none other simulations can be found. Theme NUM [15] have performed large-scale experimental evaluation of the JXTA framework on peers located across France. They developed an own peer-to-peer network in order to evaluate the boundaries of the framework (the number of peers, that can be kept in one peerset). Since JXTA is used, this approach is hardly transferable to other systems, which makes a comparison difficult. A few other simulators of specific P2P-networks exist, but all of those focus on a single network and as stated in the beginning, in order to benchmark systems at least one alternative needs to be present and tested as well. III. REQUIREMENTS FOR A PEER-TO-PEER BENCHMARKING TOOL Even though P2PTester seems to be a project heading in the right direction, it will probably not be suitable for every system and every scenario. Therefore this section will define a few requirements for a peer-to-peer benchmark and will discuss the needs of several parties. Before developing a new benchmark, one needs to decide the main goal of it. For example, the point of view of the end user may vary from the point of view of the developer. While the latter wants to build a fast and stable overall system, getting the opportunity to tune single components of the product, a single user is just interested in its own performance and, speaking about benchmarks, an easy to understand metric. This paper will provide a guidance to gather results, which are as detailed as possible (as described under the point modularity while discussing P2PTester), a later adjustment to them may make the metric easier to read for non-tech-savvy readers. In order not to interfere with the idea of peer-to-peer, one needs to keep the properties of this approach in mind (as stated in the introduction of this paper): In the optimal scenario, the system should not be able to be aware of any benchmark running at the same time. The peers should still be able to join or leave the network at any time without causing trouble for not only the network, but also the benchmark being run on top of it. Scalability, one of the main reasons for the success of peerto-peer systems, should not be hindered by any tests. In order to achieve this, an important issue is the overhead produced by the benchmark. It should be kept as low as possible. While speaking about scalability, another problem emerges: because of the decentralized character of such networks, a way needs to be found to generate the workload and simulate a realistic behaviour of the participating peers. Furthermore, the tool should be platform independent and support the heterogeneity of the network (i.e. the possibility of the peers to run on different platforms). Taking into consideration, that in the best scenario every peer should be intercepted in some way or the other, it might need some precaution. Probably the hardest goal to achieve is the universality of such a system. Because of the wide range of applications using the peer-to-peer model, e.g. filesharing (BitTorrent [10]), instant messaging (Skype [1]) or social networking (Safebook [7]), every type has different performance requirements. For example while at instant messaging or at phone calls any delay decreases the quality of service, in a filesharing scenario even a delay of 2-3 seconds is acceptable, but a high bandwidth/download rate is much more important than in other scenarios. Since some of these problems seem to contradict each other, the peer-to-peer world will eventually evolve in a similar way as benchmarking in computer science in general has evolved in its beginnings: While different requirements led to domain specific benchmarks, specific tools will emerge in order to benchmark different applications. IV. CONCLUSION As mentioned in the introduction, only little research in the benchmarking of peer-to-peer systems has been conducted. While older approaches have none or only little applicability to today s peer-to-peer systems, only P2PTester is a tool solely focusing on peer-to-peer. Even though it has not been tested on a large number of peers yet and has its drawbacks, it could mark the beginning of a new research field. Any further research will need to deal with a number of decisions and problems, such as finding a way to provide workload in a decentralized scenario, maintain the scalability or produce as little overhead as possible, just to name a few. Furthermore, because of the complexity and variety of those systems it will be hard to develop a single standard benchmark for peer-to-peer infrastructures, rather many more specialized approaches will eventually emerge. This situation can be compared to the situation of the first benchmarks in computer science, where domain specific tools needed to be developed in order to receive usable results. REFERENCES [1] Skype. accessed on ), I, III [2] Top500 list - june accessed on ), II-B, II-B1, II-B1 [3] K. Aberer, M. Punceva, M. Hauswirth, and R. Schmidt. P2p systems. In In Practical Handbook of Internet Computing. CRC press, I [4] P. Barham, R. Isaacs, R. Mortier, and D. Narayanan. Magpie: Online modelling and performance-aware systems. In In Proceedings of the Ninth Workshop on Hot Topics in Operating Systems (HotOS IX, pages USENIX Association, II-B3 [5] W. Booth, G. G. Colomb, and J. M. Williams. The Benchmarking Book: Best Practice for Quality Managers and Practitioners. Butterworth Heinemann, II [6] B. Butnaru, F. Dragan, G. Gardarin, I. Manolescu, B. Nguyen, R. Pop, and N. P. L. Yeh. P2ptester: a tool for measuring p2p platform performance. In International Conference on Data Engineering, pages , I, II-B4, 4 [7] L. A. Cutillo, R. Molva, and T. Strufe. Consumer communications and networking safebook: A privacy-preserving online social network leveraging on real-life trust, III [8] D. J. Dewitt. The wisconsin benchmark: Past, present, and future. the benchmark handbook for database and transaction processing systems, II-B2 [9] J. J. Dongarra, P. Luszczek, and A. Petitet. The linpack benchmark: Past, present, and future. concurrency and computation: Practice and experience. Concurrency and Computation: Practice and Experience, 15, II-B1 [10] D. Erman, D. Ilie, and A. Popescu. Bittorrent session characteristics and models. In in Procedings of HETNETS05, I, III [11] J. Gray. Database and transaction processing performance handbook., II-A 53

54 GENERAL OVERVIEW ON BENCHMARKING TECHNIQUES AND THEIR APPLICABILITY FOR P2P SYSTEMS [12] Q. He, M. Ammar, G. Riley, H. Raj, and R. Fujimoto. Mapping peer behavior to packet-level details: A framework for packet-level simulation of peer-to-peer systems. gnutella/simulator.ps (last accessed on ), II-B5 [13] R. Isaacs and P. Barham. Performance analysis in loosely-coupled distributed systems. In In 7th CaberNet Radicals Workshop, II-B3 [14] A. Montresor and M. Jelasity. Peersim: A scalable p2p simulator, II-B5 [15] T. Num, G. Antoniu, G. Antoniu, L. Cudennec, L. Cudennec, M. Duigou, M. Duigou, M. Jan, and M. Jan. Performance scalability of the jxta p2p framework. In In Proc. 21st IEEE International Parallel and Distributed Processing Symposium, pages , II-B5 [16] T.-P. Subcommittee. TPC-E Benchmark Overview. Transaction Processing Performance Council. (last accessed on ). 2, 3, II-B2 [17] TPC-PR. Transaction Processing Performance Council. II-B2 [18] T. Tuan, A. Dinh, M. Lees, G. Theodoropoulos, and R. Minson. Large scale distributed simulation of p2p networks, I [19] L. Whitney. Google, ibm, apple world s most valuable brands. (last accessed on ), I [20] T. Wimberly. Jit performance boost coming with android jit-performance-boost-coming-with-android-2-2/ (last accessed on ), II-B1 54

55 SAFEBOOK KEY MANAGEMENT Safebook Key Management Felix Günther Abstract Safebook is a peer-to-peer based online social network (OSN), enabling users to create profiles and share data like in other OSNs as, e.g., Facebook. Since the decentralized architecture of Safebook does not contain a central authority that is able perform access control, encryption is needed to ensure the confidentiality of published data. This paper outlines strict requirements and weak constraints for the encryption of data attributes in Safebook. Subsequently, an overview of possible cryptographic solutions is given and their suitability according to these requirements is analyzed. As a result, the differences and trade-offs between and within the given approaches are expounded. The outcome of this paper can be used as a foundation for further investigations on this topic. I. INTRODUCTION Safebook [2] is a peer-to-peer based, decentralized OSN that builds links between peers based on real-life relations. The system provides operations like data publication in profiles, data retrieval from other users profiles, contact management or message exchange as known from existing OSNs. Due to its decentralized nature, Safebook lacks a central authority enforcing access control on all user profiles which is possible on centralized OSNs like Facebook or LinkedIn that are accessed through a single web interface. In contrast to these systems, the profile of a user in Safebook is stored on her system itself, all user systems forming a peer-to-peer network. Since the profile of a user is replicated on the systems of her friends (i.e., her contacts) for accessibility reasons, she is not able to enforce live access control on her profile either. These constraints yield the need of encryption of user profile data in Safebook to guarantee its confidentiality. In addition to that, Safebook users shall be enabled to restrict access to their profiles in a fine-grained manner on atomic attributes. Thus, the encryption scheme has to offer a possibility to encrypt a multitude of single attributes each for a single user or a group of users. Aside of these hard constraints the architecture of Safebook poses some weaker constraints. Since all operations need to be executable on mobile devices and with affected users being offline, storage space requirements and the requisite interaction between group members in the particular encryption scheme have to be taken into account. After presenting architectural structures of Safebook beneficial for key management, this paper points out the significant requirements demanded from the encryption scheme and outlines weaker constraints leading to varying focuses among the investigated approaches. Based on these requirements and constraints, an overview of different encryption schemes is given, reflecting different approaches to distribute and manage group keys. Thereupon, these schemes are analyzed regarding the weak constraints by developing abstract formulas for several interesting properties of the presented approaches which are evaluated later on using reasonable system parameters. As a result, it is shown that the examined schemes differ heavily in terms of the given constraints, whereas an approach based on broadcast encryption performs best regarding the outlined properties. The rest of this paper is organized as follows. First, beneficial architectural structures of Safebook are pointed out in section II. In section III, requirements for suitable encryption schemes are discussed and weaker constraints identified. A survey of applicable approaches is given thereafter in section IV, followed by an evaluation of these in section V. In section VI, approaches related to the described ones are discussed and their drawbacks regarding the given requirements are pointed out. The paper concludes in section VII with a short summary and gives an outlook on future work. II. BENEFICIAL ARCHITECTURAL STRUCTURES OF SAFEBOOK The architecture of Safebook includes a Trusted Identification Service (TIS) that provides each user joining the network with an unambiguous node identifier and pseudonym. Along with this, each user generates two public/private key pairs for the peer-to-peer and OSN levels for which she receives certificates from the TIS. Furthermore, the peer-to-peer substrate of Safebook allows to resolve the public key belonging to a user or to a user s pseudonym. That way, encryption schemes can utilize the global availability of keying material (i.e., distributed public/private key pairs) for their purpose. The existence of keying material eases the communication and key distribution needed for key management and renders more complex approaches (as they are needed in, e.g., ad-hoc networks, where no preliminary keying material is available) unnecessary. Beyond that, the user groups who shall be allowed to access certain attributes are in contrast to members in, e.g., ad-hoc networks or interest groups in Facebook both relatively stable and more likely increasing than decreasing. Thus, member exclusions will occur only rarely, which can be advantageous for some encryption schemes. III. REQUIREMENTS AND CONSTRAINTS In this section, the mandatory requirements for encryption schemes solving the given problem are defined first, followed by weaker constraints, whose degree of fulfillment can be quantified for each approach. A. Mandatory Requirements The following requirements have to be met by an approach to be suitable at all: 55

56 SAFEBOOK KEY MANAGEMENT 1) Confidentiality: If an attribute a is encrypted for a certain group of users U a = {U a,1, U a,2,..., U a,n }, it has to be computationally infeasible for any user U U a to decrypt the attribute a. 2) Access Control: Only the owner of a profile can change the access rules to its attributes, defining who is allowed to access a certain attribute and who is not. In particular, the mirroring peers (i.e., all peer in the owner s contact list) must not be able to manipulate the access rules, neither of attributes they are allowed to decrypt nor of these that they are not allowed to decrypt. 3) Privacy: It has to be infeasible for any user to discover the identity of an authorized user (except for herself) of an attribute as well as to decide whether any other user is or is not authorized to access an attribute. 4) Key Independence: If the encryption schemes use group keys K = {K 0, K 1,..., K n } (i.e., a secret share is published to and known by all users having access to a given attribute), it has to be guaranteed that a passive adversary knowing an arbitrary subset ˆK K of group keys is not able to discover any other group key K (K \ ˆK) (cf. [10]). Key independence implies forward and backward secrecy; i.e., an attacker knowing a contiguous subset of group keys cannot discover subsequent or preceding keys. B. Weaker Constraints Besides the hard requirements given above, there are weaker constraints posed by the architecture of Safebook, which allow an evaluation of different approaches applying to the requirements. These are: 1) Storage Space: The keys used by encryption schemes in the given setting are duplicated in two ways: On the one hand, the keys the owner has to store in her profile (e.g., encrypted shared keys for authorized users) are replicated on the systems of all her contacts. On the other hand, the keys needed for accessing an attribute have to be stored by the client user (regarding the access to a profile) for every attribute she has access to and that for any contact s profile in her contact list. It is obvious that especially the client-side storage needs can become very large. Therefore, storing keys on clients should be avoided if possible, or at least used on a limited scale. Keeping the replication of user profiles in mind, the amount of storage overhead imposed in the profile should be kept as low as possible. It should be noted that not all data stored at the owner of a profile necessarily has to be stored in the profile itself, e.g., the private key of the owner clearly has to be stored on her system but needless to say not in her profile, replicated on other systems. Encryption schemes may introduce similar keys or other data, which have to be stored on the owner s system only, not directly in the profile. 2) Interaction with group users: Since Safebook is based on a peer-to-peer system, its users systems apparently are not permanently online, which makes direct communication difficult. Therefore, live interaction needed between the owner of a profile and users in an access group for a certain attribute should be reduced to a minimum. Otherwise, e.g., establishment of keys would slow down dramatically, since delayed channels would have to be used. 3) Expenditure of resources needed for computations: As Safebook clients should be able to run on mobile devices with limited computing power, access control management has to be feasible also on these clients. Thus, the computation of keys is demanded not to be too expensive regarding the resources needed. IV. ENCRYPTION SCHEMES In this section, different approaches suiting the requirements are described. First, a simple and intuitive scheme in two variants is described. Thereafter, a more complex approach is outlined based on the One-way Function Tree (OFT) scheme [1], [11], [13], which itself bases on the Logical Key Hierarchy approach (LKH) [17], [18]. The third scheme presented in [7] uses bilinear pairings for broadcast encryption (BE) to achieve adaptive security (the scheme is subsequently referred to as Gentry-Waters BE ). A. Simple Shared Key The intuitive approach to encrypt attributes for a group of users is the following: The profile owner creates a new attribute a and defines the group U a = {U a,1, U a,2,..., U a,n } of users authorized to access it. She then chooses a secret key K a for this attribute at random, encrypts the attribute a as Enc Ka (a) and adds the encrypted attribute to her profile. Finally, the key K a has to be distributed to all users in U a. Regarding the architecture of Safebook, there are two possibilities to distribute the key K a, forming the two shapes of this approach: 1) Client-side Key Storage: The first variant is to send every user U a,i the secret key K a for the new attribute using the respective public key pk i for encryption; i.e., send Enc pki (K a ) to each U a,i U a. In this case, the owner of the profile only has to store the current attribute key K a (which does not have to and should not be stored in the profile), but this key needs to be stored also on the system of every user in U a. When creating an attribute, K a has to be sent to each user U a,i, resulting in n messages. If a new user U a,n+1 is added to the group of authorized users U a, the attribute key K a needs to be changed and the new key K a has to be transmitted to all users U a,i U a = U a {U a,n+1 } as Enc pki (K a), thus resulting in n + 1 messages and encryptions. The owner of the profile then encrypts the attribute with the new key as Enc K a (a) replacing the old encryption in the profile. On exclusion of a user U a,j out of U a, the owner of the profile also has to choose a new secret key K a and replace the encrypted attribute in the profile. The key has to be distributed to all users in the new group of authorized users U a = U a \ {U a,j }, resulting in n 1 messages and encryptions. It should be noted that this approach also supports the addition or exclusion of a subgroup of multiple users Ūa = {Ūa,j 1,..., Ūa,j m } at once: Addition and exclusion of this group can be carried out like the addition or exclusion of a single user, publishing the new key K a to U a = U a Ūa in case of user addition respectively U a = U a \ Ūa in case of user exclusion, resulting in n + m respectively n m messages. 56

57 SAFEBOOK KEY MANAGEMENT K a K a Enc Ka,1 (R(R(r))) K a,0 K a,1 K a,0 K a,1 Enc Ka,00 (R(r)) K a,00 K a,01 K a,10 K a,11 U a,1 U a,2 U a,3 U a,4 K a,00 U a,1 K a,01 K a,10 K a,11 Enc Ka,011 (r) U a,4 U a,5 Fig. 1. OFT key tree K a,010 K a,011 2) Profile-side Key Storage: The second variant of this approach is to store the secret key in the profile rather than distributing it to all authorized users. For this purpose, the owner of the profile computes Enc pki (K a ) for each U a,i ; i.e., she encrypts the secret key K a for each authorized user U a,i using the respective public key pk i. These encodings of K a are then stored in the owners profile, accessible for all Safebook users, thus also the authorized ones. This way, the authorized users do not need to store anything: To access a an authorized user U a,i decrypts the encryption of K a destined for him using his private key sk i and receives K a enabling him to decrypt the attribute. This zero-storage at client-side is traded in for greater storage needs at profile-side, since the owner of the profile in this variant has to store not only K a (outside the profile), but also n encryptions of K a in the profile that are replicated on the systems of her contacts. Since no information has to be transferred to the authorized users in this variant, there is no group interaction at all; i.e., no messages need to be sent. Member addition and exclusion (also of multiple users) are done analogously to the first variant, storing the new encrypted keys in the profile rather than sending them to the users. B. OFT-based Approach The approach based on the One-way Function Tree (OFT) uses a binary tree, containing the shared secret key K a for the attribute a at its root and associating the leafs with the n authorized users U a,1,..., U a,n (see Figure 1). The key tree is of height log n and is initialized as follows (cf. [1]): The profile owner associates every node v with a randomly chosen key K a,v and sends each user all keys associated to nodes on the path from the user to the root encrypted with the respective user s public key. In the tree of Figure 1 for example, U a,1 would receive K a,00, K a,0 and K a. Thus, each user receives at most log n + 1 keys, transmitted with n messages. As all users know K a, the encrypted attribute Enc Ka (a) can be stored in the profile with all authorized users able to decrypt it. On user removal, all keys associated to nodes on the path from the removed user Ū to the root have to be changed Fig. 2. U a,2 U a,3 OFT user removal or addition to assure forward secrecy. As an enhancement of the LKH approach, OFT does not choose all new keys on the path to the root at random, but only assigns the parent node p(ū) of the removed user Ū a randomly chosen value r. Then, a pseudorandom generator [8] G which doubles the size of its input (L(x) and R(x) denoting the left and right halves of the output of G(x)) is used to determine the new keys on the path to the root. Every other node v on the path to the root is assigned a value r v computed as r p(v) = R(r v ) = R Ū v (r) (where p(v) denotes the parent and v the height of v). Based on these values the new key of a node v is defined as K a,v = L(r v ) = L(R Ū v 1 (r)). Finally, each value r p(v) is encrypted with the key K a,s(v) (s(v) denoting the sibling of v) and sent to the users in the subtree of s(v), thus enabling all users to compute the new attribute key K a. For example, if user U a,2 is removed in the tree of figure 2, Enc Ka,011 (r) has to be sent to U a,3, Enc Ka,00 (R(r)) to U a,1 and Enc Ka,1 (R(R(r))) to U a,4 and U a,5, thus log n encryptions are needed and n 1 messages sent. Now, each user is able to compute the new keys K a,01 = L(r), K a,0 = L(R(r)) and K a = L(R(R(r))). User addition is accomplished similar to user removal. To guarantee backward secrecy, all keys on the path from the new user to the root have to be changed the same way as if the new user would have been removed. Thus, the addition of U a,2 to the tree of figure 2 results in the same encryptions and n + 1 messages sent (of course, r is chosen newly at each addition or removal). If user U a,3 is moved down in the tree in order to add U a,2, she keeps her old key K a,01 as the new key K a,011. The owner of the profile has to store the whole key tree (not in the profile itself), whereas the authorized users have to retain at most log n + 1 keys each. C. Gentry-Waters broadcast encryption (BE) approach The Gentry-Waters BE scheme presented in [7], which is a very novel approach in the field of broadcast encryption [3], [4], [6], [12] (especially regarding adaptive security), 57

58 SAFEBOOK KEY MANAGEMENT can be used for the encryption of multiple attributes at the same time and with low overhead in our setting. Due to its complexity, we will only sketch the approach at this point (cf. [7], section 3.1 for more details). The construction contains the four algorithms Setup, KeyGen, Enc and Dec which we will draft subsequently: Setup generates the basis groups G, G T of prime order p and the bilinear map e. Moreover, it chooses α Z p and g, h 1,..., h n G at random and computes a public and a private key P K and SK. Using the private key, KeyGen is called for each of n users (where n constitutes the upper bound for the number of users which can be granted access to an attribute), resulting in a secret key d i for each user. These secret keys can be stored in the profile, encrypted with the public key of the respective user. This concludes the initialization. Thereafter, a secret share for each group of authorized users can be computed by providing SK and the group of authorized users to Enc, which outputs a header Hdr and the secret key K. Using this secret key, the owner of the profile can now encrypt the attribute, the group shall have access to and store it in the profile together with the header Hdr. Each authorized user is then able to decrypt the shared key K using Dec with her secret key d 1 i, which she is able to decrypt with her private key sk i. User addition and removal requires a new execution of Enc, since the group of authorized users has changed. A regeneration of the secret keys d i is not needed. The authorized users have to store nothing in this approach. The owner of the profile has to retain the private key SK whereas the public key P K and the encrypted secret keys d i for each user as well as the header Hdr and symmetric encryption Enc K (a) for each attribute have to be stored in the profile. V. EVALUATION We will now evaluate the encryption schemes described in the previous section according to the requirements and weaker constraints presented in section III. First, the relevant metrics (as storage space, numbers of encryptions needed, etc.) for analysis are defined. Then, abstract formulas are determined for all properties and encryption schemes. In a third step, we apply concrete values for the parameters of the properties to explore the trade-offs imposed by the approaches. Finally, the differences between the approaches are discussed. A. Property definitions We will study the following abstract properties on each scheme: 1) Storage requirements at the profile owner outside the profile: The amount of storage in bytes needed for a single attribute on the system of the profile owner, which is not stored in the profile, is denoted by S o. 1 The Dec algorithm also has to be provided with the indices of the users that are allowed to access a certain attribute. It is arguable how much information about the users with the respective indices can be deduced. We assume that meaningful linking becomes (statistically) impossible if the attribute is encrypted for a certain percentage of additional dummy indices not related to any user. 2) Storage requirements in the profile and its replications: The amount of storage in bytes needed for a single attribute in the profile is denoted by S p. This value includes the storage needed on the systems of the profile owner s contacts (remember that Safebook profiles are replicated at their owner s contacts for accessibility reasons). It does not include the storage needed for the symmetric encryption of the attribute itself. 3) Storage requirements at the authorized users: The amount of storage in bytes needed for a single attribute on the systems of all users authorized to access the attribute is denoted by S u. 4) Number of encryptions on initialization: The number of encryptions needed on initialization of the encryption scheme is denoted by E i, not including the encryption of the attribute itself. 5) Number of encryptions on user addition: The number of encryptions needed when a user is added to the group of authorized users is denoted by E a, not including the encryption of the attribute itself. 6) Number of encryptions on user removal: The number of encryptions needed if a user is excluded from the group of authorized users is denoted by E r, not including the encryption of the attribute itself. 7) Number of messages on initialization: The overall number of messages sent on initialization of the encryption scheme is denoted by M i. 8) Number of messages on user addition: The overall number of messages sent when a user is added to the group of authorized users is denoted by M a. 9) Number of messages on user removal: The overall number of messages sent if a user is excluded from the group of authorized users is denoted by M r. B. Abstract property formulas Table I shows the abstract formulas for all properties and encryption schemes using the notation shown in table II. We will now develop these formulas: 1) Simple Shared Key: In both variants the owner has to store the shared symmetric key K a, thus S o = b s. S p = 0 for the variant of client-side key storage (since the profile is not needed for storage here) and S p = (C + 1) N b a for the profile-side storage, as we have to store the (asymmetrically) encrypted K a for N users (N is the number of users authorized to access attribute a, cf. table II) and this storage is replicated to the C contacts of the profile s owner. Each authorized user has to store K a in the client-side variant, thus S u = N b s here. In the profile-side variant, the users have to store nothing. The number of needed encryptions is identical for both variants: On initialization, the shared key K a has to be encrypted for each user, resulting in N encryptions. When adding or removing a user, K a has to be encrypted for all members of the new group of authorized users, thus N + 1 respectively N 1 encryptions have to be performed. Whilst the profile-side variant does not need any messages to be sent, each encrypted key has to be transmitted to the appropriate user in the client-side variant. 58

59 SAFEBOOK KEY MANAGEMENT TABLE I ABSTRACT PROPERTY FORMULAS Scheme Storage Encryptions Messages S o S p S u E i E a E r M i M a M r S. Shared Key (1) b s 0 Nb s N N + 1 N 1 N N + 1 N 1 S. Shared Key (2) b s (C + 1)Nb a 0 N N + 1 N OFT (2N 1)b s ( 0 ) N( log N + 1)b s N log N log N N N + 1 N 1 b Gentry-Waters BE g P K +Cba + 2b A A g (C + 1) A N C n BE b s b a b g b p TABLE II NOTATION USED IN THE PROPERTY FORMULAS the number of attributes in the profile the number of users authorized to access an attribute the number of contacts of the profile s owner the maximum number of users chosen for the Setup algorithm the size of a symmetric key in bytes the size of an asymmetric key in bytes the size of a pairing-based group element (g G) in bytes the size of a pairing (e(g, g)) in bytes 2) OFT-based Approach: In the OFT-based scheme, the owner of the profile has store the key tree outside of the repository which contains 2N 1 nodes, each associated with a symmetric key b s. The N users in the tree have to retain the keys on the path to the root (at most log N + 1 symmetric keys of size b s ), thus S u = N ( log N + 1) b s. Nothing has to be stored in the profile. During the initialization, all keys on the path from a user s leaf node to the root have to be sent encrypted to the respective user, resulting in N encryptions and N messages sent. Since user addition and removal are quite similar, they both need the same number of encryptions: For each subtree under a sibling of a node on the path from the removed or added node to the root, a value r v is encrypted, resulting in at most log N (and at least log N) encryptions. All nodes in the tree have an ancestor changing its associated key, thus all N +1 respectivley N 1 nodes have to receive a key updating message. 3) Gentry-Waters BE scheme: The owner of the profile has to store the private key SK of size b g in the Gentry-Waters BE scheme. As this key has to be stored only once, its size has to be distributed over the number of attributes; i.e., divided by the number of attributes A, resulting in S o = bg A. The authorized users need not retain anything. The profile has to provide the public key P K (consisting of n BE +1 elements of the group G and one pairing e(g, g) α ) of size P K = (n BE +1) b g +b p and the encrypted secret keys d i for all of the profile user s contacts (C b a ). This storage like the private key is required only once and therefore divided by A. Further, the header Hdr of size 2 b g has to be stored in the profile for each attribute. Finally, all this is replicated ( on the C systems of ) the user s contacts. Summarized, S p = P K +C ba A + 2 b g (C + 1). Concerning the number of encryptions, we only consider needed executions of the algorithm Enc, since Setup and KeyGen have to be processed only at the initialization of the scheme, not at each initialization of an attribute. Thus, we have a single encryption for initialization as well as for user addition and removal. No active messaging is needed in this approach. C. Analysis of the proposed schemes Figure 3 shows the plots of all properties over the number of authorized users for an attribute. For A, C, n BE, b a, b s, b g and b p, we have used the following reasonable values: We have assumed a high average of C = 250 contacts and a medium average of A = 300 attributes in a profile. A maximum of 250 users for a group of authorized users used in the Setup algorithm of the BE approach seems reasonable. Furthermore, we have chosen a bit length of 1024 for asymmetric keys and pairings each (b a = 128, b p = 128) and a bit length of 192 for symmetric keys and pairing-based group elements each (b s = 24, b g = 192). D. Discussion The two variants of the Simple Shared Key approach require a considerable amount of storage either in the profile or at client-side, which is rather problematic since storage in the range of megabytes at profile-side respectively kilobytes at client-side is needed just for a single attribute. Keeping in mind that a user may have thousands of attributes and clients have to store attribute keys for each attribute of all of their contacts, the Simple Shared Key approach gets infeasible rapidly. Without doubt, the great advantage of the approach based on the One-way Function Tree scheme is the ease of access revocation; i.e., user removal in our case. However, this scheme requires a comparatively high amount of group interaction in form of messages to the authorized users and depends on an amount of client-side key storage that is even higher than the one needed by the Simple Shared Key approach. As stated before, the amount of client-side storage is multiplied by the number of attributes the user has access to at the profiles of all her contacts. Thus, this approach results in large overall storage needs. It is obvious that the Gentry-Waters BE approach based is most suitable regarding the given constraints. Especially the important storage requirements are met as the Gentry- Waters BE scheme performs very well at each of the three related properties. Even if it is considered that pairing based cryptography is more expensive than symmetric or public key cryptography, the Gentry-Waters BE approach requires a tolerable amount of computing power. Moreover, the avoidance of group interaction in form of messages is an advantage of this scheme. Given the presented evaluation, the Gentry-Waters BE approach turns out to be the best fitting approach, worthy of further investigation. 59

60 SAFEBOOK KEY MANAGEMENT S o [KBytes] e-05 Simple Shared Key (1) Simple Shared Key (2) OFT Gentry-Waters BE N (a) Storage at the owner S p [KBytes] S. Shared Key (1) (= 0) Simple Shared Key (2) OFT (= 0) Gentry-Waters BE N (b) Storage in the profile S u [KBytes] Simple Shared Key (1) Simple Shared Key (2) (= 0) OFT Gentry-Waters BE (= 0) N (c) Storage at the users Simple Shared Key (1) Simple Shared Key (2) OFT Gentry-Waters BE Simple Shared Key (1) Simple Shared Key (2) OFT Gentry-Waters BE Simple Shared Key (1) Simple Shared Key (2) OFT Gentry-Waters BE E i 10 E a 10 E r N (d) Encryptions during initialization N (e) Encryptions on user addition N (f) Encryptions on user removal Simple Shared Key (1) Simple Shared Key (2) (= 0) OFT Gentry-Waters BE (= 0) Simple Shared Key (1) Simple Shared Key (2) (= 0) OFT Gentry-Waters BE (= 0) Simple Shared Key (1) Simple Shared Key (2) (= 0) OFT Gentry-Waters BE (= 0) M i 100 M i 100 M i N N N (g) Messages during initialization (h) Messages on user addition (i) Messages on user removal Fig. 3. Plots of all properties over the number of authorized users N (y-axis plotted with logscale, properties that are constant 0 are marked accordingly with (= 0) in the legend) VI. RELATED APPROACHES Steiner, Tsudik and Waidner proposed an approach that extends Diffie-Hellman key exchange to groups in [14], [15]. This scheme provides cheap member addition but is based on the contribution of all group members to compute the group key, which poses two problems regarding the requirements and constraints: It requires direct interaction between the authorized users, which is infeasible regarding the peer-topeer architecture. Moreover, the key exchange happens directly between the members of the authorized user group, which violates the privacy requirements as the authorized users have to know each other. A recent encryption scheme proposed by Eskeland and Oleshchuk [5] uses fractional public keys to compute a shared group key. If Safebook users could be provided with a private key based on this scheme, neither they nor the owner of the profile would have to store any further keys. However, the private keys are generated by a central trusted authority like Safebook s Trusted Information Service, which would then be able to decrypt any communication in Safebook. The authorized users also need to know the other authorized users, conflicting with the privacy requirements. In [9], Jin and Lotspiech present a broadcast encryption scheme with differently privileged users. They introduce security classes to provide different data sets for differently privileged user groups. Even though Safebook attribute user groups could be arranged in a hierarchy like in role-based systems, the proposed approach since it is designed for a linear hierarchy (i.e., group A > group B > group C, etc.) imposes conflicts regarding overlapping attribute user groups and key distribution. Multiple approaches exist to establish group keys in (mobile) ad hoc networks (cf. [16]) that could generally be used to set up attribute group keys in Safebook. However, these approaches assume that no preliminary keying material is available and therefore impose much useless overhead in a system like Safebook having public/private key pairs at hand for bilateral communication. VII. CONCLUSION AND FUTURE WORK The architecture of Safebook leads to the need for different encryption schemes as they are used in centralized online 60

61 SAFEBOOK KEY MANAGEMENT social networks. Private data has to be encrypted in user profiles to guarantee its confidentiality. The peer-to-peer structure of Safebook raises different requirements and constraints that we have presented in this paper, subdivided into strict and weaker ones. Afterwards, different encryption schemes have been described that base upon distinct approaches. To evaluate the given schemes, interesting properties stemming from the outlined requirements and constraints are defined first. As a second step, abstract formulas for the computation of these properties are developed for each encryption scheme using varying system parameters. All properties are plotted afterwards with reasonable values applied for the abstract parameters, constituting a basic overview over the trade-offs imposed between and within different approaches. Though there is evidence that the broadcast encryption based approach performs best in the present setting, further investigations and simulations especially simulations based upon real data sets are needed to prove this observation. Clearly, the evaluation carried out in this paper can just give a first insight into the characteristics of the presented approaches. A precise elaboration of the schemes outlined in this paper remains for future work. [17] D. Wallner, E. Harder, and R. Agee. Key management for multicast: Issues and architectures. RFC 2627 (Informational), IV [18] C. K. Wong, M. G. Gouda, and S. S. Lam. Secure group communications using key graphs. In SIGCOMM, pages 68 79, IV REFERENCES [1] R. Canetti, J. A. Garay, G. Itkis, D. Micciancio, M. Naor, and B. Pinkas. Multicast security: A taxonomy and some efficient constructions. In INFOCOM, pages , IV, IV-B [2] L. A. Cutillo, R. Molva, and T. Strufe. Safebook: A privacy-preserving online social network leveraging on real-life trust. "IEEE Communications Magazine", Vol 47, Issue 12, Consumer Communications and Networking Series, December 2009, I [3] Y. Dodis and N. Fazio. Public key broadcast encryption for stateless receivers. In Digital Rights Management Workshop, pages 61 80, IV-C [4] Y. Dodis and N. Fazio. Public key trace and revoke scheme secure against adaptive chosen ciphertext attack. In Public Key Cryptography, pages , IV-C [5] S. Eskeland and V. A. Oleshchuk. Secure group communication using fractional public keys. In ARES, pages , VI [6] A. Fiat and M. Naor. Broadcast encryption. In CRYPTO, pages , IV-C [7] C. Gentry and B. Waters. Adaptive security in broadcast encryption systems (with short ciphertexts). In EUROCRYPT, pages , IV, IV-C [8] O. Goldreich, S. Goldwasser, and S. Micali. How to construct random functions. J. ACM, 33(4): , IV-B [9] H. Jin and J. Lotspiech. Broadcast encryption for differently privileged. In SEC, pages , VI [10] Y. Kim, A. Perrig, and G. Tsudik. Simple and fault-tolerant key agreement for dynamic collaborative groups. In ACM Conference on Computer and Communications Security, pages , III-A4 [11] D. A. McGrew and A. T. Sherman. Key establishment in large dynamic groups using one-way function trees. Technical Report No. 0755, TIS Labs at Network Associates, Inc., Glenwood, MD, May IV [12] D. Naor, M. Naor, and J. Lotspiech. Revocation and tracing schemes for stateless receivers. In CRYPTO, pages 41 62, IV-C [13] A. T. Sherman and D. A. McGrew. Key establishment in large dynamic groups using one-way function trees. IEEE Trans. Software Eng., 29(5): , IV [14] M. Steiner, G. Tsudik, and M. Waidner. Diffie-hellman key distribution extended to group communication. In ACM Conference on Computer and Communications Security, pages 31 37, VI [15] M. Steiner, G. Tsudik, and M. Waidner. Cliques: A new approach to group key agreement. In ICDCS, pages , VI [16] J. van der Merwe, D. S. Dawoud, and S. McDonald. A survey on peerto-peer key management for mobile ad hoc networks. ACM Comput. Surv., 39(1), VI 61

62 MOBILE P2P Mobile P2P Yann Karl Abstract After Napster, Gnutella, edonkey and many more successful examples for peer-to-peer networking emerged among the global communication networks, the usage of those concepts in mobile environments is taking its first steps. This paper will at first define some requirements and challenges of mobile peerto-peer networking, then take a look at possible solutions for some of the challenges mobile peer-to-peer has to face. We give examples for applications based on mobile ad hoc networks and study the architectural problems and possible solutions. I. INTRODUCTION When Mobile ad hoc networks first appeared as DARPA packet radio networks in early 1970 s, they became an important research topic in the domain of communication networks. The main goal is to create and maintain a network lacking central control entities. Peer-to-Peer networks realize quite a similar approach within fixed network infrastructures. Since mobile ad hoc networks suffer from rapid changes in the operation environment, weak signal strengths and heterogeneous combination of hard and software, the challenges the researchers and implementers face are quite different from wired peer-to-peer solutions. Subsequently we give an overview of these challenges and the current development in mobile peerto-peer environments. II. CHALLENGES OF AD HOC MOBILE INFORMATION SYSTEMS An ad hoc mobile information systems is a decentralized and highly dynamic network consisting of autonomous mobile devices which interact as peers. The network itself is selforganizing, changing its topology depending on the relative distance of its members. Compared to wired network systems, mobile systems are limited in terms of QoS-mechanisms, bandwidth capacity and signal transport medium quality, thus requiring many technical challenges to be solved. A. Ad hoc Networks A wireless ad hoc network is a self-organizing networks consisting of different wireless nodes that communicate on dynamically established connections. Ad hoc networks have a number of advantages compared to traditional wireless cellular networks [1]: No Infrastructure required: Ad hoc wireless networks don t rely on any given central control unit like wired base-stations So, they can be established in places without any existing infrastructure on demand and with little configuration effort. Self-organization: In contrast to wire based networks, the topology of an ad hoc network depends on physical proximity of the nodes and thus is subject to change at anytime. The topology in fact represents the relative distance of its nodes and reconfigures as nodes move, join the network or disappear. Fault Tolerance: Since central control units are missing and the network reorganizes itself, the network itself is fault-tolerant on node failure, reorganizing itself to prevent any malfunction. B. Mobile Peer-to-Peer Systems The idea of bringing Peer-to-Peer systems to mobile devices is not really the traditional idea on which their design was based. Initially, mobile hardware was designed as thin client as part of a client-server system. Because of their lack of computational power, their main purpose was to access resources like internet data or computation power provided by larger server systems. Since mobile CPUs are gaining more speed while saving battery life, and wireless network connections are becoming more stable and are able to utilize greater bandwidth, it becomes possible to design mobile systems as peer-to-peer systems. Bolcer et al describe peer-to-peer as any relationship in which multiple, autonomous hosts interact as equals. An autonomous host is useful in it s own right even in the absence of others. The peering relationship implies that additional functions are available to other peers collectively as a consequence of their collaborations with other hosts. Known as the network effect, the value and extent of these added powers increases dramatically as the number and variety of peers grows [2]. A mobile peer-to-peer system inherits many of the features of ad hoc networks: Self-organizing: since mobile devices are usually on the move and the network structure is based on proximity, the topology of a mobile peer-to-peer system constantly changes due to the altering distances between its peers. Fully decentralized: a peer-to-peer system lacks central information nodes, so each peer in a mobile peer-to-peer system is equally important to it. Highly dynamic: because of the nature of wireless connections, communication end-points move frequently, creating a high dynamic behavior of the peer-to-peer network. III. CHALLENGES The unique character of mobile peer-to-peer systems represents a significant challenge for the designer. Gerd Kortuem et al [1] described the most important properties as follows: A. Networking In general, wireless data networks suffer from great limitations referring to power, available spectrum, mobility, band- 62

63 MOBILE P2P width, latency, availability and connection stability. Communication links can be interrupted spontaneously, and thus networks failures have to be handled gracefully by the mobile peers. In addition, peer applications should provide for disconnected operations such that a peer remains operational even without network connection. Furthermore, communication between arbitrary peers in a mobile peer-to-peer network requires routing over multiple-hop wireless paths. The main difficulty arises from the fact, that without a fixed infrastructure these paths consist of wireless links whose end-points are likely to be moving independently of one another. Consequently, node mobility causes the frequent failure and activation of links, leading to increased network congestion while the network routing algorithm reacts to topology changes [3]. The instability of multi-hop paths and the limited lifetime of routes in ad hoc networks have a negative impact on the performance on peer-to-peer routing. B. Mobile Device Limitations As stated on the networking issues above, the main limitation of mobile devices is power and therefore battery life. The limited energy capacity also restricts the uplink bandwidth and makes sending data a more expensive operation that receiving. Mobile devices also lack the computational power, memory and storage space of modern desktop computers or server systems. Less powerful CPUs also prevent the usage of intensive cryptographic security and signing measures. C. Naming Since peer-to-peer systems lack central authorities, they often operate outside the DNS system. As we characterized above, a mobile peer-to-peer system in its decentralized and autonomous environment should not rely on central naming. Additional reasons for not relying on the DNS system are: In mobile ad hoc networks without central units, access to a DNS server should not be relied on. IP networking is not always available making DNS resolving impossible. Impromptu collaboration requires not only the identification of peers but also the identification of its user for access control or privacy reasons, so it is necessary not only to determine the name of a device but also identify its current owner. D. Resource Discovery For mobile peer-to-peer applications it is not only necessary to discover the devices which are part of the network, but also to gather meta-information about its neigboring devices to take advantage of their available and shared resources such as storage, media and services. Because of the unpredictable nature of mobile ad hoc networks, discovering resources becomes quite a challenge. Algorithms are needed through which a peer can not only detect the availability of a neighboring device but also share information about configuration and supported services. Resource discovery must be timely (in order to detect moving devices) and efficient (so not to overload the network) [4]. As decentralization is a necessary feature of mobile peerto-peer networks, they can not rely on central servers as wire network peer-to-peer systems like Gnutella sometimes do (e.g. central host cache). So, other way of discovering surrounding peers have to be found. E. Data Sharing and Synchronization Due to its weak network connections, data sharing and synchronization is a challenge not easily solved. To be able to cooperate in a consistent and reliable way, peers need to be able to share and synchronize data. The unpredictable behavior of wireless networks and the fact that peers usually communicate pair-wise lead to the following, yet conflicting requirements: High availability: For peers to be able to perform tasks autonomous from connections to other peers, it is necessary to employ a replicated object scheme where every peer keeps a local copy of any shared object. Consistency: Replication of data and its independent use and alteration creates the problem of syncing independently updated information on the same shared object to prevent peers from working with inconsistent data. Timeliness: Solutions for keeping consistency have to deal with the problem that data might be shared between groups of peers that never meet at the same time. This may result in slow update propagation through the whole group. F. Security The security aspect of mobile peer-to-peer systems is much more complicated than that of wire based systems. Without proper countermeasures is it possible to track movement of mobile devices and even observe and keep track of their activities. In ad hoc networks where every capable device may be a communication partner, users may not be aware of whom they are connected to or what devices try to connect to them. Spoofing identities of known people in line of sight by someone in the next room can lead to unauthorized access to private, sensitive or confidential data. So in this case, not only strong encryption, which are unfortunately limited to the mobile devices CPU power, but also robust authentication methods are needed for connection establishment. This is similar to the issues of naming, because in a system lacking central authorities, the question how and whom can we trust on the network cannot be easily answered. There is no central certification authority (CA) to be trusted, thus no public key infrastructure. So to solve this issue, efficient distributed authentication protocols are needed which are capable of running in a totally decentralized environment. A possible solution, proposed in [5] is the use of reputations. To ensure a fully secure system it might also become necessary to include biometric security checks not only to ensure that the device itself has the right certificates installed but also that it was not accidentally taken or stolen by another user. 63

64 MOBILE P2P G. Privacy Privacy in the case of mobile peer-to-peer applications should be a challenge well taken care of. Privacy is the right of any individual to control them-self the data and information collected about their behavior. In contrast to security, which concerns are about keeping information and data safe from malicious access attempts, privacy defines the amount of information known about an individual. So, it is not only necessary to secure any data from unauthorized access, but also giving the user control over what information is disclosed, to whom, and when. Moreover, it should be possible for any user to stay anonymous if necessary or desired. IV. THE MPP PROTOCOL The fact, that MANETs and peer-to-peer have a lot in common, is used by Ruediger Schollmeier and Ingo Gruber [6] to develop a protocol combining both features. In general, peerto-peer protocols are not aware of their underlying network layer and assume wired connections, thus creating a lot of overhead network traffic unsuitable for mobile systems. Also, they generally rely on TCP and assume stable, reliable data delivery between peers. On the other hand, MANETs are not aware of the peer-to-peer application layer, taking much effort in recreating broken links instead of choosing an alternate data source. The MPP Protocol Suite consists of 3 different protocols: the MPP Protocol as the application layer protocol, the Mobile Peer Control Protocol (MPCP) as the inter-layer communication protocol and EDSR as the network routing protocol. HTTP over TCP is used for the data transport. Fig. 1. MPP Network layers and protocols [6] A. Structure The EDSR protocol is an enhanced version of the existing Dynamic Source Routing [7] protocol which is capable of finding routes between hosts in a mobile ad hoc network. The EDSR protocol does not change the behavior of the DSR protocol, so EDSR nodes can be integrated into any DSR network. EDSR extends the request and reply types to not only find routes to a peer, but also submit search-information for file-requests, and the capability to reply to those requests. This leads to the following advantages [6]: The MANET keeps control of the network organization. This way, the changing structure of the topology is taken into account by the overlaying peer-to-peer network. Routing is performed by the network layer and not unnecessary additionally by the application layer which minimizes the overall implementation effort. Any security implemented on the MANET layer also improves the peer-to-peer network security. By integrating both networks, no redundant information creating unnecessary traffic and thus taking bandwidth is created. Figure 1 shows the layer model of the mobile peer-to-peer network. MPCP serves as a communication channel to achieve the above stated advantages between the Application (MPP) and the network layer (EDSR). Fig. 2. [6] MPP message sequence chart for data search and download process Figure 2 shows the sequence of announcing an application to the EDSR layer, searching for files and transmitting data. The MCPC works as registration service enabling the EDSR to notify applications of incoming search requests, to transmit search queries of its user to the network and handle incoming requests and responses from the EDSR. So, using the MCPC, the peer-to-peer application registers itself to the EDSR layer. If a user starts a search request, the EDSR layer handles it, flooding requests throughout the MANET. Registered nodes on the EDSR layer receive and forward this request to their overlaying peer-to-peer application which then decides whether to reply or not depending if the users query matches. If the request is a match, an EDSR reply is sent back to the requesting node, containing not only information about the file matching, but also complete routing information between both hosts. As stated above, MPP then uses the HTTP protocol for file transfer, implementing not only GET requests for downloading, but also PUT requests for uploading, which is part of the HTTP RFC but usually not implemented in e.g. web-servers. 64

65 MOBILE P2P To download file chunks from different peers or to retrieve broken up-/downloads, the MPP uses the HTTP content range header. messages and may be implemented on top of many existing protocols such as TCP/IP, UDP or HTTP. V. THE PROEM MIDDLEWARE PLATFORM Proem is described as "an open computing platform targeted at ad hoc mobile information systems. It provides a complete solution for developing and deploying collaborative peer-topeer applications for mobile ad hoc networks and personal area networks." [8] Proem was developed by the Wearable Computing Laboratory at the Department of Computer Science, University of Oregon, USA, based on their rich experience in mobile peerto-peer application development. [8] [9] The main objectives of Proem are stated as followed: Adaptability: Proem is design to rapidly response to changes in the operating environment. Universality: Proem, in contrast to other concepts, is an infrastructure for building diverse mobile communities ranging from file sharing to instant messaging. Interoperability: Proem s goal is to provide interoperability between heterogenous hardware and software platforms. Platform Independence: Proem is designed independent of the programming language to be used, utilizing open web standards and technologies such as HTTP, XML, MIME and others. Extensibility: Proem s core components may be modified by developers. High-level development support: Proem was designed to provide a simple but powerful development platform for the implementation of mobile peer-to-peer applications. A. Architecture The basic building block of the Proem architecture is the peer, which is an autonomous host or device in a peer-topeer relationship. Deployed on each peer is the Peerlet Engine, a runtime environment for Peerlets. The Peerlet itself is the peer-to-peer application based on an event-based programming model. From an abstract point of view, Proem consists of a set of communication protocols defining the syntax and semantics of messages, which peers are able to exchange. The Proem platform itself is a collection of tools, APIs and runtime structures for the development of peer-to-peer applications. It currently exists as a proof of concept Java implementation. The Proem defines 4 protocols for peer communication: Proem Transport Protocol: the low level basis of all Proem protocols - a connectionless, asynchronous communication protocol. Presence Protocol: protocol for presence announcement. Data Protocol: designed for peers to share and synchronize data. Community Protocol: designed to be used for community membership. Additionally, Proem can be extended by application-specific protocols. The Proem Transport Protocol builds on XML Fig. 3. Proem Middleware Architecture [9] Figure 3 shows the components of the Proem Middleware Architecture. Besides the protocols for presence, community, data and transport, it includes the following services useful for application design and development: Discovery manager: provides service for presence announcement and discovery of peers. Context manager: responsible for information about currently visible peers. Peer database: data storage for peerlets to store custom meta-information on peers, keeps log of peer encounters. Resource manager: provides peerlets with the ability to share resources with other peers and allows access control on these resources. Event bus: Since Proem is event-based on a publishsubscribe basis, the event bus handles the communication between the components notifying peerlets of contextual data changes. Profile manager: stores information about the peer user and its relation to others. Protocol manager: gives information about applicationspecific protocols supported by the current peer. The use of the Proem platform as application platform in software engineering courses has shown, that not only experienced users but also novice developers were able to build a complete application design in a short timeframe. Future work on the integration to the underlying ad hoc networks and the specification of a security architecture still needs to be done. VI. MORE EXAMPLES FOR MOBILE PEER-TO-PEER SOLUTIONS A. The Mobile Agent P2P Architecture (MAP2P) Tim Hsing-ting Hu et al [10] propose an architecture to support mobile devices as enhancement to the Gnutella File Sharing Network. The main problems stated in their work using mobile devices with the Gnutella Protocol are the following: Periodic sending of Heartbeat messages drains too much power. 65

66 MOBILE P2P Limited bandwidth of wireless networks are often a bottleneck. Unpredictability of wireless networks according to sudden disconnects. Lack of mobility awareness for migration between networks. The solution in their proposal is the use of a mobile agent. Mobile Agents have the characteristics of being mobile, autonomous and persistent [11]. In this case, instead of the mobile device attaching to the peer-to-peer network, a mobile agent acting on behalf of the device is used. The mobile agent mediates the communication with the mobile system through a lightweight protocol to reduce the necessary amount of traffic. If a peer wants to download a file from the mobile peer, the mobile agent works similar to a HTTP proxy, denying access if the mobile peer is not available or forwarding the request if it s present. Downloading a file to the mobile peer can be either done directly by the mobile device via HTTP, if in an area with WiFi connectivity or any other accessible network, or the mobile agent can be directed to download the file on behalf of the mobile peer. (see figure 6) Fig. 6. Options for Download Operation [10] Fig. 4. The Mobile Agent P2P Architecture [10] The proposed Mobile Agent P2P architecture is shown in figure 4. A necessary extension to the existing peer-to-peer network are hosts that contain execution environments for mobile agents. Those execution environments not only serve as place for mobile agents to execute their program code, but also as a resource and security control for the host. The process of joining the network for a mobile client is illustrated in figure 5. Starting the File-Sharing client creates a mobile agent containing any information about its mobile peer, which is then migrated to an execution host and reports its location back to the mobile peer. One of the main challenges of mobile peer-to-peer, network migration, is also handled in this solution. Since the mobile peer knows the location of its agent, it can update its IP-Address at anytime. Mobile Agents are also allowed to migrate for different reasons, they can move to an operation environment closer to the peer, or be forced to migrate when its current execution environment runs out of resources. (see figure 7) Fig. 7. Options for Download Operation [10] Fig. 5. Mobile Devices Joining the Gnutella Network [10] Searching for a file is not different from the usual Gnutella search method. If the user enters a search on his mobile device, it is given to the mobile agent for processing. If only limited bandwidth is available, it will only return the top of the list results, which should be most reliable matches. Since the mobile agent contains the list of files shared by the user on his mobile device, it can directly answer to any search request from the network without need to communicate with its peer. B. Distributed Search Service for Peer-to-Peer File Sharing In [12] Christoph Lindemann et al. present a passive distributed indexing (PDI) technique, a general-purpose distributed search service for mobile file sharing applications based on peer-to-peer technology. The service enables resource effective searching for files distributed across mobile devices based on simple queries. The basic idea behind the PDI implementation is that each mobile devices stores a local repository containing a set of available files. Each document is uniquely identified using its path in the local storage and the IP or MAC address of the device. Also, each device maintains a local index cache. Queries are defined by a query string consisting of a list of keywords connected by a Boolean AND operation. Query messages are transmitted between peers using local broadcasts. To spread the messages further, queries may be forwarded for 66

67 MOBILE P2P a predefined number of hops. To fill the local index cache, all devices listen to broadcasted responses and add all references to matching documents to the cache. PDI itself does not specify how documents match the query, making this a task of the application implementing it. To cope with the low transmission range of wireless devices, PDI is able to forward query messages as shown in figure 8. As the set of documents in the result increases with each hop, any peer on its way updates its cache with the information available. To avoid unneccesary traffic, duplicate entries are removed from the reply. [11] K. A. Pham V, Mobile software agents: An overview, IEEE Communication Magazine, vol. 1, pp , July VI-A [12] O. P. W. Christoph Lindemann, A distributed search service for peerto-peer file sharing in mobile applications. VI-B, 8 Fig. 8. Message forwarding in PDI [12] VII. CONCLUSION AND FUTURE WORK As we could see in the given concepts and examples of implementations, a lot is done merging the ideas of MANETs and peer-to-peer from different angles: presenting new ideas for new protocols and application development, and also integrate mobile clients into existing peer-to-peer networks like Gnutella. Still, there is a lot of work to be done, but since the development of mobile devices itself concerning computational power and battery life advances on a day to day basis, there are a lot of interesting opportunities in developing peer-to-peer based applications for mobile devices. REFERENCES [1] D. P. T. G. C. T. S. F. Z. S. Gerd Kortuem, Jay Schneider, When peer-to-peer comes face-to-face: Collaborative peer-to-peer computing in mobile ad-hoc networks, in Proceedings of the International Conference on Peer- to-peer Computing (P2P2001), (Linkoping, Sweden), August II-A, III [2] A. S. H. P. K. B. M. P. O. R. N. T. Gregory Alan Bolcer, Michael Gorlick, Peer-to-peer architectures and the magi TM open-source. II-B [3] J. J. Kistler and M. Satyanarayanan, Disconnected operation in the coda file system, ACM Transactions on Computer Systems, vol. 10(1):3, p. 1, February III-A [4] M. Nidd, Timeliness of service discovery in deapspace. proceedings of the 2000 international workshop on parallel processing. III-D [5] J. J. S. F. Z. S. Jay Schneider, Gerd Kortuem, Disseminating trust information in wearable communities, in 2nd International Symposium on Handheld and Ubitquitous Computing (HUC2K), (Bristol, England), Sept III-F [6] I. G. Ruediger Schollmeier, Protocol for peer-to-peer networking in mobile environments, IV, IV-A, 1, 2 [7] D. M. D. Johnson, Dynamic source routing in ad hoc wireless networks, Mobile Computing, pp , IV-A [8] G. Kortuem, Proem: A peer-to-peer computing platform for mobile ad-hoc networks, V [9] G. Kortuem, Proem: A middleware platform for mobile peer-to-peer computing, Mobile Computing and Communications Review, vol. 6, no. 4, p. 1, V, 3 [10] A. S. Tim Hsing-ting Hu, Binh Thai, Supporting mobile devices in gnutella file sharing networks. VI-A, 4, 5, 6, 7 67

68 SURVEY OF POSSIBLE TASKS FOR ARTIFICIAL LIFE IN LARGE-SCALE NETWORKS Survey of possible tasks for Artificial Life in large-scale Networks Denis Lapiner Abstract This paper deals with the question: "If we create artificial life, what could its task be?". Artificial life and learning algorithms are essential today they help handling the complexity of software, which is growing rapidly especially in the distributed networks domain. The article will give a rough overview of the history of artificial life (AL) and artificial intelligence (AI). Additionally it will briefly introduce the methodology used in AL and AI application. Furthermore, some practical and theoretical applications for AL and AI will be presented. Finally this article will give some suggestions on what can be performed by artificial life inside a decentral, intelligent and autonomous network. I. INTRODUCTION The most common misapprehension about artificial intelligence (AI) and artificial life (AL), probably caused through the since-fiction films, is that AI is capable of thinking and understanding what it does. In fact AI is mostly defined by some heuristic algorithms. If an machine decides that an article is about sports, then it does so not because it understands the text, but because the word frequency is similar to the sport articles the machine has seen before. Artificial intelligence and artificial life are almost 70 years old. They evolved from toy applications to highly valuable applications that save large amounts of money and simplify everyday life. Both are used in fields like the Internet, computer games, other entertainment, computer graphics, robotics, security, economics, medicine, and linguistics. The methodology for AL and AI consists of various techniques like evolutionary computation, agentbased modelling, cellular automata, swarm intelligence, Lindenmayer systems, neural networks and many more. Many applications of the algorithms exist, starting with algorithms playing chess better then any human ever could and continuing in spacecrafts that are too complex to be controlled solely by human beings. This article will give a qualified overview of the techniques used in AL and AI applications. In addition interesting applications will be named and explained. The methodologies of AL are predominant massively parallel and computationally intensive, which makes large-scale networks very suitable for these methodologies. This paper will introduce a decentral, intelligent and autonomous peer to peer network called SkyNet, which is planned to use artificial life for self organisation and improvement of its services. Due to the fact that SkyNet s network does not have a certain task at the moment, the main intention of this article is to present three suggestions for possible tasks which artificial life inside SkyNet could have. II. VISION The scientists of the Multimedia Communications Lab (KOM) do research on SkyNet, a peer to peer network which uses any number of resources on ordinary desktop personal computers. This network is capable of monitoring itself and furthermore it finds performance measurement values which need to be satisfied by itself. These could be for example latency, average individual processor load, availability, etc. The system autonomously decides which division of resources is optimal. The current aim of the scientists is to make the network adapt to the needs of the users. If users complain of long waiting time then the latency needs to be improved, if the local load is too high then it needs to be reduced. But these are only the current aims. From a visionary perspective, this is only low-level self organization, as the network could be able to follow its own autonomous aims. Large-scale networks like SkyNet are particularly suitable for AL applications, because of their massive parallel computational abilities. Furthermore SkyNet s computational power and memory capacities are incredibly high and surpass those of any high-end single machine. In addition there are no investment needed to operate SkyNet. In order to find some suitable and meaningful task for SkyNet I will introduce some applications from the history of AI and AL. Besides the progress of the expectations towards intelligent systems will be shown. Finally I will suggest some tasks which SkyNet could have in respect to already existent applications. III. HISTORY This section is to introduce the history of the artificial intelligence and artificial life ideas. It shows examples for AI and AL applications from the past. In addition this section is to show that there is no AI that can define its task automatically. It is rather human s job to define the machine s task. According to the book "Artificial Intelligence, A Modern Approach" written by Stuart J.Russell and Peter Norvig "the first work that is now generally recognized as AI was done by Warren McCulloch and Walter Pitts" [4] in These two scientists have created a neural network in which a neuron can be active ("on") or inactive ("off"). Where a neuron is switched on when a certain number of his direct neighbours is active. McCulloch and Pitts have shown that any computable functions can be realized with their neural network and consequently all logical connectives like "and", "or", "not", etc. are implementable. They also have made 68

69 SURVEY OF POSSIBLE TASKS FOR ARTIFICIAL LIFE IN LARGE-SCALE NETWORKS some theoretical thoughts on making their network learn, which were practically demonstrated by Donald Hebb in year The learning rules Hebb has defined were called "Hebbian learning" to his honour. According to the preceding information we can assume that around 1950 the expectations for artificial intelligence were to solve some simple logical problems and some learning ability for easy logical functions (- computable functions). This conclusion is confirmed by the first neural network computer called SNARC in 1951 created by Marvin Minsky and Dean Edmonds. This computer had even problems to solve mathematical tasks. Stuart J.Russell and Peter Norvig say that development of artificial intelligence in the 1950s was "full of success-in a limited way" [4]. Programs which could solve problems with a "thinking humanly" approach appeared. Learning algorithms showed that computers are better in playing games then humans, if you let them learn a game by themselves. This development shows that artificial intelligence faced new tasks that time. They had to learn and solve problems which are not related to mathematics. Even worse, they had to solve mathematical problems without using mathematics! The phrase "in a limited way" from above refers to the availability problem of computers that time. Here is a quote concerning this problem from the previously mentioned book: "Working at night, he used machines that were still on the testing floor at IBM s manufacturing plant." [4]. In 1968 a program called ANALOGY was even able to solve geometric analogy problems that appear in IQ tests. In this time AI researchers got very confident of capabilities of artificial intelligence. One of the predictions that have been made was that it will take no more than 10 years until a computer becomes chess champion and proves a challenging mathematical theorem [4]. Indeed, this prediction came true, although not in those following 10 years. Still the so far emerged algorithms were not scalable for large or difficult problems. The next generation of artificial intelligent algorithms were the "knowledge-based systems". The advantage of these was the background knowledge which they had on the engaged problem. The Dendral program (1969) is an example of this approach. This program was designed to calculate the structure of molecules given their mass spectrum. The important difference to earlier algorithms is that Dendral had a large database of rules which were provided by the developers. These rules allowed to restrain the number of possible results to a limited value. From there on the program could search for the best fitting result. Also Kim and Cho write in their paper that evolutionary computation with genetic algorithms have been used to find out the structure of proteins [5]. Since the number of artificial intelligent applications increased fast knowledge-based languages appeared in the world. An example is Prolog, which is still in use today. In the 1980s artificial intelligence became an industry. In 1980 the value of the AI industry was only a few millions and in 1988 it had grown up to billions of dollars [4]. One example for commercial AI software is R1 used by the Digital Equipment Corporation to simplify the configuration of orders for new computer systems. Nevertheless many companies made some promises on complicated AI technology which they could not keep. This period is known as the "AI Winter". From this development of AI we see that the expectations we have about AI systems changed. AI is now not only messing around with small theoretical problems, but is used to help people on their working place like the expert system R1 did. One more practical example for a useful AI system is the diagnostic expert system used in Windows to correct problems. Furthermore Stephen Wolfram explains in his book "A New Kind of Science" [10] that in the mid-1980s to the mid-1990s the research in artificial life had its most active time. AL showed in this years that "... computer programs could be made to emulate various features of biological systems." [10]. In that time Wolfram himself was working on cellular automata, which will be explained in more detail in the next section. "The first conference on artificial life, in 1989, where the term "artificial life" was coined, gave recognition to ALife as a field in its own right" [5]. In the 1990s many agent systems emerged in the artificial intelligence field. Their popularity is demonstrated by the integration of the word "bot" in the everyday language. The Internet, growing rapidly, became a large application area for AI. A new research area called "Web Mining" was formed. Search engines, spam filter and recommendation systems use AI technologies. Actually the Internet would be almost senseless without AI programs which help users every day. Also medicine was a fast growing application area for AI. In 1991 the fist "Artificial Intelligence in Medicine Europe" conference was held [8]. "Artificial Intelligence, A Modern Approach" [4] gives some examples for AI tasks around year These will be briefly listed below. Autonomous planning and scheduling, like the planning program on board of NASA s spacecraft. Game playing, IBM s Deep Blue won a chess match against Garry Kasparov. Autonomous control. A program navigated a minivan for 2850 miles across the U.S. after learning on some training runs. Diagnosis, medical programs make suggestions on possible diseases. Logistics Planning, such as DART, a transportation planner of the U.S. forces during the Gulf crisis. Robotics, for example in surgery. Language understanding and problem solving, like Proverb, a crossword puzzle solver. 69

70 SURVEY OF POSSIBLE TASKS FOR ARTIFICIAL LIFE IN LARGE-SCALE NETWORKS IV. PRESENT This section will introduce some papers which deal with the aims of artificial life nowadays. At first AL s methodologies and their application areas will be explained. Thereafter one of the application areas of AL will be explained in more detail. Finally this section will give some information on SkyNet s self-organisation and the state of the art in this research area. Artificial Life The first discussed paper in this section is "A Comprehensive Overview of the Applications of Artificial Life" [5] from Kyung-Joong Kim and Sung-Bae Cho. This paper names some aims which artificial life nowadays can have. First of all I will explain the difference between traditional AI design and that of AL. There are two approaches to model AI. On the one hand, the "top-down approach (involving a complicated, centralized controller that makes decisions based on access to all aspects of the global state)" [5] and on the other hand, the "bottom-up approach, which is based on parallel, distributed networks of relatively simple, low-level agents that simultaneously interact with each other" [5]. The top-down approach is the one traditionally used for AI, while the bottom-up approach is characteristic for AL. Besides the bottom-up approach of AL is capable of using the massive parallel computation abilities of large-scale networks in a much more efficient way then AI s top-down approach. Artificial life has different methodologies to solve problems, these are listed below: Evolutionary Computation is inspired by the biological evolution process in nature, its algorithms use evolution strategies to compute results, they all are population based search algorithms. The evolutionary computation is used for learning, adaptation, and searching. There are three main types for these algorithms: Genetic algorithms, genetic programming and evolutionary programming. Genetic algorithms are the most popular ones and worthy to be described in more detail. They use crossover and mutations as search operators. In genetic algorithms a population of individuals experiences an evolutionary process. The individuals fight for resources in the environment and the stronger ones pass their genetic material to the later generation. Normally evolutionary computation uses some autonomous fitness function to select the stronger individuals, but for some applications, like graphics, such fitness functions are hard to find. Therefore the fitness function must be determined by the subjective view of a human user. This method is called interactive evolutionary computation. However the human always creates a bottleneck for the computation, which leads to small number of generations. Agent-Based Modelling Multi Agent Simulations explained in the next section are also an example for artificial life. Cellular Automata are used for computation, actually there is no autonomy, but they still belong to the field of artificial life. A grid of cells in which each cell has an internal state, a program, and knowledge of its neighbours is used to represent different complex behaviours. The program defines the state transitions according to the states of the cell s neighbours. This method is used for modelling ecological systems, image processing and neural network construction. Swarm Intelligence uses algorithms which imitate insect swarms. Kim and Cho give two examples for such algorithms. The first one is the popular ant colony optimization, it solves complex combinatorial optimization problems using artificial ants. This algorithm has many applications in fields like electronics, industrial design, chemical process design, and data mining. Furthermore it is used in telecommunications and networking in areas of control and routing. The second algorithm is the particle swarm optimization based on the behaviour of bees. Lindenmayer System, this method uses simple grammar rules to generate complex sentences from primitive components. It is used in computer graphics to model plants like trees and flowers or other regular shapes like feathers. Neural Networks are used for classification of their inputs. Actually they map inputs to outputs, which they previously learned in the training phase. This method is often used in optimization, regression, and prediction problems. Kim and Cho also give many detailed examples for fields of application that use artificial life, some of them will be listed below. Since we are only interested in possible task of AL, technical information like implementations are not discussed in our paper, for more information take a look at [5]. Robotics: the main task for AL in robotics is designing controllers for robots. To solve this problem, methods from evolutionary computation combined with neural networks are used. Besides designing controllers there are also other issues like map building, planning and human-robot interaction. Computer Graphics: here AL helps designing virtual characters, generating images and animation. Virtual creatures and characters have an important role in computer graphics nowadays. AL helps to shape the creatures and design their behaviour. Cho and Lee have developed an image retrieval system based on human 70

71 SURVEY OF POSSIBLE TASKS FOR ARTIFICIAL LIFE IN LARGE-SCALE NETWORKS preference and emotion by using an interactive GA. It searches the images not only with explicit queries, but also with implicit queries such as "cheerful expression" and "gloomy expression"" [5]. Mostly interactive evolutionary computation is used for this tasks, since their quality is not rateable objectively. Natural Phenomenon Modelling: this is a good application field for cellular automata and Lindenmayer systems. A popular task in this field is modelling flock behaviour. For example the behaviour of fish swarms or plant-eating insects can be simulated. Flocking algorithms were used in movies like Batman Returns where animated flocks of bats were simulated. Furthermore flock behaviour can be even used to explain the evolution of fossils. Entertainment and Games: traditional AI in games is modelled via top-down approach which makes it follow simple predefined rules. Finding and tuning these rules needs expensive expert knowledge. Furthermore this kind of AI cannot produce unexpected behaviour. AL s bottom-up approach allows to create AI for games without expert knowledge and more important its behaviour is unpredictable and makes the game play much more interesting. The popular games Half-life and Unreal both use flocking to control the movement of groups of fish, birds, and monsters to create a more realistic and natural environment. Even music can be composed by AL using evolutionary computation. Divers AL music composition systems exist like GenJam, SBEAT3, GP-MUSIC, etc. Economics: like mentioned in the Multi Agent Simulations section, agent based simulation software allows to better schedule, execute, monitor, and coordinate business activities. Kim and Cho explain one more popular application: "Agent-based design and implementation philosophy has been used as a prototype for a business process management system for British Telecom (BT)" [5]. Internet and Information Processing: one of the possible task for the AL in the Internet is creating a recommendation system, which sorts the Internet for a particular user. Kind of an immune system, adapted for every user, could then remove unimportant information from the data found by filtering agents. Since the main power of evolutionary computation is searching they are highly suitable for data mining. For example the colony-based algorithm Ant-Miner can find classification rules in data bases. Industrial Design: AL application solve problems in industrial design like: better planning of traffic systems in big cities, optimizing design of batch chemical processes, and even design aid systems for women s clothes. AL s search algorithms are also used to design the layout of electrical circuits since they became very large and complex. Security: it was already described in the "Autonomic Computing Systems" section that self-healing and self-protection are task which can be solved with adaptive distributed agent based systems. Like Kim and Cho say: "Recent security incidents and analysis have demonstrated that manual response to such attacks is no longer feasible" [5]. Consequently many researches on immune systems for intrusion detection software were made and few named in [5]. Even researches on computer-virus-immune systems exist. The illustration above from [5] shows the field of application and the relations of the methodologies of AL. Multi Agent Simulations The next work I want to present is "Multi-agent simulations and ecosystem management: a review" [2] written by F. Bousquet and C. Le Page. It introduces the main aspects of multi-agent simulations (MAS) and gives some example MAS projects. In MAS many independent agents (programs) are interacting with each other and their virtual environment. To make it simple think of the game of life: each cell on the grid would then be an agent, who only interacts with its direct neighbours. MAS are explained in more detail because one of my suggestions for SkyNet s tasks is a generic MAS framework, this idea will be explained in more detail in the last section. There are two possible types of multi-agent simulation. On the one hand a simulation can reveal "critical coefficients that characterize the transitions" [2]. For this approach, a model with possible transitions needs to be specified. There is some target behaviour of the simulation, that can be achieved with the right combination of coefficients. Exactly this combination is what needs to be found in the simulation. According to the authors this kind of simulation is used in physics science, but unfortunately the paper does not reveal any examples of real projects that deal with this type of simulation. On the other hand a "like reality" simulation can answer "and what if...?" [2] questions. Scientists from life - and 71

72 SURVEY OF POSSIBLE TASKS FOR ARTIFICIAL LIFE IN LARGE-SCALE NETWORKS social faculties are typical users of this kind of simulation. Usually this method helps to understand the relations in the real world, that are described by the model, instead of using the model to predict the behaviour in the reality caused by the chosen coefficients [2]. F. Bousquet and C. Le Page give many examples for the second kind of MAS, explaining all of them would make this paper to long, therefore only some interesting projects will be mentioned. Dean et al. has simulated population movements in response to environmental crises of Anasazi Indians in order to reveal historical information. Another example for use of MAS made by Lansing and Kremer is the simulation of water management in Bali, which revealed some information on the coordination of water management. One more example is that MAS are often used to help understanding the management process of renewable resources and agricultural practices. Also simulating the activity on the roads of a natural park in order to prevent crossing of different user groups like cyclists, walkers and vehicle drivers seems to be an interesting application for MAS. This paper also mentions that multi-agent simulations can use already existing models, for example they can simulate the behaviour of human users in already existing frameworks. For example the workload of some telecommunication can be simulated in order to optimize the structure of a network. Furthermore Bonabeau and Theraulaz mention in their paper [1] that many parallel swarm and genetic algorithms exist. These algorithms use multi-agent like artificial life approaches to solve their problems. Bonabeau and Theraulaz explain the importance of artificial life as following: "In particular, AL builds bridges between natural sciences and the sciences of the artificial: This makes it unique and indispensable" [1] Autonomic Computing Systems This section will briefly present the state of the art on Autonomic Computing Systems(ACSs). In the survey "A Survey of Autonomic Computing Systems" [7] Mohammad Reza Nami and Koen Bertels describe the use of Autonomic Computing Systems to solve the problem of self organisation which was mentioned in the Vision section. In brief the task of ACSs is to improve services and reduce the complexity - "The increasing complexity, cost and heterogeneity of distributed computing systems have motivated researchers to investigate new ideas to cope with the management of this complexity." [7]. This paper also gives some definitions for characteristics of ACSs which will be summarised below later some ACS-projects will be discussed considering these characteristics. Self-configuration is the systems ability of dynamic configuration and adaptation to changing conditions. Self-healing describes the act of failure prevention, but also recovery or replacement of failed components. Self-optimization deals with issues like resource utilization and work load management in a way that suits the user s requests. Self-protection is the capability of the ACS to detect security threats and handle them. Self-awareness means that the system knows its status and the status of the available resources. Context-awareness is the ACS s knowledge of its environment, which allows it to react to changes such as new policies. Openness implies the system being a cross platform application which can operate in an inhomogeneous environment. The authors of the survey mentioned above refer to a paper written by Mazeiar Salehie and Ladan Tahvildari called "Autonomic Computing: Emerging Trends and Open Problems" [9]. This paper gives some examples of already existing ACSs. In the list of commercial products IBM seems to have some interesting projects for large intelligent networks like Oceano, an infrastructure for a large scale computing utility power plant. Oceano is the first scalable and manageable systems of its kind. Besides that, IBM s Optimal Grid which is also a prototype made to help in the creation and management of large-scaled, connected, parallel grid applications seems to be interesting. Optimal Grid optimizes performance and includes autonomic grid functionality. Nevertheless the industry-oriented projects seem to focus only on self-configuring and self-optimizing. Oceano seems to be an exception among industry-oriented projects with its capability of self-awareness. Regarding the academic-oriented projects the two most valuable in our context are AntHill developed at the University of Bologna and ebiquity created at the University of Baltimore County. AntHill is a supportive tool for the implementation and evaluation of multi-agent and evolutionary programming based p2p-applications. It simplifies the design, implementation and evaluation of such systems. ebiquity "explores the interactions between mobile, pervasive computing, multi-agent systems and artificial intelligence techniques" [9]. Both tools could be very useful for the above mentioned multi-agent simulations. Salehie and Tahvildari point out that the focused autonomic characteristics in academic projects are different to the industry-oriented, since they focus more on openness. Furthermore ebiquity is even able of context-awareness in contrast to almost all described projects in the paper. Nevertheless there seems to be no ACS that satisfies all characteristics, consequently further research in this area is necessary. V. CONCLUSION Artificial intelligence became an essential part in today s information technology. The complexity of computer systems increased to a degree where we cannot handle all its layers and need systems which allow to interact with a higher abstraction 72

73 SURVEY OF POSSIBLE TASKS FOR ARTIFICIAL LIFE IN LARGE-SCALE NETWORKS level without knowing the exact details beyond it. AI is present in complicated and expensive systems like spacecraft, whose sensitive controls are too complex to being controlled one by one, but it is also present in little every day applications like spam filters and web search engines. Also artificial life has a wide application area. Simulations of complex environments, like the coordination of water management in Bali, allow to find possible problems and solutions before expensive projects are realized outside the virtual world. Similar to AI we also face AL applications in every day life like in fields of computer graphics, entertainment, Internet or security. VI. FUTURE WORK Back to the question what the autonomous aims of SkyNet s peer to peer network could be. SkyNet does not have an underlying large-scale network yet. For this reason many application field for AL are not suitable for SkyNet. For example in robotics AL is used to design controllers, but there is no task that provides direct benefit to a large group of people. That is why it will be impossible to find participating nodes for a large-scale network in this special field. Because of that, I will suggest only tasks that are attractive to many users. My first suggestion is to use SkyNet as a medical diagnosis system. Its task would be to propose possible diseases to the given symptoms. This systems can use SkyNet s p2p-network as a distributed database, which saves data about diseases and the probability of their symptoms. There were similar applications in the past, like the Quick Medical Reference [6] possibly one of the most effective medical expert systems. Many papers with technical ideas on creating such systems like "A Tractable Inference Algorithm for Diagnosing Multiple Diseases" [3] exist. Nevertheless there is no system that is reachable in every hospital in the world, SkyNet runs in the Internet, therefore it is reachable from everywhere. Moreover SkyNet is reachable for ordinary people, this would allow them to inform themselves on possible diseases according to their symptoms, which would make their questions to the doctor much more precise. Furthermore an exclusive user group like doctors and experts could be allowed to teach SkyNet and expand its database. This would allow to create an always up to date medical database, which does not need to be replaced after dozens of years, instead it would become more and more precise. Besides this special user group could provide the system with feedback about its proposal. The system could then learn and improve the probabilities of the diseases according to the feedback on its suggestion. Also the query interface for the system could be improved with use of AI to find synonyms for diseases and symptoms. Besides a medical diagnosis system will be likely used by a large group of people, which will make it easier to find participants for the p2p-network. One more possibility is creating Web 3.0. Since SkyNet has access to many desktop computers through the Internet, it can use their computational power to create Web 3.0. The condition SkyNet being a large distributed network allows to use highly parallel algorithms very well. Consequently all algorithms from artificial life can be executed with a high-performance. One possible task, like proposed in Kim s and Cho s paper, could to be creating a recommendation system [5]. This system could be individually adapted for each Internet user in the world. SkyNet could use its computational power to permanently run filtering agents in order to index and classify the Internet. With the mentioned immune system agents, individual for each user, SkyNet could adapt the retrieved information to the surfing habits of its users. Furthermore SkyNet could improve the query interface for Internet searches. Like the above mentioned image retrieval system designed by Cho and Lee which searches the images not only with explicit queries, but also with implicit queries such as "cheerful expression" and "gloomy expression" [5]. SkyNet would then learn to answer questions like humans like to ask them. In this case people would not need long lasting experience with search engines like google to find what they are searching for in the Internet. One more feature SkyNet could provide for the Web 3.0 is cleaning it from viruses using some immune system agents, which search for viruses and eliminate them. SkyNet could offer a service to clean up a desktop computer from viruses and keep it upto date if it joins the network. Moreover the time between the discovery of a new virus and distributing a database update to all users could be reduced drastically. Also this application for SkyNet will attract many users if it becomes successful, in this case the nodes for the network will be easy to find. Another use of SkyNet could be a generic multi agent simulation system. SkyNet will have a computation power that can not be realized with one machine and not event with a group of high performance computers. Therefore parallel, large scaled and CPU-intensive simulations could be run. SkyNet s definition of artificial life could then be exchanged for every simulation. The simulations could lower the costs for big projects that need expensive planning and improve the quality of those. The user group for a simulation software will have a limited number of participants, however these users will usually come from big companies, which could offer powerful computers for the network. The exclusive user group for this application will make it more difficult to create a large-scale network, nonetheless it is not impossible to create a network between companies, since all of them would benefit from SkyNet. REFERENCES [1] Eric W. Bonabeau and G. Theraulaz. Why do we need artificial life? MIT Press, IV [2] F. Bousquet and C. Le Page. Multi-agent simulations and ecosystem management: a review. Ecological Modelling, IV [3] David Heckerman. A Tractable Inference Algorithm for Diagnosing Multiple Diseases. Medical Computer Science Group, Knowledge Systems Laboratory, Departments of Computer Science and Medicine Stanford, VI [4] Stuart J.Russell and Peter Norvig. Artificial Intelligence A Modern Approach (Second Edition). Pearson Education, Inc., III 73

74 SURVEY OF POSSIBLE TASKS FOR ARTIFICIAL LIFE IN LARGE-SCALE NETWORKS [5] Kyung-Joong Kim and Sung-Bae Cho. A Comprehensive Overview of the Applications of Artificial Life. Department of Computer Science Yonsei University, 134 Shinchon-dong, Sudaemoon-ku, Seoul , Korea, III, IV, VI [6] Randolph A. Miller and Fred E. Masarie Jr. Quick Medical Reference (QMR): An evolving, microcomputer-based diagnostic decision-support program for general internal medicine VI [7] Mohammad Reza Nami and Koen Bertels. A Survey of Autonomic Computing Systems. Computer Engineering Laboratory, Delft University of Technology, IV [8] Vimla L. Patel, Edward H. Shortliffe, Mario Stefanelli, Peter Szolovits, Michael R. Berthold, Riccardo Bellazzi, and Ameen Abu-Hanna. The Coming of Age of Artificial Intelligence in Medicine. proceedings of Artificial Intelligence in Medicine (AIME 07), III [9] Mazeiar Salehie and Ladan Tahvildari. Autonomic Computing: Emerging Trends and Open Problems. Dept. of Elect. and Comp. Eng. University of Waterloo, IV [10] Wolfram Stephen. A New Kind of Science. Wolfram Media, III 74

75 SURVEY AND DEFINITION OF DISTRIBUTED INFORMATION MANAGEMENT SYSTEMS Survey and Definition of Distributed Information Management Systems Niklas Lochschmidt Abstract In recent years peer-to-peer applications have grown into fields were it is essential to be able to monitor the systems state in order to make configurations and optimizations at runtime. Distributed Information Management Systems (DIMS) have emerged that gather system information in an efficient way and provide an interface to issue complex queries on this information. This paper surveys the approaches to distributed information management, lists common core functionalities and properties of DIMS and states a definition of the term DIMS. I. INTRODUCTION Over the last 15 years peer-to-peer (P2P) systems have emerged in order to serve as a viable and scalable alternative to conventional client-server architectures. The first generation of P2P systems concentrated solely on file sharing and while their performance was good in the beginning it turned out that they were actually flawed in some aspects. For example Napster had a central server cluster for indexing the available files. The centralized architecture scaled only because of significant financial investment into the indexing servers, which were a single point of failure and served as legal targets in the lawsuit that brought Napster to a halt [1]. However in Napster there was a possibility to see at least how many users are online and how many files are shared even if those numbers were unlikely to be accurate [2]. The next contestant Gnutella, was originally based on a gossiping protocol and had no guarantee for finding files even if they were available somewhere in the system. As a result, counting the exact number of peers or files in the system is not possible [2]. The second and third generation of hybrid and structured P2P systems reintroduced the possibility to aggregate basic system state information. They have been studied to an extent, that they are now regarded as robust and scalable alternatives for many applications like file sharing [3] and publish-subscribe architectures [4]. In hybrid P2P systems like FastTrack [5] aggregating system state information is possible because of the presence of coordinator nodes called super-nodes, which communicate with other super-nodes and aggregate the state of several ordinary nodes. Structured P2P systems like Chord [6] or Kademlia [7] can be monitored by deriving system information from the structure or by querying specific nodes for their state. However these strategies for acquiring system state information directly depend on the inner workings of the monitored system, offer no guaranties for correctness and are often not efficient. In recent years several new applications for P2P systems like Application Layer Multicast [8], [9], Distributed Computation [10] and Voice-over-IP (VoIP) [11] have been proposed. These new applications have significantly stronger requirements regarding Quality-of-Service and must often be explicitly designed to handle heterogeneous peers, for example smartphones in a VoIP system. These complex applications require the P2P systems to become autonomous to a larger extent. In "The Vision of Autonomic Computing" [12] Jeffrey O. Kephart and David M. Chess state that a system can only be self-managing if it has a means of acquiring live system information. According to their vision, a system should at first "merely collect and aggregate information to support decisions by human administrators", while the information will later serve as foundation for automatic advisory and ultimately autonomic decision making to allow the system to configure, heal, protect and optimize itself. Current P2P systems are usually designed to be able to optimize based on information that a node can directly obtain and it is not possible to get any information on the system wide status. Distributed Information Management Systems (DIMS) have been proposed to meet the demand for system information. Zhang et al. have described a DIMS to be a system "to gather from and distribute to entities comprising the system whatever system metadata of concern" [13]. What is described with metadata is usually low level machine information like number of CPUs, amount of available bandwidth or amount of available memory. In large scale distributed systems it would be infeasible to store and forward every bit of information to every peer as the amount of information is practically unbounded. As a result DIMS are able to calculate aggregates from the low level information. The desired effect is, that the size of each aggregate is now bounded, but it also brings with it the potentially undesired effect of information loss. To counter this DIMS calculate aggregates for different levels in a hierarchy to allow the presentation of overviews as well as details of the systems state. With this information an individual peer, a peer functioning as coordinator or the developer of a P2P system can decide what steps should be taken in order to meet the requirements of the new applications. The majority of scientific work on the foundations of DIMS has been done around 2003 and Systems like Astrolabe [14], SOMO [13] and SDIMS [15] have introduced the basic idea what functionality and properties DIMS should provide. Newer contributions such as SkyEye.KOM [16] refined these approaches to deal with emerging trends towards mobile computing and higher heterogeneity of nodes. The main contribution of this paper is to state a definition of the term "Distributed Information Management System", because even though the term is commonly used, as of now there is no common definition available. The proposed 75

76 SURVEY AND DEFINITION OF DISTRIBUTED INFORMATION MANAGEMENT SYSTEMS definition is based on the approaches that already used the term. Therefore a survey on DIMS is necessary before a definition can be found that represents the greatest possible denominator. The next chapter describes the approaches to distributed information management found in the literature. Based on these, chapter III contains a description of common core functionality and core properties for a DIMS together with the definition for the term DIMS. Chapter IV includes currently unresolved problems that should be addressed in the future and finally chapter V concludes. II. APPROACHES TO DISTRIBUTED INFORMATION MANAGEMENT In this chapter an overview over existing approaches to distributed information management for P2P systems is presented. The implementations described in this chapter have been selected because they are either representative in terms of the offered functionality or exhibit interesting rare properties. The order in which the implementations are presented here is based on the succession in which the papers describing the systems reference each other. As additional information, at the end of this chapter there are some alternative approaches to distributed information management described that are implemented in way that makes them hardly usable on the internet and for that reason are not taken into account for the definition presented in chapter III. A. Astrolabe Developed by Van Renesse et al. Astrolabe [14] is a hierarchical P2P system based on a randomized epidemic protocol, also called gossiping protocol. An epidemic protocol was chosen, because it is naturally resistant to failure of nodes or whole network subsystems. The hierarchy of nodes in Astrolabe is similar to a DNS hierarchy with top-level domains, and administrative isolated subdomains. Astrolabe is based on the decentralized storage of so called Managed Information Bases (MIB) and aggregations there from. Each node has an Astrolabe-agent installed that has access to a MIB containing the attribute values of the local machine (Number of CPUs, amount of bandwidth,... ). Rendezvous of nodes in the same subdomain is done via IP- Multicast and each agent stores a list of reachable agents for each subdomain it is a part of. All agents gossip the local MIB to other agents in the subdomain and receive their local MIBs the same way. Aggregations of the remote and local MIBs are calculated via SQL aggregation queries (AVG, SUM,...) and the result is a new local MIB for use in the subdomain one level up the hierarchy. This process takes place periodically and for each level in the hierarchy until, in the end, each node stores a set of MIBs for each subdomain it is in. The size of the aggregation record and the general performance correlates with the number of active aggregation queries. These queries can be installed on the agents remotely at runtime and the results of the queries can be probed multiple times later on. Due to the administrative isolation of the individual domains, programs can be installed in specific subdomains only and a partition of the system due to network failure only effects parts of the system. Problems with Astrolabe are, that it is not self-configuring, meaning the buildup of the hierarchy and naming of the nodes has to be done by hand by administrators and it is therefore their responsibility to find a tradeoff between delay (depth of tree) and network load (fan-out). In addition, any gossiping scheme, by design, requires aggressive replication of messages and aggregate information (see [15]) and Van Renesse et al. therefore make the assumption, that "the number of aggregating queries active within any given scope is assumed to be reasonably small". B. SOMO In contrast to Astrolabe, the Self Organized Metadata Overlay (SOMO) by Zhang et al. [13] does not use gossiping but instead builds on top of a Distributed Hash Table (DHT) such as Pastry [17] or Kademlia [7]. It leverages the features of P2P DHTs especially self-organization and self-healing and builds an overlay on top. Opposed to Astrolabe, SOMO s algorithms contain placeholders for functions that have to be implemented before deployment to the nodes. For example, to build the hierarchy needed for aggregation, SOMO relies on a supplied function that takes an interval of the key-space (a zone) and calculates the key to locate the responsible node for this zone. This way the overlay on top of the DHT is a tree in which the inner nodes are the coordinators of the zones. Each node periodically executes a routine to determine its place in the tree and in each round the tree potentially changes and the hierarchy is therefore not static like in Astrolabe. Aggregated information in SOMO is called a report. To calculate a report, each node pulls the report from it s children and calculates a new report from these. Each child node periodically calculates the report in the same way and sends it back, if a node is a leaf node it simply returns it s local machine information. The calculation of the new reports is done using a set of functions that again have to be defined by the developer before deployment. The functions can therefore not be changed at runtime. When a node has calculated a full report, consisting of the results of all functions in the set, it sends the report back to the children. Since this goes all the way up to the root of the tree, the aggregation of the reports can be viewed as a converge cast, while the dissemination of the reports is essentially a multicast. In addition to the periodically aggregation of reports, Zhang et al. also envisioned the implementation of capacity search (e.g. "search for 5 nodes with bandwidth > x") and publishsubscribe into SOMO. Problems with SOMO are, that there is no control over the way the tree is build up, which becomes a problem when peers are heterogeneous with low bandwidth/low processing capability nodes in the system. In this case a slow peer can become an inner node or the root of the SOMO tree and significantly slow down the dissemination of the reports. Due to a lack of subtree (administrative) isolation the tree structure changes rapidly and temporary disconnection of individual nodes has a negative effect on all nodes in the system. Opposed 76

77 SURVEY AND DEFINITION OF DISTRIBUTED INFORMATION MANAGEMENT SYSTEMS to Astrolabe the type of information included in the reports can not be changed at runtime so that SOMO can only be used for a predetermined task and has to be reinstalled or updated once the need for different information arises C. Willow DHT As a follow up to Astrolabe, the Willow DHT [18] borrows from Astrolabes design model, but is implemented as a DHT similar to Kademlia. The P2P aggregation protocol uses a combination of SQL programs installed on the nodes in addition to multicast functionality along the routing tree. Each node in the Willow DHT has a 128 bit identifier and nodes with the same prefix share the same domain. Each node can therefore be in as many as 128 domains. Similar to Astrolabe each node stores all aggregates of all domains in which the node is located. Each domain promotes a contact member that acts as coordinator for the domain. When a new SQL program should be installed it is send to the coordinator for the root domain (length of shared prefix is zero), which forwards it to next two coordinators in the tree (shared prefix = 0 and shared prefix = 1). The results of the SQL program are send from the leafs to the coordinators and up to the root coordinator. The Willow DHT also supports publish-subscribe by implementing multicast with filtering. A multicast message can be augmented with a SQL statement that is evaluated with the attributes of each child domain. If the query evaluates to true the message will be forwarded to the coordinator of that subdomain. This way messages could be send for example to all nodes with CPU load smaller than 50%. Like most P2P systems implementing Application Layer Multicast the Willow DHT adapts the tree to optimize for low link latencies. D. DASIS The Distributed Approximate System Information Service (DASIS) [19] is conceptualized to integrate directly into an existing structured P2P system to effectively improve the join operation of new peers into the system. In DASIS a peer with a given key is expected to be an "expert" on the state of all domains consisting of the peers with a same prefix. This results in one domain for each possible prefix and a binary tree (shown in figure 1) that represents the hierarchy of these domains. Each peer is then responsible for exchanging messages with all peers in their routing table to gather information about available bandwidth, number of peers in each subtree, etc... Each of the attributes to be collected again has to be chosen by the developer at development time and can not be changed at runtime. After an initial request for some information by a peer to some remote peer in the routing table, the remote peer is responsible to send updates to the asking peer whenever the value of an attribute changes. As a result the number of attributes is limited and the selection of attributes with a high probability for change should be omitted. Fig. 1. Hierarchy of domains in DASIS: each inner node in the tree is a domain and each peer with the same prefix is part of the domain E. SDIMS The Scalable Distributed Information Management System (SDIMS) by Praveen Yalagandula and Mike Dahlin [15] combines features of Astrolabe, like runtime configuration and administrative isolation, with the scalability and efficiency that is possible by using a DHT as substrate. The authors state, that hierarchical aggregation as utilized by the previous systems is the key for making a DIMS scalable. Path locality on the other hand assures that a subdomain in a system can continue to work even if other subdomains can not be reached [20]. To achieve this the underlying DHT of SDIMS called Autonomous DHT or A-DHT is a modified version of Pastry maintaining a different leaf set for each domain the node is in. Like in Astrolabe the SDIMS aggregates information by installing aggregation functions for specific attributes. The aggregation function has an expiration time so it must be periodically renewed or otherwise it will time out. Each function can also be restricted to specific domains only (administrative isolation). When a value at a local node changes, the change is send to the node in the same domain that is closest to the hash of the attribute name and type. This results in one tree being build per attribute which means that for a large number of attributes, the load is balanced on all nodes. When a node is nearest to the key it calculates the aggregate for the domain based on the received data and sends the report to all nodes in the same domain as well as to the node that is nearest to the key in the next higher hierarchy level, etc... The installation of the aggregate function takes two arguments that specify how far up each update is sent and how far down each aggregate is sent. This allows fine tuning of the amount of replication in the system compared to the time a probe for the result of an aggregation function will need. The probe operation is used to get the results of an aggregation function. The operation can be one time only (one-shot) or continuous. The latter means, that the result of an aggregation will be provided to the issuer of the probe once the result changes, which essentially implements a publish-subscribe infrastructure. Since SDIMS has the concept of administrative isolation, a probe operation can also be restricted to individual domains. With SDIMS, Yalagandula and Dahlin presented a very complete DIMS in 2004 and since evaluation also looked promising this is likely the cause for a lack of work on the topic DIMS in the next 3-4 years. Recently the increasing importance of systems being able to cope with heterogeneous nodes ranging from smartphone to Desktop-PCs and servers 77

78 SURVEY AND DEFINITION OF DISTRIBUTED INFORMATION MANAGEMENT SYSTEMS while still being able to compute complex aggregates from many attributes reopened the topic of DIMS. F. SkyEye.KOM The SDIMS builds up a tree for each attribute which means calculation of complex queries that combine multiple attributes becomes time consuming and produces some significant overhead. In addition the equal balancing of the workload is not the right choice when deployed in a heterogeneous environment. In that case workload should be shifted towards capable nodes in terms of available computation power and bandwidth. SkyEye.KOM [16] solves this problem by reintroducing a domain coordinator as seen in SOMO. The main difference between SOMO and SkyEye.KOM is, that a system-wide parameter T Min specifies how many peers have to at least be attached to a coordinator, while a peer-specific parameter T Max, calculated from the peers system specification and network connectivity, specifies how many peers a specific coordinator can serve at most. When the number of peers in a coordinator s domain falls below T Min the domain is closed and all peers are attached to the coordinator one level up the hierarchy. If on the other hand the number of connected peers at a coordinator exceeds T Max the coordinator selects a group of support peers and temporary delegates handling of the additional peers to them thereby shifting the load to a more capable peer. This is a prime example on how system information gathered by a DIMS can be used to to optimize the structure of a P2P network. G. Other approaches Some generic distributed databases can be used to monitor system state but they are not optimized for the task. This is especially true for distributed databases that conform (or try to conform) to the ACID principle [21] and can only be deployed in a controlled environment, because they can not handle high churn rates. Most distributed databases are also designed for relative low update rates, which is not true for all attributes that could be of interest. A distributed database that has been proposed for monitoring purposes is PIER [22]. However PIER is not designed to efficiently store and disseminate system state information for many clients but is optimized to serve generic SQL queries for each client individually. DIMS are designed to be scalable to a large amount of nodes and robust to node failures or network latencies. While these properties are ideal for using it over the internet it is neither usable for real-time applications nor for systems designed for low power consumptions. For such applications other monitoring systems like the ones below could be used instead. The Ganglia distributed monitoring system [23] is designed for monitoring high-performance computing systems like clusters and grids and also incorporates the idea of a hierarchy in conjunction with information aggregation. This system operates on the assumption that the attached clusters have a high reliability which stands in contrast to the natural churn in P2P systems. In addition Ganglia makes extensive use of IP multicast and heartbeat messages that are not an option when designing for the internet. On the other hand systems like TAG [24] are designed for sensor networks consisting of many battery powered and wireless sensor nodes. TAG relies heavily on the underlying wireless network characteristics and uses stream processing techniques to aggregate information from readings of many sensor nodes. This can usually be done without exchanging any additional messages which reduces power consumption to a minimum. However a technique like this is not deployable to machines communicating over the internet. III. DEFINITION An appropriate definition of DIMS should be the greatest common denominator of properties and functionality of the surveyed approaches. In this chapter the common aspects are summarized and then a definition is stated. A. Configuration All DIMS can be configured for the task at hand, however DIMS can be deployed in several ways. Astrolabe, SDIMS and SkyEye.KOM are designed as stand-alone systems or Aggregation Management Layers while DASIS and SOMO are designed to be directly integrated into a P2P system. The main difference is, that stand-alone DIMS should be used by several applications at once and reconfiguration must therefore be possible at runtime, while the integrated systems are configured before runtime. B. Data DIMS operate on data that is supplied by the nodes in the system. Often this data represents the low-level system state of the host computer, but in principle every kind of data is allowed as long as aggregates can be calculated (see below). C. Aggregation Functions Aggregation is the process of combining multiple data points into a single bounded value that is still of use to the interested party. To aggregate information one or multiple aggregation functions must be defined. In DIMS that are configured before runtime the developer has total control over what the function does, however in systems where aggregation functions can be installed at runtime the common type of installable aggregation functions are semantically equivalents to the structured query languages (SQL) aggregation functions count, sum, avg, min and max [25], [26] (see Table I). D. Hierarchical Aggregation In DIMS the aggregated information is always structured in a hierarchical way since "hierarchical aggregation is a fundamental abstraction for scalability" [15]. Building a hierarchy and allowing nodes to request aggregations for a specific subtree of the hierarchy (a domain) means, that a birds-eye view of as well as detailed insight into the state of the distributed system can be efficiently calculated. 78

79 SURVEY AND DEFINITION OF DISTRIBUTED INFORMATION MANAGEMENT SYSTEMS Function count sum avg min max Example in DIMS Count the number of nodes in the system Sum up the total amount of disk space available Calculate the average bandwidth of the nodes Find the minimum CPU frequency of all nodes Find the maximum amount of free memory that one of the nodes has TABLE I COMMON SQL AGGREGATION FUNCTIONS IN THE CONTEXT OF DISTRIBUTED INFORMATION MANAGEMENT Capacity search is similar to an SQL select request with a limit argument. This ensures that the message size is bounded and that the request can be efficiently served. An example for such an request would be to retrieve the three nodes in a domain that have the most bandwidth available (e.g. to use them as router nodes inside an application layer multicast). This way a system utilizing the DIMS can use it to configure, repair and optimize itself. The structure of the aggregation hierarchy in integrated DIMS is usually identical to the hierarchy implied by the underlying DHT, while stand-alone systems use a separate hierarchy. The way in which this separate hierarchy is constructed is usually externalized by the DIMS. Astrolabe, like the Domain Name Service (DNS), relies on human administrators to assign nodes to domains before connecting the nodes to the system. SkyEye.KOM and SDIMS on the other hand rely on a function to calculate the coordinator for a peer based on the peers key, thus all peers that connect to this coordinator reside in the same domain. Such a function can be deterministic but it could also be designed to leverage the geographic position or connectivity provided by the peer. The aggregates returned by a DIMS are not guaranteed to show the actual current status of the system and it is not guaranteed that the result of the same request on two different nodes at the same time yields the same result. However given that the arguments for the aggregation function and the P2P system is stable for some time, the results will eventually become consistent (eventual consistency [14]). E. Administrative Isolation When a DIMS is deployed as a stand-alone system, it is also important that all operations (query, capacity search and publish-subscribe) supported by the DIMS can be restricted (logical and physical) to a specific domain in the hierarchy. This property is called administrative isolation and it is important for three reasons [14], [15]: security: Nodes outside the queried domain do not receive messages with potentially sensible information. availability: Sibling domains of a domain can fail without influencing domain internal operations. efficiency: Messages are only exchanged inside the nodes of a domain which is a significantly smaller number then all nodes in the system, except for queries at the root or top-level domain. This also means, that domains can be formed according to their locality to improve responsiveness. F. Capacity Search While the aggregation functions above are useful to decide whether a system is of good health, it is not suitable to help with the configuration, repairs and optimizations. Therefore DIMS commonly implement capacity search or publishsubscribe mechanisms or both. G. Publish-Subscribe Publish-subscribe is equally important for self-repair and self-optimization as well as self-protection. For example one could subscribe for an event designed to identify faulty or malicious nodes. As soon as a node detects such a faulty node it could publish a notification and the subscribers can then try to deal with the problem. H. Soft-State When a DIMS is configurable at runtime, precautions must be taken to ensure that installed aggregation functions and the subscriptions are cleared when the application that installed them is no longer active. The common approach is to make installed functions and subscriptions soft-state so they must be refreshed or will eventually expire. Astrolabe is an exception in that the installation request is cryptographically signed and the issuer of the request is responsible for uninstalling the request when it is no longer needed. I. The Definition I propose the following definition based on the survey over existing DIMS and the analysis above: A Distributed Information Management System is a reduced Distributed Database in which the stored information is gathered automatically and represents the status of the distributed system in a hierarchical way. IV. DISCUSSION AND FUTURE WORK The definition above does not include a concrete statement about what functionality must always be provided. This is reasonable because DIMS serve either a specific application or multiple applications at once. What functionality a concrete DIMS should provide therefore depends on the specific purpose for which a DIMS is used and can therefore not be a part of the definition. The definition also does not include a description on how the information is gathered in order not to over-specify. That said a DIMS should offer the ability to fine-tune the rate with which updates on attribute values and continuous query results are sent in the system. This is important because some local attributes do not change (number of CPUs), some change with a slow rate (available disk space) and some change with high frequency (free CPU cycles). One client of a DIMS might need the amount of free CPU cycles of all nodes every 10 seconds while another might need the available disk space of all nodes in a domain every minute. A DIMS with a fixed refresh rate per attribute might either generate more 79

80 SURVEY AND DEFINITION OF DISTRIBUTED INFORMATION MANAGEMENT SYSTEMS traffic then needed or produce less updates then necessary. The up and down parameters in the SDIMS are a way to cope with that but the SDIMS does not help with choosing nor optimizes the parameters itself, but requests them from the client. Apart from that in the SDIMS the aggregation functions are calculated for single attributes only. In a system that allows multiple attributes for each query, attributes can have different frequencies of change, which should be taken into consideration. An aspect that has been largely ignored since Astrolabe is security in DIMS. The only security in most DIMS is offered by administrative isolation and is as such very weak. For example anyone could intercept messages right before a peer and could inject wrong values. Of course a security mechanism like a Public Key Infrastructure could be used in conjunction with any DIMS, but especially in the presence of peers with low computational power one should investigate in possible alternatives and maybe include a statement on security into the definition of DIMS. Currently all DIMS rely on the nodes in the system to be faithful with reporting their local machine information. If a node reports fake information it can greatly influence the aggregation results and can not be easily distinguished from faulty values due to churn (joining and leaving of nodes). One solution to solve this problem is to explicitly detect and remove obvious outliers from the calculation or by applying smoothing mechanisms [27]. While this has been shown to be effective for dealing with churn related error, it is not guaranteed to reduce the effect of malicious nodes. Another solution would be to create the local DIMS instance as part of a trust chain from a Trusted Platform Module 1 Chip, but TPM-Chips are currently not built into every device. The definition in this paper is based on several work done in academic research, however, there are currently many proprietary P2P systems widely deployed, like Skype 2 or Zattoo 3, that are heterogeneous as well as efficient Skype connects desktop computers, stand-alone telephones and mobile phones. These systems are bound to include some kind of distributed information management to optimize themselves. A real world data analysis of these systems could provide more relevant results then the results from simulations. However since they are proprietary and encrypted an analysis is difficult [28] and is out of scope for this paper. In addition to smartphones even smaller devices like sensor nodes have been proposed as possible participants in a DIMS. However since sensor nodes normally work with a different kind of physical network (see ZigBee [29]) an integrated DIMS or a total different approach to distributed information management that is not based on a DHT is probably more suited [30]. In general it should be studied in which situation a generic stand-alone DIMS should be favoured over a specialized or low-level information management system. 1 Information on TPM, see: 2 Skype: popular chat, webcam and VoIP software, 3 Zattoo: P2P based video streaming software, see: V. CONCLUSION With increasing complexity of current P2P systems, designers and developers of such systems must be able to obtain live system information to improve the design, protocol and implementation of their applications. Apart from that, enough information on the systems state has to be aggregated in order to serve as a solid foundation for automatic decision making in autonomous P2P systems. In this paper a definition of the term Distributed Information Management Systems has been stated and core functionalities and properties of DIMS have been identified that incorporate the requirements of developers and autonomous systems. The definition represents the greatest common denominator of current approaches to distributed information management. REFERENCES [1] T. Ryan, Infringement.com: RIAA v. Napster and the War Against Online Music Piracy, Ariz. L. Rev., vol. 44, no. C, p. 495, [2] S. Saroiu, K. Gummadi, and S. Gribble, Measuring and Analyzing the Characteristics of Napster and Gnutella Hosts, Multimedia systems, vol. 9, no. 2, pp , [3] E. Biersack, P. Rodriguez, and P. Felber, Performance Analysis of Peerto-Peer Networks for File Distribution, LNCS: Quality of Service in the Emerging Networking Panorama, pp. 1 10, [4] A. Rowstron, A.-M. Kermarrec, M. Castro, and P. Druschel, SCRIBE: The design of a large-scale event notification infrastructure, Proceedings of 3rd International Workshop on Networked Group Communication (NGC2001), pp , [5] J. Liang, R. Kumar, and K. Ross, The FastTrack Overlay: A measurement study, Computer Networks, vol. 50, no. 6, pp , Apr [6] I. Stoica, R. Morris, D. Karger, and M. Kaashoek, Chord: A scalable peer-to-peer lookup service for internet applications, Proceedings of the 2001 Conference on Applications, Technologies, Architectures and Protocols for Computer Communications, vol. 11, no. 1, pp , Feb [7] P. Maymounkov and D. Mazieres, Kademlia: A peer-to-peer information system based on the xor metric, Peer-to-Peer Systems, pp , [8] S. Banerjee, B. Bhattacharjee, and C. Kommareddy, Scalable application layer multicast, in SIGCOMM 02: Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications. New York, NY, USA: ACM, 2002, pp [9] M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. Rowstron, and A. Singh, SplitStream, in ACM SIGOPS Operating Systems Review, vol. 37, no. 5. ACM, Dec. 2003, p [10] D. Milojicic, V. Kalogeraki, R. Lukose, K. Nagaraja, J. Pruyne, B. Richard, S. Rollins, and Z. Xu, Peer-to-Peer Computing, Oct [11] K. Singh and H. Schulzrinne, Peer-to-Peer Internet Telephony Using SIP, Proceedings of the International Workshop on Network and Operating Systems Support for Digital Audio and Video - NOSSDAV 05, p. 63, [12] J. Kephart and D. Chess, The Vision of Autonomic Computing, Computer, vol. 36, no. 1, pp , [13] Z. Zhang, S. Shi, and J. Zhu, SOMO: Self-organized metadata overlay for resource management in P2P DHT, Peer-to-Peer Systems II, pp , [14] R. V. Renesse, K. Birman, and W. Vogels, Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining, ACM Transactions on Computer Systems, vol. 21, no. 2, pp , May [15] P. Yalagandula and M. Dahlin, A scalable distributed information management system, ACM SIGCOMM Computer Communication Review, vol. 34, no. 4, p. 379, Oct [16] K. Graffi, A. Kovacevic, S. Xiao, and R. Steinmetz, SkyEye.KOM: An information management over-overlay for getting the oracle view on structured P2P systems, 14th IEEE International Conference on Parallel and Distributed Systems, pp , Dec

81 SURVEY AND DEFINITION OF DISTRIBUTED INFORMATION MANAGEMENT SYSTEMS [17] A. Rowstron and P. Druschel, Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems, in IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), vol. 11. Citeseer, 2001, pp [18] R. van Renesse and A. Bozdog, Willow: DHT, aggregation, and publish/subscribe in one protocol, Peer-to-Peer Systems III, pp , [19] K. Albrecht, R. Arnold, and R. Wattenhofer, Join and Leave in Peer-to- Peer Systems: The DASIS approach, Technical report, CS, ETH Zurich, [20] N. Harvey, M. Jones, S. Saroiu, M. Theimer, and A. Wolman, Skipnet: A scalable overlay network with practical locality properties, Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems, vol. 5, p. 9, [21] T. Haerder and A. Reuter, Principles of transaction-oriented database recovery, ACM Computing Surveys, vol. 15, no. 4, pp , Dec [22] R. Huebsch, J. Hellerstein, N. Lanham, B. Loo, S. Shenker, and I. Stoica, Querying the Internet with PIER, in Proceedings of the 29th international conference on Very large data bases-volume 29. VLDB Endowment, 2003, p [23] M. Massie, B. Chun, and D. Culler, The Ganglia Distributed Monitoring System: Design, implementation, and experience, Parallel Computing, vol. 30, no. 7, pp , Jul [24] S. Madden, M. Franklin, J. Hellerstein, and W. Hong, Tag: a tiny aggregation service for ad-hoc sensor networks, ACM SIGOPS Operating Systems Review, vol. 36, no. SI, p. 146, [25] M. Bawa, H. Garcia-Molina, A. Gionis, and R. Motwani, Estimating aggregates on a peer-to-peer network, Computer Science Department, Stanford University, Tech. Rep., [26] G. Saake and A. Heuer, Datenbanken-Konzepte und Sprachen. Mitp- Verlag, Heidelberg, [27] K. Graffi, D. Stingl, J. Rueckert, A. Kovacevic, and R. Steinmetz, Monitoring and Management of Structured Peer-to-Peer Systems, IEEE Ninth International Conference on Peer-to-Peer Computing, pp , Sep [28] S. Baset and H. Schulzrinne, An analysis of the skype peer-to-peer internet telephony protocol, Arxiv preprint cs/ , pp. 1 11, Apr [29] P. Kinney and Others, ZigBee Technology: Wireless control that simply works, in Communications design conference, vol. 2, no. October, 2003, pp [30] D. Kempe, A. Dobra, and J. Gehrke, Gossip-Based Computation of Aggregate Information, 44th Annual IEEE Symposium on Foundations of Computer Science, Proceedings., pp ,

82 PEER-TO-PEER BUSINESS MODELS Peer-to-Peer Business Models Untersuchung der kommerziellen Verwertbarkeit der P2P-Technologie Oliver May Zusammenfassung Die Peer-to-Peer (P2P) - Technologie hat in den letzten zehn Jahren, ungeachtet der Illegalität, die Geschäftstätigkeit der Musikverlage und Filmstudios sowie das Nutzungsverhalten von Internetteilnehmern wesentlich beeinflusst. Doch können, außerhalb des Umfeldes des illegalen File-Sharings, ökonomisch sinnvolle Geschäftsmodelle auf Basis eines P2P- Netzwerkes etabliert werden? Zur Beantwortung dieser Frage werden zunächst die theoretischen Einsatzgebiete im Hinblick auf eine produktive Verwertbarkeit erarbeitet um darauf folgend, anhand der Untersuchung von vier realisierten Business Modellen, eine allgemeine Antwort abzuleiten. Abschließend kommt diese Arbeit nach der Auswertung der Anwendungsgebiete und der Fallstudien zu dem Ergebnis, dass ein P2P-System keine ausreichende Grundlage für ein praktikables Geschäftsmodell darstellen kann. Dennoch lässt sich mit einem kontrollierten Einsatz dieser Technologie, um existierende Dienste zu erweitern oder deren Betriebskosten zu reduzieren, eine indirekte Ertragsquelle erkennen. I. EINLEITUNG Die Ende der siebziger Jahre von Prof. Dieter Seitzer entwickelte Idee, Musiksignale über Telefonleitungen zu übertragen, hat in ihrer Weiterentwicklung am Frauenhofer Institut zu einer radikalen Änderung der Übertragung und des Konsums von Audio-Inhalten geführt. Die seit 1995 unter dem Namen MP3 bekannte Kompressionstechnik begann ihre Massenverbreitung durch die ersten 1998 eingeführten tragbaren Abspielgeräte, wie das Rio von Diamond. Die zunehmende Beliebtheit des Mediums veranlasste zahlreiche andere Hersteller ähnliche Geräte zu entwickeln [15]. Mit der weiteren vereinfachten Handhabung des MP3-Codecs verbreitete sich insbesondere bei technikaffinen Studenten der Download von Audio-Dateien mittels der schnellen Anbindung ihrer Wohnheime. Für die Suche nach bestimmten Titeln wurden Anbieter wie Scour.com und Lycos genutzt, welche durchsuchbare Indizes von MP3-Titeln, gespeichert auf privaten Webseiten, enthielten. Diese Indexdaten waren jedoch nach wenigen Tagen nicht mehr gültig, da die referenzierten MP3s aufgrund von hohen Traffic-Kosten oder in Folge von Abmahnungen vom Webspace-Anbieter gelöscht wurden [42]. Dieses Problem versuchte ein 18-jähriger Informatikstudent, Shawn Fanning, 1999 mit einem altruistischen Konzept, in dem jedes Individuum dem anderen hilft, zu lösen [26]. Dabei sollten Nutzer direkt auf die Festplatten anderer zugreifen können, um Musikdateien auszutauschen. Im Gegensatz zu den alternativen Anbietern, die periodisch Suchprogramme (robots) einsetzten, um Musikinhalte zu finden bzw. zu aktualisieren, sollten dies nach Fanning die Nutzer selbst übernehmen, indem sie ihre Liste der P2P Seminar, TU Darmstadt, Sommer 2010, verfasst am 30. Juni 2010 bereitgestellten Musiktitel auf einen Server übertrugen [27]. Trotz des Einsatzes eines zentralen Servers wurde das aus den Teilnehmern (peers) entstehende Kollaborationsnetzwerk mit dem P2P-Grundkonzept assoziiert. Das Programm und die von ihm und seinem Onkel gegründete Firma [42] nannte er nach seinem Nicknamen Napster [27]. Durch die enorme Weiterentwicklung der Internet-Bandbreite, Rechenleistung und Speicherkapazität sowie der Flat-Rate basierten Internetzugangsmethoden wurde Napster schnell populär [27]. Obwohl Napster bereits 2001 aufgrund eines Gerichtsurteils [51] eingestellt werden musste, veränderte diese Technologie grundlegend die Nutzung des Internets. Abbildung 1 zeigt die Ergebnisse der CacheLogic Studie 2006 [14]. Aus dieser geht hervor, dass trotz der Abschaltung des Napster-Servers alternative P2P-Anwendungen enorme Wachstumsraten aufwiesen. Auswertungen der Verbindungsdaten 2002 ergaben, dass in Spitzenzeiten bis zu 65 % des Downstreams und bis zu 90 % des Upstreams privater Nutzer aus der Nutzung von P2P-Systemen stammten. Abbildung 1. Internet-Nutzungsarten ( ) [14] Obgleich die Software ursprünglich aus dem altruistischen Grundgedanken konzipiert wurde, war Napster das erste Business Modell einer P2P-Technologie-Anwendung, da Mitarbeiter- und Server-Kosten finanziert werden mussten. Das Geschäftskonzept sah die Kooperation mit Musikverlagen vor, für die Napster als Promotion und Distributions-Plattform dienen sollte. Die Vorstellung von Don Dodge, ehemaliger Leiter der Produktentwicklung von Napster, war wie folgt: Napster hat 50 Millionen Nutzer von denen eine Vielzahl bereit waren 5 $ pro Monat oder 1 $ pro Download zu zahlen. Dies sollte zu Einnahmen von 3 Milliarden $ führen, wovon Napster 10 % für seine Dienstleistung einbehalten würde. Der 82

83 PEER-TO-PEER BUSINESS MODELS verbleibende Umsatz sollte an Musikverlage als Rechteinhaber abgeführt werden. Nach den Vorstellungen von Dodge wäre dieses Konzept für die Musikindustrie ökonomisch sinnvoll, da sie bis dato über 90 % der Einnahmen als Kosten für Promotion und Distribution aufbringen und dadurch weniger als 1 $ pro Album als Gewinn vereinnahmten. Die 10 Millionen $ jährliche Kosten von Napster wurden zunächst von Investoren getragen, um eine für Verhandlungen nötige Marktpräsenz zu erreichen. Diese Investition sollte sich durch einen späteren Börsengang amortisieren [11]. We changed the world but failed to achieve business success [11] ist das Resümee von Dodge über die erfolglose Etablierung des Geschäftsmodells. Mit der Frage, ob sich die P2P-Technologie trotz des Misserfolgs von Napster erfolgreich verwerten lässt, befasst sich diese Ausarbeitung. Da ein P2P- Netzwerk nicht nur zum Austausch von Musiktiteln genutzt werden kann, werden im nächsten Abschnitt zunächst die Charakteristika und die sich daraus ergebenden Anwendungsgebiete vorgestellt. Grundlage für die Etablierung einer P2Pbasierten Anwendung ist der Austausch bzw. die Bereitstellung von Ressourcen auf Teilnehmer-Ebene. Welche grundsätzlichen Anreize ein Unternehmen den Teilnehmern bieten kann, um ihre Ressourcen freizugeben, betrachtet der dritte Abschnitt. Im darauf folgenden Kapitel werden ausgewählte Anwendungsszenarien auf ihre kommerzielle Verwertbarkeit überprüft. Hierfür werden zuerst Kriterien für die Beurteilung aufgestellt und anschließend anhand von umgesetzten Geschäftsmodellen die Verwertbarkeit der Einsatzgebiete diskutiert. II. P2P-TECHNOLOGIE Peer-to-peer is a class of applications that take advantage of resources storage, cycles, content, human presence available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, peer-to-peer nodes must operate outside the DNS and have significant or total autonomy of central servers [44]. Mit dieser groben Definition von Clay Shirky lässt sich ein P2P-System als eine Art Internet auf Applikationsebene über dem Internet ansehen [12]. A. Charakteristika Ein P2P-Netzwerk kann nach HAUSWIRTH und DUSTDAR [16] durch die folgenden Eigenschaften charakterisiert werden: Rollensymmetrie Jeder Peer kann gleichzeitig Client wie auch Server (servent) sein. Dezentralisierung Es existiert weder eine zentrale Koordinierungsinstanz (für die Interaktionssteuerung der Peers) noch eine zentrale Datenbasis (die Daten speichert und allen zur Verfügung stellt) und kein Peer verfügt über eine globale Sicht des Systems (Peer kennt nur direkte Nachbarn). Selbstorganisation Das gesamte Netzwerkverhalten entsteht aus partikulärer Interaktion der einzelnen Peers. Autonomie Jeder Peer ist unabhängig in seinen Entscheidungen und in seinem Verhalten. Zuverlässigkeit Durch entsprechende Mechanismen (Replikation, Reputationsmanagement, etc.) wird sichergestellt, dass trotz der losen Struktur das System zuverlässig funktioniert. Verfügbarkeit Diese Eigenschaft impliziert die Forderung, dass ungeachtet der verteilten Speicherung sowie unbekannter Glaubwürdigkeit und Kollaborationsrate der Peers, die nötigen Ressourcen jederzeit und für jeden Peer zur Verfügung stehen. Die zuvor genannten Eigenschaften erfüllen nur wenige P2P- Systeme, da sie entweder in dem gedachten Anwendungsszenario nicht erforderlich sind oder deren Umsetzung zu einem enormen Komplexitätsanstieg 1 führen würde. Insbesondere aus Komplexitätsreduktionsgründen sind einige hybride P2P- Ansätze 2 entwickelt worden. B. Anwendungsszenarien Das P2P-Konzept beschränkt sich nicht nur auf das Internet, sondern schließt zugleich Intranets und andere ad-hoc-netzwerke mit ein. Ebenfalls können Anwendungen für jegliche Arten von computer-ähnlichen Plattformen wie Smartphones, Netbooks oder Fernseher mit Netzwerk- Anschluss für ein P2P-System nutzen. P2P-Anwendungen lassen sich nach MILOJICIC ET AL. [47] in die drei folgenden in Abbildung 2 dargestellten Kategorien einteilen. Abbildung 2. Taxonomie der P2P-Anwendungen (in Anlehnung an [34]) Daten- und Speicher-Management Diese Anwendungskategorie umfasst die Speicherung und Verarbeitung der Informationen der Peers im Netzwerk. 1 Das Gnutella System, ein weiteres bekanntes P2P-Netzwerk, erfüllt zwar die obigen Eigenschaften, benötigt jedoch eine relativ hohe Bandbreite zur Umsetzung der Such- und Netzwerkverwaltungsstrategie. 2 Z.B. das Napster System aus Kapitel I, das für die Suche und Knotenverwaltung einen zentralen Server einsetzt. 83

84 PEER-TO-PEER BUSINESS MODELS Hierzu zählen Datenaustausch-Systeme, wie zum Beispiel File-Sharing-Programme und Streaming-Anwendungen oder P2P-Speicher-Systeme, die ihren Schwerpunkt bzgl. der Sicherheit und Verfügbarkeit von gespeicherten Daten haben. Und Systeme zur Datenfilterung sowie Data-Mining, bei denen der Fokus nicht auf den Informationsaustausch gelegt wird, sondern auf kollaborierende Techniken die durchsuchbare Indizes innerhalb des Netzwerks aufbauen. Parallele Verarbeitung Parallelisierbare P2P-Systeme zerlegen ein umfangreiches Problem in kleine Teilprobleme, die parallel von einer Vielzahl unabhängiger Computer (peer nodes) gelöst werden. Dabei kann zwischen rechenintensiven, die die Lösung einer Aufgabe auf viele Peers verteilen, und modularisierten Anwendungen, bei denen verschiedene Komponenten auf den Peers ausgeführt werden, unterschieden werden. Direkte Kollaboration Kollaborierende Anwendungen erlauben eine direkte Zusammenarbeit in Echtzeit. Informationen können gesammelt und weitergegeben werden, ohne dass dazu ein zentraler Server benötigt wird. Das klassische Anwendungsgebiet ist die direkte Kommunikation mittels Text oder Sprache. Eine simultane Bearbeitung der gleichen Information erlauben Shared Apps. Des Weiteren existieren Multiplayer Spiele auf verschiedenen Plattformen, die die P2P-Technologie nutzen. In Tabelle I werden abschließend ausgewählte konkrete Anwendungsbeispiele zu den zuvor erläuterten Kategorien aufgeführt. III. KOLLABORATIONSANREIZE Wie bereits in Abschnitt II-A angedeutet wurde, basiert ein P2P-Netzwerk auf der Kollaboration der einzelnen Teilnehmer, die ihre eigenen Ressourcen anderen zur Nutzung zur Verfügung stellen. Für die spätere Beurteilung von Geschäftsideen auf Basis der P2P-Technologie wird zunächst auf die fundamentale Frage eingegangen, welche Anreize ein Business Modell dem Teilnehmer bieten kann, um sich an diesem Netzwerk zu beteiligen. Nach KOLLOCK [21] basiert jede Form der Online- Zusammenarbeit auf eigennützigen oder altruistischen Motiven. Die Verhaltensimplikationen der Nutzer, die sich aus egozentrischen Anreizen ergeben, lassen sich durch die Übertragung der aus der betriebswirtchaftlichen Organisationstheorie stammende Anreiz-Beitrags-Theorie erklären. Diese besagt, dass die Existenz einer als Koalition verstandene Organisation gesichert ist, sofern es gelingt, für die Organisationsteilnehmer eine mindestens ausgeglichene Beziehung zwischen Anreizen (inducements) und Beiträgen (contributions) herzustellen [43]. Angewendet auf Mitarbeiter als Organisationsteilnehmer sieht diese Theorie vor, dass der als Anreiz verstandene Lohn mindestens so hoch sein muss, wie die als Beitrag eingebrachte Arbeitsleistung. In konsequenter Adaption hat ein P2P- Geschäftsmodell eine Existenzberechtigung, sofern der aus dem Netzwerk stammende Nutzen des Teilnehmers mindestens 3 Unterstrichene Beispiele werden in Abschnitt IV-B diskutiert. Tabelle I KONKRETE ANWENDUNGSBEISPIELE 3 DATEN- UND SPEICHER-MANAGEMENT Datenaustausch Filesharing (BitTorrent [5], LimeWire [29], in2movies [ ]), Videostreaming (Kontiki [22]) Speicher-Systeme Online-Festplatte (Wuala [7]) Daten-Filterung und Data-Mining Web Crawler (YaCy [8]), Kollaboratives Data Mining [10] PARALLELE VERARBEITUNG rechenintensiv Suche nach außerirdischen Signalen (Seti@home [52]), Medizinische Forschung (Compute against Cancer [ ]) modularisiert Workflow-Management-Systemen (P2E2-Projekt [2]), generelle Web Services (JXTA [48]) DIREKTE KOLLABORATION Kommunikation Voice over IP (Skype [46]), Intsant Messaging (ICQ [18]) Projektarbeit Groupware (MS Office Groove [33]) Spiele Smartphone Spiele (ipaintball [1]) Konsolen Spiele (Halo 3 [XBox 360] [32]) PC-Spiele (Call of Duty: Modern Warfare 2 [19]) so groß ist wie der Beitrag, in Form von Kosten für die Bereitstellung von Ressourcen. Die trivialste Anreizform ist die direkte oder indirekte finanzielle Vergütung. So könnte die Auszahlung von Geldeinheiten, Einräumung von Nutzungsrechten oder Erwerb von Dienstleistungsansprüchen pro Zeiteinheit der Verbindung mit dem Netzwerk eine direkte Bezahlung darstellen. Eine indirekte Provision wäre beispielweise eine Spende an eine wohltätige Organisation. Eigennützige nicht monetäre Anreizgestaltung kann aufgrund von sozialen oder ideologischen Motiven geschehen. Das aus den Sozialwissenschaften stammende Erklärungsmodell der Sozialen Anreize basiert auf Verpflichtungen der Erfüllung sozialer Konventionen oder als Ausdruck sozialer Beziehungen [31]. Voraussetzung für die Generierung sozialer Anreize ist jedoch der Verzicht auf Anonymität der Teilnehmer. Nur so könnten gruppendynamische Verpflichtungen aus der realen Welt zur Teilnahme an dem Netzwerk zwingen, um nicht sozialen Repressalien unterliegen zu müssen. Fraglich ist jedoch, ob in hochskalierten Netzwerken, aufgrund der großen Zahl an Teilnehmern, eine soziale Verbundenheit entstehen kann. Ideologisch motiviertes Handeln basiert auf dem Grundgedanken, die Welt durch die individuelle Tat positiv zu beeinflussen [31]. Unter diese Kategorie fallen unter anderem Projekte, die auf Erkenntnisgewinn zentraler Fragestellungen abzielen. Trotz der gemeinnützigen Zielsetzung erfüllt die Teilnahme den Eigennutz der intrinsischen Befriedigung (z.b. Wohlgefühl durch gute Tat ). Im Gegensatz zu den finanziell orientierten 84

85 PEER-TO-PEER BUSINESS MODELS Anreizen können soziale oder ideologische Anreize nur indirekt vom Unternehmen stimuliert werden. In Bezug auf soziale Motive könnte das Business Modell den Betrieb sozialer Plattformen vorsehen, die soziale Verflechtungen aufbauen und in das Netzwerk übertragen. Da der Aufbau neuartiger ideeller Wertvorstellungen nur schwer durch ein Unternehmen durchführbar ist, muss bei dieser Zielsetzung auf bereits etablierte Ideologien aufgebaut werden. In Situationen, in denen Netzwerk-Anhänger nicht aus eigennützigen Interessen handeln, können altruistische Motive unterstellt werden. Da diese Motive nicht durch Anreize stimuliert werden können, werden diese nicht näher betrachtet. IV. P2P-BUSINESS MODELS Nachdem in den vorangegangenen Abschnitten die grundsätzlichen Anwendungsmöglichkeiten und die Partizipationsanreize aufgezeigt wurden, erklärt und beurteilt dieses Kapitel umgesetzte Business Modelle. Hierzu werden ausgewählte Projekte aus den Bereichen des legalen Filesharing, Video Streaming, Grid Computing und Voice over IP betrachtet. A. Schlüsselfaktoren Um das Potential einer Geschäftsidee beurteilen zu können, ist es zunächst erforderlich Kriterien, an denen eine einheitliche Evaluierung stattfinden kann, festzulegen. Neben den zahlreichen Forschern (z.b.: [50], [36]), die sich mit den Erfolgsfaktoren von E-Business-Modellen beschäftigt haben, betrachteten MACINNES ET AL. [30] und HUGHES ET AL. [17] insbesondere die Schlüsselfaktoren von P2P- Geschäftsmodellen. Bei dem von HUGHES ET AL. entwickelten Framework wird ein P2P-Geschäftsmodell aus technologischer, ökonomischer, struktureller, juristischer, politischer, kultureller und kognitiver Perspektive beurteilt. Im Gegensatz zu diesen analytisch geprägten umfassenden Beurteilungskriterien beschränkt sich diese Ausarbeitung auf die folgenden, in Anlehnung an MACINNES ET AL. vorgeschlagenen ökonomischen Erfolgsfaktoren. Einnahmebasis (revenue source) Referenziert die Art und Weise, wie das Unternehmen die Kosten seiner unternehmerischen Tätigkeiten aufbringt. Teilnehmernutzen (potential benefit to actors) Beinhaltet alle Erträge die die Mitglieder eines P2P- Netzwerks erhalten. Diese Eigenschaft zielt auf die Anreizmotive der finanziellen, ideologischen oder sozialen Entschädigung der Ressourcenbereitstellung aus Abschnitt III ab. Technologie (technology) Bezieht sich auf die eingesetzte Grundtechnologie (Netzwerkarchitektur, Informationsübermittlung, Teilnehmerinteraktion) und die das Netzwerk unterstützenden Systeme (z.b. das Vergütungssystem, Support). Sicherheitskonzepte (security) Umfasst die Sicherungsmittel zum Schutz der Daten vor unerlaubten Zugriffen, Vertrauenswürdigkeit der Infrastruktur sowie die sichere Kommunikation. B. Business Model - Fallstudien Im Folgenden werden ausgewählte Business Modelle vorgestellt und hinsichtlich ihrer ökonomischen Verwertbarkeit beurteilt. Grundsätzlich kann mit keinem gesetzwidrigen Geschäftsgebaren eine positive ökonomische Beurteilung verbunden werden, da diese nur solange praktiziert werden können bis eine staatliche Einrichtung die Ausübung untersagt, werden derartige Modelle nur beiläufig behandelt. (legale) Filesharing-Dienstleistungen Obwohl kleinere Musiklabels durch geringe Markteintrittskosten, in Bezug auf Promotion und Werbung, vom File-Sharing-Netzwerken profitieren können [3] und sogar in den USA bekanntere Künstler sich gegen die strikte Verfolgung von Teilnehmer aussprechen [24], bekämpfen amerikanische Major Labels gemeinsam als Recording Industry Association of America (RIAA) [39] die unlizenzierte digitale Vervielfältigung (digital piracy) eigener urheberrechtlich geschützter Inhalte [40]. Durch Schadensersatzklagen versuchen die Plattenfirmen die Einnahmeverlusten zu begegnen [45]. Der erste von der RIAA gewonnene populäre Prozess richtete sich gegen Napster [51], dessen Dienst leicht, aufgrund des zentralisierten Suchdienstes, abgeschaltet werden konnte. In Folge dieses Urteils wurden dezentrale Netzwerk-Strukturen geschaffen, in denen es keine zentralen Anbieter und damit keinen zentralen Akteur, der schadensersatzpflichtig wäre, existieren. In der Vergangenheit wurden daher einzelne P2P-Nutzer, die mehr als Titel zum Tausch anboten, zur Abschreckung anderer verklagt [37]. Da bislang das Anbieten von Dienstleistungen oder Herstellen von Software für ein Filesharing-Netzwerk, das illegale Inhalte enthält, keine rechtlichen Auswirkungen impliziert, wären solche Geschäftsmodelle eventuell ökonomisch sinnvoll. Dies könnte sich jedoch mit der Bestätigung eines Urteils vom 12. Mai 2010 eines US-Bundesgerichts (federal court) verändern. In dem Verfahren, das sich gegen die Anbieter eines populären P2P-Dienstes Limewire richtet, wurde entschieden, dass diese Software zum Raubkopieren verleitet und illegal ist [38]. Mit diesem Entscheid, der dem Anbieter eine indirekte Urheberrechtsverletzung bereits bei dem Aufbau eines Netzes ohne Sicherheitsmechanismen zum Schutz von Urheberrechten unterstellt, wäre jedes derartige Geschäftsmodell potentiellen Schadensersatzklagen ausgesetzt. Dies würde jegliche Dienstleistung im Umfeld von P2P-Netzen, wie BitTorrent, Gnutella oder private Netze der Darknet 4 betreffen und wird daher in dieser Arbeit nicht weiter vertieft. Filesharing zur Distributionsunterstützung (in2movies) Ein mögliches Geschäftsmodell, mit einem eingeschränkten P2P-System, in Hinblick auf Autonomie und Dezentralisierung, versuchte die Kooperation aus Warner Bros. Entertainment GmbH und ein zum Bertelsmann- Konzern gehörender technischer Dienstleister im Jahr Ein nicht öffentliches P2P-Netzwerk, in dem ein Teilnehmer nur durch Einladung beitreten kann [4] 85

86 PEER-TO-PEER BUSINESS MODELS zu etablieren. Sie wollten ein Filmportal in2movies im deutschen Markt erschaffen, dessen operatives Geschäft sich auf den Vertrieb von Videos (Spielfilmen, Serien) als digitaler Download in DVD-naher Qualität (1,5 MBit/s encodiert) konzentrieren sollte [23]. Als Einnahmegrundlage diente der Verkauf und die spätere 24- Stunden Ausleihe von Warner Bros. Spielfilmen. Dabei sollten aktuelle Filme 14,99 e, ältere 6,99 e und TV-Serien 1,99 e kosten. Dieses Angebot war im Gegensatz zur Kauf- DVD ohne alternative Tonspuren oder Extras ausgestattet. Die dem Geschäft zugrundeliegende Technologie war ein hybrides P2P-Netzwerk. In diesem sollte grundsätzlich nach dem Kauf eines Filmes die Videodatei vom in2movies-server herunter geladen werden. Da aufgrund von mehreren gleichzeitig angefragten Filmen die Downloadgeschwindigkeit der zentralen Server stark beeinträchtigt werden könnte, sollten bei hoher Auslastung Peers aushelfen. Die Lastverteilung wurde über das Download-System GNAB der Bertelsmann- Tochter Arvato gesteuert. Dieses System lagerte Video- Fragmente der zu übertragenden Downloads auf die Peers aus und administrierte deren Verteilung. Der schnelle und sicherere Download sowie das 24-Stunden Angebot sollten den Teilnehmernutzen für Käufer darstellen. Der Anreiz, dass sich Kunden selbst als Distributions-Servent an der Plattform beteiligten, geschah in Form einer fiktiven Vergütung. Für bereitgestellte Upload-Kapazität erhielt ein Teilnehmer MoviePoints, die gegen Film-Downloads eingetauscht werden konnten (ca. 40 GB Upload für einen Film). Damit die gekauften Filme nicht beliebig weitergegeben werden konnten, sah das Sicherheitskonzept den Einsatz eines Digital Rights Management (DRM) Systems vor. Dieses beschränkte das Abspielen der Videos auf maximal drei Plattformen und unterband das Brennen auf Video-DVDs. Trotz verschiedenster Kooperationen (u.a. mit AOL, Amango, Media-Markt-Kette oder Yahoo) wurde der Dienst Mitte 2008 aufgrund von Erfolglosigkeit eingestellt [20]. Das in2movies Filmportal zeigt, dass ein Geschäftsmodell nur dann erfolgreich sein kann, wenn es dem Kunden ein für ihn empfundenen Mehrwert bietet. Ein Dienst, der sich als Konkurrenz zum ordinären Kauf eines Films aufstellt, muss dem Kunden spezielle Vorteile bieten, damit dieser sein gewohntes Kaufverhalten ändert. Der Vorteil des ständigen Zugangs überwog nicht die Nachteile der durch das DRM eingeschränkte Nutzungsrecht von Filmen zu DVD-ähnlichen Kaufpreisen ohne DVD-Verpackung und -Qualität. Gerade die Einsparungen von IT-Infrastrukturkosten durch den Einsatz der P2P-Technologie haben sich nicht in der Preispolitik manifestiert, so dass dieser Dienst von den Kunden abgelehnt wurde und keine Marktetablierung erreicht worden ist. Video Streaming (Kontiki) Ein weiteres als das zuvor vorgestellte Verfahren Videos als komplette Datei zu übertragen, ist die Möglichkeit, bewegte Bilder als Video-Stream anzubieten. Dabei werden die Videoinhalte sukzessive weitergereicht, so dass eine nahezu verzögerungsfreie Wiedergabe beim Empfänger möglich ist. Hierbei sind sowohl Filmabruf (Video on Demand) - Varianten als auch die Übertragung von TV-Kanälen vorstellbar. Insbesondere das Übertragen von TV-Livestream über das Internet als Übertragungsmedium ist unter dem Marketingbegriff IPTV bekannt. Die trivialste Form der Datenübertragung ist eine Client/Server basierte Technologie, welche als Unicast bezeichnet wird. Bei diesem Verfahren, in Abbildung 3a schematisch dargestellt, überträgt der zentrale Streaming-Server jedem Nutzer jeweils ein Datenstrom. Die Serverkapazität, insbesondere dessen Anbindung an das Internet, würde linear mit der Anzahl der Zuschauer steigen. Da das englische öffentliche Fernsehen (BBC) als eine der ersten Fernsehstationen in Europa ihre Programme über das Internet verbreiteten, berechneten LIEBAU ET AL. in einer Fallstudie [28], dass, sollte ganz England IPTV nutzen, bei der Unicast- Technik Server 5 benötigt, was ökonomisch nahezu unrealisierbar wäre. Abbildung 3. (a) Unicast (One-to-One) (b) Hybrides P2P (Kontiki Konzept) (c) Multicast (One-to-Many) IPTV Übertragungswege Eine Alternative zum Unicast wäre der Einsatz der P2P- Technologie um den Aufwand für eine komplexe Server Farm zu vermeiden. Diesen Ansatz verfolgte die BBC indem sie das von der Firma Kontiki entwickelte hybride P2P- Übertragungskonzept in ihrem Streaming-Programm verwen- 5 Dabei liegt die Annahme zugrunde, dass für eine derartige IPTV-Plattform TB Traffic pro Monat erzeugt werden würde und jeder Server eine 10 KBps Internetanbindung besäße. 86

87 PEER-TO-PEER BUSINESS MODELS dete. Der BBC iplayer 6 wurde im Oktober 2005 erstmals im Testbetrieb eingesetzt und stetig weiterentwickelt. Die Einnahmebasis der Kontiki Gesellschaft liegt im Verkauf von Nutzungslizenzen und Anpassung der Video Streaming Technologie. Zielgruppen sind Unternehmen die Videoinhalte intern, für Partner oder für Kunden verbreiten wollen. Des Weiteren kann diese Technologie zur IPTV Übertragung genutzt werden. Der Teilnehmernutzen stellt sich für Clients in einem stabilen Datenstrom, der in Zeiten hoher Auslastung Aussetzer in der Wiedergabe minimiert, dar. Für Anbieter von Videoinhalten reduziert diese Methode die Kosten für eine Server-Infrastruktur und benötigte Bandbreite. Die Peer-unterstützende-Netzwerk-Technologie wird in Abbildung 3b vereinfacht dargestellt. Diese kombiniert partikuläre Downloads von Peers und zentralen Servern. Da einige Peers Inhalte temporär auf ihren Rechner aufbewahren zu denen sie keine Wiedergabeberechtigung haben, sieht das Sicherheitskonzept den Einsatz eines DRM-Systems vor. Bei diesem kann der Inhalt nur bei Besitz eines passenden Schlüssels, den ein DRM-Server bei jeder Wiedergabe vergibt, genutzt werden. Im Dezember 2007, mit der Veröffentlichung der dritten BBC iplayer Version, verabschiedeten sich die Engländer von der P2P-Technologie. Die Umstellung des BBC-Video-Services auf einen zentralen Server-Stream geschah aufgrund der folgenden drei identifizierten Motiven [41], die teilweise die Ursache der Ablehnung des Video-Angebots waren: Unerwünschter Ressourcenverbrauch Nutzer mochten P2P nicht, da für diesen bezahlten Dienst die Technologie sowohl CPU wie Upload Ressourcen belegten, so dass sich nach deren Empfinden ihre Rechner spürbar verlangsamten. ISP deckeln Bandbreite nach Traffic-Verbrauch Einige Angebote englischer Internet Service Provider (ISP), berechnen bei gedeckelten Internetzugängen sowohl Upload wie Download als Traffic, so dass einige Teilnehmer durch den erhöhten P2P-Upload frühzeitig ihr Traffic Limit erreichten. Reduktion der Datenübertragungskosten (bandwidth costs) Wie Abbildung 4 verdeutlicht, verringerten sich im Zeitraum des ersten Releases bis zur Umstellung (2004 bis 2008) die Breitband-Kosten um fast 90 %. Dies war die grundlegende ökonomisch und technische Entwicklung, die das Anbieten von direkten http- Downloads als eine realisierbare Alternative ermöglichte. Für die Reduzierung der Server-Last wurden bei Livestream-Übertragungen die Router der Internetdienstanbieter, die als Multiplikatoren der Datenpakte fungieren, mit einbezogen. In diesem Multicast-Verfahren, das in Abbildung 3c skizziert ist, wird das Routing Information Protocol (RIP) verwendet, welches das Versenden der Daten- Pakete an viele Empfänger zur gleichen Zeit ermöglicht. Trotz der Abkehr der BBC von der P2P-Technik kann dieses Geschäftsmodell weiter ökonomisch erfolgreich sein. Im Gegensatz zur BBC verwendet zum Beispiel das britische Bezahlfernsehen BSkyB weiterhin die Technologie von Kontiki [6]. Diese bieten das IPTV nur als zusätzliche Konsummöglichkeit neben der gewöhnlichen Kabel- bzw. Satelliten-Übertragung des TV-Programms an. Neben dem TV-Streaming kann diese Videoverbreitungslösung erfolgreich bei kleineren Unternehmen genutzt werden, die teure Server-Infrastrukturkosten scheuen, da sie nur gelegentlich Video-Botschaften übermitteln und die Nachteile akzeptieren. P2P in der medizinischen Forschung (Parabon) Bei den P2P-Netzwerken, die Rechenleistung bündeln, werden überwiegend keine ökonomischen Motive zugrunde gelegt. Die Aufgaben haben eine zu erforschende Problemstellung und die Nutzer engagieren sich in diesem aus idealistischen Gründen (z.b. bei der Weltraumerforschung nach außerirdischen Signalen im SETI@home Projekt). Dennoch wird im Folgenden ein indirektes Geschäftsmodell der Firma Parabon, dass die Compute against Cancer - Initiative betreut vorgestellt. Das Ziel dieses P2P-Grid- Computing-Projekts ist die Krebsforschung zu beschleunigen, indem die Reaktion von Krebszellen auf verschiedene Medikamente simuliert wird. Dieser Dienst besitzt zwar keine direkte Einnahmebasis, jedoch promotet Parabon mit diesem Projekt seine Technik und die teilnehmenden Forscher können aus den späteren Patenten Einnahmen generieren. Der Teilnehmernutzen basiert auf der ideologisch motivierten intrinsischen Befriedigung, behilflich bei der Krebserforschung zu sein. Bis 2001 wurden jedoch zur Bekanntmachung des Projektes monetäre Anreize zur Teilnahme, indem an jeden Tag 100 $ und $ pro Monat verlost wurden, gesetzt [49]. Bei dem Zusammenschluss der Rechner wird die von Parabon entwickelte Technologie verwendet. Jeder Peer nutzt die spezielle Parabon Pioneer Software, welche sich die nötigen Daten und Engines lädt und die Ergebnisse wieder zurück zum Erkenntnis-Server überträgt. Parabon nutzt kein ausdrückliches Sicherheitskonzept, sondern basiert auf Vertrauen der Teilnehmer, dass sowohl die Software keinen Schaden anrichtet als auch Ergebnisse nicht manipuliert sind. Insbesondere zielt das Vertrauen der ressourcengebenden Teilnehmer darauf ab, dass ihr System nicht für den Aufbau eines Netzes, den Botnetz, für illegale Computerattacken genutzt wird. Ihre Systeme könnten unter anderem für DDoS- Abbildung 4. Breitband Internet Transit Kosten ( ) [25] 6 Im Feldversuch als Integrated Media Player (imp) bezeichnet. 87

88 PEER-TO-PEER BUSINESS MODELS Attacken 7 oder zur Spam-Verteilung 8 eingesetzt werden. Zwar könnte ein Botnetz legal durch eine Klausel in den Nutzungsbedingungen aufgebaut werden und dadurch in der Vermietung und Verkauf von Botnetzen ein Geschäftsmodell bestehen, werden diese doch mit hoher Wahrscheinlichkeit nicht für gesetzeskonforme Zwecke eingesetzt werden und wären damit illegitim. Das für diese Netzwerke durchaus ein Markt existiert, bescheinigt die Kaspersky Studie. In dieser wird geschätzt, dass die Pacht für ein -Botnetz, das etwa Mails pro Minute versendet (bei 100 Zombie- Rechnern online) [...] etwa $ im Monat [35] beträgt. Der Verkauf von Zombie-Netzen mit [...] einigen hundert Bots [erzeugt Einnahmen] zwischen 200 und 700 $ [35]. Voice over IP (Skype) Der mit über 520 Millionen Nutzern [13] wohl populärste legale P2P-Dienst ist die Telefon-Software Skype, mit der Gespräche über das Internet kostenlos geführt werden können. Neben den freien Telefonaten von PC zu PC bietet Skype die gebührenpflichtige Möglichkeit aus dem Skype Netz in das gewöhnliche Telefonnetz anzurufen. Die Platzierung von Werbung und die Erlöse aus den Festnetzverbindungen stellt die Einnahmebasis des Unternehmens dar. In Bezug auf die abgewickelten Gespräche ist Skype erfolgreich, so dass bereits acht Prozent der weltweiten Auslandsgespräche über den Internet-Telefondienst abgewickelt werden [54]. Das technologische Fundament von Skype, veranschaulicht in Abbildung 5, bildet eine hybride P2P-Infrastruktur, bestehend aus Super-Knoten, gewöhnlichen Knoten und den Login-Servern. Nach dem Download der Skype Software überprüft diese die Rechenleistung sowie Internetanbindung und entscheidet ob der Rechner als Super Knoten agiert. Gewöhnliche Knoten verbinden sich mit Super Knoten, die wiederum mit anderen Super Knoten verbinden. Super Knoten und gewöhnliche Knoten bilden zusammen das P2P-Overlay-Netzwerk, in dem die Telefonate weitergereicht werden. Der einzige zentralisierte Teil in diesem Netzwerk sind die Login-Server [28]. Das Sicherheitskonzept von Skype beinhaltet sowohl Zertifikate zur Sicherstellung der digitalen Identität, eine grundsätzlich verschlüsselte Kommunikation (AES-256) sowie zahlreiche Mechanismen zum Schutz vor Angriffen, wie z. B. der Identitätswechsel, Abhören, Man-In-The-Middle-Angriffe und Datenmodifizierung während der Übertragung [46]. Der Teilnehmernutzen liegt in der einfachen Bedienung, kostenlosen Nutzung und in den zusätzlichen Angeboten (z.b. Videokonferenz, Spielen, Chats etc.). Aus den gerade erörterten Eigenschaften scheint dieses Geschäftsmodell geeignet zu sein, dauerhaft substanzielle 7 Distributed Denial-of-Service-Attacken (DDoS-Attacken) sind Überlastungsangriffe auf ein Computersystem, so dass dieses keine legitimen Anfragen mehr bearbeiten kann. Kriminelle können DDoS-Attacken als Druckmittel verwenden, um Lösegeld von den angegriffenen Firmen zu erpressen. Nach Experten-Schätzungen [35] gab es im Jahr 2008 zirka DDoSAttacken mit einem erbeuteten Lösegeld von rund 20 Millionen $. 8 Spam-Mails sind unerwünschte -Mitteilungen, deren Inhalt meist Werbung für Viagra, Online-Casinos oder gefälschte Luxusartikel sind. Da das massive Versenden von Mails verfolgt wird, werden Botnetze zu diesem Zweck eingesetzt. Nach der Kaspersky Studie [35] liegen die Einnahmen für verschickte Mails bei ca. 70 $. Abbildung 5. Struktur des Skype-Overlay-Netzwerks [28] Erlöse zu generieren. Zu diesem Ergebnis kam auch das Management von Ebay und übernahm 2005 Skype für einen ökonomisch fragwürdigen Betrag von 3,1 Milliarden $ [54]. In den folgenden Jahren gelang es Ebay jedoch nicht mit Skype ein gewinnbringendes Ergebnis zu erzielen, da trotz der enormen Beliebtheit, der Dienst kaum Profit abwirft. Setzt man den Umsatz von 551 Millionen $ im Jahr 2008 in Bezug zu den Nutzern, erwirtschaftet jeder Teilnehmer im Monat weniger als 11 Cent für das Unternehmen [54]. Zwar führt dies nach Abzug aller Kosten zu einem Gewinn von 116 Millionen $ [9], doch dieses Ergebnis ergibt, gemessen an der Investitionssumme, eine Investitionsrendite von gerade 3,7 %. Da dieser Profit nach Abzug von Steuern und Abschreibung aus betriebswirtschaftlicher Sicht zu wenig ist, verkaufte Ebay % von Skpye an eine Investorengruppe für den Verkaufspreis von 1,9 Milliarden $ [53]. Skype zeigt, dass es sogar große Technologie Firmen nicht schaffen ein profitables Geschäftsmodell im Umfeld einer kostenlosen P2P-Dienstleistung aufzubauen. Daher, sollte es den neuen Eigentümern nicht gelingen weitere Ertragsquellen zu generieren, wird Skype wahrscheinlich in naher Zukunft entweder weiterverkauft, über Spenden finanziert oder eingestellt werden. V. FAZIT UND AUSBLICK Durch die vielen zum Teil einzigartigen Vorteile, wie Dezentralisierbarkeit, geringere Netzwerkkosten, Anonymität, Skalierbarkeit oder Selbstorganisation, verbreitete sich die P2P-Technologie rasant und wurde Grundlage einer Reihe unterschiedlichster Anwendungen. P2P fand Einzug in Konzepten aus den Bereichen der Daten- und Speicher-Management, parallelen Verarbeitung und direkten Kollaboration. Jedoch dominierte die Verwendung dieser Technik für den kostenlosen Austausch bzw. unlizenzierten Vervielfältigung geschützte Inhalte. Dadurch haftete der Technologie ein Image der Illegalität an, so dass P2P-Systeme von Unternehmen eher bekämpft anstatt genutzt wurden. Das Hauptproblem bei der Verwirklichung eines P2P- Geschäftsmodells ist die mit der Technologie tief verankerte Kultur der kostenlosen Nutzung. Innerhalb der Tauschbörsen entstand ein irreales Gefühl der indirekten Bezahlung der Inhalte, da der Betrieb oder die Auslastung des Rechners bemerkbare Kosten verursachte. Der Erhalt der Inhalte wurde 88

89 PEER-TO-PEER BUSINESS MODELS als Anreiz höherwertig als die Kosten eingestuft, so dass sich die Nutzer an den Netzwerken beteiligten. Neben den individuellen Fehlern der Geschäftsgestaltung war das Versagen in der Schaffung attraktiver Anreize, die Ursache des Scheiterns der meisten Versuche der kommerziellen Verwertung. Erreichte das Geschäftsmodell eine hohe Kundenanzahl für einen kostenlosen Dienst (als Anreiz), dann wurde zu wenig an den kostenpflichtigen Zusatzleistungen verdient (z.b. Skype). Alternativ wurde der Anreiz für die Beteiligung an dem kommerziellen Netzwerk als nicht attraktiv genug bewertet, so dass keine angemessene Anzahl von Kunden gewonnen werden konnte (z.b. in2movies). Dem Einsatz von P2P als Mittel der Kostenersparnis steht die enorme technische Weiterentwicklung und die exponentielle Degression der Übertragungskosten entgegen. Denn vergleicht man die Kosten für den Breitbandzugang aus dem Jahr 1998, der Veröffentlichung von Napster, mit denen 10 Jahre später, haben sich die Kosten um den Teiler 100 reduziert. Die kommerzielle Verwertung der P2P-Technologie könnte ihren Durchbruch in zukünftigen mobilen Netzwerken haben. Aus der zunehmenden Verbreitung und technischen Weiterentwicklung von mobilen Geräten wie Smartphones, Netbooks oder Tablet-Computer die verschiedenste Netzzugänge, wie GSM, WLAN und Bluetooth unterstützen, könnte ein Markt für P2P-Software oder -Spiele entstehen. Da die Funkfrequenzbereiche stark reglementiert sind, verbleibt die Bandbreite für die Datenübertragung innerhalb eines Netzes auf einem relativ geringen Niveau. Hier könnten nun parallel aus mehreren in Reichweite befindlichen Geräten ad-hoc P2P- Systeme über andere Netzzugänge aufgebaut und dadurch die einzelne Geräte-Bandbreite erhöht werden. Doch stellt sich in diesem Anwendungsszenario genauso das grundsätzliche Problem der profitablen Umsatzgenerierung. Abschließend ist festzuhalten, dass die P2P-Technologie als alleinige Grundlage nur wenig Raum für eine Etablierung eines substanziellen Geschäftsmodells bietet. Jedoch lässt sich mit einem kontrollierten Einsatz eine indirekte Ertragsquelle erkennen, wenn die P2P-Technologie genutzt wird, um existierende Dienste zu erweitern oder deren Betriebskosten zu reduzieren. LITERATUR [1] Apple Inc. ipaintball p2p multiplayer. ipaintball-p2p-multiplayer/id , Letzter Zugriff I [2] Matthias Bender, Steffen Kraus, Florian Kupsch, and et.al. Peer-to-Peer- Technologie für unternehmensweites und organisationsübergreifendes Workflow-Management. In Peter Dadam and Manfred Reichert, editors, Informatik Informatik verbindet, volume 51 of Lecture Notes in Informatics (LNI), pages Gesellschaft für Informatik (GI), I [3] Sudip Bhattacharjee, Ram D. Gopal, Kaveepan Lertwachara, James R. Marsden, and Rahul Telang. The Effect of Digital Sharing Technologies on Music Markets: A Survival Analysis of Albums on Ranking Charts. Management Science, 53(9): , IV-B [4] Peter Biddle, Paul England, Marcus Peinado, and Bryan Willman. The Darknet and the Future of Content Distribution. Levine s Working Paper Archive. 4 [5] BitTorrent Inc. Bittorrent. Letzter Zugriff I [6] BSkyB Ltd. How does Sky use Kontiki s secure peer-to-peer technology to deliver shows to my PC?, Letzter Zugriff IV-B [7] Caleido AG. Wuala - der sichere online-speicher. Letzter Zugriff I [8] Michael Christen. Dezentrale web-suche mit yacy. Letzter Zugriff I [9] Michael Corkery. Did EBay Make a Profit on Skype Or Not? The Wall Street Journal, IV-B [10] Souptik Datta, Kanishka Bhaduri, Chris Giannella, Ran Wolff, and Hillol Kargupta. Distributed Data Mining in Peer-to-Peer Networks. IEEE Internet Computing, 10(4):18 26, I [11] Don Dodge. How Napster changed the world - A look back 7 years later, Letzter Zugriff I [12] Schahram Dustdar and Harald Gall. Software-Architekturen für verteilte Systeme: Prinzipien, Bausteine und Standardarchitekturen für moderne Software. Xpert.press. Springer, Berlin, II [13] ebay Inc. Reports Third Quarter 2009 Results, IV-B [14] David Ferguson. Trends and Statistics in Peer-to-Peer: Vice president of Engineering, CacheLogic, I, 1 [15] Fraunhofer-Institut für Integrierte Schaltungen IIS (Hrsg.). 20 Jahre Audiocodierung am Fraunhofer IIS. I [16] Manfred Hauswirth and Schahram Dustdar. Peer-to-Peer: Grundlagen und Architektur. Datenbank Spektrum, 13:5 13, II-A [17] Jerald Hughes, Karl R. Lang, and Roumen Vragov. An analytical framework for evaluating peer-to-peer business models. Electronic Commerce Research and Applications, 7(1): , IV-A [18] ICQ LLC. Communicate and find new friends on icq. com, Letzter Zugriff I [19] Infinity Ward Inc. Call of duty: Modern warfare 2. modernwarfare2.infinityward.com, Letzter Zugriff I [20] Nico Jurran. Filmportal in2movies stellt seinen Betrieb ein, Letzter Zugriff IV-B [21] Peter Kollock. The Economies of Online Cooperation: Gifts and Public Goods in Cyberspace. In Marc A. Smith and Peter Kollock, editors, Communities in cyberspace, pages Routledge, III [22] Kontiki Inc. Kontiki enterprise video solutions. Letzter Zugriff I [23] Stefan Krempl. Portal in2movies zielt auf Filmfreunde und Power- Sauger, Letzter Zugriff IV-B [24] Jonathan Krim. Artists Break With Industry on File Sharing: Some Musicians Say Web Services Can Be Valuable Means of Distribution. Washington Post, page E05, IV-B [25] Craig Labovitz, Danny McPherson, Scott Iekel-Johnson, Jon Oberheide, Farnam Jahanian, and Manish Karir. ATLAS - Internet Observatory 2009 Annual Report: (NANOG47), [26] Damian Fernandez Lamela, Kwan Hong Lee, and Mihai Lupu. Peerto-Peer Business Models: Term Paper: : The Software Business, I [27] Jin Li. On peer-to-peer (P2P) content delivery. Peer-to-Peer Networking and Applications, 1(1):45 63, I [28] Nicolas Liebau, Konstantin Pussep, Kalman Graffi, Sebastian Kaune, Eric Jahn, André Beyer, and Ralf Steinmetz. The Impact Of The P2P Paradigm. In 13th Americas Conference on Information Systems (AMCIS 2007). AIS Electronic Library (AISeL), IV-B, IV-B, 5 [29] Lime Wire LLC. Limewire. Letzter Zugriff I [30] Ian MacInnes and Junseok Hwang. Business Models for Peer to Peer Initiatives: Research paper. In Joze Gricar and Andreja Pucihar, editors, 16th Bled Electronic Commerce Conference, pages 44 58, IV-A [31] Kevin McGee and Jörgen Skågeby. Gifting Technologies. In Andy Clarke, editor, 4th Conference on Computational Semiotics for Games and New Media, pages 87 96, III [32] Microsoft Corporation. Halo 3. Letzter Zugriff I [33] Microsoft Corporation. Microsoft office groove microsoft.com/groove, Letzter Zugriff I [34] Dejan S. Milojicic, Vana Kalogeraki, Rajan Lukose, Kiran Nagaraja, Jim Pruyne, Bruno Richard, Sami Rollins, and Zhichen Xu. Peer-to- Peer Computing: HPL (R.1), [35] Yury Namestnikov. Schattenwirtschaft Botnetz - ein Milliongeschäft für Cybercriminelle, Kaspersky Lab. IV-B, 7, 8 [36] Alexander Osterwalder and Yves Pigneur. An ebusiness Model Ontology for Modeling e-business. In Joze Gricar and Uros Hribar, editors, 15th Bled Electronic Commerce Conference, IV-A [37] Markus Pilzweger. RIAA einigt sich außergerichtlich mit 64 P2P- Nutzern. PC Welt, IV-B [38] Joseph Plambeck. Court Rules That File-Sharing Service Infringed Copyrights. The New York Times, page B10, IV-B 89

90 PEER-TO-PEER BUSINESS MODELS [39] Recording Industry Association of America. Riaa - faq. faq.php, Letzter Zugriff IV-B [40] BPI Research & Information. The Impact of Illegal Downloading on Music Purchasing, Letzter Zugriff IV-B [41] Anthony Rose. Introducing BBC iplayer Desktop for Mac, Linux and PC: BBC Intenert Blog, Letzter Zugriff IV-B [42] Janko Röttgers. Mix, Burn & R.I.P. Das Ende der Musikindustrie. Heinz Heise, Hannover, I [43] Gerhard Schewe. Gabler Wirtschaftslexikon: Stichwort: Anreiz- Beitrags-Theorie (6. Version), III [44] Clay Shirky. What Is P2P... And What Isn t, Letzter Zugriff II [45] Stephen E. Siwek. The True Cost of Sound Recording Piracy to the U.S. Economy, IPI Policy Report. IV-B [46] Skype Limited. Skype. Letzter Zugriff I, IV-B [47] Ralph H. Sprague, editor. Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 08), Washington, IEEE Computer Society. II-B [48] Sun Microsystems Inc. Jxta technology to create peer-to-peer (p2p) applications. Letzter Zugriff I [49] Eugen Thome. Parabon - Compute Against Cancer: Science@home.de, Letzter Zugriff IV-B [50] Paul Timmers. Business Models for Electronic Markets. Electronic Markets, 8(2):3 8, IV-A [51] United States Court of Appeals, Ninth Circuit. A & M Records vs.. Napster Inc, I, IV-B [52] Universität Berkeley. Seti@home. Letzter Zugr I [53] Oliver Voß. Ebay verkauft Skype für knapp zwei Milliarden Dollar. Wirtschaftswoche, IV-B [54] Oliver Voß. Skype hat kein Geschäftsmodell: Erfolglose Ebay-Tochter. Wirtschaftswoche, IV-B, IV-B 90

91 ANALYZING NETWORK CODING FOR SECURITY THREATS AND ATTACKS Analyzing Network Coding for Security Threats and Attacks Benjamin Milde Abstract Network coding is as a promising alternative to traditional content distribution approaches in P2P networks. Its key advantages are more efficient data delivery and its ability to maintain resiliency with impatient and selfish agents. One of its main problems however are Byzantine attacks, where malicious peers can corrupt data to perform a DDoS. This work gives an introduction to network coding theory, an overview of the performance gains and resiliency that arise if network coding is applied to P2P systems and contrasts different ideas to solve the problem of Byzantine attacks that have emerged recently. I. INTRODUCTION Network coding has been originally proposed in information theory [1] [19], but there is an ongoing process to apply it to P2P network systems. The idea is to use the power of network coding to improve resilience to peer dynamics in P2P systems [22] and to increase performance in download times [6]. In traditional networks, data is send in the form of packets, that are forwarded from node to node. Different data packets may share a resource and use the same path, but the information that is contained in the packets is separated. This is there network coding introduces a new paradigm: data packets may be mixed into one or more packets, the information is coded instead of just forwarded. Nodes can now combine packets in such a way, that it is easier to maximize the flow in a multicast network. In fact, it has been proved [21] that linear coding usually suffices to achieve the maximum rate in a multicast setting. According to max-flow min-cut theorem [23], the maximum flow of a network with packets passing from one source to a sink is equal to the amount of capacity that needs to be removed in order to stop all flow from the source to the sink. The proof from Li et al. [21] shows that with linear network coding, the amount of capacity in the min-cut can be achieved for a flow from one source to a group of sinks. Thus, with the the max-flow min-cut theorem, this is the maximum achievable rate for a multicast network. There are some results from practical attempts to apply network coding to P2P network systems, namely the Avalanche project from Microsoft 1. In [6], [7] and [9] they could present some of the practical benefits of a P2P network with network coding. Unfortunately, up to this date, there is no published software from Microsoft that can be used to 1 verify Avalanche s performance. One of the major drawbacks of network coding in P2P networks, is that special care has to be taken into account for malicious nodes. An Byzantine [20] adversary may pretend to forward packets, but instead modifies them so that they are corrupted. Since packages are not only forwarded, but mixed a single corrupted package can "infect" all the information that is being passed from the source to the sinks [13]. In order to understand the impact of this type of attack, a deeper understanding of network coding in general and in P2P systems is needed. The rest of the paper is organized as follows: Section II describes network coding and briefly introduces linear network coding. In Section III the advantages of network coding in peer to peer systems are presented. Section IV describes Byzantine attacks in network coding and section V shows different countermeasures for this type of attack. A S1 R1 T II. NETWORK CODING S2 R2 B A (a) Both R 1 and R 2 want (b) With network coding, to receive the packets A this can be done in one and B in this network time frame A S1 R1 A A T A S1 R1 B A A B A T A+B S2 R2 A+B (c) With normal routing, only one of the end nodes can get both packages in one time frame Fig. 1. The canonical simple example for network coding, a butterfly network. The canonical example for network coding is a butterfly network. Two nodes, S 1 and S 2 want to send to different B B A+B B S2 R2 B 91

92 ANALYZING NETWORK CODING FOR SECURITY THREATS AND ATTACKS packets across a network as shown in figure 1(a). Both R 1 and R 2 want to receive both packages. 1(a) Assume that in one time frame, S 1 and S 2 can send a package to R 1 or R 2. But every path can just be used once in one time frame. It is easy to see that there is a bottleneck in the middle of the network. With traditional routing, that is no package can be altered they can just be forwarded, two time frames must be used, as shown in figure 1(c). If we refrain from the restriction, that packets cannot be altered, the packets could be send in a more efficient way. The node T could combine the information of A and B in such a way, that then one of the end-nodes has already one package, the other one can be calculated from the coded packet from T. In this simple example, the XOR operation could be used. T computes A B and forwards the coded package. Both R 1 and R 2 receives the same package A B as seen in figure 1(b). They also receive A from S 1 and B from S 2. R 1 and R 2 now have to decode the coded package. Because a simple XOR was used, R 1 can easily decode B with (A B) A. R 2 can decode A in the same manner. If the packets are send in this way, only one time frame is needed. So in this special case, sending the packets with network coding is twice as fast as with normal routing. A. Encoding For linear network coding [21], that is used in practice, this simple scheme is generalized. Linear coding over finite fields are used, because the algorithms for coding and decoding are well understood. 2 s bits can be interpreted as a symbol in the finite field 2 (also called Galois field) F 2 s. Let L be the length of some packet. A packet is then divided into n symbols of length L/n and the last symbol is filled with zero bits if its length does not equal to L/n. A packet M i can have many symbols. Data can be defined as a group of packages, M 1,, M n. The k-th symbol of M is M k. To clarify the next equations, the original data can now be expressed as (M 1 1,, M m 1 ),, (M 1 n,, M m n ), where (M 1 i,, M m i ) = M i is the i-th package of the data. To keep things simple, assume that each encoded packet has a coefficient vector assigned. This coefficient vector is called encoding vector and is defined as g = (g 1,, g n ) and the information vector is X = (X 1,, X n ) [3] [5] For the information vector X = n i=1 g im i one must perform a summation for each symbol. Each of the X k are computed as X k = n i=1 g im i k. All additions and multiplications are carried out in the chosen F 2 s Encoding can also be done in a recursive manner, so that new packets are generated out of a set of already received packets. Suppose a node stores the m packets it received 2 so far as (g 1, X 1 ),, (g m, X m ), where (g k, X k ) is the k-th information vector and encoding vector of the k-th packet. To compute a new packet (g, X ) the node picks then a new set of coefficients, h = (h 1,, h n ) and computes X as X = m i=1 h ix i. Then, g is given by g i = m j=1 h jg j i [5] B. Decoding A node receives at least n coded packages of the form (g 1, X 1 ),, (g m, X m ), where (g k, X k ) is the k-th information vector and encoding vector of the k-th packet. The node can built a set of linear equations {X j = n i=1 g i j M i }, where the unknowns are them i s. If one node received enough linear independent packets to build at least n equations, all the original packets M 1,, M n can be decoded uniquely simply by solving the equations. C. Choosing coefficients Two strategies exist. There is a deterministic polynomialtime algorithm for multicasting [24] that chooses the coefficients for the nodes in such a way that there are no collisions in the form of linear dependencies. Because the coefficients are deterministic, no encoding vectors must be transmitted. Every sink can compute it so that it can decode the packages that are just the information vectors. This has the advantage that this really gives the maximal achievable throughput for the network, because only the information vectors must be transfered. This comes with the price that nodes cannot easily join or disconnect from the network. In order to retain full flexibility to accommodate for changes in network topology, a second scheme for choosing coefficients exists, that is called random linear coding [12] [10]. Retaining flexibility to network topology changes is crucial for a practical p2p system with network coding, because nodes behave very dynamically in a practical p2p system and are not reliable entities. 1) Random linear coding: In random linear coding, the coefficients are chosen at random in a decentralized manner. There is a proven lower bound on the success probability to recover the original data if n coded packets have been received. For example, the lower bound for the success probability in a random linear network with 100 nodes with F 2 8 is [10]. The lower bound depends on the amount of nodes in the system and the chosen finite field. If the chosen field is bigger or if lesser nodes are in the network, the chance for a linear dependency in the decoding phase becomes less likely. So if one uses F 2 16 instead in this example, the lower bound for the success probability becomes

93 ANALYZING NETWORK CODING FOR SECURITY THREATS AND ATTACKS Choosing F 2 16 instead of F 2 8 comes at the cost that operations are more costly. Another way to improve the probability is that more than n packets can be send, so that there is a compensate for collisions that could occur. In random linear network coding, choosing the finite field F 2 s is a trade off between computational complexity and throughput. III. ADVANTAGES OF NETWORK CODING IN P2P SYSTEMS BitTorrent, one of the most famous traditional peer to peer systems, uses a modified version of a tit-for-tat algorithm that is called optimistic unchoking [4]. In short, it rewards cooperation and gives incentives to the user for uploading with a higher bandwidth. Also, rare packets are downloaded with priority to improve overall availability. When a peers upload capacity is full, peers that do not cooperate are "choked" and the client tries to upload to a (hopefully) more cooperative user instead. The goal is to optimize the pareto efficiency: a node that is not cooperating consumes upload capacity, but does not return download speed in favor to the client. If two nodes cooperate, they make a pareto improving move: both of the upload capacities assigned to each other is helping each node to attain a higher download speed. They attributed this observation to the tit-for-tat algorithm and to the way a BitTorrent-like system tries to maximize availability. When starting a download, a node has to search rare packets first and has no good chances for cooperation, because it has nothing to share and cannot cooperate. Over the time it has more and more packets to share and can form good cooperations that helps the node to download faster. In the end it must search nodes that have the an exact copy of missing file parts, so it is harder to form and maintain cooperations that are useful and download speed decreases. A lot of the overall time is spend in the begging and the end of a download. In a P2P network with network coding, there is no need to increase availability by looking for rare parts of the data. Nearly every new packet is unique and is valuable to everyone. That means, that nodes can form cooperations more easily, because nearly every packet they have can be traded easily. A node has no need to search rare parts in the beginning and when it completes its file, it does not need to search the missing parts. [6] /µ=0.5 /µ=1 /µ= Number of Peers Fig. 3. Amount of time spent in each stage of the download when using a network coding peer to peer system (simulated). Taken from [6] Download Completedness(%) Fig. 2. "Peer distribution in the stable state, without the seeds departure and the download peers aborting" from [8]. By varying parameters in the simulation the severeness of the U-shape can be influenced, but the overall situation remains the same: there is a big group of nodes struggling to start or end a download in a BitTorrent-like file sharing system. In [8], Tian et al. modeled and analyzed BitTorrent-like peer to peer networks and how the tit-for-tat algorithm influences the download times for each peer. Figure 2 demonstrates the state of a stable system. In this system, no node aborts the download and all nodes keep uploading as a seed after the download has finished. They came to the conclusion that the tit-for-tat algorithm gives such a system an U-like shape. There are many more peers trying to begin the download and many more peers that want to finish their download then peers that are in the middle of the file transfer. Figure 3 illustrates this. Note the absence of an U-like shape, overall performance can thus be improved then compared to a traditional peer to peer system. [6] Because every new packet is unique, the availability of the data increases. As long as there are enough packets (n or more) the data stays available. The network can recover itself even in extreme situations, when no node has the complete file, but only some fractions of it. In [9] a very extreme example for this was created and simulated, to show how a P2P network coding can make a very robust system, how availability is attained when nodes are not reliable and leave early. A model was created that compared a stable system, there all nodes stay forever, to a system in which the server leaves after serving the full file and every node leaves after downloading the whole file. See figure 4 for experimental results. Even if nodes are very impatient and selfish and leave early, as apposed to an optimal state there nodes stay forever, performance and availability is on par to the stable state. This is one of the key advantages of network 93

94 ANALYZING NETWORK CODING FOR SECURITY THREATS AND ATTACKS with it. This is no thread to traditional P2P systems, in the sense that a corrupted package can be easily identified and verified using traditional hash functions. But because information is coded and mixed in network coding, simple hash functions can not be applied to individual packets. They can be still applied for a whole set of packets (for example the complete data that should be transmitted). But then, the receiving nodes can only decide in the end of the decoding process if the data is not corrupted. If it decides that a corruption occurred, this would mean that the node must download all the data again, because it does not know which individual packets got corrupted. Fig. 4. "Finish times for 200 nodes using network coding when a) the server stays for ever and b) when the server leaves after serving the full file. Nodes arrive in batches of 40 nodes every 20 rounds. Nodes leave immediately after downloading the file." Taken from [9] coding. IV. BYZANTINE ATTACKS IN NETWORK CODING P2P SYSTEMS In a Byzantine attack, a malicious node is allowed to interfere with the data. It can change or completely discard data it should forward. A Byzantine adversary could also introduce new packets, that are not part of the encoded data. Since packages are not only forwarded, but mixed and encoded a single corrupted package can "infect" all the information that is being passed from the source to the sinks. [13] In [17], Luisa Lima et al. proved that for a P2P system with network coding, even for a "small probability of attack the system fails with overwhelming probability in the presence of Byzantine adversaries". So in order to be of practical use, the issue of Byzantine adversaries must be solved, otherwise the advantages of network coding are outweighed by the fact that traditional P2P network system are not prone to this type of attack. That is why it is not surprising that attacks on network coding in the presence of Byzantine adversaries is an emerging field of research and numerous recent papers exist that present different ideas how to solve this issue. V. COUNTER MEASURES FOR BYZANTINE ATTACKS IN NETWORK CODING Quite different ideas have emerged to solve the issue of Byzantine attacks in network coding. The proposals seem to fall into this three board categories [16]: Alice Calvin X1 X 2 X 3 R 1 R 2 R 3 α1 X1 β1 Z α 2 X 2 β 2 Z α 3 X 3 β 3 Z Fig. 5. An illustrating example for Byzantine Attacks in Network Coding, taken from [13]. Alice transmits to Bob. Calvin injects corrupted packets into their communication, corrupting all the information that is being send. Figure 5 illustrates this with an easy example. Note that even though Calvin did not exchange any packets with Alice or Bob directly, he managed to corrupt the data that was being send from Alice to Bob. The intermediate nodes between Alice and Bob were tricked into thinking that the vector Z is part of the communication and performed network coding Bob A. End-to-end error correction schemes Network Error Correction for Network Coding was theoretically introduced in [25] and [2]. The idea is to generalize existing point-to-point error correction codes, like hamming codes, to the multi-cast case by defining other measures of distances that are useful for network coding. Additional coded information needs to be transmitted that helps the end-nodes to decide which data was corrupted. There are proved bounds on the maximal achievable rates when an Byzantine adversary is present. Jaggie at al. introduced the first practical scheme. [14] Their network codes are polynomial-time and distributed, furthermore they are rate optimal because they achieve the theoretic maximal bounds. Let z o be the rate at which a Byzantine adversary injects data into the system. C is the network capacity, it is the maximal flow of the network (or the min-cut). The maximal rate of C is only achievable without any interference. If the nodes that perform the end-to-end error correction have a 94

95 ANALYZING NETWORK CODING FOR SECURITY THREATS AND ATTACKS secret channel, an optimal rate of C z o is achievable. [14] This means that for an adversary that wants to corrupt all the communication, as much corrupt data has to be injected as there is normal data. This is very good and comparable to a traditional packet routing system with error correction. But it is impractical for peer to peer systems, because a secure channel between each of the end points in the system is hard to attain. Jaggie at al. also suggested a different scheme for a setup in which there is no secret channel between the end points. Then the optimal rate becomes C 2z o [14]. This means that if the majority of the nodes act healthy in the system, then network error correction can still be useful and can be applied to peer to peer system with network coding. B. Generation-based Byzantine detection schemes Generations are a parts of the data, for which the network coding is done. On can split a large file into many independent parts on which the encoding and decoding operations are done. This parts are called generations. The most naive approach against Byzantine adversaries would be to make many small generations and use traditional hash functions to detect changes in the data once the complete generation can be decoded. Then, if a node detects corrupt data, it must redownload only the affected generations, not the whole file. This is not very optimal, but can be done with little computational overhead. This would however sacrify some of the advantageous properties of network coding, the better availability can only be made possible in relation to one generation. Ho et al. [11] introduced an informatic-theoretic approach for detecting Byzantine adversaries in random linear network coding, based on the assumption that the adversary does not know all the linear combinations received by all sinks of the network. A polynomial hash is added to each packet in the generation. Once a node receives all necessary packets for a generation, it can detect errors with some probabilistic error margin. The detection probability can be varied by the length of the hash, the field size, and the amount of information (the linear combinations) that the adversary does not know. The advantages of this scheme are that the probability can be traded by varying some parameters that also effect the computational power needed for net work coding (for example the field size). But it makes only sense in combination with some other scheme, because errors could be still present in the decoding phase of the generation and this scheme only detects the errors probabilistically. C. Packet-based Byzantine detection schemes Several ideas have been proposed to solve the issue of Byzantine adversaries in network coding on a packet basis. This has the advantage that corrupted or malicious packages can be examined and detected by every node, so that corrupted packages are not forwarded. To be of practical use, it must be computationally fast enough so that it is not performance bottleneck in the network. This is especially true for the verify operation in a scheme, since a possible practical scheme would verify a package on every intermediate node. In [15], Kamal et al. propose a signature scheme for network coding that is based on elliptic curves. The scheme is quite complex and signatures on elliptic curves are generally a bit computationally intense [18]. So this is a scheme that works in theory, but then applied to peer to peer networks, there massive data volumes could be exchanged, this remains impractical. There are experimental results in [26] that show doubtlessly that this is very time consuming. An interesting approach are homomorphic hash functions [8]. This is because they are the siblings of normal hash functions and could be used like hash functions in traditional peer to peer networks. A hash function is a function h(x), that usually maps a large input, say x, to a much smaller input. To be useful in peer to peer networks, the hash function should atleast satisfy the second preimage resistance. That means, for any given x it should be hard to find a x, so that h(x) = h(x ). An homomorphic hash function has an additional property: For any original blocks b i with i [1, n] the hash value of any given linear combination b = c 1 b 1 + c 2 b c n b n of the blocks is given by h c1 (b 1 ) h c2 (b 2 )... h cn (b n ) [8]. This allows the nodes to compute new hash values out of already existing hash values. A node that generates new coded packages out of already coded packages (recall the part in section "Encoding" about recursive encoding) can thus generate new packages with valid hash values easily. Every node can check the validity of the packages it receives. The authors have given a concrete homomorphic hash function. Let G = (r, q, g) be a set of parameters that are needed for the hash function. r and q are some random prime numbers so that q (r 1) holds. The length of r and q are the security parameters for the system, for example 1024 bit for r and 257 bit for q. The parameter g is a vector of m numbers so that each of the elements of the vector can be written as x (r 1)/q, where X F q and x 1 [8]. Then for a block b i = [b 1 i, b 2 i,, b n i ], h(b i ) is defined as: h(b i ) = m k=1 g k bik. A hash value of an entire file is then a vector over all the b i : H(F ) = (h(b 1 ), h(b 2 ),, h(b n )) [8]. The homomorphic property comes now into play: For an encoded block e = n i=1 c ib i the hash value can be computed from thew original blocks by the following equivalence: e n i=1 hci (b i ) mod r This scheme is very simple, but has several drawbacks. First, all the computation needs to be done in F q (where q is some prime number) and not some finite field F 2 s (as discussed in the section "Encoding"). This alone makes the decoding 95

96 ANALYZING NETWORK CODING FOR SECURITY THREATS AND ATTACKS and encoding in the network slower. Then, computing and checking the hash values is a very slow process. For instance the checking rates are approximately 300 Kbps on a 3 GHz Pentium processor. Checking normal hash functions, like SHA-1 can be done with 560 Mbps on the same computer! [8] Thus, for now homomorphic hash functions are also of theoretic nature and can t be used to solve the issue of Byzantine attacks in network coding for now. In [16], a new signature scheme is proposed that is based on the Diffie-Hellman problem and also works on a packet basis. They have proven that their scheme is the most bandwidth effective and stated that it is thus superior to other schemes. No concrete numbers are given for how intensive the computational part is, but chances are, since they also using modulo-arithmetic and exponentiation that it is also to expensive for most of todays computers to be of practical use. VI. CONCLUSION AND FUTURE WORK This work has shown that network coding in peer to peer networks has some interesting and advantageous properties that could make peer to peer networks more robust in regards to file availability and could increase download performance for all participants. However, with network coding the thread of Byzantine adversaries is introduced. There are many proposals, many of them have their own short comings. Most of them are simply to slow to be used in practice and would annihilate the performance advantages of network coding. Further research should be focused on solving the Byzantine issue in network coding, desirable would be a simple non-probabilistic scheme that is also practical and fast on todays computers. Eventually this could make network coding a foothold in todays practical peer to peer networks. As of today, network coding remains an interesting concept in theory. [10] T Ho, R Koetter, M Medard, and DR Karger. The benefits of coding over routing in a randomized setting. International Symposium on Information Theory, II-C, II-C1 [11] T Ho, B Leong, R Koetter, M Médard, and M Effros. Byzantine modification detection in multicast networks with random network coding. IEEE Transactions on Information Theory, V-B [12] T Ho, M Médard, J Shi, M Effros, and DR Karger. On randomized network coding. In Proceedings of 41st Annual Allerton Conference on Communication, Control, and Computing, II-C [13] S Jaggi, M Langberg, S Katti, T Ho, D Katabi, and M Medard. Resilient network coding in the presence of byzantine adversaries. IEEE INFOCOM 2007, I, IV, 5 [14] S Jaggi, M Langberg, S Katti, T Ho, D Katabi, and M Medard. Resilient network coding in the presence of byzantine adversaries. IEEE INFOCOM 2007, V-A [15] DC Kamal, D Charles, K Jain, and K Lauter. Signatures for network coding. In Proceedings of the fortieth..., V-C [16] Minji Kim, Luisa Lima, Fang Zhao, and Muriel M. On Counteracting Byzantine Attacks in Network Coded Peer-to-Peer Networks. Information Sciences, pages 1 11, V, V-C [17] MJ Kim, L Lima, F Zhao, J Barros, and M Médard. On Counteracting Byzantine Attacks in Network Coded Peer-to-Peer Networks. Arxiv preprint arxiv, IV [18] N Koblitz, A Menezes, and S Vanstone. The state of elliptic curve cryptography. Designs, Codes and Cryptography, V-C [19] R Koetter and M Médard. Beyond routing: An algebraic approach to network coding. IEEE INFOCOM, I [20] L Lamport, R Shostak, and M Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, I [21] SYR Li, RW Yeung, and N Cai. Linear network coding. IEEE Transactions on Information Theory, I, II-A [22] D Niu. On the Resilience of Network Coding in Peer-to-Peer Networks and its Applications I [23] Christos H. Papadimitriou and Kenneth Steiglitz. Combinatorial optimization: algorithms and complexity p , The Max-Flow, Min-Cut Theorem. Courier Dover Publications, I [24] P Sanders, S Egner, and L Tolhuizen. Polynomial time algorithms for network information flow. Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures, II-C [25] RW Yeung and N Cai. Network error correction, part I: Basic concepts and upper bounds. Communications in Information and Systems, V-A [26] Z Yu, Y Wei, B Ramkumar, and Y Guan. An efficient signaturebased scheme for securing network coding against pollution attacks. Proceedings of IEEE INFOCOM, V-C REFERENCES [1] R Ahlswede, N Cai, SYR Li, and RW Yeung. Network information flow. IEEE Transactions on Information Theory, I [2] N Cai and RW Yeung. Network error correction, part II: Lower bounds. Communications in Information and Systems, V-A [3] PA Chou, Y Wu, and K Jain. Practical network coding. Annual Allerton Conference on Communication Control and Computing, II-A [4] Bram Cohen. Incentives Build Robustness in BitTorrent III [5] C Fragouli, J Le Boudec, and J Widmer. Network coding: An instant primer. Computer Communication Review, II-A [6] C Gkantsidis, J Miller, and P Rodriguez. Anatomy of a p2p content distribution system with network coding. The 5th International Workshop on Peer-to-Peer Systems, I, III, 3, III [7] C Gkantsidis, J Miller, and P Rodriguez. Comprehensive view of a live network coding P2P system. Proceedings of the 6th ACM SIGCOMM on Internet measurement, I [8] C Gkantsidis and P Rodriguez. Cooperative security for network coding file distribution. IEEE INFOCOM, , III, V-C [9] C Gkantsidis and PR Rodriguez. Network coding for large scale content distribution. Proceedings IEEE INFOCOM 2005, I, III, 4 96

97 CURRENT RESEARCH ON TWITTER Current Research on Twitter Philipp Neubrand Abstract This paper will give an overview about the current research on Twitter. While all Online Social Networks (OSN) have grown in the last years, Twitter is of special importance, due to its unique characteristics: The 140 character limited, fast paced tweeting and the number of users. As Facebook has similar features and is very popular in the US, some papers on Facebook that cover these features are also included. The papers in this work will be categorized based on the attributes of OSNs that they assess, forming three major groups: 1) influence 2) social interaction between users 3) use of OSNs during emergencies or disasters. I. INTRODUCTION Online Social Networks (OSNs) have become more and more interesting as the number of internet users has exploded. With the number of potential users growing by a factor of 5 since the beginning of the century, shooting from about 360 million in 2000 up to over 1,800 million in April , the scope of any internet project has widened. When only a few percent of the population were interconnected in 2000, the internet users were mainly computer affine people and it was used to work and play. By now, over 25% of the world population are using the internet across all social levels. As the user base widened not only in size but in variety, Online Social Networks (OSNs) got a lot of attention from the scientific community across the world [5] [24] [4] [8] [18] [22]. Among the most studied attributes of OSNs are the social networks they create and how people interact within it. This is particularly interesting for marketing. Understanding the network and the motives for OSN usage is an obvious concern of most OSN founders. Lately, the use of OSNs during emergency situations got a lot of attention as it offers the possibility of integrating the information of many people. In addition, programs that build on top of OSNs use various and sometimes only assumed attributes of the OSNs they are based on. Examples are SybilGuard [27] or Reliable [6]. Proofing, disproofing or providing means of changing these attributes can change the functionality of the applications. In this paper, the current work on Twitter will be presented and categorized. In addition, some papers on Facebook are included if they cover related or for Twitter relevant topics. Twitter was picked as the main focus of this work as it is one of the biggest OSNs currently deployed with over 100 million signed up users 2. It also provides a unique way of communication, the so called microblogging or tweeting. The proposed categories are based on the aspects of Twitter that are analyzed: Twitter-snags-over-100-million-users-eyes-money-making/articleshow/ cms Influence and Ranking contains all papers analyzing various forms of influence within Twitter and techniques to rank users based on their influence. This category was introduced as measuring influence is a very popular field of research. In addition, ranking users is a common task in most advanced research, be it to single out the most influential user to have your advertisement spread or to find ways of communication with parts of the population in emergency situations. Social Interaction covers all papers that take a closer look on the social interaction between users inside OSNs. Papers investigating the motives for using OSNs are also listed here, as the why can reveal a lot about the how. Emergency Use handles papers analyzing the use of Twitter in mass emergency situations. Both past use of Twitter during such situations as well as prospects of future use are discussed. The rest of the paper is structured as follows: In section II the technical vocabulary will be introduced. Section III contains the papers in the category Influence and Ranking while section IV contains all those in the category Social Interaction. After that, section V covers all papers of the category Emergency Use. In section VI we conclude this paper with a brief forecast on future areas of research surrounding Twitter. II. TWITTERS TECHNICAL VOCABULARY Microblogging in Twitter comes along with some conventions and a specific technical vocabulary. In the following, the parts of the vocabulary relevant for this work will be listed. A tweet is a short message in Twitter, containing a maximum of 140 characters. If user A follows another user B, A will automatically receive all tweets of B. A followee or followed is a user that is being followed by another user. The followees of a user are the users he is following. A hashtag is a word in a Tweet that is meant as information about the content of that tweet. This metainformation is marked by a single "#" in front of the word, either inline or appended. For example, "#TUD" would indicate a connection of the tweet to the TU Darmstadt. A reference is a link to another Twitter user, signaled by an "@" in front of his/her name. For example, "@Flipp" is a reference to the user named Flipp. A retweet is a message that was not created by the sender but that is being forwarded. Usually the original author is tributed with a reference and the message itself is not altered but only shortened if needed. However, a variety of notations are used to signal retweets: "RT" (without a username) or even just repeating the Tweet without appending the "RT" flag. For further details see [2]. 97

98 CURRENT RESEARCH ON TWITTER III. INFLUENCE AND RANKING This section contains all papers that analyze the influence in Twitter or Facebook. Most papers that analyze how influence can be measured introduce some way of ranking users according to their metric. However, the definition of influence, while not a new concept, is quite controversial in the scientific community. This is being reflected by a broad overview about the ranking techniques currently under discussion. Unlike Googles PageRank for websites, there is no established ranking algorithm for Twitter users as of yet. Identifying influential users is a very important task as it is the basis for most advanced research. It is important no matter if you are advertising or trying to save lives in an emergency. WHAT IS TWITTER, A SOCIAL NETWORK OR A NEWS MEDIA? [16] Haewoon Kwak et al. analyze the overlay network of Twitter. As following is a directed action, the Twitter network itself is unidirectional or directed. The structure of this network is different from the expectations. While the follower distribution below 1000 followers follows a power law, the news character leads to a lower average path length than expected. In addition, for those normal users below 1000 followers most followers are geographically close and the connections are homophil. However, this pattern is obstructed by the accounts with a lot more than 1000 followers, namely celebrities and news accounts which have followers from all around the world. After that, the authors take a closer look on various ranking techniques. They compare PageRank, ranking by number of followers and ranking by number of retweets to eachother without spotting a best or worst approach. The authors do point out that especially ranking by retweet shows the rise of alternative media in Twitter. At the end they take a closer look on conversations within Twitter. Matching the contents of tweets to topics of major news websites, the authors show that there is a high overlap between them, but topics usually stay active in Twitter even after they are dropped from the news. The second aspect of conversations in Twitter, retweets, gets examined aswell. It is shown that once a Tweet starts spreading via retweeting, it will reach a certain amount of people no matter how many followers the original author has. While not offering a detailed explanation for this phenomenon, the authors imply that selected individuals have the power of dictating what is being retweeted and what not. TWITTERRANK: FINDING TOPIC-SENSITIVE INFLUENTIALS IN TWITTER [25] Weng et. el. analyze how influential Twitter users can be identified. First they inspect the usual way of determinating a users influence based on the number of followers. As this approach would be meaningless if the following relation is random, it is first shown that users following each other are more likely to have the same interests. This would not be the case if the relations would just be random. Next the authors introduce a more refined approach of ranking users, called TwitterRank. Based on PageRank the probability of a random surfer to traverse an edge in TwitterRank is topic specific. Thereby a topic-specific relationship network is constructed, which is a subset of the complete Twitter network. This extension to PageRank means that the influence of a user on another users depends on the number of tweets he sends and the topic similarity of both. By aggregating all the TwitterRanks the overall influence of a user can be measured. In the last part the authors compare the most influential users found by their algorithm to the ones found by the traditional In-degree approach and to an unaltered PageRank algorithm. They show that their listing is different from the other two. In a performance test they then apply the Twitter recommendation algorithm (that is selecting users to be recommended for following) to their ranking and show that their ranking yields consistently better results although the difference is only marginal. This is to be expected, as as neither In-degree nor the traditional PageRank factor in similar interests. MEASURING USER INFLUENCE IN TWITTER: THE MILLION FOLLOWER FALLACY [3] Cha et. al. take a closer look at the way people are influenced in Twitter: They distinguish between the popularity (the indegree), the content value (number of retweets) and the name value (number of mentions) of a user. These are vastly different as a popular user (like a celebrity) does not necessarily produce the most valued content. In addition, the most retweeted users are often news broadcasters that get their interesting or important news retweeted, while mentions usually emerge through continuous conversations. This explains why the popularity does not map to both the content and the name value of a user, meaning that those are in fact three different types of influence. After having established those different definitions of influence, the authors analyze if influence is limited to one topic. They show that especially the news authorities do have influence about a variety of topics but even ordinary people can achieve that. In the end the authors suggest a way of achieving influence: post interesting and creative tweets on selected topics that get retweeted and engage in conversations. FINDING INFLUENTIALS BASED ON TEMPORAL ORDER OF ADOPTION IN TWITTER [17] Lee et. al. look into the task of finding the most influential individuals within a Twitter conversation. While the first approach would be just to just check the number of followers of a user, the authors postulate a more sophisticated ranking based on the chronological order of tweets: The idea being that early tweets have much more influence on the conversation than later ones. To quantify this definition of influence, the authors introduce the effective readers for every tweet, which is defined as the number of people that hear about that specific topic the first time via this very tweet. While this is in some way linked to the number of followers, the later a tweet happens the lower the number of effective readers will be, no matter how many followers someone has. 98

99 CURRENT RESEARCH ON TWITTER This ranking technique is unique in that it identifies other people as influential than ranking by # followers or PageRank. The authors state that while the ranking does work, there is some room for improvement and that they will add some additional considerations in the future. While celebrity Ashton Kutcher leads both traditional rankings (#followers and PageRank), Pete Cashmore, a newswrite on social media, leads the effective readers ranking. NEPOTISTIC RELATIONSHIPS IN TWITTER AND THEIR IMPACT ON RANK PRESTIGE ALGORITHMS [7] Avello et. al. explore the impact of spammers on various ranking algorithms and introduce means to desensitize ranking algorithms to their presence: An in this way altered algorithm would rank spammers low, eliminating the need of detecting and removing them before ranking. At first, the compared algorithms are introduced: PageRank [19], Hyperlink-Induced Topic Search (HITS) [14], NodeRank [20], TunkRank 3 and TwitterRank [25] and known weaknesses of both PageRank and HITS [1] are mentioned. HITS is not intended to be applied to the whole internet but is instead meant to be applied to the result of a search query. It is based on the assumption, that there are two different types of documents: pages with a high count of incoming and pages with a lot of outgoing links. This leads to two different scores being awarded, the authoritive and the hub score. The authoritive score is the sum of the hub scores of all pages linking to a page and the hub score is the sum of the authoritive scores of all pages a page links to. If the scores are normalized in-between passes, the ranking converges after a few iterations. NodeRank uses the PageRank approach to ranking web pages but was modified to work on weighted graphs. In addition, the damping/teleportation parameter is not fixed for the whole algorithm and network. This parameter reduces the ranking a bit, accomodating for the fact that the random surfer, which is the main idea of PageRank, will at some point stop browsing. In NodeRank the factor depends on the number of outgoing connections of the current node. By this, the algorithm can adapt to different topologies within a network. In comparison to the two rankings above, TunkRank was designed to rank users in Twitter. It defines influence as the number of people that will read a tweet. The calculation is performed recursively based on two assumptions: 1) every user pays the same attention to all the people they are following, i.e. the chance of a tweet being read is 1 / # of followees; 2) if a user reads a tweet, there is a constant probability for a retweet. The influence is then calculated as the probability of a read by its followers plus the probability that a retweet occurs and is read, and so on. The drawback of TunkRank is that the retweet probability needs to be rather low, as otherwise cycles in the recursion may occur and thus the calculation does not converge. Next a way to desensitize a ranking algorithm to spammers is introduced: A paradoxical discounted ratio is proposed, that can be used as a weight in any given algorithm. This ratio for a user is either followers/followees (if a user has more followers 3 than followees) or the same equation but with all repricoral (2-way) links removed. Taking PageRank as an example the authors show that influencing this metric is hardly possible and that it would in fact improve a ranking. Then 50 spammer accounts (by checking their bio information and the average follower/followee count as well as their usual conversations) and about the same number of aggressive marketing accounts (legit but "spammy" marketing accounts) are identified by hand and their place in the rankings is determined. PageRank, HITS and NodeRank all rank spammers and marketeers about equally high (50% in the top 10% to 20% of the ranks) with TunkRank only rating spammers high and marketeers average among the users. TwitterRank only gives low prestige to both spammers and marketeers while at the same time ranking spammers very high (90% in the top 30% of users). The author explains that prestige is distributed mainly between the top 25 accounts, that get more than 95% of the available prestige. The tweaked PageRank gets very mixed results as it puts about 40% of the spammers in a bin with nearly zero prestige but at the same time ranks the remaining 60% in the top 10% of users. The authors conclude that you can tamper with the currently used PageRank, for example by link spamming. NodeRank has the same problem as it is closely related to PageRank. TwitterRank is probably not suited for global ranking as it was initially designed for local analysis and is computation intensive on a global scale. TunkRank on the other hand does outperform all other algorithms and is therefore recommended. At the end they try to explain the complex behavior of the tweaked PageRank. For once, most of the users receive virtually no prestige by this ranking. Considering that most of the users do not contribute, this is expected (see participation inequality [9]. On the other hand the top positions for the tweaked PageRank are a lot different from the top positions of most other rankings. This is explained by how the weights are introduced into the PageRank algorithm: A user is not affected by its own ratio but by the ratio of his followers. Being linked to famous users with a high ratio leads to a high score which then means a high position. The authors use the term giant shoulder phenomenon, as lesser popular users are carried to higher positions by popular ones. IV. SOCIAL INTERACTION Collected in the next section are all papers analyzing various aspects of social interaction in OSNs including motives for using OSNs. This is in fact a very broad area of research, that could be broken down further in future work, as papers analyzing the motivations and habits of people using OSNs are mixed with papers analyzing the pure social interaction. "LOOKING AT", "LOOKING UP" OR "KEEPING UP WITH" PEOPLE? MOTIVES AND USES OF FACEBOOK [13] Joinson looks into the motives for using Facebook. Even though the author does not analyze Twitter, the motives for using various OSNs are most likely pretty similar. The conclusions and findings in this paper about Facebook are therefore, at least to some extent, applicable to Twitter as well. The 99

100 CURRENT RESEARCH ON TWITTER author conducted a study among Facebook users to identify the main reasons, the so called factors, for using Facebook and lists 7 of them: Keeping in touch, Passive contact, social surveillance, Re-acquiring lost contacts, Communication, Photographs, Design related (using Facebook because its ease of use), Perpetual contact (seeing another users status) and Making new contacts. In addition he introduces up to 8 loads per factor, that describe it even further. For example the first factor, Keeping in touch has the load Finding out what old friends are doing now. Next he matches the factors/loads to the demographics of the participants and establishs a model to predict the average frequency of facebook use as well as the duration of each visit based on these factors. He finds that gender and age are the most determining factors for both frequency and time spent and that the different factors come with different implications for frequency and time. Females visit more often while younger people spend more time. All in all, your demographics and the motivation for your use of Facebook allow predictions for both how often and how long you will use Facebook. At the end the author matches the motives against the privacy settings of users and concludes that wanting to meet new people is the first reason to lower the privacy level. WHY WE TWITTER: UNDERSTANDING MICROBLOGGING USAGE AND COMMUNITIES [12] Java et. al. try to answer two questions: Why do we Twitter? and How do we Twitter?. In addition, the authors give a brief overview about who twitters and explain that while originally launched in the US, Twitter became fairly popular in Europe and Asia as well. Even though, friendships are more likely to be intra continental and between geographically close people. They then introduce three categories of user relationship: information-sharing (having a lot of followers but only following a few), information-seeking (following many but being only followed by a few) and friendship-wise (having an equal number of followers and followees). Considering only the last category, the authors identify certain communities that are formed in Twitter. These communities show a dense network of friendship relations while only having scarce relations with other people. Communities mostly form around similar interests. Finally the authors analyze what people talk about in Twitter throughout the week. Not surprisingly, the biggest topic is Daily Chatter, followed by Conversations. A FEW CHIRPS ABOUT TWITTER [15] Krishnamurthy et. al. describe the twitter network as well as its users. They identify three different categories of Twitter users, namely broadcasters with a big number of followees, acquaintances with roughly even numbers of followers and followees and miscreants/evangelists with a much larger number of people they are following. The first category mostly consists of news agencies and celebrities, the second one is the biggest as it contains most normal users. The last category is populated by spammers. They then analyze the tweeting behavior and find that broadcasters indeed tweet a lot, acquaintances have conversations and spammers do not chat a lot. In addition, most tweets are entered through the web interface, followed by mobile devices and instant messengers. At the end the authors take a quick look at the popularity of Twitter in different geographical regions, identified by timezones. They show that Twitter has most users in the US followed by Europe and Japan and that the growth in all regions does slow down after a first surge at the very beginning. USER INTERACTION IN SOCIAL NETWORKS AND THEIR IMPLICATIONS [26] Wilson et. al. explore whether ordinary links in Facebook are in fact friend relationships and if those links are all equal. As this is a question that is relevant for all OSNs (be it Facebook, Twitter or Flickr), the conclusions drawn here are probably applicable to other OSNs as well. To answer this question they take a closer look on the interaction between friends and show that most of the interaction (>70%) happens among a small subset (<20%) of declared friends. Further, about 40% of declared friends are not interacting at all. They conclude that not every link is equal and that only a subset of them actually represent a real relationship. To further prove their point they examine the wall posts and photo comments in Facebook and show that only 1% of the users are responsible for 20% of the wall posts and another 1% of the users are responsible for 40% of the photo comments. Thus, a small group of very active users opposes a rather big group of not so active ones and therefore the links have to be different. Correlating the amount of connections to the percentage of interaction the authors show that 50% of all interactions are done by only 10% of the users and nearly 100% of interactions are done by about 50% of the users. After showing that the social graph does not represent active social relationships, the authors introduce their approach of representing user relationships: the interaction graph. In this graph users only get connected if they actively interact with each other rather than just get linked. More formally, the graph specifies a interaction rate threshold (n interactions in t time) which has to be surpassed to get connected. After evaluating different values for n and t, the authors chose n >= 1 and a varying t at 2 month, 6 month, 12 month and lifetime. In a next step they then compare those five graphs (one social network graph and four interaction graphs with different t) to each other using various graph metrics. The interaction graphs actually follow a power-law scaling more precisely (having a lower fitting error) but have a lower clustering and longer paths. They remain small-world graphs. In the last chapter the authors evaluate the meaning of their interaction graph to applications that use OSNs (or graphs generated by OSNs), namely Reliable (an OSN based anti-spam system) and SybilGuard (guarding distributed applications against Sybil attacks where an attacker introduces a certain number of controlled nodes to influence the distributed application). They show that while RE would work significantly better with an interaction graph, SybilGuard would drop in efficiency. 100

101 CURRENT RESEARCH ON TWITTER SOCIAL NETWORKS THAT MATTER: TWITTER UNDER THE MICROSCOPE [10] Huberman et. al. take a closer look into the way people connect in Twitter: Following and being followed. As you do not need to accept anyone that wants to follow you, the maintenance of this relationship is rather small. They then ask if the relationship means something. To answer this question they compare the number of followers to the number of friends (by their definition anyone a user has mentioned at least two times) and they show that even with this weak definition of friendship the dense network of followers gets thinned out a lot: No one has more friends than followers and most people have lass then 10% of their followers as friends. The authors conclude that a link between two people does not necessarily mean that they are friends and that therefore it should not be viewed as this. TWEET, TWEET, RETWEET: CONVERSATIONAL ASPECTS OF RETWEETING ON TWITTER [2] Boyd et. al. analyze the use of retweeting in Twitter, in particular how, why and what people retweet. First they note, that while there are conventions about retweeting those are interpreted in different ways by different users. People may or may not attribute the original author or even introduce their own convention that signals a retweet. In addition, commenting on the original Tweets messes the conventions up even more and nested retweets are not supported by any convention so far. In the next section the shortening of long tweets is examined. As the convention for retweeting is to insert additional characters into the tweet, long tweets need to be shortened to fit into the 140 character limit. The authors identify two types of shortening behavior: Preservers only shorten words and remove punctuation, while preserving the content and most of the message while Adapters remove whole words or even rephrase the tweet. They conclude that even though messages are altered and striped down to where you can barely read and understand them, the original meaning is preserved in retweets. Next the motivations behind retweeting are examined. After evaluating the answers of regular Twitter users being asked why they retweet, the authors list 10 motivations: Spreading tweets, Informing others, Commenting, To show that you are listening, Agreeing, Validating, Helping less popular people/topics, Gaining followers or reciprocity and Saving Tweets (as they will show up on your own profile after being retweeted rather than on someone else s). The authors then take a look on what people actually retweet. The reason for retweeting obviously influences what you are retweeting but common elements can be found. Breaking news get retweeted as well as otherwise interesting tweets (for someones followees). Interestingly various calls for social action get retweeted as well: calls to raise money but also calls to get a message out (as happened when George Tiller was murdered, people retweeted the message until it showed up as a current Trend). V. EMERGENCY USE Paper in this category deal with Twitter in emergency situations. A great variety of information is spread via Twitter in emergency situations, ranging from personal status updates (I am fine...) to circumstantial information I can see the fires burning three miles east. This leads to the rather different tasks of not only identifying reliable source but also recognizing important information and parsing this information. In addition, Twitter can be used as a way to get the message out to people, as important information gets spread rather fast via Twitter. This is where Twitter is unique: it provides fast communication and supplies a back channel for information flowing from the field into the command. PASS IT ON?: RETWEETING IN MASS EMERGENCY [21] Starbird and Palen examine the phenomenon retweeting in Twitter. This convention allows Twitterers to forward information they receive via Twitter to their followers. Usually the original author is tributed by mentioning his name and by only tweaking the tweet in length. This form of influence is only loosely dependent on the social network and allows a more precise classification of tweets: In an emergency situation important tweets are retweeted more often than unimportant ones. Furthermore, tweets by official accounts (like radio stations/newspapers) or by accounts just created for this situation are more likely to be retweeted. This implies that independent of your original number of followers, your message will get broadcasted if it is important enough. The locality of a user to an emergency allows another distinction: If he is affected by it, f.e. by living in an affected region, he will forward tweets that contain information that is important for other local users, like locations of evacuation centers. If he is not local he will forward more newslike information, for example pictures and overviews. The authors of the paper introduce the number of retweets as a filter mechanism for important information. If a tweet is forwarded by different people it is more important than if it just vanishes. MICROBLOGGING DURING TWO NATURAL HAZARDS EVENTS: WHAT TWITTER MAY CONTRIBUTE TO SITUATIONAL AWARENESS [23] Vieweg et. al examine the usage of Twitter during two natural hazard situation in the USA with regard to how those tweets can enhance the situational awareness of first responders. The term situational awareness is used to describe the ability of seeing all aspects of a situation rather than only a piece of it. The authors propose that by including the information available via Twitter the situational awareness can be improved. In the two analyzed situations, namely the Red River Crest and the Oklahoma Grassfires, both 2010, a lot of local people broadcasted so called situational updates: various information about the current situation from water heights to shelter locations. The authors then read through all of the messages and categorized them by their content: If a tweet contained the name of a highway it was tagged as Road Condition, if 101

102 CURRENT RESEARCH ON TWITTER it contained information about injuries or some type of damages it was labeled as Damage/Injuries. Ten basic categories were identified: Preparation, Warning, Response to Warning, Hazards Location, Other Environmental Conditions, Advice, Evacuation, Sheltering, Animal Management and Damage and Injury reports. The authors hope that these categories can be automatically assigned in the future maybe as a basis for automated content extraction. TWITTER ADOPTION AND USE IN MASS CONVERGENCE AND EMERGENCY EVENTS [11] Hughes and Palen attempt a descriptive analysis of Twitter usage during two emergencies, namely the two hurricanes Gustav and Ike, and two national security events, namely the Democratic and Republic National Council (all 2008). The tweets regarding these events are compared to all the tweets of that time to detect patterns in non-routine Twitter usage. The authors point out that the percentage of users with a specific amount of tweets in the conversation is constant even though the number of involved users varies. In addition, the tweets surrounding the events contain less replies and more URLs than average, making them less private and more informative. The authors postulate that around special events information brokers emerge with the majority of people just consuming the information those brokers spread. VI. CONCLUSION AND FUTURE WORK The paper has given an overview about the current research on Twitter and has categorized this research. The current focus of research is the social interaction in Twitter and ways to rank and formalize this interaction. While a lot of ranking algorithms have been introduced, as of yet there is no established ranking similar to Googles PageRank for websites. Complicating matters, there is no easy way of determinating the quality of a ranking algorithm for Twitter users as the best possible ranking is a controversial topic in itself. Finding and establishing a ranking will be a challenge. In addition, another field of research is emerging: The use of Twitter in emergency or non-routine situations. While this is still mostly descriptive analysis of past use, some efforts are undertaken to show how Twitter can be used in future emergencies to help both the government and the affected population. This includes early drafts of automated data mining from Twitter as well as exploration of how people can be informed via Twitter. All the research in this field is in a very early stage and while most of the time only ideas are elaborated, using Twitter in a real emergency situation seems to be the next logical step. While this survey was limited to exploring research about Twitter and related topics, there are a great variety of other OSNs being researched: Facebook [24], Flickr [8], even Youtube [4], each with its own interesting features. In addition, as this paper is getting finished, new research about Twitter is being published. REFERENCES [1] Krishna Bharat and Monika R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In SIGIR 98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages , New York, NY, USA, ACM. III [2] Danah Boyd, Scott Golder, and Gilad Lotan. Tweet, tweet, retweet: Conversational aspects of retweeting on twitter. In HICSS 10: Proceedings of the rd Hawaii International Conference on System Sciences, pages 1 10, Washington, DC, USA, IEEE Computer Society. II, IV [3] Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and Krishna P. Gummadi. Measuring User Influence in Twitter: The Million Follower Fallacy. In In Proceedings of the 4th International AAAI Conference on Weblogs and Social Media (ICWSM), Washington DC, USA, May III [4] Meeyoung Cha, Haewoon Kwak, Pablo Rodriguez, Yong-Yeol Ahn, and Sue Moon. I tube, you tube, everybody tubes: analyzing the world s largest user generated content video system. In IMC 07: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, pages 1 14, New York, NY, USA, ACM. I, VI [5] Meeyoung Cha, Alan Mislove, and Krishna P. Gummadi. A measurement-driven analysis of information propagation in the flickr social network. In WWW 09: Proceedings of the 18th international conference on World wide web, pages , New York, NY, USA, ACM. I [6] Scott Garriss, Michael Kaminsky, Michael J. Freedman, Brad Karp, David Mazières, and Haifeng Yu. Re: reliable . In NSDI 06: Proceedings of the 3rd conference on Networked Systems Design & Implementation, pages 22 22, Berkeley, CA, USA, USENIX Association. I [7] Daniel Gayo-Avello. Nepotistic relationships in twitter and their impact on rank prestige algorithms. CoRR, abs/ , III [8] Amit Goyal, Francesco Bonchi, and Laks V.S. Lakshmanan. Learning influence probabilities in social networks. In WSDM 10: Proceedings of the third ACM international conference on Web search and data mining, pages , New York, NY, USA, ACM. I, VI [9] B. Heil and M. Piskorski. New twitter research: Men follow men and nobody tweets. Available at: twitter_research_men_follo.html, III [10] Bernardo A. Huberman, Daniel M. Romero, and Fang Wu. Social networks that matter: Twitter under the microscope. CoRR, abs/ , IV [11] Amanda Lee Hughes and Leysia Palen. Twitter adoption and use in mass convergence and emergency events. International Journal of Emergency Management, 6: (13), 11 February V [12] Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng. Why we twitter: understanding microblogging usage and communities. In WebKDD/SNA-KDD 07: Proceedings of the 9th WebKDD and 1st SNA- KDD 2007 workshop on Web mining and social network analysis, pages 56 65, New York, NY, USA, ACM. IV [13] Adam N. Joinson. Looking at, looking up or keeping up with people?: motives and use of facebook. In CHI 08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, pages , New York, NY, USA, ACM. IV [14] Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5): , III [15] Balachander Krishnamurthy, Phillipa Gill, and Martin Arlitt. A few chirps about twitter. In WOSP 08: Proceedings of the first workshop on Online social networks, pages 19 24, New York, NY, USA, ACM. IV [16] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or a news media? In WWW 10: Proceedings of the 19th international conference on World wide web, pages , New York, NY, USA, ACM. III [17] Changhyun Lee, Haewoon Kwak, Hosung Park, and Sue Moon. Finding influentials based on the temporal order of information adoption in twitter. In WWW 10: Proceedings of the 19th international conference on World wide web, pages , New York, NY, USA, ACM. III [18] Alan Mislove, Bimal Viswanath, Krishna P. Gummadi, and Peter Druschel. You are who you know: inferring user profiles in online social networks. In WSDM 10: Proceedings of the third ACM international conference on Web search and data mining, pages , New York, NY, USA, ACM. I 102

103 CURRENT RESEARCH ON TWITTER [19] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report , Stanford InfoLab, November Previous number = SIDL-WP III [20] Josep M. Pujol, Ramon Sangüesa, and Jordi Delgado. Extracting reputation in multi agent systems by means of social network topology. In AAMAS 02: Proceedings of the first international joint conference on Autonomous agents and multiagent systems, pages , New York, NY, USA, ACM. III [21] Kate Starbird and Leysia Palen. Pass it on?: Retweeting in mass emergencies. Information Systems for Crisis Response and Management Conference, V [22] T. Strufe. Safebook: A Privacy-Preserving Online Social Network Leveraging on Real-Life Trust. IEEE Communications Magazine, page 3, I [23] Sarah Vieweg, Amanda L. Hughes, Kate Starbird, and Leysia Palen. Microblogging during two natural hazards events: what twitter may contribute to situational awareness. In CHI 10: Proceedings of the 28th international conference on Human factors in computing systems, pages , New York, NY, USA, ACM. V [24] Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P. Gummadi. On the evolution of user interaction in facebook. In WOSN 09: Proceedings of the 2nd ACM workshop on Online social networks, pages 37 42, New York, NY, USA, ACM. I, VI [25] Jianshu Weng, Ee-Peng Lim, Jing Jiang, and Qi He. Twitterrank: finding topic-sensitive influential twitterers. In WSDM 10: Proceedings of the third ACM international conference on Web search and data mining, pages , New York, NY, USA, ACM. III, III [26] Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P.N. Puttaswamy, and Ben Y. Zhao. User interactions in social networks and their implications. In EuroSys 09: Proceedings of the 4th ACM European conference on Computer systems, pages , New York, NY, USA, ACM. IV [27] Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman. Sybilguard: defending against sybil attacks via social networks. In SIGCOMM 06: Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, pages , New York, NY, USA, ACM. I 103

104 LARGE-SCALE MULTIPLAYER GAMES AND NETWORKED VIRTUAL ENVIRONMENTS Large-scale Multiplayer Games and Networked Virtual Environments Leonhard Nobach Abstract How to create game infrastructures that have to deal with more and more users? How to create world-scale networks without any lag and waiting time experience? This paper focuses on problems of today s large-scale network games and virtual environments. In detail, it gives an insight about software architecture, network architecture and network communication paradigms of these games, always aiming to achieve the quality aspects Scalability, Adaptability, Fault-Tolerance as well as aspects subject to the player s perception, like game experience, especially the freedom from lag. This paper creates a generic network architecture for this kind of games. After that, three popular multiplayer online games are introduced, and it is shown how the vendors face the common problems of backend infrastructure. I. INTRODUCTION A. History of Multiplayer Games In computer history, large efforts have been made to create virtual reality. Algorithms evolved that generate a threedimensional projection of world models stored at the computer, with detailed aspects, like transparency, reflection, blur and so on. The market grew, and even the semiconductor industry supplied game developers with dedicated hardware components like graphics adapters, allowing them to gain a perception of world models in a computer that is not far away from reality. While the graphics evolved, you may forget that the evolution made progress in other directions, too. The common player did not want to play only against the artificial intelligence of virtual characters, the player rather wanted to challenge human beings. The first multiplayer games were made for being run at single computers. Here, two or more players each had controller pads (or even had to share a keyboard) and the screen was split, so that the players could challenge each other. As the personal computer gained a larger audience and network interfaces were standardized, multiplayer local-area network gaming gained popularity. Gamers organized LAN parties at regular intervals and connected their computers to networks, from small sizes to over participants. In the young years of the Internet, dial-up connections had a low bandwidth, a high latency and were too expensive to stay online during a whole game session. With broadband Internet access coming up, Internet gaming portals gained popularity. First, the players met in forums to organize game interconnections, here, each group still had a small player count. But the demand for being able to interact with more and more players reached a level, which confronted network architects with still unresolved problems. How to get the whole world together in one virtual world, without servers being overloaded and people getting disconnected? B. Massively Multiplayer Online Games (MMOGs) The pioneers of MMO-Gaming were Daimonin 1 and Neverwinter Nights 2. Daimonin, for example, is a game taking place in a medieval context. Daimonin is a fantasy roleplaying game. Key concept of role playing is solving of quests with some reward. Other elements are attaining skills and knowledge by fighting computer characters (Player-versus- Environment, PvE) as well as fighting other human characters (Player-versus-Player, PvP). In the most role-playing games, you will reach certain levels, giving you additional skills, the ability to use advanced weapons and glory among other players. Another game of this genre is the famous World of Warcraft, created by Blizzard Entertainment. While Daimonin, which is open source, is having an isometric third-person view with poor graphics, the World of Warcraft uses a more advanced 3D environment. Another today s role-playing MMOG is Eve Online, created by the Icelandic company CCP Games. The Eve Online universe plays in a science-fiction context, in contrast to the often medieval fantasy games. The player commands starships, and trades, fights and travels between multiple solar systems. The last game addressed in this paper is Linden Labs Second Life, which is not a role-playing game and very different from the other games named before. Second Life does not specify neither a genre, nor a particular environment. When you register the game, you create an avatar. This is a humanlooking person, that you can modify in many appearance aspects. But after you have logged in, you do not need to follow a game story created by the vendor. You rather create the world together with other players. For example, you can create objects from primitive shapes (like a sphere, a torus or cylinder) you can group them, model very detailed objects from them and place them in the environment, if you have permissions for that. Going further, you can apply scripts to objects and thus implement a behavior on them. Conceptually, this leads to the possibility to even create custom games and experiences in this virtual world. In practice, all the usergenerated content leads to scalability and efficiency problems of Second Life, limiting the potential. C. What this paper is about The Key Features of a Virtual World by Kumar et al. [1] give an overview about the aspects a virtual world should fulfill. Especially the features Server Scalability, Network Constraints and Object Encoding are aspects that intersect with the ones in this paper

105 LARGE-SCALE MULTIPLAYER GAMES AND NETWORKED VIRTUAL ENVIRONMENTS Chapter II will suggest a generic client-server architecture of Multiplayer Online Games. Chapter III will extend this architecture to achieve scalability by distributing load onto multiple servers. Chapter IV will go into detail: The infrastructures of three popular MMOGs are explained, especially their conceptual difference to the generic architecture. The rather short last chapter will round this paper up by giving a short comment on current infrastructure and gives an insight about what may come in later days. II. GENERIC ARCHITECTURE AND COMMUNICATION OF CURRENT MULTIPLAYER NETWORK GAMES This chapter describes a generic architecture of multiplayer network games as they are used in practice. The following lines do not resemble a special architecture for a network game, they rather give the reader an overview about stateof-the-art concepts, they show alternatives and give clues for improving the number of players that may interact with a single server. A. Client-Server architecture When multiple instances communicate, the first simple network architecture model to be referred to is often the clientserver concept. The server is an instance that defines the state of a world model, while clients request the server to influence the model state or to retrieve a part of this model needed by the client for representation. 2) Occlusion: The server may filter the representation sent to the clients, based on the events the game wants the players to perceive 3) Large parts of the game code remain in the custody of the server provider (which sometimes is the game provider). 4) Easier connection setup, e.g. there are no problems with NAT traversal. Since all games discussed in this paper make use of clientserver concept (but with multiple servers), we will focus on this strategy from now on. B. From the Client to the Server: Actions Whenever a player attempts to influence the game world, the player creates actions. Different types of player actions have different requirements which are important for their communication to the server. Requirements on player actions can be expressed using the values deadline and precision [19]. The deadline is the time after which the action does not fulfill its original purpose in the game world anymore. In contrast, the precision is the accurary of (parts of) the game state the player has to know to make a decision about initating an action. For example, the deadline is very small when shooting a game character with a sniper, since the character may only be visible for a split second, Here the precision is very high, too, since you need to know the exact position of the character you shoot on. The player s actions need to be propagated to the server to influence the game world. For this, messages containing the action are sent to the server. updates action updates Fig. 1: The client server topology and a distributed approach using n copies of the world state. Despite of the client-server concept s simplicity, some people prefer a decentralized approach, where a copy of the current model is kept on every client [28]. Changes to the model have to be multicast to all other clients to keep the model up-to-date (Figure 1). The advantages of the distributed approach are that there 1) is no single point of failure (like the server) 2) the latency is likely to be lower, since there is only one hop necessary for a state change, unlike the client-server concept. However, you will experience other obstacles when implementing a game in practice. These problems will not appear when using a client-server model: 1) Game world consistency: The game state is defined at the server. C. Duties of the Server The dominant purpose of the server is to maintain a model of the game world, this includes handling player actions. Whenever an action message arrives at the server, the server maintains changes to the game world that are induced by the action message given. This is a non-trivial step explained later, but we already disclose that it may take some processing power and thus time to finish. So when the server is busy maintaining the game universe, further action messages must be enqueued. Here, an advanced scheduling algorithm may be used like Start-time Fair Queuing [21]. This algorithm allows for enqueueing different action message types at different queues, while the queues are prioritized reciprocally to their deadline, which allows for urgent events (like shoots) to be handled earlier than e.g. moving or digging a hole. Despite the prioritization, it guarantees starvation avoidance, since eventually a lowprioritized message will be handled, even when the queue of urgent elements is always full. Now that the server is free to handle an action, it has to do various steps. These steps include, roughly in the following order, but may not be limited to: Validating the action: Is the player allowed to do such an action? For example if the region does not allow fighting, the sword can not be taken out of the sheath. A good 105

106 LARGE-SCALE MULTIPLAYER GAMES AND NETWORKED VIRTUAL ENVIRONMENTS validation of actions on the server side will avoid cheating e.g. by changing the client s execution code. Inducing changes on the game universe: Based on an action messages, objects and players may move, appear and disappear, their appearance may change and so on. Often, it is good for the later update procedure to not only change the state of the universe, but keeping a change log, too. Physics: Objects moving, appearing and disappearing may often affect the environment, or the motion of this object is affected by the environment. Possible tasks of physics engines are collision detection and object motion handling while respecting collisions with other objects, adhesion and roll resistance. To reduce complexity, there are physics that definitely should not be handled by the server. For instance, there are detailed physics that do not affect surrounding objects, but are rather important for the rendering process at the client. An example for this is a flattering cape, or a curtain which is pushed aside when moving the character under it. Artificial Intelligence: Non-Player Characters (NPCs) or Bots need to react on player actions and world changes. The Artificial Intelligence (AI) is processed at the server. The steps above can be strongly cohesive, since an object movement that is handled by the physics engine may again induce a change of the model and vice versa, or large parts of validation of the action may be done by the physics engine (like an avatar running against a wall). The level of cohesion depends on the complexity of the game. D. From the Server to the Client: Updates As the game world changes, these changes have to be propagated to the clients. These state updates need to be received by the client on time, because the player always needs accurate information about the avatar s environment to initiate further actions. You either can send updates to the clients whenever a state change occurs, or you can do it in a framed way [13], this means, state updates are accumulated and sent after certain time intervals. Typical rates for updating the client lie around 10 updates/second [30] and 45 updates/second (Second Life). These status updates are a great opportunity to save bandwidth, because the server should only send updates to clients that may receive and that are interested in them, this problem is called Interest Management [27]. In virtual environments, Interest Management may be based on the visibility of objects to the player (culling). The visibility can be calculated based on the euclidean distance of the avatar to the respective objects (sphere), this method can be extended by using the heading (cone or frustum) of the client (Figure2). When changing frustum settings, e.g. when using a spyglass in the game, we have to keep in mind that the client must always tell its new viewing settings to the server for culling calculations. To save even more bandwidth for update messages on very crowded servers, a crowd dynamics model can be applied (Parker and Sorenson, [30]). Here, important parameters of the moving behavior of a whole crowd of players is modeled, Fig. 2: Culling is not only done by the client s graphics adapter, but also by the server only sending necessary updates. Here are three options: sphere, cone and a spyglass cone. which is transferred to the client as a statistical function instead of every particular position of distant entities. Now the client can visualize this movement, using the shape of random characters which follow the crowd behavior model that was transferred and display them to the player. However, crowd dynamics should only be used for players very far away and hardly visible, since it will hide details of a co-player s appearance the player may be looking for. A good example is a player seeking a friend in a crowd, an information ignored in the statistical crowd function. Additionally, this technique can not be pplied to arbitrary objects which are not of an equal shape. Updates to the client can not only be culled, they can also be prioritized, so that the client receives updates to nearby objects faster. This is especially useful when the server reaches its upstream bandwidth limits. E. Latency and Lag The most common definition of latency of a network game is the time between an action initiated by the player and a reaction of the environment that the player, in reality, would expect to occur immediately. The most important parts of the communication process that cause latency are network layer delay for action and update messages, i.e. packets waiting in queues of intermediate routers and medium propagation delay in copper and fibre cables, action messages waiting in the server s queue, world modification execution time at the server, retransmissions, they will be discussed in the next chapter. Lag occurs when latency reaches a level which is negatively affecting game experience. Some authors [11] use lag as a synonym for latency. The following table is an overview of lag limits [20] according to the definition given above. Genre Lag limit Real-time Strategy >1s (Amer.) Football 500ms Racing 100ms First-Person Shooter ms 106

107 LARGE-SCALE MULTIPLAYER GAMES AND NETWORKED VIRTUAL ENVIRONMENTS F. Lag, Omissions and Recovery A common problem of packet switching networks are packet omissions that ocassionally occur, either once or in bursts 3. Esbensen [11] supposes a very simple communication protocol, which shall resemble the state of the art in current MMOGs. Here, an action message is very large and has to be split into multiple packets. The packets are then reassembled at the server, handled by it, and an update is sent back to the client, split into multiple packets as well and serving as an acknowledgement. If not all packets that were sent are received by the server (omissions), it times out and sends a message asking for retransmission to the client. If all packets could be transmitted, the client gets its action message acknowledged via the update message of the server. First, most urgent action messages do not need to be that large that they need to be split into multiple packets or IP fragments. Let us calculate an example: An update of the absolute position of an object is triggered. Let us assume a short header is transferred (byte), an object identifier (long) and the 3D coordinates of the object (each dimension long/double). Then we have 33 bytes of payload, which is much smaller than even a pessimistic MTU 4 of 500 bytes in the Internet. Second, when omissions occur, there are more advanced retransmission mechanisms than timing out and retransmitting the whole action message, for example TCP s Fast Retransmit: When a single packet is omitted and the next one arrives, a second acknowledgement (Dup-Ack) is sent for the last packet received. So the retransmission lasts only approx. 1*RTT 5 longer than the correct transmission and may result in lag, but this is very much faster than waiting for a timeout, asking for retransmission and retransmitting the whole action message again. TCP s Fast Retransmit is part of TCP Reno, which is widely used in the network stacks of operating systems today. Esbensen [11] further suggests that most retransmissions can be avoided by sending messages in single packets carrying its own payload and additionally redundant payload of the messages sent before. This may help to reconstruct lost messages without further need for retransmits with a certain probability. When omission bursts occur, this strategy will likely fail, since the first packets ommitted by the burst will not be reconstructed by the following packets, also omitted. Additionally, this technique is inappropriate when bandwidth capacity of the transmission is a matter. Otherwise, the redundant message payload will dramatically increase needed bandwidth and will lead to the intermediate routers dropping additional packets due to network congestion. Esbensen claims that 90% of protocol packet transmissions and 98% of lag can be reduced, but does not show or link to resources that prove that or show his measurement approach. An additional fact is, that many actions and update messages are self-healing. This means that when ocassional omissions occur, a small state deviation at the client side or an action is not crucial for the game experience, and will eventually 3 Many packet omissions in succession 4 Maximum Transmission Unit 5 Round-Trip Time be obsoleted by further state updates. In this case, it may not be necessary to use a reliable communication protocol. For example, Second Life updates are based on UDP packets, where a large portion of them are not retransmitted in case of omissions [24]. III. ADDING THE MASSIVELY TO NETWORK MULTIPLAYER GAMES In the previous chapter, we have shown several paradigms for client-server communication in online environments, focusing on reduction of bandwidth and complexity, thus resulting in additional player capacity per server. This chapter now focuses on concepts that allow to distribute online environments onto multiple servers. A. Dominant: Spatial Distribution In this context, scalability is achieved, when an addition of processing cores that act as servers for the game can always result in a roughly proportional win of player- or world size capacity. In all games analyzed, a spatial distribution is used: the game world, in our previous chapter handled by an authoritative server, is split up into multiple regions 6. A region runs in a region process. Since a player avatar always interacts heavily with its nearby environment which is part of a region, the player s client is connected to the corresponding region server. The region is influenced by the players actions, leading to action messages being sent to the region server process for handling. Additionally, the region s current state changes are periodically sent from the region server process to the player s client for representation. When splitting a game universe into regions for seamless region interactivity (next chapter), a region must have defined borders and neighbors. When using homogeneous region bounds, we have to define a region border shape that allows either 2D (planar worlds, impenetrable sky and ground) or 3D tesselation, like a hexagon, a square or a cube. Voronoi regions are an example of an inhomogeneous splitting. B. Handing Over between Regions If we just split the game universe into regions, they would be sealed off from another. Regions have to communicate with each other to ensure a typical MMORPG experience. The most important type of interaction is avatar crossing. It should be desirable to enable object crossing in realistic MMOGs, too. Let us suggest a very simple mechanism, ensuring the possibility of avatar movement between regions. A region contains portals to other regions. A portal is a data structure consisting of a 1) unique identifier inside the region, 2) a geometric shape in the region world (e.g. a rectangle), typically at the border of regions or in the ground. 3) a point and direction in the region world (spawn point) and 4) a transport address of another region server process, 6 The term used for parts of the game world may vary between games 107

108 LARGE-SCALE MULTIPLAYER GAMES AND NETWORKED VIRTUAL ENVIRONMENTS 5) an identifier of another portal in the above region server process. Whenever the physics engine of a process m detects collision or intersection of an avatar A with a portal P s geometric shape P(2), the handover is initiated: m tells process n addressed by P(4) that it has to expect a handover of avatar A. m tells the client of avatar A that the avatar has to be handed over to process n. The client of avatar A, formerly connected to m connects to n. n tells the m that the avatar shall be handed over, along with all properties of it. m removes the avatar A. n s region spawns the avatar at the spawn point (3) of the portal P(5). To reduce lag, which will possibly occur here, region processes may keep connections to portal-referenced neighbor processes alive. Portal references should be symmetric, this means that for every portal q, a portal p addressed by q(4) and q(5) should in turn address portal q via p(4) and p(5). Whenever walking into a portal, the player will enter the region behind the portal from his perspective. Symmetric relations will allow the avatar to go back to the former region as well. Until now, the transfer between regions is not seamless. A player walking into another region may suddenly jump to the target region portal s spawn point, stand still and face another direction. For example, Legend of Zelda - Ocarina of Time had this spawn behavior when loading another region (although it was not a distributed game). To extend the above model to a portal behavior with seamless physics, we have to transfer several physical properties along with the avatar handover, like the avatar s position, direction and velocity. With this information, the target process can calculate position of, direction of and velocity at the reentry point, making the spawn point obsolete. In a networked virtual environment, it is desired to enter other regions by not only crossing regions through small portals, but by using any point of a region s bounds for crossing. model from above, every line/face of the border is a portal pointing to the corresponding border in the adjacent region A summary of region handover strategies are depicted in Figure 3. Handing over objects to another region process is important for seamlessness in environments where (non-avatar) objects are allowed to cross regions. The procedure for object handover can be derived from the seamless portals procedure for avatar movement, but without any client-server communication taking part. C. Watching into other regions For a seamless border behavior, you often want players to be able watch into neighbor regions without interacting with them. For achieving this goal, you can distinguish between two options (Figure 4): Proxy: The region a client s avatar is currently in keeps connections to neighboring regions, receives updates from them and propagates them to its clients (if necessary, depending on culling strategies). This strategy is easier in terms of connection management, but may result in more latency and will not inherently enable a quick handover, like explained in the following strategy. Direct: Clients connect to neighbor regions as well and receive status updates from them, if the avatars are in range. This approach has the lowest status updates delay, and also supports a quick handover without lag, since the connection is already established prior to entering a neighbor region. Furter advantages are low latency of update messages and low payload usage. The problem is that many connections have to be maintained, depending on the number of neighbors. In a 2D-square-tiled world, 9 connections (edge-to-edge-joined neighbors included) have to be maintained by the client and 9 connections per client by the server (on average) a b c Fig. 3: Different strategies for region handover, shown on a two-dimensional map. a) Portals, b) Portals with seamless physics, and c) Seamless Borders. The yellow area is the region where players may reside. For seamless border handovers, a region needs defined borders and neighbors (see previous chapter). Applying our Fig. 4: Proxy-based and direct neighbor region state updates. In the latter case, the dotted connections are kept open, but at the current avatar position no state updates are propagated, because the culling strategy forbids the propagation of updates to the client. D. Adaptability The spatial distribution of game universes along with region handovers ensures scalability of player and object capacities 108

109 LARGE-SCALE MULTIPLAYER GAMES AND NETWORKED VIRTUAL ENVIRONMENTS with a growing game world and seamless movement between regions, assumed that players are equally distributed throughout the game world. But in reality, players are attracted by some hot spots that get overcrowded very fast, while many regions are empty. We can summarize different levels of adaptability: Static, homogeneous. The region processes, running regions with homogeneous bounds are distributed on several hardware units 7. Some servers run multiple, only ocassionally used regions, while a process running a usually crowded region may run on a dedicated machine. Processes that get too crowded may be shut down manually, save their state and be recreated on processing units with less load. Players currenly on the region will need to disconnect. Static, heterogeneous. Like above, but the bounds of a region may be dynamically established, which results in regions with heterogeneous sizes and neighbor count. Game operators try to predict the level of concurrent users in an area of the game world and use a smaller region size in crowded areas than in less crowded, thus reducing the number of players per region process. The problem is that the smaller the regions are in an area in the world is, the higher the IPC 8 per area/space measure unit will be, caused by many handovers. A player that moves through a crowded area will experience a lot more lag caused by the handover between small regions. Hot-Swapping Regions [6], homogeneous / heterogeneous. If a region s process is highly utilized, the process can be serialized and transferred to a hardware unit with more dedicated processing power and bandwidth. In turn, processes that tend to have only a small crowd can be moved to machines that host many low-utilized regions. During the swapping, players may notice a considerable lag, but will not get disconnected. Challenges are connection management, load measurement and the selection of an algorithm that decides when to swap. Since a region swap itself may need a lot of processing power and bandwidth, it is important that high throughput between the servers is possible and large hysteresis is used in the swapping decision algorithm. Region Splitting, heterogeneous. Here region can not only be transferred between servers, but they can even be split into smaller regions to distribute load onto many hardware units, if necessary. Regions with few utilization can, in turn, be conjoined to a larger region to avoid unnecessary handovers/ipc between them. This is the most advanced grid-based adaptation mechanism discussed here. Other strategies might be to provide incentives for users to leave to lower-crowded regions, like taxes on transactions in highly crowded ones. 7 In this context, hardware units are units with one or more processor cores, its own memory and network interface, A good example for a hardware unit is a dedicated server or a server blade. 8 Inter-Process Communication E. Appearance, Inventory and Skills Large-scale network games are well received among players, not least due to the capability to customize an avatar in many aspects, gain virtual property, skills and strength. These aspects were not considered until now. It is certain that, in a game where these properties are of importance, they strongly influence the possible actions for a player and the outcome for the environment. In games where the properties of a player, especially the skills, are of importance for the player s esteem, it is desired to ensure that these properties can only be altered in a way the rules of the game constitute. The common approach is to establish communication between the region server and a central or distributed database management system, maintaining the player properties. Trust is an important aspect that is required in both directions to ensure that cheating is made hard to the maximum extent. The region process has to trust the database to make sure that the player s authentic properties are returned. Conversely, the database has to trust the game server that it only changes the player s properties in a way that conform to the game s rules. If we put all the infrastructure in the hands of the game provider, this trust is easily ensured. All popular MMO games are currently maintained by a central, authoritative instance. In all of them, avatar skills are crucial for game experience. Second Life even has currency that can be exchanged to realworld money. Although player properties are very different among MMOs and boundaries overlap, we try to introduce a certain nomenclature and their meaning for infrastructure decisions: Transient: Player properties are not of importance for the development of a character. A loss of the properties will not mean a great loss for the player, or will not mean an influence on game perception of other players. This is typical for games for ocassional players. For example, a wide-scale car racing game, where you just select a car with a certain appearance and gain stats while you are online. This property is making it easy from the database management s point of view, since there will be few or no communication necessary between the database and the region servers. Evolving: Player properties are being developed along frequent game sessions. Typically, skills are attained, items are created and taken into an inventory, bought and sold to other residents. Properties attained will have a strong effect on the game experience for the player and possible opponents. This usually requires moderately higher communication between the database and region servers, since the region server needs to know more about the players properties for calculating interaction with the environment. Despite, the main problem will be higher trust demands between the region server and the game database, since the motivation for cheating and hacking the game will be accordingly higher. Vendor-generated: This is especially related to items and appearance. The finite set of items is defined and implemented by the vendor (or another game authority). The advantage for client-server communication is that the 109

110 LARGE-SCALE MULTIPLAYER GAMES AND NETWORKED VIRTUAL ENVIRONMENTS client can store these items representational state (i.e. the 3D shape and textures) locally and load them on demand, the server just needs to update the client about a current item/appearance identifier, if needed. Databases storing the player properties are not suffering from storing and transmitting the whole item s representation. User-generated: Items and appearance are not defined by the vendor, they are rather created by the players themselves. The possible set of items is infinite, if you can think of it, you can create it. Impacts on infrastructure and requirements are obvious: Game databases have to store more information for representation (for use at the client) and for behavior (for use at region servers) of such items. Additionally, this will result in a lot more traffic between database, regions and clients. A good example for a borderless transition between vendorgenerated and user-generated player properties is the appearance creation in Second Life: Here, you can adjust properties of your body with a lot of sliders: like chin length, face width, hair at the front, waist and leg size and many more. By adjusting them randomly, you will likely create a character unique among all players, although the function that derives a geometrical shape from the slider positions is created by Linden Labs, the publisher of Second Life. IV. BACKEND STRATEGIES IN PRACTICE: POPULAR MMO-GAMES This chapter presents the three popular large-scale multiplayer online games Second Life, World of Warcraft and Eve Online. It explains how they implement the generic clientserver and grid model we constructed in the chapters before, and shows how they conceptually differ from it. In some cases, detailed specifications of backend communication are of strategic importance for the vendors, leading to non-disclosure [22] and even legal actions against projects trying to reverseengineer backend infrastructure [4]. A. Second Life In Second life, the world map is divided into m 2 square regions. Every Second Life player runs a viewer that authenticates at a central authentication server. Viewers are spatially distributed on processes, called simulators. Every simulator handles such a 256mx256m region, which keeps track of the region s state, sends updates to the clients that are currently connected to the region an acts as a proxy to further, generally central servers for the client. The simulator s purpose is focused on the following jobs: Keeping track of the avatars currently in the region: The simulator keeps track of the avatar s positions. An avatar can enter a region by either teleporting or walking/flying into it from another region. The viewer the avatar is using tells the server about keystrokes, chat messages and other commands the player has entered. This way, the server can update the avatar s state and calculate interaction with the world. Storing objects in the region: If it is permitted by the land owner, an avatar connected to a simulator can position an asset at a place of choice (called rezzing in Second Life nomenclature). There are two ways to rez an item with different backend communication paradigms. Second Life allows building custom objects from primitive shapes (e.g. a cube, a cylinder or a torus). When you want to do that, action messages to create the primitive shape and further ones are sent to the simulator you are currently on. The other way is to take an object from your inventory for rezzing. For this, the simulator calls an asset server, this is where your inventory is stored. Then it loads the complete object from your inventory and places it where you selected it to appear. Communication with adjacent regions: We already mentioned that the avatar may leave a region and enter another one. In order to do this properly, adjacent regions are connected via a circuit (UDP connection) and must at least transfer the current state, the entrance position and the motion vector of the avatar. Since the avatar may carry objects with it (e.g. when sitting in a car and driving into another region), more complex objects have to be transferred between regions. As a Second Life user, you will experience a short freeze while you are moving between two regions, especially when one of them is crowded, thus overloaded. Physics: The server processes object interaction using a physics engine. This includes, but is not limited to collision detection. The Second Life simulators run the physics engine Havok 4. Non-interactive physics (like the flattering of capes in the wind) is done at the viewer side [24] Execution of Scripts: In Second Life, scripts can be applied to objects that are written in a language solely created for Second Life, the Linden Scripting Language. They are executed in a virtual machine on the region server. LSL scripts strongly interact with the environment and the physics engine. Periodic updating of the viewers: If objects or avatars change in the region, these changes are propagated to the viewers. To reduce the amount of bandwidth for updates, the simulator does visibility computing for a viewer and only sends objects to it that may be visible. The algorithms currently used for visibility computing are currently closed-source, but may be improvable [24]. Running one simulator on a 256mx256m square does not mean that the simulator has to run on a dedicated server for itself. A server may run multiple simulators at once if they are not overcrowded. The far most traffic-consuming property of Second Life, action and environment reaction, is distributed in a location-based way, but there are still central instances for many transactions: A login server authenticates the viewer and maintains a handshake with the region to connect to. The authentication procedure is done via an XML remote procedure call. Second Life organizes its square regions in an xy coordinate space. A space server knows the location of every region in the grid and provides the corresponding server 110

111 LARGE-SCALE MULTIPLAYER GAMES AND NETWORKED VIRTUAL ENVIRONMENTS Fig. 5: User-generated content in OpenSimulator, a reimplementation of the Second Life backend: A catamaran. Later, scripts can be applied that set the sails according to the wind and make it able to move. associated to a region. E.g. a server can ask the space server for its neighbors. Asset servers and further central database servers. For this, Linden Labs use storage clusters from Isilon Systems. [1] While testing Second Life, we noticed that nearly every interaction which depends on the asset storage cluster is laggy. When rezzing a medium-detailed object even as the only viewer in an uncrowded region, the object takes up to 10 seconds until it is appearing. Second Life is the game with the far most user-generated content introduced here (see Figure 5). Second Life s main bottleneck are the asset clusters, often reaching capacity limits [1]. B. World of Warcraft The World of Warcraft currently has 11 million subscribers [31]. These subscribers are obviously no stale accounts, since the players pay a fee every month and so they all might regularly use it. Thus, Blizzard, Inc. has the financial power to invest into a large-scale infrastructure. Blizzard is the company that does worst in disclosing its infrastructure [4], [22], but still, it is possible to determine key concepts related to our reference architecture. Main concepts are the environment solely generated by Blizzard s staff, the creation of hundreds of copies of the world called realms and instancing, a method to create regions only for access for certain groups of players on demand. 1) Vendor-Generated Environment: Blizzard has a team of 51 artists [10], dedicated to creating the environment, appearances and 1.5 million [10] unique items. The shape, textures, animations and behavior of these assets are not placed on the (region) servers to send it to the clients on demand. Instead, these items are shipped with the game and are being installed at the client as compressed files. This is obviously why World of Warcraft needs up to 15 gigabytes of disk space [16]. Compressed files are unpacked and loaded into memory, whenever it is demanded. The concept that unburdens the servers from a lot of client update information requires a region server to supply clients with updates regarding only the following parts of the environment: Avatars of other players (like movement, gestures etc...) Non-player characters (NPCs) Mobile Objects (called MOBs). Updates of skills and stats, chat messages. Whenever the vendor-generated content of the World of Warcraft changes, patches are released. These patches can not only be installed using patch servers of Blizzard, but also a recommended tracker-based peer-to-peer network can be used. This additionally saves bandwidth for Blizzard regarding media distribution, and supplies users with a faster download access. 2) Realms: The World of Warcraft is not unique, there are rather hundreds of copies of the world. These copies are called realms. Currently, there are 200 realms in North America only. Realms do not interact, an avatar can only be created for a particular realm, the transfer of avatars between realms is only done restrictively, either when a realm gets overloaded, or on a paid basis, and then, the transfer can last several days. Creation of separate realms has two advantages: Lower infrastructure cohesion. The inter-process communication required is reduced to a per-realm basis. Every realm only needs one separate database and storage cluster. As the audience for World of Warcraft increases. a well-proven infrastructure only needs to be copied several times to create additional realms, without the concern of scalability. Lower world size. The size of the world can be lower than in a unique-world model, since we have to assume that the world size has to increase with the number of players the world contains. This results in lower design effort for the staff, and reduces the amount of disk space needed by the clients and the bandwidth needed for patching. In the World of Warcraft community, it is common to confuse realms with servers. But a realm always consists of multiple region and database servers. Currently, the World of Warcraft employs more than servers around the world [31]. If we assume a total of 400 realms worldwide, we can estimate there are about 30 servers working for a realm on average. 3) Instancing: There are some regions in the World of Warcraft which are solely created on a per-group basis, especially dungeons. This means, whenever a group of players enters such a dungeon, a separate instance of this dungeon is created solely for these players. Other players will not be able to enter this instance, instead, when they enter the dungeon 111

112 LARGE-SCALE MULTIPLAYER GAMES AND NETWORKED VIRTUAL ENVIRONMENTS too, a new one is created for them in turn. Additionally, the game experience of such instanced regions is another one, it is rather based on a sequence of obstacles the player has to face, than the typical network virtual environment behavior where the player can enter and leave whenever he or she wants. Instance regions launch a presumably dedicated process to a group of players with limited size, reducing lag due to server congestion in areas with hard and complicated tasks, where lag caused by server congestion will cause the most undesirable impact on game experience. 4) Transport Protocol: World of Warcraft, like many clientserver online games, uses TCP as a transmission protocol. C. Eve Online Although the game is fully proprietary, CCP, the vendor of Eve Online, allows to gain a rough insight into the backend infrastructure in contrast to Blizzard [6], [7]. Like Second Life, but in contrast to the World of Warcraft, Eve Online uses a single-world approach rather than creating several copies of it. 1) Distribution Strategy: Using our reference model, regions 9 in Eve Online are solar systems, where every solar system is a process that can be distributed among Eve s more than 5000 IBM blade servers. Eve uses a trick that is offered by its sci-fi genre, which makes inter-region communication like handovers easy from an infrastructure designer s view. In the Eve universe, travels between solar systems are only possible using jump drive or jump gates [2]. Travel time: Travelling between solar systems may waste some time until arrival. This time can be used for a handing over between two region processes without any hectics. Here, lag may occur without an impact on game experience. Invisible adjacency: Because of the large distance, events in other solar systems are not visible to the players in a particular system. This makes it unnecessary to update clients across regions. Eve s database is centralized and is run on an SQL server. 2) Adaptability: Eve is a good example for MMOGs suffering from mass attraction phenomena, which are constantly overloading several regions. Like explained in our reference architecture, Eve runs crowded regions on dedicated machines, while others share a server among other regions. Eve plans to extend adaptability to be able to hot-swap regions between servers, while even being able to split up transactions inside a region process to multiple processing cores, using Infiniband technology, a high-bandwidth and low-latency serial bus [6]. 3) Stackless Python [14]: To provide feasible support for highly concurrent environments, language-specific tools of Stackless Python are exploited. Instead of pushing information aside to the processor s stack for executing a subroutine (like a procedure), Stackless Python uses special data structures to store this information. Stackless Python allows for fast 9 Note that Eve uses the term region in another context than we use it in our reference model. In Eve, regions are groups of solar systems. In this paper, region always refers to the term of our generic architecture switching between tasklets (microthreads) in a single core by using its own virtual machine for context switching. Just an example: If hundreds of player-versus-players fights are handled by one server, we may have hundreds of threads calculating the outcome of a fight. Stackless Python does better in concurrent threads than traditional stacked processor-based machines. It does not have inherent efficiency improvement for multi-processor threading [15]. Stackless Python is not only used at the server side, but also at the client side. 4) Proxies: Another characteristic of Eve are proxies, hiding backend connections of the clients to the region servers. An Eve player stays connected to a proxy server throughout the game session, no matter which solar system the client is on. No further connections have to be set up by the client when changing the region. Additionally, the proxy server concept is an opportunity to supply the region server with trusted information about the player, like skills and items, anti-cheat free, consistent and mutually excluded. This may help to unburden the central database. V. CONCLUSION In Eve and the World of Warcraft, the world is created mainly by the vendor. Of all games introduced here, Second Life is the only multiplayer environment really allowing the creation of user-generated content with all of its aspects. To come back to Kumar s Key Features of a Virtual World [1], it is the only virtual world that really fulfills the most of these aspects in terms of current architecture, licensing and the vendor s motivation. But as we said, currently, Second Life has to cope with heavy infrastructure problems, caused by their asset storage cluster. The World of Warcraft and Eve Online do better in performance by providing vendor-generated content, limiting the opportunities to customize the world. VI. FUTURE PROSPECTS Linden Labs, Inc. recently released the viewer under an open-source license. Soon, the community launched the open source projects OpenSimulator (.Net/Mono) and the Hippo OpenSim Viewer. OpenSimulator is an open implementation of the Second Life grid. Based on OpenSimulator, the OSGrid was launched, a grid where everybody can set up own region servers and connect them to the grid. OpenSimulator also lets new ideas emerge, like the Hypergrid, which uses a Web-based concept to intertwine heterogeneous region servers among each other. The original Second Life idea already reaches protocol status 10. In the future, we will probably have a virtual universe with defined and open-source protocols and interfaces, where game designers will use them to design their own games on top of it. But until then, we have to purge problems with efficiency, scalability, adaptability and lag. Peer-to-Peer network virtual environment infrastructure is under research today. The work formerly done by servers is distributed among the player s hosts themselves, obsoleting a backend infrastructure hosted by the vendor. This will allow the vendors to publish online games without remarkable infrastructure costs. 10 Metaverse Exchange Protocol, see 112

113 LARGE-SCALE MULTIPLAYER GAMES AND NETWORKED VIRTUAL ENVIRONMENTS REFERENCES [1] Second life grid update from fj linden. blogs.secondlife.com/community/features/blog/2009/01/12/ second-life-grid-update-from-fj-linden, January I-C, IV-A, V [2] About eve online. Online, May IV-C1 [3] A beginner s guide to creating a mmorpg. articles/building-mmorpg/, May [4] Blizzard v. bnetd. May IV, IV-B [5] Configuration of wow backend servers. configuration-of-wow-backend-servers html, May [6] Eve evolved: Eve online s server model /09/28/eve-evolved-eve-onlines-server-model/, May III-D, IV-C, IV-C2 [7] Eve online architecture. eve-online-architecture.html, May IV-C [8] Gdc austin: An inside look at the universe of warcraft. gamasutra.com/php-bin/news_index.php?story=25307, May [9] Immense scale and interconnection in eve-online. edu/gamegeog/2010/02/12/109/comment-page-1/, May [10] Interviews: World of warcraft - lead designer rob pardo. gamespy.com/pc/world-of-warcraft/568494p2.html, May IV-B1 [11] Online game architecture: Back-end strategies. com/gdc2005/features/ /esbensen_01.shtml, May II-E, II-F [12] Opensim load balancing and region splitting. wiki/opensim_load_balancing_and_region_splitting, May [13] Second life wiki. May II-D [14] Stackless python project size. May IV-C3 [15] [stackless] stackless python in a multicore environment. stackless.com/pipermail/stackless/2007-august/ html, May IV-C3 [16] World of warcraft community site. May IV-B1 [17] Ahmed Abdelkhalek, Angelos Bilas, and Andreas Moshovos. Behavior and performance of interactive multi-player game servers. In Cluster Computing, [18] Susanne Busse, Ralf-Detlef Kutsche, Ulf Leser, and Herbert Weber. Federated information systems: Concepts, terminology and architectures. Technical report, [19] Mark Claypool and Kajal Claypool. Latency and player actions in online games. Commun. ACM, 49(11):40 45, II-B [20] Matthias Dick, Oliver Wellnitz, and Lars Wolf. Analysis of factors affecting players performance and perception in multiplayer games. In NetGames 05: Proceedings of 4th ACM SIGCOMM workshop on Network and system support for games, pages 1 7, New York, NY, USA, ACM. II-E [21] Pawan Goyal, Harrick M. Vin, and Haichen Cheng. Start-time fair queuing: A scheduling algorithm for integrated services packet switching networks. In In Proceedings of ACM SIGCOMM 96, pages , II-C [22] Bruce Hack, Mike Morhaime, Jean-Francois Grollemund, and Nichol Bradford. Introduction to vivendi games - investor presentation /y22210exv99w1.htm, May IV, IV-B [23] Wolfgang Karner, Markus Rupp, and Philipp Svoboda. Traffic analysis and modeling for world of warcraft. Technical report, [24] Sanjeev Kumar, Jatin Chhugani, Changkyu Kim, Daehyun Kim, Anthony Nguyen, Pradeep Dubey, Christian Bienia, and Youngmin Kim. Second life and the new generation of virtual worlds. Computer, 41(9):46 53, II-F, IV-A [25] Jay Lee. Relational database guidelines for mmogs. gamasutra.com/resource_guide/ /lee_01.shtml, May [26] Emmanuel Léty, Thierry Turletti, and François Baccelli. Score: a scalable communication protocol for large-scale virtual environments. IEEE/ACM Trans. Netw., 12(2): , [27] Michael R. Macedonia, Michael J. Zyda, David R. Pratt, Paul T. Barham, and Steven Zeswitz. Npsnet: A network software architecture for large scale virtual environments, II-D [28] Martin Mauve. How to keep a dead man from shooting. In In Proceedings of the 7 th International Workshop on Interactive Distributed Multimedia Systems and Telecommunication Services (IDMS) 2000, pages , II-A [29] Lothar Pantel and Lars C. Wolf. On the impact of delay on realtime multiplayer games. In NOSSDAV 02: Proceedings of the 12th international workshop on Network and operating systems support for digital audio and video, pages 23 29, New York, NY, USA, ACM. [30] J. R. Parker and Nathan Sorenson. A novel network architecture for crowded online environments. In Sandbox 08: Proceedings of the 2008 ACM SIGGRAPH symposium on Video games, pages , New York, NY, USA, ACM. II-D, II-D [31] Brendan Sinclair. Blizzard outlines massive effort behind world of warcraft. May IV-B, IV-B2 [32] Daniel Terdiman. second life : Don t worry, we can scale. http: // May [33] Tracy V. Wilson. How world of warcraft works. howstuffworks.com/world-of-warcraft.htm, May

114 P2P ONLINE SOCIAL NETWORKS P2P Online Social Networks Sascha Nordquist Abstract Dieses Paper behandelt P2P Online Social Networks. Bisherige Online Social Networks (OSNs) basieren auf einer Client-Server Lösung. Der P2P Ansatz kann zur Lösung einiger Probleme wie Sicherheit, Datenschutz, Performance und Kosten verwendet werden. Dazu werden in diesem Paper Dazu ihre Ziele, Probleme, Vor- und Nachteile vorgestellt. Als Stellvertreter werden die P2P Online Social Networks PeerSoN, Safebook und LifeSocial vorgestellt und anhand der Hauptfunktionen eines OSN verglichen. I. EINLEITUNG UND MOTIVATION Online Social Networks werden immer beliebter und für manche Internetnutzer ist Facebook sogar Das Internet. Es gibt hunderte Sozialer Netzwerke 1. Laut angaben von Facebook, eines der größten Sozialen Netzwerke, hat allein Facebook mehr als 400 Millionen aktive Benutzer. Durchschnittlich hat ein Benutzer dort 130 Kontakte und mehr als 500 Milliarden Minuten im Monat, hochgerechnet auf die Benutzer, wird Facebook von ihren Benutzern genutzt. Umso wichtiger ist es, bei diesen Sozialen Netzen, auf Sicherheit und Datenschutz zu achten. Es hat sich immer wieder gezeigt, dass durch Sicherheitslücken in den Systemen Benutzerdaten ausspioniert oder an dritte verkauft werden. OSNs werden dazu verwendet, um Daten mit anderen Internetnutzern (zum Beispiel Familie, Freunden, Arbeitskollegen,...) auszutauschen. Dazu kann im Normalfall jeder Nutzer ein eigenes Profil erstellen, in dem er sich selbst beschreibt. Eine weitere Kernfunktion eines OSN ist, dass man Kontakte verwalten kann. Mit diesen Kontakten kann man dann über das OSN Daten (Nachrichten, Bilder,...) austauschen. Viele Nutzer von Facebook, StudiVZ oder anderen Netzwerken wissen gar nicht, wer sich alles ihre privaten Bilder oder Gästebucheinträge anschaut und wofür diese Daten benutzt. Aufgrund dessen machen sich viele keine Gedanken darüber, welche Art von persönlichen Information, zum Beispiel Fotos von der letzten Party, sie in ihrem Profil veröffentlichen. Dies kann jedoch für die spätere Karriere zum Problem werden, da heutzutage von vielen Firmen die Profile möglicher, zukünftiger Mitarbeiter angeschaut werden, bevor diese schließlich eingestellt werden. 2 Sogar nach dem löschen des eigenen Profils bleiben die Daten bei einigen OSN Providers weiterhin gespeichert und man hat keinen Einfluss mehr auf die Daten. Zu den Problemen eines OSN mit einem zentralen Server gehören unter anderem Sicherheit, Datenschutz, Kosten und Last des Servers. Um einige Probleme von herkömmlichen Client-Server OSNs zu lösen, gibt es einige dezentralisierte Peer-to-Peer 1 Top 100 Soziale Netzwerke in Deutschland: top-100-soziale-netzwerke-deutschland/ 2 (P2P) Ansätze. Einer der Hauptvorteile eines P2P Online Social Networks ist, dass es keinen zentralen Server gibt. Die Kommunikation zwischen 2 Benutzern des OSN findet also nicht über einen Server, sondern direkt zwischen 2 Benutzern oder über andere Peers in diesem Netzwerk statt. Im weiteren Verlauf dieses Papers wird zuerst auf ein paar Grundlagen und Definitionen eingegangen (Kapitel II). Im Kapitel III werden auf die Vor- und Nachteile einer dezentralen Architektur eingegangen. Im Anschluss werden die Ziele (Kapitel IV-A) und Probleme (Kapitel IV-B) in Peerto-Peer Online Social Networks erklärt. Daraufhin werden die gefundenen Lösungsansätze den jeweiligen Zielen zugeordnet (Kapitel V). Im Kapitel VI werden die Hauptfunktionen eines Sozialen Netzwerkes beschrieben. Dazu werden die drei P2P Ansätze PeerSoN, Safebook und LifeSocial verglichen. Im Weiteren wird im Kapitel VII auf das Thema Datenschutz, im Kapitel VIII auf das Thema Sicherheit, im Kapitel IX auf die Verfügbarkeit und Kapitel X auf Integrität von Daten eingegangen. Zum Ende des Papers wird in Kapitel XI ein Fazit gezogen. A. DHT II. GRUNDLAGEN UND DEFINITIONEN Eine DHT ist eine verteile Hashtabelle und ist eine spezielle Art eines P2P Netzwerks. Die Grundfunktion einer DHT ist das Speichern von Daten in einem P2P Netzwerk. Dazu können Daten in einem (key, value) Paar gespeichert werden. Mit Hilfe des key kann dann die value ausgelesen werden. OpenDHT, KAD und FreePastry sind einige von vielen DHT Implementierungen. KAD (Kademlia) hat den Vorteil, dass jeder Peer einer Identität zugewiesen werden kann. B. OSN Online Social Network Ein Online Social Network ist, ganz allgemein, ein Service im Internet, dass Kommunikation und Freundschaftsbeziehnungen zwischen Personen im Internet ermöglicht. Beispiele für große Soziale Netzwerke sind: Facebook, MySpace, StudiVZ, und viele mehr. C. P2P OSN Ein Soziales Netzwerk, dass auf einer Peer-to-Peer Lösung basiert. D. Ein Peer eines P2P OSN Ein Peer eines P2P OSN ist zum Beispiel Ein Computer, Laptop, Smartphone oder ähnliches, welches am P2P OSN teilnimmt. 114

115 P2P ONLINE SOCIAL NETWORKS E. Man-In-The-Middle Angriff Bild 1: Man In The Middle 3 Bei diesem Angriff kann Mallory (Bild 1) die Daten zwischen Alice und Bob abhören, manipulieren oder unterdrücken. Alice und Bob müssen davon nichts mitbekommen. F. Sybil Attack Eine Sybil Attacke passiert wenn ein P2P Netzwerk von einem Angreifer mit vielen gefälschten Identitäten besetzt. Dadurch hat der Angreifer Kontrolle über einen Teil des Netzwerks und kann damit unter anderem Daten unterdrücken, falsch weiterleiten usw. G. Eclipse Attack Bei einer Eclipse Attacke kann als Fortführung einer Sybil Attacke gesehen werden. Dabei hat ein Angreifer so viel Kontroller über ein P2P Netzwerk, dass er dieses Netzwerk zum Beispiel in mehrere Teile aufspalten kann. H. Impersonation Attack Bei einem Impersonation Attack geht es hauptsächlich darum, dass versucht wird eine falsche Identität vorzutäuschen. In einem P2P OSN wäre das der Fall, wenn sich ein Benutzer als ein anderer ausgeben könnte. I. Matryoshka Matryoshkas sind eigentlich kleine Holzfigürchen die man ineinander setzen kann. Dieses Prinzip wird zum Beispiel von Safebook als Modell zur Speicherung von Daten im P2P Netzwerk verwendet. 3 Quelle Bild 1: Bild 2: Struktur eines Matryoshkas [9] Matryoshkas bei Safebook sind ineinander verschachtelte Ringe. Der innerste Knoten ist eine Person bzw ein Peer. Im innersten Ring befinden sich die Kontakte dieser Person. Im 2. Ring werden die Kontakte der Benutzer des 1. Rings gespeichert (Freundesfreunde). Die nachfolgenden Ringe sind gleichermaßen aufgebaut: Freunde der Freundesfreunde, Freunde der Freunde der Freundesfreunde, usw. Die Benutzer ab dem 2. Ring müssen jedoch nicht unbedingt in einer Beziehung zueinander stehen. Die Verbindungen zwischen den Knoten in Bild 2 sind die Kontaktbeziehungen zwischen den einzelen Personen. J. TIS Trusted Identification Service. Dieser Service wird von Safebook verwendet, um jedem Benutzer eine eindeutige Identität zu geben. [10] A. Vorteile III. DEZENTRALE ARCHITEKTUR Da herkömmliche OSNs auf einem zentralen Server liegen, liegt der gesamte Traffic auf diesem einen Server. Umso mehr Benutzer ein solches OSN haben, umso teurer ist der Server. Bei einem P2P Ansatz gibt es keinen zentralen Server und die Last wird auf die einzelnen Peers verteilt. Ein Peer in einem P2P System ist ein Benutzer des Netzwerks (zum Beispiel ein PC, Notebook, PDA, Smartphone). Dadurch entstehen für den Betreiber eines P2P OSN weniger Kosten. Darüber hinaus muss der Betreiber bei einem wachsenden OSN die Infrastruktur nicht selbst erweitern, da sich die Infrastruktur in einem P2P Netzwerk, durch die hinzukommenden Peers, selbst erweitert. Umso größer ein solches P2P Netzwerk wird, desto robuster wird es. B. Nachteile Da bei P2P OSNs kein zentraler Service Provider vorhanden ist, gibt es niemanden, der bösartige oder illegale Inhalte löscht oder bösartige Benutzer sperrt. [3] 115

116 P2P ONLINE SOCIAL NETWORKS A. Ziele IV. ZIELE UND PROBLEME VON P2P OSN Datenschutz kann verbessert werden, da jeder Peer selbst Kontrolle über seine Daten hat und diese nicht auf einem Server gespeichert werden. Sicherheit kann verbessert werden, da es zum Beispiel keinen zentralen Server gibt, der zum Ziel eines Angriffs werden kann. Performance kann verbessert und Kosten können verringert werden, da die Rechenlast auf den Peers verteilt werden kann. Filesharing - Hiermit kann man zum Beispiel mit seinen Freunden Daten austauschen. B. Probleme Wo werden die Daten gespeichert? Was passiert, wenn ein Peer offline geht? Wie und wo kann ich mich im Netzwerk registrieren? Wie und wo kann ich mich im Netzwerk einloggen? Wie stellt man Datenschutz sicher? Wie verhindert man Manipulation von Daten? Wie kann man nach Kontakten suchen? Was passiert, wenn viele Peers gleichzeitig offline gehen? A. PeerSoN V. EINORDNUNG DER GEFUNDENEN ANSÄTZE PeerSoN 4, welches eine Art Vorgänger von Safebook ist, legt den Fokus auf Datenschutz und Sicherheit. PeerSoN ist ein Prototyp eines P2P OSN. [3], [4] Dieses System basiert auf einer DHT (OpenDHT). B. Safebook Safebook 5 ist ein P2P Online Social Network, welches sich auf Datenschutz und Sicherheit fokussiert. [9], [11] Es kombiniert eine DHT und einen Ring des Vertrauens (Matryoshka), in welchem Daten nur bei vertrauenswürdigen Peers gespeichert werden. Safebook besteht aus folgenden 3 Hauptkomponenten: [11] 1) einige Matryoshkas (Siehe: II-I) 2) ein P2P Substrat (zum Beispiel DHT - Siehe: II-A) 3) ein trusted identification service (TIS - Siehe: II-J) C. LifeSocial LifeSocial 6 legt den Fokus auf die Vorteile in Datenschutz, Sicherheit, Performance und Kosten eines P2P OSN [5] [7] D. RetroShare, Tribler, Maze RetroShare 7, Tribler 8 [2], [13] und Maze [8] haben Filesharing als Ziel. Diese Systeme sind hauptsächlich Filesharing Programme, welche nur nebensächlich Funktionen eines Sozialen Netzwerks implementieren. Aus diesem Grund wird im Rest des Papers nicht weiter auf diese eingegangen. 4 PeerSoN: 5 Safebook: 6 LifeSocial: 7 Retroshare: 8 Tribler: E. Likir Ein P2P Framework [12] auf dem ein OSN aufgebaut werden kann. Es hat als Hauptziel den Schutz vor Angriffen (Sicherheit) wie Sybil und Eclipse Attack durch einen identity based DHT. Dazu definiert Likir eine neue Art von DHT, welches auf KAD basiert. Likir besteht aus 3 Modulen Certification Service (CS), welcher einem Benutzer während der Registrierungsphase eine eindeutige Benutzer Id zuweist. Ein Interaktions Protokoll, welches asymmetrische Verschlüsselung verwendet und signierte Ids auszutauschen. Ein Reputation System (RS), welches die Bewertung anderer Peers erlaubt. So kann man Peers identifizieren, die nicht vertrauenswürdig sind. F. diaspora The privacy aware, personally controlled, do-it-all, open source social network. 9 Dieses P2P OSN legt, wie die Definition schon sagt ihren Fokus auf Datenschutz. Dieses Netzwerk wird hier nicht weiter betrachtet, da es zu diesem Netzwerk keine veröffentlichten Papers oder Implementierungsdetails gibt. G. HelloWorld HelloWorld ist ein OpenSource und verteiltes soziales Netzwerk. Bei diesem Netzwerk kann jeder Benutzer selbst entscheiden, auf welchem Server seine Daten gespeichert werden. Es bietet unter anderem auch ein P2P Client, für das direkte Versenden und Empfangen von Nachrichten. Es legt ebenfalls großen Wert auf Datenschutz. [1] 10 Wenn man nämlich die Daten auf einem lokalen Server speichert, kann man selbst entscheiden, welche Daten man zeigt, welche Daten man ändert oder löscht. Dieses Netzwerk ist kein klassisches P2P System. Es wird hier jedoch trotzdem erwähnt, da es ein Ansatz zur Dezentralisierung eines Sozialen Netzwerks ist. A. Account erstellen VI. GRUNDFUNKTIONEN EINES OSN Um an einem Sozialen Netzwerk teilzunehmen, muss zuerst einmal ein Account erstellt werden. Dazu gibt es unterschiedliche Strategien. Zum einen gibt es soziale Netze, bei denen man von einem anderen Benutzer eingeladen werden muss, um sich einen Account erstellen zu könnnen. Das hat den Vorteil, dass man nach der Registrierung bereits einen Kontakt hat. Andere Netze widerrum erlauben eine Registrierung ohne vorherige Einladung. Einige der größeren sozialen Netzwerke wie Facebook oder wer-kennt-wen änderten die Strategie nach einiger Zeit von Einladung zur freien Registrierung. 1) PeerSoN: PeerSoN lässt eine freie Registrierung zu. Es benutzt zur Identifizierung einen Hash der Adresse oder des öffentlichen Schlüssels des Benutzers. 9 diaspora: 10 HelloWorld: 116

117 P2P ONLINE SOCIAL NETWORKS 2) Safebook: Safebook erlaubt nur Registrierungen durch vorherige Einladung. [9] Dies hat den Vorteil, dass die Verfügbarkeit des eigenen Profils besser gesichert ist. Weitere Informationen dazu im Kapitel IX. Nachdem sich der Benutzer registriert hat, wird dessen Matryoshka aufgebaut. Diese Matryoshka wird beim Account erstellen einmal angelegt und im Weiteren immer aktuell gehalten. 3) LifeSocial: LifeSocial bietet die Möglichkeit einer freien Registrierung durch Angabe eines Benutzernamens und eines Passworts. Danach wird ein minimales Profil, ein privater und ein öffentlicher Schlüssel für diesen Benutzer angelegt. B. Einloggen Wenn man einen Account erstellt hat, muss man sich mit diesem im OSN anmelden können. In OSNs wie Facebook oder MySpace passiert dies durch Angabe von Benutzername/ Adresse und dem Passwort. Bei diesen Netzwerken meldet man sich am zentralen Server an. Bei P2P OSNs gibt es zwei Möglichkeiten dies umzusetzen. Anmelden bei einem beliebigen Peer bei speziellen Peers (z.b. den eigenen Kontakten) 1) PeerSoN: Bei PeerSoN findet der Login bei einem beliebigen Peer statt. Nach erfolgreichem Login sendet PeerSoN eine Meldung an den DHT, dass der Peer nun online ist. 2) Safebook: Die Anmeldung bei Safebook erfolgt nur bei Peers innerhalb des eigenen Matryoshka. Der Anmeldeprozess startet dabei am äußersten Ring der Matryoshka. [9] 3) LifeSocial: LifeSocial [5] bietet die Möglichkeit der Anmeldung an einem beliebigen Peer im Netzwerk. Die Anmeldung funktioniert ähnlich wie die Registrierung. Wenn im Registrierungsverfahren festgestellt wird, dass der Benutzer bereits existiert, dann wird dieser angemeldet. Dazu sendet der Peer, der zur Anmeldung benutzt wird, eine Anfrage an andere Peers, um die vom Benutzer eindeutige nodeid zu suchen. C. Kontakte verwalten Ein Soziales Netz kann man als einen Graph aus Kontakten verstehen. Um diesen Graph zu erweitern, muss ein soziales Netzwerk die Möglichkeit bieten, die eigenen Kontakte zu verwalten. Hierfür gibt es in sozialen Netzwerken unterschiedliche Umsetzungsmöglichkeiten. Jeder Benutzer kann seine Kontakte hinzufügen, ohne vom anderen Benutzer eine Bestätigung zu bekommen. Dieser Fall führt zu einem Graph, bei dem es Kanten (Kontaktbeziehungen) in eine oder in beide Richtungen gibt. (Gerichteter Graph) Eine Kontaktanfrage muss vom Empfänger zuerst akzeptiert werden, damit dieser Kontakt in die Kontaktliste aufgenommen wird. Dies führt zu einem ungerichteten Graph. Bei allen vorgestellten P2P OSNs muss aus Datenschutzgründen eine Kontaktanfrage vom Empfänger akzeptiert werden. D. Daten mit den Kontakten austauschen Eine weitere Grundfunktion ist das Austauschen von Daten mit den eigenen Kontakten in Form von Nachrichten, Statusmeldungen, Bildern, Blogeinträgen, Gästebucheinträgen oder ähnlichem. In den vorgestellten P2P OSN (PeerSoN, LifeSocial, Safebook) wird großen Wert auf Datenschutz gelegt. Deswegen werden alle Daten zwischen beiden Kommunikationspartnern verschlüsselt. Mehr dazu im Kapitel VII. E. Asynchrone Nachrichten In OSN kann man ebenfalls Nachrichten schreiben, auch wenn der betreffende Benutzer gerade offline ist. Sobald dieser wieder online geht, soll er diese Nachricht erhalten. In Client-Server basierten Systemen können diese Nachrichten einfach in einer Datenbank gespeichert und beim nächsten Login wieder aus der Datenbank ausgelesen werden. Bei P2P Netzwerken muss dazu eine andere Lösung gefunden werden. 1) PeerSoN: Wenn Bei PeerSoN eine Nachricht von Peer A nach Peer B gesendet wird, dann wird im OpenDHT nachgeschaut, ob Peer B online ist. Wenn ja, dann wird diese Nachricht direkt an Peer B gesendet. Wenn Peer B allerdings offline ist, dann wird die Nachricht im OpenDHT zwischengespeichert. Die Nachrichten werden allerdings nur 1 Woche zwischengespeichert und sie dürfen eine Länge von 800 Zeichen nicht überschreiten. (Limitierungen von OpenDHT) [4] 2) Safebook: Safebook bietet ebenfalls die Möglichkeit von Offline-Nachrichten. Diese werden gleich behandelt wie Profil-Informationen. 3) LifeSocial: LifeSocial benutzt einen DHT Overlay, um diese Offline-Nachrichten zwischen zu speichern. F. Kontakte suchen Um ein Soziales Netzwerk aufzubauen, muss man seine Kontakte in diesem Netzwerk erst einmal finden. Dazu wird von OSNs eine Suche nach Kontakten angeboten. Das ist in P2P Systemen eine größere Herausforderung, da man keine zentrale Datenbank hat, in der alle Benutzer verwaltet werden. Des Weiteren ist dies bei den vorgestellten P2P Systemen nicht mehr so gut lösbar. Um eine Person um Internet finden zu können muss man zum Beispiel dessen Namen, Wohnort, Interessen oder ähnliche Informationen über den Benutzer öffentlich zur Verfügung stellen. Wenn allerdings, wie in den bekannten Systemen, standardmäßig alle Informationen vor anderen Benutzern versteckt wird und man eigentlich nur die Existenz eines Benutzers feststellen kann werden Suchanfragen häufig erfolglos bleiben. Benutzer könnten zum Beispiel direkt eingeladen werden oder Kontakte in der Kontaktliste der eigenen Kontakte finden. Man könnte in einem P2P OSN eine Suchfunktion implementieren, in dem man einen Kompromiss bei den Datenschutzeinschränkungen eingeht oder man den Benutzern erlaubt Inhalte zu veröffentlichen, die jeder Peer es Netzwerks lesen kann. In diesem Fall könnte man eine Suchanfrage, mit einer bestimmten TTL (time to live), an alle Kontakte schicken. Die Kontakte leiten diese Suchanfrage dann weiter bis die TTL abgelaufen ist. 117

118 P2P ONLINE SOCIAL NETWORKS VII. DATENSCHUTZ Datenschutz in OSNs geht nicht nur um den Schutz von Profilinformationen, welche der OSN Benutzer seinen Kontakten zur Verfügung stellt, sondern auch um die Versendung von Daten zwischen dem Benutzer und seinen Kontakten. Es geht auch darum, dass dies nicht nachverfolgt werden kann. Diese Daten könnten zum Beispiel Nachrichten, Bilder oder ähnliches sein. A. PeerSoN Der Datenschutz wird bei PeerSoN durch Verschlüsselung der Daten erreicht. Dazu wird mit privaten und öffentlichen Schlüsseln gearbeitet, um den Kontakten Zugang zu persönlichen Informationen zu erlauben. Neben der Verschlüsselung verwendet PeerSoN auch eine Rechteverwaltung (Access Control). Mit dieser Rechteverwaltung können die Benutzer festlegen, wer welche Daten sehen darf. [4] B. Safebook Safebook setzt auf das Vertrauen zu den eigenen Kontakten [9]. Im Normalfall werden alle persönlichen Informationen über einen Benutzer versteckt. Nur die ausgewählten Kontakte haben gesonderte Rechte und können Informationen über den Benutzer abfragen. Des Weiteren bietet Safebook auch die Möglichkeit einer detaillierten Rechteverwaltung, in welcher jedes Attribut des Profils berücksichtigt wird. Beim Veröffentlichen von Daten wird ausgewählt, welcher Klasse diese Daten angehören. Diese Klassen sind: private (Nicht veröffentlicht), protected (veröffentlicht und verschlüsselt) und public (veröffentlicht und unverschlüsselt). [9] Jede Kommunikation zwischen 2 Benutzern ist geheim, so dass niemand Informationen, außer den beiden Kommunikationspartnern, aus Request und Response beziehen kann. Dazu verwendet Safebook Multi-Hop Routing [11] von außen nach innen, durch die Ringe des Matryoshkas. C. LifeSocial LifeSocial verwendet, wie PeerSoN, auch eine Verschlüsselung und eine feingranulare Rechteverwaltung. [5] Wenn ein Benutzer neue Daten verfasst, dann werden diese Daten mit einem symmetrischen Schlüssel verschlüsselt. Danach kann von dem Benutzer ausgewählt werden, welche anderen Benutzer Zugriff auf diese Daten haben dürfen. Der symmetrische Schlüssel wird mit jedem öffentlichen Schlüssel der berechtigten Benutzer verschlüsselt, so dass dieser mit ihrem privaten Schlüssel entschlüsselt werden kann. So wird sichergestellt, dass nur die Benutzer, die einen passenden, privaten Schlüssel haben, die Daten entschlüsseln können. Alle vorgestellten P2P OSNs haben zusätzlich noch den Vorteil der Dezentralisierung, da die Daten bei einem P2P OSN nicht, wie bei einem normalen OSN, auf einem zentralen Server gespeichert werden. Durch diese Dezentralisierung ist das OSN unabhängig vom Betreiber und ein Missbrauch der Daten durch den Provider des OSN wird verhindert. VIII. SICHERHEIT Bei P2P Systemen ist es noch wichtiger als bei Web- Anwendungen Benutzerdaten zu verschlüsseln, da die Daten nicht nur auf einem Server, sondern auf vielen Peers liegen. Sollte man die Daten nicht verschlüsseln, könnten Daten von unberechtigten Benutzern ausgelesen oder manipuliert werden. In den vorgestellten P2P OSN werden alle Daten verschlüsselt. Bei einem zentralen Server besteht die Gefahr eines Angriffs, wodurch es zu Missbrauch der Benutzerdaten kommen könnte. Ausgeschlossen ist solch ein Angriff bei einem dezentralen Netzwerk, da die Daten nicht auf einem zentralen Server liegen. Sollte ein Angriff auf einen einzelnen Peer gelingen, wären die Konsequenzen viel geringer, als auf einem zentralen Server, da die Daten nur einen einzelnen Benutzer des OSNs betreffen. Es werden ebenfalls alle Nachrichten zwischen Benutzern verschlüsselt, vom Absender signiert und vom Empfänger verifiziert. Dadurch wird ein Man-in-themiddle Angriff verhindert. Es werden folgende Angreifer unterschieden: [9] ein bösartiger Benutzer ein bösartiger Provider ein Angreifer, der den Datenverkehr abhören oder manipulieren kann A. Safebook Safebook verwendet zur Authentifizierung einen trusted indentification service (TIS - Siehe: II-J) welcher Sybil und impersonation attacks verhindert [10], indem er jedem Benutzer ein eigenes Pseudonym, eine NodeID und ein zugehöriges Zertifikat zuweist. Das Pseudonym wird als key für die DHT verwendet. B. PeerSoN PeerSoN verwendet einen nicht vertrauenswürdigen externen Service (OpenDHT - Siehe: II-A). Dadurch ist die Sicherheit und Privatsphäre bei PeerSoN schwächer als bei Safebook. [9] Benutzer in PeerSoN haben zur Identifizierung eine GUID (Global Unique Id), welche ein Hash der Adresse oder ein Hash des öffentlichen Schlüssels des Benutzers ist. PeerSoN ist allerdings nicht gegen Impersonation und Sybil Attacks geschützt. Es wird zwar eine mögliche Lösung gegen Impersonation Attacks mit Hilfe eines Challenge/Response Protokoll erwähnt, wurde aber in PeerSoN bisher nicht implementiert. Sybil Attacken können nicht abgewendet werden, da es ist nämlich kein Problem viele Adressen zu besitzen. C. LifeSocial LifeSocial benutzt wie PeerSoN einen externen DHT: FreePastry [6]. Nach der Registrierung wird der Benutzer 118

119 P2P ONLINE SOCIAL NETWORKS mit einer eindeutigen userid und Authentifizierungsinformationen ausgestattet. Damit kann er sich dann bei jedem beliebigen Peer einloggen. Dadurch ist ein Impersonation Attack ausgeschlossen. Durch die freie Registrierung ist eine Sybil Attake wie auch bei PeerSoN möglich. IX. VERFÜGBARKEIT Ein Problem von P2P OSNs ist die Verfügbarkeit von Informationen eines Peers, wenn dieser offline geht. Des Weiteren muss man auch Peers, die offline sind, Daten (z.b. Nachrichten) senden können. Die Problematik bei P2P besteht darin, dass die Daten irgendwo zwischengespeichert werden müssen. Falls dies nur bei dem Benutzer selbst passieren würde, wären die Daten weg, sobald er offline geht. Ein weiteres Problem ist, Daten aktuell zu halten. Wenn Daten zum Beispiel redundant auf mehreren Peers existieren und der Besitzer ein Update der Daten macht, dann müssen alle Peers, die die alten Daten halten, aktualisiert werden. Eine wichtige Frage hierbei ist, wo Daten gespeichert werden. Möglichkeiten zur Datenspeicherung wären zum Beispiel: [3] in einer verteilten Hashtabelle (DHT) bei zufälligen Peers bei den Kontakten A. PeerSoN PeerSoN verwendet beispielsweise eine verteilte Hashtabelle (DHT), um Daten der Benutzer verfügbar zu halten. B. LifeSocial LifeSocial benutzt dafür einen Storage and Replication Layer, der die Funktionalität einer verteilten Hashtabelle (DHT) mit ID-Based Routing und Key-Based Routing implementiert, um zum Beispiel Freundelisten, Gruppenmitgliedschaften, Alben-Listen und Foto-Listen zu speichern. [6] C. Safebook Safebook löst dieses Problem mit Hilfe von Matryoshkas. Das bedeutet, dass die Daten eines Peers ebenfalls bei den Kontakten (des innersten Rings des Matryoshka) dieses Peers liegen. Desto mehr Kontakte man hat, umso besser ist die Verfügbarkeit. Tests haben gezeigt, dass drei bis vier Ringe eines Matryoshka und ca. 23 Kontakte ausreichen, um 90% Verfürbarkeit zu erreichen. [10] Sollte ein Peer allerdings nur wenige Kontakte haben, ergeben sich daraus Nachteile in der Verfügbarkeit der Daten. Die eigenen Benutzerdaten werden verschlüsselt bei den Kontakten gespeichert. [9] Ein Vorteil bei diesem Verfahren ist die Vereinfachung von Updates [3]. X. INTEGRITÄT Bei der Integrität geht es darum, dass personenbezogene Daten nicht von unberechtigten Benutzern modifiziert werden dürfen. Hierfür könnten zum Beispiel diese Daten vom Besitzer signiert werden. A. Safebook Safebook verwendet neben signieren auch noch den TIS (Definition: II-J) um sicherzugehen, dass die Daten auch vom Richtigen signiert werden. B. LifeSocial Bei LifeSocial hat jeder Benutzer eine eindeutige userid und Authentifizierungsinformationen. XI. FAZIT In diesem Paper wurden Kernfunktionalitäten eines OSNs und die Lösungsansätze der drei wichtigsten P2P OSN vorgestellt. Soziale Netze werden im Internet immer populärer und es kommt in diesen Netzwerken immer wieder zu Datenklau oder Verkauf an dritte. Um die Daten der Nutzer zu schützen, wäre eine dezentrale Lösung besser geeignet, als eine Client-Server Lösung. Durch die Dezentralisierung würden die Daten zum einen gegen Angriffe des Providers geschützt werden und zum anderen auch die Benutzerdaten nicht, vom Provider, an Dritte verkauft werden. Anders als bei Client-Server Lösungen muss jedoch verstärkt auf die Verschlüsselung geachtet werden, weil die Daten meist redundant auf verschiedenen Peers gespeichert werden. Ob sich ein solches Netzwerk in näherer Zukunft, in größerem Umfang, durchsetzen wird, ist jedoch fraglich. Soziale Netzwerke leben eher von der Vielzahl von Funktionen, als von der Sicherheit von Daten. Das kommt daher, dass sich der Großteil der Benutzer eines sozialen Netzwerkes sich keine Gedanken macht, was mir den eigenen Daten passiert. Außerdem hat eine Vielzahl von Funktionen den Vorteil, dass das Soziale Netzwerk nicht so schnell langweilig wird und die Benutzer länger eingeloggt bleiben. Des Weiteren werden viel mehr finanzielle Mittel in kommerzielle Soziale Netze investiert, als in Netzwerke, bei denen niemand etwas verdient. REFERENCES [1] HelloWorld: An Open Source, Distributed and Secure Social Network V-G [2] S.M.A. Abbas, J.A. Pouwelse, D.H.J. Epema, and H.J. Sips. A gossipbased distributed social networking system. In Proceedings Wetice 2009, pages IEEE CS Press, V-D [3] Sonja Buchegger and Anwitaman Datta. A Case for P2P Infrastructure for Social Networks - Opportunities and Challenges. Snowbird, Utah, USA, February 2-4, III-B, V-A, IX, IX-C [4] Sonja Buchegger, Doris Schiöberg, Le Hung Vu, and Anwitaman Datta. PeerSoN: P2P Social Networking - Early Experiences and Insights. Nürnberg, Germany, March 31, V-A, VI-E1, VII-A [5] Kalman Graffi, Patrick Mukherjee, Burkhard Menges, Daniel Hartung, Aleksandra Kovacevic, and Ralf Steinmetz. Practical security in p2pbased social networks. In IEEE Society, editor, The 34th Annual IEEE Conference on Local Computer Networks (LCN), Piscataway, NJ, USA, Oct IEEE Computing Society, IEEE. V-C, VI-B3, VII-C [6] Kalman Graffi, Sergey Podrajanski, Patrick Mukherjee, Aleksandra Kovacevic, and Ralf Steinmetz. A distributed platform for multimedia communities. In IEEE International Symposium on Multimedia (ISM 08), page 6, Berkley, USA, Dec IEEE, IEEE Computer Society Press. V-C, VIII-C, IX-B [7] Kalman Graffi, Dominik Stingl, Julius Rückert, Aleksandra Kovacevic, and Ralf Steinmetz. Monitoring and management of structured peerto-peer systems. In 9th International Conference on Peer-to-Peer Computing 2009, pages IEEE Computer Society, Sep V-C 119

120 P2P ONLINE SOCIAL NETWORKS [8] Jinqiang Han Hua Chen, Xiaoming Li. Maze: a Social Peer-to-peer Networking. IEEE International Conference on E-Commerce Technology for Dynamic E-Business, V-D [9] Refik Molva Leucio Antonio Cutillo and Thorsten Strufe. Safebook: A Privacy-Preserving Online Social Network Leveraging on Real-Life Trust. IEEE Communications Magazine, II-I, V-B, VI-A2, VI-B2, VII-B, VIII, VIII-B, IX-C [10] Refik Molva Leucio Antonio Cutillo and Thorsten Strufe. Safebook: Feasibility of transitive cooperation for privacy on a decentralized social network. World of Wireless, Mobile and Multimedia Networks (WoWMoM), II-J, VIII-A, IX-C [11] Thorsten Strufe Leucio Antonio Cutillo, Refik Molva and Sophia Antipolis. Privacy Preserving Social Networking Through Decentralization V-B, VII-B [12] Giancarlo Ruffois Luca Maria Aiello. Secure and Flexible Framework for Decentralized Social Network Services V-E [13] J.A. Pouwelse, P. Garbacki, J. Wang, A. Bakker, J. Yang, A. Iosup, D.H.J. Epema, M. Reinders, M. van Steen, and H.J. Sips. Tribler: A social-based peer-to-peer system. Concurrency and Computation: Practice and Experience, 20: , February V-D 120

121 USER BEHAVIOR MODELING IN P2P SYSTEMS User Behavior Modeling in P2P Systems Malcolm Parsons Zusammenfassung Die Simulation von P2P-Systemen stellt eine wichtige Methode dar, um die Performanz von P2P- Anwendungen zu evaluieren. Damit realistische Simulationen durchgeführt werden können, muss ein realitätsnahes Benutzermodell verwendet werden. Untersuchungen zeigen, dass die Verwendung von realitätsfernen Benutzermodellen zu unzureichenden Simulationsergebnissen führen. Dies hat zur Folge, dass ein veröffentlichtes Produkt im praktischen Einsatz Schwächen in der Performanz aufweisen kann. Diese Arbeit führt in den Prozess der Erstellung von Benutzermodellen ein und präsentiert Arbeiten, die Benutzerverhalten sammeln, analysieren und Benutzer- und somit auch Workloadmodelle erstellen, um P2P-Anwendungen in realen Situationen evaluieren zu können. Es werden hierbei verschiedene Modelle für unterschiedliche P2P-Anwendungen im Bereich File-Sharing, Video-on-Demand und Online-Gaming vorgestellt. Verhaltensmodellierung von Nutzern hat sich zu einem wichtigen Forschungsbereich entwickelt, indem noch viele Forschungsmöglichkeiten existieren. A. Motivation I. EINFÜHRUNG Benutzermodell sind Modelle, welche das Verhalten von Nutzern an einem System oder in einer Anwendung modellieren. Solch ein Modell kann zur Generierung von Workload genutzt werden. Laut Calzarossa et al. [2] wird der Begriff Workload als Anfragen oder Befehle die vom System verarbeitet werden definiert. In manchen Arbeiten wird der Begriff von Benutzermodellen und Workloadmodellen gleichermaßen verwendet. Benutzermodelle sind essentiell für die Simulation und Evaluation von Anwendungen. Um eine Anwendung wie P2P- Anwendungen, Computersysteme, Netzwerke oder Serverarchitekturen testen zu können, existieren zwei Möglichkeiten. Die erste ist der Aufbau einer realen Testumgebung mit physischen Maschinen und Netzwerken. Die zweite Möglichkeit besteht in der Simulation einer Testumgebung. Der erste Ansatz eignet sich meist für kleinere Systeme mit geringeren Teilnehmern und einer kleinen Aufgabe. Zum Beispiel wurden klassische Serverarchitekturen auf diese Art getestet. In [1] wurde z.b. ein verteilter Dateiserver an einer Universität evaluiert. Steigt jedoch der Umfang des zu testenden Systems, soll z.b. die Auslastung und Performanz einer neuen P2P-File-Sharing- Anwendung mit mehr als 1000 Teilnehmern evaluiert werden, so macht eine simulationsbetriebene Evaluation Sinn. Es ist meist deutlich einfacher, 1000 Teilnehmer zu simulieren, als 1000 reale Computer inklusive der jeweiligen Benutzer für einen Test zu verwenden. Damit eine P2P-Anwendung auf Performanz und Design etc. evaluiert werden kann, wird ein Modell zur Simulation des Benutzerverhaltens benötigt. Das für die Simulation verwendete Benutzerverhaltensmodell hat einen maßgeblichen Einfluss auf das Simulations- und Evaluationsergebnis. So kann eine sehr umfangreiche Simulation für eine P2P-Anwendung durchgeführt werden, dessen Ergebnisse auf einen guten Einsatz schließen lassen. Weicht das reale Verhalten der Benutzer jedoch von dem geschätzten Verhalten in der Simulation ab, so sind alle Simulationsergebnisse bedeutungslos. Pussep et al. [12] untersuchen die Auswirkungen von unterschiedlichen Benutzermodellen auf ein zu testendes System. Laut ihren Ergebnissen wirken sich unterschiedliche Benutzerverhaltensmodelle unterschiedlich auf die Belastung eines Systems aus. Ein weiterer Grund und eine Herausforderung für die Notwendigkeit fundierten Wissens über das Verhalten von Benutzern in P2P-Anwendungen sind die P2P-spezifischen Eigenschaften. Jeder Teilnehmer einer P2P-Anwendung beteiligt sich in fast jedem Fall an der Verteilung des betreffenden Inhaltes dieser Anwendung. Durch die dezentrale Struktur des Netzes existiert kein zentraler Punkt, um Messungen durchführen zu können. Es ist somit schwieriger, Erkenntnisse und Messergebnisse und vor allem Schätzungen über das Verhalten von Nutzern zu erhalten, als dies in den klassischen Client-Server-Architekturen der Fall war. Dies ist ein wichtiger Punkt, um organisierte und wissenschaftliche Untersuchungen zu diesem Thema durchzuführen. Realitätsnahe Verhaltensmodelle sind somit unabdingbar, um genaue Erkenntnisse über Auslastung und Performanz einer Anwendung zu erhalten. Nur dann können Rückschlüsse bezüglich der Tauglichkeit des Systems bei einem realen Einsatz gezogen werden und dessen Dynamik verstanden werden. Des Weiteren spielen wirtschaftliche Interessen eine wichtige Rolle, da durch geeignete Simulationen die Entwicklung (Software- als auch Hardware-Seite) von über-proportionierten Produkten vermieden werden kann. Auch der Vergleich mit Konkurrenzprodukten (Benchmark) ist ein Anwendungsgebiet. B. Ziel Ein ideales Verhaltensmodell wäre eines, welches mittels Parameter für jegliche P2P-Anwendung (z.b. File-Sharing, Online-Gaming, Video-on-Demand) eingesetzt werden könnte. Die Parameter könnten immer weiter verfeinert werden und die Last für das System immer weiter erhöht werden, solange, bis die Auslastungs- und Performanzgrenze der jeweiligen Anwendung gefunden ist. Auf diese Weise gäbe es ein Benchmarkingmodell, welches für alle Anwendungen einsetzbar wäre. Ein solches universell einsetzbares Modell zu erstellen, ist jedoch sehr schwierig. Das Verhalten von Benutzern variiert stark bei verschiedenen Anwendungen. So verfolgen Nutzer bei File-Sharing-Anwendungen und bei Online-Spielen völlig unterschiedliche Strategien. Bisher wird das Verhalten von Nutzern eher anwendungsspezifisch untersucht und modelliert. 121

122 USER BEHAVIOR MODELING IN P2P SYSTEMS Allein hier gibt es bereits eine große Zahl von Parametern, mit denen ein Modell angepasst werden kann. Alle im Zuge dieser Arbeit gefundenen Modelle beschäftigen sich mit spezifischen P2P-Anwendungen oder Anwendungsgebieten. C. Voraussetzung Die Voraussetzung, um ein realistisches Modell entwickeln zu können, ist ein fundiertes Wissen über die Verhaltensweise von Benutzern. Der erste Schritt ist demnach das Benutzerverhalten in einem realen System zu analysieren. Ein mögliches Vorgehen wäre eine Loggingfunktionalität in eine Anwendungen einzufügen, die das Benutzerverhalten in Logdateien schreibt. Diese Dateien werden anschließend analysiert. Eine detailgetreuere Beschreibung wird in II-C vorgenommen. Im weiteren Verlauf dieser Arbeit beschäftigt sich Abschnitt II mit der Charakterisierung, Beschreibung und Modellierung von Nutzer- und somit auch Workloadmodellen. Abschnitt III stellt einige, in verschiedenen Arbeiten analysierte, Verhaltensweisen von Nutzern zu spezifischen Anwendungen vor. Abschnitt IV enthält eine Zusammenfassung der Arbeit. II. BENUTZER-/WORKLOAD-MODELLE In diesem Abschnitt wird auf die Eigenschaften, Anforderungen und die Erstellung von Benutzermodellen eingegangen. Benutzermodelle repräsentieren das typische Verhalten von Benutzern (die Wahrscheinlichkeiten von Aktionen, die in einer bestimmten Situation durchgeführt werden). Somit kann Workload für ein System generiert werden. Neben einem konkreten Benutzermodell existieren noch andere Systeme für die Erzeugung von Workload für ein zu testendes System. A. Anforderungen In diesem Abschnitt soll ein Überblick über die unterschiedlichen Phasen für die Konstruktion von Modellen gegeben werden. Hierfür existieren verschiedene Ansätze und Verfahren. Des Weiteren muss festgelegt sein, was die Eigenschaften (Kriterien) eines guten Modells sind und wie diese erreicht werden. Bodnarchuk et al. [1] haben drei Anforderungen (Kriterien) zusammengefasst, die die Güte von System bzw. Modellen zum Generieren von Workload bestimmen sollen: Genauigkeit (accuracy) eines Modells lässt sich durch den Vergleich mit einem realen System bestimmen. Ist die vom Modell generierte Auslastung eines zu testenden Systems ähnlich zu der Auslastung im realen Einsatz, so erfüllt das Modell dieses Kriterium. Reproduzierbarkeit (reproducability) wird erfüllt, wenn ein Modell zwei Workloads generieren kann, deren Effekt auf einem zu testenden System nicht unterscheidbar ist. Flexibilität (flexibility) ist von einem Modell erfüllt, wenn es Auslastungen mit verschiedenen Charakteristiken generieren kann. Es kann also mittels Parametern variiert werden. B. Modelltypen Es gibt verschiedene Arten von Workloadmodellen: Live-Load: Hierbei wird ein System real getestet. Zum Zeitpunkt des Tests werden Benutzeraktionen zum Testen des Systems verwendet. Dieser Ansatz hat mehrere Nachteile. Zum einen kann keine Aussage über die Repräsentativität des Workloads getroffen werden, da hier nur Aktionen über eine gewisse Zeitspanne verwendet werden. Der Workload besitzt eine hohe Genauigkeit, jedoch nur für den Zeitraum des Tests. Es muss genau untersucht werden, wie die Repräsentativität für die restliche nicht getestete Zeit ist. Ein Beispiel soll dies verdeutlichen. Würde man einen File-Sharing-Server an der Universität für einen Monat testen, so kann gesagt werden, dass (wenn der Test im Mai stattfand) für den Monat Mai das System getestet ist und die Auslastung des Systems bekannt ist. Äußere Umstände wie z.b. Semesterferien oder besondere Veranstaltungen die in anderen Monaten eintreten, würden von dem Live-Load jedoch nicht getestet sein. Workload-Traces: Dies sind aufgezeichnete Anfragen an das System und bieten Reproduzierbarkeit (anders als Live-Loads) und Genauigkeit. Einer der größten Probleme besteht in der Speicherung der großen Datenmengen, die Traces umfassen. Unpraktisch ist auch die Erfüllung des Kriteriums der Flexibilität. Sollen Workloads mit verschiedenen Charakteristiken vorhanden sein, so muss für jedes solches ein eigener Trace erstellt werden. synthetisches Workloadmodell: Ein synthetisches Workloadmodell wird aus Traces erstellt und bietet die Möglichkeit, zum einen weniger Platz zu benötigen und zum anderen über Parameter verschiedene Charakteristiken konfigurieren zu können. Der geringe Verbrauch an Speicherplatz ist z.b. durch die Funktionsweise eines solchen Modells zu erklären. Das Modell besitzt innere Zustände. Von diesen Zuständen aus werden unter Berücksichtigung diverser Abhängigkeiten in dem jeweiligen Zustand Entscheidungen getroffen. Solche Entscheidungen umfassen die zu wählenden Aktionen (z.b. kann eine Aktion A mit der Wahrscheinlich x ausgewählt werden). Eine Beispiel für die Darstellung eines solchen Modells ist Abbildung 1. Laut Calzarossa et al. [2] besitzt ein synthetisches Modell auch noch die Eigenschaft der Portabilität. Außerdem erfüllt es Genauigkeit, Reproduzierbarkeit, Flexibilität. Letzteres ist erfüllt, da das Modell parametrisiert ist. In der Arbeit von Lo et al. [9] wurde untersucht, inwieweit Workload-Traces und synthetische Workloadmodelle sich unterschiedlich auf die Performanz eins Systems auswirken. Das Ergebnis war, dass die alleinige Auswahl eines der Modelle kaum einen Unterschied aufweist. C. Erstellung eines Modells Die Erstellung eines Workloadmodells bedarf mehrere Schritte, die im folgenden beschrieben werden sollen. Als 122

123 USER BEHAVIOR MODELING IN P2P SYSTEMS erstes müssen Daten über die Aktionen, die Benutzer ausführen, gesammelt werden. Anschließend werden diese analysiert und letztlich aus den gewonnenen Erkenntnissen ein Modell erstellt. a) Protokollierung: Der erste Schritt ist die Sammlung von Aktionen, welche Benutzer in einer P2P-Anwendung ausführen. Es ist wichtig zu wissen, was ein Nutzer zu welcher Zeit für Aktionen ausführt. Ein mögliches Beispiel, um an solche Informationen zu gelangen, wäre die Modifikation eines File-Sharing-Clients. Durch das Hinzufügen einiger Zeilen Code könnten alle Aktionen und die Uhrzeit, zu denen sie ausgeführt wurden, in Logdateien gespeichert werden. Dies sollte für möglichst viele Benutzer und über eine möglichst lange Zeitperiode erfolgen. In diesem Schritt können sehr große Datenmengen anfallen. Sollte nicht genügend Speicherplatz verfügbar sein, können alternativ die Hauptbelastungszeiten des Systems bestimmt werden und nur während diesen Zeiten die Erfassungen durchgeführt werden. In Abschnitt II-B wurde das Konzept von Workload-Traces beschrieben. Diese eignen sich auch gut, da sie ebenfalls eine Sammlung von Nutzeraktionen darstellen. b) Analyse: Nachdem Daten über die Aktionen von Benutzern gesammelt wurden, müssen diese analysiert werden. Aufgrund der großen Menge an angefallen Daten ist es wichtig, die Übersicht zu behalten und nur die wichtigsten Informationen zu extrahieren. Zum einen müssen die Aktionen und Anforderungen der Nutzer identifiziert werden, welche die größten Auswirkungen auf das System haben. Dies könnten zum Beispiel (bei einer DHTbasierten P2P-Anwendung) eine sehr hohe Churn-Rate sein. Eine solche hätte das Versenden vieler Kontrollnachrichten (z.b. für den Aufbau der Routingtabellen), ein häufiges Neu- Versenden von Dateien oder eine große Anzahl an Anfragen bezüglich verschiedener Dateien zur Folge. Churn ist ein gut erforschter Bereich in P2P-Netzen (siehe hierzu [6]). Des Weiteren spielt es eine Rolle, wann und wie oft bestimmte Aktionen hintereinander ausgeführt werden. Zur Analyse der gesammelten Daten können numerische Analysetechniken und stochastische Prozesse verwendet werden, um dynamisches Verhalten aufzuzeigen. Neben der Identifikation von teuren Aktionen, ist die Erkennung von Mustern eine wichtige Aufgabe. Hier kann z.b. eine numerische Mustererkennung, wie Clustering, verwendet werden. Clustering versucht Strukturen/Muster in großen Datenmengen zu finden (siehe [2]). Hilfreich ist auch eine visuelle Analyse und eine visuelle Repräsentation der ermittelten Ergebnisse (siehe [2]). Gerade letzteres kann spezielle Zusammenhänge und Muster des Workloads aufzeigen. So kann z.b. ein Benutzerverhalten aufgedeckt werden, welches an verschiedenen Tagen und in unterschiedlichen geographischen Regionen (Europa, Asien, Nordamerika) oder nur während bestimmten Tageszeiten stattfindet. Eine weitere Erkenntnis könnte folgender Natur sein: 97% der Anfragen von Nutzern aus Nord Amerika werden nicht von Nutzern aus Europa gestellt [7]. c) Modellierung: Der letzte Schritt besteht darin, ein Modell aufzubauen, welches die extrahierten Informationen verwendet, um Benutzerverhalten zu repräsentieren und synthetischen Workload zu generieren. Hierfür kann versucht werden, Wahrscheinlichkeitsverteilungen zu finden, die die extrahierten Informationen beschreiben. Häufigkeiten können auch in Form von Tabellen, Arrays oder ähnlichen Speicherungsformen abgelegt werden, damit ein Algorithmus auf diese zugreifen kann. Dieser kann dann feststellen in welcher Situation und mit welchen Abhängigkeiten welche Aktion oder welche Eigenschaft des Nutzers mit welcher Wahrscheinlichkeit ausgeführt wird. Eine Art, Benutzerverhalten zu modellieren, ist die Repräsentation als Benutzerverhaltensgraph [2]. Ein Beispiel für einen solchen wären die Abbildungen 1 und 2. Ein weiteres wäre die Form eines Algorithmus der Schritt für Schritt Entscheidungen trifft und Workload generiert. Ein fertiges Modell wird aus allen notwendigen Informationen und Schlüsselfaktoren erstellt. Ein Einsatz eines solchen synthetischen Modells könnte nun folgendermaßen ablaufen: 1) Das System befindet sich in einer Testphase 2) Das Benutzer-/Workloadmodell wählt eine Aktion (z.b. Anfrage nach einer Ressource, Verlassen eines Netzes oder Anmeldung in ein Netz) aus 3) Die ausgewählte Aktion kann mit zusätzlichen Parametern erweitert werden. Zum Beispiel Anzahl der angeforderten Dateien, Größe der angeforderten Dateien, Anzahl der verschickten Requests usw. 4) Ausführung der Aktion Das Modell kann evaluiert werden, indem die Auswirkungen des generierten Workloads mit den Auswirkungen eines realen Workloads verglichen werden. III. VERHALTENSMODELLE VON BENUTZERN In diesem Abschnitt werden einige Ergebnisse verschiedener Untersuchungen über das Benutzerverhalten spezifischer Anwendungen vorgestellt. A. P2P-File-Sharing-Anwendungen 1) MAZE: Die Erstellung eines Modells für das Benutzerverhalten in einer P2P-File-Sharing-Anwendung wurde von [4] im Jahr 2009 durchgeführt. Die Autoren untersuchten Nutzer im Bezug auf ihr Verhalten bei einem vom ihnen nicht verursachten fehlgeschlagenen Download (retry-behavior) und die zeitliche Dauer bis eine fertig heruntergeladene Datei aus dem Programm entfernt wurde (retention time). Diese beiden Verhaltensmerkmale können starken Einfluss auf die Performanz von P2P-Anwendungen haben. Genaue Daten sind daher für realitätsnahe Simulationen wichtig. Für diesen Zweck wurden über die Dauer von zwei Jahren das Download- Verhalten von Benutzern des File-Sharing-Programms MAZE ( [13]) gesammelt und die resultierenden Logdateien anschließend auf die genannten Ziele hin untersucht. MAZE wird primär für die Verteilung von Videodateien verwendet und so die Autoren Untersuchungen sollen gezeigt haben, dass das Verhalten der MAZE-Benutzer sehr ähnlich zu Benutzern anderer File-Sharing-Programme ist. 123

124 USER BEHAVIOR MODELING IN P2P SYSTEMS Tabelle II PARAMETER FÜR DAS DATEI-ENTFERNUNGS-MODELL. P fr STELLT DIE WAHRSCHEINLICHKEIT DAR, DASS EIN NUTZER DIE DATEI AUS DEM PROGRAMM ENTFERNT, R c DASS EINE DATEI ÜBERPRÜFT (ANGESEHEN) WIRD, P d DASS EINE DATEI DIREKT NACH DER ÜBERPRÜFUNG ENTFERNT WIRD UND R r DAS EINE DATEI ENTFERNT WIRD, NACHDEM SIE ÜBERPRÜFT WURDE, ABER NICHT DIREKT DANACH ENTFERNT. Dateien P fr R c P d R r F % 93.6% 77.8% 16.1% F % 82.2% 90.7% 15.0% F % 93.4% 73.9% 6.0% F % 83.3% 92.9% 12.6% F % 95.5% 81.6% 5.6% Abbildung 1. Modell des Benutzerverhaltens bei Downloads. Tabelle I DIE WICHTIGSTEN PARAMETER FÜR DAS MODELL FEHLGESCHLAGENER DOWNLOADS UND DAS ERNEUTE HERUNTERLADEN SOLCHER FEHLGESCHLAGENEN DOWNLOADS. ES WERDEN DIE MODELLPARAMETER FÜR FÜNF REPRÄSENTATIVE DATEIEN PRÄSENTIERT. Dateien u 1 P us P s P f P r F % 27,1% 72,9% 39,3% F ,4% 25,6% 74,4% 47,1% F % 26,4% 73,6% 46,1% F ,5% 59% 41% 27,6% F ,7% 58,2% 41,8% 20,4% a) Retry-Behavior: Abbildung 1 zeigt einen Verhaltensgraph für Benutzer, deren Download einer Datei fehlschlägt. Die Erstellung des Modells ist hier jedoch nicht die Herausforderung, vielmehr sind konkrete Werte für die Parameter zu identifizieren. Es werden Werte für fünf repräsentative Dateien gezeigt. Die Dateien stellen Filme dar, von denen die ersten drei populäre Videos und die letzten beiden Filme für Erwachsene waren. Das Modell besitzt folgende Parameter: Der erfolgreiche Download P s Der fehlgeschlagene Download P f = 1 P s Die Wahrscheinlichkeit einen fehlgeschlagenen Download erneut zu Versuchen P r Die Wahrscheinlichkeit, dass generell aufgegeben wird und es nicht versucht wird, den fehlgeschlagene Download erneut herunterzuladen 1 P r Die Zahl der potentiellen Benutzer U(t) zum Zeitpunkt t Die Wahrscheinlichkeit eines Downloads C(t) Anteil der Nutzer, die irgendwann den Download erfolgreich beenden P us Die Zahl der Nutzer, die versuchen, die entsprechende Datei herunterzuladen u 1 Aus den Daten (siehe Tabelle I) kann geschlossen werden, dass Nutzer von MAZE wenig Probleme damit haben, einen fehlgeschlagenen Download erneut zu laden. Sie besitzen eine hohe Toleranz, was dies betrifft, da die Erfolgsrate (P s ) nicht hoch ist. 20,4% bis 47,1% der Nutzer versuchen erneut, einen fehlgeschlagenen Download zu laden. Des Weiteren wurde festgestellt, dass C(t) für die meisten überwachten Dateien den größten Wert innerhalb der ersten drei Tage nach Veröffentlichung der Datei annimmt, bei F 0 innerhalb der ersten zwei Tage. b) Retention Time: Die Zeit, nach welcher ein Download vollständig geladen ist und aus der P2P-Anwendung entfernt wird, wird als retention time bezeichnet. Hierbei spielen freeriding, die Dauer zwischen fertigem Download und dessen Überprüfung bzw. Anschauen sowie die Dauer zwischen der Überprüfung und der Entfernung der Datei eine Rolle. Tabelle II zeigt die Modell Parameter. Eine Beschreibung der Parameter ist in der Tabellen-Legende zu finden. Diese zeigen, wie unterschiedlich die Strategien von Nutzern im Bezug auf das Entfernen von fertiggestellten Downloads sind. Ungefähr die Hälfte der Nutzer sind free-riders Nutzer die nur an der Beschaffung der Datei interessiert sind und sich nicht an der Verbreitung der Datei beteiligen wollen. 90% überprüfen die Datei im Laufe eines Tages. Ungefähr 80% der Nutzer entfernen eine Datei, nachdem sie sie einmal verwendet haben (in diesem Fall sich den Film einmal angesehen haben) und 20% der Nutzer entfernen die Datei mit einer Rate von 10% pro Tag. 2) Query Verhalten bei Gnutella: In 2004 untersuchten [7] das Query-Verhalten von Nutzern der P2P-File-Sharing- Anwendung Gnutella [5]. Über die Dauer von 40 Tagen zeichneten sie alle Anfragen an einen modifizierten Client (der als superpeer lief) auf. Das Projekt hatte das Hauptziel, genügend Messungen zu liefern, um aus diesen ein synthetisches Modell entwickeln zu können. Analysiert wurden vor allem folgende Punkte: Der Anteil der passiven Clients. Passive Clients führen während ihrer gesamten Online-Zeit (Session) keine Anfragen durch Die Dauer von Sessions Für jede aktive Session Die Zahl der durchgeführten Anfragen Die Zeit zwischen zwei Anfragen Die Zeit bis die erste Anfrage gestellt wird Die Zeit nach der letzten Anfrage Die populärsten Anfragen Die in dieser Arbeit gewonnenen Erkenntnisse umfassen: 124

125 USER BEHAVIOR MODELING IN P2P SYSTEMS Tabelle III ÜBERSICHT ÜBER DIE VERSCHIEDENEN SPIELER-KLASSEN. EINGETEILT SIND DIESE DURCH DIE ANZAHL AN BEWEGUNGEN. KLASSE 1 STELLT SPIELER DAR, WELCHE NUR SEHR SELTEN SPIELEN. KLASSE 5 STELLT SPIELER DAR, WELCHE SEHR OFT SPIELEN. Klasse # Spieler # Bewegungen Abbildung 2. Verhaltensgraph für Regionen (Rooms) und die möglichen Aktionen in einem MOG. Beim Anmelden wird dem Spieler ein Raum zugewiesen. Von dort aus kann er in einen benachbarten Raum wechseln oder sich von dem Spiel abmelden. Um das Anfrage-Verhalten der Nutzer erfassen zu können, müssen automatisch durchgeführte Wiederanfragen, die von der Client-Software generiert werden und Protokoll spezifische Aufgaben erfüllen, ausgefiltert werden. Diese würden sonst die Statistik verzerren. Pro Tag verändern sich die 100 populärsten Suchanfragen. 97% der Anfragen die aus Nord Amerika ausgeführt werden, werden nicht von Nutzern aus Europa durchgeführt Die Anzahl der Anfragen, die pro Session ausgeführt werden, sind abhängig von der geographischen Position der Nutzer. So führen Nutzer in Europa mehr Anfragen aus als Nutzer in Nord Amerika. Die Sessions an sich dauern im Schnitt in Europa am längsten. Es wurde ein signifikanter Zusammenhang zwischen der Dauer einer Sitzung und der Anzahl der Anfragen während dieser Sitzung festgestellt. Kein Zusammenhang wurde zwischen dem Anfrage-Intervall und der Anzahl von Anfragen festgestellt. B. Multi-player Online-Game Im Jahr 2005 führten [8] eine Untersuchung über das Nutzerverhalten eines Multi-player Online Games (MOG) durch. Das Nutzerverhalten bei MOGs kann vom Genre des Spiels abhängen. So stellt ein Ego-Shooter andere Anforderungen (Genauigkeit eines Schusses) als eine Rollenspiel. Letzteres zeichnet sich eher durch die Erforschung der virtuellen Welt aus. Ziel der Arbeit war die Entwicklung eines Modells. Hierfür wird eine virtuelle Welt in Regionen (Rooms) unterteilt. Um Daten über das Spielverhalten von Nutzern zu erhalten, wurde ein Jahr lang das MOG RockyMud untersucht. Insgesamt wurde das Verhalten von 556 verschiedenen Spielern untersucht. Ein typischer Nutzer betritt das Spiel, hält sich in Regionen auf, führt Aktionen aus (Kämpfe mit Monstern), wechselt Regionen und verlässt das Spiel. Ein Benutzerverhaltensmodell in Form eines Graphen ist in Abbildung 2 aufgeführt. Als interessanteste Parameter für ein Modell wurden folgende Aktionen bzw. Verhaltensweisen ausgewählt: Die Dauer der Intervalle, in denen sich Benutzer in das Spiel einloggen Die Wahrscheinlichkeit, dass Nutzer von einer Region des Spiels in eine andere wechseln (transition probability) Die Dauer, für die sich Spieler in einer Region aufhalten, bevor sie wechseln Wie lange Spieler im Spiel verweilen, also die Zeit zwischen dem Einloggen und dem Ausloggen Diese Parameter lassen sich laut den Autoren von [8] auf alle Spielgenres anwenden und sind deshalb von besonderem Interesse. Die Interaktion zwischen Nutzern im MOG wurde aus Zeit- und Aufwandsgründen ausgelassen. Die Ergebnisse also die gewonnen Erkenntnisse über das Benutzerverhalten sehen folgendermaßen aus: Die Ankunftsintervallrate gleichen einer exponentiellen Verteilung f(x) = λ e λx, wobei λ zwischen 70,3 und ,7 Sekunden annehmen kann (abhängig von der Region in der der Spieler startet) Spieler werden in verschiedene Klassen eingeteilt (z.b. basierend auf der Anzahl der durchgeführten Bewegungen). Tabelle III stellte eine Übersicht über die Klassen und deren Spieler dar. Jede Klasse hat eine unterschiedliche Wahrscheinlichkeit, die aktuelle Region zu verlassen Die Aufenthaltsdauer in einer Region ist Pearson verteilt Spieldauer ist Pareto (2. Ordnung) verteilt, mit f(x) = ab2, a > 0, b > 0. (x + b) a Für das Modell wurden die Werte a = 4, 56 und b = 0, 6 identifiziert Des Weiteren lassen sich Verhaltensmuster für spezielle Regionen identifizieren. Abhängig von der Klasse verhalten sich Spieler in unterschiedlichen Regionen unterschiedlich. Zum Beispiel verlassen Nutzer das Spiel, nachdem sie in einer bestimmten Region waren. C. P2P-basiertes Video-on-Demand-Anwendung In [10] wurde das Nutzerverhalten der P2P basierten Video-on-Demand (VoD) Anwendung CCTV.com [3] mit der Verwendung von Hard-Chaches über eine Periode von 100 Tagen untersucht. Ziel der Untersuchung war es, zu ermitteln, ob sich die Verwendung von Hard-Cache eignet. Unter Hard-Cache wird verstanden, dass Videos auf der Festplatte von Teilnehmern gespeichert werden. Anders 125

126 USER BEHAVIOR MODELING IN P2P SYSTEMS Poisson-Verteilter Ankunfts Prozess) und Online/Offline Modell verwendet. Die Arbeit von [12] zeigt deutlich, dass Verhaltensmodelle von Benutzern eine deutliche Auswirkung auf die Performanz und Auslastung von P2P-Anwendungen haben. Unterschiedliche Modelle bewirken auch unterschiedliche Messergebnisse. Nur ein realistisches Verhaltensmodell kann somit eine korrekte Evaluation von Anwendungen ermöglichen. IV. FAZIT Abbildung 3. Ein Ergebnis der Untersuchung des Benutzerverhaltens für Popularität (= die Anzahl der Anfragen für 30 Tage) von Musik- und Nachrichten Videos. Rote Linien repräsentieren Nachrichtenvideos und blaue Linien Musikvideos. als Soft-Cache werden diese Videos nicht z.b. nach dem Ausloggen oder einem Neustart entfernt. Der Sinn des Hard-Chaching Verfahrens ist, dass Nutzer auch nach einem Verlassen und erneuten Beitritt Videos verteilen können. Ein Ergebnis der Untersuchung bezieht sich auf die Popularität von Musik- und Nachrichtenvideos. Abbildung 3 zeigt die Popularität, d.h. die Anzahl der Anfragen für eine 30-tägige Zeitspanne. Die roten Linien repräsentieren Nachrichtenvideos und die blau gepunkteten Linien Musikvideos. Es zeigt sich, dass Nachrichtenvideos nur ein paar Tage nach Veröffentlichung interessant sind. Dafür erhalten sie in diesen ersten Tagen deutlich mehr Anfragen als Musikvideos. Musikvideos haben in den 30 Tagen nach Veröffentlichung eine fast gleichbleibende Nachfrage. Dies ist in der Hinsicht eine wichtige Erkenntnis, da sie für die Betreiber einer VoD-Anwendung bedeutet, dass Nachrichtenvideos nach ein paar Tagen aus dem Hard-Cache entfernt werden sollten und mit neueren ersetzt werden. D. DHT - Auswirkungen von Wokloads In 2008 untersuchten [12] die Auswirkungen, die Verhaltensmodelle von Nutzern auf die Performanz von P2P-basierten verteilten Hashtabellen (Distributed Hash Tables) haben. Zum Einsatz kam Kadamelia [11]. Das Hauptaugenmerk wurde bei dieser Untersuchung auf die Lebenszeit (Lifetime) eines Nutzers gelegt. Diese Lebenszeit stellt die zeitliche Spanne dar, ab welchem der Nutzer sich zum ersten Mal bei einer Anwendung anmeldet (Ankunftszeit) und endet zu dem Zeitpunkt, an dem sich der Nutzer zum letzten Mal von der Anwendung abmeldet und sich nie wieder anmeldet (Weggang). Während dieser Lebenszeit gibt es Online-Zyklen (in denen der Nutzer in der Anwendung angemeldet ist) und Offline- Zyklen (in denen der Nutzer abgemeldet ist). Die Art und Frequenz dieser Online- und Offline-Zyklen wird als Churn bezeichnet. Für die Evaluation wurden verschiedene Parameter für das Lifetime-Modell (Deterministisch- und In dieser Arbeit wurde auf die Thematik der Modellierung von Benutzerverhalten für P2P-Systeme eingegangen. Es wurde gezeigt, dass realistische Benutzer- und Workloadmodelle ein wichtiger Punkt für eine realitätsnahe Simulation und somit einer realitätsnahen Evaluation von P2P-Systemen und Anwendungen sind. Des Weiteren wurden Ergebnisse über aktuelle Studien (die letzte vom Jahr 2009) zur Analyse und Modellerstellung von Benutzerverhalten zusammengefasst. Es existieren Analysen von Nutzerverhalten und somit auch die Grundlagen für synthetische Modelle für verschiedene Aspekte von P2P-File-Sharing-, P2P-basierten Video-on-Demand- und Online-Spiel-Anwendungen. Von den besprochenen Modelltypen zur Erzeugung von Workload ist das synthetische Modell den anderen in den meisten Fällen vorzuziehen. Es bedarf deutlich weniger Speicherplatz als Traces und ist parametrisiert. Dank der internen Zustände und der internen Abfragen, um z.b. mittels Wahrscheinlichkeitsverteilungen Aktionen auszuwählen, kann synthetischer Workload generiert werden. Allerdings bieten Workload-Traces in der Situation einen Vorteil, wenn Beziehungen, Interaktionen und Abhängigkeiten zwischen verschiedenen Benutzern einer P2P-Anwendung berücksichtigt werden müssen. Dies wäre nur mit viel Aufwand welcher sich bis zu einer Anpassung des verwendeten Simulationsprogramms auswirken würde bei synthetischen Modellen machbar. Die synthetische Modellbildung ist ein noch nicht erschöpfter Forschungsbereich. Alle hier vorgestellten Modelle sind bis auf eine Ausnahme für spezifische Anwendungen erstellt worden. Auffallend war, dass die Messzeiträume der vorgestellten Arbeiten sehr unterschiedlich waren. Die Zeiträume gehen von knapp über einem Monat bis hin zu einem Jahr. Auch die Zahl der überwachten Nutzer schwankt stark. Es wäre interessant, die erhaltenen Ergebnisse vor dem Hintergrund der verschiedenen Messzeiträume zu vergleichen bzw. zu untersuchen, inwieweit sich unterschiedlich lange Messzeiträume bei der selben Anwendung auf die Ergebnisse auswirken. Auffallend ist auch, dass das Thema für die Messung und Untersuchung von Benutzerverhalten für P2P-Anwendungen momentan fast nur im akademischen Bereich interessant ist. Dies lässt sich gut an den in dieser Arbeit vorgestellten Modellen und Verhaltensuntersuchungen erkennen. Die untersuchten P2P-Anwendungen Gnutella, RockyMud und MAZE stellen keine Main-Stream- Anwendungen dar. Gnutella hat keine große Verbreitung in der File-Sharing-Community gefunden. MAZE ist ein 126

127 USER BEHAVIOR MODELING IN P2P SYSTEMS Forschungsprojekt. RockyMud war zum Zeitpunkt der Erstellung dieser Arbeit im Internet nicht mehr erreichbar. Untersuchungen in kommerziellen Produkten oder weit verbreiteten Anwendungen (außer dem vorgestellten VoD- System) wurden nicht gefunden. Das große Ziel wird es sein, Benutzermodelle zu entwickeln, die für jede Art von Anwendung eingesetzt werden können. LITERATUR [1] R. Bodnarchuk and R. Bunt. A Synthetic Workload Model for a Distributed System File Server. ACM SIGMETRICS Performance Evaluation Review, 19(1):59, June I-A, II-A [2] M. Calzarossa, L. Massari, and D. Tessera. Workload Characterization Issues and Methodologies. Performance Evaluation: Origins and Directions, page , I-A, II-B, II-C0b, II-C0c [3] CCTV.com. China Central Television. III-C [4] Q. Feng and Y. Dai. User Behavior Modeling in Peer-to-Peer File Sharing Networks: Dissecting Download and Removal Actions IEEE International Conference on Acoustics, Speech and Signal Processing, pages , April III-A1 [5] Gnutella Developer Forum. Gnutella Protocol Development III-A2 [6] O. Herrera and T. Znati. Modeling Churn in P2P Networks. 40th Annual Simulation Symposium (ANSS 07), pages 33 40, March II-C0b [7] A. Klemm, C. Lindemann, Mary K. Vernon, and O. Waldhorst. Characterizing the Query Behavior in Peer-to-Peer File Sharing Systems. Proceedings of the 4th ACM SIGCOMM conference on Internet measurement - IMC 04, page 55, II-C0b, III-A2 [8] M. Kwok and G. Yeung. Characterization of User Behavior in a Multi- Player Online Game. In Proceedings of the 2005 ACM SIGCHI International Conference on Advances in computer entertainment technology, volume 54, page 74. ACM, III-B, III-B [9] V. Lo, J. Mache, and K. Windisch. A Comparative Study of Real Workload Traces and Synthetic Workload Models for Parallel Job Scheduling. In Job Scheduling Strategies for Parallel Processing, page Springer, II-B [10] J.G. Luo, Y. Tang, M. Zhang, and S.Q. Yang. Characterizing User Behavior Model to Evaluate Hard Cache in Peer-to-Peer Based Video-on-Demand Service. Advances in Multimedia Modeling, ( ): , III-C [11] P. Maymounkov and D. Mazieres. Kademlia: A Peer-to-Peer Information System Based on the Xor Metric. Peer-to-Peer Systems, page 53 65, III-D [12] K. Pussep, S. Kaune, C. Leng, A. Kovacevic, and R. Steinmetz. Impact of User Behavior Modeling on Evaluation of Peer-to-Peer Systems. Computer, I-A, III-D [13] M. Yang, H. Chen, B. Zhao, Y. Dai, Z. Zhang, U. C. Barbara, and S. Barbara. Deployment of a Large-scale Peer-to-Peer Social Network. In Proc. of WORLDS, III-A1 127

128 A FORGETTING INTERNET A forgetting Internet Olga Petrova Abstract Internet presents new challenges to the task of protecting private data. Data sent over the Internet are constantly duplicated and stored by providers, opening a possibility for attacks on privacy. These attacks cannot be met with countermeasures by original data owners, because the senders do not have direct control over the data copies. Recently two new systems, Vanish and EphCOM, which address this security problem, were developed and underwent initial tests. Their approach is based on the notion of disappearing keys for encrypted data. In order to temporarily store the bits of the encryption keys both Vanish and EphCOM use large networks, which provide openly available Internet services. Certain features of these services are exploited in order to organize automatic destruction of the keys. We describe basic principles, which drive the design of both systems, the details of the prototype implementations, the results of the tests performed by the developers and initial trials by independent groups of researchers. We also discuss how the two systems perform with respect to potential security threats. I. INTRODUCTION The Internet and open networks give their users more and more possibilities to communicate. And at the same time they open these communications to potential eyes dropping by other parties. Once private data become available on the Internet the person who created these data essentially loses control over them. Internet providers, mail servers, owners of websites and hosts of social networks, such as Facebook, Flickr or other data sharing community services, constantly duplicate and archive data which were posted. Often copies are saved by the involved parties for a very long time, for years. Sometimes a person who sent or posted the original data would prefer to protect them against any misuse, but is no longer capable of doing so, because he or she has no knowledge even about how many duplicates exist and where they are stored. Even confident messages, intended to be viewed by a particular receiver are regularly duplicated by mail, Web servers and Internet providers long after the initial purpose of sending them was achieved. Then the data might not be given any special care or protection. These stored copies of private data or outdated web-communications can become subjects of security attacks or an unwanted investigation. Very often, therefore, users would prefer that their data would not be available at all after a certain expiration time. On its own computer a user can manually delete sensitive old data or automate this process with a cron job. But the copies archived by the receivers or providers can not be deleted by the initial data owners neither manually nor automatically. At the first level of protection the security is addressed by encrypting sensitive data or messages. However, this measure is sufficient only under condition that the encryption keys are stored separately from the encrypted data and that even if an attacker or investigator has access to the encrypted data he can not obtain the corresponding encryption key at the same time. One of the possible approaches in this direction is to give encryption keys to a trusted third party. The key would be available to the sender and receiver of the encrypted message for some time and after that irreversibly destroyed by the trusted party (the so called Ephemerizer family of solutions). This approach has some limitations. Not only it can happen that the third party is not fully trustworthy, the encrypted keys can be stolen or copied by an attacker, or the trusted party can be forced to disclose the keys. Therefore, it is desirable to devise such a technique that would allow encryption keys to be saved temporarily without involving any third party and to be automatically destroyed after a certain expiration time without special effort from the initial data owners. Two recently developed systems, Vanish [2] and EphCOM [1], explore exactly this possibility. Both systems were built for the same purpose and share common points in the design. But there are principal differences between them as well. Vanish and EphCOM differ significantly first of all in the choice of the network which is used for storing the encryption keys. The systems also use their own algorithms for constructing the encryption keys. The structure of the following sections of this paper reflects these similarities and differences of the two projects. Whenever possible, common points are addressed keeping in mind both systems, and minor variations are mentioned after that. More essential parts, which are specific to either one of the systems, are described in separate subsections. Also, it is important to note that Vanish was the first project which implemented the new approach. EphCOM was created later, and its design was influenced by the experience obtained with the first Vanish implementation. Therefore, in the sections where security is discussed, special consideration is given to the attacks which were initially underestimated by the Vanish developers. EphCOM is designed and expected to overcome security attacks of this type better then Vanish. II. BASIC REQUIREMENTS AND OPERATION SHEME The developers of Vanish and EphCOM systems raised the requirements to the security of communication and retained data to a very high level. The basic condition is that even if the encrypted data are stored unmodified and are fully available to the attacker after the expiration time, the data should not be possible to decrypt. The attacker might have a fraction of the key available after expiration time, but this fraction should not be sufficient for data decoding. Furthermore, the process of the destruction of encryption keys should be organized in such a way that keys disappear automatically without explicit intervention by the data owners or by the providers who stored either the encrypted data or the keys. In addition, the developers wanted to create a system, which would not require from potential users a need for any specific secure hardware. 128

129 A FORGETTING INTERNET And finally, it was intended that the system would only use common network services openly available on the Internet. In general, the technique of disappearing keys is envisioned to be applied for various types of data, like messages, web posts and even files, stored locally only by one party. But for the sake of simplicity we will use exchange as an example for explaining the basic principles of Vanish and EphCOM. The process of message transmission between sender and receiver would be organized as follows. After the data are created in the non-encrypted form, say, in plain text, the sender would trigger the encapsulation stage, during which the encryption key is constructed and the data are encrypted using this key. Already at this stage the system should know the intended length of expiration period, because this length can influence the constructed value of the encryption key. After the encryption stage is finished, the data and the key are separated: the key parts are automatically stored by the system in an open network, while the encrypted message is sent to the receiver. In addition to the encrypted text the object transmitted to the receiver contains some information needed to reconstruct the encryption key by the receiver. This encapsulated object is denoted differently by Vanish and EphCOM groups, Vanishing Data Object (VDO) and EphCOM object (ECO) respectively. In the rest of this paper it is referred to as encapsulated object. After recieving the encapsulated object the reciever has to trigger the process of decapsulation. Using the information added to the encrypted message the system checks if the expiration time is not yet reached and it makes sense to decode the message. Then the system extracts parts of the encryption key from the storage network, reconstructs the key and decrypts the message. If the lifetime of the key is already exceeded, but the receiver tries to extract the encryption key nevertheless, the information obtained from the storage network is corrupted to such a degree that the reconstructed key is incorrect and efficient decryption of the message is no longer possible. III. DISAPPEARING KEYS Major idea of the new approach is that encryption keys can be stored using existing and commonly available open network services, Distributed Hash Table (DHT) network in case of Vanish and Domain Name System (DNS) in case of EphCOM. The services are used by Vanish and EphCOM as temporary storage, and this storage is provided by such a huge number of participating nodes that if a potential attacker tries to break the security of the system it would be too difficult for him to guess the location of the parts of the encryption keys prior to their expiration and too expensive to collect information from these services proactively. Because the state of the DHT and DNS servers used in the process of composing and reconstructing encryption keys will change after a certain time and will not be reproducible after that, it would become impossible to reconstruct the keys after the expiration time. A. Construction of the keys in Vanish For its operation Vanish uses one of the existing DHTs. Several large-scale DHTs exist in the Internet and openly available for common use. Any DHT is a storage network, which consists of many Peer-to-Peer participating nodes. The data stored in a DHT consists of a pair (index, value). One computer (sender) can store a value on a DHT using a certain index and another computer (receiver) can retrieve this value using the same index. One of the most important properties of a DHT for potential Vanish implementation is large number and worldwide distribution of the participating nodes. For example, the first prototype of Vanish was based on the million-node Vuze DHT which primary service is to store keys for decentralized torrent tracking. Another essential property needed by Vanish is that a DHT can reliably store data (encryption keys) for a predetermined period of time, say, for several hours, and on the other hand, shortly after that these data with high probability will be replaced by new information. This replacement is done automatically and in a natural way due to inherent property of DHT to constantly refill the contents stored on its nodes giving priority to newer data. In principle, in the extreme case an encryption key generated by a Vanish process can be saved as a whole in one (index, value) pair, but this would obviously present two potential problems - the whole key would be easier to extract by an attacker and it can be accidentally either erased by the DHT node or stored longer than intended. Therefore, at the stage when an encription key K is composed by Vanish, the key is subdivided into N shares K=(K 1, K 2, K 3... K N ). To each share Ki Vanish assigns certain index I i and then stores shares of the key as N (index, value) pairs in the DHT using this set of indices I=(I 1, I 2, I 3...I N ). The constantly changing state of the DHT does not guarantee that all shares of the key will be extracted without errors. To deal with this issue Vanish uses threshold secret share, in which the threshold T determines how many of the shares are required to reconstruct the original key. The value of this threshold is also included in the encapsulated object transmitted between the sender and the receiver. In its simplest implementation Vanish could include in the encapsulated object the encrypted message EM, set of indices I, total number of shares N in the encryption key, and the threshold value T. However, the Vanish developers decided to add more complexity and flexibility into the mechanism of building the set of indices I. The set of indices I is built by a special algorithm using an access key L and time, provided by a weakly synchronized clock. The final encapsulated object transmitted by Vanish, VDO = (EM,L,N,T). At the decapsulation stage the receiver uses the access key L and time provided by his clock to reconstruct the set of indices I. With this set of indices the reciever then contacts the DHT in order to extract shares of the encryption key. This design feature that the set of indices I is defined not only by access key L, but also may depend on time, allows Vanish to repost in the DHT the same encryption key using a different set of indices, which is more secure than reposting the 129

130 A FORGETTING INTERNET same set of (index, value) pairs several times. This extension of the key lifetime can be used if the default timeout of the DHT is not suitable. B. Construction of the keys in EphCOM Similarly to Vanish, EphCOM uses openly available service in order to store encryption keys. Namely, it uses cache DNS servers. In their basic function DNS servers resolve domain names into IP addresses. In order to serve the incoming requests more efficiently DNS servers remember recently reconstructed domain names and IP addresses and keep these data in the cache. For its operation EphCOM exploits certain features of the DNS cache servers. Every DNS server replies to the domain name reconstruction queries either by using its own database (non-recursive requests) or by redirecting requests to other DNS servers if there is no entry in the local database (recursive requests). Every DNS server should support non-recursive queries, while support for recursive queries is optional. If a certain domain name was resolved recently and was stored in the cache of the DNS server, the subsequent request for the same name resolution is answered quickly if the request comes within a certain time period. This time is set by the DNS server and is called Time To Live (TTL). What is also essential for the EphCOM implementation is that the type of DNS request can be specified as recursive or non-recursive by the requesting node. The algorithm of disappearing encryption keys is organized in EphCOM in the following manner. First, a key K=(K 1, K 2, K 3... K N ) which consists of N bits is generated. Then, two sets of N addresses are built, a set of server names SN=(SN 1, SN 2, SN 3... SN N ) and a set of domain names DN=(DN 1, DN 2, DN 3... DN N ) associating every bit of the encryption key K i with a pair (SN i,dn i ). After that, if K i =1, the sender performs recursive DNS request to the cache server SN i asking to recognize the domain name DN i and thus, in a sense, storing the domain name in the server s cache, whereas if K i =0 such a request is skipped. The sets SN and DN are encapsulated with the encrypted message and sent to the reciever. At the time when receiver has to reconstruct all N bits of the encryption key, the reciever sends N non-recursive DNS requests, asking each DNS cache server SN i to recognize the corresponding domain name DN i. Ideally, all the servers associated with bits K i =1 would be able to answer the non-recursive requests because the corresponding domain names would be stored in their cache due to the preceding request made by the sender. On the opposite, for K i =0 the server would not be able to serve the non-recursive query, because its cache would not have a record about the domain name. So, if server SN i replies with a valid IP record, the corresponding bit K i is reconstructed by the receiver s EphCOM client as 1, otherwise as 0. The last element transmitted in the encapsulated message by EphCOM is the end time ET after which the key is expected to be unrecoverable, ECO = (EM,SN,DH,ET). Of course, there can be errors in the DNS replies. By accident the server associated with K i =0 might have in its cache a record with domain name DN i, or for K i =1 the record created after the sender s request in the server SN i can be accidentally deleted or the server can become unavailable by the time of the receiver s query. In order to cope with this type of errors the developers of EphCOM foresee the usage of error correction codes. For example, in its first implementation EphCOM adopts a convolution code, which is sufficient for a prototype system and can be replaced by more sophisticated algorithm for future optimizations of the system. Again, as built-in features of DHTs guarantee the destruction of keys generated by Vanish clients, the basic functionality of the DNS network service used by the EphCOM assures automatic key destruction after the expiration time. DNS servers keep the data in their cache only for some predetermined time, which ranges from several minutes to several days depending on the server. Even if the encapsulated message is completely available after expiration timeout, the list of DNS servers linked to the domain names, which was used to compose the encryption key, is not sufficient for reconstructing the key, because the information stored in cache of the DNS servers has already changed. In order to be able to set suitable values of the expiration time to the disappearing data EphCOM proactively builds a database of DNS servers which can satisfy timing requirements. Currently, there is a wide choice of DNS servers, which have suitable TTL values ranging from several minutes to several days and peaked at 1, 2, 8, 24, 48 hours and 7 days. This allows the EphCOM client to choose SN sets with the same TTL equal to the expected lifetime of the encryption key. Differently from the Vanish project, the developers of EphCOM decided, at least for the first implementation, not to build in any feature, which would allow to dynamically extend the lifetime of the encryption key. On the opposite, the expiration time is even included in ECO. IV. ARCHITECTURE OF THE PROTOTYPE SYSTEMS The developers of Vanish and EphCOM created prototype applications with the main goal to test the basic functionality of the systems and to provide proof of concept. The Vanish prototype applications included a modification of Firefox plugin for the Gmail service and a Firefox extension, which allowed to decrypt and encrypt a text in an input box of a web page. One more Vanish application would allow users to wrap sensitive files into VDO s and to set expiration time to these objects. Similarly, the EphCOM prototype applications included a Firefox extension and a command line tool. For simplicity, in the following we only describe the applications built as Firefox extensions, because they better illustrate potential usage of both systems. The architecture of Vanish and EphCOM extensions for Firefox is similar and organized in layers, such that the core modules, which perform generation of encryption keys and communicate with network servers for key storing, are run in the background. The forefront layer of the user interface was developed by the Vanish group and is based on the FireGPG plug-in for Gmail. It was separately released by the Vanish group by the time when the EphCOM group started to make 130

131 A FORGETTING INTERNET their prototype, and therefore it was adapted to the EphCOM Firefox plug-in as well. From the user s side to work with both, Vanish and Eph- COM Firefox extensions is very similar and rather straightforward. Suppose, a Firefox user has the Vanish plug-in installed in her computer and wants to add the "disappearing" property to her text, which she just typed in an input box of some web page. In this case she simply selects the sensitive text and right-clicks on the selected part. Out of the popped up options, she only needs to pick the "Encapsulate Text", and the Vanish client running in the background replaces the text in the window by the encrypted message, while the encryption key is stored in the DHT. A reader of this encrypted message also needs to have the Vanish client running, and can again simply right click on the encrypted text, choose "Decapsulate Text" option, and the Vanish client does the rest - the text is decrypted and replaced (provided that the encryption key is not yet expired). The layers with core elements of the software, the ones which compose and decompose encryption keys, naturally are different in Vanish and EphCOM. The Vanish prototype was tested with two DHTs, Vuze DHT and Open DHT, and the most extensive experiments were performed with the Vuze DHT. In order to integrate Vanish with the Vuze DHT the developers needed to modify the Vuze client installed on the sender and receiver nodes, however, these modifications were local to the client nodes, no changes to the Vuze DHT servers were necessary. Installation of the EphCOM prototype is even less demanding, this installation does not require any additional background software. V. TESTS OF BASIC FUNCTIONALITY AND PERFORMANCE During the initial tests of both systems the functionality of the overall scheme was confirmed, and the general feasibility of the new approach of disappearing keys was demonstrated. These tests also helped to identify potential performance limitations and possible ways to overcome them. For example, robust and secure operation of Vanish was possible if the number of shares in the encryption keys was chosen between 20 and 50, and the time to store 50 key shares at the Vuze DHT was about 30 seconds. If composition of the key is started only at the time when the sender triggers it, this latency of 30 seconds would be inconveniently large for encryption. But if Vanish proactively generates encryption keys and stores shares in the DHT in advance, then encapsulation process takes about 80 milliseconds and therefore the encryption latency is barely noticeable by the user. In the Vanish operations the number of shares and thresholds in the secret sharing are two crucial parameters, which affect both availability and security, and thus become a subject of a certain trade-in and optimization. The tests demonstrated that with number of shares N=50 and threshold 90 Also experience with the Vanish Firefox extension showed that certain types of communications required timeout longer than 8 hours (default timeout for the Vuze DHT) and therefore special treatment for such communications would be needed. As mentioned earlier, this prolongation of a key is possible in Vanish, but requires at least weak synchronization of sender s and receiver s clocks. Similarly to Vanish, the EphCOM prototype also revealed noticeable latency at the stage when information about encryption key is distributed to the DNS cache servers. This distribution lasted from 20 to 50 seconds depending on how many parallel threads execute the recursive queries to the DNS servers. Latency of key retrieval at the receiver s side was smaller, its values varied from 3 to 10 seconds. This stage can be performed faster then key distribution, because for key reconstruction only non-recursive queries are sent to the DNS servers. As already mentioned, the error correction code used in the first EphCOM prototype was not optimal. Every bit 1 in the encryption key was coded as 3-bit symbol 111 and every bit 0 as 3-bit symbol 000. This simple choice of a convolutional code was suitable for a test release, but led to larger than optimal number of requests to the DNS servers. Therefore one can expect that in the future versionsof EphCOM the latency of key storage and retrieval will be shorter. In order to facilitate the generation of the encryption keys, the EphCOM prototype also uses a background job, which pregenerates sets of valid DNS cache servers and domain names. However, this background job does not reduce duration of subsequent DNS queries. It is also worth noting that neither Vanish nor EphCOM present any substantial overhead to the network services, which they use for storing the disappearing keys. The Vuze DHT can store in principle up to (index, value) pairs, by far exceeding any potential load by Vanish users. The EphCOM developers also point out that additional DNS traffic created by EphCOM queries is negligible compared to the DNS queries made by regular web navigation. VI. SECURITY OF THE SYSTEMS While addressing their major objective to provide additional protection from retroactive security attacks, both Vanish and EphCOM in their implementations try not to introduce any new threats to privacy at the stages before the end of the lifetime of the stored encryption keys. Both groups achieved their major goal to make a system, which due to the feature of key destruction would make a retroactive attack on security extremely difficult to organize. Even if the attacker has access to the DHT network machines in case of Vanish or DNS servers in case of EphCOM, the nature of both services guarantees that the temporary data become completely unrecoverable. Moreover, the machines participated in the DHT network or DNS servers are scattered around the globe and gaining access to them is very problematic. Therefore, special care is given by both groups to the attackers, which could try to retrieve encryption keys BEFORE the keys are destroyed. P rotection from direct copying. To prevent the direct attack which would try to make a copy of the encrypted data and the corresponding key information both Vanish and EphCOM organize the software in such a way that the encapsulated objects, VDO or ECO, themselves could be additionally encrypted by using traditional encryption software, for example, PGP or GPG. Then even if the permanent PGP private keys of 131

132 A FORGETTING INTERNET the sender become known to an attacker after the expiration time of the Vanish or EphCOM encryption keys, this would only allow the attacker to reconstruct the encapsulated object, but encryption of the original message already will not be possible. P rotecting Internet exchange. Another sort of attackers can try to intercept the Internet communications between the sender or receiver nodes and the network, which is used to store the encryption key, DHT or DNS. This information does not contain encrypted messages, but can be used by the attacker to obtain and store encryption keys. Even indirect information about addresses of the nodes, which were contacted (especially addresses of the DNS servers in case of EphCOM), if accumulated for a long time, can reveal the structure or contents of the encryption keys. The attacker could build a database of the encryption keys hoping that they can be useful for future decryptions. In order to defend against such attackers Vanish can limit its connections only to the DHTs, which by default encrypt communications between participating nodes. Another protection is to use Tor in combination with Vanish in order to screen interactions between the sender or receiver machines with the DHT. Similarly, the developers of EphCOM also foresee the screening of network communications with the DNS servers by the Tor or a likewise system. I into the storage network. This type of attackers might ntegration infiltrate the network used for key storage and own a significant fraction of nodes in the DHT network used by Vanish or DNS servers used by EphCOM. Naturally, in this way such attackers can collect information about encryption keys. In order to prevent against this security threat, the number of nodes participating in the storage network should be very large. For example, developers of Vanish estimated that approximately 10 % of the network nodes should be owned by the attacker to allow him to reconstruct significant fraction of the encryption keys. Because the number of nodes participating in the storage network is very large this attack was expected to be very expensive. However, as was demonstrated in later studies, performed by two groups of researchers [3], the first prototype of the Vanish system was vulnerable to Sybil attacks, and these attacks did not require any expensive hardware or services. Two features of DHTs made this attacks possible and were overlooked by the Vanish group. In DHT s one node is allowed to act with many virtual names and therefore to participate in peer-topeer exchange with many identities. In fact, each IP address owned by the attacker allows him to participate with up to 65,535 node ID s in the Vuze DHT network, so with just a few machines one can gain a significant fraction of the DHT network. Second significant factor is that a DHT node does not have to stay online for a long time in order to effectively collect the data from the DHT. The low-cost attacks on the Vanish system were organized by the attackers with hopping strategies, when each Sybil node only lived for 3 minutes, collected most of the data in one part of the DHT, and then jumped into another part of the Vuze DHT with a new identity. Given that Vanish keys are stored in the Vuze for 8 hours such Sybil nodes can provide coverage from 160 locations. In future versions of the Vanish system the developers can try to overcome the shortcomings of the first prototype, but all security improvements would need some modifications made not in the Vanish itself, but in the DHT operations. For example, Vanish can choose another, more secure DHT, like OpenDHT, which is privately hosted and operates only on a separate set of nodes. However, such a DHT would essentially act as a trusted third party, and not as an open primary service. Alternatively, Vanish can ask for some modifications to be made in the Vuze DHT significantly reducing the possibility for a sort living identities to participate in the DHT data exchange. In addition, restricting participation of nodes in the Vuze to fewer number of IDs per one IP would not significantly decrease functionality of the DHT with respect to its primary service, because currently the majority of nodes in the Vuze DHT already satisfy this condition. Such countermeasures would not eliminate the threat from Sybil attacks completely, but would raise their cost. The EphCOM approach, which uses the DNS infrastructure, is expected to defend against attacks of this type better. In order to launch a successful Sybil attack within the DNS network the attacker would need to own very large number of IP addresses. To organize this with different hosts acting as DNS servers would be prohibitely expensive. In addition, within the current framework of public IPv4 policy every account is limited to 5 IP addresses, and therefore, a large number of public IP addresses cannot be organized with just few hosts. Moreover, the EphCOM system preselects for its operation only the DNS servers, which are relatively stable. For example, in order to protect from short term Sybil identities, an EphCOM implementation can use for key storage only DNS servers, which were working for at least one year, and the set of selected servers will still be sufficiently large. However, one has to always keep in mind that when a disappearing key system, Vanish or EphCOM, relies on features of any primary network service, future developments of this service can go into direction unfavorable to the selected features. Currently EphCOM relies on the possibility to organize different types of requests sent to DNS servers, namely recursive and non-recursive. This feature may not necessarily be maintained in the future. For example, future progress might allow all DNS queries to be served quickly as recursive, and the choice whether a request should be treated as recursive or non-recursive might be moved from client and delegated to DNS servers themselves. Therefore, one has to expect that future disappearing key systems will always need to adapt to the new conditions in the underlying private storage services. VII. CONCLUSION Wide use of Internet made protection of the data privacy increasingly important and difficult. Private data such as messages are constantly stored by third parties and can present a future threat to the initial owners. To address this problem the new approach was explored by two projects, Vanish and EphCOM. In this approach the keys used in the encryption of the data are temporarily stored using existing primary network services in such a way that the keys are automatically destroyed after the expiration time. The new 132

133 A FORGETTING INTERNET systems demonstrated that built-in features of DHT and cache DNS servers allow one to organize timely destruction of the encryption keys. Two prototype applications were developed and underwent initial tests. The keys were self-destructed at the predefined time without direct user participation. However, the first experience also revealed some potential limitations for these systems. The Vanish system was shown to be vulnerable to low-cost Sybil attacks. Regardless of these difficulties, both systems demonstrated the potential usefulness of the new approach. REFERENCES [1] EphCOM: Practical Ephemeral Communications., I [2] Amit A. Levy Henry M. Levy Roxana Geambasu, Tadayoshi Kohno. Vanish: Increasing Data Privacy with Self-Destructing Data. University of Washington, I [3] Nadia Heninger Edward W. Felten J. Alex Halderman Christofer J. Rossbach Brent Waters Scott Wolchok, Owen S. Hofmann and Emmett Witchel. Defeating Vanish with Low-Cost Sybil Attacks Against Large DHTs., VI 133

134 BENUTZERVERHALTEN IN ONLINE-SOCIAL-NETWORKS Benutzerverhalten in Online-Social-Networks Daniel Puscher Abstract Social Networks sind zur Zeit beliebter denn je. Um die bestehenden Services zu verbessern und benutzerfreundlicher zu machen, ist es nötig zu verstehen, wie Benutzer mit dem Dienst interagieren. Zu diesem Thema wurden bereits einige Studien angestellt, die das Verhalten von Benutzern genauestens aufzeichnen und daraus versuchen, das Verhalten der Benutzer zu analysieren und generalisieren. Man kommt nach Betrachtung der Ergebnisse zu interessanten Erkenntnissen, die genutzt werden können, um bestehende Social Networks zu verbessern, und das Design künftiger Plattformen zu optimieren. I. EINLEITUNG Der Begriff "Social Network" (deutsch: Soziales Netzwerk) bezeichnet eine spezielle Klasse von Web-Services. Bei diesen kann der Nutzer online ein öffentliches, oder auch für Fremde nur eingeschränkt zugängliches, Profil über sich selbst erstellen. Die meisten Anbieter bieten dem Nutzer an, ein Bild von sich, allerlei persönliche Angaben sowie seine Interessen dort zu veröffentlichen. Außerdem lassen sich bei diesen Netzwerken andere Benutzer als "Freunde" markieren. Als Resultat wird eine Verbindung zwischen den beiden Benutzern hergestellt und der jeweils andere Nutzer taucht in einer sogenannten "Freundesliste" auf, wodurch es für andere Benutzer möglich ist, diese Verbindung einzusehen. Durch die Verbindungen entsteht ein Netzwerk aus sozialen Kontakten, innerhalb diesem die Teilnehmer Daten (zum Beispiel Private Nachrichten, Fotos, Videos und Links) austauschen können. Die Funktionen zur Interaktion und Kommunikation, die ein Social Network besitzt, existieren schon seit längerer Zeit im Internet als Chats, Foren und . Die Geschichte von Social Networks beginnt jedoch erst in der Mitte der 1990er Jahre. Zu dieser Zeit entstanden Plattformen, die über reine Internet-Foren und Chats hinausgingen. Als eines der ersten Angebote kann die Schulfreunde-Community classmates.com angesehen werden, welche 1995 in den USA gegründet wurde. Allerdings waren solche Dienste bis ca nicht besonders populär. Dann gab es jedoch einen schnellen Anstieg der Nutzerzahlen. Interessante Meilensteine in der Geschichte von Social Networks, zumindest wenn es um den "Wert" dieser Seiten geht, waren der Verkauf von MySpace für 580 Millionen US-Dollar im Jahr 2005 an News Corporation [6], der anteilige Verkauf von Facebook (1,6%) für 240 Millionen US-Dollar in 2007 an Microsoft [8] und der Verkauf von Bebo (populär in Großbrittanien) für 850 Millionen US-Dollar in 2008 an AOL [20]. Was ist also das Besondere an Social Networks und wieso haben sie einen solchen Erfolg? Der Erfolg liegt, wie bei vielen Web 2.0-Diensten, darin, dass die Nutzer die Möglichkeit haben, aktiv das Internet mitzugestalten. Durch die Plattform ist es auf einfache Weise möglich, selbst Informationen im Web zu veröffentlichen. Der Nutzer ist nicht mehr nur als Konsument zu sehen, sondern vielmehr als Produzent und Konsument gleichzeitig ("Mitmach-Web"). Die Funktionen sind bei diesen Diensten auf die "sozialen Bedürfnisse" ausgerichtet. Die Hauptfunktionen sind soziale Kommunikation und die Selbstdarstellung. [16] Social Networks sind zur Zeit extrem populär gab es alleine aus Deutschland 149 Social Networks [18]. Dieser Wert dürfte bis heute allerdings noch weiter angestiegen sein. Dabei gibt es neben den allgemeinen Netzwerken für die breite Masse auch eigene Plattformen für speziellere Zielgruppen, wie torfreunde.de für Fußballer, mamiweb.de für Mütter oder sogar Angebote für Senioren, welche aber nur sehr selten genutzt werden. [13] Die international bedeutendsten Portale sind MySpace, die VZ-Netzwerke 1 und Facebook. Auf MySpace sind laut eigenen Aussagen mehr als 220 Millionen Nutzer registriert, bei den VZ-Netzwerken insgesamt 16,9 Millionen 2 und Facebook kann als absoluter Spitzenreiter mehr als 400 Millionen aktive Nutzer in aller Welt vorweisen 3. Doch wozu benutzen Leute eigentlich Social Networks? Um zu erfahren, warum und wie Studenten Social Networks nutzen, führten C. Lampe et al eine Studie auf der Plattform "Facebook" mit über 1000 Studenten der Michigan State University durch. Es wurde unter anderem die Frage behandelt, mit welchem Ziel das Social Network benützt würde. Der Grund, der von den Studenten als am wichtigsten eingestuft wurde, war: "Mit alten Freunden oder jemandem, den ich aus der Highschool kenne, in Kontakt bleiben." mit einer durchschnittlichen Wichtigkeit von 4,63/5. Ein weiterer sehr wichtiger Grund war: "Das Profil von jemandem checken, den ich privat kennengelernt habe." (4,51/5). Das Kennenlernen von neuen Freunden wurde dagegen eher selten als Grund angegeben: "Persönliches Treffen mit jemandem, den ich über Facebook kennen gelernt habe." (2,41/5) und "Jemanden zum Ausgehen finden." (1,99/5). [9] Social Networks werden also hauptsächlich dazu genutzt, mit seinen Freunden in Kontakt zu bleiben, und nicht dazu, neue Personen kennen zu lernen. Laut einer Veröffentlichung von "Nielsen" von 2009 sind Social Networks inzwischen weltweit 4 der viert-beliebteste Service im Internet. 67% der Internetnutzer benutzen solche 1 StudiVZ, SchülerVZ und MeinVZ 2 Quelle: Stand: Juni Quelle: Stand: Juli Zugrundeliegende Daten wurden von "Nielsen" in Australien, Brasilien, Deutschland, Frankreich, Italien, Spanien, Schweiz, Großbrittanien and den USA gesammelt 134

135 BENUTZERVERHALTEN IN ONLINE-SOCIAL-NETWORKS Seiten. Damit sind Social Networks sogar beliebter als die private . [12] Social Networks zeigen außerdem ein überdurchschnittlich großes Wachstum auf. Innerhalb eines Jahres ist die Anzahl der Benutzer von Social Networks um 27,8% gewachsen. Im Vergleich dazu wuchsen die Nutzerzahlen bei Suchmaschinen nur um 4,6%. [11] Die schnelle Verbreitung dieser Services gibt Anlass dazu, zu überprüfen, wie genau die Nutzer diese Angebote nutzen. In diesem Paper wird zuerst die Frage beantwortet, warum solch eine Analyse überhaupt Sinn macht. Im weiteren Verlauf werden die Ergebnisse zweier Studien zu diesem Thema vorgestellt und analysiert. Dabei wurden die Ergebnisse in mehrere Gruppen eingeteilt: Sitzungsdaten: Informationen darüber, wie lange und wie oft ein Nutzer online ist. Zusätzlich wurde untersucht, wie die Aktivität von der Zeit abhängt. Anzahl von Freunden: Wie viele Freunde haben Nutzer? Nutzeraktivitäten: Welchen Aktivitäten gehen Nutzer am meisten nach? Übergang zwischen Aktivitäten: Angaben darüber, in welcher Reihenfolge der Nutzer Aktionen ausführt. Interaktion mit anderen Benutzern: Untersucht wurde, wie Benutzer miteinander kommunizieren. Zusätzlich wird beschrieben, inwiefern diese Erkenntnisse eine Rolle für die Entwicklung von neuartigen Social Networks, speziell Safebook, spielen. Safebook ist ein dezentrales Social Network, es kommt also ohne einen zentralen Server aus, auf dem alle Profildaten gespeichert sind. Stattdessen speichert jeder Nutzer die Profile seiner Freunde. Bei einer Freundschaftsanfrage landet man in einem solchen Fall nicht direkt bei dem Nutzern, sondern vielleicht erstmal bei einem Freund eines Freundes dieses Nutzers. Die Daten werden über verschlüsselte Umwege versendet, so dass eine Rückverfolgung unmöglich ist. Derzeit wird Safebook von unter Anderem von Thorsten Strufe an der TU Darmstadt entwickelt. Grund für die Entwicklung war eine eigene Studie, in der wahllos Profile bei Facebook anlegt wurden, die es mit dem gleichen Namen und Foto schon gab. Über deren Freundeslisten konnten durch Freundschaftsanfragen über 80 Prozent der Personen dazu gebracht werden, sich mit dem Fake-Profil anzufreunden. Auch Datenlecks, die bei diversen Angeboten aufgetreten sind (wie im Mai 2010 bei Facebook [14] oder 2009 bei Schüler VZ [15]) zeigen, dass es Zeit ist, über Alternativen zur aktuellen Technik nachzudenken. II. WARUM FÜHRT MAN EINE VERHALTENS-ANALYSE VON BENUTZERN DURCH? Die Gründe für die Analyse des Benutzerverhaltens in Sozialen Netzwerken sind vielfältig: Die Effizienz der vorhandenen Implementierung kann getestet werden. Somit lässt sich das Design der Seite verbessern und gleichzeitig auch eine bessere Platzierung der Werbung erreichen. [2] Dies ist ein besonders wichtiger Punkt, da Social Networks erhebliche Ausgaben haben und die einzige Einnahmequelle, neben Investoren, Werbung ist. Facebook zum Beispiel hat im Monat über 20 Millionen US-Dollar Ausgaben für Strom, Bandbreite, Hardware, Miete und Personal. Zusätzlich steigen diese Kosten ständig, da sich die Anzahl der Nutzer und die Menge der Inhalte, die gespeichert werden müssen, ständig erhöht. [4] Das erste mal, dass Facebook seit der Gründung Anfang 2004 Geld verdient hat, war Mitte Zuvor waren die Ausgaben des Unternehmens immer höher als die Einnahmen. [17] Modelle, die das Verhalten von Nutzern beschreiben, sind in sozialen Studien und im viralen Marketing von großer Bedeutung. So können Firmen diese Informationen benutzen, um Inhalte/Aktionen (oder allgemein: Werbung) schnell und vor allem effektiv zu verbreiten. [10] Da Social Networks durch ihre Popularität einen nicht geringen Anteil des Internet-Traffics ausmachen, können die Informationen, die aus der Analyse des Nutzverhaltens gewonnen werden, dazu genutzt werden, die Struktur des "next-generation Internet" zu entwickeln. [1] III. BESCHREIBUNG DER ZUGRUNDE LIEGENDEN DATEN Zu dem Thema "Benutzerverhalten in Social Networks" existieren eine Reihe wissenschaftlicher Untersuchungen. Allein die Suche bei Google Scholar nach "user behavio(u)r" "social network(s)" bringt fast 4000 Ergebnisse. Viele dieser Studien unterscheiden sich jedoch von den Ergebnissen her kaum, da oft die gleichen Plattformen untersucht werden und sich andererseits das Verhalten der Nutzer in den verschiedenen Plattformen ähnelt. In diesem Paper beziehe ich mich auf 2 populäre Studien. Zum Einen die sehr ausführliche Veröffentlichung "Characterizing user behavior in online social networks" von F. Benevenuto et al. aus dem Jahr 2009, zum Anderen "Rhythms of social interaction: messaging within a massive online network" von S. Golder et al. von Im ersten Paper beruht die Analyse auf detailliert erfassten Clickstream-Daten. Ein Clickstream zeichnet für einen Benutzer genau auf, wann er an welche Stelle einer Website geklickt hat, bzw. welche Funktionen er ausgeführt hat. Gesammelt wurden über einen Zeitraum von 12 Tagen die Daten von Nutzern, die die 4 populären Netzwerke "Orkut", "MySpace", "Hi5" und "LinkedIn" benutzten. Da das Sammeln der Daten direkt über die Portale nicht möglich ist, wurde dies über einen brasilianischen Aggregations- Service realisiert, der den Nutzern erlaubt, sich mit einer einzigen Authentifikation in mehrere Netzwerke gleichzeitig einzuloggen. Die Auswertung berücksichtigte einerseits die Auslastung des Services (wie oft Nutzer sich einloggen und 135

136 BENUTZERVERHALTEN IN ONLINE-SOCIAL-NETWORKS wie lange sie online bleiben), andererseits die Aktivitäten, die die Nutzer durchführen. Das zweite Paper behandelt die Frage, wie Nachrichten zwischen den Benutzern eines Social Network ausgetauscht werden. Dazu wurde das Social Network "Facebook" über einen Zeitraum von 26 Monaten überwacht. Insgesamt beruht die Analyse auf 362 Millionen ausgetauschten Nachrichten von 4,2 Millionen Nutzern. Die Daten wurden an 496 nordamerikanischen Universitäten gesammelt und bestehen aus datenschutzrechtlichen Gründen nur aus völlig anonymisierten Headern (jedem Nutzer wurde eine zufällige ID zugeordnet) ohne Nachrichteninhalt. wird der Service zu jeder Zeit immer von einer gewissen Mindestanzahl an Personen (ca. 50) genutzt. Jedoch muss man hierbei bedenken, dass der untersuchte Service hauptsächlich von Nutzern aus Brasilien benutzt wird. Bei einer Seite, die von vielen internationalen Usern gleichzeitig verwendet wird, würde sich keine solch deutliche Kurve abzeichnen, da die Nutzer sich in verschiedenen Zeitzonen aufhalten. IV. ANALYSE VON SITZUNGSDATEN Eine wichtige Eigenschaft für die Aktivität der Nutzer ist einerseits die Anzahl der Besuche pro Zeiteinheit und andererseits auch die Länge einer Sitzung. Laut F. Benevenuto et al. hat der Großteil der Benutzer (63%) in dem Zeitraum von 12 Tagen die Seite nur einmal besucht, wohingegen es auch Besucher gegeben hat, die die Seite öfter besucht haben. Der Benutzer, der sich am meisten einloggte, tat dies durchschnittlich 4,1 mal pro Tag. Auch die Länge der Sitzungen variierte stark. Über die Hälfte der Benutzer (51%) waren über den Testzeitraum insgesamt nicht mehr als 10 Minuten angemeldet. 14% der Benutzer waren mehr als eine Stunde angemeldet und 2% sogar mehr als 12 Stunden (durchschnittlich 1 Stunde pro Tag), wie man in Fig. 1 sehen kann. Fig. 2: Anzahl von verbundenen Benutzern pro Tageszeit [1] Aus den Messwerten lässt sich auch herauslesen, dass auch die Benutzung von Social Networks über die Woche verteilt einem deutlichen Muster folgt. So sind am Wochenende (in Fig /29. März und 04./05. April) deutlich weniger Nutzer aktiv als unter der Woche. V. ANZAHL VON FREUNDEN Laut S. Golder et al. haben die analysierten 4,2 Millionen Nutzer im Mittel 144 Freunde. Erstaunlich ist, dass es einige Nutzer gibt, die extrem von diesem Mittelwert abweichen. Es gab zum Beispiel 11 Nutzer mit mehr als Freunden. [5] (s. Fig. 3) Fig. 1: Länge der Sessions pro Benutzer [1] Allerdings seien die Dauer des Aufenthalts auf der Plattform und die Anzahl der Besuche nicht sehr stark miteinander verknüpft. So gibt es Benutzer, die sich nur wenige Male einloggen und trotzdem insgesamt eine lange Zeit auf der Seite verbringen. Andererseits gäbe es auch Nutzer, die sich zwar häufig einloggen, aber insgesamt nicht viel Zeit auf der Seite verbringen. Weiterhin ist die Aktivität auf den Plattformen stark von der Tageszeit und dem Wochentag abhängig. Wie man in Fig. 2 sehen kann, gibt es Spitzenzeiten, an denen sehr viele Benutzer (> 700) online sind (um 15 Uhr), und Zeiten, an denen weniger Benutzer online sind (früh am Morgen und nachts). Jedoch Fig. 3: Anzahl von Freunden pro Benutzer [5] Für Safebook, bei dem jeder Nutzer die Daten aller seiner Freunde speichert, würde diese Erkenntnis bedeuten, dass der durchschnittliche Benutzer Daten für 144 andere Nutzer speichern muss. Geht man davon aus, dass es sich bei diesen Daten nicht nur um Text, sondern auch um Videos und Bilder handelt (was heutzutage üblich ist), so kämen auf den Nutzer nicht unerhebliche Datenmengen zu, die er speichern müsste. 136

137 BENUTZERVERHALTEN IN ONLINE-SOCIAL-NETWORKS Geht man davon aus, dass die hochgeladenen Fotos, skaliert und komprimiert, noch eine Größe von 100KB aufweisen und jeder Nutzer ca. 10 Fotos hochgeladen hat, käme man schon auf eine Datenmenge von über 140MB. Würde jeder Nutzer noch 2 Videos in YouTube-Qualität (ca. 10MB für 3,5 Minuten) hochladen, die man speichern müsste, käme man auf eine Datenmenge von knapp 3GB. Dies ist auch in der Zeit von Breitband-Internet-Anschlüssen und Terrabyte-Festplatten noch ein stolzer Wert. VI. ANALYSE DER NUTZERAKTIVITÄTEN Um zu untersuchen, was Benutzer in einem Social Network genau machen, wurde zuerst mit den Clickstream-Daten der Untersuchung von F. Benevenuto et al. für jeden Nutzer ein Profil angelegt. Die durchgeführten Klicks wurden daraufhin in 41 verschiedene Aktivitäten aufgeteilt und in 7 Gruppen einsortiert: Suchen nach Gruppen oder anderen Benutzern Einem anderen Nutzer öffentliche Kurznachrichten hinterlassen Private Nachrichten versenden Eine Aktivität eines Freundes kommentieren Fotos und Videos ansehen/hochladen Aktivitäten, die mit Profilen und Freunden zu tun haben, wie Freundschaftsanfragen beantworten, Profilseiten lesen, etc. Aktivität in Gruppen A. Nutzer-Aktivitäten Eine eindeutige Aussage, die sich aus den Ergebnissen ziehen ließ, ist, dass das Browsen bzw. Stöbern die häufigste Aktivität ist. 92% aller Anfragen zählten dazu. Zum Beispiel war die Anzahl der Benutzer, die Nachrichten durchstöbert haben, 13 mal so hoch wie die Anzahl der Benutzer, die Nachrichten verschickt haben. Auch sind Aktivitäten, die mehr Engagement vom Benutzer benötigen, wie zum Beispiel Das Editieren des eigenen Profils, das Schreiben von Kommentaren oder das Verfassen von Nachrichten, im Vergleich zu passiven Aktivitäten nicht besonders beliebt. Es lässt sich somit sagen, dass sich die meisten Benutzer einen Großteil der Zeit passiv im Social Network aufhalten, um neue Einträge von Freunden zu lesen, Fotos zu betrachten oder Profile zu durchstöbern und nur gelegentlich aktiv am Geschehen teilnehmen. B. Wahrscheinlichkeit der Aktivität über die Zeit Um zu überprüfen, ob es einen Unterschied in der Art der Aktivität macht, ob Benutzer eher lang oder eher kurz online sind, wurden Nutzer abhängig von Ihrer Sitzungs-Dauer in 4 Klassen eingeteilt. Für jede dieser Gruppen wurde daraufhin überprüft, welchen Aktivitäten sie vorzugsweise nachgehen. In Fig. 4 und Fig. 5 kann man sehen, wie sich die Zeit, die für verschiedene Aktionen in Anspruch genommen wird, in Abhängigkeit von der Sitzungsdauer verändert. Man sieht schnell, dass die Nutzer, unabhängig von der Sitzungsdauer, am meisten Zeit mit dem eigenen und fremden Fig. 4: Zeit, die ein Benutzer mit einer bestimmten Sitzungslänge mit beliebten Aktivitäten verbringt [1] Fig. 5: Zeit, die ein Benutzer mit einer bestimmten Sitzungslänge mit weniger beliebten Aktivitäten verbringt [1] Profilen sowie mit öffentlichen Kurznachrichten ("Was macht du gerade?") verbringen. In sehr kurzen Sitzungen (< 1 Minute) entfallen 90% der Zeit auf diese beiden Aktivitäten. Jedoch auch in sehr langen Sitzungen (> 20 Minuten) werden noch 75% der Zeit dafür verwendet. Außerdem sieht man in den Graphen, dass mit zunehmender Sitzungsdauer der Zeitanteil, der auf Aktivitäten, die mehr Engagement erfordern (also nicht passiv sind), ständig steigt. Die Wahrscheinlichkeit, dass sich ein Benutzer aktiv am Geschehen beteiligt, steigt also mit zunehmender Sitzungslänge. So ist zum Beispiel der Zeitanteil, der für das Betrachten von Videos und Fotos aufgewandt wird, bei langen Sitzungen doppelt so hoch, wie bei sehr kurzen Sitzungen. VII. ÜBERGANG ZWISCHEN AKTIVITÄTEN Zusätzlich wurde auch ausgewertet, welche Aktion Benutzer als nächstes durchführen, wenn Sie eine Aktivität beendet haben. Dies gibt Designern wertvolle Hinweise zur Gestaltung der Webseite. Funktionen, die von einem Punkt besonders oft aufgerufen werden, sollten von dort aus auch einfach zu finden sein. Als Ergebnis lässt sich feststellen, dass es sehr wahrscheinlich ist, dass ein User nach dem Beenden 137

138 BENUTZERVERHALTEN IN ONLINE-SOCIAL-NETWORKS einer Aktivität die gleiche Aktivität noch ein mal ausführt. Nach dem Betrachten eines Fotos ist es zum Beispiel sehr wahrscheinlich, dass der Benutzer ein weiteres Foto betrachtet. Insgesamt wurden 67% der Benutzeraktivitäten wiederholt. Darüber hinaus gab es mehr Übergänge zwischen Funktionen innerhalb der gleichen Kategorie (77%) als kategorieübergreifend (23%). Benutzer führen also voraussichtlich mehrere Aufgaben, die thematisch verwandt sind, hintereinander aus. Es ist also wahrscheinlicher, dass sich ein Benutzer direkt nach dem Durchstöbern einer Liste von Fotoalben Fotos ansieht, als dass er Funktionen aus einer anderen Kategorie, wie zum Beispiel Schreiben einer Nachricht, benutzt. Zusätzlich ließ sich aus den Ergebnissen auch herauslesen, dass eine einzelne Funktion überdurchschnittlich oft aufgerufen wird, wenn sie sehr einfach erreichbar ist. So zum Beispiel das Anzeigen der Start- oder Profil-Seite, was von jeder anderen Seite aus möglich ist. Für die Entwicklung von zukünftigen Social Networks ist es also wichtig, dass man Funktionen, die oft benutzt werden sollen bzw. die voraussichtlich oft genutzt werden, so einfach wie möglich zugänglich macht. Je mehr Klicks eine Funktion entfernt ist, desto seltener wird sie benutzt. VIII. INTERAKTION MIT ANDEREN BENUTZERN A. Aufruf von Funktionen In Fig. 6 kann man sehen, welche Funktionen bei welchen Benutzern aufgerufen wurden. Dabei wurde unterschieden zwischen dem eigenen Profil, dem von Freunden und dem von indirekten Freunden. Für das Beispiel Safebook sind diese Erkenntnisse interessant, da für 78% der Profilbesuche kein erneuter Datenaustausch nötig ist, da das eigene Profil und das von direkten Freunden lokal gespeichert ist. Lediglich in 22% greift man auf ein unmittelbares Profil zu, welches von einem anderen Benutzer abgerufen werden muss. B. Direkter Informationsaustausch mit anderen Nutzern Es gibt in den untersuchten Social Networks drei verschiedene Möglichkeiten zum direkten Informationsaustausch: Kurznachrichten schreiben Private Nachrichten schreiben Kommentare verfassen Das Schreiben von Kurznachrichten geschieht meist auf der Profilseite von Freunden. Das Posten auf das eigene Profil ist sehr selten (0,5%). Erstaunlicherweise kommunizieren die meisten Nutzer nicht über private Nachrichten mit ihren Freunden. Stattdessen werden 76% der versendeten privaten Nachrichten an indirekte Freunde versendet. Kommentare konnten in dem untersuchten Social Network nur zu direkten Freunden geschrieben werden, weswegen sich für diese Gruppe in Fig. 7 eine Rate von 100% ergibt. Freund 2+hops Private Nachrichten Fig. 7: Direkter Informationsaustausch innerhalb des Social Network [1] Fig. 6: Aufruf von Seiten innerhalb des Social Network [1] Die Grafik zeigt, dass insgesamt über 80% aller Funktionen auf dem eigenen oder einem Profil eines direkten Freundes aufgerufen werden. Benutzer besuchen ihr eigenes Profil häufig, wenn Sie Kommentare und Kurznachrichten schreiben. Fast genau so häufig werden die Kurznachrichten auf Profilseiten von direkten Freunden gelesen. Profile von direkten Freunden werden öfters aufgerufen (59%) als das eigene, und auch Profile von indirekten Freunden werden in 22% der Fälle besucht. Beim Betrachten von Fotos fällt auf, dass ein sehr großer Anteil der angeschauten Fotos von unmittelbaren Freunden stammt. 1) Private Nachrichten: S. Golder et al. kamen in ihrer Studie von Facebook zu dem Ergebnis, dass private Nachrichten eher selten geschrieben werden. Laut den ermittelten Daten verschickt ein Nutzer durchschnittlich 0,97 Nachrichten pro Woche. Dabei ist noch zu beachten, dass es einige wenige Nutzer gibt, die sehr viele Nachrichten verschicken aber wiederum auch eine große Menge an Teilnehmern, die gar keine schicken. Dass eine derart geringe Anzahl an Nachrichten verschickt werden ist erstaunlich. Da für Studenten die Kommunikation per eine große Wichtigkeit hat [7], hätte man hier einen deutlich höheren Wert erwartet. 2) Anstupsen, Gruscheln etc.: Ein weiteres Feature bei einigen Social Networks ist das Anstupsen (bei Facebook) oder Gruscheln (bei den VZ-Netzwerken). Stupst man einen anderen Benutzer an, bzw. gruschelt man ihn, erhält dieser beim nächsten Einloggen eine Meldung darüber und eine direkte Möglichkeit zum "zurück-gruscheln" bzw. "zurückanstupsen". 138

139 BENUTZERVERHALTEN IN ONLINE-SOCIAL-NETWORKS Dabei kann das Gruscheln/Anstupsen für verschiedene Nutzer- Paare verschiedene Bedeutungen haben. Das entscheiden die Nutzer für sich selbst. Ein relativ weit verbreitetes Phänomen sind die sogenannten "Poke wars" (Poke = Anstupsen). Dabei gruschelt sich ein Paar von Nutzern gegenseitig wiederholt über mehrere Stunden oder Tage. Laut S. Golder et al. werden Nachrichten und "Anstupser" hauptsächlich zwischen Freunden ausgetauscht werden (90,6% der Nachrichten und 87,5% der "Anstupser") Auch wenn Nachrichten hauptsächlich zwischen Freunden gesendet werden, so sendet (wie oben bei anderen Netzwerken schon festgestellt) auch bei Facebook nur ein geringer Teil der Nutzer überhaupt Nachrichten. Die Untersuchung der Daten ergab, dass von 378 Millionen Freundschafts-Paaren nur 57 Millionen (15.1%) überhaupt Nachrichten austauschten. [5] C. Was Benutzer dazu bringt, ein Profil zu besuchen Aus der Analyse der gesammelten Daten ging hervor, dass die meisten Zugriffe auf Profile von Freunden von dem eigenen Profil ausgingen (68%). Von diesem Profil wurde in 25% der Fälle ein weiteres Profil von einem Freund aufgerufen und mit einer Wahrscheinlichkeit von 7% das eines indirekten Freundes. Die meisten Benutzer rufen Profile von Freunden und indirekten Freunden überdurchschnittlich oft direkt von ihrer Startseite aus auf. Das mag daran liegen, dass dort Neuigkeiten von Freunden (zum Beispiel Aktivitäten mit deren Freunden wie zum Beispiel Kommentare) angezeigt werden, welche wiederum Links zu den entsprechenden Personen beinhalten. Somit existieren nicht nur Links zu Freunden sondern auch zu indirekten Freunden. Eine weitere Auffälligkeit ist, dass ein hoher Anteil der Zugriffe von Freundes-Profilen ausgeht. Und zwar 25% der Zugriffe auf Profile von Freunden und 30% der auf Profile von indirekten Freunden. Dies bestätigt die Aussage, dass Benutzer von Social Networks vor allem durch ihre Freunde auf andere Benutzer und Inhalte aufmerksam werden. [3] D. Anzahl der kontaktierten Nutzer Hierbei wurde die Anzahl der Personen, die ein Benutzer kontaktiert hat, ausgewertet. Dabei wurde verglichen zwischen allen Aktivitäten und Aktivitäten, die auch für Dritte sichtbar sind. Der dabei festgestellte Interaktionsgrad ist sehr gering. In der 12-tägigen Testphase kommunizierte ein Benutzer durchschnittlich nur mit 3,2 anderen Teilnehmern des Netzwerks. Betrachtet man nur die für Dritte sichtbaren Aktivitäten kommt man sogar nur auf einen Wert von 0,2 Teilnehmer. Dieser geringe Wert wurde auch schon in anderen Studien festgestellt. Laut Christo Wilson et al. [19] sind im Social Network "Facebook" fast 60% der Nutzer inaktiv. Sie haben für mehr als ein Jahr keinerlei Aktivität gezeigt. Ein anderer Trend, der sich abgezeichnet hat, ist, dass der Grad der Interaktion nicht mit dem Grad der Verknüpfung (Anzahl der Freunde) steigt. Benutzer mit einem geringen Grad kommunizieren mit ähnlich vielen Freunden wie Benutzer mit einem hohen Grad. Daraus kann man schließen, dass es einfacher ist, Freunde in einem Social Network zu gewinnen als mit diesen zu kommunizieren. Insgesamt kommunizierten 55% der Nutzer in der Testphase mit mindestens einem anderen Nutzer, 8% hatten mindestens eine sichtbare Aktivität und 47% zeigten nur unsichtbare Aktivitäten. Diese Werte sollten beim Analysieren von Nutzerverhalten anhand von Aktivitäts-Beobachtungen berücksichtigt werden. So würde man 47% der Nutzer eben nicht beachten, da die Aktivitäten dieser Nutzer unsichtbar sind. IX. FAZIT Viele Erkenntnisse, die aus den Studien gewonnen worden sind bzw. von den jeweiligen Autoren herausgearbeitet wurden, klingen schlüssig und dürften Personen, die selbst Benutzer von Social Networks sind auch schon aufgefallen sein. Aus den Ergebnissen lässt sich der durchschnittliche Social Network-Benutzer herauslesen. Er besucht das Social Network innerhalb von 12 Tagen einmal, wahrscheinlich unter der Woche um die Mittagszeit. Dabei ist dieser nicht länger als 10 Minuten angemeldet. Der Durchschnitts-User hat 144 Freunde und verbringt die meiste Zeit nicht damit, Nachrichten zu schreiben, sondern mit dem Stöbern auf anderen Profilseiten. Da sich Social Networks jedoch im ständigen Wandel befinden und auch das Verhalten der Benutzer sich mit jeder neuen Funktion oder jedem neuen Design ändern kann, ist es wichtig, Studien in regelmäßigen Abständen durchzuführen und die gewonnenen Erkenntnisse zu vergleichen. REFERENCES [1] F Benevenuto, T Rodrigues, M Cha, and V Almeida. Characterizing User Behavior in Online Social Networks. in ACM SIGCOMM conference on In, pages 49 62, II, 1, 2, 4, 5, 6, 7 [2] Moira Burke, C. Marlow, and T. Lento. Feed me: motivating newcomer contribution in social network sites. In Proceedings of the 27th international conference on Human factors in computing systems, pages ACM, II [3] Meeyoung Cha, Alan Mislove, and Krishna P. Gummadi. A measurement-driven analysis of information propagation in the flickr social network. Proceedings of the 18th international conference on World wide web - WWW 09, page 721, VIII-C [4] Andreas Dengler. Geldvernichtungsmaschine Facebook? [Internet]. geldvernichtungsmaschine-facebook-jetzt-mit-zahlen/, Okt Zuletzt abgerufen am II [5] S.A. Golder, D.M. Wilkinson, and B.A. Huberman. Rhythms of social interaction: Messaging within a massive online network. In Communities and technologies 2007: proceedings of the Third Communities and Technologies Conference, Michigan State University 2007, page 41. Springer-Verlag New York Inc, V, 3, VIII-B2 139

140 BENUTZERVERHALTEN IN ONLINE-SOCIAL-NETWORKS [6] Handelsblatt. News Corporation kauft Internetfirma Intermix Media [Internet]. news-corporation-kauft-internetfirma-intermix-media;929653, Jul Zuletzt abgerufen am I [7] S. Jones and M. Madden. The Internet goes to college. Pew Internet and American Life Project, VIII-B1 [8] Jürgen Kuri. Microsoft kauft sich bei Social-Networking-Site Facebook ein [Internet]. Microsoft-kauft-sich-bei-Social-Networking-Site-Facebook-ein html, Okt Zuletzt abgerufen am I [9] Cliff Lampe, Nicole Ellison, and Charles Steinfield. A Face(book) in the Crowd: Social Searching vs. Social Browsing. Human Factors, pages , I [10] Jure Leskovec, Lada a. Adamic, and Bernardo a. Huberman. The dynamics of viral marketing. ACM Transactions on the Web, 1(1):5 es, May II [11] Alan Long. The Rise and Rise of the Social Network. [Internet]. of_the_socia.html, Nov Zuletzt abgerufen am I [12] Nielsen Online Report. Social networks & blogs now 4th most popular online activity, September I [13] Wilfried Schock. 50plus im Netz Senioren Communitys nicht gefragt. [Internet]. senioren-communitys-nicht-gefragt/, Jul Zuletzt abgerufen am I [14] Maximilian Schönherr. Lauschangriff im Facebook-Chat [Internet]. http: // Mai Zuletzt abgerufen am I [15] Julia Seeliger. Daten-Leck bei Schüler-VZ [Internet]. daten-leck-bei-schueler-vz/, Okt Zuletzt abgerufen am I [16] Stoxn. Online-Communities und Social Networking:Wichtiger Bestandteil des Web 2.0 [Internet]. Version k/stoxn/online-communities-und-social-networking/2o6u2g6xfx3th/62, Sep Zuletzt abgerufen am I [17] Oliver Voß. Facebook verdient erstmals Geld [Internet]. facebook-verdient-erstmals-geld /, Sep Zuletzt abgerufen am II [18] Martin Weigert. Aktuelles Ranking: 149 Social Networks aus Deutschland. [Internet]. zn-aktuelles-ranking-149-social-networks-aus-deutschland/, Apr Zuletzt abgerufen am I [19] Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P.N. Puttaswamy, and Ben Y. Zhao. User interactions in social networks and their implications. Proceedings of the fourth ACM european conference on Computer systems - EuroSys 09, page 205, VIII-D [20] Peter-Michael Ziegler. Time Warner zahlt 850 Millionen US-Dollar für britische Online-Community [Internet]. Time-Warner-zahlt-850-Millionen-US-Dollar-fuer-britische-Online-Community html, Mär Zuletzt abgerufen am I 140

141 GENERAL OVERVIEW ON UNDERLAY MODELLING FOR P2P SIMULATORS General Overview on Underlay Modelling for P2P Simulators Christian Rosskopf Abstract Für die Erstellung und Analyse von P2P- Anwendungen ist es notwendig, die zugrunde liegende Netzstruktur zu kennen und simulieren zu können. Daher werden in dieser Arbeit mehrere Ansätze zur Erzeugung von Topologien betrachtet und analysiert. Dabei wird geprüft wie gut es gelingt eine Topologie zu erstellen die dem Internet möglichst ähnlich ist. Für eine realistische Simulation einer dem Internet ähnlichen Topologie gehört auch die Berücksichtigung von Latenzen. Deshalb werden einige Modelle vorgestellt, die sich mit der Messung und Berechnung von Latenzen beschäftigen. I. EINFÜHRUNG Um Peer-to-Peer-Anwendungen (P2P) sinnvoll testen zu können, wird eine große Anzahl an teilnehmenden Peers benötigt. Diese Arbeit beschäftigt sich nun damit wie ein großes P2P-Netzwerk realistisch simuliert werden kann. Dabei muss zum einen die Heterogenität der Peers beachtet werden, da sie sich beispielsweise sehr in Leistungsstärke und ihrer Anbindung an das Internet unterscheiden können. Zum anderen muss vor allem die Struktur des simulierten Netzwerks dem Internet sehr ähnlich sein, um die erzielten Ergebnisse auf eine spätere Anwendung übertragen zu können. Daher wird hier überprüft, wie eine solche Netzstruktur für eine Simulation geschaffen werden kann. Zusätzlich ist für eine realistische Simulation auch die Implementierung eines Latenzmodells notwendig. Dafür werden einige Modelle vorgestellt, mit denen es möglich ist in großen Netzen die Latenz zu messen. Die so gesammelten Beispieldaten könnten dann in ein Simulationsmodell integriert werden. Die Arbeit ist folgendermaßen aufgebaut: In Kapitel II werden zuerst, die für diese Arbeit grundlegenden Begriffe definiert und erläutert. Ebenso wird eine kurze Einführung in den Aufbau von Peer-to-Peer-Netzwerken gegeben. In Kapitel III wird der Aufbau des Internets kurz betrachtet. Dazu werden gemessene Ergebnisse vorgestellt, die als charakteristische Werte für das Internet angesehen werden können. Kapitel IV betrachtet verschiedene Ansätze, um eine dem Internet möglichst ähnliche Topologie zu erzeugen. In Kapitel V werden die Ergebnisse dieser Ansätze mit den in Kapitel III gemessenen Werten verglichen. Dadurch werden die einzelnen Ansätze in ihrer Effizient bewertet. Kapitel VI beschäftigt sich dann damit, wie auf einem geeignetem Modell während einer Simulation Latenzen dargestellt werden können. Dazu werden einige Modelle vorgestellt, die sich damit beschäftigen, wie Latenz möglichst genau gemessen werden könnte. In Kapitel VII werden die Latenzmodelle bewertet und es werden ihre Anforderungen an Speicher und Rechenleistung betrachtet. Darauffolgend werden in Kapitel VIII zwei existierende P2P- Simulatoren kurz vorgestellt. Zum Abschluss wird in Kapital IX eine Zusammenfassung der Arbeit präsentiert. Dabei wird betrachtet, wie gut Topologien erzeugt werden können und welche Arten der Latenzsimulation besonders empfehlenswert sind. II. GRUNDLAGEN Es folgt zuerst eine kurze Erläuterung von wichtigen Begriffen aus der Graphentheorie und anderen benötigten Vokabeln. Danach wird definiert was unter einem P2P-Netzwerk zu verstehen ist und was für dieses kennzeichnend ist. A. Begriffe der Graphentheorie Folgend werden wichtige Begriffe der Graphentheorie definiert. Die komplexeren Begriffe werden dabei am Beispielgraph in Abbildung 1 1 erläutert. Die hier vorgestellten Begriffe werden in den folgenden Kapiteln immer wieder aufgegriffen, um das Internet mit künstlich geschaffenen Topologien vergleichen zu können. 1) Average Path Length: Dies bezeichnet die durchschnittliche Pfadlänge [21] in einem Graphen. Dazu wird für jeden Knoten die durchschnittliche Entfernung zu allen anderen Knoten berechnet und daraus der Durchschnittswert berechnet. Je nach Definition kann als Entfernung die Anzahl der zu passierenden Knoten auf dem kürzesten Weg oder die Summe der gewichteten Kanten verwendet werden. 2) Degree Distribution: Damit wird die Verteilungsfunktion des Knotengrades im Graphen bezeichnet. Sie gibt Auskunft über die Anteile der Knoten im Graph, die beispielsweise einen niedrigen Knotengrad besitzen. 3) Diameter: Damit wird in [21] unter dem Namen "Average Eccentricity" der mittlere Durchmesser eines Graphen definiert. Der Wert für den Durchmesser eines Knoten ist gleich der Länge der kürzesten Strecke zwischen ihm und dem am weitesten von ihm entfernten Knoten. Der durchschnittliche Durchmesser ist demnach der Mittelwert über die Durchmesser aller Knoten des Graphen. Als Durchmesser des Graphen wird der höchste zu einem Knoten gehörende Durchmesser verwendet. 4) Clustering Coefficent: Der lokale Clusterkoeffizient [21] beschreibt wie stark ein Knoten verbunden ist. Dazu wird für jeden seiner Nachbarn berechnet, mit wie vielen der anderen Nachbarn sie verbunden sind. Ein Wert von 1 bedeutet dabei, jeder Nachbar besitzt auch eine direkte Verbindung zu jedem anderen Nachbarn. Ein Wert von 0 sagt hingegen aus, dass zwischen den Nachbarn keine Verbindungen bestehen. Für den globalen Clusterkoeffizent wird der Durchschnitt der einzelnen Werte verwendet

142 GENERAL OVERVIEW ON UNDERLAY MODELLING FOR P2P SIMULATORS Fig. 1. Der Clusterkoeffizient (CK) für Knoten 1 beträgt 1, da seine beiden Nachbarn miteinander verbunden sind. Für die Knoten 2 und 5 ist der CK = 1/3, da jeweils nur eine von drei möglichen Verbindungen unter ihren Nachbarn besteht. Die Knoten 3 und 4 haben CK = 0, da ihre Nachbarn nicht untereinander verbunden sind. Für Knoten 6 ist er nicht definiert. 5) Vertex Cover: Die Knotenüberdeckung [21] bezeichnet das Verhältnis zwischen der Anzahl einer Auswahl an Knoten und der Anzahl der im Graph vorhandenen Knoten. Für die Auswahl muss dabei folgendes gelten: 1) Fügt man zu den ausgewählten Knoten alle Knoten hinzu die durch eine ihrer Kanten erreichbar sind, erhält man die Menge aller im Graphen vorhandenen Knoten. 2) Die Auswahl muss eine minimale Anzahl an Knoten umfassen. In Abbildung 1 werden für die Knotenüberdeckung mindestens 2 Knoten benötigt, die Knotenüberdeckung beträgt also 1/3 und könnte durch eines der folgenden Paare dargestellt werden: {1,4}, {2,4} oder {4,5}. 6) Maximum Clique Size: Als Clique wird in der Graphentheorie eine Teilmenge eines Graphen bezeichnet, in der jedes beliebige Knotenpaar direkt miteinander verbunden ist. Es existiert also eine Teilmenge die komplett untereinander verbunden ist. Dies würde bei Topologien als "fully connected mesh" bezeichnet. Unter Maximum Clique Size?? versteht man entsprechend die Größe der größten Clique im Graphen. In Abbildung 1 hat die maximale Clique Größe 3 und besteht aus der Knotenmenge {1,2,5} B. Weitere Begriffe 1) Latenz: Mit Latenz wird die Verzögerung in der Kommunikation zwischen den Knoten bezeichnet. Die Ursachen dafür können zum einen in der geographischen Entfernung liegen. Die Übertragungen zwischen einem Peer in Europa und einem Peer in Nordamerika sind normalerweise einer größeren Latenz ausgesetzt, als die zwischen zwei geografisch benachbarte Knoten. Latenz kann aber auch durch unterschiedliche Anbindungen an das Internet entstehen. So können zwei geographisch benachbarte Knoten verschiedene Service Provider nutzen und dadurch eine viel höhere Latenz besitzen, als durch ihre Entfernung erwartet wird. 2) Jitter: Damit wird ein Störwert bezeichnet, der Schwankungen in der gemessenen Latenz zwischen zwei Knoten erklärt. Durch Jitter kann eine Antwort entweder früher oder später ankommen als erwartet. 3) Churn: Der Begriff beschreibt die ständige Fluktuation von Knoten in einem P2P-Netzwerk. Ständig kommen neue Knoten hinzu und andere verlassen das Netzwerk entweder durch Abmeldung oder ohne Benachrichtigung. 4) Powerlaw: Die allgemeine Form eines Powerlaws oder Potenzgesetzes ist in Formel 1 zu sehen. Dabei sind b und c reelle Konstanten. Powerlaws konnten in vielen Bereichen nachgewiesen werden. So unterliegen einige Phänomene der Wirtschaft ebenso wie der Bevölkerungswachstum einem Powerlaw. Für diese Arbeit ist interessant, dass auch im Internet einige Powerlaws existieren. So hängt beispielsweise die Häufigkeit eines vorkommenden Knotengrades und somit die Verteilung der Knotengrade ebenfalls von einem Powerlaw ab. C. Peer-to-Peer-Networks f(x) = b c x (1) Im Gegensatz zu einem Client-Server-Modell sind in einem P2P-Netzwerk nach der ursprünglichen Definition alle Peers gleich [6]. Typisch für P2P-Netze ist, dass es keine zentrale Kontrolle gibt und die Peers somit unabhängig sind. Ein P2P- Netz ist sehr dynamisch und es gibt einen ständigen Wechsel zwischen hinzukommenden und gehenden Knoten. Weitere allgemeine Eigenschaft von P2P-Netzen ist die Skalierbarkeit und Robustheit gegenüber Angriffen. Ein Netzwerk kann sehr gut mit einer hohen Anzahl an Knoten arbeiten, da diese nicht von einer zentralen Stelle verwaltet werden müssen. Dadurch ist es auch für Angreifer schwierig verwundbare Stellen innerhalb des P2P-Netzes ausfindig zu machen. In der Praxis werden strukturierte und unstrukturierte P2P-Netzwerke unterschieden. In einem strukturierten Netzwerk wie Chord [20] besitzen die einzelnen Peers feste Zuständigkeitsbereiche bezüglich Objekten. Clients wird in diesem Fall beispielsweise durch eine Hashfunktion eine feste Position im Overlay zugewiesen. Im unstrukturierten Ansatz, der häufig in Filesharing-Anwendungen zu finden ist, kann der Client selbst wählen mit wem er sich verbindet und somit seine Position im Overlay beeinflussen. III. CHARAKTERISTISCHE EIGENSCHAFTEN DES INTERNETS Die hier vorgestellten Werte wurden in [21] präsentiert und basieren auf der Aufbereitung zweier Studien [16], [17]. Diese haben Routing Tabellen von 51 autonomen Systemen (AS) im Zeitraum November 1997 bis Februar 2002 gesammelt. Die einzelnen Domänen oder autonomen Systeme haben eine Größenordnung von ungefähr 6000 bis Knoten. A. Nodedegree In [21] wurde gezeigt, dass ungefähr 1,5 bis 2% der Knoten einen Anteil von über einem Drittel an der Summe der Knotengrade haben. Auf der anderen Seite haben über 60% der Knoten nur einen Grad von 1, sie sind somit also nur zu einem anderen Knoten verbunden. Weitere ca. 25% der Knoten haben einen Grad von 2. In [7] wurde gezeigt, dass die Verteilung des Knotengrades einem Powerlaw unterliegt. B. Cluster Coefficient Dadurch, dass sehr viele Knoten nur eine sehr geringe Anzahl an Nachbarn haben, können höhere Werten für den Clusterkoeffizienten als bei vielen Zufallsmodellen beobachtet werden. Für die beobachteten Systeme wurden Werte im Bereich von 0,35 bis 0,45 gemessen, wobei ein leichter Anstieg 142

143 GENERAL OVERVIEW ON UNDERLAY MODELLING FOR P2P SIMULATORS über den Untersuchungszeitraum zu beobachten war. Dieser relativ hohe Wert kann auch auf das Powerlaw zurückgeführt werden, welches für die Verteilung des Knotengrades verantwortlich ist. C. Vertex Cover Bei den in [11] zitierten Beobachtungen wurden für Vertex Cover Werte von 0,15 bis 0,2 festgestellt. Hier ist zu beachten, dass die Werte im Laufe de Experimentes kontinuierlich sanken und gegen Ende den Wert 0,15 erreichten. Dies bedeutet, dass bei richtiger Auswahl mit Hilfe von 15% der Knoten der komplette Graph erstellt werden kann. D. Diameter Für die untersuchten autonomen Systeme konnte ein mittlerer Durchmesser von 6,5 bis 7,5 bestimmt werden. Im Durchschnitt ist es also möglich mit 7 Sprüngen von einem Knoten zu einem anderen beliebigen Knoten zu kommen. E. Maximum Clique Size Als Größe wurden hier im Laufe des Experiments Werte zwischen 8 und 16 Knoten gemessen. Es wurde dabei in [11] angenommen, dass es sich bei diesen Cliquen um die Knoten handelt, die den höchsten Knotengrad besitzen. Übertragen auf das gesamte Internet wird die maximale Clique vermutlich durch die 10 bis 15 größten autonomen Systeme gebildet. IV. MODELLIERUNG VON INTERNETTOPOLOGIEN In diesen Abschnitt werden nun einige Modelle und Generatoren vorgestellt, die zur Erzeugung von Graphen verwendet werden. Dabei wird kurz beschrieben nach welchem Verfahren ein Graph erzeugt wird. Im Kapitel V wird dann bewertet wie erfolgreich die Modelle sind, um eine Topologie zu erzeugen, die den in III gemessenen Werten ähnlich ist. A. Gilbert Modell In diesem sehr einfachen Modell [10], [19] existiert zwischen zwei Knoten mit der Wahrscheinlichkeit p eine Kante. Ein Generator benötigt als Eingaben nur die Anzahl der Knoten und die Wahrscheinlichkeit p. B. Barabsi-Albert-Modell Bei diesem Modell [19] wird versucht, die in III beschriebene Eigenschaft des Powerlaws bezüglich des Knotengrades zu erreichen. Dazu wird der Graph sukzessiv aufgebaut. Ein Knoten der neu an den Graphen angebunden wird, versucht sich mit m verschiedenen Knoten zu verbinden, wobei Knoten die einen hohen Grad besitzen bevorzugt werden. Ein vorhandener Knoten j wird mit Wahrscheinlichkeit Π(j) ausgewählt. Dabei wird Π(j) nach Formel 2 berechnet, wobei k t (j) für den Knotengrad des aktuell betrachteten Knoten j steht: Π(j) = k t(j) k t (v) = k t(j) (2) 2 m t v V Fig. 2. Die Abbildung aus [22] zeigt die beiden Möglichkeiten wie ein neuer Knoten mit einem bestehenden Netzwerk verbunden werden kann und welche neuen Verbindungen dabei erzeugt werden. Je höher also der Anteil eines Knotens an der Summe der Knotengrade ist, desto höher ist die Wahrscheinlichkeit, dass er von einem neuen Knoten ausgewählt wird. Für das Modell kann mit dem Parameter m festgelegt werden, wie viele Verbindungen ein neuer Knoten erstellen soll. C. Positive-Feedback-Preference-Modell - PFP-Modell Auch in diesem Modell [22] wird sich an der Powerlaw- Eigenschaft orientiert. Jedoch werden hier Knoten mit einem hohen Grad noch stärker bevorzugt. In Abbildung 2 werden zwei Möglichkeiten unterschieden, wie ein neuer Knoten k1 sich mit dem aktuellen Graphen verbinden kann: 1) Mit Wahrscheinlichkeit 1-p verbindet sich k1 mit einem vorhandenem Knoten. Dieser verbindet sich mit zwei weiteren Knoten des Graphen zu denen noch keine direkte Verbindung besteht. 2) Mit Wahrscheinlichkeit p wird der neue Knoten k1 mit zwei Knoten des existierenden Graphen verbunden. Einer dieser Knoten verbindet sich mit einem weiteren Knoten aus dem Graphen zu dem bisher keine direkte Verbindung besteht. Die Auswahl der Knoten zu denen eine Verbindung erstellt wird, wird nach der Formel 3 bestimmt. Dabei bezeichnet Π(i) die Wahrscheinlichkeit mit der Knoten k i zu einem Knoten mit Knotengrad k eine Verbindung erstellt. Dem Generator werden beim Start die Anzahl der Knoten, sowie feste Werte für δ und p übergeben. Nach den Auswertungen aus [22] werden die besten Ergebnisse mit δ = und p = 0.4 erzielt. Dabei handelt es sich bei δ um einen experimentellen Parameter, dessen Idealwert durch mehrere Simulationsvorgänge bestimmt wurde. D. I-Net 3.0 Π(k) = k(1+δ ln k) k (1+δ ln k) j, σ > 0 (3) Der Generator I-Net 3.0 [21] ist eine Weiterentwicklung von I-Net 2.2 [12]. Da I-Net 2.2 in einigen Punkten noch zu große Abweichungen zu real gemessenen Werten besaß, wurde I- Net 3.0 gezielt in diesen Bereichen verbessert. Dabei wurde 143

144 GENERAL OVERVIEW ON UNDERLAY MODELLING FOR P2P SIMULATORS vor allem an einer Optimierung in den Bereichen Degree Distribution und Vertex Cover gearbeitet. In I-Net 2.2 war der Vertex Cover zwischen 50 und 100% größer als in den Messungen. Dies ist darauf zurück zu führen ist, dass zu viele Knoten mit einem kleinem Grad zu anderen Knoten mit kleinem Grad verbunden wurden. Diese Knoten müssten dann auch in die Menge des Vertex Cover aufgenommen werden, bringen aber ihrerseits über die Kanten kaum andere Knoten mit. Beispielsweise waren 35% der Knoten mit Grad 2 mit Knoten verbunden die einen Grad von maximal 3 hatten. In den realen Werten waren es nur knapp 5%. Ähnliches wurde auch für die am stärksten verbundenen Knoten festgestellt. Ungefähr 45% der Nachbarn des Knoten mit dem zweithöchstem Grad besitzen selbst einen Grad von maximal 3, wobei hier ein Wert von 75% erwartet wurde. Daraus folgt, dass Knoten mit niedrigem Grad öfter mit Knoten verbunden werden müssen die einen hohen Grad besitzen. Als Folge dessen wurde der Algorithmus für die Verbindungen in I- Net 3.0 überarbeitet. Die Wahrscheinlichkeit, dass sich zwei Knoten verbinden steigt nun umso mehr, je stärker sie sich in ihrem Grad unterscheiden. Ein Graph wird nach folgendem Schema erzeugt: 1) Es wird ein sukzessiv ein Spannbaum mit allen Knoten erzeugt deren Grad größer 1 sein soll. Ein neuer Knoten wählt dafür nach Formel 4 und 5 einen Knoten aus, der noch freie Grade besitzt. 2) Alle Knoten mit Grad 1 werden nun sukzessiv in den Graphen eingebaut. Auch sie wählen ihren Verbindungspartner über 4 und 5 aus. 3) Nun besteht der Graph aus allen Knoten, jedoch haben viele Knoten noch freie Grade. Jetzt werden innerhalb des Graphen neue Verbindungen erzeugt. Dabei wird mit den Verbindungen für den Knoten mit den meisten freien Graden begonnen. Auch hier wird die Auswahl über 4 und 5 getroffen. w j i = MAX(1, (log d i d j ) 2 + (log f(d i) f(d j ) )2 ) d j (4) P (i, j) = wj i wi k k G In 5 steht P(i, j) für die Wahrscheinlichkeit, dass ein Knoten i mit Grad d i zu einem Knoten j mit Grad d j verbindet. Dazu wird anhand der jeweiligen Knotengrade eine Gewichtung für die Verbindung in 4 berechnet. Diese wird anhand der Knotengrade und deren Häufigkeit bestimmt. Kombinationen aus stark unterschiedlichen Graden werden dabei stark bevorzugt, während gleiche oder sehr ähnliche Grade nur mit linearer Priorität beachtet werden. Die wichtigsten Einstellungen für den Generator sind die Anzahl der gewünschten Knoten, sowie der Anteil von Knoten der Grad 1 besitzt. V. EVALUATION DER TOPOLOGIEMODELLE In diesem Abschnitt werden die von den zuvor in IV vorgestellten Modelle und Generatoren erzeugten Topologien betrachtet und mit denen für das Internet charakteristischen Werten aus III verglichen. (5) A. Gilbert-Model Die erzeugte Topologie weicht sehr stark von der in III vorgestellten ab. Durch die Art wie die Knoten verbunden wurden, ergibt sich eine ganz andere Verteilung des Knotengrades. Außerdem kann durch das Modell gut ein Erwartungswert für die Anzahl an Verbindungen pro Knoten berechnet werden. Da alle Knoten ungefähr gleich viele Nachbarn besitzen und die Verbindungen zufällig erstellt wurden, sind die Nachbarn eines Knoten untereinander nicht stark verbunden. Daraus ergibt sich auch ein sehr geringer Wert für den Clusterkoeffizienten, der somit stark von den angestrebten Werten abweicht. B. Barabasi-Albert-Model Das Modell liefert die besten Werte für die Einstellung m = 3. Dadurch besitzt aber jeder Knoten auch mindestens m Verbindungen. Dies ist aber eine starke Abweichung zu den Werten aus III und [21]. Dort wurde gezeigt, dass ein Großteil der Knoten nur einen Nachbarn besitzt. C. PFP-Model Die vom Modell erzeugte Topologie wurde in [22] mit einem chinesischen AS verglichen. Dabei wurden bezüglich des Knotengrades Werte erzielt die sehr nahe am Vergleichsobjekt lagen. Bezüglich der Pfadlänge und der maximalen Clique konnten sehr ähnliche Werte erzielt werden. D. I-Net 3.0 Die Optimierungen gegenüber der Vorgängerversion haben zu Verbesserungen bezüglich Degree Distribution und Vertex Cover geführt. Jedoch weicht I-Net 3.0 noch immer stark von den in III gemessenen Werten für Clique Size und Clustering Coefficient ab. Vor allem im Bereich der maximalen Clique sind die Ergebnisse ernüchternd, da hier I-Net 2.2 bessere Werte lieferte. Auch der Clusterkoeffizient konnte zur Vorgängerversion nicht wirklich verbessert werden. Hier lagen die Ergebnisse bei den frühen Vergleichsdaten sogar leicht unter den Werten von I-Net 2.2, bei den späten konnte aber eine geringe Verbesserung zum Vorgänger erzielt werden. Da die Verteilung der Knotengrade ist in I-Net 3.0 nahezu identisch zu den gemessenen Vergleichswerten ist, liegt die Ungenauigkeit in einzelnen Bereichen wohl an der Art wie die Knoten untereinander verbunden werden. Der Verbindungsalgorithmus sollte also noch anhand der neuen Ergebnisse weiter verfeinert werden. VI. MODELLIERUNG VON LATENZEN IN SIMULATIONEN Um eine Simulation realistisch durchführen zu können, wird neben einem geeignetem Topologiemodell auch eine realistische Darstellung der Latenz zwischen den einzelnen Knoten benötigt. Es gibt beispielsweise die Möglichkeit konstante oder auf zufällige Werte für die Latenz zu verwenden. Ein weitere Möglichkeit besteht darin, bereits gemessene Latenz mit Hilfe einer analytischen Verteilungsfunktion den Knoten zuzuteilen. Im folgenden werden einige weitere Ansätze vorgestellt, die es ermöglichen sollen realistische Latenzwerte für eine Simulation zu erhalten. Eine Auswertung dieser Modelle findet in Kapitel VII statt. 144

145 GENERAL OVERVIEW ON UNDERLAY MODELLING FOR P2P SIMULATORS Fig. 3. Um die Latenz zwischen A und B bestimmen zu können wird von KING eine DNS-Anfrage an Name-Server-A gestellt. Dabei wird nach Host B gefragt und die Anfrage an Name-Server-B weitergeleitet. Aus der Antwortzeit und der Latenz zwischen KING und Name-Server-A kann die Latenz zwischen den Servern berechnet werden. Bild stammt aus [11] A. KING In KING [11] wird die Latenz zwischen zwei beliebigen Endhosts mit Hilfe von DNS-Anfragen geschätzt. Dieses Verfahren besitzt den Vorteil, dass die Endhosts nicht unter der Kontrolle des KING-Netzes sein müssen. Die Entwicklung von KING basiert auf zwei Beobachtungen: 1) Ein Endhost befindet sich meistens in der Nähe seines DNS-Servers. 2) Die Latenz zwischen zwei DNS-Servern kann über rekursive DNS-Abfragen ermittelt werden. Um die Latenz zwischen den Hosts A und B in Abbildung 3 zu bestimmen, sendet der KING-Client eine Anfrage nach Host-B an den Name-Server-A. Dieser leitet sie an Name- Server-B weiter. Von der Bearbeitungsdauer der Anfrage wird Latenz abgezogen, die zwischen dem Client und Name- Server-A besteht. Diese kann mittels eines ICMP-Pings oder einer iterativen DNS-Anfrage ermittelt werden. Da sich nach Beobachtung 1, die Endhosts in der Nähe ihrer Name-Server befinden, kann der ermittelte Wert als Latenz zwischen den Endhosts angesehen werden. B. Global Network Positioning - GNP In [14] werden die Schwächen einiger Ansätze bei der Simulation mit Latenzen angesprochen. Werden die Latenzzeiten über analytische Verteilungsfunktionen bestimmt, ist dies zwar einfach und ohne große Ressourcen umzusetzen, jedoch kann dabei keine geographische Verteilung der Knoten berücksichtigt werden. Für den Fall, dass Latenzen für jeden Knoten gespeichert werden tritt das Problem auf, dass der Speicherbedarf quadratisch mit der Knotenzahl steigt. Daher wird ein Ansatz mit linearer Komplexität vorgeschlagen der mit Netzwerkkoordinaten arbeitet. Damit während der Simulation der Speicherbedarf und die Rechenleistung niedrig gehalten werden können, müssen im Vorfeld komplexe Berechnungen durchgeführt werden, die jedoch für weitere Simulationen verwendet werden können. Grundlage des Modells ist die Beobachtung, dass oft weit entfernte Knoten mit geringerer Verzögerung kommunizieren können, als geographisch benachbarte Knoten, da sie den gleichen ISP nutzen. Für das Modell werden sogenannte Monitore benötigt. Dies sind Knoten, für die untereinander jeweils die Round-Trip-Zeiten bekannt sind. Bei der Wahl der Monitore sollte dabei darauf geachtet werden, dass diese geographisch möglichst weit auseinander liegen. Die Monitore werden anschließend auf diesen Daten basierend in ein d-dimensionales Koordinatensystem eingetragen, wobei d kleiner der Anzahl an Monitoren sein muss. Um für einen neuen Knoten seine Position im Raum zu bestimmen muss er mindestens d + 1 Monitore pingen. Anhand der erhaltenen Latenzen kann der Knoten selbst seine Position im Raum bestimmen. Um nun die Latenz für zwei beliebige Knoten, die im Koordinatensystem eingetragen sind, zu berechnen, muss nur die euklidische Norm ihrer Koordinaten bestimmt werden. C. Dynamische Latenz In [13] wird ebenfalls wie bei GNP [14] versucht über die IP-Adresse einen Standort zu berechnen und durch diesen eine Abschätzung der Latenz zu erhalten. Auch hier werden den Adressen Punkte in einem mehrdimensionalen Raum zugeordnet. Dabei wird auch betrachtet das Latenz kein statischer Wert ist sondern ein dynamischer. Eine gemessene Latenz entspricht nämlich nicht nur der benötigten Zeit für den Weg, es kommt ein Zeitanteil für die Verabeitung am Ziel hinzu. Deshalb wird in diesem Ansatz versucht eine Abschätzung für die Latenz zugegeben, die Schwankungen durch die Verarbeitung mit betrachtet. Im ersten Schritt des Modells werden für alle Knotenpaare die minimalen Delays anhand der Roundtrip-Zeiten aus den Daten von CAIDA [1] berechnet. Damit ist der statische Anteil der Latenzen berechnet und die Knoten können in einem mehrdimensonalen Raum entsprechend ihrer gegenseitigen Latenzen positioniert werden. Im nächsten Schritt soll der Jitter berechnet werden. Dabei werden für den Jitter Zeitintervalle festgelegt, die beschreiben um wie viel die Latenz vom berechneten Wert abweichen kann. So dient dieser Störwert dazu, dynamische Latenzen zu erzeugen. Die Daten für die Berechnung des Jitters stammen aus dem PingER-Projekt [5]. Dort ist deutlich erkennbar, dass der Jitter stark von den Regionen abhängt in denen sich die jeweiligen Knoten befinden. Beispielsweise ist der Jitter für eine Verbindung nach Afrika besonders hoch, wohingegen eine Verbindung Nordamerika-Europa fast den gleichen Wert hat wie Nordamerika-Nordamerika. Dies wird darauf zurückgeführt, dass in unterschiedlichen Regionen die Infrastruktur des Internets sehr unterschiedlich ausgebaut ist. VII. EVALUIERUNG DER LATENZMODELLE Zuerst betrachtet diese Kapitel die Genauigkeit der berechneten Latenz im Vergleich zu den wirklichen Latenzwerten. Anschließend wird eine Übersicht geliefert, die den Speicherbedarf und die Rechenleistung für die Modelle betrachtet. A. KING Bei Tests zur Bestimmung der Latenz von Webservern lagen zwei Drittel der Ergebnisse innerhalb einer Abweichung von 10% zu den tatsächlichen Werten. Drei Viertel der Ergebnisse lagen noch innerhalb einer Abweichung von 20%. Bei einem Versuch zur Bestimmung der Latenz von Napster-End-Clients kam es allerdings zu stärkeren Abweichungen. Hier wurde die geschätze Latenz zu niedrig angesetzt. Nach [11] ist dies 145

146 GENERAL OVERVIEW ON UNDERLAY MODELLING FOR P2P SIMULATORS auf die schlechtere Anbindung der Endnutzer im Vergleich zu Webservern zurückzuführen. Wenn man aus den Vergleichswerten allerdings den letzten Hop zum End-Client weglässt, erreicht KING die gleichen Ergebnisse wie im vorherigen Experiment. B. Global Network Positioning Für GNP wurden 200= Messungen vorgenommen und folgende Ergebnisse veröffentlicht: 1) 81% der berechneten Latenzen hatten eine maximale Abweichung von 50%. 2) 50% der Messungen hatten eine Abweichung von maximal 12.3%. Die größten Abweichungen gab es ab Latenzen die eigentlich einen Wert von mindestens 350ms hatten. Hier wurden die Werte teils stark unterschätzt. Allerdings machten diese auch nur knapp 7% der Werte aus. Die Auswertung der so erhaltenen Latenzen zeigte, dass die Verteilungskurve für die durchschnittliche Latenz gleich der Kurve aus den beobachteten Werten der CAIDA [1] Studie ist. C. Dynamische Latenzen Da dieses Modell auf den Werten von GNP für die Latenz basiert, wird hier auf den vorherigen Abschnitt verwiesen. D. Allgemeine Evaluierung In Abbildung 4 ist eine allgemeine Übersicht über die Berechnungskosten und den Speicherbedarf verschiedener Modellierungstypen dargestellt. Die Verwendung einer Verteilungsfunktion, die auf zuvor gemessenen Werten beruht, ist mit wenig Aufwand einsetzbar und benötigt nur minimale Systemanforderungen. Allerdings berücksichtigen diese Latenzen keine geographische Positionen der Knoten. Außerdem können die Latenzwerte für einen Knoten sehr unterschiedlich ausfallen, wenn jedes mal erneut auf die Funktion zurückgegriffen wird. Dies kann einen ungewollten Jitter erzeugen. Um das Modell von KING einzusetzen müssten große Mengen an Daten zuvor ermittelt werden und über die ganze Simulationsdauer gespeichert werden. In diesem Fall müsste für jedes Knotenpaar ein Latenzwert gespeichert sein. Dies entspricht einem Aufwand von O(n 2 ) und ist bei einer Knotenzahl von n > nicht zu vernachlässigen. Allerdings würde man sehr genau Werte zur Verfügung haben. Dieser Ansatz ist mit den Lookup-Tabellen vergleichbar. Die Ansätze GNP und dynamische Latenz haben einen sehr hohen Berechnungsaufwand für die Koordinaten der Knoten. Dieser ist allerdings einmalig und findet vor der Simulation statt. Die dabei berechneten Werte können anschließend für viele Simulationen genutzt werden. Im Falle von GNP gibt es auch kaum Speicheraufwand und der Rechenaufwand ist während der Simulation sehr gering. Bei Verwendung der dynamischen Latenz trifft auf den Rechenaufwand das gleiche wie für GNP zu. Der Speicheraufwand hingegen ist höher, da der Jitter für die einzelnen Kombinationen von Regionen gespeichert werden muss. Dies würde eigentlich einem Aufwand von O(n 2 ) entsprechen, kann jedoch vernachlässigt werden, da n hier im Gegensatz zum Modell von KING nur sehr klein ist. Mit n werden in diesem Fall große Regionen beschrieben die einen gemeinsamem Jitter-Wert haben. VIII. P2P-SIMULATOREN In diesem Kapitel werden abschließend zwei verschiedene P2P-Simulatoren vorgestellt. Dabei wird kurz beschrieben welche Underlaymodelle genutzt werden können und wie Latenzen simuliert werden. Außerdem wird aufgelistet welche Arten von Overlays in der Simulation unterstützt werden und welche zusätzlichen Features der Simulator zur Verfügung stellt. A. OverSim OverSim [2], [3] stellt drei unterschiedlich komplexe Underlaymodelle zur Auswahl: 1) Simple: In diesem Modus ist es möglich bis zu Knoten zu simulieren. Für die Simulation können Latenzen benutzt werden. Dabei kann entweder auf konstante Werte zurückgegriffen werden oder die Latenz kann über die Position in einem n-dimensionalen Raum ähnlich wie in GNP berechnet werden. Ebenso können den einzelnen Knoten Werte für Bandbreite, Laufzeitverzögerung, Jitter und Paketverlust zugeteilt werden um ein heterogenes Netzwerk zu simulieren. 2) Inet: In diesem Modus findet die Simulation unter Verwendung eines vollständigen IP-Protokolls statt. Des weiteren können auch simuliert werden und durch verschiedene implementierte MAC-Protokolle kann auch der Zugriff von drahtlosen Geräten simuliert werden. 3) SingleHost: Dieses Underlay dient als Anbindung an echte Netzwerke. Damit soll es ermöglicht werden, Overlay-Protokolle die für OverSim entwickelt wurden in realen Netzwerken einzusetzen. Bei der Simulation kann sowohl auf strukturierte und unstrukturierte Overlayprotokolle zurückgegriffen werden. In Oversim sind unter anderem Chord [20], Kademlia [15], Pastry [18], Gia [4] und Vast implementiert. Ebenso ist es möglich Churn zu simulieren oder mobile Knoten durch Änderung ihrer Koordinaten darzustellen. B. ProtoPeer Das Ziel von ProtoPeer [8], [9] ist es, die Lücke zwischen Simulation und dem richtigen Einsatz einer P2P-Anwendung zu überbrücken. Für Simulationen von P2P-Anwendungen wird normalerweise viel Code benötigt, der in der veröffentlichten Anwendung nicht mehr verwendet wird. ProtoPeer möchte erreichen, dass für Simulation und Veröffentlichung der Code genutzt werden kann. Das Framework wurde in Java implementiert und verwendet zum Übertragen von Nachrichten TCP und UDP. Es gibt sowohl die Möglichkeit eine Simulation in Schritten durchzuführen, als auch einen Live-Test durchzuführen. Für 146

147 GENERAL OVERVIEW ON UNDERLAY MODELLING FOR P2P SIMULATORS Fig. 4. Überischt über die verschiedenen Latenzmodelle die Darstellung von Delay, kann auf die in Abschnitt VI vorgestellten Modelle KING und GNP sowie auf eine analytische Verteilungsfunktion zurückgegriffen werden. Wie in Over- Sim können auch hier Churn und Knoten mit verschiedene Bandbreiten simuliert werden. Zum Testen der Anwendung existiert auch die Möglichkeit in unterschiedlich detaillierten Stufen ein Simulationsprotokoll zu erstellen. IX. FAZIT Die Beobachtungen in Kapitel V machen deutlich, dass es zwischen den Ergebnissen der Modelle und Generatoren und den wirklich gemessenen Strukturen noch immer Unterschiede gibt. Sowohl das Gilbert- als auch das Barabasi-Albert-Modell weichen in der Verteilung der Knotengrade stark vom erwünschtem Ergebnis ab und sind daher für eine realistische Simulation des Internets keine gute Wahl. Die Verteilung ist beim PFP-Modell schon besser aber nicht so überzeugend wie bei I-Net 3.0. Bei Betrachtung des Clusterkoeffizienten wäre jedoch das PFP-Modell zu bevorzugen, da I-Net 3.0 hier noch eine starke Abweichung besitzt. Dies bedeutet, dass I- Net 3.0 zwar bei der Anzahl der Nachbarn die ein Knoten besitzt sehr gute Arbeit leistet, jedoch noch nicht die korrekten Nachbarknoten auswählt. Hier ist allerdings anzumerken, dass die Vergleichsdaten aus den Jahren 1997 bis 2002 stammen und das I-Net-Projekt seit 2002 anscheinend nicht mehr weiterentwickelt wurde. Die Arbeit zum PFP-Modell entstammt immerhin dem Jahr 2006 und das Modell wird dort mit Werten aus dem Jahre 2002 und 2005 verglichen. Aufgrund dessen könnte man das PFP- Modell im Vergleich besser bewerten, da es eher aktuell ist. Die Ergebnisse aus Kapitel VI zeigen, dass es sehr gut möglich ist Latenzen genau abzuschätzen. Je nach Priorität des Projektes in dem man Latenz verwenden will, kann man sich für unterschiedliche Lösungen entscheiden. Den geringsten Aufwand bei der Beschaffung und Implementierung haben die analytischen Verteilungsfunktionen, allerdings liefern sie auch die ungenausten Ergebnisse. Spielt der Speicherbedarf keine Rolle, können mit einem Loopkup-Modell oder dem Ansatz von KING die genausten Ergebnisse erzielt werden. Für GNP und dem ähnlichen Modell mit dynamischen Latenzen aus VI-C spricht vor allem, dass sie recht gut in eine Simulation hinzugefügt werden können. Sie benötigen zwar einen gewissen Vorlauf um ihre Position einmalig zu berechnen, können dann aber ohne große Anforderungen an Speicher oder Rechenleistung integriert werden. Mit ihnen kann die Latenz zwischen beliebigen Knoten schnell und recht genau errechnet werden und. Beim Modell mit dynamischer Latenz gibt es zusätzlich die interessante Möglichkeit Jitter zu simulieren. In den Arbeiten zu den beiden vorgestellten Simulatoren wurde leider nicht genau drauf eingegangen, wie das Underlay jeweils modelliert wurde. Beide Modelle bieten aber die Möglichkeit eine sehr große Anzahl Knoten zu simulieren, wobei es bei OverSim mehr verschiedene Auswahlmöglichkeiten bezüglich des Underlay und auch des Overlay gibt. Mit beiden ist es möglich Latenzen zu berücksichtigen und dabei wird auch auf die hier vorgestellten Latenzmodelle zurückgegriffen. REFERENCES [1] Caida. macroscopic topology project. VI-C, VII-B [2] I. Baumgart, B. Heep, and S. Krause. OverSim: Ein skalierbares und flexibles Overlay-Framework fr Simulation und reale Anwendungen VIII-A [3] I. Baumgart, B. Heep, and S. Krause. OverSim: A scalable and flexible overlay framework for simulation and real network applications. In Ninth International Conference on Peer-to-Peer Computing (IEEE P2P 09), pages 87 88, VIII-A [4] Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, and S. Shenker. Making gnutella-like p2p systems scalable. In Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, page 418. ACM, VIII-A [5] L. Cotrell. PingER project at Stanford, VI-C [6] J. Ebersp "acher and R. Schollmeier. 5. First and Second Generation of Peerto-Peer Systems. Peer-to-Peer Systems and Applications, pages 35 56, II-C [7] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, page 262. ACM, III-A [8] W. Galuba, K. Aberer, Z. Despotovic, and W. Kellerer. ProtoPeer: from Simulation to Live Deployment in One Step. In Eighth International Conference on Peer-to-Peer Computing (P2P 08), pages IEEE, VIII-B 147

148 GENERAL OVERVIEW ON UNDERLAY MODELLING FOR P2P SIMULATORS [9] W. Galuba, K. Aberer, Z. Despotovic, and W. Kellerer. ProtoPeer: a P2P toolkit bridging the gap between simulation and live deployement. In Proceedings of the 2nd International Conference on Simulation Tools and Techniques, pages 1 9. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), VIII-B [10] E.N. Gilbert. Random graphs. The Annals of Mathematical Statistics, pages , IV-A [11] K.P. Gummadi, S. Saroiu, and S.D. Gribble. King: Estimating latency between arbitrary Internet end hosts. In Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurment, pages ACM, III-C, III-E, 3, VI-A, VII-A [12] C. Jin, Q. Chen, and S. Jamin. Inet: Internet topology generator IV-D [13] S. Kaune, K. Pussep, C. Leng, A. Kovacevic, G. Tyson, and R. Steinmetz. Modelling the internet delay space based on geographical locations. In 17th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2009). Citeseer, VI-C [14] G. Kunzmann, R. Nagel, T. Hossfeld, A. Binzenhofer, and K. Eger. Efficient simulation of large-scale P2P networks: Modeling network transmission times. In 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing, PDP 07, pages , VI-B, VI-C [15] P. Maymounkov and D. Mazieres. Kademlia: A peer-to-peer information system based on the xor metric. Peer-to-Peer Systems, pages 53 65, VIII-A [16] T. McGregor, H.W. Braun, and J. Brown. The NLAMR network analysis infrastructure. IEEE Communications Magazine, 38(5): , III [17] D. Meyer. University of oregon route views archive project. at routeviews. org. III [18] A. Rowstron and P. Druschel. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), volume 11, pages Citeseer, VIII-A [19] Ralf. Steinmetz and K. Wehrle. Peer-to-peer systems and applications. Springer, IV-A, IV-B [20] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, page 160. ACM, II-C, VIII-A [21] J. Winick and S. Jamin. Inet-3.0: Internet topology generator, II-A1, II-A3, II-A4, II-A5, III, III-A, IV-D, V-B [22] S. Zhou. Characterising and modelling the internet topology The rich-club phenomenon and the PFP model. BT Technology Journal, 24(3): , , IV-C, IV-C, V-C 148

149 PEER-TO-PEER-BOTNETS: EIN SYSTEMATISCHER ÜBERBLICK Peer-to-Peer-Botnets: Ein systematischer Überblick André Schaller Zusammenfassung Botnets stellen seit über 10 Jahren als verteilte Angriffswerkzeuge eine Bedrohung für legitime Internetnutzer dar. Sie bilden ein Netzwerk aus Computern, die von einem Angreifer kontrolliert werden. Diese Technologie hat vor wenigen Jahren eine Weiterentwicklung vollzogen, die den ehemals zentralen Charakter durch die Anwendung von P2P- Technologien ablöste. Diese neue Generation der Peer-to-Peer (P2P) Botnets ist komplexer im Aufbau, effektiver in ihren Funktionen und schwerer still zu legen. Exemplare dieser neuen Generation von Botnets wurden bereits entdeckt und sind für einige der größten Masseninfektionen in den letzten drei Jahren verantwortlich. In dieser Arbeit werden aktuelle Erkenntnisse über P2P- Botnets systematisch aufbereitet. Es werden Vertreter präsentiert und ihre Verbreitung verdeutlicht. Weiterhin wird eine Klassifizierung von P2P-Botnets vorgenommen, um aktuelle Schädlinge systematisch einordnen zu können. Im Anschluss wird gezeigt, dass P2P-Botnets, die auf dem publish-/subscribe-verfahren basieren und deren Kommunikation unverschlüsselt verläuft, mittels eines Index-Poisoning-Angriffs, zum Stillstand gebracht werden können. I. EINFÜHRUNG Eine der größten Bedrohungen der heutigen IT-Security stellen Botnets dar. Sie bestehen aus Netzwerken von Computern, die unter der Kontrolle eines Angreifers stehen. Das signifikante Bedrohungspotenzial von Botnets gegenüber anderen Schadprogrammen ist vielfältig, kann aber durch zwei wesentliche Aspekte verdeutlicht werden: Einerseits können Botnets in kurzer Zeit hohe Wachstumsraten erreichen, andererseits besitzen bereits kleinere Botnets genügend Ressourcen, um auch besonders geschützte Rechner oder Netzwerke erfolgreich zu attackieren und lahm zu legen. Ihr schnelles Wachstum, in verhältnismäßig kurzer Zeit, ist auf die Art ihrer Verbreitung zurück zu führen. Obwohl dem Angreifer verschiedene Angriffsvektoren zur Verfügung stehen, um die Schadsoftware des Botnets zu verbreiten (siehe Abschnitt III-B), sind es vor Allem Würmer, die als Transportmittel eine effiziente Propagation sicher stellen. Die Effektivität des Botnets wird durch die kummulierten Ressourcen der einzelnen Bots im Netzwerk sicher gestellt. Der überwiegende Teil der Rechner setzt sich aus Desktop- PCs, Web-Servern und anderen Hosts wie Laptops zusammen. Keiner dieser Hosts kann, für sich allein genommen, genügend Daten verarbeiten beziehungsweise versenden, um andere Teilnehmer im Internet zu beeinflussen. Oft sind die Internetanbindungen betroffener Systeme zu langsam, um einen Schaden bei anderen Hosts anzurichten. Die vereinten Ressourcen des Botnets jedoch, können bei einem zeitgleichen und zielgerichteten Angriff eine ausreichend große Datenmenge erzeugen, um auch stark gesicherte Systeme unter der Last zusammenbrechen zu lassen. Diese Art des Angriffs wird Distributed Denial of Service (DDoS) genannt und stellt eine wichtige Funktion von Botnets dar. Weiterhin sind Botnets heutzutage der primäre Treiber für den massenhaften Versand von Spam-Mails. Diese unerwünschten Mails wurden anfänglich von wenigen Rechnern versendet. Dies konnte jedoch durch Organisationen, die an der Verhinderung von Spam interessiert sind, erfolgreich unterbunden werden. Durch die verteilten Versand mittels Botnets wird dieser Ansatz der Spam-Bekämpfung erschwert. Weiterhin kann durch den parallelen Versand durch tausende Bots weitaus höhere Spam-Raten erzielt werden. Das Konzept der Botnets unterliegt einem sehr dynamischen Entwicklungsprozess, der die grundlegende Strukturen mehrfach verändert und effizienter gestaltet hat. Ausschlaggebend für die erste Generation von Botnets war die Entwicklung von EggDrop 1, eines der populärsten und ältesten Bots, der keine schädlichen Funktionen beinhaltete, sondern Routineaufgaben bei der Administration von Internet Relay Chat (IRC) Servern automatisieren sollte. Das Potenzial über ein etabliertes Protokoll mehrere Bots zu kontrollieren führte zur Entwicklung der ersten bösartigen Bots. Sie werden über einen zentralen IRC- Server gesteuert, in dem sie seinen speziellen Channel betreten und dort auf Befehle warten. Bekannte Vertreter dieser Art von Bots sind GTBot und Agobot. Diese Architektur mit dem IRC-Server als zentraler Punkt zur Steuerung (Command-and-Control; C2) ist für den Angreifer zwar einfach zu implementieren und effizient. Jedoch stellt sie gleichzeitig als Single Point of Failure die größte Schwachstelle dar. Mit dem Schließen des Channels oder des gesamten Servers können Bots keine Instruktionen erteilt werden. Die Weiterentwicklung von Botnets führte zu einer weitgehenden Dezentralisierung der Architektur. Dies wurde erreicht, indem vorhandene Entwicklungen aus dem Bereich Peer-to-Peer (P2P) adaptiert wurden. Abbildung 1. Schema eines zentralisierten IRC-Botnets [6]

150 PEER-TO-PEER-BOTNETS: EIN SYSTEMATISCHER ÜBERBLICK Der Rest der Arbeit ist wie folgt aufgebaut. In Abschnitt II werden allgemeine Informationen über bekannte Vertreter von P2P-Botnets vorgestellt. Abschnitt III befasst sich mit der Klassifizierung von Botnets nach den folgenden Gesichtspunkten: Typ des Botnets, Infizierungsvektoren, Ziele, Bootstrapping-Ansätze und Mechanismen der Command-and- Control-Kanäle. In Abschnitt IV werden die aus Abschnitt II vorgestellten Vertreter den Merkmalen aus Abschnitt III zugeordnet und im Detail erläutert. In Abschnitt V werden mögliche Gegenmaßnahmen gegen P2P-Botnets aufgezählt. Letztlich wird in Abschnitt VI diese Arbeit zusammengefasst. Abbildung 2. Schema eines Peer-to-Peer Botnets [6] P2P-Netzwerke sind Netze, deren Mitglieder, auch Peers oder Nodes genannt, gleiche Rollen besitzen. Jedes Peer stellt dem Rest Dienste zur Verfügung oder kann die Dienste der anderen nutzen. Dabei gibt es keine zentralisierten Funktionen einzelner Peers. Als Beginn der P2P-Entwicklung kann die Inbetriebnahme von Napster gesehen werden. Mit dem Client konnte ein PC als Peer im Netzwerk von Napster teilnehmen und mit anderen Teilnehmern Musikdateien tauschen. Die Suche nach Musikstücken verlief über einen Index, der über wenige, zentrale Server verteilt war. Daher kann Napster nicht als vollwertiges P2P-Netzwerk angesehen werden. Nach Schließung von Napster auf Grund illegaler Nutzung wurde Gnutella entwickelt: ein komplett dezentrales P2P-System. Im Gegensatz zu Napster gibt es bei Gnutella keine zentralisierten Rollen oder Funktionen innerhalb des Netzes. Suchanfragen werden in das Netzwerk gefloodet, d.h. sie werden von Node zu Node weiter gereicht. Dieser ineffiziente Suchansatz muss benutzt werden, da es sich bei Gnutella um ein unstrukturiertes P2P-System handelt: Es gibt keine festen Zuständigkeitsbereiche der Nodes und auch keinen definierten Platz, an dem angebotene Ressourcen abgelegt werden. In strukturierte P2P- Netzwerken werden verteilte Algorithmen genutzt um obige Probleme zu lösen. Mittels Distributed Hash Tables (DHTs) wird die Position der Nodes und der angebotenen Dateien im Netzwerk festgelegt und ist allen Teilnehmern bekannt. Bekannte Vertreter strukturierter P2P-Systeme sind Chord 2 und Kademlia 3. Eine chronologische Auflistung wichtiger Entwicklungen aus dem Bereich der Botnets und P2P liefert [9]. Einen schematischen Überblick liefert Tabelle 1. zentralisiert verteilt unstrukturiert strukturiert Beispiel Napster Gnutella (vor 2000) Chord, Kademlia Tabelle I SYSTEMATISIERUNG VON P2P-NETZWERKEN ANHAND IHRER STRUKTUR II. AKTUELLE P2P-BOTNETS Im folgenden Abschnitt werden drei prominente Vertreter von P2P-Botnets aufgeführt und allgemein beschrieben. Die Auswahl richtet sich nach der geschätzten Botnet-Größe, des daraus folgenden Schadpotenzial und ihrer Präsenz in den Medien. Neben den aufgezählten Schädlingen gibt es eine Reihe weiterer Vertreter. A. Stormworm Der bisher prominenteste Vertreter von P2P-Botnets stellt Stormworm 4 (Nuwar, Zhelatin) dar. Er basiert auf einem hybriden P2P-Ansatz, der auf Overnet als P2P-Protokoll aufsetzt. s, mittels derer er propagiert wurde, berichteten über einen Sturm über Europa, welcher zur Namensgebung führte. Er wurde als erstes im Januar 2007 entdeckt und erhielt seine Aktivitäten bis zum Anfang 2009 aufrecht. Die primäre Funktion des Stormworms besteht im Versand von Spam und Malware. Zwischen 2007 und 2009 war Peacomm für rund 17% des weltweiten Spam-Versands verantwortlich [11]. Zwischenzeitlich wurde das Stormnet auf eine Größe von ca. 1 Millionen infizierter PCs geschätzt. Markant für den Stormworm ist die große Anzahl von Derivaten. Mehrere tausend verschiedene Ausprägungen konnten täglich beobachtet werden. Diese hatten jedoch eine kurze Wirkungsdauer von wenigen Stunden. Mit dieser Taktik wollten die Autoren des Schädlings die Effektivität von Signatur-basierten Antivirenprodukten einschränken. Seit Anfang 2009 konnte ein drastischer Einbruch der Aktivitäten beobachtet werden. So wurde ab September 2008 kein weiterer Versand von Spam durch Stormworm registriert. Nach weiteren Analysen 5 wurden Anfragen an einzelne Peers des Stormnets mit der Nachricht Go away, we re not home beantwortet. Seit dem 28. April 2010 wurde eine modifizierte Version des Stormworms entdeckt 6. Sie setzt auf dem Großteil der Codebasis des Vorgängers auf. Jedoch wurde die P2P-Infrastruktur komplett entfernt. Diese als Stormworm 2 genannte Version kommuniziert ausschließlich über HTTP

151 PEER-TO-PEER-BOTNETS: EIN SYSTEMATISCHER ÜBERBLICK B. Nugache Nugache, der als Vorläufer vom Stormworm gesehen werden kann, wurde Anfang April 2006 entdeckt 7. Er stellte eine Trendwende in der Qualität von Botnets dar, da er als einer der ersten Schädlinge auf eine reine P2P-Infrastruktur setzte, um seine Command-and-Control-Channel zu implementieren. Die Hauptfunktion von Nugache bestand in der Ausführung von DDoS-Attacken und im Versand von Spam. Die Verwendung asymmetrischer Verschlüsselung in verschiedenen Bereichen der Kommunikation der Bots untereinander erschwerte die Analyse des Schädlings. C. Waledac Der bedeutendste P2P-Bot im Jahr 2009 war ohne Zweifel Waledac. Er wurde als erstes im Dezember 2008 entdeckt. Waldac wird von einigen Experten als Nachfolger des Stormworms bezeichnet. Grundsätzlich gibt es einige Gemeinsamkeiten bezüglich der technischen Ansätze, sowie der verfolgten Ziele. Waledec hat jedoch einen puristischeren P2P-Ansatz als Stormworm und ist somit dezentraler aufgebaut. Ähnlich wie der Stormworm liegt die Hauptfunktion von Waledac im Versenden von Spam-Mails. Waledac hat nach Schätzungen das Potenzial rund 1,5 Millionen Mails pro Tag zu versenden 8. Mitte 2009 wurde die Größe des Botnetzes auf circa Rechner geschätzt 9. Im Februar 2010 versuchte Microsoft rund 280 Domains von potenziellen Command-and-Control-Servern zu deaktivieren 10. A. Botnet-Typen III. KLASSIFIZIERUNG Obwohl P2P-Botnets selbst einen speziellen Typus allgemeiner Botnets darstellen, ist eine feinere Klassifizierung unterschiedlicher Typen von P2P-Botnets sinnvoll. Im Folgenden wird die Einteilung von [18] aufgegriffen, da sie bis zum Verfassungsdatum dieser Arbeit als die einzige gilt. Weiterhin greift die Art der Einteilung die wichtigsten strukturellen Unterschiede auf und verdeutlicht diese. P2P-Botnets können bezüglich ihres Umfelds und Mitglieder in drei unterschiedliche Klassen eingeteilt werden: parasite P2P-Botnets, leeching P2P-Botnets und bot-only Botnets. Parasite Botnets entwickeln sich auf Basis eines bereits existierenden P2P-Netzwerks. Sie setzen auf das in diesem Netz verwendete P2P-Protokoll auf, um die notwendigen C2- Mechanismen zu verwenden. Potenzielle Opfer dieser Art von Botnets sind legitime Nutzer des P2P-Netzwerks. Dies führt zu einer starken Beschränkung der etwaigen Größe des Botnets, da diese durch die Mitgliederzahl des zugrundeliegenden P2P- Netzwerks limitiert wird. Die Autoren von [18] sehen dies als Hauptgrund dafür, dass der Großteil aktueller P2P-Botnets nicht auf diese Art implementiert ist. Diesem Nachteil stehen d4666a88-8d90-4d6c e9452eebdb 9 How Much Spam Does Waledac Send? eine Reihe von Vorteilen gegenüber: Die gesamte Kommunikation kann mittels eines bereits existierenden und oft stabil funktionierendem Protokoll geschehen. Bot-Befehle, die über diesen In-Band-Channel versendet werden, fallen auf Grund ihrer Einbettung in legitime P2P-Befehle weniger auf und tragen daher zum Schutz vor der Entdeckung bei. Ein weiterer Vorteil bezieht sich auf die Aufnahme eines angreifbaren Opfers in das Botnet (Bootstrapping). Da diese bereits Mitglied des Netzwerks sind, werden keine weiteren Aktionen benötigt, um das neue Mitglied in das Botnet aufzunehmen Mögliche Varianten des Bootstrappings werden in Kapitel III-D diskutiert. Die Klasse der leeching Botnets kann als eine Erweiterung zu den parasite Botnets angesehen werden. Prinzipiell ähneln sie sich stark, da auch die leeching Botnets ein vorhandenes P2P-Protokoll, respektive P2P-Netzwerk, instrumentalisieren, um den Command-and-Control-Channel zu implementieren. Leeching Botnets sind jedoch in der Lage nicht nur angreifbare Hosts aus dem zugrundeliegenden Netz als Mitglieder aufzunehmen. Sie können jeden verwundbaren Rechner aus dem Internet zu einem Bot rekrutieren. Dies erhöht die Anzahl möglicher Mitglieder und gleichzeitig die Flexibilität des gesamten Botnets. Dieser Vorteil bedingt jedoch Maßnahmen, um einen Rechner außerhalb des P2P-Netzes in das Botnet aufzunehmen. Daher kommen leeching Botnets nicht ohne einen Bootstrapping-Prozess aus. Die flexibelste Art von P2P-Botnets sind bot-only Botnets. Ihre Mitglieder werden aus dem gesamten Internet rekrutiert. Das Netz selbst setzt meist nicht auf einem existierenden P2P- Protokoll auf, sondern nutzt ein unabhängiges, speziell dafür entwickeltes. Die Benutzung eines existierenden Protokolls ist jedoch weiterhin möglich. Folglich sind im ersten Fall alle Rechner im Netzwerk ausschließlich Bots, während bei den anderen Botnet-Typen auch legitime P2P-Nutzer Teil des Netzes sind. Bootstrapping ist in diesem Fall abhängig von der Architektur des Protokolls und somit optional. Hauptnachteil dieses Types ist, dass das eingesetzte Protokoll oft ungetestet eingesetzt wird. B. Infizierungsvektoren Um einen verwundbaren Host zu einem Mitglied des Botnets zu migrieren, muss dessen Schwachstelle ausgenutzt werden, um Schadcode zu infizieren. In der Regel erfolgt der Infizierungsprozess zweistufig. In einem ersten Schritt wird der primäre Schadcode (Payload) zur Ausnutzung der tatsächlichen Schwachstelle injiziert. Danach wird ein sekundärer Schadcode nachgeladen, um erweiterte Funktionalitäten bereitzustellen. Dieser zweistufige Prozess dient unter anderem der vereinfachten Infizierung mittels eines kleinen primären Payloads, welcher die Detektion erschwert. Weiterhin dient die Fähigkeit des Nachladens sekundären Schadcodes der Aktualisierung neuer Funktionen im Botnet. Die eigentlichen Infizierungsvektoren und die Orte, von denen der sekundäre Schadcode nachgeladen wird, hängt in erster Linie von der Klasse des P2P-Botnets ab. Für parasite Botnets, welche Teil eines legitimen P2P- Netzwerks sind, steht grundsätzlich P2P-Schadsoftware als 151

152 PEER-TO-PEER-BOTNETS: EIN SYSTEMATISCHER ÜBERBLICK Infizierungsvektor zur Verfügung. Hier kann, analog zu Viren und Würmen, in passive und aktive Schädlingen unterteilt werden. Passive P2P-Schädlinge sind meist als populäre Inhalte getarnte Payloads, die zum Download innerhalb des Netzwerks angeboten werden. In Frage kommen jegliche Medien, die unter den Peers getauscht werden (vermeintliche Videos-, Audio, Bild, PDF-Dateien et cetera). Sie verweilen passiv im Download-Verzeichnis des P2P-Clients und warten darauf, von einem anderem Peer heruntergeladen und ausgeführt zu werden. Aktive P2P-Schädlinge ähneln in ihrem Verhalten den Würmern, da diese sich selbst ständig verbreiten. Sie suchen aktiv nach potenziellen Opfern, indem sie zum Beispiel die zuletzt kontaktierte Peers versuchen zu infizieren oder Peers, welche auf eine Suchanfrage antworten. Leeching und bot-only Botnets können zusätzliche Arten der Propagation nutzen. In der Praxis nutzen sie meist eine Kombination gängiger Infizierungsvektoren. Die primäre Infektion erfolgt meist durch Spam-Mails, Würmer oder sogenannte Drive-by-Downloads beim Besuch einer modifizierten Website. Selten werden Exploits manuell und gezielt ausgenutzt, um eine Infektion zu realisieren. Ein wichtiges Element stellt Social Engineering dar, bei der Internetnutzer durch gefälschte Inhalte in Mails, Instant-Messaging-Nachrichten oder auf Websites zum Download vermeintlich legitimer Tools zum Ausführen der schädlichen Payloads überredet werden. Die Autoren von [3] berichten: Social engineering attacks represent a significant source of malware infections. Worms that spread through , peer-to-peer networks, and instant messaging clients account for 35% of the computers [...]. Die sekundären Schadcodes, welche erweiterte Funktionalitäten und Kommunikation mit dem Rest des Botnets bereitstellen, werden oft von Rechnern außerhalb des Netzwerks nachgeladen. In einigen Fällen werden diese Programmteile jedoch von anderen Peers innerhalb des Netzes bezogen. C. Ziele Einer der Hauptaufgaben von Botnets ist die Propagation des eigenen Schadcodes selbst. Die Verfasser von [8] behaupten: in most cases, botnets are used to spread new bots. Daneben werden Botnets für eine Reihe von Zwecken eingesetzt, die grob in drei Klassen kategorisiert werden können: Informationssammlung, Informationsverarbeitung und Informationsversand. Zur Gruppe der Informationssammlung zählen Funktionen wie die Samlung von Passwörtern, Seriennummern und Lizenzdaten installierter Software, persönliche Daten wie gesammelte -Adressen, Kontodaten und Ähnliches. Diese Funktion stellt unter den bisher untersuchten P2P-Botnets jedoch eine untergeordnete Rolle dar. Eine weiterer Einsatzzweck von Botnets ist die Verarbeitung von Informationen. Hierunter zählen in erster Linie das Cracken verschlüsselter Passwörter. Diese befinden sich meist in Form von MD5-Hashes, die als Teil von größeren, illegal erworbenen Datenbanken vorliegen. Das Cracken der Hashes geschieht meist durch einen Brute-Force-Ansatz, bei dem alle Möglichen Kombinationen durchprobiert werden. Dies erfordert große Rechenkapazitäten, welche durch das Botnet bereit gestellt werden können. Eine der wichtigsten Funktionen von Botnets ist jedoch der Informationsversand. Hierunter zählen Aktivitäten, wie das Durchführen einer Distributed Denial-of-Service-Attacke (DDoS-Attacke). Weiterhin fällt der massenhafte Versand von Spam-Mails unter diese Kategorie. In manchen Fällen wurde auch die Propagation von Schadsoftware, die nicht direkt zum Botnet gehört, beobachtet. Was dieses Einsatzgebiet so interessant macht, sind die Verwertungsmöglichkeiten. Teile des Botnets können an Interessenten vermietet werden, um ihnen temporär Kontrolle über eine der genannten Funktionen zu übergeben. Diese Kunden starten ihrerseits Spam- Kampagnen, um Umsatz zu erwirtschaften. Oder sie setzen Marktkonkurrenten durch DDoS-Angriffe unter Druck oder unterbinden das Nutzen deren Angebote, um den Umsatz zu schmälern. Der Informationsversand ist somit interessant, da er eine wirtschaftliche Komponente besitzt. Eine systematische Übersicht modularer Funktionsblöcke von Botnets liefert [12]. D. Bootstrapping Damit ein verwundbarer Host eine nutzbare Ressource innerhalb des Botnets darstellen kann, muss diese in das Netzwerk aufgenommen werden. Dieser Prozess wird als Bootstrapping bezeichnet. Um sich mit dem restlichen Netzwerk zu verbinden und vom Operator Befehle entgegen nehmen zu können, muss jeder Bot mit ausreichend Informationen darüber versorgt werden. Diese Informationen macht das Bootstrapping zu einem kritischen Teil der Gesamtarchitektur, da es sich als Single-Point-of-Failure herausstellen kann. Wird ein Bot-Exemplar analysiert, können diese Informationen zur Aufdeckung weiter Teile und somit zur Schließung des Botnets beitragen. Wie bereits in Kapitel III-A erläutert, ist ein Bootstrapping im Fall von parasite Botnets nicht notwendig, da sich die Peers bereits im darunterliegenden P2P-Netzwerk befinden. Im Falle von leeching Botnets ist dieser Prozess jedoch zwingend. Derzeit analysierte Botnets nutzen unterschiedlicher Strategien. Eine intuitive Methode, einen neuen Bot in das restliche Netzwerk aufzunehmen, ist, eine Liste von IP Adressen und Portnummern anderer Peers hart-kodiert in jeden Peer zu implementieren. Dabei sind Peers mit besonderen Charakteristiken auszuwählen. Es sollten vor alle Peers mit einer geringen Churn-Rate ausgewählt werden, damit die Erreichbarkeit dieser gewährleistet ist. Dieses Vorgehen ist jedoch sehr störanfällig gegen Analysen von gekaperten Bot-Instanzen. Indem man die Verbindung zu den Hosts unterbindet, welche als Entry- Points des Bootstrapping-Prozesses dienen, kann zumindest die Vergrößerung des Botnets effektiv unterbunden werden. Weiterhin können die in der Liste hinterlegten Informationen veralten, sodass der Bootstrapping-Prozess gestört wird. Eine zweite Methode verlagert die eigentlichen Informationen für das Bootstrapping, also IP-Adressen und Portnummern von Peers, auf externe Web-Server aus. Mittels sogenannter Web-Caches, die unter Anderem auch im P2P-Netzwerk Gnutella Anwendung finden, erlangen neue Bots ausreichen Informationen über den Zutritt zum Botnet. Diese Web-Caches sind meist außerhalb des Netzwerks lokalisiert und können 152

153 PEER-TO-PEER-BOTNETS: EIN SYSTEMATISCHER ÜBERBLICK dynamische Einträge enthalten. In diesem Fall ist jedoch die Adresse des Web-Caches hart-kodiert im Bot vorhanden. Dies führt zu dem selben Nachteil, wie in der oben erläuterten Vorgehensweise. Um kritische Informationen zur Konstruktion des Netzes nicht statisch in den Bots zu hinterlegen, gibt es den Ansatz des Austauschs von Peer-Listen. Nach der Infektion eines Hosts B durch einen Bot A, übergibt A eine Untermenge an Peers, die er kennt an B. Diese Methode umgeht erfolgreich einen Bootstrapping-Prozess. Sie kann erweitert werden, in dem ein Listenaustausch bekannter Peers bei jeder Kommunikation zweier Peers erfolgt. Jedoch kann dieses Vorgehen lediglich in bot-only-botnets angewandt werden. Grund dafür ist, dass das zugrundeliegende Protokoll speziell dafür ausgelegt ist. In der Praxis gibt es bisher noch kein P2P-Botnet, welches sich durch den Austausch von Peer-Listen konstruiert. Jedoch wurde mit dem in [17] vorgestellten hybriden Botnet ein Prototyp implementiert, der diese Fähigkeit besitzt. Weiterhin wurde in [14] ein weiteres System entwickelt, welches diesen Ansatz verfolgt. E. Command-and-Control-Mechanismen Ein weiterer wichtiger Aspekt in der Architektur von Botnets ist der Ansatz zur Propagation von Befehlen an alle Bots. Traditionelle, zentralisierte IRC-Botnets wenden eine einfache Form der Verbreitung von Befehlen an. Alle Bots verbinden sich regelmäßig mit einem oder mehreren zentralen IRC-Servern. In einem fest definierten IRC-Channel werden Befehle vom Operator mittels einfacher Chat-Nachrichten an die Bots erteilt. Im Fall von P2P-Botnets existiert eine derartige zentrale Instanz nicht, sodass andere Ansätze verfolgt werden müssen. Diese Ansätze zur Kommunikation innerhalb der Bots kann in zwei Gruppen eingeteilt werden:push und pull. Bei dem push-verfahren, welches auch als Command- Forwarding bezeichnet wird, agieren Bots bei dem Empfang von Befehlen passiv. Ihnen werden die Befehle durch andere Bots zugestellt. Nachdem Erhalt des Befehls senden sie diesen an weitere Bots. Hierbei entsteht das Problem, zu entscheiden, an welche Bots Befehle weitergeleitet werden sollen. Die Autoren von [18] schlagen zwei Alternativen vor. In der ersten Version bedient sich ein Bot seiner Nachbarschafts-Peer-Liste. Der zweite Ansatz der Arbeit besteht darin, eine Suchanfrage nach einem speziellen Titel zu starten. Die Peers, welche auf die Suche antworten, werden als Ziele der Weiterleitung angesehen. In Rambot [13] wird der Push-Mechanismus durch einfaches Flooding der Befehle in das Netzwerk realisiert. Dem steht das pull-verfahren gegenüber, welches auch als Command-Publishing beziehungsweise Command- Subscribing bezeichnet wird. Hierbei versuchen Bots aktiv neue Befehle zu beziehen. Sie überprüfen in regelmäßigen Intervallen definierte Orte im Netzwerk (Rendezvous-Punkte), an denen der Botmaster neue Befehle hinterlegt. Dies wird realisiert, indem Peers regelmäßig nach bestimmten Hashes in der Distributed Hash Table (DHT) suchen. Der Botmaster muss lediglich die Befehle selbst oder Informationen, welche zu den Befehlen führen, an den Peers hinterlegen, welche zuständig für die Suchanfrage sind. Dies sind die Peers, deren ID am nächsten zur Hash-ID des Suchbegriffes liegen. Die Berechnung des Hashes für die Suchanfrage ist meist in einem hart-kodierten Algorithmus im Bot hinterlegt. Die bereits erwähnten Systeme aus [17] und [14], sowie [13] verwenden eine Kombination beider Verfahren. So werden in Rambot Ankündigungen zu neuen Updates, Befehlen et cetera per push-mechanismus verbreitet. Daraufhin versuchen die einzelnen Bots diese Änderungen mittels pull-verfahren zu beziehen. IV. ZUORDNUNG Im folgenden Abschnitt werden den Charakteristiken von P2P-Botnets, die in Abschnitt III vorgestellt wurden, aktuellen P2P-Botnets zugeordnet. Die Auswahl der vorgestellten Vertreter unterliegt keiner Eingrenzung. Es werden für jedes Kriterium diejenigen Schädlinge vorgestellt, welche dazu geeignet sind, die unterschiedlichen Ausprägungen der Klassifizierungsmerkmale darzustellen. A. Botnet-Typen 1) Parasite Botnets - Phatbot: Die ersten Exemplare von P2P-Botnets beschränkten sich auf existierende P2P- Netzwerke und konnten somit der Klasse der parasitären P2P- Botnets zugeordnet werden. Ein Beispiel für diese Schädlinge ist Phatbot. Dieser setzt auf dem WASTE-Protokoll 11 auf. WASTE implementiert verschlüsselte und anonyme Kommunikation innerhalb eines P2P-Netzes und wurde 2003 von Nullsoft unter der GPL veröffentlicht. Jedoch sind die meisten Vertreter heutiger P2P-Botnets der leeching beziehungsweise bot-only Klasse zuzuordnen. Das Auftauchen neuer Schädlinge, welche sich auf existierende P2P-Netzwerke beschränken, ist nicht mehr fest zu stellen. Vermutliche Gründe dafür, sind die in Abschnitt III-A beschriebenen Nachteile parasitärer P2P-Bots. 2) Leeching Botnets - Stormworm: Die erste Generation des Stormwurms setzte auf Overnet auf, welches das Overlay des P2P-Netzes organisierte. In dieser Version ist Storm der Klasse der leeching Botnets zuzuordnen. Zwar wurde Overnet 2006 geschlossen, jedoch blieben danach weitere Peers online, sodass zu diesem Zeitpunkt neben Bots auch gutartige Peers im Netzwerk zu verzeichnen waren. Jedoch änderte sich die Charakteristik von Storm im Oktober Alternativ zu Overnet etablierte der Stormworm ein eigenes Netzwerk, welches anfangs optional zur Verfügung stand. Spätere Versionen nutzten ausschließlich dieses spezielle Netzwerk, welches als Stormnet bezeichnet wurde. Diese Generation Storms ist der Klasse der bot-only P2P-Botnets zuzuordnen. Im Folgenden wird Bezug auf die erste, leeching Version des Stormworms genommen, welche auf Overnet aufbaut. Overnet ist eine P2P-Hashtabelle, welche auf Kademlia basiert. Jedoch gibt es Unterschiede im Vergleich zu Kademlia. Overnet nutzt 128 bit Hashes als Schlüsselraum für Peer- und Schlüssel-IDs. Im Gegensatz dazu verwendet Kademlia

154 PEER-TO-PEER-BOTNETS: EIN SYSTEMATISCHER ÜBERBLICK bit Hashes. Beim Start des Clients wird ein zufälliger Hash- Wert generiert. Genau wie Kademlia basiert Overnet auf der XOR-Metrik. Diese definiert die Distanz zwischen zwei Hash- Werten x und y als d(x, y) = x y. x und y werden bitweise kalkuliert. Die XOR-Metrik besitzt die Eigenschaft der Unidirektionalität. Es gibt für jeden Schlüssel x und jede Distanz ε > 0 genau einen Schlüssel y, sodass gilt: d(x, y) = ε. Werte werden gespeichert, in dem aus ihren Namen der Hash-Wert gebildet wird (Objekt-ID). Dieser wird an dem Peer abgespeichert, dessen ID die kleinste Distanz zur Objekt-ID aufweist. Um eine Datei beziehungsweise ein Datum von einem Peer zu laden, wird bei dem Knoten gesucht, dessen ID am nächsten zum entsprechenden Schlüssel der Datei liegt. Das Routing erfolgt auch bei Overnet nach dem Präfix-Ansatz. Ein Peer besitzt mehrere Nachbarschaftslisten, in der Informationen über benachbarte Peers enthalten sind. Bei Kademlia handelt es sich um 160 Listen, die Informationstrippel speichern. Diese Trippel setzen sich zusammen aus Informationen über ID des Knoten, IP-Adresse, sowie der UDP-Portnummer. Für jede der Listen gilt, dass eine Liste i, mit 0 i < 160, diejenigen Knoten speichert, deren Distanz zwischen 2 i und 2 i+1 liegt. In einer Liste können maximal k Einträge vorliegen. Dieser Wert ist parametrierbar und führt dazu, dass diese Listen auch k-buckets genannt werden. In Overnet speichert ein Bucket maximal 20 Einträge. Das Peer sendet einen Datum an ein anderes Peer, welches das selbe Präfix besitzt, wie der Schlüssel des Datums. Das Präfix des Empfängerknotens muss jedoch um eine weitere Nummernstelle mit der ID des Datums übereinstimmen, als die eigene Peer-ID. Overnet wurde nicht als quelloffene Software veröffentlicht. Jedoch gibt es Anzeichen, dass das Overnet-Modul dem Kademlia-Derivat KadC ähnelt. In [5] wird dieser Zusammenhang durch die Analyse einer Datei hergestellt, die der Stormworm nutzt, um den Bootstrapping-Prozess zu starten. Diese Datei besteht aus Hashes, die Informationstrippeln kodiert. Diese Informationen setzen sich aus der Peer-ID, der IP-Adresse und dem UDP-Port zusammen: Auffällig ist die große Ähnlichkeit ihrer Struktur zu der Datei, die die Cbibliothek KadC [... ] zur Verwaltung ihrer Kontakte verwendet [5]. 3) Bot-only Botnets - Waledac: Einen Vertreter aus der Klasse der bot-only Botnets stellt Waledac dar. Die C2- Strukturen, sowie die gesamte Kommunikation Waledacs wird mittels Hypertext Transfer Protocol (HTTP) realisiert. Diese HTTP-basierte P2P-Kommunikation wurde von einigen Sicherheitsexperten als HTTP2P getauft 12. Bei dem Austausch von Befehlen untereinander werden HTTP-Anfragen selbst nicht verschlüsselt. Die eigentlichen Befehle, welche als Payload mittels HTTP transportiert werden, sind jedoch durch ein asymmetrischen Verschlüsselungsverfahren chiffriert. Zusätzlich zu HTTP benutzt Waledac eine Reihe weiterer, offener Protokolle und Standards, um die Kommunikation zwischen den Bots zu gewährleisten. Dazu gehören unter Anderem XML-basierte Nachrichtenformate und Base64-Kodierung. Da Waledac in die Klasse der bot-only P2P-Botnets einzuordnen ist, rekrutiert er potenzielle Opfer aus dem gesamten Internet. B. Infizierungsvektoren 1) Spam-Mails - Stormworm: Der Stormworm verbreitete sich in der ersten Phase seiner Existenz lediglich als Anhang von Mails. Diese Mails enthielten meist englischen Text zu einer Breite von Themen, die in sogenannten Kampagnen organisiert waren. Sie benutzten verschiedene Techniken des Social-Engineerings, um die Ausführung des schadhaften Anhangs durchzusetzen. So wurden infizierte Mails zu bestimmten Feiertagen versandt, welche inhaltlich auf den Festtag abgestimmt waren. Der Anhang wurde als elektronische Postkarte (E-Card) ausgegeben. Laut Dahl [5] änderte sich die Ausbreitungsstrategie ab Ende Juni Zu diesem Zeitpunkt wurden in versendeten Spam-Mails keine weiteren Stormworm-Exemplare versandt. Jedoch behielten die Angreifer die Social-Engineering-Taktiken bei und betteten stattdessen Links zu anderen infizierten Hosts ein, die dem Anwender eine vermeintlich ungefährliche Website präsentierten. Auf solchen Websites wurde mit weiteren sozialen Beinflussungen versucht, das Opfer zum Download eines Stormworm-Exemplars zu veranlassen. Parallel nutzten die Angreifer zusätzlich ein automatisiertes Verfahren. Falls der Anwender nicht zum Download des Schädlings bewegt werden konnte, versuchte die manipulierte Website automatisch eine Reihe von Browser-Exploits auszunutzen, um Stormworm ohne weiteres Zutun durch den Nutzer zu installieren. Ab August 2007 nutzte Stormworm zusätzlich unterschiedliche Blog- Softwares, um sich zu verbreiten. Inhaltlich ähnelten die in den Blogs dargebotenen Informationen den Mail-Kampagnen. Eine zeitliche Übersicht der Verbreitungsmechanismen und der Kampagnen liefert [5]. 2) Manipulierte Websites - Waledac: Die Verbreitungsmechanismen Waledacs ähneln denen von Storm, auch wenn sie nicht ähnlich mannigfaltig sind. Waledac verbreitet sich lediglich über infizierte Websites. Diese bemächtigen sich ähnlichen Social-Engineering-Taktiken. So wird der Nutzer im Glauben gelassen, dass er auf entsprechenden Seiten Updates zu Codecs und Videoplayern, Handy-Tools und weitere Programme beziehen kann, die sich als Waledac-Exemplare herausstellen. Die Websites sind thematisch und inhaltlich auf das vermeintliche Programm abgestimmt, sodass der unerfahrene Nutzer keinen Unterschied zu einer legitimen Seite feststellen kann. Im Gegensatz zu Storm muss der Nutzer jedoch aktiv die ausführbare Datei herunterladen, da keine automatisierte Ausnutzung von Schwachstellen des jeweiligen Browsers in der Website implementiert ist. 3) Instant-Messaging - Nugache: Nugache ist ein Beispiel für P2P-Bots, welche eine ganze Bandbreite an Infizierungsvektoren nutzen. Obwohl die Mechanismen zur Verbreitung nicht genau abgegrenzt werden können, wurde Nugache dafür bekannt, dass es vor Allem mittels Instant Messenger (IM) Nachrichten Zugang zu neuen Opfern fand. [4] listet weitere Wege der Verbreitung auf. Dazu gehören unter Anderem: Phishing-Websites, Spam und Mail-Anhänge, modifizierte Versionen von Limewire-Clients, Browser-Exploits und weitere Exploits in Netzwerkdiensten

155 PEER-TO-PEER-BOTNETS: EIN SYSTEMATISCHER ÜBERBLICK C. Ziele 1) Spam-Versand - Stormworm & Waledac: a) Stormworm: Hauptfunktion von Stormworm ist der Versand von Spam-Mails im Rahmen sogenannter Kampagnen. Diese Kampagnen zeichnen sich durch eine thematische Ausrichtung auf aktuelle Ereignisse aus. So wurden unter anderem Spam-Mails verschickt, welche über die Folgen des Sturmtiefs Kyrill informierten. Weiterhin gab es Kampagnen anlässlich zu Fest- und Feiertagen, wie dem Valentinstag oder zu sportlichen Ereignissen. Holz et al [10] zählten zwischen Dezember 2006 und Januar verschiedene Spam-Mail- Kampagnen, indem sie von Storm versandte Mails in einer dafür installierten Spamtrap auswerteten. Gekoppelt an diese Funktion kann Stormworm den befallenen Rechner nach E- Mail Adressen durchsuchen, um diese für weitere Spam-Mail- Kampagnen zu nutzen. Neben dem Versand von Spam-Mails implementiert Storm noch weitere Funktionen, die für Botnets markant sind. Dazu zählen unter anderem die Möglichkeit eine Distributed Denial of Service Attacke (DDoS) auszuführen. Erwähnenswert ist, dass Storm die Möglichkeit zum Ausführen einer DDoS- Attacke auch zur Selbstverteidigung nutzt. Storm wertet verschiedene Statistiken über Aktivitäten innerhalb des Botnets aus. Darunter fällt auch die Anzahl paralleler Downloads der primären Payloads pro IP-Adresse. Falls diese ungewöhnlich hoch für einen normale Infizierungsprozess ist, geht Storm von einem Versuch der Analyse aus und attackiert die IP- Adresse mittels einer ICMP-DDoS Attacke. Dies geschah unter anderem dem Autor von [5] bei dem Versuch das Verhalten des Stormworms zu untersuchen. b) Waledac: Da Waledac vom Aufbau und Systemverhalten stark Storm ähnelt, gibt es auch bezüglich der Funktionalität Übereinstimmungen. Die Hauptfunktion Waledacs liegt im Versand von Spam-Mail. Diese sind, ähnlich wie bei Storm, Template-basiert und in sogenannten Kampagnen organisiert. Die Autoren von [1] berichten in ihrer Arbeit, dass Waledac Spam-Mails zur Selbstverbreitung und zum Sammeln von legitimen Mail-Adressen versendet. Im letzten Fall wird den Opfern ein Angebot zum Kauf sehr günstiger Uhren unterbreitet. Bei Interesse, sollen sich diese per E- Mail bei einer vermeintlichen Kontaktperson melden. Waledac organisiert den Versand von Spam, ählich wie Storm, in Kampagnen. Diese basieren inhaltlich auf Fest- und Feiertagen und enthalten Links zu Websites, welche Waledac-Exemplare getarnt zum Download anbieten. 2) DDoS-Angriffe - Nugache: Nugache weist eine größere Bandbreite an Funktionalitäten auf und ähnelt diesbezüglich Trojaner. Als Hauptfunktion wurde die Ausführung von DDoS-Angriffen identifiziert. Weiterhin ermöglicht Nugache dem Botmaster die Kontrolle infizierter Rechner, das Herunterladen und Bereitstellen von Daten via File Transfer Protocol (FTP) und HTTP. Eine weitere Funktion besteht in der Aufzeichnung von Tastenanschlägen, sogenanntem Keylogging. Dies ermöglicht das Ausspähen von sensiblen Daten, wie Passwörtern, Transaktionsnummern und Ähnlichem. D. Bootstrapping - Stormworm, Nugache & Waledac Im Bereich des Bootstrapping-Prozesses verhalten sich die Schädlinge Storm, Nugache und Waledac ähnlich. Alle Vertreter nutzen hart-kodierte Informationen, um sich initial mit dem Netzwerk zu verbinden. 1) Stormworm: Storm legt bei der Infizierung eines neuen Rechners eine Datei an (meist spooldr.ini), in der Kontaktinformationen über festgelegte Peers enthalten sind. Diese Liste hat einen Umfang von mehreren hundert Einträgen [5], [15]. Jeder Eintrag besteht aus zwei Feldern, die durch einen Seperator ( = ) getrennt sind. Das erste Feld ist ein Hash-Wert, welcher die Peer-ID darstellt und zur eindeutigen Identifizierung des Peers dient. Im zweiten Feld sind zusätzlich Informationen, IP-Adresse, Port-Nummer und der Typ des Peers, hexadezimal kodiert. Diese Informationen können genutzt werden, um das eigentliche Bootstrapping zu vollziehen. Die ersten Varianten von Stormworm, welche auf Overnet aufsetzten, folgten hierbei dem Ablauf des Netzwerkbeitritts von Kademlia. Es wird zuerst zufällige Peer-ID mittels einer Hash-Funktion erzeugt. Diese ID bleibt persistent und wird für spätere Aktivitäten im Netzwerk genutzt. Die hart-kodierten Peers werden entsprechend ihrer Distanz in die jeweiligen Buckets einsortiert. Danach erfolgt das Versenden der Nachricht NODE_LOOKUP mit der eigenen Peer-ID als Parameter. Dies hat als Konsequenz, dass der Knoten Informationen über seine Nachbarschaft erhält und diese wiederum den Knoten in ihre Buckets aufnehmen. 2) Nugache: Wie bereits erwähnt, verfolgt Nugache einen ähnlichen Ansatz zur Verwaltung der Informationen, die für den Beitritt zum Botnet benötigt werden. Die Autoren von [15] fanden durch Analysen des Schädlings heraus, dass jedes Exemplar über eine fest eingebundene Liste von 22 IPs verfügt, zudem sich ein neu infizierter Rechner verbindet. Dieser erhält daraufhin eine aktuelle Liste infizierter Rechner. Die Liste liegt in binärer Form vor. Zusätzliche Kontaktinformationen, welche von den Bootstrapping-Peers übermittelt werden, legt Nugache in der Windows-Registrierung unter HKCU_Software_Gnu ab. Nugache zeichnet erfolgreiche Verbindungen zu anderen Peers, sowie die Anzahl der Verbindungsversuche auf und speichert sie in der Registrierung ab. 3) Waledac: Waledacs Ansatz, um neu infizierte Bots in das Botnet aufzunehmen, ist im Vergleich zu anderen Schädlingen dynamischer und robuster gegen Angriffe. Borup [2] hat einen umfassenden Überblick über Waledacs Verhalten zusammengestellt. Zwar greifen Bots anfänglich auch eine Liste von 100 sogenannter Proxy-Bots zurück. Da auf Grund der speziellen Netzwerkarchitektur Waledacs Proxy-Bots einem hohen churn aufweisen, kann diese Liste schnell nicht mehr aktuell sein. Daher wird jeder neues Waledac-Exemplar mit einer dynamisch erzeugten Liste erreichbarer Proxy-Bots erzeugt. Nach dem ersten Start senden die Proxy-Bots dem anfragenden Bot Informationen über weitere Peers. Zusätzlich macht Waledac Gebrauch von der fast-flux Komponente. Ein Bot sendet weitere Anfragen an eine Domain aus dem fast-flux Netzwerk nach Peer-Informationen. 155

156 PEER-TO-PEER-BOTNETS: EIN SYSTEMATISCHER ÜBERBLICK E. Command-and-Control-Mechanismen 1) 3-Schichtarchitektur, Pull - Stormworm: Die erste Version des Stormworm setzte als leeching Botnet auf Overnet auf. Spätere Versionen lösten Overnet durch ein eigenes P2P- Protokoll, Stormnet genannt, ab. Dieses unterscheided sich zu Overnet lediglich in der Kodierung der Nachrichten. Diese werden mittels eines 40 Byte Schlüssels XOR kodiert. Die Nachrichten selbst wurden nicht verändert. Wie bereits erwähnt, gibt es in der Netzwerkarchitektur Gemeinsamkeiten zu Waledac. Auch Stormworm gliedert sie in 3 Ebenen. Abhängig von der Erreichbarkeit neu infizierter Rechner, nehmen sie die Rolle eines Spammers oder Gateways ein. Ein Spammer ist nicht direkt aus dem Internet erreichbar und fragt daher aktiv bei Gateways nach neuen Peers und Spam-Templates nach. Gateways beantworten Anfragen von Spammer-Bots als auch den Kontroll-Knoten, die Kontaktinformationen für Spammer- Bots an den Gateways hinterlegen. Das Auffinden von Inhalten und die Suche nach neuen Peers durch Spammer erfolgt nach dem Prinzip der Schlüsselsuche. Eine hard-kodierter Algorithmus generiert unterschiedliche Schlüssel, nach denen Spammer-Bots suchen. Auf den entsprechenden Peers, die für den jeweiligen Schlüssel verantwortlich sind, wurden zuvor Kontaktinformationen in Form von IP-Adresse und Port- Nummer eines Kontroll-Knoten hinterlegt. Ein Spammer-Bot verbindet sich mittels TCP mit den entsprechenden Kontroll- Knoten und fragt nach neuen Instruktionen. Die Schlüssel dienen somit als Rendezvouz-Punkt. Dies ist ein charakteristisches Merkmal für publish/subscribe-basierte Architekturen. Holz et al. [10] konnten durch ihre Analysen den Algorithmus näher bestimmen. Es ist eine Funktion f(d, i), die als Parameter d das aktuelle Datum und als Parameter i eine Zufallszahl zwischen 0 und 31 entgegen nimmt. Daraus ergeben sich pro Tag 32 unterschiedliche Schlüssel, nach denen gesucht wird. Die Schlüsselsuche dient weiterhin zum Auffinden weiterer infizierter Bots. Dies ist notwendig, da in der ersten Version von Stormnet zwischen infizierten Bots und legitimen Knoten des Overnet-Netzwerks unterschieden werden musste. 2) 3-Schichtarchitektur, Push & Pull - Waledac: Waledacs Infrastruktur zur Kommunikation zwischen den beteiligten Bots ist in drei Ebenen gegliedert. Für die Etablierung der Command-and-Control-Kanäle werden den Bots unterschiedliche Funktionen zugeteilt. Diese greifen mittels push- sowie pull-techniken auf Informationen bezüglich weiterer Peers im Netzwerk und neuen Spam-Instruktionen zu. Abhängig von der Erreichbarkeit eines infizierten Rechners nimmt ein Bot entweder die Funktion eines Spammers (auch Worker) oder eines Proxys ein. Falls ein Rechner nicht direkt aus dem Internet erreichbar ist, sich also zum Beispiel hinter einem NAT-Gerät befindet, wird er zu einem Spammer. Spammer bilden die unterste Schicht des Netzwerks. Da sie nicht direkt vom restlichen Netzwerk kontaktiert werden können, fragen sie aktiv nach Informationen über neue Spam-Kampagnen nach. Diese Anfragen werden von Proxy-Bots weitergeleitet. Sie bilden die mittlere Schicht des Netzwerks. Da sie öffentlich erreichbar sind, können sie Anfragen von Spammern weiterleiten zu der obersten Schicht. Die oberste Schicht besteht aus 5 Backend-Servern, welche stabile IPs besitzen und deren Netzwerkstatus sich über lange zeit nicht verändert. Die Proxies leiten Anfragen von Worker-Bots an Backend-Server weiter und bieten diese Funktionalität für andere Proxies an, sodass nie ein Bot direkt mit einem Backend-Server kommuniziert. Somit muss jeder Bot mindesten einen Proxy-Bot in seiner Liste haben, um an der C2-Kommunikation teilzunehmen. Um diese Liste möglichst aktuell zu halten beantworten Proxies Anfragen von Workern bezüglich neuer Peer-Informationen. Gleichzeitig senden Proxies unterinander Anfragen, wobei die anfragenden Proxy-Bots auf der Liste der kontaktierten Bots stehen und sich somit aktiv werben. Die Kommunikation ermöglicht somit push-, sowie pull-methoden. Weiterhin ist auffällig, dass Waledac in den unteren zwei Ebenen Peerto-Peer-Charakteristiken aufweist. Auf Grund der Backend- Server hat Waledac jedoch auch eine zentrale Komponente. Die Gliederung der C2-Kommunikation in verschiedene Layer ähnelt der Architektur des Stormworms. 3) Dezentral, Pull - Nugache: Die C2-Infrastruktur von Nugache unterscheided sich von den bereits vorgestellten Schädlingen Stormworm und Waledac. Sie ist im Vergleich sehr viel einfacher gehalten und komplett dezentral. Abgesehen vom puristischen P2P-Charakter dieses Botnets wurde in späteren Versionen eine Rollenverteilung eingeführt. Um eine höhere Effizienz zu gewährleisten, werden in dieser Version auch Rechner in das Botnet aufgenommen, welche nicht direkt aus dem Internet erreicht werden können. In [15] wird beschrieben, dass die Funktion eines neu infizierten Bots abhängig von seiner Erreichbarkeit festgelegt wird. Falls sich dieser hinter einem NAT-Gerät befindet, kann er lediglich die Rolle eines Clients annehmen. Öffentlich erreichbare Peers werden als Servents eingesetzt, die Anfragen von Clients beantworten. Die gesamte Kommunikation findet in einem verschlüsselten P2P-Netz ab, welches lediglich pull-methoden implementiert. Zusätzlich zur P2P-Komponente, können Nugache-Bots auf zentralisierte IRC-Kanäle zugreifen. V. GEGENMASSNAHMEN Um ein Botnet effektiv zu bekämpfen, gibt es grundsätzlich zwei Ansätz. Im ersten Ansatz würde man versuchen, alle infizierten Rechner vom Schädling mittels eines Antivirenprogramms zu bereinigen. Der zweite Ansatz besteht darin, die Command-and-Control-Kanäle des Netzes anzugreifen, sodass keine Kommunikation mehr zwischen Botmaster und Bots stattfinden kann. Dies würde das Botnet für den Botmaster ultimativ nutzlos machen. In zentralisierten Botnets kann durch Schließung der zentralen IRC-Server oder durch Filterung jeglicher Verbindungen dahin dieses Ziel erreicht werden. Da in P2P-Botnets keine physische, zentrale Stelle vorhanden ist, müssen andere Ansätze gefunden werden. Die wichtigsten werden in diesem Abschnitt vorgestellt. A. Naive Ansätze Naive Ansätze umfassen den Einsatz Signatur-basierter Antivirenprogramme sowie das physische Lahmlegen identifizierter Bots. Beide Methoden führen zu keinem befriedigenden Ergebnis, was vor Allem dem P2P-Charakter der Botnets zu zuschreiben ist. Wie bereits erwähnt zeigen unter Anderem 156

157 PEER-TO-PEER-BOTNETS: EIN SYSTEMATISCHER ÜBERBLICK Stormworm Waledac Nugache Phatbot Typ Leeching (erste Version) Bot-only Bot-Only Zentralisiert (IRC) Ziele Primär: Spam-Versand Primär: Spam-Versand Primär: DDoS-Angriffe Primär: DDoS-Angriffe Infektion Spam-Mails Websites Instant-Messaging Würmer Bootstrapping hart-kodierte Listen (statisch) C&C-Struktur 3-Schichtenarchitektur, Pull Listenaustausch 3-Schichtenarchitektur, Push & Pull hart-kodierte Listen (dynamisch) & hart-kodierte Listen (dynamisch) Dezentral, Pull Tabelle II ÜBERBLICK KLASSIFIKATION VON P2P-BOTNETS hard-koderit (statisch) Zentralisiert, Pull Waledac und Stormworm sehr starke polymorphe Aktivitäten. Beide Schädlinge erstellten Replikate von sich selbst, jedoch mit leicht veränderter Code-Basis. Dies führte dazu, dass für jedes neue Derivat eine neue Viren-Signatur durch die Hersteller erstellt werden musste. Die Derivate selbst zeigten eine nur geringe Lebensdauer von wenigen Stunden, im Falle von Waledac. Weiterhin ist dieser Ansatz reaktiver Natur. Botnets zeigen während ihrer Wirkenszeit charakteristische Verbreitungsmuster auf, die eine Konstanz gemeinsam haben. Trotz dem Einsatz von Antiviren-Software verbreiten sich die Schädlinge der Botnets somit erfolgreich weiter. Das physische Lahmlegen aller Rechner eines Botnets ist schon aus praktischen Gründen nicht möglich. Die Größe des Botnets lässt dies nicht zu. Knoten mit herausragender Wichtigkeit für das Botnet gibt es auf Grund des P2P-Charakters prinzipiell nicht. Weiterhin sind Botnets meist geographisch weit verteilt. Der Zugriff auf Rechner anderer Länder wird oft durch juristische Schranken behindert. B. Index-Poisoning-Angriff Dieser Angriff wurde erstmals durch die Content-Industrie eingeführt, um die Verbreitung inhaltlich geschützter Werke in P2P-Netzwerken zu verhindern beziehungsweise zu unterbinden. Sie kann auf alle P2P-Netzwerke angewandt werden, deren C2-Mechanismen nach dem publish/subscribe-ansatz funktionieren und weitere Kriterien erfüllen. Ein erfolgreicher Index-Poisoning-Angriff kann dabei nur gelingen, wenn das Netzwerk die Kommunikation nicht verschlüsselt. Weiterhin darf keine Authentifikation unter den Botserforderlich sein, um Inforamtionen auszutauschen. Bei dem Index-Poisoning- Angriff werden Inhalte mittels Indizes im Netzwerk positioniert und abgefragt. Derartige Netzwerke werden auch als Index-basierte Netzwerke bezeichnet. In erster Linie stellt Storm ein derartiges System dar, da dessen Architektur auf dem pull-mechanismus aufsetzt und keine optionalen push- Methoden zur Kommunikation implementiert (siehe Abschnitt IV-E1). Im ersten Schritt muss durch Reverse-Engineering eines Bot-Exemplars die fest einprogrammierte Funktion zur Berechnung der Rendezvouz-Schlüssel identifiziert werden. Nachdem der Verteidiger (oder Sicherheitsspezialist) Kenntnis von der Art und Weise der Berechnung der Schlüssel hat, kann er diese voraussagen und hat somit Kenntnis über die Nodes, welche für die Speicherung der Inhalte verantwortlich sein werden. Der eigentliche Prozess der Vergiftung geschieht, in dem der Verteidiger falsche Informationen und den selben Schlüssel-IDs veröffentlicht. Tut er dies in genügend größer Anzahl, steigt die Wahrscheinlichkeit, dass Bots, die nach diesen Schlüsseln suchen, anstelle der echten Bot-Instruktionen die falschen Informationen erhalten. In Kademlia werden Inhalte jedoch nicht nur an dem Knoten gespeichert, dessen ID am nächsten zur Objekt-ID des gespeicherten Wertes ist. Um eine erhöhte Redundanz trotz des P2P-inhärenten Churns zu gewährleisten, wird der Inhalt an den k-nächst gelegenen Knoten hinterlegt. Ein Sicherheitsspezialist müsste also auch die k nächsten Knoten um den eigentlichen Schlüssel herum mit falschen Inhalten versorgen. Trifft eine Anfrage eines Bots auf dem Weg zum eigentlichen Zielknoten zwischendurch auf einen der Knoten, der als Redundanz auch diesen Wert speichert, wird dieser den falschen Inhalt zurück liefern und die Suche beenden. Die Autoren von [18] haben die Effektivität dieser Attacke ausgewertet. Es existieren jedoch Möglichkeiten, Index-Poisoning abzuwehren. Die Schwachstellen, die eine derartige Attacke ermöglichen sind grundsätzlich folgende. Erstens funktioniert Index-Poisoning in Netzwerken, in denen Inhalte ohne vorherige Authentifizierung veröffentlicht werden. Zweitens führt ein hard-kodierter Algorithmus zur Berechnung der Schlüssel dazu, dass Rendezvous-Punkte im Voraus berechnet werden können. Drittens ergibt sich durch die begrenzte Schlüsselzahl eine logische Zentralität, obwohl das Netz an sich physisch dezentral aufgebaut ist. Das in [16] vorgestellte Botnet Overbot umgeht einen Teil der Probleme, indem es seine Kommunikation mit einer starken RSA-Verschlüsselung chiffriert. Weiterhin berechnet jeder Knoten in Overbot seinen eigenen Schlüssel. Zusätzlich überprüft ein Knoten durch Authentifizierung vor dem Publishing, etwa durch asymmetrische Verschlüsselung, ob der Befehl vom Botmaster kommt und ob dieser Befehl auf dem Weg zum Knoten inhaltlich verändert wurde (durch Verwendung von Hashes und Signaturen). C. Sybil-Angriff Der Ansatz der Sibyl-Angriff basiert, wie bei dem Index- Poisoning auf der Kenntnis der Berechnung der Rendezvouz- Punkte, also Schlüsseln. Weiterhin funktioniert diese Art des Angriffs auf ein P2P-Botnet oder -Netzwerk allgemein dann, wenn Informationen ohne Authentifikation von Knoten akzeptiert und veröffentlicht werden können. Allgemein formuliert, basiert die Sybil-Attacke auf der Erstellung mehrerer, gefälschter Knoten. Diese Knoten sollen in möglichst hoher Zahl im Netzwerk verteilt werden, um einen überproportionalen 157

158 PEER-TO-PEER-BOTNETS: EIN SYSTEMATISCHER ÜBERBLICK Einfluss auf das Gesamtverhalten im Netzwerk zu erreichen. Dies ist einfach umsetzbar, da eine Entität in einem P2P- Netzwerk nicht zwangsläufig an nur eine physische Identität gebunden ist. Somit kann auf einem Rechner eine Vielzahl (bis zu tausend Identitäten, auf einem durchschnittlichen PC) an gefälschten Identitäten simuliert werden. Diese existieren als Knoten im P2P-Botnet. Die IDs der Knoten dürfen dabei natürlich nicht zufällig gewählt werden. Sie müssen nahe derjenigen Objekt-IDs liegen, deren Distribution man unterdrücken will. Wenn der Botmaster nun Inhalte unter dieser Objekt-ID veröffentlicht, werden diese im besten Fall auf allen gefälschten Knoten mitveröffentlicht, da deren IDs nahe zu der Objekt-ID liegen. Dieses Vorgehen wird auch als aktive Sybil- Angriff definiert. Da die gefälschten Knoten unter der Kontrolle der Verteidigers liegen, können diese vom P2P-Protokoll abweichendes Verhalten implementieren. So können diese alle anfragenden Verbindungen abbrechen, was zum Nichtauffinden der Inhalte führt. In einer ersten Phase der Bekämpfung eines P2P- Botnets könnten diese jedoch lediglich passiv agieren und die Datenströme mitzeichnen, damit sie zur Analyse des Netzees genutzt werden können. Eine spezielle Form der Sybil-Attacke, die Eclipse-Attacke, welche in [7] präsentiert wurde, soll hier kurz vorgestellt werden. Die direkte Kommunikation von Peers untereinander funktioniert primär mittels der Informationen, welche in sogenannten Nachbarschaftslisten (den k-buckets in Kad) hinterlegt sind. Die Integrität der Inhalte dieser Listen ist demnach essentiell für jede Art der Kommunikation innerhalb des Netzwerks. Grundsätzlich funktioniert er Eclipse-Angriff in der Art, dass eine moderate Anzahl an gefälschten Entitäten im Netzwerk einen anderen Bot-Knoten dazu bringen, sie als Peers in seine Nachbarschaftslisten aufzunehmen. Ziel ist es, die Nachbarschaftslisten möglichst vieler Bot-Knoten zu kontrollieren. Diese Attacke kann jedoch im Fall von Stormworm, welcher auf Overnet aufsetzte, nicht angewandt werden. Der Grund dafür liegt im Aufbau des Schlüsselraums in Overnet. Um eine Schlüssel-ID k mittels einer Eclipse- Attacke anzugreifen, müssen gefälschte Peers möglichst nahe um k ins das Netzwerk eingeführt werden. In Overnet werden zu Zwecken der Redundanz jedoch Inhalte nicht um n Peers verteilt, die am nächsten zu k sind. Dagegen werden Inhalte über den gesamten Schlüsselraum verteilt, sodass eine derartige Angriffszone nicht existiert. VI. ZUSAMMENFASSUNG Wie in dieser Arbeit gezeigt wurde. stellen P2P-Botnets eine Weiterentwicklung zentraler Botnets dar. Es wurde eine Auswahl aktueller Vertreter dieser Art von Schädlingen präsentiert, darunter Stormworm, Nugache und Waledac. P2P- Botnets können entlang verschiedener Dimensionen klassifiziert werden. Die wichtigste Kategorisierung teilt sie dabei in die drei Typen parasite, leeching und bot-only botnets ein. Ein wesentlicher Punkt in der Funktionsweise von Botnets ist die Aufnahme eines neu infizierten Rechners in das bestehende Netzwerk, der Bootstrapping-Prozess. Je nach Implementierung des Bootstrappings, müssen Informationen über andere Peers einem neuen Knoten bekannt sein. Diese Informationen können als Ausgangspunkt für Gegenmaßnahmen genutzt werden. Auf Grund der dezentralen Struktur aktueller Botnets, müssen neue Verteidigungsmaßnahmen ergriffen werden. Mittels Index-Poisoning können viele der Botnets erfolgreich infiltiert werden, welche auf Basis von unverschlüsselten publish-/subscribing-verfahren Inhalte ohne Authentifikation unter den Bots austauschen. Es wurden jedoch im Rahmen der Forschung bereits hybride P2P-Botnets entwickelt, welche neben pull- auch push-mechanismen implementieren, um Angriffen zu widerstehen. Die Entwicklung P2P-basierter Botnets hat gerade erst begonnen. Es ist mit weiteren technischen Fotschritten auf Seiten der Angreifer zu rechnen. Um diesen Gefahrenzu begegnen, muss an der Entwicklung neuer Botnets gearbeitet werden, damit entsprechende Gegenmaßnahmen abgeleitet werden können. LITERATUR [1] Markus Engelberth Felix C. Freiling Ben Stock, Jan Göbel and Thorsten Holz. Walowdac - analysis of a peer-to-peer botnet. Technical report, Laboratory for Dependable Distributed Systems University of Mannheim, Secure Systems Lab Technical University Vienna, IV-C1b [2] Lasse Trolle Borup. Peer-to-peer botnets: A case study on waledac. Technical report, Technical University of Denmark, IV-D3 [3] Matthew Braverman. Msrt - progress made, trends observed. Technical report, Microsoft Antimalware Team, III-B [4] Supreeth Burji. Reverse engineering of a malware - eyeing the future of security. Master s thesis, The Graduate Faculty of The University of Akron, IV-B3 [5] Frédéric Dahl. Der storm-worm. Master s thesis, Universität Mannheim, IV-A2, IV-B1, IV-C1a, IV-D1 [6] D. Dittrich and S. Dietrich. Command and control structures in malware. In USENIX ;login, volume 32, pages 8 17, , 2 [7] John Douceur and Judith S. Donath. The sybil attack. pages , V-C [8] Thorsten Holz et al. Know your enemy: Tracking botnets. Technical report, The Honeynet Project and Research Alliance, III-C [9] Julian B. Grizzard and The Johns. Peer-to-peer botnets: Overview and case study. In USENIX Workshop on Hot Topics in Understanding Botnets (HotBots 2007), I [10] Thorsten Holz, Moritz Steiner, Frederic Dahl, Ernst Biersack, and Felix Freiling. Measurements and mitigation of peer-to-peer-based botnets: a case study on storm worm. In LEET 08: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, pages 1 9, Berkeley, CA, USA, USENIX Association. IV-C1a, IV-E1 [11] Commtouch Software Ltd. Malware outbreak trend report: Storm-worm. Technical report, Commtouch Software Ltd., verfügbar unter http: // II-A [12] Jeet Morparia. Peer-to-peer botnets: Analysis and detection. Master s thesis, San Jose State University, III-C [13] Thorsten Holz Ralf Hund, Matthias Hamann. Towards next-generation botnets. Computer Network Defense, European Conference on, 0:33 40, III-E [14] Michael J. Jacobson Ryan Vogt, John Aycock. Army of botnets. In In Proceedings of NDSS 2007, pages , III-D, III-E [15] John Hernandez Sven Dietrich Sam Stover, Dave Dittrich. analysis of the storm and nugache trojans: P2p is here. In Usenix ;login: The USENIX Magazine, volume 32, IV-D1, IV-D2, IV-E3 [16] Guenther Starnberger, Christopher Kruegel, and Engin Kirda. Overbot - a botnet protocol based on kademlia. In 4TH INTERNATIONAL CONFERENCE ON SECURITY AND PRIVACY IN COMMUNICATION NETWORKS (SECURECOMM), V-B [17] Ping Wang, Sherri Sparks, and Cliff C. Zou. An advanced hybrid peerto-peer botnet. In HotBots 07: Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets, pages 2 2, III-D, III-E 158

159 PEER-TO-PEER-BOTNETS: EIN SYSTEMATISCHER ÜBERBLICK [18] Ping Wang, Lei Wu, Baber Aslam, and Cliff C. Zou. A systematic study on peer-to-peer botnets. In ICCCN 09: Proceedings of the 2009 Proceedings of 18th International Conference on Computer Communications and Networks, pages 1 8, Washington, DC, USA, IEEE Computer Society. III-A, III-E, V-B 159

160 PLAUSIBLE DENIABILITY Plausible Deniability Martin Soemer Abstract In der folgenden Ausarbeitung wird näher auf den Sachverhalt der glaubhaften Abstreitbarkeit eingegangen. Einige Personengruppen wie bspw. Informanten haben ein großes Interesse daran, nicht in Verbindung mit manchen Daten oder deren Weitergabe gebracht werden zu können. Hier soll eine Übersicht über einige Verfahren gegeben werden, die den Aspekt der Plausible Deniability in elektronischen Medien umsetzen. Beispiele hierfür sind Off-the-Record und TrueCrypt Hidden Volumes. Im Abschluss wird kurz darauf eingegangen, welche Probleme in den bisher existenten Lösungen noch vorhanden sind und welche Verfahren bereits gut integriert sind. I. MOTIVATION Jeder betreibt auf irgendeinem Wege Kommunikation. Sei es übers Telefon, via Brief, persönlich im Gespräch oder elektronisch mittels oder Instant Messaging. Kommuniziert man von Person zu Person, so kann man recht einfach sicherstellen, dass die ausgetauschten Informationen vertraulich bleiben, indem man einen nicht öffentlichen Ort für das Gespräch wählt. Bei der Telefonie und dem Briefverkehr ist dies auch direkt damit gegeben, dass ein spezieller Adressat mit der Information erreicht wird. Diese Kommunikationswege sind rechtlich davor geschützt mitgeschnitten zu werden. Doch werden immer mehr Informationen über das Internet ausgetauscht. Hier fällt es leichter eine Unterhaltung unbemerkt mitzuzeichnen, da der Kommunikationsweg durch das Internet nicht für mich ersichtlich ist, oder im Falle von Wireless-LANs meine Informationen drahtlos gesendet werden und somit jemand in lokaler Nähe diese ohne Probleme ebenfalls erhält (Figur 1). Rechtlich ist es ebenfalls kritisch zu Natürlich können Daten verschlüsselt übermittelt werden, sodass die Informationen nicht unmittelbar lesbar sind. Doch ist die Zukunftssicherheit von kryptographischen Methoden nicht immer garantiert und im Falle das bspw. der Rechner des Empfänger übernommen oder ausgespäht wird, nutzt eine Verschlüsselung auch nicht. Die hier gewünschte Eigenschaft ist glaubhafte Abstreitbarkeit, oder im wie im weiteren Verlauf dieser Arbeit plausible deniability genannt. Das bedeutet, dass auch im Nachhinein keine eindeutige Zuordnung zu dem Absender möglich ist. Hierbei kann Verschlüsselung sogar kontraproduktiv wirken wie in Abschnitt IV gezeigt wird. Auch das Aufbewahren von Daten kann kritisch sein. Die Existenz von Daten auf einem System, selbst wenn diese verschlüsselt vorliegen, kann sich negativ auswirken. II. OUTLINE Im Abschnitt III wird kurz gezeigt, welche Arten von Plausible Deniability existieren. Darauf folgend werden in Abschnitt IV einige Schwierigkeiten die es bei der Umsetzung zu beachten gibt geschildert. Danach wird auf einige Methoden genauer eingegangen, angefangen in Abschnitt V, in dem eine mögliche Umsetzung von Plausible Deniability für E- Mail-Verkehr dargestellt. In dem Abschnitten VI und VII werden dann noch spezielle Verfahren für Instant Messaging- Protokolle gegeben, wobei Abschnitt VI ein Verfahren für 1- zu-1-kommunikation beschreibt und Abschnitt VII näher auf Kommunikation mit mehreren Teilnehmern eingeht. Wie nicht nur die Informationen beim Empfänger einer Nachricht abstreitbar gestalten werden, sondern auch der Kommunikationsweg, wird in Abschnitt VIII beschrieben. Eine andere Anwendung von Plausible Deniability als die Kommunikation wird in Abschnitt IX gegeben, in dem eine Methode verschleiern von Daten auf einem Datenträger beschrieben ist. Zum Abschluss findet sich in Abschnitt X ein Fazit und ein Ausblick auf mögliche Forschungsgebiete. Fig. 1. Unbekannter Kommunikationsweg III. ARTEN VON PLAUSIBLE DENIABILITY IN DER KOMMUNIKATION Es gibt unterschiedliche Arten, wie bei elektronischer Kommunikation etwas abstreitbar gehalten wird. betrachten. In den Nutzungsbestimmungen von vielen Instant Messagern und einigen Mail-Providern ist vermerkt, dass an allen über sie vermittelten Daten der Anbieter Recht hat diese zu verwerten. Nun gibt es aber einige Personengruppen welche ein Interesse daran haben, dass Informationen die sie jemand anderem mitteilen anschließend nicht mehr eindeutig ihnen zuordenbar sind. Beispielsweise möchten Geheimdienste, Journalisten, Informanten oder Dissidenten nicht immer, dass Informationen welche nach außen dringen ihnen angelastet werden können. A. Inhaltliche Abstreitbarkeit Möchte man nach dem Übermitteln einer Nachricht abstreiten können, exakt diesen Text gesendet zu haben, so gibt es dafür mathematische Ansätze wie bspw. in [2], in dem Verfahren für kryptographische Funktionen beschrieben werden, mit denen zu einem übermittelten Ciphertext, ein möglicher Klartext ermittelt werden kann, welcher nicht dem eigentlich übermittelten Text entspricht. Dabei gibt es noch die Unterscheidung ob der Ausweichtext vorher bestimmbar 160

161 PLAUSIBLE DENIABILITY sein soll, oder ob er lediglich durch mathematische Methoden berechenbar ist und somit keinen Sinn wie bspw. lesbare Sprache ergeben muss. B. Sender/Empfänger-Abstreitbarkeit Häufig ist es interessant, abstreiten zu können eine Nachricht überhaupt gesendet zu haben, anstatt nur deren Inhalt abstreiten zu können. Zum Beispiel ist es für einen Informanten wichtiger erst gar nicht in Kontakt mit bspw. einem Journalisten gebracht werden zu können, als das dieser bewiesenermaßen vorliegt. Um dieses Ziel zu erfüllen, weitet man die Anzahl der möglichen Personen, welche die Nachricht geschickt haben könnten, aus, sodass keine eindeutige Zuordnung mehr stattfinden kann. IV. SCHWIERIGKEITEN Möchte man das Konzept der Plausible Deniability umsetzen, so stellen viele der üblichen Methoden zur Verschlüsselung und Authentifizierung Schwierigkeiten dar. Eines der derzeit am häufigsten verwendeten Verschlüsselungsverfahren für den -Austausch ist Pretty Good Privacy (PGP), welches Philip R. Zimmermann 1991 entwickelte [10]. Dieses basiert auf einem Schlüsselpaar-Prinzip aus öffentlichem und geheimen Schlüssel. Nachrichten welche mit dem privaten Schlüssel signiert wurden, können dabei mittels des öffentlichen Schlüssels auf ihre Authentizität geprüft werden. Möchte man Nachrichten an eine Person verschlüsselt versenden so wird der öffentliche Schlüssel dieser dafür verwendet. Nur mit dem passenden privaten Schlüssel kann diese dann wieder gelesen werden. In Belangen der Plausible Deniability ist dieses Verfahren aber absolut kontraproduktiv, da eine Person eindeutig einer Nachricht zugeordnet werden kann. Im Falle der Signatur ist dies der Absender und im Falle der Verschlüsselung der Empfänger. V. s stellen im Bezug auf die Plausible Deniability eine besondere Schwierigkeit dar, da die Kommunikationszeiten relativ lang sind. Möchte man sicherstellen das der Absender sich dem Empfänger gegenüber authentifizieren kann, so muss eine Methode gefunden werden welche auch nach bspw. einer Woche dies noch zulässt, ohne das eine dritte Person diese Authentifizierung nachvollziehen kann. Geht man davon aus, dass eine Mail jeweils nur an einen Empfänger adressiert ist, lässt sich dieser Umstand ausnutzen. Wie in [8] beschrieben, lassen sich Ring-Signaturen nutzen um eine Signatur zu erzeugen welche von jedem aus einer Gruppe von möglichen Personen erzeugt werden kann. Kommt man nun auf die Annahme zurück, dass die Kommunikation nur 2 Teilnehmer hat, wird in [1] die Möglichkeit aufgezeigt, eine Ring-Signatur für 2 Teilnehmer zu nutzen. Sendet Alice nun eine mit einer solchen Signatur versehene Mail, kann Bob sicher sein, dass diese von Alice kommt, doch eine dritte Person (im folgenden Eve genannt) kann dies nicht beweisen. Auch Bob hat diese Möglichkeit gegenüber Eve nicht, da er die Nachricht auch selber hätte signieren können. Nur das Wissen von Bob, dass er die Mail nicht selber geschrieben hat, beweist ihm, dass die Nachricht von Alice stammen muss. Das in [8] beschriebene Verfahren hat den Vorteil, dass nur der öffentliche Schlüssel des Empfängers bekannt sein muss, um die Ring-Signatur zu erstellen. Es ist also keine weitere Architektur notwendig als die bisher etablierte zum Austausch von öffentlichen Schlüsseln, wie sie auch von PGP genutzt wird. Dieses Verfahren kann aber nicht bei mehreren Empfängern verwendet werden, da dann aus dem Wissen, dass man nicht selber die Nachricht gesendet hat, kein eindeutiger Absender mehr feststellbar ist, sondern eine Gruppe von möglichen Sendern in Betracht kommen kann. Dieser Sachverhalt bietet eine andere Art von Plausible Deniability, bei der zwar abgestritten werden kann, dass man eine Mail geschrieben hat, aber auch der Empfänger nicht weiss, wer diese Person war. In [8] wird dafür als Anwendungsbeispiel ein Fall genannt, in dem eine führende Person aus dem weissen Haus Informationen versenden möchte, ohne das diese Person als Informant identifizierbar ist. Dafür muss dieser lediglich eine Ringsignatur mit allen öffentlichen Schlüsseln der Führungskräfte im weissen Haus erstellt. Damit ist diese Person nicht identifizierbar, aber es kann geprüft werden, das die Quelle authentisch ist, da es sich auf jeden Fall um einer der Führungskräfte handeln muss. VI. 1-ZU-1-INSTANT MESSAGING Beim Instant Messaging sind die Kommunikationszeiten wesentlich kürzer als bei s. Dies erlaubt einem, Methoden zu nutzen, bei denen die Authentifizierung flüchtiger ist und somit bspw. nach einer Sitzung entfernt werden kann. Ein inzwischen etabliertes Verfahren hierfür ist Off-the-Record (OTR) [1]. Dieses setzt folgende Aspekte um: Verschlüsselung Authentifizierung Deniability Perfect forward secrecy Der Begriff der perfect forward secrecy bedeutet dabei, dass beim Verlust von privaten Daten, alte Konversationen nicht kompromittiert werden können. Um diese Ziele zu erfüllen, baut OTR auf den Ideen, welche in Abschnitt V beschrieben wurden, auf und erweitert diese. Jeder Teilnehmer besitzt einen privaten Schlüssel. Dieser wird zu Beginn der Kommunikation verwendet um mittels des Diffie-Hellman-Algorithmus [3] einen gemeinsamen Schlüssel für die Verschlüsselung zu finden, wie in Figur 2 dargestellt ist. In dieser Figur sind x und y die privaten Schlüssel von Fig. 2. A B : A, g, p, mit A = g x mod p B A : B, mit B = g y mod p Ablauf des Diffie-Hellman-Schlüsselaustauschs Alice und Bob, p ist eine Primzahl und g eine Primitivwurzel 161

162 PLAUSIBLE DENIABILITY mit 2 g p 2. Alice berechnet dann den Schlüssel aus K = g Bx und Bob mit K = g Ay. Wird nun einer der Teilnehmer kompromittiert, so, kann zwar festgestellt werden das sowohl Alice als auch Bob an dem Schlüsselaustausch teilgenommen haben, da der private Schlüssel von beiden verwendet wurde, aber die weitere Konversation könnte sowohl von Alice als auch von Bob stammen, da beide nun den gleichen Schlüssel für die übrige Kommunikation besitzen. Um dies zu verdeutlichen wird in Figur 3 der Ablauf des OTR-Verfahrens noch einmal genau dargestellt. Fig. 3. OTR-Protokoll [1] A B : g x1 B A : g y1 A B : g x2, E(M 1, k 11 ) B A : g y2, E(M 2, k 21 ) A B : g x3, E(M 3, k 22 ) Bei jeder Nachrichtenübermittlung wird hierbei der Schlüssel getauscht. dieser wird in der Figur als g k l dargestellt, was für den l-ten Schlüssel von Teilnehmer k steht. E(M i, k ab ) symbolisiert die Verschlüsselung der i-ten Nachricht mit dem gemeinsamen Schlüssel k, welcher aus den Teilen g xa und g y b zusammengesetzt ist. Die gegenseitige Authentifizierung basiert wie in Abschnitt V darauf, dass sowohl Alice als auch Bob die Nachricht hätten verschlüsseln können, aber der jeweilige Empfänger weiß, dass er die Nachricht nicht geschrieben hat, sie folglich von seinem Kommunikationspartner stammen muss. Die bekannten Implementierungen für das OTR-Verfahren [6] [5] kennen dabei 4 unterschiedliche Zustände in denen sich eine Kommunikation befinden kann. Zum ersten gibt es den Zustand Nicht privat. Dabei wird der Text im Klartext übermittelt und es findet keinerlei Authentifizierung oder Verschlüsselung statt. Der zweite Zustand ist Privat. Dies bedeutet das eine verschlüsselte Kommunikation stattfindet und der öffentliche Schlüssel der Gegenpartei authentisch ist. Dies ist auch der gewünschte Zustand in einer Unterhaltung mittels OTR. Der dritte Zustand lautet Unverifiziert. Hierbei findet zwar eine Verschlüsselung statt, jedoch ist der öffentliche Schlüssel der Gegenpartei nicht als authentisch angesehen. Dieser Zustand ist in sofern als unsicher anzusehen, dass zwar die Daten verschlüsselt vorliegen, jedoch nicht garantiert werden kann, dass nicht eine dritte Person zwischen den beiden Teilnehmern die Nachrichten vermittelt (Man-In- The-Middle-Attacke). Der letzte Zustand lautet Beendet. In diesen wird gewechselt wenn eine bestehende OTR-Sitzung beendet wird, sei es durch das mutwillige Beenden einer der beiden Teilnehmer, oder durch das Beenden des Clients von einem. Wie bei s gibt es aber auch bei dem OTR-Verfahren das Problem, das es nur für zwei Teilnehmer konzipiert ist. Möchte man das Verfahren ausdehnen auf Chatrooms wie bspw. IRC in sicherer aber abstreitbarer Weise nutzen, müssen komplexere Verfahren eingesetzt werden. VII. MEHR-TEILNEHMER-INSTANT MESSAGING Ein Beispiel eines solchen Verfahrens ist mpotr (multiparty OTR) [7]. Der Ablauf des Protokolls lässt sich in 6 Teilschritte gliedern, welche in Figur 4 dargestellt sind. 1) Initialisierung 2) SessionID bestimmen 3) Protokoll-Parameter authentifizieren 4) Teilnehmer authentifizieren 5) Kommunikation Senden Empfangen 6) Beenden Fig. 4. Ablauf von mpotr In dem Beispiel eines IRC-Channels findet dieses Verfahren Anwendung, wenn man nur mit bekannten Teilnehmern kommunizieren möchte. Diese erzeugen flüchtige Signaturen, welche nach einer Sitzung gelöscht werden. Diese werden zwischen allen Beteiligten ausgetauscht, wobei dieser Austausch von Teilnehmer zu Teilnehmer privat abläuft, also jeder Teilnehmer zu jedem anderen Teilnehmer einzeln eine Verbindung aufbaut um einen Schlüsselaustausch durchzuführen. Somit kann jeder Teilnehmer jedem anderen eindeutig eine flüchtige Signatur zuordnen, und jeder hat sich gegenüber jedem authentifiziert. Wird nun eine Sitzung beendet, so werden die flüchtigen Signaturen im Channel veröffentlicht, sodass anschließend jeder alles hätte schreiben können, aber während einer Sitzung die Zuordnung eindeutig ist. Ändert sich die Teilnehmerliste, so wird die aktuelle Sitzung beendet und eine neue begonnen. Dabei berechnet jeder Teilnehmer eine Checksumme über die Liste der von ihm anerkannten Teilnehmer. unterscheidet sich diese Checksumme mit der von anderen Teilnehmern, so ist ein Teilnehmer hinzugekommen, welcher nicht autorisiert ist. Auch bei diesem Verfahren ist der einzige Punkt, an dem die Mitwirkung eines speziellen Teilnehmers im Nachhinein nachgewiesen werden kann der initiale Schlüsselaustausch. Dieser ist auch nicht effizient gelöst, da er mit einer Komplexität von O(n n ) nicht gut auf hohe Nutzerzahlen skaliert. VIII. PROTOKOLL-EBENE Während die bis hierhin beschriebenen Verfahren die Möglichkeit bieten, abzustreiten der Autor einer Nachricht zu sein, kann doch auf Netzwerkebene eine Verbindung nachgewiesen werden. Für viele Zwecke wäre es wünschenswert nicht nur den Inhalt abstreiten zu können, sondern schon allein die Verbindung zu einem System. Für diesen Zweck gibt es einige Möglichkeiten den Weg, den ein Paket im Netzwerk zurücklegt, zu verschleiern. Eines der bekanntesten Verfahren ist TOR [4]. Mit dieser Anwendung ist es möglich den Verkehr von TCP-Paketen anonym umzuleiten. Auch lassen sich Dienste anonym mittels sogenannten Rendezvous Points anbieten. 162

163 PLAUSIBLE DENIABILITY Fig. 5. Aufbau eines 2-Sprung-Kreises und Beginn einer Web-Page-Anfrage [4] TOR bildet ein Netzwerk aus TOR-Knoten, zwischen denen Daten nur verschlüsselt übermittelt werden und auf Pfaden zwischen den Knoten immer nur der direkte Nachfolger und der direkte Vorgänger bekannt sind. Dadurch ist die Anonymität des einzelnen garantiert, da niemals ein Knoten sicher weiß, von wem ein Paket ursprünglich ausging. Möchte sich ein Teilnehmer über das Netzwerk mit einem anderen Teilnehmer verbinden, so sieht der Ablauf bspw. so aus wie in Figur 5 gezeigt. Es wird eine TLS-Verbindung zu einem bekannten TOR-Knoten hergestellt und der erste Teil eines Diffie-Hellman-Schlüsselaustauschs übermittelt, wie er in Figur 2 gezeigt wurde. Dieser antwortet nach Diffie Hellman-Protokoll, sodass eine vertrauliche Verbindung zum ersten Knoten hergestellt wurde. Soll ein weiterer Sprung im Netzwerk hinzugefügt werden, wird ein Extend-Befehl über den ersten Knoten abgesetzt, welcher für den 2. Knoten verschlüsselt wurde. Dieser Befehl bewirkt widerum den Aufbau einer TLS-Verbindung zu Knoten 2.Dieser Vorgang kann beliebig oft wiederholt werden um mehr Sprünge bis zum Ziel durchzuführen. Um nun sicher zu gehen, dass kein Knoten auf dem Weg mehr feststellen kann als den Vorgängerbzw. Nachfolgeknoten, werden die Pakete in umgekehrter Reihenfolge verschlüsselt, also in dem gegebenen Beispiel zuerst für Knoten 2, danach für Knoten 1. Sendet man nun eine normale TCP-Anfrage, so kommt am letzten Punkt ein unverschlüsseltes Paket an, der dieses dann unverschlüsselt an das Ziel weiterleitet. Nach einer festen Zeit wird der so erstellte Pfad verworfen und ein neuer erstellt, um eine potentielle Nachverfolgung zu erschweren. Um einen anonymen Treffpunkt zwischen 2 Parteien zu ermöglichen, oder auch einen TCP-Dienst wie bspw. einen Webserver anonym anbieten zu können, nutzt TOR Rendezvous Points. Bei diesem Verfahren verbindet sich ein Teilnehmer in das TOR-Netzwerk und hinterlässt an einigen Knoten die Information auf einen Dienst. Nun kann ein anderer Teilnehmer ebenfalls über das Tor-Netzwerk zu diesen Informationsknoten verbinden und den Dienst eines anderen anfordern. Auf diese Weise wird eine anonyme Verbindung aufgebaut, ohne das die Beteiligten sich direkt kennen, sondern bspw. nur einen Verweis auf einen Informationsknoten den man auf einer Webseite gegeben hat. IX. EXISTENZ VON DATEN Möchte man Daten auf einem System vor dem Zugriff von Dritten schützen, ist die einfachste Methode diese zu verschlüsseln. Doch möchte man nicht nur den Inhalt sichern, sondern die Existenz dieser im Gesamten glaubhaft abstreiten können, so stößt man auf Schwierigkeiten. Selbst wenn man einen Bereich auf einem Datenträger vollständig mit Zufallswerten füllt, bevor man diesen nutzt, ist dies nur ein Schutz, solange das System, welches diese Daten nutzen soll, nicht erreichbar ist. Bekommt man Zugriff auf das System, nutzt man einfach dessen Methoden um an die Daten zu gelangen. Ein möglicher Ansatz um auch in einem solchen Falle Abstreitbarkeit zu erlangen wird in [9] mittels des Tools TrueCrypt beschrieben (wie in Figur 6 zu sehen ist). Dieses erzeugt eine Container-Datei, welche zunächst vollständig mit zufälligen Werten gefüllt wird. Möchte man nun auf eine Datei in dem System zugreifen, muss man das Programm nutzen, welches die Dateizuordnungen aus dem Container auslesen kann. Doch deutet die Existenz eines solchen mit Zufallswerten gefüllter Bereich darauf hin das dort Daten vorliegen. Um nun die Existenz dieser abstreitbar zu machen bedient sich das Programm eines Tricks. Während ein normales System alle vorhandenen Dateien in einer Liste zur Verfügung hat um auf diese zugreifen zu können, kann TrueCrypt innerhalb eines Containers einen weiteren Container erzeugen, welcher nirgendwo aufgeführt wird. Es verbirgt sich also in einem Container voller Zufallswerte ein weiterer Container, welcher von außen nicht ersichtlich ist. Möchte man auf die darin enthaltenen Daten zugreifen, muss man explizit darauf hinweisen wo dieser Container beginnt, da er selbst keine Zuordnung hat. Wird man nun gezwungen den Zugang zu dem verschlüsselten Bereich auf dem Datenträger zu gewähren, findet man in diesem nur Alibi-Dateien, aber die Existenz des zweiten Containers kann nicht festgestellt werden. 163

164 PLAUSIBLE DENIABILITY Fig. 6. Option zur Erstellung eines verschlüsselten und versteckten Bereichs Um die Abstreitbarkeit zu garantieren, darf das umgebende System keinerlei Aktionen mitprotokollieren, welche in dem geschützten Container geschehen. Viele moderne Betriebssysteme protokollieren den Zugriff auf Daten, was umgangen werden muss. X. FAZIT UND AUSBLICK Abschließend bleibt festzustellen, dass Plausible Deniability zwar mit unterschiedlichen Bemühungen erreichbar ist, aber die bisher verbreiteten Verfahren nur mit 2 Teilnehmern effizient realisiert werden. In Umgebungen mit mehreren Teilnehmern ist noch einiges zu verbessern um effiziente Verfahren zur Hand zu haben. Ein Manko, dass die beschriebenen Verfahren zur Kommunikation noch haben, ist, dass der Schlüsselaustausch die Mitwirkung an der Kommunikation noch zeigt. Ein Verfahren für den Deniable-Key-Exchange müsste entwickelt werden um auch diesen Grad der Mitwirkung abstreiten zu können. Im Bereich des -Verkehrs gibt es keine Methode eine Mail an mehrere Empfänger zu schicken und abstreitbar zu signieren, sodass der Empfänger sicher weiß, dass die Mail vom Absender stammt. Mit TOR ist im Bereich der abstreitbaren Verbindung oder des anonymen Dienstangebots ein gutes Mittel gegeben, doch zeigen aktuelle Forschungen immer wieder Schwachstellen auf. Nehmen aber viele an dem Netzwerk teil, so stellt dieses Verfahren eine gute Lösung für das Problem der Abstreitbarkeit eines Datenweges dar. Plausible Deniability von Daten ist ein Bereich, in dem noch viel getan werden kann. Wo moderne Browser die Möglichkeit bieten den Verlauf zu löschen um den Aufenthalt auf bestimmten Internet-Seiten abstreitbar zu gestalten, da protokollieren Betriebssysteme immer mehr mit. ACM workshop on Privacy in the electronic society, pages 77 84, New York, NY, USA, ACM. V, VI, 3 [2] Ran Canetti, Cynthia Dwork, Moni Naor, and Rafail Ostrovsky. Deniable encryption. In CRYPTO 97: Proceedings of the 17th Annual International Cryptology Conference on Advances in Cryptology, pages , London, UK, Springer-Verlag. III-A [3] Whitfield Diffie and Martin E. Hellman. New directions in cryptography. IEEE Transactions on Information Theory, 22(6): , November VI [4] Roger Dingledine, Nick Mathewson, and Paul Syverson. Tor: The second-generation onion router. In In Proceedings of the 13th USENIX Security Symposium, pages , VIII, 5 [5] Scott Ellis. Off the record plugin for miranda im. " June VI [6] Ian Goldberg, Rob Smits, Chris Alexander, and Nikita Borisov. Off the record plugin for pidgin. " June VI [7] Ian Goldberg, Berkant Ustaoğlu, Matthew D. Van Gundy, and Hao Chen. Multi-party off-the-record messaging. In CCS 09: Proceedings of the 16th ACM conference on Computer and communications security, pages , New York, NY, USA, ACM. VII [8] Ronald L. Rivest, Adi Shamir, and Yael Tauman. How to leak a secret. In ASIACRYPT 01: Proceedings of the 7th International Conference on the Theory and Application of Cryptology and Information Security, pages , London, UK, Springer-Verlag. V [9] Robin Snyder. Some security alternatives for encrypting information on storage devices. In InfoSecCD 06: Proceedings of the 3rd annual conference on Information security curriculum development, pages 79 84, New York, NY, USA, ACM. IX [10] Philip R. Zimmermann. The official PGP user s guide. MIT Press, Cambridge, MA, USA, IV REFERENCES [1] Nikita Borisov, Ian Goldberg, and Eric Brewer. Off-the-record communication, or, why not to use pgp. In WPES 04: Proceedings of the

165 CLOUD COMPUTING Cloud Computing Martin Stopczynski Abstract Cloud computing illustrates a huge cloud of distinctive services, hiding the technology and infrastructure behind it. Furthermore, through the constant technology development and maturity of the business models providing the services, the cloud expands and the characterization gets bigger and fuzzier. This fact makes it difficult to provide a comprehensive definition of the term cloud computing. Though, in order to bring light into the cloud of services and the technology associated, this paper will provide an insight in particular services the cloud offers, the key concepts as well as the underlying architecture. In addition, we want to present different approaches to achieve cloud computing features and aspects by employing the concepts of peer-to-peer and grid computing. Fig. 1. General unterstanding of Cloud Computing [6] I. INTRODUCTION Talking about cloud computing, we think about a cloudy group of technologies and services provided over the internet. Some would say the cloud is a metaphor for the internet to abstract the complex infrastructure we do not want to think of or even worry about. This very abstract and basic statement describes in general the public understanding of cloud computing and is illustrated in Figure 1. Going a step further, one can say that cloud computing is yet another buzz word [17] to define the overall IT processes and services around the next generation web technologies. Similar to the buzz phrase Web 2.0, there is a vast number of characterizations of the cloud computing term, which we will discuss later in Section III-B. Moreover, the hype around the cloud combined with computing further muddies the message. To resolve this confusion and to get an insight into the variety of aspects of cloud computing, this paper will provide an overview of the technical as well as the business model understanding [7]. We will encounter that the nowadays term of cloud computing was formed by the convergence of three major IT trends: Virtualization: where applications are separated from infrastructure. Utility Computing: where computing capacity is accessed over a grid as variably shared services. Application Service Provision: where applications are available on demand on a subscription basis. To get an idea what led to the paradigm of cloud computing, section II will emphasize the technical and business preconditions as well as trends for the arising of cloud computing. Some of these technologies include utility computing, grid computing, virtualization and cluster computing. Section III shows the abstract cloud computing architecture with the specific actors (vendor, developer, end-user) and applications in this segment. Following, in Section IV we present a few popular cloud computing provider and their services. In Section III-B we describe the key aspects in the cloud computing environment and reveal why cloud computing is so successful. Finally, we show two concepts how to use peer-to-peer and grid computing as a cloud computing service. II. UNFOLDING CLOUD COMPUTING In order to understand the fundamental idea of cloud computing, we have to enlist some basic technical and business preconditions that build the term of cloud computing see Figure 2. The success of today s cloud computing is not just solely based on the development of supercomputers or server farms and clusters. The outcome is primary established by the conjunction of cluster computing and the business model of utility computing, which briefly summarize the concept of renting expensive hardware, software or computation power rather than buying it. A. Utility computing Utility computing is not a new concept and is known since the early 60ies. It expresses the technology and business model to provide different IT services like computing power, storage and applications on demand, by paying only what you used. Utility computing eliminates the need to acquire and manage the computing resources, eliminates the start up costs in acquiring capital, configuring machines, performing the basic system management and system administration as well as allowing to focus on simply running the needed applications [15]. It also characterize the idea of paying for services per usage, which we will encounter later. The term is related to the traditional utility services such as electricity, water, gas and telephone network. The concept is simple why should one acquire expensive hardware that exceed the current liquidity of a company if primarily, it is unknown what resources are really needed and secondly, if it is still unclear whether the business will be successful. Besides, 165

166 CLOUD COMPUTING Fig. 2. Development and Accommondation of Cloud Computing partition a workload into a grid-aware application, and provide the application with secure access to data [15]. At present, multi-organizational grid computing is mainly used by academic institutions with large computational or storage demands. Commercial adoption has been slow, mainly due to security concerns created by the shared use of resources in an environment with unknown users [22]. What distinguishes grid computing from conventional high performance computing systems such as cluster computing, is that grids tend to be more loosely coupled, heterogeneous, geographically dispersed and may be used for many different purposes. They are often constructed with the aid of generalpurpose grid software libraries called middleware. The disparity to cloud computing is visualized in Figure 3, and is described as followed: Fig. 3. Difference between Cloud and Grid Computing [18] as an end-user you neither have to worry about managing those resources nor do you generally worry about accessing them, rather than tapping into these resources in some standardized means. However, the breakthrough of this model occurred today, because at that time several technical preconditions were not available or fully adapted. Among these conditions were maintaining extremely flexible and efficient dynamic IT infrastructure with complete cost control, cost allocation and active Service-Level-Agreement (SLA) management. The often compared concept of grid computing, which arise in the mid-90ies, also could not reach the achievement of cloud computing. Though, foremost the establishment of virtualization and the capability to run any software or operating system on raw hardware enabled the popularity and success around cloud computing. B. Grid computing Grid computing is a coordinated form of distributed computing whereby a virtual super computer is composed from a cluster of networked, loosely coupled computers, acting in collaboration to perform very large tasks (mostly a single problem at the same time). The idea started with a vision of making computing resources sharable and broadly accessible, ultimately providing a form of utility computing as described above. Grid developed from a huge computing research focus, which has been heavily backed by access to large computing clusters. Most of the grid related projects are those doing deep research with cluster aware programs, developed with a willingness to use a grid-aware library such as the Globus Toolkit [10]. This middleware facilitate computation as well as data intensive applications that typically require high levels of inter-machine communication. These toolkits and applications provide the ability to locate services within the grid, sustain communication between processes, simplify the ability to Cloud Computing: Typical configuration when consumers visit an application served by the central cloud, which is housed in one or more data centers. Green symbolizes resource consumption by many end-users (tasks), and yellow resource provision (data center), which is hidden behind the central cloud. The role of the coordinator for resource provision is designated by red, and is centrally controlled. Grid Computing: Typical configuration in which resource provision is managed by a group of distributed nodes. Green symbolizes resource consumption (end-users), which is usually only one batch-oriented task, and yellow resource provision (servers), which are distributed over the network. The role of the coordinator for resource provision is designated by red, and is centrally controlled (middelware). We can say that the focus of grid computing is on using multiple distributed but collaborative resources to run single batch-oriented jobs, when cloud computing provides dynamic and virtualized resources that are centrally organized in the cloud, for multiple users. C. Virtualization Virtualization defines the abstraction of logical systems from their physical implementation. The simulation of IT resources provides the opportunity to run distributed systems on one physical machine, so processes can be accomplished parallel as well as use shared resources without knowing each other. This conjunction of hardware and software simplifies the IT infrastructure and especially its maintenance. Advantages of virtualization are standardizing and automation of processes like deployment of patches or security updates, monitoring, 166

167 CLOUD COMPUTING backup and prioritization of resources. Furthermore, virtualization offers applications to be independent of the underlying hardware. The goal of virtualization is to reduce running costs for the physical systems and its administration plus increasing the capacity utilization of the resources. Usually servers are 10 to 15% busy, so the initial costs acquiring them, moreover the running costs for electricity and cooling are wasted [3]. By deploying multiple virtual servers on one physical machines you get different benefits. Not only the aspect of saving space in the data center, also the increase of reliability, energy-saving and server-utilization are huge advantages of virtualization. With all its features virtualization characterize a huge enabler for the rise of cloud computing and provided the agility as well as flexibility to deploy on-demand any machine instance needed. Large broadband connection took also a big role in cloud computing becoming a lead technology where data, software applications and computing power accessed from the cloud of online resources without worrying about maintenance or support. The accomplishment of cloud computing arise for individuals to access the data or use the application from anywhere, any time and any device as well as for organizations to reduce capital costs by purchasing software and hardware as a utility service. Although, it is quite difficult at this point to give a broad definition of cloud computing, Rehof Jacob, director of Frauenhofer ISST, summarizes the objectives as follows [12]: In cloud computing the service providers focus the model of on-demand services. Software vendors understand by this concept especially the hosting of business applications, mostly in cooperation with the service providers. Provider of virtualization solutions interpret the cloud as a virtualization software, which also contains dynamic cluster computing. The users interpret the term very flexible, depending on the challenge and approach. III. CLOUD COMPUTING - ARCHITECTURE AND CONCEPTS This section distinguishes the kind of systems where clouds are being used and characterizes the actors involved in this domain. Due to the fact that the concept of cloud computing is driven by many market participants and the individual protagonists emphasized different priorities, we will point out the key architecture. As shown in the lower part of Figure 2, there are three different abstraction levels in cloud computing aimed at different market segments. A. Key Architecture Infrastructure as a Service (IaaS): At the bottom layer of cloud computing service/infrastructure providers offer their large set of computing resources, such as storing and processing capacity. Through virtualization the vendor can provide all different sets of machine instances the customer requires (CPU, RAM, Disk-Space, Bandwidth). These instances (virtual machines) behave like dedicated servers and are controlled by the developer (customer). By splitting, assigning and dynamically resizing the resources, the vendor is able to build ad-hoc systems as demanded by customers. IaaS providers like Amazon EC2 1 or Rackspace-Cloud 2 offer user-defined disk images and software stacks to run on their hardware. A non-proprietary example for IaaS is the open source system Eucalyptus [19]. Platform as a Service (PaaS): Instead of supplying a virtualized infrastructure, one layer of abstraction above, this cloud service offers the software platform where developers (customers) can build their own application that run on the providers infrastructure. Providing an easy to use API which can be used by developers, not concerning with matters of allocation or other technical details. An example for PaaS is Google s App Engine 3 or Microsoft Azur 4. Some further services similar to this field are Storage-as-a-Service (like Amazon S3 5, Rackspace Cloudfiles 6 ) or Database-as-a- Service (like Microsoft SQLAzur 7 or Amazon SimpleDB 8 ). Software as a Service (SaaS): At the end-user facing level are the most popular examples of cloud computing. This type of cloud computing delivers a single application through the browser to thousands of customers using a versatile architecture. With web-based applications like Google Mail, Calendar or Docs 9 the vendor offers specific services, resources or storage. On the customer side, it means no upfront investment in servers, software licensing or software implementation; on the provider side, with just one app to maintain, costs are low compared to conventional hosting. Another provider of this service is Salesforce.com 10 with a huge portfolio of applications. Looking at Figure 4 we now can visualize the roles of the various actors in the cloud environment. Besides the vendor providing a particular service, the developers utilize the resources to build applications or services for the end-user. Indeed, actors can take on multiple roles, with vendors also developing services for the end-user, or developers utilizing the services of others to build their own services. B. Key Aspects and Concepts At this point we will gather most of the available cloud characterizations to get an integrative definition as a minimum common denominator. Taking the comments of a variety of experts from table I into account, we can provide a list of key aspects of cloud computing:

168 CLOUD COMPUTING Fig. 4. Actors in Cloud Computing [18] TABLE I CLOUD DEFINITIONS BY EXPERTS Easy to use: Do not care what s in the cloud and do not worry about maintenance. User friendly: Do not get distressed with the underlying infrastructure, just use the services provided. Agile: Rapid and inexpensively scale up/down recourses to avoid peaks or idle time. Flexible: Choose the specification what instances and computing power is required on-demand. Pay-as-you-go: Only pay the services you actually use and do not bother to license software. Risk: Reduce capital costs by purchasing software or hardware as a utility service. Instance availability: Dynamic increase/decrease services nearly in real-time. Independent: Enable users to access the services from any device and any location. Reliable: Use multiple redundant sites with huge infrastructure services to improve reliability. SLA 11 : Get the service you signed on. This leads to an encompassing definition of the cloud: Clouds are a large pool of easily usable and accessible virtualized resources (such as hardware, development platforms and/or services). These resources can be dynamically reconfigured to adjust to a variable load (scale) and are exploited by a payper-use model. IV. CLOUD COMPUTING - PROVIDER There are a lot of cloud computing providers out there [9] offering different cloud services. To give an insight on the discussed segments we enlist a few popular vendors and their general portfolio. Amazon Web Services IaaS: AWS is a comprehensive cloud services platform, offering computer power, storage, content delivery and other functionality. Some of the major services provided are: Amazon EC2 (Elastic Compute Cloud) is a web service that provides resizable compute capacity in the cloud. It offers different sets of virtual machines, operating systems and software stacks. 11 Service Level Agreement Author Definition J. Cross I don t care what s up there, as long as it works. I have get a way to plug in to a huge network of recourses and don t have to mass with the stuff in the middle. [4] K. Marks The idea of cloud computing comes from the days when we draw the network as a cloud and didn t care where the mass went, the cloud hides for us. [4] D. Farber Use the Cloud to store a lot of data, applications and other things. In the future everything will move to the cloud and providing data or applications will be like serving electricity. [4] M. Fox... all activitiy you want to do should take place on a remote server and you need only a network connection. [4] S. Gillmor... startups beeing able to implement ideas at a very early stage without the concern of scaling up. [4] T. Doerksen Cloud computing is... the user-friendly version of Grid computing... [11] S. Lawrence Cloud Computing is a bit like liquid paper - it covers up mistakes. If you want to get something up running, you don t have to worry about installing software or infrastructure... [4] G. Gruman Cloud computing comes into focus only when you think about what IT always needs: a way to increase capacity or add capabilities on the fly without investing in new infrastructure, training new personnel, or licensing new software. Cloud computing encompasses any subscriptionbased or pay-per-use service that, in real time over the Internet, extends IT s existing capabilities. [13] M. Klems...you can scale your infrastructure on demand within minutes or even seconds, instead of days or weeks, thereby avoiding under-utilization (idle servers) and overutilization of in-house resources... [11] B. Martin Cloud computing encompasses any subscription-based or pay-per-use service that, in real time over the Internet, extends IT s existing capabilities. [11] I. W. Berger...the key thing we want to virtualize or hide from the user is complexity... all that software will be virtualized or hidden from us and taken care of by systems and/or professionals that are somewhere else - out there in the Cloud... [11] J. Kaplan...a broad array of web-based services aimed at al lowing users to obtain a wide range of functional capabilities on a pay-as-you-go basis that previously required tremendous hardware/software investments and professional skills to acquire. Cloud computing is the realization of the earlier ideals of utility computing without the technical complexities or complicated deployment worries... [11] K. Hartig...really is accessing resources and services needed to perform functions with dynamically changing needs... is a virtualization of resources that maintains and manages itself. [11] K. Sheynkman Clouds focused on making the hardware layer consumable as on-demand compute and storage capacity. This is an important first step, but do companies to harness the power of the Cloud, complete application infrastructure needs to be easily configured, deployed, dynamically-scaled and managed in these virtualized hardware environments. [11] R. Buyya A Cloud is a type of parallel and distributed system consisting of a collection of interconnected and virtualized computers that are dynamically provisioned and presented as one unified computing resources based on service-levelagreements established through negotiation between the service provider and consumers. [8] R. Cohen Cloud computing is one of those catch all buzz words (like Web 2.0) that tries to encompass a variety of aspects ranging from deployment, load balancing, provisioning, business model and architecture. It s the next logical step in software. For me the simplest explanation for Cloud Computing is describing it as internet cetric software. [11] R. Bragg The key concept behind the Cloud is web application... a more developed and reliable Cloud. Many finds it s now cheaper to migrate to the web cloud than invest in their own server farm... it is a desktop for people without a computer... [5] 168

169 CLOUD COMPUTING Amazon S3 (Simple Storage Service) provides a simple web service interface that can be used to store and retrieve data, any time and anywhere on the web. Amazon Simple DB provides core database functions of data indexing and querying in the cloud. Google App Engine PaaS: Google App Engine is a platform for developing and hosting web applications on Google s infrastructure. App Engine requires developers to use only its supported languages, APIs and frameworks. Currently, the supported programming languages are Java and Python. App Engine costs nothing to get started. All applications can use up to 1 GB of storage and enough CPU and bandwidth to support an efficient app serving around 20 million page views a month, absolutely free. By enabling the billing option for the application, the free limits are raised, and you only have to pay for resources used above the free levels. Microsoft Azur PaaS: The Windows Azure platform is a set of cloud computing services that can be used together or independently that enable: Windows Azure, providing a scalable environment with compute, storage, hosting, and management capabilities. SQL Azure, a Relational Database for the cloud. AppFabric helps developers connect applications and services in the cloud or on-premises. This includes applications running on Windows Azure, Windows Server and a number of other platforms including Java, Ruby, PHP and others. Salseforce SaaS, PaaS: Salesforce (Force.com) offers a software as a service platform for building and deploying applications on a subscription basis. These applications are build using Apex (a proprietary Java-like programming language for the Force.com platform) and Visualforce (an XML-like syntax for building user interfaces in HTML, AJAX or Flex). Open Source Provider: Euculyptus: This acronym for Elastic Utility Computing Architecture Linking Your Programs To Useful Systems, is an open-source software infrastructure for implementing cloud computing on own indrastructure. The current interface to Eucalyptus is compatible with Amazon s EC2 and S3 interfaces, but the infrastructure is designed to support multiple client-side interfaces. Eucalyptus is implemented using commonly available Linux tools and basic Web-service technologies making it easy to install and maintain. V. CLOUD COMPUTING BASED ON P2P AND GRID In the final section we will provide examples how to achieve a cloud with an underlying peer-to-peer/grid structure. Peer-to-peer: Commonly abbreviated to P2P, is any distributed network architecture composed of participants that make a portion of their resources (such as processing power, disk storage or network bandwidth) directly available to other network participants, without the need for central coordination instances (such as servers or stable hosts) [23]. Peers are both suppliers and consumers of resources, in contrast to the traditional client-server model where only servers supply, and clients consume. Taking these considerations as well as the discussed in Section II into account, we will now show two different concepts to combine P2P, grid and cloud computing concepts. To understand the need of P2P based the cloud concept, we will have to look at the requirements and motivations first. A. Motivation of using P2P/Grid The increasing demand of large scale instances, intensive data/computation and the support of high performance computing infrastructures requires huge costs to recruit external resources to address resource insufficiency during peak periods. On the other hand, during off-peak periods, it cannot provide services to others to make full advantage of the investment. Therefore, resource scalability is becoming a critical problem for current workflow systems. Here, cloud computing can provide scalable resources on demand to system requirements. Besides scalable resources, another principal issue for large scale workflow applications is decentralized management. In order to achieve successful execution, effective coordination of system participants is required for many management tasks such as resource management (load management, workflow scheduling), QoS (Quality of Service) management, data management, security management and others [16]. One of the conventional ways to solve the coordination problem is centralised management where coordination services are set up on a centralized machine. All the communications such as data and control messages are transmitted only between the central node and other resource nodes but not among them. However, centralized management depends heavily on the central node and thus can easily result in a performance bottleneck. Some others common disadvantages also include: single point of failure, lack of scalability and the advanced computation power required for the coordination services [16]. To overcome the problems of centralized management, the typical decentralized architecture of peer-to-peer comes into account. Though, without any centralised coordination, pure P2P (unstructured decentralized) where all the peer nodes are communicating with each other through complete broadcasting suffers from low efficiency and high network load. Evidently, neither centralized nor unstructured decentralized management is suitable for managing large scale workflow applications since massive communication and coordination services are required. Therefore, in practice, structured P2P architecture is often applied where a super node acts as the coordinator peer for a group of peers. Through those super nodes which maintain all the necessary information about the neighboring nodes, workflow management tasks can be effectively executed where data and control messages are transmitted in a limited broadcasting manner [16]. 169

170 CLOUD COMPUTING B. P2P/Grid Based Cloud Workflow System Based on the analysis above, it is evident that cloud computing is a promising solution to address the requirement of scalable resources, and structured decentralized architecture such as structured P2P is an effective solution to address the requirement of decentralized management. Therefore, in this chapter we present SwinDeW-C (Swinburne Decentralized Workflow for Cloud) [16], a peer to peer based cloud workflow system for managing large scale workflow applications. Workflow systems are designed to support the process automation of large scale business and scientific applications like insurance claims or to search for pulsars, from the scientific application area. Generally they are deployed on high performance computing infrastructures such as clusters, cloud and grid computing. SwinDeW-C is also based on an existing SwinDeW-Grid [25] (a peer-to-peer based grid workflow system see Figure 5). a) SwinDeW-G: contains many grid nodes distributed in different places. Each grid node contains many computers including high performance PCs and/or supercomputers composed of significant numbers of computing units. In SwinDeW- G, a scientific workflow is executed by different peers that may be distributed at different grid nodes. As shown in Figure 5, each grid node can have a number of peers, and each peer can be simply viewed as a grid service. The top plane of Figure 5 shows a sample of how a scientific workflow can be executed in the grid computing environment. Fig. 5. SwinDeW-G Evironment b) SwinDeW-C: is built on a cloud computing infrastructure called SwinCloud [16]. In SwinDeW-C a peer is deployed on a virtual machine (Platform Layer in Figure 6), which can scale the computing power dynamically according to the tasks requested. The architecture of SwinDeW-C is depicted in Figure 6 and includes the general cloud architecture layers from top to bottom: application layer (user applications), platform layer (middleware cloud services to facilitate the development/deployment of user applications), unified resource layer (abstracted/encapsulated resources by virtualisation) and fabric layer (physical hardware resources). The focus on the system architecture are the cloud management services like pricing (pay-per-use) and virtual machine Fig. 6. Architecture of SwinDeW-C management (scalability). Furthermore, the usability improved by accessing SwinDeW-C via a web portal with any device from anywhere in the world. Compared with SwinDeW-G which can only be accessed through a SwinDeW-G peer with pre-installed software. C. Cloud Computing Platform Based on P2P In this subsection, we propose a peer-to-peer file system, Peeraid [24], to tackle storage and data management problems of open cloud systems. Based on distributed hash table, Peeraid is fully decentralized to guarantee scalability and avoid single point of failure. As discussed before, cloud computing platforms are a set of scaleable large-scale data server clusters, providing computing and storage services to customers. Usually the architecture of current cloud computing system are central structured, thus all data nodes must be indexed by a master server which may become a bottleneck of the system. To resolve this problem Peeraid proposes a cloud computing architecture based on P2P which provides a pure distributed data storage environment without any central entity. The cloud based on the proposed architecture shown in Figure 7, is selforganized and self-managed and shall have a better scalability and fault tolerance. The distributed P2P network indexed by DHT arithmetic such as Chord [20] or Pastry [21]. The key components in this system are the gateway and the chunk server that are defined as follows: Gateway: the entity which can transfer the request or response between the client application with the network and can lead the request to the nearest node in the network. Chunk Server: the entity which is served as the data resource node and P2P node. It has three function modules with separated interfaces. As shown in the figure above: Index Module, takes charge of part of the global resource index which is assigned by DHT arithmetic. Route Module, passes a lookup request by a next hop routing table which is also assigned by DHT. Data Module, provides the data resource stored in the local machine. 170

171 CLOUD COMPUTING Fig. 7. Cloud Storage Architecture Based on P2P VI. CONCLUSION In this paper, we proposed a detailed ontology for the cloud in an attempt to establish the knowledge domain of the area of cloud computing and its relevant components. As shown, the IT trends like virtualization, cluster or grid computing and the vision of XaaS (everything as a service) led to the nowadays known term of cloud computing. But, with the maturity of IT technologies and the expansion of the business concepts in this area like Google Chrome OS (a cloud based operating system) [1] or Google Cloud Print [2], it will be a changing characterization. What can we learn from this is, that cloud computing does not only offer a way to a less carrying IT. It is still a complex resource, which requires knowledge and hard work in order to use it profitably [14]. REFERENCES [1] Google chrome os, Google Inc, Tech. Rep., [Online]. Available: introducing-google-chrome-os.html VI [2] Google cloud print, Google Inc, Tech. Rep., [Online]. Available: docs/overview.html VI [3] BITCOM, Cloud computing - evolution in der technik, revolution im business, BITKOM, Tech. Rep., [Online]. Available: II-C [4] R. Boothby, What is cloud computing, Joyent, Tech. Rep., [Online]. Available: I [5] R. Bragg, Cloud computing: When computers really rule, Tech News World, Tech. Rep., [Online]. Available: technewsworld.com/story/63954.html I [6] R. Buest, Cloud architektur, ClourUser Expert, Tech. Rep., [Online]. Available: 1 [7], Was ist utility computing, ClourUser Expert, Tech. Rep., [Online]. Available: was-ist-utility-computing I [8] R. Buyya, C. S. Yeo, and S. Venugopal, Market-oriented cloud computing: Vision, hype, and reality for delivering it services as computing utilities, CoRR, Tech. Rep., [Online]. Available: I [9] Cloude Slam 10. [Online]. Available: IV [10] I. Foster, The globus toolkit for grid computing, in CCGRID 01: Proceedings of the 1st International Symposium on Cluster Computing and the Grid. Washington, DC, USA: IEEE Computer Society, 2001, p. 2. II-B [11] J. Geelan, Twenty-one experts define cloud computing, Virtualization Journal, Tech. Rep., [Online]. Available: sys-con.com/node/ I [12] W. Grohmann, Cloud, iaas, paas, saas, xaas, s+s - was ist das? Computerwoche, Tech. Rep., [Online]. Available: http: // II-C [13] G. Gruman and E. Knorr, What cloud computing really means, InfoWorld, Tech. Rep., [Online]. Available: com/article/08/04/07/15fe-cloud-computing-reality1.html I [14] W. Herrmann, Neun mythen um cloud computing, Computerwoche, Tech. Rep., [Online]. Available: hardware/data-center-server/ /index11.html VI [15] G. Huizenga, Cloud computing: Coming out of the fog, II-A, II-B [16] X. Liu, D. Yuan, G. Zhang, J. Chen, and Y. Yang, Swindew-c: A peerto-peer based cloud workflow system, V-A, V-B, V-B0b [17] J. Maguire, Cloud computing: The ever expanding buzzword, [Online]. Available: php/ /cloud-computing-the-ever-expanding-buzzword.htm I [18] A. Marinos and G. Briscoe, Community cloud computing, in Cloud- Com 09: Proceedings of the 1st International Conference on Cloud Computing, 2009, pp , 4 [19] D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, and D. Zagorodnov, The eucalyptus open-source cloud-computing system, in CCGRID 09: Proceedings of the th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009, pp III-A [20] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, A scalable content-addressable network, in SIGCOMM 01: Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, 2001, pp V-C [21] A. I. T. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems, in Middleware 01: Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg. London, UK: Springer- Verlag, 2001, pp V-C [22] M. Schmidt, N. Fallenbeck, M. Smith, and B. Freisleben, Secure service-oriented grid computing with public virtual worker nodes, in SEAA 09: Proceedings of the th Euromicro Conference on Software Engineering and Advanced Applications, 2009, pp II-B [23] R. Schollmeier, A definition of peer-to-peer networking for the classification of peer-to-peer architectures and applications, in P2P 01: Proceedings of the First International Conference on Peer-to-Peer Computing. Washington, DC, USA: IEEE Computer Society, 2001, p V [24] K. Xu, M. Song, X. Zhang, and J. Song, A cloud computing platform based on p2p, Beijing University of Posts and Telecommunications, V-C [25] Y. Yang, K. Liu, J. Chen, J. Lignier, and H. Jin, Peer-to-peer based grid workflow runtime environment of swindew-g, in E-SCIENCE 07: Proceedings of the Third IEEE International Conference on e-science and Grid Computing, 2007, pp V-B 171

172 THE IMPACT OF NETWORK PROPERTIES ON MULTIPLAYER GAMES The Impact of Network Properties on Multiplayer Games Dimitri Wulffert Abstract Online multiplayer games are very sensitive to internet common problems like latency and packet loss. In order to set up an efficient network and provide better service in multiplayer games, it is necessary to investigate which games are affected by network properties and in which degree. This paper shows relevant studies on several kinds of games and summarises the impact of network properties on different type of games as well as on the end user experience. I. INTRODUCTION Online multiplayer games are very popular among gamers, in most of the cases the multiplayer component is preferred because the opponent is a human. The reason behind this preference is that human players find more challenging and rewarding to play against a human player than a computer bot. Because of this reason many companies include in their games a sophisticated multiplayer component, so that gamers can enjoy a longer replay value from the game. This multiplayer experience is affected by multiple factors: How many people play the game. How good or bad play the other players. How fun is the game, playing with others. How strong is the network and the stability of the game. This paper concentrates on the last point: on how the network properties affect the gameplay and the players performance. The investigation focuses on several kinds of games: First person shooter (FPS), Real time strategy (RTS), Sports, Racing and tests how they are affected to different properties on the network. The information provided in this paper can be useful to users, game developers and networks developers. For users to have a general idea on how the properties of the network affect the game experience depending on the game type. For game developers to see how the latency range affects different kinds of games; and for network developers to ensure quality of service by building a more adequate network for gaming [2]. This paper is organized as follows: section 2 shows the approach used in this investigation in order to measure the effects of latency and packet loss. Section 3 explains for each kind of game, how latency and packet loss affect the game s and players performance. Section 4 shows the conclusions and future work. II. APPROACH In order to test the effect of network properties on different kinds of games, the use of experiments and tests made from other studies was required (for each respective type of game). This includes studies from First Person Shooter [2], Real Time Strategy [1], Racing [5] [4] and Sport games [3]. To measure the players performance, it is important to take into account that the skills of the player influence the results of the tests. For this reason, in this investigation players are classified in three groups: Pro gamer: Player with very good playing skills. Average gamer: Player with average playing skills. Inexperience gamer: Player with no experience in playing games. Not all tests include this player classification. In case the test does not include this separation, this investigation assumes average gamers. III. INVESTIGATION Online Multiplayer games are affected by the settings of the network: if the speed of the network is slow, the result will be a high latency. On the other hand if the network is congested, it is possible to encounter packet loss. These two problems are the most common when players play online multiplayer games and affect the game experience, making it either annoying or frustrating. Before going to the next section, let s see the definition of latency and packet loss. Latency Latency is the time measured, in which transmitted messages from station A reach station B. If the latency is higher, messages between A and B will take longer to reach their destination. Packet loss Packet loss, as its name suggests, occurs when one or more transmitted packets fails to reach their destination. This failure might be caused by several factors, e.g. due to collision in the network. As a consequence the information can be lost or corrupted. These two problems react differently depending on which kind of game is currently being played. This means that there are games more resistant to latency or packet loss than other kind of games. In order to measure which actions are more or less affected by latency and packet loss, the classification of the next terms is required [2]: precision and deadline. Precision Precision [2] actions determine how accurate the player must be in order to execute his/her action. 172

173 THE IMPACT OF NETWORK PROPERTIES ON MULTIPLAYER GAMES Fig. 1. Precision vs. Deadline [2] For instance, in FPS games, the precision actions are based on the snipers counted, here the level of precision required to fire the weapon successfully is relatively high, so in this case a delay (or a faulty synchronization due to loss packets) could be fatal for the gamer. Therefore these kind of precision actions are highly affected by delay and packet loss. Simple Movement Complex Movement For the simple movement test, two players start running in the same direction from the same start point. The objective is to see how the two players perform when the network is manipulated to increase the delay. Results have shown that no matter how high the delay and packet loss were, the speed and the time of both players were the same. This outcome is explained [6] by the fact, that the players locations require minimal iteration from the server and because UT2003 uses client-side prediction for latency and packet loss compensation. For the complex movement test, a special track with multiple obstacles was made. Players try to reach the finish line in the shortest time possible. As shown in Fig. 2 and 3 [6], the players performance during complex movement was not significantly affected neither by latency smaller than 400ms nor by a packet loss. Fig. 2. Movement performance vs latecy in FPS [6] Deadline Deadline [2] is the time required for an action to be executed. If the action is really important and must be done at once, like e.g. firing a sniper rifle in a FPS game, it has a very short deadline. There are longer deadlines, e.g. in RTS games, where the player tells its army to move, this could take 500 ms and it will not affect neither the performance nor the actual game experience. In Fig. 1 the relation between Precision and Deadline is displayed. Taking as basis the previous concepts and definitions, this investigation shows how different kind of games are affected by packet loss and delay: A. First person shooters First person shooters games are known for its first person view, in other words the player can see what the avatar sees in a first person perspective. The player has also control of the movement and view of the avatar and in many games it is able to shoot targets and also to interact with its environment. For this kind of game, players are required to have good reflexes, precision and timing in order to meet their target. With so many different types of FPS games, with different types of weapons, characters, objectives, etc, there are two concepts that are common in each FPS games: movement and shooting. Movement in FPS Actions like moving and jumping are affected by delays, but the question is: how long must be the delay in order to affect the performance of the player? For this question, several tests were made by [6] in Unreal Tournament 2003, in which movement was divided in two different actions: Shooting in FPS In FPS there are many different types of weapons; these weapons can be categorized in two classes [6]: Precision weapons (Sniper rifle) Normal weapons (Machine guns, Uzis, etc) Depending on the weapon used by the player, latency may affect the gameplay in a different way. For example, if the player has a sniper rifle, the grade of precision required to shoot successfully implies almost perfect timing. Therefore this kind of weapon has a very short deadline and needs high precision as shown in Fig. 1, i.e. in FPS games, these weapons are the most affected by latency and delay. On the other hand, normal weapons require lower exactness than precision weapons but still their deadline is the same as for precision weapons as shown in Fig. 1. But because the required precision is lower, they are more resistant to latency and delay. The player s performance under various degrees of latency and packet loss is shown in Fig 4. and 5. As displayed in the figures, players performance under 50 ms latency is not altered significantly, but as soon as the latency exceeds 50 ms of latency, the players performance deteriorates, in this example from 41 kills to 30 kills, an approximate 20 percent of performance loss, also the number of deaths for this player 173

174 THE IMPACT OF NETWORK PROPERTIES ON MULTIPLAYER GAMES Fig. 3. Movement performance vs packet loss in FPS [6] Fig. 5. Shooting performance vs packet loss in FPS [6] Fig. 4. Shooting performance vs latecy in FPS [6] TABLE I REQUIRED EXPLORING TIME UNDER DIFFERENT LATENCY [1] Latency ms ms Warcraft III s s Age of Mythology s s CCG s s increases with higher latency, from 10 to 15 deaths, also a 50 percent of performance degradation. In terms of packet loss, players performed better with more packet loss, than with higher latency. As it is displayed in Fig. 5, kill rate stays almost intact and death rate is not affected significantly. In conclusion, FPS games are best played under 50 ms latency without altering the player performance due network problems. Another important point is that higher latency affects more the gameplay of FPS than packet loss. B. Real time strategy RTS games are known for their Omnipresent perspective, this means players can see the field from a very high position. They are also known for giving the possibility to control several objects at the same time. In order to test RTS games a special classification was made by Marc Claypool [1], this classification splits the game in three parts: Exploring, time during which the player explores her/his surroundings. Building, time during which the player creates buildings, researches and builds up her/his army. Combat, part of the game in which two or more armies fight for supremacy. The tests were based on three different games: Warcraft III, Age of Mythology and Command and Conquer Generals (CCG), and special maps were created to test the players performance under different latency situations. For these tests the latency was in a range between 0 ms and 3000 ms, but most of the test were focused on a maximal latency of 1000 ms. Exploring For this test, the time during which the player explores all maps is measured. The test was repeated under different latencies as displayed in Table I. As shown in Table I, for latency between ms, the time difference for CCG and Age of Mythology was lower than 10s whereas for Warcraft III it was 35s. For a latency range between 500 and 1000 ms, the same first two games mentioned above increased the time difference in 5s while the increase for Warcraft III was of 20s. This means that in Warcraft III every 100 ms of latency, increased in 6s the exploring time; this time difference is insignificant for a RTS game. The other two games have shown that even less time change over higher latency. In conclusion, exploring with higher latency does not affect significantly the players performance. Building Building tests had similar results as the exploring tests. In this test time was also measured after certain amount of buildings and research was made. Table II shows that under higher latency, the required building time was shorter [1] at least in Age of Mythology and CCG, and in the case of Warcraft III there is a delay of 0.6 seconds per 100 ms of latency, which is insignificant in RTS games. In conclusion, these three games are not affected by latency in the building phase. Combat Combat in RTS depends considerably on the game being played, for instance if the micro management (control of single units or small groups in the game) in the game is very important, then the game will be more susceptible to latency. For the test, each player has an army with a determinate number of health points (Health points for Warcraft III and Age of Mythology, number of units in Command and Conquer Generals), these two armies combat through different latency 174

175 THE IMPACT OF NETWORK PROPERTIES ON MULTIPLAYER GAMES TABLE II REQUIRED BUILDING TIME UNDER DIFFERENT LATENCY [1] Fig. 6. Collision delay [4] Latency ms ms Warcraft III s s Age of Mythology s s CCG s s TABLE III COMBAT PERFORMANCE UNDER DIFFERENT LATENCY [1] Fig. 7. Racing performance vs latecy [4] Unit score difference 0 to 500 ms 500 to 1000 ms Warcraft III -500 to to -200 Age of Mythology 190 to to 380 CCG 15 to to 13 and the difference of health points tells how good the player performed. Table III shows how players performed after combating each other with different of latency. Players with a latency between 500 and 1000 ms lost one or two units more than under latency between 0 and 500 ms. For this reason, latency has little influence in the outcome of the battle. After looking all three phases, i.e exploring, building and combat in RTS games, it is observed that latency has little effect on the players performance. This conclusion is explained by the fact that the strategy factor a player has to display in order to win the game is essential compared to the reaction time of the player in a real time game, in other words good strategy surpasses good reaction in RTS. C. Racing games In racing games, players have control over a vehicle ( e.g. a car, a spaceship, a submarine, etc. ). In most of these games, players have to run a circuit in the shortest time possible. There are many types of variations of racing games, e.g. simulation racing like Forza 3 or an arcade racing game like Burnout Paradise. Racing games can be separated into racing games without and with collisions. Without collision Games without collision are common (e.g. Trackmania). In this kind of game the players performance is not affected even under high delays (only if the game does not synchronize the position of other players). This is because the interaction with the other players is minimal and there are often played in single player mode, with the difference that the player can compare its score with other players. With collision In comparison to the games without collisions, a game with collisions and with a system that synchronizes the game frequently, is more exposed to latency and packet loss. The reason behind this, is that the game tries to keep the same state on all clients. So if any of the clients lags, the other clients will also be affected. In order to test this point, several tests were made by Lothar Pantel and Lars C. Wolf [4]. The first tests were based on two games: Need for Speed and Re volt, here 5 different types of test were made. In the first four kinds of test, the objective was to test how well the games were synchronized, when playing with two players. For these tests, the following problems occurred: In both monitors, both players are shown to be in the first place. In most of the cases, at the start of the race, the server has the first place (even when both cars are the same, and one person is pushing forward in both computers at the same time) If a collision occurs on players B screen as shown in Fig 6., two possible situations can happen: either the collision happens also on Screen A with player A asking himself how did it occur; or in Screen B the collision is not displayed anymore, showing an incongruence in the gameplay. In both cases, one of the players receive an unexpected behaviour of the game. Fig. 7 shows how latency affects racing games. Here three different players with different skills, play a game called RC- Car Simulation. Gamers are exposed to different levels of latency and their best times for each lap are recorded. Fig. 7 shows, that with more than 50 ms of latency, all players performance started to deteriorate. But this degradation is not the same on each player. Depending on how skilled the player is, the performance s deterioration was altered differently due to high latency. Players also stated that when the game had a latency of 50ms, the delay was hardly noticed, even at 100 ms the game was playable, but with more than 200 ms the game experience was unacceptable. In conclusion, based on the test results, racing games are best played with a latency up to 50ms. For a simulation game, a latency of 100ms has to be avoided and for every kind of racing game a latency higher of 200 ms will make the game experience unacceptable. D. Sport games Another very popular genre in the games industry is sports. These games are related to real sports like basketball, soccer, 175

176 THE IMPACT OF NETWORK PROPERTIES ON MULTIPLAYER GAMES football, tennis, etc. For this kind of games, a multiplayer is a very important part of the game, so people can test their skills against other people online. This investigation use a study made by James Nichols and Mark Claypool [3], testing Madden NFL football. Here the use of "playstation 2" consoles was used, the reason behind this is that sports games are more popular in consoles than in PCs. For the test, the environment was set as follows: two consoles Alpha and Beta play against each other. Beta connects to Alpha and the latency reference is: Console Alpha with 1500 ms latency, console Beta with low latency. Console Beta with 1500 ms latency, console Alpha with low latency. Both consoles with 750 ms latency. For the first test, Alpha got a high latency and as described by [3], the game uses a "Dumb Client Model", in order to handle this situation. Here the client sends to the server a "I want to move" request; the server receives the information, validates the move and sends an acknowledge (ACK) to the client; after the client receives the response, the client s console renders the move. In other words, this means that player will have to wait for the response of the other console, in order to render its action. For this reason the player might get very frustrated if he/she is making an important move and because of the slow reaction of the network he/she loses the ball. For the second test, Beta has a latency of 1500 ms. Here the use of the "Client-side Prediction" was made as explained in [3]. In this case when the client Alpha makes a move, it will be rendered at the same time in which the message is transmitted to Beta, Beta verifies and validates the move, then returns a message to Alpha, with this last message Alpha fixes the movement. As shown in the first two tests, the behaviour in the game is different depending on who is the host and which of the players has a higher latency. For the third test, both players have the same amount of latency. So if Alpha sends a message to Beta, Alpha will wait for 1/2 of the estimated round-trip time (RTT), after this time passed, Alpha render the move. This means that Alpha render the move at 750 ms, after he sends the message. Beta is going to receive the message at 750 ms and then it will render the move too. So both consoles render the move at the same time, this also means that both players will play with delay, but both of them are playing synchronized. Another test was also made, this time only one player is induced to different ranges of latency. As shown in Fig. 8, the players performance was measured based on the amount of gained yards. Yards could be gained when the player is playing offensively. The player can either make a pass or run with the ball, the more yards are gained, the better is the performance. In this test, players show good performance under 500 ms of latency gaining between 4.5 and 5 yards. But as soon the latency goes higher than 500 ms, the players performance is drastic reduced. Here are some examples, on how players react to latency: The player got the ball and is going to the left side of the field, he/she wants to change course but because of Fig. 8. Performance vs latecy in NFL [3] high latency, the player ended going out of bounds of the field. The player wants to pass the ball, the pass must have a precise timing, but because of latency, the pass is delayed and therefore intercepted. In conclusion, the player s performance was not affected with less than 500 ms of latency, but with more than 750 or more ms of latency the game starts to lag and the game experience as well as the player s performance are affected. It is also important to notice that depending on how the current state of the network is, the game will use different models in order to handle this (Dumb Model, Client Side Prediction or Symmetrical latencies). IV. CONCLUSION AND FUTURE WORK In this investigation, four kinds of games: FPS, RTS, Sports and Racing games, were tested with different network problems: latency and packet loss. The objective was to see how the performance of the player and the gameplay were affected by these two problems. Based on the different tests and their results, it is observed that racing games are the most sensitive to latency. This is because the tested games have shown incongruent gameplay with more than 50 ms latency, e.g collisions that never happened or two players at the same time in the first place. Also the players performance deteriorated with more than 50 ms of latency. FPS games are also sensitive to latency, but in FPS the actions made by the player may increase or decrease the susceptibility to latency. Players movements are not highly affected by latency even if the movement is complex. But shooting in FPS is highly affected by latency, even more if the player is using precision weapons. Also the players performance is affected by increasing the number of deaths and decreasing the numbers of kills with more than 50 ms of latency. Sport games have shown to be more resistant to latency than racing and FPS games. Here even with a latency of 500 ms, the players performance was not affected significantly. There were some cases, that actions like pass and running were affected by latency, but only with more than 500 ms of latency. The reason behind this: is that the factor of tactics is more important than fast reaction. 176

177 THE IMPACT OF NETWORK PROPERTIES ON MULTIPLAYER GAMES RTS games have shown to be the most resistant to latency. In this kind of game, players performance was not significantly affected, even with more than 1000 ms of latency. Also the outcomes of battles were not affected by latency, because the strategy factor of the game has a primary role, even higher than the reaction of the player. In all of these kinds of games, the number of packet loss, did not affected neither the gameplay nor the players performances. As for the next steps, further research in newer games will be of benefit. The reason behind this is that most of the testing done so far is done with games older than five or six years, and some of the newer games handle network properties better than older ones. Another two possible interesting area for future work based in this topic, could be a research based on: how players adapt to latency in different kind of games. Or how multiplayer games are affected by network settings in emerging mobile communication technologies like 4G. REFERENCES [1] Mark Claypool. The effect of latency on user performance in Real-Time Strategy games. Elsevier North-Holland, Inc. New York, NY, USA, II, III-B, I, III-B, II, III [2] Mark Claypool and Kajal Claypool. Latency and player actions in online games. ACM New York, NY, USA, I, II, III, III, 1, III [3] James Nichols and Mark Claypool. The Effects of Latency on Online Madden NFL Football. ACM New York, NY, USA, II, III-D, 8 [4] Lothar Pantel and Lars C. Wolfl. On the Impact of Delay on Real-Time Multiplayer Games. ACM New York, NY, USA, II, III-C, 6, 7 [5] Yutaka Ishibashi Takahiro Yasuil and Tomohito Ikedo. Influences of Network Latency and Packet Loss on Consistency in Networked Racing Games. ACM New York, NY, USA, II [6] Corey Lusher John Plunkett Emmanuel Agu Tom Beigbeder, Rory Coughlan and Mark Claypool. The Effects of Loss and Latency on User Performance in Unreal Tournament ACM New York, NY, USA, III-A, 2, III-A, 3, 4, 5 177

178 P2P QUALITY MANAGEMENT P2P Quality Management Mingmin Xu Abstract P2P applications are widely used nowadays, but to monitor and manage the quality of a P2P system (e.g. specific lookup times, communication delays, underlay efficiency...) is still difficult. Researchers propose various approaches to overcome the difficulty of the P2P quality management. This paper gives an overview on current approaches to control the service quality provided by p2p systems. Three different approaches will be discussed in detail and compared with each other. Those approaches of quality management are useful, but integrated and systematic approaches that can be widely used in different P2P systems are currently rare. I. INTRODUCTION In the year 1999, the first Peer-to-Peer(P2P) system, the music-sharing application Napster [7], was introduced in the Internet. The popularity of Peer-to-Peer networks has grown dramatically ever since. Since at least 2003, the traffic load on the Internet appears to be dominated by P2P applications [12]. P2P applications are nowadays not only used for file sharing. With the emerge of P2P applications such as Skype, Edutella, PPlive and SETI@home, P2P systems are now widely used in many other areas. Various P2P applications have specific quality requirements of their own. Appropriate quality management of P2P systems is one of the key factors of the success of P2P systems, especially in a commercial scenario. The quality of service provided by a P2P system must be measurable and controllable. That is important both for the users and for P2P system providers as well. Comparing to the client/server systems, the quality management of a P2P system has both special benefits and challenges of his own. On one hand, the quality of a P2P system can be monitored and managed with much lower costs, because costs are typically shared among the participating nodes. On the other hand, the lack of a central server and the decentralized stored information about the whole running system makes the administration more difficult. Because of the dynamic and heterogeneous nature of P2P systems, it is quite difficult to reach every node to retrieve precise and fresh information. Furthermore, the uncertainty and unreliability of links and nodes, as well as the large scale of systems, lead also to the complexity problem. Researchers propose various approaches to overcome the difficulty of the P2P quality management and take advantages of P2P paradigm. This paper gives an overview on current approaches to control the service quality provided by p2p systems. The remainder of the paper is structured as follows: Firstly, The quality attributes to describe a P2P system and metrics that used to measure the quality of systems are described in Section 2. After that, some related concepts such as performance management, network management and automatic computing, Fig. 1. Quality attributes in four groups. which are all useful for quality management will be introduced in Section 3. And then we discuss three of quality management approaches in Section 4. The Simple Network Management Protocol (SNMP), the Distributed Network Agents (DNA) framework and the SkyEye.KOM solution are discussed there one by one. These three approaches are compared with each other according to several criteria in Section 5. Section 6 finally concludes the paper. II. QUALITY ATTRIBUTES AND METRICS Different researchers use different attributes to describe the quality of P2P systems. Heckmann et al. [5] divide all important quality attributes into four groups (see Fig. 1): adaptivity, efficiency, validity and trust. These attributes include all important aspects of P2P systems, and can be used to evaluate and compare them. Each attributes can only be measured through a set of quality metrics. For example, in the group of efficiency, the attribute performance can be measured through metrics such as response times, data availability, hop count per lookup etc., while the attribute costs can be measured through metrics such as bandwidth consumption, load distribution, local storage consumption etc. Quality metrics play a central role in P2P quality management. Proper metrics must be selected according to specific quality requirement. And appropriate thresholds must be determined for each metric so that exceeding these thresholds indicates a quality problem worthy of attention. Referring to the concept of quality management, P2P quality management is focused not only on the quality of P2P network, but also the means to achieve it. Generally quality management approaches have three tasks: to monitor 178

179 P2P QUALITY MANAGEMENT the network system quality through gathering corresponding metrics information, to trigger an alarm in the case of exceeding of predefined threshold of such metrics and to start reconfiguration automatically to meet the given quality goals. The first task provides the foundation of the other two tasks, so that monitor is the basic function and of most importance, while only some of current quality management approaches realize the third task. Through the monitor and control a certain set of quality metrics, specific quality requirements for P2P systems will be guaranteed. III. QUALITY MANAGEMENT, NETWORK MANAGEMENT AND AUTOMATIC COMPUTING Before depicting any specific quality management approaches, some relevant concepts, such as network management and automatic computing, need be introduced firstly. A. Network Management and FCAPS The main goal of network management is to ensure that the users of a network receive the information technology services with the quality that they expect [2]. To achieve this, network managers must monitor, control, and secure the computing assets connected to the network. A common way of characterizing network management functions is FCAPS. FCAPS is an acronym for fault, configuration, accounting, performance, security, the management categories into which the ISO model defines network management tasks. Therefore network management has altogether five functionally areas as following [8] [14]: Fault management A fault is an event that has a negative significance. Faults can cause downtime or unacceptable network degradation. The goal of fault management is to recognize, isolate, correct and log faults to improve the availability of the network. Furthermore, it uses trend analysis to predict errors so that the network is always available. Fault management is perhaps the most widely implemented of the ISO network management elements. Configuration management The goal of configuration is to monitor network and system configuration information so that the effects on network operation of various versions of hardware and software elements can be tracked and managed. Accounting management The purpose of accounting management is to measure network utilization parameters. For nonbilled networks, administration management replaces accounting management. Performance management The goal of performance management is to measure the network performance and maintain the performance an acceptable level. The network performance addresses the throughput, percentage utilization, error rates and response times areas. The network can be monitored to collect performance data. Performance thresholds can be set in order to trigger an alarm. Performance data, such as throughout and response time, are important quality metrics as well. Therefore performance management has similar tasks like those of quality management. Security management The purpose of security management is to control access to network resources. Performance management and partially fault management have a tight relationship with quality management. So that the architectures, applications or approaches of network management (especially performance management) can be used in quality management, too. B. Automatic Computing and 4-Steps Model of Quality Management As mentioned in the introduction, the dynamic and the large scale of P2P systems lead often to the complexity problem. IBM proposed in 2001 the concept of automatic computing [6] to overcome the complexity problem in IT systems. This concept of automatic computing is also be useful in P2P quality management. The autonomic concept is inspired by the human body s autonomic nervous system, which is able to effectively monitor, control, and regulate the human body without external intervention. So an autonomic system is a system that manages itself. According to Paul Horn s definition, an autonomic system is have eight characteristics [9]: self-configuration, self-healing, self-optimization, self-protection, self-awareness, context-awareness, openness and anticipatory. Some of the characteristics are special meaningful for quality management. For example, self-awareness is the capability to know the current status, which is similar to the monitor task in quality management, while self-configuration or self-optimization enable the system to meet quality requirements automatically. There is a 4 steps loop to describe the autonomic element architecture. Following this approach, a 4-steps model of quality management [10] [4] (see Fig. 2) can be established. This model describe the steps that are needed to reach and keep a preset quality state. Policies are a set of predefined rules that used to manage and control the changing and/or maintaining of the quality state [9]. They govern how the four steps are accomplished, indicate which resources are to be monitored or not and how changes need to be propagated in the system. In quality management, policies can be defined as preset quality intervals for a set of metrics [4]. For example, the response time of a streaming application must be below 100ms or the count of overlay hops has to be between 7 to 11, and so on. This setting of quality intervals are scenario and application specific. Monitor The detail information of relevant quality metrics in a P2P system is collected, gathered, filtered and reported in monitor step [1]. Monitoring retrieve a live view on the quality of a running p2p system. This step can be seen as a knowledge-building tool, delivering the facts on which the future decisions in analyze step can base on. Analyze The information gathered in monitoring step is analyzed in this step. Following the preset policies (a set of quality intervals), the current quality state is compared to the preset quality requirements to determine if some change needs to be made, and which metric needs to be de- or increased. In the case that deviation is detected from the preset quality intervals the change request is logically passed to the plan step. 179

180 P2P QUALITY MANAGEMENT Fig. 3. SNMP Manager and managed devices. Fig. 2. The 4-steps model of quality management. Plan Quality metrics cannot be de- or increased directly, they can only be changed through a reconfiguration of the system. The step decides which configurable parameter needs to be changed in order to effect the invalid quality metric and how. Various mechanism can be used in this step to decide on the interdependence between metrics and configurable parameters. They help to find out which parameters in the system configuration to change in order to lower or raise a specific metric. Execute The execute step provides the mechanism to schedule and perform the necessary system reconfiguration. The information of change plan should spread quickly to all peeps in the network, and new settings should be applied on each node in a coordinated fashion. Once the new configuration is adopted system wide, the value of quality metrics need to be updated and the 4-steps cycle restarts. The last three steps can also be regarded as the control step based on monitor step. Using the concept of automatic computing leads to a self-organized, efficient quality management system. IV. APPROACHES OF QUALITY MANAGEMENT After the introduce of some useful concepts, some available approaches of quality management will be discussed one by one in this section. A. SNMP Protocol Simple Network Management Protocol (SNMP) is a UDPbased network protocol, which operates in the Application Layer of the Internet Protocol Suite as defined by the Internet Engineering Task Force (IETF) [11]. SNMP is based on a manager/agent model and allows an SNMP manager (the controller) to control an SNMP agent (the Fig. 4. Three kinds of SNMP messages. controlee) by exchanging SNMP messages (see fig. 3). The managed devices, sometimes called network elements, can be any type of device, such as routers, access servers, switches, bridges, hubs, IP telephones, IP video cameras, computer hosts, and printers. In general there are three kinds of SNMP messages that are exchanged between SNMP manager and SNMP agent: get, set and trap (see fig. 4). SNMP agent is a software component that runs on the managed devices. The first kind of messages are get request and get response. SNMP manager uses get request to ask for the value of a variable or list of variables. After receiving such request, the agent send a response with the required information back. The second kind of messages are Trap. Trap is an asynchronous notification from agent to manager when some preset condition occurs. For example, agent can send a notification when a defect occurs. These two kinds of messages can be used to monitor (get) parameters on an SNMP agent in both active (get) and passive (trap) way. The third kind of messages are set request and set response, which are used to Change the value of a variable or list of variables. Variable bindings are specified in the body of the request. Changes to all specified variables are to be made as an atomic operation by the agent. A Response with current new values for the variables is returned. SNMP itself does not define which information (which 180