Burkhard Stiller, Karoly Farkas, Fabio Victora Hecht, Guilherme Sperb Machado, Andrei Vancea, Martin Waldburger (Eds.) Communication Systems V

Größe: px
Ab Seite anzeigen:

Download "Burkhard Stiller, Karoly Farkas, Fabio Victora Hecht, Guilherme Sperb Machado, Andrei Vancea, Martin Waldburger (Eds.) Communication Systems V"


1 Burkhard Stiller, Karoly Farkas, Fabio Victora Hecht, Guilherme Sperb Machado, Andrei Vancea, Martin Waldburger (Eds.) Communication Systems V TECHNICAL REPORT No. IFI August 2012 University of Zurich Department of Informatics (IFI) Binzmühlestrasse 14, CH-8050 Zürich, Switzerland ifi

2 B. Stiller et al. (Eds.): Communication Systems V Technical Report No. IFI , August 2012 Communication Systems Group (CSG) Department of Informatics (IFI) University of Zurich Binzmühlestrasse 14, CH-8050 Zürich, Switzerland URL:

3 Introduction The Department of Informatics (IFI) of the University of Zurich, Switzerland works on research and teaching in the area of computer networks and communication systems. Communication systems include a wide range of topics and drive many research and development activities. Therefore, during the spring term FS 2012 a new instance of the Communication Systems seminar has been prepared and students as well as supervisors worked on this topic. The areas of communication systems include among others wired and wireless network technologies, various network protocols, network management, Quality-of-Service (QoS) provisioning, mobility, security aspects, peer-to-peer systems, multimedia communication, and manifold applications, determining important parts of future networks. Therefore, this year s seminar addressed such areas in more depth. The understanding and clear identification of problems in technical and organizational terms have been prepared and challenges as well as weaknesses of existing approaches have been addressed. All talks in this seminar provide a systematic approach to judge dedicated pieces of systems or proposals and their suitability. Content This new edition of the seminar entitled Communication Systems V discusses a number of selected topics in the area of computer networks and communication systems. The first talk on Anonymous Communication Systems focuses on communication in networks with respect to anonymity. Pros and cons of anonymity are discussed and therefore methods for and against anonymity are explained. The second talk on Protocols for a Faster Web discusses novel approaches of protocols that intend to make the Web faster. Talk three on Content Distribution Networks gives an overview of common mechanisms and techniques of Content Distribution Networks (CDN), takes a look at economic aspects like the relationship between a CDN provider and ISPs, and it provides an outline of the current market situation in content delivery. Talk four on Wireless Indoor Positioning Techniques looks at wireless indoor positioning systems in use today. The respective positioning algorithms and the measurement techniques are described and compared among the the different systems presented. Finally, talk five on Where no Bit has Gone Before: Data Communication in Space presents challenges and technical difficulties for interplanetary communication, and it describes currently adopted architectures as well as communication protocols for communication in space. 3

4 4 Seminar Operation Based on well-developed experiences of former seminars, held in different academic environments, all interested students worked on an initially offered set of papers and book chapters. Those relate to the topic titles as presented in the Table of Content below. They prepared a written essay as a clearly focused presentation, an evaluation, and a summary of those topics. Each of these essays is included in this technical report as a separate section and allows for an overview on important areas of concern, technology architectures and functionality, sometimes business models in operation, and problems encountered. In addition, every student prepared a slide presentation of approximately 45 minutes to present findings and summaries to the audience of students attending the seminar and other interested students, research assistants, and professors. Following a general question and answer phase, a student-moderated discussion debated open issues and critical statements with the audience. Local IFI support for preparing talks, reports, and their preparation by students had been granted by Karoly Farkas, Fabio Victora Hecht, Guilherme Sperb Machado, Andrei Vancea, Martin Waldburger, and Burkhard Stiller. In particular, many thanks are addressed to Martin Waldburger for his strong commitment on getting this technical report ready and quickly published. A larger number of pre-presentation discussions have provided valuable insights in the emerging and moving field of communication systems, both for all groups of students and supervisors. Many thanks to all people contributing to the success of this event, which has happened in a lively group of highly motivated and technically qualified students and people. Zürich, August 2012

5 5 Contents 1 Anonymous Communication Systems 7 Michael Bloechlinger 2 Protocols for a Faster Web 21 Cyrill Pedol 3 Content Distribution Networks 37 Sebastian Golaszewski 4 Wireless Indoor Positioning Techniques 53 Rilind Ballazhi 5 Where no bit has gone before - Datenkommunikation im Weltraum 71 Denise Hoschek

6 6

7 7 Chapter 1 Anonymous Communication Systems Michael Bloechlinger What once naturally was anonymous like buying a magazine at a newsstand with cash, today looks different. Online activities like online shopping with credit cards leave digital traces. Gathering these information for profiling is problematic. How much personal information is revealed often is unknown. This paper gives an overview on anonymous communication on networks. Pros and cons on anonymity are discussed and therefore methods for and against anonymity are explained.

8 8 Anonymous Communication Systems Contents 1.1 Introduction and Motivation CIA Triad Anonymity and Pseudonymity the Internet Pros and Cons of Anonymity Anonymity and The Law History of Anonymous Communication Systems David Chaum MIX Onion Routing Current Services Tor Project JAP Project AN.ON Freenet Countermeasures Traffic Analysis Summary and Conclusion

9 Michael Bloechlinger Introduction and Motivation There once was a time when people paid goods with cash. Information was transmitted over broadcast networks like radio. People bought their newspapers and magazines at the newsstand. Letters were dropped into yellow boxes and phone calls were made from phone booths. Documents were processed by employees without the help of computers. Humans tend to forget over time that retrieving documents needed quite some effort, especially when documents were distributed over different locations. So what have these descriptions in common? A lot of daily activities once naturally were anonymous. Literally had to hire a private detective to spy on someone throughout the day to retrieve information on his or her actions or behavior. However, the introduction of information technology steadily replaced this natural anonymity [8]. The use of credit cards, digital television, online journals, online shopping, , digital dossiers and other online services (even online browsing) leave digital traces. Analysing and storing these digital traces is called profiling. Nowadays profiling becomes natural. Who is browsing, buying, visiting, communicating what, where and when. Also connecting these different often distributed information becomes possible. Since movies like Enemy of the State (1998) this is nothing new. But what to do about anonymity and especially anonymity on the Internet? Anonymity means that the real author of a message is not shown [10]. This report gives an overview on online anonymity. Further the history of anonymity on computer networks is presented and some pros and cons of anonymity are discussed. The report explains different methods and services to establish anonymous communications on networks. Finally this paper shows countermeasures against anonymity CIA Triad In the context of information security, the term CIA Triad comes up quite often. CIA stands for Confidentiality, Integrity and Availability. This section shows the common goal of each component of the CIA triad. Confidentiality means that nobody has access to the data who is not allowed to. Integrity makes sure that the data is not modified without notice. And availability ensures that continuous access to the data is guaranteed. The question is how is this connected to anonymity. The answer lies in communication. On the one hand, integrity is important for accountability. For example in the area of legal enforcement it is important that sender and recipient of messages are documented. On the other hand, the impact of confidentiality on communication is crucial. If person A sends a message to person B, the payload of the message has to be confident. This means the data has to be protected from unauthorized access. However not only the data of a message is in need of protection, but in some cases also its author. Not only the data transfered has to be considered, but also the communication relations of users. If person A uses an online service for a medical consultation, the identity of person A has to be protected. In other words, only the doctor of the medical service should know the identity of person A. This concludes that to all other users of the network, person A has to be able to communicate anonymous [1] Anonymity and Pseudonymity the Internet Implementing anonymous communication in a network means that it is impossible or very difficult to find out the real author of a message. Pseudonymity on the other hand says that instead of the real author s name, a cover name (pseudonym) is used. Pseudonymity is a common variant of anonymity. If the pseudonym is kept secret or if it is openly known who is behind it depends on the author. One person can even choose multiple

10 10 Anonymous Communication Systems pseudonyms for different types of communication. Compared to complete anonymity, by using pseudonyms it is possible to recognize different messages from the same author. Many Internet users think that they are anonymous because they are many among thousands of users. Without any protecting measures, anonymity on the Internet is an illusion. There are many potential enemies who could abuse the communication over the Internet. Competitors in the business world, secret services of other countries, system administrators or even a neighbor [1]. Encryption of the data helps only partial, an attacker could still gather information about who is communicating with whom. Therefore, one important property of anonymity on the Internet is unobservability. Unobservability means that everybody can be the originator of an event with an equal likelihood [2]. To be anonymous the communication relations between sender and recipient have to be hidden. Figure 1.1 shows users and messages created by these users. One users of this group is only anonymous, if one specific message can not be traced back to one specific user. This concludes that the number of users is important for anonymity. If all messages are created by only one user, it is clear who sent the messages. Figure 1.1: The relation between an user of the anonymity group and an event must be untraceable [2]. Although anonymity and pseudonymity existed before the Internet, large computer networks facilitate the sending of anonymous messages. However the messages mostly are not anonymous, meaning that the security for the anonymous user is low. A person might set up a account with a pseudonym and thinks that he or she can now send mails anonymously. Most service providers log the physical (IP) and sometime even the logical address (hostname) of their users. Internet service providers (ISP) then log which user has which IP address at any given time. Combining these information it is now possible to find out what person or Internet connection was accessing for example an account at a specific time. This requires the cooperation of the ISP [10]. Finally an e-commerce site may use price discrimination based on your country of origin. Anonymity could protect users of a service against this discrimination. 1.2 Pros and Cons of Anonymity Anonymity is a widely discussed topic and the opinions on the subject vary widely. As a consequence there are many arguments for and against anonymity. Anonymity itself is

11 Michael Bloechlinger 11 neither bad nor good. It greatly depends on the purpose it is used for. The following section shows advantages and disadvantages of anonymity. Often people are dependent on a system, institution or organisation. Therefore the affected person may be afraid of revenge and omits to disclose information about any inconsistencies. Anonymous tips can be an important way to make such inconsistencies public. Such information can be used by newspapers or law enforcements [10]. On the other hand an employee of a company could abuse an anonymous message board to express his frustration with the boss. Countries ruled by repressive political regime may persecute regime-critical citizens. Internet based anonymity servers may be crucial for communicating their political opinion. In this case anonymity protects political speech or even freedom of speech. But not all political opinions are legally allowed. In most democratic countries racial agitation are forbidden. Sometimes personal stuff is embarrassing to discuss openly. However people may discuss for example sexual problems anonymously. Research in this field has shown that anonymous participants disclose significantly more information about themselves [5]. Further anonymity can be interesting for authors of new ideas or theories. An author may get more objective feedback about his idea by not revealing his identity. Also factors like status, gender, appearance, race will not influence the evaluation of the idea. Unfortunately anonymity can be misused as well. Anonymity can be used to protect a person committing a crime. This may include the distribution of child pornography, illegal threats, racial agitation, fraud or intentional damage such as distribution of computer viruses [10]. This clearly depends on the law of the corresponding country and may even include high treason. Furthermore anonymity makes it possible to seek contacts for performing illegal acts. For example a pedophile could use an anonymous chat to search contacts to children. Finally even if the act is not illegal, a person could still use anonymity to say offensive things about another person. Depending on the law, the border between offensive and illegal is not always sharp. A study [10] from September 1995 made by Mikael Berglund shows how anonymity was used. The study was based an a scan of publicly available newsgroups in a Swedish Usenet News Server. Table 1.1 shows a classification of most commonly used topics on the server. At that time sexual topics were often discussed. It can be argued that anonymity on the Internet should not be restricted only because a few abuse it. On the other hand one could argue that statistics show anonymity on the Internet is mainly used for sexual topics. This clearly does not necessarily has to be misuse of anonymity but it still leads into that direction. However the statistics are from 1995 and there certainly is a difference between composition of the topics nowadays. One good example is the Arabic spring where the anonymity software Tor was a tool used by some bloggers, journalists and online activists to protect their identity and to practice free speech [4]. Since the governments persecuted facebook activists, people had to protect their identity on the Internet. Therefore anonymity helped in overcoming regimes in Arabic countries. This example shows the importance of anonymous network communication Anonymity and The Law On the 17th Dezember 2010 Barthassat Luc handed in a motion to extend the duration of the obligation to preserve records. Currently, the Swiss law tells the Internet Service Providers to preserve records of the IP address for six months [13]. Therefore Swiss providers such as Cablecom currently store IP-tables for exactly six months. Barthassat thinks that this is not enough. He argues that crimes concerning child pornography are often committed over the Internet. Swiss law enforcement teams are dependent on the

12 12 Anonymous Communication Systems Percentage Table 1.1: Classification of the contents of the newsgroup messages [10]. Topic 18,8 Sex 18,5 Partner search ad 9,4 Testing anonymity 8,7 Software 5,8 Hobby, work 4,7 Unclassified 4,3 Computer hardware 4,0 Religion 3,6 Picture 2,5 Races, racism 2,5 Politics 2,2 Internet etiquette (people complaining of other people s misuse of the net sometimes wrote anonymously) 1,4 Personal criticism of identified person 1,4 Internet reference 1,4 Ads selling something 1,4 Psychology 1,1 War, violence 1,1 Drugs (except pharmaceutical drugs) 1,1 Ethics 1,1 Contact ad which was not partner ad 0,7 Poetry 0,7 Celebrity gossip 0,7 Pharmaceutical drugs 0,4 Fiction 0,4 Censorship logfiles of the ISPs to track down pedophiles. Since these logfiles sometimes are the only way to persecute criminals, Barthassat wants ISPs to keep their records for minimum one year. Swiss parliament still has to decide whether the motion is to be accepted or not. Barthassat handed in a another motion: anonymity of online chat participants must be abolished, and the responsible federal agencies must be given the opportunity to monitor this provision [12]. This motion was not accepted though. Barthassat argues that the Internet support and even encourages prostitution of minors. Therefore anonymity in chatrooms is a real problem. However the Swiss Federal Council argued against the motion because it is unlikely to ensure monitoring in all chat service providers. These examples show that anonymous communication on networks certainly is an issue also in Swiss politics. Anonymity is discussed controversial and can be seen from different point of views. 1.3 History of Anonymous Communication Systems Network security focuses on preventing eavesdropping. This means that only the authorised communication partners have access to the payload of the messaged transferred. However a potential attacker could gather information even without reading the content of the message. Who communicated with whom is information by itself. Therefore traffic analysis can be used to retrieve such information. In the following section some methods for establishing anonymous communications over networks are presented. Table 1.2 gives

13 Michael Bloechlinger 13 a historical overview of research and development in the field of anonymous communication. The overview also shows on what previous work current anonymity services are based on. This section presents David Chaum mixes which are even today the theoretical basis of many anonymity services like onion routing. Table 1.2: Overview of anonymous communication theories and applications [1]. Year Theory Application 1978 Public-key encryption 1981 MIX, Pseudonyms 1983 Blind signature schemes 1985 Credentials 1988 DC network 1990 Privacy preserving value exchange 1991 ISDN-Mixes 1995 Blind message service 1995 Mixmaster 1996 MIXes in mobile communications 1996 Onion Routing 1997 Crowds Anonymizer 1998 Stop-and-Go (SG) Mixes 1999 Zeroknowledge Freedom Anonymizer 2000 AN.ON/JAP Anonymizer 2004 TOR David Chaum MIX First steps for anonymity on the Internet have been completed only in theory. At the end of the seventies, researchers recognized that data encryption is not enough for anonymous communication. Karger discussed in 1977 for the first time the impact of traffic analysis [9]. Traffic analysis is discussed in section In 1981, David Chaum presented a solution for the problem. His initial idea of mixes were applied to communication. A mix basically is a router which prevents tracing outbound to inbound messages. However the idea could be easily transferred to other areas of IT. A mix is a software which stores and forwards s. The goal in this process is to ensure that receiver can not track the sender of the message. Padlipsky, Snow und Karger (1978) recognized three important aspects of tracking: size of the message, timestamps sent/received and the address of the receiver [9]. The first problem can be solved by dividing the message into equally sized parts. If the last part of the divided message is too small, the algorithm just fills up the rest of the package. This is called padding. The second problem is addressed with store and forward called batch processing. Hereby the mix stores a defined amount of messages and forwards them all together. This way a single message can not be traced through the mix. Finally there is the issue with the receiver address. It is impossible to strip this information from the message because in this case the mix would not know where to send the message. Therefore Chaum suggests to use several mixes in a cascade. This method is called onion routing and will be discussed in the following section Onion Routing Onion routing is a network protocol for establishing anonymous communication. The protocol is based on a network of Chaum mixes called routers or relays. An onion is a

14 14 Anonymous Communication Systems cryptographically layered data structure that defines the route through the onion routing network [6]. Figure 1.2 shows these encryption layers. Onion routing makes use of public private key encryption. The layers of the onion are created before the onion is sent. Therefore a random route through the network is selected and the message will be encrypted with each public key of the routers along the selected route through the network. While the onion travels along the route, each router decrypts (peels) one layers of the onion and retrieves the next hop. Hence, only the last router knows the real destination host. Each onion router basically is a chaum mix. Without access to the router, it is hard to trace a message which passes a mix. Chaining these mixes even improves the security of the system. To make traffic analysis work, an attacker needs access to all the routers in the network (where the message was routed through) to track a message along the route. However, there is a possibility to compromise the system with access to only some routers of the network. Some information could be extracted only out of part of the package stream. Therefore it is crucial to have many trustworthy routers in the network. Figure 1.2: An onion is a layered data structure. Each layer of the onion is encrypted with a public key of a router [6]. 1.4 Current Services Today there are several services helping to protect the user identity. This section gives an overviews on some well-known anonymity services Tor Project The onion routing (Tor) project is a free anonymity service. Originally Tor was designed, implemented, and deployed as a third-generation onion routing project of the U.S. Naval Research Laboratory. Its primary purpose initially was the protection of government communications. Today, tor is used for a wide variety of purposes by normal people, the military, journalists, law enforcement officers, activists, and many others. The tor software protects the identity of the user by forwarding the communication through a distributed network of relays run by volunteers all around the world [14]. Based on the onion routing protocol tor protects its users against a common form of Internet surveillance known as traffic analysis.

15 Michael Bloechlinger Location Hidden Services Tor enables users to anonymously offer services such as publishing a website or setting of an instant messaging server. The Tor hidden service protocol makes use of so called rendezvous points to let users connect to hidden services in the Tor network. The following example shows how hidden services work: Bob wants to set up a hidden service. A hidden service needs to advertise its existence in the Tor network before clients will be able to contact it [15]. Bob s service randomly picks some routers from the Tor network, builds circuits to them, and asks them to act as introduction points by telling them its public key. Now a hidden service descriptor containing the public key of the service and a summary of each introduction point is signed with the private key. This descriptor is uploaded to a distributed hash table. The descriptor can be found by clients requesting XYZ.onion where XYZ is a 16 character name that can be uniquely derived from the service s public key [15]. This concludes the setup up process of the service. The following section describes the protocol details of Tor hidden services, which can be found on the Tor project website [15]. If Alice as a client wants to use Bob s hidden service, Alice has to retrieve the onion address first. Then the client is able to download the service descriptor from the distributed hash table. As shown before the descriptor contains the set of introduction points and the corresponding public key to use. Alice creates a circuit to another randomly picked router and asks it to act as rendezvous point by telling it a one-time secret. Alice sends the information about the rendezvous point as an introduce message to one of the introduction points. All communication is done via the Tor network ensuring the protection of the identities of all participants. Now Bob s hidden service decrypts the client s introduce message and finds the address of the rendezvous point and the one-time secret in it. The Service connects to the rendezvous point and sends the one-time secret to it in a rendezvous message. After that the rendezvous point notifies the client about successful connection establishment. Finally both client and hidden service can use their circuits to the rendezvous point for communicating with each other [15]. Figure 1.3 shows the final step of the creating process. Alice and Bob are able to communicate through the rendezvous point. Figure 1.3: Final step of the set up process for a Tor hidden service [15].

16 16 Anonymous Communication Systems JAP Project AN.ON In 2000, Prof. Dr. Andreas Pfitzmann started the project AN.ON - Anonymity.Online as a collaboration between the Technical University of Dresden and the Independent Centre for Privacy Protection Schleswig-Holstein. Later the University of Regensburg joined the project. The Java Anon Proxy (JAP), which is now operated by JonDonym, is one of the very actively used anonymity programs [9]. Figure 1.4 shows the architecture of the JAP system. JAP works like a local proxy for a browser. If a user is trying to access a website JAP receives the request from the browser and forwards the request to the anonymity service. The anonymity service is basically a mix cascade where the order of the mixes are predefined. JAP receives the response from the webserver which comes from the anonymity service and forwards it to the browser [7]. The AN.ON mixes are mostly operated by independent institutions[11]. The institutions declare a commitment that they do not save log files on the connected links nor exchange such data with other mix providers. The identity and number of organizations in each mix cascade can be viewed in detail on the JAP project site and can be verified using cryptographic methods. Bases on this information, JAP enables its users to select trustworthy mix cascades themselves. Figure 1.4: Architecture of the JAP project [7] Freenet Freenet is free software which lets its users anonymously share files, browse and publish web sites. These so called freesites are accessible only through Freenet. Further Freenet enables users to chat on forums, without fear of censorship. Freenet is decentralised in order to make it less vulnerable to attacks [3]. 1.5 Countermeasures Most service providers like hotmail log every transaction with the corresponding IP address. Combined with the ISP logs an IP address can be traced back to identify the person owning the internet connection where the message originated from. If law enforcement departments want to trace back an IP Address this always requires the cooperation of the ISP. For the ISP these decisions when to disclose or withhold information can be tricky. Therefore some ISP directly solve this problem in their terms and conditions where they state that searches for anonymous users are always assisted. But there many more wellknown methods to compromise anonymity. For example hidden elements can be included in the source code of a website, which transmits sensitive data without the knowledge of

17 Michael Bloechlinger 17 the user [10]. s are mostly not anonymous. The header of the message contains a trace of the route of the message. Of course this information can be manipulated but by default it is included in the . Figure 1.5 shows the header of an including the trace of the route. s e n t t o jpalme=dsv. su. s r e t u r n s. groups. yahoo. com R e c e i v e d : from n12. groups. yahoo. com ( n12. groups. yahoo. com [ ] ) by unni. dsv. su. s e ( / ) with SMTP i d CAA21903 f o r su. se >; Wed, 12 Dec : 1 9 : (MET) X egroups Return : s e n t t o jpalme=dsv. su. s r e t u r n s. groups. yahoo. com R e c e i v e d : from [ ] by n12. groups. yahoo. com with NNFMP; 12 Dec : 1 9 : Received : ( qmail invoked from network ) ; 12 Dec : 1 8 : R e c e i v e d : from unknown ( ) by m8. grp. snv. yahoo. com with QMQP; 12 Dec : 1 8 : R e c e i v e d : from unknown (HELO n26. groups. yahoo. com ) ( ) by mta1. grp. snv. yahoo. com with SMTP; 12 Dec : 1 8 : X egroups Return : l i z a r m r l i z a r d. com R e c e i v e d : from [ ] by n26. groups. yahoo. com with NNFMP; 12 Dec : 1 2 : X egroups Approved By : simparl com> via web ; 12 Dec : 1 8 : X Sender : l i z a r m r l i z a r d. com X Apparently To : web com Received : (EGP: mail ) ; 11 Dec : 50: Received : ( qmail invoked from network ) ; 11 Dec : 5 0 : R e c e i v e d : from unknown ( ) by m12. grp. snv. yahoo. com with QMQP; 11 Dec : 50: Received : from unknown (HELO micexchange. loanperformance. com ) ( ) by mta2. grp. snv. yahoo. com with SMTP; 11 Dec : 5 0 : R e c e i v e d : from m r l i z a r d. com ( IAN2 [ ] ) by micexchange. loanperformance. com with SMTP ( M i c r o s o f t Exchange I n t e r n e t Mail S e r v i c e V e r s i o n ) id W11PL97B ; Tue, 11 Dec : 53: Figure 1.5: header shows the trace of the route. The route is read from bottom up. [2] Traffic Analysis Traffic analysis focuses on information besides the payload of messages. Network security often focuses on encryption of the message. However by analyzing communication patterns, an attacker may be able to draw conclusions about the content of the message transferred [6]. There exist sophisticated statistical techniques to track these communications patterns. Not only tracking who communicates with whom but also how does the pattern of the packages look is relevant. In 1978, Padlipsky, Snow und Karger drew attention to the importance of the package size, timestamp and destination address of an encrypted package [9]. On the networking level, there is a difference between a video stream and, for example, the package stream for browsing a website. Hence even if the packages are encrypted, an attacker may still guess the type of service used. 1.6 Summary and Conclusion This report shows the importance of anonymity in the modern world. Activities which once naturally were anonymous, are today with the establishment of IT and especially the Internet not anonymous anymore. People tend to think that they are able to act anonymously in computer networks because they are one amongst thousand of users. Or users even rely on encryption and follow the illusion that encrypted communication will fully protect their identity. As discussed in the report this is not true. While encryption can help protect the data transfered it will in most cases not protect the identity of the sender. Services such as Tor, Freenet or Project AN.ON fill the need for anonymous communication on the Internet. Anonymity by itself is neither good nor bad. Important is the purpose it is used for. Finally anonymous communication networks are also a current topic in Swiss politics.

18 18 Bibliography [1] Federrath Hannes: Schutz der Privatsphäre im Internet, Universität Regensburg last visited [2] Federrath Hannes: Anonymity in the Internet, Universität Regensburg zisc.ethz.ch/events/ /isc2006slides/federrathzisctalk.pdf, last visited [3] Freenet website https://freenetproject.org/whatis.html, last visited [4] Ingmar Zahorsky: Tor, Anonymity, and the Arab Spring, An Interview with Jacob Appelbaum last visited [5] Joinson: Self-disclosure in computer-mediated communication: The role of self-awareness and visual anonymity. European Journal of Social Psychology, 31, (2001), Joinson2002_0.pdf, last visited [6] Kaviya K.: Network Security Implementation by Onion Routing, IEEE 2009 [7] Koepsell Stefan: AnonDienst - Design und Implementierung, inf.tu-dresden.de/develop/dokument.pdf, last visited [8] Koepsell Stefan, Pfitzmann Andreas: Wie viel Anonymitaet vertraegt unsere Gesellschaft, TU Dresden pdf, last visited [9] Kubieziel Jens: Anonymitaet im Internet, Magdeburger Journal zur Sicherheitsforschung, 2011, ISSN: article/view/116/114, last visited [10] Palme Jacob, Berglund Mikael: Anonymity in the Internet se/~jpalme/society/anonymity.html, last visited [11] Projekt: AN.ON - Anonymität.Online Website last visited [12] Schweizer Parlament: Schluss mit der Anonymitaet in Internet-Diskussionsforen , last visited [13] Schweizer Parlament: Verlaengerung der Aufbewahrungspflicht für Protokolle ueber die Zuteilung von IP-Adressen geschaefte.aspx?gesch_id= , last visited

19 Michael Bloechlinger 19 [14] TOR Project website https://www.torproject.org/docs/faq.html.en# Torisdifferent, last visited [15] TOR Hidden Service Protocol https://www.torproject.org/docs/ hidden-services.html.en, last visited [16] Wikipedia: Onion Routing last visited

20 20 Anonymous Communication Systems

21 21 Chapter 2 Protocols for a Faster Web Cyrill Pedol Web pages are getting more and more complex and make use of a lot more resources than in the early days of the Web. These increased requirements demand a lot more from the involved components and are negatively influencing the latency of Web pages. This report aims at discussing novel approaches of protocols that intend to make the Web faster and shall finally give reason for certain protocols to be a realistic approach.

22 22 Protocols for a Faster Web Contents 2.1 Introduction Motivation Variables to Be Improved Transport-Layer Protocols SCTP SST Discussion Application-Layer Protocols HTTP-NG SPDY Discussion Combining Protocols HTTP over SCTP Related Work Conclusion

23 Cyrill Pedol Introduction The World Wide Web, originally created by Tim Berners-Lee at CERN, is now more than 20 years old. The primary thought at that time was to create a network based system that allows for sharing information between scientists all over world [18]. As it turned out the Web did not only stick to this thought and has assumed massive proportions beyond that. The way the Web is used has changed a lot in the last years and Web pages are getting more and more complex in order to meet the contemporary requirements. Not surprising, the average size of a Web page today is almost 70 times as much as it was in 1995 and the average number of requests performed by a browser has been nearly increased up to factor 40 [20, 19]. Such an evolution from simple text based pages to multi-functional media oriented platforms demands a lot more from the underlying technology than it used to and directly affects the latency of Web pages eventually. Latency is a very important variable for Web page providers, particularly today. Consumer reaction studies show, that a whole third of online shoppers will not wait longer than 4 seconds for a page to be rendered, meaning that they would rather leave the page [21]. Whereas continuous improvements to hardware, software algorithms and the physical network itself helped to overcome the problem of higher page loading times, they slowly reached their limit in respect of efficiency. Sometimes these improvements do not even apply, especially in case of environments that typically just have narrow bandwidths and less CPU power, such as mobile devices. However the protocols, which provide the guidance of how communication has to be processed, did not change for a long time. The last update to the Web protocol stack, basically concerning the Hypertext Transfer Protocol (HTTP) and the Transmission Control Protocol (TCP), has been made over 10 years ago, which was moving to the HTTP/1.1 standard, published in 1999 [3]. These two protocols are often said to be obsolete, because they have to face new requirements they just have not been designed for, thus having clear weaknesses [11]. For this reason, there has been some research effort leading to new protocol approaches and initiatives with the goal to make the Web faster. This paper aims at discussing novel approaches of Web related protocols in respect of their position in the protocol stack. After pointing out some variables that need to be improved, the essentials of each protocol are going to be introduced, followed by a comparison of the associated protocols. First, the next section shall give reason in more detail why the Web could strongly benefit from better protocols. 2.2 Motivation Alongside the great improvements at the hardware and the software part, increases in bandwidth served very well for reducing page loading times in the last years. Therefore, why not just continue that path and let the bandwidth do all the work while leaving the protocols as they are? This would be suited for large content downloads, but since the Web-style traffic consist of many short-lived connections it will not work. The main factors that influence page latency are the bandwidth and the round trip time (RTT). The latter is defined as the time a data entity needs from one end to the other and back. As it turned out, RTT matters much more than bandwidth. Figure 2.1 shows the page loading time in relation to either bandwidth or RTT. It illustrates that augmenting bandwidth is of a logarithmic nature, whereas RTT behaves linearly. This is why tuning the bandwidth only helps up to a certain level. Upgrading the bandwidth from 5 Mbps to 10 Mbps would reduce page loading times by just about 5% [5]. Hence, we have reached a situation where more bandwidth is not going to help on.

24 24 Protocols for a Faster Web Figure 2.1: Latency per Bandwidth/RTT [5] Furthermore, the current protocols in use do not help us neither. Both TCP and HTTP were not designed for low latency transactions [1]. HTTP itself was even designed without fully understanding the underlying protocol [11]. But what are these protocols actually suffering from? Generally, people often find TCP as well as the User Datagram Protocol (UDP) just too limiting. Either they do not provide the exact functionality or they provide too much functionality. This is because of their single-edged behaviour, which is, TCP can only handle ordered and reliable connections whereas UDP can only manage the opposite. In many cases there would be a need for combining these properties [7]. Moreover, TCP does not support multiplexed or multistreamed connections, which is a serious drawback particularly for Web transactions. Web-style traffic, as mentioned before, typically consists of multiple small requests that should preferably be processed in parallel. However TCP, by its very nature, requires that each request is either encapsulated in a separate connection or serialized over a single connection. The former causes unnecessary connection warming, often referred to as slow start, as well as 3-way-handshakes for each of them. The latter introduces the problem of head-of-line (HOL) blocking. In any case, TCP enforces performance loss here. TCP has also some conceptual problems regarding connection setup and connection finishing. On the one hand, the 3-way-handshake is specified in a way that wants the passive side of the connection, which is typically the server, to store the connection information [7]. This makes the server vulnerable to SYN-Attacks. On the other hand, the process of finishing a connection may lead to a half-closed state [8]. But not only TCP has its shortcomings. HTTP has always been struggling with the connection problem brought up before. HTTP/1.0 only allowed a single request per TCP connection. This was of course very inefficient regarding fast transport and unfair to other applications at the same time. This is why HTTP started to move away from having many connections to rather a few. Hence, HTTP/1.1 allowed for persistent connections, meaning multiple request within the same TCP stream. It was even looking a step further and introduced pipelining. But this feature never really broke through due to many servers not supporting it correctly and due to its limitations to just idempotent actions [4]. In consequence, HTTP is currently serializing requests over a few connections, which again, causes HOL-blocking [13]. Another drawback of HTTP is its one-way communication style. A server is meant to be just a passive instance that can only respond to given requests. In many cases unnecessary round trips could be avoided if the server were able to initiate requests by itself, rather than waiting for a client request to respond to. This is because the server often knows in advance what resource the client is going to ask for and could potentially act in a more intelligent way here. Such a feature is typically called server push. Beside this issues, the HTTP protocol itself also creates a data overhead. By looking at the HTTP header it is noticeable that the header has some interesting redundancy. For instance, the user-agent header is actually quite long and it

25 Cyrill Pedol 25 is included in every single client request, even though the user-agent is not very likely to change during a HTTP session. Furthermore, whereas the HTTP payload can optionally be compressed, the header cannot at all. This is certainly a performance bottleneck, not just in combination with cookies, which are part of the HTTP header [4]. In addition, a general misconception that HTTP is intended to be a human readable protocol had led to headers just sent plaintext, even though a binary format could save overhead [11]. This section has shown why bandwidth is not always the best solution and that a lot of potential for improvement lies in novel protocols. It has pointed out some of the most tremendous shortcomings regarding TCP and HTTP. Despite the fact that TCP lacks some really important functionality regarding Web transactions, HTTP has never introduced own features that eliminate the problems once for all. HTTP rather forces the TCP way and bends it in order to avoid too much performance loss [3]. Before covering the new approaches, the next section shall discover the most important variables that affect latency and, hence, provide a fundament for better protocols to build on. 2.3 Variables to Be Improved The last section stated the main disadvantages of the current protocols in use. Given that, the relevant variables are to be extracted in the following. Thus, what are the variables that a new protocol has to deal with and should eventually improve, particularly regarding the latency? Due to the fact of many short-lived and parallel requests a browser typically has to perform, a primary goal should be to reduce the costs of opening a new connection as far as possible, so that each request can immediately be processed. However the setup of a real connection can hardly be improved, since some kind of handshaking and slow starts for congestion avoidance are typically required. Hence, a possible solution could be to have lightweight connections within connections. This is often called multiplexing or multistreaming. In this paper, multiplexing is meant to be the interleaving of requests within a shared channel, whereas multistreaming is referred to as the mapping of requests to multiple channels using partial ordering. Multiplexing and multistreaming improve another variable at the same time. TCP and HTTP/1.1 are both suffering from HOL-blocking, which typically occurs when serializing multiple data entities over a single connection. HOL-blocking of TCP is caused in case of packet loss, which is, if a packet got lost on its way to the receiver, the receiver is not allowed to deliver packets to the upper application until the lost packet has been successfully arrived after retransmission. Thus, packets lying between the lost packet and the retransmitted packet are blocked in the meantime. In HTTP it can happen that very large requests are being transferred and, in consequence, are blocking potential smaller requests queued behind. Therefore, in both cases, serialization is the root of the problem. Whereas multistreaming could solve the mentioned blockers, multiplexed connection are only able to overcome the latter. This is because multiplexed requests are still flowing through a single shared channel and there is no possibility for partial ordering [7, 4]. A third variable is the number of round trips performed for requests. As stated earlier, RTT is probably the most influential variable in a warmed up connection. Regarding that, it is highly desirable to reduce the overall amount of sent packets. Again here, multiplexing and multistreaming provide possible solutions, since they allow for more densely packed packets, thus using fewer packets. This is in relation to compression techniques, which may represent a second way to improve the round trips. Applying compression to packets and removing redundancy eventually leads to fewer packets, which means that the bandwidth is used more efficiently [1]. Generally, another approach could be to stabilize a connection and avoiding too much packet loss at the same time. Packet loss, beside the fact that it

26 26 Protocols for a Faster Web Table 2.1: Variables to be improved Variable connection setup costs blockers number of round trips user perceived latency Possible Solution multiplexing, multistreaming multiplexing, multistreaming multiplexing, multistreaming multihoming, server push compression techniques priority handling causes TCP HOL-blocking, requires retransmissions of lost packets which in consequence leads to more round trips needed for requests. A TCP connection is exactly bound to two Internet Protocol (IP) addresses, either belonging to the client or the server. If this network path is lossy or just not very stable, the connection would have to face a lot of packet loss. To overcome this problem, a connection should be able to manage various network paths in parallel, which is often referred to as multihoming. The number of round trips could also be reduced by just eliminating unnecessary communication steps. If a server were able to push resources to a client before it is going to ask for, this would save a lot of transmission [1]. This ability to anticipate possible requests and directly sending them alongside a response, is often known as server push. Finally, a special variable is the user perceived latency. It is certainly influenced by the variables introduced above, but it can also be improved without enhancing the actual system latency. In other words, mechanisms that improve that variable might not really affect the real page loading time as the others do, but rather pretend to do so. A possible approach could be the introduction of priorities. In case of Web sites that have a lot of resources and information that are displayed on them, some parts are typically hidden and have to be reached by scrolling down. Using priorities, a client could, for instance, prioritize resources that are visible on the first sight, hence slightly modifying rendering process [10]. As a matter of fact, new protocols still have to meet some constraint requirements, that do not directly influence page loading times. Hence, a protocol must still provide a certain level of security as well as fairness to other network participants, which is not to be too aggressive in respect of resources [10]. Additionally a protocol should be flexible to extensions and it should be easy to use. Ease of use means, that the specified protocol should be implementable and usable from the programmers point of view. Another very important issues is the deployment perspective of the protocols, which means that a protocol that is likely to be impossible to be deployed is not very much of a gain. Table 2.1 summarizes the affected variables and how they could potentially be influenced by newer protocols. This table is not meant to list all the possible variables and influences, but shall rather give an overview of the most important ones. Thus, the task of a novel protocol is to come up with an approach that positively affects these variables in order to make the Web faster. The next section is going to introduce these approaches, beginning with the transport layer. 2.4 Transport-Layer Protocols Since TCP is a transport layer protocol, one possibility is to get the root of the problem and directly solve the issues on this layer. Some initiatives have chosen that way and tried to invent a new approach that should replace TCP after all. Two approaches turned out to be the most promising ones particularly regarding efficiency and latency. They

27 Cyrill Pedol 27 are known as the Stream Control Transmission Protocol (SCTP) and the Structured Stream Transport (SST). Below these approaches will be introduced by pointing out their features which is then followed by a discussion of how they affect the variables defined in the previous section SCTP In October of 2000 the Internet Engineering Task Force (IETF) first standardized SCTP in RFC It was later updated by RFC Hence, SCTP represents an official standard that aims at providing an alternative for communication between peers in IP networks, thus, replacing TCP (and UDP). SCTP, which operates atop IP, has originally been intended for telephone network signalling and has now turned into a general purpose transport protocol [8] Approach First of all, SCTP introduces multistreaming. Data is no longer sent over a single connection stream but rather assigned to various streams, which are then sent over a SCTP connection. The number of inbound and outbound streams that can be used has to be negotiated at connection startup. These streams are then treated independently in respect of the ordering property, thus SCTP provides partial-ordering [6]. The handshake has been replaced by a 4-way alternative instead of the 3-way approach of TCP. The 4-way handshake positively affects the security constraint, since it overcomes the problem of SYN-attacks. Instead of storing the connection information on the server instance, it is exchanged by a cookie. Thus, if the initiator is still interested in opening the connection, it has to reply with the same cookie again. This leads to equal conditions between client and server when establishing a connection. The extra step in the handshake might imply an extended setup time, but this is not the case because the third packet is already able to piggy-back data [6]. Moreover, SCTP supports multihoming. Two endpoints communicating through SCTP can connect their various interfaces, which possibly leads to multiple network paths making up the connection. Thus, an SCTP connection is also referred to as an association. A certain path or interface is determined to be the primary one. While the endpoints will communicate through their primary interfaces, a heartbeat mechanism is concurrently checking the other interfaces for reachability. In case of a connection failure, SCTP allows for falling back to a secondary interface without loosing the connection, but this depends on the reachability of the secondary interface. Since every path could potentially represent very different network conditions, congestion control is applied for each path individually. Hence, all streams flowing through the same path are sharing the same congestion information. This is in contrast to the flow control, which is shared among all paths [6]. Unlike TCP, SCTP does not send data as byte-streams, but rather encapsulated in messages. In consequence the receiver is able to read data with message boundaries preserved. Whereas TCP based applications need to implement their own framing, SCTP does all the work regarding that and lifts the burden from the shoulders of the programmer [6]. SCTP is basically meant to be an ordered and reliable protocol as TCP is, but it is possible to indicate that no ordered delivery is needed for a given stream. By setting a header flag, SCTP starts communicating in an unordered manner. In contrast to UDP, the communication remains reliable [6]. This feature might be useful for applications that do not necessarily need the packets to arrive in order, but is generally not suited for Web-style traffic.

28 28 Protocols for a Faster Web In detail, SCTP provides even more functionality, but those features mentioned above are probably the most significant ones SST Another approach to overcome the drawbacks of TCP is SST. SST has been published in 2007 and is part of the UIA project at the Parallel and Distributed Operating Systems Group in the Computer Science and Artificial Intelligence Laboratory at MIT. SST either directly operates atop IP or wraps around UDP as an underlying transport protocol. In the following the idea of SST will be discussed Approach As its name suggests, SST pursues an approach of structured streams. This brings in a completely new concept of abstraction. Streams are no longer organized in a flat way next to each other, but rather in a stream hierarchy having parent-child relations. A SST session is transparently assigned a root stream, which then serves as parent for other streams to be created. Streams can be created on-the-fly at any time during a connection. Hence, SST allows for multistreaming using dynamically managed hierarchies of streams which are carried over a single SST session. Streams are designed to be very lightweight and do neither require handshakes nor slow starts. In addition, every stream is independent and has its own flow control. This concept offers to reflect the logical transaction structure embodied by some correlated requests [10]. SST also pays attention to message boundaries. It provides a simple facility for inserting record marks into a stream, so that the receiver knows how to get the single messages [10]. Furthermore, SST is able to prioritize streams, which can be used to privilege streams that are required by a visible view part of a browser for instance. A browser could decide to first load all the objects of a Web site that have to be rendered in a place that is immediately visible to a user, such as specific parts of a page or even complete tabs [10]. An important feature SST provides is its flexibility regarding the underlying network protocol. Since SST is meant to be a transport protocol, it can of course directly run atop the IP protocol. However, SST is normally running over UDP. Thus, it facilitates deployment due to the fact that it can better traverse intermediary instances such as routers and Network Address Translation (NAT) based systems. But in addition, it can be used by applications that do not have special privileges, because SST over UDP is designed to be a user space implementation [10]. Whereas SST supports TCP-like reliable and ordered style, it also provides an unreliable communication style. The latter is represented by the so called ephemeral substreams. Ephemeral substreams are generally ordinary substreams and permit a lowoverhead datagram delivery based on a best-effort algorithm. A stream that is marked to be an ephemeral stream will be sent in a real datagram-oriented fashion as UDP does. However SST might also decide not to do so, depending on the best-effort algorithm. If a datagram cannot be send without a reasonable amount of packet loss, SST just falls back to reliable delivery [10]. SST streams are attached to one or more SST channels. A channel can be though of a controller, managing tasks that should apply for all streams attached to it. This tasks include built-in security mechanisms, sequencing and congestion control [10]. Thus, SST makes a point of providing security itself instead of passing it on SSL/TLS. A rather exotic, but still helpful feature is the ability of NAT- Hole-Punching that SST provides. This allows for setting up a P2P-SST connection, which is similar to how Skype works.

29 Cyrill Pedol 29 Table 2.2: Comparison of transport layer protocols Variable SCTP SST connection setup costs +multistreaming +multistreaming blockers +multistreaming +multistreaming number of round trips +multistreaming +multistreaming +multihoming *indirect support of multihoming -header overhead +small header user perceived latency -no priorities +priorities The next section discusses SCTP and SST in terms of how their features conform to the defined variables Discussion Table 2.2 shows that SCTP as well as SST provide features that clearly improve some variables affecting latency. Both introduce multiplexing leading to lightweight streams that bypass the need for opening an expensive TCP connection for each request and solve the problem of serialization of requests at the same time. The ability to have multiple streams eliminates the TCP-like HOL-blocking and also the HTTP request blocking, in case of following a transaction per stream approach. Whereas SCTP comes with an excellent support for multihoming, which reduces the number of round trips in lossy network environments, SST does not directly support that. But since SST streams are able to be attached to multiple SST channels, this concept could be extended to support multihoming [9]. However, at the moment, SST does not specify a mechanism to get the heartbeat of a channel and, thus, having multihomed connections as it is done in SCTP would now probably be up to the application logic to implement. As a matter of fact, regarding the per packet overhead of headers, SST clearly outclasses SCTP. Wire efficiency evaluations in [9] have demonstrated that SST in stream mode actually does not even need more overhead than TCP itself. SCTP, in contrast, shows a per packet overhead that is just about 72% larger than it is in SST. Thus, SCTP slightly suffers from having unnecessary overhead which eventually leads to more round trips needed. Moreover, SCTP does not allow for stream prioritization. User perceived latency in regard of selective rendering is in consequence not going to improve in SCTP, which is in contrast to SST. For a protocol to be of practical use it must be deployable. In respect of this constraint, SCTP is a bit at a disadvantage, because it is meant to be run directly on top of IP. Even though some implementations exist, it is really hard to roll out the protocol, because a lot of intermediary nodes that exclusively understand TCP or UDP would need an upgrade. The ability of SST to run atop UDP solves this problem, although it causes a certain performance penalty [10]. Due to SCTP telephony background, it is not as flexible as SST in terms of managing streams. One thing that might be problematic when using SCTP is, that it negotiates the number of inbound and outbound streams during connection setup [10]. This means that if, for some reason, an application needs more streams than SCTP has arranged, the application might probably have to fall back to intra-stream serialization or pipelining(cf. HTTP over SCTP). If that happens, some of the benefits of SCTP are likely to be thwarted. This differs from SST, since SST is able to create streams dynamically within an existing session.

30 30 Protocols for a Faster Web Additionally, SCTP assigns a receiver window for all streams put together. This does not allow for a receiver to apply back pressure on one stream while accepting data on a different stream [10]. 2.5 Application-Layer Protocols As mentioned earlier in this paper, both TCP and HTTP have their disadvantages. Thus, changing the transport layer might help to overcome some problems, but it is not the silver bullet. The HTTP shortcomings still remain. This section focuses on approaches that try to wipe away those problems by jumping in the application layer HTTP-NG In mid-1997, the World Wide Web Consortium (W3C) started an activity with the goal to come up with a next generation protocol for HTTP, called HTTP Next Generation (HTTP-NG). The overall goals of this approach comprise simplicity, extensibility, scalability, network efficiency and transport flexibility. Obviously this activity faced a lot more issues than just performance and proposed a completely new architecture of HTTP. HTTP-NG focuses on an object-oriented approach, similar to technologies like the Java Remote Method Invocation (RMI), the Common Object Request Broker Architecture (CORBA) or the Distributed Component Object Model (DCOM). In order to achieve these goals, an architecture consisting of 3 layers has been chosen. At the very bottom, a message transport layer should build the interface to an underlying transport protocol, such as TCP. Atop the message transport, HTTP-NG puts a remote invocation layer, representing a generic request/response interface. Upmost, there is an application layer providing services and methods that make use of the layers mentioned before [13]. Hence, each sublayer should represent a certain communication abstraction, whereas the lower two of those resulting layers aimed at improving performance [15] Approach The HTTP-NG message transport abstraction layer is designed to include multiple so called filters. Thus, it employs a filter stack where data finally flows through and might be manipulated eventually. This is comparable to the concept of Unix pipes. The filter stack can include any type of filters, like such that are responsible for securing a connection (SSL) or compressing a given data payload. There is a special filter called WebMux. It was specified in the MUX/SMUX protocol, which has been renamed to WebMUX later on, in order to avoid name conflicts with the SNMP Multiplex Protocol. Basically, it could run on top of any transport protocol, but typically uses TCP. WebMUX allows for multiplexing request over a single connection. Opposed to SST and SCTP, it is noticeable, that multiplexing is not done on transport layer but rather a layer above. Moreover, WebMux introduces record marking. Record marking provides an efficient way to determine the length of a message, so that message boundaries can easily be detected. Additionally it assigns a separate flow control for each multiplexed stream, but uses the congestion control mechanisms of the underlying protocol, which might probably be TCP. Due to the proximity of HTTP-NG to systems like CORBA, RMI or DCOM for instance, the transport layer introduced a feature that allows for callback functions via endpoint identification. Hence, connections can be established in either direction, from client to server or vice versa [15]. The remote invocation focuses on request-response messaging, similar to HTTP. However it improves the performance by applying caching mechanisms and transforming messages

31 Cyrill Pedol 31 Table 2.3: Comparison of application layer protocols Variable HTTP-NG SPDY connection setup costs +multiplexing +multiplexing blockers +multiplexing +multiplexing +priorities number of round trips +multiplexing +multiplexing *callbacks +Server Push / Server Hint +binary format +binary format +compression +compression user perceived latency -no priorities +priorities into a binary format. This reduces the overall number of bytes actually transferred. Furthermore, the remote invocation layer uses additional control messages for communication, which again reduces traffic [14] SPDY A rather new approach is SPDY. This protocol has been invented by Google and is currently in its third draft. It is not standardized yes, but Google has recently sent this third draft to the IETF. SPDY is specifically designed for minimal latency [2]. Its basic approach is described in the following Approach SPDY is meant to be tied to SSL, sitting in between the HTTP layer and the SSL layer and it thus supports secured connections [1]. It provides multiplexing of streams over a single TCP connection. Just as the WebMUX protocol of HTTP-NG, SPDY therefore tries to solve the multiplexing problem by itself, rather than proposing a new transport layer protocol [1]. In addition to that, SPDY is able to assign priorities to given streams, so that more important requests are guaranteed to be processed first. Whereas this might be used to improve user perceived latency, it also reduces blocking of important resources in case of congested networks [1]. SPDY also specifies two rather advanced features. The first one is a mechanism called server push. This allows for requests initiated by a server. Hence, if a server knows in advance that a client might be going to ask for some resource, it could just push these resource alongside another response. This is comparable to the server hint feature. The intention of server hint is not to directly push a resource, but rather to send back some hints about potential resources a client might need. Opposed to server push, this though requires the client to request the resources as before, but it can reduce the time a client needs to discover what other resources could be associated with this page [1]. Moreover, SPDY attaches importance to improve redundancy of HTTP as far as possible. It introduces HTTP header compression and even removes unnecessary headers. SPDY itself is a binary format protocol, which again saves overhead. However, SPDY is not meant to be a replacement for HTTP, it provides a great enhancement of it. In the end, the request/response protocol of the application layer still remains the same, and SPDY just encapsulates HTTP by overriding some parts of it [1].

32 32 Protocols for a Faster Web Discussion SPDY and HTTP-NG try to solve the problems of the current protocols above the transport layer. They both introduce a multiplexing mechanism that allows for interleaving the requests in a TCP connection. Likewise before, their support for multiplexing helps to overcome the problem of connection setup penalties and it solves the problem of HTTP requests blocking, because requests can be sent in parallel. Moreover multiplexing allows for more efficient use of bandwidth, which reduces the number of round trips [1]. SPDY also features the assignment of priorities, which is in contrast to HTTP-NG. Despite the fact that the extensible and flexible nature of HTTP-NG might allow for easy adoption [15], it does currently not support it. SPDY might benefit from that when using a clogged network, thus not occupying the channel with non-critical resources [1]. The server push feature of SPDY is often a bit misinterpreted. The name server push could imply an alternative to technologies like AJAX long-polling or websocket pushes, however this is not what the server push of SPDY is. Even though specification might be extended to support this kind of pushes, the way of SPDY currently just means to push data into a browsers cache [3]. A comparable feature in HTTP-NG is the callback functionality, which is though a broader concept than in SPDY and reflect its object oriented focus. Using this features alongside the compression techniques and the binary format, a lot of redundant requests could be avoided, leading to fewer round trips needed eventually. Both protocols also get along with the primary constraints. Whereas HTTP-NG adds optional support for secured connections by inserting an SSL filter in its message transport stack, SPDY goes more along the lines of making it an essential part of the protocol. In other words, Google looks ahead by saying that secured connections are an inevitable requirement for the long-term future and thus attaches more importance to it [1]. Moreover, the multiplexing feature allows a fair resource usage and their deployment is relatively easy due to the application layer nature. HTTP-NG is slightly more flexible through having its 3 layer architecture. Obviously SPDY as well as HTTP-NG contribute a set of features that help to improve the variables defined at the beginning. This improvements are summed up in table 2.3. However both still rely on TCP. Some drawbacks of TCP could have been eliminated, but others though remain. For instance, TCP HOL-blocking does not simply disappear just because of multiplexing. From the TCP point of view, multiplexed messages are still but a single stream. 2.6 Combining Protocols Previously, approaches to both transport layer protocols as well as application layer protocols have been discussed. However an interesting question is, how HTTP behaves when combining it with a different underlying transport protocol. The next section shall specifically discover the approach of HTTP over SCTP, which has been described in its own Internet-Draft in HTTP over SCTP HTTP over SCTP tries to propose a new design of HTTP in combination with SCTP by pointing out services of SCTP that better match the needs of HTTP than TCP does. The assumption was not to make any changes to the HTTP standard, but rather to show how a browser or server implementation should map HTTP onto SCTP [12]. First, this draft states, that every HTTP transaction should employ its own stream. Every pipelined transaction should better be split into multiple independent streams in order to

33 Cyrill Pedol 33 avoid inter-transaction HOL-Blocking. Due to the fact that the number of streams has to be negotiated at handshake-time, this draft specifies an equal number for inbound as well as outbound streams to be negotiated. The request-response model should always be processed on the same stream (in- and outbound). However if, for some reason, the amount of transactions exceeds the number of streams available, this draft proposes falling back to intra-stream pipelining by using a round robin scheduling algorithm [12]. Multihoming is of course used to improve the general fault tolerance, which client and servers will automatically benefit from, whereas the 4-way handshake allows for protection of SYN-flooding. HTTP over SCTP also recommends that HTTP requests should potentially be split into multiple smaller messages in order to overcome the MTU size limits, which causes further fragmenting by SCTP itself [12]. Despite some interesting notions, HTTP over SCTP still has some open issues. For instance, it is still unclear how the message boundaries could positively fit into this concept. Furthermore, it does not definitely answer the question how clients decide between TCP or SCTP [12]. As it has been pointed out in the previous sections, SCTP is definitely able to improve Web transactions. However the approach of HTTP over SCTP does not eliminate the drawbacks of HTTP itself. In consequence, HTTP over SCTP is nothing but a partial solution. 2.7 Related Work The discussed protocols above are not the only approaches that target a faster Web. The Blocks Extensible Exchange Protocol (BEEP) for instance, which typically runs atop TCP, shares many features mentioned above. Nonetheless, it though does not focus enough on page loading times and thus lacks some very important functionality that could improve latency [1]. Another interesting approach is called ASAP. It is a transport layer protocol and is specifically intended to be a low latency protocol. ASAP tries to cut down the latency of transactions with delay as far as possible to a value close to a single RTT. It allows for piggyback transport packets within DNS request, hence using the bandwidth more efficiently. Furthermore it eliminates the 3 way handshake and replaces it with a certificate based verification, thus avoiding the need for handshake on every connection setup [23]. Even though these features are quite novel compared to the protocols discussed before, ASAP does support features like multistreaming or priority assignment. Thus, ASAP does not include enough features to make it a powerful general purpose transport protocol, or, in other words, it does not go far enough. 2.8 Conclusion This paper discussed various potential approaches for protocols that should enhance the performance of Web transactions. This begs the question of which solution might be the most appropriate one for the near future. The big picture shows that all protocols actually provide a real improvement compared to HTTP and TCP. Test results also substantiate this statement. Even though most tests have been performed using immature prototype implementations of the novel protocols, a significant performance gain could be observed. In case of SPDY, which has been tested against the 25 top Web sites, the page loading time could even be reduced by a value between 27% and 63% [1], which is an essential difference. HTTP-NG has been observed to be around 17% faster than HTTP/1.1, when testing it with a W3C microscape benchmark [14]. Additionally, both SCTP and SST have been tested against TCP. SST compared to a specially designed user-space TCP implementation

34 34 Protocols for a Faster Web performed about 2.1% faster [9]. SCTP, compared to native TCP, reached a performance boost in the range of 20% in average when simulating the tests under 1-2% of packet loss. Surprisingly, SCTP performed about 13% slower when running under 0% of packet loss[7]. Though, when using a real-world packet loss value, which is typically said to be around 1-2% [1], the general improvements of the protocols can be seen. Nevertheless, it is important to mention, that all these reported values of the protocols resulted from different environment setups and testing conditions. Thus, the results should be examined in isolation and they might not allow to draw any conclusions about the comparability of the novel protocols itself. However a faster protocol is of no use if it is hardly deployable. As mentioned before, deploying a transport layer protocols is typically going to be a though piece of work. Hence, it is highly questionable whether a protocol like SCTP has a realistic perspective to be widely adopted in the real world. SCTP would require a all-or-nothing -like deployment strategy, whereas SST might be introduced gradually by first using the UDP based version and let the applications get accustomed to it before switching to SST atop IP. Application layer protocols do not have that problem, however the impression arises that these protocols are somehow trying to solve too much, interfering with tasks that are part of the responsibilities of the lower layer. This leads to the drawback that, for instance, the great set of SPDY features is not a benefit for other application layer protocols, since they probably still use TCP. Beside this, it is also noticeable that SPDY is actually the only protocol that still shows a certain level of activity. Google recently submitted their draft to IETF for standardization, whereas other protocols have not highlighted progress for years. This leads to a two-sided proposal to answer the question that has been put at the beginning of this section. An ideal, but rather unrealistic approach could be to replace TCP with SST and thus providing a more powerful transport that can be used by all the protocols lying above. This might cause some features of the discussed protocols to become superfluous, such as the multiplexing of SPDY. Nonetheless, combining this with a slightly slimmed down version of an HTTP alternative would lead to a well matching set of protocols, whereby each of them comes along with its area of responsibilities. A more practical, but not quite ideal solution would pursue the approach of leaving the transport as it is and let the upper layers do all the work. Given that, SPDY is probably the most promising candidate that has a realistic chance to establish oneself in the near future.

35 35 Bibliography [1] SPDY Protocol Whitepaper: SPDY: An experimental protocol for a faster web;url, Available at: last visited: March 19, [2] Mike Belshe, Roberto Peon: SPDY Protocol; URL, Available at: https://tools. ietf.org/html/draft-mbelshe-httpbis-spdy-00, last visited: March 19, [3] Chris Strom: The SPDY Book; E-Book, Available at: com, last visited: March 19, 2012 [4] Roberto Peon, William Chan: Google Tech Talk SPDY Essentials; December Screencast, Available at: PLE0E03DF19D90B5F4&index=2&feature=plpp_video, last visited: March 19, [5] Mike Belshe: More Bandwidth Doesn t Matter (much);url, August 2010, Available at: Y2hyb21pdW0ub3JnfGRldnxneDoxMzcyOWI1N2I4YzI3NzE2, last visited: March 19, [6] RFC 4960: Stream Control Transmission Protocol; URL, Available at: ietf.org/rfc/rfc4960.txt, last visited: March 19, [7] Rajesh Rajamani, Sumit Kumar, Nikhil Gupta: SCTP versus TCP: Comparing the Performance of Transport Protocols for Web Traffic;Computer Sciences Department, University of Wisconsin-Madison, July 2002, Available at:http://pages.cs.wisc. edu/~sumit/extlinks/sctp.pdf, last visited: March 19, [8] Randall Stewart, Michael Tüxen, Peter Lei: SCTP: What is it, and how to use it?; BSDCan 2008, The Technical BSD Conference. URL, Available at: last visited: March 19, [9] Bryan Ford: Structured Streams: a New Transport Abstraction;Massachusetts Institute of Technology. URL, Available at:http://www.brynosaurus.com/pub/net/ sst.pdf, last visited: March 19, [10] Bryan Ford: Structured Stream Transport: Preliminary Protocol Specification;Massachusetts Institute of Technology, November URL, Available at:http: //pdos.csail.mit.edu/uia/sst/spec.pdf, last visited: March 19, [11] W3C Web site, Working Draft: SMUX Protocol Specification; URL, Available at:http://www.w3.org/tr/wdd-mux, last visited: March 19, [12] IETF Draft: Using SCTP as a Transport Layer Protocol for HTTP; URL, Available at:http://tools.ietf.org/html/draft-natarajan-http-over-sctp-00, last visited: March 19, 2012.

36 36 Protocols for a Faster Web [13] David Gourley et al.: HTTP: The Definitive Guide; O Reilly, [14] William C. Janssen: A next Generation Architecture For HTTP; Xerox Palo Alto Research Center, February URL, Available at:http://www.cs.gmu.edu/ ~setia/inft803/http-ng.pdf, last visited: March 19, [15] Henrik Frystyk Nielsen et al.: HTTP-NG Overview; URL, Available at: w3.org/protocols/http-ng/1998/11/draft-frystyk-httpng-overview-00, last visited: March 19, [16] Mike Spreitzer, Bill Janssen: HTTP Next Generation ;Xerox Palo Alto Research Center. URL, Available athttp://www.parc.com/content/attachments/ janssen-www9-http-next-generation.pdf, last visited: March 19, [17] Jonathan T. Leighton: A Comparison of SPDY-over-SCTP and SPDY-over-TCP; Protocol Engineering Lab, University of Delaware. URL, Available at eecis.udel.edu/~leighton/spdy.html, last visited: March 19, [18] Cern Website: Where the web was born; URL, Available at:http://public.web. cern.ch/public/en/about/web-en.html, last visited: March 19, [19] httparchive.org: Trends; URL, Available at: last visited: March 19, [20] Josep Domenech et al.: A user-focused evaluation of web prefetching algorithms;department of Computer Engineering, Universitat Politecnica de Valencia, Valencia, Spain, [21] JupiterResearch:Retail Web Site Performance;URL Available at: last visited: March 19, [22] RFC 3080: Blocks Extensible Exchange Protocol; URL, Available at:http://www. rfc-editor.org/rfc/rfc3080.txt, last visited: March 19, [23] Wenxuan Zhou et al.: ASAP: A Low-Latency Transport Layer; University of Illinois at Urbana-Champaign, 2011.

37 37 Chapter 3 Content Distribution Networks Sebastian Golaszewski In spite of constant improvements in bandwidth and increasing capacities of servers the rapid growth of content volumes causes longer latencies for the end users and higher workloads of the networks. Content Distribution Networks (CDN) replicate requested content to several geographically distributed replica servers and redirect requesting clients to close caches resulting in lower download times and a better user s quality experience. This seminar report offers an overview about the common mechanisms and techniques of a CDN and takes also a look at some economic aspects like the relationship between a CDN provider and ISPs, as well as an overview of the current market situation in content delivery.

38 38 Content Distribution Networks Contents 3.1 Introduction Overview Mechanisms of a CDN Architecture of a CDN Technical Aspects Replica Server Placement Content Management Request Routing Performance Measurement Economic Aspects ISPs and CDN Providers Current Market Situation Akamai Summary

39 Sebastian Golaszewski Introduction Internet becomes worldwide more and more popular. Especially bandwidth intensive content is rapidly growing, resulting in longer download times and heavy workloads of the networks. The Internet operates without a central coordination, which is certainly a reason for the wide success, but there is no guarantee for service quality and load balancing. Even constant improvements in bandwidth or increasing capacities of the servers can not eliminate the access delay problems. An effective approach are content distribution networks (CDN) [12]. Other sources refer to content delivery networks, which has the same meaning [17]. A CDN moves web content, like texts, images or videos, to replica servers, which are geographically wide distributed. By caching nearby copies the requesting clients achieve lower load times and the networks have fewer traffic and are better balanced. There are several challenges like selecting the content to host at a given replica server, keeping the content consistent, providing a request routing system and the placement of the CDN servers [2]. This report gives an overview about the idea and the approaches to distribute content over the web. Section 3.2 describes some general mechanisms and a general architecture view of a CDN. Section 3.3 takes a look at some challenges in design of a CDN and their approaches. It considers the topics replica server placement, content management, request routing and performance measurement. The last section 3.4 describes the relationships between CDNs and Internet Service Providers (ISP). Further the section contains a brief presentation of the current market situation for content delivery and the commercial CDN provider Akamai, which is the current market leader. 3.2 Overview A CDN distributes content from origin servers to replica servers which are close to the end clients. The notion close includes geographical aspects, but also topological or latency considerations. A CDN serves usually multiple content providers. A content provider, like a worldwide acting company, refers the CDN to the web content, which has to be spread worldwide. That is why a CDN replicates and delivers a very selective set of content to the replica servers. A request will be answered only for the cached content by a near replica server [6] Mechanisms of a CDN A CDN requires a certain set of supporting mechanisms to guarantee efficiency for a significant number of users at different wide spread locations [11][2]: A replica placement mechanism is needed to select the replica server sites and to adaptively fill them with content. The servers do not cache like traditional proxy caching. The content is pro-actively updated, which happens only once and will be not repeated for every access to the origin server. A CDN does not have a pull behaviour like in traditional caching. Content update mechanism is needed to automatically and regularly check the origin host site for changes and retrieve content for delivery to the replica servers, so that the CDN can guarantee content consistency. Active measurement mechanism must be available to monitor the whole CDN and used network, to be able to recognize the fastest route between the requesting user and the replica server in any type of traffic situations, like flash crowds - sudden

40 40 Content Distribution Networks Figure 3.1: System Architecture Components of a CDN [12] heavy demands for a single site, expected or not. This monitoring is the basis for each replica selection mechanism. A replica selection mechanisms of a CDN is the key feature which allows the end users to retrieve the required content fast from the closest and most available edge server. The service must keep its server from getting overloaded by means of load balancing. Re-routing mechanism must be provided to be able quickly re-route content request in case of congestions and traffic bursts detected by the measurement system Architecture of a CDN The described architecture is based on the pathbreaking work of Gang Peng in his paper CDN: Content Distribution Network [12]. A general architecture of a CDN system can be described by seven components: client, origin server, replica servers, request routing system, distribution system, accounting system and billing organization. The relationships between the components are shown in Figure 3.1 and are represented with the numbered lines, which are described as follows: 1. The request routing system receives the URI name space of web objects, which should be distributed and delivered by the CDN, from the origin server. 2. The distribution system receives the content from the origin server that the CDN publishes by distributing and delivering. 3. The distribution system distributes the content from the origin server on the replica servers. Additionally the system gives a feedback to the request routing system, which assists the replica server selection process for client requests. 4. The client requests web objects from what it considers to be the origin. Through URI name space delegation, the request will be directed to the request routing system. 5. The request routing system receives the request from a client and routes it to a suitable replica server in the CDN. 6. The client receives the requested content delivered by the selected replica server.

41 Sebastian Golaszewski Accounting informations for delivered content are send from the replica server to the accounting system. The accounting informations are collected and handled by the accounting system into statistics and detail records. These are used by the origin server and billing organization and also as feedback to the request routing system. 8. The content detail records are used by the billing organization to settle with all parties involved in the content distribution and delivery process. 3.3 Technical Aspects The design of a CDN involves several challenges. Section describes approaches to place a certain number of replica servers in a suitable manner. Section discusses the challenges to select the right content, how to outsource it and how to keep it updated. The section describes some approaches to redirect client s request to an appropriate replica server. The last section covers some performance measurement techniques and metrics which are used by CDNs Replica Server Placement One important challenge in building a CDN is distributing and placing the replica servers on smart locations. The replica server placement problem deals with determining the best network location for the CDN replica server. Intuitively, replica servers should be placed close to the clients, with the goal to reduce latency and bandwidth consumption. Several approaches has been proposed [12][8]. The placement problem can be seen as a center placement problem. The goal is to place a given number of centers in a such manner that the maximum distance between a node and the nearest center gets minimized. One theoretical approach is the k-hierachically well-seperated trees(k-hst), which is based on graph theory. The algorithm builds a tree by recursively subdividing the graph in partitions. The probability to cut short links by partitioning decreases exponentially as on climbs the tree. That is why nodes close together are more likely to be partitioned lower down the tree. Taking advantage of this characteristics a greedy strategy can be used to find the number of centers needed in the resulted tree when the maximum distance between node and center is bounded by a given value. An other theoretical approach is the minimum k-center problem: Given a graph G = (V, E) and all its edges arranged in non-decreasing order by edge cost c: c(e 1 ) c(e 2 )... c(e m ). 1. Construct a set of square graphs G 2 1, G 2 2,..., G 2 m. 2. Compute the maximal independent set M i for each G 2 i. An independent set M is defined as an independent set V such that all nodes in V V are at most three hops away from nodes in V. 3. Find smallest i such that M i K, which is defined as j. 4. M j is the set of K center. The theoretical approaches described above have a high computational complexity. An other disadvantage is that they do not consider the characteristics of network and workload. Hence, they are less suitable for real CDNs [14]. The hot spot approach puts the replica servers near the clients generating greatest load. The potential sites are sorted in dependency to the amount of traffic caused by their surrounding nodes. The replica servers are placed at the top sites that generate maximum traffic [11]. The heuristic approaches are suboptimal, but they consider the existing information from the CDN, such as network topology and workload. The greedy algorithm, proposed by P. Krishnan et al. [7] for the cache location problem and adjusted by Qiu et al. [13] for the replica placement problem in CDNs, is one approach. The algorithm chooses M servers among N potential locations. In a first phase the costs associated with each location are computed, e.g. the bandwidth consumption. The algorithm assumes that the access from

42 42 Content Distribution Networks all clients converges to the location under consideration. Hence the location with lowest cost is chosen. In the second phase the algorithm searches for a second location with lowest costs in conjunction with the location already selected. The algorithm repeats the iteration until M server location has been chosen. It assumes that clients direct their requests to the nearest server, i.e. to a server that can be reached with lowest cost. The disadvantage is, that knowledge about the clients location and pair-wise inter-node costs is required. The topology-informed placement strategy places the replica servers on candidates nodes in descending order of outdegrees. The outdegree of a node is defined as the number of other nodes it is connected to. The transit node heuristic strategy assumes that nodes with the highest outdegree can reach more nodes with smaller latency. Due the lack of detailed network topology the strategy uses Autonomous Systems (AS) topologies where each node represents a single AS. Radoslavov et al. [14] improved the strategy by using router-level topology instead AS-level topology where each LAN associated with a router can be a potential location for a replica server. Experiments showed that the transit node heuristic performs as well as the greedy algorithm and using router-level topologies performs better than using AS-level topology information [12]. To determine the optimal number of replica servers there exist two approaches. In the single-isp approach a CDN provider places at least 40 replica servers around the network edge. The idea is to put one or two replica servers in each major city within the ISP coverage. An Internet service provider with a global network can have a wide spread coverage without relying on other ISPs. Estimation of performance showed that this approach is useful for low-to-medium traffic. The disadvantage is that clients of the CDN provider may be distant from the placed replica servers. In the Multi-ISP approach replica servers are placed at as many global ISP Points of Presence (POPs) as possible. This approach reduces the probability, that a requesting client does not have a replica server in his near area and it works better for locations with high traffic volumes. Beside the cost and complexity to setup and maintain, the main disadvantage is that some replica servers may receive fewer requests than in the single-isp approach, which results in weak used capacities and a poor CDN performance [11] Content Management To achieve an efficient content delivery the right selection of content is crucial, resulting in a reduction of client download time and origin server load. The simplest approach is the full-site content selection and delivery approach. The origin content provider outsources the entire set of origin content to the distribution system. The content provider configures the DNS in such a way that all clients requests for its Web site are directed to a CDN server, which resolves the requests and delivers the full Web site. This approach is not usable considering the tendency to have bigger sizes for Web objects. An other problem is the required huge and expensive storage space. A further disadvantage is that a content provider nearly never has static content, which results in unmanageable updating of huge collections of Web objects [11]. A better solution is the partial site content selection and delivery approach, where partial replication is performed. The content provider selects single web objects, like web page objects or videos, which are placed in the CDN. The objects have host names in a domain for which a CDN provider is responsible. A requesting client receives for example the base of a HTML page, while embedded objects are linked to a CDN provider. This approach shows generally a better performance than the full-site approach. Beside the desired reducing of load on the origin server and infrastructure, the irregular change of embedded content is better manageable [11]. Once the content providers have selected the web objects to deliver by the CDN the next issue is choosing an efficient outsourcing practice. The cooperative push-based approach is based on pre-fetching of the content to the replica servers. A content provider pushes

43 Sebastian Golaszewski 43 the content to the replica servers. Additionally the replica servers cooperate together to reduce replication and update costs. If a replica server does not have a copy of the requested content the client is directed to the origin server. This approach is still only theoretical and no CDN provider supports this scheme [8]. The non-cooperative pull-based approach directs client requests to the closest replica server using URL rewriting or DNS redirection. If some content is missing, the replica server pulls and caches the requested content. This approach is used by the most popular CDN providers. The issue of the approach is that an optimal replica server is not always selected [11]. The cooperative pull-based approach pulls also content in case of cache miss, but not directly from the origin servers. The replica servers cooperate with each other to get the requested content. Client requests are directed through DNS redirection to the closest replica server. The CDN maintains a distributed index, like distributed hash tables (DHT), so that a replica server finds nearby copies of the requested content and stores it in the cache [11]. This reactive approach is implemented for example by the academic CDN Coral [3]. The issue of the optimal placement of outsourced content is still a research task. Several strategies has been developed. Kangasharju et al. [5] propose four heuristics: random, popularity, greedy-single and greedy-global for replication of outsourced content. An other set of greedy approaches is presented in [16] where the placement considers the balancing of load and size of replica servers. Further an algorithm of Pallis et al. [10] called latcdn uses object latencies for replication decisions. The same authors proposed also an improved algorithm called il2p [9] which makes placement decisions with respect to the latency and load of web objects. An other issue in cache management is keeping cache consistency, especially for web objects changing dynamically. Cached content has typically associated expiration times, called time to live (TTL). After the TTL expired and a client is requesting the replica server has to validate the cached object with a remote server (origin or another cache). It can happen, that the major part of a cache turned out to be current. The messages needed for validation have small sizes, but anyway, they can cause latency. Thus the functionality of a CDN does not depend only on content availability but also on its currentness [2]. One approach to keep cache consistency is pre-populating or pushing content to the caches before requests arrive. New or updated Web objects get automatically pushed to the caches, which guarantees content consistency and no validity checks are necessary. The drawback is that this technique causes a large amount of traffic. An other approach is invalidation. When an origin object has been changed an invalidation message is sent to all replicas. Each replica has to retrieve an updated version individually at a later time. This approach does not make full use of the distribution network for content delivery and inefficiency in managing consistency can appear. A better solution is the hybrid approach, which generates less traffic than the propagation and invalidation approach. The content provider chooses between the propagation or invalidation method for each Web object. The decision is based on the statistics about the update frequency at the origin servers and the requests collected by replicas [11] Request Routing A key challenge in a CDN is to build and maintain an efficient request routing system. A request routing system is responsible to reroute a clients request for distributed content to a suitable replica server. The process called server selection uses criteria like network topology (proximities between client and selected replica server), replica server availability and their load. To select a suitable replica for a requesting client within a CDN the request routing system usually has to handle two issues. First, it has to determine the distance between the client and a replica server. To measure the distance tools like ping and traceroute are used to

44 44 Content Distribution Networks determine hop counts and round-trip times. The drawback of these two metrics is that they do not take account for the highly variable network load, hence they are not sufficient and accurate to indicate the proximity between replica servers and the clients. Second, the request routing system has to determine the availability and load of a replica server. The techniques widely used are server push and client probe. In the first approach the replica servers sends the load information to some agents. In the second approach, the agents probe the load status of the replica servers periodically. There is trade-off between the frequencies of probing for accurate load measurement and the traffic caused by probing [12]. Several request routing mechanisms have been used to reroute clients to a suitable server among a set of replica servers. They can be classified into five categories, according their variety of request processing [12][11][2]: Client multiplexing: A proxy server close to the requesting client or the client it-self receives the addresses of a set of candidate replica servers and chooses one to send the final request. The decision is based on measured latencies by probing to the replica servers. Generally, this scheme generates additional overhead through sending a set of candidate replica servers. An other drawback is that the client may choose, due to lack of information, a server with high load, which could overload servers and cause larger access latency. HTTP redirection: HTTP protocols allow a server to respond to a client request with a special reply message, located in the HTTP header, that tells the client to resubmit its request to another server. This way a server chooses for the requesting client suitable replica servers and redirects the client to those servers. The mechanism provides both full-site and partial-site content selection and delivery. Main advantages are simplicity and flexibility. On the other hand it probably is the least efficient approach, due the lack of transparency. Beside the overhead generated by the extra messages, the origin servers are the only point responsible for redirecting requests, which can result in bottlenecks. DNS-based request routing: This scheme relies on a modified Domain Name System (DNS), which returns an IP address of one of a set of replica servers when the DNS server is requested by the client with a server s domain name. A domain name has multiple IP addresses associated to it, so that a client s DNS resolver chooses a suitable replica server among these. DNS-based request routing provides full and partial-site selection and delivery and shows generally the best performance and effectiveness. Also an advantage is its transparency to the clients. The approach is widely used because its simplicity. Since it is fitted in the DNS any Internet application can use it. The main disadvantage is the increased DNS lookup time. CDN providers split typically the CDN DNS into two levels (low-level and high-level DNS) to solve this problem. Anycasting: Request routing in CDN can be seen as a task of locating nearby copies of replicated Internet servers. The anycasting technique was originally developed for server location, but can also be used for request routing in CDN. The approach assumes that an anycast address respectively a domain name can be an IP anycast address or a URL of content which is assigned to a set of hosts. These hosts provide the same service and each router provides a path in its routing table to the host that is usually closest to this router. A client sends packets with an anycast address in the destination address field desiring to receive a respond from one of the servers. Anycast-aware routers direct then the packets to at least one of the replica servers identified by the anycast address. This anycast routing approach

45 Sebastian Golaszewski 45 can be integrated into the existing Internet routing infrastructure, which enables a request routing service to all the CDNs. A big benefit of this scheme is also that it scales up well with the growth of the Internet. Peer-to-Peer Routing: Peered CDNs are designed by symmetrical connections between hosts and form an ad-hoc network. Peer-to-peer content networks provide and deliver content on each others interest. A CDN interacts with other partnered CDNs and their nearby forward proxies which results in a high range. A content provider typically has contracts with only one CDN which contacts on its behalf other peer CDNs. Peered CDNs are more fault-tolerant and the request routing can be developed on the members of the peer-to-peer network themselves. Disadvantages are the constant changing of the network and nodes will never have the complete global information about the network. The issue of routing requests efficiently through peer-to-peer networks in a distributed manner without causing high overhead is still a major research concern Performance Measurement Customers of a CDN, typically content providers need feedback to predict, observe and ensure the end-to-end performance of a CDN. A combination of software and hardwarebased probes distributed over a CDN measures usually five key metrics [11]: The cache hit ratio is the ratio of the number of cached documents versus total requested documents. The higher the hit rate the more efficient is the CDN s cache managing. The reserved bandwidth measures the bandwidth used by the origin server. The latency shows the user perceived response time. The replica server utilization measurement indicates the fraction of time during which the server is busy. This metric can be used to derive the CPU load, number of requests served and storage I/O usage. The reliability of a CDN can be determined by packet-loss measurements. packet loss indicates a high reliability and availability to the clients. Less CDN providers perform an internal measurement by collecting and analysing logs from the servers, which are equipped with hardware and/or software probes. This measurements can be deceptive: a particular Web site can perform well for some users, but perform badly for others. To achieve an independent and reliable performance measurement external measurements are required. Third-party companies informs the CDN customers about the verified and guaranteed performance. This companies operate benchmarking networks of strategically distributed measurement computers. This measurements have the enduser s perspective [8]. There exist different network statistics acquisition techniques for internal and external performance measurement. A measurement technique is network probing. Possible requesting entities are probed in order to derive some metrics from some replica servers. An example of this technique is an ICMP ECHO message. Disadvantages are additional network latency and triggered intrusion-detection alerts which can appear after sending several probes to an entity. This can cause that ICMP traffic will be ignored or reprioritized because some firewalls may detect Distributed Denial of Service (DDoS) attacks [11] [8]. An other measurement technique is traffic monitoring where the traffic between a client and a replica server is monitored to detect the current performance metrics. After

46 46 Content Distribution Networks the connection establishment between a client and a replica sever the actual performance of the transfer is measured. This informations are used as feedback for the request-routing system. An example of such traffic monitoring is to observe the packet loss between a client and a replica server or the user perceived latency. The latency is a common distance metric, which can be estimated by monitoring of the number of packets travelled along the route between client and the replica servers [11]. An intuitive measuring technique is taking feedback from the replica servers through periodical probing of a replica server. Agents which are deployed in the replica servers collect feedback informations. These agents can allocate a variety of metrics about their site. Methods for getting feedback informations can be static or dynamic. Static methods select a route to minimize the number of hops or to optimize other static parameters. Dynamic methods allow computing round-trip time or other quality-of-service (QoS) parameters in real time [2]. A CDN status information determined by network statistics acquisition methods relies on several metrics. A measure to identify a user s location within a certain area is the geographical proximity. This metric is typically derived through probing of Border Gateway Protocol (BGP) routing tables and often used to redirect all users within a certain region to the same POP. The end-user perceived response time latency is a useful metric for reroute a user s request to a suitable replica server. A metric used to select the path with lowest error rate is packet loss. Other metrics which are used to select best path for streaming media delivery are average bandwidth, startup time and frame rate. The computation of the server load state is based on the metrics CPU load, network interface load, active connection and storage I/O load. The server state is used to select the server with the least load [11]. 3.4 Economic Aspects Beside the technical aspects of CDNs there are also economic ones. The advent of CDNs and the increasing data volumes brought significant changes in the business of content delivery ISPs and CDN Providers The Internet is a network of local, national and global interconnected networks. These networks are provided by Internet service providers (ISP) which physically connect their networks. This happens typically in specialized computing centers called Internet Exchange points. These are commonly operated by third party providers. Some CDN providers place their servers on this major Internet Exchange points (see Section 3.3.1). But the most CDN providers choose more local sites controlled by ISPs. This physical interconnection is regulated by contractual agreements between ISPs and CDN providers. The connections between the CDN servers themselves lays partly on the public Internet and partly on parallel private infrastructures [4]. CDN providers and ISPs have a strong interest in mutual cooperation. A CDN provider can offer its customers faster data delivery for their content. On the other hand customers of an ISP which has local CDN servers in its network benefit from faster access to the content provider s origin servers. And also the ISPs profit from lower connection costs to other ISPs, because the main traffic stays within their own networks. From an economic view the classical Internet market for data termination is split into two separate parts. On one side peering agreements between ISPs avoid charging fees from content providers by terminating ISPs. On the other side the terminating ISPs control the access to the clients. In addition there is no controlling entity. The advent of CDNs has brought both parties, ISPs and content providers, together onto one platform with contractual

47 Sebastian Golaszewski 47 Figure 3.2: Classical Internet connectivity [4] Figure 3.3: Content delivery value chain [4] relationships with both sides. With CDNs the business objectives of ISPs are changed from single-sided to two-sided profit maximization [4]. Figure 3.2 describes the standard Internet way of delivering content and the corresponding ad-financed business model. An advertiser pays advertising fees to a content provider which controls some origin servers and acts as a platform enabling interaction between advertiser and clients. The origin servers realize access to end user s clients by delivering content across several ISP s networks. A single ISP usually will not have origin server and client on its network. Comparing this to Figure 3.3 we see the transit role of a CDN: there are no transit and source ISPs any more. A CDN provider has business relations with both content providers and terminating ISPs. In Figure 3.2 an ISP earns money from the endpoints of any traffic flow either source or sink, but is never or at least very rare connected to both origin server or client. A CDN enables terminating ISPs contact to the content provider s origin servers which can be used for two-sided optimization of ISP s profit functions [4] Current Market Situation In the past years the market for content delivery showed a very high growth. In 2006 YouTube, Flash and other industry trends caused a huge surge in volume growth. This trend abated a little but showed still high growth rates about ten to thirty percent in the last three years. Figure 3.4 shows the expected tendency for growth rates and revenues. Analysts still except a strong growth in volume and market size, which is a multi million dollar business [15]. In the past years video streams - or more generally media streams acted - as a catalyst respectively driver for the market growth. Unfortunately no current techniques or trends shows the potential to cause growth rates like the advent of video streams. Anyway upcoming trends like HD video, E-Books, Internet television or mobile content acceleration will contribute to the future development of the content delivery market. But predictions, especially in the highly dynamic Internet world, are to be treated with caution [15]. The Tables 3.1 and 3.2 show a rough estimation of the pricing models of the forth quarter The data relies on surveys of some customers of the major commercial CDN providers Akamai, Limelight, Level 3, Amazon, EdgeCast, AT&T, Verizon and Highwinds. Table 3.1 shows the pricing based on volumes. An interesting point is that the

48 48 Content Distribution Networks Figure 3.4: Projected size of the CDN market [15] Table 3.1: Video delivery pricing for forth quarter 2011 (per GB)[15] Volume Highest Price ($ per GB) Lowest Price ($ per GB) 250 TB TB TB PB PB PB bigger CDN providers are targeting clearly to serve customers with 250 Terabyte (TB) volume or more. Of course they take customers with a pricing on a 100 TB a month deal, but they play a minor role. The prices for content delivery dropped over the recent years: -45% in 2009, -25% in 2010 and -20% in 2011 [15]. Main reasons for this development are new market participants with aggressive price models and business acquisitions which results in a pricing pressure. The forecasts for the year 2012 and 2013 show a quite stable pricing. A reason is that content owners typically choose quality first and price second. For the most costumers performance is more crucial than low prices. An other and probably bigger reason for stable prices is the current tendency to have pricing models based on a per Mbps per month model which is shown in Table 3.2. Table 3.2: Video delivery pricing for forth quarter 2011 (per Mbps per month)[15] Mbps per month Highest Price (in $) Lowest Price (in $) < 100 (small deals)

49 Sebastian Golaszewski Akamai Akamai Technologies, Inc. was founded in 1998 and is with 85 % (2008) market share the current market leader in the content delivery market [11]. Akamai, which is an Hawaiian word for smart or intelligent with connotations of insightful or wise or skilful, was evolved out of an MIT research effort. The content distribution platform controls approximately servers in 75 different countries which are placed in about networks, so that Akamai is able to monitor about 20 % of the Web traffic [1]. According to the previous sections (see Section 3.3 Technical Aspects) Akamai uses the following solutions: Akamai handles static as well as dynamic content with main focus on streaming media. The replica servers are placed according the multi-isp approach. The platform supports full and partial-site selection and delivery. The common outsource practice is non-cooperative pull-based. The request routing is based on a modified DNS. The internal measurement consists of network probing and proactive traffic monitoring. The external measurement is performed by a third-party organization called Giga Information group [11]. 3.5 Summary Due to the tendency of constant increasing content volumes replication techniques such as CDN are getting more and more important. CDNs offer an effective approach to deal with the current and future traffic volumes facilitating faster download times and better work balances of the networks. This report discussed some key challenges in designing and maintaining a CDN. Several approaches has been proposed and offered suitable results in usability and performance. But there are still several research topics and theoretical approaches which requires a deeper examination to gain more optimized CDNs. A further topic was the business relationship of a CDN provider to ISPs. CDNs accomplish all participating parties in content delivery on a single platform and allow two-side gainful optimizations by handling large content volumes. This seminar report took also a brief look at the current market situation with some pricing examples in content delivery and it s market leader Akamai. Beside the difficulty to predict the Internet development and the role of the CDN technique in the future, the idea to have autonomic CDNs with self-managing features which are driven by the common average user will certainly bring some exciting solutions for improvements in content delivery and their corresponding next-generation CDN techniques.

50 50 Bibliography [1] Akamai; March, [2] N. Bartolini, E. Casalicchio, S. Tucci: A Walk through Content Delivery Networks; Universita di Roma La Sapienza and Tor Vergata, Roma, Italy, In Proceedings of MASCOTS 2003, LNCS 2965, pp. 1-25, April, [3] Coral; March, [4] T. Hau, W. Brenner: Vertical Platform Interaction on the Internet, How ISPS and CDNS Interact; Research Paper, 18th European Conference on Information Systems, Pretoria, South Africa, [5] J. Kangasharju, J. Roberts, K. W. Ross: Object Replication Strategies in Content Distribution Networks; Computer Communications, Vol. 25, No. 4, pp , March, [6] B. Krishnamurthy, C. Wills, Y. Zhang: On the Use and Performance of Content Distribution Networks; In Proceedings of 1st International Internet Measurment Workshop, ACM Press, pp , San Francisco, CA, USA, September, [7] P. Krishnan, D. Raz, Y. Shavitt: The cache location problem; IEEE/ACM Transactions on Networking, [8] V. M. Manikandan: Content Delivery Networks; Seminar Report, Department of Computer Science, Cochin University of Science and Technology, India, [9] G. Pallis, K. Stamos, A. Vakali, A. Sidiropoulos, D. Katsaros, Y. Manolopoulos: Replication-Based on Objects Load Under a Content Distribution Network; In Proceedings of the 2nd International Workshop on Challenges in Web Information Retrieval and Integration (WIRI), Altanta, Georgia, USA, April, [10] G. Pallis, A. Vakali, K. Stamos, A. Sidiropoulos, D. Katsaros, Y. Manolopoulos: A Latency-Based Object Placement Approach in Content Distribution Networks; In Proceedings of the 3rd latin American Web Congress (La-Web 2005), IEEE Press, Buenos Aires, Argentina, pp , October, [11] A. K. Pathan, R. Buyya: A Taxonomy and Survey of Content Delivery Networks; Grid Computing and Distributed Systems (GRIDS) Laboratory, Department of Computer Science and Software Engineering, University of Melbourne, Parkville, Australia, [12] G. Peng: CDN Content Distribution Network; Technical Report TR-125, Experimental Computer System Lab, Department of Computer Science, State University of New York at Stony Brook, Stony Brook, NY, February, 2008.

51 Sebastian Golaszewski 51 [13] L. Qiu, V. N. Padmanabhan, G. M. Voelker: On the placement of web server replicas, In Proceedings of IEEE INFOCOM 2001 Conference, Anchorage, Alaska USA, April, [14] P. Radoslavov, R. Govindan, D. Estrin: Topology-informed Internet replica placement; In Proceedings of Sixth International Workshop on Web Caching and Content Distribution, Boston, Massachusetts, June, [15] D. Rayburn:CDN Pricing Stable In Q4, Down About 20% For The Year, Market Size $675M; November, [16] S. S. H. Tse: Approximate Algorithms for Document Placement in Distributed Web Servers; IEEE Transactions on Parallel and Distributed Systems, Vol. 16, No. 6, pp , June, [17] Wikipedia: Content Delivery Network; delivery_network, March, 2012.

52 52 Content Distribution Networks

53 53 Chapter 4 Wireless Indoor Positioning Techniques Rilind Ballazhi Wireless Indoor Positioning Systems have become a high demand during the years past. These systems are used for tracking and locating people or objects within buildings in different areas, such as locating products stored in a warehouse, positioning the first responder of a rescue team in a building or location detection of medical personnel in a hospital. This work provides an overview of the actual wireless indoor positioning systems. Before this overview, the positioning algorithms and the measurement techniques will be described, followed by a comparison of the different systems. At the end of this work, a summary and a conclusion will be given.

54 54 Wireless Indoor Positioning Techniques Contents 4.1 Introduction Positioning Techniques Positioning Algorithms Measurement Techniques Positioning Systems Fixed Indoor Positioning Systems Pedestrian Indoor Positioning Systems Comparison of Positioning Systems Summary and Conclusions

55 Rilind Ballazhi Introduction Wireless indoor positioning systems (IPSs) are used to locate people and objects within an indoor environment. In the last years the demand for these systems has increased. These systems can be used in different application areas, such as locating products stored in a warehouse, positioning the first responder of a rescue team in a building or location detection of medical personnel in a hospital. Since wireless technology is widely used nowadays, there is a high demand for positioning objects using wireless networks. Wireless IPS can be classified into different categories, but in this work they are categorized into fixed and pedestrian IPS. While fixed IPSs have an installed infrastructure in the indoor environment, the users of the pedestrian positioning systems are equipped with e.g. wearable sensors and they do not have an installed system. Based on the main medium to estimate the object location, fixed IPSs can be grouped into Infrared-, Ultra-sound-, Radio Frequency-, UWB- or Optical-based Indoor Positioning Systems. Figure 4.1 shows the building blocks of a fixed wireless IPS. It mainly consists of a mobile terminal, reference points, base stations and a display system. The mobile terminal, which is carried by the user or the targeted object, sends signals to the reference points, e.g. sensors. The reference points propagate the signals to the base station, where the position of the object will be determined. The base station uses location measurements and position algorithms in order to determine the object s position, which which will be presented on a display system. Figure 4.1: Fixed Indoor Positioning System. This report investigates the different algorithms and measurement techniques used for wireless IPS. Various fixed and pedestrian systems are described and compared at the end, based on different performance metrics. This work is organized as follows. Section 4.2 describes the different positioning techniques used and it is divided into two parts. Section presents various approaches used whereas explains the different measurement techniques for different systems. This is followed by the presentation of the different wireless positioning systems in Section 4.3, which is the main part of this work. Finally, Section 4.4 concludes the work. 4.2 Positioning Techniques There are various wireless techniques used for indoor positioning. Section briefly describes the mainly used algorithms in positioning systems, whereas Section presents the different measurement techniques used for these systems Positioning Algorithms There are four approaches used in building positioning systems [10][9].

56 56 Wireless Indoor Positioning Techniques 1. Triangulation 2. Trilateration 3. Scene Analysis 4. Proximity Triangulation Triangulation uses the geometric properties of triangles to estimate the location of the targeted object. In order to find the object location, this technique determines the angle of incidence at which signals arrive at two known reference points, which e.g. can be sensors installed in the indoor area. The targeted object can be found as the third point of a triangle with one known side and two angles. Triangulation uses the Angle of Arrival (described in Section 4.2.2) as measuring technique. Trilateration This approach is similar to Triangulation. In order to estimate the target location, this approach determines the coordinates of the target object. This is done using at least three base stations with known coordinates. The distance is computed by multiplying the radio signal velocity and the travel time of the signal going from the mobile target to the reference point. Scene Analysis Scene Analysis refers to the type of algorithms that first collects the fingerprints of a scene and then estimate the location of an object. This approach has two phases for locating the fingerprints: the offline and online phase. During the offline stage the location coordinates and the signal strengths from the nearby base stations are collected, while in the online phase a location positioning technique uses the currently observed signal strengths and the collected information in the offline phase for estimating the target location. The following location fingerprinting-based algorithms are used in scene analysis: probalistic methods k-nearest-neighbor (knn) neural networks smallest M-vertex polygon Probalistic Methods Assume there are n location candidates L1, L2, L3,..,Ln and s is the signal strength during the online phase. The location with the highest probability that the mobile node is in it, given the signal strength s (see (4.1)), can be chosen as location where the object is positioned. The given decision rule is based on posteriori probability. P (L i s) > P (L j s) (4.1) k-nearest-neighbor (knn) This algorithm uses the online Received Signal Strength (RSS) measurements (see Section 4.2.2) to search for k best mappings of known locations in signal space and the signal measurements in the offline phase. The searching is based on the root mean square errors principle. In this algorithm, k is the parameter adapted for better performance. The location with the best mapping will be determined as the object position. Neural Networks

57 Rilind Ballazhi 57 This algorithm trains first neural networks during the offline stage, where RSS and the corresponding location coordinates are adopted as inputs for training purpose. For the trained neural networks, appropriate weights are obtained. This kind of positioning algorithms usually uses the multi-layer perception (MLP) network with one hidden layer. The hidden layer in this network is placed between the input and the output layer. The input vector, consisting of RSS values, is multiplied by the trained input weight matrix. The result will be used by the transfer function of the hidden layer. The output of the transfer function will be multiplied by the trained hidden layer weight matrix and presents the two- or three-elements vector, which holds the 2-D or 3-D location estimation. Smallest M-vertex Polygon (SMP) SMP searches for each signal transmitter candidate locations in signal space based on the online RSS values. Each signal transmitter has at least one location candidate. They form M-vertex polygons (for M transmitters). The average of the coordinates of vertices of smallest polygon determines the object location. Proximity Proximity algorithms provide relative location information. The estimation of a targeted object will be examined with respect to a known position or an area. In order to use this technique, one or more fixed reference points (detectors) are needed. When a targeted object is detected by a detector, the position of the object is considered to be in the proximity area where the detector is placed. If the object is detected by more than one detector, the position will be estimated to be in the proximity area of the detector with the highest received signal strength. Proximity techniques can not provide absolute position estimation of an object. Infrared and Radio Frequency ID systems are based on this method (see Section 4.3) Measurement Techniques Nowadays there are different measurement techniques used for positioning systems [10][2]. The techniques used are: AOA: Angle of Arrival TOA: Time of Arrival TDOA: Time Difference of Arrival RSS: Received Signal Strength In the following each of these measurement techniques will be briefly explained. The above mentioned techniques are used by the Base Station to calculate the coordinate of the target object s position. Angle of Arrival (AOA) The AOA measurement technique locates the targeted object by finding the angle of incidence at which the received signal arrive at the reference point. It uses at least two reference points, and two measured angles to the position of an object. In Figure 4.2 the intersection of the two angle direction lines, formed by reference points A and B and their respective angle, locates the targeted object P. The measured angles in this Figure are the angle of incidence at which the signal arrives at point A and B from the targeted object. Advantage of this technique is that it uses few measuring units. E.g. for 2D positioning it uses at least two measuring units, while for 3D positioning at least three measuring units are needed. This technique does not need synchronization between the measuring units. Disadvangtage of AOA is the relatively large hardware requirements and the low accuracy when there are signal reflections. Time of Arrival (TOA)

58 58 Wireless Indoor Positioning Techniques Figure 4.2: Positioning based on AOA measurement [10]. TOA is the time taken by the signal to go from the source mobile target to the measuring unit. For 2D positioning systems, TOA needs signals from at least three reference points as shown in Figure 4.3. For each of the reference points, A, B and C, in this Figure, the distance between them and the target object (P) is determined by multiplying the radio signal velocity and the travel time of the received signal. Knowing the distance between the reference point and the targeted object, TOA circles will be created for each reference point. The intersection of these circles locates then the targeted object P. Figure 4.3: Positioning based on TOA measurement [10]. Compared to AOA, all signal transmitters and receivers in TOA should be synchronized. Time Difference of Arrival (TDOA) Figure 4.4: Positioning based on TDOA measurement [10]. TDOA estimates the relative position of the mobile transmitter by calculating the difference in time of the signals received by multiple measuring units as shown in Figure 4.4. In order to determine the TDOA between two reference points, the signal transmitter (P in Figure 4.4) should lie on a hyperboloid with a constant range difference between the two measuring units (e.g. A and B). The location estimation of the object needs at least two or more TDOA measurements. The intersection of the two hyperbolas formed from the TDOA measurements of reference point A, B and C, estimates the location of target P. Received Signal Strength (RSS) RSS presents the strength of the received signal. This technique estimates the distance of the mobile target from a set of reference points, using the attenuation of the emitted

59 Rilind Ballazhi 59 Figure 4.5: Positioning based on RSS measurement [10]. signal strength. Knowing the attenuation of the emitted signal the signal path loss due to propagation is determined. In Figure 4.5 the signal path losses between mobile target P and reference points A, B and C are marked as LS1, LS2 and LS3. There are various models used to translate the signal path losses into a range estimate, which then can determine the estimated location. 4.3 Positioning Systems This Section describes two types of positioning systems. They are classified into fixed and pedestrian indoor positioning systems Fixed Indoor Positioning Systems A fixed indoor positioning system is positioning an object if the building has an installed positioning system in it and users are carrying mobile devices or tags [7][2]. These tools provide a unique identification for each object. In the following, some of the wireless fixed IPS will be presented. The systems classified as fixed indoor positioning systems are: Infrared Positioning Systems Ultra-sound Positioning Systems RF Positioning Systems Optical Indoor Positioning Systems Ultra Wideband Systems Infrared Positioning Systems Infrared (IR) positioning systems [24][7] use IR signals to transmit signals from mobile target to sensor nodes. The object which has to be positioned by this kind of systems carries an ID card equipped with infrared LED. The infrared LED transmits a unique code to the Base Station every fifteen seconds. IR positioning systems have a simple structure, low cost and relatively high accuracy indoor. The transmission distance and the short sight lines are two disadvantages of these systems. One of the IR positioning systems is the Active Badge System [21] developed by AT & T. It is one of the earliest indoor positioning systems. This system is not offered commercially. The badge, which is shown on the right side of Figure 4.6, is carried by the objects to be tracked and transmits its ID to one or more fixed sensors in the building. The sensors propagate the signal information to a central server. Based on these information, the location of the targeted object will be estimated. OPTOTRAK PROseries [15] is another IR-based IPS, designed by Northern Digital Inc. for congested shops and workspaces. This system is quick and easy to set up. It uses a system of three cameras (see Figure 4.7) for tracking 3-D positions of markers, which

60 60 Wireless Indoor Positioning Techniques Figure 4.6: Base station (left) and transmitter (right) of Active Badge system [24]. Figure 4.7: OPTOTRAK PROseries System [7]. are mounted on different parts of the tracked object. The mounted markers emits IR light, which is detected by the cameras in order to estimate the location. Triangulation technique is used for estimating the location. Figure 4.7 shows three markers A, B, C mounted on a surface of a car, and marker E which is fixed on the car door. By opening the door, the position of emitter E can be measured with relative position changes with respect to the reference points A, B and C. This system provide a high accuracy of 0.1 mm to 0.5 mm with 95% success probability. As this system is IR-based, it requires line-of-sight. The cameras used in OPTOTRAK can cover a volume of 20 m 3. Ultra-sound Positioning Systems Ultrasound positioning systems are used more often than IR techniques and Radio Frequency (RF) based systems as they are more accurate for positioning objects. The ultrasound positioning systems use the ultrasonic beacons for signal transmission from the mobile target to the reference points. Figure 4.8: Active Bat System [1].

61 Rilind Ballazhi 61 Positioning systems using the ultrasound technology are the Active Bats [1][22] developed by AT & T, and the Cricket system [13][16] developed by Massachusetts Institute of Technology (MIT). The Active Bats system (Figure 4.8) is similar to the Active Badges (IR system). The signal sender (a Bat) is carried by the object to be located and transmits short pulses of ultrasound to receivers mounted at fixed places on the ceiling. As the speed of sound in air is known, the distances from the signal transmitter to each receiver in the wall will be determined. So we have the required information to locate the object which carries the Bat. In order to determine the object location the trilateration technique is used. The Cricket system consists of cricket nodes, shown in Figure 4.9. The cricket node consists of a Radio Frequency (RF) transceiver, a microcontroller, and other associated hardware for generating and receiving ultrasonic signals and interfacing with a host device. The cricket nodes can be used either as beacon or as listener. Beacons are placed to the ceiling, which means that they have a fix location, while listeners are attached to the target objects. Beacons send information periodically, containing the ID, the range of coverage or physical space associated with it and its coordinates. The listener listens to this transmission and determines the distances to the nearby beacons. Figure 4.9: Cricket Nodes [16]. Radio Frequency (RF) Positioning Systems RF positioning systems use the Radio Frequency for signal transmission, which is the mostly used wireless technology for indoor geolocation. In the following, the LANDMARC [14], RADAR [18][3] and SpotON [8] positioning systems based on the RF technology will be described. The LANDMARC system is a positioning system based on Radio Frequency and RFID tags. The LANDMARC system uses RFID tags and readers, which are shown in Figure The RFID readers are placed in well-known locations and divide the regions into sub-regions. They have 8 power levels of the signal strength of the received signal from tags. For this reason, LANDMARC spends approximately one minute to scan the 8 power levels and to determine the signal strength, received from tags, according to the power level of the reader. In order to decrease the costs, for this system extra fixed location reference tags are used to help the location calibration. These extra tags serve as landmarks in the system. The RFID tags, carried by the targeted objects, are associated to the subregions, where the objects are also moving. The RSS measuring technique is used for this system. Figure 4.10: The RFID reader and tag used in LANDMARC [14].

62 62 Wireless Indoor Positioning Techniques The RADAR system works using a radio map, which holds the previous measured signal strengths and the building locations where the measurement was done. In order to locate objects, the wireless device of the object to be located measures the signal strength (RSS) from the reference points within its range and then search in the radio map for the signal strength entry with the best match with its result. The accuracy of RADAR system is about 2-3m. Figure 4.11: SpotON architecture [8]. SpotON is another well-known positioning system based on RFID technology. The algorithm used in SpotON is also based on RSS measurements (see Section 4.2.2). Figure 4.11 shows the architecture of this system, consisting of object tags, base stations and a central server. The Hydra Microwebserver is an embedded hardware in base stations, which has both ethernet and RS232 port and it is used for internetworking task. The object tags emits the RF signals to the base stations, which measure the RSS and send them through RS232 port via internet to the server. The server maps the received RSS values with previous measured RSS data, and determines the position of the targeted objects. The base stations are serially connected. Optical Indoor Positioning Systems The optical indoor positioning systems [2][12] compared to the other system, have a system installed in the building and a camera carried by the user. These systems are mostly used for locating robots within an indoor environment. Figure 4.12: Laser spots in CLIPS [11]. CLIPS (Camera and Laser based Indoor Positioning System) [11] is an Optical Indoor Positioning System. The system consists of a mobile camera, carried by the targeted object, and a projector. The projector projects a pattern of laser spots on any surface in the building. The 3D directions of these laser beams are known. When the mobile camera captures the laser spots, the relative orientation of the camera to the project

63 Rilind Ballazhi 63 is determined. This system does not require high-precision mechanics or sophisticated set-ups. Ultra Wideband Systems Ultra wideband system (UWB) in comparison to other systems transmits a signal for a much shorter duration over multiple bands of frequencies simultaneously. UWB can be used for precise indoor positioning. Figure 4.13: Ubisense System [20]. A well-known UWB system is the Ubisense system [19][20]. This system uses the triangulation technique for finding the object location. It consists of tracked tags, sensors placed in the building, and the Ubisense location management platform (see Figure 4.13). The tags are tracked by objects. They send signals to the sensors placed in the building, which then propagate these data to the location management platform, where the position of the object will be determined Pedestrian Indoor Positioning Systems The Pedestrian Indoor Positioning Systems belong to a group of positioning systems, which does not have a system installed within the building where users or objects have to be detected. People or objects which have to be localized are carrying the localization sensors. The Pedestrian Dead Reckoning (PDR) [2] is the mostly used approach for positioning in pedestrian positioning systems. The PDR technique starts from a known location and adds every movement of the targeted object to the coordinates of the starting point. The PDR estimates the speed of the object movement and the heading or direction of movement. The following Pedestrian IPSs will be described: Beauregard s System FootSLAM Fischer s System Bat System Beauregard s System Beauregard s System [4] uses the PDR approach in order to locate objects or people. Figure 4.14 shows the helmet mounted sensor used by this system. The orange box in this Figure shows the XSens motion sensor used, which has to be directed to the movement of the user, and the smaller device on the top is the GPS antenna. The GPS antenna is used to calibrate and validate the PDR technology. Data is logged using a laptop computer and the GPS receiver is carried in a small backpack.

64 64 Wireless Indoor Positioning Techniques Figure 4.14: Helmet-mounted Sensor [4]. The PDR approach used in this system is decomposed into the step detection and estimation part and the heading estimation part. During the step detection and estimation part, the speed of the movement and the length of the step will be determined. In heading detection and estimation step, the direction of the movement is estimated. The motion sensor s yaw output is used in this case. They are mounted in a fixed orientation relative to the user s body. The person must keep the helmet pointed in the direction of motion. This is a restriction of this system, which needs to be fixed. FootSLAM Figure 4.15: 2D map learnt from the IMU data [17]. FootSLAM [17] is another pedestrian positioning system. It uses a Bayesian estimation approach for simultaneous localization and mapping of pedestrians based on odomentry with foot mounted Inertial Measurement Unit (IMU). The foot mounted IMU are the only sensors used by this system. With the IMU data, a 2D map (see Figure 4.15) of the building will be built, without prior knowledge of the building structure. This 2D map is improved by a larger number of visits to the locations in the building and is used to perform estimation of pedestrians future paths. Fischer s System Fischer s system [5] is a pedestrian positioning system which, compared to the FootSLAM and Beauregard s System, combines the ultrasound beacons with foot mounted inertial sensors (see Figure 4.16) for providing better accuracy. The system guides a person along a defined path, under poor visibility conditions. It is applied mainly to the rescue team first responder.

65 Rilind Ballazhi 65 The PDR algorithm, used by the foot mounted IMU, is used to get the location of the pedestrians. It contains two types of phases within the steps, a stance phase, when the foot is in contact with the ground, and a swing phase. During the stance phase the velocity is reset to zero, while the acceleration is double integrated during the swing phase. Fischer defines two types of error in PDR approach, the heading error and the distance error. To reduce the heading error the users have to deploy ultrasound sensors (see the black box in Figure 4.16) on their way as they proceed within the building. They are used as landmarks in the building. Figure 4.16: Fischer s System [6]. Bat System The Bat system [23] provides absolute positioning of the pedestrian by combining a foot mounted Inertial Measurement Unit (IMU), for the PDR algorithm, a detailed building model and a particle filter. The Bayesian filter is used as particle filter for correcting the drift in inertial measurements. In order to reduce the computational overheads at the start of localization process, the Wifi signal strength is used for the initialization of the localization algorithm. Figure 4.17: Map used in Bat System [23]. Figure 4.17 shows the absolute location of a pedestrian which can be determined by knowing a path of step events describing their relative movement, and a map of the environment. For every step of the pedestrian, the horizontal step length, the change in height over the step and the change in heading between the previous and the current steps are measured. The new pedestrian heading and position is calculated based on the previous state and step parameters.

Burkhard Stiller Thomas Bocek Cristian Morariu Peter Racz Martin Waldburger (Eds.) Internet Economics II. February 2006

Burkhard Stiller Thomas Bocek Cristian Morariu Peter Racz Martin Waldburger (Eds.) Internet Economics II. February 2006 Burkhard Stiller Thomas Bocek Cristian Morariu Peter Racz Martin Waldburger (Eds.) Internet Economics II TECHNICAL REPORT No. ifi-2006.02 February 2006 University of Zurich Department of Informatics (IFI)


Burkhard Stiller, Placi Flury, Jan Gerke, Hasan, Peter Reichl (Edt.) Internet-Economics 1

Burkhard Stiller, Placi Flury, Jan Gerke, Hasan, Peter Reichl (Edt.) Internet-Economics 1 Eidgenössische Technische Hochschule Zürich Swiss Federal Institute of Technology Zurich Burkhard Stiller, Placi Flury, Jan Gerke, Hasan, Peter Reichl (Edt.) Internet-Economics 1 TIK-Report Nr. 105, Februayr


Internet Economics II

Internet Economics II Internet Economics II Burkhard STILLER Oliver BRAUN Arnd HEURSCH Peter RACZ (Hrsg.) Institut für Informationstechnische Systeme, IIS Bericht Nr. 2003-01 June 2003 Universität der Bundeswehr München Fakultät


IT-Outsourcing in Switzerland. IT-Outsourcing in der Schweiz

IT-Outsourcing in Switzerland. IT-Outsourcing in der Schweiz IT-Outsourcing in Switzerland IT-Outsourcing in der Schweiz Dr. iur. Roger Staub Nicola Benz Rechtsanwälte Attorneys at Law Avocats I T - O u t s o u r c i n g i n S w i t z e r l a n d I T - O u t s


Entwicklungen in den Informations- und Kommunikationstechnologien

Entwicklungen in den Informations- und Kommunikationstechnologien Entwicklungen in den Informations- und Kommunikationstechnologien Herausgeber: Friedrich-L. Holl Band 3 Study Criteria for success of identification, authentication and signing methods based on asymmetric


Trends, Pressures and Factors that affect Data Center Management taking Environmental Sustainability into Account

Trends, Pressures and Factors that affect Data Center Management taking Environmental Sustainability into Account Trends, Pressures and Factors that affect Data Center Management taking Environmental Master thesis, 15 ECTS, INFM03 in informatics Presented: 30 th of May 2012 Authors: Papadopoulos Charalampos Wurm Andreas



THE USE OF XING AND LINKEDIN FOR RECRUITMENT IN ENTERPRISES THE USE OF XING AND LINKEDIN FOR RECRUITMENT IN ENTERPRISES 22nd June 2012 Carlo Colicchio Iso Demirkaya Alex Hächler Project 1 / FHNW Master of Science in Business Information Systems Introduction Title:


Supporting Public Deliberation Through Spatially Enhanced Dialogs

Supporting Public Deliberation Through Spatially Enhanced Dialogs MASTER THESIS Supporting Public Deliberation Through Spatially Enhanced Dialogs Gerald Pape 2nd November 2014 Westfälische Wilhelms-Universität Münster Institute for Geoinformatics First Supervisor: Second


UNIVERSITY OF TWENTE. Master Thesis. Cultural influences on the use of effectuation in entrepreneurship. The case of German student entrepreneurs

UNIVERSITY OF TWENTE. Master Thesis. Cultural influences on the use of effectuation in entrepreneurship. The case of German student entrepreneurs UNIVERSITY OF TWENTE Master Thesis Cultural influences on the use of effectuation in entrepreneurship The case of German student entrepreneurs Michael Drecker 17.02.2012 This thesis was written and handed


Evaluation of the COGITO system

Evaluation of the COGITO system Risø-R-1363 (EN) Evaluation of the COGITO system V. Andersen and H.H.K. Andersen Risø National Laboratory, Roskilde August 2002 Abstract This report covers the deliverable D7.2 of the COGITO project. It


Eidesstattliche Erklärung

Eidesstattliche Erklärung Can a Paths Catalogue in European High-Speed Rail Operations enhance Competition? Master Thesis University of Zurich Executive MBA Prof. Dr. Schenker Degree Program: Executive MBA Authors: Eric Cosandey


5. GI/ITG KuVS Fachgespräch Ortsbezogene Anwendungen und Dienste 4.-5. September 2008, Nürnberg

5. GI/ITG KuVS Fachgespräch Ortsbezogene Anwendungen und Dienste 4.-5. September 2008, Nürnberg 5. GI/ITG KuVS Fachgespräch Ortsbezogene Anwendungen und Dienste 4.-5. September 2008, Nürnberg Jörg Roth (Hrsg.) Georg-Simon-Ohm-Hochschule Nürnberg 90489 Nürnberg Joerg.Roth@Ohm-hochschule.de Abstract


Program SUC 2013-2016 P-2 Scientific information: Access, processing and safeguarding

Program SUC 2013-2016 P-2 Scientific information: Access, processing and safeguarding Program SUC 2013-2016 P-2 Scientific information: Access, processing and safeguarding White Paper for a Swiss Information Provisioning and Processing Infrastructure 2020 Contact: isci@crus.ch Web: www.crus.ch/isci


Client Employment of previous Auditors -

Client Employment of previous Auditors - Heidrun Schlaich & Max Ziegler Client Employment of previous Auditors - Banks views on Auditors Independence Business Administration Master s Thesis 15 ECTS Term: Spring 2014 Supervisor: Dan Nordin Acknowledgements


Social Dimensions of the German Energy Transition

Social Dimensions of the German Energy Transition Faculty of Natural Resources and Agricultural Sciences Social Dimensions of the German Energy Transition On the issue of social justice in a technological transformation process Verena Gröbmayr Department


A Division of Cisco Systems, Inc. ADSL2 Gateway. with 4-Port Switch. User Guide WIRED AG241. Model No.

A Division of Cisco Systems, Inc. ADSL2 Gateway. with 4-Port Switch. User Guide WIRED AG241. Model No. A Division of Cisco Systems, Inc. WIRED ADSL2 Gateway with 4-Port Switch User Guide Model No. AG241 ADSL2 Gateway with 4-Port Switch Copyright and Trademarks Specifications are subject to change without


Master Thesis. Submitted to: Reykjavik University School of Business. International Business

Master Thesis. Submitted to: Reykjavik University School of Business. International Business Master Thesis Submitted to: Reykjavik University School of Business International Business Stakeholder Management On the Example of Accreditation for the FIS Alpine Ski World Cup Doris Lintner January


',3/20$5%(,7. Titel der Diplomarbeit: etom Enhanced Telecom Operations Map: Design und Erstellung von Telekom-Referenzprozessen.

',3/20$5%(,7. Titel der Diplomarbeit: etom Enhanced Telecom Operations Map: Design und Erstellung von Telekom-Referenzprozessen. ',3/20$5%(,7 Titel der Diplomarbeit: etom Enhanced Telecom Operations Map: Design und Erstellung von Telekom-Referenzprozessen. Verfasser: angestrebter akademischer Grad Magister der Sozial- und Wirtschaftswissenschaften


Ortsbezogene Anwendungen und Dienste 9. Fachgespräch der GI/ITG-Fachgruppe Kommunikation und Verteilte Systeme

Ortsbezogene Anwendungen und Dienste 9. Fachgespräch der GI/ITG-Fachgruppe Kommunikation und Verteilte Systeme Matthias Werner, Mario Haustein (Hrsg.) Ortsbezogene Anwendungen und Dienste 9. Fachgespräch der GI/ITG-Fachgruppe Kommunikation und Verteilte Systeme Fakultät für Informatik. www.tu-chemnitz.de/informatik


FOR. Inauguraldissertation zur Erlangung des akademischen Grades eines Doktors der Wirtschaftswissenschaften der Universität Mannheim.

FOR. Inauguraldissertation zur Erlangung des akademischen Grades eines Doktors der Wirtschaftswissenschaften der Universität Mannheim. DESIGN PRINCIPLES FOR SUPPLY NETWORK SYSTEMS Inauguraldissertation zur Erlangung des akademischen Grades eines Doktors der Wirtschaftswissenschaften der Universität Mannheim vorgelegt von Diplom-Kaufmann


Hearings. Les menaces, l Union européenne, l OTAN et la rationalisation de la défence européenne

Hearings. Les menaces, l Union européenne, l OTAN et la rationalisation de la défence européenne Bericht des Bundesrats an die Bundesversammlung über die Sicherheitspolitik der Schweiz, 2015 Hearings Lars Nicander Cyber risks Julian Harston Peacekeeping and Peace Support Dr. Emmanuel Kwesi Aning UN


Antecedents and Benefits of the Preferred Customer Status in a Buyer-supplier Relationship: a multiple case study at Gebr.

Antecedents and Benefits of the Preferred Customer Status in a Buyer-supplier Relationship: a multiple case study at Gebr. Antecedents and Benefits of the Preferred Customer Status in a Buyer-supplier Relationship: a multiple case study at Gebr. Becker GmbH Author: Steffen Kokozinski University of Twente P.O. Box 217, 7500AE


A Division of Cisco Systems, Inc. GHz 2.4 802.11g. Wireless-G. User Guide. ADSL Gateway WIRELESS WAG54G. Model No.

A Division of Cisco Systems, Inc. GHz 2.4 802.11g. Wireless-G. User Guide. ADSL Gateway WIRELESS WAG54G. Model No. A Division of Cisco Systems, Inc. GHz 2.4 802.11g WIRELESS Wireless-G ADSL Gateway User Guide Model No. WAG54G Wireless-G ADSL Gateway Copyright and Trademarks Specifications are subject to change without


Sustainability Made in Germany

Sustainability Made in Germany Sustainability Made in Germany The Second Review by a Group of International Peers, commissioned by the German Federal Chancellery Berlin, September 2013 texte Nr. 44, September 2013 Sustainability Made


Final report for the pilot project Data and Service Center for the Humanities (DaSCH)

Final report for the pilot project Data and Service Center for the Humanities (DaSCH) Vol. 10, N o 1, 2015 Final report for the pilot project Data and Service Center for the Humanities (DaSCH) Swiss Academies Reports, Vol. 10, N o 1, 2015 www.akademien-schweiz.ch Final report for the pilot


Sustainability as a driver of innovation in the automotive industry on the example of Recycling old cars and car parts.

Sustainability as a driver of innovation in the automotive industry on the example of Recycling old cars and car parts. University of Applied Science Campus Wels Course: Innovation- and Productmanagement Sustainability as a driver of innovation in the automotive industry on the example of Recycling old cars and car parts.


S E C A Swiss Private Equity & Corporate Finance Association. SECA Yearbook 2007

S E C A Swiss Private Equity & Corporate Finance Association. SECA Yearbook 2007 SECA Yearbook 2007 I. Report from the President 3 II. SECA, Switzerland and Private Equity 11 III. Chapters and Working Groups 31 Reporting Innovation & Venture Capital 32 Reporting Private Equity 47 Reporting


Language Policies, Practices, and Perceptions in German Multinational Corporations. A Case Study.

Language Policies, Practices, and Perceptions in German Multinational Corporations. A Case Study. Multilingualism Management at Work Language Policies, Practices, and Perceptions in German Multinational Corporations. A Case Study. by Katharina Leuner A thesis presented to the University of Waterloo


DIPLOMARBEIT. Titel der Diplomarbeit. The International Relocation of Corporate Employees: Overview and Terminology. Verfasserin.

DIPLOMARBEIT. Titel der Diplomarbeit. The International Relocation of Corporate Employees: Overview and Terminology. Verfasserin. DIPLOMARBEIT Titel der Diplomarbeit The International Relocation of Corporate Employees: Overview and Terminology Verfasserin Nicole Insanally angestrebter akademischer Grad Magistra der Philosophie Wien,


Proceedings. Student Conference on Software Engineering and Database Systems

Proceedings. Student Conference on Software Engineering and Database Systems Proceedings Student Conference on Software Engineering and Database Systems 27th June 2009 University of Magdeburg G29-307 Database Track Table of Contents Current State and Future Challenges of Indexing