O'Reilly logo

Peer-to-Peer by Andy Oram

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The network model of the Internet explosion (1995-1999)

The explosion of the Internet in 1994 radically changed the shape of the Internet, turning it from a quiet geek utopia into a bustling mass medium. Millions of new people flocked to the Net. This wave represented a new kind of people—ordinary folks who were interested in the Internet as a way to send email, view web pages, and buy things, not computer scientists interested in the details of complex computer networks. The change of the Internet to a mass cultural phenomenon has had a far-reaching impact on the network architecture, an impact that directly affects our ability to create peer-to-peer applications in today’s Internet. These changes are seen in the way we use the network, the breakdown of cooperation on the Net, the increasing deployment of firewalls on the Net, and the growth of asymmetric network links such as ADSL and cable modems.

The switch to client/server

The network model of user applications—not just their consumption of bandwidth, but also their methods of addressing and communicating with other machines—changed significantly with the rise of the commercial Internet and the advent of millions of home users in the 1990s. Modem connection protocols such as SLIP and PPP became more common, typical applications targeted slow-speed analog modems, and corporations began to manage their networks with firewalls and Network Address Translation (NAT). Many of these changes were built around the usage patterns common at the time, most of which involved downloading data, not publishing or uploading information.

The web browser, and many of the other applications that sprung up during the early commercialization of the Internet, were based around a simple client/server protocol: the client initiates a connection to a well-known server, downloads some data, and disconnects. When the user is finished with the data retrieved, the process is repeated. The model is simple and straightforward. It works for everything from browsing the Web to watching streaming video, and developers cram shopping carts, stock transactions, interactive games, and a host of other things into it. The machine running a web client doesn’t need to have a permanent or well-known address. It doesn’t need a continuous connection to the Internet. It doesn’t need to accommodate multiple users. It just needs to know how to ask a question and listen for a response.

Not all of the applications used at home fit this model. Email, for instance, requires much more two-way communication between an email client and server. In these cases, though, the client is often talking to a server on the local network (either the ISP’s mail server or a corporate one). Chat systems that achieved widespread usage, such as AOL’s Instant Messenger, have similar “local” properties, and Usenet systems do as well. As a result, the typical ISP configuration instructions give detailed (and often misunderstood) instructions for email, news, and sometimes chat. These were the exceptions that were worth some manual configuration on the user’s part. The “download” model is simpler and works without much configuration; the “two-way” model is used less frequently but perhaps to greater effect.

While early visions of the Web always called it a great equalizer of communications—a system that allowed every user to publish their viewpoints rather than simply consume media—the commercial explosion on the Internet quickly fit the majority of traffic into the downstream paradigm already used by television and newspapers. Architects of the systems that enabled the commercial expansion of the Net often took this model into account, assuming that it was here to stay. Peer-to-peer applications may require these systems to change.

The breakdown of cooperation

The early Internet was designed on principles of cooperation and good engineering. Everyone working on Internet design had the same goal: build a reliable, efficient, powerful network. As the Internet entered its current commercial phase, the incentive structures changed, resulting in a series of stresses that have highlighted the Internet’s susceptibility to the tragedy of the commons. This phenomenon has shown itself in many ways, particularly the rise of spam on the Internet and the challenges of building efficient network protocols that correctly manage the common resource.

Spam: Uncooperative people

Spam, or unsolicited commercial messages, is now an everyday occurrence on the Internet. Back in the pre-commercial network, however, unsolicited advertisements were met with surprise and outrage. The end of innocence occurred on April 12, 1994, the day the infamous Canter and Seigel “green card spam” appeared on the Usenet. Their offense was an advertisement posted individually to every Usenet newsgroup, blanketing the whole world with a message advertising their services. At the time, this kind of action was unprecedented and engendered strong disapproval. Not only were most of the audience uninterested in the service, but many people felt that Canter and Seigel had stolen the Usenet’s resources. The advertisers did not pay for the transmission of the advertisement; instead the costs were borne by the Usenet as a whole.

In the contemporary Internet, spam does not seem surprising; Usenet has largely been given over to it, and ISPs now provide spam filtering services for their users’ email both to help their users and in self-defense. Email and Usenet relied on individuals’ cooperation to not flood the commons with junk mail, and that cooperation broke down. Today the Internet generally lacks effective technology to prevent spam.

The problem is the lack of accountability in the Internet architecture. Because any host can connect to any other host, and because connections are nearly anonymous, people can insert spam into the network at any point. There has been an arms race of trying to hold people accountable—closing down open sendmail relays, tracking sources of spam on Usenet, retaliation against spammers—but the battle has been lost, and today we have all learned to live with spam.

The lesson for peer-to-peer designers is that without accountability in a network, it is difficult to enforce rules of social responsibility. Just like Usenet and email, today’s peer-to-peer systems run the risk of being overrun by unsolicited advertisements. It is difficult to design a system where socially inappropriate use is prevented. Technologies for accountability, such as cryptographic identification or reputation systems, can be valuable tools to help manage a peer-to-peer network. There have been proposals to retrofit these capabilities into Usenet and email, but none today are widespread; it is important to build these capabilities into the system from the beginning. Chapter 16, discusses some techniques for controlling spam, but these are still arcane.

The TCP rate equation: Cooperative protocols

A fundamental design principle of the Internet is best effort packet delivery. “Best effort” means the Internet does not guarantee that a packet will get through, simply that the Net will do its best to get the packet to the destination. Higher-level protocols such as TCP create reliable connections by detecting when a packet gets lost and resending it. A major reason packets do not get delivered on the Internet is congestion: if a router in the network is overwhelmed, it will start dropping packets at random. TCP accounts for this by throttling the speed at which it sends data. When the network is congested, each individual TCP connection independently slows down, seeking to find the optimal rate while not losing too many packets. But not only do individual TCP connections optimize their bandwidth usage, TCP is also designed to make the Internet as a whole operate efficiently. The collective behavior of many individual TCP connections backing off independently results in a lessening of the congestion at the router, in a way that is exquisitely tuned to use the router’s capacity efficiently. In essence, the TCP backoff algorithm is a way for individual peers to manage a shared resource without a central coordinator.

The problem is that the efficiency of TCP on the Internet scale fundamentally requires cooperation: each network user has to play by the same rules. The performance of an individual TCP connection is inversely proportional to the square root of the packet loss rate—part of the “TCP rate equation,” a fundamental governing law of the Internet. Protocols that follow this law are known as “TCP-friendly protocols.” It is possible to design other protocols that do not follow the TCP rate equation, ones that rudely try to consume more bandwidth than they should. Such protocols can wreak havoc on the Net, not only using more than their fair share but actually spoiling the common resource for all. This abstract networking problem is a classic example of a tragedy of the commons, and the Internet today is quite vulnerable to it.

The problem is not only theoretical, it is also quite practical. As protocols have been built in the past few years by companies with commercial demands, there has been growing concern that unfriendly protocols will begin to hurt the Internet.

An early example was a feature added by Netscape to their browser—the ability to download several files at the same time. The Netscape engineers discovered that if you downloaded embedded images in parallel, rather than one at a time, the whole page would load faster and users would be happier. But there was a question: was this usage of bandwidth fair? Not only does it tax the server to have to send out more images simultaneously, but it creates more TCP channels and sidesteps TCP’s congestion algorithms. There was some controversy about this feature when Netscape first introduced it, a debate quelled only after Netscape released the client and people discovered in practice that the parallel download strategy did not unduly harm the Internet. Today this technique is standard in all browsers and goes unquestioned. The questions have reemerged at the new frontier of "download accelerator” programs that download different chunks of the same file simultaneously, again threatening to upset the delicate management of Internet congestion.

A more troubling concern about congestion management is the growth of bandwidth-hungry streaming broadband media. Typical streaming media applications do not use TCP, instead favoring custom UDP-based protocols with their own congestion control and failure handling strategies. Many of these protocols are proprietary; network engineers do not even have access to their implementations to examine if they are TCP-friendly. So far there has been no major problem. The streaming media vendors seem to be playing by the rules, and all is well. But fundamentally the system is brittle, and either through a mistake or through greed the Internet’s current delicate cooperation could be toppled.

What do spam and the TCP rate algorithm have in common? They both demonstrate that the proper operation of the Internet is fragile and requires the cooperation of everyone involved. In the case of TCP, the system has mostly worked and the network has been preserved. In the case of spam, however, the battle has been lost and unsocial behavior is with us forever. The lesson for peer-to-peer system designers is to consider the issue of polite behavior up front. Either we must design systems that do not require cooperation to function correctly, or we must create incentives for cooperation by rewarding proper behavior or auditing usage so that misbehavior can be punished.

Firewalls, dynamic IP, NAT: The end of the open network

At the same time that the cooperative nature of the Internet was being threatened, network administrators implemented a variety of management measures that resulted in the Internet being a much less open network. In the early days of the Internet, all hosts were equal participants. The network was symmetric—if a host could reach the Net, everyone on the Net could reach that host. Every computer could equally be a client and a server. This capability began to erode in the mid-1990s with the deployment of firewalls, the rise of dynamic IP addresses, and the popularity of Network Address Translation (NAT).

As the Internet matured there came a need to secure the network, to protect individual hosts from unlimited access. By default, any host that can access the Internet can also be accessed on the Internet. Since average users could not handle the security risks that resulted from a symmetric design, network managers turned to firewalls as a tool to control access to their machines.

Firewalls stand at the gateway between the internal network and the Internet outside. They filter packets, choosing which traffic to let through and which to deny. A firewall changes the fundamental Internet model: some parts of the network cannot fully talk to other parts. Firewalls are a very useful security tool, but they pose a serious obstacle to peer-to-peer communication models.

A typical firewall works by allowing anyone inside the internal network to initiate a connection to anyone on the Internet, but it prevents random hosts on the Internet from initiating connections to hosts in the internal network. This kind of firewall is like a one-way gate: you can go out, but you cannot come in. A host protected in this way cannot easily function as a server; it can only be a client. In addition, outgoing connections may be restricted to certain applications like FTP and the Web by blocking traffic to certain ports at the firewall.

Allowing an Internet host to be only a client, not a server, is a theme that runs through a lot of the changes in the Internet after the consumer explosion. With the rise of modem users connecting to the Internet, the old practice of giving every Internet host a fixed IP address became impractical, because there were not enough IP addresses to go around. Dynamic IP address assignment is now the norm for many hosts on the Internet, where an individual computer’s address may change every single day. Broadband providers are even finding dynamic IP useful for their “always on” services. The end result is that many hosts on the Internet are not easily reachable, because they keep moving around. Peer-to-peer applications such as instant messaging or file sharing have to work hard to circumvent this problem, building dynamic directories of hosts. In the early Internet, where hosts remained static, it was much simpler.

A final trend is to not even give a host a valid public Internet address at all, but instead to use NAT to hide the address of a host behind a firewall. NAT combines the problems of firewalls and dynamic IP addresses: not only is the host’s true address unstable, it is not even reachable! All communication has to go through a fairly simple pattern that the NAT router can understand, resulting in a great loss of flexibility in applications communications. For example, many cooperative Internet games have trouble with NAT: every player in the game wants to be able to contact every other player, but the packets cannot get through the NAT router. The result is that a central server on the Internet has to act as an application-level message router, emulating the function that TCP/IP itself used to serve.

Firewalls, dynamic IP, and NAT grew out of a clear need in Internet architecture to make scalable, secure systems. They solved the problem of bringing millions of client computers onto the Internet quickly and manageably. But these same technologies have weakened the Internet infrastructure as a whole, relegating most computers to second-class status as clients only. New peer-to-peer applications challenge this architecture, demanding that participants serve resources as well as use them. As peer-to-peer applications become more common, there will be a need for common technical solutions to these problems.

Asymmetric bandwidth

A final Internet trend of the late 1990s that presents a challenge to peer-to-peer applications is the rise in asymmetric network connections such as ADSL and cable modems. In order to get the most efficiency out of available wiring, current broadband providers have chosen to provide asymmetric bandwidth. A typical ADSL or cable modem installation offers three to eight times more bandwidth when getting data from the Internet than when sending data to it, favoring client over server usage.

The reason this has been tolerated by most users is clear: the Web is the killer app for the Internet, and most users are only clients of the Web, not servers. Even users who publish their own web pages typically do not do so from a home broadband connection, but instead use third-party dedicated servers provided by companies like GeoCities or Exodus. In the early days of the Web it was not clear how this was going to work: could each user have a personal web server? But in the end most Web use is itself asymmetric—many clients, few servers—and most users are well served by asymmetric bandwidth.

The problem today is that peer-to-peer applications are changing the assumption that end users only want to download from the Internet, never upload to it. File-sharing applications such as Napster or Gnutella can reverse the bandwidth usage, making a machine serve many more files than it downloads. The upstream pipe cannot meet demand. Even worse, because of the details of TCP’s rate control, if the upstream path is clogged, the downstream performance suffers as well. So if a computer is serving files on the slow side of a link, it cannot easily download simultaneously on the fast side.

ADSL and cable modems assume asymmetric bandwidth for an individual user. This assumption takes hold even more strongly inside ISP networks, which are engineered for bits to flow to the users, not from them. The end result is a network infrastructure that is optimized for computers that are only clients, not servers. But peer-to-peer technology generally makes every host act both as a client and a server; the asymmetric assumption is incorrect. There is not much an individual peer-to-peer application can do to work around asymmetric bandwidth; as peer-to-peer applications become more widespread, the network architecture is going to have to change to better handle the new traffic patterns.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required