While the new breed of peer-to-peer applications can take lessons from earlier models, these applications also introduce new characteristics or features that are novel. Peer-to-peer allows us to separate the concepts of authoring information and publishing that same information. Peer-to-peer allows for decentralized application design, something that is both an opportunity and a challenge. And peer-to-peer applications place unique strains on firewalls, something well demonstrated by the current trend to use the HTTP port for operations other than web transactions.
One of the promises of the Internet is that people are able to be their own publishers, for example, by using personal web sites to make their views and interests known. Self-publishing has certainly become more common with the commercialization of the Internet. More often, however, users spend most of their time reading (downloading) information and less time publishing, and as discussed previously, commercial providers of Internet access have structured their offering around this asymmetry.
The example of Napster creates an interesting middle ground between the ideal of “everyone publishes” and the seeming reality of “everyone consumes.” Napster particularly (and famously) makes it very easy to publish data you did not author. In effect, your machine is being used as a repeater to retransmit data once it reaches you. A network designer, assuming that there are only so many authors in the world and therefore that asymmetric broadband is the perfect optimization, is confounded by this development. This is why many networks such as college campuses have banned Napster from use.
Napster changes the flow of data. The assumptions that servers would be owned by publishers and that publishers and authors would combine into a single network location have proven untrue. The same observation also applies to Gnutella, Freenet, and others. Users don’t need to create content in order to want to publish it—in fact, the benefits of publication by the “reader” have been demonstrated by the scale some of these systems have been able to reach.
Peer-to-peer systems seem to go hand-in-hand with decentralized systems. In a fully decentralized system, not only is every host an equal participant, but there are no hosts with special facilitating or administrative roles. In practice, building fully decentralized systems can be difficult, and many peer-to-peer applications take hybrid approaches to solving problems. As we have already seen, DNS is peer-to-peer in protocol design but with a built-in sense of hierarchy. There are many other examples of systems that are peer-to-peer at the core and yet have some semi-centralized organization in application, such as Usenet, instant messaging, and Napster.
Usenet is an instructive example of the evolution of a decentralized system. Usenet propagation is symmetric: hosts share traffic. But because of the high cost of keeping a full news feed, in practice there is a backbone of hosts that carry all of the traffic and serve it to a large number of “leaf nodes” whose role is mostly to receive articles. Within Usenet, there was a natural trend toward making traffic propagation hierarchical, even though the underlying protocols do not demand it. This form of “soft centralization” may prove to be economic for many peer-to-peer systems with high-cost data transmission.
Many other current peer-to-peer applications present a decentralized face while relying on a central facilitator to coordinate operations. To a user of an instant messaging system, the application appears peer-to-peer, sending data directly to the friend being messaged. But all major instant messaging systems have some sort of server on the back end that facilitates nodes talking to each other. The server maintains an association between the user’s name and his or her current IP address, buffers messages in case the user is offline, and routes messages to users behind firewalls. Some systems (such as ICQ) allow direct client-to-client communication when possible but have a server as a fallback. A fully decentralized approach to instant messaging would not work on today’s Internet, but there are scaling advantages to allowing client-to-client communication when possible.
Napster is another example of a hybrid system. Napster’s file sharing is decentralized: one Napster client downloads a file directly from another Napster client’s machine. But the directory of files is centralized, with the Napster servers answering search queries and brokering client connections. This hybrid approach seems to scale well: the directory can be made efficient and uses low bandwidth, and the file sharing can happen on the edges of the network.
In practice, some applications might work better with a fully centralized design, not using any peer-to-peer technology at all. One example is a search on a large, relatively static database. Current web search engines are able to serve up to one billion pages all from a single place. Search algorithms have been highly optimized for centralized operation; there appears to be little benefit to spreading the search operation out on a peer-to-peer network (database generation, however, is another matter).
Also, applications that require centralized information sharing for accountability or correctness are hard to spread out on a decentralized network. For example, an auction site needs to guarantee that the best price wins; that can be difficult if the bidding process has been spread across many locations. Decentralization engenders a whole new area of network-related failures: unreliability, incorrect data synchronization, etc. Peer-to-peer designers need to balance the power of peer-to-peer models against the complications and limitations of decentralized systems.
One of the stranger phenomena in the current Internet is the abuse of port 80, the port that HTTP traffic uses when people browse the Web. Firewalls typically filter traffic based on the direction of traffic (incoming or outgoing) and the destination port of the traffic. Because the Web is a primary application of many Internet users, almost all firewalls allow outgoing connections on port 80 even if the firewall policy is otherwise very restrictive.
In the early days of the Internet, the port number usually indicated which application was using the network; the firewall could count on port 80 being only for Web traffic. But precisely because many firewalls allow connections to port 80, other application authors started routing traffic through that port. Streaming audio, instant messaging, remote method invocations, even whole mobile agents are being sent through port 80. Most current peer-to-peer applications have some way to use port 80 as well in order to circumvent network security policies. Naive firewalls are none the wiser; they are unaware that they are passing the exact sorts of traffic the network administrator intended to block.
The problem is twofold. First, there is no good way for a firewall to identify what applications are running through it. The port number has already been circumvented. Fancier firewalls can analyze the actual traffic going through the firewall and see if it is a legitimate HTTP stream, but that just encourages application designers to masquerade as HTTP, leading to an escalating arms race that benefits no one.
The second problem is that even if an application has a legitimate reason to go through the firewall, there is no simple way for the application to request permission. The firewall, as a network security measure, is outmoded. As long as a firewall allows some sort of traffic through, peer-to-peer applications will find a way to slip through that opening.