4BA2

leaflet

Technology Survey

BitTorrent

by Stephen Knox, Diarmaid O'Cearuil, Nicola Scott Holland and Ljiljana Skrba

Introduction

BitTorrent on its own is a protocol to enable file-downloading. When used in conjunction with directory style websites (the most famous one is the now defunct Suprnova), BitTorrent becomes a powerful tool enabling users of the system to share large files. The users of the system download the files from each other but they rely on a centralised system in order to do this. The Overhaul protocol, however, is an example of a more decentralised application.

File Sharing (Pre - BitTorrent)

A computer network is a system for communication among two or more computers allowing exchange of traffic back and forth between them. The Internet is an interconnected system of networks that connects computers around the world via the TCP/IP protocol [1]

One of the most popular uses of the Internet is file downloading. Initally users typically downloaded files from a central server. This method was restricted by the number of people attempting to download files from the server.
Napster, brought the P2P method of downloading to the mainstream. In this paradigm, a user would download and view files from others using Napster (generally known as peers) after being connected to them through a central server. P2P file-sharing is a more efficient way of downloading high-bandwidth material like music and video. The increase in efficiency can be attributed to each member of the network using some of their own upload bandwidth to share the files with others, maximizing the speed at which downloads occur.

In order for a P2P application to be successful the system should adhere to several criteria [2]

  1. It should have a high availability of different files
  2. The content should be of good quality - ie the network should not be polluted with hoax files [3]
  3. Flashcrowds should be efficiently dealt with
  4. High download speeds should be available

File sharing applications which followed Napster's lead eventually evolved into decentralised systems - with no central server(hence they could not be shut down by the authorities like Napster was.) Examples of these applications are Kazaa and Morpheous. However there were limits to the success of this file-sharing paradigm. They were afflicted by "leeching" which is when a user downloads but refuses to upload. Most importantly, a user could only download as fast as the user they were downloading from, could upload. This method does not take advantage of broadband where users had far greater downloading power.

The BitTorrent Protocol

Illustration of the BItTorrent Protocol
The green blocks represent fully downloaded chunks, available for upload. The red blocks are the blocks currently being downloaded. Each peer can upload only two blocks at a time.

In BitTorrent, a file is split into chunks, typically of the order of a thousand chunks per file. To download a complete file, a user downloads different chunks of the desired file from other users. The chunks are not downloaded sequentially, but are based on the rarity of the chunk at that time. When all the chunks have been downloaded, the chunks are reassembled et voila, the user has their file. This method of splitting a file into many pieces, greatly facilitates the sharing of large files such as MPEGs and software applications. In fact, one of the original applications of BitTorrent was to share GNU/Linux software.[4]

A seed seed is a computer that has a complete copy of a particular file, whereas a peer is one that has a partial copy. In order to download this file, one must simply procure the .torrent file.
.torrent refers to the metadata available from a web server about the file you wish to download - typcally the filename, size, and the hash of each block in the file (which allows users to make sure they are downloading the real thing) and the address of a tracker server The .torrent file is sent to the downloader's computer when they click on a link and it can be used for downloading via BitTorrent. When a client finishes downloading from a seed, it will remain open until it is closed or the 'finish' button is clicked.[5]

The .torrent file is not stored on the website itself but is distributed among a number of tracker servers. These servers store a global registry of all the downloaders and seeds of the corresponding file. When a user wishes to start downloading, they click on a link from the website to a .torrent file. The tracker server responds to the users request by sending back a list of other users that have (part of) the file. A direct connection is set up between the users after they have bartered for the file. In general, if a user has high upload rates they will be allowed high download speeds. If a user has completed downloading a file and stays online, they then too become a seed for the file.
Should a tracker server go offline while a user is downloading, the process is not affected as they are downloading from a peer - independant of the server. However, the user will not be able to commence any new downloads.[5]

Here the horizontal axis represents the number of seeds for a file after its injection into mainstream traffic. The vertical axis shows how long the file needs to stay available so that a given number users can download it.
The above graph,(our representation of figure 7 of the Register's study [2]) deals with another aspect of Bit Torrent. Here the horizontal axis represents the number of seeds for a file after its injection into mainstream traffic. The vertical axis shows how long the file needs to stay available so that a given number users can download it.

As the number of seeds increases, the lifetime of the file dramatically declines. This exemplifies the need for users to become seeds, in order to reduce download times. The attraction to becoming a seed is low because all upload capacity is used for distribution of one file. Obviously, as the number of seeds increases, not only are you a seed for a shorter time, but the bandwidth used for uploading this file is reduced. Even though, as time goes by the number of seeds for a particular file decreases, the file is available so long as there is at least one seed available.
When no seeds are available, a user with the complete file must come in and act as the new seed. This is known as reseeding.
A Swarm is a term given to a group of computers, potentially including both seeds and Peers that are connected for a particular file. If a Swarm of Peers has a complete copy between them, but none of them possess it individually this is said to be a Distributed copy. [5]

Choked is a term from the BitTorrent protocol which indicates a state an uploader is in if it refuses to send anything on that link. It usually happens when a client has too many simultaneous uploads i.e. a limit is set to the number of files a computer can upload concurrently and when another file is requested it will be denied. Interested is another term which specifies that a downloader wants something from an uploader. It is used when the choked flag is in use, to let it be known that whenever possible, a connection is wanted.
Snubbed is term from the protocol that indicates that a client has not received anything for a substantial period of time. It is used to improve download speeds i.e. to let other peers know that it has been ignored and that it should probably receive some attention.[6]

The above graph displays measurements taken during a study on the tolerance of Bit Torrent to the "FlashCrowd Effect"
The above graph displays measurements taken during a study on the tolerance of Bit Torrent to the "FlashCrowd Effect"(Our representation of Figure 5 of The Register's study[2]). The blue line shows the number of downloads of the tracker for "Lord of the rings 3" from "FutureZone.TV". The red line signifies the number of seeds for the file.

For the first five days, "FutureZone.TV" was the only seed, causing the sustained high download rates. As the number of seeds increases, the download rate dropped for the website. Notice that a small increase in the number of seeds dramatically decreases the load on the provider. From these results, it can be concluded that bit torrent is well capable of handling large, sudden crowds.

Decentralisation

Traditionally, the bit torrent model has relied on a small number of centralised servers to provide the trackers for specific files. There is an inherant problem with this model. If any number of servers go down, this places an extra burdon on the remaining few. Also, because of their public nature, it makes them susceptible to attack. Although this has been shown that this is difficult to do, it is also a comon fact that a determined hackers usually finds a way.
When a popular new file is introduced by a seed there can be a large surge of simultaneous downloads from this source. This is termed the 'flash-crowd-effect'. A way to avoid or at least alleviate this bottleneck is to decentralise the system. This eliminates the need for a central server, while increasing the load on client machines, as they need to track who they are downloading from and who they are uploading to. This user must then put his .torrent file and tracker on offer, either by e-mail or on a website. In this scenario, a new user must then initiate his download from a different location from where the original file came. (One example of this is "Exeem". This is not as popular among the bit torrent community, as it is closed source and installs adware on client machines.)
A possible disadvantage of this system is administration. When supernova was still in operation, an article was written detailing the performance of bit torrent. [7] One test they carried out was to donate an account for hosting a mirror, and adding spyware to the html code. This experiment failed. All the corrupted code provided by them was surprisingly filtered out. Unfortunately this system of moderation relies on global components and is difficult to distribute because there are no mediators to administer the file content. Serious security issues are thus raised. A simple case to illustrate this is if a client downloads a file, so is registered for having that file, and then renames a file containing corrupted data. Since the infrastructure is not there to oversee the file sharing, this file may well infect many machines. Another problem that may arise is obtaining the torrent.There is no longer a "torrent site" from which to download. So in order to get a torrent, a much more difficult search is needed.

Pure Decentralisation with the Overhaul Protocol

When a file is large (as in a gig or more) it is better to divide it into separate chunks. One way of doing this is using Overhaul ([8]. Overhaul changes the HTTP of an overloaded server so it acts like a peer to peer network. It splits up the requested document into n chunks. Each request results in a response that includes the ith chunk and the IP address of m other clients accessing the document. A signature for each of the n chunks is also provided in the header. A client supporting Overhaul connects to other clients to retrieve the remaining chunks. This saves bandwidth utilized by a regular fetch. By transferring only a small portion of the document, the Overhaul process frees up the server to satisfy requests from other clients.

The Overhaul protocol
The Overhaul Protocol

An example is shown above; four client request for the same document off the server. The server becomes overloaded and goes into Overhaul mode, where it splits the document into chunks and distributes these amongst the clients. The client collaborate (using the headers of the chunks) to merge the chunks together so that each can form a coherent document. This is different to BitTorrent which is a specialized tool for distributing large files over existing peer to peer networks, (the Internet). Also BitTorrent requires a dedicated tracker and meta-info file for each requested document, resulting in extra traffic to the server.

Conclusion

As it can be seen Bit Torrent may be ushering in a new paradigm in downloading. At the moment its the cheapest and one of the fastest ways to share lage files in the mainstream medium - the internet. But as with everything there are the disadvanges. One of the less attractive features of Bit Torrent is that a user in turn has to become a seed and stay on-line to share its copy of the file requested. This can tie up one's upload bandwidth with unwanted traffic. Whatever disadvantages lie in bit torrent, if it is used properly, these are quite acceptable.

References

  1. Dictionary.com/Internet
    URL: http://dictionary.reference.com/search?q=internet [10 February 2005]
  2. The BitTorrent P2P file-sharing system
    URL: http://www.theregister.co.uk/2004/12/18/bittorrent_measurements_analysis/ [11 February 2005]
  3. [Liang J.,Kumar R, Xi Y., Ross, K.] Pollution in P2P file sharing systems.
    In IEEE Infocom, Miami, FL, USA, March 2005.
  4. Bit Torrent - Wikipedia
    URL:http://en.wikipedia.org/wiki/BitTorrent [10 February 2005]
  5. Brian's BitTorrent FAQ and Guide
    URL: http://www.dessent.net/btfaq/
  6. Bit Torrent: Protocol Specification
    URL: http://bitconjurer.org/BitTorrent/protocol.html [10 February 2005]
  7. The Bittorrent P2P File-sharing System: Measurements and Analysis
    URL: http://p2pnet.net/story/3292 [14 February 2005]
  8. [Patel J, Gupta I] Overhaul Extending HTTP to Combat Flash Crowds
    URL: http://www-courses.cs.uiuc.edu/~cs598ig/wcw2004.pdf