Common file systems allow us to run the Internet on a larger scale.Without them there would be a near inability to have e-commerce sites such as www.ebay.com. These e-commerce sites operate on large scales - thousands upon thousands of users could all be making transactions simultaneously on a variety of platforms from Windows 98 through to MacOS 8 to one of the many flavours of Linux. The remote server at these sites will need to be capable of serving all these transactions without having to tell the user "Sorry Server is Busy - Come Back Later Please". If users cannot get what they want, when they ask on an e-commerce site they are likely to go elsewhere - and it is essential that this does not happen.
How do we prevent this? How do we ensure that we can handle our requests? This will involve a requirement for, essentially, multiple server machines dealing with all these requests without the user, the client, having to worry about such details. Let's look at a simple example :
Imagine a user is visiting an auction site. They see an offer that interests them : "Slightly Used Kettle - $5" and decide to make a bid on it. Their bid of $6 will be forwarded to the web-site and there, due to the magnitude of the site, be sent off to a machine there where a server that is free to handle the request will record their bid. However someone else can come along a second after the bid is confirmed, be directed to a different server (the original one being busy perhaps processing some other bid) and not realise that the kettle is now $6 because this server has not been told of the new, updated kettle price (the price only being found on the old server). Two people have a bid they feel is rightfully theirs at $6 - one is valid, the other is not. Both customers will end up angry and take their business elsewhere. A solution is needed to manage these multi-computer spanning situations - Enter common file systems.
Common file systems provide a way for us to spread our files over a number of different machines but, importantly, give the appearance of being on only one machine. Common file systems allow us to look up files using a single name space. So we would find our kettle, item 65604, at /items/kitchenware/kettles/65604/ - no matter where the actual information is stored (in terms of physical location on a harddisk). The path is a logical path - that is, we use this path to access the file and it is our file system, working in the background, which will resolve this path into a physical address of where the actual file is using whatever techniques that file system operates by. Common file systems, which work on a distributed systems concept (information spread over multiple computers with transparency so we don't have to care about this fact - it is handled for us) allow us to move the physical location of the file without changing the logical path. Think of how important this could be for an application - if we were, for example, accessing remote data and updating a file we will need to be able to be sure we are accessing the correct file. What if the Administrator on the remote end had to change their computer where the information was stored - traditionally then our client would be looking in the wrong location. However with a logical address, we can easily have the logical address relate to the new physical address solely by some tweaking at the server end of the common file system - the client need never worry. The logical path is consistent and we only need change the interpretation of the logical path so it looks at the new physical address. This issue is an important one - and common file systems will take care of it.
However there are other things to consider - What if there are permissions involved - we need a mechanism for conveying this to the user so the wrong user cannot over-write an important file. What if specific information changes on the remote server (a message relevant to display details for example)- by what means do we tell the client this without them finding out by accident or, even worse, not knowing and thus causing a critical mistake? What about security? Is there a mechanism to ensure that the file we are sending cannot be intercepted halfway through and false information sent under our "name"? We might conceivably think of a way to work on this on a local area network scale but what about the large global scale that the internet allows? The file systems that deal with such scales are the ones we set out to examine.
There are many examples of common file systems, each providing a different approach or take on the requirements. For example we have SFS or Self-Certifying File System, a global file system which concentrates on strong security over untrusted networks. We have the Coda filesystem (whose roots are in the Andrew File System) which advertises features like support for mobile computing and the fact it operates under a liberal license and is free. We even have software, such as TotalNET Advanced Server created by Syntax, which provides a common file system by reconciling the different protocols used by different machines (protocols essentially dealing with how one machine talks to another). For our introduction to common file systems we have concentrated on three file systems in particular :
* AFS - Andrew File System enables cooperating hosts (both client and server) to share filesystem resources across the network. It was developed at Carnegie-Mellon university. It is a robust system, which helped spawn other systems such as the Coda file system. Due to its incompatibility with UNIX file systems (the most common type of file system for server data) it has not found wide favour in the public domain, but has achieved more noticeable success among universities (providing, as it does, good security).
* WebNFS - WebNFS is Sun's addressing of the common file system problem. It is a child of NFS (Network File System) and was designed to allow users to access files stored on the Internet and intranets. It has seen success with its support in the Netscape suite of Internet tools as well as in use by companies such as Oracle Corp., Spyglass, IBM, Apple Computer Inc., Auspex Systems and Novell Corporation.
* CIFS - Common Internet File System - This is Microsoft's offering to the world of common file systems. CIFS is intended to provide an open cross-platform mechanism for client systems to request file services from server systems over a network with features for : file access, file and record locking, safe-caching, file change notification, protocol version negotiation, extended attributes (beyond just file permissions), distributed replicated virtual volumes (as per the bidding example above), server name resolution independence, batched results and unicode file names (thus not limiting us to ASCII names - i.e. we can use other character sets beyond the standard European character set). Being built on an existent technology (the SMB protocol) means it could quickly achieve greater public acceptance with those familiar with the original technology. It found support with Novell, Banyan, IBM and Digital among others principally due to being built on an existing protocol. CIFS is an important one in terms of e-commerce sites as it is embedded into the highly popular Microsoft suite of software (the Windows line for example, as well as support in Internet Explorer, the leading Web browser at the time of writing). Additionally CIFS has specific capabilities for dealing with low-speed dial-up connections, which will be used, currently, the large percentage of e-commerce customers until greater bandwidth connections become more common.
In addition to looking at these systems, we will also take a look at examples of famous outages, when common file systems failed and what the repercussions of this were and, more importantly, what was done to prevent such a similar occurrence.
The 3 chosen file systems will be examined under a number of different criteria. These criteria were chosen as the most important ones in terms of examining a file system for e-commerce viability. Together they encompass nearly all the necessary expected features for Internet based file systems.
- Origins : Where we will provide a brief history of the file system, its roots. Who made it and did they base it on existing concepts and technologies or was it an entirely new form?
- Central Design Strategy : What was the thinking behind this file system? What did it have to do, in technical terms?
- Scalability : How many users can this system handle? Is it built with the idea of a 1,000 users accessing it or a potential unlimited audience? If one system starts to falter at 15,000 users then it may not be suited to the needs of a very large e-commerce site. Additionally increasing the size of the system should not mean that the old system is made redundant.
- Naming Convention : How do we access files in this system? What is the hierachy? Are there features such as local transparency access - does it seem like the files are stored on our computer, when they may actually exist on a computer a thousand miles away? The way the file structure is viewed by the client could be of vital importance - does everyone have the same view is another question that may need to be asked.
- Security : How trustworthy is this file system? Can we safely transmit sensitive data? What guarantees are in place that our data will not be tampered with and what notifications do we receive to this end? How do we allow some users access to certain domains or "areas" within the file system while restricting others? The implications for this in relation to e-commerce are obvious when we could be sending information like credit card numbers and dare not risk compromising the customers details.
- Caching/Versioning: How is the problem of different versions of files addressed? How do we ensure that the file we get after a request is the correct, up-to-date version if there are multiple copies of the files? How do we ensure that our version just saved will be recognised as the latest, most up-to-date correct version of the file?
- Server Replication: A large site will need server replication to allow it to handle large volumes of requests simultaneously, and possibly multiple requests of the exact same type. How easy is it to have multiple copies of the data and how easy can we deal with it? When our main server crashes, how will a backup server come into play and how will it notify everyone to transmit the data to it now? This will link into issues of caching and versioning to ensure all the information stored is the correct and most recent version. Additionally issues such as placing some data on the server most suitable to a clients current connection, to speed up the process, could come into play here.
- Administration: Is it easy for the file system to be administered? Can an administrator easily adapt to it from any particular file system background or must they adjust to an entirely new thinking for file systems? When making changes must the whole system be taken off-line or can the administrator make changes while the system stays up? The site www.ebay.com had to go down once a week for maintenance purposes - such events can only but have a negative customer influence.
- Architecture/Platform Support - Is the file system cross-platform? Would it run better and faster on a UNIX system than a Windows NT Server based system? Is it unable to work with a particular technology at all? Picking a file system that is suited to a business' current technology is a vital decision in terms not only of usability, but also for budgetary concerns - a system that fails to support both UNIX and Windows OS would automatically be rendered inviable for an e-commerce site.
We feel that with these criteria we can succesfully examine file systems and provide a good introduction to some file systems that comprise the systems behind the World Wide Web.