Many distributed services require or benefit from data replication. Data replication in a distributed system can improve reliability, availability, local autonomy, ease of load balancing, and data access performance. Many data intensive services offered over the network are likely to need to store multiple copies of the data they require across different machines. Much of that data will be in the form of files.
The needs of many higher level services have much in common. They need to assure that all replicas receive the latest version of the data in a timely manner; they need to permit updates from different sources (at least for availability purposes); they need to deal with problems of multiple concurrent updates; they need to provide control over the creation, deletion, and movement of data replicas; and they need a method of accessing particular replicas. This commonality of needs suggests that a common middleware service providing file replication is better than ad hoc solutions for each higher level service.
An ideal middleware file replication service would have several characteristics:
Rumor is a file replication service designed to meet this middleware need. It has the necessary characteristics described above. It is a user-level, rather than kernel-level, service, making distribution, installation, and porting easier. It uses a simple model of replication and operation, yet provides correct results even in tricky cases. It has built-in security features, and will receive further security analysis in the near future. Its operation is relatively easy to explain and understand.
Rumor is an optimistic file replication service that allows its users to maintain file replicas on different machines. Since Rumor is optimistic, updates are permitted at any time on any machine to any user who has permission to write the file. Updates are propagated periodically by a reconciliation service which can propagate updates either on a regular schedule or on user demand. Rumor is implemented at the user level, with no new kernel services required for basic operation. Rumor includes a secure data transport service.
The remainder of this paper describes the design and implementation of Rumor, and provides examples of how Rumor could be used as middleware in some important applications.
Rumor replicates subtrees of a hierarchical file space. Rumor permits multiple replicas of each subtree spread across arbitrary machines. There are no kernel hooks in the basic Rumor service, and no other mechanisms to trap file-related system calls (such as opens and writes), so normal file operations are totally unaffected by Rumor. For example, an open of a file replicated by Rumor uses exactly the same code as an open of a non-replicated file. This design implies that Rumor has no opportunity to trap each open or write of a file, and thus cannot store records for each file modification. It also implies that Rumor adds no overhead to normal requests.
Rumor maintains replica consistency by running a user-level process called reconciliation. This process examines all files in the replicated subtree to determine which of them have changed since the last reconciliation process. Rumor maintains a database on the state of its files that is used in this process. By examining information obtained from each file, such as its modification time, and comparing that information to the previously stored value, Rumor can determine precisely which files were changed. The reconciliation process can package this information for use by another replica, which runs a second process that integrates changes from this replica into its local copies of the files.
Other services such as Laplink and Reconcile have attempted to provide mechanisms to support file replication, but they have tended to either overlook the hard cases or solve them sub-optimally. One such case is concurrent updates to a single file. Despite the number of replicas of the file, and the number of them that have received concurrent updates, Rumor is able to determine precisely which replicas have been concurrently updated, which replicas have already received updates from other replicas, and which resulting conflicts have already been dealt with. Rumor automatically resolves conflicting updates to directories and has a mechanism for invoking user-specified programs to perform application-specific conflict resolution.
Concurrent update is not the only tricky case. Laplink and Reconcile, for example, do not fully solve the generic problem of create/delete ambiguities. File renames, creation of different files with the same name, and unusual patterns of moves, name creations, and deletions can also lead to situations that are tricky to recognize and to deal with. The methods Rumor uses to detect these cases are strongly dependent on what information about files the operating system exports to the user. The existing implementation makes use of very standard information most Unix systems make available, but Windows or Macintosh implementations would use different methods. But the algorithms for handling the cases when detected are platform-independent. These algorithms were largely derived from the Ficus file system.
Rumor processes can be run under the identity of the user replicating the files or under a more privileged identity. If they are run under the user's identity, they can only assign file ownership to the user running the process. In such modes of operation, files replicated under Rumor have each file's replica owned by the local user. For example, file ``foo'' stored at sites 1, 2, and 3 by users A, B, and C, respectively, would be owned by user A on site 1, user B on site 2, and user C on site 3. If the Rumor processes are running as unprivileged users, there is no other option.
If Rumor processes are run by a privileged user, like the Unix superuser, more complex file ownership patterns are possible. Assigning ownership of a file replica to a remote user who does not otherwise have an account on the local machine is a complex problem, however. This issue will be addressed by future Rumor research.
File replication is a generally useful service that is a component of many other higher-level services. Rumor provides a fairly basic file- replication service that others can profitably build on. This section presents three examples of how Rumor can be used as middleware.
The Truffles project at UCLA and Trusted Information Systems investigates providing secure services to cooperating users across networks, such as replication of a set of files. This problem may seem to be no more than slapping a secure data transport layer on a service like Rumor, but the full practical solution of the problem is much more complex. In addition to securely transporting the data, there are significant problems involved in guaranteeing secure setup of the sharing relationship, secure distribution of cryptographic keys, reasonable methods for controlling membership in file sharing relationships, and mechanisms for administering and troubleshooting an ongoing sharing relationship. Truffles' file sharing service is actually a fine example of the difference between a middleware product like Rumor and a full user-level service. Rumor is an important component of the Truffles file system solution, but it is far from the only component.
Another potential use of Rumor for middleware is support of replicated World Wide Web servers. NCSA has implemented a method for replicating Web servers that is relatively independent of the replication service. Their research used Andrew file replication to provide replicated servers. In certain environments, Andrew is a fine solution, but for the more general case it cannot be used, due to its requirements that all participating servers be tightly coupled. Also, the Andrew replication service uses conservative file replication, which limits the sites that can generate updates. If any Web server in the replication set is supposed to be able to update serviced Web pages, and those updates are to be propagated to all sites, an optimistic replication service like Rumor is more appropriate. Rumor would also be a better choice for replicating servers across wide areas.
UCLA, in conjunction with Locus Computing Corporation, is starting an experiment to support the use of mobile computers in medical clinical trials. One need of this experiment is the ability to have consistent data available both on disconnected mobile machines and central computers that maintain the full data for the experiment. Updates can be generated from either type of machine. Other needs include high quality graphical user interfaces, data consistency checking, and automatic connection establishment, so Rumor is not the full solution to the problem, but it can provide the basic replication service that keeps data on the portables and central computers in sync, while still allowing medical workers in the field and experimenters in the lab to make necessary updates.
Data replication, and file replication in particular, is an important middleware service for the distributed applications of the future. A user- level, highly portable, optimistic, peer-to-peer, public domain service is well-suited for filling many of these replication needs. Rumor is one such service. Rumor is being used to fill some important middleware needs in systems under development, and will be used in more such systems.