User:PeterWachira7/sandbox

A Report on Distributed File Systems, Name Services, Time

Abstract Distributed systems, name services, time and global states are key concepts in modern computing. Distributed systems allow storage and access of files on remote file systems by programs. The data in a distributed file system is stored on a server and is accessed as though it was stored locally. The purpose of a distributed file system in a network is to come up with a control mechanism for file sharing and information transfer as the data storage is governed by a remote access protocol. In server architecture nonetheless, name services are used to define control addresses and access protocols. These name services come in handy in maintenance and storage of authorized computers in a network, consolidation of information pertaining resource addresses, domains and remote object addresses. Time control is a critical factor in distributed file systems as they offer consistency in the processes during transactions in distributed systems (Coulouris et al., 2005). Taking an instance where distributed computing is used for transactions over the global network – internet, a consistent way of recording transaction time-stamps is needed to develop an unswerving account of progressions in system access, retrieval of information, and synchronization. Global states refer to the capability of distributed systems to manage information concerning the availability of resources including but not limited to deadlock handling, garbage collection and termination of processes. In this report, focus will be delved into how these factors affect distributed computing, their impacts, methods involved in achieving the concept of distributed computing, and the advantages and disadvantages of using such methods.

Distributed Systems and Unique File Identifiers (UFIDs) Unique File Identifiers are used to store information about each file in the remote storage including path variables; that is file locations, and the directories or folders that lead to the file location. The distributed file system makes use of mainly UNIX file system-like approach to data storage and access. Likewise, in distributed file systems, UFIDs are used to store basic information about the file such as access authorization, file location, group type and authorization, cache address, and file representation. Just as the name suggests, Unique File Identifiers are required to be unique across all file systems. This can be attributed to the fact that each file system has a special way of identifying files and their characteristics, including location and user-groups. Different server technologies provide for a variety of user-groups that can access the system. Network File Systems. Sun Microsystems developed the Network File Systems (NFS) infrastructure to provide for the first known remote network storage system in 1985. The NFS provides for consistency in usage and maintenance, efficiency, fault-tolerance, heterogeneity, and transparency. Transparency was the key issue in the development of the Network file systems as this provided for a verifiable process of remote access of network files from a UNIX – client file system. This particularly provides for the establishment of a reliable client-server system that is capable of achieving a symmetrical relationship between the client and the server. This further establishes a strong ground for remote access protocol by any machine running on the same network (Coulouris et al., 2005). The most important aspect of the NFS is that it has the capability of running both the client and the server operating systems on a UNIX environment. Therefore, the application which needs access to a remote protocol on an NFS server simply makes UNIX standard calls on the client side. The virtual UNIX operating system in tern sends out these requests through an NFS protocol to the remote server through the network connection. The server-side of the NFS handles these requests through a standard call via the UNIX virtual kernel to request for files and resources to the file system (Coulouris et al., 2005). The presence of the virtual file system in UNIX calls help the operating system in differentiating which calls are made remotely and those that are made locally. Therefore, the client-side of the NFS must have the following components: A virtual UNIX file system, as mentioned earlier. Since many requests are being received from application programs requesting for resources from the UNIX kernel, the virtual file system helps in determining which of these request are being made to the local file system, and those which are being made to the remote Network File System. Nonetheless, the client-side of the NFS must have a Network File System client. This is the actual module that is being utilized in receiving remote UNIX calls, as well as relaying the calls through the network to the server-side of the NFS. Relay of calls is made possible due to the NFS client’s capability of communication through User Datagram Protocol (UDP) or the Transmission Control Protocol (TCP) (Coulouris et al., 2005). Moreover, these calls are made directly to the client without the urgency to recompile the application programs making these requests. Andrew File Systems (AFS). The Andrew File System was developed with an aim to minimize client – server communication. The AFS has a similarity to NFS in that it offers transparent transmission and access between the cline and the server, without the need to recompile user programs making the requests. The file system was initially implemented in networks containing remote workstations and servers that ran Mach Operating System and BSD UNIX operating systems. However, the design and implementation of AFS completely differs from the NFS implementation, in that AFS makes use of file caching to achieve scalability. This is achieved through the whole-file serving and whole-file caching design principles. Whole-file serving means that transmission of files and directory contentment’s is made entirely to the client computers. Whole-file caching seeks to achieve storage of file copies on the client’s systems through permanent caching of these files on the local hard disk (Coulouris et al., 2005). These methods work well together to achieve scalability in a multi-user environment. The AFS achieves its goals through the use of two components; Venus and Vice, which run on the clients’ and the servers’ UNIX systems respectively. As opposed to the case of NFS where UNIX calls are handled by a virtual file management system, the two software components receive and transmit the requests directly from the UNIX kernel, while application program requests are handled directly by the kernel. Files requested by the client-side are store in shared or the local directories in the server-side, and a token of promising access is issued to the client-side. In case the files are modified by any other user on the network, the Vice sends out a signal to the Venus notifying the client of possible updating or modification of the files (Coulouris et al., 2005). Volumes are used as unique user identifiers for files requested by different groups of users in an aim to lessen the hassle of file location and movement from the local or shared directories. This concept eases file and user management, hence achieving the desired scalability even further. However, several factors limit the achievement of complete scalability with the use of AFS as follows. The caching method does not provide for an efficient concurrency control methodology. For instance, if multiple users are accessing the same files, through read, write or close modes, the AFS does not provide for a method of providing feedback to either of the users that their last close action was not completed successfully, but instead the processes close unexpectedly. Nevertheless, there have been recent modifications and improvement in distributed file systems with an aim of achieving greater scalability. This has been achieved through recent technologies such as Not Quite Network File System (NQFS), a method which is aimed at improving efficient and precise caching to the NFS protocol (Coulouris et al., 2005). Moreover, recent design approaches such as distributed file data across multiple servers has enabled fast access, storage, and retrieval of files, through exploitation of high speed networks. This offers a suitable system for serving large files. Nonetheless, server-less architecture that has evolved in modern times as a means for distributed file systems. In this system, storage and processing is left for all the available nodes in the network. Name Services. Caching in naming services. The DNS servers store information involving the DNS servers in the root directory. However, since a single server may be used to host multiple DNS names, the caching method is used. Caching helps in distinguishing between different servers at the same time. In this way, if a client or server makes a request to the DNS server, the server’s information will be cached in order to minimize chances of having to establish a communication yate again with the client or server (Coulouris et al., 2005). Depending on the time to live of requests, caches are controlled to offer the requests to those servers only for that designated time. If the time for the requests expire, the cache is deemed to be expired and the servers have to re-establish contact with the domain name server. Domain name server providing multiple answers to a single name lookup. DNS servers may provide multiple answers for a single name look-up depending on the level of the domains. Initially, the result of the query provides for two answers for two name servers, that is dns0 and dns1 (Coulouris et al., 2005). Moreover, it also displays the mail protocols database corresponding to the two name servers. This is particularly important in that it rules out the chances of a domain name having different names for the NS0, NS1, or even having un-corresponding names for the mail database servers. Therefore, once a name has been registered, it will be impossible for the same name to be re-used in the domain name server for servers other than the ones designated. Time and Computer Clock Synchronization. Need for clock synchronization. In distributed computing, there is dire need to maintain consistent-stamps of transaction for denoting the exact time when a transaction was started, or completed, or the time when a transaction failed. However, global times are varying due to change in the geographical factors such as longitudes around the world. Nonetheless, distributed processing is meant to offer services of processing, storage and access all over the world. If a perfect means of synchronizing global clocks according to the server’s location does not exist, then problems are bound to arise in recording the time of transactions, as a client may access the NFS from a time region that is either behind the server’s physical time region, or ahead of the server’s time. Time, therefore, is a dimension that need to be measured accurately to provide for this consistency. Other than the fact of recording transaction time, there have been an evolution in algorithms that are meant to reference objects in an object-oriented environment (Coulouris et al., 2005). Therefore, without a means for perfect synchronization, such object could either; have not been created, or could be obsolete in a distributed system. Design requirements for clock synchronization in a distributed system. Several algorithms exist, which helps in synchronization of computer clocks, either through logical time synchronization, or physical time synchronization. Physical clocks synchronization involve synchronizing the time in one node, relative to another node, in a certain degree of accuracy, achieving the ability to measure the time between the two nodes (Coulouris et al., 2005). Logical clock synchronization tends to suggest a more efficient means for synchronization in distributed systems through partial ordering. The following are the design patterns that are involved in both physical and logical clock synchronization: Network Time Protocol. This is a method that was brought into light in 1995, whereby, the computers connected to the network server through the Internet can synchronize their clocks. Moreover, this method makes use of several repeating servers to ensure consistency even when there are network connectivity problems.

References Coulouris, G. F., Dollimore, J., & Kindberg, T. (2005). Distributed systems: concepts and design. Pearson education.