Introduction



next up previous
Next: Publishing and Name Up: Location-Independent Naming for Virtual Previous: Location-Independent Naming for Virtual

Introduction

 

Well-maintained software repositories are central to software reuse because they make high-quality software widely available and easily accessible. One such repository is Netlib, a collection of high-quality publicly available mathematical software. Netlib, in operation since 1985, currently processes over 300,000 requests a day. Netlib is serving as a prototype for development of the National Software Exchange (NSE), which has the goal of encompassing all High Performance Computing Consortium (HPCC) software repositories and of promoting reuse of software components developed by Grand Challenge and other scientific computing researchers.

Many repositories, previously maintained centrally, have evolved into virtual repositories that serve as directories to distributed collections. For example, the GAMS Software Repository, once a central repository [1], is now a virtual repository that catalogs software maintained by other repositories [2]. Another distributed repository, the NASA/GSFC/ESS Software and Information Exchange (accessible at http://farside.gsfc.nasa.gov/ESS/), stores some of the catalogued software packages locally, but for packages at remote sites stores just descriptions with pointers. Growth in the popularity of the Internet and the World Wide Web, as well as the wide availability of WWW client and server software, has accelerated the shift from centrally maintained software repositories to virtual, distributed repositories. A provider site (either the maintainer or a mirror site) need only make the file available from an FTP, Gopher, or HTTP server for it to be accessible from a WWW client.

The main advantage of distributing a repository is to allow the software to be maintained by those in the best position to keep it up-to-date. Also, copies of popular software packages may be mirrored by a number of sites to increase availability (e.g., if one site is unreachable, the software may be retrieved from a different site) and to prevent bottlenecks.

A distributed, virtual repository should appear as a centralized, well-organized repository. An effective search interface for a distributed software repository allows a user to search for software without knowing at what site the software is located. A mirrored file should appear once in a list of such results, rather than once for each mirrored copy. Yet a searcher should still enjoy the reliability and performance benefits of mirroring - i.e., be able to try alternative locations to retrieve a search hit or retrieve the closest copy of the file.

Distributed maintenance and mirroring of software introduces challenges as well as benefits. Maintaining the quality of software and of indexing information and presenting a uniform searching and browsing interface become much more difficult. Location-independent naming facilitates the organization, finding, and retrieval of software in a distributed repository, and can be used to provide consistency, authenticity, and integrity guarantees.

The WWW mechanism of specifying a file by its Uniform Resource Locator (URL) poses several difficulties for a virtual software repository. URLs are inadequate for ensuring the consistency and currency of mirrored copies. A URL for an independently mirrored copy of a software package may point to an out-of-date copy and give no indication that it is not up-to-date. Consistency between a set of files that are meant to be used together may also become a problem. For example, the Netlib Software Repository provides dependency checking that allows the user to retrieve a top-level routine plus all routines in its dependency tree (i.e., those routines that are called directly or indirectly by the top-level routine). Another example is a graphical parallel programming environment that relies on an underlying parallel communications support package. The problem becomes more complex when different pieces might be retrieved from different physical repositories. Ideally, the user should be able to have a consistent set retrieved automatically without having to scan documentation to verify that compatible pieces have been retrieved.

Distributing the repository also poses challenges for searching. A centrally maintained repository can easily run an indexing and search engine that provides a search interface to the repository contents. With the current WWW setup, however, the user has a choice of searching the various distributed repository sites individually or of using a general purpose WWW search engine such as WebCrawler, Lycos, or the World Wide Web Worm.

Most of the above problems can be alleviated by implementing a location-independent naming architecture that includes mechanisms for authenticity and integrity checking. We have designed a naming scheme in which the binding between a name and file contents is unchangeable and verifiable. A name may be resolved to multiple, mirrored copies. In the case where it represents a set of files, a name may be resolved to a list of other names. The record for a resource that includes the name as well as other descriptive information may be signed by the publisher so that users may verify the authenticity of a retrieved resource. This paper describes the design of our naming architecture. We also describe our implementation of a protoype name-to-location service and of a modified WWW client that does name resolution. A glossary of terms used in this paper is included as an appendix.


next up previous
Next: Publishing and Name Up: Location-Independent Naming for Virtual Previous: Location-Independent Naming for Virtual



Jack Dongarra
Mon Jan 30 10:42:57 EST 1995