Previous: User Interface for
Virtual access to a set of distributed, autonomously maintained repositories has many advantages but also poses numerous challenges. The main advantage of distributing the repository is to allow the software to be maintained by those in the best position to keep it up-to-date. Also, copies of popular software packages may be transparently mirrored to increase availability, improve response time, and prevent bottlenecks.
Many issues that are addressed in a centrally maintained repository by administrative procedures must be addressed in a virtual repository by other means. Such issues include assignment of unique identifiers to retrievable assets, collection and merging of cataloging information, and verification of the authenticity and integrity of retrieved assets.
The NHSE's approach to these issues will be to implement a location-independent naming architecture that unambiguously associates a unique name, called a Location Independent Filename (LIFN), with the byte contents of a published asset and that includes mechanisms for authentication and integrity checking. Authentication will ensure that the purported author of a published asset is the actual author; integrity checking will ensure that the contents of a retrieved asset are exactly the same as those published under the asset's unique name. Higher level names, called Uniform Resource Names (URNs), that are not associated with the specific byte contents, may also be assigned to assets.
Figure 3: Publishing and Retrieving Assets Using Location-Independent Naming
Publishing tools will be made available to assist publishers with naming and cryptographically signing published assets, and with exporting asset descriptions to an NHSE search service. A distributed name-to-location lookup service will be provided, along with a means for publisher and mirror sites to register locations for published assets. A client library that may be linked with a WWW browser to enable the browser to resolve location-independent names and to perform authenticity and integrity checking will also be provided.
The steps involved in publishing an asset are shown as P1, P2, and P3 in Figure 3. A mirror site that maintains an authorized copy of an asset may also register a location for that asset. The steps a user carries out in searching for and retrieving a published asset are shown as U1, U2, U3, U4 in Figure 3. Having unique verifiable names will allow search services to unambiguously associate descriptions, including third-party descriptions such as critical reviews, with published assets. Scientific researchers will be able to unambiguously refer to software used to produce experimental results. Different users who have downloaded copies of a software asset that have the same name, or a user who downloads the same named asset more than once, will have the assurance that the copies are indeed the same. Unique naming will also facilitate collection management and tracking of assets by file servers and search services.
Authentication of assets will be handled by an asymmetric public-private key encryption system. A publisher will sign his description of an asset using his private key. Then any client program in possession of the publisher's public key will be able to authenticate the asset description. Either the name or the description for an asset will include a signature for the file containing that asset, such as the MD5 fingerprint. A client program will be able to perform an integrity check on a retrieved file by computing the signature for the file and comparing it with the one known to be associated with that asset. These mechanisms are similar to those proposed in [5].
To avoid the overhead of the client having to perform authenticity and integrity checks for every file accessed, the NHSE plans to use an authentication system for name-to-location servers and for file servers. Name-to-location servers will allow only authorized trusted file servers to register locations of files. A trusted file server will guarantee that the file it returns for a particular name is correct. Updates from file servers to name-to-location servers, as well as the update protocol between replicated name-to-location servers, will require authentication. Such authentication may be based on public keys, shared secrets, network addresses, or some combination of these. Thus, authentication will be provided at the server level, rather than only at the individual file level.
Previous: User Interface for