http://www.netlib.org/utk/projects/esr/
Jack Dongarra
University of Tennessee
Eric Grosse
AT&T Bell Laboratories
Ron Boisvert
National Institute of Standards and Technology
Software repositories have traditionally provided access to software
resources for particular communities of users within specific domains.
For example, our Netlib
and GAMS repositories provide
access to collections of mathematical software, while our
National HPCC Software Exchange (NHSE)
provides access to high performance computing resources.
The growth of the World Wide Web has created new opportunities
for expanding the scope of discipline-oriented repositories,
for reaching a wider community of users, and for expanding the
types of services offered. With these opportunities have come challenges,
however, such as the shift from centralized to decentralized management,
interoperating between different repositories, and increased security
risks. Reaching a wider community of users has created a need
for increased automated assistance in locating appropriate resources
and in understanding and making use of these resources.
We are tackling these challenges with a number of efforts, ranging
from system-level infrastructure for resource management to application-level
and content-oriented tools.
Our research has the following five focus areas:
- Resource Cataloging and Distribution System (RCDS)
- Application-level and content-oriented tools
- Safe execution environments for mobile code
- Repository interoperability
- Distributed, semantic-based searching
In the area of resource management infrastructure, we are developing
the
Resource Cataloging and Distribution System (RCDS). RCDS has
the goals of
- facilitating the scalable distribution of resources,
- achieving fault tolerance, high availability, and good response time,
- responding quickly to changes in resources, and
- assuring integrity,
authenticity, and consistency of resources and metadata.
RCDS consists of the following components:
- File servers, which provide access to the files themselves.
These can be ordinary http, ftp, etc. servers.
- Catalog info servers, which maintain authenticated
information about the
characteristics of network-accessible resources and accept queries
about the characteristics of such resources from clients.
- Location servers, which maintain information about the
locations of network-accessible resources and accept queries for
location data from clients.
- Collection managers, which are responsible for acquiring
and deleting files on a file server and for informing location servers
about file availability.
- Publication tools, which accept new files and descriptions
from content providers and inject them into the system.
Another component needed by RCDS, but which is not part of the current
RCDS design but which we are considering,
is a public key infrastructure consisting of key servers
for certifying and revoking public keys. Search servers are also not
part of RCDS -- rather than attempting to design a resource discovery
system that would work well for all existing subject areas, we have
chosen to design a cataloging and distribution system that will form
a common substrate for present and future resource discovery tools.
RCDS does not explicitly support protection of intellectual property
rights. However, it is possible to include pricing information and usage
restrictions in the description of a resource.
Software repositories will find the RCDS infrastructure useful for
supporting decentralized management of resources and for providing
users with reliable, efficient access to those resources.
Through the use of digital signatures and cryptographically signed
certificates,
RCDS will also provide integrity and authentication guarantees
that, in addition to protecting against malicious modification
or accidental corruption of source code, will enable safe use
of agent and applet technologies for adding interactive content
to repositories. A repository may participate in RCDS as a resource
contributor, by providing descriptions of resources that it holds,
or as a third party that adds value to resources contributed by others
by cataloging and classifying them and/or providing a search service.
In the area of application-level and content-oriented tools,
we are developing applet and agent programs for assisting users
in finding and using software resources.
- The Numerical Navigator, for which we have developed prototype
versions in Java and Tcl/Tk, allows the user to visualize the contents
of a software collection on a single screen. The user manipulates
the display using buttons, sliders, and pull-down menu. Pointing
and clicking in the display area reveals more detailed information,
including links for immediate downloading of selected software.
- ApproxWizard is an applet, being developed in both Java and Limbo
versions, that helps users select an approximation
code. The applet interacts with the user by doing calculations,
either on the client or remotedly on servers, on sample user data sets
that reside on the client disk.
- We have developed domain-specific expert extensions to the
GAMS problem classification scheme. An advisory system for a given
problem class helps the user discriminate between problem-solving
software modules for that class. The existing prototype user interface
was programmed as an X-windows client, but Java versions are planned.
- A project in the planning stage is a Program Builder that fetches
the appropriate versions of
source code, subroutines, and libraries for a user's platform
from different repositories
and compiles and links them.
We are working with other researchers in the repository
and agent technology communities to define requirements for
safe execution environments for agent and applet programs.
An execution environment provides program interpretation
and run-time support as well as relocation and communication services.
However, the execution environment must also be secure to ensure
that code from untrusted sources does not harm the host system,
gain unauthorized access to files, or usurp resources.
After determining the requirements for such an environment, we plan
to implement a facility for remote execution of user code in
the
Netsolve system.
Although there a number of software repositories in existence
or under development, these repositories generally have their own
interfaces and require the user to connect to each one separately
to search or browse for software or other resources.
We are involved in efforts to promote sharing of asset metadata
and, where possible, of assets themselves between software repositories.
We are working with other members of the
Reuse Library
Interoperability Group to add structure to WWW-based interoperation
and to define labeling standards for asset certification and
intellectual property rights.
To facilitate maximum interoperability, we are developing
a toolkit called Repository in a Box for use by repository managers.
This toolkit will include a publishing tool for creating and
maintaining software catalog records and for exporting these
records to other repositories and search services.
We are combining our experience using the
Harvest System with
our research on Latent Semantic Indexing (LSI) to produce
a semantic-based distributed search system.
LSI uses the singular value decomposition of the term-document
matrix to produce a low-rank approximation to this matrix that
can be used for semantic retrieval based on statistical word
co-occurrence. The resulting concept space provides better
retrieval performance than lexical keyword matching and allows
for easy relevance feedback. Our plans are to interface LSI
to the Gatherer and Broker components of Harvest.
The interface to the Gatherer will consist of an interactive tool
for use by an expert in guiding the Gatherer to collect relevant
informtion. The Broker interface will allow for searching
and doing relevance feedback across multiple distributed LSI indexes.
dongarra@cs.utk.edu