Progress report on NHSE development through 12-16-94: (Please send additions, corrections, or comments to browne@cs.utk.edu) Netlib Development Group, UTK: We have put up the NHSE home page(s) with pointers to all the HPCC-related information we have been able to find on Internet file servers. We installed the Harvest system from the University of Colorado (see http://harvest.cs.colorado.edu/ for more information about Harvest) and have used it to provide a search interface to the distributed set of NHSE HTML pages. A Gatherer running at UTK collects around 3000 local HTML pages (includes all of Netlib) and 5000+ non-local HTML pages, and it extracts the title and full-text, along with other attributes, from these. The extracted information is imported and indexed by a Broker, which is currently using freeWAIS-sf-1.0 as the search engine. We are considering purchasing commercial WAIS to have a more reliable search engine. A CGI gateway to the Broker provides a search interface in which the user may enter search keywords. We have written up a software review policy for NHSE software contributions. Contributions will be assigned the following review classification levels: U - Unreviewed P - Partially reviewed (inspected) R - Reviewed (tested, peer-reviewed) Guidelines for software contributors and reviewers concerning scope, documentation, construction, completeness, and testing are included in the policy. Because we consider both the current hypertext browsing and blind keyword searching modes to be inadequate for efficiently locating information, we are in the planning stages of an integrated browsing/searching interface that will permit contextual, iterative searching. The interface will be based on an HPCC thesaurus and faceted classification scheme. The thesaurus will draw on the glossaries being developed at Syracuse, the current NHSE contents, and existing thesauri and classification schemes. The idea is that the user will be able to browse an annotated hypertext display of the thesaurus and initiate searches from hypertext nodes. We envisage two types of information being available -- 1) the current unstructured HTML pages, and 2) catalogued software contributions. The catalogued software collection may be small and stable enough that it can be indexed (i.e., assigned thesaurus terms) manually. For the HTML pages, the thesaurus will be a searching thesaurus only, with its main purpose being to increase recall by term suggestion and synonym expansion. It appears to be a requirement of the NHSE that the HPCC software it encompasses be maintained at a number of autonomous discipline-oriented repositories. In order to provide a uniform interface and to reliably catalog and provide access to submissions, the NHSE needs a location-independent naming scheme and authentication and integrity checking mechanisms. Thus, we have begun the design and implementation of a system with the following components: - tools to assist publishers, also called naming authorities, to assign unique names to files they publish, provide and crytographically sign descriptions of those files, and register locations for those files with the name-to-location lookup service. - a distributed name-to-location lookup service - client library for resolving location-independent names and for performing authenticity and integrity checks - file mirroring programs that allows file servers to obtain copies of published files and register their locations with the name-to-location lookup service We envision providing for two types of names: 1) A location-independent filename (LIFN) for which the binding to the particular sequence of bytes it names may not be changed, 2) A URN that names a conceptual entity. At any given time, a URN resolves to a particular LIFN, but the binding may change (linearly!) over time. The URN would change when the high-level description changes -- e.g., a major new release of a software package -- but not for a minor bug fix that would just require a new LIFN. The idea is that URNs could be used on a fairly long-term basis in hypertext links and in records exported to search services. Users and repository managers (human or automated) could refer to a LIFN when an unambiguous reference to a partcular sequence of bytes were needed. We have put up a small (so far) report server that indexes reports from the BibNet, tennessee, and lapack/lawns directories in Netlib, as well as University of Manchester and Pete Stewart's ftp directories, and we are accepting contributions. This server is accessible from the NHSE Information Databases page. The subject areas targeted are numerical analysis and high performance computing. The idea is that each contributing site will maintain a machine-parseable file containing URL (or filename from which URL may be determined), title, author, abstract, etc., for each available report. A Harvest Gatherer retrieves these files on a regular basis and streams the extracted information to a Harvest Broker that indexes the information and provides a search interface. This search interface would also benefit from use of an HPCC thesaurus as a searching thesaurus to improve recall. We have established contacts with the Reuse Library Interoperability Group (RIG) for the purpose of discussing: 1) unique naming and cataloguing standards for software and software-related assets 2) a common software evaluation framework 3) possible interoperation of Netlib and/or NHSE with ASSET, CARDS, DSRS, ELSA, and AdaIC. These five repositories currently interoperate, meaning that a user logged into one of the repositories may access assets exported by another repository without logging into the other repository. So for example if NHSE interoperated with these, a user could retrieve assets exported by say ASSET from the NHSE hypertext interface without getting an ASSET account. Shirley Browne will be attending the Jan 24-25 RIG meeting to do a presentation on the LIFN/URN scheme and discuss cataloguing and naming standards with the TC2 committee. Shirley Browne / browne@cs.utk.edu / 12-16-94 -------------------------- Report on NHSE Work carried out at Syracuse since inception of NHSE Project. Geoffrey Fox and Ken Hawick - report to mid Dec 1995. We have carried out a lot of general hypertextual development work to assess what forms of information can best be organised to provide "roadmaps" to the software technology. In particular we have invented the concept of the HTML glossary as a mini roadmap to a technological area. A substantial glossary on HPCC terminology was developed, and smaller glossaries on special areas like HPF are under development. We have also looked into ways of speeding up the process of glossary development and quality control through use of word stemming (for automated cross referencing) and consistency checking and cross reference verification. We envisage a number of other specialised glossaries that we will develop over the coming months. We have manually built three major road map packages. One was a technology integration system for the US Air Force and looked more broadly at software and hardware technologies. This project also pioneered the idea of multimedia consultant review articles which form a distributed encyclopaedia of technology. We envisage developing this idea further and focusing it on NHSE type technologies. We also constructed a roadmap package to HPF as a key technology from the CRPC. The concepts for this have been transferred to other so that David Walker has agreed to undertake a similar package for MPI for example. Our HPF package includes a number of HPF kernel codes as well as educational code fragments and full applications codes including financial modeling, Black Hole Grand Challenge simulation code and data assimilation code for meteorological purposes. Forty kernels are drawn from a broad base of computational science. We are currently analysing our findings from this HPF Applications study and preparing a documentary summary. We are also integrating our package with the applications areas identified by our colleagues at Maryland for the HPFF-2 effort. We are developing a major Roadmap package on HPCC technology and applications. The applications section was based on the highly succesful Fox applications category tables. We are currently reviewing these categories and uptaking the categorisation scheme developed during the Pasadena Petaflops workshop. We are also reviewing potential categorisation schemes for HPCC software technologies themselves rather than just the applications, and we envisage a hypertxtual cross linking and referencing scheme between these. This will enable both experts and novices to interrogate a multimedia package which will lead them through appropriate contextual links to the actual software packages in the NHSE. We have also developed a survey of HPCC hardware vendors both from a systems perspective and a historical perspective. We are currently revising this system in the light of new information from vendors at Supercomputing. This system is also cross linkled to the preformance database system developed by our colleagues at Tenessee. We are also "mining" this database of hardware systems and supplier organisations for trends and are analysing this information to provide insights into next generation HPCC systems. We have been working with InfoMall partners to assess what their preferred online information packages would be and how they would use the NHSE and Roadmaps. Finally, we are considering how best to integrate search engines and other innovative information technology software with our online material. As well as customised search engine facilities we also envisage configuring server systems to demonstrate selected software packages on demand. We see this "try before you download" scenario as complementary to the more commercially oriented "try before you buy". In particular we have experimented with hypertextual gateways to commercial database systems such as SYBASE and Oracle which may be suited to handling very large collections of source controlled software components. Most of the above ideas and developments have been written up in the NHSE Document. KAH, 9 Dec, 1994.