Work at UT on the NHSE, 1/17/94-1/30/94 Statistics Page --------------- An NHSE Statistics page has been constructed that includes clickable NHSE usage graphs. To get to this page, click on "Statistics on the NHSE" on the NHSE home page. Please send comments, questions, or suggestions about this page to Stan Green / sgreen@cs.utk.edu. Re-installation of Harvest -------------------------- The "Search interface to the NHSE" is being provided through use of the Harvest system (http://harvest.cs.colorado.edu). We are currently running the pre-1.0 release, but we were in the process of upgrading to the official Harvest-1.0 release when we received an announcement of the beta-1.1 release. We had modified the gatherer in the 1.0 releases as follows: - modified the HTML summarizer to extract full text in addition to title and other fields - modified the httpenum program -- to expand RootNodes to a maximum depth, rather than just doing depth-first expansion to a maximum number of nodes (We had a bug that caused the depth not to be decremented on an error return, which sometimes caused fewer files to be retrieved during a RootNode expansion than should have been, but this has now been fixed). -- to follow offsite as well as onsite pointers when expanding RootNodes - to NOT automatically generate values for the keywords field - to go through the local file system rather than the http server when accessing local files. We had previously asked the Harvest developers to make some of these changes, but it had been many months. Then we received the beta-1.1 announcement that local gathering, and depth limit and following offsite pointers for RootNode expansion had been implemented. So we will now check out the 1.1 release and upgrade to it once it is stable. The current search interface is using freeWAIS-sf-1.0 for the search engine, but we have purchased commercial WAIS, and will be switching over to it soon. The waisindex for the NHSE search interface is now a month out-of-date. We have made some recent attempts at updating it, but because of the larger number of files we are now gathering, we have run out of disk space on every attempt. It is also taking on the order of a few days to do the gathering. We have moved the files to where there is more disk space and hope to have the waisindex updated within another day or two. However, it would help tremendously if some of the other sites serving a large number of NHSE files, such as Argonne and Syracuse, would run a local gatherer. If you would be interested in doing this, please send email to Shirley Browne / browne@cs.utk.edu. Software Survey --------------- In the last report, we described the survey of software pointed to by the NHSE that we are undertaking. The software descriptions have been input to both the NTTC natural language processing software and to Mike Berry's LSI engine. We are currently manually indexing and abstracting the individual software items, but this is taking a considerable amount of time, and only 19 out of some 300 items have been processed so far. As we do the indexing, we are also compiling a list of thesaurus terms that will eventually have scope notes and definitions attached to them. We plan to conduct an experiment comparing the following retrieval methods using the software survey as a test database: - NTTC natural language processing alone - LSI alone - LSI with NTTC noun phrase extraction as a preprocessing step - using HPCC thesaurus for both manual indexing and searching with boolean searches - using HPCC thesaurus as searching thesaurus only with boolean searches - NTTC assisted by thesaurus scope notes and definitions - LSI assisted by thesaurus scope notes and definitions Work Continues on URN/LIFN Publishing System -------------------------------------------- In the last report, we described work on this. Meetings we have held during the past two weeks have dealt with the following details concerning the URC (meta-information) server: - how to implement read/write/delete permissions - Assertion and Certificate data structures - syntax and semantics of query and update requests - request and update handling algorithms - strategy for supported different data models, such as the Reuse Library Interoperabilty Group (RIG) Basic Interoperability Data Model (BIDM)