Work at UT on the NHSE, 1/17/94-1/30/94

Statistics Page
---------------

An NHSE Statistics page has been constructed that includes
clickable NHSE usage graphs.  To get to this page, click on
"Statistics on the NHSE" on the NHSE home page.
Please send comments, questions, or suggestions about this
page to Stan Green / sgreen@cs.utk.edu.


Re-installation of Harvest
--------------------------

The "Search interface to the NHSE" is being provided through
use of the Harvest system (http://harvest.cs.colorado.edu).
We are currently running the pre-1.0 release, but we
were in the process of upgrading to the official Harvest-1.0
release when we received an announcement of the beta-1.1 release.
We had modified the gatherer in the 1.0 releases as follows:

 - modified the HTML summarizer to extract full text in addition
   to title and other fields
 - modified the httpenum program 
   -- to expand RootNodes to a maximum
      depth, rather than just doing depth-first expansion to a maximum
      number of nodes
      (We had a bug that caused the depth not to be decremented
      on an error return, which sometimes caused fewer files to
      be retrieved during a RootNode expansion than should have
      been, but this has now been fixed).
   -- to follow offsite as well as onsite pointers when expanding
      RootNodes
  - to NOT automatically generate values for the keywords field
  - to go through the local file system 
    rather than the http server when accessing local files.

We had previously asked the Harvest developers to make some of
these changes, but it had been many months.
Then we received the beta-1.1 announcement that local gathering,
and depth limit and following offsite pointers for
RootNode expansion had been implemented.
So we will now check out the 1.1 release and upgrade to it once it
is stable.

The current search interface is using freeWAIS-sf-1.0 for the
search engine, but we have purchased commercial WAIS, and will
be switching over to it soon.

The waisindex for the NHSE search interface is now a month out-of-date.
We have made some recent attempts at updating it, but because of
the larger number of files we are now gathering, we have run out of
disk space on every attempt.  It is also taking on the order of
a few days to do the gathering.
We have moved the files to where there is more disk space and
hope to have the waisindex updated within another day or two.
However, it would help tremendously if some of the other sites serving
a large number of NHSE files, such as Argonne and Syracuse, would
run a local gatherer.  If you would be interested in doing this,
please send email to Shirley Browne / browne@cs.utk.edu.


Software Survey
---------------

In the last report, we described the survey of software pointed
to by the NHSE that we are undertaking.
The software descriptions have been input to both the NTTC natural language
processing software and to Mike Berry's LSI engine.
We are currently manually indexing and abstracting the individual
software items, but this is taking a considerable amount of time,
and only 19 out of some 300 items have been processed so far.
As we do the indexing, we are also compiling a list of thesaurus terms
that will eventually have scope notes and definitions attached to them.
We plan to conduct an experiment comparing the following retrieval
methods using the software survey as a test database:

 - NTTC natural language processing alone
 - LSI alone
 - LSI with NTTC noun phrase extraction as a preprocessing step
 - using HPCC thesaurus for both manual indexing and searching with boolean
   searches
 - using HPCC thesaurus as searching thesaurus only with boolean searches
 - NTTC assisted by thesaurus scope notes and definitions
 - LSI assisted by thesaurus scope notes and definitions


Work Continues on URN/LIFN Publishing System
--------------------------------------------

In the last report, we described work on this.  Meetings we
have held during the past two weeks have dealt with the
following details concerning the URC (meta-information) server:

 - how to implement read/write/delete permissions
 - Assertion and Certificate data structures
 - syntax and semantics of query and update requests
 - request and update handling algorithms
 - strategy for supported different data models, such as the Reuse
   Library Interoperabilty Group (RIG) Basic Interoperability Data Model
   (BIDM)