Upload
gordon-hancock
View
216
Download
0
Embed Size (px)
Citation preview
MCNC/CNIDR & A/WWW Enterprises
Introduction to CNIDR’s Isite
Jim Fullton - MCNC/CNIDRArchie Warnock - A/WWW
Enterprises
MCNC/CNIDR & A/WWW Enterprises
What is Isite?
A freely available implementation of the Z39.50 search/retrieval protocol
It includes a Unix-based server, a WWW gateway, a command-line client and a sophisticated text search engine
ftp://ftp.cnidr.org/pub/NIDR.tools/Isite http://vinca.cnidr.org/software/Isite/Isite.html
MCNC/CNIDR & A/WWW Enterprises
What is Isearch?
Isearch is the successor to freeWAIS Isearch is a sophisticated full-text
search and retrieval system Isearch is a component of Isite, an
implementation of the NISO standard protocol Z39.50 for information search and retrieval
ftp://ftp.cnidr.org/pub/NIDR.tools/Isearch http://vinca.cnidr.org/software/Isearch/Isearch.html
MCNC/CNIDR & A/WWW Enterprises
System Components - I
Iindex, the Text Indexer - builds searchable version of the document collection Implements fast word-based searching Document parser - recognize start/end
of individual documents Field parser - recognize start/end of
fields within individual documents
MCNC/CNIDR & A/WWW Enterprises
System Components - II
Isearch, the Search engine - searches a document collection based on user-supplied query Command line search
Primarily used for testing WWW gateway (using CGI)
End-user interface using forms Z39.50 gateway
MCNC/CNIDR & A/WWW Enterprises
Isearch Capabilities
Fast full-text search US AIDS Patent Collection - can search
~250,000 patents in < 1 second Fielded search
Can restrict searches to title, author, abstract, other fields
Relevance ranking Search “hits” are assigned scores &
sorted
MCNC/CNIDR & A/WWW Enterprises
Isearch Capabilities
Word truncation search for “matri*” matches “matrix”
and “matrices” Boolean functions
AND, OR and ANDNOT combinations of different fields
Customized presentation of results Phrase searching (coming soon)
MCNC/CNIDR & A/WWW Enterprises
Isearch Customization
What’s needed to customize Isearch? Isearch is written in C++ Documents are C++ objects - data &
procedures Already have SGML & HTML, among others
Object technology allows code reusability, customizing only where differences from existing objects occur
MCNC/CNIDR & A/WWW Enterprises
Isearch Customization
What’s needed to make arbitrary documents searchable? Code to parse documents Code to parse fields Code to build brief and full result
records Yes, it requires programming But, many of these are derived from
existing procedures
MCNC/CNIDR & A/WWW Enterprises
Introduction to Z39.50
Developed for search and retrieval Networked, client/server environment Tested by working information
scientists (Z39.50 Implementor’s Group)
Commerical & public domain support (Isite from CNIDR)
http://www.ds.internic.net/z3950/z3950.html
MCNC/CNIDR & A/WWW Enterprises
Attribute Sets
Attributes define how the query is specified Use: field names Relation: comparisons Position: location in field Structure: word/phrase/key/etc Truncation: left/right/none/etc Completeness: subfield/field
MCNC/CNIDR & A/WWW Enterprises
Attributes & Element Sets
Supported Attribute Sets BIB-1 GILS GEO STAS
Element Sets define retrievable sets of use attributes Brief record Full record Summary record (GEO)
MCNC/CNIDR & A/WWW Enterprises
Record Syntaxes
Z39.50 allows specification of a “Preferred Record Syntax” for results SUTRS (unstructured text) HTML USMARC GRS-1 (tagged, generalized syntax)
MCNC/CNIDR & A/WWW Enterprises
Profiles - GEO and Otherwise
Profiles define allowed attributes and element sets
Usually domain specific - ATS-1, GILS, WAIS, GEO, Digital Collections, Museum Collections
Supported by external agreement between client & server (currently) i.e., a GEO client talks to a GEO server
MCNC/CNIDR & A/WWW Enterprises
FGDC Enhancements
Search Engine (Iindex/Isearch) Field types (text, numeric, date,
others) Search in nested fields Search in numeric fields Date & Date Range Searching Spatial Searching
MCNC/CNIDR & A/WWW Enterprises
FGDC Enhancements
Z39.50 Implementation (ZDist) Support for GEO attributes & element
sets GRS-1 record syntax Support for additional (non-Isearch)
search engines Syntax to support nested query
MCNC/CNIDR & A/WWW Enterprises
Outstanding Issues
User Interface What fields are searchable and how
does the user indicate them? How complex can the geographic
queries be? Bounding box only? Complex regions?