17
MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

Embed Size (px)

Citation preview

Page 1: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

Introduction to CNIDR’s Isite

Jim Fullton - MCNC/CNIDRArchie Warnock - A/WWW

Enterprises

Page 2: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

What is Isite?

A freely available implementation of the Z39.50 search/retrieval protocol

It includes a Unix-based server, a WWW gateway, a command-line client and a sophisticated text search engine

ftp://ftp.cnidr.org/pub/NIDR.tools/Isite http://vinca.cnidr.org/software/Isite/Isite.html

Page 3: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

What is Isearch?

Isearch is the successor to freeWAIS Isearch is a sophisticated full-text

search and retrieval system Isearch is a component of Isite, an

implementation of the NISO standard protocol Z39.50 for information search and retrieval

ftp://ftp.cnidr.org/pub/NIDR.tools/Isearch http://vinca.cnidr.org/software/Isearch/Isearch.html

Page 4: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

System Components - I

Iindex, the Text Indexer - builds searchable version of the document collection Implements fast word-based searching Document parser - recognize start/end

of individual documents Field parser - recognize start/end of

fields within individual documents

Page 5: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

System Components - II

Isearch, the Search engine - searches a document collection based on user-supplied query Command line search

Primarily used for testing WWW gateway (using CGI)

End-user interface using forms Z39.50 gateway

Page 6: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

Isearch Capabilities

Fast full-text search US AIDS Patent Collection - can search

~250,000 patents in < 1 second Fielded search

Can restrict searches to title, author, abstract, other fields

Relevance ranking Search “hits” are assigned scores &

sorted

Page 7: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

Isearch Capabilities

Word truncation search for “matri*” matches “matrix”

and “matrices” Boolean functions

AND, OR and ANDNOT combinations of different fields

Customized presentation of results Phrase searching (coming soon)

Page 8: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

Isearch Customization

What’s needed to customize Isearch? Isearch is written in C++ Documents are C++ objects - data &

procedures Already have SGML & HTML, among others

Object technology allows code reusability, customizing only where differences from existing objects occur

Page 9: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

Isearch Customization

What’s needed to make arbitrary documents searchable? Code to parse documents Code to parse fields Code to build brief and full result

records Yes, it requires programming But, many of these are derived from

existing procedures

Page 10: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

Introduction to Z39.50

Developed for search and retrieval Networked, client/server environment Tested by working information

scientists (Z39.50 Implementor’s Group)

Commerical & public domain support (Isite from CNIDR)

http://www.ds.internic.net/z3950/z3950.html

Page 11: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

Attribute Sets

Attributes define how the query is specified Use: field names Relation: comparisons Position: location in field Structure: word/phrase/key/etc Truncation: left/right/none/etc Completeness: subfield/field

Page 12: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

Attributes & Element Sets

Supported Attribute Sets BIB-1 GILS GEO STAS

Element Sets define retrievable sets of use attributes Brief record Full record Summary record (GEO)

Page 13: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

Record Syntaxes

Z39.50 allows specification of a “Preferred Record Syntax” for results SUTRS (unstructured text) HTML USMARC GRS-1 (tagged, generalized syntax)

Page 14: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

Profiles - GEO and Otherwise

Profiles define allowed attributes and element sets

Usually domain specific - ATS-1, GILS, WAIS, GEO, Digital Collections, Museum Collections

Supported by external agreement between client & server (currently) i.e., a GEO client talks to a GEO server

Page 15: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

FGDC Enhancements

Search Engine (Iindex/Isearch) Field types (text, numeric, date,

others) Search in nested fields Search in numeric fields Date & Date Range Searching Spatial Searching

Page 16: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

FGDC Enhancements

Z39.50 Implementation (ZDist) Support for GEO attributes & element

sets GRS-1 record syntax Support for additional (non-Isearch)

search engines Syntax to support nested query

Page 17: MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises

Outstanding Issues

User Interface What fields are searchable and how

does the user indicate them? How complex can the geographic

queries be? Bounding box only? Complex regions?