Next Generation Z39.50A Web Services Approach for Search and Retrieve
6th Annual State GILS Conference, March 31 – April 3, 2004, Raleigh, NC
William E. Moen<[email protected]>
School of Library and Information Sciences
Texas Center for Digital KnowledgeUniversity of North Texas
Denton, TX 72603
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 2
Overview
Quick description of SRW Brief background – historical, political,
conceptual Non-technical (almost) introduction to SRW Common Query Language (CQL) briefly Concluding thoughts
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 3
What is SRW? Search and Retrieve Web Service (SRW) An XML-based protocol for searching, retrieving,
and other information retrieval transactions Cast in the standards/technologies for web
services XML SOAP HTTP
Brings the concepts and experience of Z39.50 into the web environment using web technologies
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 4
Why SRW?
Genesis: several years of soul searching by Z39.50 developers and implementors
The “web” had become the common implementation environment
Z39.50 was not perceived as web friendly Pivotal moments:
December 2000 ZIG meeting July 2001 meeting
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 5
Turning point: December 2000 “Z39.50 Future” discussion Perceptions of Z39.50
broken heavy-weight difficult and complex old technology not web friendly
Several options presented Rewrite the protocol from the ground up Rewrite as an XML protocol Separate the Z39.50 protocol from its use of BER as a wire
protocol Simplify the protocol specifications to focus on core features
Recognition of the intellectual contribution of Z39.50
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 6
Taking action: June 2001 Invitational meeting to discuss moving Z39.50 to an XML-
based protocol Goal
Lower the barriers to implementation while preserving the existing intellectual contributions of Z39.50, discarding those aspects no longer useful or meaningful.
Objective Define specifications for a new web service definition based on
Z39.50 together with web technologies Separate the Z39.50 abstract and associated semantic model
from its specific encoding and wire protocol (i.e., ASN.1/BER and TCP/IP)
Initially called Z39.50 Next Generation (ZNG) Intended as proof-of-concept Defining only those protocol specifications that would
actually be implemented by participants
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 7
ZING – Z39.50 International Next Generation
Make intellectual/semantic content of Z39.50 more broadly available
Make Z39.50 more attractive by lowering barriers to implementation Use of XML – to represent and encode data Use of HTTP – for transport Use of SOAP – for interaction between client and
server based on Remote Procedural Call (RPC) Several ZING initiatives: ZOOM, ez39.50, ZeeRex,
SRW/U
FOR MORE INFORMATION, VISIT THE ZING WEBSITE…
http://www.loc.gov/z3950/agency/zing/
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 8
SRW/U, SRW, SRU SRW/U: Search and Retrieve for the Web
General designation for this initiative SRW: Search and Retrieve Web Service
HTTP Post Simple Object Access Protocol (SOAP) XML messages
SRU: Search and Retrieve URL Service HTPP Get Request parameters included in URL syntax
Development Version 1.0 November 2001 Version 1.1 February 2002
FOR MORE INFORMATION, VISIT THE SRW WEBSITE…
http://www.loc.gov/srw
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 9
Networked information retrieval
What’s needed: Identifying a target to search A vocabulary for expressing search requests,
search criteria, retrieval requests, etc. Methods to encode the requests and
responses from the target Methods to transport the requests and
responses across a network In other words, a protocol and supporting
specifications
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 10
Abstract Model of IR
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 11
Abstract model of Z39.50
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 12
Z39.50 classic & SRW
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 13
SRW Overview
Builds on Z39.50 concepts and web technologies
Web technologies: XML, SOAP, HTTP Uses new, human-readable query
language Combines several Z39.50 features into
several “operation types” searchRetrieve operation scan operation explain operation
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 14
searchRetrieve operation
The core of the protocol Expresses the search and additional criteria Records are returned in XML
Request parameters version query Optional parameters
• sortkeys• recordPacking• recordSchema• recordXPath• stylesheet
Response parameters version numberOfRecords Optional parameters
• resultSetID• resultSetIdleTime• records• diagnostics
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 15
SRW & XML
XML as foundation for protocol Provides syntax for intelligent markup Defines or references XML schemas Example XML schema for SRW
specifications searchRetrieveRequest searchRetrieveResponse
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 16
searchRetrieveRequest example
Sent as a HTTP Post XML document is sent to the server Using SOAP to wrap the request
<searchRetrieveRequest> <version>1.1</version> <query>dc.title all "Squirrel Hungry"</query> <maximumRecords>1</maximumRecords> <startRecord>1</startrecord> <recordSchema>dc</recordSchema> </searchRetrieveRequest>
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 17
searchRetrieveResponse example
<searchRetrieveResponse> <version>1.1</version> <numberOfRecords>10</numberOfRecords> <records> <record> <recordSchema>info:srw/schema/1/dc-
v1.1</recordSchema> <recordData> <dc:record> <dc:title>Squirrel is Hungry</dc:title> </dc:record> </recordData> </record> </records> </searchRetrieveResponse>
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 18
searchRetrieve response
Records returned in response All records in XML syntax According to one or more XML schemas
(semantics) Dublin Core Onix MODS MarcXml
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 19
searchRetrieve example
Retrieval results XML view Screen shot
<searchRetrieveRequest> <version>1.1</version> <query>dc.title computer</query> <startRecord>1</startrecord> <maximumRecords>10</maximumRecords> <recordPacking>xml</recordPacking> <recordSchema>dc></recordSchema></searchRetrieveRequest>
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 20
SRW results
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 21
SRU briefly Protocol requests can be carried via HTTP Get searchRetrieveRequest parameters expressed in
standard URL syntax baseURL and search part separated by question
mark “?” Response is XML document containing records The searchRetrieveRequest in SRU:
http://alcme.oclc.org/srw/search/SOAR?operation=searchRetrieve&version=1.1&query=dc.title=%22computer%22&recordSchema=DC&startRecord=1&maximumRecords=10&recordPacking=xml
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 22
search/Retrieve query
SRW query consists of one or more query statements linked by Boolean operators
Five categories of query statements:1. single search clause
2. two or more search clauses linked by Boolean
3. search clauses and result sets linked by Boolean
4. two or more result sets linked by Boolean
5. single result set
Expressed in the Common Query Language (CQL)
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 23
Common Query Language (CQL) A formal language for representing queries to information
retrieval systems Human-readable Search clause
Always includes a term• simple terms consist of one or more words
May include index name• To limit search to a particular field/element• Index name includes base name and may include prefix
• title, subject• dc.title, dc.subject
• Several index sets have been defined (called Context Sets in SRW)• dc• bath• srw
• Context set defines the available indexes for a particular application
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 24
Other components of CQL Relation
<, >, <=, >=, =, <> exact used for string matching all when term is list of words to indicate all words must be found any when term is list of words to indicate any words must be
found Boolean operators: and, or, not Proximity (prox operator)
relation (<, >, <=, >=, =, <>) distance (integer) unit (word, sentence, paragraph, element) ordering (ordered or unordered)
Masking rules and special characters single asterisk (*) to mask zero or more characters single question mark (?) to mask a single character carat/hat (^) to indicate anchoring, left or right
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 25
CQL examples Simple queries:
dinosaur "the complete dinosaur"
Boolean dinosaur and bird or dinobird "feathered dinosaur" and (yixian or jehol)
Proximity foo prox bar foo prox/>/4/word/ordered bar
Indexes title = dinosaur bath.title="the complete dinosaur" srw.serverChoice=dinosaur
Relations year > 1998 title all "complete dinosaur" title any "dinosaur bird reptile" title exact "the complete dinosaur"
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 26
SRW & classic Z39.50 SRW
No explicit concept of connection, session, or state
Results sets named by server
Single record syntax (XML), multiple schemas
String (i.e., human-readable) queries CQL
Named indexes
Classic Z39.50 Stateful Results sets named by
client Multiple record syntaxes No human-readable query
language Type 1 query using attribute
sets Use attribute to identify
access point
Z39.50 Concepts Retained Result sets Abstract access points
Abstract record schemas Explain Diagnostics
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 27
What problems does SRW solve Addresses need for standards-based searching
in the networked environment Shows the vitality of the Z39.50 concepts and
implements those in a web services & URL access context
Offers database providers with a web-friendly method for offering standards-based searching of resources
Provides low barrier to entry solution using commonly available technologies
XML format of records provide for more reuse, and more interesting use of resources
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 28
Possible implementation venues
Gateways to existing Z39.50 servers Lightweight SRW/U servers to specialized
databases Cost-effective search access to
commercial databases (e.g., citation, full-text)
Metasearching Beyond libraries to many other information
communities
Moen 6th Annual State GILS Conference -- March 31 – April 3, 2004 -- Raleigh, NC 29
References Z39.50 International Next Generation – ZING
http://www.loc.gov/z3950/agency/zing/
Search and Retrieve for the Web – SRW/U http://www.loc.gov/srw
A Gentle Introduction to SRW http://www.loc.gov/z3950/agency/zing/srw/introduction.html
A Gentle Introduction to CQL http://zing.z3950.org/cql/intro.html
Search and Retrieval in The European Library: A New Approach by van Veen and Oldroyd in D-Lib (Feb04) http://www.dlib.org/dlib/february04/vanveen/02vanveen.html