Upload
erick-reed
View
225
Download
2
Embed Size (px)
Citation preview
CQL
““Common Query Common Query Language”Language”
Ray DenenbergMarch 2005
CQL’s Goals
Combine the simplicity and intuitiveness of google searching with the expressive power of Xquery.
Support very simple queries;
and arbitrarily complex expressions as necessary.
Example: search on “cat”
cat
cat
(That’s it. The whole query.)
Simple CQL Queries cat
cat and dog
title = cat
Simple CQL Queries cat (simplest)
cat and dog (simple boolean)
title = cat (index)
Simple CQL Queries cat (simplest)
cat and dog (simple boolean)
title = cat (index)
dc.title = cat (index qualified)
Boolean cat and dog
cat or dog
Cat not dog
Boolean cat and dog
cat or dog
Cat not dog
cat not dog and fish or frog
Boolean cat not dog and fish or frog
evaluates to:(((cat not dog) and fish) or frog)
Boolean cat not dog and fish or frog
evaluates to:(((cat not dog) and fish) or frog)
Not:(cat not dog) and (fish or frog)
index Search
title = cat
Qualified index
title = cat dc.title = cat bib.title = cat Bath.keyTitle
Fielded/index Search
dc.title = cat bib.title = cat
dc.title A name given to the resource
bib.title (fictitious)A word, phrase, character, or group of characters, normally appearing in an item, that names the item or the work contained in it.
Zthes Indexes
zthes.nt=sauropod and zthes.bt=macronaria
narrower than sauropod but broader than macronaria.
Relations
Relations
<index> <relation> <search term>
Search Clause
The triple:
Is called a:
(e.g. title = cat)
Relations
<index> <relation> <search term>
Simple Relations
Title = "the complete dinosaur" title all "complete dinosaur“
title any "dinosaur bird reptile"
title exact "the complete dinosaur"
the = relation
Title = "the complete dinosaur“
(find these three words,
adjacent and in this order)
Title = "the complete dinosaur“
matches “a day in the life of the complete dinosaur“
and“the complete dinosaur goes to Paris“
= Title = "the complete dinosaur“
matches “a day in the life of the complete dinosaur“
and“the complete dinosaur goes to Paris“
but not “the complete and unabridged dinosaur"
All Title all "complete dinosaur“
matches “the complete and unabridged dinosaur“
does not match “the unabridged dinosaur“
Title all "dinosaur bird reptile“
does not match “the complete dinosaur"
Any
Title any "dinosaur bird reptile“
does match “the complete dinosaur" and
“the unabridged dinosaur"
Exact title exact "the complete dinosaur" matches"the complete dinosaur"
Exact title exact "the complete dinosaur" matches"the complete dinosaur"
Does not match: “a day in the life of the complete dinosaur
or“the complete dinosaur goes to Paris“ or“the complete and unabridged dinosaur “
Relations …. observations
Relations …. observations Observation 1: Shorthand
title all "old man sea"
same as
title="old" and title="man" and title="sea"
Relations …. observations Observation 2: Anchoring
^
The anchor character
Recall ……. Title = "the complete dinosaur“
matches “a day in the life of the complete dinosaur“
Anchoring
title="^the complete dinosaur" would not match
“a day in the life of the complete dinosaur”
Anchoring title="^the complete dinosaur"
would not match “a day in the life of the complete dinosaur”
title="the complete dinosaur^" would not match
“the complete dinosaur goes to Paris”
Relations …. observations Observation 3: Index and Relation go
together
Index and Relation go together
Cat
Title = cat
Index and Relation go together
Cat
Title = cat
Title cat
= cat
Index and Relation go together
Cat
Title = cat
Title cat
= cat
BNF
searchClause ::='(' cqlQuery ')‘ | index relation searchTerm | searchTerm
Basic Relations …. summary
Title = "the complete dinosaur" title all "complete dinosaur“ title any "dinosaur bird reptile" title exact "the complete dinosaur"
A few more relations … < less
> greater
<= less or equal
>= greater or equal
= (see next) <> not equal
= relation
= means:
word adjacency, when the term is a list of words.
Equality, otherwise.
Relation Modifiers Stem relevant Fuzzy phonetic
Stemming
title =/stem "these completed dinosaurs“
matches
The Complete Dinosaur.
Relevance
subject any/relevant "fish frog"
would find records whose subject field included words like shark, tuna, coelocanth, toad, amphibian, etc.
Relation Modifiers Stem relevant Fuzzy phonetic
fuzzy Fuzzy means:
“be liberal in what you count as a match … details left to the server. Might include permutations of character order, off-by-one for numerical terms.”
Title =/fuzzy “sharlot simmins” might match
“I am Charlotte Simmons”
telephoneNumber exact/fuzzy “303 441 1319"
Relation Modifiers Stem relevant Fuzzy phonetic
Phonetic Match words that sound the same
e.g. Hostel might match “hostile”
Booleans And
Or
not
Booleans And
Or
Not
Proximity
And cat and dog
Or cat or dog
Not cat not dog
Proximity cat prox dog
And cat and dog
Or cat or dog
Not cat not dog Proximity cat prox dog
roughly: “find cat near dog”
Proximity (chestnut prox “Cryphonectaria parasitica”)
prox
(“dutch elm” prox
Ceratocystisulmi)
Proximity parameters relation
Distance
unit
ordering
Proximity parameters relation
Distance
unit
ordering
e.g: “Find cat in the same sentence as dog”
Relation: less or equalDistance: 0Unit: sentenceOrdering: unordered
relation ("<", ">" ,"<=" ,">=" ,"=" , "<>"; default "<="),
distance (integer; default: 1 for word, zero otherwise)
unit ("word", "sentence", "paragraph", or "element"; default "word"),
ordering ("ordered" or "unordered"; default "unordered")
“Find cat in the same sentence as dog”
cat prox//sentence dog
“Find cat in the same sentence as dog”
cat prox//sentence dogsame as:cat prox/<=/0/sentence/unordered dog
(chestnut prox//sentence
“Cryphonectaria parasitica”)
prox//paragraph
(“dutch elm” prox//sentence
Ceratocystisulmi)
(chestnut prox//sentence
“Cryphonectaria parasitica”)
prox//paragraph
(“dutch elm” prox//sentence
Ceratocystisulmi)
(find chestnut in the same sentence as “Cryphonectaria parasitica”, and “dutch elm” In the same sentence as Ceratocystisulmi, and both sentences in the same paragraph.)
(chestnut prox//paragraph
“Cryphonectaria parasitica”)
and
(“dutch elm” prox//paragraph
Ceratocystisulmi)
(chestnut prox//paragraph
“Cryphonectaria parasitica”)
and
(“dutch elm” prox//paragraph
Ceratocystisulmi)
(find chestnut in the same paragraph as “Cryphonectaria parasitica”, and “dutch elm” In the same paragraph as Ceratocystisulmi.)
cat prox/>/2//ordered hat
retrieves “cat in the hat” but not “cat in hat”
nor “hat on the cat”
Pattern Matching ? Matches any single character
* Matches any sequence of zero or more characters
^ word-anchoring
Pattern Matching ? Matches any single character
• c?t matches cat, cot, cut, but not coat or ct. c??t matches cart, but not cat or crypt.
* Matches any sequence of zero or more characters
• c*t matches cat, coat, crypt and counterargument.
^ word-anchoring ---
Word Anchoring title="^the complete dinosaur"
Matches “the complete dinosaur meets godzilla” But not “a day in the life of the complete
dinosaur”
title="the complete dinosaur^ “ Matches a day in the life of the complete
dinosaur” But not “the complete dinosaur meets godzilla”
Word Anchoring - any
title any "^cat ^dog rat“•Means title with cat at the beginning, or with
dog at the beginning,or with rat anywhere.
Word Anchoring - any title any "^cat ^dog rat“
• Means title with cat anywhere, or with rat anywhere, or with dog at the beginning.
matches • 'cat eats dog', • 'dog eats hat' • ‘hat eats rat’
but not • ‘hat eats dog'
CQL Syntax Reserved words:
and, or, not, prox
Special Characters Space ( ) = < > ” /
Tokens A string that has no special
characters; or
Any string at all enclosed by double quotes. (Except the string cannot include a double quote, unless escaped.)
Escape Character \ Backslash (\) escapes '*', '?', " and
'^' , as well as itself
"\“why not\?\" she said"
Results in the following token:
“why not?" she said
Context sets
Context sets Indexes
Relations
Relation modifiers
Boolean Modifiers
subject any/relevant "fish frog"
subject any/relevant "fish frog"
index relationRelationmodifier
Search term
subject any/relevant "fish frog"
index relationRelationmodifier
Search term
Subject to context qualification
dc.subject any/relevant "fish frog"
Context set
dc.subject any/relevant "fish frog"
dc.subject any/rel.lr "fish frog"
dc.subject any/rel.lr "fish frog"
A specific Relevance algorithn
Context set
dc.subject cql.any/rel.lr "fish frog"
Context set
Example–fictitious relation: “only”
depicts only “cat"
Matching images would depict only a cat and nothing else. The same cat with a person would not match.
index relation
image.depicts image.only “cat"
Context for index
Context for relation
subject any/relevant "fish frog"
Go back to:
subject any/relevant "fish frog"
title any/relevant “cat dog"
Or
subject any/relevant "fish frog"
title any/relevant “cat dog"
Or/rel.mean
subject any/relevant "fish frog"
title any/relevant “cat dog"
Or/rel.meanBoolean modifier
Contextset
Defaults Consider the query:
cat
The server needs to turn that into a search clause, I.e. an index, relation, and search term.
As it is, there’s only a search term
<index> <relation> cat
cql.scr(default context set and relation)scr: “server choice relation”
cql.serverChoice(default index)
Next, consider the query:
title = cat
Next, consider the query:
title = cat
The server needs to assign a context set to the index (title) and a context set to the relation (=)
Next, consider the query:
title = cat
The server needs to assign a context set to the index (title) and a context set to the relation (=)
Or to make it even more complicated….
Add a relation modifier
title = cat/relevant
The server needs to assign a context set to the index (title) and a context set to the relation (=), and a context set to the relation modifier.
Default Context Sets
<>.title cql.= cat/cql.relevant
Default index seleted by server
Default context set for relation is ‘cql’
Default context set for relation modifier is ‘cql’
Additional relation modifiers word
The term should be broken into words, (according to the server's definition of a 'word‘)
stringThe term is a single item, and should not be broken up.
isoDateEach item within the term conforms to ISO 8601
numberEach item within the term is a number.
uriEach item within the term is a URI.
masked (default modifier)
Title any “cat dog” same as
Title any/word “cat dog”
Title any “cat dog” same as
Title any/word “cat dog”
Title exact “cat in the hat” same as
title exact/string “cat in the hat”
Title any “cat dog” same as
Title any/word “cat dog”
Title exact “cat in the hat” same as
title exact/string “cat in the hat”
Title = “cat * hat” same as
Title =/masked “cat * hat”