100
CQL Common Query Common Query Language” Language” Ray Denenberg March 2005

CQL “Common Query Language” Ray Denenberg March 2005

Embed Size (px)

Citation preview

Page 1: CQL “Common Query Language” Ray Denenberg March 2005

CQL

““Common Query Common Query Language”Language”

Ray DenenbergMarch 2005

Page 2: CQL “Common Query Language” Ray Denenberg March 2005

CQL’s Goals

Combine the simplicity and intuitiveness of google searching with the expressive power of Xquery.

Support very simple queries;

and arbitrarily complex expressions as necessary.

Example: search on “cat”

Page 3: CQL “Common Query Language” Ray Denenberg March 2005

cat

Page 4: CQL “Common Query Language” Ray Denenberg March 2005

cat

(That’s it. The whole query.)

Page 5: CQL “Common Query Language” Ray Denenberg March 2005

Simple CQL Queries cat

cat and dog

title = cat

Page 6: CQL “Common Query Language” Ray Denenberg March 2005

Simple CQL Queries cat (simplest)

cat and dog (simple boolean)

title = cat (index)

Page 7: CQL “Common Query Language” Ray Denenberg March 2005

Simple CQL Queries cat (simplest)

cat and dog (simple boolean)

title = cat (index)

dc.title = cat (index qualified)

Page 8: CQL “Common Query Language” Ray Denenberg March 2005

Boolean cat and dog

cat or dog

Cat not dog

Page 9: CQL “Common Query Language” Ray Denenberg March 2005

Boolean cat and dog

cat or dog

Cat not dog

cat not dog and fish or frog

Page 10: CQL “Common Query Language” Ray Denenberg March 2005

Boolean cat not dog and fish or frog

evaluates to:(((cat not dog) and fish) or frog)

Page 11: CQL “Common Query Language” Ray Denenberg March 2005

Boolean cat not dog and fish or frog

evaluates to:(((cat not dog) and fish) or frog)

Not:(cat not dog) and (fish or frog)

Page 12: CQL “Common Query Language” Ray Denenberg March 2005

index Search

title = cat

Page 13: CQL “Common Query Language” Ray Denenberg March 2005

Qualified index

title = cat dc.title = cat bib.title = cat Bath.keyTitle

Page 14: CQL “Common Query Language” Ray Denenberg March 2005

Fielded/index Search

dc.title = cat bib.title = cat

Page 15: CQL “Common Query Language” Ray Denenberg March 2005

dc.title A name given to the resource

bib.title (fictitious)A word, phrase, character, or group of characters, normally appearing in an item, that names the item or the work contained in it.     

Page 16: CQL “Common Query Language” Ray Denenberg March 2005

Zthes Indexes

zthes.nt=sauropod and zthes.bt=macronaria

narrower than sauropod but broader than macronaria.

Page 17: CQL “Common Query Language” Ray Denenberg March 2005

Relations

Page 18: CQL “Common Query Language” Ray Denenberg March 2005

Relations

<index> <relation> <search term>

Search Clause

The triple:

Is called a:

(e.g. title = cat)

Page 19: CQL “Common Query Language” Ray Denenberg March 2005

Relations

<index> <relation> <search term>

Page 20: CQL “Common Query Language” Ray Denenberg March 2005

Simple Relations

Title = "the complete dinosaur" title all "complete dinosaur“

title any "dinosaur bird reptile"

title exact "the complete dinosaur"

Page 21: CQL “Common Query Language” Ray Denenberg March 2005

the = relation

Title = "the complete dinosaur“

(find these three words,

adjacent and in this order)

Page 22: CQL “Common Query Language” Ray Denenberg March 2005

Title = "the complete dinosaur“

matches “a day in the life of the complete dinosaur“

and“the complete dinosaur goes to Paris“

Page 23: CQL “Common Query Language” Ray Denenberg March 2005

= Title = "the complete dinosaur“

matches “a day in the life of the complete dinosaur“

and“the complete dinosaur goes to Paris“

but not “the complete and unabridged dinosaur"

Page 24: CQL “Common Query Language” Ray Denenberg March 2005

All Title all "complete dinosaur“

matches “the complete and unabridged dinosaur“

does not match “the unabridged dinosaur“

Page 25: CQL “Common Query Language” Ray Denenberg March 2005

Title all "dinosaur bird reptile“

does not match “the complete dinosaur"

Page 26: CQL “Common Query Language” Ray Denenberg March 2005

Any

Title any "dinosaur bird reptile“

does match “the complete dinosaur" and

“the unabridged dinosaur"

Page 27: CQL “Common Query Language” Ray Denenberg March 2005

Exact title exact "the complete dinosaur" matches"the complete dinosaur"

Page 28: CQL “Common Query Language” Ray Denenberg March 2005

Exact title exact "the complete dinosaur" matches"the complete dinosaur"

Does not match: “a day in the life of the complete dinosaur

or“the complete dinosaur goes to Paris“ or“the complete and unabridged dinosaur “

Page 29: CQL “Common Query Language” Ray Denenberg March 2005

Relations …. observations

Page 30: CQL “Common Query Language” Ray Denenberg March 2005

Relations …. observations Observation 1: Shorthand

Page 31: CQL “Common Query Language” Ray Denenberg March 2005

title all "old man sea"

same as

title="old" and title="man" and title="sea"

Page 32: CQL “Common Query Language” Ray Denenberg March 2005

Relations …. observations Observation 2: Anchoring

^

The anchor character

Page 33: CQL “Common Query Language” Ray Denenberg March 2005

Recall ……. Title = "the complete dinosaur“

matches “a day in the life of the complete dinosaur“

Page 34: CQL “Common Query Language” Ray Denenberg March 2005

Anchoring

title="^the complete dinosaur" would not match

“a day in the life of the complete dinosaur”

Page 35: CQL “Common Query Language” Ray Denenberg March 2005

Anchoring title="^the complete dinosaur"

would not match “a day in the life of the complete dinosaur”

title="the complete dinosaur^" would not match

“the complete dinosaur goes to Paris”

Page 36: CQL “Common Query Language” Ray Denenberg March 2005

Relations …. observations Observation 3: Index and Relation go

together

Page 37: CQL “Common Query Language” Ray Denenberg March 2005

Index and Relation go together

Cat

Title = cat

Page 38: CQL “Common Query Language” Ray Denenberg March 2005

Index and Relation go together

Cat

Title = cat

Title cat

= cat

Page 39: CQL “Common Query Language” Ray Denenberg March 2005

Index and Relation go together

Cat

Title = cat

Title cat

= cat

Page 40: CQL “Common Query Language” Ray Denenberg March 2005

BNF

searchClause ::='(' cqlQuery ')‘ | index relation searchTerm | searchTerm

Page 41: CQL “Common Query Language” Ray Denenberg March 2005

Basic Relations …. summary

Title = "the complete dinosaur" title all "complete dinosaur“ title any "dinosaur bird reptile" title exact "the complete dinosaur"

Page 42: CQL “Common Query Language” Ray Denenberg March 2005

A few more relations … < less

> greater

<= less or equal

>= greater or equal

= (see next) <> not equal

Page 43: CQL “Common Query Language” Ray Denenberg March 2005

= relation

= means:

word adjacency, when the term is a list of words.

Equality, otherwise.

Page 44: CQL “Common Query Language” Ray Denenberg March 2005

Relation Modifiers Stem relevant Fuzzy phonetic

Page 45: CQL “Common Query Language” Ray Denenberg March 2005

Stemming

title =/stem "these completed dinosaurs“

matches

The Complete Dinosaur.

Page 46: CQL “Common Query Language” Ray Denenberg March 2005

Relevance

subject any/relevant "fish frog"

would find records whose subject field included words like shark, tuna, coelocanth, toad, amphibian, etc.

Page 47: CQL “Common Query Language” Ray Denenberg March 2005

Relation Modifiers Stem relevant Fuzzy phonetic

Page 48: CQL “Common Query Language” Ray Denenberg March 2005

fuzzy Fuzzy means:

“be liberal in what you count as a match … details left to the server. Might include permutations of character order, off-by-one for numerical terms.”

Title =/fuzzy “sharlot simmins” might match

“I am Charlotte Simmons”

telephoneNumber exact/fuzzy “303 441 1319"

Page 49: CQL “Common Query Language” Ray Denenberg March 2005

Relation Modifiers Stem relevant Fuzzy phonetic

Page 50: CQL “Common Query Language” Ray Denenberg March 2005

Phonetic Match words that sound the same

e.g. Hostel might match “hostile”

Page 51: CQL “Common Query Language” Ray Denenberg March 2005

Booleans And

Or

not

Page 52: CQL “Common Query Language” Ray Denenberg March 2005

Booleans And

Or

Not

Proximity

Page 53: CQL “Common Query Language” Ray Denenberg March 2005

And cat and dog

Or cat or dog

Not cat not dog

Proximity cat prox dog

Page 54: CQL “Common Query Language” Ray Denenberg March 2005

And cat and dog

Or cat or dog

Not cat not dog Proximity cat prox dog

roughly: “find cat near dog”

Page 55: CQL “Common Query Language” Ray Denenberg March 2005

Proximity (chestnut prox “Cryphonectaria parasitica”)

prox

(“dutch elm” prox

Ceratocystisulmi)

Page 56: CQL “Common Query Language” Ray Denenberg March 2005

Proximity parameters relation

Distance

unit

ordering

Page 57: CQL “Common Query Language” Ray Denenberg March 2005

Proximity parameters relation

Distance

unit

ordering

e.g: “Find cat in the same sentence as dog”

Relation: less or equalDistance: 0Unit: sentenceOrdering: unordered

Page 58: CQL “Common Query Language” Ray Denenberg March 2005

relation ("<", ">" ,"<=" ,">=" ,"=" , "<>"; default "<="),

distance (integer; default: 1 for word, zero otherwise)

unit ("word", "sentence", "paragraph", or "element"; default "word"),

ordering ("ordered" or "unordered"; default "unordered")

Page 59: CQL “Common Query Language” Ray Denenberg March 2005

“Find cat in the same sentence as dog”

cat prox//sentence dog

Page 60: CQL “Common Query Language” Ray Denenberg March 2005

“Find cat in the same sentence as dog”

cat prox//sentence dogsame as:cat prox/<=/0/sentence/unordered dog

Page 61: CQL “Common Query Language” Ray Denenberg March 2005

(chestnut prox//sentence

“Cryphonectaria parasitica”)

prox//paragraph

(“dutch elm” prox//sentence

Ceratocystisulmi)

Page 62: CQL “Common Query Language” Ray Denenberg March 2005

(chestnut prox//sentence

“Cryphonectaria parasitica”)

prox//paragraph

(“dutch elm” prox//sentence

Ceratocystisulmi)

(find chestnut in the same sentence as “Cryphonectaria parasitica”, and “dutch elm” In the same sentence as Ceratocystisulmi, and both sentences in the same paragraph.)

Page 63: CQL “Common Query Language” Ray Denenberg March 2005

(chestnut prox//paragraph

“Cryphonectaria parasitica”)

and

(“dutch elm” prox//paragraph

Ceratocystisulmi)

Page 64: CQL “Common Query Language” Ray Denenberg March 2005

(chestnut prox//paragraph

“Cryphonectaria parasitica”)

and

(“dutch elm” prox//paragraph

Ceratocystisulmi)

(find chestnut in the same paragraph as “Cryphonectaria parasitica”, and “dutch elm” In the same paragraph as Ceratocystisulmi.)

Page 65: CQL “Common Query Language” Ray Denenberg March 2005

cat prox/>/2//ordered hat

retrieves “cat in the hat” but not “cat in hat”

nor “hat on the cat”

Page 66: CQL “Common Query Language” Ray Denenberg March 2005

Pattern Matching ? Matches any single character

* Matches any sequence of zero or more characters

^ word-anchoring

Page 67: CQL “Common Query Language” Ray Denenberg March 2005

Pattern Matching ? Matches any single character

• c?t matches cat, cot, cut, but not coat or ct. c??t matches cart, but not cat or crypt.

* Matches any sequence of zero or more characters

• c*t matches cat, coat, crypt and counterargument.

^ word-anchoring ---

Page 68: CQL “Common Query Language” Ray Denenberg March 2005

Word Anchoring title="^the complete dinosaur"

Matches “the complete dinosaur meets godzilla” But not “a day in the life of the complete

dinosaur”

title="the complete dinosaur^ “ Matches a day in the life of the complete

dinosaur” But not “the complete dinosaur meets godzilla”

Page 69: CQL “Common Query Language” Ray Denenberg March 2005

Word Anchoring - any

title any "^cat ^dog rat“•Means title with cat at the beginning, or with

dog at the beginning,or with rat anywhere.

Page 70: CQL “Common Query Language” Ray Denenberg March 2005

Word Anchoring - any title any "^cat ^dog rat“

• Means title with cat anywhere, or with rat anywhere, or with dog at the beginning.

matches • 'cat eats dog', • 'dog eats hat' • ‘hat eats rat’

but not • ‘hat eats dog'

Page 71: CQL “Common Query Language” Ray Denenberg March 2005

CQL Syntax Reserved words:

and, or, not, prox

Special Characters Space ( ) = < > ” /

Page 72: CQL “Common Query Language” Ray Denenberg March 2005

Tokens A string that has no special

characters; or

Any string at all enclosed by double quotes. (Except the string cannot include a double quote, unless escaped.)

Page 73: CQL “Common Query Language” Ray Denenberg March 2005

Escape Character \ Backslash (\) escapes '*', '?', " and

'^' , as well as itself

"\“why not\?\" she said"

Results in the following token:

“why not?" she said

Page 74: CQL “Common Query Language” Ray Denenberg March 2005

Context sets

Page 75: CQL “Common Query Language” Ray Denenberg March 2005

Context sets Indexes

Relations

Relation modifiers

Boolean Modifiers

Page 76: CQL “Common Query Language” Ray Denenberg March 2005

subject any/relevant "fish frog"

Page 77: CQL “Common Query Language” Ray Denenberg March 2005

subject any/relevant "fish frog"

index relationRelationmodifier

Search term

Page 78: CQL “Common Query Language” Ray Denenberg March 2005

subject any/relevant "fish frog"

index relationRelationmodifier

Search term

Subject to context qualification

Page 79: CQL “Common Query Language” Ray Denenberg March 2005

dc.subject any/relevant "fish frog"

Context set

Page 80: CQL “Common Query Language” Ray Denenberg March 2005

dc.subject any/relevant "fish frog"

Page 81: CQL “Common Query Language” Ray Denenberg March 2005

dc.subject any/rel.lr "fish frog"

Page 82: CQL “Common Query Language” Ray Denenberg March 2005

dc.subject any/rel.lr "fish frog"

A specific Relevance algorithn

Context set

Page 83: CQL “Common Query Language” Ray Denenberg March 2005

dc.subject cql.any/rel.lr "fish frog"

Context set

Page 84: CQL “Common Query Language” Ray Denenberg March 2005

Example–fictitious relation: “only”

depicts only “cat"

Matching images would depict only a cat and nothing else. The same cat with a person would not match.

index relation

Page 85: CQL “Common Query Language” Ray Denenberg March 2005

image.depicts image.only “cat"

Context for index

Context for relation

Page 86: CQL “Common Query Language” Ray Denenberg March 2005

subject any/relevant "fish frog"

Go back to:

Page 87: CQL “Common Query Language” Ray Denenberg March 2005

subject any/relevant "fish frog"

title any/relevant “cat dog"

Or

Page 88: CQL “Common Query Language” Ray Denenberg March 2005

subject any/relevant "fish frog"

title any/relevant “cat dog"

Or/rel.mean

Page 89: CQL “Common Query Language” Ray Denenberg March 2005

subject any/relevant "fish frog"

title any/relevant “cat dog"

Or/rel.meanBoolean modifier

Contextset

Page 90: CQL “Common Query Language” Ray Denenberg March 2005

Defaults Consider the query:

cat

The server needs to turn that into a search clause, I.e. an index, relation, and search term.

As it is, there’s only a search term

Page 91: CQL “Common Query Language” Ray Denenberg March 2005

<index> <relation> cat

cql.scr(default context set and relation)scr: “server choice relation”

cql.serverChoice(default index)

Page 92: CQL “Common Query Language” Ray Denenberg March 2005

Next, consider the query:

title = cat

Page 93: CQL “Common Query Language” Ray Denenberg March 2005

Next, consider the query:

title = cat

The server needs to assign a context set to the index (title) and a context set to the relation (=)

Page 94: CQL “Common Query Language” Ray Denenberg March 2005

Next, consider the query:

title = cat

The server needs to assign a context set to the index (title) and a context set to the relation (=)

Or to make it even more complicated….

Page 95: CQL “Common Query Language” Ray Denenberg March 2005

Add a relation modifier

title = cat/relevant

The server needs to assign a context set to the index (title) and a context set to the relation (=), and a context set to the relation modifier.

Page 96: CQL “Common Query Language” Ray Denenberg March 2005

Default Context Sets

<>.title cql.= cat/cql.relevant

Default index seleted by server

Default context set for relation is ‘cql’

Default context set for relation modifier is ‘cql’

Page 97: CQL “Common Query Language” Ray Denenberg March 2005

Additional relation modifiers word

The term should be broken into words, (according to the server's definition of a 'word‘)

stringThe term is a single item, and should not be broken up.

isoDateEach item within the term conforms to ISO 8601

numberEach item within the term is a number.

uriEach item within the term is a URI.

masked (default modifier)

Page 98: CQL “Common Query Language” Ray Denenberg March 2005

Title any “cat dog” same as

Title any/word “cat dog”

Page 99: CQL “Common Query Language” Ray Denenberg March 2005

Title any “cat dog” same as

Title any/word “cat dog”

Title exact “cat in the hat” same as

title exact/string “cat in the hat”

Page 100: CQL “Common Query Language” Ray Denenberg March 2005

Title any “cat dog” same as

Title any/word “cat dog”

Title exact “cat in the hat” same as

title exact/string “cat in the hat”

Title = “cat * hat” same as

Title =/masked “cat * hat”