1 CS 430: Information Discovery Lecture 12 Extending the Boolean Model

CS 430: Information Discovery

Lecture 12

Extending the Boolean Model

Course Administration

Midterm examination:

Date: Wednesday, 31 October, 7:30 to 8:30 p.m.Room: TBAOpen book

Assignment 1:

Grades have been sent by email. If you have not received a grade, please send a message to wya@cs.cornell.edu

Problems with the Boolean model

Counter-intuitive results:

Query q = A and B and C and D and EDocument d has terms A, B, C and D, but not E

Intuitively, d is quite a good match for q, but it is rejected by the Boolean model.

Query q = A or B or C or D or EDocument d1 has terms A, B, C, D and EDocument d2 has term A, but not B, C, D or E

Intuitively, d1 is a much better match than d2, but the Boolean model ranks them as equal.

Problems with the Boolean model (continued)

• Boolean model has no way to rank documents.

• Boolean model allows for no uncertainty in assigning index terms to documents.

• The Boolean model has no provision for assigning weights to the importance of query terms.

Boolean is all or nothing.

Boolean model as sets

d and q are either in the set A or not in A. There is no halfway!

Extending the Boolean model

Term weighting

• Give weights to terms in documents and/or queries.

• Combine standard Boolean retrieval with vector ranking of results

Fuzzy sets

• Relax the boundaries of the sets used in Boolean retrieval

Ranking methods in Boolean systems

SIRE (Syracuse Information Retrieval Experiment)

Term weights

• Add term weights to documents

Weights calculated by the standard method of

term frequency * inverse document frequency.

Ranking

• Calculate results set by standard Boolean methods

• Rank results by vector distances

Relevance feedback in SIRE

SIRE (Syracuse Information Retrieval Experiment)

Relevance feedback is particularly important with Boolean retrieval because it allow the results set to be expanded

• Results set is created by standard Boolean retrieval

• User selects one document from results set

• Other documents in collection are ranked by vector distance from this document

Boolean model as fuzzy sets

q is more or less in A. There is a halfway!

Basic concept

• A document has a term weights associated with each index term. The term weight measures the degree to which that term characterizes the document.

• Term weights are in the range [0, 1]. (In the standard Boolean model all weights are either 0 or 1.)

• For a given query, calculate the similarity between the query and each document in the collection.

• This calculation is needed for every document that has a non-zero weight for any of the terms in the query.

MMM: Mixed Min and Max model

Fuzzy set theory

dA is the degree of membership of an element to set A

intersection (and)

dAB = min(dA, dB)

union (or)

dAB = max(dA, dB)

Fuzzy set theory example

standard fuzzy set theory set theory

dA 1 1 0 0 0.5 0.5 0 0

dB 1 0 1 0 0.7 0 0.7 0

and dAB 1 0 0 0 0.5 0 0 0

or dAB 1 1 1 0 0.7 0.5 0.7 0

Terms: A1, A2, . . . , An

Document D, with index-term weights: dA1, dA2, . . . , dAn

Qor = (A1 or A2 or . . . or An)

Query-document similarity:

S(Qor, D) = Cor1 * max(dA1, dA2,.. , dAn) + Cor2 * min(dA1, dA2,.. , dAn)

where Cor1 + Cor2 = 1

Terms: A1, A2, . . . , An

Document D, with index-term weights: dA1, dA2, . . . , dAn

Qand = (A1 and A2 and . . . and An)

Query-document similarity:

S(Qand, D) = Cand1 * min(dA1,.. , dAn) + Cand2 * max(dA1,.. , dAn)

where Cand1 + Cand2 = 1

Experimental values:

Cand1 in range [0.5, 0.8]

Cor1 > 0.2

Computational cost is low. Retrieval performance much improved.

Paice Model

Paice model is a relative of the MMM model.

The MMM model considers only the maximum and minimum document weights.

The Paice model takes into account all of the document weights.

Computational cost is higher than from MMM. Retrieval performance is improved.

See Frake, pages 396-397 for more details

P-norm model

Terms: A1, A2, . . . , An

Document D, with term weights: dA1, dA2, . . . , dAn

Query terms are given weights, a1, a2, . . . ,an, which indicate their relative importance.

Operators have coefficients that indicate their degree of strictness

Query-document similarity is calculated by considering each document and query as a point in n space.

See Frake, pages 397-398 for details

Test data

CISI CACM INSPEC

P-norm 79 106 210

Paice 77 104 206

MMM 68 109 195

Percentage improvement over standard Boolean model (average best precision)

Lee and Fox, 1988

Reading

E. Fox, S. Betrabet, M. Koushik, W. Lee, Extended Boolean Models, Frake, Chapter 15

Methods based on fuzzy set concepts

1 CS 430: Information Discovery Lecture 12 Extending the Boolean Model

Documents

Correlation of Boolean Functions - MIT CSAILpeople.csail.mit.edu/.../Coded_Aperture/Correlation_Boolean_Functio… · of Boolean Functions Desired Cryptographic Properties of Boolean

01 Logika Matematika - modul.mercubuana.ac.idEka+Agung...Aljabar Boolean Dua Nilai Sifat-Sifat Aljabar Boolean Fungsi Boolean Penyederhanaan Fungsi Boolean Kalkulus Proposisi Kalkulus

Boolean Algebra, Gates and Circuits - Radboud Universiteit · Equivalence of Boolean Expressions Two Boolean expressions that represent the same Boolean function are called equivalent

A. Abhari CPS2131 Chapter 2: Boolean Algebra and Logic Gates Topics in this Chapter: Boolean Algebra Boolean Functions Boolean Function Simplification

Extending the tephra and palaeoenvironmental record of the ......POST-PRINT 1 1 Extending the tephra and palaeoenvironmental record of the Central 2 Mediterranean back to 430 ka: A

Boolean Algebras - KSUfac.ksu.edu.sa/sites/default/files/5-boolean-algebra... · 2019. 12. 2. · The Abstract De nition of a Boolean Algebra Boolean Functions Representation of Boolean

1 CS 430 / INFO 430 Information Retrieval Lecture 11 Latent Semantic Indexing Extending the Boolean Model

CHAPTER 3 Boolean Algebra and Digital Logic€¦ · Boolean Algebra and Digital Logic . 3.1 Introduction 121 . 3.2 Boolean Algebra 122 . 3.2.1 Boolean Expressions 123 . 3.2.2 Boolean

1. Boolean Algebra Fachgebiet Rechnersysteme 1 1 Boolean

Boolean Algebra Applications1 BOOLEAN ALGEBRA APPLICATIONS RELIABILITY OF CIRCUITS

Boolean Algebra and Logic Simplification. Boolean Addition & Multiplication Boolean Addition performed by OR gate Sum Term describes Boolean Addition

Boolean algebra - howardhuang.ushowardhuang.us/teaching//cs231/02-Boolean-algebra.pdf · June 17, 2003 Boolean algebra 5 The definition of a Boolean algebra The secret is Boolean

CT455: Computer Organization Boolean Algebra. Lecture 3: Boolean Algebra Digital circuits Digital circuits Digital circuits Boolean Algebra Boolean

Logic Gates and Boolean Algebra - fke.utm.my · Boolean Simplification - Example •Applying boolean theorem for logic simplification depends on a thorough knowledge of boolean algebra,

New 1. Boolean Logic1 · 2002. 10. 14. · Boolean Functions and Boolean Algebra Boolean algebra deals with Boolean (or binary) values that are typically labeled true/false, 1/0,

Boolean and non-Boolean conjunction 1 - univie.ac.at › ... › texte › Boolean_and_non-Boolean_conju… · Boolean and non-Boolean conjunction 1 Viola Schmitt University of Vienna

1 Boolean Algebra Digital circuits Digital circuits Boolean Algebra Boolean Algebra Two-Valued Boolean Algebra Two-Valued Boolean Algebra Boolean

BB 0457 371 0001 - Stihl · 4850 430 2510 4850 430 5500 4850 430 5504 4850 430 5505 4850 430 5507 4850 430 5508 4850 430 5509 4850 430 5510 4850 430 5511 4850 430 5700 4850 430 5704

Laws of Boolean Algebra Commutative Laws of Boolean

Boolean Searching