23
Presented by: AKHIL GADA CSCI 572 University of Southern California Full Text Indexing Based On Lexical Relations An Application :Software Library by YS Maarek and F.A. Smadja

Presented by: AKHIL GADA CSCI 572 University of Southern California

  • Upload
    teige

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Full Text Indexing Based On Lexical Relations An Application :Software Library by YS Maarek and F.A. Smadja. Presented by: AKHIL GADA CSCI 572 University of Southern California. REQUIREMENT FOR SEARCH IN SOFTWARE LIBRARY. SEARCH FOR FUNCTIONALLY SIMILAR COMPONENTS. - PowerPoint PPT Presentation

Citation preview

Page 1: Presented by: AKHIL  GADA CSCI 572 University of Southern California

Presented by:AKHIL GADA

CSCI 572University of Southern California

Full Text Indexing Based On Lexical Relations An

Application :Software Library by YS Maarek and F.A. Smadja

Page 2: Presented by: AKHIL  GADA CSCI 572 University of Southern California

July 15th , 20102Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

REQUIREMENT FOR SEARCH IN SOFTWARE LIBRARY

SEARCH FOR FUNCTIONALLY SIMILAR COMPONENTS

E.g. Yahoo Search API and Google Search API for query “I want to search pages”

Page 3: Presented by: AKHIL  GADA CSCI 572 University of Southern California

3Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

A.I. OR Knowledge Base Approach

I.R. OR Free Text Based Approach

ENTER DOMAIN KNOWLEDGE

NO PRIOR KNOWLEDGE REQUIRED

MANUAL OR SEMI-AUTOMATIC COMPLETELY AUTOMATIC

SPECIFIC AND DIFFICULT TO SCALE TO NEW DOMAIN

GENERIC AND VERRY EASY TO SCALE TO NEW DOMAIN

SEMANTIC UNDERSTANDING OF DOCUMENTS

NO SEMANTIC UNDERSTANDING OF DOCUMENTS

Page 4: Presented by: AKHIL  GADA CSCI 572 University of Southern California

4Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

SINGLE KEYWORD LEXICAL RELATION

CONTEXT INFORMATION IS LOST E.g. Apple Fruit VS Apple Computers

REVEALS CONTEXT INFORMATION

HIGH FREQUENCY GENERIC TERMS MIGHT INTRODUCE NOISE . E.g. Word “File” in UNIX manual does not characterize the functionality of any command

HIGH FREQUENCY OF LEXICAL TERM PROVIDES HIGH FUNCTIONAL INFORMATION OF DOCUMENT

E.g. Word “Copy File” in UNIX

VS

Page 5: Presented by: AKHIL  GADA CSCI 572 University of Southern California

5Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

LINEAR IR USING INVERTED INDEX

CLUSTERING IR USING HAC(Hierarchical Agglomerative Clustering)

Page 6: Presented by: AKHIL  GADA CSCI 572 University of Southern California

6Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

LEXICAL RELATIONS TWO WORDS IN A SENTENCE HAVING SYNTACTIC RELATIONSHIP BETWEEN THEM : Subject-Verb, Verb-Direct object , Verb-Indirect object, etc

OPEN CLASS WORD – NOUNS,ADJECTIVE,ADVERBS ARE MEANING BEARING .

CLOSED CLASS WORD – Conjunctions (and, or), Articles (the, a), Demonstratives (this, that), and Prepositions (to, from, at, with). Does not convey any Meaning to sentence

Page 7: Presented by: AKHIL  GADA CSCI 572 University of Southern California

7Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

5 – Word Window

EXTRACT [1] LEXICAL RELATIONS ALGO.[2]

W1

W2

W3

W4

W5

Page 8: Presented by: AKHIL  GADA CSCI 572 University of Southern California

8Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

EXTRACT [1] LEXICAL RELATIONS ALGO. [2]

Page 9: Presented by: AKHIL  GADA CSCI 572 University of Southern California

9Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

EXTRACT [1] LEXICAL RELATIONS ALGO. [2]

Page 10: Presented by: AKHIL  GADA CSCI 572 University of Southern California

10Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

RESOLVING POWER

OUTPUT FROM EXTRACT [1] ALGORITHM. [0]

Page 11: Presented by: AKHIL  GADA CSCI 572 University of Southern California

11Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

SELECT TOP N INFORMATIVE (RESOLVING POWER)LEXICAL RELATION FOR EACH DOCUMENT FORMING PROFILE FOR THE DOCUMENT .

CREATE INVERTED INDEX . [2]

Page 12: Presented by: AKHIL  GADA CSCI 572 University of Southern California

12Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

SIMILARITY MEASURE BETWEEN TWO DOCUMENTS [2]

• LET X = set of top N resolving power lexical relations for document dx Y = set of top N resolving power lexical relations for document dy (X ∩ Y) = Set of Lexical Relations Common Between dx and dy

dx dy∂(dx,dy)

Page 13: Presented by: AKHIL  GADA CSCI 572 University of Southern California

13Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

CLUSTER SIMILAR FUNCTIONAL COMPONENTS USING HIERARCHICAL AGGLOMERATIVE CLUSTERING[2]

{d1}

∂({d1},{d2}) ∂({d3},{d4})

{d2}

{d3}

{d4}

{d5}

∂({d3,d4},{d5})

Page 14: Presented by: AKHIL  GADA CSCI 572 University of Southern California

14Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

INFORMATION RETRIEVAL[2]USER SPECIFY FREE TEXT

QUERY

SEARCH AND RETURN RESULTS - LINEAR I.R. USING INVERTED INDEX

USER SATISFIED ??

ALLOW USER TO TRAVERSE THROUGH CLUSTERED HIERARCHY

NO

Page 15: Presented by: AKHIL  GADA CSCI 572 University of Southern California

15Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

LINEAR INFORMATION RETRIEVAL[2]

dqdq

d1

∂(dq,d2)

∂(dq,d1)

∂(dq,dn)

d2

dn

Page 16: Presented by: AKHIL  GADA CSCI 572 University of Southern California

16Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

GURU : WORKING SYSTEM SNAPSHOT [2]

Page 17: Presented by: AKHIL  GADA CSCI 572 University of Southern California

17Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

EVALUATION[2]

MAINTENANCE COST : INCREMENTAL INSERTION [3] OF NEW COMPONENTS IS EASY

EFFICIENCY: 2.5 secs on RT ;0.15 secs on IBM RISC for query containing 5 to 15 Lexical Relation

RETRIEVAL EFFECTIVENESS : Contd…

Page 18: Presented by: AKHIL  GADA CSCI 572 University of Southern California

18Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

EVALUATION Precision-Recall Curve[ 2]

If c = Total number of records retrieved after executing query q R= Total Number of expected correct result - Determined before

query is executed. r = Total number of correct result retrieved after executing query q.

Then Recall = r/R Prescision= r/c

Page 19: Presented by: AKHIL  GADA CSCI 572 University of Southern California

19Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

PROS:

EASY TO EXTEND TO ANY DOMAIN i.e. GENERIC APPROACH

VERY SIMPLE AND ELEGANT APPROACH

PAPER ADEQUATELY PROVIDED BACKGROUND BY DESCRIBING PAST RESEARCH

Page 20: Presented by: AKHIL  GADA CSCI 572 University of Southern California

20Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

CONS:May fail in following case

E.g. ‘xcalc’ and ‘bc’

Page 21: Presented by: AKHIL  GADA CSCI 572 University of Southern California

21Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

FURTHER RESEARCH:COMBINE KNOWLEDGE BASE APPROACH WITH THIS TECHNIQUE e.g. Knowledge bc=calculator can be added to GURU to increase recall.

IMPROVED ALGORITHMS FOR INCREMENTAL UPDATION OF INDICES .

Page 22: Presented by: AKHIL  GADA CSCI 572 University of Southern California

22Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

References• 0 - Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries by Yoelle S. Maarek, Frank A Smadja

• 1 - F. De Saussure, Cours de Linguistique Geaerale, Qualridme edition. Librairie Payot, Paris, France, 1949.

• 2 – GURU-Information Retrieval For Reuse - Y S. Maarek,Deniel M Berry,Gail E . Kaiser.

• 3 - Kaplan and Maarek, 1990: Incremental maintenance of semantic links in dynamically changing hypertext systems .Interacting with Computers

Page 23: Presented by: AKHIL  GADA CSCI 572 University of Southern California

23Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

Q & A