Applying the KISS Principle with Prior-Art Patent Search

  • Published on
    11-Jan-2016

  • View
    18

  • Download
    3

Embed Size (px)

DESCRIPTION

Applying the KISS Principle with Prior-Art Patent Search. CLEF-IP, 22 Sep 2010. Walid Magdy Gareth Jones Dublin City University. DCU participation in CLEF-IP 2009. The more text, the better the results Structured search does not help Filtering helps - PowerPoint PPT Presentation

Transcript

  • Applying the KISS Principle with Prior-Art Patent SearchWalid Magdy Gareth Jones

    Dublin City UniversityCLEF-IP, 22 Sep 2010

  • DCU participation in CLEF-IP 2009The more text, the better the resultsStructured search does not helpFiltering helpsCombination of terms and phrases does betterWord matching for search is not the bestBlind relevance feedback is ineffectivePart of the answer is within the question

  • KISSKeep It Simple and StraightforwardThree submitted simple runs: 1. IR run (simple search) 2. Cit run (straightforward citation extraction) 3. IR+Cit run (combine IR and Cit runs)Evaluation results (25 submitted runs): 1. IR run (3rd in recall) 2. Cit run (1st in precision) 3. IR+Cit run (2nd in MAP, recall, and PRES)

  • IR runDifferent document versions of a patent are mergedOnly English parts are indexed (title, abstract, description, and claims)Query is constructed from the same fields as follows: - unigrams with freq>2 from description field - bigrams with freq>3 from all fieldsFrench and German topics are translated using Google translation1st three levels of classification are used to filter results

  • Cit and IR+Cit runsAll patents IDs are extracted from description section in patent topics IDs that do not exist in collection are filtered outRemaining IDs are considered as relevant documentsOnly 771 out of 2,005 topics could have citations extracted from its text (2,307 citations)IR run is appended to Cit run after removing duplicates to create IR+Cit run

  • Results

    Run #MAPRR@100PRESPRES@100IR0.1220.5700.3040.4610.228Cit0.1120.1190.1190.1190.118IR+Cit0.2030.6180.3850.5230.316

    Chart2

    0.3907

    0.3162

    0.2281

    0.2167

    0.2003

    0.2001

    0.1857

    0.1856

    0.1823

    0.1816

    0.1809

    0.1641

    0.1292

    0.1176

    0.1129

    0.104

    0.06363

    0.05877

    0.01089

    PRES@100

    DCU runs among submitted runs (large topics set)

    Sheet1

    humb0.3907

    dcu-30.3162

    dcu-20.2281

    spq0.2167

    uned-40.2003

    uned-20.2001

    uned-50.1857

    uned-30.1856

    uned-10.1823

    uned-80.1816

    uned-60.1809

    uned-70.1641

    ui-30.1292

    dcu-10.1176

    ui-10.1129

    ui-20.104

    bitem-20.06363

    bitem-10.05877

    uaic0.01089

    Sheet1

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    PRES@100

    DCU runs among submitted runs (large topics set)

    Sheet2

    bitem-1-S0.3191

    bitem-2-S0.3315

    dcu-1-S0.1163

    dcu-3-S0.5283

    dcu-2-S0.4633

    hild-1-small0.2622

    hild-2-small0.3552

    hild-3-small0.2849

    hild-4-small0.2809

    humb-S0.6409

    run-1-small0.1215

    run-2-small0.1216

    spq-S0.4579

    uaic-S0.01097

    ui-2-small0.2171

    ui-3-small0.264

    ui-1-small0.2215

    uned-1-S0.3635

    uned-2-S0.3843

    uned-3-S0.3658

    uned-4-S0.3847

    uned-5-S0.3664

    uned-6-S0.3636

    uned-7-S0.3339

    uned-8-S0.3599

    Sheet3

  • Conclusion & Future WorkWhen simpler approaches achieve better results than sophisticated ones: Much research is still needed in this area

    Extracted citations can be useful for relevance feedbackBetter translations can be used for FR/DE topicsFaster translation techniques can be used to translate FR/DE documents

  • Simply,Thank youthis was the KISS principle with patent search

    *

Recommended

View more >