39
Korean script searching in Korean Library OPACs Junglim Chae Yonsei University

Korean script searching in Korean Library OPACs

  • Upload
    caesar

  • View
    68

  • Download
    0

Embed Size (px)

DESCRIPTION

Korean script searching in Korean Library OPACs. Junglim Chae Yonsei University. Indexing Method. N-Gram Morphological Analysis. N-Gram Indexing. N-Gram : Unigram, Bigram, Trigram, N-Gram E.g.) 아버지가 방에 들어가신다 12 Index by Bigram Segmentation - PowerPoint PPT Presentation

Citation preview

  • Korean script searching in Korean Library OPACs Junglim Chae Yonsei University

  • Indexing Method

    N-Gram

    Morphological Analysis

  • N-Gram IndexingN-Gram : Unigram, Bigram, Trigram, N-GramE.g.) 12 Index by Bigram Segmentation, , , 0 , 0, , 0 , 0, , , , Many index terms-many results but lots of noise High recall ratio but low precision ratio

  • Morphological AnalysisRequires a morphological analysis dictionaryE.g.) Three Index by morphological analysis, , Ability to match linguistically similar terms Faster performance with a smaller index Accurate matches that meet user expectationsHigh precision ratio but low recall ratio

  • N-Gram Vs. Morphological Analysis

    N-GramMorphological AnalysisRecall RatioHighLowPrecision RatioLowHighSize of IndexBigSmallIndexing SpeedFastSlowSearch SpeedSlowFastApplicationLibrariesWeb Search Engines

  • A Case Study

    Yonsei University LibraryLibrary System: Maestro-Y Search Engine: K2 by VerityIndexing Method N-Gram (bigram) + Morphological AnalysisIndexing RulesRule1: Divide Strings by space Rule2: Extract index using bigram indexing methodRule3: Add the whole string excluding spaces between strings Rule4: Add words from Korean morphological analysis dictionary

  • A Case Study

    Yonsei University LibraryE.g.)

    / (rule1), , , , (rule2)(rule3)(rule4)Index: , , , , , ,

  • Search Tips

  • Search Tips(1)Keyword Search

    , Default Search OptionUse at most 3 keywordsUse Boolean operatorsOmit Stop-words

  • Search Tips(2)Keyword Search

    Follow the Korean Word Division Rules E.g.) (O) (X)

  • Search Tips(3)

    Keyword Search

    Compound Nounsdo not use spaces between nounsE.g.) (O), (X )

  • Browse SearchBegin with or Truncation,

    When you already know the first word of the title, author, or publisher E.g.)

    Search Tips(4)

  • Browse Search

    Korean ClassicsE.g.)

    Search Tips(5)

  • Exact Match

    Precise Search

    Known itemsE.g.) Search Tips(6)

  • Exact Match

    Single character wordsE.g.) , , C

    Search Tips(7)

  • Support Hangul/Hancha Searching

    E.g.) /

    Search Tips(8)

  • Japanese KanaArchaic KoreanRussianSpecial characters : Choose scripts from Multi-language Input Table

    Search Tips(9)

  • E.g.) Multi-Script Input Table

  • Japanese Kana//

    Search Tips(10)

  • Personal names ; Shakespeare ; Murakami, Haruki ; ; ,

    Search Tips(11)

  • Space Considered as ANDE.g.) = AND In some OPACs, spaces in the character fields do make a difference in retrieval

    Search Tips(12)

  • Comparative search with and without space

  • Thank You

    [email protected]

    *********************.************

    *****

    *