Personalized Literature Retrieval Recommendation System Framework Design

  • Upload
    jcseuk

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

  • 7/31/2019 Personalized Literature Retrieval Recommendation System Framework Design.

    1/4

    Journal ofComputational

    Science andEngineering

    www.jcseuk.com

    2050-2311/Copyright 2012 IE Enterprises ltd Jour. of Comp. Sci. and Eng.All right reserved Vol. 1, Num.1, 00140017, 2012

    Personalized Literature Retrieval Recommendation System

    Framework Design

    Wenli Huaa, a*

    aAnhui Vocational College of Electronic & Information Technology Department of Teaching and Scientific Research Bengbu Anhui 233030,

    China

    Abstract

    Personal recommendation for academic literature in vast digital literatures database is one of the hot issues in informationretrieval area. Common recommendation systems often missed literatures with different characters but semantic similarity.

    The paper introduces the design scheme of academic literature recommendation system. Users interest model and feature of

    academic literature were studied from semantic view; furthermore similarity measure based on semantic was discussed.

    Academic Literature Recommendation System was proposed based on analyzing the recommendation process.

    Keywords: Recommendation systemLiteratureODP Categories SystemCosine similarity

    Due to the huge amount of digital literature resources available, researchers seeking specific pieces of

    literature are often unable to quickly locate the literature to which they are interested, resulting in the

    information loss andinformation overload phenomenon[1].

    A Literature personalized recommendation system is proposed to solve the above problem; it can find the

    users preference from the collected user information and then recommend digital document information to the

    user from the existing mass literature, thus greatly reducing the time the user spends waiting for retrieval.

    We combined elements from previous studies into this field and designed a framework for the literature query

    recommendation system. This framework mainly contains: user information collecting, user preferences feature,

    resource feature extraction of digital literature, similarity measurement of the literature and user preferences anddigital literature sequence recommendation modules etc.

    * Corresponding author. Wenli Hua

    E-mail address: [email protected]

    September 2012

  • 7/31/2019 Personalized Literature Retrieval Recommendation System Framework Design.

    2/4

    Wenli Hua / Journal of Computational Science and Engineering 1:1 (2012) 00140017

    The design of this system framework has provided the proof for the detailed design and development of a

    prototype system; it also has the certain reference value for the promotion and application of the literature

    recommendation theory.

    1. The overall system function

    The design of the system framework uses the process of literature retrieval as the proof of the module division,

    the process of using the PRSSL (the Personal Recommendation System for Scholarly Literature) system toretrieve is shown in Figure 1. The basic strategy of literature retrieval is that the system retrieves matching

    literature automatically according to the users preference information and then recommends the results to the

    user according to the sequence of matching level. In order to achieve the above functions, the main problems

    that the system needs to solve are: extraction of user interest preference, extraction of literature feature,

    similarity measure and recommendation of literature sequence etc.

    Fig. 1. The process of literature query recommendation

    2. The logical structure of the PRSSL

    After the demand analysis of PRSSL [2] its main logic module includescollection of user information,

    extraction of user preference feature, collection of literature information, extraction of literature feature,

    literature recommendation modules etc. The module picture of logical structure is shown in figure 2.

    Fig. 2. PRSSL Logical structure module picture of system

    PRSSL

    Information collection

    Extraction of

    user

    information

    Feature extraction

    Feature

    extraction of

    user

    preference

    Extraction

    of literature

    information

    Literature recommendation

    Extraction

    of literature

    feature

    Match measure of

    literature and user

    preference

    Sequence

    recommendation

    Collection of user information Collection of literature information

    Feature extraction

    Match of user preference and literature feature

    Sequence recommendation

  • 7/31/2019 Personalized Literature Retrieval Recommendation System Framework Design.

    3/4

    Wenli Hua / Journal of Computational Science and Engineering 1:1 (2012) 00140017

    3 PRSSL Function design

    3.1 Function design of main module of system

    According to the logical structure of the PRSSL system, its core functions include three parts: information

    collection, feature extraction, literature recommendation. And the difficulties of the algorithm design contain

    three aspects: indication and extraction of user preferences, indication and extraction of literature feature,matching measure of literature and user preference. PRSSL adopts the form of vector to represent user

    preference U and literature feature P respectively, uses the method of cosine similarity measure cos (Vu, Vp) tocalculate the matching level of user preferences and literature features, and eventually recommends literature

    according to the sequence of match level.

    3.2 Core algorithm design of system

    Three of the core algorithm need to be solved in PRSSL systemsindication of user preferences, indication

    of literature features and the match level of user preferences and literature features.

    (1) The algorithm of user preferences indication

    PRSSL uses feature vector U=((wu1,vu1),(wu2,vu2),,(wuu,vuu)) to indicate the user interest preferences,

    where wui is the feature word, vui is the level of interest which the user has placed on the feature word, wu i, wui

    is from the keyword of research area in the users registration information and the keyword of the literaturewhich the user has published.

    Define vui= k1+k2+k3k1, k2, k3 as representations of registration information, published literature, influence

    level of current retrieval method on interestingness, the values of vui, k1, k2, k3 are set mainly according to the

    experiment results and the purpose to maintain the consistence of order of magnitude with literature feature

    vector. In 0 vui 1and if the feature word wui appears in the registration information, then k1=0.281,

    otherwise k1=0if the feature word wui appears in the published literature, then k2=0.015*(includes the number

    of published literature of wi/ the total number of published literature)otherwise k2=0 wui if the feature word

    wui is included in the retrieval formula then k3=0.704otherwise k3=0

    (2) Literature feature representation algorithm

    All digital literatures have Keywords partit is feasible to chose these keywords as the carrier of literature

    feature description. PRSSL uses the number of times which the keyword appears in the literature as the proof to

    measure the literature feature, use vector P=((wp1, vp1),(wp2, vp2),,(wpp, vpp)) as the tool to indicate theliterature features, where wpi is the literature keyword, define vpi=fni/(fn1++fnp), record the number of timeswhich the keywords wpi appeared in the literature as fni. It is not difficult to see from the above definitions and

    agreement that vpi represents the weight that the keyword number i possessed in literature features.

    (3) Match measure of literature and user preference

    The match level of the literatures features and the user preferences will be measured by using the cosine

    similarity methodhere make Vu=(vu1, vu2,...., vuu)VP= (vp1,vp2,,vpp)in that Sim(U,P)=cos(Vu,Vp)=

    | | | |

    u p

    u P

    V V

    V V

    . But there is a problem which needs to be addressed, the feature of V u and Vp are not necessarily the

    same, in fact, there are three relationships between both characteristicssame, similar or different. The so-

    called similar means in terms of disciplines, the relationship between the two features may be the conceptual

    relationship between father and son, for example, the relationship between association rule and data mining can

    be seen as a father-son relationship. Based on this consideration, prior to the calculating the match level of thecomputing literature features and user preferences, first, we need to carry out the unified standardized process of

    characteristics to U,P, that is U,P should have the same characteristics. The process strategy of this part is as

    followsFirst step, generalizing each feature of user preference feature vector by using the ODP directory

    systems ideology [3]all the generalization paths of characteristics constitute a generalized tree structure, the

    root of the tree is the name of the second stage subject in research area subject which the user specified in the

    registration.

    Second stepstandardized process U, Pto guarantee that U, P has the same feature item, therefore this

    requires building the W union of U, P, the building process of W can be based on U, based on the basis of

    conceptual generalization, gradually add W in the feature item of P. Third step, use W as a standard, regulate Vuand VP respectively, prepare to calculate the cosine similarity. Forth step, calculate the cosine similarity.

  • 7/31/2019 Personalized Literature Retrieval Recommendation System Framework Design.

    4/4

    Wenli Hua / Journal of Computational Science and Engineering 1:1 (2012) 00140017

    4 Research outlook

    The main issues that need to be addressed next are: (1) How to determine the structure of the feature vector

    conceptual generalization tree ; (2) How to derive the union W by the conceptual generalized tree; (3) Using W

    as a standardhow to standardize Vu and VP. Here the key issue to be solved is how to transform the weight of

    all feature items of original Vu andVP into the weight about W.

    Acknowledgement

    The work is supported by Humanity and Social Science foundation of Ministry of Education of China under

    Grant No. 09YJC870001.

    References

    [1] Genyuan Lai. Research of cross-language recommended model of scientific literature. Journal of Library Science in China 2012

    38(198)71-78.

    [2] Yong Li, DezhiXu, Yong Zhang etc. Aarticle recommendation algorithm in VRE based on content filtering .Application Research of

    Computers, 2007, 24(9): 58-61..

    [3] http://www.dmoz.org/[OL]

    Biography

    Wenli Hua(1969.2-), male, postgraduate, assistant professor, current research interests are: recommendation

    system, data mining, Email: [email protected].