Cây FP - Khai thác dữ liệu

Embed Size (px)

DESCRIPTION

Cây FP - Khai thác dữ liệu

Citation preview

  • Khai khong cc mu tun t ph bin m khng cn pht sinh cc tp ng vin

  • Ni dung bo coGii thiu khai khong mu tun tCch tip cn AprioriThit k cy v xy dng cy FP (Frequent Pattern Tree)Khai khong cc mu ph bin bng cch s dng cy FPnh gi cc kt qu thc nghim Cc vn ang cn tho lun

  • Gii thiu khai khong cc mu tun tT mt tp d liu, chng ta tm cc mu c chiu di l 1, 2, 3, tha min_support

    Data

    D lieu ban hang sieu th

  • Ni dung bo coGii thiu khai khong mu tun tCch tip cn AprioriThit k cy v xy dng cy FPKhai khong cc mu ph bin bng cch s dng cy FPnh gi cc kt qu thc nghim Cc vn ang cn tho lun

  • Cch tip cn AprioriThut ton Apriori: tng thut tonLp i lp li qa trnh pht sinh tp cc ng vin c chiu di k+1 t cc mu ph bin chiu di kKim tra ph bin ca ng vin tha min_support trong CSDL

  • Cch tip cn Apriori (tt)TIDCc mt hng c mua100f, a, c, d, g, i, m, p200a, b, c, f, l, m, o 300 b, f, h, j, o 400 b, c, k, s, p 500 a, f, c, e, l, p, m, n

    Chn ph bin cc tiu l (min_support) = 3 (60%)

  • Cch tip cn Apriori (tt)Bc 1: Tm F1 cha cc mu c chiu di l 1 tha min_supportF1={f, c, a, b, m, p}Bc 2: Qu trnh lp tm tp ng vin Ck v t Ck tm tp FkVi k=2C2 = {, , , , , , , , , , , , , , }F2= {, , , , , , }

  • Cch tip cn AprioriVi k=3C3={, , , , }F3={, , , }Vi k=4C4={}F4={}Vi k=5C5= ngngVy tp y cc mu ph bin l: f, c, a, b, m, p, fc, fa, fm, ca, cm, cp, am, fca, fcm, fam, cam, fcam

  • Nhng hn ch ca thut ton AprioriHai loi chi ph ca thut ton Apriori:Chi ph pht sinh ng vinChi ph lp i lp li vic duyt CSDL kim tra mt lng ln cc ng vin tha min_support104 mu ph bin c kch thc l 1Chi ph pht sinh ng vin qu lnChi ph duyt CSDL lnMc tiu: trnh pht sinh tp ng vin qu ln ngh xy dngcy FP (FP-tree)

  • Ni dung bo coGii thiu khai khong mu tun tCch tip cn AprioriThit k cy v xy dng cy FPKhai khong cc mu ph bin bng cch s dng cy FPnh gi cc kt qu thc nghim Cc vn ang cn tho lun

  • Thut ton xy dng cy FPChn cc item ph bin trong cc giao tc v sp xp chng theo th t gim dn ph bin trong tp LSp xp cc item trong tp F theo th t gim dn ca ph bin, ta c tp kt qu l L.Bc 1: Duyt CSDL, ly ra tp cc item ph bin F v tnh ph bin ca chng.Bc 2: To nt gc cho cy T, v tn ca nt gc s l Null.Sau duyt CSDL ln th hai. ng vi mi giao tc trong CSDL thc hin 2 cng vic sau:Gi hm Insert_tree([p|P],T) a cc item vo trong cy T

  • Thut ton xy dng cy FPLy ra tp ph bin LL bao gm cc item ph bin theo th t gim dn ca ph binBc 1:

  • Thut ton xy dng cy FPBc 2:

  • Cy FP - V dChn ph bin cc tiu l (min_support) = 3 (60%)

    Bng tt c cc item:

  • Cy FP - V d (tt)Ta c mt danh sch cc mt hng ph bin L l: Cc mt hng c sp th t gim dn theo ph bin

  • Cy FP - v d (tt)T tp d liu ban u, chng ta c c cy FP nh sau:

    TIDCc mt hngc muaCc mt hng ph bin( sp theo th t)100f, a, c, d, g, i, m,pf, c, a, m, p200a, b, c, f, l, m, of, c, a, b, m300b, f, h, j, of, b400b, c, k, s, pc, b, p500a, f, c, e, l, p, m, nf, c, a, m, p

  • Cy FP - v d (tt)T tp d liu ban u, ta xy dng header table ca cy FP nh sau:

  • Phn tch chi ph thut ton to cy FPng vi thut ton trn th chng ta cn chnh xc l 2 ln qut qua tt c cc giao tc ca CSDLChi ph a mt giao tc Trans vo trong cy l O(|Trans|)vi |Trans| l s ln xut hin ca cc item trong giao tc Trans ny.

  • Ni dung bo coGii thiu khai khong mu tun tCch tip cn AprioriThit k cy v xy dng cy FPKhai khong cc mu ph bin bng cch s dng cy FPnh gi cc kt qu thc nghim Cc vn ang cn tho lun

  • nh nghaC s iu kin ca nt m:(f:2, c:2, a:2)(f:1, c:1, a:1, b:1)

  • Thut ton khai khong cc mu ph bin s dng cy FPProcedure FP-growth(Tree, ){

    } (1) Nu Tree c cha mt ng i n P (2) Th vi mi cch kt hp ca cc nt trong ng i P thc hin (3)pht sinh tp mu U, support = min(support ca cc nt trong ); (4) ngc li ng vi mi ai trong thnh phn ca Tree thc hin { (5)pht sinh tp mu =aiU vi ph binsupport = ai.support; (6)xy dng c s iu kin cho v sau xy dng cy FP Tree theo iu kin ca ; (7)Nu Tree (8)th gi li hm FP-growth(Tree, )}

  • Khai khong cc mu ph bin bng cch s dng cy FP (tt)Call FP-Growth(Tree, null)

    i vi nt p=p U null = p, xut kt qu p:3C s iu kin l:(f:2, c:2, a:2, m:2)(c:1, b:1)Cy FP vi iu kin trn{(c:3)}pVy nt p c cc mu tun t ph bin l: p:3, cp:3

    Xut kt qu l: cp:3

  • Khai khong cc mu ph bin bng cch s dng cy FP (tt)i vi nt m=m U null = m, Xut kt qu m:3Cy iu kin FP ca m:

    (f:2, c:2, a:2)(f:1, c:1, a:1, b:1)Nn nt m c cc mu tun t ph bin l: {(m:3), (am:3), (cm:3), (fm:3), (cam:3), (fam:3), (fcm:3), (fcam:3)} Gi FP-Growth(Treem, m)

    V Treem c cha ng i n

    C s iu kin ca nt m:

  • Khai khong cc mu ph bin bng cch s dng cy FP (tt)Bng kt qu ca tt c cc item:

  • Ni dung bo coGii thiu khai khong mu tun tCch tip cn AprioriThit k cy v xy dng cy FPKhai khong cc mu ph bin bng cch s dng cy FPnh gi cc kt qu thc nghim Cc vn ang cn tho lun

  • Khai khong cc mu ph bin bng cch s dng cy FP (tt)

    Hiu qu hn so vi Apriori.Phn chia v kim sot qu trnh x l.S dng cy FP biu din cc mu ph bin th d liu gim rt ng k so vi cch biu din trong CSDL.

  • So snh FP-growth v Apriori

  • So snh FP-growth v Apriori

  • Ni dung bo coGii thiu khai khong mu tun tCch tip cn AprioriThit k cy v xy dng cy FPKhai khong cc mu ph bin bng cch s dng cy FPnh gi cc kt qu thc nghim Cc vn ang cn tho lun

  • Cc vn ang cn ang tho lun

    Vn xy dng cy FP cho cc projected database.

    Vn t chc lu tr cy FP trn a.

    Vn cp nht li cy khi cy tng trng v mt kch thc.

  • Vn xy dng cy FP cho projected databaseKhng th xy dng cy FP trong b nh chnh khi CSDL l ln.

    u tin phn chia CSDL vo trong cc projected database v sau xy dng mt cy FP v khai thc cy ny trong mi projected database.

  • Vn t chc lu tr cy FP trn aLu tr cy FP trong cc a cng.S dng cu trc B+Tree.

  • Vn cp nht li cy khi cy tng trng v mt kch thcCc thng tin b mt.

    Vic ti xy dng li cy c th xy ra.

  • Ti liu tham kho[1] Jiawei Han, Jian Pei, and Yiwen Yin (2000). Mining Frequent Patterns without Candidate Generation. The Natural Sciences and Engineering Research Council of Canada.[2] H. Huang, X. Wu, and R. Relue (2002). Association analysis with one scan of databases. In IEEE International Conference on Data Mining, pages 629-636.[3] J. Liu, Y. Pan, K. Wang, and J. Han (2002). Mining frequent item sets by oppotunistic projection. In Eight ACMSIGKDD Internationa Conf. on Knowledge Discovery and Data Mining, pages 229-238, Edmonton, Alberta.[4] F. Frahne, L. Lakshmanan, and X.Wang (2000). Efficient mining of constrained correlated sets. In ICDE00.[5] R. Agrawal and R.Srikant (1995). Mining sequential patterns. In ICDE95 pp. 3-14.[6] R. J. Bayardo (1998). Efficiently mining long patterns from databases. In SIGMOD98 pp. 85-93.[7] J. Han, J. Pei, and Y. Yin (1999). Mining partial periodicity using frequent pattern trees. In CS Tech. Rep. 99-10, Simon Fraser University.