000000208019.pdf

Embed Size (px)

Citation preview

  • 8/8/2019 000000208019.pdf

    1/125

    BGIO DC VO TOTR NGI HC BCH KHOA H NI

    -----------------------------------------------------

    LU N V N TH C S KHOA H C

    PH NG PHP X L PHN TCHTR C TUYN P DNG TRONG XY D NG

    H TR GIP QUYT NH D A VO D LIU

    CHUYN NGNH: X L THNG TIN V TRUY N THNG

    TR N NH CHI N

    NG I H NG D N KHOA HC: GS.TS. NGUY N THC H I

    H NI 2006

  • 8/8/2019 000000208019.pdf

    2/125

    - 2 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    MC LC

    Danh m c hnh v ................................................................................................5 Danh sch cc thu t ng v t vi t t t .......................................................6 Li m u ...............................................................................................................7 Chng I. Khai thc d liu v x l phn tch tr c tuy n ................10

    1.1. Gi i thiu cc ph ng php khai thc d liu.....................................................10 1.2. X l phn tch tr c tuyn (OLAP)......................................................................11 1.3. Nguyn tc ca OLAP............................................................................................12

    1.3.1. Khung nhn a chi u .........................................................................................12 1.3.2. Tnh trong su t (Transparency) ........................................................................12 1.3.3. Kh n ng truy nh p c..................................................................................13 1.3.4. Th c hin vic t o bo co ng nh t ..............................................................13 1.3.5. Ki n trc khch/ch (Client/Server) .................................................................13 1.3.6. C u trc chung cho cc chi u (Generic Dimensionality).................................13 1.3.7. Lm vi c v i ma tr n.........................................................................................14 1.3.8. H tr nhi u ng i s d ng .............................................................................14 1.3.9. Php ton gi a cc chi u khng h n ch ..........................................................14 1.3.10. Thao tc t p trung vo d liu........................................................................14 1.3.11. T o bo co linh ho t .....................................................................................15 1.3.12. Khng h n ch s chi u v cc m c k t h p d liu ......................................15

    Chng II. Kho d liu (Data Warehouse) ..............................................16 2.1. Cc thnh phn kho d liu ..................................................................................16

    2.1.1. Siu d liu (Metadata).....................................................................................17 2.1.2. Cc ngu n d liu .............................................................................................17 2.1.3. H th ng x l giao d ch tr c tuy n (OLTP) ....................................................18 2.1.3.1. Nhngc im ca hthng OLTP ........................................................19

    2.1.3.2. Cc cng cthu th p, lm sch v chuyn i dliu ngun...................202.1.4. C s d liu ca kho d liu ...........................................................................22 2.1.5. Kho d liu........................................................................................................23

    2.1.5.1.nh ngh a..................................................................................................232.1.5.2.c im dliu trong kho dliu ...........................................................24

    2.1.6. Kho d liu ch (Datamart) .........................................................................25 2.2. S dng kho d liu ...............................................................................................26 2.3. Ph ng php xy d ng kho d liu......................................................................28 2.4. Thit k CSDL cho kho d liu.............................................................................29

    2.4.1. Gi n hnh sao (Star).....................................................................................29 2.4.2. Gi n hnh tuy t r i (Snowflake)...................................................................32 2.4.3 Gi n k t h p..................................................................................................33 2.4.4. Nh ng v n lin quan t i thi t k gin hnh sao.......................................34

    2.4.4.1.nh ch s................................................................................................342.4.4.2. Ch th vmc............................................................................................35

    2.4.5. Nh ng nhn t thi t k cn phi c cn nh c...............................................35 2.5. Qun tr kho d liu...............................................................................................37

  • 8/8/2019 000000208019.pdf

    3/125

    - 3 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    Chng III. Ti p c n v phn tch a chi u trong x l phn tchtr c tuy n ..............................................................................................................39

    3.1. Tip cn a chiu....................................................................................................39

    3.2. Phn tcha chiu ..................................................................................................40 3.3. Kin trc khi ca OLAP (OLAP Cube Architecture) ......................................42 3.3.1. Gi i thiu ki n trc kh i ...................................................................................42 3.3.2. Kh i (Cube).......................................................................................................43

    3.3.2.1. Xcnh khi.............................................................................................443.3.2.2. Xl cc khi............................................................................................453.3.2.3. Khi o (Virtual Cube) ..............................................................................46

    3.3.3 Chi u (Dimension) .............................................................................................46 3.3.3.1. Xcnh cc chiu.....................................................................................483.3.3.2. Chiu c phn c p......................................................................................483.3.3.3. Phn c p chiu ...........................................................................................493.3.3.4. Roll_up v Drill_down da trn phn c p chiu .......................................503.3.3.5. Cc chiuo (Virtual Dimensions)............................................................50

    3.3.4. Cc n v o l ng (Measures).......................................................................51 3.3.5. Cc phn ho ch (Partitions).............................................................................51 3.3.6. Cc ph ng php l u tr d liu (MOLAP, ROLAP, HOLAP) .......................53

    3.3.6.1. MOLAP (Multidimensional OLAP)..........................................................533.3.6.2. ROLAP (Relational OLAP).......................................................................543.3.6.3. HOLAP (Hybrid OLAP)............................................................................55

    3.4. Thut ton ch sho cc khung nhn trong x l phn tch tr c tuyn kho d liu...................................................................................................................................5

    3.4.1. M t s khi ni m c bn ...................................................................................56 3.4.1.1. Cc khi dliu con (Subcubes) ...............................................................56

    3.4.1.2. Cu truy vn (Queries)...............................................................................563.4.1.3. Ch s(Indexes).........................................................................................573.4.1.4. Quan htnh ton v phthuc .................................................................58

    3.4.2. Thu t ton ch n View v Index.........................................................................61 3.4.2.1. c tnh kch th c ca mi View............................................................613.4.2.2. c tnh kch th c ca ch sIndex........................................................613.4.2.3. Xcnh bi ton .......................................................................................623.4.2.4. Gii quyt bi ton .....................................................................................63

    3.3.5 K t lun ..............................................................................................................66 Chng IV. H tr gip quy t nh d a vo d liu .............................67

    4.1. Htr gip quyt nh...........................................................................................67 4.1.1. Gi i thiu ..........................................................................................................67 4.1.2. H tr gip quy t nh ......................................................................................68 4.1.3. Phn lo i cc h tr gip quy t nh ................................................................69

    4.2. Htr gip quyt nh d a vo d liu................................................................71 4.2.1. Ti p cn kho d liu v OLAP..........................................................................71 4.2.2. Tr gip quy t nh d a vo d liu trn c s kho d liu v OLAP.............73 4.2.3. Ti n trnh tr gip quy t nh d a vo d liu cho bi ton c th .................75

    4.3. Xy d ng cu trc thng tin htr vic ra quyt nh ......................................77

  • 8/8/2019 000000208019.pdf

    4/125

    - 4 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    4.3.1. Vai tr c a c u trc thng tin ...........................................................................77 4.3.2. Cc y u t nh h ng .......................................................................................78

    4.3.2.1. Cc yu cu thng tin.................................................................................784.3.2.2. Mc tch h p.........................................................................................80

    4.3.3. M hnh t ch c thng tin .................................................................................81 4.3.3.1. Cc yu cu thng tin v nng lc ca hthng thng tin ........................814.3.3.2. Mc tch h p hthng..........................................................................83

    4.3.4. K t lun .............................................................................................................84 4.4. Dch v tr gip quyt nh ca Microsoft..........................................................85

    4.4.1. Kho d liu Microsoft .......................................................................................85 4.4.1.1. Microsoft Data Warehousing Framework .................................................864.4.1.2. Sphc t p ca dliu .............................................................................874.4.1.3. L i chi v i vic kinh doanh.................................................................884.4.1.4. M hnh dliu..........................................................................................884.4.1.5. Cc hnh thc lu tr .................................................................................89

    4.4.2. Ki n trc d ch v tr gip ra quy t nh ca Microsoft....................................90 4.4.3. Cc v n trong vi c tri n khai Microsoft DSS...............................................91

    4.4.3.1. Xy dng m hnh dliu OLAP cho Microsoft DSS..............................914.4.3.2. Lu tr mm do .......................................................................................934.4.3.3. Chuyn thng tin t i ng i sdng ..........................................................974.4.3.4. Khnng ca cc cng cOLAP............................................................100

    4.5. H ng nghin c u pht trin: Htr gip quyt nh phn tn ....................102 Chng V. Xy d ng h th ng tr gip quy t nh d a vo d liubng cng c Analysis Services ..................................................................106

    5.1. Mc tiu ca hthng..........................................................................................106 5.2. Yu cu vhthng..............................................................................................106

    5.3. Ch c nng chnh ca hthng............................................................................107 5.3.1. Ch c n ng t o l p CSDL a chi u .................................................................109 5.3.2. Ch c n ng phn tch v hi n th d liu .........................................................109

    5.4. Gi i thiu hthng...............................................................................................110 5.4.1. Kh i ng Analysis Manager..........................................................................110 5.4.2. Ci t c s d liu v ngu n d liu (Database & Data Source)...............110 5.4.3. T o kh i...........................................................................................................111 5.4.4. L u tr v x l kh i.......................................................................................114 5.4.5. Kh i o t ng c ng khn ng x l v bo mt .............................................117 5.4.6. T o kh i o......................................................................................................118 5.4.7. Hi n th d liu kh i........................................................................................120 5.4.8. V d minh ha ................................................................................................121

    Ph n k t lu n .....................................................................................................122 Ti li u tham kh o ...........................................................................................124 Tm t t lu n v n ..............................................................................................125

  • 8/8/2019 000000208019.pdf

    5/125

    - 5 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    Danh m c hnh v

    Hnh 1.1. Kho d liu v OLAPHnh 2.1. M hnh kho dliuHnh 2.2. Gin hnh sao v hnh tuyt r iHnh 3.1. M hnh d liu a chiuHnh 3.2. M hnh d liu khiHnh 3.3. Gin khi hnh saoHnh 3.4. Gin khi hnh tuyt r iHnh 3.5. S m hnha khiHnh 3.6. Phn c p chiu Sn_phm

    Hnh 3.7. Cy phn c p i xngHnh 3.8. Roll_up v Drill_down theo phn c p chiuHnh 4.1. Phn loi cc Hthng tin qun lHnh 4.2. Kho d liu v hthng OLAPHnh 4.3. Tin trnh tr gip quyt nh da vo dliu cho bi ton cth Hnh 4.4. Ma tr n Yu cu/Nng lcHnh 5.1. Kin trc htr gip quyt nh da vo d liuHnh 5.2. Chc nng htr gip quyt nh da vo dliuHnh 5.3. To DataSource cho cc khi trong DatabaseHnh 5.4. Chn bng FactHnh 5.5. Chn n v oHnh 5.6. To chiuHnh 5.7. Chn cc mc ca chiuHnh 5.8. Chn kiu lu tr Hnh 5.9. Tng tc thc hin

    Hnh 5.10. Xl khiHnh 5.11. Chn cc khi cho khi oHnh 5.12. Chn n v o cho khi oHnh 5.13. Chn chiu cho khi oHnh 5.14. Hin th d liu khi

  • 8/8/2019 000000208019.pdf

    6/125

    - 6 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    Danh sch cc thu t ng v t vi t t t

    CSDL C s dliuDBA DataBase Administrator Qun tr c s dliuDM DataMart Kho dliu ch DSS Decision Support System Htr gip quyt nhHOLAP Hybrid OLAP OLAP ghpETL Extract Transformation Load Trch xut, chuyn v n pdliuLS Legacy System Hthng c snMIS Management Information System Hthng tin qun lMOLAP Multidimensional OLAP OLAPa chiuMSS Management Support System Hhtr qun lOLAP On-Line Analysis Processing Xl phn tch tr c tuynOLTP On-Line Transaction Processing Xl giao dch tr c tuynRDBMS Relational DataBase ManagementSystem Hqun tr CSDL quan hROLAP Relational OLAP OLAP quan h SA Subject Area Vng ch

  • 8/8/2019 000000208019.pdf

    7/125

    - 7 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    L i m u

    Cc hot ng sn xut, kinh doanh hin nay lun cn c s p ng

    nhanh nhy, tc th i i v i cc thayi lin tc, v vy cc nh qun l buc phi th ng xuyn ra cng lc nhiu quyt nh ng n (m chng s nhh ng ng k n xu h ng hot ng v s cnh tranh ca doanh nghi p)mt cch nhanh chng. Do vn tr gip quyt nh tr nn r t cn thit. Ng i ta cn phi thu th p, tng h p v phn tch dliu tnhiu ngun khcnhau mt cch nhanh v hiu qu th m i c th ra c nhng quyt nhnhanh chng v ph h p. iu ny dn n vic cn pht trin nhng h

    thng tinh thng bit cch lm thnotrch chn v phn tch d liu chong i sdng.

    Hin nay c r t nhiu phn mm cung c p cho ng i s dng nhngkhnng truy vn v l p cc bo co thng tin,c bit l cc hqun tr CSDL quan h. Tuy nhin CSDL quan hv i cu trc hai chiu (dng v ct)khng c thit k cung c p cc quanim a chiu trn d liu u voca cc phn tch phc t p. S dng cc h thng ny, chng ta sg p r tnhiu kh khn v bt tin trong vic tchc d liu a chiu vo cc bnghai chiu, khng th trin khai d liu phn tch v i s l ng l n, cng c phn tchto ra cc d liu quyt nh khng mnh, thun tin, linh hot,nhanh chng v nht l khng ddngs dng i v i cc nh qun l,nhng ng i ra quyt nh.

    Nhvy, vic xy dng mt hthng m i c khnng tchc d liua chiu v c khnng phn tch d liu linh hot tr l i c cc truyvn a chiu mt cch ddng, nhanh chng nhm h tr cho vic ra quytnh ca cc nh qun l l cn thit.Mc ch ca ti:

    Lun vn c p n vic nghin cu xy dng mt htr gip quyt

  • 8/8/2019 000000208019.pdf

    8/125

    - 8 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    nh da vo d liu, s dng ph ng php lun x l phn tch tr c tuyn(OLAP).ti st p trung vo hai cng vic chnh l nghin cu vn t

    chc c s d liu a chiu, phn tch v hin th d liu tr gip ra quytnh.Htr gip quyt nh theo cch ti p cn ny c thgip cc nh qun

    l thit l p mt m hnh OLAP chong dng c thca mnh trong vic t chc c s d liu a chiu v ddngiu chnh hot ng phn tch, tmkim thng tin theo nhng kha cnh khc nhau ca d liu nhm thu th p c ti a d liu cn thit t a c nhng quyt nh tt nht mt

    cch nhanh chng.Khng ging v i cc htr gip quyt nh truyn thng th ng c

    xy dng v i mc ch a ra gii php ti u cho mt bi ton cth, trongmt phm ving dng h p, htr gip quyt nh da vo d liu h ng nvic gip ng i s dng c thkhai thc c ti a khnng tim n camt khi l ng dliu l n, nhm thu c nhng thng tin tng h p cckha cnh khc nhau ca d liu, t c thra cc quyt nh ng mt

    cch nhanh chng. Doc im ny, phm ving dng ca htr gip quytnh da vo d liu l r ng. N c th c sdng tr gip quyt nhcho cc bi ton khc nhau, trong nhng l nh vc khc nhau.Bcc ca lun vn:

    Ton blun vn c trnh by trong 5 ch ng:

    Ch ng 1: Gi i thiu cc ph ng php khai thc d liu, cc ni dungc bn vxl phn tch tr c tuyn.

    Ch ng 2: Trnh by cc l thuyt chung vkho d liu v m hnhkho dliu, ph ng php xy dng v thit k CSDL cho kho dliu.

    Ch ng 3: Trnh by ph ng php ti p cn v phn tcha chiu trongxl phn tch tr c tuyn.

  • 8/8/2019 000000208019.pdf

    9/125

    - 9 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    Ch ng 4: Gi i thiu H tr gip quyt nh da vo d liu v i haithnh phn chnh l kho d liu v x l phn tch tr c tuyn. Tin

    trnh tr gip quyt nh da vo d liu. Xy dng cu trc thng tinhtr vic ra quyt nh v gi i thiu vdch vtr gip quyt nhca Microsoft. H ng nghin cu pht trin.

    Ch ng 5: Xy dng hthng v i chc nng to l p c s d liu achiu v phn tch hin th dliu.

  • 8/8/2019 000000208019.pdf

    10/125

    - 10 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    Ch ng I. Khai thc d liu v x l phn tch tr c tuy n

    1.1. Gi i thiu cc ph ng php khai thc d liu

    Khai thc d liu l qu trnh pht hin ra nhng mi quan h linthuc, cc m hnh v cc khuynh h ng m i (Patterns & Trends) bng vickho st mt s l ng l n d liu c lu tr trong cc kho (Repository) s dng cc cng nghvnhn dng mu cng nh cc k thut thng k vton hc. Khai thc d liu c thhiu l k thut khoan d liu theo chiusu v tng h p d liu theo chiu ng c li, l qu trnho x i xem xt d liu d i nhiu gcnhm tm ra cc mi lin hgia cc thnh phn d liu v pht hin ra nhng xu h ng, hnh mu, kinh nghim qu kh timntrong kho dliu. V vy n r t ph h p v i mc ch phn tch dliu htr iu hnh v ra quyt nh.

    Phn l n cc ph ng php khai thc d liu u da trn cc l nh vcnhhc my, thng k v cc cng ckhc. Mt sk thut th ng dng lmng N -ron (Neuron Network), gii thut di truyn (Genetic Algorithms) vxl phn tch tr c tuyn (OLAP).

    X l phn tch tr c tuyn chnh l vic sdng kho d liu cho mcch tr gip quyt nh. t ng m phng cc chiu trong d liu c th c m r ng: mt bng v i n thuc tnh c th c xem nhmt khnggian n chiu. Ng i qun l th ng t nhng cu hi m c thphn tchtrong nhng phn tcha chiu. Cc thng tin ny khng phi dphn tchkhi bng c biu din hai chiu v CSDL quan hchun khng th png

    tt cng vic ny. Trong tr ng h p nhvy, sdng OLAP tra thch h p.Cng c mt skhc nhau gia cc cng cOLAP v khai thc d liu

    l cng cOLAP khng thhc, chng khng to nn tri thc m i vkhng tm kim c gii php m i. Nhvy c skhc nhau c bn gia trithc a chiu v kiu tri thc m mt ng i c thly ra c tmt CSDL

  • 8/8/2019 000000208019.pdf

    11/125

    - 11 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    thng qua khai thc dliu.

    Hnh 1.1. Kho dliu v OLAP

    1.2. X l phn tch tr c tuyn (OLAP)

    OLAP l mt chc nng thng minh trong xl nghi p v, lm cho ccthng tin c thhiu c ddng. OLAP khin cho ng i sdng u cui(End-User) c th hiu c bn cht bn trong thng qua vic truy nh pnhanh, t ng tc t i cc khung nhn nhiu dng ca thng tin c chuyni tcc dliu thphn nh s a dng nhiu chiu.

    OLAP l mt cng nghphn tch d liu thc hin nhng cng vicsau:

    a ra mt khung nhn Logic, nhiu chiu ca d liu trong kho d liu. Khung nhn ny hon ton khng phthuc vo vic d liu clu tr nh thno (c th c lu tr trong mt kho d liu nhiuchiu hay mt kho dliu quan h).

    Th ng lin quan t i nhng truy vn phn tch t ng tc d liu. S t ng tc th ng l phc t p, lin quan t i vic khoan su xung nhngmc d liu chi tit h n hoc cun ln mc dliu cao h n mc tngh p hoc k t h p.

    Cung c p khnng thit l p m hnh phn tch bao gm tnh ton t l,

  • 8/8/2019 000000208019.pdf

    12/125

  • 8/8/2019 000000208019.pdf

    13/125

    - 13 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    tn ti trong mt kin trc hthng m , cho php cc cng cphn tch cth c nhng vo bt k n i no m ng i sdng mong mun m khng

    c mt s tc ng ng c li no v i cc chc nng ca cng c trn mych.

    1.3.3. Kh nng truy nh p c

    Cng cOLAP phi nh x c gin Logic ca chnh n t i khod liu vt l hn t p, truy nh p t i d liu v thc hin mi chuyn i cnthit a ra mt khung nhn n gin, mch lc vng nht cho ng i s dng. D liu vt l ca hthng thuc kiu ny tr nn trong sut v i ng isdng v ch l mi quan tm ca cng c.

    1.3.4. Th c hi n vi c t o bo co ng nh t

    Khi sl ng cc chiu tng th nng sut bo to bo co gim i.

    1.3.5. Ki n trc khch/ch (Client/Server)

    Thnh phn Server ca cc cng cOLAP cn phi thng minhnmc m nhiu Client c th c truy nh p t i mt cch ddng v c thl ptrnh tch h p. Server thng minh phi ckhnng nh xv xy dngd liu tnhng c s d liu vt l v Logic khc hn nhau.iu r t cnthit m bo tnh trong sut v xy dng mt l c mc khi nim,Logic, vt l chung.

    1.3.6. C u trc chung cho cc chi u (Generic Dimensionality)

    Mi chiu ca d liu phi cn bng gia cu trc v khnng thc

    hin ca n. Th ng ch tn ti mt cu trc chung cho tt ccc chiu. Michc nng c p dng cho mt chiu cng c thp dng cho cc chiukhc.

  • 8/8/2019 000000208019.pdf

    14/125

    - 14 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    1.3.7. Lm vi c v i ma tr n

    Cu trc vt l ca OLAP Server cn phi bin i cho ph h p v i m

    hnh phn tch cth c to ra v ti vovic qun l cc ma tr n l tiu nht. Khi lm vic v i cc ma tr n, OLAP Server phi c khnng suylun v tm ra cch lu tr d liu hiu qunht. Cc ph ng php truy nh pvt l cng c thayi th ng xuyn v cung c p nhng c chkhc nhaunh tnh ton tr c ti p, cy nh phn, k thut bm hoc s k t h p tt nhtnhng k thut nhvy.

    1.3.8. H tr nhi u ng i s d ng

    Nhng cng cca OLAP phi cung c p truy nh p ng th i (ly d liu ra v c p nht), tnh ton vn v an tonh tr cho nhng ng i s dng lm vic ng th i v i cng mt m hnh phn tch hoc to ra nhngm hnh khc nhau tcng mt dliu.

    1.3.9. Php ton gi a cc chi u khng h n ch

    Trong phn tch d liu a chiu, tt ccc chiu c to ra v c vai

    tr nh nhau. Cc cng cOLAP qun l nhng tnh ton lin quan t i ccchiu v khng yu cu ng i sdng phi nh ngh a nhng php ton.Vic tnh toni hi phi nh ngh a cc cng thc ty thuc vo mt ngnng, ngn ngny phi cho php tnh v thao tc v i mt s l ng chiu btk m khng b hn chb i mi quan hgia cc phn t, khng lin quant i sthuc tnh chung ca dliu ca mi phn t.

    1.3.10. Thao tc t p trung vo d li u

    Nhng thao tc nh nh h ng li ng dn xy dng d liu hockhoan su xung theo cc chiu hoc cc hng c thc hin bng hnhng tr c ti p trn nhng phn t ca m hnh phn tch m khngi hi phi sdng nhng Menu hay ngt cho giao din v i ng i sdng. Nhng

  • 8/8/2019 000000208019.pdf

    15/125

    - 15 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    chiu c nh ngh a trong m hnh phn tch cha tt cthng tin m ng isdng cn thc hin nhng hnhng chu.

    1.3.11. T o bo co linh ho t

    V i vic s dng OLAP Server v cc cng cca n, mt ng i s dng u cui c ththao tc, phn tch,ng bho v xem xt d liu theo bt k cch no m ng i mong mun, bao gm c vic to ra nhngnhm Logic hoc b tr nhng hng, ct, phn t cnh nhng phn t khc. Nhng ph ng tin to bo co cng phi cung c p tnh linh hot v a ranhng thng tin c ng b theo bt k cch no m ng i s dngmun hin th chng.

    1.3.12. Khng h n ch s chi u v cc m c k t h p d li u

    Mt OLAP Server c thcha c t nht l 15 chiu trong mt mhnh phn tch thng th ng nht. Mi chiu cho php mt s l ng khnggi i hn cc mc tng h p v k t h p d liu do ng i sdng nh ngh a va ra cch xy dng cc mc .

  • 8/8/2019 000000208019.pdf

    16/125

    - 16 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    Ch ng II. Kho d liu (Data Warehouse)

    Hin nay hu ht cc tchc u ang phi ng u v i s thayi

    ca th tr ng. Ng i ta thy r ng c th a ra mt quyt nh ngn,tr c ht phi c khnng truy nh p t i tt ccc loi thng tin nhanh chng.i v i mt tchc no, c thc quyt nh ngn, cn nghin cucnhng d liu qu kh, phn tch nhm nh ra ton b cc xu h ng cth. Trong bi cnh cng ngh thng tin pht trin, d liu c t p trungtrong nhng c s d liu khng l, nhu cu truy c p vo tt ccc thng tinl cn thit. Cch c hiu qunht tr gip nhu cu truy nh p thng tin l

    tchc kho dliu (Data Warehouse).2.1. Cc thnh phn kho d liu

    Cc thnh phn cu thnh kho d liu cung c p mt khung c bn traoi vkin trc, cu trc v cc chin l c ca kho dliu.

    Hnh 2.1. M hnh kho dliu

  • 8/8/2019 000000208019.pdf

    17/125

    - 17 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    2.1.1. Siu d li u (Metadata)

    Trong vic tchc kho d liu, khng ch nhng ng i dngu cui

    m ngay cnhng nhn vin qun tr u cn truy nh p ton b thng tintrong bng gm cci t ng cng nh cc thuc tnh. Do hmun bitmt svn :

    C thtm thy dliu u? Tn ti nhng loi thng tin, dliu no? Dliu thuc loi no, c dng ra sao? Trong cc c s d liu khc nhau th d liu c lin quan v i nhau

    nhthno? Dliu c ly t u v n thuc ai qun l?

    V vy hnh thnh mt dng c s d liu khc c gi l Metadatanhm m tcu trc ni dung ca c s d liu chnh. Trong mi tr ng c s d liu phc h p, mt Metadata ph h p l khng ththiu b i n nh racu trc c s d liu tc nghi p v ccu trc kho d liu. Mt vn xuthin th ng xuyn l khnng giao ti p v i ng i s dng vnhng thngtin bn trong kho d liu v cch thc chng c truy nh p. Chnh Metadatal cchng i s dng v ccng dng c th ti p cn c v i nhngthng tin c lu tr trong kho d liu. N c th nh ngh a tt ccc phntdliu v cc thuc tnh ca chng.

    Metadata cn c thu th p khi kho d liu c thit k v xy dng.Metadata phi c sn cho tt cnhng ng i s dng kho d liu h ng

    dn hdng kho dliu. Ngoi ra cc cng ctr gip cng c thit l p vcn c nh gi.

    2.1.2. Cc ngu n d li u

    Bao gm cc h thng trong v ngoi ca mt tchc, r t phong ph

  • 8/8/2019 000000208019.pdf

    18/125

    - 18 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    vchng loi. Cc hthng nm trong c coi nhcc hthng ngun hoccc hthng c sn.

    Hthng c sn (Legacy System - LS): l mt hthng tc nghi p.Hthng ny tng c pht trin, sdng cc cng nghc sn vvn ph h p v i cc nhu cu. Cc hthng ny c th c thc hintrong nhiu nm v c lkhng c hoc c r t t minh chng bng tiliu.

    D liu ngoi: l d liu khng nm trong cc hthng tc nghi p camt tchc, l nhng dliu do ng i sdng u cui yu cu.

    Cc LS c pht trin phc vcho cc dn. Ccng dng c pht trin cng v i d liu m cc d liu ny li png nhiu nhu cu khcnhau. Cng l mt dliu nhng li c tn khc nhau hoc thuc cc hthngo l ng khc nhau. K t qucui cng l cc ngun d liu cn c nhgi v ccnh ngh a cn c a vo Metadatanhm t i cc vn sau:

    Xc nh cc ngun khc nhau, cc cu trc file khc nhau, cc nn(Platform) khc nhau.

    Hiu c d liu no c trong cc h thng ngun ang tn ti, ccnh ngh a ca dliu v bt k cc lut no cho dliu.

    Pht hin sgiao nhau vthng tin ca cc hthng khc nhau. Quyt nh d liu tt nht trong cc hthng. Mi hthng cn c

    nh giquyt nh h thng no c d liu r rng v chnh xch n.

    2.1.3. H th ng x l giao d ch tr c tuy n (OLTP)D liu pht sinh t cc hot ng hng ngy c thu th p, x l

    phc v cng vic c th ca mt t chc th ng c gi l d liu tcnghi p v hot ng thu th p x l loi d liu ny c gi l x l giao

  • 8/8/2019 000000208019.pdf

    19/125

    - 19 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    dch tr c tuyn (OLTP).D liu ti cc CSDL tc nghi p c ly t nhiu ngun khc nhau

    nn db nhiu, hn t p dn n d liu khng sch, khng ton vn. Dovic kim tra d liu, lm sch d liu phi c tin hnh ngay ti y nhm bo m tnh ton vn, tnhng n ca d liu phc v cho vic xydng kho dliu v tr gip ra quyt nh sau ny.

    2.1.3.1. Nh ng c i m ca hth ng OLTP

    Tr gip s l ng l n ng i s dng ng th i trong vic thm m i,sa i dliu.

    Din t tr ng thi thayi bt buc ca t chc nhng khng lu lilch sca n.

    Cha ng s l ng l n cc d liu, bao gm d liu tng qut kim sot thc hin.

    c iu chnh png nhanh vic thc hin. Cung c p c s htng cng ngh h tr cc thao tc th ng ngy

    ca mt tchc.Chnh t nhng c im ny, nu chng ta s dng OLTP cho phn

    tch tr c tuyn th th ng g p nhng kh khn sau: Cc yu cu phn tch, tng h p nhng khi l ng l n d liu nh

    h ng t i khnng ca hthng. S thc hin ca hthng khipng nhng yu cu phn tch phc

    t p c thchm hoc khngn nh, cung c p sh tr khngy

    cho ng i sdng trong phn tch tr c tuyn. S thay i d liu th ng xuyn gy tr ngi cho tnh tin cy ca

    thng tin phn tch. An ninh tr nn phc t p h n khi phn tch tr c tuyn c k t h p v i

  • 8/8/2019 000000208019.pdf

    20/125

    - 20 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    xl giao dch tr c tuyn.Kho d liu v i nhim v t chc d liu cho mc ch phn tch

    gii quyt c cc kh khn trn bng vic cung c p nhng kha chnh, cckho dliu c th: K t h p d liu tnhng ngun d liu hn t p vo trong mt cu trc

    n thun nht.

    T chc d liu trong nhng cu trc n gin p ng hiu qucacc yu cu c tnh phn tch h n l cho vic xl giao dch.

    Cha dliu thayi, h p l, chc chn v h p l ho trong phn tch.

    Cung c p dliun nh. c c p nht nh k d liu bsung h n l nhng giao dch th ng

    xuyn. Cung c p mt c s d liu c t chc ph h p cho OLAP h n l

    cho OLTP.

    2.1.3.2. Cc cng c thu th p, lm s ch v chuy n i d liu ngu n

    Mt yu cu quan tr ng l sdng nhng d liu c tinh cht nhng hthng tc nghi p va chng vo mt khun dng thch h p chocc ng dng thng tin. Nhng cng cny thc hin tt ccc cng vicchuyn i, tm tt nhng thayi quan tr ng, nhng thayi vcu trc vnhng cng cn thit cho schuyn i d liu ring r thnh thng tin cth c dng trong nhng cng c h tr quyt nh. N sinh ra nhngch ng trnh v kim sot nhng cu lnh Cobol, ngn ngJLC, Unix Script

    v ngn ng nh ngh a d liu SQL cn thit chuyn d liu vo kho d liu t nhiu h thng tc nghi p khc nhau. Ngoi ra n cng duy trMetadata. Cc chc nng chnh bao gm:

    Loi bnhng d liu khng mong mun t nhng c s d liu tc

  • 8/8/2019 000000208019.pdf

    21/125

    - 21 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    nghi p.

    Chuyn i thnh nhng tn v nhng nh ngh a dliu chung.

    Tnh ton cc tng v dliu c chuyn ha. Thit l p nhng mc nh cho cc dliu bmt. Lm cho nhng thayi v nh ngh a dliu ngun tr nn thch h p.

    Nhng cng cny c thtit kim c mt cchng k th i gianv sc lc. Tuy nhin nhiu cng cc sn m i ch c ch cho vic tinh ch nhng d liu n gin do vic pht trin nhng th tc tinh chc kh nng tu bin l cn thit. Cc cngon thc hin bao gm:

    a. Trch ly dliuTrch ly d liu l x l ly cc d liu c xcnh tr c ra

    khi cc h thng tc nghi p v cc ngun d liu ngoi. Vic trch ly d liu ngun c th c hon thnh b i cc cng vic: c ngun mt cchtr c ti p,c mt nh ca ngun hoc c Log.

    C mt scng cv cc trnh tin ch phc vcho qu trnh trch lyd liu. Cc vn xung quanh vic trch ly d liu bao gm c cu th igian trong dliu c trch ly v hiu quca vic trch ly dliu .

    V i mi ph ng thc trch chn d liu, Metadata lunng vai trquan tr ng trong qu trnh x l. Metadata mu bao gm: ccnh ngh a cahthng ngun, cc khun dng vt l, ph ng thc v bn lit k vic trchly d liu. C thdng cc cng choc thc hin bng taythu cMetadata.

    C thpht hin ra nhng thayi c thc hin i v i d liu trongh thng LS thng qua vic c Log. Nhng thayi l cc hnhngchn thm, c p nht v xo cng nh thng tin ca ct hoc hng lin quan.Ton bnhng thayi c ghi li v sau c p dng theo tr t tmcc thayi c thc hin trong hthng tc nghi p.

  • 8/8/2019 000000208019.pdf

    22/125

    - 22 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    b. Tinh chdliuD liu sau khi c trch xut s c tinh ch thng qua cc cng

    vic lm sch (Cleaning), chuyn i (Transforming) v tch h p. Cc cngc c ththc hin trn mt t p cc thng s c xcnh tr c, trnLogic m hoc trin khai cc thut ton thng minh. Cc thut ton thngminh Heuristic v i t p lut m r ng m phng suy din ca con ng i lmcho vic iu tra tin hnh nhanh h n.

    Tr c khi c thchuyn i v tch h p d liu, nn thit l p hthngo l ng v chun ho ccnh/ngngh a. Mc ch ca vic chuyn i v

    tch h p l chuyn d liu thnh thng tin v lm cho chng dhiu, ds dng h n i v i ng i sdng.

    Ccnh ngh a ca d liu phi chnh xc,y , tin cy v c gi tr . Nu d liu c a vo kho d liu khngng th sau phi quantm t i vic xem xt li. Vic ny lin quan nhiu t i vic t chc. Cc cuhi cn t ra tr c khi thayi ci c l: cc thayi c h p php vngquy cch khng? C th p ng c nhng thayi ny khng? Thayi

    c phi l lu di khng? Nu cu tr l i l c cho c3 cu hi trn th thayi l c ththc hin c.

    2.1.4. C s d li u ca kho d li u

    C s d liu t p trung l mt nn tng c bn ca mi tr ng kho d liu. C s d liu ny hu ht c cit da trn cng nghca Hthngqun tr c s d liu quan h(RDBMS). Tuy nhin vic cit mt kho d

    liu da trn k thut ca RDBMS truyn thng b rng buc b i mt thc t l vic cit RDBMS truyn thng c ti u hoi v i vic x l c s d liu giao dch. Nhng thuc tnh tt yu ca kho d liu nhkch c r tl n, x l cc truy vn c bit v s cn thit to ra nhng khung nhn linhhot cho ng i sdng bao gm vic t p h p, k t h p nhiu bng v khoan

  • 8/8/2019 000000208019.pdf

    23/125

    - 23 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    su (Drill_down) tr thnh nhng nh h ng cho cc cch ti p cn khcnhau t i c s dliu ca kho dliu. Nhng cch ti p cn bao gm:

    Thit k CSDL quan hsong song. Mt cch ti p cn m i lm tng tc RDBMS truyn thng l cch

    sdng mt cu trc ch sbqua kim tra cc bng quan h. Cc c s d liu a chiu da trn cng nghc s d liu phbin

    hoc c cit sdng trn nn RDBMS quen thuc. C s d liua chiu c thit k khc phc nhng gi i hn tn ti trong khod liu gy ra do bn cht ca m hnh d liu quan h. Cch ti p cn

    ny gn lin v i cc cng cx l phn tch tr c tuyn thc hin nh mt i tc ca cc kho dliu a chiu. Cc cng cny g p li thnhmt nhm cng ctruy vn, to bo co, phn tch vo x i dliu.

    2.1.5. Kho d li u

    2.1.5.1. nh ngh a

    Kho d liu (Data Warehouse) l t p h p ca cc CSDL tch h p,

    h ng ch , c thit k h tr cho chc nng tr gip quyt nh mmi n v dliu u lin quan t i mt khong th i gian cth.[1]

    Kho d liu th ng c dung l ng r t l n, t i hng tr m Gigabyte haythm ch hng Terabyte d liu c t chc, lu tr v phn tch phc v cho vic cung c p cc dch v thng tin lin quann yu cu ca mt t chc no. Kho d liu phc vcho vic phn tch v i k t qumang tnhthng tin cao. Cc hthng thng tin thu th p, x l d liu loi ny cn gil Hxl phn tch tr c tuyn (OLAP).

    Mt kho lu tr d liu th ng c s dng nh c s cho mt h thng h tr quyt nh. N c thit k khc phc nhng vn v p phi khi mt tchc cgng thc hin chin l c phn tch c sdng cng

  • 8/8/2019 000000208019.pdf

    24/125

    - 24 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    mt c s dliu c sdng cho xl giao dch tr c tuyn.

    2.1.5.2. c i m d liu trong kho d liu

    Kho dliu l mt t p h p dliu c nhng tnh cht sau:a. Dliu c tnh tch h p

    Mt kho d liu l mt khung nhn thng tin mc ton th, thngnht cc khung nhn khc nhau thnh mt khung nhn ca mt ch . V d,hthng OLTP truyn thng c xy dng trn mt vng phc vvic kinhdoanh. Mt hthng bn hng v Marketing c thc chung mt dng thngtin vkhch hng, nhng cc vn vti chnh th li cn mt khung nhnkhc. Mt kho d liu s c mt khung nhn ton thvmt khch hng,khung nhn bao gm cc phn d liu khc nhau t ti chnh nMarketing.

    Tnh tch h p thhin chd liu t p h p trong kho d liu c thuth p tnhiu ngun v tr n ghp v i nhau to thnh mt ththng nht. b. Dliu gn th i gian v c tnh lch s

    Mt kho cha d liu bao hm mt khi l ng l n d liu mang tnhlch s. D liu c lu tr thnh mt lot cc Snapshort, mi Snapshort phn nh nhng gi tr ca d liu ti mt th i im nht nh thhin mtkhung nhn ca mt vng ch trong mt giaion. Do vy n cho phpkhi phc li lch sv so snh mt cch chnh xc cc giaion khc nhau.Yu t th i gianng vai tr nhmt phn ca khobo m tnh nnht v cung c p c tr ng vth i gian cho dliu.

    c. Dliu ch cD liu trong kho d liu l d liu ch c, c th c kim tra v

    khng c sa i b i ng i sdng.d. Dliu khng bin ng

    Thng tin trong kho d liu c ti vo sau khi d liu trong hthng

  • 8/8/2019 000000208019.pdf

    25/125

    - 25 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    iu hnh c cho l qu c. Khng bin ng thhin ch: d liu clu tr lu di trong kho dliu. Mc d c thm dliu m i nh p vo nhng

    d liu c trong kho vn khng b xo,iu cho php cung c p thng tinvmt khong th i gian di, cung c p s liu cn thit cho cc m hnhnghi p vphn tch, dbo.e. Dliu tng h p v chi tit

    D liu chi tit l thng tin mc th p nht c lu tr trong kho d liu. D liu tc nghi p l thng tin mc th p nht cho mt tchc. D liutc nghi p thun tu khng c lu tr trong kho d liu. D liu tng h p

    c tch li qua nhiu giaion khc nhau.2.1.6. Kho d li u ch (Datamart)

    Kho d liu ch (Datamart - DM) l CSDL c nhng c imging v i kho d liu nhng v i quy m nhh n v lu tr d liu vmtl nh vc, mt chuyn ngnh. Cc Datamart c th c hnh thnh tmt t pcon d liu ca kho d liu hoc cng c th c xy dng c l p v saukhi xy dng xong cc Datamart c th c k t ni, tch h p li v i nhau tothnh kho dliu.

    Datamart l mt kho d liu thc p gm cc d liu tch h p ca khod liu. Datamart c h ng t i mt phn ca d liu, th ng c gi lmt vng ch (SA) c to ra dnh cho mt nhm ng i s dng. D liu trong Datamart cho thng tin vmt ch xcnh, khng phi vton bcc hot ng nghi p v ang din ra trong mt tchc. Thhin th ng

    xuyn nht ca Datamart l mt kho d liu ring r theo ph ng din vt l,th ng c lu tr trn mt Server ring trong mt mng cc b phc v cho mt nhm ng i nht nh.i khi Datamart v i cng nghOLAP to racc quan htheo dng hnh saoc bit hoc nhng siu khi (Hypercube) d liu cho vic phn tch ca mt nhm ng i c cng mi quan tm trn mt

  • 8/8/2019 000000208019.pdf

    26/125

    - 26 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    phm vi d liu. C th chia Datamart ra lm 2 loi: Datamartc l p vDatamart phthuc.

    Datamart ph thuc cha nhng d liu c ly t kho d liu vnhng d liu ny s c trch lc, tinh ch, tch h p li mc cao h n phc vmt ch nht nh.

    Datamartc l p khng ging nhDatamart ph thuc, n c xydng tr c kho d liu v d liu c ly t cc ngun d liu tc nghi p.Ph ng php ny n gin h n v chi ph th p h n nhng i li c nhngim yu. Mi Datamartc l p c cch tch h p ring do d liu tnhiu

    Datamart khng nht v i nhau.Datamart thhin hai vn : tnhn nh khi mt Datamart nhban

    u l n ln nhanh chng theo nhiu chiu v s tch h p d liu. V vy khithit k Datamart phi ch t i tnhn nh ca hthng, s ng nht cadliu v vn vkhnng qun l.

    2.2. S dng kho d liu

    Kho dliu c sdng theo ba cch chnh: Theo cch khai thc truyn thng, kho d liu c s dng khai

    thc cc thng tin bng cc cng cvn p v bo co. Tuy nhin, nh c vic xut ra, tng h p v chuyn i t cc d liu th sang dngcc d liu cht l ng cao v c tnhn nh, kho d liu gip nngcao cc k thut biu din thng tin truyn thng (hi p v bo co).Bng cch to ra mt tng n gia ng i dng v CSDL, cc d liu

    u vo ca k thut ny c t vo mt ngun duy nht. Vic h pnht ny loi b c r t nhiu li sinh ra do vic phi thu th p v biudin thng tin tr t nhiu ngun khc nhau cng nhgim b t c s chm tr do phi ly cc d liu b phn on trong cc CSDL khcnhau, trnh cho ng i dng khi nhng cu lnh phc t p. Tuy nhin

  • 8/8/2019 000000208019.pdf

    27/125

  • 8/8/2019 000000208019.pdf

    28/125

    - 28 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    2.3. Ph ng php xy d ng kho d liu

    Xy dng kho dliu va l mt tin trnh cng vic v cng ng th i

    l mt kin trc nhm thc hin cc ni dung nh: la chn, chuyn i, luchuyn, bo ton tnh ton vn, tch h p, lm sch d liu, a d liu t nhiu ngun d liu tc nghi p vo hthng qun l c s d liu phc v cc qu trnh ra quyt nh. Kin trc ca cc kho d liu cung c p nhiu kh nng mm do, nhiu khnng m r ng phc vcho ccng dng hinc cng nh cho ccng dng m i trong t ng lai. Kho d liu gm ccthnh phn thit yu sau:

    Cc ngun dliu tc nghi p ODS (Operational Data Sources). Chuyn i v xut ra dliu (Data Conversion and Extraction). Tm l c v lm giu dliu (Data Sumaization & Data Enrichment). H thng qun l cc CSDL ca kho d liu (Database Management

    System - DBMS). Qun l cc siu dliu. Cc cng c(Tools) truy nh p v phn tch.

    Qu trnh xy dng kho d liu c thbt u bng vic xy dng ccDatamart, c ngh a l sau khi xy dng xong cc Datamart ta tin hnh k tni, tch h p chng v i nhau to thnh kho d liu. Theo cch ny, Datamartchnh l m hnh v l b c u tin ca qu trnh xy dng kho d liu.Cch thhai, ta c thxy dng kho dliu tr c sau to ra cc Datamart.Mi ph ng phpu c thun l i v kh khn ca n, tyiu kin cthta

    la chn hay k t h p cc ph ng php cho ph h p.Ph ng php phn tch, thit k v qu trnh xy dng kho d liu c

    th c chia thnh cc giaion, trong mi giaion c cc b c:- Giaion kho stB c 1: Xcnh chin l c v xy dng k hoch

  • 8/8/2019 000000208019.pdf

    29/125

    - 29 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    B c 2: Kho st,nh gi hin tr ng hthng- Giaion phn tch thit k

    B c 3: Phn tch, thit k h thng v xy dng mu th nghim(Prototype)- Giaion xy dng, pht trin hthngB c 4: Trin khai xy dng hthngB c 5: Khai thc v duy tr hthng

    2.4. Thit k CSDL cho kho d liu

    Mt vi ph ng php v cng c phc v tt cho vic to ra cc h thng tc nghi p gn nh l khng ph h p v i nhng yu cu khc nhau cakho d liu. iu ny r t ng trong cc hthng qun tr c s d liu. H thng OLTP truyn thng c thit k mt cch n gin khng ph h p v inhng yu cu ca ph ng php kho dliu. Nhng dn dng ph ng phpkho d liu buc phi la chn gia mt m hnh d liu v mt gin d liu lin quan tr c quan cho vic phn tch nhng ngho nn vthhin. Mtgin - m hnh l cch thc hin tt h n nhng khng ph h p lm chovic phn tch. Khi ph ng php kho d liu c ti p tc pht trin thnhng cch ti p cn m i cho vic thit k gin d liu ph h p h n v ivic phn tch c hnh thnh v l iu ct yu dn n thnh cng ca ph ng php kho d liu. Mt gin c ch p nhn sdng r ng ri cho ph ng php kho dliu l gin hnh sao.

    2.4.1. Gi n hnh sao (Star)

    Vic phn tch, dbo i hi nhng gin CSDL chyu t p trungvo nhng truy vn m bn cht l a chiu v h ng mng (Array-oriented). Nhvy, cng nghCSDL chnh ca kho d liu l RDBMS. Ta sxem xtvic thit k gin dliu khi gn lin n v i cng nghCSDL quan h.

  • 8/8/2019 000000208019.pdf

    30/125

    - 30 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    Gin hnh sao c a ra ln u tin b i Raph Kimball nhl mtla chn thit k CSDL cho kho dliu. Trong gin hnh sao, dliu c

    xcnh v phn loi theo 2 kiu: skin (bng Fact:i t ng trung tm) v phm vi (cc bng Dimension: cc bng lin k t). Trong gin hnh sao ch c mt bng lin quan tr c ti p t i hu ht cc bng cn li l bng Fact vl bng cha yu t ct li cn c phn tch. N c gi l gin hnhsao b i v cc skin nm trung tm ca m hnh v c bao quanh b icc phm vi lin quan, r t ging v i ccim ca mt ngi sao. Cc skinl cci l ng s ca cng vic. Cc phm vi l cc b lc hoc cc rng

    buc ca nhng skin ny. V d: thng tin vkhch hng nh tn,a ch l mt phm vi, trong khi thng tin bn hng cho khch hng l mt s kin.

    Hnh 2.2. Gin hnh sao v hnh tuyt r i

    V i gin hnh sao, ng i thit k c thddng m phng nhngchc nng ca CSDLa chiu. Sphi chun ha c thcoi l s tin k t ni

  • 8/8/2019 000000208019.pdf

    31/125

    - 31 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    (Pre-joining) cc bng cho ccng dng khng phi thc hin cng vick t ni, lm gim th i gian thc hin.

    Gin hnh sao c thit k l khc phc nhng hn chca mhnh quan hhai chiu. V i c s d liu c thit k theo gin hnh sao,nhng truy vn v i nhng cu hi phc t p lin quan t i nhiu bng v sliutng cng tr nn n gin h n v s l ng cng vic cn thc hin a c ra cu tr l i l t nht so v i mt m hnh quan hchun. Gin hnhsao ci thin ng k th i gian truy vn v cho php thc hin mt s tnhnng a phm vi. Gin ny r t tr c quan, dsdng, thhin khung nhn

    a chiu ca d liu dng ngngh a ca CSDL quan h. Kha ca bng Fact c to b i nhng kha ca cc bng cha thng tin theo tng phm vi(bng Dimension). Tt ccc khau c xcnh v i cng mt chun ttn.

    V d, ly c thng tin thnh phca khch hng cth, cn phik t h p kha ch khch hng trong bng skin (bng Fact) v i kha cakhch hng trong bng phm vi (bng Dimension) vt thuc tnh thnh

    phca khch hng l thnh phm hquan tm.Bng Fact c cha kha ca cc bng Dimension, c thl v i tn khc

    i m bo tnh duy nht ca mi hng. Cc bng Dimension th ng cnh danh duy nht v cha ng nhng thng tin vchiu (Dimension) ca bng .

    V bng Fact c tng h p t tr c v c k t h p theo nhiu chiunn xu h ng c r t nhiu hng v tng tr ng mt cch nhanh chng trongkhi cc bng Dimension khng c nhiu hng v s tng tr ng l t nh.Bng Fact c thbao gm hng chc triu hng. Bng Dimension cha ngcc thuc tnh c th c s dng nh cc tiu ch tm kim v th ng ckch th c nhh n nhiu, r t quen thuc v i ng i s dng t tr c. Kho

  • 8/8/2019 000000208019.pdf

    32/125

    - 32 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    ca n khng l kho ghp nhbng Fact. Nu mt bng Dimension bt uc s t ng ng v i bng Fact th n cn c ti p tc chia ra thnh cc

    bng Dimension na. Nu mt bng Dimension c chia thnh Dimensionchnh v Dimension ph th cu trc thu c gi l mt gin tuyt r ihoc mt cu trc sao m r ng.

    Mt gin hnh sao n gin ch gm mt bng Fact v mt vi bngDimension. Mt gin hnh sao phc t p bao gm hng tr m bng Fact v bng Dimension. Mt vi k thut ci thin hiu sut ca cc truy vntrong gin hnh sao bao gm:

    Xc nh s k t h p cc bng Factang tn ti hay to ra mt s k th p m i cc bng Fact.

    Phn chia bng Factn mc m hu ht cc truy vn ch truy nh p t i phn .

    To ra cc bng Fact ring r . To ra nhng t p ch s n duy nht hoc cc k thut khcci

    thin nng sut k t h p.Cbng Fact v cc bng Dimensionu khng bt buc dng chun

    nh i v i ph ng php thit k truyn thng tc l c d tha d liu. Loigin ny cho php lu tr d tha d liu, i li khnng truy nh pnhanh h n ph h p v i nhng cu hi phn tch nhiu chiu, phc t p. V bn cht bng Fact thuc dng chun 1 v i mc dtha dliu r t l n.

    C thni gin hnh sao l mt CSDL ch c, vic c p nht d liu

    l r t kh nu khng mun ni l khng th c. Mt vi bng Dimensioncha d liu c th c thm vo bng cc truy vn c k t ni, mt vi bngkhc li khng cha dliu g ngoi vic phc v nh ch scho dliu.

    2.4.2. Gi n hnh tuy t r i (Snowflake)

    Gin hnh tuyt r i l mt sm r ng ca gin hnh sao, ti

  • 8/8/2019 000000208019.pdf

    33/125

    - 33 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    mi cnh sao khng phi l mt bng Dimension m l nhiu bng. Trongdng gin ny, mi bng theo chiu ca gin hnh sao c chun ha

    h n. Gin hnh tuyt r i ci thin nng sut truy vn, ti thiu khng gian a cn thit lu tr d liu v ci thin nng sut nh vic ch phi k th p nhng bng c kch th c nhh n thay v phi k t h p nhng bng ckch th c l n li khng chun ha. N cng lm tng tnh linh hot ca ccng dng b i schun ha v t mang bn cht theo chiu h n. N lm tngs l ng cc bng v lm tng tnh phc t p ca mt vi truy vn cn c s tham chiu t i nhiu bng. Mt vi cng c che giu ng i sdng gin

    CSDL vt l v cho php h c th lm vic mc khi nim. Nhngcng cny nh xnhng truy vn ca ng i sdng t i s vt l. H cn mt bqun tr CSDLthc hin cng vic ny mt ln u tin khicng cny c cit.

    2.4.3 Gi n k t h p

    L k t h p gia gin hnh sao da trn bng Fact v nhng bngDimension khng chun ha theo cc chun 1, 2, 3 v gin hnh tuyt r itrong tt ccc bng Dimensionu c chun ha. Trong gin loi ny ch nhng bng Dimension l n l c chun ha cn nhng bngkhc cha mt khi l ng l n cc ct dliu cha c chun ha.

    Mt vi CSDL v cc cng c truy vn ca ng i sdng, nht l cccng cx l phn tch tr c tuyn (OLAP)i hi m hnh d liu phi lgin hnh sao b i v n l mt m hnh d liu quan hnhng li c

    thit k htr m hnh d liu a chiu, lim ct li ca OLAP. Cc c s dliu v cng cny c iu chnh cho ph h p thc hin c ccyu cu truy vn i v i m hnh ny.

  • 8/8/2019 000000208019.pdf

    34/125

    - 34 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    2.4.4. Nh ng v n lin quan t i thi t k gi n hnh sao

    Mc d hu ht cc chuyn giau ng r ng gin hnh sao thch

    h p cho ph ng php thit l p m hnh cho ph ng php kho d liu nhngvn cn mt svn ca hqun tr c s d liu quan hlin quan t i viccit gin hnh sao.

    2.4.4.1. nh ch s

    Sdng vic nh ch sc th m bo sduy nht ca cc kha vc thci thin nng sut c. V cc bng trong thit k hnh saoin hnhcha sphn c p tng thca cc thuc tnh, cch thc ny c ch p nhncho nhng thit k bnh th ng nhng n cng thhin mt vi vn trongm hnh gin hnh sao l:

    N i hi s nh ngh a Metadata phc t p (mt cho mi thnh phnkha)xcnh mt mi quan h n (mt bng).iu ny lm chothit k thm phc t p v hiu sut kmi nhiu.

    V bng Fact phi cha tt ccc kha thnh phn nhmt phn ca

    kha chnh nn vic thm vo hay xa bmt mc trong s phnc p s i hi s thay i vt l cc bng lin quan mt nhiu th igian v hn chtnh linh hot.

    Vic cha tt cccon kha ca mi Dimension trong bng Fact lmtng kch th c ca bng ch sv tcng mnh t i hiu sut v s nnh.Mt ph ng phpi v i kha ghp nh trn l ct kha ra thnh cc

    kha n. Cch ny gii quyt c 2 vn u nhng kch th c ca bngch svn l mt vn . Cch tt nht l thay nhng kha c ngh a bngvic sdng mt kha do mnh to ra l mt kha nhnht c thm vn bom tnh duy nht ca mi bn ghi. Nhng kha c ngh a c thay thnh

  • 8/8/2019 000000208019.pdf

    35/125

    - 35 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    ni trn khng cn thit phi hy b, n gin chng c th c chuynn mt thuc tnh khng phi l kha. K t qu thit k theo m hnh hnh

    sao bao gm mt bng Fact v i mt kha chnh cng mt ct kha chomi chiu, ti mi kha l kha c to ra. Ph ng php ny cho kh nng linh hot mc cao nht, vic bo tr l t nht v cho hiu sut cao nhtc th.

    2.4.4.2. Ch th v m c

    nh h ng cc chiu mt cch thnh cng, vic thit k cc bngDimension th ng bao gm mt mc ch dn phn c p cho mi bn ghi. Mitruy vn ly d liu t cc bn ghi chi tit ca mt bng lu tr chi tit vnhng d liu k t h p phi sdng ch dn ny nhmt rng buc thm thu c k t qu ng. Mc ny l mt cng cc ch cho cc mi tr ng c kim sot cht chb i cc DBA v trong mi tr ng mt vi truyvn c bit c cho php sdng. Nu ng i sdng khng quan tm t ich th vmc hoc gi tr ca n khngng th mc d qu trnh truy vn lng vn c th a ra k t qukhng h p l.

    S la chn tt nht cho vic dng ch th vmc l sdng gin hnh tuyt r i. Trong gin loi ny, cc bng Fact k t h p c to ra mtcch ring bit t nhng bng cha d liu chi tit. Thm vo v i cc bngFact chnh, gin hnh tuyt r i cn cha cc bng Fact ring r cho mimc k t h p, v vy khng mc li trong vic la chn cc bn ghi chi tit.Tuy nhin gin hnh tuyt r i phc t p h n gin hnh sao v th ng i

    hi nhng cu lnh SQL phc t p h n nhn c cu tr l i.2.4.5. Nh ng nhn t thi t k cn ph i c cn nh c

    Thit k cu trc kho d liu c th lmnh h ng n tnh ddngtrong vic thit k v xy dng cc khi (Cube). Microsoft SQL Server OLAP

  • 8/8/2019 000000208019.pdf

    36/125

    - 36 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    Services da vo d liu c cung c p b i kho d liu c tnh chnh xc,nnh v ton vn. Khi to ra mt kho d liu sdng v i OLAP, nhng nhn

    tthit k cn phi c cn nhc l: Sdng s hnh sao hoc bng phng chnh (Flat) nu c th. Numt s dng hnh tuyt r i l cn thit th gim thiu s bngDimension v t ra ngoi mc thnht tbng chnh.

    Thit k cc bng Dimension cho ng i dng. Cc bng Dimension cnc thng tin ngh a vthc tm ng i dng mun tm hiu.

    p dng vic chun ho thng th ng vo thit k bng Dimension.

    Khng nn k t h p d liu khng quan hvo bng Dimension n vkhng nn l p li d liu trong cc bng Dimension. V d: toDimension khch hng ring bit thay v l p li thng tin khch hngtrong nhiu bng Dimension.

    Khng tng h p tha trong bng chnh. Gi li mc tinh tcn thitcho ng i dng truy c p v gi li tt ccc bn ghi ca bng chnhtrong cng mt mc chi tit. OLAP Services c thit k to rav qun l dliu tng h p tcc kho lu tr dliu ht nhn mc caokhng lm tng th i gian tr l i yu cu.

    S dng cu trc chung cho bng chnh (Fact) cho d liu cng loi.D liu s dng trong mt khi c th c lu tr trong cc bngchnha chiu nhng nhng bng ny phi c cng cu trc.

    Khng to cc bng ph cho d liu tng. OLAP Services tnh ton

    tr c cc tng theo cu trc m c thit k cho vic truy vn c hiuqu. Cc bng tng phkhng c sdng.

    To ch s cho cc tr ng kho. V i mi bng Dimension to ra mtch s trn ct kho ca n, v i mi bng Fact to ra mt ch s ntrn t h p cc ct m n cha cc kho ngoi ca bng Dimension

  • 8/8/2019 000000208019.pdf

    37/125

    - 37 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    c k t h p v i bng Fact. OLAP Services sdng nhng ch snykhi chng Load cc cu trc d liu a chiu v cc tnh ton d liu

    tng. Nhng ch sny ci tin ng k qu trnh xl. Bo m tnh ton vn. y liu quan tr ng v cc bng Fact c biu din theo cc bng Dimension. Cc bng Fact m khng c khot ng ng trong bng Dimension c thgy li hoc cc hng trong bng Fact b b i nu cc bng Fact v bng Dimension c dngtrong cng mt khi. Cc bng Dimension cha thng tin khng c biu din trong bng Fact c th gy ra cc tr ng trong cc khi.

    Nhng tr ng ny c thgy tr ngi cho mt s k t qu tnh ton phn tch.

    Thit k mt chin l c c p nht d liu. Khi d liu c thm vohoc thayi trong kho lu tr d liu, cc khi c xy dng td liu tr c phi c c p nht tr c khi dkiu m i c cung c p chong i dng. Vic st nh p d liu b sung trong cc khi i hi th igian t h n vic xy dng cc khi khi dliu tn ti thayi.

    2.5. Qun tr kho d liu

    Kho d liu cl n g p khong nhiu ln mt kho d liu tc nghi ptng th. N khng c ng bv i d liu tc nghi p lin quan trong th igian thc nhng c th c c p nht th ng xuyn nu nh ng dng yucu n n.

    Hu ht cc sn phm ca kho d liu bao gm cc cng truy nh p

    t i cc ngun d liu phc t p m khng phi vit li cc phn mm chuyni, dch v sdng d liu. Trong mt mi tr ng kho d liu hn t p, r tnhiu cc CSDL khc nhau nm trn nhng hthng ring r v th i hicc cng c lm vic traoi gia cc mng. iu dn n s cn thit phi qun tr cc thnh phn htng. Qun tr kho dliu bao gm:

  • 8/8/2019 000000208019.pdf

    38/125

    - 38 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    Qun tr van ton, bo mt v u tin Qun tr c p nht tnhiu ngun khc nhau

    Kim tra cht l ng dliu Qun tr v c p nht Metadata Kim ton, l p bo co vvic sdng v tr ng thi ca kho dliu Lm sch dliu Ti to d liu, chia nhd liu thnh nhng t p con v phn tn d

    liu Sao lu v phc hi dliu Qun tr cc kho dliu

  • 8/8/2019 000000208019.pdf

    39/125

    - 39 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    Ch ng III. Ti p c n v phn tch a chi u trong x l phntch tr c tuy n

    3.1. Tip cn a chiuOLAP l hot ng xl to l p, qun l dliu a chiu trong thc t,

    gip ng i s dng ddng trong vic phn tch, tham kho d liu, nhmhiu c cc thng tin tim n m d liu ang cha ng. Cc yu cuchnh yu ca OLAP l:

    Truy xut, tnh ton nhanh. C khnng phn tch mnh. Linh hot (phn tch linh hot, giao din linh hot, hin th d liu linh

    hot).

    Htr nhiu ng i sdng.Vn t ra l phi chn ti p cn tchc dliu no png c

    nhng yu cu chc nng ny ca OLAP v m hnh d liu a chiu thc t. Nhiu ng i c tm cch s dng bng tnh hay SQLp dng OLAP

    vo nhng iu ny r t kh khn, nhiu hn chv iu quan tr ng l khngthhin c nhng c tr ng ca OLAP, khngp ng c v i nhngyu cu chc nng ca OLAP v m hnha chiu. L do chyu dn nvic bng tnh b hn chkhi cgng to l p m hnh dliu a chiu l v bng tnh khng tch cu trc ca m hnh ra khi nhng thhin ca mhnh. Nhvy n ch c th c p dng i v i mt bi ton n gin,trn mt s l ng nhd liu c t chc d i dng bng hai chiu. SQL

    cho chng ta ph ng tin truy vn da trn cc ct ca d liu nhng khngp dng c cho tt ccc tr ng h p phn tch v cho vic so snh trn ccdng. Chai ti p cn nyu khng lm cho chng ta truy vn ddng khil ng d liu l n c tchc mt cch phc t p. Ti p cn tt nht cungc p x l h ng n quyt nh da trn phn tch v ph h p v i nhng yu

  • 8/8/2019 000000208019.pdf

    40/125

    - 40 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    cu ca OLAP l ti p cn a chiu. Cc m hnh doanh nghi p yu cu kh nng g p dliu nhiu mc khc nhau trong cc chiu. Ng i phn tch cn

    c khnng l t nhanh d liu thng qua vic thayi cu hnh hin th cad liu trn mn hnh. Hcn c khnng phn tch d liu, chyu l davo vic tng h p v so snh d liu trn cc chiu. Ti p cn a chiu cnhiu u im r rng h n ti p cn bng tnh (Spreadsheet) hay SQL trn c hai cng vic nh ngh a v sdng cc m hnh nhvy.

    S tch ring cu trc d liu ( c nh ngh a trong cc chiu) ra khi biu din ca d liu l mt thun l i l n ca ti p cn a chiu. N lm ti

    thiu scn thit l p li cc thng tin vcu trc v cung c p sh tr tr cti p cho vic lm thayi ddng cc yu cu hin th. Ngoi ra s h tr tr c ti p ca cc chiu a mc v khnng gn cc cng thc trn tr c (Axis- based) thay v cc cng thc trn (Cell-based) lm vic nh ngh a cc phpg p a mc v cc tnh tona chiu ddng.

    OLAP l cng cphn tch tr c tuyn. Bn cht ct li ca OLAP l d liu c ly ra tkho d liu hoc Datamart sau c chuyn thnh m

    hnha chiu v c lu tr trong mt kho d liu a chiu (d liu clu tr theo mng thay v bn ghi nhm hnh quan h). Cc dch v (haycng c) OLAP ly d liu trong kho d liu thc hin cc cng vic phntchc bit theo nhiu chiu, phc t p htr cho vic ra quyt nh. Gin hnh sao c dng thit k m hnh d liu trong kho d liu hocDatamart l m hnh d liu quan hnhng li mang nhng thuc tnh nhiuchiu c r t nhiu thun l i cho vic cit OLAP.

    3.2. Phn tcha chiu

    Tt c nhng d liu c quan h v i nhauu cn c phn tch.Trong x l phn tch th tr ng tm l phn tch d liu, c bit l phn tcha chiu. Trong phn tcha chiu, d liu c miu t thnh cc chiu

  • 8/8/2019 000000208019.pdf

    41/125

    - 41 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    (Dimensions) chng hn nh Sn phm, Khu vc v Khch hng. Ccchiu th ng lin quan t i nhng sphn c p v dnhThnh ph, Vng

    v N c. Chiu th i gian l mt chiu chun v i sphn c p ca ring n lNgy, Tun, Thng, Qu v Nm.

    Hnh 3.1. M hnh dliu a chiu

    gii quyt sphn tch phc t p, phn tch nhiu chiu thhin mtkhung nhn d liu gn gi v i ng i sdng. Chng hn, mt ng i sdng

    c thtruy nh p t i ngn kh theo tng phng ban v lu tr 4 qu cui chomt t p cc sn phm. K t quc th c xoaythayi v tr cc tr c vkhung nhn. Thm na ng i sdng c thxem cc chiu bng cch khoansu (Drill-down) hay cun ln (Roll-up) theo cc thnh phn ca mi chiu.Vic khoan su trn cc chiu c thto ra cc khung nhn khc. Phm vi cax l thng tin th ng n gin h n (ch gm 2 hoc 3 chiu). Phn tchnhng d liu lch s hiu c qu kh l sphn tch t nh. X l phntch c th c dng cho nhng phn tch lch sphc t p v i thao tc m r ng hay gi l sphn tchng: ln k hoch v dbo ti p qu khnh l phn m u cho t ng lai.

    Trong kho d liu, d liu c lu tr cho vic truy vn, phn tch v

  • 8/8/2019 000000208019.pdf

    42/125

    - 42 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    cc mc ch khc nhOLTP, khi dliu c thu th p v lu tr cho cchot ng tc nghi p v cc mc ch kim sot.

    3.3. Kin trc khi ca OLAP (OLAP Cube Architecture)3.3.1. Gi i thi u ki n trc kh i

    C s d liu OLAP s dng hnh khi d liu lm cn bn. hiuhnh khi OLAP nh th no, chng ta th hnh dung xem d liu cchuyn vo CSDL OLAP xut pht t vic truy vn d liu t bng d liuFact v nhng bng Dimensions. Ni cch khc, bo co cui cng ca vic phn tch dliu c k t xut tcc loi bng dliu trn cng v i vicngdng mt shm tnh ton.

    Hnh 3.2. M hnh dliu khi

    m td liu hnh khi, chng ta th t ng t ng d liu trong

    bng Fact c phn bnhsau:i t ng chnh ca OLAP l khi, mt s biu din a chiu ca d liu chi tit v tng th. Mt khi bao gm mt bng skin (Fact), mt hoc nhiu bng chiu (Dimensions), cc n v o(Measures) v cc phn hoch (Partitions). Ta c th thit k cc khi datrn c s cc yu cu phn tch ca ng i sdng. Mt kho dliu c thh

  • 8/8/2019 000000208019.pdf

    43/125

    - 43 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    tr nhiu khi khc nhau: khi vl ng, khi vhng tn kho...V dmt gin khi hnh sao c dng nhsau:

    Hnh 3.3. Gin khi hnh sao

    y nu mun, ta c thm r ng khi theo nhiu nm bng cchthm ct Year_ID vo Time_Dimension_Table v to thm mt bngDimension l Time_Dimension_Table_2 cha hai ct Year_ID v Year .Lc ny ta c c mt gin khi hnh tuyt r i nhsau:

    Hnh 3.4. Gin khi hnh tuyt r i

    3.3.2. Kh i (Cube)

    Khi l phn tchnh trong x l phn tch tr c tuyn, mt cng ngh

  • 8/8/2019 000000208019.pdf

    44/125

  • 8/8/2019 000000208019.pdf

    45/125

    - 45 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    php ng i dnga ra yu cu vcc vn mc cao v sau m r ng ra mt h thng chiu pht hin thm chi tit. V d: mt nh phn

    tch c thbt u bng vic yu cu xem cc gi tr Fiscal_Year ca cc k tquba nm ti chnh vtr c. Vic phn tch c th thng bo chi ph nmny cao h n so v i cc nm khc. M r ng chiu Fiscal_Year t i mcMonth th sphn tch cho thy chi ph sn phm c bit cao trong thngno . Sau nh phn tch c th kho st k cc c p ca chiuStore_Locationthy mt l nh vc c bit gp phn ng k lm chi phsn phm cao hoc m r ng chiu Product_Linethy Item_Cost cao

    i v i mt nhm sn phm hoc mt sn phm c bit. Kiu kho st ny c bit n nhl Drill_down v n phbin trong ccng dng OLAP.

    Mc d khi va xut c ba chiu nhng mt khi c thc t i 64chiu. D liu khi v cc lin k t (Aggregation) c th c lu tr d inhiu ph ng thc. Cc lin k t l cc bn d liu s l c c tnh tontr c, n cung c p c chcho vic png nhanh yu cu trong cc hthngOLAP.

    Cc khi c th i hi khng gian lu tr ng k cha d liu vthng tin s l c c tnh ton tr c trong cc cu trca chiu. Nhn ttcng n cc yu cu lu tr l khngng k (s l ng cc tr ng trongmt khi). V d: nu mt chiu c cha cc m tvic bn hng v mtchiu khc cha cc min, cc ti im giao nhau gia biu din bn hngmin Bc v min Nam c thl r ng.

    Cc la chn lu tr cho php ta chn cc ph ng thc v cc v tr lutr thch h p cho d liu khi. Ta c th to mt chin l c lu tr OLAPpng theo cc nhu cu ca ta.

    3.3.2.2. X l cc kh i

    Khi ta x l mt khi th cc khi lin k t thit k ca n c tnh

  • 8/8/2019 000000208019.pdf

    46/125

    - 46 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    ton v c Load cng v i khi v d liu. Qu trnh x l mt khi baogm vic c cc bng Dimentionsxc nh cc c p d liu hin ti,

    c bng Fact, tnh ton cc lin k t c bit v lu tr cc k t qutrong khi.Sau khi mt khi c xl, n c cung c p cho yu cu ca ng i dng.X l l thut ng c dng ch s ti tr n vn d liu ca khi. Tt

    ccc chiu, d liu bng Fact c c v tt ccc khi lin k t c bit c tnh ton. Ta phi x l mt khi khi cu trc ca n cn m i hoc ccchiu ca n hay cc n v o l ng c chn lc. Vic x l mt khic thly i mt s th i gian thc nu c mt bng Fact l n, c nhiu chiu

    v i nhiu c p v nhiu khon mc trong mi c p . Vic ti thng tinchiu l khng cn thit nu ta dng cc chiu dng chung c x ltrong cc khi.

    Cc thayi trong s kho cha d liu mnh h ng n cu trccc khi i hi cc khi ny c s thay i cu trc v sau c x l.Cc thayi hoc cc bsung vo d liu trong kho cha d liu khngihi cc khi phi c x l hon ton. Nh vy nhng s thay i c th

    c k t h p trong cc khi hin c sdng cc la chn x l c p nht giatng hoc lm t i dliu, phthuc vo cch thayi dliu.

    3.3.2.3. Kh i o (Virtual Cube)

    Ta c th lin k t cc khi trong khi o ging nh cc bng c th c lin k t v i cc khung nhn trong mt c s d liu quan h. Mt khio cung c p truy c p t i d liu trong cc khi k t h p m khngi hi xy

    dng mt khi m i, n cho php ta duy tr thit k tt nht cho mi khi ring bit.

    3.3.3 Chi u (Dimension)

    Cc chiu l cch m tchng loi m theo cc dliu strong khi

  • 8/8/2019 000000208019.pdf

    47/125

    - 47 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    c phn chiaphn tch. V d: nu mt n v o l ng ca khi l tngs sn phm (Production Count) v cc chiu ca n l th i gian, n i sn

    xut, sn phm (Time, Factory Location, Product) th ng i dng khi c th phn chia tng s sn phm theo th i gian, n i sn xut, sn phm (Time,Factory Location, Product).

    Mt chiu c th c dng b i nhiu khi khc v c gi l mtchiu dng chung. Ni chung, cc khi cn chia xmt hay nhiu h n ccchiu. V dnh ta c hai khi: DOANH_THU v NHN_S . Hai khiny chia x hai chiu chung: Ca_hng v Th i_gian. Ngoi ra khi

    DOANH_THU c thm cc chiu: Sn_phm, Khung_cnh vBin_s _sp. Khi NHN_S c thm cc chiu: Nhn_vin vBin_s _s.

    Hnh 3.5. S m hnha khi

    Cc chiu chia sc th c dng trong bt ckhi no ca c s d liu. Bng vic to ra cc chiu chia sv dng chng tronga khi, ta trnh

  • 8/8/2019 000000208019.pdf

    48/125

    - 48 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    c vic to ra cc chiu cc bging ht nhau trong mi chiu thuc cckhi.

    Cc chiu chia scng cho php tiu chun ho trong scc khi. Vd: cc khi chia schun cho th i gian v v tr a l m bo r ng d liu c phn tch tcc khi khc nhau s c tchc t ng tnhau. Ta cngc th to mt loi chiu khc c bit n nhl chiuo.

    3.3.3.1. Xc nh cc chi u

    Khi xcnh mt chiu, ta chn mt hoc nhiu ct ca mt trong cc bng lin k t (bng chiu). Nu ta chn cc ct phc t p th tt ccn c quanhv i nhau, chng hn cc gi tr ca chng c th c t chc theo h thng phn c p n. xcnh hthng phn c p, s p x p cc ct tchungnht t i c thnht. V d: mt chiu Th i gian (Time) c to ra t ccct Nm, Qu, Thng, Ngy (Year, Quarter, Month, Day).

    Mi ct trong chiu gp phn vo mt c p cho chiu. Cc c p c s p t theo nt ring bit v c tchc trong hthng c p bc m ntha nhn cc cch h p Logic cho vic o su (Drill_down). V d: chiuTh i gian c miu t trn cho php ng i dng khi o su(Drill_down) t Nm t i Qu, t Qu t i Thng v t Thng t iNgy. Mi Drill_down cung c p ntc tr ng h n.

    Mi c p c cha cc thnh phn. Cc thnh phn l cc gi tr trongct xcnh c p . V d: c p Qu c thgm 4 thnh phn: Qu I,Qu II, Qu III v Qu IV. Tuy nhin, nu d liu trong bng ko di

    h n mt nm, v dc p Nm cha 3 gi tr khc nhau: 1996, 1997 v1998 th c p Qu sgm 12 thnh phn.

    3.3.3.2. Chi u c phn c p

    Phn c p l ct sng ca vic g p d liu hay ni mt cch khc l da

  • 8/8/2019 000000208019.pdf

    49/125

    - 49 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    vo cc phn c p m vic g p d liu m i c th thc hin c. Phn l ncc chiu u c mt cu trca mc hay phn c p. Nu chng ta lm nhng

    quyt nh vgi sn phm ti a doanh thu th chng ta cn quan st nhng d liu vdoanh thu sn phm c g p theo gi sn phm, tc lchng ta thc hin mt cch g p. Khi cn lm nhng quyt nh khc thchng ta cn thc hin nhng php g p t ng ng khc. Nh vy c thcqu nhiu tin trnh g p nn cc tin trnh g p ny cn phi c thc hinmt cch r t ddng, linh hot c thhtr nhng phn tch khng hochnh tr c. iu ny c th c gii quyt trn c s c s tr gip ca

    nhng phn c p r ng v su.3.3.3.3. Phn c p chi u

    Xt v dvmt phn c p chiu qua hnh vsau:

    Hnh 3.6. Phn c p chiu Sn_phm

    Cc tham chiu n cc phn t trong ccng dng a chiu th nglin quann mt vi phn tkhc. Tham chiu lin quan trong mt cu trc phn c p th phc t p h n tham chiu lin quan trong cu trc dng v ct.

  • 8/8/2019 000000208019.pdf

    50/125

    - 50 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    Cu trc phn c p th ng quan tmn h ng m chng tam. V dnh khi chng ta mun tham chiu n tt ccc phn t c cng c ly v i Gia

    dng khim tgc th t p cc phn tny sgm: Bn, Gh, T, Giadng, Vn phng (cng l hai mc m tgc).Phn c p chiu nh trn gi l phn c p bt i xng. Phn c p nh

    trong hnh sau gi l phn c p i xng:

    Hnh 3.7. Cy phn c p i xng

    Trong phn c p i xng chng ta c th tham kho n cc phn t theo mc ca n. Nhvy cc Qu l mt t p h p cc phn tmt mc t d i ln v mt mc ttrn xung.

    3.3.3.4. Roll_up v Drill_down d a trn phn c p chi u

    Da trn phn c p theo chiu, tmt mc d i chng ta c thcunln (Roll_up) cc mc trn, thc hin mt php g p c c k t qutngh p h n v tmt mc trn c thkhoan su xung (Drill_down) cc mc

    d i c cc k t quchi tit h n (xem v dhnh 3.8).

    3.3.3.5. Cc chi u o (Virtual Dimensions)

    Chiu o l mt kiu c bit, n nh xcc thuc tnh ca cc thnh phntrong cc chiu khc vo trong mt chiu m sau c th c dng trong

  • 8/8/2019 000000208019.pdf

    51/125

    - 51 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    cc khi. Cc chiu o v thuc tnh thnh phn c nh gi l cn thitcho cc yu cu v chng khngi hi lu tr khi vt l.

    Hnh 3.8. Roll_up v Drill_down theo phn c p chiu

    3.3.4. Cc n v o l ng (Measures)

    Cc n v o ca khi l cc ct trong bng Fact. Cc n v o l ngxcnh nhng gi tr s tbng Fact c tng h p phn tch nh nh gi,tr gi hoc sl ng.

    3.3.5. Cc phn ho ch (Partitions)

    Tt ccc khi u c ti thiu mt phn hoch cha dliu ca n.Mt phn hoch n c t ng to ra khi khi c nh ngh a. Khi ta tomt phn hoch m i cho mt khi, phn hoch m i ny c thm vo trongt p h p cc phn hoch tn ti i v i khi. Khi phn nh dliu c

  • 8/8/2019 000000208019.pdf

    52/125

    - 52 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    k t ni c trong tt ccc phn hoch ca n. Mt bng phn hoch ca khil v hnhi v i ng i dng.

    Cc phn hoch tiu biu cho mt cng cmnh, mm do cho vicqun tr cc khi OLAP,c bit cc khi l n. V d: mt khi cha thng tinth ng mi c th cha trong mt hoc nhiu phn hoch cho d liu canhng nm tr c v cc phn hoch cho mi qu ca nm hin ti. Cui nmcc bng phn hoch ca bn qu c th c h p nht trong mt phn hoch n cho nm . Cc bng phn hoch c th c lu tr v i cc s lachn k t h p khc nhau theo ph ng thc lu tr , nh v d liu ngun v

    thit k k t h p. Tnh mm do ny cho php ta thit k cc chin l c lu tr khi thch h p v i cc yu cu.

    Cc bng phn hoch phi c thit k v qun l ph h p trnhcc k t qumu thun hay sai lch. Tnh ton vn ca d liu khi da vodliu c phn bgia cc phn hoch ca khi v thdliu khng b l pli gia cc phn hoch. Khi d liu c tng k t t cc bng phn hoch, bt k mt thnh phn d liu no c trong mt phn hoch s c tng k t

    nh thchng l cc thnh phn d liu khc nhau.iu ny c th a racc bn tng k t khng chnh xc v d liu sai cho ng i dng. V d, nucng vic kinh doanh th ng mi cho sn phm X c l p li trong cc bngFact cho hai phn hoch, cc tng k t ca vic mua bn sn phm X c th bao gm vic tnh ton hai ln.

    Cc phn hoch c th c h p nht, ta c thdng tnh nng nytrong ton bchin l c lu tr v c p nht dliu. Cc phn hoch ch ch p nht nu chng c cng ch lu tr v cc khi t p h p. to cc phn hoch dnh cho vic h p nht vsau, ta c thla chn ch lu tr v sao chp cc khi k t h p tmt phn hoch khc khi ta to phn hoch.Ta cng c thsa i mt phn hoch sau khi n c to ra v sao chp cc

  • 8/8/2019 000000208019.pdf

    53/125

  • 8/8/2019 000000208019.pdf

    54/125

    - 54 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    CSDL tc nghi p.

    3.3.6.2. ROLAP (Relational OLAP)

    D liu c bn ca khi c lu tr cng v i d liu k t h p(Aggregation) trong c s d liu quan h. Ph ng php ti p cn ny baogm cc dch vca OLAP v c s d liu quan h. Cc d liu c lutr trong nhng bng quan hv c thc kch th c hng tr m Gigabyte. Nhng hROLAP cung c p cc Engine truy vn cc k linhng bng vicchun b sn sng tt cd liu tc nghi p cho ng i sdng u cui, d dng trch v tng h p d liu theo yu cu. Nhng cng cROLAP c th trch dliu tr t nhiu ngun CSDL quan hkhc nhau.

    ROLAP l sla chn cho kho dliu c nhngc im sau: Dliu th ng xuyn thayi: trong mt kho d liu hay bin ng v

    ng i sdng li i hi nhng tng h p gn nh tc th i, ROLAP s l s la chn duy nht. MOLAP phi trch ly v tng h p d liungoi tuyn (Offline), h n na hu ht cc c s d liu a chiu u

    yu cu tnh ton li ton bCSDL khi mt chiu c thm vo, khimt l c tng h p thayi hoc khi d liu m i c thm vo. Nhng c im ny khin cho MOLAP khng thch h p v i nhng h htr quyt nh m ngun dliu th ng xuyn bin ng.

    Khi l ng d liu l n: i v i nhng kho d liu c l n c Terabyte, MOLAPi hi vic tnh ton tr c d liu v i hng tr mTerabyte khng gian lu tr .

    Cc dng truy vn khng c bit tr c: ROLAP cho php truy vnv tng h p t bt k ngun d liu tc nghi p no. Tuy nhin kh nng ny li dn t i sphc t p khi sdng, trong vic nh xt i ccngun dliu tc nghi p.

  • 8/8/2019 000000208019.pdf

    55/125

    - 55 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    3.3.6.3. HOLAP (Hybrid OLAP)

    L k t h p hai ph ng php MOLAP v ROLAP. D liu c bn ca

    khi c lu tr trong c s d liu quan h v d liu k t h p(Aggregation) c lu tr trong cu trca chiu hiu sut cao. Lu tr HOLAPa ra nhng l i ch ca MOLAP cho vic lin k t m khng cnthit mt bn sao chnh xc tdliu chi tit.

    3.4. Thut ton ch s ho cc khung nhn trong x l phn tch tr ctuyn kho d liu

    C hai cch th ng c s dng truy nh p tr c ti p vo kho d liu. Cch thnht thng qua cc khung nhn (View) nhiu chiu v thhinn nh l cu trc nhiu chiu phc vcho vic phn tch v l p bo co cc tr m lm vic. thc hin hiu qux l phn tch tr c tuyn trn cckhung nhn d liu, ng i ta th ng t p trung xy dng cc thut ton chn t ng cc bng tng h p v ch sha cc khung nhn. Cch thhai l phn tch tr c ti p cc khi d liu nhiu chiu c to l p t cc kho d

    liu v to ra khnng tng h p, g p chung, h tr cho vic ra quyt nh v dbo, phn tch xu thpht trin v phn tch thng k.

    Trong lun vn ny ti xin gi i thiu thut ton chn t ng ccSubcubes v cc ch st ngng xl tr c sao cho h p l nht.

    Xt v d(1), khi quan st kho d liu qun l cc thng tin kinh doanhtcc ca hng ca mt tng cng ty, ng i ta nhn thy nhng cu hi cnxl OLAP th ng c dng:

    Scc Mt_hng bn ra hng tun ca mi Ca_hng? Sl ng bn ra ca tng Mt_hng l bao nhiu?

    tr l i cho c nhng cu hi trn th cc ch ng trnhng dngOLAP phi nhn vo kho d liu theo nhiu chiu (ph ng din) khc nhau.

  • 8/8/2019 000000208019.pdf

    56/125

    - 56 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    v dtrn, cc thuc tnh xcnh chiu l Ca_hng v Mt_hng. n v ca chiu m chng ta quan tm nhiu nht y l: shng bn ra. Hthng

    x l OLAP cn biu din d liu cho ng i sdng cc View nhiu chiu dng hnh khi (Data Cube). Trong v d trn, Data Cube s bao gm 4Subcube nhsau:

    Sl ng bn ra ca mi Mt_hng tng ca_hng, Sl ng bn ra ca mi Mt_hng tt ccc Ca_hng, Sl ng bn ra cc Mt_hng trong tng Ca_hng, Sl ng bn ra cc Mt_hng tt ccc Ca_hng.

    3.4.1. M t s khi ni m c bn

    3.4.1.1. Cc kh i d liu con (Subcubes)

    Subcube l mt bphn ca khi d liu (Data Cube). Ni cch khc,mi phn tca t p cc t p con ca cc chiu kho d liu sl mt Subcube.Xt ti p v d(1) trn, mi c p {Mt_hng, Khch_hng} st ngng v imt Subcube cha Mt_hng bn ra cho tng Khch_hng. Trong SQL cc

    Subcube ch khc nhau b i cu lnh g p (Groupby Clause). y chng tacng cho Subcube t ngng v i mt t p cc thuc tnh c thg p c v inhau. Nh vy {Mt_hng, Khch_hng} s t ng ng v i mt Subcube c xcnh b i cu lnh trong SQL nhsau:

    SELECT M t_hng, Khch_hng, SUM(Hng_bn) AS TotalSales

    FROM R

    GROUP BY M t_hng, Khch_hng

    3.4.1.2. Cu truy v n (Queries)

    Mi cu truy vn c ths dng chiu nh l thuc tnhla chn(trong SQL chiu l thuc tnh trong Groupby Clause - cu lnh g p li hoct ngng v i Where Clause - cu lnh m tha mniu kin no).

  • 8/8/2019 000000208019.pdf

    57/125

    - 57 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    S dng cch vit rt gn ca m hnh, ta c thvit cu truy vn Q

    d i dng: cps trong xc nh nhng thuc tnh g p li (Groupby);

    xc nh cc thuc tnh chn t p h p li (Selection) ca tng cu hi; c:Khch_hng (customer); p: Mt_hng (part) v s: Hng_bn (sales). Tt nhin

    th tcc thuc tnh l khng quan tr ng, cu truy vn psc cng hon ton

    ging nh pcs.

    Mi cu truy vn dng c(p = constant(R))l yu cu vlt ct thng

    qua Subcube(customer, part). Ta quinh cu truy vn tng qut:cp v gin l cu truy vn v lt ct (Slice Query)i v i Subcube(customer, part).

    Dng tng qutG1, ..., Gk cho Subcube(G1, ..., Gk, S1, ..., Sl) l nhngSubcube nh nht tham gia tr l i cho cu hi trn v i k v l l nhng th nguyn ca kho dliu.

    3.4.1.3. Ch s (Indexes)

    tng tc x l cc cu truy vn, ta c thsdng cu trc ch s B-cy (B-Tree: Balance-Tree). V d i v i Subcube(p,s), ta c thxy dng

    nh ch snhsau: Ips: Tm nhng ch sm n c ghp li tchiu p (part) v i chiu s

    (sales).

    Isp: Tm nhng ch sm n c ghp li thai chiu s v p y th tcc chiu l quan tr ng. Cho tr c mt gi tr ca p, ta c

    thsdng Ipstm tt ccc hng trong Subcube(p,s) m n c gi tr p.T ng t, cho tr c c p (p,s) ta sdng Ipstm trong Subcube(p,s) nhnghng, ct c c p gi tr .

    S dng ch sB-cy sgip rt ngn c th i gian tr l i cho cccu truy vn. i v i mi View ta c mt s cch ch s ha. V d v iSubcube(p,s) ta c thxy dng 4 cchnh ch s nh sau: Ip(ps), Is(ps),

  • 8/8/2019 000000208019.pdf

    58/125

    - 58 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    Ips(ps), Isp(ps).Trong mi tr ng h p ta lit k cc thuc tnh kha tm kim nhl ch

    sv i Subcube(p,s) m trong Index c xy dng. Mi t p con cc thuctnh ca mt quan st View, ta c thxcnh mt ch stheo mt th tno. Nhvy cc ch sc thca mt khung nhn View v i m thuc tnh l:

    0!

    m

    r

    mr

    r =

    Nhvy, scc ch sl qu l n. Ni chung, ch sc thhtr tr l icc cu truy vn. xl dliu nhanh, chnh xc th phi xl tr c vcc

    lt ct khi mt tin t (Prefix) ca cc thuc tnh c ch sha t ngngnhng thuc tnh la chn (Selection Attribute) trong cu truy vn hay ch s ha cc khung nhn vo kho dliu.

    3.4.1.4. Quan h tnh ton v ph thuc

    Gia cc cu truy vn (Queries) v cc khung nhn (Views), tanhngh a quan htnh ton

  • 8/8/2019 000000208019.pdf

    59/125

    - 59 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    customer), nhng partcustomer v ng c li cng vy.T p h p cc Subcube ca Data Cube to ra cc xcnh trn s c

    gi l quan hphthuc ca cc View. V dcc Subcube ca Data Cubenu v d(1) cng v i quan h to thnh cc Subcube ca mt CSDL nh sau:

    trong non l t p r ng

    Chng ta ddng nhn thy nu V1 V2 v Q1

  • 8/8/2019 000000208019.pdf

    60/125

    - 60 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    mi cnh (Q,V). Tnhng phn tch trn, ta c thtng qut ha cng thcnh gi ph tn nhng cu tr l i cho cu truy vn Q khi sdng View V v

    ch sJ.Githit Q l cu truy vn AB trong A v B l cc t p vchiu. B

    =khi v ch khi Q l cu truy vn vSubcube v B =ngh a lc p ntt ccc chiu.

    Githit V l View C. Nu Q

  • 8/8/2019 000000208019.pdf

    61/125

    - 61 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    6 .( , , )0,01 .

    tr C Q V J

    tr =

    3.4.2. Thu t ton ch n View v Index

    thc hin thut ton ta cn bit cthnhng thng tin sau: Kch th c ca cc View, Kch th c ca tng ch s, V i mi b3 (Query, View, Index), ph tn ca cu tr l i C(Q,V,Y) l

    bao nhiu.

    trn ta a ra cng thc (*)tnh C(Q,V,J). Vn cn li yl cn xcnh c thkch th c ca tng View v tng Index.y khng phi l vn n gin v kch th c ca chng th ng r t l n.

    3.4.2.1. c tnh kch th c ca m i View

    C nhiu cch xcnh kch th c ca View m khng cn thit phic thc thha tt ccc View. Ta c thsdng ph ng php phn tchv ly mu xcnh kch th c ca View tnhiu View khc m chng tach cn cthha phn tV1 l n nht (View cha tt ccc chiu) trong ccView. V i mt View, nu cc thuc tnh nhm li mc l p t nh th ta c th xc nh theo ph ng php gii tch theo kch th c ca View. Ng c li cthdng mu V1 tnh kch th c ca cc View khc. Kch th c ca mtView l scc gi tr khc nhau ca cc thuc tnh m chng c nhm li.

    3.4.2.2. c tnh kch th c ca ch s Index

    Cho tr c kch th c ca mi View, ta hy tnh kch th c ca ch s Index t ng ng. Thng th ng kch th c ca View trong m hnh l s dng trong View. Kch th c ca Index (B-cy) l s cc l ca B-cytheo cchnh ch sIndex. Mt khc scc nt l ca B-cy cho mt Indexx p x sdng ca mt View t ngng. Vy ta c k t lun: kch th c ca

  • 8/8/2019 000000208019.pdf

    62/125

    - 62 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    mt ch sIndex trn View V cng l kch th c ca bn thn View V.Ta hy xt hai Index J1 = IA(V) v J2 = TB(V)i v i cng mt View

    V. Nu B l tin t thc sca A th C(Q,V,J1) C(Q,V,J2) v i mi cu truyvn Q.Mt khc, kch th c ca J1 v J2 l cng x p x v i kch th c ca mt

    View theo cng mt chnh xc nn ta c thbJ2 m ch xt J1. Nhvy lv i tng View, ta ch cn chn ch sdi nht x l, l nhng ch smcc thuc tnh kha tm kim ca n khng phi l tin t thc s ca ccthuc tnh tm kim ca cc Index khc trong cng mt View.

    Nu V l View (C) th t p cc ch s Index s l {ID(V) D l mthon v ca C}.

    3.4.2.3. Xc nh bi ton

    Nh trn nu, nhim vca ta l xy dng cc thut tonchncc View v Index cth tr l i cho nhng cu truy vn i v i mt DataCube cho tr c. Ta c thpht biu mt cch khng hnh thc: cho tr c mt

    t p cc View, mi View li xcnh mt t p cc Index v mt t p cc cutruy vn m hthng cn phi tr l i. Mc ch ca ta l chn View v Indextrong s c c cu tr l i cho cc cu truy vn v i ph tn th p nhtv i mt iu kin rng buc l t p cc View v t p cc Index khng chimnhiu khng gian h n mt khng gian cho tr c S,y l bi ton NP_ y_ . Do vy gii quyt c bi ton trn, ta phi xy dng nhng thutton c tnh Heuristic, nhng phi m bo thut ton thc hin hiu qu.

    Tr c tin chng ta tin hnh hnh thc ha bi ton nu trn. Xt th l ng phn, G = (VQ, E) c gi l th cu truy vn - khung nhn(Query - View Graph), V l t p cc View cn Q cha cc cu truy vn. V i

    mi vi V xcnh t ngng mt b(Si, Ii) trong Si l khng gian m vi

  • 8/8/2019 000000208019.pdf

    63/125

    - 63 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    chim v Ii l t p cc ch strn vi.K hiu Iik l ch sthk ca vi.

    V i mi qi Q xcnh t ngng ph tn Ti tr l i cho cu truy vnqi. Mi cnh (qi, v j) c nhn c gn t ngng l (k, tijk ), trong tijk l

    ph tn cu tr l i cho cu truy vn qi sdng View v j v ch s thk ca n. Khi k = 0, tijk l ph tn ca cu tr l i cho qi m ch sdng v j.

    Bi ton: Cho t p cc View V v t p cc cu hi Q, cn xcnh M V, t p cc View v cc ch scthsao cho khng gian m cc View v cc

    ch s chim khng v t qu S (khng gian gi i hn) ng th i v i cchchn M cng m bo c cc tiu ha ph tn ton b c c cu tr l i cho cu truy vn Q tmt trong cc View ca M.

    Ngh a l ta cn cc tiu hai l ng sau sao cho tng khng gian mcc cu trc c la chn tM nhh n S:

    | |

    ijk 1

    ( , ) min( ,min )Q

    i

    G M Ti t =

    = (**), j jk V I M

    Bi ton trn ddng tng qut ha thnh bi ton xcnh cc Viewv Index trong Data Cube.

    3.4.2.4. Gi i quy t bi ton

    Tr c tin chng ta hynh ngh a mt sk hiu. C - t p bt k ccView v Index trongthG. S(C) l khng gian cc cu trc chim trong C.

    B(C,M) l sinh l i ca C so v i M v: B(C,M) =(G, M) - (G, MC);B(C,) l sinh l i tuyt i ca C.

    Vkhng gian, l i ca C so v i M sl B(C, M) / S(C).

  • 8/8/2019 000000208019.pdf

    64/125

    - 64 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    a. Thut ton r - cu trcCho tr c: th cu h i - khung nhn G

    Khng gian h n ch S

    BEGIN

    M = ; /* M = t p cc c u trc c chn */

    While (S(M) < S)

    BEGIN

    Tm t t ccc t p View v Index c a mt trong cc d ng sau:

    {vi , I ij1 , I ij2 , ..., I ijp } sao cho v i M,I ijl M v i

    1 l, 0 p < r

    ho c{I ij } sao cho v i ng v i I ij M v I ij M.

    Chn C l m t trong s cc t p trn m sinh l i v khng gian so v i M l

    c c i.

    t M = M C;

    END while

    Return M;

    END;

    Thut ton r - cu trc thc hin trong mt s b c m mi b c thchn t p con ca C cha nhiu nht r cu trc. C l t p h p gm:

    Mt View v mt sch st ngng ca n hoc Mt ch sm View c chn b c tr c.

    Vn chnh ca thut ton l chn C mi b c sao cho sinh l i can so v i M l cc i.

    nh gi thut ton: Githit c n View trong Data Cube v mi Viewc nhiu nht 1 ch s. Khi thut ton r - cu trc phi thc hin mi b c cn tnh ton sinh l i ca n*1+n* (1/r-1) t p h p. Nhvy phc t p

    ca thut ton 1 sl (kmr) trong m l scu trc cho tr c ca thGv k l scu trc c chn trong thut ton, tr ng h p xu nht l bng S.

  • 8/8/2019 000000208019.pdf

    65/125

    - 65 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    b. Thut ton tng qutCng nh trn, mi b c ca thut ton cn chn mt t p con C bao

    gm: Mt View v mt s ch s c chn khng b hn chvs l nghoc

    Mt ch sm View t ngng c chn b c tr c.Cn lu l kch th c ca C skhng b gi i hn b i r nh thut ton

    trn. Mi b c ca thut ton phi thc hin hai phn: V i mi View vi chng ta xy dng t p IGi m lcu ch cha vi. Sau

    b sung thm dn cc ch svo IGi cho n khi sinh l i vkhnggian ca IGi, t p cc cu trc c chn t t i cc i.

    Ti p theo l chn ch s Index m sinh l i vkhng gian ca Viewt ngng so v i Mt c cc i.So snh sinh l i trn v i sinh l i ca C v i M, ci no tt h n th b

    sung vo M.Thut ton ny c m thnh thc nhsau:

    Cho tr c: th query - view G

    Khng gian kh ng ch S

    BEGIN

    M = ; /* M = t p cc c u trc c chn */

    While (S(M)

  • 8/8/2019 000000208019.pdf

    66/125

    - 66 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    IG = IG I ic;

    END while;

    If (B(IG,M)/S(IG) > B(C,M)/|C| or C= then C = IG;

    END for

    For I ij m v i M

    If (B(I ij ,M)/S(I ij ) > B(C,M)/S(C) then C = {I ij };

    M = M C;

    END while

    Return M;

    END;

    nh gi thut ton:phc t p ca thut ton l (k 2

    mr), trong ml tng scu trc ca th G v k l scu trc cc i h p v i khnggian S, tr ng h p xu nht l bng S.

    3.3.5 K t lun

    Vic thc hin cc cu hi theo OLAP phthuc r t nhiu vo vic tol p bng tng h p theo cc View trong cc kho d liu. tng hiu qux l cc cu hi, chng ta c thsdng ch sha (Index) trn cc khung nhn(View). Hai thut ton trn m tcch chn cc View (hay Subcube) v xcinh ch s Index cn phi tnh ton tr c tng hiu qux l cc cu hiOLAPi v i kho dliu.

  • 8/8/2019 000000208019.pdf

    67/125

    - 67 -

    Lun vn t t nghi p cao h c chuyn ngnh X l Thng tin v Truy n thng kho 2004 - 2006

    Ch ng IV. H tr gip quy t nh d a vo d liu

    4.1. Htr gip quyt nh

    4.1.1. Gi i thi u

    Ngay tnh