26
HC VIN CÔNG NGHBƯU CHÍNH VIN THÔNG --------------------------------------- Vũ ThGương KTHUT KHAI PHÁ DLIU CHUI THI GIAN ÁP DNG TRONG DBÁO CHNG KHOÁN Chuyên ngành: Truyn dliu và Mng máy tính Mã s: 60.48.15 TÓM TT LUN VĂN THC SĨ HÀ NI - 2012

TTLV VuThiGuong asd dá dá

Embed Size (px)

DESCRIPTION

d d dá da sdasdasd

Citation preview

  • HC VIN CNG NGH BU CHNH VI N THNG ---------------------------------------

    V Th Gng

    K THUT KHAI PH D LIU CHUI THI GIAN

    P DNG TRONG D BO CHNG KHON

    Chuyn ngnh: Truyn d liu v Mng my tnh M s: 60.48.15

    TM TT LUN VN THC S

    H NI - 2012

  • Lun vn c hon thnh ti:

    HC VIN CNG NGH BU CHNH VI N THNG

    Ngi hng dn khoa hc: TS. NGUYN C DNG

    Phn bin 1: ....................................................................

    Phn bin 2: ....................................................................

    Lun vn s c bo v trc Hi ng chm lun vn

    thc s ti Hc vin Cng ngh Bu chnh Vin thng

    Vo lc: ..... gi ....... ngy ..... thng ..... nm ............

    C th tm hiu lun vn ti:

    - Th vin ca Hc vin Cng ngh Bu chnh Vin thng

  • 1

    M U

    1. L do chn ti

    Ngy nay, khi x hi ngy cng pht trin th lng

    thng tin cng tng ln vi tc bng n. Lng d liu

    khng l y l mt ngun ti nguyn v gi nu nh

    chng ta bit cch pht hin v khai thc nhng thng tin

    hu ch c trong . Nh vy vn t ra vi d liu ca

    chng ta l vic lu tr v khai thc chng. Cc phng

    php khai thc d liu truyn thng ngy cng khng p

    ng c nhu cu thc t. Mt khuynh hng k thut

    mi ra i l K thut Khai ph d liu v khm ph tri

    thc (Knownledge Discovery and Data mining - KDD).

    Cng ngh khai ph d liu ra i cho php ta khai

    thc c nhng tri thc hu dng bng vic trch xut

    nhng thng tin c mi quan h hoc mi tng quan nht

    nh t mt kho d liu ln (cc ln) m bnh thng

    khng th nhn din c t gii quyt cc bi ton tm

    kim, d bo cc xu th, cc hnh vi trong tng lai, v

    nhiu tnh nng thng minh khc. Ngy nay, cc cng

  • 2

    ngh data mining c ng dng rng ri trong hu ht

    cc lnh vc: phn tch d liu, d bo,

    Mt trong nhng vn quan trng nht trong lnh

    vc ti chnh hin i l tm kim

    nhng cch thc hiu qu tm tt v hnh dung d

    liu th trng chng khon cung cp

    cho cc c nhn hoc t chc nhng thng tin hu ch v

    cc hnh vi th trng h tr vic ra cc quyt nh u t.

    S lng ln d liu c gi tr c to ra bi th

    trng chng khon thu ht c cc nh nghin cu

    khm ph vn ny bng cch s dng cc phng php

    khc nhau.

    i vi Vi t Nam, th trng chng khon cn kh

    mi m, song ai cng bit c tim nng v li ch ng

    k ca n. Vic khai thc c th trng ny s em li

    li ch kinh t cao. D bo th trng chng khon l mt

    cng vic kh quan trng khai thc lnh vc ny. Chnh

    v vy ti chn ti K thut khai ph d liu chui

    thi gian p dng trong d bo chng khon lm

    lun vn tt nghip vi mc ch hiu c cng ngh

  • 3

    data mining cng nh ng dng to ln ca n trong vic

    d bo, d on xu hng trong tng lai, c bit l

    trong lnh vc th trng ti chnh, chng khon t c

    nhng quyt nh u t, giao dch ph hp.

    2. Mc ch nghin cu

    - Nghin cu khi nim, vai tr, ng dng v cc k

    thut khai ph d liu.

    - Tm hiu k thut phn tch d liu chui thi gian

    trong khai ph d liu p dng vo bi ton d bo ni

    chung v d bo trong th trng chng khon ni ring.

    - Tm hiu m hnh ARIMA (Auto Regressive

    Integrate Moving Average) vi chc nng nhn dng

    m hnh, c lng cc tham s v a ra kt qu d bo

    da trn cc tham s c lng c la chn mt

    cch ti u. Thc nghim m hnh ARIMA trn d liu

    thi gian thc, p dng vi d liu chng khon hng ti

    vic d bo chng khon.

    3. i tng v phm vi nghin cu

    Nghin cu cc k thut khai ph d liu, tp trung

    vo k thut phn tch chui theo thi gian p dng vo

  • 4

    bi ton d bo s ln xung ca th trng chng khon.

    M hnh ARIMA thc nghim trn d liu VNIndex,

    ABT, ACB.

    4. Phng php nghin cu

    Nghin cu, tm hiu l thuyt v cc k thut khai

    ph d liu.

    Tm hiu, phn tch d liu ti chnh, chng khon.

    Tm hiu c s l thuyt v m hnh ARIMA cho d

    liu thi gian thc (time series) v cch p dng vo bi

    ton thc t - d bo s ln xung ca th trng chng

    khon.

    Xy dng v thi hnh m hnh ARIMA v ng dng

    vo bi ton khai ph d liu chui thi gian trong d bo

    ti chnh, chng khon

    S dng phn mm Eviews thi hnh chng trnh.

    nh gi kt qu d bo c.

    5. Kt cu lun vn

    Ni dung chnh ca lun vn chia lm 3 chng:

  • 5 Chng 1: Tng quan v khai ph d liu gii thiu

    tng quan v qu trnh pht hin tri thc v khai ph d

    liu, cc k thut khai ph d liu v ng dng ca khai

    ph d liu.

    Chng 2: K thut khai ph d liu chui thi

    gian gii thiu v d liu chui thi gian thc v bi ton

    d bo ang c quan tm trong khai ph d liu. Gii

    thiu c s l thuyt ca m hnh ARIMA v cc bc

    pht trin m hnh. Bi ton d bo c p dng di

    kha cnh s dng m hnh ARIMA cho chui thi gian

    thc. Tip n gii thiu v phn mm Eviews cho qu

    trnh thi hnh.

    Chng 3: p dng m hnh ARIMA cho bi ton

    d bo chng khon trnh by thc nghim bi ton d

    bo vi chui d liu ti chnh, chng khon bng m

    hnh ARIMA. Thi hnh cc bc trong m hnh vi phn

    mm Eviews 6, a ra kt qu v nh gi vi thc t.

    Cui cng l Phn kt lun v hng pht trin ca

    ti.

  • 6

    Chng 1: TNG QUAN V KHAI PH D LIU

    1.1.Gii thi u

    1.1.1. Khi nim

    Khai ph d liu (Data Mining)

    Khm ph tri thc (Knownledge Discovery - KD)

    Data Mining l mt qu trnh trch xut thng tin c

    mi quan h hoc c mi tng quan nht nh t mt kho

    d liu ln (cc ln) nhm mc ch d on cc xu th,

    cc hnh vi trong tng lai, hoc tm kim nhng tp

    thng tin hu ch m bnh thng khng th nhn din

    c.

    1.1.2.Qu trnh pht hin tri thc trong CSDL

    Hnh 1.1. Qu trnh pht hin tri th c

  • 7

    1.2. Cc k thut khai ph d liu

    1.2.1. Cy quyt nh

    1.2.2. Mng nron

    1.2.3. Phn cm

    1.2.4. Lut kt hp

    1.2.5. Factor analysis (Phn tch nhn t)

    1.2.6. Chui thi gian

    1.3. ng dng ca khai ph d liu

    1.3.1. Dng d liu c th khai ph

    Data Mining c ng dng rng ri nn n c th

    lm vic vi rt nhiu kiu d liu khc nhau, mt s dng

    d liu in hnh nh: CSDL quan h, CSDL a chiu

    (multidimentional structures, data warehouses), CSDL

    dng giao dch, CSDL quan h-hng i tng, d liu

    khng gian v thi gian, D liu chui thi gian, CSDL a

    phng tin, d liu Text v Web...

    1.3.2. ng dng ca khai ph d liu

    Khai ph d liu l mt lnh vc c quan tm v

    ng dng rng ri. Mt s ng dng in hnh trong khai

    ph d liu c th lit k: (i) phn tch d liu v h tr ra

    quyt nh; (ii) iu tr y hc; (iii) pht hin vn bn; (iv)

  • 8

    tin sinh hc; (v) ti chnh v th trng chng khon; (vi)

    bo him...

    1.3.3.ng dng ca cc k thut KPDL trong th trng

    chng khon

    ng dng in hnh ca khai ph d liu trong th

    trng ti chnh, chng khon l: phn tch tnh hnh

    ti chnh v d bo gi ca cc loi c phiu trong th

    trng chng khon t mang li cho cc nh u t

    nhiu c hi chn la loi c phiu cn u t, c hnh

    thc v quy m giao dch ph hp nhm t c gi tr

    gia tng hiu qu.

    1.3.3.1. ng dng ca cy quyt nh

    1.3.3.2. ng dng ca mng nron

    1.3.3.3. ng dng ca phn cm

    1.3.3.4. ng dng ca lut kt hp

    1.3.3.5. ng dng ca phn tch nhn t

    1.3.3.6. ng dng ca time series

  • 9

    Chng 2: K THUT KHAI PH D LIU CHUI

    THI GIAN

    2.1. Bi ton d bo

    D bo l mt nhu cu khng th thiu cho nhng

    hot ng ca con ngi trong bi cnh bng n thng tin.

    D bo s cung cp nhng c s cn thit cho cc hoch

    nh, v c th ni rng nu khng c khoa hc d bo th

    nhng d nh tng lai ca con ngi vch ra s khng

    c s thuyt phc ng k.

    C rt nhiu phng php, k thut gii quyt bi

    ton d bo, trong c phng php d bo theo chui

    thi gian. ARIMA l m hnh d bo nh lng theo thi

    gian, gi tr tng lai ca bin s d bo s ph thuc vo

    xu th vn ng ca i tng trong qu kh (chui d

    liu qu kh).

    2.2. D liu chui thi gian

    Mt chui thi gian (Time Series) l mt chui cc

    quan st theo trt t thi gian. Ch yu nhng quan st

    ny c thu thp nhng khong thi gian ri rc, cch

    u nhau. Cc m hnh chui thi gian c c bit p

    dng trong d bo ngn hn. Trong cc bi ton d bo

  • 10

    ni chung v cc bi ton d bo ti chnh v chng khon

    ni ring, d liu thng c biu din di dng chui

    thi gian. Trong cc dng d liu c phn tch th d

    liu chui thi gian lun thuc tp u v tnh ph bin.

    2.2.1. Chui thi gian thc

    2.2.2. Thnh phn xu hng di hn

    2.2.3. Thnh phn ma

    2.2.4. Thnh phn chu k

    2.2.5. Thnh phn bt thng

    2.3. M hnh ARIMA cho d liu chui thi gian

    2.3.1. Cc cng c p dng trong m hnh

    2.3.1.1. Hm t tng quan ACF (AutoCorrelation

    Function)

    =

    .

    2.3.1.2. Hm t tng quan tng phn PACF

    y(t+k) = Ck1y(t+k-1) + Ck2y(t+k-2) + ... + Ckk-1y(t + 1) +

    Ckky(t) + e(t) (2.2)

  • 11 Tng quan, hm t tng quan tng phn c tnh

    theo Durbin :

    =

    (2.3)

    2.3.1.3. M hnh AR(p)

    y(t)=a0+a1y(t-1)+a2y(t-2)+apy(t-p)+e(t) (2.4)

    M hnh AR(1): y(t) = a0 + a1y(t-1) + e(t)

    M hnh AR(2): y(t) = a0 + a1y(t-1) + a2y(t-2) +e(t)

    2.3.1.4. M hnh MA(q)

    y(t) = b0 + e(t) +b1e(t-1) + b2e(t-2) + ... +bqe(t-q) (2.5)

    M hnh MA(1) : y(t) = b0 + e(t) + b1e(t-1)

    M hnh MA(2) : y(t) = b0 + e(t) + b1e(t-1) + b2e(t-2)

    2.3.1.5. Sai phn I(d)

    Sai phn ln 1 (I(1)) : z(t) = y(t) y(t-1)

    Sai phn ln 2 (I(2)) : h(t) = z(t) z(t-1)

  • 12

    2.3.2. M hnh ARIMA

    - M hnh ARMA(p,q):

    y(t) = a0+a1y(t-1)+a2y(t-2)+...+apy(t-p)+e(t)

    +b1e(t-1)+b2e(t-2)+...+bqe(t-q) (2.6)

    - M hnh ARIMA(p,d,q):

    M hnh ARIMA (1, 1, 1):

    y(t) y(t-1) = a0 + a1(y(t-1) y(t-2) + e(t) + b1e(t-1))

    Hoc z(t) = a0 + a1z(t-1) + e(t) + b1e(t-1),

    Vi z(t) = y(t) y(t-1) sai phn u tin: d = 1.

    Tng t ARIMA(1,2,1):

    h(t) = a0 + a1z(t-1) + e(t) + b1e(t-1),

    Vi h(t) = z(t) z(t-1) sai phn th hai: d = 2.

    2.3.3. Cc bc pht trin m hnh.

    2.3.3.1. Xc nh m hnh

    2.3.3.2. c lng tham s

    2.3.3.3. Kim nh chnh xc

    2.3.3.4. D bo

  • 13

    Hnh 2.16. S m phng m hnh Box - Jenkins

    2.4. Phn mm EVIEWS

    2.4.1. Gii thiu phn mm ng dng Eviews

  • 14

    Hnh 2.17.Ca s chnh ca Eviews [Ngun: Eviews

    5 Users Guide, tr16]

    2.4.2. p dng Eviews thi hnh cc bc ca m hnh

    ARIMA

    2.4.2.1. Xc nh m hnh

    2.4.2.2. c lng m hnh, kim tra m hnh

    2.4.2.3. D bo

  • 15

    Chng 3: P DNG M HNH ARIMA CHO BI

    TON D BO CHNG KHON

    3.1. D liu ti chnh, chng khon

    D liu chng khon c bit ti nh mt chui thi

    gian a dng bi c nhiu thuc tnh cng c ghi ti

    mt thi im no . Cc thuc tnh ca d liu chng

    khon l: Open, High, Low, Close, Volume

    3.2. M hnh ARIMA cho d bo chng khon

    3.2.1. Qu trnh xy dng m hnh

    - Xc nh m hnh

    - c lng, kim tra m hnh

    - D bo

    3.2.2. Thit k m hnh ARIMA cho d liu

    Cc bc xy dng mt m hnh nh sau :

    1. Chn tham bin

    2. Chun b d liu

    Xc nh tnh dng ca chui d liu

    Xc nh yu t ma v

    Xc nh yu t xu th

  • 16 3. Xc nh cc thnh phn p, q trong m hnh ARMA

    4. c lng cc tham s v chn on m hnh ph

    hp nht

    5. D bo ngn hn

    3.3. Thc nghim

    S dng m hnh ARIMA v phng php Box

    Jenkins thc hin 3 qu trnh d bo gi ng ca ca:

    VnIndex, m c phiu ABT (ca Cng ty c phn xut

    nhp khu thy sn Bn Tre) v m c phiu ACB (ca

    Ngn hng Thng mi c phn Chu) trong ngn hn

    cn c vo cc chui d liu qu kh ca cc m CK .

    3.2.1. Mi trng thc nghim

    3.2.2. D liu u vo

    D liu u vo ca lun vn c ly t

    http://www.cophieu68.com/datametastock.php. l 3

    file.CSV tng ng vi 3 m CK c ly t website trn

    xung. D liu c dng:

  • 17

    Hnh 3.1. D liu u vo.

    To cc workfile.

    3.2.3. X l d liu

    3.2.3.1. Kim tra tnh dng ca chui chng khon

    Da vo biu ca bin gi ng ca ca mi

    chui chng khon.

  • 18

    Hnh 3.6. Biu gi ng ca ca ABT

    3.2.3.2. Nhn dng m hnh

    - Xc nh cc tham s p, d, q trong m hnh ARIMA

    ca tng m CK da vo biu t tng quan.

    Hnh 3.9. Biu SAC v SPAC ca chui

    GIADONGCUA ca VNINDEX

  • 19

    3.2.3.3. c lng v kim nh vi m hnh ARIMA

    Hnh 3.16. c lng m hnh ARIMA(1,0,1) ca ABT

    Hnh 3.17. Kt qu m hnh ARIMA(1,0,1) ca ABT

  • 20

    Hnh 3.18. Kim tra phn d ca chui ABT

    Bng 3.2. Bng tiu chun nh gi cc m hnh ARIMA

    ca ABT

    M hnh

    ARIMA

    BIC Adjusted R2 SEE

    ARIMA(1,0,0) 2.385271 0.814950 0.782972

    ARIMA(1,0,1) 2.345217 0.825445 0.760445

    ARIMA(1,0,2) 2.397569 0.816063 0.780614

    M hnh c chn cho chui ABT l ARIMA(1,0,1)

    3.2.3. Thc hin d bo

    Thc hin d bo gi ng ca ca VNINDEX, ABT,

    ACB trong vng 8 ngy t 11/09/2012 n 20/09/2012

  • 21

    Hnh 3.22. D bo

    Hnh 3.23. Kt qu d bo VNINDEX.

  • 22

    Bng3.4. Bng nh gi gi d bo VNINDEX so

    vi gi thc t

    Ngy Gi d bo Gi thc

    t

    nh gi Sai s

    (%)

    11/09/2012 390.8433 386.6 4.2433 1.09

    12/09/2012 391.1221 388.4 2.7221 0.70

    13/09/2012 391.3961 391.4 -0.0039 ~0.00

    14/09/2012 391.6655 398.9 -7.2345 1.85

    17/09/2012 391.9303 401.8 -9.8697 2.52

    18/09/2012 392.1906 394.5 -2.3094 0.59

    19/09/2012 392.4465 394.6 -2.1535 0.55

    20/09/2012 392.6980 389.3 3.3980 0.87

    nh gi: kt qu d bo l kh chnh xc (mc sai

    s rt thp, t xp x 0% n 2.52%).

  • 23

    KT LUN

    Lun vn trnh by c tng quan v khai ph d

    liu: khi nim, cc k thut khai ph d liu v cc ng

    dng ca khai ph d liu. Trong lun vn tp trung

    vo k thut khai ph d liu chui thi gian p dng vo

    bi ton thc t ang c quan tm l bi ton d bo

    ni chung v d bo gi chng khon ni ring.

    Lun vn cng trnh by c mt s ni dung c

    s l thuyt v chui thi gian thc, v m hnh ARIMA

    (cc cng c p dng trong m hnh, quy trnh xy dng

    m hnh) v phn mm Eviews, p dng Eviews thi

    hnh cc bc ca m hnh ARIMA trong d bo chng

    khon. Tc gi c bn nm c quy trnh dng phn

    mm Eviews xy dng m hnh ARIMA cho d liu

    thi gian thc, tnh ton gi tr d bo cho chui d liu

    chng khon.

    Lun vn p dng nhng c s l thuyt nghin

    cu tin hnh thc nghim trn ba chui chng khon (ch

    s VnIndex, m CK ABT, ACB) da trn d liu lch s

    ca mi chui (gm 257 quan st trong qu kh) v d

    bo c gi ng ca ca 10 ngy tip theo. Kt qu d

  • 24

    bo c phn tch, kim tra, i chiu vi gi thc t

    v cho thy kt qu l kh chnh xc, tin cy cao.

    Nh vy cng cho thy rng m hnh ARIMA a ra cho

    mi chui chng khon trong lun vn l kh ph hp

    d bo ngn hn gi c phiu.

    Bn cnh nhng kt qu t c, lun vn cn

    mt s hn ch:

    - Thut ton c lng cng nh nh gi cn nhiu

    hn ch.

    - Trong cc phin giao dch cn c th c tc ng ca cc

    yu t ngoi lai ln nh tm l nh u t, tc ng ca

    cc th trng chng khon khc, thng tin v s thay i

    chnh sch, s lm cho sai s d bo tng. Do kt

    qu ca m hnh a ra vn ch mang tnh cht tham kho

    nhiu hn. y ch l m hnh phn tch k thut, cha th

    d bo mt cch chnh sch, bi ch ph thuc vo mt

    bin Thi gian, trong khi qu trnh d bo ph thuc vo

    nhiu yu t.

    Hng pht trin tip theo ca ti: Xy dng m

    hnh ARIMA a bin: ch s ca gi chng khon ph

    thuc vo nhiu bin khc nhau.