Upload
phamtuong
View
230
Download
4
Embed Size (px)
Citation preview
Da
ta M
inin
g:
Con
cep
ts a
nd
Tec
hn
iqu
es—
Ch
ap
ter
1 a
nd
2 —
Slid
es r
elat
ed t
o:
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es1
p
—In
trod
uct
ion
an
d D
ata
pre
pro
cess
ing
—
Jia
wei
Ha
n a
nd
Mic
hel
ine
Ka
mb
er
Dep
art
men
t of
Com
pu
ter
Scie
nce
Un
iver
sity
of
Illin
ois
at
Urb
an
a-C
ha
mp
aig
n
ww
w.c
s.u
iuc.
edu
/~h
an
j©
200
6 Ji
aw
ei H
an
an
d M
ich
elin
e K
am
ber
. A
ll ri
gh
ts r
eser
ved
.
Wh
y D
ata
Min
ing?
nTh
e Ex
plos
ive
Gro
wth
of
Dat
a: fr
om t
erab
ytes
to
peta
byte
s
nD
ata
colle
ctio
n an
d da
ta a
vaila
bilit
y
nAu
tom
ated
dat
a co
llect
ion
tool
s, d
atab
ase
syst
ems,
Web
,
com
pute
rized
soc
iety
nM
ajor
sou
rces
of
abun
dant
dat
a
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es2
j
nBu
sine
ss:
Web
, e-c
omm
erce
, tra
nsac
tions
, sto
cks,
…
nSc
ienc
e: R
emot
e se
nsin
g, b
ioin
form
atic
s, s
cien
tific
sim
ulat
ion,
…
nSo
ciet
y an
d ev
eryo
ne:
new
s, d
igita
l cam
eras
, You
Tube
nW
e ar
e dr
owni
ng in
dat
a, b
ut s
tarv
ing
for
know
ledg
e!
n“N
eces
sity
is t
he m
othe
r of
inve
ntio
n”—
Dat
a m
inin
g—Au
tom
ated
anal
ysis
of
mas
sive
dat
a se
ts
Ex. 1
: M
arke
t A
nal
ysis
an
d M
anag
emen
t
nW
here
doe
s th
e da
ta c
ome
from
?—Cr
edit
card
tra
nsac
tions
, lo
yalty
car
ds,
disc
ount
cou
pons
, cu
stom
er c
ompl
aint
cal
ls, pl
us (
publ
ic)
lifes
tyle
stu
dies
nTa
rget
mar
ketin
gn
Find
clu
ster
s of
“m
odel
” cu
stom
ers
who
sha
re t
he s
ame
char
acte
ristic
s: in
tere
st,
inco
me
leve
l, sp
endi
ng h
abits
, etc
.
nD
eter
min
e cu
stom
er p
urch
asin
g pa
tter
ns o
ver
time
nCr
oss-
mar
ket
anal
ysis
—Fi
nd a
ssoc
iatio
ns/c
o-re
latio
ns b
etw
een
prod
uct
sale
s,
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es3
y/
p,
& p
redi
ct b
ased
on
such
ass
ocia
tion
nCu
stom
er p
rofil
ing—
Wha
t ty
pes
of c
usto
mer
s bu
y w
hat
prod
ucts
(cl
uste
ring
or c
lass
ifica
tion)
nCu
stom
er r
equi
rem
ent
anal
ysis
nId
entif
y th
e be
st p
rodu
cts
for
diff
eren
t gr
oups
of
cust
omer
s
nPr
edic
t w
hat
fact
ors
will
att
ract
new
cus
tom
ers
nPr
ovis
ion
of s
umm
ary
info
rmat
ion
nM
ultid
imen
sion
al s
umm
ary
repo
rts
nSt
atis
tical
sum
mar
y in
form
atio
n (d
ata
cent
ral t
ende
ncy
and
varia
tion)
Ex. 2
: C
orpo
rate
An
alys
is &
Ris
k M
anag
emen
t
nFi
nanc
e pl
anni
ng a
nd a
sset
eva
luat
ion
nca
sh fl
ow a
naly
sis
and
pred
ictio
n
nco
ntin
gent
cla
im a
naly
sis
to e
valu
ate
asse
ts
ncr
oss-
sect
iona
l and
tim
e se
ries
anal
ysis
(fin
anci
al-r
atio
, tr
end
anal
ysis
, etc
.)
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es4
y,
)
nRes
ourc
e pl
anni
ng
nsu
mm
ariz
e an
d co
mpa
re t
he r
esou
rces
and
spe
ndin
g
nCo
mpe
titio
n
nm
onito
r co
mpe
titor
s an
d m
arke
t di
rect
ions
ngr
oup
cust
omer
s in
to c
lass
es a
nd a
cla
ss-b
ased
pric
ing
proc
edur
e
nse
t pr
icin
g st
rate
gy in
a h
ighl
y co
mpe
titiv
e m
arke
t
Ex. 3
: Fr
aud
Det
ecti
on &
Min
ing
Un
usu
al P
atte
rns
nAp
proa
ches
: Cl
uste
ring
& m
odel
con
stru
ctio
n fo
r fr
auds
, ou
tlier
ana
lysi
s
nAp
plic
atio
ns:
Hea
lth c
are,
ret
ail,
cred
it ca
rd s
ervi
ce, t
elec
omm
.n
Auto
insu
ranc
e: r
ing
of c
ollis
ions
nM
oney
laun
derin
g:su
spic
ious
mon
etar
y tr
ansa
ctio
ns
nM
edic
al in
sura
nce
nPr
ofes
sion
al p
atie
nts,
rin
g of
doc
tors
, an
d rin
g of
ref
eren
ces
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es5
p,
g,
g
nU
nnec
essa
ry o
r co
rrel
ated
scr
eeni
ng t
ests
nTe
leco
mm
unic
atio
ns:
phon
e-ca
ll fr
aud
nPh
one
call
mod
el:
dest
inat
ion
of t
he c
all,
dura
tion,
tim
e of
day
or
wee
k.
Anal
yze
patt
erns
tha
t de
viat
e fr
om a
n ex
pect
ed n
orm
nRet
ail i
ndus
try
nAn
alys
ts e
stim
ate
that
38%
of
reta
il sh
rink
is d
ue t
o di
shon
est
empl
oyee
s
nAn
ti-te
rror
ism
Evol
uti
on o
f D
atab
ase
Tech
nol
ogy
n19
60s:
nD
ata
colle
ctio
n, d
atab
ase
crea
tion,
IM
S an
d ne
twor
k D
BMS
n19
70s:
n
Rel
atio
nal d
ata
mod
el,
rela
tiona
l DBM
S im
plem
enta
tion
n19
80s:
n
Adva
nced
dat
a m
odel
s (e
xten
ded-
rela
tiona
l, O
O, d
educ
tive,
etc
.)
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es6
()
nAp
plic
atio
n-or
ient
ed D
BMS
(spa
tial,
tem
pora
l, m
ultim
edia
, et
c.)
n19
90s:
n
Dat
a m
inin
g, d
ata
war
ehou
sing
, m
ultim
edia
dat
abas
es,
and
Web
da
taba
ses
n20
00s
nSt
ream
dat
a m
anag
emen
t an
d m
inin
g
nD
ata
min
ing
and
its a
pplic
atio
ns
nW
eb t
echn
olog
y (X
ML,
dat
a in
tegr
atio
n) a
nd g
loba
l inf
orm
atio
n sy
stem
s
Wh
at I
s D
ata
Min
ing?
nD
ata
min
ing
(kno
wle
dge
disc
over
y fr
om d
ata)
n
Extr
actio
n of
inte
rest
ing
(non
-triv
ial,
impl
icit,
pre
viou
sly
unkn
own
and
pote
ntia
lly u
sefu
l)pa
tter
ns o
r kn
owle
dge
from
hu
ge a
mou
nt o
f da
ta
nD
ata
min
ing:
a m
isno
mer
?
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es7
nAl
tern
ativ
e na
mes
nKn
owle
dge
disc
over
y (m
inin
g) in
dat
abas
es (
KDD
), k
now
ledg
e ex
trac
tion,
dat
a/pa
tter
n an
alys
is, d
ata
arch
eolo
gy, d
ata
dred
ging
, inf
orm
atio
n ha
rves
ting,
bus
ines
s in
telli
genc
e, e
tc.
nW
atch
out
: Is
eve
ryth
ing
“dat
a m
inin
g”?
nSi
mpl
e se
arch
and
que
ry p
roce
ssin
g
n(D
educ
tive)
exp
ert
syst
ems
Kn
owle
dge
Dis
cove
ry (
KD
D)
Pro
cess
nD
ata
min
ing—
core
of
know
ledg
e di
scov
ery
proc
ess
Tk
ltD
tDat
a M
inin
g
Pat
tern
eva
luat
ion
and
pres
enta
tion
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es8
Dat
a C
lean
ing Dat
a In
tegr
atio
n
Dat
abas
es
Dat
a W
areh
ouseTa
sk-r
elev
ant D
ata
Sele
ctio
n an
d tr
ansf
orm
atio
n
Wh
y D
ata
Pre
proc
essi
ng?
nD
ata
in t
he r
eal w
orld
is d
irty
nin
com
plet
e: la
ckin
g at
trib
ute
valu
es, l
acki
ng c
erta
in
attr
ibut
es o
f in
tere
st, o
r co
ntai
ning
onl
y ag
greg
ate
data
ne.
g., o
ccup
atio
n=“
”i
ti
itli
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es9
nno
isy:
con
tain
ing
erro
rs o
r ou
tlier
sn
e.g.
, Sal
ary=
“-10
”n
inco
nsis
tent
: co
ntai
ning
dis
crep
anci
es in
cod
es o
r na
mes
ne.
g., A
ge=
“42”
Birt
hdat
e=“0
3/07
/199
7”n
e.g.
, Was
rat
ing
“1,2
,3”,
now
rat
ing
“A, B
, C”
ne.
g., d
iscr
epan
cy b
etw
een
dupl
icat
e re
cord
s
Wh
y Is
Dat
a D
irty
?
nIn
com
plet
e da
ta m
ay c
ome
from
n“N
ot a
pplic
able
” da
ta v
alue
whe
n co
llect
edn
Diff
eren
t co
nsid
erat
ions
bet
wee
n th
e tim
e w
hen
the
data
was
col
lect
ed
and
whe
n it
is a
naly
zed.
nH
uman
/har
dwar
e/so
ftw
are
prob
lem
sn
Noi
sy d
ata
(inco
rrec
t va
lues
) m
ay c
ome
from
nFa
ulty
dat
a co
llect
ion
inst
rum
ents
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es10
nH
uman
or
com
pute
r er
ror
at d
ata
entr
yn
Erro
rs in
dat
a tr
ansm
issi
onn
Inco
nsis
tent
dat
a m
ay c
ome
from
nD
iffer
ent
data
sou
rces
nFu
nctio
nal d
epen
denc
y vi
olat
ion
(e.g
., m
odify
som
e lin
ked
data
)n
Dup
licat
e re
cord
s al
so n
eed
data
cle
anin
g
Wh
y Is
Dat
a P
repr
oces
sin
g Im
port
ant?
nN
o qu
ality
dat
a, n
o qu
ality
min
ing
resu
lts!
nQ
ualit
y de
cisi
ons
mus
t be
bas
ed o
n qu
ality
dat
a
ne.
g., d
uplic
ate
or m
issi
ng d
ata
may
cau
se in
corr
ect
or e
ven
mis
lead
ing
stat
istic
s.
nD
ata
war
ehou
se n
eeds
con
sist
ent
inte
grat
ion
of q
ualit
y da
ta
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es11
gq
y
nD
ata
extr
actio
n, c
lean
ing,
and
tra
nsfo
rmat
ion
com
pris
es
the
maj
ority
of
the
wor
k of
bui
ldin
g a
data
war
ehou
se
Form
s of
Dat
a P
repr
oces
sin
g
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es12
Arc
hit
ectu
re:
Typi
cal D
ata
Min
ing
Syst
em
Dat
aM
inin
gEn
gine
Patt
ern
Eval
uatio
n
Gra
phic
al U
ser
Inte
rfac
e
Know
led
ge-
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es13
data
cle
anin
g, in
tegr
atio
n, a
nd s
elec
tion
Dat
abas
e or
Dat
a W
areh
ouse
Ser
ver
Dat
a M
inin
g En
gine
edge
Base
Dat
abas
eD
ata
War
ehou
seW
orld
-Wid
eW
ebO
ther
Inf
oR
epos
itor
ies
Wh
y N
ot T
radi
tion
al D
ata
An
alys
is?
nTr
emen
dous
am
ount
of
data
nAl
gorit
hms
mus
t be
hig
hly
scal
able
to
hand
le la
rge
amou
nts
of d
ata
nH
igh-
dim
ensi
onal
ity o
f da
ta
nM
icro
-arr
ay m
ay h
ave
tens
of
thou
sand
s of
dim
ensi
ons
nH
igh
com
plex
ity o
f da
ta
Dt
td
dt
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es14
nD
ata
stre
ams
and
sens
or d
ata
nTi
me-
serie
s da
ta, t
empo
ral d
ata,
seq
uenc
e da
ta
nSt
ruct
ure
data
, gra
phs,
soc
ial n
etw
orks
and
mul
ti-lin
ked
data
nH
eter
ogen
eous
dat
abas
es a
nd le
gacy
dat
abas
es
nSp
atia
l, sp
atio
tem
pora
l, m
ultim
edia
, tex
t an
d W
eb d
ata
nN
ew a
nd s
ophi
stic
ated
app
licat
ions
Dat
a M
inin
g: C
lass
ific
atio
n S
chem
es
nG
ener
al fun
ctio
nalit
y
nD
escr
iptiv
e da
ta m
inin
g
nPr
edic
tive
data
min
ing
nD
iffer
ent
view
sle
adto
diff
eren
tcl
assi
ficat
ions
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es15
nD
iffer
ent
view
s le
ad t
o di
ffer
ent
clas
sific
atio
ns
nD
ata
view
: Ki
nds
of d
ata
to b
e m
ined
nKn
owle
dge
view
: Ki
nds
of k
now
ledg
e to
be
disc
over
ed
nM
etho
dvi
ew:
Kind
s of
tec
hniq
ues
utili
zed
nAp
plic
atio
nvi
ew:
Kind
s of
app
licat
ions
ada
pted
Dat
a M
inin
g: o
n w
hat
kin
ds o
f da
ta?
nD
atab
ase-
orie
nted
dat
a se
ts a
nd a
pplic
atio
ns
nRel
atio
nal d
atab
ase,
dat
a w
areh
ouse
, tr
ansa
ctio
nal d
atab
ase
nAd
vanc
ed d
ata
sets
and
adv
ance
d ap
plic
atio
ns
nO
bjec
t-re
latio
nal d
atab
ases
nTi
me-
serie
s da
ta,
tem
pora
l dat
a, s
eque
nce
data
(in
cl.
bio-
sequ
ence
s)
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es16
nSp
atia
l dat
a an
d sp
atio
tem
pora
l dat
a
nTe
xt d
atab
ases
and
Mul
timed
ia d
atab
ases
nD
ata
stre
ams
and
sens
or d
ata
nTh
e W
orld
-Wid
e W
eb
nH
eter
ogen
eous
dat
abas
es a
nd le
gacy
dat
abas
es
Dat
a M
inin
g –
wh
at k
inds
of
patt
ern
s?
nCo
ncep
t/cl
ass
desc
riptio
n:
nCh
arac
teriz
atio
n: s
umm
ariz
ing
the
data
of
the
clas
s un
der
stud
y in
gen
eral
ter
ms
nE.
g. C
hara
cter
istic
s of
cus
tom
ers
spen
ding
mor
e th
an 1
0000
se
k pe
r ye
ar
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es17
nD
iscr
imin
atio
n: c
ompa
ring
targ
et c
lass
with
oth
er (
cont
rast
ing)
cl
asse
s
nE.
g. C
ompa
re t
he c
hara
cter
istic
s of
pro
duct
s th
at h
ad a
sal
es
incr
ease
to
prod
ucts
tha
t ha
d a
sale
s de
crea
se la
st y
ear
Dat
a M
inin
g –
wh
at k
inds
of
patt
ern
s?
nFr
eque
nt p
atte
rns,
ass
ocia
tion,
cor
rela
tions
nFr
eque
nt it
emse
t
nFr
eque
nt s
eque
ntia
l pat
tern
nFr
eque
nt s
truc
ture
d pa
tter
n
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es18
nE.
g. b
uy(X
, “D
iape
r”)
buy(
X, “
Beer
”) [
supp
ort=
0.5%
, con
fiden
ce=
75%
]
conf
iden
ce:
if X
buys
a d
iape
r, t
hen
ther
e is
75%
cha
nce
that
X b
uys
beer
supp
ort:
of
all t
rans
actio
ns u
nder
con
side
ratio
n 0.
5% s
how
ed tha
t di
aper
and
beer
wer
e bo
ught
tog
ethe
r
nE.
g. A
ge(X
, ”20
..29”
) an
d in
com
e(X,
”20
k..2
9k”)
bu
ys(X
, ”cd
-pla
yer”
) [s
uppo
rt=
2%, c
onfid
ence
=60
%]
Dat
a M
inin
g –
wh
at k
inds
of
patt
ern
s?
nCl
assi
ficat
ion
and
pred
ictio
n
nCo
nstr
uct
mod
els
(fun
ctio
ns)
that
des
crib
e an
d di
stin
guis
h cl
asse
s or
con
cept
s fo
r fu
ture
pre
dict
ion.
The
deriv
ed m
odel
is b
ased
on
anal
yzin
g tr
aini
ng d
ata
data
who
secl
ass
labe
lsar
ekn
own
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es19
–da
ta w
hose
cla
ss la
bels
are
kno
wn.
nE.
g., c
lass
ify c
ount
ries
base
d on
(cl
imat
e), o
r cl
assi
fy c
ars
base
d on
(ga
s m
ileag
e)
nPr
edic
t so
me
unkn
own
or m
issi
ng n
umer
ical
val
ues
nCl
uste
r an
alys
isn
Clas
s la
bel i
s un
know
n: G
roup
dat
a to
for
m n
ew c
lass
es, e
.g.,
clus
ter
cust
omer
s to
fin
d ta
rget
gro
ups
for
mar
ketin
gn
Max
imiz
ing
intr
a-cl
ass
sim
ilarit
y &
min
imiz
ing
inte
rcla
ss s
imila
rity
nO
utlie
r an
alys
is
Dat
a M
inin
g –
wh
at k
inds
of
patt
ern
s?
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es20
nO
utlie
r: D
ata
obje
ct t
hat
does
not
com
ply
with
the
gen
eral
beh
avio
r of
the
dat
an
Noi
se o
r ex
cept
ion?
Use
ful i
n fr
aud
dete
ctio
n, r
are
even
ts a
naly
sis
nTr
end
and
evol
utio
n an
alys
isn
Tren
d an
d de
viat
ion
Are
All
the
“Dis
cove
red”
Pat
tern
s In
tere
stin
g?
nD
ata
min
ing
may
gen
erat
e th
ousa
nds
of p
atte
rns:
Not
all
of t
hem
are
inte
rest
ing
nSu
gges
ted
appr
oach
: H
uman
-cen
tere
d, q
uery
-bas
ed,
focu
sed
min
ing
nIn
tere
stin
gnes
s m
easu
res
nA
patt
ern
is in
tere
stin
gif
it is
eas
ily u
nder
stoo
dby
hum
ans,
val
idon
new
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es21
pg
yy
,
or t
est
data
with
som
e de
gree
of
cert
aint
y, p
oten
tially
use
ful,
nove
l,or
valid
ates
som
e hy
poth
esis
that
a u
ser
seek
s to
con
firm
nO
bjec
tive
vs.
su
bjec
tive
inte
rest
ingn
ess
mea
sure
s
nO
bjec
tive:
base
d on
sta
tistic
s an
d st
ruct
ures
of
patt
erns
, e.
g., su
ppor
t,
conf
iden
ce,
etc.
nSu
bjec
tive:
base
d on
use
r’s b
elie
fin
the
dat
a, e
.g.,
unex
pect
edne
ss,
nove
lty, a
ctio
nabi
lity,
etc
.
Fin
d A
ll an
d O
nly
In
tere
stin
g P
atte
rns?
nFi
nd a
ll th
e in
tere
stin
g pa
tter
ns:
Com
plet
enes
s
nCa
n a
data
min
ing
syst
em fi
nd a
llth
e in
tere
stin
g pa
tter
ns?
Do
we
need
to
find
allo
f th
e in
tere
stin
g pa
tter
ns?
nH
euris
tic v
s. e
xhau
stiv
e se
arch
nAs
soci
atio
n vs
. cla
ssifi
catio
n vs
. clu
ster
ing
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es22
g
nSe
arch
for
only
inte
rest
ing
patt
erns
: An
opt
imiz
atio
n pr
oble
m
nCa
n a
data
min
ing
syst
em fi
nd o
nly
the
inte
rest
ing
patt
erns
?
nAp
proa
ches
nFi
rst
gene
rate
all
the
patt
erns
and
the
n fil
ter
out
the
unin
tere
stin
g on
es
nG
ener
ate
only
the
inte
rest
ing
patt
erns
—m
inin
g qu
ery
optim
izat
ion
Dat
a M
inin
g –
wh
at t
ech
niq
ues
use
d?
Dat
abas
e Te
chno
logy
Stat
istic
s
Mhi
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es23
Dat
a M
inin
gM
achi
neLe
arni
ng
Patt
ern
Reco
gniti
onAl
gorit
hmO
ther
Dis
cipl
ines
Visu
aliz
atio
n
Top-
10
Mos
t P
opu
lar
DM
Alg
orit
hm
s:1
8 I
den
tifi
ed C
andi
date
s (I
)
nCl
assi
ficat
ion
n#
1. C
4.5:
Qui
nlan
, J. R
. C4.
5: P
rogr
ams
for
Mac
hine
Lea
rnin
g. M
orga
n Ka
ufm
ann.
, 199
3.n
#2.
CAR
T: L
. Br
eim
an, J.
Frie
dman
, R. O
lshe
n, a
nd C
. Sto
ne. C
lass
ifica
tion
and
Regr
essi
on T
rees
. W
adsw
orth
, 19
84.
n#
3. K
Nea
rest
Nei
ghbo
urs
(kN
N):
Has
tie, T.
and
Tib
shira
ni,
R. 1
996.
D
iscr
imin
ant
Adap
tive
Nea
rest
Nei
ghbo
r Cl
assi
ficat
ion.
TPA
MI.
18(
6)
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es24
n#
4. N
aive
Bay
es H
and,
D.J
., Yu
, K.,
2001
. Idi
ot's
Bay
es:
Not
So
Stup
id
Afte
r Al
l? I
nter
nat.
Sta
tist.
Rev
. 69
, 385
-398
.n
Stat
istic
al L
earn
ing
n#
5. S
VM:
Vapn
ik,
V. N
. 199
5. T
he N
atur
e of
Sta
tistic
al L
earn
ing
Theo
ry.
Sprin
ger-
Verla
g.n
#6.
EM
: M
cLac
hlan
, G
. an
d Pe
el, D
. (20
00).
Fin
ite M
ixtu
re M
odel
s. J
. W
iley,
New
Yor
k. A
ssoc
iatio
n An
alys
isn
#7.
Apr
iori:
Rak
esh
Agra
wal
and
Ram
akris
hnan
Srik
ant.
Fas
t Al
gorit
hms
for
Min
ing
Asso
ciat
ion
Rul
es. In
VLD
B '9
4.n
#8.
FP-
Tree
: H
an, J
., Pe
i, J.
, and
Yin
, Y.
2000
. Min
ing
freq
uent
pat
tern
s w
ithou
t ca
ndid
ate
gene
ratio
n. I
n SI
GM
OD
'00.
The
18
Ide
nti
fied
Can
dida
tes
(II)
nLi
nk M
inin
gn
#9.
Pag
eRan
k: B
rin, S
. and
Pag
e, L
. 199
8. T
he a
nato
my
of a
la
rge-
scal
e hy
pert
extu
al W
eb s
earc
h en
gine
. In
WW
W-7
, 199
8.n
#10
. HIT
S: K
lein
berg
, J. M
. 199
8. A
utho
ritat
ive
sour
ces
in a
hy
perli
nked
env
ironm
ent.
SO
DA,
199
8.n
Clus
terin
g#
11K
MM
QJ
BS
thd
fl
ifiti
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es25
n#
11. K
-Mea
ns:
Mac
Que
en, J
. B.,
Som
e m
etho
ds fo
r cl
assi
ficat
ion
and
anal
ysis
of
mul
tivar
iate
obs
erva
tions
, in
Proc
. 5th
Ber
kele
y Sy
mp.
Mat
hem
atic
al S
tatis
tics
and
Prob
abili
ty, 1
967.
n#
12. B
IRCH
: Zh
ang,
T.,
Ram
akris
hnan
, R.,
and
Livn
y, M
. 199
6.
BIRCH
: an
eff
icie
nt d
ata
clus
terin
g m
etho
d fo
r ve
ry la
rge
data
base
s. I
n SI
GM
OD
'96.
nBa
ggin
g an
d Bo
ostin
gn
#13
. Ada
Boos
t: F
reun
d, Y
. and
Sch
apire
, R. E
. 199
7. A
dec
isio
n-th
eore
tic g
ener
aliz
atio
n of
on-
line
lear
ning
and
an
appl
icat
ion
to
boos
ting.
J. C
ompu
t. S
yst.
Sci
. 55,
1 (
Aug.
199
7), 1
19-1
39.
The
18
Ide
nti
fied
Can
dida
tes
(III
)
nSe
quen
tial P
atte
rns
n#
14. G
SP:
Srik
ant,
R. a
nd A
graw
al,
R. 1
996.
Min
ing
Sequ
entia
l Pat
tern
s:
Gen
eral
izat
ions
and
Per
form
ance
Im
prov
emen
ts.
In P
roce
edin
gs o
f th
e 5t
h In
tern
atio
nal C
onfe
renc
e on
Ext
endi
ng D
atab
ase
Tech
nolo
gy,
1996
.n
#15
. Pre
fixSp
an:
J. P
ei, J
. Han
, B.
Mor
taza
vi-A
sl, H
. Pi
nto,
Q. C
hen,
U.
Day
al a
nd M
-C. H
su. Pr
efix
Span
: M
inin
g Se
quen
tial P
atte
rns
Effic
ient
ly b
y Pr
efix
-Pro
ject
edPa
tter
nG
row
thIn
ICD
E'0
1
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es26
Pref
ixPr
ojec
ted
Patt
ern
Gro
wth
. In
ICD
E 01
.n
Inte
grat
ed M
inin
gn
#16
. CBA
: Li
u, B
., H
su, W
. and
Ma,
Y. M
. Int
egra
ting
clas
sific
atio
n an
d as
soci
atio
n ru
le m
inin
g. K
DD
-98.
n
Rou
gh S
ets
n#
17. F
indi
ng r
educ
t: Z
dzis
law
Paw
lak,
Rou
gh S
ets:
The
oret
ical
Asp
ects
of
Reas
onin
g ab
out
Dat
a, K
luw
er A
cade
mic
Pub
lishe
rs,
Nor
wel
l, M
A, 1
992
nG
raph
Min
ing
n#
18. g
Span
: Ya
n, X
. an
d H
an, J.
200
2. g
Span
: G
raph
-Bas
ed S
ubst
ruct
ure
Patt
ern
Min
ing.
In
ICD
M '0
2.
Top-
10
Alg
orit
hm
Fin
ally
Sel
ecte
d at
IC
DM
’06
n#
1:
C4
.5 (
61
vot
es)
n#
2:
K-M
ean
s (6
0 v
otes
)n
#3
: SV
M (
58
vot
es)
n#
4:
Apr
iori
(5
2 v
otes
)
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es27
n#
5:
EM (
48
vot
es)
n#
6:
Pag
eRan
k (4
6 v
otes
)n
#7
: A
daB
oost
(4
5 v
otes
)n
#7
: kN
N (
45
vot
es)
n#
7:
Nai
ve B
ayes
(4
5 v
otes
)n
#10
: C
AR
T (3
4 vo
tes)
A B
rief
His
tory
of
Dat
a M
inin
g So
ciet
y
n19
89 I
JCAI
Wor
ksho
p on
Kno
wle
dge
Dis
cove
ry in
Dat
abas
es
nKn
owle
dge
Dis
cove
ry in
Dat
abas
es (
G. Pi
atet
sky-
Shap
iro a
nd W
. Fra
wle
y,
1991
)
n19
91-1
994
Wor
ksho
ps o
n Kn
owle
dge
Dis
cove
ry in
Dat
abas
es
nAd
vanc
es in
Kno
wle
dge
Dis
cove
ry a
nd D
ata
Min
ing
(U. F
ayya
d, G
. Pi
atet
sky-
Shap
iro,
P. S
myt
h, a
nd R
. Uth
urus
amy,
199
6)
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es28
yp
,y
,y,
)
n19
95-1
998
Inte
rnat
iona
l Con
fere
nces
on
Know
ledg
e D
isco
very
in D
atab
ases
an
d D
ata
Min
ing
(KD
D’9
5-98
)
nJo
urna
l of
Dat
a M
inin
g an
d Kn
owle
dge
Dis
cove
ry (
1997
)
nAC
M S
IGKD
D c
onfe
renc
es s
ince
199
8 an
d SI
GKD
D E
xplo
ratio
ns
nM
ore
conf
eren
ces
on d
ata
min
ing
nPA
KDD
(19
97),
PKD
D (
1997
), S
IAM
-Dat
a M
inin
g (2
001)
, (I
EEE)
ICD
M
(200
1), e
tc.
nAC
M T
rans
actio
ns o
n KD
D s
tart
ing
in 2
007
Con
fere
nce
s an
d Jo
urn
als
on D
ata
Min
ing
nKD
D C
onfe
renc
esn
ACM
SIG
KDD
Int
. Con
f. on
Kn
owle
dge
Dis
cove
ry in
D
atab
ases
and
Dat
a M
inin
g (K
DD
)n
SIAM
Dat
a M
inin
g Co
nf. (
SDM
)(I
EEE)
It
Cf
Dt
nO
ther
rel
ated
con
fere
nces
nAC
M S
IGM
OD
nVL
DB
n(I
EEE)
ICD
E
nW
WW
, SIG
IR
ICM
LCV
PRN
IPS
Dat
a M
inin
g: C
once
pts
and
Tech
niqu
es29
n(I
EEE)
Int
. Con
f. on
Dat
a M
inin
g (I
CDM
)n
Conf
. on
Prin
cipl
es a
nd
prac
tices
of
Know
ledg
e D
isco
very
and
Dat
a M
inin
g (P
KDD
)n
Paci
fic-A
sia
Conf
. on
Know
ledg
e D
isco
very
and
Dat
a M
inin
g (P
AKD
D)
nIC
ML,
CVP
R, N
IPS
nJo
urna
ls
nD
ata
Min
ing
and
Know
ledg
e D
isco
very
(D
AMI
or D
MKD
)
nIE
EE T
rans
. On
Know
ledg
e an
d D
ata
Eng.
(TK
DE)
nKD
D E
xplo
ratio
ns
nAC
M T
rans
. on
KDD