Text Adaptive Resonance Theory Neural Network for Support Textual Inputs

เท็กอะแดปทีพเรโซแนนเทียรนีิวรอลเน็ตเวิรค

Text Adaptive Resonance Theory Neural Network

นายนรเศรษฐ จันทสูตร43067138

แขนงวชิา วิทยาการสารสนเทศ รุนที่ 10คณะเทคโนโลยสีารสนเทศ

สถาบันเทคโนโลยีพระจอมเกลาเจาคุณทหารลาดกระบังอาจารยผูควบคุมวิทยานพินธ ผศ.ดร.วรพจน กรสีรุะเดช

โครงเรื่อง

ความเปนมาและปญหาของงานวิจัยแนวความคิดหลักของงานวิจัยนิวรอลเน็ตเวิรคที่ออกแบบใหมและใชในงานวิจัยการวัดประสิทธิภาพผลการทดลองบทสรุป

ความเปนมาและปญหาของงานวิจัย

ความเปนมาขอมูลเอกสารที่มีรูปแบบเปนอิเล็กทรอนิกสมีปริมาณเพิ่มมากขึ้นความตองการในการสืบคนขอมูลเอกสารที่ถกูตองและรวดเร็วการจัดกลุมเอกสาร (Document Clustering) เพื่อชวยในการคนคืนขอมูลใชเทคนิค Vector Space Model ในการแปลงขอมูลในรูปแบบตัวเลขสําหรับอัลกอริทึมการจัดกลุม

ความเปนมาและปญหาของงานวิจัย (ตอ)

Doc1 = “care, cat, dog”Doc2 = “care, care, cat, cat, dog”Index Terms set = {care, cat, dog}Ex. Query = “cat”

q = { 0, 1, 0}

21care

12Doc211Doc1

dogcat

Doc x index term

Similarity in vector SpaceCosine similarity measure

, ,1( , )2 2, ,1 1

t w wi j i qisim d qj t tw wi j i qi i

∑ ==∑ ∑= =

Sim(d1, q) = [ (1*0) + (1*1) + (1*0)] /sqrt(12 +12 + 12 ). sqrt( 02 + 12 + 02 )

= 1 / (1.73*1) = 1/1.73 = 0.578Sim(d2, q) = [ (2*0) + (2*1) + (1*0)] /

sqrt(22 +22 + 12 ). sqrt( 02 + 12 + 02 ) = 2 / (3*1) = 2/3 = 0.667

ปญหา การจัดกลุมเอกสาร (Document Clustering) เมื่อขอมูลเอกสารถูกนําเขากระบวนการ Document Representation โดยการหลักการจาก Vector Space Model อาจนําไปสูปญหา High Dimensional Vector ถาจํานวนคําศัพท (Index Term) มีมากขึ้นตามหลักการของ Vector Space Model การกําหนดคําใน Index Term set ตองถูกกําหนดขึ้น

ความเปนมาและปญหาของงานวิจัย (ตอ)k มีจํานวนมากเกินไป จะทําใหเกิดปญหา High Dimension Vector

docxtermknkn3n2n1n

k242322212

k111111111

)term,doc(tf...)term,doc(tf)term,doc(tf)term,doc(tf)term,doc(tf..................

)term,doc(tf...)term,doc(tf)term,doc(tf)term,doc(tf)term,doc(tf)term,doc(tf...)term,doc(tf)term,doc(tf)term,doc(tf)term,doc(tf

Term 1

Term 2

Term 3

Term 4

Term k

Unsupervised Learning ANNs

วัตถปุระสงคของงานวิจัย

เพื่อศกึษาและพัฒนาการจัดแบงกลุมเอกสารดวยโครงขายประสาทเทียมที่ใชอัลกอริทึมเท็กอะแดปทีพเรโซแนนเทียรีนิวรอลเน็ตเวิรค ที่สามารถรับอินพุตคํา (Text) เขาไปประมวลผลไดโดยตรง

ประโยชนที่คาดวาจะไดรับ

ลดปญหาการเกิด High-Dimensional Vector ของคําทําใหการจัดกลุมเอกสารทําไดสะดวกโดยวิธีการรับอนิพุตคํา (Text) โดยตรงเขามาประมวลผลในโครงขายประสาทเทียม

แนวความคิดหลักของงานวิจัย

Adaptive Resonance Theory Neural NetworksSimilarity Measure for symbolic Objectโครงขายประสาทเทียมที่พัฒนาขึ้นใชอัลกอริทึม

เท็กอะแดปทีพเรโซแนนเทียรีนิวรอลเน็ตเวิรค:Text Adaptive Resonance Theory Neural Network

Binary ART Neural Network Architecture

สถาปตยกรรมของ Binary Adaptive Resonance Theory Neural Network

vigilance

F1(a) Layer F1(b) Layer F2 Layer

ART1 learning algorithm1. Weights เริ่มตนไดจากการสุม2. หาคา 3. X = S4. ทุกเอาทพุทโหนด 5. หาเอาทพุทโหนดที่มีคาสูงสุด

เปนโหนดชนะ6. คํานวณอินพุต X กับ top-down

weight ของโหนดที่ชนะ 7. ทดสอบ Resonance

iijj xby

}ymax{ j

ρ≥sx

jii tsx

ART1 learning algorithm (ตอ)8. ปรับ weight ของโหนดที่ชนะ Jy

x1LLxb i

)new(iJ +−=

i)new(Ji xt =

Text ART Neural Network Architecture

สถาปตยกรรมของ Text Adaptive Resonance Theory Neural Network

vigilance

F1(a) Layer F1(b) Layer F2 Layer

Document Representation

เอกสารแตละชุดสามารถเขียนแทนดวย Cartesian Product ไดดังนี้

เมื่อ คือ Feature ลําดับที ่d ของเอกสาร

dDxxDxDxDDoc ... 321=

Document Representation (ตอ)

ในงานวิจัยนี้ไดใช feature ของขาว คือ feature Titlefeature Keyword

ดังนัน้ขาวหนึ่งขาวสามารถแทนไดดวยสมการดังนี้KeywordxTitleDoc =

Text ART Neural NetworkDocument input,

Title = {money, bank} Keyword ={economic, market, finance, interest}

vigilance

{money, bank}

{economic, market, finance, interest}

Text ART Neural Network

)}e,B(),...,e,B(),e,B{(t pjipjiji2ji2ji1ji1ji =

Weight ของ Text ART Neural Network ประกอบดวย• คํา• คาระดับความสัมพันธของคํา

)}e,A(),...,e,A(),e,A{(b pijpijij2ij2ij1ij1ij =

Similarity Measure for Symbolic object

คา ความคลายคลึง ประกอบดวยสองสวนคือคา ความคลายคลึงในเชิง span เขียนแทนดวย SS

ความคลายคลึงในเชิง content เขียนแทนดวย SC

Similarity Measure for Symbolic object (ตอ)

inters

คือ จํานวนสมาชิกทั้งหมดใน Feature A

คือ จํานวนสมาชิกทั้งหมดใน Feature B

คือ จํานวนของสมาชิกทั้งหมดที่ Intersection กันระหวาง Feature A และ Feature B

คือ จํานวนสมาชิกทั้งหมดของ Feature A และ Feature B รวมกันลบดวยจํานวนของสมาชิกทั้งหมดที่ Intersection กันระหวาง Feature A และ Feature B (la + lb – inters)

bakks l

kkc lBAS inters),( =

Similarity Measure for symbolic object (ตอ)

คาความคลายคลึงสุทธิระหวางเอกสาร และ คือ),(),(),( kkckkskk BASBASBAS +=

Ak B k

kkk BASBAS

1),(),(

นิยามให k คือ คุณลักษณะของเอกสารที่ kd คือ จํานวนของคุณลักษณะที่กําหนด

Text ART Learning AlgorithmCompetition ทําการหาโหนดผูชนะซึ่งมีคาความคลายคลึงสงูทีส่ดุ โดยใช Similarity Measure ของเอกสารอินพุตเปรียบเทียบกับ bottom-up weight และโหนดที่มีคาความคลายคลึงรองลงมาใหกําหนดเปนโหนดคูแขง Resonance ทําการทดสอบหาความสอดคลองโดยใชคา Vigilance Threshold เปรียบเทียบกับคาความคลายคลึงของเอกสารอินพุตและ top-down weight ของโหนดผูชนะ

กรณีไมสอดคลอง: ยกเลิกโหนดผูชนะ และเลือกโหนดคูแขงทีม่ีคาความคลายคลึงสงูทีส่ดุ กําหนดใหเปนโหนดผูชนะ และกลับไปขั้นตอน Resonance เพื่อทาํการทดสอบหาความสอดคลองใหม กรณสีอดคลอง: ยอมรับโหนดผูชนะ และไปขั้นตอนการปรับ Weight

Update weight

การคํานวณหาโหนดผูชนะ (Competition)

∑∑= =

1inijnij

1nij e).A,X(SY

}Ymax{Y jJ =

ทดสอบสวนความสอดคลอง Resonance

)d*5.0()d*2(

)d*5.0()t,X(SZ

weightsupdate,vigilanceZreset,vigilanceZ

การปรับ Weight ของ Text ART Neural Nets

Word in input document

weightUNION

Word in input document weight

การปรับ Weight ของ Text ART Neural Nets (ตอ)

∩∉−

∩∈+

Otherwise;*5

,Xb A if )e(f

iJnij)old(

)new(niJ

)old(iJ

)new(iJ

∩∉−

∩∈+

Otherwise;*5

,X tB if )e(f

Jinji)old(

)new(nJi

)old(Ji

)new(Ji

Bottom-up weight Top-down weight

if 0 x 1( ) 0 if x 0

1 if x 1

≤ ≤= < >

Where F(.) is defined as

การวัดประสิทธิภาพของการจัดกลุม

Entropy F-measure

การวัดประสิทธิภาพของการจัดกลุม (ตอ)Entropy แตละ cluster

m N jE Ep jNj= ×∑

log( )E p pj ij iji

= −∑

คา Entropy รวม หาไดจากคือจํานวนของสมาชิกใน Cluster j คือจํานวนของสมาชิกทั้งหมดคือจํานวน Cluster

p i j คือ Probability ของ Sample, Sj , ซึ่งเปนของ class Ci

Class1

Class3 Class4

Class2

การวัดประสิทธิภาพของการจัดกลุม (ตอ)

)j,i(precisionP ==

คือจาํนวนสมาชิกของ class Ci ใน Clusterjคือจํานวนสมาชิกของ Clusterjคือจํานวนสมาชิกของ class Ci

Cluster ,JClass, I

)j,i(recallR ==

Correct classes found in clusters

การวัดประสิทธิภาพของการจัดกลุม (ตอ)

คา F-Measure ของ class Ci หาไดจาก=

+2( ) PRF iP R

คา F-Measure รวม หาไดจาก( )( )×∑

i F iiFp iii คือจํานวนสมาชิกใน class Ci

การทดลองSynthesized data

Synthesized Alphabet Documents 3 classSynthesized Text Documents 3 class

Reuter-21578 dataReuter news 3 classReuter news 5 classReuter news 14 class

การทดลอง (ตอ)Synthesized alphabet document

r s tu v

u v w x y

Title keyword

ผลการทดลอง (ตอ)

Entropy = 0.04F measure = 0.99

004132600213201321

จํานวนสมาชิกที่ตกในแตละ ClusterClass

Synthesized alphabet documentLearning Rate = 0.01, Epoch = 6/100 , output node =3 , vigilance = 0.1

การทดลอง (ตอ)Synthesized text document

javanetwork

internetprotocolcisco3com

car businesstravel hotelbank airline

algorithmdatabasepredictcluster

market tourbenz toyotamoney airway

mailsmtphttpftp

computealgorithmdatabaseintel

webfirewall

Title keyword

ผลการทดลอง (ตอ)

Entropy = 0.01F measure = 0.99

326003003232034921321

จํานวนสมาชิกที่ตกในแตละ ClusterClass

Synthesized text documentLearning Rate = 0.01, Epoch = 8/100 , output node =3 , vigilance = 0.1

การทดลอง (ตอ)

Reuter-21578 dataขั้นตอนที่ 1 แยกเอาเฉพาะขอความใน Title และใน Body ของทุกๆขาว และทํา Index ของแตละขาววาอยูใน Topic ไหนขั้นตอนที่ 2 นําตัวเนื้อขาวที่ได (จากแท็ก Body) ทั้งหมดมาหาคําสําคัญ (keyword) ดวยโปรแกรม copernic summarizer โดยในการหาคําสําคัญของแตละขาวไดกําหนดจํานวนของคําที่ซ้ําไวที่ 10 คํา ขั้นตอนที่ 3 นํา ขอความ Title และ Keyword ที่ไดมาหา stemming ของคํารวมทั้งตัดคําที่เปน stop word

เปรียบเทียบกับผลการทดลองของ TPCLNN

0.85 0.85

0.88 0.84

00.10.20.30.40.50.60.70.80.9

Reuter 3 Reuter 5 Reuter 14

F-measure

Text ART NNTPCLNN

0.450.51

00.10.20.30.40.50.60.70.80.9

Entorpy

Text ART NNTPCLNN

Datasets

Loop Text ART

TPCLNN

สรุปงานวิจัย ปญหาที่พบ และขอเสนอแนะสรุปงานวิจัยลดปญหาของการเกิด High-Dimension vector ของคํา จํานวนรอบในการ Train ของนิวรอลเน็ตเวิรคนอยลงปญหาที่พบความทับซอนกัน (overlap) ของขาวขอเสนอแนะเพิ่มวิธีการที่สามารถจัดกลุมขาวที่สามารถอยูไดมากกวา 1 class

จบการนําเสนอ ขอบคุณครับ

Text ART Neural Network (ตัวอยาง)

Document input,

Title = {money, bank} Keyword ={economic, market, finance, interest}

vigilance

{money, bank}

{economic, market, finance, interest}

Text ART Neural Network (ตัวอยาง)Learning rate = 0.01Vigilance = 0.1Initialize weights

bottom-up weighttop-down weight

Initial bottom-up weightb11={(interest,0.5),(bank,0.5),(credit,0.5)}b21={(interest,0.5),(bank,0.5),(finance,0.5)}b12={(compute,0.5),(science,0.5),(logic,0.5)}b22={(compute,0.5),(accuracy,0.5),(math,0.5)}b13={(logic,0.5),(bank,0.5),(biz,0.5)}b23={(interest,0.5),(bank,0.5),(cost,0.5)}

Initial top-down weightt11={(market,1),(money,1),(biz,1)}t12={(market,1),(money,1),(finance,1)}t21={(algorithm,1),(biz,1),(crude,1)}t22={(interest,1),(biz,1),(crude,1)}t31={(acq,1),(economic,1),(war,1)}t32={(money,1),(economic,1),(marvel,1)}

ตัวอยางการคํานวณหาโหนดผูชนะ (ตอ)หาคาความคลายคลึงในเชิง Span ระหวาง Xtitle กับ b11

Xtitle = {money, bank}b11= {(interest,0.5),(bank,0.5),(credit,0.5)}

5.0)012(2

12)A,X(S 111titles =−+×

75.0)112(2

12)A,X(S 211titles =−+×

5.0)012(2

12)A,X(S 311titles =−+×

ตัวอยางการคํานวณหาโหนดผูชนะ (ตอ)หาคาความคลายคลึงในเชิง Content ระหวาง Xtitle กับ b11

0)012(

0)A,X(S 111titlec =−+

5.0)112(

0)012(

ตัวอยางการคํานวณหาโหนดผูชนะ (ตอ)หาคาความคลายคลึงรวมในเชิง Span และ Content ระหวาง Xtitle กับ b11

1111titlec11titles1111title e)).A,X(S)A,X(S(e).A,X(S +=

05.125.055.025.0

5.0)*05.0(5.0)*5.075.0(

5.0)*05.0(=

ตัวอยางการคํานวณหาโหนดผูชนะ (ตอ)หาคาความคลายคลึงในเชิง Span ระหวาง Xkeyword กับ b21

625.0)114(2

14)A,X(S 121keywords =−+×

5.0)014(2

625.0)114(2

ตัวอยางการคํานวณหาโหนดผูชนะ (ตอ)หาคาความคลายคลึงในเชิง Content ระหวาง Xkeyword กับ b21

25.0)114(

1)A,X(S 121keywordc =−+

0)014(

25.0)114(

ตัวอยางการคํานวณหาโหนดผูชนะ (ตอ)หาคาความคลายคลึงรวมในเชิง Span และ Content ระหวาง Xkeyword กับ b21

1111keywordc11keywords1111keyword e)).A,X(S)A,X(S(e).A,X(S +=

125.1437.025.0

5.0)*25.0625.0(5.0)*05.0(

5.0)*25.0625.0(=

ตัวอยางการคํานวณหาโหนดผูชนะ (ตอ)หาคาความคลายคลึงสุทธิ ของโหนด Y1

1.05+1.125 = 2.175หาคาความคลายคลึงสุทธิ ของโหนด Y2

0.75+0.75 = 1.5หาคาความคลายคลึงสุทธิ ของโหนด Y3

1.05+ 0.9375 = 1.9875โหนด Y1 คือโหนดชนะ ทําการทดสอบความสอดคลอง Resonance

ทดสอบสวนความสอดคลอง Resonance (ตอ)

6.14.02.1)t,X(S)t,X(S)t,X(S 12keywordc12keywords12keyword =+=+=

375.125.0125.1)t,X(S)t,X(S)t,X(S 11titlec11titles11title =+=+=

575.26.1375.1)t,X(S 1 =+=

525.0)2*5.0()2*2()2*5.0()575.2(Z =

−−

weights update,Z1.0

ρρ≥=

การปรับ Weight ของ Text ART Neural Nets (ตอ)Input data Xtitle={money, bank}Xkeyword={economic, market, finance, interest}

Old weightsb11={(interest,0.5),(bank,0.5),(credit,0.5)}b21={(interest,0.5),(bank,0.5),(finance,0.5)}t11={(market,1),(money,1),(biz,0.5)}t12={(market,1),(money,1),(finance,1)}

New weightsb11={(interest,0.49),(bank,0.51),(credit,0.49),(money,0.05)}b21={(interest,0.51),(bank,0.49),(finance,0.51),(economic,0.05),(market,0.05)}t11={(market,0.99),(money,1),(biz,0.99),(bank,0.05)}t12={(market,1),(money,0.99),(finance,1),(economic,0.05),(interest,0.05)}

Text Adaptive Resonance Theory Neural Network for Support Textual Inputs

Documents

Convergent textual needs divergent textual sources · Convergent textual needs – divergent textual sources: the ‘Transitus Mariae’ and the ‘Sunday Letter’ in their Insular

Resonance Finder...N Netflix O Twitch School of Electronic and Communications Engineering If unmodulated signals of identical frequency coo are applied to the two inputs, the circuit

Tipologia Textual Generodd Textual e Funcao Da Linguagem6e87bd391497529ffcbfd1a5a0a71c219b40ada7

Produção textual 9° Ano/ EF Estrutura textual Profa.Karla Faria

Nuclear Magnetic Resonance (NMR) Magnetic Resonance Imaging (MRI)

COMPREENSÃO TEXTUAL. COESÃO E COERÊNCIA TEXTUAL

Textual Analysis and Textual Theory

Resonance - JNNCE ECE Manjunathcalled resonance. The frequency at which resonance takes place is called the frequency of resonance ! r (radians/sec) or f r. (Hz) Resonance may occur

JEE (MAIN)-2013 - Resonance Kota candidate is allowed to carry any textual material, printed or written, bits of papers, ... This solution was download from Resonance JEE Main 2013

Textual analysis

Adaptive Resonance Theory - hebmlc.orghebmlc.org/en/GroupMeeting/Adaptive Resonance Theory.pdf · Fuzzy ART Fuzzy ARTMAP Adaptive Resonance Theory Adaptive Resonance Theory (ART)

Presentación de PowerPoint · Del 18 al 22 de abril . RABIN Textual Mente Objectual . Textual Mente Objectual . Textual Mente Objectual . Textual Mente Objectual . Title: Presentación

MRI Magnetic resonance imaging. Definition NMR = Nuclear Magnetic Resonance MRI = Magnetic Resonance Imaging ESR = Electron Spin Resonance

Physics and mathematics of magnetic resonance imaging for ... · Magnetic resonance imaging (MRI), magnetic resonance angiography (MRA) and magnetic resonance spectroscopy (MRS) are

Textual analysis!!!

Resonance Why Resonance?

RESONANCE Series Resonance

ELECTRON SPIN RESONANCE MAGNETIC RESONANCE …

Nuclear magnetic resonance microscopy€¦ · Nuclear Magnetic Resonance Microscopy Stefanie C. VanGorden Introduction Nuclear magnetic resonance (NMR) microscopy is magnetic resonance

FROM TEXTUAL DESCRIPTION TO UML USE CASE … From textual...Textual Analysis and candidate items 1. Diagram Navigator \ Requirements Capturing \ 2. Right click on Textual Analysis