Link Distribution on W ikipedia

Link Distribution on Wikipedia

[0422]KwangHee Park

Table of contents Introduction Similarity between document

Error case Modify word bag

Conclusion

Introduction Why focused on Link

When someone make new article in Wikipedia, mostly they simply link to other language source or link to similar and related article. After that, that article to be wrote by others

Assumption Link terms in the Wikipedia articles is the key terms which

can represent specific characteristic of articles

Introduction Problem what we want to solve is

To analyses latent distribution of set of Target document by topic modeling

Topic modeling – our approach Target

Document = Wikipedia article Terms = linked term in document

Modeling method LDA

Modeling tool Lingpipe api

Advantage of linked term Don’t need to extra preprocessing

Boundary detection Remove stopword Word stemming

Include more semantics Co-relation between term and document Ex) cancer as a term cancer as a document

cancer

A Cancer

Preliminary Problem How well link terms in the document are represent

specific characteristic of that document

Link evaluation Calculate similarity between document

Link evaluation Similarity based evaluation

Calculate similarity between documents Sim_d{doc1,doc2}

Calculate similarity between terms Sim_t{term1,term2}

Compare two similarity

Similarity between documents Sim_d

Similarity between documents Significantly affected input term set

Data set 1536 number of document

Disease domain : 208 Settlement domain : 1328

p,q = topic distribution of each document Kullback Leibler divergence

Example –reasonable

Example – not good

Error analysis Length problem – overestimate portion of topic

If the document contain only few link term then portion of topic of that document tend to be overestimated Ex)1950 년 ,1960 년 , 파푸아 뉴기니 , 식인풍습

Error analysis Some document’s Link terms do not describe docu-

ment itself Ex) Date, Country,…etc

Demo website For disease domain :

http://semanticweb.kaist.ac.kr/research/tmodel/ For settlement domain :

http://semanticweb.kaist.ac.kr/research/tmodel/sindex.php

For disease + settlement domain : http://semanticweb.kaist.ac.kr/research/tmodel/dsi

ndex.php

Modify word bag Including non-link term

Excluding noise term

Weighted score for duplication term

Including incoming link

Conclusion Topic modeling with link distribution in Wikipedia Need to measure how well link distribution can rep-

resent each article’s characteristic After that analysis topic distribution in variety way Expect topic distribution can be apply many applica-

Link Distribution on W ikipedia

Documents

U NDERSTANDING W IKIPEDIA Niki Kittur nkittur@cs.cmu.edu

Throughput-Optimal Broadcast in Wireless Networks with ...modiano/papers/CV_J_102.pdf · distribution of the link states is assumed to possess an arbi-trary joint distribution. The

Capital Link Shipping Weekly Markets Reportmaritime-connector.com/documents/Capital Link Shipping...Capital Link Shipping Weekly Markets Report Weekly distribution to an extensive

The MK Link connection and distribution system brings plug-in

Obesity, body fat distribution, insulin resistance and ... · Obesity, body fat distribution, insulin resistance and link with type 2 diabetes mellitus Gayathri Rathnayake*, Usha

The UK’s first DC link using existing distribution … UK’s first DC link using existing distribution network 33kV AC circuits Year One Project Summary Autumn 2016 ANGLE-DC ANGLE-DC

IO-Link Power Distribution How To

( Wiki Guía) · Ocupación Trobador, poeta e compositor Xéneros cantiga de amigo [ editar datos en Wikidata ] Mendinho Na Galipedia, a W ikipedia en galego. Mendiño, Meendinho

F Distribution and ANOVA - Arkansas State Universitymyweb.astate.edu/sbounds/Statistics_AP/4 Week 4/ELAD_6773_WEE… · F Distribution and ANOVA ... use the online link ANOVA2. Use

DHL Just Sell Redesign Wireframes v0 - kleinrogge.co.uk file[Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link] [Link]

PowerPoint Presentation · acebook witter ikipedia . 24,000 18,000 12,000 6,000 . PLOS Pathogens . PLOS ONE PLOS Medicine . PLOS Clinical Trials PLOS Genetics . PLOS Neglected Tropical

Gordonton5 · 2018-10-18 · wikipedia spy*ids glïdiatòõèlådiatöi 2001 „ikipedia shrek wikipedia 4k!lrekshrek spyiåds Shrek 2001 iPhone gladiator Shrek gladiator gladiator

Joint Distribution of Link Distances

Specification NetLink Data Link Processor · NetLink Data Link Processor ... Data Distribution Service (DDS) RTI 4.4 High Level Architecture (HLA) HLA 1.3 Webservice – NMEA 0183

MEDSTAT SYSTEMS A Vital Link Between Patient, Physician & Provider Lab Pharmacy Stat Distribution

Chairman’s Message From The Editor’s Deskieeecs-madras.managedbiz.com/link-web/link-1111.pdf · The topic for poster preparation was “Save environment”. During the prize distribution

StattBlatt Edition 2014 Grevenbroicher Verzällstattblatt.de/wp-content/uploads/...verlag_weihnachtsverzaell_2014.pdf · eihnachten 1892 ikipedia – 4 – Der Wenkter Der Hervs es

Intermediate Physical Chemistry Driving force of chemical reactions Boltzmann’s distribution Link between quantum mechanics & thermodynamics

County Board Work Session September 1, 2015 · AHMP and IF Revisions Geographic Distribution AHMP •Affordable Housing Forecast map included •Distribution policy 1.1.4 –link

Models of Link Capacity Distribution in Isp's Router-Level Topology