Upload
daemin-park
View
18
Download
0
Embed Size (px)
Citation preview
Daemin PARK
Korea Press Foundation
1
Toward a News Data Science
Research Plans
Toward a News Data Science
Research Histories
2
Toward a News Data Science
Improving Analytics
Designing Systems
Creating Ecosystems
3
Multi-Level Semantic Network Analysis of News
Level of Analysis
Network Type
Node Edge Analysis Algorithm
Named Entities
1 mode
Person Cooccurrence in articles
Disputant Degree centrality Organization Relevance Tie strength
Topic Cooccurrence
in quotes Depth of discussion Degree centrality
Relevance Tie strength
2 mode Person-Topic Cooccurrence
in quotes - Specialists/Generalists
- Main issues/Peripheral issues 2-mode degree centrality Organization-Topic
Sentences 1 mode Quotes
Cooccurrence in articles
+ Identical sources
+ Similarity
Agenda network Clustering Semantic distance Manhattan distance
Semantic path Path Main theme Degree centrality
Summary Diameter Particularization Clique
Media 1 mode Media Similarity Uniqueness Normalized sum of reversed similarity
synchronization Ratio of duplication
Park, D.M., Baek, Y.M., & Kim, S.H. (2015). News big data analysis system. Seoul, Korea: Korea Press Foundation.
4
News Big Data Analysis System
5
News Big Data Ecosystem
Park, D.M., Kim, S.H., & Yang J.A. (2014). Strategies for smart news media platform innovation. Seoul, Korea: Korea Press Foundation.
Big Data Analysis System -Text Mining
-Computer Vision -Semantic Net Analysis
Data Driven Services
- Expert System - News Startups
Content Provider
– Media - User
- Experts
Social Media - Advanced Search Engine
- CMS - SNS, chatbot
- Ads
open API
open source
content
open data
revenue share
Archive unstructured
data
revenue share
open API
6
Research Plans
Research Histories
Toward a News Data Science
7
News Source Network Analysis
Park, D.M.(2013). News source network analysis as big data analytics of news articles. Korean Journal of Journalism and Communication Studies. 57(6). 233-261.
2 persons 1 article
2 persons 2 articles
4 persons 1 article
4 persons 2 articles
8
Distribution of Semantic Network
Park, D.M., Kim, G.N., & On, B.W.(2016). Understanding the network fundamentals of the news sources associated with a specific topic. Information Sciences. 327. 32-52.
1.6±0.2
9
Fat Tailed, Micro-small World
Park, D.M., Kim, G.N., & On, B.W.(2016). Understanding the network fundamentals of the news sources associated with a specific topic. Information Sciences. 327. 32-52.
Important Sources
Barack Obama Jay Carney
Ban Kimoon John Kerry
Victoria Nuland Kim Hyunwook
Susan Rice …
10
Crawling Advanced NLP Customized SNA Discourse Analysis
Text Mining with NLP & SNA
- tokenization - stemming - stopword elimination - tagging part of speech - Indexing - sentence boundary
recognition - URL tagging - co-occurrence analysis - partial parsing - named entity
recognition - coreference resolution - word sense
disambiguation - classification - clustering
- visualization - data cleansing - time series content analysis - governmentalitiy studies
- projector - file name standardizer - edge list converter - degree centrality - periodic analysis - degree exponent - rank - quote rank - description - Fragmentation
Park, D.M.(2016). Natural language processing of news articles: A case of ‘NewsSource beta’. Korean Communication Theory. 12(1). 4-52.
- crawler - data aggregation
BigKinds
Semantic Net Analyzer
11
Content Analysis: <News Big Data Analytics & Insights>
12
Visualization of Millions of News
Park, D.M.(2016). Automated time series content analysis with news big data analytics: Analyzing sources and quotes in one million news articles for 26 years. Korean Journal of Journalism and Communication Studies. 60(5). 353-407.
13
Automated Time Series Content Analysis
Park, D.M.(2016). Automated time series content analysis with news big data analytics: Analyzing sources and quotes in one million news articles for 26 years. Korean Journal of Journalism and Communication Studies. 60(5). 353-407.
14
Toward a News Data Science
Research Plans
Research Histories
15
Available Data
Data Sources Language Period No. of Media No. of Articles Topics
KINDS
Korean
1 Jan. 1990 - 30 Jun. 2014 66 About 30 million All
BIGKINDS 1 Jan. 1990- 31 Aug. 2016 44 About 30 million All
Naver, Daum 1 Jun. 2016 - 30 Jun. 2016 200 About 6 million All
UPI English
4 Jan. 2010 – 16 Jul. 2013 1 About 0.15 million All
LexisNexis 1 Jan.1999 – 31 Dec. 2013 10* About 73 thousand North Korea
Type of Named Entities No. of Entities
Person Korean 116,787
Foreigner 6,438
Organization 489,023 148,405
Rank 1,035
* NYT, FT, WP, the Daily Yomiuri (Tokyo), the Nikkei Weekly(Japan), South China Morning Post, The Business Times, The Strait Times, Korea Herald, Korea Times
16
Current Research Projects No. Themes Collaboration Progress Journal
1 Debating chatbot? : Sentence-level news search engine Prof. B.W. Suh (SNU) Prototyping complete SCI
2 Is user-centrism a journalistic value?: Social media design based on news big data
Prof. J.S. Lee (SNU) UI design complete SCI
3 Financialization of KPOP Prof. G.T. Lee
(George Mason Uni.) English draft in progress SSCI
4 Political change and journalists’ use of news sources Prof. Y.M. Baek
(Yonsei Uni.) English draft in progress SSCI
5 Politicization of Hallyu Prof. S.K. Hong (SNU) Data analysis
in progress SSCI
6 Time series content analysis on ‘public opinion’, ‘people's voice’, and ‘people's livelihood’
Dr. S.H. Kim (KPF) Data analysis
in progress SSCI
7 Prediction of stock prices(KOSPI)
Prof. W.S. Lee (Dongseo Uni.)
Dr. Y.S. Park (Bank of Korea)
Data analysis in progress KCI
8 Prediction of North Korea’s provocation Prof. Y.H. Kim
(Sungkyungkwan Uni.) Data analysis
in progress SSCI
9 Time series content analysis on ‘social media’ Prof. E.J. Lee
(SNU) Data crawling
complete SSCI
17
Integration of Heterogeneous Data for Expert Systems
- Multimedia: texts, audios, videos,
interactive units
- Multilevel: words, sentences, articles, media,
systems
- Multilingual: Korean, English, Japanese,
Chinese, …
- Multisource: news, reports, journals,
literatures, behaviors, sensors …
18
Advanced Methodology
Opinion Dynamics Bayesian Statistics Machine Learning
19
Facebook was not originally created to be a company. It was built to accomplish a social mission :
to make the world more open and connected.
Be open, build social value.
Mark Zuckerberg’s Letter to Investors: ‘The Hacker Way’
Q & A
20