20
Daemin PARK Korea Press Foundation 1 Toward a News Data Science

Toward a news data science

Embed Size (px)

Citation preview

Page 1: Toward a news data science

Daemin PARK

Korea Press Foundation

1

Toward a News Data Science

Page 2: Toward a news data science

Research Plans

Toward a News Data Science

Research Histories

2

Page 3: Toward a news data science

Toward a News Data Science

Improving Analytics

Designing Systems

Creating Ecosystems

3

Page 4: Toward a news data science

Multi-Level Semantic Network Analysis of News

Level of Analysis

Network Type

Node Edge Analysis Algorithm

Named Entities

1 mode

Person Cooccurrence in articles

Disputant Degree centrality Organization Relevance Tie strength

Topic Cooccurrence

in quotes Depth of discussion Degree centrality

Relevance Tie strength

2 mode Person-Topic Cooccurrence

in quotes - Specialists/Generalists

- Main issues/Peripheral issues 2-mode degree centrality Organization-Topic

Sentences 1 mode Quotes

Cooccurrence in articles

+ Identical sources

+ Similarity

Agenda network Clustering Semantic distance Manhattan distance

Semantic path Path Main theme Degree centrality

Summary Diameter Particularization Clique

Media 1 mode Media Similarity Uniqueness Normalized sum of reversed similarity

synchronization Ratio of duplication

Park, D.M., Baek, Y.M., & Kim, S.H. (2015). News big data analysis system. Seoul, Korea: Korea Press Foundation.

4

Page 6: Toward a news data science

News Big Data Ecosystem

Park, D.M., Kim, S.H., & Yang J.A. (2014). Strategies for smart news media platform innovation. Seoul, Korea: Korea Press Foundation.

Big Data Analysis System -Text Mining

-Computer Vision -Semantic Net Analysis

Data Driven Services

- Expert System - News Startups

Content Provider

– Media - User

- Experts

Social Media - Advanced Search Engine

- CMS - SNS, chatbot

- Ads

open API

open source

content

open data

revenue share

Archive unstructured

data

revenue share

open API

6

Page 7: Toward a news data science

Research Plans

Research Histories

Toward a News Data Science

7

Page 8: Toward a news data science

News Source Network Analysis

Park, D.M.(2013). News source network analysis as big data analytics of news articles. Korean Journal of Journalism and Communication Studies. 57(6). 233-261.

2 persons 1 article

2 persons 2 articles

4 persons 1 article

4 persons 2 articles

8

Page 9: Toward a news data science

Distribution of Semantic Network

Park, D.M., Kim, G.N., & On, B.W.(2016). Understanding the network fundamentals of the news sources associated with a specific topic. Information Sciences. 327. 32-52.

1.6±0.2

9

Page 10: Toward a news data science

Fat Tailed, Micro-small World

Park, D.M., Kim, G.N., & On, B.W.(2016). Understanding the network fundamentals of the news sources associated with a specific topic. Information Sciences. 327. 32-52.

Important Sources

Barack Obama Jay Carney

Ban Kimoon John Kerry

Victoria Nuland Kim Hyunwook

Susan Rice …

10

Page 11: Toward a news data science

Crawling Advanced NLP Customized SNA Discourse Analysis

Text Mining with NLP & SNA

- tokenization - stemming - stopword elimination - tagging part of speech - Indexing - sentence boundary

recognition - URL tagging - co-occurrence analysis - partial parsing - named entity

recognition - coreference resolution - word sense

disambiguation - classification - clustering

- visualization - data cleansing - time series content analysis - governmentalitiy studies

- projector - file name standardizer - edge list converter - degree centrality - periodic analysis - degree exponent - rank - quote rank - description - Fragmentation

Park, D.M.(2016). Natural language processing of news articles: A case of ‘NewsSource beta’. Korean Communication Theory. 12(1). 4-52.

- crawler - data aggregation

BigKinds

Semantic Net Analyzer

11

Page 12: Toward a news data science

Content Analysis: <News Big Data Analytics & Insights>

12

Page 13: Toward a news data science

Visualization of Millions of News

Park, D.M.(2016). Automated time series content analysis with news big data analytics: Analyzing sources and quotes in one million news articles for 26 years. Korean Journal of Journalism and Communication Studies. 60(5). 353-407.

13

Page 14: Toward a news data science

Automated Time Series Content Analysis

Park, D.M.(2016). Automated time series content analysis with news big data analytics: Analyzing sources and quotes in one million news articles for 26 years. Korean Journal of Journalism and Communication Studies. 60(5). 353-407.

14

Page 15: Toward a news data science

Toward a News Data Science

Research Plans

Research Histories

15

Page 16: Toward a news data science

Available Data

Data Sources Language Period No. of Media No. of Articles Topics

KINDS

Korean

1 Jan. 1990 - 30 Jun. 2014 66 About 30 million All

BIGKINDS 1 Jan. 1990- 31 Aug. 2016 44 About 30 million All

Naver, Daum 1 Jun. 2016 - 30 Jun. 2016 200 About 6 million All

UPI English

4 Jan. 2010 – 16 Jul. 2013 1 About 0.15 million All

LexisNexis 1 Jan.1999 – 31 Dec. 2013 10* About 73 thousand North Korea

Type of Named Entities No. of Entities

Person Korean 116,787

Foreigner 6,438

Organization 489,023 148,405

Rank 1,035

* NYT, FT, WP, the Daily Yomiuri (Tokyo), the Nikkei Weekly(Japan), South China Morning Post, The Business Times, The Strait Times, Korea Herald, Korea Times

16

Page 17: Toward a news data science

Current Research Projects No. Themes Collaboration Progress Journal

1 Debating chatbot? : Sentence-level news search engine Prof. B.W. Suh (SNU) Prototyping complete SCI

2 Is user-centrism a journalistic value?: Social media design based on news big data

Prof. J.S. Lee (SNU) UI design complete SCI

3 Financialization of KPOP Prof. G.T. Lee

(George Mason Uni.) English draft in progress SSCI

4 Political change and journalists’ use of news sources Prof. Y.M. Baek

(Yonsei Uni.) English draft in progress SSCI

5 Politicization of Hallyu Prof. S.K. Hong (SNU) Data analysis

in progress SSCI

6 Time series content analysis on ‘public opinion’, ‘people's voice’, and ‘people's livelihood’

Dr. S.H. Kim (KPF) Data analysis

in progress SSCI

7 Prediction of stock prices(KOSPI)

Prof. W.S. Lee (Dongseo Uni.)

Dr. Y.S. Park (Bank of Korea)

Data analysis in progress KCI

8 Prediction of North Korea’s provocation Prof. Y.H. Kim

(Sungkyungkwan Uni.) Data analysis

in progress SSCI

9 Time series content analysis on ‘social media’ Prof. E.J. Lee

(SNU) Data crawling

complete SSCI

17

Page 18: Toward a news data science

Integration of Heterogeneous Data for Expert Systems

- Multimedia: texts, audios, videos,

interactive units

- Multilevel: words, sentences, articles, media,

systems

- Multilingual: Korean, English, Japanese,

Chinese, …

- Multisource: news, reports, journals,

literatures, behaviors, sensors …

18

Page 19: Toward a news data science

Advanced Methodology

Opinion Dynamics Bayesian Statistics Machine Learning

19

Page 20: Toward a news data science

Facebook was not originally created to be a company. It was built to accomplish a social mission :

to make the world more open and connected.

Be open, build social value.

Mark Zuckerberg’s Letter to Investors: ‘The Hacker Way’

Q & A

20