15
Big Data .vs. Official Statistics Yu gyung Director, Statistical Information Portal Di Statistics General of the National Statistical Institutes Meeting tember 2013/Hague, Netherlands

Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical

Embed Size (px)

Citation preview

Page 1: Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical

Big Data .vs. Official Statistics

Yu gyung Kang Director, Statistical Information Portal Division

Statistics Korea

Directors General of the National Statistical Institutes Meeting25~27 September 2013/Hague, Netherlands

Page 2: Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical

Contents

Technology Assessment (TA) in KoreaBig Data Use in Private Sector

• Market Analysis• Suicide Warning System

On-going Projects by KOSTAT• Pilot Project for Mining and Manufacture Survey• E-household Account System• Pilot Project for Price Statistics

Future Challenges

1

Page 3: Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical

Technology Assessment (1)

…Conducted by MSIP of Korea in 2012, under the Article 14 of the Framework Act on Science and Technology

• What is big data?– Data with 3Vs characteristics + Data Management Technology * Gartner’s 3Vs : Volume, Variety and Velocity

Volume Variety Velocity

…….

GB/TB

PBEBZB

Structured Data Unstructured Data

Customer DataSale DataStock DataFinance Data

Video Music Messages

SNS GPS BBS

Low speed(hours to

weeks)

High speed(mins. to seconds)

2

Page 4: Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical

Technology Assessment (2)

• Expected Impact Private Sector Public Sector Individuals

• source of new value creation

• Supporting efficient decision-making

• Providing business chances and jobs

• Improving public ser-vice and its efficiency

• Real-time response to social issues

• Creating new industry and job opportunities

• Improving quality of life with individually tailored service

• Increasing trust in public policies and service

• Aggravating economic inequality

• Possibility of wasting money due to careless massive investment

• Social problems caused by unethical use of data

• Increasing risk of leak-ing gov’t’s secrets

• ‘Big Brother’• Misuse of big data

with error and its neg-ative impact to gov’t policies

• Increase of privacy and security issues

3

Page 5: Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical

Technology Assessment (3)

Policy Recommendations

a. Localize Core Technologies related to big data through gov’t-led R&D

b. Establish Legal and Institutional Basis for standardization of managing, sharing and trading big data

c. Foster pool of Big Data Analysts and Experts through interdis-ciplinary undergraduate and graduate programs

d. Take a Step-By-Step Approach by Setting Priorities in the sec-tors where benefits to the public will be visible.

e. Make Strategies to Protect Privacy

4

Page 6: Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical

Big Data Use in Private SectorCase 1 : Market Analysis by

X5

Which Business would you like to open?

Page 7: Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical

Big Data Use in Private SectorCase 1 : Market Analysis by

Floating Population

ConsumerType

Sales Information

Real Estate

Business Cycle

6

Real Estate 411

Korean Statistical Information Service

Page 8: Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical

Big Data Use in Private SectorCase 2 : Suicide Warning System

Weather Forecast

7

Why not

Suicide fore-cast?

• social factors• weather factors• Werther Effect• personal emotion

OECD (2012), OECD Health Statistics

Page 9: Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical

Case 2 : Suicide Warning System

Big Data Use in Private Sector

• Training Set (2008-2009) & Test Set (2010)– Total number of suicide incidents – Economic and weather data

• CPI, unemployment rate, KOSPI(Korean Composite Stock Price Index), day-light hours and temperature

– 150 million posts from about 5 million blogs on NAVER(incl. SNS posts)

• Var1 (# of posts including “suicide”), • Var2 (# of posts including “dysphoria”, “be tired”, “be painful”, or “be ex-

hausted”)

• Model– Dependent Variable : No. of suicide in a given period(3 days)– Independent Variables

• CPI, unemployment rate, KOSPI, daylight hours, temperature• Two variables obtained from the Posts • Celebrity suicide (control variable)• No. of suicide from the previous period8

Page 10: Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical

What should NSOs do?

scientifically collected data .vs. huge amount of data Challenge!

Sample Surveys

Established theoretical basis

Representativeness of target pop-ulation

Relatively slow

Expensive data collection

Big Data

Quantity beats quality

Lack of representativeness of tar-get population

MORE TIMELY

Data already there

9

Page 11: Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical

KOSTAT tried…

Seminars

October 2012~March 2013

Organizes seminars once or twice a month inviting outside big data ex-perts

Aims to raise awareness of big data and its impact on producing official statistics

Pilot Project

December 2012~April 2013

A pilot project on the use of big data in the process of editing exist-ing national statistics

Using media data for examining outliers when producing the Index of Industrial Production(IIP)

10

Page 12: Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical

KOSTAT is doing…1. E-Diary System(household Account System)

• Currently about 48.5% of sample household adopted the e-Diary system

• Respondents can import their expenditure information through online transactions from the banks, credit card companies and major retail stores.

using big data for the conve-nience of re-spondents

11

Page 13: Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical

KOSTAT is doing…

KOSTAT is currently preparing for a pilot project on compiling price index using big data for a specific manufacturing product.

2. Pilot Project of Price Index

Please select specific do-mains(or items) that can

clearly show difference be-tween big data and existing

statisticsi.e. TV or electronic products

Prof. Roberto Rigobon

12

Page 14: Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical

Future Challenges

Can we ignore Big data just because of its representativeness issue in spite of its strengths like timeliness?

Can KOSTAT disallow over 380 statistical agencies to produce official statistics with big data?

13

Maybe Not!Shall make use of big data in producing statistics at some point in the

future as it was the case with transition to administrative data from survey data.

Need to identify the limitations of big data through pilot projects and learn techniques and know how to refine big data based statistics for official statistics.

Page 15: Big Data.vs. Official Statistics Yu gyung Kang Director, Statistical Information Portal Division Statistics Korea Directors General of the National Statistical

감사합니다 !Thank you very much!