78
Big Data, Potentials, Opportunities, and Challenges Setia Pramana Politeknik Statistika STIS Sub Directorate Statistics Modeling BPS RI 1

Big Data, Potentials, Opportunities, and Challenges

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Big Data, Potentials, Opportunities, and Challenges

Setia PramanaPoliteknik Statistika STIS

Sub Directorate Statistics Modeling BPS RI

1

About Me

1999-2000

2005

BSc in Statistics Brawijaya Univ.

Start working @ STIS

Journey to Europe

2006

M.Sc in App. Statistics Hasselt Univ. Belgium

M.Sc in BioStatisticsHasselt Univ. Belgium

Research Assistant

2007

2011

PhD MathematicsStatistical BioinformaticsHasselt Univ. Belgium

Postdoc @ MEBKarolinska Institutet

2014

2015

@ STIS

Associate Professor

2018

Head of PPPM Polstat STIS

Head of Sub Directorate Statistics Modeling BPS

2019

Board Member

• UN Global Working Group Big Data for Official Statistics

• Asosiasi Ilmuan Data Indonesia

• Ikatan Statistisi Indonesia

• Forum Pendidikan Tinggi Statistika

• Masyarakat Biodiversiti dan Bioinformatika Indonesia

• Asosiasi Artificial Intelegent Indonesia

4

“Data adalah jenis kekayaan baru bangsa kita. Kini data lebih berharga

dari minyak. Oleh karena itu, kedaulatan data harus diwujudkan. Hak

warga negara atas data pribadi harus dilindungi. Regulasinya harus

segera disiapkan, tidak boleh ada kompromi.

Presiden RI, Joko Widodo dalam Pidato Kenegaraan 16 Agustus 2019

REGISTRASI

DULU sumber data berasal dari:

SURVEI

SENSUS

SEKARANG sumber data juga berasal dari:

BIG DATA

Data administratif

Data digital komersial atau

transaksional

Perangkat pelacakan GPS

Data perilaku

Data opini

Data Explosion

• Interactions of billions of people using computers, GPS devices, cell phones, and medical devices.

• online or mobile financial transactions, social media traffic, and GPS coordinates.

• “In the next five years, we’ll generate more data as humankind than we generated in the previous 5,000 years”. Eron Kelly, GM Microsoft

5

Data Explosion

6

7

What we “produce”?

Setia Pramana 8

How about Indonesia?

Internet of Things

10

Definition?

“A paradigm for enabling the collection, storage, management, analysis and visualization, potentially under real-time constraints, of extensive datasets with heterogeneous characteristics.” (International Telecommunication Union, 2015)

12Setia Pramana

Big Data for Energy 13

Big Data for Energy 14

Big Data for Energy 15

Data SourcesExhaust data Mobile phone data

Financial transactionsOnline search and access logsCitizen cardPostal data

Sensing data Satellite and UAV imagerySensors in cities, transport and homes Sensors in nature, agriculture and waterWearable technologyBiometric dataInternet of Things (IoT)

Digital Content Social media dataWeb scrapingParticipatory sensing / crowdsourcingHealth recordsRadio content

What People Do

What People Say

Measurement Revolution

Sources: Mobile Phone Data• Owned by Mobile Network Operators (MNOs)

• Mobile Position (Active)

• Call Detail Records (CDR):• Contains: incoming and outgoing call, SMS and MMS, and Location(passive)• stored to internal data warehouses & billing management systems

• DDR (Data Detail Record): Internet traffic between the mobile devices and the network).

17

Mobile Positioning Data

• Location of Mobile Devices• Statistical indicators can be generated:

• The number of residences geographically distributed according to available accuracy;• The number of workplace, school, secondary home, and other regular locations;• Internal migration based on the change of the residences within the country;• Change of workplace over time;• Cross-border migration based on the regular travels between different countries;• Population grid statistics (1 km2);• Temporary population statistics• Assessing temporary population (hourly, daily, weekly, monthly, etc.);• Real-time assessment for specific location during the large-scale event, gathering of

people or actual emergency situations (e.g. what is the consistence of the crowd in specific location, how many people are affected by an earth-quake of hurricane);Risk assessment for law enforcement (planning the number of patrol units in the area based on the consistency of the temporary population).

18

Big Data Source: Internet of Things

19

Big Data for Energy 20

Page 21Big Data for Energy

From source to load we make the grid efficient and reliableFrom downtown to suburb, we deliver urban efficiency today

Smart Grid & Smart City IoT Solutions

Smart Grid Operator “IT/OT integration from field to control center to enterprise”

Smart Generator“Producing power efficiently"

Renewable Operator"Making renewablesdispatchable"

Energy Services Provider"Bridging supply & demand" .

Smart Energy

Smart Mobility

Smart Water

Smart Public Services

Smart Buildings & Homes

Smart Integration

Smart Data Center

Social Media

22

23

24

Twitter

25

• Trending Topics

26

Twitter

27

28

Crowdsourcing• The process of getting work, funding or information,

usually online, from a crowd of people.

• The word Crowdsourcing is a combination of Crowd & Outsourcing

CROWD

OUTSOURCING

CROWDSOURCING

Crowdsourcing

30

31

Web Crawling and Scraping

• Extract Information from Web

• Web Crawling is the process of locating information on World Wide Web(WWW), indexing all the words in a document, adding them to a database, then following all hyper links and indexes and adds that information also to the database

• Web scraping is the process of automatically requesting a web document and collecting information from it.

32

http://prowebscraping.com/web-scraping-vs-web-crawling/

What for?

Business InsightPolicyPolictics,Etc….

33

Better Data, Better Government

• Quality and timely data are vital for enabling governments,international organizations, civil society, private sector and thegeneral public to make informed decisions

• Evidence based policy making

• Quality of Statistics:• Accuracy • Relevance • Timeliness • Accessibility • Coherence • Interpretability

34Setia Pramana

Big Data in Action for Government

World bank group

35

Big Data for Public Policy: Examples

36

37

Crowdsourcing

38

Susanti and Pramana 2017

STATISTICS INDONESIA

39

Crowdsource for Food Prices Nowcasting

• Collaboration with Pulse Lab UN Jakarta

• Use crowdsourcing premise UN Food security project

• Locus: Kota Mataram, NTB

• Time: March– July 2015

39

STATISTICS INDONESIA

40

Web Scraping: Online Shops

41

Online Shops Total Commodities

Hypermart 52 products

KlikMart 75 products

Bhinneka 40 products

Elektronik City 17 products

Zalora 36 products

BerryBenka 25 products

Mothercare 2 products

Babyzania 4 products

Apotek Century 10 products

Pusat Kosmetik 5 products

Sephora 2 products

Stationary 6 products

Gramedia 3 products

• Analysis is on progress

• Get the movement of consumer price

• Get the pattern of the changes of consumer price per commodity kind and per e-commerce

• Construct CPI by substituting the conventionally collected consumer price with e-commerce-based consumer price, then

• Comparing the survey-based CPI with e-commerce-based CPI

Online Shop Commodities Prices

42

Capturing E-commerce Activities

43

44

Perkembangan E Commerce di Indonesia menurut Provinsi

Persebaran E-Commerce di Indonesia Hasil Crawling Shopee

Jenis Toko Terbanyak per Provinsi Hasil Crawling Google Map

Profil E Commerce di Sumatera Barat

Sumber Data

• Web Scraping dari E Commerce dan Google Maps

• Keterangan Data:• Web E Commerce (Nasional)

• 1.288 kategori• 3.065.279 barang

• 4.212 barang di sumatera barat

• 264.240 toko• 892 toko di sumatera barat

• Google Maps (6 sampel kabupaten/kota di Sumatera Barat)• Kabupaten Solok• Padang• Padang Pariaman• Pariaman• Pesisir Selatan• Kota Solok

Persentase Item di Seluruh Kategori

54

Crawling online ticketing

• Pegi pegi

• Agoda

• Traveloka

• Data :Link Hotel, Id Hotel, Nama Hotel, Tipe Hotel, Bintang Hotel,Alamat Hotel, Harga Hotel, Skor Reviu, Jumlah Review, Tipe Kamar,Jumlah Tipe Kamar, Jumlah Kamar Tersisa, Jumlah Lantai, Jumlah Restauran, Jumlah Kamar Total,Tahun Dibangun, Latitude, Longitude,Kota/Area Hotel, dan Fasilitas Hotel.

Persebaran Hotel dari website Agoda.com (12026 Hotel)

58

59

• Sraping data dari media sosial instagram.

• Keywords: #wonderfulindonesia, #pesonaindonesia, #visitindonesia, #exploreinidonesia, #indonesiatourism.

• 1,897,450 post: 480,100 post have geotag (25%), cleaned: 411,630 post.

Analisis Data Geospasial Sosial Media Pola Pariwisata di Indonesia

Analisis Data Geospasial Sosial Media Pola Pariwisata di Indonesia

61542

27738

0

20000

40000

60000

80000

Peak Season (20 Desember 2018 - 2 Januari 2019)Low Season (21 Januari 2019 - 3 Februari 2019)

Number of Instagram Postings

62

63

Job Vacancy Monitoring: Twitter

Jab vacancy

Contents Title

Contents Title Contents Title

Subjective Happiness Index

Subjective Happiness Index by Province

The saddest province

Not too sad & not too happy province

The happiest province

Analytics Approaches

• Descriptive: What happened or what is happening now?

• Diagnostic: Why did it happen or Why is it happening now?

• Predictive: What will happen next? What will happen under various conditions?

• Prescriptive: What are the options to create the most optimal/high value result/outcome?

67

Big Data Analytics

• Data is unstructured

• Data comes from different sources and has conflicts/missing data/outliers

• Usually a data fusion step is required

• Data are dynamic

• Often has a crowdsourcing component (e.g., Twitter)

• Often sensor processing steps are required (domain-specific)

• Because of the size of processed data things have to be done differently

• Involves high-performance computing and specialized algorithms

• Using advanced analytics techniques such as text analytics, machine learning, predictive analytics, data mining, statistics, and natural language processing.

Big Data for Energy 68

Big Data for Energy 69

Data Science

“Applying advanced statistical tools to existing data to solve problems, generate new insights, improve products/services”

“Everything that has something to do with data: Collecting, analyzing, modeling...... yet the most important part is its applications --- all sorts of application”

70

What is Data Science?

• Theories and techniques from many fields and disciplines are used to investigate and analyze a large amount of data to help decision makers in many industries such as science, engineering, economics, politics, finance, and education• Computer Science

• Pattern recognition, visualization, data warehousing, High performance computing, Databases, AI

• Mathematics• Mathematical Modeling

• Statistics• Statistical and Stochastic modeling, Probability.

71

Data Science

• A Mashed Up Discipline

• A multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data

72

Data Science

73

Data Science

• New Discipline

• Very few books covering the discipline as a whole

• Interdisciplinary fields like business analysis that incorporate computer science, modeling, statistics, analytics, and mathematics.

74

Monica Rogati https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007

75

Summary

• We are now in BigData

• Huge potential of big data

• Big data analytics plays important roles on all aspects

• Data Scientist would be the most sexiest job

Big Data for Energy 76

Challenges

• Information technology (IT) infrastructure.

• Data collection and governance.

• Data integration and sharing.

• Data processing and analysis.

• Security and privacy.

• Professionals of big data analytics and smart energy management.

Big Data for Energy 77

Thank you…………

78