26
© Cloudera, Inc. All rights reserved. Big data introduction workshop Updated October 24, 2018 http://sli.do/bigdataworkshop

Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved.

Big data introduction workshopUpdated October 24, 2018

http://sli.do/bigdataworkshop

Page 2: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 2

AGENDA

Introduction

What is big data / distributed computing

Map-Reduce concept

Map-Reduce java workshop

HADOOP history & Additional components

Q&A

http://sli.do/bigdataworkshop

Page 3: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 3

About me

Zoltan Siegl

Software [email protected]

http://sli.do/bigdataworkshop

Page 4: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 4

ASSESS CUSTOMER BEHAVIOUR

• Do you think when playing candy crush you randomly get that ping lollipop?

• Well… think again...

BEHAVIOUR

http://sli.do/bigdataworkshop

Page 5: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 5

DETECTING INTERACTIVE ADVERSE DRUG EFFECTS

• No clinical trials for cross effects

• 91 additional ADE-s revealed

• 200 000 man-year of clinic trials did not reveal these

HEALTHCARE

http://sli.do/bigdataworkshop

Page 6: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 6

DIGITAL DEFENDERS OF CHILDREN

• Uses big data against human trafficking

• In 1 year:• Prevent over 860 cases

• Help identify over 300 victims

• Of witch 50 children

PROTECT

http://sli.do/bigdataworkshop

Page 7: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 7

AGENDA

Introduction

What is big data / distributed computing

Map-Reduce concept

Map-Reduce java workshop

HADOOP history & Additional components

Q&A

http://sli.do/bigdataworkshop

Page 8: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 8

WHAT DO YOU MEAN BIG DATAThe amount of DATA possessed and processed

It is 300 000 000 000 000 000 Bytes600 TB processed per day1 billion users / month2.7 billion likes / day300 million photos uploaded / day

NSAThat is pronounced Exabytes5*1018 bytes30 PB processed per day1,6 % of internet traffic touched / dayWeb searches, websites visited, phone calls, skype calls, credit card transactions, etc.* this is a rough estimate. For legal purposes I have to state that I do not have possession of the data they store. You hear that NSA, right? :)

100 PB processed per day60 trillion pages indexed>1 billion unique search users / month2,3 billion search / second

Reference: 2016 Janani Ravi - Building blocks of Hadoop course on Pluralsight

http://sli.do/bigdataworkshop

Page 9: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 9

NO SINGLE COMPUTER IS BIG ENOUGH

• Not even in the 90’s

• Supercomputers are superexpensive

• Amount of data grows. Computers don’t.

http://sli.do/bigdataworkshop

Page 10: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 10

REQUIREMENTS FOR A BIG DATA SYSTEMWhat expectations do we have

STORAGE SCALABILITYCOMPUTATION

http://sli.do/bigdataworkshop

Page 11: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 11

AGENDA

Introduction

What is big data / distributed computing

Map-Reduce concept

Map-Reduce java workshop

HADOOP history & Additional components

Q&A

http://sli.do/bigdataworkshop

Page 12: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 12

DISTRIBUTED COMPUTINGShard data, and shard computing capacity

STORAGE

+

COMPUTATION

STORAGE

+

COMPUTATION

STORAGE

+

COMPUTATION

http://sli.do/bigdataworkshop

Page 13: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 13

MAPPER AND REDUCERLogical view

Mapper

Map(key1, value1) → list(key2, value2)

Reducer

Map(key2, list(value2)) → list(value3)

http://sli.do/bigdataworkshop

Page 14: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 14

BIG- COMPLEXITY OF MAP-REDUCEThis is completely unreasonable

O(n log n * s * (1/p)) where:

- n is the number of items

- s is the number of nodes

- p is the ping time between nodes (assuming equal ping times between all nodes in the network)

http://sli.do/bigdataworkshop

Page 15: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 15

MAPPER AND REDUCERLogical view

Mapper

Map(”shard1”, ”to be or not to be”) → <”to”, 1>, <”be”, 1>, <”or”, 1>,

<”not”, 1>, <”to”, 1>, <”be”, 1>

Reducer

Reduce(”to”, <1,1,1>) → <”to”, 3>

http://sli.do/bigdataworkshop

Page 16: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 16

MAPPER AND REDUCERLogical view

http://sli.do/bigdataworkshop

Page 17: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 17

AGENDA

Introduction

What is big data / distributed computing

Map-Reduce concept

Map-Reduce java workshop

HADOOP history & Additional components

Q&A

http://sli.do/bigdataworkshop

Page 18: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved.

LET’S DO SOME CODING

http://sli.do/bigdataworkshop

Page 19: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 19

AGENDA

Introduction

What is big data / distributed computing

Map-Reduce concept

Map-Reduce java workshop

HADOOP history & Additional components

Q&A

http://sli.do/bigdataworkshop

Page 20: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 20

HISTORY OF HADOOPStarted from Google

HDFS SCALABILITYCOMPUTATION

http://sli.do/bigdataworkshop

Page 21: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 21

HISTORY OF HADOOPHadoop 2

HDFS SCALABILITYCOMPUTATION

http://sli.do/bigdataworkshop

Page 22: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 22

HISTORY ECOSYSTEMOther components

http://sli.do/bigdataworkshop

Page 23: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 23

CLOUDERA MANAGER

http://sli.do/bigdataworkshop

Page 24: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 24

CLOUDERA INTERNSHIP PROGRAM

http://bit.do/clouderaintern

You are

❏ A Bachelors, Masters or PhD full-time student at a Hungarian university where you have learnt programming

❏ Based in Hungary and currently enrolled to a Hungarian university❏ Finishing your studies soon: either in summer 2019 or in winter 2020❏ Able to work 20 hours / week in the active semester (schedule is flexible)❏ Able to start the internship early February 2019

You have

❏ Good verbal and written English skills❏ The urge to learn about system software and algorithms, software quality assurance or

tooling

http://sli.do/bigdataworkshop

Page 25: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved. 25

AGENDA

Introduction

What is big data / distributed computing

Map-Reduce concept

Map-Reduce java workshop

HADOOP history & Additional components

Q&A

http://sli.do/bigdataworkshop

Page 26: Big data introduction workshop - Eötvös Loránd …gsd.web.elte.hu/lectures/bolyai/2019/big_data/BigData...© Cloudera, Inc. All rights reserved. 2 AGENDA Introduction What is big

© Cloudera, Inc. All rights reserved.

THANK YOU