22
ECONOMY BEHIND BIG DATA GREGORY CHOI WWW.MBAPROGRAMMER.COM

The economy behind big data technology

Embed Size (px)

Citation preview

Page 1: The economy behind big data technology

ECONOMY BEHIND BIG DATA

G R E G O RY C H O IW W W. M B A P R O G R A M M E R . C O M

Page 2: The economy behind big data technology

BIG DATA VS DATA MINING

• Please don’ get confused with them! They are not interchangeable• I’ll explain why one by one• Do you want to follow me?

Page 3: The economy behind big data technology

BIG DATA

• It could be misleading that the goal of “Big Data” is to achieve handle large scale data.• The goal of Big data is to achieve “Scale-out”

structure– REDUCING COST

Page 4: The economy behind big data technology

SCALE-UP VS SCALE-OUT

10 Core

10 Core

10 Core

10 Core

10 Core

10 Core

10 Core

10 Core

Scale -up

Scale – out

Increase computing powerin one machine

EXPENSIVE

Increase computing power by increasing the number of machine

CHEAP

Page 5: The economy behind big data technology

SCALE-UP VS SCALE-OUT

• Think about this way• Which one is cheaper?

– Quad-core (4 Core) PC x 2– Octa-core (8 Core) PC x 1

• Generally Quad-core PC x 2 is cheaper than one octa-core PC. – This is because only limited number of mother board makers produce

the board that support 8-core

Page 6: The economy behind big data technology

WHY DO WE CHOOSE SCALE-OUT OVER SCALE-UP STRUCTURE

Page 7: The economy behind big data technology

THE DIFFICULTY OF SCALE-OUT STRUCTURE• How do we balance the CPU usage across the machines?• If one machine fails, how do we manage it?• How do we distribute the tasks to each machine?• What if do we add one machine more?

• Conclusion: DIFFICULT

Page 8: The economy behind big data technology

CASE 01 – BUSINESS TRANSACTION IN RDBMS• Let’s assume that we need to handle the 1 TB database• 100 million transactions in a day• You want to handle this without any failure• You are a H/W architecture. What would you do?

Page 9: The economy behind big data technology

H/W ARCHITECTURE FOR THAT

Commercial DB

Unix(40 Core)

Firewall / L2

Commercial DB

Unix(40 Core)

SAN Switch

Storage 1TB Storage 1TBMirroring

Cluster

Page 10: The economy behind big data technology

ESTIMATED COST

[S/W]DB License $5,000 / Core * 80 = $400,000Clustering $50,000[H/W]40 Core Unix x 2 = $1,000,000Storage = $100,000Switches = $30,000

Discretion: This is not an actual price. It depends on your sales history. I wrote this based upon my experience

Total

Roughly$2,000,000

Page 11: The economy behind big data technology

PROBLEM

Your CFO probably tells you.

“That’s too expensive. Is there any way to reduce the cost?”

Page 12: The economy behind big data technology

CASE 02 – BUSINESS TRANSACTION IN HADOOP

10 CoreHP

DL380x86

10 CoreHP

DL380x86

10 CoreHP

DL380x86

10 CoreHP

DL380x86

10 CoreHP

DL380x86

10 CoreHP

DL380x86

10 CoreHP

DL380x86

10 CoreHP

DL380x86

F/WSwitc

h

Suppose each server has 500 GB SCSI HDD. 500GB x 8 = 2 TBIt is able to support full mirroring option

Page 13: The economy behind big data technology

ESTIMATED COST

[S/W]Hadoop is open-source. It’s free![H/W]10 Core x86 machine x 8 = $80,000Switches = $30,000

Discretion: This is not an actual price. It depends on your sales history. I wrote this based upon my experience

Total

Roughly$110,000

vs $2,000,000 Unix +Commercial DB

Page 14: The economy behind big data technology

SCALABILITY

• Let’s assume that we have more customers. We need more computing power.

[Unix + commercial DB]I need to buy one more server, one more storage, and 40 core commercial DB license => Prohibitively expensive[Linux + Hadoop]Just add one more x86 server. It’s not a big deal. => Cheap

Page 15: The economy behind big data technology

IS HADOOP ALIGHTY?

• No– You have to use JAVA code in lieu of SQL– You have to code Map-Reduce to retrieve the data or manipulate the data

that takes a form that you want.– It doesn’t have sophisticated data management technology to get optimized

performance– Open Source. Don’t expect any type of technical support

• With Commercial RDBMS, it has mutual supportive relationship. – RDBMS: real time transaction– Big Data: Business Intelligence

Page 16: The economy behind big data technology

DATA MINING

• Please don’t get confused it with Big Data!

Big Data ≠ Data MiningWhere do we store the data How do we use the data

Page 17: The economy behind big data technology

DATA MININGSuppose that you are in charge of issuing credit cards. You want to know who is likely to default…You already have records of past transactions.

Gender Zipcode Age Education Income Default

Male 46637 33 Master $90,000 No

Female 10001 21 GED $50,000 Yes

… … … … … …

Page 18: The economy behind big data technology

DATA MINING

Income

Age

35

$30,000

There is a certain group of people who are likely to default.

Page 19: The economy behind big data technology

ALGORITHMS

• K-nearest Algorithm• Classification Tree• Naïve Bayes• Machine Learning

Page 20: The economy behind big data technology

DATA MINING

• From existing data, identify the relationship between Y and X value.– y=f(x1, x2, x3, …)– It could be y = ax, y=log(x), y=exp(x). We don’t know, but machine is

capable of trying it to find out the best fitted model to account for Y value.

• AlphaGo, Google’s AI Go player, adopted this technology and advanced it to ultimate level

– Y value: the probability to win this game– X values: the positions of white and black stones

Page 21: The economy behind big data technology

WHAT CAN WE DO WITH DATA MINING?• Combining with Big Data Technology• Identify marketing opportunity

– Analyzing who has purchased our products?• Financial Fraud

– Which transaction looks fraudulent?• Artificial Intelligence

– Go, Chess, other games• Etc.

Page 22: The economy behind big data technology

Q&A

• If you have any question, feel free to ask me.www.mbaprogrammer.com