38
Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi ve Yaratıcı İstanbul Mali Destek Programı kapsamında yürütülmekte olan TR10/16/YNY/0036 no’lu İstanbul Big Data Eğitim ve Araştırma Merkezi Projesi dahilinde gerçekleştirilmiştir. İçerik ile ilgili tek sorumluluk Bahçeşehir Üniversitesi’ne ait olup İSTKA veya Kalkınma Bakanlığı’nın görüşlerini yansıtmamaktadır.

Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

Hadoop Distributed File System(HDFS)

Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi ve Yaratıcı İstanbul Mali Destek Programı kapsamında

yürütülmekte olan TR10/16/YNY/0036 no’lu İstanbul Big Data Eğitim ve Araştırma Merkezi Projesi dahilinde

gerçekleştirilmiştir. İçerik ile ilgili tek sorumluluk Bahçeşehir Üniversitesi’ne ait olup İSTKA veya Kalkınma Bakanlığı’nın

görüşlerini yansıtmamaktadır.

Page 2: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

Motivation Questions

• Problem 1: Data is too big to store on one machine.

• HDFS: Store the data on multiple machines!

Page 3: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

Motivation Questions

• Problem 2: Very high end machines are too expensive

• HDFS: Run on commodity hardware!

Page 4: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

Motivation Questions

• Problem 3: Commodity hardware will fail!

• HDFS: Software is intelligent enough to handle hardware failure!

Page 5: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

Motivation Questions

• Problem 4: What happens to the data if the machine stores the data fails?

• HDFS: Replicate the data!

Page 6: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

Motivation Questions

• Problem 5: How can distributed machines organize the data in a coordinated way?

• HDFS: Master-Slave Architecture!

Page 7: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

How do we get data to the workers?

Compute Nodes

NAS

SAN

What’s the problem here?

Page 8: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

Distributed File System

• Don’t move data to workers… move workers to the data! – Store data on the local disks of nodes in the cluster – Start up the workers on the node that has the data

local

• Why? – Not enough RAM to hold all the data in memory – Disk access is slow, but disk throughput is reasonable

• A distributed file system is the answer – GFS (Google File System) for Google’s MapReduce – HDFS (Hadoop Distributed File System) for Hadoop

Page 9: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

Distributed File System

• Single Namespace for entire cluster

• Data Coherency

– Write-once-read-many access model

– Client can only append to existing files

• Files are broken up into blocks

–Each block replicated on multiple DataNodes

• Intelligent Client

– Client can find location of blocks

– Client accesses data directly from DataNode

Page 10: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

GFS: Assumptions

• Commodity hardware over “exotic” hardware – Scale “out”, not “up”

• High component failure rates – Inexpensive commodity components fail all the time

• “Modest” number of huge files – Multi-gigabyte files are common, if not encouraged

• Files are write-once, mostly appended to – Perhaps concurrently

• Large streaming reads over random access – High sustained throughput over low latency

GFS slides adapted from material by (Ghemawat et al., SOSP 2003)

Page 11: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

GFS: Design Decisions

• Files stored as chunks – Fixed size (64MB)

• Reliability through replication – Each chunk replicated across 3+ chunkservers

• Single master to coordinate access, keep metadata – Simple centralized management

• No data caching – Little benefit due to large datasets, streaming reads

• Simplify the API – Push some of the issues onto the client (e.g., data layout)

HDFS = GFS clone (same basic ideas)

Page 12: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

From GFS to HDFS

• Terminology differences:

– GFS master = Hadoop namenode

– GFS chunkservers = Hadoop datanodes

• Functional differences:

– HDFS performance is (likely) slower

For the most part, we’ll use the Hadoop terminology…

Page 13: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Architecture: Master-Slave

• Name Node: Controller – File System Name Space

Management

– Block Mappings

• Data Node: Work Horses – Block Operations

– Replication

• Secondary Name Node: – Checkpoint node

Master

Slaves

Name Node (NN)

Data Node (DN)

Secondary Name Node (SNN)

Single Rack Cluster

Page 14: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

Secondary NameNode

Client

HDFS Cluster Architecture

NameNode

DataNodes

Cluster Membership

Cluster Membership

NameNode : Maps a file to a file-id and list of MapNodes DataNode : Maps a block-id to a physical location on disk SecondaryNameNode: Periodic merge of Transaction log

Page 15: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

Block Placement

• Current Strategy

-- One replica on local node

-- Second replica on a remote rack

-- Third replica on same remote rack

-- Additional replicas are randomly placed

• Clients read from nearest replica

• Would like to make this policy pluggable

Page 16: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

Data Correctness

• Use Checksums to validate data

– Use CRC32

• File Creation

– Client computes checksum per 512 byte

– DataNode stores the checksum

• File access

– Client retrieves the data and checksum from DataNode

– If Validation fails, Client tries other replicas

Page 17: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Architecture: Master-Slave

Name Node (NN)

Data Node (DN)

Secondary Name Node (SNN)

Multiple-Rack Cluster

Data Node (DN) Data Node (DN)

Switch Switch

Rack 1 Rack 2 Rack N . . .

NN will replicate lost

blocks in another node

I know all blocks and replicas!

Reliable Storage

Page 18: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Architecture: Master-Slave

Name Node (NN)

Data Node (DN)

Secondary Name Node (SNN)

Multiple-Rack Cluster

Data Node (DN) Data Node (DN)

Switch Switch

Rack 1 Rack 2 Rack N . . .

NN will replicate lost blocks across

racks

I know the topology of the cluster!

Rack Awareness

Page 19: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Architecture: Master-Slave

Name Node (NN)

Data Node (DN)

Secondary Name Node (SNN)

Multiple-Rack Cluster

Data Node (DN) Data Node (DN)

Switch Switch

Rack 1 Rack 2 Rack N . . .

Do not ask me, I am down

Single Point of Failure

Page 20: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Architecture: Master-Slave

Name Node (NN)

Data Node (DN)

Secondary Name Node (SNN)

Multiple-Rack Cluster

Data Node (DN) Data Node (DN)

Switch Switch

Rack 1 Rack 2 Rack N . . .

Keep bulky communication within a rack!

How about network

performance?

Page 21: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Inside: Name Node

Filename Replication factor Block ID

File 1 3 [1, 2, 3]

File 2 2 [4, 5, 6]

File 3 1 [7,8]

1, 2, 5, 7, 4, 3

1, 5, 3, 2, 8, 6

1, 4, 3, 2, 6

Name Node

Data Nodes

Snapshot of FS Edit log: record changes to FS

Page 22: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Inside: Name Node

Name Node

Data Nodes

FS image

Edit log

Secondary Name Node

FS image

Edit log

Periodically

- House Keeping - Backup NN Meta Data

Page 23: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Inside: Blocks

• Q: Why do we need the abstraction “Blocks” in addition to “Files”?

• Reasons:

• File can be larger than a single disk

• Block is of fixed size, easy to manage and manipulate

• Easy to replicate and do more fine grained load balancing

Page 24: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Inside: Blocks

• HDFS Block size is by default 64 MB, why it is much larger than regular file system block?

• Reasons:

• Minimize overhead: disk seek time is almost constant

• Example: seek time: 10 ms, file transfer rate: 100MB/s, overhead (seek time/a block transfer time) is 1%, what is the block size?

• 100 MB (HDFS-> 128 MB)

Page 25: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Inside: Read

Client

Name Node

DN1 DN2 DN3 DNn . . .

1

2

3 4

1. Client connects to NN to read data 2. NN tells client where to find the data blocks 3. Client reads blocks directly from data nodes (without going through NN) 4. In case of node failures, client connects to another node that serves the

missing block

Page 26: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Inside: Read

• Q: Why does HDFS choose such a design for read? Why not ask client to read blocks through NN?

• Reasons: • Prevent NN from being the bottleneck of the

cluster

• Allow HDFS to scale to large number of concurrent clients

• Spread the data traffic across the cluster

Page 27: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Inside: Read

• Q: Given multiple replicas of the same block, how does NN decide which replica the client should read?

• HDFS Solution:

• Rack awareness based on network topology

Page 28: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Inside: Network Topology

• The critical resource in HDFS is bandwidth, distance is defined based on that

• Measuring bandwidths between any pair of nodes is too complex and does not scale

• Basic Idea: – Processes on the same node

– Different nodes on the same rack

– Nodes on different racks in the same

data center (cluster)

– Nodes in different data centers

Bandwidth becomes less

Page 29: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Inside: Network Topology

• HDFS takes a simple approach:

– See the network as a tree

– Distance between two nodes is the sum of their distances to their closest common ancestor

Rack 3

n5

n6

Rack 4

n7

n8

Data center 2

Rack 1

n1

n2

Rack 2

n3

n4

Data center 1

Page 30: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Inside: Network Topology

• What are the distance of the following pairs: Dist (d1/r1/n1, d1/r1/n1)=

Dist(d1/r1/n1, d1/r1/n2)=

Dist(d1/r1/n1, d1/r2/n3)=

Dist(d1/r1/n1, d2/r3/n6)=

Rack 3

n5

n6

Rack 4

n7

n8

Data center 2

Rack 1

n1

n2

Rack 2

n3

n4

Data center 1

0

2

4

6

Page 31: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Inside: Write

Client

Name Node

DN1 DN2 DN3 DNn . . .

1

2

3

4

1. Client connects to NN to write data 2. NN tells client write these data nodes 3. Client writes blocks directly to data nodes with desired replication factor 4. In case of node failures, NN will figure it out and replicate the missing blocks

Page 32: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Inside: Write

• Q: Where should HDFS put the three replicas of a block? What tradeoffs we need to consider?

• Tradeoffs:

• Reliability

• Write Bandwidth

• Read Bandwidth

Q: What are some possible strategies?

Page 33: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Inside: Write • Replication Strategy vs Tradeoffs

Reliability Write Bandwidth

Read Bandwidth

Put all replicas on one node

Put all replicas on different racks

Page 34: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Inside: Write • Replication Strategy vs Tradeoffs

Reliability Write Bandwidth

Read Bandwidth

Put all replicas on one node

Put all replicas on different racks

HDFS: 1-> same node as client 2-> a node on different rack 3-> a different node on the same rack as 2

Page 35: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Interface

• Web Based Interface

• Command Line: hdfs fs Shell

Page 36: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS-Web UI

Page 37: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS-Web UI

Page 38: Hadoop Distributed File System(HDFS) - Big Data · 2018-01-31 · Hadoop Distributed File System(HDFS) Bu eğitim sunumları İstanbul Kalkınma Ajansı’nın 2016 yılı Yenilikçi

HDFS Command Line • Hdfs Shell