19
I accidentally the Namenode HDFS reliability at Facebook Andrew Ryan Facebook April 2012

Hadoop Distributed File System Reliability and Durability at Facebook

Embed Size (px)

Citation preview

Page 1: Hadoop Distributed File System Reliability and Durability at Facebook

I accidentally the Namenode HDFS reliability at Facebook

Andrew Ryan Facebook April 2012

Page 2: Hadoop Distributed File System Reliability and Durability at Facebook

The HDFS Namenode: SPOF by design

▪  Single Point of Failure by design

▪  All metadata operations go through Namenode

▪  Early designers made tradeoffs: features & performance first

Namenode Secondary Namenode

Datanode Clients

Data

Simplified HDFS Architecture: Namenode as SPOF

Page 3: Hadoop Distributed File System Reliability and Durability at Facebook

HDFS major use cases at Facebook Data Warehouse and Facebook Messages

Data Warehouse Facebook Messages

# of clusters <10 10’s Size of clusters Large

(100’s – 1000’s of nodes)

Small (~100 nodes)

Processing workload MapReduce batch jobs

HBase transactions

Namenode load Very heavy Very light End-user downtime impact

None Users without Messages

Page 4: Hadoop Distributed File System Reliability and Durability at Facebook

HDFS at Facebook: 2009-2012 Some things have changed…

2009 2012 # HDFS clusters 1 >100

Largest HDFS cluster size (TB) 600TB >100PB

Largest HDFS cluster size (# files) 10 million 200 million

HDFS cluster types MapReduce MapReduce, HBase, MySQL backups, +more

Page 5: Hadoop Distributed File System Reliability and Durability at Facebook

HDFS at Facebook: 2009-2012 …and some things have not

2009 2012 Single points of failure in HDFS Namenode Namenode

HDFS cluster restart time 60 minutes 60 minutes

Namenode failover method Manual, complicated

Manual, complicated

SPOF Namenode as a cause of downtime

Unknown Unknown

Page 6: Hadoop Distributed File System Reliability and Durability at Facebook

Data Warehouse

▪  Storage and querying of structured log data using Hive and Hadoop MapReduce

▪  Composed of dozens of tools/components

▪  A “vigorous and creative” user population Storage (HDFS)

Compute (MapReduce)

Query (Hive)

Workflow (Nocron)

UI Tools

Hadoop

Page 7: Hadoop Distributed File System Reliability and Durability at Facebook

Data Warehouse: all incidents 41% are HDFS-related

Page 8: Hadoop Distributed File System Reliability and Durability at Facebook

Data Warehouse: SPOF Namenode incidents 10% are SPOF Namenode

Page 9: Hadoop Distributed File System Reliability and Durability at Facebook

Facebook Messages

Messages Cell

Application Server

HBase/HDFS/ZK

Haystack

Clients (www, chat, MTA, etc.)

Mail

Anti-spam

Mail Servers

User Directory Service

Outbound Mail

Page 10: Hadoop Distributed File System Reliability and Durability at Facebook

Messages: all incidents 16% are HDFS-related

Page 11: Hadoop Distributed File System Reliability and Durability at Facebook

Messages: SPOF Namenode incidents 10% are SPOF Namenode

Page 12: Hadoop Distributed File System Reliability and Durability at Facebook

What would happen if… Instead of this…

Namenode Secondary Namenode

Datanode Clients

Data

Simplified HDFS Architecture: Namenode as SPOF

Page 13: Hadoop Distributed File System Reliability and Durability at Facebook

What would happen if… We had this!

Primary Namenode

Standby Namenode

Datanode Clients

Data

Simplified HDFS Architecture: Highly Available Namenode

Page 14: Hadoop Distributed File System Reliability and Durability at Facebook

AvatarNode is our solution

AvatarNode datanode view AvatarNode client view

Page 15: Hadoop Distributed File System Reliability and Durability at Facebook

AvatarNode is… ▪  A two-node, highly available Namenode with manual failover

▪  In production today at Facebook

▪  Open-sourced, based on Hadoop 0.20: https://github.com/facebook/hadoop-20

Page 16: Hadoop Distributed File System Reliability and Durability at Facebook

AvatarNode does not… ▪  Eliminate the dependency on shared storage for image/edits

▪  Provide instant failover (~1 second per million blocks+files)

▪  Provide automated failover

▪  Guarantee I/O fencing for Primary/Standby (although precautions are taken)

▪  Require Zookeeper at all times for proper normal operation (required for failover)

▪  Allow for >2 Namenodes to participate in an HA cluster

▪  Have any special network requirements

Page 17: Hadoop Distributed File System Reliability and Durability at Facebook

Wrapping up… ▪  The SPOF Namenode is a weak link of HDFS’s design

▪  In our services which use HDFS, we estimate we could eliminate:

▪  10% of service downtime from unscheduled outages

▪  20-50% of downtime from scheduled maintenance

▪  AvatarNode is Facebook’s solution for 0.20, available today

▪  Other Namenode HA solutions are being worked on in HDFS trunk (HDFS-1623)

Page 18: Hadoop Distributed File System Reliability and Durability at Facebook

Questions?

Page 19: Hadoop Distributed File System Reliability and Durability at Facebook

Page 19

Sessions will resume at 11:25am