Improving Hadoop Performance via Linux

Improving Hadoop Cluster Performance via Linux Configura:on 2014 Hadoop Summit – San Jose, California Alex Moundalexis alexm at clouderagovt.com @technmsg

2

Tips from a Former SA

Click to edit Master :tle style

CC BY 2.0 / Richard Bumgardner

Been there, done that.

4

Tips from a Former SA Field Guy


CC BY 2.0 / Alex Moundalexis

Home sweet home.

6

Tips from a Former SA Field Guy Easy steps to take…

7

Tips from a Former SA Field Guy Easy steps to take… that most people don’t.

What This Talk Isn’t About

•  Deploying •  Puppet, Chef, Ansible, homegrown scripts, intern labor

•  Sizing & Tuning •  Depends heavily on data and workload

•  Coding •  Unless you count STDOUT redirec:on

•  Algorithms •  I suck at math, but we’ll try some mul:plica:on later

8

9

“ The answer to most Hadoop ques:ons is it

depends.”

So What ARE We Talking About?

•  Seven simple things •  Quick •  Safe •  Viable for most environments and use cases

•  Iden:fy issue, then offer solu:on

•  Note: Commands run as root or sudo

10

11

Bad news, best not to…

1. Swapping

Swapping

•  A form of memory management •  When OS runs low on memory…

•  write blocks to disk •  use now-‐free memory for other things •  read blocks back into memory from disk when needed

•  Also known as paging

12

Swapping

•  Problem: Disks are slow, especially to seek •  Hadoop is about maximizing IO

•  spend less :me acquiring data •  operate on data in place •  large streaming reads/writes from disk

•  Memory usage is limited within JVM •  we should be able to manage our memory

13

Disable Swap in Kernel

•  Well, as much as possible.

•  Immediate: # echo 0 > /proc/sys/vm/swappiness

•  Persist ager reboot: # echo “vm.swappiness = 0” >> /etc/sysctl.conf

14

Swapping Peculiari:es

•  Behavior varies based on Linux kernel •  CentOS 6.4+ / Ubuntu 10.10+ •  For you kernel gurus, that’s Linux 2.6.32-‐303+

•  Prior •  We don’t swap, except to avoid OOM condi:on.

•  Ager •  We don’t swap, ever.

•  Details: hkp://:ny.cloudera.com/noswap

15

16

Disable this too.

2. File Access Time

File Access Time

•  Linux tracks access :me •  writes to disk even if all you did was read

•  Problem •  more disk seeks •  HDFS is write-‐once, read-‐many •  NameNode tracks access informa:on for HDFS

17

Don’t Track Access Time

•  Mount volumes with noatime op:on •  In /etc/fstab: /dev/sdc /data01 ext3 defaults,noatime 0

•  Note: noatime assumes nodirtime as well •  What about relatime?

•  Faster than atime but slower than noatime •  No reboot required

•  # mount -‐o remount /data01

18

19

Reclaim it, impress your bosses!

3. Root Reserved Space

Root Reserved Space

•  EXT3/4 reserve 5% of disk for root-‐owned files •  On an OS disk, sure •  System logs, kernel panics, etc

20


CC BY 2.0 / Alex Moundalexis

Disks used to be much smaller, right?

Do The Math

•  Conserva:ve •  5% of 1 TB disk = 46 GB •  5 data disks per server = 230 GB •  5 servers per rack = 1.15 TB

•  Quasi-‐Aggressive •  5% of 4 TB disk = 186 GB •  12 data disks per server = 2.23 TB •  18 servers per rack = 40.1 TB

•  That’s a LOT of unused storage! 22

Root Reserved Space

•  On a Hadoop data disk, no root-‐owned files

•  When crea:ng a par::on # mkfs.ext3 –m 0 /dev/sdc

•  On exis:ng par::ons # tune2fs -‐m 0 /dev/sdc

•  0 is safe, 1 is for the ultra-‐paranoid

23

24

Turn it on, already!

4. Name Service Cache Daemon

Name Service Cache Daemon

•  Daemon that caches name service requests •  Passwords •  Groups •  Hosts

•  Helps weather network hiccups •  Helps more with high latency LDAP, NIS, NIS+ •  Small footprint •  Zero configura:on required

25


•  Hadoop nodes •  largely a network-‐based applica:on •  on the network constantly •  issue lots of DNS lookups, especially HBase & distcp •  can thrash DNS servers

•  Reducing latency of service requests? Smart. •  Reducing impact on shared infrastructure? Smart.

26


•  Turn it on, let it work, leave it alone: # chkconfig -‐-‐level 345 nscd on # service nscd start

•  Check on it later: # nscd -‐g

•  Unless using Red Hat SSSD; modify ncsd config first! •  Don’t use nscd to cache passwd, group, or netgroup •  Red Hat, Using NSCD with SSSD. hkp://goo.gl/68HTMQ

27

28

Not a problem, un:l they are.

5. File Handle Limits

File Handle Limits

•  Kernel refers to files via a handle •  Also called descriptors

•  Linux is a mul:-‐user system •  File handles protect the system from

•  Poor coding •  Malicious users •  Pictures of cats on the Internet

29

30 Microsog Office EULA. Really.

java.io.FileNotFoundExcep:on: (Too many open files)

File Handle Limits

•  Linux defaults usually not enough •  Increase maximum open files (default 1024)

# echo hdfs – nofile 32768 >> /etc/security/limits.conf # echo mapred – nofile 32768 >> /etc/security/limits.conf # echo hbase – nofile 32768 >> /etc/security/limits.conf

•  Bonus: Increase maximum processes too # echo hdfs – nproc 32768 >> /etc/security/limits.conf # echo mapred – nproc 32768 >> /etc/security/limits.conf # echo hbase – nproc 32768 >> /etc/security/limits.conf

•  Note: Cloudera Manager will do this for you.

31

32

Don’t be tempted to share, even on monster disks.

6. Dedicated Disk for OS and Logs

The Situa:on in Easy Steps

1.  Your new server has a dozen 1 TB disks 2.  Eleven disks are used to store data 3.  One disk is used for the OS

•  20 GB for the OS •  980 GB sits unused

4.  Someone asks “can we store data there too?” 5.  Seems reasonable, lots of space… “OK, why not.”

Sound familiar?

33

34 Microsog Office EULA. Really.

I don’t understand it, there’s no consistency to these run >mes!

No Love for Shared Disk

•  Our quest for data gets interrupted a lot: •  OS opera:ons •  OS logs •  Hadoop logging, quite chaky •  Hadoop execu:on •  userspace execu:on

•  Disk seeks are slow, remember?

35

Dedicated Disk for OS and Logs

•  At install :me •  Disk 0, OS & logs •  Disk 1-‐n, Hadoop data

•  Ager install, more complicated effort, requires manual HDFS block rebalancing: 1.  Take down HDFS

•  If you can do it in under 10 minutes, just the DataNode 2.  Move or distribute blocks from disk0/dir to disk[1-‐n]/dir 3.  Remove dir from HDFS config (dfs.data.dir) 4.  Start HDFS

36

37

Sane, both forward and reverse.

7. Name Resolu:on

Name Resolu:on Op:ons

1.  Hosts file, if you must 2.  DNS, much preferred

38

Name Resolu:on with Hosts File

•  Set canonical names properly

•  Right 10.1.1.1 r01m01.cluster.org r01m01 master1 10.1.1.2 r01w01.cluster.org r01w01 worker1

•  Wrong 10.1.1.1 r01m01 r01m01.cluster.org master1 10.1.1.2 r01w01 r01w01.cluster.org worker1

39

Name Resolu:on with Hosts File

•  Set loopback address properly •  Ensure 127.0.0.1 resolves to localhost, NOT hostname

•  Right 127.0.0.1 localhost

•  Wrong 127.0.0.1 r01m01

40

Name Resolu:on with DNS

•  Forward •  Reverse

•  Hostname should MATCH the FQDN in DNS

41

This Is What You Ought to See

42

Name Resolu:on Errata

•  Mismatches? Expect odd results. •  Problems star:ng DataNodes •  Non-‐FQDN in Web UI links •  Security features are extra sensi:ve to FQDN

•  Errors so common that link to FAQ is included in logs! •  hkp://wiki.apache.org/hadoop/UnknownHost

•  Get name resolu:on working BEFORE enabling nscd!

43

44

Time to take out your camera phones…

Summary

Summary

1.  disable vm.swappiness 2.  data disks: mount with noatime op:on 3.  data disks: disable root reserve space 4.  enable nscd 5.  increase file handle limits 6.  use dedicated OS/logging disk 7.  sane name resolu:on

hkp://:ny.cloudera.com/7steps

45

Recommended Reading

•  Hadoop Opera:ons hkp://amzn.to/1hDaN9B

46

47

Preferably related to the talk…

Ques:ons?

48

Thank You! Alex Moundalexis alexm at clouderagovt.com @technmsg We’re hiring, kids! Well, not kids.

49

Because we had enough :me…

8. Bonus Round

Others Things to Check

•  Disk IO •  hdparm

•  # hdparm -‐Tt /dev/sdc •  Looking for at least 70 MB/s from 7200 RPM disks •  Slower could indicate a failing drive, disk controller, array, etc.

•  dd •  hkp://romanrm.ru/en/dd-‐benchmark

50


•  Disable Red Hat Transparent Huge Pages (RH6+ Only) •  Can reduce elevated CPU usage •  In rc.local:

echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled

•  Reference: Linux 6 Transparent Huge Pages and Hadoop Workloads, hkp://goo.gl/WSF2qC

51


•  Enable Jumbo Frames •  Only if your network infrastructure supports it! •  Can easily (and arguably) boost throughput by 10-‐20%

52


•  Enable Jumbo Frames •  Only if your network infrastructure supports it! •  Can easily (and arguably) boost throughput by 10-‐20%

•  Monitor Everything •  How else will you know what’s happening? •  Nagios, Ganglia, CM, Ambari

53

54

Thank You! Alex Moundalexis alexm at clouderagovt.com @technmsg We’re hiring, kids! Well, not kids.

Technology

Improving Hadoop Performance via Linux