Upload
alex-moundalexis
View
1.925
Download
10
Tags:
Embed Size (px)
DESCRIPTION
Administering a Hadoop cluster isn't easy. Many Hadoop clusters suffer from Linux configuration problems that can negatively impact performance. With vast and sometimes confusing config/tuning options, it can can tempting (and scary) for a cluster administrator to make changes to Hadoop when cluster performance isn't as expected. Learn how to improve Hadoop cluster performance and eliminate common problem areas, applicable across use cases, using a handful of simple Linux configuration changes.
Citation preview
Improving Hadoop Cluster Performance via Linux Configura:on 2014 Hadoop Summit – San Jose, California Alex Moundalexis alexm at clouderagovt.com @technmsg
2
Tips from a Former SA
Click to edit Master :tle style
CC BY 2.0 / Richard Bumgardner
Been there, done that.
4
Tips from a Former SA Field Guy
Click to edit Master :tle style
CC BY 2.0 / Alex Moundalexis
Home sweet home.
6
Tips from a Former SA Field Guy Easy steps to take…
7
Tips from a Former SA Field Guy Easy steps to take… that most people don’t.
What This Talk Isn’t About
• Deploying • Puppet, Chef, Ansible, homegrown scripts, intern labor
• Sizing & Tuning • Depends heavily on data and workload
• Coding • Unless you count STDOUT redirec:on
• Algorithms • I suck at math, but we’ll try some mul:plica:on later
8
9
“ The answer to most Hadoop ques:ons is it
depends.”
So What ARE We Talking About?
• Seven simple things • Quick • Safe • Viable for most environments and use cases
• Iden:fy issue, then offer solu:on
• Note: Commands run as root or sudo
10
11
Bad news, best not to…
1. Swapping
Swapping
• A form of memory management • When OS runs low on memory…
• write blocks to disk • use now-‐free memory for other things • read blocks back into memory from disk when needed
• Also known as paging
12
Swapping
• Problem: Disks are slow, especially to seek • Hadoop is about maximizing IO
• spend less :me acquiring data • operate on data in place • large streaming reads/writes from disk
• Memory usage is limited within JVM • we should be able to manage our memory
13
Disable Swap in Kernel
• Well, as much as possible.
• Immediate: # echo 0 > /proc/sys/vm/swappiness
• Persist ager reboot: # echo “vm.swappiness = 0” >> /etc/sysctl.conf
14
Swapping Peculiari:es
• Behavior varies based on Linux kernel • CentOS 6.4+ / Ubuntu 10.10+ • For you kernel gurus, that’s Linux 2.6.32-‐303+
• Prior • We don’t swap, except to avoid OOM condi:on.
• Ager • We don’t swap, ever.
• Details: hkp://:ny.cloudera.com/noswap
15
16
Disable this too.
2. File Access Time
File Access Time
• Linux tracks access :me • writes to disk even if all you did was read
• Problem • more disk seeks • HDFS is write-‐once, read-‐many • NameNode tracks access informa:on for HDFS
17
Don’t Track Access Time
• Mount volumes with noatime op:on • In /etc/fstab: /dev/sdc /data01 ext3 defaults,noatime 0
• Note: noatime assumes nodirtime as well • What about relatime?
• Faster than atime but slower than noatime • No reboot required
• # mount -‐o remount /data01
18
19
Reclaim it, impress your bosses!
3. Root Reserved Space
Root Reserved Space
• EXT3/4 reserve 5% of disk for root-‐owned files • On an OS disk, sure • System logs, kernel panics, etc
20
Click to edit Master :tle style
CC BY 2.0 / Alex Moundalexis
Disks used to be much smaller, right?
Do The Math
• Conserva:ve • 5% of 1 TB disk = 46 GB • 5 data disks per server = 230 GB • 5 servers per rack = 1.15 TB
• Quasi-‐Aggressive • 5% of 4 TB disk = 186 GB • 12 data disks per server = 2.23 TB • 18 servers per rack = 40.1 TB
• That’s a LOT of unused storage! 22
Root Reserved Space
• On a Hadoop data disk, no root-‐owned files
• When crea:ng a par::on # mkfs.ext3 –m 0 /dev/sdc
• On exis:ng par::ons # tune2fs -‐m 0 /dev/sdc
• 0 is safe, 1 is for the ultra-‐paranoid
23
24
Turn it on, already!
4. Name Service Cache Daemon
Name Service Cache Daemon
• Daemon that caches name service requests • Passwords • Groups • Hosts
• Helps weather network hiccups • Helps more with high latency LDAP, NIS, NIS+ • Small footprint • Zero configura:on required
25
Name Service Cache Daemon
• Hadoop nodes • largely a network-‐based applica:on • on the network constantly • issue lots of DNS lookups, especially HBase & distcp • can thrash DNS servers
• Reducing latency of service requests? Smart. • Reducing impact on shared infrastructure? Smart.
26
Name Service Cache Daemon
• Turn it on, let it work, leave it alone: # chkconfig -‐-‐level 345 nscd on # service nscd start
• Check on it later: # nscd -‐g
• Unless using Red Hat SSSD; modify ncsd config first! • Don’t use nscd to cache passwd, group, or netgroup • Red Hat, Using NSCD with SSSD. hkp://goo.gl/68HTMQ
27
28
Not a problem, un:l they are.
5. File Handle Limits
File Handle Limits
• Kernel refers to files via a handle • Also called descriptors
• Linux is a mul:-‐user system • File handles protect the system from
• Poor coding • Malicious users • Pictures of cats on the Internet
29
30 Microsog Office EULA. Really.
java.io.FileNotFoundExcep:on: (Too many open files)
File Handle Limits
• Linux defaults usually not enough • Increase maximum open files (default 1024)
# echo hdfs – nofile 32768 >> /etc/security/limits.conf # echo mapred – nofile 32768 >> /etc/security/limits.conf # echo hbase – nofile 32768 >> /etc/security/limits.conf
• Bonus: Increase maximum processes too # echo hdfs – nproc 32768 >> /etc/security/limits.conf # echo mapred – nproc 32768 >> /etc/security/limits.conf # echo hbase – nproc 32768 >> /etc/security/limits.conf
• Note: Cloudera Manager will do this for you.
31
32
Don’t be tempted to share, even on monster disks.
6. Dedicated Disk for OS and Logs
The Situa:on in Easy Steps
1. Your new server has a dozen 1 TB disks 2. Eleven disks are used to store data 3. One disk is used for the OS
• 20 GB for the OS • 980 GB sits unused
4. Someone asks “can we store data there too?” 5. Seems reasonable, lots of space… “OK, why not.”
Sound familiar?
33
34 Microsog Office EULA. Really.
I don’t understand it, there’s no consistency to these run >mes!
No Love for Shared Disk
• Our quest for data gets interrupted a lot: • OS opera:ons • OS logs • Hadoop logging, quite chaky • Hadoop execu:on • userspace execu:on
• Disk seeks are slow, remember?
35
Dedicated Disk for OS and Logs
• At install :me • Disk 0, OS & logs • Disk 1-‐n, Hadoop data
• Ager install, more complicated effort, requires manual HDFS block rebalancing: 1. Take down HDFS
• If you can do it in under 10 minutes, just the DataNode 2. Move or distribute blocks from disk0/dir to disk[1-‐n]/dir 3. Remove dir from HDFS config (dfs.data.dir) 4. Start HDFS
36
37
Sane, both forward and reverse.
7. Name Resolu:on
Name Resolu:on Op:ons
1. Hosts file, if you must 2. DNS, much preferred
38
Name Resolu:on with Hosts File
• Set canonical names properly
• Right 10.1.1.1 r01m01.cluster.org r01m01 master1 10.1.1.2 r01w01.cluster.org r01w01 worker1
• Wrong 10.1.1.1 r01m01 r01m01.cluster.org master1 10.1.1.2 r01w01 r01w01.cluster.org worker1
39
Name Resolu:on with Hosts File
• Set loopback address properly • Ensure 127.0.0.1 resolves to localhost, NOT hostname
• Right 127.0.0.1 localhost
• Wrong 127.0.0.1 r01m01
40
Name Resolu:on with DNS
• Forward • Reverse
• Hostname should MATCH the FQDN in DNS
41
This Is What You Ought to See
42
Name Resolu:on Errata
• Mismatches? Expect odd results. • Problems star:ng DataNodes • Non-‐FQDN in Web UI links • Security features are extra sensi:ve to FQDN
• Errors so common that link to FAQ is included in logs! • hkp://wiki.apache.org/hadoop/UnknownHost
• Get name resolu:on working BEFORE enabling nscd!
43
44
Time to take out your camera phones…
Summary
Summary
1. disable vm.swappiness 2. data disks: mount with noatime op:on 3. data disks: disable root reserve space 4. enable nscd 5. increase file handle limits 6. use dedicated OS/logging disk 7. sane name resolu:on
hkp://:ny.cloudera.com/7steps
45
Recommended Reading
• Hadoop Opera:ons hkp://amzn.to/1hDaN9B
46
47
Preferably related to the talk…
Ques:ons?
48
Thank You! Alex Moundalexis alexm at clouderagovt.com @technmsg We’re hiring, kids! Well, not kids.
49
Because we had enough :me…
8. Bonus Round
Others Things to Check
• Disk IO • hdparm
• # hdparm -‐Tt /dev/sdc • Looking for at least 70 MB/s from 7200 RPM disks • Slower could indicate a failing drive, disk controller, array, etc.
• dd • hkp://romanrm.ru/en/dd-‐benchmark
50
Others Things to Check
• Disable Red Hat Transparent Huge Pages (RH6+ Only) • Can reduce elevated CPU usage • In rc.local:
echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
• Reference: Linux 6 Transparent Huge Pages and Hadoop Workloads, hkp://goo.gl/WSF2qC
51
Others Things to Check
• Enable Jumbo Frames • Only if your network infrastructure supports it! • Can easily (and arguably) boost throughput by 10-‐20%
52
Others Things to Check
• Enable Jumbo Frames • Only if your network infrastructure supports it! • Can easily (and arguably) boost throughput by 10-‐20%
• Monitor Everything • How else will you know what’s happening? • Nagios, Ganglia, CM, Ambari
53
54
Thank You! Alex Moundalexis alexm at clouderagovt.com @technmsg We’re hiring, kids! Well, not kids.