97
Designing for Garbage Collection Gregg Donovan Senior Software Engineer Etsy.com Wednesday, July 31, 13

Designing for garbage collection

Embed Size (px)

Citation preview

Page 1: Designing for garbage collection

Designing for Garbage Collection

Gregg DonovanSenior Software Engineer

Etsy.com

Wednesday, July 31, 13

Page 2: Designing for garbage collection

3.5 Years Search Engineering at Etsy.com

5 years Search & Web Engineeringat TheLadders.com

Wednesday, July 31, 13

Page 3: Designing for garbage collection

Wednesday, July 31, 13

Page 4: Designing for garbage collection

25+ million members

Wednesday, July 31, 13

Page 5: Designing for garbage collection

20+ million items

Wednesday, July 31, 13

Page 6: Designing for garbage collection

900k+ active sellers

Wednesday, July 31, 13

Page 7: Designing for garbage collection

60+ million monthly unique visitors

Wednesday, July 31, 13

Page 8: Designing for garbage collection

Wednesday, July 31, 13

Page 9: Designing for garbage collection

Wednesday, July 31, 13

Page 10: Designing for garbage collection

Wednesday, July 31, 13

Page 11: Designing for garbage collection

Wednesday, July 31, 13

Page 12: Designing for garbage collection

Wednesday, July 31, 13

Page 13: Designing for garbage collection

Wednesday, July 31, 13

Page 14: Designing for garbage collection

Wednesday, July 31, 13

Page 15: Designing for garbage collection

CodeAsCraft.etsy.comWednesday, July 31, 13

Page 16: Designing for garbage collection

Wednesday, July 31, 13

Page 17: Designing for garbage collection

Understanding GC

Wednesday, July 31, 13

Page 18: Designing for garbage collection

Understanding GCMonitoring GC

Wednesday, July 31, 13

Page 19: Designing for garbage collection

Understanding GCMonitoring GC

Debugging Memory Leaks

Wednesday, July 31, 13

Page 20: Designing for garbage collection

Understanding GCMonitoring GC

Debugging Memory LeaksDesign for Partial Availability

Wednesday, July 31, 13

Page 21: Designing for garbage collection

Wednesday, July 31, 13

Page 22: Designing for garbage collection

public class BuzzwordDetector { static String[] prefixes = { "synergy", "win-win" }; static String[] myArgs = { "clown synergy", "gorilla win-wins", "whamee" };

public static void main(String[] args) { args = myArgs;

int buzzwords = 0; for (int i = 0; i < args.length; i++) { String lc = args[i].toLowerCase(); for (int j = 0; j < prefixes.length; j++) { if (lc.contains(prefixes[j])) { buzzwords++; } } } System.out.println("Found " + buzzwords + " buzzwords"); }}

Wednesday, July 31, 13

Page 23: Designing for garbage collection

New(): ref <- allocate() if ref = null /* Heap is full */ collect() ref <- allocate() if ref = null /* Heap is still full */ error "Out of memory" return ref atomic collect(): markFromRoots() sweep(HeapStart, HeapEnd)

From Garbage Collection HandbookWednesday, July 31, 13

Page 24: Designing for garbage collection

markFromRoots(): initialise(worklist) for each fld in Roots ref <- *fld if ref != null && not isMarked(ref) setMarked(ref) add(worklist, ref) mark() initialise(worklist): worklist <- empty mark(): while not isEmpty(worklist) ref <- remove(worklist) /* ref is marked */ for each fld in Pointers(ref) child <- *fld if (child != null && not isMarked(child) setMarked(child) add(worklist, child)

From Garbage Collection HandbookWednesday, July 31, 13

Page 25: Designing for garbage collection

Trivia: Who invented the first GC and Mark-and-Sweep?

Wednesday, July 31, 13

Page 26: Designing for garbage collection

Weak Generational Hypothesis

Wednesday, July 31, 13

Page 27: Designing for garbage collection

Where do objects in your application live?

Wednesday, July 31, 13

Page 28: Designing for garbage collection

GC Terminology:Concurrent vs Parallel

Wednesday, July 31, 13

Page 29: Designing for garbage collection

JVM Collectors

Wednesday, July 31, 13

Page 30: Designing for garbage collection

Serial

Wednesday, July 31, 13

Page 31: Designing for garbage collection

Throughput

Wednesday, July 31, 13

Page 32: Designing for garbage collection

CMS

Wednesday, July 31, 13

Page 33: Designing for garbage collection

Garbage First (G1)

Wednesday, July 31, 13

Page 34: Designing for garbage collection

Continuously Concurrent Compacting Collector (C4)

Wednesday, July 31, 13

Page 35: Designing for garbage collection

IBM, Dalvik, etc.?

Wednesday, July 31, 13

Page 36: Designing for garbage collection

Why Throughput?

Wednesday, July 31, 13

Page 37: Designing for garbage collection

Questions so far?

Wednesday, July 31, 13

Page 38: Designing for garbage collection

Monitoring

Wednesday, July 31, 13

Page 39: Designing for garbage collection

GC time per request

Wednesday, July 31, 13

Page 40: Designing for garbage collection

...import java.lang.management.*;...

public static long getCollectionTime() { long collectionTime = 0; for (GarbageCollectorMXBean mbean : ManagementFactory.getGarbageCollectorMXBeans()) { collectionTime += mbean.getCollectionTime(); } return collectionTime; }

Available via JMX

Wednesday, July 31, 13

Page 41: Designing for garbage collection

Wednesday, July 31, 13

Page 42: Designing for garbage collection

Visual GC

Wednesday, July 31, 13

Page 43: Designing for garbage collection

Wednesday, July 31, 13

Page 44: Designing for garbage collection

Wednesday, July 31, 13

Page 45: Designing for garbage collection

export GC_DEBUG="-verbose:gc \-XX:+PrintGCDateStamps \-XX:+PrintHeapAtGC \-XX:+PrintGCApplicationStoppedTime \-XX:+PrintGCApplicationConcurrentTime \-XX:+PrintAdaptiveSizePolicy \-XX:AdaptiveSizePolicyOutputInterval=1 \-XX:+PrintTenuringDistribution \-XX:+PrintGCDetails \-XX:+PrintCommandLineFlags \-XX:+PrintSafepointStatistics \-Xloggc:/var/log/search/gc.log"

Wednesday, July 31, 13

Page 46: Designing for garbage collection

Wednesday, July 31, 13

Page 47: Designing for garbage collection

2013-04-08T20:14:00.162+0000: 4197.791: [Full GCAdaptiveSizeStart: 4206.559 collection: 213 PSAdaptiveSizePolicy::compute_generation_free_space limits: desired_promo_size: 9927789154 promo_limit: 8321564672 free_in_old_gen: 4096 max_old_gen_size: 22190686208 avg_old_live: 22190682112AdaptiveSizePolicy::compute_generation_free_space limits: desired_eden_size: 9712028790 old_eden_size: 8321564672 eden_limit: 8321564672 cur_eden: 8321564672 max_eden_size: 8321564672 avg_young_live: 7340911616AdaptiveSizePolicy::compute_generation_free_space: gc time limit gc_cost: 1.000000 GCTimeLimit: 98PSAdaptiveSizePolicy::compute_generation_free_space: costs minor_time: 0.167092 major_cost: 0.965075 mutator_cost: 0.000000 throughput_goal: 0.990000 live_space: 29859940352 free_space: 16643129344 old_promo_size: 8321564672 old_eden_size: 8321564672 desired_promo_size: 8321564672 desired_eden_size: 8321564672AdaptiveSizeStop: collection: 213 [PSYoungGen: 8126528K->7599356K(9480896K)] [ParOldGen: 21670588K->21670588K(21670592K)] 29797116K->29269944K(31151488K) [PSPermGen: 58516K->58512K(65536K)], 8.7690670 secs] [Times: user=137.36 sys=0.03, real=8.77 secs] Heap after GC invocations=213 (full 210): PSYoungGen total 9480896K, used 7599356K [0x00007fee47ab0000, 0x00007ff0dd000000, 0x00007ff0dd000000) eden space 8126528K, 93% used [0x00007fee47ab0000,0x00007ff0177ef080,0x00007ff037ac0000) from space 1354368K, 0% used [0x00007ff037ac0000,0x00007ff037ac0000,0x00007ff08a560000) to space 1354368K, 0% used [0x00007ff08a560000,0x00007ff08a560000,0x00007ff0dd000000) ParOldGen total 21670592K, used 21670588K [0x00007fe91d000000, 0x00007fee47ab0000, 0x00007fee47ab0000) object space 21670592K, 99% used [0x00007fe91d000000,0x00007fee47aaf0e0,0x00007fee47ab0000) PSPermGen total 65536K, used 58512K [0x00007fe915000000, 0x00007fe919000000, 0x00007fe91d000000) object space 65536K, 89% used [0x00007fe915000000,0x00007fe918924130,0x00007fe919000000)}

Wednesday, July 31, 13

Page 48: Designing for garbage collection

GC Log Analyzers?

GCHisto

GCViewer

garbagecat

github.com/Netflix/gcvizWednesday, July 31, 13

Page 49: Designing for garbage collection

Graphing with Logster

github.com/etsy/logster

Wednesday, July 31, 13

Page 50: Designing for garbage collection

Wednesday, July 31, 13

Page 51: Designing for garbage collection

GC Dashboardgithub.com/etsy/dashboard

Wednesday, July 31, 13

Page 52: Designing for garbage collection

Wednesday, July 31, 13

Page 53: Designing for garbage collection

YourKit.com

Wednesday, July 31, 13

Page 54: Designing for garbage collection

Designing for Partial Availability

Wednesday, July 31, 13

Page 55: Designing for garbage collection

JVMTI GC Hook?

Wednesday, July 31, 13

Page 56: Designing for garbage collection

How can a client ignore GC-ing hosts?

Wednesday, July 31, 13

Page 57: Designing for garbage collection

Server lies to clients about availability

TCP socket receive buffer

TCP write buffer

Wednesday, July 31, 13

Page 58: Designing for garbage collection

“Banner” protocol1. Connect via TCP

2. Wait ~1-10ms

3. Either receive magic four byte header or try another host

4. Only send query after receiving header from server

Wednesday, July 31, 13

Page 59: Designing for garbage collection

0xC0DEA5CF

Wednesday, July 31, 13

Page 60: Designing for garbage collection

public function open() { $this->handle_ = @fsockopen($this->host_, $this->port_, $errno, $errstr, $this->connectTimeout_ / 1000.0); try { stream_set_timeout($this->handle_, 0, $banner_timeout * 1000); $read_start = microtime(true); $data = $this->readAll(4); $read_time = (microtime(true) - $read_start) * 1000; // micros to millis $arr = unpack('N', $data); $value = $arr[1]; if ($value !== 0xC0DEA5CF) { StatsD::increment("search.baddata.{$short_hostname}.{$this->getPort()}"); throw new TTransportException("[$value] does match banner [0xC0DEA5CF]"); } } catch (Exception $e) { $this->close(); // this won't necessarily be closed by clients throw new TTransportException($message, self::BANNER_TIMEOUT_CODE); }}

Wednesday, July 31, 13

Page 61: Designing for garbage collection

private static class BannerSendingTProcessorFactory extends TProcessorFactory { private final TProcessor base; public BannerSendingTProcessorFactory(TProcessor base) { super(base); this.base = base; }

@Override public TProcessor getProcessor(TTransport trans) { return new BannerTProcessor(base, (TSocket) trans); }}

private static final class BannerTProcessor implements TProcessor { private final TProcessor base; private final TSocket tsocket;

private BannerTProcessor(TProcessor base, TSocket tsocket) { this.base = checkNotNull(base); this.tsocket = checkNotNull(tsocket); }

@Override public boolean process(TProtocol in, TProtocol out) throws TException { this.tsocket.write(TBannerUtil.BANNER, 0, 4); this.tsocket.flush(); return this.base.process(in, out); }}

Wednesday, July 31, 13

Page 62: Designing for garbage collection

What if GC happens mid-request?

Wednesday, July 31, 13

Page 63: Designing for garbage collection

Backup requests

Wednesday, July 31, 13

Page 64: Designing for garbage collection

Jeff Dean: Achieving Rapid Response Time in Large

Online Services

Wednesday, July 31, 13

Page 65: Designing for garbage collection

Sharding?

Naive approach: only as fast as the slowest shard.

Wednesday, July 31, 13

Page 66: Designing for garbage collection

“Make a reliable whole out of unreliable parts.”

Wednesday, July 31, 13

Page 67: Designing for garbage collection

Memory Leaks

Wednesday, July 31, 13

Page 68: Designing for garbage collection

SolrIndexSearcher generation marking with YourKit triggers

Wednesday, July 31, 13

Page 69: Designing for garbage collection

Wednesday, July 31, 13

Page 70: Designing for garbage collection

Questions so far?

Wednesday, July 31, 13

Page 71: Designing for garbage collection

Miscellaneous Topics

Wednesday, July 31, 13

Page 72: Designing for garbage collection

System.gc()?

Wednesday, July 31, 13

Page 73: Designing for garbage collection

-XX:+UseCompressedOops

Wednesday, July 31, 13

Page 74: Designing for garbage collection

-XX:+UseNUMA

Wednesday, July 31, 13

Page 75: Designing for garbage collection

Paging

Wednesday, July 31, 13

Page 76: Designing for garbage collection

#!/usr/bin/env bash

# This script is designed to be run every minute by cron.

host=$(hostname -s)

psout=$(ps h -p `cat /var/run/etsy-search.pid` -o min_flt,maj_flt 2>/dev/null)min_flt=$(echo $psout | awk '{print $1}') # minor page faultsmaj_flt=$(echo $psout | awk '{print $2}') # major page faults

epoch_s=$(date +%s)

echo -e "search_memstats.$host.etsy-search.min_flt\t${min_flt:-0}\t$epoch_s" | nc graphite.etsycorp.com 2003echo -e "search_memstats.$host.etsy-search.maj_flt\t${maj_flt:-0}\t$epoch_s" | nc graphite.etsycorp.com 2003

Wednesday, July 31, 13

Page 77: Designing for garbage collection

Solution 1: Buy more RAM

Ideally enough RAM to:Keep data in OS file buffersAND ensure no paging of VM memory AND whatever else happens on the box

~$5-10/GB

Wednesday, July 31, 13

Page 78: Designing for garbage collection

echo “0” > /proc/sys/vm/swappiness

Wednesday, July 31, 13

Page 79: Designing for garbage collection

mlock()/mlockall()

github.com/LucidWorks/mlockall-agent

Wednesday, July 31, 13

Page 80: Designing for garbage collection

echo “-17” > /proc/$PID/oom_adj

Mercy from the OOM Killer

Wednesday, July 31, 13

Page 81: Designing for garbage collection

Huge Pages

Wednesday, July 31, 13

Page 82: Designing for garbage collection

-XX:+AlwaysPreTouch

Wednesday, July 31, 13

Page 83: Designing for garbage collection

Future Directions

Wednesday, July 31, 13

Page 84: Designing for garbage collection

Many small VMs instead of one large VM

microsharding

Wednesday, July 31, 13

Page 85: Designing for garbage collection

Off-heap memory with sun.misc.Unsafe?

Wednesday, July 31, 13

Page 86: Designing for garbage collection

Try G1 again

Wednesday, July 31, 13

Page 87: Designing for garbage collection

Try C4 again

Wednesday, July 31, 13

Page 88: Designing for garbage collection

Resources

Wednesday, July 31, 13

Page 89: Designing for garbage collection

gchandbook.org

Wednesday, July 31, 13

Page 90: Designing for garbage collection

Wednesday, July 31, 13

Page 91: Designing for garbage collection

bit.ly/mmgcb

Mark Miller’s GC Bootcamp

Wednesday, July 31, 13

Page 92: Designing for garbage collection

bit.ly/giltene

Gil Tene: Understanding Java Garbage Collection

Wednesday, July 31, 13

Page 93: Designing for garbage collection

bit.ly/cpumemory

Ulrich Drepper: What Every Programmer Should Know About Memory

Wednesday, July 31, 13

Page 94: Designing for garbage collection

github.com/pingtimeout/jvm-optionsWednesday, July 31, 13

Page 95: Designing for garbage collection

Read the JVM Source(Not as scary as it sounds.)

hg.openjdk.java.net/jdk7/jdk7

Wednesday, July 31, 13

Page 96: Designing for garbage collection

Mechanical Sympathy Google Group

bit.ly/mechsym

Wednesday, July 31, 13

Page 97: Designing for garbage collection

Questions?

Thanks for coming!

Gregg [email protected]

Wednesday, July 31, 13