Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Detecting and Fixing Memory-Related PerformanceProblems in Managed Languages
Lu Fang
Committee: Prof. Guoqing Xu (Chair), Prof. Alex Nicolau, Prof. Brian Demsky
University of California, Irvine
May 26, 2017, Irvine, CA, USA
Lu Fang (UC Irvine) Final Defense May 26, 2017 1 / 51
Performance Problems in Real World
Lu Fang (UC Irvine) Final Defense May 26, 2017 2 / 51
Performance Problems in Real World
Lu Fang (UC Irvine) Final Defense May 26, 2017 2 / 51
Performance Problems in Real World
Lu Fang (UC Irvine) Final Defense May 26, 2017 2 / 51
Performance Problems in Real World
Many distributed systems, such as Spark, Hadoop,
also suffer from performance problems
java.lang.OutOfMemoryError: Java heap space
Lu Fang (UC Irvine) Final Defense May 26, 2017 2 / 51
Performance Problems
Commonly exist in real world applicationsI Single-machine apps, such as Eclipse, IE
I Traditional databases, web servers, such as MySQL, Tomcat
I Big Data systems, such as Hadoop, Spark
Further exacerbated by managed languagesI Such as Java, C#
I Big overhead introduced by automatic memory management
Cannot be optimized by compilersI Cannot understand the deep semantics
I Cannot guarantee the correctness
Lu Fang (UC Irvine) Final Defense May 26, 2017 3 / 51
Performance Problems
Commonly exist in real world applicationsI Single-machine apps, such as Eclipse, IE
I Traditional databases, web servers, such as MySQL, Tomcat
I Big Data systems, such as Hadoop, Spark
Further exacerbated by managed languagesI Such as Java, C#
I Big overhead introduced by automatic memory management
Cannot be optimized by compilersI Cannot understand the deep semantics
I Cannot guarantee the correctness
Lu Fang (UC Irvine) Final Defense May 26, 2017 3 / 51
Performance Problems
Commonly exist in real world applicationsI Single-machine apps, such as Eclipse, IE
I Traditional databases, web servers, such as MySQL, Tomcat
I Big Data systems, such as Hadoop, Spark
Further exacerbated by managed languagesI Such as Java, C#
I Big overhead introduced by automatic memory management
Cannot be optimized by compilersI Cannot understand the deep semantics
I Cannot guarantee the correctness
Lu Fang (UC Irvine) Final Defense May 26, 2017 3 / 51
Performance Problems
Difficult to find, especially during developmentI Invisible effect
I Often escape to production runs
Difficult to fixI Large systems are complicated
I Enough diagnostic information is necessary
I Problems may be located deeply in systems
Can lead to severe problemsI Scalability reductions
I Programs hang and crash
I Financial losses
Lu Fang (UC Irvine) Final Defense May 26, 2017 4 / 51
Performance Problems
Difficult to find, especially during developmentI Invisible effect
I Often escape to production runs
Difficult to fixI Large systems are complicated
I Enough diagnostic information is necessary
I Problems may be located deeply in systems
Can lead to severe problemsI Scalability reductions
I Programs hang and crash
I Financial losses
Lu Fang (UC Irvine) Final Defense May 26, 2017 4 / 51
Performance Problems
Difficult to find, especially during developmentI Invisible effect
I Often escape to production runs
Difficult to fixI Large systems are complicated
I Enough diagnostic information is necessary
I Problems may be located deeply in systems
Can lead to severe problemsI Scalability reductions
I Programs hang and crash
I Financial losses
Lu Fang (UC Irvine) Final Defense May 26, 2017 4 / 51
Existing Solutions
Many solutions are proposedI Pattern-based
I Mining-based
I Learning-based
Most are postmortem debugging techniquesI Require user logs/input to trigger bugs
I Bugs already escape to production runs
Lu Fang (UC Irvine) Final Defense May 26, 2017 5 / 51
Existing Solutions
Many solutions are proposedI Pattern-based
I Mining-based
I Learning-based
Most are postmortem debugging techniquesI Require user logs/input to trigger bugs
I Bugs already escape to production runs
Lu Fang (UC Irvine) Final Defense May 26, 2017 5 / 51
Drawbacks in Existing Works
→ Our Solutions
I Lacking a general way to describe problems
→ Instrumentation Specification Language (ISL)
I Cannot detect problems under small workload
→ PerfBlower
I Lacking a systematic approach to tune memory usagein data-intensive systems
→ ITask
Lu Fang (UC Irvine) Final Defense May 26, 2017 6 / 51
Drawbacks in Existing Works → Our Solutions
I Lacking a general way to describe problems→ Instrumentation Specification Language (ISL)
I Cannot detect problems under small workload
→ PerfBlower
I Lacking a systematic approach to tune memory usagein data-intensive systems
→ ITask
Lu Fang (UC Irvine) Final Defense May 26, 2017 6 / 51
Drawbacks in Existing Works → Our Solutions
I Lacking a general way to describe problems→ Instrumentation Specification Language (ISL)
I Cannot detect problems under small workload→ PerfBlower
I Lacking a systematic approach to tune memory usagein data-intensive systems
→ ITask
Lu Fang (UC Irvine) Final Defense May 26, 2017 6 / 51
Drawbacks in Existing Works → Our Solutions
I Lacking a general way to describe problems→ Instrumentation Specification Language (ISL)
I Cannot detect problems under small workload→ PerfBlower
I Lacking a systematic approach to tune memory usagein data-intensive systems→ ITask
Lu Fang (UC Irvine) Final Defense May 26, 2017 6 / 51
Lu Fang, Liang Dou, Guoqing Xu
PerfBlower: Quickly Detecting Memory-Related Performance Problems viaAmplification
ECOOP’15
Lu Fang (UC Irvine) Final Defense May 26, 2017 7 / 51
Instrumentation Specification Language
I Motivation 1: an easy way to develop new detectors
I Motivation 2: detect the problems with small effects
Lu Fang (UC Irvine) Final Defense May 26, 2017 8 / 51
Instrumentation Specification Language
I Motivation 1: an easy way to develop new detectors
I Motivation 2: detect the problems with small effects
Lu Fang (UC Irvine) Final Defense May 26, 2017 8 / 51
Instrumentation Specification Language
I Focus on problems with observable heap symptoms
I Users define symptoms/counter-evidence in events
I Two important actions: amplify and deamplify
Lu Fang (UC Irvine) Final Defense May 26, 2017 9 / 51
Amplification and Deamplification
amplify: increases the penalty
deamplify: resets the penalty
Virtual space overhead (VSO)I VSO =
Sumpenalty+Sizelive heap
Sizelive heap
I Reflects the severity on 2 dementions: Time and Size
Lu Fang (UC Irvine) Final Defense May 26, 2017 10 / 51
Amplification and Deamplification
amplify: increases the penalty
deamplify: resets the penalty
Virtual space overhead (VSO)I VSO =
Sumpenalty+Sizelive heap
Sizelive heap
I Reflects the severity on 2 dementions: Time and Size
Lu Fang (UC Irvine) Final Defense May 26, 2017 10 / 51
Amplification and Deamplification
amplify: increases the penalty
deamplify: resets the penalty
Virtual space overhead (VSO)I VSO =
Sumpenalty+Sizelive heap
Sizelive heap
I Reflects the severity on 2 dementions: Time and Size
Lu Fang (UC Irvine) Final Defense May 26, 2017 10 / 51
An ISL Program Example
1 Context defines the type
2 History of partition instance
3 Heap partitioning
4 Tracked objects
5 The actions on events
Detecting Leaking Object ArraysContext TypeContext {
type = ''java.lang.Object[]'';}
History UseHistory {
type = ''boolean'';size = 10;
}
Partition AllPartition {
kind = all;
history = UseHistory;
}
TObject TrackedObject {
include = TypeContext;
partition = AllPartition;
instance boolean useFlag = false;
}
Event on_rw(Object o, Field f, Word w1, Word w2) {
o.useFlag = true;
deamplify(o);
}
Event on_reachedOnce(Object o) {
UseHistory h = getHistory(o);
h.update(o.useFlag);
if (h.isFull() && !h.contains(true)) amplify(o);
o.useFlag = false;
}
Lu Fang (UC Irvine) Final Defense May 26, 2017 11 / 51
An ISL Program Example
1 Context defines the type
2 History of partition instance
3 Heap partitioning
4 Tracked objects
5 The actions on events
Detecting Leaking Object ArraysContext TypeContext {
type = ''java.lang.Object[]'';}
History UseHistory {
type = ''boolean'';size = 10;
}
Partition AllPartition {
kind = all;
history = UseHistory;
}
TObject TrackedObject {
include = TypeContext;
partition = AllPartition;
instance boolean useFlag = false;
}
Event on_rw(Object o, Field f, Word w1, Word w2) {
o.useFlag = true;
deamplify(o);
}
Event on_reachedOnce(Object o) {
UseHistory h = getHistory(o);
h.update(o.useFlag);
if (h.isFull() && !h.contains(true)) amplify(o);
o.useFlag = false;
}
Lu Fang (UC Irvine) Final Defense May 26, 2017 11 / 51
An ISL Program Example
1 Context defines the type
2 History of partition instance
3 Heap partitioning
4 Tracked objects
5 The actions on events
Detecting Leaking Object ArraysContext TypeContext {
type = ''java.lang.Object[]'';}
History UseHistory {
type = ''boolean'';size = 10;
}
Partition AllPartition {
kind = all;
history = UseHistory;
}
TObject TrackedObject {
include = TypeContext;
partition = AllPartition;
instance boolean useFlag = false;
}
Event on_rw(Object o, Field f, Word w1, Word w2) {
o.useFlag = true;
deamplify(o);
}
Event on_reachedOnce(Object o) {
UseHistory h = getHistory(o);
h.update(o.useFlag);
if (h.isFull() && !h.contains(true)) amplify(o);
o.useFlag = false;
}
Lu Fang (UC Irvine) Final Defense May 26, 2017 11 / 51
PerfBlower
A general performance testing framework
Supports ISL
Can capture problems with small effects
Reports reference path to problematic objects
Lu Fang (UC Irvine) Final Defense May 26, 2017 12 / 51
PerfBlower
Lu Fang (UC Irvine) Final Defense May 26, 2017 13 / 51
Heap Reference Path
1 Object leak is referenced by array
2 Knowing allocation site 1 is notenough
3 Key point: array keeps a reference toleak, which can be shown by leak ’sheap reference path
Leak is reference by whom?
Object[] array = new Object[10];
// Allocation site 1, creating the leak.
Object leak = new Object();
// Object leak is referenced by array
array[0] = leak;
// Keep using Object leak
...
// ... Never use leak again.
// However, leak is referenced by array,
// GC cannot reclaim object leak.
Lu Fang (UC Irvine) Final Defense May 26, 2017 14 / 51
Heap Reference Path
1 Object leak is referenced by array
2 Knowing allocation site 1 is notenough
3 Key point: array keeps a reference toleak, which can be shown by leak ’sheap reference path
Leak is reference by whom?
Object[] array = new Object[10];
// Allocation site 1, creating the leak.
Object leak = new Object();
// Object leak is referenced by array
array[0] = leak;
// Keep using Object leak
...
// ... Never use leak again.
// However, leak is referenced by array,
// GC cannot reclaim object leak.
Lu Fang (UC Irvine) Final Defense May 26, 2017 14 / 51
Heap Reference Path
1 Object leak is referenced by array
2 Knowing allocation site 1 is notenough
3 Key point: array keeps a reference toleak, which can be shown by leak ’sheap reference path
Leak is reference by whom?
Object[] array = new Object[10];
// Allocation site 1, creating the leak.
Object leak = new Object();
// Object leak is referenced by array
array[0] = leak;
// Keep using Object leak
...
// ... Never use leak again.
// However, leak is referenced by array,
// GC cannot reclaim object leak.
Lu Fang (UC Irvine) Final Defense May 26, 2017 14 / 51
Mirroring the reference path
Original Objects
Mirror Objects
Mirroring Ref. Path
Stack stack = new stack;
// Allocation site 1, creating the leak.
Object obj = new Object();
// stack.elements[0] = leak
stack.push();
// Keep using Object leak
...
// ... Never use obj again
// However, leak is referenced by stack,
// GC cannot reclaim object leak.
Lu Fang (UC Irvine) Final Defense May 26, 2017 15 / 51
Experiments
Three detectorsI Memory leak amplifier
I Under-utilized container amplifier
I Over-populated container amplifier
DaCapo benchmarks with 500MB heap
Lu Fang (UC Irvine) Final Defense May 26, 2017 16 / 51
Memory Leak Amplifier
VSOs Reported by Memory Leak Amplifier
Programs with confirmed unknown leaks
0
10
20
30
40
50
60
antlr bloat eclipse fop luindexlusearch pmd hsqldb jython xalan
VSOs caused by confirmed memory leaks
Basic VSOs
VSO is large The program is likely to have leaks
Lu Fang (UC Irvine) Final Defense May 26, 2017 17 / 51
Under-Utilized Container Amplifier
0
10
20
30
40
50
60
antlr bloat eclipse fop luindexlusearch pmd hsqldb jython xalan
VSOs caused by confirmed under-utilized containers
Basic VSOs
VSOs Reported by Under-Utilized Container Amplifier
Programs with confirmed unknown UUCs
VSO is large The program is very likely to have UUCs
Lu Fang (UC Irvine) Final Defense May 26, 2017 18 / 51
Over-Populated Container Amplifier
0
5
10
15
20
25
30
antlr bloat eclipse fop luindexlusearch pmd xalan hsqldb jython
VSOs caused by confirmed over-populated containers
Basic VSOs
VSOs Reported by Over-Populated Container Amplifier
Programs with confirmed unknown OPCs
VSO is large The program is very likely to have OPCs
Lu Fang (UC Irvine) Final Defense May 26, 2017 19 / 51
Performance Improvements
Benchmark Space Reduction Time Reduction
xalan-leak 25.4% 14.6%
jython-leak 24.3% 7.4%
hsqldb-leak 15.6% 3.1%
xalan-UUC 5.4% 34.1%
jython-UUC 19.1% 1.1%
hsqldb-UUC 17.4% 0.7%
hsqldb-OPC 14.9% 2.9%
Lu Fang (UC Irvine) Final Defense May 26, 2017 20 / 51
The Effectiveness of PerfBlower
VSOs indicate the existence of problemsI 8 unknown problems are detected
I All reports contain useful diagnostic information
Low overheadI Space overheads are 1.23–1.25×I Time overheads are 2.39–2.74×
Lu Fang (UC Irvine) Final Defense May 26, 2017 21 / 51
Fixing Performance Problems
Fixing performance problems is hardI Enough information is necessary
I Have to understand the logic of the system
I The problem exists deeply in the system
Memory pressureI A common performance problem in data-paralle systems
Lu Fang (UC Irvine) Final Defense May 26, 2017 22 / 51
Lu Fang, Khanh Nguyen, Guoqing Xu, Brian Demsky, Shan Lu
Interruptible Tasks: Treating Memory Pressure As Interrupts for HighlyScalable Data-Parallel Programs
SOSP’15
Lu Fang (UC Irvine) Final Defense May 26, 2017 23 / 51
Memory Pressure in Data-Parallel Systems
Data-parallel systemI Input data are divided into independent partitions
I Many popular big data systems
B Memory pressure on single nodes
Our studyI Search “out of memory” and “data parallel” in StackOverflow
I We have collected 126 related problems
Lu Fang (UC Irvine) Final Defense May 26, 2017 24 / 51
Memory Pressure in Data-Parallel Systems
Data-parallel systemI Input data are divided into independent partitions
I Many popular big data systems
B Memory pressure on single nodes
Our studyI Search “out of memory” and “data parallel” in StackOverflow
I We have collected 126 related problems
Lu Fang (UC Irvine) Final Defense May 26, 2017 24 / 51
Memory Pressure in the Real World
Memory pressure on individual nodesI Executions push heap limit (using managed language)
I Data-parallel systems struggle for memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
OutOfMemoryError point
Long and useless GC
CRASH OutOfMemory Error SLOW Huge GC effort
Lu Fang (UC Irvine) Final Defense May 26, 2017 25 / 51
Memory Pressure in the Real World
Memory pressure on individual nodesI Executions push heap limit (using managed language)
I Data-parallel systems struggle for memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
OutOfMemoryError point
Long and useless GC
CRASH OutOfMemory Error
SLOW Huge GC effort
Lu Fang (UC Irvine) Final Defense May 26, 2017 25 / 51
Memory Pressure in the Real World
Memory pressure on individual nodesI Executions push heap limit (using managed language)
I Data-parallel systems struggle for memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
OutOfMemoryError point
Long and useless GC
CRASH OutOfMemory Error SLOW Huge GC effort
Lu Fang (UC Irvine) Final Defense May 26, 2017 25 / 51
Root Cause 1: Hot Keys
Key-value pairs
Popular keys have many associated values
Case study (from StackOverflow)I Process StackOverflow posts
I Long and popular posts
I Many tasks process long and popular posts
Lu Fang (UC Irvine) Final Defense May 26, 2017 26 / 51
Root Cause 1: Hot Keys
Key-value pairs
Popular keys have many associated values
Case study (from StackOverflow)I Process StackOverflow posts
I Long and popular posts
I Many tasks process long and popular posts
Lu Fang (UC Irvine) Final Defense May 26, 2017 26 / 51
Root Cause 1: Hot Keys
Key-value pairs
Popular keys have many associated values
Case study (from StackOverflow)I Process StackOverflow posts
I Long and popular posts
I Many tasks process long and popular posts
Lu Fang (UC Irvine) Final Defense May 26, 2017 26 / 51
Root Cause 2: Large Intermediate Results
Temporary data structures
Case study (from StackOverflow)I Use NLP library to process customers’ reviews
I Some reviews are quite long
I NLP library creates giant temporary data structures for longreviews
Lu Fang (UC Irvine) Final Defense May 26, 2017 27 / 51
Root Cause 2: Large Intermediate Results
Temporary data structures
Case study (from StackOverflow)I Use NLP library to process customers’ reviews
I Some reviews are quite long
I NLP library creates giant temporary data structures for longreviews
Lu Fang (UC Irvine) Final Defense May 26, 2017 27 / 51
Existing Solutions
More memory? Not really!I Data double in size every two years, [http://goo.gl/tM92i0]
I Memory double in size every three years, [http://goo.gl/50Rrgk]
Application-level solutionsI Configuration tuning
I Skew fixing
System-level solutionsI Cluster-wide resource manager, such as YARN
We need a systematic and effective solution!
Lu Fang (UC Irvine) Final Defense May 26, 2017 28 / 51
Existing Solutions
More memory? Not really!I Data double in size every two years, [http://goo.gl/tM92i0]
I Memory double in size every three years, [http://goo.gl/50Rrgk]
Application-level solutionsI Configuration tuning
I Skew fixing
System-level solutionsI Cluster-wide resource manager, such as YARN
We need a systematic and effective solution!
Lu Fang (UC Irvine) Final Defense May 26, 2017 28 / 51
Existing Solutions
More memory? Not really!I Data double in size every two years, [http://goo.gl/tM92i0]
I Memory double in size every three years, [http://goo.gl/50Rrgk]
Application-level solutionsI Configuration tuning
I Skew fixing
System-level solutionsI Cluster-wide resource manager, such as YARN
We need a systematic and effective solution!
Lu Fang (UC Irvine) Final Defense May 26, 2017 28 / 51
Existing Solutions
More memory? Not really!I Data double in size every two years, [http://goo.gl/tM92i0]
I Memory double in size every three years, [http://goo.gl/50Rrgk]
Application-level solutionsI Configuration tuning
I Skew fixing
System-level solutionsI Cluster-wide resource manager, such as YARN
We need a systematic and effective solution!
Lu Fang (UC Irvine) Final Defense May 26, 2017 28 / 51
Our Solution
Interruptible Task: treat memory pressure as interrupt
Dynamically change parallelism degree
Lu Fang (UC Irvine) Final Defense May 26, 2017 29 / 51
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Program starts with multiple tasks
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Program pushes heap limit
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Long and useless GC
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
OutOfMemory Error
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Long and useless GCs are detected
Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Killed Killed
Long and useless GCs are detected, start interrupting tasks
Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Release the memory, memory pressure is gone
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Local Data Structures
Processed Input
Unprocessed Input
Output
Killed Consumed MemoryKilled
Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Release the memory, memory pressure is gone
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Local Data Structures
Processed Input
Unprocessed Input
Output
Killed Consumed Memory
Released
Killed
Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Release the memory, memory pressure is gone
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Local Data Structures
Processed Input
Unprocessed Input
Output
Killed Consumed Memory
Released
Killed
Released
Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Release the memory, memory pressure is gone
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Local Data Structures
Processed Input
Unprocessed Input
Output
Killed Consumed Memory
Released
Kept in memory,
can be serialized
Killed
Released
Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Release the memory, memory pressure is gone
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Local Data Structures
Processed Input
Unprocessed Input
Output
Killed Consumed Memory
Released
Kept in memory,
can be serialized
Final result: push
out and released
Killed
Released
Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Release the memory, memory pressure is gone
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Local Data Structures
Processed Input
Unprocessed Input
Output
Killed Consumed Memory
Released
Kept in memory,
can be serialized
Intermediate result: kept in memory, can be serialized
Final result: push
out and released
Killed
Released
Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
Program executes without memory pressure
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Killed Killed
Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51
Why Does Our Technique Help
Task
Consumed
Memory
Me
mo
ry c
on
su
mp
tio
n
Execution time
Heap size
If there is enough memory, increase parallelism degree
Task
Consumed
Memory
Task
Consumed
Memory
Task
Consumed
Memory
Killed Killed
Task
Consumed
Memory
Newly created
Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51
Making Task Interruptible Is Non-trivial
Task
Consumed
Memory
?
?
?
?
Consumed Memory
Released or Kept in
Memory?
Interrupted
Released or Kept in
Memory?
Released or Kept in
Memory?
Released or Kept in
Memory?
Lu Fang (UC Irvine) Final Defense May 26, 2017 31 / 51
Making Task Interruptible Is Non-trivial
Task
Consumed
Memory
?
?
?
?
Consumed Memory
Released or Kept in
Memory?
Interrupted
Released or Kept in
Memory?
Released or Kept in
Memory?
Released or Kept in
Memory?
Require Semantics
Lu Fang (UC Irvine) Final Defense May 26, 2017 31 / 51
Challenges
How to expose semantics
→ a programming model
How to interrupt/reactivate tasks
→ a runtime system
Lu Fang (UC Irvine) Final Defense May 26, 2017 32 / 51
Challenges
How to expose semantics → a programming model
How to interrupt/reactivate tasks
→ a runtime system
Lu Fang (UC Irvine) Final Defense May 26, 2017 32 / 51
Challenges
How to expose semantics → a programming model
How to interrupt/reactivate tasks → a runtime system
Lu Fang (UC Irvine) Final Defense May 26, 2017 32 / 51
Challenges
How to expose semantics → a programming model
How to interrupt/reactivate tasks → a runtime system
Lu Fang (UC Irvine) Final Defense May 26, 2017 32 / 51
The Programming Model
An ITask requires more semanticsI Separate processed and unprocessed input
I Specify how to serialize and deserialize
I Safely interrupt tasks
I Specify the actions when interrupt happens
I Merge the intermediate results
A unified representation of input/output
A definition of an interruptible task
Lu Fang (UC Irvine) Final Defense May 26, 2017 33 / 51
The Programming Model
An ITask requires more semanticsI Separate processed and unprocessed input
I Specify how to serialize and deserialize
I Safely interrupt tasks
I Specify the actions when interrupt happens
I Merge the intermediate results
A unified representation of input/output
A definition of an interruptible task
Lu Fang (UC Irvine) Final Defense May 26, 2017 33 / 51
Representing Input/Output as DataPartitions
I How to separate processed and unprocessed input
I How to serialize and deserialize the data
1 A cursor points to the firstunprocessed tuple
2 Users implement serialize anddeserialize methods
DataPartition Abstract Class// The DataPartition abstract class
abstract class DataPartition {
// Some fields and methods
...
// A cursor points to the first
// unprocessed tuple
int cursor;
// Serialize the DataPartition
abstract void serialize();
// Deserialize the DataPartition
abstract DataPartition deserialize();
}
Lu Fang (UC Irvine) Final Defense May 26, 2017 34 / 51
Representing Input/Output as DataPartitions
I How to separate processed and unprocessed input
I How to serialize and deserialize the data
1 A cursor points to the firstunprocessed tuple
2 Users implement serialize anddeserialize methods
DataPartition Abstract Class// The DataPartition abstract class
abstract class DataPartition {
// Some fields and methods
...
// A cursor points to the first
// unprocessed tuple
int cursor;
// Serialize the DataPartition
abstract void serialize();
// Deserialize the DataPartition
abstract DataPartition deserialize();
}
Lu Fang (UC Irvine) Final Defense May 26, 2017 34 / 51
Representing Input/Output as DataPartitions
I How to separate processed and unprocessed input
I How to serialize and deserialize the data
1 A cursor points to the firstunprocessed tuple
2 Users implement serialize anddeserialize methods
DataPartition Abstract Class// The DataPartition abstract class
abstract class DataPartition {
// Some fields and methods
...
// A cursor points to the first
// unprocessed tuple
int cursor;
// Serialize the DataPartition
abstract void serialize();
// Deserialize the DataPartition
abstract DataPartition deserialize();
}
Lu Fang (UC Irvine) Final Defense May 26, 2017 34 / 51
Defining an ITask
I What actions should be taken when interrupt happens
I How to safely interrupt a task
1 In interrupt, we define how to dealwith partial results
2 Tasks are always interrupted at thebeginning in the scaleLoop
ITask Abstract Class// The ITask interface in the library
abstract class ITask {
// Some methods
...
abstract void interrupt();
boolean scaleLoop(DataPartition dp) {
// Iterate dp, and process each tuple
while (dp.hasNext()) {
// If pressure occurs, interrupt
if (HasMemoryPressure()) {
interrupt();
return false;
}
process();
}
}
}
Lu Fang (UC Irvine) Final Defense May 26, 2017 35 / 51
Defining an ITask
I What actions should be taken when interrupt happens
I How to safely interrupt a task
1 In interrupt, we define how to dealwith partial results
2 Tasks are always interrupted at thebeginning in the scaleLoop
ITask Abstract Class// The ITask interface in the library
abstract class ITask {
// Some methods
...
abstract void interrupt();
boolean scaleLoop(DataPartition dp) {
// Iterate dp, and process each tuple
while (dp.hasNext()) {
// If pressure occurs, interrupt
if (HasMemoryPressure()) {
interrupt();
return false;
}
process();
}
}
}
Lu Fang (UC Irvine) Final Defense May 26, 2017 35 / 51
Defining an ITask
I What actions should be taken when interrupt happens
I How to safely interrupt a task
1 In interrupt, we define how to dealwith partial results
2 Tasks are always interrupted at thebeginning in the scaleLoop
ITask Abstract Class// The ITask interface in the library
abstract class ITask {
// Some methods
...
abstract void interrupt();
boolean scaleLoop(DataPartition dp) {
// Iterate dp, and process each tuple
while (dp.hasNext()) {
// If pressure occurs, interrupt
if (HasMemoryPressure()) {
interrupt();
return false;
}
process();
}
}
}
Lu Fang (UC Irvine) Final Defense May 26, 2017 35 / 51
Multiple Input for an ITask
I How to merge intermediate results
1 scaleLoop takes aPartitionIterator as input
MITask Abstract Class// The MITask interface in the library
abstract class MITask extends ITask{
// Most parts are the same as ITask
...
// The only difference
boolean scaleLoop(
PartitionIterator<DataPartition> i) {
// Iterate partitions through iterator
while (i.hasNext()) {
DataPartition dp = (DataPartition) i.next();
// Iterate all the data tuples in this partition
...
}
return true;
}
}
Lu Fang (UC Irvine) Final Defense May 26, 2017 36 / 51
Multiple Input for an ITask
I How to merge intermediate results
1 scaleLoop takes aPartitionIterator as input
MITask Abstract Class// The MITask interface in the library
abstract class MITask extends ITask{
// Most parts are the same as ITask
...
// The only difference
boolean scaleLoop(
PartitionIterator<DataPartition> i) {
// Iterate partitions through iterator
while (i.hasNext()) {
DataPartition dp = (DataPartition) i.next();
// Iterate all the data tuples in this partition
...
}
return true;
}
}
Lu Fang (UC Irvine) Final Defense May 26, 2017 36 / 51
ITask WordCount on Hyracks
Map Operator Map Operator Map Operator
Merge Operator
Reduce Operator
Final
...
Final Final
Shuffling
Reduce Operator ... Reduce Operator
1
HDFS
Merge Operator
1 1 n n
HDFS
MapOperatorclass MapOperator extends ITask
implements HyracksOperator {
void interrupt() {
// Push out final// results to shuffling...
}
// Some other fields and methods
...
}
Lu Fang (UC Irvine) Final Defense May 26, 2017 37 / 51
ITask WordCount on Hyracks
Map Operator Map Operator Map Operator
Merge Operator
Reduce Operator
Final
...
Final Final
Shuffling
Reduce Operator ... Reduce Operator
1
HDFS
Merge Operator
1 1 n n
HDFS
ReduceOperatorclass ReduceOperator extends ITask
implements HyracksOperator {
void interrupt() {
// Tag the results;// Output as intermediate// results...
}
// Some other fields and methods
...
}
Lu Fang (UC Irvine) Final Defense May 26, 2017 37 / 51
ITask WordCount on Hyracks
Map Operator Map Operator Map Operator
Merge Operator
Reduce Operator
Final
...
Final Final
Shuffling
Reduce Operator ... Reduce Operator
1
HDFS
Merge Operator
1 1 n n
HDFS
MergeOperatorclass MergeTask extends MITask {
void interrupt() {
// Tag the results;// Output as intermediate// results
}
// Some other fields and methods
...
}
Lu Fang (UC Irvine) Final Defense May 26, 2017 37 / 51
Challenges
How to expose semantics → a programming model
How to interrupt/activate tasks → a runtime system
Lu Fang (UC Irvine) Final Defense May 26, 2017 38 / 51
ITask Runtime System
Monitor
SchedulerPartition Manager
Data Partition
Data Partition
Data Partition
Data Partition
ITask Runtime System
Lu Fang (UC Irvine) Final Defense May 26, 2017 39 / 51
ITask Runtime System
Monitor
SchedulerPartition Manager
Data Partition
Data Partition
Data Partition
Data Partition
MemoryITask Runtime System
Grow/Reduce
Check Reduce
Lu Fang (UC Irvine) Final Defense May 26, 2017 39 / 51
ITask Runtime System
Monitor
SchedulerPartition Manager
Data Partition
Data Partition
Data Partition
Data Partition
Memory
ITasks Disk
ITask Runtime System
Grow/Reduce
Check
Serialize/DeserializeInput/Output
Reduce
Lu Fang (UC Irvine) Final Defense May 26, 2017 39 / 51
ITask Runtime System
Monitor
SchedulerPartition Manager
Data Partition
Data Partition
Data Partition
Data Partition
Memory
ITasks Disk
ITask Runtime System
Grow/Reduce
Check
Interrupt/Create Serialize/DeserializeInput/Output
Reduce
Lu Fang (UC Irvine) Final Defense May 26, 2017 39 / 51
Evaluation Environments
We have implemented ITask onI Hadoop 2.6.0
I Hyracks 0.2.14
An 11-node Amazon EC2 clusterI Each machine: 8 cores, 15GB, 80GB*2 SSD
Lu Fang (UC Irvine) Final Defense May 26, 2017 40 / 51
Evaluation Environments
We have implemented ITask onI Hadoop 2.6.0
I Hyracks 0.2.14
An 11-node Amazon EC2 clusterI Each machine: 8 cores, 15GB, 80GB*2 SSD
Lu Fang (UC Irvine) Final Defense May 26, 2017 40 / 51
Experiments on Hadoop
GoalI Show the effectiveness on real-world problems
BenchmarksI Original: five real-world programs collected from Stack Overflow
I RFix: apply the fixes recommended on websites
I ITask: apply ITask on original programs
Name Dataset
Map-Side Aggregation (MSA) Stack Overflow Full DumpIn-Map Combiner (IMC) Wikipedia Full DumpInverted-Index Building (IIB) Wikipedia Full DumpWord Cooccurrence Matrix (WCM) Wikipedia Full DumpCustomer Review Processing (CRP) Wikipedia Sample Dump
Lu Fang (UC Irvine) Final Defense May 26, 2017 41 / 51
Experiments on Hadoop
GoalI Show the effectiveness on real-world problems
BenchmarksI Original: five real-world programs collected from Stack Overflow
I RFix: apply the fixes recommended on websites
I ITask: apply ITask on original programs
Name Dataset
Map-Side Aggregation (MSA) Stack Overflow Full DumpIn-Map Combiner (IMC) Wikipedia Full DumpInverted-Index Building (IIB) Wikipedia Full DumpWord Cooccurrence Matrix (WCM) Wikipedia Full DumpCustomer Review Processing (CRP) Wikipedia Sample Dump
Lu Fang (UC Irvine) Final Defense May 26, 2017 41 / 51
Improvements
Benchmark Original Time RFix Time ITask Time Speed Up
MSA 1047 (crashed) 48 72 -33.3%
IMC 5200 (crashed) 337 238 41.6%
IIB 1322 (crashed) 2568 1210 112.2%
WCM 2643 (crashed) 2151 1287 67.1%
CRP 567 (crashed) 6761 2001 237.9%
I With ITask, all programs survive memory pressure
I On average, ITask versions are 62.5% faster than RFix
Lu Fang (UC Irvine) Final Defense May 26, 2017 42 / 51
Experiments on Hyracks
GoalI Show the improvements on performance
I Show the improvements on scalability
BenchmarksI Original: five hand-optimized applications from repository
I ITask: apply ITask on original programs
Name Dataset
WordCount (WC) Yahoo Web Map and Its SubgraphsHeap Sort (HS) Yahoo Web Map and Its SubgraphsInverted Index (II) Yahoo Web Map and Its SubgraphsHash Join (HJ) TPC-H DataGroup By (GR) TPC-H Data
Lu Fang (UC Irvine) Final Defense May 26, 2017 43 / 51
Experiments on Hyracks
GoalI Show the improvements on performance
I Show the improvements on scalability
BenchmarksI Original: five hand-optimized applications from repository
I ITask: apply ITask on original programs
Name Dataset
WordCount (WC) Yahoo Web Map and Its SubgraphsHeap Sort (HS) Yahoo Web Map and Its SubgraphsInverted Index (II) Yahoo Web Map and Its SubgraphsHash Join (HJ) TPC-H DataGroup By (GR) TPC-H Data
Lu Fang (UC Irvine) Final Defense May 26, 2017 43 / 51
Tuning Configurations for Original Programs
Configurations for best performanceName Thread Number Task Granularity
WordCount (WC) 2 32KBHeap Sort (HS) 6 32KBInverted Index (II) 8 16KBHash Join (HJ) 8 32KBGroup By (GR) 6 16KB
Configurations for best scalabilityName Thread Number Task Granularity
WordCount (WC) 1 4KBHeap Sort (HS) 1 4KBInverted Index (II) 1 4KBHash Join (HJ) 1 4KBGroup By (GR) 1 4KB
Lu Fang (UC Irvine) Final Defense May 26, 2017 44 / 51
Improvements on Performance
WC HS II HJ GR0
1
2
1 1 1 1 1
1.41.11
1.28
1.67 1.61
Nor
mali
zed
Sp
eed
Up Original Best
ITask
On average, ITask is 34.4% faster
Lu Fang (UC Irvine) Final Defense May 26, 2017 45 / 51
Improvements on Scalability
WC HS II HJ GR0
10
20
1 1 1 1 1
5.12.7
24
6 5
Nor
mal
ized
Dat
aset
Siz
e
Original BestITask
On average, ITask scales to 6.3×+ larger datasets
Lu Fang (UC Irvine) Final Defense May 26, 2017 46 / 51
The Effectiveness of ITask
ITask is praticalI it has helped 13 real-world applications survive memory problems
ITask improves performance and scalabilityI On Hadoop, ITask is 62.5% faster
I On Hyracks, ITask is 34.4% faster
I ITask helps programs scale to 6.3× larger datasets
A programming model + a runtime systemI Non-intrusive
I Easy to use
Lu Fang (UC Irvine) Final Defense May 26, 2017 47 / 51
Conclusions
First general technique to amplify problemsI A class of performance problems
I Reveals pontential problems during testing
A general performance testing frameworkI Includes a compiler and a runtime system
I Very pratical
First systematic approach to address memorypressure
I Consists of a programming model and a runtime system
I Solves real-world problems
I Significantly improves data-parallel tasks’ performance andscalability
Lu Fang (UC Irvine) Final Defense May 26, 2017 48 / 51
Future Works
Extend ISL
Add support into production JVMs
Consider more factors to improve test oracle
Instantiate ITask in more data-parallel systems
Lu Fang (UC Irvine) Final Defense May 26, 2017 49 / 51
Publications
I K. Nguyen, L. Fang, G. Xu, B. Demsky, S. Lu, S. Alamian, O. MutluYak: A High-Performance Big-Data-Friendly Garbage CollectorOSDI’16
I Z. Zuo, L. Fang, S. Khoo, G. Xu, S. LuLow-Overhead and Fully Automated Statistical Debugging with Abstraction RefinementOOPSLA’16
I K. Nguyen, L. Fang, G. Xu, B. Demsky.Speculative Region-based Memory Management for Big Data SystemsPLOS’15
I L. Fang, K. Nguyen, G. Xu, B. Demsky, S. LuInterruptible Tasks: Treating Memory Pressure As Interrupts for Highly ScalableData-Parallel ProgramsSOSP’15
I L. Fang, L. Dou, G. XuPerfBlower: Quickly Detecting Memory-Related Performance Problems via AmplificationECOOP’15
I K. Nguyen, K. Wang, Y. Bu, L. Fang, J. Hu, G. XuFacade: A Compiler and Runtime for (Almost) Object-Bounded Big Data ApplicationsASPLOS’15
Lu Fang (UC Irvine) Final Defense May 26, 2017 50 / 51
Thank You
Q & A
Lu Fang (UC Irvine) Final Defense May 26, 2017 51 / 51