Upload
farley-lai
View
500
Download
1
Embed Size (px)
Citation preview
University of Iowa | Mobile Sensing Laboratory
Static Memory Management for Efficient Mobile Sensing
Applications
EMSOFT 2015
Farley Lai, Daniel Schmidt, Octav ChiparaDepartment of Computer Science
University of Iowa | Mobile Sensing Laboratory
• A class of applications that process continuous input data streams and may produce continuous output streams
– real-time processing
– efficient resource management
Emerging Mobile Sensing Applications
2
Speaker Models
Speech Recording
VADFeature
Extraction
HTTP Upload
Speaker Identifier
Introduction
Sensing Stream Processing
University of Iowa | Mobile Sensing Laboratory
• Workload: stream operations on frames of samples
– e.g., windowing, splitting, or appending
– stream operation tend to be memory intensive
• Goal: implement stream operations efficiently
– reduce memory footprint
– reduce number of memory accesses
• Challenges:
– handle complex interaction between components
– avoid unnecessary memory copies
– enable data sharing between components
The Memory Management Challenge
3
Introduction
University of Iowa | Mobile Sensing Laboratory
• Dynamic memory management
– specialized data structures to implement memory management
• e.g., SigSeg [Girod, et al. 2008] – linked list of buffered samples
– a level of indirection in accessing streaming data
• Static memory management
– no runtime overhead
– requires precise knowledge of the variable live ranges
• difficult to achieve in complex applications
• must be time-efficient to be included in compilers
Approaches to Memory Management
4
Introduction
[Girod2008] L. Girod, Y. Mei, R. Newton, S. Rost, A. Thiagarajan, H. Balakrishnan, and S. Madden, “XStream: a Signal-Oriented Data Stream Management System,” in ICDE, 2008.
University of Iowa | Mobile Sensing Laboratory
• Application model
• Static analysis
• Memory layout
• Evaluation
• Conclusions
Outline
5
University of Iowa | Mobile Sensing Laboratory
• StreamIt – synchronous data flow (SDF) language
– application = graph of filters connected with FIFO channels
• limited memory operations: pop(), peek(), and push()
• known consumption and production rates
A Model for Stream Applications
6
pop
peek
push
Filter::work()
INPUT: OUTPUT:
University of Iowa | Mobile Sensing Laboratory
• StreamIt – synchronous data flow language
– applications are constructed hierarchically
• pipeline of streams
• split and joins (splitter and joiner)
– pass-by-value semantics
• naïve implementation would incur significant number of copies
A Model for Stream Applications
7
LPF2
Source
Du
plic
ate LPF1
Subtract SinkR
ou
nd
-Ro
bin
University of Iowa | Mobile Sensing Laboratory
• SDFs may be executed in a cyclo-static schedule– the complete memory behavior of the program may be
observed within one execution of the schedule
• Our solution: static analysis + memory layout
Insight
8
LPF2
Source
Du
plic
ate LPF1
Subtract Sink
Ro
un
dR
ob
in
Source,3 DUP, 3 LPF1,1 LPF2,1
Source,1 DUP, 1 LPF1,1 LPF2,1 RR,1 Sub,1 Sink
INIT PHASE:
STEADY
PHASE:
RR,1 Sub,1 Sink
University of Iowa | Mobile Sensing Laboratory
• Location Sharing
– an output element is pushed from an unmodified input element
– each I/O element is associated with a pop/push index
• Temporal Sharing
– an output element reuses the input element storage
– each I/O element is associated with a live range [i, j]
• Builds on abstract interpretation
– build a Control-Flow Graph (CFG) for each filter
– abstract interpretation of memory operations
Component Analysis
9
University of Iowa | Mobile Sensing Laboratory
• Abstract interpretation of memory operations
– memory counter (MC) – relative order of operation
– indexes of current push (out) and pop (in)
– live range for each input (LIN) and output (LOUT) element
• Indexes and live ranges represented as intervals
• Subset of rules for determining live ranges:
Component Analysis
10
MC, out, LOUT
LOUT [out]⊔ MC, out++, MC++push
MC, in, LIN
LIN[in]⊔MC, in++, MC++pop
(MC1, in1, out1) (MC2, in2, out2)
(MC=max(MC1,MC2), in= in1 ⊔ in2, out=out1 ⊔ out2)join
University of Iowa | Mobile Sensing Laboratory | 11
Example of Component Analysis
[0,0] ∅ ∅ExampleLIN LOUT
0 0 1
MC, LIN, in
LIN[in]⊔MC, in++, MC++pop
RULE:
STATE:
MC 0
in 0 0
out 0 0
MC 1
in 1 1
out 0 0
CFG:
LIN[0] =LIN[0]⊔[0,0]
University of Iowa | Mobile Sensing Laboratory | 12
Example of Component Analysis
[0,0] [1,1] ∅ExampleLIN LOUT
0 0 1
RULE:
STATE:
MC 1
in 1 1
out 0 0
MC 2
in 1 1
out 1 1
CFG:
LOUT[0] =LOUT[0]⊔[1,1]
MC, LOUT, out
LOUT [out]⊔ MC, out++, MC++push
University of Iowa | Mobile Sensing Laboratory | 13
Example of Component Analysis
[0,0] [1,1] ∅ExampleLIN LOUT
0 0 1
RULE:
STATE:
MC 1
in 1 1
out 0 0
MC 2
in 1 1
out 0 1
CFG:
MC 2
in 1 1
out 1 1
(MC1, in1, out1) (MC2, in2, out2)
(MC=max(MC1,MC2), in= in1 ⊔ in2, out=out1 ⊔ out2)
join
University of Iowa | Mobile Sensing Laboratory | 14
Example of Component Analysis
[0,0] [1,1] [2,2]ExampleLIN LOUT
0 0 [0,1]
RULE:
STATE:
MC 2
in 1 1
out 0 1
MC 3
in 1 1
out 1 2
CFG:
LOUT[0,1] =LOUT[0,1]⊔[2,2]
MC, LOUT, out
LOUT [out]⊔ MC, out++, MC++push
University of Iowa | Mobile Sensing Laboratory
• Component analysis constructs a memory fragment
– captures live ranges for temporal reuse
– captures location sharing edges
• Whole program analysis constructs a memory graph
– stitches together memory fragments
– simulates the schedule to
• connect location sharing edges into paths and
• extend live ranges with the phase number and invocation index
• Our approach:
– analysis is precise when there is no input dependency
– otherwise, it is a sound approximation
Whole Program Analysis
15
University of Iowa | Mobile Sensing Laboratory
B
• Empirical insights– split-joins can be eliminated for manipulating location shared
elements
– a filter usually can reuse its input memory
• Heuristic approaches to resolving temporal reuse conflicts
Memory Layout
16
A
B
A0
0
0
A B other comps A memory B memory
0
0 0
No conflict Append on Conflict (AoC) Insert-in-Place (IP)
B
A
A
University of Iowa | Mobile Sensing Laboratory
• Intel x86_64 on Mac OS X 10.10.3– 3GHz Intel Xeon CPU E5-1680 v2.
– 32KB L1 instruction + 32KB L1 data caches
– 256KB L2 + 25MB L3 caches
• StreamIt Compiler– baseline default settings without optimizations
– enabled cache optimizations with –cacheopt
– gcc –O3 to compile generated C/C++ code
• 11 micro benchmarks from StreamIt
• 3 macro benchmarks from real MSAs– BeepBeep [Peng, C., et al. 2007],
– MFCC and Crowd [Xu, C., et al. 2013]
Experimental Setup
17
Evaluation
University of Iowa | Mobile Sensing Laboratory
– ESMS reduces both channel buffer sizes and the number memory operations from splitters, joiners and reordering filters
Memory Usage on Intel x86_64
18
45% to 96% reductions73% reductions on average
Evaluation
University of Iowa | Mobile Sensing Laboratory
– Compared with baseline StreamIt– The average speedup of AA, AoC, and IP are 3, 3.1, and 3 while the average
speedup of CacheOpt is merely 1.07. – ESMS improves the performance by eliminating unnecessary memory
operations and reducing cache/memory references.
Speedup on Intel x86_64
19
Evaluation
University of Iowa | Mobile Sensing Laboratory
• Static memory management is effective for stream languages
– whole program memory behaviors may be characterized
– both location and temporal sharing opportunities are exploited
– performance improvement due to fewer memory operations and references
• ESMS provides significant performance improvements
– 45% to 96% data size reduction
– 73% code size reduction
– 3X speedup
Conclusions
20
University of Iowa | Mobile Sensing Laboratory
• National Science Foundation (NeTs grant #1144664 )
• Carver Foundation (grant #14-43555 )
Acknowledgements
21
CSense Toolkit
University of Iowa | Mobile Sensing Laboratory
Questions?
Thank You
22