Upload
anilkumar-patil
View
44
Download
0
Embed Size (px)
Citation preview
Tutorial on Timing Analysis and Optimization
Is your program always fast enough?
Dr. Christian FerdinandAbsInt Angewandte Informatik GmbH
Dr. Kai RichterSymtavision GmbH
2
AbsInt Angewandte Informatik GmbH
Provides advanced development tools for embedded systems, and tools for validation, verification, and certification of safety-critical software
Founded in February 1998 by six researchers of Saarland University, Germany
Privately held by the founders0
10
20
30
40
1998 2008
Staff growth graph
3
Key Products
4
Controllers in planes, cars, plants, … are expected to finish their tasks within reliable time bounds.Schedulability analysis must be performedHence, it is essential that an upper bound on the execution times of all tasks is known Commonly called the Worst-Case Execution Time (WCET)
Hard Real-Time Systems
5
The Timing ProblemPr
obab
ility
Execution time
Exact worst-caseexecution time
Safe worst-caseexecution timeestimate
Best-caseexecution time
Unsafe:execution timemeasurement
6
The Ever-Growing Gap
LOAD r2, _a
LOAD r1, _b
ADD r3,r2,r1
MPC 5xx (2000) PPC 755 (2001)
x = a + b;
68K (1990)
20 200
100
200
300
Best case Worst case
Execution time (clock cycles)
4
320
0
100
200
300
Best case Worst case
Execution time (clock cycles)
4 830
0
100
200
300
0 wait cycles 1 wait cycle External(6,1,1,1,..)
Execution time depending on flash memory
7
(Concrete) Instruction Execution
mul
FetchI-Cache miss?
IssueUnit occupied?
ExecuteMulticycle?
RetirePending instructions?
30
1
1
3
3
4
6
44
1 1
1
1
1
1
1
3
8
Murphy’s Law in Timing Analysis
Naïve, but safe guarantee accepts Murphy’s Law: Any accident that may happen will happen
Consequence: hardware overkill necessary to guarantee timeliness
Example: EADS study: Measured performance of PPC 603e with all the caches switched off
Corresponds to assumption “all memory accesses miss the cache”
Result: Slowdown of a factor of 30!!!
9
Fighting Murphy’s Law
Static Program Analysis allows the derivation of Invariants about all execution states at a program point
Derive Safety Properties from these invariants:
Certain timing accidents will never happen.
Example: At program point p, instruction fetch will never cause a cache miss
The more accidents excluded, the lower the upper bound
10
aiT WCET AnalyzerThe solution to the timing problemGlobal program analysis
abstract interpretation for cache, pipeline, and value analysisinteger linear programming for path analysis
Everything combined in a single intuitive GUI
11
Structure of the aiT WCET Analyzer
12
Example: Direct Mapped I-Cache
mul …
add …
ble 1024
1028:
1024:
1032:mul …
add …
1028:
1024:
Program Counter:
1028
Instruction:
I-Cache
mul ...
1032
ble 1024
CPU
Main memory
Cache Hit: ~ 1 Cycle
Cache Miss: ~ +1 to +100 Cycles
ble 1024 1032:
13
Cache Analysis
Must analysis:for each program point and calling context,find out which blocks are in the cacheMay analysis:for each program point and calling context,find out which blocks may be in the cache
Example: Fully Associative Cache (2 Elements)
14
Set Associative Cache
Addressprefix
Byte inline
Setnumber
Address:
CPU
1 2 … A
Adr. prefix Tag Rep Data block Adr. prefix Tag Rep Data block … …
… … … … … … Set: Fully associative subcache of A elements with LRU, FIFO, rand. replacement strategy … … … … … …
Main MemoryCompare address prefixIf not equal, fetch block from memory
Data Out
Byte select & align
15
Pipelines
Ideal case: 1 instruction per cycle
Fetch
Decode
Execute
Write back
Fetch
Decode
Execute
Write back
Fetch
Decode
Execute
Write back
Fetch
Decode
Execute
Write back
Fetch
Decode
Execute
Write back
Inst 1 Inst 2 Inst 3 Inst 4
16
Pipeline Analysis
Goal: calculate all possible pipeline states at a program pointMethod: perform a cycle-wise evolution of the pipeline,determining all possible successor pipeline statesImplementation: from a formal model of the pipeline,its stages and communication between themGeneration: from a PAG specificationResult: WCET for basic blocks
17
Pipeline ModelMPC555 Block Diagram
RCPU Block DiagramaiT visualization
aiT's internalpipeline model
18
Visualization of Pipeline Analysis Results
19
if a then b
elseif c thend
elsee
endiff
a
bc
d
f
e
10t
4t
3t
2t
5t
6t
Value of objective function: 19xa 1xb 1xc 0xd 0xe 0xf 1
max: 4 xa + 10 xb + 3 xc +
2 xd + 6 xe + 5 xf
where xa = xb + xc
xcc = xd + xe
xf = xb + xd + xexa = 1
Path Analysis: Example (simplified constraints)
20
A Hybrid Approach:Combining block measurements with static analysis
Measurementsof execution times of blocks(emulator, logic analyzer,Nexus, ETM,…)
Avoids the high costs of micro-architecture modelingRequires to “measure” all local worst-case behaviors
Regrettably, this is nearly impossible generally not safe!Nevertheless, can be quite useful for optimizations by hand
21
Some Architectural Features that make Measurement-Based WCET Analysis a Challenge
Fine-grain timing measurement is not always possibleInstrumentation changes timing behaviorDebug interfaces rarely available in “real” embedded applications
The empty cache is not necessarily the “worst case cache”
“Domino” effects
22
Domino Effect
Timing anomalyExecution time increase is not bounded by hardware determined constantsCertain instruction sequences e.g. in loop bodies can trigger this effect and increase latencies in further iterations
23
Pseudo-LRU Replacement (e.g., PPC G3)
Each setting of B[0..2] points to a specific line:
B0
B1 B2
10
10 10
L0 L1 L2 L3
24
4-way PLRU Domino Effect
Non-empty cacheEmpty cache
c: c . . .. . . .
11 0
c d . . 10 0
c d f . 00 1
c d f . 11 1
c d f . 10 1
c d f h 00 0
c d f h 11 0
c d f h 10 0
c d f h 00 1
c d f h 11 1
c d f h 10 1
c d f h 00 0
00 0
d:f:c:d:h:c:d:f:c:d:h:
c e a bc e d bc f d bc f d bc f d bc h d bc h d bc h d bc f d bc f d bc f d bc h d b
11 0
01 1
10 1
11 1
01 1
10 1
11 1
01 1
10 1
11 1
01 1
10 1
f e a b 00 0
c:d:f:c:d:h:c:d:f:c:d:h:
Sequence: c, d, f, c, d, h
This sequence is thenrepeated ad infinitum
only cache hits
two misses each time
b
25
aiT WCET Analysis Input/Output
clock 10200 kHz ;loop "_codebook" + 1 loop exactly 16 end ;recursion "_fac" max 6;SNIPPET "printf" IS NOT ANALYZED AND TAKES MAX 333 CYCLES;flow "U_MOD" + 0xAC bytes / "U_MOD" + 0xC4 bytes is max 4;area from 0x20 to 0x497 is read-only;
Specifications (*.ais)
Entry Point
Worst Case Execution Time
Visualization, Documentation
aiT
void Task (void){ variable++;function();next++:if (next)do this;terminate()}
Application Code
Executable (*.elf / *.out)à =€@€� �aŒ† |� @€,�@€�;Þ�Kÿÿô;ÿ �Kÿÿ؉�€2}Œ`øÿÿ™�€(8H#鳡�¶��€(
Compiler Linker
26
Hardware-Settings
Hardware settings have to be specified in aiT according to the targetprocessor configuration in the start-up code.
27
Challenge: Reconstruction of CFGIndirect Jumps
Case/Switch statements as compiled by the C-compiler are automatically recognizedFor hand-written assembly code annotations might be necessaryINSTRUCTION ProgramPoint BRANCHES TO Target1, …, Targetn
Indirect CallsCan often be recognized automatically if a static array of function pointers is usedFor other casesINSTRUCTION ProgramPoint CALLS Target1, …, Targetn
28
Loops
aiT includes a loop bound analysis based on interval analysis and pattern matching that is able to recognize the iteration count of many „simple“ FOR loops automatically
Other loops need to be annotatedExample: loop "_prime" + 1 loop end max 10;
29
Source Level Annotations
bool divides (uint n, uint m) {/* ai: SNIPPET HERE NOT ANALYZED, TAKES MAX 173 CYCLES; */return (m % n == 0);
}
bool prime (uint n) {uint i;if (even (n))/* ai: SNIPPET HERE INFEASIBLE; */
return (n == 2);for (i = 3; i * i <= n; i += 2) {/* ai: LOOP HERE MAX 20; */
if (divides (i, n))return 0;
}return (n > 1);
}
30
aiT: Timing Details
31
Recent Advances
Source: studies by Lim et al. (1995), Thesing et al. (2002), and Souyris et al. (2005)
Cache-miss penalties WCET overestimation
32
Master’s Thesis of Daniel SehlbergMälardalen University, Sweden, ASTEC-Project, August 2005
Real-time tasks under Rubus OS on C16x taken from Volvo CE application
33
WCET Challenge 2006Organized by the University of Mälardalenhttp://www.idt.mdh.se/personal/jgn/challenge/
Aim: Compare different approaches in analyzing the Worst-Case Execution Time
Excerpts from the final report: "aiT is able to handle every kind of benchmark and every test program that was tested in the Challenge. aiT is able to support WCET analysis even for complex processors.”“aiT demonstrates its leading position through all its features […]"
Full report: http://dc.informatik.uni-essen.de/Tan/all/
34
SCADE / aiT automated Flow
35
Analysis Reports
Customizable HTML reportsGlobal and detailed reportsDiff feature
36
Integration with ETAS/ASCET
aiT/StackAnalyzer is started from the ASCET main menuASCET generates the annotation filesand the analyses are performed in the background
37
Practical Experiments, Execution Time
Engine throttle control module specified in ASCET, Tasking compiler v7.5., STM ST10F269 microcontroller board. Run-times extracted from bus traces (ISYSTEMS ILA 128 logic analyzer)The worst-case path information provided by aiT was used to manually construct a corresponding input.
38
Practical Experiments: Stack Usage
ST10/C16x uses two stacks.Most generated functions neither use local variables nor call subroutines, i.e. the stack usage is zero.
39
Integration with Scheduling Analysis
System level:SymTA/S
Code level:aiT/StackAnalyzer
System model(tasks, activation,scheduling)
WCET/stack analysis (single task)
Scheduling analysis (WCRT)system stack analysis
WCET/stack request
Refinement
WCET/stack response
Additional info
40
Future Work
Extraction of timing (pipeline) models from HW description (VHDL)
Use of source-level program analyses
Tighter integration with measurement based approaches
Early phase worst-case execution time estimation
41
aiT WCET Analyzer Advantages
Inspect the worst-case timing behavior of (critical parts of) your code
Tight WCET bounds reflect the actual worst-case performance of your system
Determined automatically
Valid for all inputs and all execution scenarios
No modification of your code or tool chain required
42
aiT Visualization Features
Precise insight into the program and processor behavior
Valuable feedbackin optimizing your program
43
Conclusion
aiT enables development of complex hard-real time systems on state-of-the-art hardware
Increases safety
Saves development time and costs
Usability proven in industrial practice
Contact
Visit us!
Hall10, booth 403
Coffee break
We start again at 11h