Upload
kamea
View
21
Download
1
Embed Size (px)
DESCRIPTION
Workshop Ib: Experimental research in practice. Roland Geraerts 2 May 2014. Bad repeatability. Why can it be hard to reproduce papers’ claims?. Bad repeatability (1). Problem The results cannot be reproduced easily Cause Details of the method are lacking - PowerPoint PPT Presentation
Citation preview
1
Workshop:
Experimental research in practice
Roland Geraerts8 September 2015
2
Bad repeatability
Why can it be hard to reproduce papers’ claims?
3
Bad repeatability (1)
Problem The results cannot be reproduced easily
Cause Details of the method are lacking
• Parts of the method are not described• Degenerate cases are missing• References to other papers (without mentioning details)
Parameters don’t get assigned values (usually weights) Source code is not available The experimental setup is not clear
• Tested hardware (e.g. which PC/GPU, the number of cores used)• Statistical setup (e.g. number of runs, seed)• Details of the scenario(s) are missing
4
Bad repeatability (2)
Problem The results cannot be reproduced easily
Cause Low significance caused by a low number of runs Hard problems can be hard to implement
Solution Let someone else implement the method/paper Provide the source code
5
Data collection errors
What kind of errors occur during the collection of (raw) data?
6
Data collection errors (1)
Problem Errors occur during collection of raw data
• E.g., copy/paste values from GUIs into excel sheets or text files
Cause The data collection process was not automated
• There is a GUI but not a command line (console) version• Variables aren’t assigned the right values (how to verify?)
The precision of the stored numbers is too low Statistics are computed wrongly (e.g. how to compute the SD) Only the execution of a part of the algorithm is recorded The visualization part is not strictly separated from the
execution part of the algorithm• E.g. While the method performs its computations, the results are
being written to a log file and sent to the GPU for visualization purposes
7
Data collection errors (2)
Solution Automate the process using a console called from a batch file
• For small experiments, call the arguments in the batch file• Otherwise, build a load/save mechanism
Create an API that supports setting up experiments
8
Data collection errors (3): Time measurement errors
Problem Time is measured wrongly
Cause Lack of timer’s accuracy
• C++: Don’t use time.h• Don’t start/stop the timer inside the method, especially not if the
parts take less than 1 ms to compute Intervening network/CPU/GPU processes
9
Data collection errors (4):Time measurement errors
Solution Use accurate timers
• C++: Use QueryPerformanceCounter(…) instead; be careful of 0.3s jumps, or C++ 11: std::chrono::high_resolution_clock
• Run fast methods many times and take the average; watch out for non-deterministic behavior
Take the average of some runs, also in case of deterministic algorithms
Only measure the running time of the algorithm• Switch off the network• Kill the virus killer• Stop the e-mail program• Disable update functionality• Use only 1 core• Don’t work on your thesis while running the experiments on the
same machine; and yes, this happens
10
Bad figures
When do figures convey information badly?
11
Bad figures
Problem The figures convey information badly
Cause The figures are hard to read (e.g. too small or bitmapped) Axes haven’t been labeled The y-axis doesn’t start at 0 which amplifies (random) differences Use the right number precision/format
• Don’t display 100,000.001• Don’t display 0.0005 s, or 0.1 0.15 0.2 …
The meaning is not conveyed clearly Some colors/patterns don’t do well on black & white printers
Solution Use e.g. GNUplot (set all labels and export to vector: EPS or PDF) Use vector images as much as possible (e.g. use IPE) Explain all phenomena
12
Conclusions are too general
When are drawn conclusions too general?
13
Conclusions are too general (1)
Problem The conclusions drawn are often too general
Cause Only one instance is tested, e.g.
• environment / moving entity Only one problem setting is tested A favorable setup is used, e.g.
• a few axis-aligned rectangular obstacles• polygonal convex obstacles• 1 fixed query
Deterministic experiments do suffer from the ‘variance problem’
14
Conclusions are too general (2)
Solution Try to sample the problem space as good as possible Don’t try to bias any method
• Use a favorable setup (to show certain properties) and a ‘normal’ one• Also choose worst-case scenarios• Tune all methods equally
Compare against the state-of-the-art instead of old methods only Dare to show the weakness(es) of your method
15
Statistical weaknesses
When are the statistics less reliable?
16
Statistical weaknesses
Problem Statistics are done badly
Cause Results have been collected on different sets of hardware Too few runs Not all running times are mentioned (e.g. initialization) Only averages are mentioned
Solution Use the same machine (and don’t change the setup) Use e.g. GNUplot and set all (relevant) labels Use other measures, e.g.
• SD• Boxplot• Student’s t-test: statistical hypothesis test• ANOVA: Analysis of variance
17
Statistically significant?
18
So your method is statistically significant
While a method was granted being statistically significant, this does not have to mean anything in practice…
…due to the programmer’s bias.
Suppose different methods run in 10.2, 10.0, 10.3, and 9.6 seconds (with appropriate SDs etc). While the latter one might be better, in reality it does not have to be…
…since the third one might be the only one that wasn’t optimized.
19
Ways to bias your results (1)
Run the code with choices in of Hardware (CPU, GPU, memory, cache, #cores, #threads) Language (C++/C#, 32/64bit, different optimizations) Software libraries (own code/boost/STL)
Implementation is done by different people
20
Ways to bias your results (2):Some code optimizations
Enable optimizations in your compiler Run in release mode! Visual studio
• full optimization• inline function expansion• Enable intrinsic functions• Etc.
Compile the code with a 64-bit compiler 2-15% improvement of running times due to
• usage a of a larger instruction set• Not having to simulate 32-bit code
However, watch code that deals with memory and loops• use memsize-types in address arithmetic
21
Ways to bias your results (3):Some code optimizations
Unroll loops Improves usage of parallel execution (e.g. SSE2)
Create small code E.g. by improving the implementation; properly align data Improves cache behavior
Avoid mixed arithmetic Use STL
Is heavily optimized Avoid disk usage and writing to a console etc.
Follow the course: Optimization and vectorization
22
Ethics versus mistakes
Let’s have a discussion here!