Upload
gary-short
View
591
Download
2
Tags:
Embed Size (px)
Citation preview
1
Collection Classes Deep Dive
By Gary ShortHead of Gibraltar Labs
Gibraltar Software
2
Introduction
• Gary Short• Head of Gibraltar Labs• C# MVP• @garyshort• [email protected]• facebook.com/TheOtherGaryShort
3
Why do we Care About This Stuff?
4
5
Let’s Start With Something We Know
6
List<T> Demo
7
What we Learned
• Don’t add elements in a loop• Add causes capacity growths• Capacity growths uses Array.Copy()• Array.Copy() is a O(n) operation• O(n) is sloooooowwwwwww. • Use AddRange() instead• Or set “large enough” initial capacity.
8
How Slow is Slow?
10 100 1000 10000 100000 10000000
5000
10000
15000
20000
25000
30000
Performance: Add Versus AddRange
AddAddRange
Number of Elements Added
Num
ber o
f Tic
ks
10
What About Removing Stuff?
11
Demo
12
What we Learned
Prefer RemoveAt() as there’s no IndexOf() step
13
List<T> - Sorting
• Uses QuickSort under the hood• Fastest general purpose sort algorithm• O(n log n) in best case• O(n log n) in average case• Though worst case is O(n^2)
1 2 3 4 5 6 7 8 9 100
20
40
60
80
100
120
Performance: O(n log n) Vs O(n^2)
O(n log n)O(n^2)
Elements to be Sorted
Effor
t
15
QuickSort Demo
16
So What is the Worst Case?
• If the list is already sorted– First partition has lower = 0, upper = n– Then calls Partition(n-1);– This happens a further n-2 times
18
Can we Mitigate the Worst Case?
• Median of Three– Take an element from the “top” of the array– Take an element from the “middle” of the array– Take an element from the “bottom” of the array– Find the median value of the three– Pivot on the median
• Let’s see if Microsoft uses this algorithm.
19
Disadvantage: O(n) Add, Insert, Remove
20
What if we Need Fast Add, Insert & Remove?
21
LinkedList<T>
• Double linked– Each item points to the previous and next items– This means it’s super fast• Add, insert and remove are all O(1) operations
22
Demo
23
Disadvantage: O(n) lookups
24
What if we Need Fast Lookups?
25
Dictionary<TKey, TValue>
• Performance depends on key.GetHashCode() – Hash codes must be evenly distributed across int• If two keys return hashes that give the same index
– Dictionary must look for nearest free location to store item– Must search later to return the item– This hurts performance
– Use your own type, then this is on you.
26
Dictionary<TKey, TValue>
• Objects used as keys must also implement IEquatable.Equals()
• Or override Equals()• Why?– Different keys may return the same hashcode– Equals() is used by the dictionary comparing keys– So you must ensure the following
• If A.Equals(B) then A.HashCode() and B.HashCode() return the same HashCode()
• Override Equals() but not GetHashCode() == compile error.
27
Disadvantage: one value per key
28
What if I Need Multiple Values per Key?
29
Lookup<TKey, TElement> Demo
30
Concurrent Collections
31
Types of Concurrent Collections
• ConcurrentBag<T>• ConcurrentDictionary<T>• ConcurrentQueue<T>• ConcurrentStack<T>• OrderablePartitioner<T>• BlockingCollection<T>.
32
Key Characteristics
• New .Net 4.0• Guards against multi-thread collection conflicts• Implements IProducerConsumerCollections<T>– TryAdd()
• Tries to add item to collection returns success bool
– TryTake()• Tries to remove and return item returns success bool
– Returns the item in an out param.
• Always check the return value before moving on.
33
Do I Have To Check Every Time?!
• BlockingCollection<T>– Blocks and waits until task completes– Uses Add() and Take() methods• Block the thread and wait until task completes• Add() has an overload to pass a CancellationToken• Add() may also block if bounding capacity was used.
34
But I Don’t Want it to Wait For Ever!
• So we don’t want to wait forever• Nor do we want to cancel the Add() from
outside• TryAdd() and TryTake() are offered too• Where you can specify a timeout.
35
Summary
• List is a good general purpose collection– Construct to size if possible– Construct to upper threshold then trim– Prefer AddRange() over Add()– Be aware of “Quicksort Killers”
• Use LinkedList if you need fast insert/remove• Use Dictionary if you need fast lookup• Use Lookup if you need multi values• Use concurrent collections for thread safety.