Upload
apextgi
View
34
Download
5
Embed Size (px)
DESCRIPTION
Parallel LINQ (PLINQ) is a parallel implementation of LINQ to Objects. PLINQ implements the full set of LINQ standard query operators as extension methods for the T:System.Linq namespace and has additional operators for parallel operations.
Citation preview
PARALLEL - LINQ
Multi-Core and .NET 4In the words of developers
> “Getting an hour-long computation done in 10 minutes changes how we work.”
> “.NET 4 has made it practical and cost-effective to implement parallelism where it may have been hard to justify in the past.“
> “I do believe the .NET Framework 4 will change the way developers think about parallel programming.“
> Why Parral Framework(PFX)
> In recent times, CPU clock speeds have stagnated and manufacturers have shifted their focus to increasing core counts. This is problematic for us as programmers because our standard single-threaded code will not automatically run faster as a result of those extra cores.
> Leveraging multiple cores is easy for most server applications, where each thread can independently handle a separate client request, but is harder on the desktop — because it typically requires that you take your computationally intensive code and do the following:
> Partition it into small chunks.> Execute those chunks in parallel via multithreading.> Collate the results as they become available, in a thread-safe and performant
manner.
PFX Concepts > There are two strategies for partitioning work among threads: data parallelism and task parallelism.
PLINQ > Parallel LINQ (or PLINQ, as it is called) offers all benefits of LINQ. In addition to that, it enables parallel
execution of LINQ queries to take advantages of multiple processors of the host machine.
> AsParallel> The AsParallel method is the doorway to PLINQ. It converts data sequence into a ParallelQuery. The> LINQ engine detects the use of a ParallelQuery as the source in a query and switches to PLINQ> execution automatically. You are likely to use the AsParallel method every time you use PLINQ.
> ArrayList list = new ArrayList() {> "Adams", "Arthur", "Buchanan", "Bush", "Carter", "Cleveland",> "Clinton", "Coolidge", "Eisenhower", "Fillmore", "Ford", "Garfield",> "Grant", "Harding", "Harrison", "Hayes", "Hoover", "Jackson"};> IEnumerable<string> results = list.AsParallel().Cast<string>().Where(p => p.Contains('o'))> .Select(p => p);> foreach (string president in results) {> Console.WriteLine("Match: {0}", president);}
> ………………………………………………………………………………………………………………………….> IEnumerable<string> ss = list.AsParallel().AsOrdered().Cast<string>().Where(p =>
p.StartsWith("A")).Select(p => p);
> foreach (string president in ss)> {> DropDownList1.Items.Add(president);> }
ParallelQuery<T> by Filtering a ParallelQuery
> ArrayList list = new ArrayList();> list.Add("Adams");> list.Add(23);> list.Add("Arthur");> list.Add(DateTime.Now);> list.Add("Buchanan");> list.Add(new string[] { "apple", "orange" });> IEnumerable<string> results = list> .AsParallel()> .OfType<string>()> .Select(p => p);> foreach (string president in results) {> Console.WriteLine("Match: {0}", president);> }
Visual Studio 2010Tools, programming models and runtimes
Parallel Pattern Library
Parallel Pattern Library
Resource ManagerResource Manager
Task SchedulerTask Scheduler
Task Parallel Library
Task Parallel Library
Parallel LINQParallel LINQ
ManagedManaged NativeNativeKey:
ThreadsThreadsOperating System
Concurrency Runtime
Programming Models
ThreadPoolThreadPool
Task SchedulerTask Scheduler
Resource ManagerResource Manager
Data Str uctures
Data Str uctures
Dat
a St
ruct
ures
Dat
a St
ruct
ures
Tools
ToolingTooling
ParallelDebugger
Tool Windows
ParallelDebugger
Tool Windows
Concurrency Visualizer
Concurrency Visualizer
AgentsLibraryAgentsLibrary
UMS ThreadsUMS Threads
.NET Framework 4.NET Framework 4 Visual C++ 10Visual C++ 10Visual StudioIDE
Visual StudioIDE
User Mode SchedulingUser Mode Scheduling
Parallel LINQParallel LINQ
From LINQ to Objects to PLINQAn easy change
> LINQ to Objects query:int[] output = arr .Select(x => Foo(x)) .ToArray();
int[] output = arr .Select(x => Foo(x)) .ToArray();
int[] output = arr.AsParallel() .Select(x => Foo(x)) .ToArray();
int[] output = arr.AsParallel() .Select(x => Foo(x)) .ToArray();
> PLINQ query:
Array Mapping
11Thread 1Thread 1
SelectSelect
int[] input = ...bool[] output = input.AsParallel() .Select(x => IsPrime(x)) .ToArray();
int[] input = ...bool[] output = input.AsParallel() .Select(x => IsPrime(x)) .ToArray();
input: output:
66
33
88
22
77
……
Thread 2Thread 2
SelectSelect
Thread NThread N
SelectSelect
FF
FF
TT
FF
TT
TT
……
Array to array mapping is simple and efficient.
Sequence Mapping
Thread 1Thread 1
Thread 2Thread 2
Thread NThread N
SelectSelect
IEnumerable<int> input = Enumerable.Range(1,100);bool[] output = input.AsParallel() .Select(x => IsPrime(x)) .ToArray();
IEnumerable<int> input = Enumerable.Range(1,100);bool[] output = input.AsParallel() .Select(x => IsPrime(x)) .ToArray();
SelectSelect
SelectSelect
Results 2
Results 2
Results N
Results N
...
Results 1
Results 1
Input Enumerator
Input Enumerator
Lock
Lock
output:
Each thread processes a partition of inputs and stores results into a buffer.
Buffers are combined into one array.
Asynchronous Mapping
var q = input.AsParallel() .Select(x => IsPrime(x));foreach(var x in q) { ... }
var q = input.AsParallel() .Select(x => IsPrime(x));foreach(var x in q) { ... }
Thread 1Thread 1
Thread 2Thread 2
Thread NThread N
Input Enumerator
Input Enumerator
SelectSelect
SelectSelect
SelectSelect
Lock
Lock
Results 1
Results 1
Results 2
Results 2
Results N
Results N
OutputEnumerator
OutputEnumerator
Main ThreadMain
Thread
foreachforeach
...
PollPoll
MoveNextMoveNext
In this query, the foreach loop starts consuming results as they are getting computed.
Async Ordered Mapping or Filter
var q = input.AsParallel().AsOrdered() .Select(x => IsPrime(x));foreach(var x in q) { ... }
var q = input.AsParallel().AsOrdered() .Select(x => IsPrime(x));foreach(var x in q) { ... }
Thread 1Thread 1
Thread 2Thread 2
Thread NThread N
Input Enumerator
Input Enumerator
OpOp
OpOp
OpOp
Lock
Lock
Results 1
Results 1
Results 2
Results 2
Results N
Results N
OutputEnumerator
OutputEnumerator
Main ThreadMain
Thread
foreachforeach...
PollPoll
MoveNextMoveNext
OrderingBuffer
OrderingBuffer
When ordering is turned on, PLINQ orders elements in a reordering buffer before yielding them to the foreach loop.
Aggregation
Thread 1Thread 1Input
EnumeratorInput
EnumeratorAggregateAggregate
Lock
Lock
...
Thread 2Thread 2
AggregateAggregate
Thread NThread N
AggregateAggregate
int result = input.AsParallel() .Aggregate( 0, (a, e) => a + Foo(e), (a1,a2) => a1 + a2);
int result = input.AsParallel() .Aggregate( 0, (a, e) => a + Foo(e), (a1,a2) => a1 + a2);
res1:
res2:
resN:
result:
Each thread computes a local result.
The local results are combined into a final result.
Search
Thread 1Thread 1
Input Enumerator
Input Enumerator
FirstFirst
Lock
Lock
...
Thread 2Thread 2
FirstFirst
Thread NThread N
FirstFirst
int result = input.AsParallel().AsOrdered() .Where(x => IsPrime(x)) .First();
int result = input.AsParallel().AsOrdered() .Where(x => IsPrime(x)) .First();
resultFound:
FFresult:
PollPoll
SetSet
More complex queryint[] output = input.AsParallel() .Where(x => IsPrime(x)) .GroupBy(x => x % 5) .Select(g => ProcessGroup(g)) .ToArray();
int[] output = input.AsParallel() .Where(x => IsPrime(x)) .GroupBy(x => x % 5) .Select(g => ProcessGroup(g)) .ToArray();
Thread 1Thread 1
WhereWhere
...
Input Enumerator
Input Enumerator
Lock
Lock
GroupByGroupBy
Groups 2
Groups 2
Groups1
Groups1
output:
Thread 2Thread 2
WhereWhere
GroupByGroupBy
Thread 1Thread 1
SelectSelect
...
Thread 2Thread 2
SelectSelect Results2
Results2
Results1
Results1
PLINQ PERFORMANCE TIPS
Performance Tip #1:Avoid memory allocations
> When the delegate allocates memory> GC and memory allocations can become the
bottleneck> Then, your algorithm is only as scalable as GC
> Mitigations:> Reduce memory allocations> Turn on server GC
Performance Tip #2:Avoid true and false sharing
> Modern CPUs exploit locality> Recently accessed memory locations are stored
in a fast cache> Multiple cores
> Each core has its own cache> When a memory location is modified, it is
invalidated in all caches> In fact, the entire cache line is invalidated
> A cache line is usually 64 or 128 bytes
Core 1Core 1 55
Performance Tip #2:Avoid True and False Sharing
Thread 1Thread 1
Core 2Core 2Thread 2Thread 2
Core 3Core 3Thread 3Thread 3
Core 4Core 4Thread 4Thread 4
InvalidateInvalidate
Cache
Cache
Cache
Cache
66 77 33 2255 77 33 22
Memory:
55 77 33 22
55 77 33 22
55 77 33 22
Cache line
If cores continue stomping on each other’s caches, most reads and writes will go to the main memory!
Performance Tip #3:Use expensive delegates
> Computationally expensive delegate is the best case for PLINQ
> Cheap delegate over a long sequence may also scale, but:> Overheads reduce the benefit of scaling
> MoveNext and Current virtual method calls on enumerator> Virtual method calls to execute delegates
> Reading a long input sequence may be limited by the memory throughput
Performance Tip #4:Write simple PLINQ queries
> PLINQ can execute all LINQ queries> Simple queries are easier to reason about> Break up complex queries so that only the
expensive data-parallel part is in PLINQ: src.Select(x => Foo(x)) .TakeWhile(x => Filter(x)) .AsParallel() .Select(x => Bar(x)) .ToArray();
src.Select(x => Foo(x)) .TakeWhile(x => Filter(x)) .AsParallel() .Select(x => Bar(x)) .ToArray();
Performance Tip #5:Choose appropriate partitioning
> Partitioning algorithms vary in:> Overhead> Load-balancing> The required input representation
> By default:> Array, IList<> are partitioned statically> Other IEnumerable<> types are partitioned on demand
in chunks> Custom partitioning supported via Partitioner