36
OpenMP Booth – Sorting Things Out With Tasks Ruud van der Pas 1 Sorting Things Out Ruud van der Pas Distinguished Engineer SPARC Microelectronics Santa Clara, CA, USA SC’16 Talk at OpenMP Booth Tuesday, November 15, 2016 (with tasks)

Sorting Things Out - OpenMP

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas1

SortingThingsOut

RuudvanderPasDistinguished EngineerSPARCMicroelectronics

SantaClara,CA,USASC’16Talkat OpenMPBoothTuesday,November15,2016

(withtasks)

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas2

TheDarkAgesOfOpenMPBigBrotherHadToKnowEverything

Andinadvance,(right)beforeexecutionForexample,thelooplength,numberof

parallelsections,etcGetshardwithmoredynamicproblemslikeprocessinglinkedlists,divideandconquer,

recursionAsolutionwasugly.Atbest

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas3

TaskingComesToTheRescue!

Andwewillshowyouhowitallworks

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas4

BUT!

Noformalterminology,definitions,etc

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas5

Ataskisachunkofindependentwork

Youguaranteedifferenttaskscanbeexecutedsimultaneously#pragmaomp task{“thisismytask”}

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas6

Theruntimesystemdecidesontheschedulingofthetasks

Atcertainpoints(implicitandexplicit),tasksareguaranteedtobecompleted

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas7

Forthosewholovetostudythefineprint,thefollowingadvice:

RTFM!Andthisiswhatitlookslike:

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas8

RTFM!

RecognizeTheFabulousMasters

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas9

Thread

Generatetasks

Thread

Thread

Thread

Thread

Executetasks

TheTaskingConceptInOpenMP

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas10

WhoDoesWhatAndWhen?You

Useapragmatospecifywherethetasksare(Theassumptionisthatalltaskscanbeexecutedindependently)

• Whenathreadencountersataskconstruct,anewtaskisgenerated

• Themomentofexecutionofthetaskisuptotheruntimesystem

• Executioncaneitherbeimmediateordelayed• Completionofataskcanbeenforcedthroughtasksynchronization

TheOpenMPruntimesystem

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas11

TaskingExplainedByWaysOfOneExample

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas12

ASimplePlan

Writeaprogramthatprintseither“Aracecar”or“Acarrace”andmaximizetheparallelism

YourTaskforToday:

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas13

TaskingExample/1

#include <stdlib.h>#include <stdio.h>

int main(int argc, char *argv[]) {

printf("A ");printf("race ");printf("car ");

printf("\n");return(0);

}

$ cc -fast hello.c$ ./a.outA race car$

What will this program print ?

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas14

#include <stdlib.h>#include <stdio.h>

int main(int argc, char *argv[]) {

#pragma omp parallel{

printf("A ");printf("race ");printf("car ");

} // End of parallel region

printf("\n");return(0);

} What will this program print using 2 threads ?

TaskingExample/2

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas15

$ cc -xopenmp -fast hello.c$ export OMP_NUM_THREADS=2$ ./a.outA race car A race car

Notethatthisprogramcould(forexample)alsoprint“AAraceracecarcar”or“AraceAcarracecar”,or“AraceAracecarcar”,or

.....ButIhavenotobservedthis(yet)

TaskingExample/3

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas16

#include <stdlib.h>#include <stdio.h>

int main(int argc, char *argv[]) {

#pragma omp parallel{

#pragma omp single{

printf("A ");printf("race ");printf("car ");

}} // End of parallel region

printf("\n");return(0);

}

What will this program print using 2 threads ?

TaskingExample/4

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas17

$ cc -xopenmp –fast hello.c$ export OMP_NUM_THREADS=2$ ./a.outA race car

But of course now only 1 thread executes .......

TaskingExample/5

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas18

int main(int argc, char *argv[]) {

#pragma omp parallel{

#pragma omp single{

printf(“A “);#pragma omp task{printf("race ");}

#pragma omp task{printf("car ");}

}} // End of parallel region

printf("\n");return(0);

}

What will this program print using 2 threads ?

TaskingExample/6

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas19

$ cc -xopenmp -fast hello.c$ export OMP_NUM_THREADS=2$ ./a.outA race car$ ./a.outA race car$ ./a.outA car race$

Tasks can be executed in arbitrary order

TaskingExample/7

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas20

AnotherSimplePlan

Havethesentenceendwith“isfuntowatch”(hint:useaprintstatement)

Youdidwellandquickly,sohereisafinaltasktodo

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas21

int main(int argc, char *argv[]) {

#pragma omp parallel{

#pragma omp single{

printf(“A “);#pragma omp task{printf("race ");}

#pragma omp task{printf("car ");}

printf(“is fun to watch “);}

} // End of parallel region

printf("\n");return(0);

}

What will this program print using 2 threads ?

TaskingExample/8

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas22

$ cc -xopenmp -fast hello.c$ export OMP_NUM_THREADS=2$ ./a.out

A is fun to watch race car$ ./a.out

A is fun to watch race car$ ./a.out

A is fun to watch car race$

Tasks are executed at a task execution point

TaskingExample/9

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas23

int main(int argc, char *argv[]) {

#pragma omp parallel{

#pragma omp single{

printf(“A “);#pragma omp task

{printf("car ");}#pragma omp task

{printf("race ");}#pragma omp taskwaitprintf(“is fun to watch “);

}} // End of parallel region

printf("\n");return(0);}

What will this program print using 2 threads ?

TaskingExample/10

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas24

$ cc -xopenmp -fast hello.c$ export OMP_NUM_THREADS=2$ ./a.out$ A car race is fun to watch $ ./a.outA car race is fun to watch$ ./a.outA race car is fun to watch$

Tasks are executed first now

TaskingExample/11

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas25

SortingThingsOut

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas26

TheQuicksortAlgorithmACommonlyUsedAlgorithmUsedForSorting

UsesadivideandconquerstrategyMainsteps:

Splitthearraythroughapivot,suchthat

Allelementstotheleftaresmaller

Allelementstotherightareequal,orgreater

Repeatforleftandrightpartuntildone

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas27

ASimpleExample/1

8 5 7 3 9 initialvalues

8 5 7 3 9 choosepivot, keepindex

8 5 7 3 9 swappivotandlastelement

8 5 9 3 7 scanarray,swapifsmaller

8 5 9 3 7 5<7 =>movetoposition0

5 8 9 3 7 3<7 =>movetoposition1

5 3 9 8 7 continue,but nothing found

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas28

ASimpleExample/2

5 3 9 8 7 restorepivot

5 3 7 8 9 pivotisinfinalposition

5 3

7

8 9

repeatforleftbranch

repeatforrightbranch

OpenMPtask OpenMPtask

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas29

TheRecursiveSequentialCode1 void Quicksort(int64_t *a, int64_t lo, int64_t hi)2 {3 if ( lo < hi ) {4 int64_t p = partitionArray(a, lo, hi);56 (void) Quicksort(a, lo, p - 1); // Left branch78 (void) Quicksort(a, p + 1, hi); // Right branch9 }

10 }

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas30

AndNowWithTasks1 void Quicksort(int64_t *a, int64_t lo, int64_t hi)2 {3 if ( lo < hi ) {4 int64_t p = partitionArray(a, lo, hi);56 #pragma omp task shared(a) firstprivate(lo,p)7 {(void) Quicksort(a, lo, p - 1);} // Left branch89 #pragma omp task shared(a) firstprivate(hi,p)

10 {(void) Quicksort(a, p + 1, hi);} // Right branch1112 #pragma omp taskwait13 }12 }

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas31

IncludingTheDriverPart1 #pragma omp parallel default(none) shared(a,nelements)2 {3 #pragma omp single nowait4 { (void) Quicksort(a, 0, nelements-1); }5 } // End of parallel region

1 void Quicksort(int64_t *a, int64_t lo, int64_t hi)2 {3 if ( lo < hi ) {4 int64_t p = partitionArray(a, lo, hi);56 #pragma omp task default(none) firstprivate(a,lo,p)7 {(void) Quicksort(a, lo, p - 1);} // Left branch89 #pragma omp task default(none) firstprivate(a,hi,p)

10 {(void) Quicksort(a, p + 1, hi);} // Right branch11 }12 }

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas32

FineTuningTheAlgorithmWhenthearraysectiongetstoosmall,itisbettertoswitchtothesequentialalgorithmMayalsoconsidertheuseoftheif-clauseplus

themergeable andfinalclauses

Someexperimentationisrecommended;-)

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas33

APerformanceExample*

30.4

15.0

7.64.1

2.1 1.4 0.9 0.7051015202530354045

0

5

10

15

20

25

30

1 2 4 8 16 32 64 128

Speedup

oversinglethread

Elap

sed>m

e(secon

ds)

NumberofOpenMPthreads

PerformanceoftheOpenMPquicksortalgorithm(40Melements)

Elapsed>me(s) Speedup

*) SPARC M7-8 server @ 4.1 GHz

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas34

SummaryBigBrotherDoesNotNeedToKnowEverything

Forcertaintypesofalgorithms

Taskingisideallysuitable

Optimalperformancemayrequiresomefinetuning

But.......Remember:

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas35

RTFM!

RecognizeTheFabulousMasters

OpenMPBooth– Sorting ThingsOutWith TasksRuudvanderPas36

Thank You And ..... Stay Tuned [email protected]