37
Carlos Jaime Barrios Hernandez, PhD. EISI UIS @carlosjaimebh Concurrency and Parallel Programming Concurrency and Parallel Programming “An Introduction”

Concurrency and Parallel Programming “An Introduction”

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Concurrency and Parallel Programming “An Introduction”

CarlosJaimeBarriosHernandez,PhD. EISIUIS @carlosjaimebh

ConcurrencyandParallelProgramming

ConcurrencyandParallelProgramming“AnIntroduction”

Page 2: Concurrency and Parallel Programming “An Introduction”

ConcurrentandParallel

Larépétitionsurlascène,1874,EdgarDegas,Paris,Muséed'Orsay.

Page 3: Concurrency and Parallel Programming “An Introduction”

Plan

•  TheTraditionalWay •  DesignSpacesofParallelProgrammingRecall •  ConcurrentProgramming •  DistributedMemoryVs.SharedMemory •  DesignModelsforConcurrentAlgorithms

•  TaskDecomposition •  DataDecomposition

•  ConcurrentAlgorithmDesignFeaturesandForces •  NotParallelizableJobs,TasksandAlgorithms •  AlgorithmStructures •  FinalNotes

Page 4: Concurrency and Parallel Programming “An Introduction”

TraditionalWay

DesigningandBuildingParallelPrograms,byIanFosterinhttp://www.mcs.anl.gov/~itf/dbpp/

Page 5: Concurrency and Parallel Programming “An Introduction”

DesignSpacesofParallelProgramming*

•  PatternsforParallelProgramming,TimotyMattson,BeverlyA.SandersandBernaL.Massingill, SoftwarePatternSeries,Addison-Wesley2004

FC

• FindingConcurrency(StructuringProblemtoexposeexploitableconcurrency)

AS

• AlgorithmStructure(StructureAlgorithmtotakeadvantageofConcurrency)

SS

• SupportingStructures(InterfacesbetweenAlgorithmsandEnvironments)

IM

•  ImplementationMechanisms(DefineProgrammingEnvironments)

Page 6: Concurrency and Parallel Programming “An Introduction”

(Remember)ConcurrencyandParallelism

•  Asystemis“concurrent”ifitcansupporttwoormoreactionsinprogressatthesametime

•  Asystemis“parallel”ifitcansupporttwoormoreactionsexecutingsimultaneously

ConcurrentProgrammingisallaboutindependentcomputationsthatthemachinecanexecuteinanyorder.

Page 7: Concurrency and Parallel Programming “An Introduction”

ConcurretVs.Parallel

Page 8: Concurrency and Parallel Programming “An Introduction”

DistributedVs.Parallel

Page 9: Concurrency and Parallel Programming “An Introduction”

ConcurrentProgrammingGeneralSteps 1.   Analysis

•  IdentifyPossibleConcurrency •  Hotspot:Anypartitionofthecodethathasasignificantamount

ofactivity •  Timespent,Independenceofthecode…

2.   DesignandImplementation •  Threadingthealgorithm

3.   TestsofCorrectness •  DetectingandFixingThreadingErrors

4.   TuneofPerformance •  RemovingPerformanceBottlenecks

•  Logicalerrors,contention,synchronizationerrors,imbalance,excessiveoverhead

•  TuningPerformanceProblemsinthecode(tuningcycles)

Page 10: Concurrency and Parallel Programming “An Introduction”

DistributedVs.Shared

MemoryProgramming

CommonFeatures •  RedundantWork •  DividingWork •  SharingData(DifferentMethods)

•  Dynamic/StaticAllocationofWork •  Dependingofthenatureofserialalgorithm,resultingconcurrentversion,numberofthreads/processors

OnlytoSharedMemory

•  LocalDeclarationsandThread-LocalStorage

•  MemoryEffects: •  FalseSharing

•  CommunicationinMemory •  MutualExclusion •  Producer/ConsumerModel •  Reader/WriterLocks(InDistributedMemoryisBoss/Worker)

Page 11: Concurrency and Parallel Programming “An Introduction”

TasksandDataDecomposition

•  TasksDecomposition:TaskParallelism •  DataDecomposition:DataParallelism(GeometricParallelism)

Page 12: Concurrency and Parallel Programming “An Introduction”

ConcurrentComputationfromSerialCodes

•  SequentialConsistencyProperty:Gettingthesameanswerastheserialcodeonthesameinputdataset,comparingsequenceofexecutioninconcurrentsolutionsoftheconcurrentalgorithms.

in P out

in P out

P

P

SequentialVersion

Parallel/ConcurrentVersion

Page 13: Concurrency and Parallel Programming “An Introduction”

Tasksmustbeassignedtothreadsforexecution

TaskDecompositionConsiderations

• Whatarethetasksandhowaredefined?

• Whatarethedependenciesbetweentaskandhowcantheybesatisfied?

• Howarethetaskassignedtothreads?

Page 14: Concurrency and Parallel Programming “An Introduction”

Whatarethetasksandhowaredefined? •  Thereshouldbeatleastasmanytasksastherewillbethreads(orcores) •  Itisalmostalwaysbettertohave(many)moretasksthanthreads.

•  Granularitymustbelargeenoughtooffsettheoverheadthatwillbeneededtomanagethetasksandthreads •  Morecomputation:highergranularity(coarse-grained) •  LessComputation:lowergranularity(fine-grained)

Granularityistheamountofcomputationdonebeforesynchronizationisneeded

Page 15: Concurrency and Parallel Programming “An Introduction”

TaskGranularity

Core0

overhead

task

overhead

task

overhead

task

Core1 Core2 Core0

overhead

task

Core1 Core3

overhead

task

overhead

task

overhead

task

overhead

task

overhead

task

overhead

task

overhead

task

overhead

task

Fine-graineddecomposition Coarse-graineddecomposition

Page 16: Concurrency and Parallel Programming “An Introduction”

TaskDependencies

OrderDependency DataDependency

EnchantinglyParallelCode:Codewithoutdependencies

Process1

Process2

Out

in In1 In2

Process1

Process3

Process2

Out1 Out2

Process3

Out

Page 17: Concurrency and Parallel Programming “An Introduction”

DataDecompositionConsiderations

(GeometricDecomposition)

DataStructuresmustbe(commonly)dividedinarraysorlogicalstructures.

•  Howshouldyoudividethedataintochunks?

•  Howshouldyouensurethatthetasksforeachchunkhaveaccesstoalldatarequiredforupdate?

•  Howarethedatachunksassignedtothreads?

Page 18: Concurrency and Parallel Programming “An Introduction”

Howshouldyoudividedataintochunks?

Byindividualelements Byrows

Bygroupsofcolumns Byblocks

Page 19: Concurrency and Parallel Programming “An Introduction”

TheShapeoftheChunk

•  DataDecompositionhaveanadditionaldimension. •  Itdetermineswhattheneighboringchunksareandhowanyexchangeofdatawillbehandledduringthecourseofthechunkcomputations.

2SharedBorders

•  Regularshapes:CommonRegulardataorganizations. •  Irregularshapes:maybenecessaryduetotheirregular

organizationsofthedata.

5SharedBorders

Page 20: Concurrency and Parallel Programming “An Introduction”

Howshouldyouensurethatthetasksforeachchunkhaveaccesstoalldatarequiredforupdate? •  UsingGhostCells

•  Usingghostcellstoholdcopieddatafromaneighboringchunk.

Originalsplitwithghostcells

Copyingdataintoghostcells

Page 21: Concurrency and Parallel Programming “An Introduction”

Howarethedatachunks(andtasks)assignedtothreads?

•  DataChunksareassociatedwithtasksandareassignedtothreadsstaticallyordynamically

•  ViaScheduling •  Static:whentheamountofcomputationswithintasksisuniformandpredictable

•  Dynamic:toachieveagoodbalanceduetovariabilityinthecomputationneededbychunk •  Requiremany(more)tasksthanthreads.

Page 22: Concurrency and Parallel Programming “An Introduction”

ConcurrentDesignModelsFeatures •  Efficiency

•  Concurrentapplicationsmustrunquicklyandmakegooduseofprocessingresources.

•  Simplicity •  Easiertounderstand,develop,debug,verifyandmaintain.

•  Portability •  Intermsofthreadingportability.

•  Scalability •  Itshouldbeeffectiveonawiderangeofnumberofthreadsandcores,andsizesofdatasets.

Page 23: Concurrency and Parallel Programming “An Introduction”

TasksandDomainDecompositionPatterns

•  TaskDecompositionPattern •  Understandthecomputationallyintensivepartsoftheproblem. •  FindingTasks(asmuch…)

•  Actionsthatarecarriedouttosolvetheproblem •  Actionsaredistinctandrelativelyindependent.

•  DataDecompositionPattern •  Datadecompositionimpliedbytasks. •  FindingDomains:

•  Mostcomputationallyintensivepartoftheproblemisorganizedaroundthemanipulationoflargedatastructure.

•  Similaroperatorsarebeingappliedtodifferentpartsofthedatastructure. •  Insharedmemoryprogrammingenvironments,datadecompositionwillbeimpliedbytaskdecomposition.

Page 24: Concurrency and Parallel Programming “An Introduction”

GroupandOrderTasksPatterns

•  GroupTasksPattern •  Simplifytheproblemdependencyanalysis

•  Ifagroupoftasksmustworktogetheronadatasharedstructure •  Ifagroupoftasksaredependent

•  OrderTasksPattern •  Findandcorrectlyaccountfordependenciesresultingfromconstraintsontheorderofexecutionofacollectionoftasks. •  Temporaldependencies •  SpecificRequirementsofthetasks

Page 25: Concurrency and Parallel Programming “An Introduction”

DataSharingPattern

• Datadecompositionmightdefinesomedatathatmustbesharedamongthetasks.

• Datadependenciescanalsooccurwhenonetaskneedsaccesstosomeportionsoftheanothertask’slocaldata. • ReadOnly •  EffectivelyLocal(Accessedbyoneofthetasks) • ReadWrite

•  Accumulative •  Multipleread/SingleWrite

Page 26: Concurrency and Parallel Programming “An Introduction”

DesignEvaluationPattern

• Productionofanalysisanddecomposition: •  Taskdecompositiontoidentifyconcurrency •  Datadecompositiontoidentifydatalocaltoeachtask •  Groupoftaskandorderofgroupstosatisfytemporalconstraints

•  Dependenciesamongtasks • DesignEvaluation

•  Suitabilityforthetargetplatform •  DesignQuality •  Preparationforthenextphaseofthedesign

Page 27: Concurrency and Parallel Programming “An Introduction”

NotParallelizableJobs,TasksandAlgorithms • Algorithmswithstate • Recurrences • InductionVariables • Reductions • Loop-carriedDependencies

TheMythicalMan-Month:EssaysonSoftwareEngineering.ByFredBrooks.EdAddison-WesleyProfessional,1995

Page 28: Concurrency and Parallel Programming “An Introduction”

AlgorithmStructures

• OrganizingbyTasks •  TaskParallelism • DivideandConquer

• OrganizingbyDataDecomposition • GeometricDecomposition • RecursiveData

• OrganizingbyFlowofData • Pipeline •  Event-BasedCoordination

Page 29: Concurrency and Parallel Programming “An Introduction”

AlgorithmStructureDecisionTree(MajorOrganizingPrinciple)

Start

OrganizeByTasks

Linear

TaskParallelism

Recursive

DivideandConquer

OrganizeByDataDecomposition

Linear

GeometricDecomposition

Recursive

RecursiveData

OrganizeByFlowofData

Linear

Pipeline

Recursive

Event-BasedCoordination

Page 30: Concurrency and Parallel Programming “An Introduction”

DivideandConquerStrategy

Problem

Subproblem Subproblem Subproblem Subproblem

Subsolution Subsolution Subsolution Subsolution

Subproblem Subproblem

Subsolution Subsolution

Solution

split

split split

Solve Solve Solve Solve

Merge

Merge Merge

Page 31: Concurrency and Parallel Programming “An Introduction”

DivideandConquerParallelStrategy

split

base-casesolve

base-casesolve

merge

split

base-casesolve

base-casesolve

merge

split

merge Eachdashed-lineboxrepresentsatask

Page 32: Concurrency and Parallel Programming “An Introduction”

RecursiveDataStrategy

•  Involvesanoperationonarecursivedatastructurethatappearstorequiresequentialprocessing: •  Lists •  Trees •  Graphs

•  RecursiveDatastructureiscompletelydecomposedintoindividualelements.

•  Structureintheformofaloop(top-levelstructure)

•  Simultaneouslyupdatingallelementsofthedatastructure(Synchronization)

•  Examples: •  Partialsumsofalinkedlist.

•  Uses: •  WidelyusedonSIMDplatforms(HPF77)

•  CombinatorialoptimizationProblems.

•  Partialsums •  Listranking •  Eulertoursandeardecomposition •  Findingrootsoftreesinaforestofrooteddirectedtrees.

Page 33: Concurrency and Parallel Programming “An Introduction”

PipelineStrategy

•  Involvesperformingacalculationonmanysetsofdata,wherethecalculationcanbeviewedintermsofdataflowingthroughasequenceofstages •  InstructionpipelineinmodernCPUs

•  VectorProcessing(Loop-levelpipelining)

•  Algorithm-levelPipelining •  SignalProcessing •  Graphics •  ShellProgramsinUnix

Page 34: Concurrency and Parallel Programming “An Introduction”

Event-BasedCoordinationStrategy

•  Applicationdecomposedintogroupsofsemi-independenttasksinteractinginanirregularfashion.

•  Interactiondeterminedbyaflowofdatabetweenthegroups,implyingorderingconstraintsbetweenthetasks.

1

2

3

Page 35: Concurrency and Parallel Programming “An Introduction”

FinalNotes

•  EveryParallelAlgorithminvolvesacollectionoftasksthatcanexecuteconcurrently •  Thekeyisfindingtasks(andcollectthem)

•  Data-baseddecompositionisgoodif: •  Themostcomputationallyintensivepartoftheproblemisorganizedaroundthemanipulationofalargedatasetstructure.

•  Similaroperationsarebeingappliedtodifferentpartsofthedatastructurewithindependency.

•  Howeverthedesiredfeaturesofaconcurrent/parallelprogram(efficiency,simplicity,portabilityandscalability): •  Efficiencyconflictswithportability •  Efficiencyconflictswithsimplicity

•  Thusagoodalgorithmdesignmuststrikeabalancebetweenabstractionandportabilityandsuitabilityforaparticulartargetarchitecture.

Page 36: Concurrency and Parallel Programming “An Introduction”

RecommendedLectures

•  TheArtofConcurrency“AthreadMonkey’sGuidetoWritingParallelApplications”,byClayBreshears(Ed.OReilly,2009)

• WritingConcurrentSystems.Part1.,byDavidChisnall(InformITAuthor’sBlog:http://www.informit.com/articles/article.aspx?p=1626979)

•  PatternsforParallelProgramming.,byT.Mattson.,B.SandersandB.MassinGill(Ed.AddisonWeslley,2009)WebSite:http://www.cise.ufl.edu/research/ParallelPatterns/

•  DesigningandBuildingParallelPrograms,byIanFosterinhttp://www.mcs.anl.gov/~itf/dbpp/

Page 37: Concurrency and Parallel Programming “An Introduction”

Class-Delayedwork

•  RevisionofChapter2ofDesigningandBuildingParallelPrograms,byIanFosterinhttp://www.mcs.anl.gov/~itf/dbpp/

•  SolveintheExercisesSectionthe1and2numerals. •  Imagineasolutionforareal-worldhighcomplexproblemtosolveinthecampus(conceptually)

•  Readhttp://www.cs.wisc.edu/multifacet/papers/ieeecomputer08_amdahl_multicore.pdf