Distributed Programming CA107 Topics in Computing Series Martin Crane Karl Podesta

Distributed ProgrammingCA107 Topics in Computing Series

Martin Crane

Karl Podesta

The Basics…..• What is a Distributed System (DS)?• How does it differ from a Parallel Computer (MPP)?

– differences become fuzzy…now called Supercomputers or High Performance Computers (HPC)

• Supercomputers and Supermodels:– both expensive

– both hard to deal with/prone to tantrums

– both look glamorous but...

– Both spend lots of time doing tedious tasks for others:• mostly matrix-vector products for Supercomputers• being live mannequins for Supermodels

Why High Performance Computing?• Solve larger and larger scientific problems

– advanced product design – economic analysis– weather prediction/ climate modelling

• Store and process huge amount of data – data mining and knowledge discovery – image processing, multi-media information– internet information storage and search (eg

GOOGLE)

Different Supercomputers (MPPs) in Your Neighbourhood

• Single Instruction, Multiple Data (SIMD)– as seen on PlayStation 2– very useful for processing large arrays eg

a(i) = b(i) + c(i)*d(i) {as are found in games}

• Multiple Instruction, Multiple Data (MIMD)– as seen in Deep Blue

• But these are dinosaurs - we want something more flexible

Problems with Traditional Supercomputer (ie MPP)

• Expensive – Very high starting cost ($10,000s per node)– Expensive software – High maintenance cost – Costly to upgrade

• Vendor dependent – lots of companies have come and gone (datacube, Connection

Machines etc.)

So, real/poor people cannot do HPC!

PC Cluster: a poor-man’s supercomputer!

• built from high-end PCs and high-speed comms network

• supports standard parallel programming based on message-passing model (MPI language)

• cheap (16 node cluster can cost less than $10k)

Cluster Diagram Here

DCU CA Cluster Resources• “John the Baptist” Cluster

– built by Redbrick using old CA machines – 24 individual 450MHz machines– connected by a fast ethernet switch– harbinger of better things….

• “The one that is to come”……– 24 SMP machines– each with 2 GHz– plus loadsa memory!– arrives about Xmas time, appropriately enough

What are the issues in HPC?• Communication Vs Computation

– size/ nature of problem– interconnect speed/ processor speed

• Fault tolerance– quality of hardware– nature of problem

• Load balancing– nature of problem/ quality of programmer– even an easy problem can be made difficult & slow

by a bad implementation

Influence of Nature of Problem on Speed

• What is speed?– speed up is better: Time on 1 node/ Time on n nodes

• Speed-up and Problems– very good: embarrassingly parallel problems– fair to middling: regular and synchronous problems

• a bit of cross-talk between nodes

– bad: irregular/ asynchronous problems• lots of cross-talk between nodes

Documents

Distributed Programming CA107 Topics in Computing Series Martin Crane Karl Podesta