Upload
lorne
View
27
Download
2
Embed Size (px)
DESCRIPTION
Protocols and software for exploiting Myrinet clusters. Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin. Parallel machines and clusters. Cplant. Standalone workstation. Pros for clusters. - PowerPoint PPT Presentation
Citation preview
Protocols and software for exploiting Protocols and software for exploiting Myrinet clustersMyrinet clusters
Congduc Pham
and the main contributorsP. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin
Parallel machines and clustersParallel machines and clusters
Cplant
Standaloneworkstation
Pros for clustersPros for clusters
Large supercomputers are expensive and suffer from a short useful life span
Performance of workstations and PCs is rapidly improving
The communications bandwidth between workstations is increasing as new networking technologies and protocols are implemented in LANs and WANs.
Workstation clusters are easier to integrate into existing networks than special parallel computers.
Use of clusters of workstations as a distributed computing resource is very cost effective - incremental growth or update of system!!!
No polemical discussion, just statement…No polemical discussion, just statement…
Mainframe
Vector Supercomputer
Mini ComputerWorkstation
PC
1984
from R. Buyya
GigaEthernetGiganetSCIMyrinet…
The Myrinet technologyThe Myrinet technology Switch
– full crossbar– wormhole source routing– small latency
Network interface– embedded RISC processor– programmable– local memory– several DMA engines
Current specifications:
Up to 200Mhz processorUp to 8MB local memory64bit/66Mhz PCI bus (528 MB/s peak)250 MB/s full duplex links
The raw performance is here, but… The raw performance is here, but…
the traditional communication software fail to bring the hardware performance to the applications
Myrinet Traditional communication layers
Optimized communication layers
200mph 40mph
180mph
35mph
175mph
Going faster by taking shortcutsGoing faster by taking shortcuts
Our communication architectureOur communication architecture
Provides a complete suite for high-performance communications.Focus on Myrinet-based clusters
Viewed as layers, but by-passes as much as possible the OS
Myrinet physical layer
BIP BIP-SMP
MPI-BIP
programmable NICsbreak the traditionalspatial distribution of tasks
BIP, the lowest protocol levelBIP, the lowest protocol level
Basic Interface for Parallelism– very basic API– provides a library, a kernel module and a MCP– definitely not for the end-user
Optimizations for– latency– maximum throughput– the throughput increase
The implementation performs– reduction of the data critical path– distinction between small and large messages– burst or write combining for hostNIC– optimal cache usage– cache snooping for NIC host (monitoring of the PCI bus)– buffer alignment– optimal fragment size…
Myrinet
BIP BIP-SMPMPI-BIP
Avoids handshakes between the host and the NIC Uses PIO to a NIC FIFO on the sending side and an extra
memory copy on the receiving side
BIP, BIP, small message strategysmall message strategy
Use DMA both on the send side and receive side: higher bandwidth, offload the CPU
Zero-copy mechanism, pipelined transmission
BIP, largeBIP, large message strategy message strategy
BIP-SMP: a low level for SMP machinesBIP-SMP: a low level for SMP machines
SMP viewed as best performance/price ratio architectures (2 or 4 proc.)
BIP-SMP provides– manage concurrent accesses to the NIC– low latency intra-node communications– BIP equivalent inter-node communication– total transparency for the applications and end-users
0 1 2 3
BIP-SMP: Moving data between processesBIP-SMP: Moving data between processes
MPI-BIP: the communication middlewareMPI-BIP: the communication middleware
MPI-BIP adds high-level features to BIP– based on the MPICH implementation– provides a portable and widely-used API– implements a credit-based flow control for small messages– request FIFO for multiple non-blocking operations– provides segmentation/reassembly features to avoid timeouts
Working with the BIP software suiteWorking with the BIP software suite
installation– run configure
compilation and linkage– several libraries: bip, bip-smp, mpi– compile with bipcc
Submitting jobs and monitoring nodes– run myristat to know which nodes are available– run bipconf to configure the virtual machine– use bipload to lunch programs
WebCM: a high level management toolWebCM: a high level management tool
web-based management tool integrates existing solutions into a common
framework
The WebCM user interfaceThe WebCM user interface
graphical interface for myristat and bipconf
allows submission of jobs through batch packages
shows the user's virtual machine definition and the user's runnning processes
addition of fonctionnalities is performed by incorporating new software packages
Latency: BIP and MPI-BIPLatency: BIP and MPI-BIP
Throughput: BIP and MPI-BIPThroughput: BIP and MPI-BIP
BIP-SMP: intra-node communicationsBIP-SMP: intra-node communications
BIP-SMP: inter-node communicationsBIP-SMP: inter-node communications
What run on our clusters? What run on our clusters?
Genomic simulation Fluid dynamic Discrete Event Parallel Simulation Distributed Shared Memory System
Want to know more?– getting the distribution– getting the documentation
http://resam.univ-lyon1.fr