27
OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario Esposito 1 [[email protected]] Paolo Mastroserio 1 [[email protected]] Francesco Maria Taurino 1,2 [[email protected]] Gennaro Tortone 1 [[email protected]]

OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

Embed Size (px)

Citation preview

Page 1: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

OpenMOSIX experiences in Naples

INFN - Napoli1

INFM - UDR Napoli2

University of Naples (Dept. Of Physics)3

CINECA (Bologna) – November 2002

Rosario Esposito1 [[email protected]]Paolo Mastroserio1 [[email protected]]

Francesco Maria Taurino1,2 [[email protected]]Gennaro Tortone1 [[email protected]]

Page 2: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

Index introduction our first cluster: MOSIX (Feb 2000) Majorana: (Jan 2001) farm setup: Etherboot & ClusterNFS VIRGO experiment (Jun 2001) ARGO experiment (Jan 2002) conclusions

Page 3: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

Introduction (1/2)

Why Linux farm ?

high performance

low cost

Problems with big supercomputers

high cost

low and expensive scalability

(CPU, disk, memory, OS, programming tools, applications)

Page 4: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

Introduction (2/2)

Why OpenMOSIX ?

In this environment, the choice (open)Mosix has been proven to be an optimal solution to give a general

performance boost on implemented systems.

network transparency preemptive process migration dynamic load balancing decentralized control and autonomy

Page 5: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

Our first cluster: MOSIX (Feb 2000) (1/2)

Our first test cluster was configured in February 2000:

10 PCs, running Mandrake 6.1, acting as public Xterminals used by our students to open Xsessions on a DEC-Unix AlphaServer;

Those machines had the following hardware configuration:

- Pentium 200 Mhz- 64 MB RAM- 4 GB hard disk

Page 6: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

Our first cluster: MOSIX (Feb 2000) (2/2)

We tried to turn this "Xterminals" in something more useful…

Mosix 0.97.3 and kernel 2.2.14 were used to convert those PCs in a small cluster to perform some data-reduction tests (mp3 compression with bladeenc program)

Compressing a wav file in mp3 format using bladeenc could take up to 10 minutes on a Pentium 200. Using the Mosix cluster, without any source code modification, we were able to compress a ripped audio cd (14-16 songs) in no more than 20 minutes

Page 7: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

Once verified the ability of Mosix to reduce the execution time of those "toy" programs thanks to preemptive process migration and dynamic load balancing, we decided to implement a bigger cluster to offer a high performance facility to our scientific community…

Page 8: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

Majorana: (Jan 2001) (1/2)

we decided to build a more powerful Mosix cluster available to all of our users, using low cost solutions and opensource tools (MOSIX, MPI, PVM);

The farm was composed by 6 machines.

5 computing elements with: Abit VP6 motherboard 2 Pentium III @800 Mhz 512 MB RAM PC133 a 100 Mb/s network card (3com 3C905C)

Page 9: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

Majorana: (Jan 2001) (2/2)

1 server with: Asus CUR-DLS motherboard 2 Pentium III @800 Mhz 512 MB RAM PC133 4 IDE HD (os + home directories in RAID) a 100 Mb/s network card (3com 3C905C) - public LAN a 1000 Mb/s network card (Netgear GA620T) - private LAN

All of the nodes were connected to a Netgear switch equipped with8 Fast Ethernet ports and a Gigabit port dedicated to the server

Page 10: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario
Page 11: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

farm setup: Etherboot & ClusterNFS

diskless nodes low cost eliminates install/upgrade of hardware, software on

diskless client side backups are centralized in one single main server zero administration at diskless client side

Page 12: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

Etherboot (1/2)

DescriptionEtherboot is a package for creating ROM images that can download code from the network to be executed on an x86 computer

Examplemaintaining centrally software for a cluster of equally configured workstations

URLhttp://www.etherboot.org

Page 13: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

Etherboot (2/2)

The components needed by Etherboot are A bootstrap loader, on a floppy or in an EPROM on a

NIC card A Bootp or DHCP server, for handing out IP addresses

and other information when sent a MAC (Ethernet card) address

A tftp server, for sending the kernel images and other files required in the boot process

A NFS server, for providing the disk partitions that will be mounted when Linux is being booted.

A Linux kernel that has been configured to mount the root partition via NFS

Page 14: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

Diskless farm setup traditional method (1/2)

Traditional method Server

BOOTP server NFS server separate root directory for each client

Client BOOTP to obtain IP TFTP or boot floppy to load kernel rootNFS to load root filesystem

Page 15: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

Diskless farm setup traditional method (2/2)

Traditional method – Problemsseparate root directory structure for each node

hard to set up lots of directories with slightly different contents

difficult to maintain changes must be propagated to each directory

Page 16: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

ClusterNFSDescription

cNFS is a patch to the standard Universal-NFS server code that “parses” file request to determine an appropriate match on the server

Examplewhen client machine foo2 asks for file /etc/hostname it gets the contents of /etc/hostname$$HOST=foo2$$

URLhttps://sourceforge.net/projects/clusternfs

Page 17: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

ClusterNFS features

ClusterNFS allows all machines (including server) to share the root filesystem

all files are shared by default files for all clients are named filename$$CLIENT$$ files for specific client are namedfilename$$IP=xxx.xxx.xxx.xxx$$ orfilename$$HOST=host.domain.com$$

Page 18: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

Diskless farm setup with ClusterNFS (1/2)

ClusterNFS method Server

BOOTP server ClusterNFS server single root directory for server and clients

Clients BOOTP to obtain IP TFTP or boot floppy to load kernel rootNFS to load root filesystem

Page 19: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

Diskless farm setup with ClusterNFS (2/2)

ClusterNFS method – Advantages easy to set up

just copy (or create) the files that need to be different

easy to maintain changes to shared files are global easy to add nodes

Page 20: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

VIRGO experiment (Jun 2001) (1/4)

VIRGO is the collaboration between Italian and French research teams, for the realization of an interferometric gravitational wave detector;

The main goal of the VIRGO project is the first direct detection of gravitational waves emitted by astrophysical sources;

Interferometric gravitational wave detectors produce a large amount of “raw” data that require a significant computing power to be analysed.

To satisfy such a strong requirement of computing power we decided to build a Linux cluster running MOSIX.

Page 21: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

VIRGO experiment (Jun 2001) (2/4)D

ata

Sw

itch

Inte

rnal

Sw

itch

SM 6010H 18 GB

SM 6010H 18 GB

SM 6010H 18 GB

SM 6010H 18 GB

SM 6010H 18 GB

SM 6010H 18 GB

SM 6010H 18 GB

SM 6010H 18 GB

SM 6010H 18 GB

SM 6010H 18 GB

SM 6010H 18 GB

SM 6010H 18 GB

Alpha Server4100

144 GB

VIRGO Lab Switch LAN Hardware

Farm nodesSuperMicro6010H- Dual Pentium III 1Ghz- RAM: 512Mbyte- HD: 18Gbyte- 2 Fast Ethernet interfaces- 1 Gbit Ethernet interface- (only on master-node)

StorageAlpha Server 4100HD: 144GB

Page 22: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

VIRGO experiment (Jun 2001) (3/4)

The Linux farm has been strongly tested by executing intensive data analysis procedures, based on the Matched Filter algorithm, one of the best ways to search for known waveforms within a signal affected by background noise.

Matched Filter analysis requires a high computational cost as the method consists in an exhaustive comparison between the source signal and a set of known waveforms, called “templates”, to find possible matches. Using a large number of templates the quality of known signals identification gets better and better but a great amount of floating points operations has to be performed.

Running Matched Filter test procedures on the MOSIX cluster have shown a progressive reduction of execution times, due to a high scalability of the computing nodes and an efficient dynamic load distribution;

Page 23: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

VIRGO experiment (Jun 2001) (4/4)

0,00

5,00

10,00

15,00

20,00

25,00

30,00

1 4 8 12 16 20 24

Number of processors

spee

d-u

p

measured speed-up

theoretical speed-up

The increase of computing speed respect to the number of processors doesn’t follow an exactly linear curve; this is mainly due to the growth of communication time, spent by the computing nodes to transmit data over the local area network.

speed-up of repeated Matched Filter executions

Page 24: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

ARGO experiment (Jan 2002) (1/3)

The aim of the ARGO-YBJ experiment is to study cosmic rays, mainly cosmic gamma-radiation, at an energy threshold of ~100 GeV, by means of the detection of small size air showers.

This goal will be achieved by operating a full coverage array in the Yangbajing Laboratory (Tibet, P.R. China) at 4300m a.s.l. As we have seen for the Virgo experiment, the analysis of data produced by Argo requires a significant amount of computing power. To satisfy this requirement we decided to implement an OpenMOSIX cluster.

Page 25: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

ARGO experiment (Jan 2002) (2/3)

currently Argo researchers are using a small Linux farm, located in Naples, constituted by:

5 machines (dual 1Ghz Pentium III with 1 Gbyte RAM) running RedHat 7.2 + openmosix 2.4.13.

1 file server with 1 Tbyte of disk space

Page 26: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

ARGO experiment (Jan 2002) (3/3)

At this time the Argo OpenMOSIX farm is mainly used to run Monte Carlo simulations using “Corsika”, a Fortran application developed to simulate and analyse extensive air showers.

The farm is also used to run other applications such as GEANT to simulate the behaviour of the Argo detector.

The OpenMOSIX farm is responding very well to the researchers’ computing requirements and we already decided to upgrade the cluster in the near future, adding more computing nodes and starting the analysis of real data produced by Argo.

Currently ARGO researchers in Naples have produced 212 Gbytes of simulated data with this OpenMOSIX cluster

Page 27: OpenMOSIX experiences in Naples INFN - Napoli 1 INFM - UDR Napoli 2 University of Naples (Dept. Of Physics) 3 CINECA (Bologna) – November 2002 Rosario

Conclusions

the most noticeable features of OpenMOSIX are its load-balancing and process migration algorithms, which implies that users need not have knowledge of the current state of the nodes

this is most useful in time-sharing, multi-user environments, where users do not have means (and usually are not interested) in the status (e.g. load of the nodes)

parallel application can be executed by forking many processes, just like in an SMP, where OpenMOSIX continuously attempts to optimize the resource allocation