43
Open Distributed Systems December 15, 2009 Table of Contents Table of Figures.....................................2 1.0 Introduction........................................ 3 1.1 An Overview.......................................4 1.2 Motivation.........................................5 1.3 Aims & Objectives...................................5 2.0 Review.............................................. 7 2.1 High Performance Computing............................7 2.1.1 Types of HPC architectures.....................8 2.2 Supercomputers......................................8 2.2.1 Processing Techniques..........................9 2.2.2 Operating Systems..............................9 2.2.3 Programming...................................10 2.3 Clusters........................................... 12 2.3.1 Cluster Categorizations.......................12 2.3.2 Hardware issues...............................13 2.3.3 Grid Computing and Clusters...................14 2.3.4 Clustering, Linux and HPC.....................14 2.3.5 Cluster Programming...........................14 2.3.6 Beowulf Clusters..............................15 2.4 OpenMosix.........................................16 2.5 ClusterKnoppix......................................17 2.6 Cluster Hardware....................................17 2.7 Testing the Cluster...................................18 2.7.1 Fractals......................................18 2.7.2 Audio Encoding................................19 2.7.3 Image Rendering...............................19 2.8 Networking Hardware fundamentals......................21 2.8.1 Cable Medium..................................21 2.8.2 Network Switch................................22 2.8.3 Network Interface Card (NIC)..................22 2.8.4 Chapter Conclusion............................23 3.0 Implementation..................................... 24 3.1 Introduction........................................24 1 | Page

Building a Supercomputer

Embed Size (px)

Citation preview

Page 1: Building a Supercomputer

Table of ContentsTable of Figures........................................................................................2

1.0 Introduction.................................................................................................3

1.1 An Overview.....................................................................................41.2 Motivation.........................................................................................51.3 Aims & Objectives............................................................................5

2.0 Review........................................................................................................7

2.1 High Performance Computing.............................................................72.1.1 Types of HPC architectures.............................................................82.2 Supercomputers..................................................................................82.2.1 Processing Techniques....................................................................92.2.2 Operating Systems...........................................................................92.2.3 Programming..................................................................................102.3 Clusters.............................................................................................122.3.1 Cluster Categorizations..................................................................122.3.2 Hardware issues.............................................................................132.3.3 Grid Computing and Clusters.........................................................142.3.4 Clustering, Linux and HPC.............................................................142.3.5 Cluster Programming.....................................................................142.3.6 Beowulf Clusters............................................................................152.4 OpenMosix........................................................................................162.5 ClusterKnoppix..................................................................................172.6 Cluster Hardware..............................................................................172.7 Testing the Cluster............................................................................182.7.1 Fractals..........................................................................................182.7.2 Audio Encoding..............................................................................192.7.3 Image Rendering............................................................................192.8 Networking Hardware fundamentals.................................................212.8.1 Cable Medium................................................................................212.8.2 Network Switch...............................................................................222.8.3 Network Interface Card (NIC).........................................................222.8.4 Chapter Conclusion........................................................................23

3.0 Implementation.........................................................................................24

3.1 Introduction.......................................................................................243.2 Building the Cluster...........................................................................243.2.1 Network / OS “Installation” and set-up...........................................25

4.0 Testing & Conclusions..............................................................................27

4.1 “Over-Loading” the Cluster................................................................274.2 Fractal Calculation.............................................................................29

1 | P a g e

Page 2: Building a Supercomputer

4.3 Audio Encoding.................................................................................294.4 Image rendering...............................................................................304.5 Conclusions......................................................................................31

References.....................................................................................................32

Table of Figures

FIGURE 1 - BEOWULF CLUSTER AT THE CENTRE FOR ADVANCED COMPUTING RESEARCH AT CALTECH, (2001),THE BEOWULF PROJECT AT CACR,RETRIEVED 22 DECEMBER, 2009 ,......3

FIGURE 2 - SHARED MEMORY HPC ARCHITECTURE, SOURCE: CAROL GAUTHIER, CENTRE DE CALCUL SCIENTIFIQUE (CCS), UNIVERSITÉ DE SHERBROOKE...............................................8

FIGURE 3 - OPERATING SYSTEMS USED ON TOP 500 SUPERCOMPUTERS,EIGENES WERK,(2005),TOP500 SUPERCOMPUTERS,(2009),RETRIEVED DECEMBER 18, 2009,FROM HTTP://TOP500.ORG.........................................................................................................10

FIGURE 4 - ROADRUNNER, WORLD'S FIRST PETAFLOP COMPUTER,LEROY N. SANCHEZ,(2008).UNITED STATES DEPARTMENT OF ENERGY.RETRIEVED DECEMBER 19,2009,

FROM HTTP://WWW.LANL.GOV/...................................................................................................11FIGURE 5 - HPC DISTRIBUTED MEMORY ARCHITECTURE.GAUTHIER,C.G.(2003).INTRODUCTION TO

HPC.CENTRE DE CALCUL SCIENTIFIQUE (CCS) UNIVERSITÉ DE SHERBROOKE...................12FIGURE 6 - A BASIC HIGH PERFORMANCE CLUSTER. .NARAYAN,A.N.(2005).HIGH-PERFORMANCE

LINUX CLUSTERING, PART 1: CLUSTERING FUNDAMENTALS.RETRIEVED DECEMBER 25, 2009, FROM:HTTP://WWW.IBM.COM/DEVELOPERWORKS/LINUX/LIBRARY/L-

CLUSTER1/....................................................................................................................... 13FIGURE 7 - NASA 128-PROCESSOR BEOWULF CLUSTER,COMPUTATIONAL SCIENCE &

ENGINEERING RESEARCH INSTITUTE, MICHIGAN TECH,RETRIEVED DECEMBER 20, 2009,FROM:WWW.CS.MTU.EDU/~MERK/PUBLIC/WEBSITEDIR/CSERI.HTML..............................................15FIGURE 8 - THE MANDELBROT SET, UNIVERSITY OF UTAH, RETRIEVED DECEMBER 20, 2009FROM:HTTP://WWW.MATH.UTAH.EDU/~PA/MATH/MANDELBROT/MANDELBROT.HTML........................18FIGURE 9 - GLASSES, AN IMAGE CREATED BY POV RAY,AUTHOR GILLES TRAN,OYONALE - 3D ART

AND GRAPHIC EXPERIMENTS,RETRIEVED DECEMBER 22, 2009FROM HTTP://WWW.OYONALE.COM/MODELES.PHP?LANG=EN&PAGE=40.......................................20FIGURE 10 - IMAGE 10, BLUE STRANDED CATEGORY 5 CABLE WITH RJ45 PLUGS,RETRIEVED

DECEMBER 23, 2009 FROM HTTP://EN.WIKIPEDIA.ORG EN.WIKIPEDIA...................................21FIGURE 11 - HOW TO WIRE A CAT5 UTP - CROSSOVER CABLE,UTP CABLING HOW TO,RETRIEVED

DECEMBER 22, 2009 FROM WWW.PATRASWIRELESS.NET/.../CABLE_UTP.HTM.....................22FIGURE 12 - THE CLUSTERKNOPPIX ENVIRONMENT ALONG WITH THE MAIN WINDOW OF OPENMOSIX

VIEW WHICH SHOWS INFORMATION ABOUT THE CLUSTER....................................................26FIGURE 13 - CLUSTER TESTING SCRIPT,RUNNING CLUSTERKNOPPIX AS A MASTER NODE TO A

CHAOS DRONE ARMY, RETRIEVED DECEMBER 26, 2009FROM HTTP://WWW.MIDNIGHTCODE.ORG/PROJECTS/CHAOS/........................................................27FIGURE 14 - PROCESS MIGRATION SHOWN BY THE MIG-MON TOOL WHICH IS PART OF OPENMOSIX

VIEW, THE 2 NODES CAN BE SEEN AS WELL AS THE PROCESS MIGRATING FROM ONE NODE TO THE OTHER...................................................................................................................... 28

FIGURE 15 - CREATING THE MANDELBROT FRACTAL BY USING THE CLUSTER REDUCED THE CALCULATION TIME...........................................................................................................29

2 | P a g e

Page 3: Building a Supercomputer

3 | P a g e

Page 4: Building a Supercomputer

1.0 IntroductionProcessing capacity in modern computers seems to be increasing exponentially when comparing to the not so distend past, on the other hand an even greater increase in terms of computing-power need is dictated by certain “power-hungry” applications.Specific classes of computing applications present an increased demand in processing power and resources, such that overcomes those of a typical personal computer. Rendering photorealist animations, MPEG video encoding, audio file compression to MP3 or OGG format are everyday applications that require large amounts of computing power particularly in speed of calculations, rendering them challenging for even modern multi-core processors.Other more “exotic” and calculation-intensive tasks such as weather forecasting, climate research, molecular modeling, simulation of nuclear weapons detonation and research into nuclear fusion require computing machines in the frontline of current processing capacity widely known as “Supercomputers”.

Another definition (Miller, 2008) presents the supercomputer as a mainframe computer that is one of the most powerful available at a given time, typically a one-of-a-kind custom design and comparable cost.

Clustering on the other hand represents the lower end of supercomputing, a more build-it-yourself approach. Mayank Sharma (2004)Examples like the Beowulf Project (www.beowulf.org), illustrate how to use off-the-shelf PC hardware, connected through Fast Ethernet networks along with Linux operating system to build a supercomputer with a fraction of the cost, making high performance computing (HPC) accessible even to home users.

4 | P a g e

Supercomputer is a generic term that refers to a computer typically used for scientific and engineering applications that perform a large amount of computation. Mayank Sharma (2004)

Figure 1 - Beowulf cluster at the Centre for Advanced Computing Research at Caltech

Page 5: Building a Supercomputer

1.1 An Overview

During the course of this report a load-balancing cluster will be implemented using Linux clustering technology and more precisely ClusterKnoppix in the form of live CD’s. Various calculation-intensive tasks, for example, the creation of fractals; audio encoding as well as 3D image rendering will be undertaken by the cluster and benchmarked against execution in a single processing environment.

The following list illustrates the structure of this report:

In more detail:

The review chapter presents vital background knowledge ranging from basic definitions to hardware and implementations issues as illustrated through research.

The Implementation part of this report, deals as its name implies, with the actual creation of the load-balancing cluster.

Testing and Conclusions chapter illustrates the whole benchmarking process along with any observations made and finally closing with conclusions as well as remarks about the whole process.

5 | P a g e

Chapter 1 - Introduction

Chapter 2 - Review

Chapter 3 - Implementation

Chapter 4 - Testing and Conclusions

Appendices

Page 6: Building a Supercomputer

1.2 Motivation

As stated in paragraph “1.0 Introduction”, modern computer applications require large amounts of computing power, a need that is partially covered by off the self solutions that range in processing power as well as market price.On the other hand home users on limited budgets, fail to leverage the full potential modern technology has to offer in terms of processing power and computing resources. Multi-core processors and high speed computer peripherals are not accessible by a broad range of users, beside the social and ethical issues that rise by having different “classes” of users; a more practical issue is coming to surface.Even those technologies that are on the frontline of high-end home computing seem to be hampered by “demanding” applications. Video and audio editing hobbyists, computer scientists and university students that don’t have full time access to High Performance Computing equipment are affected by this lack in processing power.With the use of clustering technology and freely available open source software, supercomputers can now be created for a fraction of the cost making HPC available to computer hobbyists as well as more scientifically and technically oriented users.

1.3 Aims & Objectives

High performance computing (HPC) is becoming more accessible and easier to implement, reasons are, the growing acceptance of open source software (Linux in particular ) and the advent of clustering technology. Aditya Narayan (2005)

The aim of this report is put the above assumption to the test and to illustrate how Linux and Clusters have changed HPC by enabling home user’s as well as industries to leverage from the computing power of a supercomputer.By benchmarking the cluster against a single processing unit in terms of real time required to undertake certain tasks, a secondary objective of this report

6 | P a g e

An alternative title for this report could have been “new uses for old pc’s” or “keeping your old computers alive”, as it will be made obvious that an HPC Linux cluster can be created just by utilizing old personal computers with a minimum set of requirements in computing power and hardware.

Page 7: Building a Supercomputer

is met, more precisely to establish, based on experimentation, that the use of clustering has a positive impact on calculation-intensive tasks.

7 | P a g e

Page 8: Building a Supercomputer

2.0 ReviewDuring the review chapter underpinning theories, hardware and technologies as well as basic background knowledge concerning clustering, especially Linux HPC will be discussed and analyzed in order to form a basis for the next chapter which tackles implementation issues.The list below illustrates some of the notions discussed in this chapter:

High Performance Computing (HPC) Supercomputers

o SMP, symmetric multiprocessing

o MPP, massively parallel processing

Clusterso High Availability Clusters

o Load Balancing

o Compute Clusters

o Grid Computing

Beowulf clusters OpenMosix ClusterKnoppix Cluster Hardware Testing the Cluster

o Audio Encoding

o Fractals

o Image Rendering

Networking Hardware Fundamentalso Ethernet

o Ethernet Cables (UTP – CROSSOVER,STP)

o Network Switch

2.1 High Performance Computing

High Performance Computing (HPC) refers to the use of Supercomputers in solving extreme scientific computation problems Young and Guo (2005), the time this report was written any advanced computer system approaching the teraflops-region is considered to be an HPC system.Predominately the term HPC has been used for scientific computing and research. Recently HPC has found applications in business for data mining and warehousing as well as transaction processing.

8 | P a g e

Page 9: Building a Supercomputer

At this point confusion needs to be avoided with supercomputing, as both terms are similar and supercomputing is sometimes used as a synonym to HPC and sometimes is referred to as a powerful subset of HPC Young and Guo (2005). In this report supercomputers as considered as a subset of HPC.

2.1.1 Types of HPC architectures

The concept of parallelism is used by the majority of HPC systems, as stated by Narayan (2005) we can classify HPC system’s based on hardware architectures:

Symmetric Multiprocessor (SMP), which uses a number of processors that share the same memory

Vector Processors, where a powerful CPU is optimized in order to manipulate arrays or vectors

Clusters, being the predominant type of hardware used for HPC are presented in more detail later in the Review Chapter.

Gauthier (2003) defines the domains of HPC applications as following:

Fluid Dynamics Physics & Astrophysics Nanoscience Chemistry and Biochemistry Biophysics and Bioinformatics

Databases & Data mining Image and Signal processing and more …..

9 | P a g e

Page 10: Building a Supercomputer

2.2 Supercomputers

Numerous definitions exist that try to describe the term supercomputer, in the introductory chapter of this report supercomputers where defined as, powerful computers used for applications that perform a large amount of computation, Mayank Sharma (2004) as well as mainframe computers that are the most powerful available at a given time (Miller, 2008).

In this paragraph the term supercomputer is going to be defined more closely in terms of architectures, programming trends and operating systems.

2.2.1 Processing Techniques

Most supercomputing systems are multiple interlinked computers performing parallel processing usually following one of the following three approaches Sharma (2004):

SMP, symmetric multiprocessing, using shared memory operating system and I/O bus

MPP, massive parallel processing, characterized by a number of processors with separate OS and memory

Vector processing, utilizing a powerful CPU for specific type of applications (manipulating vectors or arrays )

Clustering, presented in more detail in the following paragraph, using distributed memory, a number of commodity processors and custom interconnecting, with most known example the Beowulf project.

2.2.2 Operating Systems

10 | P a g e

Figure 2 - Shared memory HPC architecture, source: Carol Gauthier, Centre de Calcul Scientifique (CCS), Université de Sherbrooke

Page 11: Building a Supercomputer

Operating systems used in supercomputers are predominately variants of Linux and UNIX. Figure 3 illustrates the use of OS’s in supercomputers from early 90’s until the present time.

2.2.3 Programming

Due to the parallel architectures of supercomputers special programming languages and software tools are used to exploit their processing power and speed. Base languages for supercomputer program coding are FORTRAN or C using specialized libraries, usually adopted for the specific problem and architecture. Depending on the processing technique and architecture adopted a new language based on the hardware may be developed as well. Perrott and Aliabadi (1986 p.22)

11 | P a g e

Figure 3 - OS used for supercomputers, large scale use of UNIX & Linux is obvious. source: Eigenes Werk/top500.org

Page 12: Building a Supercomputer

2.3 Clusters

In the context of HPC hardware Clusters are the predominant type, Narayan (2005). A cluster is made up by a numbers of MPP’s, as described in paragraph 2.2.1; MPP approach uses processors with their own memory, operating system, and I/O subsystem. In the context of Clustering they are referred to as nodes and are capable of communicating with other nodes through high speed Ethernet.

The term computer Cluster refers to a group of interlinked computers in such a way that in many respects form a single computer, Figure 5 illustrates the architecture of a 4 node Cluster.

12 | P a g e

Figure 4 - Roadrunner, world's first petaflop computer, source: http://www.lanl.gov/news/albums/computer/Roadrunner_1207.jpg

Page 13: Building a Supercomputer

2.3.1 Cluster Categorizations

Cluster can take various meanings in different contexts, as stated by Narayan (2005); it is possible to categorize clusters as following:

High-availability (HA) or Fail-Over, have a primary purpose of extending and maintaining the availability of the system, usually implemented for mission critical systems. They use redundant nodes which are utilized when other system nodes fail in order to maintain system functionality, minimum number of nodes in a Fail-Over Cluster is 2 in order to have redundancy.

13 | P a g e

Figure 5 - HPC distributed memory architecture, Clusters

Page 14: Building a Supercomputer

High-performance or Compute Clusters, by running parallel programs this architecture is used for applications requiring time-intensive computations usually for scientific computing. Figure 6 illustrates a basic High Performance Cluster.

Figure 6 - A Basic High Performance Cluster

Load Balancing Clusters, are multiple computers linked together to share processing workload and form a virtual single yet powerful computer. Computers in a cluster are referred to as nodes. One or more master nodes and several drone nodes form the cluster. Usually, the applications are initialized in the master node and processes migrate to nodes when required, this is referred to as Load Balancing. In more detail, load balancing refers to balancing computational work among different machines, in order to improve performance and thus forming a supercomputer. This is the type of cluster about to be created in the Implementation chapter of this report.

2.3.2 Hardware issues

Instead of using specialized, custom build and optimized processing hardwareClusters are built using off the self commodity hardware, thus keeping the cost at fraction of other supercomputer implementations for example, HPC with vector Processors.

Other hardware features worth mentioning are:

Programs have to be explicitly coded to make use of distributed hardware.

More nodes can be added to the cluster based on system needs. The use of commodity hardware enables clusters to have a much more

lower maintenance cost, take up less space, less power consumption, and need less cooling, heating generated by modern supercomputers

14 | P a g e

Page 15: Building a Supercomputer

is a major HVAC (Heating Ventilating and Air-conditioning) challenge faced by modern Supercomputer manufacturers.

2.3.3 Grid Computing and Clusters

Narayan (2995) states that Grid Computing can be considered as a broad term which refers to Service-oriented Architectures (SOA).HPC based on clusters can be thought of as a special case of Grid Computing that incorporates tightly coupled nodes, opposed to lightly coupled nodes used for Grid Computing.

One well known example of grid computing is the Seti@Home (http://setiathome.ssl.berkeley.edu/), which uses idle CPU time from PC-Home users connected to the internet in order to analyse radioscopic data in search for extraterrestrial Intelligence.

2.3.4 Clustering, Linux and HPC

The adoption of Open Source Clustering software, mainly Linux based, and advances in commodity hardware, has enabled even home users to enter the world of HPC, by building powerful clusters with a relatively small budget and expanding them by adding extra nodes when more computing power is needed. Robbins (2005)

Before the advent of Linux and the adoption of Clustering technology, a typical supercomputer usually equipped with a vector processor could cost millions.

Aditya Narayan(2005) suggests that Linux and Clustering technology have changed HPC, by making it easier to implement and more accessible to the wider public.

2.3.5 Cluster Programming

Due to the parallel nature of clusters, any existing non-parallel programs must be re-written in order to perform well and take full advantage of the Clusters capabilities.On the other hand, programs should be written explicitly to take advantage of the underlying Cluster Hardware. Narayan (2005)

15 | P a g e

Page 16: Building a Supercomputer

It is safe to assume that “Software and hardware go hand in hand when it comes to achieving high performance on a cluster” , as stated by Sharma (2004).

2.3.6 Beowulf Clusters

A very popular example is the Beowulf project, this type of implementation uses off- the-self PC processors and the Linux operating system in multiple nodes interconnected together using high speed Ethernet. Figure 7 illustrates a Beowulf Cluster.

Figure 7 - NASA 128-processor Beowulf cluster: A cluster built from 64 ordinary PC's.

For these systems to be able to fully utilize their resources, specialized cluster-enabled applications are required.These applications use advanced clustering libraries with the most popular being PVM and MPI, allowing computations and problem solving at a rate scaling almost linearly in relation to the number of machines in the cluster. Robbins (2005)

On the other hand Beowulf clusters require specialized custom-made software that is PVM and MPI “aware”, in order to take full advantage of the

16 | P a g e

Page 17: Building a Supercomputer

cluster’s capabilities and hardware. While not being a problem for scientific communities and research institutes, it presents a major drawback for the wider public that simply wishes to implement a basic Cluster and witness an increase in performance.

Since the applications that even the advanced home user uses are not usually implemented being PVM or MPI aware, the power of the cluster cannot be leveraged thus making clustering accessible to a small percentage of users.

OpenMosix technology comes to enable standard Linux applications to take advantage of a cluster’s capabilities without any need for the applications code to be rewritten or recompiled.

2.4 OpenMosix

OpenMosix adds Clustering capabilities to the Linux Kernel, by using adaptive load-balancing techniques, it enables any process to migrate from one node to another in order to execute faster, and as far as the system is concerned the process is running locally at the “home” node that was initiated.

At this level of transparency, OpenMosix Load-balancing technology can be taken advantage without any “special-programming”, as required by other solutions mentioned in the previous paragraph. Any OpenMosix installation by default “chooses” and then migrates the process to the “best” node for optimal execution Robbins (2005).

OpenMosix enables a number of interlinked computers running Linux into a Load-Balancing Cluster, resembling a large scale SMP (symmetric multiprocessing ) system. A basic difference is that in an SMP multiprocessing environment data exchange speeds are typically of large scale, while in an OpenMosix Cluster data exchange speeds and effectiveness depend solely on the interconnecting technology used, Gigabit Ethernet produces optimal results.

On the other hand an OpenMosix Cluster can be extended using low-cost personal computers, According to the OpenMosix Project(http://openmosix.sourceforge.net/ ), OpenMosix can scale up to more than 1,000 nodes, while an SMP system composed of multiple processors can be extremely expensive. Kris Buytaert (2004)

17 | P a g e

Page 18: Building a Supercomputer

2.5 ClusterKnoppix

ClusterKnoppix derives from a distribution of Linux called Knoppix with added Clustering support by incorporating the OpenMosix kernel.

ClusterKnoppix features as illustrated by the development team, The ClusterKnoppix Project (2004):

OpenMosix terminal server - PXE, DHCP and tFtp are used in order to boot Linux clients(Cluster Nodes) via the network

Auto-discovery automatically adds new nodes joining the Cluster without any further configuration, this implies no need of optical, hard disk or floppy drive in the system.

Cluster management tools used for managing the cluster. X-terminal server, that provides a graphical user interface (GUI) for

networked computers.

This specific Linux distribution comes in Live CD format, this enables complete Linux systems to boot from the Optical Drive without the need of installing ClusterKnoppix in a local Hard Drive.

2.6 Cluster Hardware

In order to set up a minimum OpenMosix Cluster, at least 2 networked systems running Linux are required.

For optimal performance the following system configurations are required. Kris Buytaert (2004)

Cable Medium: 100Mbit (Fast) Ethernet is strongly recommended, legacy 10BaseT (10Mbit) will work with a cost in performance and Gigabit Ethernet produces optimal results with an increase in cost.

Switch: the use of dedicated high speed switches can increase the performance of the Custer by enabling communication between nodes in “Full-Duplex” mode. For a Cluster comprised of two or three nodes a UTP Crossover cable that acts as a switch can enhance the performance will minimal cost, later in the review chapter cable mediums will be discussed in further detail.

NIC: any 10/100/1000 network card will suffice with Gigabit network cards being the ideal solution for optimal results.

18 | P a g e

Page 19: Building a Supercomputer

Hard Disk Swap Space, a sufficient amount of swap space is needed in case a node is removed, in order to prevent other nodes running out of virtual memory.

2.7 Testing the Cluster

A specific class of applications that require large amounts of computing power in particular, the creation of Fractals, audio encoding and image rendering is used to test the Custer.

In the following paragraphs the above applications will be briefly introduced and discussed.

2.7.1 Fractals

Fractals can be defined as “a rough or fragmented geometric shape that can be split into parts, each of which is (at least approximately) a reduced-size copy of the whole” Mandelbrot (1982).

19 | P a g e

Figure 8 - A Mandelbrot set, is a famous set of Fractals

Page 20: Building a Supercomputer

By appearing similar at any level of magnification, fractals are considered to be infinitely complex. Natural objects that are approximated by fractals to a degree include clouds, snowflakes, various vegetables (cauliflower and broccoli) as well as coastlines and mountain ranges. Falconer and Kenneth (2003)

Fractal generating software are used for fractal creation with the use of recursion without requiring any mathematical knowledge from the user, nevertheless the generation of “genuine” fractals requires large amounts of computing power, this makes it an ideal application to test the Cluster’s performance.

More information about fractals can be found on: http://math.fullerton.edu/mathews/c2003/Fractal-MandelbrotBib.html

2.7.2 Audio Encoding

Audio encoding is a process which reduces the data rate or storage size of digital audio signals. Pan (1993)

When converted from analogue to digital, uncompressed digital audio files, wave of files for instance, require a significant amount of storage space. Audio encoding as a form of data compression, more specific Audio compression, reduces the storage file size of audio files as well as the bandwidth required for transmitting digital audio streams.

Usually specialized audio compression algorithms called codecs are utilized by software solutions to compress audio , due to the fact that generic data compression methods exhibit poor performance when it comes to audio.

Some codec examples are MPEG 1 Layer3 (MP3) and OGG as well as FLAC, with the first two producing a digitally degraded version of the original data, “lossy” compression, while the latter producing an exact bite-stream as the original, when uncompressed.

Robbins (2005) states that audio as well as video encoding applications are ideal for testing the Cluster, due to the fact of being CPU-Intensive they can benefit from a Clustered environment. On the other hand, there is not an excessive demand in I/O that could overwhelm the network hence reduce performance.

20 | P a g e

Page 21: Building a Supercomputer

2.7.3 Image Rendering

3D computer graphics use a three-dimensional representation of geometric data (a model), usually in Cartesian form, stored in the computer in order to perform calculations and finally render 2D images. Shirley (2005)A model is usually represented by a data structure, holding information about geometry, viewpoint, texture, lighting, and shading.

The process of generating the actual image from a model is referred to as Rendering. A number of algorithms have been developed and researched with most popular ones being; rasterisation, ray casting, radiosity and ray tracing. Rendering finds use in architecture, video games, simulators, movie and TV special effects, Figure 9 presents a rendered image created by POV Ray an open source software package available for Linux.

21 | P a g e

Figure 9 - An image created by using POV-Ray.

Page 22: Building a Supercomputer

Photorealistic computer generated images like the above and animations, use sophisticated as well as compute-intensive software, for example MAYA, 3D-MAX and Mental Ray, in order to be created or rendered. The above specialized programs are characterised by “heavy” use of the computers central processing unit. Depending on the “complexity” of a scene and the number of frames, for an animation, the rendering process can take up to several hours.

While professionals and animation studios use specialised and expensive equipment or outsource the rendering process to companies called “CPU or Rendering Farms”, home users or hobbyists and even smaller studios cannot afford the above solutions. Modern Graphic Processing Units (GPU’s) have improved in such a degree that they compete computer CPU’S in computing power in the context of 3D Graphics rendering. Unfortunately even for the above hardware 3D computer graphics are challenging, considering that a single frame depending on the complexity of the scene can take several hours to render and for one minute of animation we have 1,800 frames (30 FPS x 60), a number of issues arise.

All the above assumptions make 3D rendering a perfect candidate to test the efficiency of the Linux Cluster by using it as a “Rendering Farm”.

2.8 Networking Hardware fundamentals

In the following paragraphs networking hardware fundamentals are going to be introduced and presented as background knowledge, in addition hardware choices used in the implementation of the Cluster are discussed.

In order to interconnect the nodes in the Cluster, a member of a frame-based family of computer networking technologies is used and more specifically Ethernet. It presents a number of wiring, signaling, addressing as well as topology standards in order to define a Local Area Network.

22 | P a g e

Page 23: Building a Supercomputer

2.8.1 Cable Medium

Category 5 cable often referred to as Cat5 presents a twisted pair cable with high signal integrity either shielded or unshielded. The term “twisted” refers to the type of wiring in which two conductors are twisted together in order to cancel out electromagnetic interference (EMI), from external sources Lammle(2007).

Shielded Twisted Pair(UTP), has a further metal shielding over each pair of copper wires for additional protection from external EMI

Unshielded Twisted Pair (STP) is the most common cable used in computer networks, as well as in the most common networking standard, Ethernet. UTP cable does not have the additional shielding making it less expensive.

Cat5 UTP cable will be used for interconnecting the testing Cluster, Figure 10 presents a Cat5 Ethernet cable.

Figure 10 - Blue stranded category 5 cable with 8P8C modular connector RJ-45

2.8.2 Network Switch

As discussed in paragraph 2.6 Cluster Hardware, using dedicated high speed switches can increase the performance of the Custer, especially in the case that communication is in “Full-Duplex” mode.

A network switch is a computer networking device that connects network segments Lammle(2007), a standard 10/100 Ethernet switch operates at the data-link layer of the OSI model, Switches may also operate at more OSI layers, physical, data link, network.A characteristic of switches that finds great application at Cluster Computing is their ability to enable data-transfer between nodes at the same time, full-duplex communication.

23 | P a g e

Page 24: Building a Supercomputer

Another characteristic of switches worth mentioning is the ability to manage network traffic, that passes through them, as well as acting as a “bridge” connecting different networks.

For a Cluster comprised of two or three nodes a UTP Crossover cable that acts as a switch can enhance the performance will minimal cost, such a cable can be created from a standard UTP cable minimizing the costs even further. Figure 11 illustrates how to wire a UTP Crossover Cat5 cable.

Figure 11 - How to wire a crossover cable

2.8.3 Network Interface Card (NIC)

Lammle (2005) defines a Network Interface Card or LAN Adapter as a computer hardware component designed to facilitate computers to communicate over a computer network through a cable medium.For this specific implementation ordinary 100Mbps Ethernet Network Adapters where used, for optimum performance Gigabit Ethernet Cards connected through a Gigabit Switch is advised.

2.8.4 Chapter Conclusion

This concludes the Review chapter aimed at proving all the required background knowledge, in terms of theoretical as well as more practical issues. The following chapter presents the actual implementation and testing of the Cluster.

24 | P a g e

Page 25: Building a Supercomputer

25 | P a g e

At this point it is worth mentioning that the Review Chapter has presented a wide range of different subjects in computer science, networking, computer graphics, hardware, image processing and other scientific fields. The intention was to provide a basic introduction to the above scientific fields that affect Cluster computing in some way.Covering the above subjects in more detail, would be out of the scope of this report, more information can be found in the references section of this report.

Page 26: Building a Supercomputer

3.0 Implementation

3.1 Introduction

In this chapter the actual Cluster will be implemented as well as tested under the following scenarios:

Load Balancing, the process of load-balancing and process migration in the cluster will be illustrated by using ClusterKnoppix build-in tracking tools, migMon and OpenMossixView and shell-scripting. Fractal Calculation, a Mandelbrot Fractal will be created using Kandel, under Linux, which is Cluster enabled and can take advantage of Cluster environments. Audio Encoding, audio files (wave,)will be encoded into MPEG Layer3 and OGG using open source software.Image Rendering, an 640x480 resolution image will be rendered using POV Ray an open source 3D image creation and rendering software tool under Linux.

3.2 Building the Cluster

As mentioned in the Introductory Chapter, for the purpose of this report a basic 2 Node Cluster will be implemented, for this, 2 outdated personal computers were used, connected through UTP Crossover Ethernet Cable.

Master: Celeron M 1,4GHz CPU, 256 MB SD-RAM Drone: AMD 1,2GHz CPU, 256 MD SD-RAM

Both nodes are equipped with a CD-Rom drive and are enabled to Boot from CD at start-up.

At this point in order to avoid any confusion with the terms Master and Drone that were used in order to refer to the nodes, the following clarification is in order. OpenMosix and ClusterKnoppix as an extend, does not imply any Master/Slave architecture or any central controlling node. Each node is autonomous and the network is dynamically adjusted in order to add or

26 | P a g e

Page 27: Building a Supercomputer

remove nodes in real time without affecting the overall functionality of the system Robbins(2005).

3.2.1 Network / OS “Installation” and set-up

Setting-up the Cluster is relatively easy and requires basic OS and Computer Networking knowledge. ClusterKnoppix comes into the form of Live-CD’s, no installation is necessary to the local hard-drive. In fact a cluster can be created with only one node equipped with a hard-drive, in order to launch applications from that node, all the other nodes of the Cluster need only to be equipped with a bootable CD-ROM in order to “run” ClusterKnoppix.

All the preparatory steps all illustrated below,

Create as many ClusterKnoppix CD’s as the nodes of the Cluster Enable the “Boot from CD-ROM” BIOS function in all nodes. Connect the nodes either using Crossover UTP cable or a switch,

depending on the number of nodes. Power on the nodes in any order.

The cluster is almost set-up, the next step is to adjust the network connections,

Networko IP Address 192.168.1.0

o Net Mask: 255.255.255.0

o Default Gateway: 192.168.1.1

o IP address of Node1: 192.168.1.1

o IP address of Node2: 192.168.1.2

All the network configurations can be adjusted by entering the following commands in the Linux console for each node,

27 | P a g e

Ifconfig eth0 192.168.1.10Route add –net 0.0.0.0 gw

192.168.1.1omdiscd

Page 28: Building a Supercomputer

by only changing the IP-Address for each node. Executing the above commands will define the static IP address in each node of the Cluster and automatically adjust the network for OpenMosix by entering the “omdiscd” command.

As it was made obvious setting-up an OpenMosix Cluster did not involve any specialized knowledge from the user’s part, in fact the whole process requires a small amount of time due to the Live-CD nature of ClusterKnoppix.From a hardware perspective two relatively old-technology PC’s, which can be found in any users basement, where used.

Figure 12 - The ClusterKnoppix environment along with the main window of OpenMosix view which shows information about the Cluster.

28 | P a g e

At this point the main aim of this report as illustrated in chapter 1 paragraph 3, is met, as it is proven that Open Source Software and Clustering technology have indeed

Page 29: Building a Supercomputer

4.0 Testing & Conclusions

To investigate the second objective of this report, the Cluster will be put to the test, in order to establish that the use of clustering has a positive impact on calculation-intensive tasks for home users.

4.1 “Over-Loading” the Cluster

Load-balancing and migration functionality of the OpenMosix kernel will be put to the test by executing a shell script in order to overload the Cluster and witness load-balancing and process migration, “live”, through migMon monitoring tool.

A script designed for testing load balancing clusters, was executed on both nodes three times, Figure 13 shows the specific script used

29 | P a g e

// testapp.c Script for testing load-balancing clusters

#include <stdio.h>

int main() { unsigned int o = 0; unsigned int i = 0;

unsigned int max = 255 * 255 * 255 * 128;

// daemonize code (flogged from thttpd) switch ( fork() ) { case 0: break; case -1: // syslog( 1, "fork - %m" ); exit( 1 ); default: exit( 0 ); }

// incrementing counters is like walking to the moon // its slow, and if you don't stop, you'll crash. while (o < max) { o++; i = 0; while (i < max) { i++;

Figure 13 - the actual C script used in order to test the load-balancing and process migration capabilities of the OpenMosix Kernel, this script was developed by the CHAOS Distribution team. Source: http://www.midnightcode.org/projects/chaos/

At this point the main aim of this report as illustrated in chapter 1 paragraph 3, is met, as it is proven that Open Source Software and Clustering technology have indeed

Page 30: Building a Supercomputer

After executing the above code, the procedure is fairly straightforward, opening any text editor under ClusterKnoppix typing the above, compiling and finally running the script as follows:

gcc testapp.c -o testapp ./testapp

load balancing and process migration, implemented automatically, by the OpenMosix kernel can be viewed by using the OpenMossixView monitoring tool. Figure 14 presents a screen shot of the above process.

Figure 14 - Process migration shown by the migMon tool which is part of OpenMosix View, the 2 nodes can be seen as well as the process migrating from one node to the other.

OpenMosix supports a “5 star” Node rating system based on Node computing power, the philosophy behind the rating system is to “migrate” processes from a node with less rating to more “powerful” nodes.The above procedure is undertaken automatically by OpenMosix, with the ability to manually balance loads, by using “drag & drop” functionality from the monitoring and management tool GUI.

30 | P a g e

Page 31: Building a Supercomputer

4.2 Fractal Calculation

Kandel, an open source Fractal creation tool, will be used to create a Mandelbrot Fractal. Kandel has been developed by the open source community for use in clustering environments making it ideal for testing the Cluster.

By setting the Concurrent Processes to 1 from the User-Menu in Kandel, a Mandelbrot Fractal is created by using just one Node from the Cluster.Next Concurrent Processes is set to 2, or as many as the nodes in the Cluster, the time required to calculate the Fractal is reduced notably, from 15 seconds when concurrent processes was equal to one, to 11 minutes. Unfortunately Kandel exhibited “unstable” behaviour, for example, abnormal program terminations, unexpected errors and exceptions as a result the above times are not considered representative. Image 15 illustrates a phase during the creation of the Mandelbrot Fractal.

31 | P a g e

Page 32: Building a Supercomputer

4.3 Audio

Encoding

FLAC, a free “lossless” audio encoder will be used to compress digital audio data in order to take up less space, 560Mb of audio data will be encoded from wave(*.wav) to FLAC format. The whole process will be undertaken by just a single node as well as by utilizing the whole Cluster.

The following script run from the bash shell, will compress the wave files by just using one node, no more than one process will run at a time,

# for x in *.wavdo flac -8 $xdone

The audio encoding process was completed in 13 minutes, by using the node equipped with the most “powerful” CPU, Celeron.

# for x in *.wavdo flac -8 $x &

Next the whole Cluster will be utilized in order to complete the encoding process, by executing the following script:

By using the”&” operator the processes will be executed in parallel by taking advantage of the automatic balance-loading and process-migration functionality of OpenMosix.The audio encoding process was now completed in 8 minutes and 25 seconds by just adding the second node!

32 | P a g e

Figure 15 - creating the Mandelbrot fractal by using the cluster reduced the calculation time

Page 33: Building a Supercomputer

4.4 Image rendering

Finally the calculation-intensive task of image rendering by using POV-Ray , an open source 3D graphics design tool will be undertaken by the Cluster.In console environment:

cp /usr/share/doc/povray/pov3demo/showoff/matches.* .gunzip matches.pov.gztime povray matches.ini +w640 +h480 +FN

the first two commands will copy a POV-Ray demo from the examples file and decompress it, while the third command will Render the demo and produce a

640x480 resolution PNG image. The following table illustrates the results of the above procedure in single processing and clustered environments.

3D Rendering using POV-Ray

CPU TIME

AMD 1,2GHz 16’ 40’’Celeron 1,4Ghz 9’ 30’’AMD Clustered 8’ 50’’Celeron Clustered

9’ 05’’

The node equipped with an AMD processor, in a single processing environment, required 16Min and 40Sec to render the image, while it only required 8Min and 50Sec when part of the Cluster.

On the other hand, the node equipped with a Celeron processor did not seem to benefit from being part of the Cluster, we can safely speculate that the superior computing power of the Celeron CPU was the reason that OpenMosix did not choose to migrate any processes from this node to the node equipped with the less-powerful AMD CPU.

33 | P a g e

Page 34: Building a Supercomputer

4.5 Conclusions

Building a ClusterKnoppix array is a relatively straightforward process with basic networking as well as Linux OS knowledge required from the user.From a hardware perspective “off-the-shelf” commodity hardware can be used, in fact as mentioned earlier, to implement the Cluster used in this report two old-technology Personal Computers were used interconnected with a UTP Crossover Cable. The cost of the above set-up was minimal, proving that any home user can utilize any computing hardware stock in his possession to enter the world of HPC even in a basic level.

Computing tasks that require large amounts of processing power seem to benefit the most by the parallel processing nature of the Cluster, with audio encoding and rendering times reduced dramatically.

On the other hand not all applications can benefit from a Clustered environment, great demand in I/O leading to network overhead is one of the main factors that affect performance.Other reasons leading to poor performance is the nature of the application itself, if not developed to be “Cluster-aware” thus not being able to fully leverage the parallel processing environment of a Cluster. In such case the gain in performance and processing-time will be negligible.

All the above performance limiting factors can be overcome by using custom, Cluster oriented or even application oriented software, as well as specialized computing and networking hardware, depending on the budget available for the specific project.

References

Miller, George A.(2009).WorldNet Princeton University, Retrieved December 13, 2009,from the website: http://wordnetweb.princeton.edu/perl/webwn?s=supercomputer

Mayank Sharma.(2004). What is a Supercomputer. Retrieved December 13, 2009, from the IBM website: http://www.ibm.com/developerworks/linux/library/l-clustknop.html

34 | P a g e

Page 35: Building a Supercomputer

Aditya Narayan.(2005).High-performance Linux clustering. Retrieved December 16, 2009, from the IBM website:http://www.ibm.com/developerworks/linux/library/l-cluster1/#N10184

Laurence t. Young & MInyi Guo.(2005),high performance computing: paradigm and infrastructure: Willey BlackwellCarol Gauthier,(2003).Introduction to HPC. paper presented at the 17th annual international symposium of High performance computing systems and applications. retrieved December 18 2009, from the website: http://hpcs2003.css.usherbrooke.ca/tutorials/Intro_to_HPC.pdf

R.H. Perrott & A.Zarea-Aliabadi(1986).Supercomputer languages[electronic version].ACM computing surveys(CSUR), volume 18, issue 1,pages 5-22

Daniel Robbins.(2005). Advantages of OpenMosix. Retrieved December 19, 2009, from the IBM website: http://www.ibm.com/developerworks/systems/articles/openmosix.html

Kris Buytaert(2004).The OpenMosix How to. Retrieved December20, 2009from http//tldp.org/HOWTO/openMosix-HOWTO/

Mosche Bar(2009).OpenMosix project leader Home page. Retrieved December 20, 2009 from http://barlab.mgh.harvard.edu/

Wim Vandersmissen.(2004).The ClusterKnoppix Project Homepage.from http://clusterknoppix.sw.be/index.htm

Mandelbrot, B.B. (1982). The Fractal Geometry of Nature: W.H. Freeman and Company.

Falconer, Kenneth (2003). Fractal Geometry: Mathematical Foundations and Applications. John Wiley & Sons

Davis Yen Pan,(1993).Digital Audio Compression[Electronic version].Digital Technical Journal, Volume 5, issue 2

Peter Shirley, Michael Ashikhmin, Michael Gleicher, Stephen Marschner, Erik Reinhard, Kelvin Sung, William Thompson,Peter Willemsen, (2005).Fundamentals of Computer Graphics, Second Edition:A K Peters

Todd Lammle(2007),CCNA: Cisco Certified Network Associate Study Guide[electronic version],New York: Sybex

35 | P a g e