History of Supercomputers

History of Supercomputers

Submitted By:Sweety YadavaHistory of SupercomputersA supercomputer is defined as a mainframe computer that is one of the fastest and most powerful. Supercomputers have grown and changed throughout their history. Their speed is unparalleled, their future is exciting, and their uses are nearly limitless. The term "Super Computing" was first used by New York World newspaper in 1929 to refer to large custom-built tabulators IBM made for Columbia University. Since then, the growth in the speed of supercomputing has taken a steady pace with a doubling time of 1.55 year in terms of Floating point Operations Per Seconds as can be seen in Figure 1 below. The term supercomputer itself is rather fluid, and today's supercomputer tends to become tomorrow's normal computer. The history of the development in the field of supercomputers has been described in phases in further sections.

Figure 1: Development of Super Computers

Year 1940-1960

The first super computer was made in England in 1943 and was named as Colossus, which had the capability to handle only 5000 characters per second. It was the first programmable digital computer and worked with the help of vacuum valves. It was a top secret project which was developed specifically for the purpose of code breaking by defense organization. It was further improved to handle 500 instructions per second and was named as Manchester Mark I in 1950. Manchester Mark I was the first machine to use the index registers for modifying base addresses. It was capable of performing 40 bit serial arithmetic with logical instructions of add, subtract and multiply. It was designed with the help of two cathode ray tubes (William tubes) as memory which measured charge at each pixel and could store 32 words in 64 rows of 42 points. It also used magnetic drums for permanent storage each of which could store information of upto 64 pages. Its main disadvantage was that it ran in batch mode wherein a series of paper tapes/ cards had to be fed in advance into the computers onto which the results were to be printed.

In the Early days of 1950, worlds first real time computer was developed by MIT called Whirlwind which was integrated with CRT displays and was designed for the purpose of flight simulation. In this design a core memory was created that could store the data in ferrite rings by the polarity of the applied magnetic field. This type of technology doubled the processing speed as compared to that of the Williams tubes which helped in processing of the continually ever changing series of inputs flight simulator control panel. The Core memory used in the whirlwind was used by the company IBM in 1956 to develop an advanced version of supercomputer called IBM 704. IBM 704 was the first mass produced computers that worked on floating-point hardware. IBM 704 was further improved to IBM 709 and IBM 7090 subsequently which had capabilities of overlapped I/O, Indirect addressing, decimal instruction and the use of transistors respectively.

Year 1960-1980

In the early days of 1960, remarkable advancements in the area of supercomputing was made. A company named LARC designed the first multiprocessing supercomputer which worked along with two central processing units and a separate I/O processor. It comprised of 26 general purpose registers with an access time of 1 microseconds which used a special form of decimal arithmetic system that could use 48 bits per word. It had core memory of 8 banks that could store upto 20000 words with an access time of 8 microseconds and a cycle time of 4 microseconds. The I/O processor could control 12 magnetic drums, 4 tape drives, a printer and a punched card reader. In the year 1961, IBM launched a newer version of 7090 known as IBM 7030 in competition with LARC. It termed out to be a failure as in spite being 30 times faster than 7090. Many advanced ideas of multiprogramming, memory protection, generalized interrupts, 8 byte instruction pipelining, prefetch and decoding developed for 7030 were used in further development of supercomputers and modern day CPUs.

Seymor Cray was employed by Control Data Corporation (CDC) in the year 1965 to develop a supercomputer named CDC 6600 which could 10 times faster than any computers of that era. Seymor Cray was also nicknamed as father of supercomputing in the later years for his contribution in the field of supercomputers. He developed a simple set of instruction for CDC 6600 to simplify the timing within the CPU. The instruction pipelining that developed in the result lead to achieve higher clock speed of 10 MHz for the first time ever. A logical address translation was used in the CDC 6600 to map the address in the user programs so that only one portion of the core memory was used at a time which helped in further movement of the user program within the core memory by the operating system. CDC 6600 was designed to contain 10 peripheral processors to handle I/O and run the operating system. The CDC 6600 is shown in Figure 2.The CDC Star-100 was the advanced version of CDC 6600 developed in the year 1974 and was the first machine ever to use a vector processor. The vector pipeline developed for CDC Star 100 was capable of filling only 50 data points per set into the vector pipeline at high setup cost. The scalar performance was sacrificed to improve the vector performance to process the algorithm at a very slow rate. This machine was termed to be a failure as it lacked several capabilities and was sold at a higher cost. In the year 1975, a new company called Cray research was founded by Seymour Cray which developed a new supercomputer called Cray-1. Cray-1 had improved vector performance along with high scalar performance as well. It used vector registers along with ECL transistors for memory operations having wire lengths less than 4 long. Worlds first auto-vecotrising fortran compiler developed on this platform had a 8MB ram and a clock speed of 80 MHz and recorded a speed of 160 million floating point operations per second. A new cooling system for supercomputers using Freon was developed by CRAY to overcome the heating problems encountered in this super computer. Figure 2: CDC 6600 computer

Year 1980-1990

The era of 1980-1990 are also referred as vector years as a large number of small competitors entered into the market of supercomputers and most of them couldnt sustain the growing rate of competition in the field. IBM and HP purchased many of such companies in 1980 which helped them in gaining expertise and experience for the present day market of super computers. The CDC developed a newer version of CDC star-100 known as CDC Cyber-205 in competition to CRAY-1 in 1981. It composed of a hand crafted assembly code with 1-4 separate vector units made of semiconductor memory. Even after all these efforts CDC Cyber-205 could not match the peak speed of its competitors. In the year 1983, CRAY Research developed a parallel (1-4) vector processor machine having a clock speed of 120 MHz with 8-128 MB RAM known as CRAY X-MP. It had a processing speed of 125 milliion floating point operations per second per CPU. It had better chaining support with parallel arithmetic pipelines and shared memory access with multiple pipelines per processor. It was also launched with support of UNIX Operating system in 1984 which widened its applicability. IN the year 1985, CRAY-2 was developed which was a compact design with 4-8 processor and consisted of main memory in a range of 512 MB-4 GB but had a lower memory latency than CRAY X-MP. CRAY Y-MP was launched in 1988 had upto 16 processors. In response to the growing competition CDC developed a new product called ETA-10 in 1989, which also turned out to be a failure like his previous counterparts. ETA-10 was compatible with CDC Cyber-205 had pipelined memory and came with two cooling variants. ETA-10 had lower memory capacity than its counterpart CRAY-2. In the later 1980s and 1990s, attention turned from vector processors to massive parallel processing systems with thousands of "ordinary" CPUs, some being off the shelf units and others being custom designs.During this era, many of the Japanese companies like NEC, Fujitsu and Hitachi also entered into the market of supercomputers and contributed significantly in the field of vector computing by their knowledge in chip technology. The vector processor developed by NEC SX-3 was the most powerful processor ever built at the rate of 5.5 Giga Floating point operations per second. Fujitsu developed an architecture for vector computing in the early 1990 which was known as Fujitsu Numerical Wind Tunnel. It was built with advanced semiconductors (Ga-As) which hada gate delay of less than 60 pico seconds. Each of the CPU had 4 independent pipelines with a peak speed of 1.7 Giga-floating point operations per second and a main memory of 256 MB.

Year 1990-2010

The advancement in CMOS VLSI technology brought up a radical change in the super computer industry. The size of the microprocessor started reducing and the clock speed crossed the barrier of 100 MHz in this era. The low cost and high speed microprocessor developed in 1990s brought about a boom in the field of supercomputers. A massive parallel processing machine called Intel ASCI-Red (Figure 3) was developed in 1995 under the Accelerated Strategic Computing Initiative (ASCI) program of the Department of Energy (DoE) an National Nuclear Security Administration (NNSA) for building a nuclear weapon simulator. It is based on multiple instruction, multiple data (MIMD) paradigm with 38x32x2 CPUs (Pentium II Xeons), 1212 GB RAM 12.5 TB hard disk. Beowulf super computer was developed in 1994, by Don Becker and Thomas Sterling at Goddard Space Flight Center in NASA. It is made from 16 486DX processor built with 10 Mbps Ethernet bus. Although such a system has a speed of single digit Giga floating point operation per second but has a capability of being build anywhere by anyone with the help of commonly used home computers.In the year 2000, IBM came up with ASCI-White which was a cluster of RS/6000 SP computer. It was composed of 512 machines each containing 16 CPUs and had 6 TB RAM with 160 TB of disk memory. A similar cluster was developed in 2002 which was based on NEC SX-6 computers with 8 vector processor and 16 GB RAM. It was designed to simulate the global climate change with its 640 nodes comprising of 5120 CPUs and 10 TB RAM. IBM also came up with a newer version ASCI Blue Gene in 2005 which was 10 times faster than the Earth Simulator. It was a constellation computer made of integrated collection of smaller parallel nodes. It was composed of 65536 CPUs connected through 3 integrated networks at different underlying topologies. In the Year 2008, IBM came up with the fastest super computer built yet called the Roadrunner. It is composed of 6,480 dual core Opteron CPUs to handle O/S, interconnection and scheduling with 12,980 Power XCell 8i CPUs specifically to handle computation. It is a unique hybrid architecture which needs specially written software with complex programs.

Figure 3: Intel ASCI-Red SupercomputerCray research developed Jaguar XT5 Super computer at the Oak Ridge national laboratory in 2009. It has a peak performance of 1750 Tera-Floating point operation per second. It contains 18,688 computing nodes, each node containing a dual hex-core AMD opteron 2435 processor with 16 GB memory. An external Luster file system called Spider which has a Read/Write capability at a rate of 240 GB/Sec and provides over 10 Peta-Byte storage capacity. The graphs plotted below in Figures 4-5, depicts the trends observed in various parameters of supercomputing in this era. It has been observed that vector processing has lost its importance in the middle but now it is back again. Parallel processing was gaining a steep curve which indicated that t would stay for a longer period of time in the years to come. The size of supercomputer/mainframe market has shrunk dramatically during 1990s. Cluster computing gives high performance with marginal cost using commodity components. There is a need for parallel programming concepts as it was difficult to exploit many processor for a single task.Many hatdware and software concepts developed for supercomputers are now being used in latest commodity high performance CPUs. Electrical power requirement became non trivial due to rise in green computing.

Figure 4: Architecture share over timeFigure 5: Processor Architecture share over time

Figure 6: Processor Family Share over time

Figure 7: Operating System family share over time

Figure 8: Number of processor share over time

2010-presentAn upgraded version of the Jaguar was built by CRAY in 2012 called the Titan. Titan uses a graphical processing unit in addition of the conventional Central processing Unit in a hybrid archietecture. It is made up of 18,688AMD Opteron6274 16-core CPUs and 18,688Nvidia TeslaK20X GPUs. It has a storage capacity of over 40 PB with a file transfer capacity of 1.4 TB/Sec. It has a total memory of 693.5 TiB comprising tof both CUP and GPU. Tianhe -2 (Also Known as Milky way 2) is currently the worlds fastest super computer developed by a team of 1300 scientist and engineers under the 863 high technology program by Chinese government in 2013. Tianhe-2 consist of 16000 computer nodes, each comprising of two IntelIvy BridgeXeon processors and threeXeon Phicoprocessorchips. TIanhe-2 has a memory of 1375 TiB and has a speed of 33.86 PFLOPS with a storage capacity of 12.4 PB. Table 1 below provides a list of top 5 supercomputers currently in operation throughout the world according to their ranking in top500.org.

S. NO.SITESYSTEMCORESRMAX (TFLOP/S)

1National Super Computer Center in GuangzhouTianhe-2 (MilkyWay-2)- TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P3,120,00033,862.70

ChinaNUDT

2DOE/SC/Oak Ridge National LaboratoryTitan- Cray XK7 , Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x560,64017,590.00

United StatesCray Inc.

3DOE/NNSA/LLNLSequoia- BlueGene/Q, Power BQC 16C 1.60 GHz, Custom1,572,86417,173.20

United StatesIBM

4RIKEN Advanced Institute for Computational Science (AICS)K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect705,02410,510.00

JapanFujitsu

5DOE/SC/Argonne National LaboratoryMira- BlueGene/Q, Power BQC 16C 1.60GHz, Custom786,4328,586.60

United StatesIBM

References:1. Bell, G., "The Future of High Performance Computers in Science and Engineering",Communications of the ACM, Vol 32, No. 9, September 1989, pp 1091-1101

2. Sterling, Thomas; Paul Messina; and Paul H. Smith, "Enabling Technologies for Petaflops Computing", MIT Press, Cambridge, MA, July 1995

3. Woodford, Chris. (2012), Supercomputers. Retrieved from http://www.explainthatstuff.com/how-supercomputers-work.html [Accessed 5/5/2015]

4. J. E. Thornton, design of a computer- The control data 6600, Scott, Foresman, Glenview iii, 1970

5. R.G. Hints and D. P. Tate, Control data Star-100 processor design. In. Proc. Compcon 73, page 1-4, New York, 1972, IEEE computer society conference, 1972, IEEE

6. Computer Architecture A Quantitative Approach (2nd edition), John L Hennessy and David A Patterson, Morgan Kaufmann Pub. Inc. (1996).

7. http://www.sandia.gov/ASCI/Red/RedFacts.html [Accessed 5/5/2015]

8. https://www.llnl.gov/str/Seager.html [Accessed 5/5/2015]

9. https://www.jamstec.go.jp/es/en/index.html [Accessed 5/5/2015

10. 10.11. https://asc.llnl.gov/computing_resources/bluegenel [Accessed 5/5/2015]

Documents

History of Supercomputers