84
1 What Happens When What Happens When Processing Processing Storage Storage Bandwidth Bandwidth are Free and are Free and Infinite? Infinite? Jim Gray Microsoft Research

What Happens When Processing Storage Bandwidth are Free and Infinite?

  • Upload
    valin

  • View
    30

  • Download
    7

Embed Size (px)

DESCRIPTION

What Happens When Processing Storage Bandwidth are Free and Infinite?. Jim Gray Microsoft Research. Outline. Clusters of Hardware CyberBricks all nodes are very intelligent Software CyberBricks standard way to interconnect intelligent nodes What next? - PowerPoint PPT Presentation

Citation preview

Page 1: What Happens When Processing Storage Bandwidth  are Free and Infinite?

1

What Happens WhenWhat Happens WhenProcessingProcessing

StorageStorageBandwidth Bandwidth

are Free and Infinite?are Free and Infinite?

Jim Gray

Microsoft Research

Page 2: What Happens When Processing Storage Bandwidth  are Free and Infinite?

2

OutlineOutline Clusters of Hardware CyberBricks

– all nodes are very intelligent Software CyberBricks

– standard way to interconnect intelligent nodes What next?

– Processing migrates to where the power is• Disk, network, display controllers have full-blown OS

• Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA) to them

• Computer is a federated distributed system.

Page 3: What Happens When Processing Storage Bandwidth  are Free and Infinite?

3

When When Computers & Communication are FreeComputers & Communication are Free

Traditional computer industry is 0 B$/year All the costs are in

– Content (good)– System Management (bad)

• A vendor claims it costs 8$/MB/year to manage disk storage.– => WebTV (1GB drive) costs 8,000$/year to manage!

– => 10 PB DB costs 80 Billion $/year to manage!

• Automatic management is ESSENTIAL

In the mean time….

Page 4: What Happens When Processing Storage Bandwidth  are Free and Infinite?

4

1980 Rule of Thumb1980 Rule of Thumb You need a systems’ programmer per MIPS You need a Data Administrator per 10 GB

Page 5: What Happens When Processing Storage Bandwidth  are Free and Infinite?

5

One Person per MegaBuckOne Person per MegaBuck 1 Breadbox ~ 5x 1987 machine room 48 GB is hand-held One person does all the work Cost/tps is 1,000x less

25 micro dollars per transaction A megabuck buys 40 of these!!!

4x200 Mhz cpu1/2 GB DRAM12 x 4GB disk

Hardware expertOS expertNet expertDB expertApp expert

3 x7 x 4GB disk arrays

Page 6: What Happens When Processing Storage Bandwidth  are Free and Infinite?

6

All God’s Children Have Clusters!All God’s Children Have Clusters!Buying Computing By the SliceBuying Computing By the Slice

People are buying computers by the gross– After all, they only cost 1k$/slice!

Clustering them together

Page 7: What Happens When Processing Storage Bandwidth  are Free and Infinite?

7

A cluster is a cluster is a A cluster is a cluster is a clustercluster

It’s so natural,even mainframes cluster !Looking closer at usage patterns, a few models emerge

Looking closer at sites, hierarchies bunches functional specializationemerge

Which are the roses ? Which are the briars ?

Page 8: What Happens When Processing Storage Bandwidth  are Free and Infinite?

8

““Commercial” NT ClustersCommercial” NT Clusters

16-node Tandem Cluster– 64 cpus

– 2 TB of disk

– Decision support 45-node Compaq Cluster

– 140 cpus

– 14 GB DRAM

– 4 TB RAID disk

– OLTP (Debit Credit)• 1 B tpd (14 k tps)

Page 9: What Happens When Processing Storage Bandwidth  are Free and Infinite?

9

Tandem Oracle/NTTandem Oracle/NT 27,383 tpmC 71.50 $/tpmC 4 x 6 cpus 384 disks

=2.7 TB

Page 10: What Happens When Processing Storage Bandwidth  are Free and Infinite?

13

Microsoft.com: ~150x4 nodesMicrosoft.com: ~150x4 nodes

SwitchedEthernet

SwitchedEthernet

www.microsoft.com(3)

search.microsoft.com(1)

premium.microsoft.com(1)

European Data Center

FTPDownload Server

(1)

SQL SERVERS(2)

Router

msid.msn.com(1)

MOSWestAdmin LAN

SQLNetFeeder LAN

FDDI Ring(MIS4)

Router

www.microsoft.com(5)

Building 11

Live SQL Server

Router

home.microsoft.com(5)

FDDI Ring(MIS2)

www.microsoft.com(4)

activex.microsoft.com(2)

search.microsoft.com(3)

register.microsoft.com(2)

msid.msn.com(1)

FDDI Ring(MIS3)

www.microsoft.com(3)

premium.microsoft.com(1)

msid.msn.com(1)

FDDI Ring(MIS1)

www.microsoft.com(4)

premium.microsoft.com(2)

register.microsoft.com(2)

msid.msn.com(1) Primary

Gigaswitch

SecondaryGigaswitch

Staging Servers(7)

search.microsoft.com(3)

support.microsoft.com(2)

register.msn.com(2)

The Microsoft.Com Site

MOSWest

DMZ Staging Servers

\\Tweeks\Statistics\LAN and Server Name Info\Cluster Process Flow\MidYear98a.vsd12/15/97

Internet

Internet

Log Processing

All servers in Building11are accessable fromcorpnet.

IDC Staging Servers

Live SQL Servers

SQL Consolidators

Japan Data Centerwww.microsoft.com

(3)premium.microsoft.com(1)

HTTPDownload Servers

(2) Router

search.microsoft.com(2)

SQL SERVERS(2)

msid.msn.com(1)

FTPDownload Server

(1)Router

Router

Router

Router

Router

Router

Router

Router

Internal WWW

SQL Reporting

home.microsoft.com(4)

home.microsoft.com(3)

home.microsoft.com(2)

register.microsoft.com(1)

support.microsoft.com(1)

Internet

13DS3

(45 Mb/Sec Each)

2OC3

(100Mb/Sec Each)

2Ethernet

(100 Mb/Sec Each)

cdm.microsoft.com(1)

FTP Servers

DownloadReplication

Ave CFG: 4xP6,512 RAM,160 GB HDAve Cost: $83KFY98 Fcst: 12

Ave CFG: 4xP5,256 RAM,12 GB HDAve Cost: $24KFY98 Fcst: 0

Ave CFG: 4xP6,512 RAM,30 GB HDAve Cost: $35KFY98 Fcst: 3

Ave CFG: 4xP6,512 RAM,50 GB HDAve Cost: $50KFY98 Fcst: 17

Ave CFG: 4xP6,512 RAM,30 GB HDAve Cost: $43KFY98 Fcst: 10

Ave CFG: 4xP6512 RAM28 GB HDAve Cost: $35KFY98 Fcst: 17 Ave CFG: 4xP6,

256 RAM,30 GB HDAve Cost: $25KFY98 Fcst: 2

Ave CFG: 4xP6,512 RAM,30 GB HDAve Cost: $28KFY98 Fcst: 3

Ave CFG: 4xP6,512 RAM,50 GB HDAve Cost: $35KFY98 Fcst: 2

Ave CFG: 4xP5,512 RAM,30 GB HDAve Cost: $35KFY98 Fcst: 12

Ave CFG: 4xP6,512 RAM,160 GB HDAve Cost: $80KFY98 Fcst: 2

Ave CFG: 4xP6,1 GB RAM,180 GB HDAve Cost: $128KFY98 Fcst: 2

Ave CFG: 4xP5,512 RAM,30 GB HDAve Cost: $28KFY98 Fcst: 0

Ave CFG: 4xP6,512 RAM,30 GB HDAve Cost: $28KFY98 Fcst: 7

Ave CFG: 4xP5,256 RAM,20 GB HDAve Cost: $29KFY98 Fcst: 2

Ave CFG: 4xP6,512 RAM,30 GB HDAve Cost: $35KFY98 Fcst: 9

Ave CFG: 4xP6,512 RAM,50 GB HDAve Cost: $50KFY98 Fcst: 1

Ave CFG: 4xP6,512 RAM,50 GB HDAve Cost: $50KFY98 Fcst: 1

Ave CFG: 4xP6,512 RAM,160 GB HDAve Cost: $80KFY98 Fcst: 1

Ave CFG: 4xP6,512 RAM,160 GB HDAve Cost: $80KFY98 Fcst: 1

FTP.microsoft.com(3)

Ave CFG: 4xP5,512 RAM,30 GB HDAve Cost: $28KFY98 Fcst: 0

Ave CFG: 4xP6,512 RAM,30 GB HDAve Cost: $35KFY98 Fcst: 1

Ave CFG: 4xP6,512 RAM,30 GB HDAve Cost: $35KFY98 Fcst: 1

Ave CFG: 4xP6,1 GB RAM,160 GB HDAve Cost: $83KFY98 Fcst: 2

Page 11: What Happens When Processing Storage Bandwidth  are Free and Infinite?

14

HotMail: ~400 ComputersHotMail: ~400 Computers

LocalDirector

Front Door(P-200, 128MB)140 +10/mo

FreeBSD/Apache

200

MB

ps I

nter

net l

ink

Graphics15xP6

FreeBSD/Hotmail

Ad10xP6

FreeBSD/Apache

Incoming Mail25xP-200

FreeBSD/hm-SMTP

LocalDirector

LocalDirector

LocalDirector

Security2xP200-FreeBSD

Member Dir

U StoreE3k,xxMB, 384GB RAID5 +

DLT tape robotSolaris/HMNNFS

50 machines, many old13 + 1.5/mo 1 per million users

Ad Pacer3 P6

FreeBSD

Cisco Catalyst 5000Enet Switch

Loc

al 1

0 M

bps

Sw

itch

ed E

ther

net

M Serv(SPAC Ultra-1, ??MB)

4- replicasSolaris

TelnetMaintenance

Interface

Page 12: What Happens When Processing Storage Bandwidth  are Free and Infinite?

15

InktomiInktomi ( (hotbothotbot), ), WebTVWebTV: > 200 nodes: > 200 nodes Inktomi: ~250 UltraSparcs

– web crawl– index crawled web and save index– Return search results on demand– Track Ads and click-thrus – ACID vs BASE (basic Availability, Serialized Eventually)

Web TV– ~200 UltraSparcs

• Render pages, Provide Email

– ~ 4 Network Appliance NFS file servers– A large Oracle app tracking customers

Page 13: What Happens When Processing Storage Bandwidth  are Free and Infinite?

16

Loki: Pentium Loki: Pentium Clusters for ScienceClusters for Science

http://loki-www.lanl.gov/http://loki-www.lanl.gov/

16 Pentium Pro Processorsx 5 Fast Ethernet interfaces+ 2 Gbytes RAM+ 50 Gbytes Disk+ 2 Fast Ethernet switches+ Linux…………………...

= 1.2 real Gflops for $63,000(but that is the 1996 price)

Beowulf project is similarhttp://cesdis.gsfc.nasa.gov/pub/people/becker/

beowulf.html Scientists want cheap mips.

Page 14: What Happens When Processing Storage Bandwidth  are Free and Infinite?

17

Intel/Sandia: 9000x1 node Ppro

LLNL/IBM: 512x8 PowerPC (SP2)

LNL/Cray: ?

Maui Supercomputer Center– 512x1 SP2

Your Tax Dollars At WorkYour Tax Dollars At WorkASCI for Stockpile StewardshipASCI for Stockpile Stewardship

Page 15: What Happens When Processing Storage Bandwidth  are Free and Infinite?

18

Berkeley NOW Berkeley NOW (network of workstations)(network of workstations) Project Projecthttp://now.cs.berkeley.edu/http://now.cs.berkeley.edu/

105 nodes– Sun UltraSparc 170,

128 MB, 2x2GB disk

– Myrinet interconnect (2x160MBps per node)

– SBus (30MBps) limited GLUNIX layer above Solaris Inktomi (HotBot search) NAS Parallel Benchmarks Crypto cracker Sort 9 GB per second

Page 16: What Happens When Processing Storage Bandwidth  are Free and Infinite?

19

Wisconsin COWWisconsin COW 40 UltraSparcs

64MB + 2x2GB disk+ Myrinet

SUN OS Used as a compute engine

Page 17: What Happens When Processing Storage Bandwidth  are Free and Infinite?

20

Andrew Chien’s JBOBAndrew Chien’s JBOBhttp://www-csag.cs.uiuc.edu/individual/achien.htmlhttp://www-csag.cs.uiuc.edu/individual/achien.html

48 nodes 36 HP 2PIIx128 1 disk

Kayak boxes 10 Compaq 2PIIx128 1 disk,

Wkstation 6000 32-Myrinet&16-ServerNet

connected Operational All running NT

Page 18: What Happens When Processing Storage Bandwidth  are Free and Infinite?

21

NCSA ClusterNCSA Cluster The National Center for

Supercomputing ApplicationsUniversity of Illinois @ Urbana

500 Pentium cpus, 2k disks, SAN Compaq + HP +Myricom A Super Computer for 3M$ Classic Fortran/MPI programming NT + DCOM programming model

Page 19: What Happens When Processing Storage Bandwidth  are Free and Infinite?

22

4 B PC’s 4 B PC’s (1 Bips, .1GB dram, 10 GB disk 1 Gbps Net, B=G)(1 Bips, .1GB dram, 10 GB disk 1 Gbps Net, B=G)

The Bricks of CyberspaceThe Bricks of Cyberspace Cost 1,000 $ Come with

– NT

– DBMS

– High speed Net

– System management

– GUI / OOUI

– Tools

Compatible with everyone else CyberBricks

Page 20: What Happens When Processing Storage Bandwidth  are Free and Infinite?

23

Super Server: 4T MachineSuper Server: 4T Machine Array of 1,000 4B machinesArray of 1,000 4B machines

1 b ips processors1 b ips processors1 B B DRAM 1 B B DRAM 10 B B disks 10 B B disks 1 Bbps comm lines1 Bbps comm lines1 TB tape robot1 TB tape robot

A few megabucksA few megabucks Challenge:Challenge:

ManageabilityManageabilityProgrammabilityProgrammabilitySecuritySecurityAvailabilityAvailabilityScaleabilityScaleabilityAffordabilityAffordability

As easy as a single systemAs easy as a single systemFuture servers are CLUSTERSFuture servers are CLUSTERSof processors, discsof processors, discs

Distributed database techniquesDistributed database techniquesmake clusters workmake clusters work

CPU

50 GB Disc

5 GB RAM

Cyber BrickCyber Bricka 4B machinea 4B machine

Page 21: What Happens When Processing Storage Bandwidth  are Free and Infinite?

24

Cluster VisionCluster VisionBuying Computers by the SliceBuying Computers by the Slice

Rack & Stack– Mail-order components

– Plug them into the cluster Modular growth without limits

– Grow by adding small modules Fault tolerance:

– Spare modules mask failures Parallel execution & data search

– Use multiple processors and disks Clients and servers made from the same stuff

– Inexpensive: built with commodity CyberBricks

Page 22: What Happens When Processing Storage Bandwidth  are Free and Infinite?

25

Nostalgia Behemoth in the BasementNostalgia Behemoth in the Basement today’s PC

is yesterday’s supercomputer Can use LOTS of them Main Apps changed:

– scientific commercial web

– Web & Transaction servers

– Data Mining, Web Farming

Page 23: What Happens When Processing Storage Bandwidth  are Free and Infinite?

26

SMP -> nUMA: BIG FAT SERVERSSMP -> nUMA: BIG FAT SERVERS Directory based caching

lets you build large SMPs Every vendor building a

HUGE SMP – 256 way

– 3x slower remote memory

– 8-level memory hierarchy• L1, L2 cache• DRAM• remote DRAM (3, 6, 9,…)• Disk cache• Disk• Tape cache• Tape

Needs– 64 bit addressing– nUMA sensitive OS

• (not clear who will do it)

Or Hypervisor– like IBM LSF, – Stanford Disco

www-flash.stanford.edu/Hive/papers.html

You get an expensive cluster-in-a-box with very fast network

Page 24: What Happens When Processing Storage Bandwidth  are Free and Infinite?

28

ThesisThesisMany little beat few bigMany little beat few big

Smoking, hairy golf ballSmoking, hairy golf ball How to connect the many little parts?How to connect the many little parts? How to program the many little parts?How to program the many little parts? Fault tolerance?Fault tolerance?

$1 $1 millionmillion $100 K$100 K $10 K$10 K

MainframeMainframe MiniMiniMicroMicro NanoNano

14"14"9"9"

5.25"5.25" 3.5"3.5" 2.5"2.5" 1.8"1.8"1 M SPEC marks, 1TFLOP1 M SPEC marks, 1TFLOP

101066 clocks to bulk ram clocks to bulk ram

Event-horizon on chipEvent-horizon on chip

VM reincarnatedVM reincarnated

Multi-program cache,Multi-program cache,On-Chip SMPOn-Chip SMP

10 microsecond ram

10 millisecond disc

10 second tape archive

10 nano-second ram

Pico Processor

10 pico-second ram

1 MM 3

100 TB

1 TB

10 GB

1 MB

100 MB

Page 25: What Happens When Processing Storage Bandwidth  are Free and Infinite?

29

A Hypothetical QuestionA Hypothetical QuestionTaking things to the limitTaking things to the limit

Moore’s law 100x per decade:– Exa-instructions per second in 30 years

– Exa-bit memory chips

– Exa-byte disks Gilder’s Law of the Telecosom

3x/year more bandwidth60,000x per decade!

– 40 Gbps per fiber today

Page 26: What Happens When Processing Storage Bandwidth  are Free and Infinite?

30

Grove’s LawGrove’s Law

Link Bandwidth doubles every 100 years! Not much has happened to telephones lately Still twisted pair

Page 27: What Happens When Processing Storage Bandwidth  are Free and Infinite?

31

Gilder’s Telecosom Law: Gilder’s Telecosom Law: 3x bandwidth/year for 25 more years3x bandwidth/year for 25 more years Today:

– 10 Gbps per channel– 4 channels per fiber: 40 Gbps– 32 fibers/bundle = 1.2 Tbps/bundle

In lab 3 Tbps/fiber (400 x WDM) In theory 25 Tbps per fiber 1 Tbps = USA 1996 WAN bisection bandwidth

1 fiber = 25 Tbps

Page 28: What Happens When Processing Storage Bandwidth  are Free and Infinite?

32

CHALLENGE– reduce software tax

on messages– Today 30 K ins + 10

ins/byte

– Goal: 1 K ins + .01 ins/byte Best bet:

– SAN/VIA

– Smart NICs

– Special protocol – User-Level Net IO (like disk)

NetworkingNetworkingBIG!!BIG!! Changes coming! Changes coming!

Technology– 10 GBps bus “now”– 1 Gbps links “now”– 1 Tbps links in 10 years– Fast & cheap switches

Standard interconnects– processor-processor– processor-device (=processor)

Deregulation WILL work someday

Page 29: What Happens When Processing Storage Bandwidth  are Free and Infinite?

33

What if What if Networking Was as Cheap As Disk IO?Networking Was as Cheap As Disk IO?

TCP/IP– Unix/NT

100% cpu @ 40MBps

Disk– Unix/NT

8% cpu @ 40MBps

Why the Difference?Host Bus Adapter does

SCSI packetizing, checksum,…flow controlDMA

Host doesTCP/IP packetizing, checksum,…flow controlsmall buffers

Page 30: What Happens When Processing Storage Bandwidth  are Free and Infinite?

34

The Promise of SAN/VIAThe Promise of SAN/VIA10x better in 2 years10x better in 2 years

Today: – wires are 10 MBps (100 Mbps Ethernet)

– ~20 MBps tcp/ip saturates 2 cpus– round-trip latency is ~300 us

In two years– wires are 100 MBps (1 Gbps Ethernet, ServerNet,…)– tcp/ip ~ 100 MBps 10% of each processor– round-trip latency is 20 us

works in lab todayassumes app uses zero-copy Winsock2 api.See http://www.viarch.org/

0

50

100

150

200

250

Bandwidth Latency Overhead

Now

Soon

Page 31: What Happens When Processing Storage Bandwidth  are Free and Infinite?

36

Functionally Specialized CardsFunctionally Specialized Cards Storage

Network

Display

M MB DRAM

P mips processor

ASIC

ASIC

ASIC

Today:

P=50 mips

M= 2 MB

In a few years

P= 200 mips

M= 64 MB

Page 32: What Happens When Processing Storage Bandwidth  are Free and Infinite?

37

It’s Already True of PrintersIt’s Already True of PrintersPeripheral = CyberBrickPeripheral = CyberBrick

You buy a printer You get a

– several network interfaces– A Postscript engine

• cpu, • memory, • software,• a spooler (soon)

– and… a print engine.

Page 33: What Happens When Processing Storage Bandwidth  are Free and Infinite?

38

System On A ChipSystem On A Chip Integrate Processing with memory on one chip

– chip is 75% memory now– 1MB cache >> 1960 supercomputers– 256 Mb memory chip is 32 MB!– IRAM, CRAM, PIM,… projects abound

Integrate Networking with processing on one chip– system bus is a kind of network– ATM, FiberChannel, Ethernet,.. Logic on chip.– Direct IO (no intermediate bus)

Functionally specialized cards shrink to a chip.

Page 34: What Happens When Processing Storage Bandwidth  are Free and Infinite?

39

Tera Byte Backplane

TODAY– Disk controller is 10 mips risc engine

with 2MB DRAM– NIC is similar power

SOON– Will become 100 mips systems

with 100 MB DRAM. They are nodes in a federation

(can run Oracle on NT in disk controller).

Advantages– Uniform programming model– Great tools– Security– economics (cyberbricks)– Move computation to data (minimize traffic)

All Device Controllers will be Cray 1’sAll Device Controllers will be Cray 1’s

CentralProcessor &

Memory

Page 35: What Happens When Processing Storage Bandwidth  are Free and Infinite?

40

With Tera Byte InterconnectWith Tera Byte Interconnectand Super Computer Adaptersand Super Computer Adapters

Processing is incidental to – Networking– Storage– UI

Disk Controller/NIC is – faster than device– close to device– Can borrow device

package & power So use idle capacity for computation. Run app in device.

Tera ByteBackplane

Page 36: What Happens When Processing Storage Bandwidth  are Free and Infinite?

41

ImplicationsImplications

Offload device handling to NIC/HBA

higher level protocols: I2O, NASD, VIA…

SMP and Cluster parallelism is important.

Tera Byte Backplane

Move app to NIC/device controller

higher-higher level protocols: CORBA / DCOM.

Cluster parallelism is VERY important.

CentralProcessor &

Memory

Conventional Radical

Page 37: What Happens When Processing Storage Bandwidth  are Free and Infinite?

42

How Do They Talk to Each Other?How Do They Talk to Each Other? Each node has an OS Each node has local resources: A federation. Each node does not completely trust the others. Nodes use RPC to talk to each other

– CORBA? DCOM? IIOP? RMI?

– One or all of the above. Huge leverage in high-level interfaces. Same old distributed system story.

Wire(s)VIAL/VIPL

stre

ams

data

gram

s

RP

C?

Applications

VIAL/VIPL

streams

datagrams

RP

C ?

Applications

Page 38: What Happens When Processing Storage Bandwidth  are Free and Infinite?

43

Punch LinePunch LineThe huge clusters we saware prototypes for this:

A Federation of

Functionally specialized nodesEach node shrinks to a “point” device

With embedded processing.Each node / device is autonomous

Each talks a high-level protocol

Page 39: What Happens When Processing Storage Bandwidth  are Free and Infinite?

44

OutlineOutline Hardware CyberBricks

– all nodes are very intelligent Software CyberBricks

– standard way to interconnect intelligent nodes What next?

– Processing migrates to where the power is• Disk, network, display controllers have full-blown OS

• Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA) to them

• Computer is a federated distributed system.

Page 40: What Happens When Processing Storage Bandwidth  are Free and Infinite?

45

Software CyberBricks: Objects!Software CyberBricks: Objects!

It’s a zoo Objects and 3-tier computing (transactions)

– Give natural distribution & parallelism– Give remote management!– TP & Web: Dispatch RPCs to pool of object servers

Components are a 1B$ business today!

Page 41: What Happens When Processing Storage Bandwidth  are Free and Infinite?

46

The COMponent PromiseThe COMponent Promise

Objects are Software CyberBricks– productivity breakthrough (plug ins)

– manageability breakthrough (modules) Microsoft Promise

DCOM + ActiveX + IBM/Sun/Oracle/Netscape promise

CORBA + Open Doc + Java Beans + Both promise

– parallel distributed execution

– centralized management of distributed system

Both campsShare key goals: Encapsulation: hide implementation Polymorphism: generic ops

key to GUI and reuse Uniform Naming Discovery: finding a service Fault handling: transactions Versioning: allow upgrades Transparency: local/remote Security: who has authority Shrink-wrap: minimal inheritance Automation: easy

Page 42: What Happens When Processing Storage Bandwidth  are Free and Infinite?

47

So

lari

sU

NIX

Inte

rnat

ion

al

OSFDCE

Op

en s

oft

war

e F

ou

nd

atio

n (

OS

F)

NT

ODBCXA / TX

Ob

ject

M

anag

emen

t G

rou

p (

OM

G)

CORBAOpenGroup

History and Alphabet SoupHistory and Alphabet Soup

1985

1990

1995

X/O

pen

DCE

RPC

GUIDs

IDL

DNS

Kerber

os

COM

Microsoft DCOM based on OSF-DCE TechnologyDCOM and ActiveX extend it

COM

Page 43: What Happens When Processing Storage Bandwidth  are Free and Infinite?

50

Objects Meet DatabasesObjects Meet Databasesbasis for basis for universaluniversal data servers, access, & integration data servers, access, & integration

DBMSDBMSengineengine

Object-oriented (COM oriented) interface to data

Breaks DBMS into components Anything can be

a data source Optimization/navigation

“on top of” other data sources

Makes an RDBMS anO-R DBMS assuming optimizer understands objects

DatabaseDatabase

SpreadsheetSpreadsheet

PhotosPhotos

MailMail

MapMap

DocumentDocument

Page 44: What Happens When Processing Storage Bandwidth  are Free and Infinite?

51

The BIG PictureThe BIG PictureComponents and transactionsComponents and transactions

Software modules are objects Object Request Broker (a.k.a., Transaction Processing Monitor)

connects objects (clients to servers)

Standard interfaces allow software plug-ins Transaction ties execution of a “job” into an atomic unit:

all-or-nothing, durable, isolated

Object RequestObject Request BrokerBroker

Page 45: What Happens When Processing Storage Bandwidth  are Free and Infinite?

53

The OO Points So FarThe OO Points So Far

Objects are software Cyber Bricks Object interconnect standards are emerging Cyber Bricks become Federated Systems. Next points:

– put processing close to data

– do parallel processing.

Page 46: What Happens When Processing Storage Bandwidth  are Free and Infinite?

56

Transaction Processing Transaction Processing Evolution to Three TierEvolution to Three Tier

Intelligence migrated to clientsIntelligence migrated to clients Mainframe Batch processing

(centralized)

Dumb terminals & Remote Job Entry

Intelligent terminals database backends

Workflow SystemsObject Request BrokersApplication Generators

Mainframe

cards

Active

green screen3270

Server

TP Monitor

ORB

Page 47: What Happens When Processing Storage Bandwidth  are Free and Infinite?

57

Web Evolution to Three TierWeb Evolution to Three TierIntelligence migrated to clients (like TP)Intelligence migrated to clients (like TP)

Character-mode clients, smart servers

GUI Browsers - Web file servers

GUI Plugins - Web dispatchers - CGI

Smart clients - Web dispatcher (ORB)pools of app servers (ISAPI, Viper)workflow scripts at client & server

archie ghophergreen screen

WebServer

Mosaic

WAIS

NS & IE

Active

Page 48: What Happens When Processing Storage Bandwidth  are Free and Infinite?

58

PC Evolution to Three TierPC Evolution to Three Tier Intelligence migrated to serverIntelligence migrated to server

Stand-alone PC (centralized)

PC + File & print servermessage per I/O

PC + Database server message per SQL statement

PC + App server message per transaction

ActiveX Client, ORB ActiveX server, Xscript

disk I/OIO request

reply

SQL Statement

Transaction

Page 49: What Happens When Processing Storage Bandwidth  are Free and Infinite?

59

Why Did Everyone Go To Three-Why Did Everyone Go To Three-Tier?Tier?

Manageability– Business rules must be with data

– Middleware operations tools

Performance (scaleability)– Server resources are precious

– ORB dispatches requests to server pools

Technology & Physics– Put UI processing near user

– Put shared data processing near shared data

– Minimizes data moves

– Encapsulate / modularityDatabase

Business Objects

workflow

Presentation

Page 50: What Happens When Processing Storage Bandwidth  are Free and Infinite?

60

DAD’sRaw Data

Customer comes to storeTakes what he wantsFills out invoiceLeaves money for goods

Easy to buildNo clerks

Why Put Business Objects at Why Put Business Objects at Server?Server?

Customer comes to store with list Gives list to clerk Clerk gets goods, makes invoiceCustomer pays clerk, gets goods

Easy to manageClerks controls accessEncapsulation

MOM’s Business Objects

Page 51: What Happens When Processing Storage Bandwidth  are Free and Infinite?

61

The OO Points So FarThe OO Points So Far

Objects are software Cyber Bricks Object interconnect standards are emerging Cyber Bricks become Federated Systems. Put processing close to data Next point:

– do parallel processing.

Page 52: What Happens When Processing Storage Bandwidth  are Free and Infinite?

63

Kinds of Parallel ExecutionKinds of Parallel Execution

Pipeline

Partition outputs split N ways inputs merge M ways

Any Sequential Program

Any Sequential Program

SequentialSequential

SequentialSequential Any Sequential Program

Any Sequential Program

Page 53: What Happens When Processing Storage Bandwidth  are Free and Infinite?

64

Object Oriented ProgrammingObject Oriented ProgrammingParallelism From Many Little JobsParallelism From Many Little Jobs

Gives location transparency ORB/web/tpmon multiplexes clients to servers Enables distribution Exploits embarrassingly parallel apps (transactions) HTTP and RPC (dcom, corba, rmi, iiop, …) are basis

Tp mon / orb/ web server

Page 54: What Happens When Processing Storage Bandwidth  are Free and Infinite?

65

Why Parallel Access To Data?Why Parallel Access To Data?

1 Terabyte

10 MB/s

At 10 MB/s1.2 days to scan

1 Terabyte

1,000 x parallel100 second SCAN.

Parallelism: divide a big problem into many smaller ones

to be solved in parallel.

BANDWID

TH

Page 55: What Happens When Processing Storage Bandwidth  are Free and Infinite?

66

Why are Relational OperatorsWhy are Relational OperatorsSuccessful for Parallelism?Successful for Parallelism?

Relational data model uniform operatorson uniform data streamClosed under composition

Each operator consumes 1 or 2 input streamsEach stream is a uniform collection of dataSequential data in and out: Pure dataflow

partitioning some operators (e.g. aggregates, non-equi-join, sort,..)

requires innovation

AUTOMATIC PARALLELISM

Page 56: What Happens When Processing Storage Bandwidth  are Free and Infinite?

67

Database Systems Database Systems “Hide” Parallelism “Hide” Parallelism

Automate system management via tools– data placement– data organization (indexing)– periodic tasks (dump / recover / reorganize)

Automatic fault tolerance– duplex & failover– transactions

Automatic parallelism– among transactions (locking)– within a transaction (parallel execution)

Page 57: What Happens When Processing Storage Bandwidth  are Free and Infinite?

69

Automatic Parallel Object Relational DBAutomatic Parallel Object Relational DBSelect imagefrom landsatwhere date between 1970 and 1990and overlaps(location, :Rockies) and snow_cover(image) >.7;

Temporal

Spatial

Image

date loc image

Landsat

1/2/72.........4/8/95

33N120W.......34N120W

Assign one process per processor/disk:find images with right data & locationanalyze image, if 70% snow, return it

image

Answer

date, location, & image tests

Page 58: What Happens When Processing Storage Bandwidth  are Free and Infinite?

70

Automatic Data Automatic Data PartitioningPartitioningSplit a SQL table to subset of nodes & disks

Partition within set:Range Hash Round Robin

Shared disk and memory less sensitive to partitioning, Shared nothing benefits from "good" partitioning

A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z

Good for equi-joins, range queriesgroup-by

Good for equi-joins Good to spread load

Page 59: What Happens When Processing Storage Bandwidth  are Free and Infinite?

74

Partitioned ExecutionPartitioned Execution

A...E F...J K...N O...S T...Z

A Table

Count Count Count Count Count

Count

Spreads computation and IO among processors

Partitioned data gives NATURAL parallelism

Page 60: What Happens When Processing Storage Bandwidth  are Free and Infinite?

75

N x M way ParallelismN x M way Parallelism

A...E F...J K...N O...S T...Z

Merge

Join

Sort

Join

Sort

Join

Sort

Join

Sort

Join

Sort

Merge Merge

N inputs, M outputs, no bottlenecks.

Partitioned DataPartitioned and Pipelined Data Flows

Page 61: What Happens When Processing Storage Bandwidth  are Free and Infinite?

81

Hash Join: Combining Two TablesHash Join: Combining Two TablesHash smaller table into N buckets (hope N=1)

If N=1 read larger table, hash to smallerElse, hash outer to disk then

bucket-by-bucket hash join.

Purely sequential data behavior

Always beats sort-merge and nestedunless data is clustered.

Good for equi, outer, exclusion joinLots of papers,

products just appearing (what went wrong?)

Hash reduces skew

Right Table

LeftTable

HashBuckets

Page 62: What Happens When Processing Storage Bandwidth  are Free and Infinite?

82

Parallel Hash JoinParallel Hash Join

ICL implemented hash join with bitmaps in CAFS machine (1976)!

Kitsuregawa pointed out the parallelism benefits of hash join in early 1980’s (it partitions beautifully)

We ignored them! (why?) But now, Everybody's doing it.

(or promises to do it).

Hashing minimizes skew, requires little thinking for redistribution

Hashing uses massive main memory

Page 63: What Happens When Processing Storage Bandwidth  are Free and Infinite?

84

Main MessageMain Message Technology trends give

– many processors and storage units– inexpensively

To analyze large quantities of data– sequential (regular) access patterns are 100x faster– parallelism is 1000x faster (trades time for money)– Relational systems show many parallel algorithms.

Page 64: What Happens When Processing Storage Bandwidth  are Free and Infinite?

86

SummarySummary

All God’s Children Got Clusters! Technology trends imply

processors migrated to transducers Components (Software CyberBricks)

Programming & Managing Clusters Database experience

– Parallelism via transaction processing– Parallelism via data flow– Auto Everything, Always Up

Page 65: What Happens When Processing Storage Bandwidth  are Free and Infinite?

87

End: End: 86 slides is more than 86 slides is more than enough for an hour.enough for an hour.

Page 66: What Happens When Processing Storage Bandwidth  are Free and Infinite?

98

Clusters Have AdvantagesClusters Have Advantages Clients and Servers made from the same stuff.

Inexpensive: – Built with commodity components

Fault tolerance: – Spare modules mask failures

Modular growth– grow by adding small modules

Page 67: What Happens When Processing Storage Bandwidth  are Free and Infinite?

99

Meta-Message:Meta-Message: Technology Ratios Are Important Technology Ratios Are Important

Meta-Message:Meta-Message: Technology Ratios Are Important Technology Ratios Are Important

If everything gets faster & cheaper at the same rate THEN nothing really

changes.

Things getting MUCH BETTER:

– communication speed & cost 1,000x– processor speed & cost 100x– storage size & cost 100x

Things staying about the same– speed of light (more or less constant)– people (10x more expensive)– storage speed (only 10x better)

Page 68: What Happens When Processing Storage Bandwidth  are Free and Infinite?

100

Storage Ratios ChangedStorage Ratios Changed 10x better access time 10x more bandwidth 4,000x lower media price DRAM/DISK 100:1 to 10:10 to 50:1

Disk Performance vs Time

1

10

100

1980 1990 2000

Year

seek

s p

er s

eco

nd

ban

dw

idth

: MB

/s

0.1

1.

10.

Cap

acity

(GB

)

Disk accesses/second vs Time

1

10

100

1980 1990 2000

Year

Acc

esse

s p

er S

eco

nd

Storage Price vs TimeMegabytes per kilo-dollar

0.1

1.

10.

100.

1,000.

10,000.

1980 1990 2000

Year

MB

/k$

Page 69: What Happens When Processing Storage Bandwidth  are Free and Infinite?

104

Performance = Storage AccessesPerformance = Storage Accesses not Instructions Executed not Instructions Executed

In the “old days” we counted instructions and IO’s Now we count memory references Processors wait most of the time

Where the time goes: clock ticks used by AlphaSort Components

SortDisc Wait SortDisc WaitDisc Wait OS

Memory WaitMemory Wait

D-Cache Miss

I-Cache MissB-CacheB-Cache

Data MissData Miss

70 MIPS“real” apps have worse Icache misses so run at 60 MIPSif well tuned, 20 MIPS if not

Page 70: What Happens When Processing Storage Bandwidth  are Free and Infinite?

105

Storage Latency: Storage Latency: How Far Away is the Data?How Far Away is the Data?

RegistersOn Chip CacheOn Board Cache

Memory

Disk

12

10

100

Tape /Optical Robot

10 9

106

Sacramento

This CampusThis Room

My Head

10 min

1.5 hr

2 Years

1 min

Pluto

2,000 YearsAndromdeda

Clo

ck T

icks

Page 71: What Happens When Processing Storage Bandwidth  are Free and Infinite?

106

Tape Farms for Tertiary StorageTape Farms for Tertiary StorageNot Mainframe SilosNot Mainframe Silos

Tape Farms for Tertiary StorageTape Farms for Tertiary StorageNot Mainframe SilosNot Mainframe Silos

Scan in 27 hours.many independent tape robots(like a disc farm)

10K$ robot 14 tapes500 GB 5 MB/s 20$/GB 30 Maps

100 robots

50TB 50$/GB 3K Maps

27 hr Scan

1M$

Page 72: What Happens When Processing Storage Bandwidth  are Free and Infinite?

107

0.01

0.1

1

10

100

1,000

10,000

100,000

1,000,000

1000 x Disc Farm STC Tape Robot 6,000 tapes, 8 readers

100x DLT Tape Farm

GB/K$

Maps

SCANS/Day

Kaps

The Metrics: The Metrics: Disk and Tape Farms Win Disk and Tape Farms Win

The Metrics: The Metrics: Disk and Tape Farms Win Disk and Tape Farms Win

Data Motel:Data checks in, but it never checks out

Page 73: What Happens When Processing Storage Bandwidth  are Free and Infinite?

108

Tape & Optical: Tape & Optical: Beware of the Beware of the Media MythMedia Myth

Tape & Optical: Tape & Optical: Beware of the Beware of the Media MythMedia Myth

Optical is cheap: 200 $/platter 2 GB/platter => 100$/GB (2x cheaper than disc)

Tape is cheap: 50 $/tape 20 GB/tape => 2.5 $/GB (100x cheaper than disc).

Page 74: What Happens When Processing Storage Bandwidth  are Free and Infinite?

109

Tape & Optical Tape & Optical RealityReality: : Media is 10% of System CostMedia is 10% of System Cost

Tape & Optical Tape & Optical RealityReality: : Media is 10% of System CostMedia is 10% of System CostTape needs a robot (10 k$ ... 3 m$ ) 10 ... 1000 tapes (at 20GB each) => 20$/GB ... 200$/GB

(1x…10x cheaper than disc)

Optical needs a robot (100 k$ ) 100 platters = 200GB ( TODAY ) => 400 $/GB

( more expensive than mag disc ) Robots have poor access times Not good for Library of Congress (25TB) Data motel: data checks in but it never checks out!

Page 75: What Happens When Processing Storage Bandwidth  are Free and Infinite?

110

The Access Time MythThe Access Time MythThe Access Time MythThe Access Time MythThe Myth: seek or pick time dominatesThe reality: (1) Queuing dominates (2) Transfer dominates BLOBs (3) Disk seeks often shortImplication: many cheap servers

better than one fast expensive server– shorter queues– parallel transfer– lower cost/access and cost/byte

This is now obvious for disk arraysThis will be obvious for tape arrays

Seek

Rotate

Transfer

Seek

Rotate

Transfer

Wait

Page 76: What Happens When Processing Storage Bandwidth  are Free and Infinite?

111

Billions Of Clients Billions Of Clients

Every device will be “intelligent” Doors, rooms, cars… Computing will be ubiquitous

Page 77: What Happens When Processing Storage Bandwidth  are Free and Infinite?

112

Billions Of ClientsBillions Of ClientsNeed Millions Of ServersNeed Millions Of Servers

MobileMobileclientsclients

FixedFixedclients clients

ServerServer

SuperSuperserverserver

ClientsClients

ServersServers

All clients networked All clients networked to serversto servers May be nomadicMay be nomadic

or on-demandor on-demand Fast clients wantFast clients want

fasterfaster servers servers Servers provide Servers provide

Shared DataShared Data ControlControl CoordinationCoordination CommunicationCommunication

Page 78: What Happens When Processing Storage Bandwidth  are Free and Infinite?

113

1987: 256 tps Benchmark 1987: 256 tps Benchmark 14 M$ computer (Tandem) A dozen people False floor, 2 rooms of machines

Simulate 25,600 clients

A 32 node processor array

A 40 GB disk array (80 drives)

OS expert

Network expert

DB expert

Performance expert

Hardware experts

Admin expert

Auditor

Manager

Page 79: What Happens When Processing Storage Bandwidth  are Free and Infinite?

114

1988: DB2 + CICS Mainframe1988: DB2 + CICS Mainframe65 tps65 tps

IBM 4391 Simulated network of 800 clients 2m$ computer Staff of 6 to do benchmark

2 x 3725 network controllers

16 GB disk farm4 x 8 x .5GB

Refrigerator-sizedCPU

Page 80: What Happens When Processing Storage Bandwidth  are Free and Infinite?

115

1997: 10 years later1997: 10 years later1 Person and 1 box = 1250 tps1 Person and 1 box = 1250 tps

1 Breadbox ~ 5x 1987 machine room 23 GB is hand-held One person does all the work Cost/tps is 1,000x less

25 micro dollars per transaction4x200 Mhz cpu1/2 GB DRAM12 x 4GB disk

Hardware expertOS expertNet expertDB expertApp expert

3 x7 x 4GB disk arrays

Page 81: What Happens When Processing Storage Bandwidth  are Free and Infinite?

116

What Happened?What Happened? Moore’s law:

Things get 4x better every 3 years (applies to computers, storage, and networks)

New Economics: Commodityclass price/mips software $/mips k$/yearmainframe 10,000 100 minicomputer 100 10microcomputer 10 1

GUI: Human - computer tradeoffoptimize for people, not computers

mainframeminimicro

time

pric

e

Page 82: What Happens When Processing Storage Bandwidth  are Free and Infinite?

117

What Happens NextWhat Happens Next

Last 10 years: 1000x improvement

Next 10 years: ????

Today: text and image servers are free

25 $/hit => advertising pays for them Future:

video, audio, … servers are free“You ain’t seen nothing yet!”

1985 20051995

perf

orm

ance

Page 83: What Happens When Processing Storage Bandwidth  are Free and Infinite?

118

Smart Cards Smart Cards Smart Cards Smart Cards

Bull CP8 two chip card first public demonstration 1979

Then (1979)

EMV card with dynamic authentication(EMV=Europay, MasterCard, Visa standard)

door key, vending machines, photocopiers

Now (1997)

Courtesy of Dennis Roberson NCR.

Page 84: What Happens When Processing Storage Bandwidth  are Free and Infinite?

119

Smart Card Smart Card Memory CapacityMemory Capacity

ApplicationsApplications

Cards will be able to storedata (e.g. medical)books, movies,…money

Source: PIN/Card -Tech/ Courtesy of Dennis Roberson NCR

1990 1992 1996 1998 2000 2002

Mem

ory

Siz

e (B

its) 300 M

1 M

3 K

10 K

You are here

2004

16 KB todaybut growing super-exponentially