Upload
georgia-alexander
View
239
Download
0
Tags:
Embed Size (px)
Citation preview
Scaleable ComputingScaleable Computing
Jim GrayJim GrayMicrosoft CorporationMicrosoft Corporation
[email protected]@Microsoft.com
™
Thesis: Scaleable ServersThesis: Scaleable Servers Scaleable ServersScaleable Servers
Commodity hardware allows new applicationsCommodity hardware allows new applications New applications need huge serversNew applications need huge servers Clients and servers are built of the same “stuff”Clients and servers are built of the same “stuff”
Commodity software and Commodity software and Commodity hardwareCommodity hardware
Servers should be able to Servers should be able to Scale up Scale up (grow node by adding CPUs, disks, networks)(grow node by adding CPUs, disks, networks)
Scale out Scale out (grow by adding nodes)(grow by adding nodes)
Scale down Scale down (can start small)(can start small)
Key software technologiesKey software technologies Objects, Transactions, Clusters, ParallelismObjects, Transactions, Clusters, Parallelism
1987: 256 tps Benchmark 1987: 256 tps Benchmark 14 M$ computer (Tandem)14 M$ computer (Tandem) A dozen peopleA dozen people False floor, 2 rooms of machinesFalse floor, 2 rooms of machines
Simulate 25,600 clients
A 32 node processor array
A 40 GB disk array (80 drives)
OS expert
Network expert
DB expert
Performance expert
Hardware experts
Admin expert
Auditor
Manager
1988: DB2 + CICS Mainframe1988: DB2 + CICS Mainframe65 tps65 tps
IBM 4391 IBM 4391 Simulated network of 800 clientsSimulated network of 800 clients 2m$ computer2m$ computer Staff of 6 to do benchmarkStaff of 6 to do benchmark
2 x 3725 network controllers
16 GB disk farm4 x 8 x .5GB
Refrigerator-sizedCPU
1997: 10 years later1997: 10 years later1 Person and 1 box = 1250 tps1 Person and 1 box = 1250 tps
1 Breadbox ~ 5x 1987 machine room1 Breadbox ~ 5x 1987 machine room 23 GB is hand-held23 GB is hand-held One person does all the workOne person does all the work Cost/tps is 1,000x lessCost/tps is 1,000x less
25 micro dollars per transaction25 micro dollars per transaction4x200 Mhz cpu1/2 GB DRAM12 x 4GB disk
Hardware expertOS expertNet expertDB expertApp expert
3 x7 x 4GB disk arrays
What Happened?What Happened? Moore’s law: Moore’s law:
Things get 4x better every 3 yearsThings get 4x better every 3 years (applies to computers, storage, and networks)(applies to computers, storage, and networks)
New Economics: CommodityNew Economics: Commodityclassclass price/mips softwareprice/mips software $/mips k$/year $/mips k$/yearmainframe mainframe 10,000 10,000 100 100 minicomputerminicomputer 100 100 10 10microcomputer 10 microcomputer 10 1 1
GUI: Human - computer tradeoffGUI: Human - computer tradeoffoptimize for people, not computersoptimize for people, not computers
mainframeminimicro
time
pric
e
What Happens NextWhat Happens Next
Last 10 years: Last 10 years: 1000x improvement 1000x improvement
Next 10 years: Next 10 years: ????????
Today: Today: text and image servers are freetext and image servers are free
25 25 $/hit => advertising pays for $/hit => advertising pays for themthem
Future:Future:video, audio, … servers are freevideo, audio, … servers are free“You ain’t seen nothing yet!” “You ain’t seen nothing yet!”
1985 20051995
perf
orm
ance
Kinds Of Kinds Of Information ProcessingInformation Processing
It’s ALL going electronicIt’s ALL going electronic
Immediate is being stored for analysis (so ALL database)Immediate is being stored for analysis (so ALL database)
Analysis and automatic processing are being addedAnalysis and automatic processing are being added
Point-to-pointPoint-to-point BroadcastBroadcast
ImmediateImmediate
Time-Time-shiftedshifted
ConversationConversationMoneyMoney
LectureLectureConcertConcert
MailMail BookBookNewspaperNewspaper
NetworkNetwork
DatabaseDatabase
Why Put EverythingWhy Put EverythingIn Cyberspace?In Cyberspace?
Low rent -Low rent -min $/bytemin $/byte
Shrinks time -Shrinks time -now or laternow or later
Shrinks space -Shrinks space -here or therehere or there
Automate processing -Automate processing -knowbotsknowbots
Point-to-point Point-to-point OR OR
broadcastbroadcast
Imm
ed
iate
OR
tim
e-d
ela
ye
dIm
me
dia
te O
R t
ime
-de
lay
ed
NetworkNetwork
DatabaseDatabase
LocateLocateProcessProcessAnalyzeAnalyzeSummarizeSummarize
Magnetic Storage Magnetic Storage Cheaper Than PaperCheaper Than Paper
File cabinetFile cabinet:: cabinet (four drawer)cabinet (four drawer) 250$250$paper (24,000 paper (24,000
sheets)sheets) 250$250$ space space (2x3 @ 10$/ft(2x3 @ 10$/ft22)) 180$180$ totaltotal
700$700$
3¢/sheet3¢/sheet DiskDisk:: disk (4 GB =)disk (4 GB =) 800$800$
ASCII: 2 mil ASCII: 2 mil pagespages
00..04¢/sheet04¢/sheet (80x cheaper)(80x cheaper)
ImageImage:: 200,000 pages200,000 pages
0.4¢/sheet0.4¢/sheet (8x cheaper)(8x cheaper)
Store everything on diskStore everything on disk
DatabasesDatabasesInformation at Your FingertipsInformation at Your Fingertips™™
Information Network Information Network™™
Knowledge NavigatorKnowledge Navigator™™
All information will be in anAll information will be in anonline database (somewhere)online database (somewhere)
You might record everything you You might record everything you Read: 10MB/day, 400 GB/lifetimeRead: 10MB/day, 400 GB/lifetime
(eight tapes (eight tapes todaytoday)) Hear: 400MB/day, 16 TB/lifetimeHear: 400MB/day, 16 TB/lifetime
(three tapes/year (three tapes/year todaytoday)) See: 1MB/s, 40GB/day, 1.6 PB/lifetime See: 1MB/s, 40GB/day, 1.6 PB/lifetime
(maybe someday)(maybe someday)
Database StoreDatabase StoreALL Data TypesALL Data Types
The new world:The new world: Billions of objectsBillions of objects Big objects (1 MB)Big objects (1 MB) Objects have Objects have
behavior (methods)behavior (methods)
The old world:The old world: Millions of objectsMillions of objects 100-byte objects100-byte objects
PeoplePeople
NameName AddressAddress
MikeMike
WonWon
DavidDavid NYNY
BerkBerk
AustinAustinPeoplePeople
NameName AddressAddress PapersPapers PicturePicture VoiceVoice
MikeMike
WonWon
DavidDavid NYNY
BerkBerk
AustinAustin
Paperless officePaperless office Library of Congress onlineLibrary of Congress online All information onlineAll information online
EntertainmentEntertainmentPublishingPublishingBusinessBusiness
WWW and InternetWWW and Internet
Billions Of Clients Billions Of Clients
Every device will be “intelligent”Every device will be “intelligent” Doors, rooms, cars…Doors, rooms, cars… Computing will be ubiquitousComputing will be ubiquitous
Billions Of ClientsBillions Of ClientsNeed Millions Of ServersNeed Millions Of Servers
MobileMobileclientsclients
FixedFixedclients clients
ServerServer
SuperSuperserverserver
ClientsClients
ServersServers
All clients networked All clients networked to serversto servers May be nomadicMay be nomadic
or on-demandor on-demand Fast clients wantFast clients wantfasterfaster servers servers
Servers provide Servers provide Shared DataShared Data ControlControl CoordinationCoordination CommunicationCommunication
ThesisThesisMany little beat few bigMany little beat few big
Smoking, hairy golf ballSmoking, hairy golf ball How to connect the many little parts?How to connect the many little parts? How to program the many little parts?How to program the many little parts? Fault tolerance?Fault tolerance?
$1 $1 millionmillion $100 K$100 K $10 K$10 K
MainframeMainframe MiniMiniMicroMicro NanoNano
14"14"9"9"
5.25"5.25" 3.5"3.5" 2.5"2.5" 1.8"1.8"1 M SPECmarks, 1TFLOP1 M SPECmarks, 1TFLOP
101066 clocks to bulk ram clocks to bulk ram
Event-horizon on chipEvent-horizon on chip
VM reincarnatedVM reincarnated
Multiprogram cache,Multiprogram cache,On-Chip SMPOn-Chip SMP
10 microsecond ram
10 millisecond disc
10 second tape archive
10 nano-second ram
Pico Processor
10 pico-second ram
1 MM 3
100 TB
1 TB
10 GB
1 MB
100 MB
Future Super Server:Future Super Server:4T Machine4T Machine
Array of 1,000 4B machinesArray of 1,000 4B machines1 bps processors1 bps processors1 BB DRAM 1 BB DRAM 10 BB disks 10 BB disks 1 Bbps comm lines1 Bbps comm lines1 TB tape robot1 TB tape robot
A few megabucksA few megabucks Challenge:Challenge:
ManageabilityManageabilityProgrammabilityProgrammabilitySecuritySecurityAvailabilityAvailabilityScaleabilityScaleabilityAffordabilityAffordability
As easy as a single systemAs easy as a single system
Future servers are CLUSTERSFuture servers are CLUSTERSof processors, discsof processors, discs
Distributed database techniquesDistributed database techniquesmake clusters workmake clusters work
CPU
50 GB Disc
5 GB RAM
Cyber BrickCyber Bricka 4B machinea 4B machine
Performance = Storage Performance = Storage AccessesAccesses
not Instructions Executed not Instructions Executed In the “old days” we counted instructions and In the “old days” we counted instructions and IO’sIO’s
Now we count memory referencesNow we count memory referencesProcessors wait most of the timeProcessors wait most of the time
Where the time goes: clock ticks used by AlphaSort Components
SortDisc Wait SortDisc Wait OS
Memory Wait
D-Cache Miss
I-Cache MissB-Cache
Data Miss
70 MIPS“real” apps have worse Icache misses so run at 60 MIPSif well tuned, 20 MIPS if not
Storage Latency: Storage Latency: How Far Away is the Data?How Far Away is the Data?
RegistersOn Chip CacheOn Board Cache
Memory
Disk
12
10
100
Tape /Optical Robot
109
106
This CampusThis Room
My Head
10 min
1.5 hr
2 Years
1 min
Pluto
2,000 YearsAndromeda
Clo
ck T
icks
Sacramento
The Hardware Is In Place…The Hardware Is In Place…And then a miracle occursAnd then a miracle occurs
? SNAP: scaleable networkSNAP: scaleable network
and platformsand platforms Commodity-distributedCommodity-distributed
OS built on:OS built on: Commodity platformsCommodity platforms Commodity networkCommodity network
interconnectinterconnect Enables parallel applicationsEnables parallel applications
Thesis: Scaleable ServersThesis: Scaleable Servers Scaleable ServersScaleable Servers
Commodity hardware allows new applicationsCommodity hardware allows new applications New applications need huge serversNew applications need huge servers Clients and servers are built of the same “stuff”Clients and servers are built of the same “stuff”
Commodity software and Commodity software and Commodity hardwareCommodity hardware
Servers should be able to Servers should be able to Scale up Scale up (grow node by adding CPUs, disks, networks)(grow node by adding CPUs, disks, networks)
Scale out Scale out (grow by adding nodes)(grow by adding nodes)
Scale down Scale down (can start small)(can start small)
Key software technologiesKey software technologies Objects, Transactions, Clusters, ParallelismObjects, Transactions, Clusters, Parallelism
Scaleable ServersScaleable ServersBOTH SMP And ClusterBOTH SMP And Cluster
Grow up with SMP; 4xP6Grow up with SMP; 4xP6is now standardis now standardGrow out with clusterGrow out with clusterCluster has inexpensive partsCluster has inexpensive parts
ClusterClusterof PCs of PCs
SMP superSMP superserverserver
DepartmentalDepartmentalserverserver
PersonalPersonalsystemsystem
SMPs Have AdvantagesSMPs Have Advantages
Single system image Single system image easier to manage, easier easier to manage, easier to program threads in to program threads in shared memory, disk, Netshared memory, disk, Net
4x SMP is commodity4x SMP is commodity Software capable of 16xSoftware capable of 16x Problems:Problems:
>4 not commodity>4 not commodity Scale-down problem Scale-down problem
(starter systems expensive)(starter systems expensive) There There isis a BIGGEST one a BIGGEST one
SMP superSMP superserverserver
DepartmentalDepartmentalserverserver
PersonalPersonalsystemsystem
Building the Largest NodeBuilding the Largest Node There is a biggest node (size grows over time)There is a biggest node (size grows over time) Today, with NT, it is probably 1TBToday, with NT, it is probably 1TB We are building itWe are building it (with help from DEC and SPIN2)(with help from DEC and SPIN2)
1 TB GeoSpatial SQL Server database1 TB GeoSpatial SQL Server database (1.4 TB of disks = 320 drives).(1.4 TB of disks = 320 drives). 30K BTU, 8 KVA, 1.5 metric tons.30K BTU, 8 KVA, 1.5 metric tons.
Will put it on the Web as a demo app.Will put it on the Web as a demo app. 10 meter image of the ENTIRE PLANET.10 meter image of the ENTIRE PLANET. 2 meter image of interesting parts 2 meter image of interesting parts (2% of land)(2% of land)
One pixel per meter = 500 TB One pixel per meter = 500 TB uncompressed.uncompressed.
Better resolution in US (courtesy of USGS).Better resolution in US (courtesy of USGS).
www.SQL.1TB.com
SupportSupportfilesfiles
1-TB SQL Server DB1-TB SQL Server DBSatellite and aerial Satellite and aerial
photosphotos
Todo loo da loo-rah, ta da ta-la la la
Todo loo da loo-rah, ta da ta-la la la
Todo loo da loo-rah, ta da ta-la la la
Todo loo da loo-rah, ta da ta-la la la
Todo loo da loo-rah, ta da ta-la la la
Todo loo da loo-rah, ta da ta-la la la
Todo loo da loo-rah, ta da ta-la la la
1-TB home page1-TB home page
TM
What’s TeraByte?What’s TeraByte? 1 Terabyte:1 Terabyte: 1,000,000,000 business letters 150 miles of book shelf1,000,000,000 business letters 150 miles of book shelf 100,000,000 book pages 100,000,000 book pages 15 miles of book shelf 15 miles of book shelf 50,000,000 FAX images50,000,000 FAX images 7 miles of book shelf 7 miles of book shelf 10,000,000 TV pictures (mpeg) 10 days of video 10,000,000 TV pictures (mpeg) 10 days of video
4,000 LandSat images 4,000 LandSat images 16 earth images (100m) 16 earth images (100m) 100,000,000 web page 10 copies of the web HTML100,000,000 web page 10 copies of the web HTML
Library of Congress (in ASCII) is 25 TBLibrary of Congress (in ASCII) is 25 TB 1980: $200 million of disc1980: $200 million of disc 10,000 discs 10,000 discs
$5 million of tape silo$5 million of tape silo 10,000 tapes 10,000 tapes
1997: $200 k$ of magnetic disc 48 discs1997: $200 k$ of magnetic disc 48 discs $30 k$ nearline tape 20 tapes$30 k$ nearline tape 20 tapes
Terror Byte !Terror Byte !
TB DB User InterfaceTB DB User Interface
Next
Tpc-C Web-Based BenchmarksTpc-C Web-Based Benchmarks Client is a Web browser Client is a Web browser
(7,500 of them!)(7,500 of them!) Submits Submits
OrderOrder InvoiceInvoice Query to server via Web Query to server via Web
page interfacepage interface
Web server translates to DBWeb server translates to DB SQL does DB workSQL does DB work Net: Net:
easy to implement easy to implement performance is GREAT!performance is GREAT!
HT
TP
HT
TP
OD
BC
OD
BC
SQL SQL
IISIIS= Web= Web
TPC-C TPC-C Shows How Far SMPs have comeShows How Far SMPs have comeTPC-C TPC-C Shows How Far SMPs have comeShows How Far SMPs have come Performance is amazing: Performance is amazing:
2,000 users is the min!2,000 users is the min! 30,000 users on a 4x12 alpha cluster (Oracle)30,000 users on a 4x12 alpha cluster (Oracle)
Peak Performance: Peak Performance: 30,390 tpmC30,390 tpmC @ $305/tpmC @ $305/tpmC (Oracle/DEC)(Oracle/DEC)
Best Price/Perf: 6,712 tpmC @ Best Price/Perf: 6,712 tpmC @ $65/tpmC$65/tpmC ( (MS SQL/DEC/Intel)MS SQL/DEC/Intel)
graphs show UNIX high price & diseconomy of scaleupgraphs show UNIX high price & diseconomy of scaleuptpmC & Price Performance(only "best" data shown for each vendor)
0
50
100
150
200
250
300
350
400
0 5000 10000 15000 20000
tpmC
$/tp
mC
DB2
Informix
MS SQL Server
Oracle
Sybase
TPC C SMP PerformanceTPC C SMP Performance
tpmC vs CPS
0
5,000
10,000
15,000
20,000
0 5 10 15 20
CPUs
tpm
C
SUN Scaleability
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
20,000
0 5 10 15 20
cpus
tpm
C
SUN Scaleability
SQL Server
• SMPs do offer speedup but 4x P6 is better than some 18x MIPSco
The TPC-C RevolutionThe TPC-C Revolution Shows How Far Shows How Far
NT and SQL Server have ComeNT and SQL Server have Come Economy of scale on Windows NT Economy of scale on Windows NT Recent Microsoft SQL Server benchmarks Recent Microsoft SQL Server benchmarks
are Web-basedare Web-based
tpmC and $/tpmCMS SQL Server: Economy of Scale & Low Price
$0
$50
$100
$150
$200
$250
0 1000 2000 3000 4000 5000 6000 7000 8000
Performance tpmC
Pri
ce
$/T
PM
-C
DB2
Informix
Microsoft
Oracle
Sybase
Bet
ter
Bet
ter
30
What Happens To Prices?What Happens To Prices? No expensive UNIX front end (20$/tpmC)No expensive UNIX front end (20$/tpmC) No expensive TP monitor software (10$/tpmC)No expensive TP monitor software (10$/tpmC)
=> => 65$/tpmC65$/tpmCTPC Price/tpmC
164
93
188
39
66 64
54
3944
66
44 4440
42
31
3835
38
22
41
18
35
16
3945
30
8
19
27
40
3
21
0
10
20
30
40
50
60
70
80
90
100
processor disk software net
Informix on SNIOracle on DEC UnixOracle on Compaq/NTSybase on Compaq/NTMicrosoft on Compaq with VisigenicsMicrosoft on HP with VisagenicsMicrosoft on Intergraph with IISMicrosoft on Compaq with IIS
Grow UP and OUT Grow UP and OUT
1 billion 1 billion transactions transactions
per dayper day
SMP superSMP superserverserver
DepartmentalDepartmentalserverserver
PersonalPersonalsystemsystem
1 Terabyte DB1 Terabyte DB
Cluster: •a collection of nodes •as easy to program and manage as a single node
Clusters Have AdvantagesClusters Have Advantages
Clients and servers made from the same stuffClients and servers made from the same stuff Inexpensive: Inexpensive:
Built with commodity components Built with commodity components
Fault tolerance: Fault tolerance: Spare modules mask failuresSpare modules mask failures
Modular growthModular growth Grow by adding small modulesGrow by adding small modules
Unlimited growth: Unlimited growth: no biggest oneno biggest one
Windows NT Windows NT clustersclusters Key goals:Key goals:
Easy: to install, manage, programEasy: to install, manage, program Reliable: better than a single nodeReliable: better than a single node Scaleable: added parts add powerScaleable: added parts add power
Microsoft & 60 vendors Microsoft & 60 vendors defining NT clustersdefining NT clusters Almost all big hardware and Almost all big hardware and
software vendors involvedsoftware vendors involved No special hardware needed - No special hardware needed -
but it may helpbut it may help Enables Enables
Commodity fault-toleranceCommodity fault-tolerance Commodity parallelism Commodity parallelism
(data mining, virtual reality…)(data mining, virtual reality…) Also great for workgroups!Also great for workgroups!
Initial: two-node failoverInitial: two-node failover Beta testing since December96Beta testing since December96 SAP, Microsoft, Oracle giving SAP, Microsoft, Oracle giving
demos.demos. File, print, Internet, mail, DB, other File, print, Internet, mail, DB, other
servicesservices Easy to manageEasy to manage Each node can be 4x (or more) SMPEach node can be 4x (or more) SMP
Next (NT5) “Wolfpack” is modest Next (NT5) “Wolfpack” is modest size clustersize cluster About 16 nodes (so 64 to 128 CPUs)About 16 nodes (so 64 to 128 CPUs) No hard limit, algorithms designedNo hard limit, algorithms designed
to go furtherto go further
SQL ServerSQL Server™™ Failover Failover Using “Wolfpack” Windows NT ClustersUsing “Wolfpack” Windows NT Clusters
Each server “owns” half the databaseEach server “owns” half the database When one fails…When one fails…
The other server takes over the shared disksThe other server takes over the shared disks Recovers the database and serves itRecovers the database and serves it
Shared SCSI disk stringsShared SCSI disk strings
AA BB
PrivatePrivatedisksdisks
PrivatePrivatedisksdisks
ClientsClients
Billion Transactions per DayBillion Transactions per DayProjectProject
Building a 20-node Windows NT Building a 20-node Windows NT Cluster (with help from Intel)Cluster (with help from Intel)> 800 disks> 800 disks
All commodity partsAll commodity parts Using SQL Server & Using SQL Server &
DTC distributed transactionsDTC distributed transactions Each node has 1/20 th of the DB Each node has 1/20 th of the DB Each node does 1/20 th of the Each node does 1/20 th of the
workwork 15% of the transactions are 15% of the transactions are
“distributed”“distributed”
How Much Is 1 Billion How Much Is 1 Billion Transactions Per Day?Transactions Per Day?
Millions of transactions per dayMillions of transactions per day
0.10.1
1.1.
10.10.
100.100.
1,000.1,000.
1 B
tpd
1 B
tpd
Vis
aV
isa
AT
&T
AT
&T
Bo
fAB
ofA
NY
SE
NY
SE
Mtp
dM
tpd
1 Btpd = 11,574 tps 1 Btpd = 11,574 tps (transactions per second)(transactions per second) ~ 700,000 tpm ~ 700,000 tpm (transactions/minute)(transactions/minute)
AT&T AT&T 185 million calls 185 million calls
(peak day worldwide)(peak day worldwide) Visa ~20 M tpdVisa ~20 M tpd
400 M customers400 M customers 250,000 ATMs worldwide250,000 ATMs worldwide 7 billion transactions / year 7 billion transactions / year
(card+cheque) in 1994 (card+cheque) in 1994
ParallelismParallelismThe OTHER aspect of clustersThe OTHER aspect of clusters
Clusters of machines Clusters of machines allow two kinds allow two kinds of parallelismof parallelism Many little jobs: online Many little jobs: online
transaction processingtransaction processing TPC-A, B, C…TPC-A, B, C…
A few big jobs: data A few big jobs: data search and analysissearch and analysis TPC-D, DSS, OLAPTPC-D, DSS, OLAP
Both give Both give automatic parallelismautomatic parallelism
Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
Kinds of Parallel ExecutionKinds of Parallel Execution
Pipeline
Partition outputs split N ways inputs merge M ways
Any Sequential Program
Any Sequential Program
Any Sequential
Any Sequential Program Program
Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
Data RiversData Rivers Split + Merge StreamsSplit + Merge Streams
River
M ConsumersN producers
Producers add records to the river, Consumers consume records from the riverPurely sequential programming.River does flow control and buffering
does partition and merge of data records River = Split/Merge in Gamma = Exchange operator in Volcano.
N X M Data Streams
Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
Partitioned ExecutionPartitioned Execution
A...E F...J K...N O...S T...Z
A Table
Count Count Count Count Count
Count
Spreads computation and IO among processors
Partitioned data gives NATURAL parallelism
Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey
N x M way ParallelismN x M way Parallelism
A...E F...J K...N O...S T...Z
Merge
Join
Sort
Join
Sort
Join
Sort
Join
Sort
Join
Sort
Merge Merge
N inputs, M outputs, no bottlenecks.
Partitioned DataPartitioned and Pipelined Data Flows
The Parallel Law The Parallel Law Of ComputingOf Computing
Grosch's Law: Grosch's Law:
Parallel Law:Parallel Law:Needs:Needs:
Linear speedup and linear scale-upLinear speedup and linear scale-upNot always possibleNot always possible 1 MIPS 1 MIPS
1 $1 $
1,000 MIPS1,000 MIPS1,000 $1,000 $
2x $ is2x performance
1 MIPS1 MIPS1 $1 $
1,000 MIPS1,000 MIPS 32 $32 $.03$/MIPS.03$/MIPS
2x $ is 4x performance
Thesis: Scaleable ServersThesis: Scaleable Servers Scaleable ServersScaleable Servers
Commodity hardware allows new applicationsCommodity hardware allows new applications New applications need huge serversNew applications need huge servers Clients and servers are built of the same “stuff”Clients and servers are built of the same “stuff”
Commodity software and Commodity software and Commodity hardwareCommodity hardware
Servers should be able to Servers should be able to Scale up Scale up (grow node by adding CPUs, disks, networks)(grow node by adding CPUs, disks, networks)
Scale out Scale out (grow by adding nodes)(grow by adding nodes)
Scale down Scale down (can start small)(can start small)
Key software technologiesKey software technologies Objects, Transactions, Clusters, ParallelismObjects, Transactions, Clusters, Parallelism
The BIG PictureThe BIG PictureComponents and transactionsComponents and transactions
Software modules are objects Software modules are objects Object Request Broker (a.k.a., Transaction Object Request Broker (a.k.a., Transaction
Processing Monitor) connects objectsProcessing Monitor) connects objects(clients to servers)(clients to servers)
Standard interfaces allow software plug-insStandard interfaces allow software plug-ins Transaction ties execution of a “job” into an Transaction ties execution of a “job” into an
atomic unit: all-or-nothing, durable, isolatedatomic unit: all-or-nothing, durable, isolated
Object Request BrokerObject Request Broker
ActiveX and COMActiveX and COM COM is Microsoft model, engine inside OLE ALL COM is Microsoft model, engine inside OLE ALL
Microsoft software is based on COM (ActiveX)Microsoft software is based on COM (ActiveX) CORBA + OpenDoc is equivalentCORBA + OpenDoc is equivalent Heated debate over which is bestHeated debate over which is best Both share same key goals: Both share same key goals:
Encapsulation: hide implementationEncapsulation: hide implementation Polymorphism: generic operationsPolymorphism: generic operations
key to GUI and reuse key to GUI and reuse Versioning: allow upgradesVersioning: allow upgrades Transparency: local/remoteTransparency: local/remote Security: invocation can be remote Security: invocation can be remote Shrink-wrap: minimal inheritanceShrink-wrap: minimal inheritance Automation: easyAutomation: easy
COM now managed by the Open GroupCOM now managed by the Open Group
Linking And EmbeddingLinking And EmbeddingObjects are data modules;Objects are data modules;
transactions are execution modulestransactions are execution modules
Link: pointer to object Link: pointer to object somewhere elsesomewhere else Think URL in InternetThink URL in Internet
Embed: bytesEmbed: bytesare hereare here
Objects may be Objects may be activeactive; ; can callback to subscriberscan callback to subscribers
Objects Meet DatabasesObjects Meet DatabasesThe basis for The basis for universaluniversal
data servers, access, & integrationdata servers, access, & integration
DBMSDBMSengineengine
object-oriented (COM oriented) object-oriented (COM oriented) programming interface to dataprogramming interface to data
Breaks DBMS into componentsBreaks DBMS into components Anything can be a data sourceAnything can be a data source Optimization/navigation “on top Optimization/navigation “on top
of” other data sourcesof” other data sources A way to componentized a A way to componentized a
DBMSDBMS Makes an RDBMS and O-RMakes an RDBMS and O-R
DBMS (assumes optimizer DBMS (assumes optimizer understands objects)understands objects)
DatabaseDatabase
SpreadsheetSpreadsheet
PhotosPhotos
MailMail
MapMap
DocumentDocument
49
The Pattern: The Pattern: Three Tier ComputingThree Tier Computing
Clients do presentation, gather inputClients do presentation, gather input
Clients do some workflow (Xscript)Clients do some workflow (Xscript)
Clients send high-level requests to Clients send high-level requests to ORB (Object Request Broker)ORB (Object Request Broker)
ORB dispatches workflows and ORB dispatches workflows and business objects -- proxies for client, business objects -- proxies for client, orchestrate flows & queuesorchestrate flows & queues
Server-side workflow scripts call on Server-side workflow scripts call on distributed business objects to distributed business objects to execute taskexecute task
Database
Business Objects
workflow
Presentation
50
The Three The Three TiersTiers
Web Client
HTML
VB or Java Script Engine
VB or Java Virt Machine
VBscritptJavaScrpt
VB Javaplug-ins
InternetORB
HTTP+DCOM
ObjectserverPool
MiddlewareORB
TP MonitorWeb Server...
DCOM (oleDB, ODBC,...)
Object & Dataserver.
LU6.2
IBMLegacy Gateways
51
Why Did Everyone Go To Why Did Everyone Go To Three-Tier?Three-Tier?
ManageabilityManageability Business rules must be with dataBusiness rules must be with data Middleware operations toolsMiddleware operations tools
Performance (scaleability)Performance (scaleability) Server resources are preciousServer resources are precious ORB dispatches requests to server poolsORB dispatches requests to server pools
Technology & PhysicsTechnology & Physics Put UI processing near userPut UI processing near user Put shared data processing near shared Put shared data processing near shared
datadataDatabase
Business Objects
workflow
Presentation
53
What Middleware DoesWhat Middleware Does ORB, TP Monitor, Workflow Mgr, Web Server ORB, TP Monitor, Workflow Mgr, Web Server
Registers transaction programs Registers transaction programs
workflow and business objects (DLLs)workflow and business objects (DLLs) Pre-allocates server poolsPre-allocates server pools Provides server execution environmentProvides server execution environment Dynamically checks authorityDynamically checks authority
(request-level security)(request-level security)
Does parameter bindingDoes parameter binding Dispatches requests to serversDispatches requests to servers
parameter bindingparameter binding load balancingload balancing
Provides QueuesProvides Queues Operator interfaceOperator interface
54
Server Side ObjectsServer Side Objects Easy Server-Side ExecutionEasy Server-Side Execution
Give simple execution Give simple execution environmentenvironment
Object gets Object gets startstart invokeinvoke shutdownshutdown
Everything else is Everything else is automaticautomatic
Drag & Drop Business Drag & Drop Business ObjectsObjects
NetworkNetwork
Thread PoolThread Pool
QueueQueue
ConnectionsConnections
ContextContext SecuritySecurity
Shared Data
ReceiverReceiver
SynchronizationSynchronization
Service logic
Co
nfig
ura
tion
Co
nfig
ura
tion
Ma
na
ge
me
nt
Ma
na
ge
me
nt
A Server
A new programming paradigm Develop object on the desktopDevelop object on the desktop Better yet: download them from the NetBetter yet: download them from the Net Script work flows as method invocations Script work flows as method invocations All on desktopAll on desktop Then, move work flows and objects to server(s)Then, move work flows and objects to server(s) GivesGives
desktop development desktop development three-tier deploymentthree-tier deploymentSoftware CyberbricksSoftware Cyberbricks
Transactions Coordinate Transactions Coordinate Components (ACID)Components (ACID)
Transaction propertiesTransaction properties Atomic: all or nothingAtomic: all or nothing Consistent: old and new valuesConsistent: old and new values Isolated: automatic locking or versioningIsolated: automatic locking or versioning Durable: once committed, effects surviveDurable: once committed, effects survive Transactions are built into modern OSsTransactions are built into modern OSs
MVS/TM Tandem TMF, VMS DEC-DTM, NT-DTCMVS/TM Tandem TMF, VMS DEC-DTM, NT-DTC
Transactions & ObjectsTransactions & Objects Application requests transaction Application requests transaction
identifier (XID)identifier (XID) XID flows with method invocationsXID flows with method invocations Object Managers join (enlist)Object Managers join (enlist)
in transactionin transaction Distributed Transaction Manager Distributed Transaction Manager
coordinates commit/abortcoordinates commit/abort
Distributed TransactionsDistributed Transactions Enable Huge Throughput Enable Huge Throughput
Each node capable of 7 KtmpC Each node capable of 7 KtmpC (7,000 (7,000 activeactive users!) users!) Can add nodes to cluster Can add nodes to cluster (to support 100,000 users)(to support 100,000 users)
Transactions coordinate nodesTransactions coordinate nodes ORB / TP monitor spreads work among nodesORB / TP monitor spreads work among nodes
Distributed TransactionsDistributed Transactions Enable Huge DBs Enable Huge DBs
Distributed database technology Distributed database technology spreads data among nodesspreads data among nodes
Transaction processing technology Transaction processing technology manages nodesmanages nodes
Thesis: Scaleable ServersThesis: Scaleable Servers Scaleable Servers Built from CyberbricksScaleable Servers Built from Cyberbricks
Allow new applicationsAllow new applications
Servers should be able to Servers should be able to Scale up, out, downScale up, out, down
Key software technologiesKey software technologies Clusters (ties the hardware together)Clusters (ties the hardware together) Parallelism: (Parallelism: (uses the independent cpus, stores, wiresuses the independent cpus, stores, wires
Objects (software CyberBricks) Objects (software CyberBricks) Transactions: masks errors.Transactions: masks errors.
Computer Industry Laws Computer Industry Laws (Rules of thumb)(Rules of thumb)
Metcalf’s lawMetcalf’s law Moore’s first lawMoore’s first law Bell’s computer classes (7 price tiers)Bell’s computer classes (7 price tiers) Bell’s platform evolutionBell’s platform evolution Bell’s platform economicsBell’s platform economics Bill’s lawBill’s law Software economicsSoftware economics Grove’s lawGrove’s law Moore’s second lawMoore’s second law Is info-demand infinite?Is info-demand infinite? The death of Grosch’s lawThe death of Grosch’s law
Metcalf’s LawMetcalf’s LawNetwork Utility = UsersNetwork Utility = Users22
How many connections can it How many connections can it make?make? 1 user: no utility1 user: no utility 100,000 users: a few contacts100,000 users: a few contacts 1 million users: many on Net1 million users: many on Net 1 billion users: everyone on Net1 billion users: everyone on Net
That is why the Internet is so “hot”That is why the Internet is so “hot” Exponential benefitExponential benefit
XXX doubles every 18 months XXX doubles every 18 months 60% increase per year60% increase per year Micro processor speedsMicro processor speeds Chip densityChip density Magnetic disk densityMagnetic disk density Communications bandwidthCommunications bandwidth
WAN bandwidth approaching LANsWAN bandwidth approaching LANs Exponential growth:Exponential growth:
The past does not matterThe past does not matter 10x here, 10x there, soon you’re talking REAL 10x here, 10x there, soon you’re talking REAL
changechange PC costs decline faster than any other PC costs decline faster than any other
platformplatform Volume and learning curvesVolume and learning curves PCs will be the building bricks of all future PCs will be the building bricks of all future
systemssystems
Moore’s First LawMoore’s First Law
128KB128KB
128MB128MB
200020008KB8KB
1MB1MB
8MB8MB
1GB1GB
19701970 19801980 19901990
1M1M 16M16Mbits: 1Kbits: 1K 4K4K 16K16K 64K64K 256K256K 4M4M 64M64M 256M256M
1 chip memory size1 chip memory size ( 2 MB to 32 MB)( 2 MB to 32 MB)
Bumps In The Moore’s Bumps In The Moore’s Law RoadLaw Road
DRAM:DRAM: 1988: United States1988: United States
anti-dumping anti-dumping rulesrules
1993-1995: ?price flat1993-1995: ?price flat
10000001000000
11
100100
1000010000
19701970 19801980 19901990 20002000
$/MB of DRAM$/MB of DRAM
.01.01
11
100100
10,00010,000
19701970 19801980 19901990 20002000
$/MB of DISK$/MB of DISK Magnetic disk:Magnetic disk: 1965-1989: 10x/decade1965-1989: 10x/decade 1989-1996: 4x/3year!1989-1996: 4x/3year!
100X/decade100X/decade
Gordon Bell’s 1975 VAX Gordon Bell’s 1975 VAX Planning Model... He Didn’t Planning Model... He Didn’t
Believe It!Believe It!
5x: Memory is5x: Memory is20% of cost20% of cost3x: DEC markup3x: DEC markup.04x: $ per byte.04x: $ per byte
He didn’t believe:He didn’t believe:the projectionthe projection$500 machine$500 machine
He couldn’tHe couldn’tcomprehendcomprehendthe implicationsthe implications 0.01K$
0.1K$
1.K$
10.K$
100.K$
1,000.K$
10,000.K$
100,000.K$
1960 1970 1980 1990 2000
16 KB 64 KB 256 KB 1 MB 8 MB
System Price = 5 x 3 x .04 x memory size/ 1.26 System Price = 5 x 3 x .04 x memory size/ 1.26 (t-1972) (t-1972) K$K$
Gordon Bell’s ProcessingGordon Bell’s ProcessingMemories, And Comm 100 Memories, And Comm 100
YearsYears
1.E+001.E+00
1.E+031.E+03
1.E+061.E+06
1.E+091.E+09
1.E+121.E+12
1.E+151.E+15
1.E+181.E+18
19471947 19671967 19871987 20072007 20272027 20472047
ProcessingProcessing Pri. MemPri. Mem Sec. Mem.Sec. Mem.
POTS(bps)POTS(bps) BackboneBackbone
•
Gordon Bell’s Seven Price Gordon Bell’s Seven Price TiersTiers
10$: 10$: wrist watch computerswrist watch computers
100$:100$: pocket/ palm computerspocket/ palm computers
1,000$:1,000$: portable computersportable computers
10,000$: 10,000$: personal computers (desktop)personal computers (desktop)
100,000$: 100,000$: departmental computers departmental computers (closet)(closet)
1,000,000$:1,000,000$: site computers (glass house)site computers (glass house)
10,000,000$:10,000,000$: regional computers (glass regional computers (glass castle)castle) Super server: costs more than $100,000
“Mainframe”: costs more than $1 millionMust be an array of processors, disks, tapes, comm ports
Bell’s Evolution Of Bell’s Evolution Of Computer ClassesComputer Classes
Technology enables two evolutionary paths:1. constant performance, decreasing cost2. constant price, increasing performance
????TimeTime
Mainframes (central)Mainframes (central)
Minis (dep’t.)Minis (dep’t.)
PCs (personals)PCs (personals)Log
pri
ce
Log
pri
ce
WSsWSs
1.26 = 2x/3 yrs -- 10x/decade; 1/1.26 = .81.26 = 2x/3 yrs -- 10x/decade; 1/1.26 = .81.6 = 4x/3 yrs --100x/decade; 1/1.6 = .621.6 = 4x/3 yrs --100x/decade; 1/1.6 = .62
Gordon Bell’s Gordon Bell’s Platform EconomicsPlatform Economics
Computer typeComputer type
0.010.01
0.10.1
11
1010
100100
10001000
1000010000
100000100000
MainframeMainframe WSWS BrowserBrowser
Price (K$)Price (K$)
Volume (K)Volume (K)
ApplicationApplicationpriceprice
Traditional computers: custom or semi-custom,Traditional computers: custom or semi-custom, high-tech and high-touch high-tech and high-touch
New computers: high-tech and no-touch New computers: high-tech and no-touch
Software Software EconomicsEconomics
An engineer costs An engineer costs aboutabout$150,000/year$150,000/year
R&D gets [5%…15%]R&D gets [5%…15%]of budgetof budget
Need [$3 million…Need [$3 million…$1 million] revenue $1 million] revenue per engineer per engineer
Microsoft: $9 billionMicrosoft: $9 billion
R&DR&D16%16%
SG&ASG&A34%34%
ProductProductand Serviceand Service
13%13%
TaxTax13%13%
ProfitProfit24%24%
Intel: $16 billionIntel: $16 billion
R&DR&D8%8%
SG&ASG&A11%11%
P&SP&S47%47%
TaxTax
12%12%
ProfitProfit22%22%
R&DR&D8%8%
SG&ASG&A22%22%
P&SP&S59%59%
TaxTax5%5%
ProfitProfit6%6%
IBM: $72 billionIBM: $72 billion
R&DR&D9%9%
SG&ASG&A43%43%
TaxTax7%7%
ProfitProfit15%15%
P&SP&S26%26%
Oracle: $3 billionOracle: $3 billion
Software Economics: Bill’s Software Economics: Bill’s LawLaw
Bill Joy’s law (Sun): Bill Joy’s law (Sun): don’t write software for less than 100,000 platforms don’t write software for less than 100,000 platforms
@$10 million engineering expense, $1,000 price@$10 million engineering expense, $1,000 price Bill Gate’s law:Bill Gate’s law:
don’t write software for less than 1,000,000 platforms don’t write software for less than 1,000,000 platforms @$10 engineering expense, $100 price@$10 engineering expense, $100 price
Examples: Examples: UNIX versus Windows NT: $3,500 versus $500UNIX versus Windows NT: $3,500 versus $500Oracle versus SQL-Server: $100,000 versus $6,000Oracle versus SQL-Server: $100,000 versus $6,000No spreadsheet or presentation pack on UNIX/VMS/...No spreadsheet or presentation pack on UNIX/VMS/...
Commoditization of base software and hardwareCommoditization of base software and hardware
PricePrice Fixed_Fixed_CostCostMarginal _CostMarginal _Cost==
UnitsUnits ++
Grove’s LawGrove’s LawThe New Computer IndustryThe New Computer Industry
Horizontal Horizontal integrationintegrationis new structureis new structure
Each layer picks Each layer picks best from lower best from lower layerlayer
Desktop (C/S) Desktop (C/S) marketmarket1991: 50%1991: 50%1995: 75%1995: 75%
Intel & SeagateIntel & SeagateSilicon & OxideSilicon & Oxide
SystemsSystemsBasewareBasewareMiddlewareMiddlewareApplicationsApplications SAPSAP
OracleOracleMicrosoftMicrosoft
CompaqCompaq
IntegrationIntegration EDSEDS
OperationOperation AT&TAT&TFunctionFunction ExampleExample
Moore’s Second Moore’s Second LawLaw
The cost of fab linesThe cost of fab linesdoubles every generation doubles every generation (three years)(three years)
Money limit hard to imagine:Money limit hard to imagine: $10-billion line$10-billion line $20-billion line$20-billion line $40-billion line$40-billion line
Physical limitPhysical limit Quantum effects at 0.25 Quantum effects at 0.25
micron now 0.05 micron micron now 0.05 micron seems hard 12 years, three seems hard 12 years, three generationsgenerations
Lithograph: need Xray Lithograph: need Xray below 0.13 micronbelow 0.13 micron
$1$1
$10$10
$100$100
$1,000$1,000
$10,000$10,000
19601960 19701970 19801980 19901990 20002000
YearYear
$mil
lion
/ Fab
Lin
e$m
illi
on/ F
ab L
ine
Constant workConstant work:: One SuperServer can doOne SuperServer can do
all the world’s computationsall the world’s computations
Constant dollars:Constant dollars: The world spends 10% onThe world spends 10% on
information processinginformation processing Computers are moving fromComputers are moving from
5% penetration to 50%5% penetration to 50% $300 billion to $3 trillion$300 billion to $3 trillion We have the patentWe have the patent
on the byte and algorithmon the byte and algorithm
Constant Dollars Versus Constant Dollars Versus Constant WorkConstant Work
Crossing The ChasmCrossing The Chasm
OldOldmarketmarket
OldOldtechnologytechnology
NewNewtechnologytechnology
VeryVeryhard
hard
HardHardBoringBoring
competitivecompetitiveslow growthslow growth
No productNo productno customersno customers
Product findsProduct finds customerscustomers
CustomersCustomersfind productfind product
HardHard
New New marketmarket