Public, Private, and Hybrid Cloud Computing With Low … · Public, Private, and Hybrid Cloud Computing ... VMware (now EMC) ... five essential characteristics, three service

Computing Techniques for Big Data

IEEE PresentationSeptember 24, 2012

© Gayn B. Winters, Ph.D.My IEEE email is gaynwinters

Public, Private, and Hybrid Cloud ComputingWith Low Latency

Multiples of bytes

● kilobyte (KB) 10^3

● megabyte (MB) 10^6

● gigabyte (GB) 10^9

● terabyte (TB) 10^12

● petabyte (PB) 10^15

● exabyte (EB) 10^18

● zettabyte (ZB) 10^21

● yottabyte (YB) 10^24

Library of Congress ~ 10 PB

Actually 1024

1024^2

1024^3, etc.

1024 ^ 7

=1.2*10^24

Big Data● “Big Data” = Exceeds in quantity, transfer

speed, and/or data type variety. (3V's)

● Big Data worth $100B growing 10% per year

● As of June 2012, ~2.5 Exabytes (10^18) of business data are created DAILY

● Twitter receives 12TB of data a day

● Modern airline eng: 20TB/hr performance data

● NASA Sq. Kilometer Array: 100's of TB/sec = 10's of Exabytes/day

● Machines dominate data generation!!!

Moving Data to the Cloud will make this transmission ramp even steeper!

What do you do ...?(Tipping Points)

● When your data size exceeds what can be put on or handled by one machine?

● When your daily calculations take longer than a day?

● When your fancy compute systems have terrible ROA? E.g. 10% utilization. (IT's dirty secret!)

Virtualization

● VMM: run multiple VMs on a single machine

● History

● In the 1960's, IBM partitioned mainframes into virtual machines as R&D tool on System 360. VM/370 released in 1972. OK performance.

● 17 non-privileged instructions on x86 had to be trapped and replaced for virtualization. VMware (now EMC) and Concentrix (now Microsoft).

● 2005 VM support in Intel VT-x and AMD-V.

● Virtual IO, storage, networks

● Live VM Migration

Clouds

Cloud Computing

● Elastically and Independently provision:

● Compute resources

● Storage

● Network capacity: access and internal

● Resource Pooling and Load Balancing

● Only pay for what you use; includes transmission, maintenance, etc.

● Failover, backup, restore, disaster recovery

● Security and Availability “Guaranteed”, but …

● See my blog: gaynwinters.wordpress.com

Cloud ComputingNIST Definition

● Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model is composed of five essential characteristics, three service models, and four deployment models.

● http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

NIST Service Models

● Software as a Service (SaaS)

● “rent an app+”

● Platform as a Service ( PaaS)

● “rent an OS+”

● Infrastructure as a Service (IaaS)

● “rent a datacenter”

NIST Deployment Models

● Private Cloud

● Own everything, provide cloud services internally

● Local or at a remote data center

● Community Cloud

● Private cloud with shared ownership and use

● Public Cloud

● Vendor owned, rent everything

● Hybrid Cloud

● Mix of the above

● The big trend now!!!

Public Cloud Vendors

● Amazon Web Services (AWS)

● Rackspace

● Google

● Microsoft

● IBM

● HP

● Dozens of others....

Building Clouds

● Need VMM virtualization technology

● Some Linux, VMware, Citrix Xen, many startups

● Need infrastructure technology

● Eucalyptus – copies AWS APIs, has AWS hybrids, most installations

● OpenStack – started by NASA, Rackspace; now part of Apache; 100's of supporting IaaS vendors

● CloudStack – started by cloud.com; purchased by Citrix, now part of Apache. Many installations, but fewer supporters than OpenStack.

● Other technology: monitoring, CASE, etc.

PCworld's Virtualization Table

Why Public Clouds?

● Usually cheaper than private clouds

● Public cloud vendors benefit from economies of scale

● Equipment: volume discounts

● Energy: can negotiate discounts and can locate their data centers where energy costs are low and air conditioning isn't much needed.

● Can recruit more (and better?) expertise

● Better for high usage elasticity

● Seasonal, tax, audits, random projects, fast growth, negative growth, schools, ...

Why Private Clouds?

● Availability (Public clouds are < 99.9%)

● gaynwinters.wordpress.com discusses outages

● Security and Regulatory Conformance

● Performance – configure own hardware for

● Base CPU, memory, speed to storage

● High IO/s

● Low latency

● Transition step to hybrid and public clouds

Google: Failures/Year/Cluster● ~0.5 heat problems (pwr down most eq < 5 min, 2 day MTTR

● ~1 PDU failure (500-1000 machines gone, ~6 hrs MTTR

● ~1 rack move ( ~500-1000 machines down, ~6 hrs MTTR

● ~1 network rewiring (rolling ~5% down over 2 day span)

● ~20 rack failures (40-80 machines disappear, 1-6 hrs MTTR

● ~6 racks see 50% packet loss (?? MTTR)

● ~8 net maint failures (30 min connectivity loss MTTR)

● ~12 router reloads (2-3 min MTTR, web and clusters)

● ~3 router failures (web access 1 hr MTTR)

● ~1000 machine failures; ~many thousands disk failures

● Many minor, hw and sw problems; many Internet problems

Hadoop Distributed File System(Modeled after Google's GFS)

● Divide (big) file into large “chunks” 64-128MB.

● Replicate each chunk at least 3 times, storing each chunk as a linuxfile on a data server. Usually put 2 in the same rack.

● Optimize for reads and for record appends.

● Before every read, checksum the data.

● Maintain a master name server to keep track of metadata, incl. chunk locations, replicas,...

● Clients cache metadata and read/write directly to the data servers.

Chunk/Data Server

Chunk/Data Server

Chunk/Data Server

Chunk/Data Server

HDFS/GFS Notes● A client interacts with master name server only to get file

metadata, which it caches. Then it can interact directly with the data servers.

● Data servers update their metadata with the master every minute or so. (Heartbeat)

● Large chunk size reduces size of metadata, master-client interaction, and network overhead.

● Master name server is a single point of failure that is addressed with shadow servers and logs.

● Applications must optimize to the file system.

● No Network Attached Storage or RAID (too expensive)

● No Mainframes (or big databases)

MapReduceSystem to Process Big Files

● Prepare a Big File for input as (key1, value1) pairs into multiple Mapper jobs

● Map(key1, value1) generates intermediate (key2, value2) pairs and separates this output into R files for input into R parallel Reducers.

● When all mappers done, the R reducers read and Sort their input files.

● Reduce(key2, value2) does a data reduction processing stage. Output = R Big Files of (key3, value3) pairs (not sorted nor combined.)

MapReduce System

Job Master

Client

Interesting Google Applications of MR

● Inverting {outgoing links} to {incoming links}

● Efficient support of Google Queries

● Stitching together overlapping satellite images tor Google Earth

● Rendering map-tile images of road segments for Google Maps

● Over 15,000+ MR applications; 100,000+ MR jobs/day

Simple MapReduce Examples

Job Master

Client

Sort on Key

● Map(key, value) = (key, value) identity fcn

● Reduce (key, value) = (key,value) identity fcn

● With multiple reducers, Map needs to have a partitioner class or a hash function so that k1 < k2 implies hash(k1) < hash(k2), an “ordered hash function”

String Searchwith Sorted Output

● Map (docName, contents) → (docName, Contents, line_number)

● If string In Contents Then emit(docName, Contents, line_number)

● Use sorted hash to assign output to the reducers

● Reduce(docName, Contents) = (docName, Contents[line_number-1:line_number+1]

Index a Document

● map(pageName, pageText)

● For word w In pageText

– Emit(w, pageName) to file(hash(w))

● Use ordered hash to output to reducers

● reduce(word, values)

● For each pageName in values Until word changes Do AddToPageNameList(pageName)

● Emit(word, “: ”, PageNameList)

Example Word Index

B

Dog ACat APlay ADig AGirl BDog BPlay BFun B

Boy CCat CGirl CPlay CFun C

Dog ACat ADig ADog B

Play APlay BGirl BFun B

Boy CCat C

Girl CPlay CFun C

Boy CCat ACat CDog ADog BDig A

Fun BFun CGirl BGirl CPlay APlay BPlay C

Boy: CCat: A, CDog: A, BDig: A

Fun: B, CGirl: B, CPlay: A, B, C

A

C

Latency

● One way network latency = time from packet send start at the source to packet arrival start at the destination.

● Round trip network latency = time from packet sent start at source to packet arrival start back at the source when the target just returns the packet with no processing. (Ping)

● Transaction time = latency + processing time

Imagekind.comapache.org

Low Latency Applications● Trading, esp. arbitrage

● Info Week: “A 1 millisecond advantage in trading can be worth $100 Million per year.”

● Broadcasting market data, publishing volatilities, reconfiguring connections and performing other time-critical tasks.

● Connecting news to trades. Weather forecast in Brazil affect futures' price of coffee.

● Telecom voice and video quality

● Beware of Droop: handshake slow speed slowing down throughput on fiber channels.

Latency in Networking● 4G and LTE mobile connections are driving the

removal of latency problems across the entire infrastructure.

● Voice and video (and games of course)

● Trading using mobile devices.

● Light in a vacuum, 3.33 microseconds per km, through fiber ~5.0 microseconds/km (almost never a straight line nor unaided.)

● Distance NYC and LA is 2462 miles (3961 km) but 2777 miles (4469 km) driving. Fiber distance is close to driving distance.

● One way latency >> 22.3 milliseconds

● Add amplifier and regeneration droop time, longer distances.

SpeedLimit

C

Latency in the Cloud

● Assume your packet to the cloud (private or public) contains the command to start a job.

● Latency is measured from the time the packet starts to arrive to the time the job starts.

● Includes sending and receiving the entire transaction packet

● Low latency applications want (and will pay for) 10 GigE connections (and even 40 GigE sometimes) from clients to servers. These are in or near the datacenter!

Low Latency Requirements

● Availability

● High MTTF (highly reliable equipment)

● MTTR (nanosecond failover)

● Bandwidth: 10-40 GigE

● Latency of all components in 100s of nanoseconds or less

● Timing, Timestamping, Synchronization

● Accurate clocks with nanosecond precision

What's Happening at the Bleeding Edge?

International Securities Exchange

● Orders acknowledged in 310 us, from entry to ISE network, through the matching engine, and back via 10 GigE

● Round trip mass quote to acknowledgement

● Reference: www.ise.com

BATS● Licensed 2008, 11-12% of securities trading

● All electronic trading

● Roundtrip from firewall to FIX handler to matching engine and back

● Sustained under heavy load, including open and close of day trading

● 10 GigE

● SAN storage (EMC and Hitachi)

● Avg 191 us (99.9% within 2 milliseconds)

● Ref: www.batstrading.com

NASDAQ

● 59 us round trip for order leaving customer port, through colo network and firewalls to the matching engine and back.

● Measured across the day, including open and close of trading.

● 10 GigE (NASDAQ also sells access to its 40 GigE subnet)

● Ref: www.nasdaqtrader.com

CheckPoint Firewalls

● Data Security Checks

● Security clearance in 5 us

● Moves 110 billion bits/sec thru firewall

● Uses 108 core processors

● Supports 300,000 connections/second

● Forwards 60,000,000 packets/second

● Ref: www.checkpoint.com

Switch Latency

● Switch hardware

● Zeptonics, a Sydney-based financial technology firm, said clients had begun testing its trading switch, which routes messages inside high-speed data centers in 130 nanoseconds.

● Cisco Nexus 3016 switch can get the last number or letter of a 1,024-character message onto a wire in 1.17 microseconds, or millionths of a second. The last number or letter of a 256-character message makes it onto the wire in 950 nanoseconds, or billionths of a second. Has nanosecond accuracy/consistency for PTP,

Cabling Latency

● Latency in nanoseconds for fiber, and milliseconds for copper. Obvious choice!

● Fiber: lighter, lower power, lower heat.

● BUT, fiber to the desk is impractical due to cost and the (current) lack of video, voice, and power over Ethernet (PoE) services. Consider combo cables, e.g. from siemon.

● All cabling and installation must be of the highest quality. Do not go cheap here!

How to get TCP Latency < 1 µs

● Often TCP buffering gets in the way and adds ~40ms latency. The flush() function on the socket won't help.

● Set TCP_NODELAY on the socket (in Java, use Socket.setNoDelay(true) )

● Be careful of unbalanced networks. Too much TCP_NODELAY causes congestion.

● Another approach: use UDP as many real-time applications do. Sometimes, however, TCP performs better!

Clocks

● Symmetricon

● SyncPoint PCIe-1000 PTP clock for Linux

● 170 nanosecond accuracy to UTC (USNO)

● Synchronizes all servers on LAN within 10 nanoseconds of each other

● Ref: www.symmetricon.com

● Other: Korusys, Meinberg, Endrun, Optisynx, TimeSync, …

● Absolute Time: GPS (~100 ns), DCF77 (long wave from Germany at 77.5 Khz, accuracy 2.5 us), CDMA/GSM mobile net, accuracy 10 us.

Saved the Best for Last - Servers

Google's Server

Google Server Backside

Photo by Stephen Shankland of CNET

Google Server Details

● 2 CPUs

● 2 Hard Drives

● Custom 12v Power Supply

● Low cost, peak capacity, high efficiency

● Gigabyte MB converts 12v to lower

● 12v Battery Backup (99.9% efficiency vs 95%)

● 8 sticks DRAM

● Ethernet, USB, and other ports

● 2U = 3.5” high

Facebook Rack of Servers1.5U, no big UPS, no AC, tall heat sinks, four huge

fans, no tools, 6 lbs lighter, cheaper, mb from Quanta Computer (Taiwan)

Facebook's Open Compute Server

Open Compute(From Facebook's Specifications)

● Open Rack (OU = 1.5U) 21” wide

● 60mm fans more efficient than 40mm

● Tall cooling towers

● Custom power supplies use 277 volt AC to avoid step downs from outside power. (Delta Electronics of Taiwan and Power One (Calif)

● One UPS per row: 20 48v batteries. Power supply has separate connection for UPS power.

● Datacenter has separate surge and harmonics suppression.

Facebook Open Compute v2 Server

Intel Open Compute mb v2

AMD Roadrunner Open Compute v2Designed for Financial Apps

What is your ideal server?

● CPU etc

● Cache

● Memory

● PCIe vs SATA

● SSD vs HDD

● Network Connections

● UPS and Power Supply

● Cooling

Consulting and Careers

● “Data Scientist” “Hadoop Administrator”- new job titles

● IBM's Big Data University (2,400 at their bootcamps, 14,000 registered for on-line)

● Hadoop “certification” (Cloudera, IBM, Hortonworks, MapR, Informatica)

● 3rd Hadoop World (Oct 23-25, 2012 NYC)

● A deluge of Hadoop activities in Silicon Valley

● Consult on using Google Apps, AWS, Hyper-V, etc.

● Performance experts needed!!! (esp. Low Latency)

● Security experts needed!!!

● Analysis/algorithms experts needed !!!

Thank You!

● Questions?

●

● Please contact me at 366-4296 or if you remember letters better FON-GAYN (area code 714) or at ieee.org with username gaynwinters for

● Consulting/jobs

● Seminars

Documents

Public, Private, and Hybrid Cloud Computing With Low … · Public, Private, and Hybrid Cloud Computing ... VMware (now EMC) ... five essential characteristics, three service