24
Sr. Director, Product Management February 21, 2013 #lspe: Dynamic Scaling Shock Absorbers and APIs Steve Shah

#lspe: Dynamic Scaling

Embed Size (px)

DESCRIPTION

Presentation given to the #lspe meetup (Large Systems Performance Engineering) on February 21, 2013 by Steve Shah. Topic for the night was Dynamic Scaling. This presentation is titled "Shock Absorbers and APIs" and covers features typical of ADCs (modern load balancers) that can help in managing scale as well as give a quick overview of what to expect from an API in an ADC.

Citation preview

Page 1: #lspe: Dynamic Scaling

Sr. Director, Product Management

February 21, 2013

#lspe: Dynamic ScalingShock Absorbers and APIs

Steve Shah

Page 2: #lspe: Dynamic Scaling

Disclaimer

• I’m going to talk about a product.Iot’s kind of necessary in order to make this talk useful.

ᵒBut a lot of you have this product or know someone that does!

ᵒThe product is pretty cool…

Iot can also sing and dance.

ᵒMaking coffee is on the roadmap.

• Sorry. ᵒYes, I am marketing scum.

ᵒNo, I will not to do a hard sell.

• My CompetitionᵒGoogle it. No really… It’s not hard to find them.

ᵒTheir product has various approaches too. I encourage you to ask them.

Page 3: #lspe: Dynamic Scaling

Performan

ce Offload SecurityAvailability

What is NetScaler?

NetScaler powers some of the world’s largest infrastructures.

Page 4: #lspe: Dynamic Scaling

1998 to 2012: From Load Balancing to Virtual

Networking

1998

L4 SLB

1999

L7 SLBGSLBMUX

2002

SSLCMPDNS

2005

AppFWSIPAAA-TM

2003

SSLVPNRHI

2006

ICAIPv6

2008

XMLnCore

2009

VPXEdgeSight

2011

SDXAppFlowDataStream

Secret Decoder Ring:SLB = Server Load BalancingGSLB = Global Server Load BalancingMUX = HTTP MultiplexingSSL = SSL AccelerationCMP = HTTP CompressionDNS = DNS Load Balancing / Proxy

RHI = Route Health InjectionICA = App Proxy for ICAIPv6 = IPv6 Routing, Switching, LBXML = XML Security, RoutingVPX = Virtual NetScalernCore = multi-core scalingSDX = Multi-tenant NetScaler

Page 5: #lspe: Dynamic Scaling

Agenda

• Things That Impact Scalability

• Shock Absorbers

• Out Scaling

• Your ADC has an API!

Page 6: #lspe: Dynamic Scaling

Things That Impact ScalabilityTouching on a bit of theory…

Page 7: #lspe: Dynamic Scaling

Load is Not Linear

• There are startup costs for enabling features in an ADC (memory and CPU)

• However, each incremental request takes a small fraction of resources

• As load increases, some global functions can take resources as wellᵒE.g., flushing unused IP fragments, running timers, management overhead, etc.

Page 8: #lspe: Dynamic Scaling

Data Structures and Big O

• I/O, Data structures, and String processing are big factors

• The two that get you are data structures and stringᵒACLs, VLANs, connection table, connection state, persistence table, etc.

ᵒHTTP request processing and policy execution

• Know your Big O – understand their impactᵒBig O notation is how programmers describe efficiency of algorithms

ᵒE.g., O(n) vs. O(log n) vs. O(1)

Page 9: #lspe: Dynamic Scaling

Shock AbsorbersCoping with Load

Page 10: #lspe: Dynamic Scaling

Launching v8: The Role of Data Structures

• Story time… launching a major service and what we learned

• Major new roll-out – expected to double the number of servers to handle

• Early testing revealed that large numbers of slow connections are meh

• Invest in your data structures! Clean up on several core structures

• Average connection lookup time driven to near constant time: O(1)

• Stir in a team that dreams in assembly language and can see cache

misalignment by glancing at code and shave another 20% off connection

lookup times (absolute times)

• Lesson: drive your apps to good data structures. Drive your vendors to do

better.

Page 11: #lspe: Dynamic Scaling

MaxConns and SurgeQ

Typical server performance curve

Peak perf – we want to

stay there

Incoming load

Page 12: #lspe: Dynamic Scaling

MaxConns and SurgeQ

Server stays operating at maximum throughput

Set max conns here

Queue incoming requests

in the ADC

Page 13: #lspe: Dynamic Scaling

Story time:

When 4 Hurricanes Hit

Page 14: #lspe: Dynamic Scaling

Out-Scaling

Page 15: #lspe: Dynamic Scaling

The SR-71 Approach: Go Faster

• Single Systemcoonfigured and managed as a

single logical system

• Scalablesocales with number of devices

(distributes work)

• Fault TolerantᵒHandles device failure, addition…

• Dynamic

Treat a collection of NS devices like a grand unified “big” device

The Sheet-metal Test

Steps:

• Take a cluster of NS, and an L2 switch.

• Configure the devices to your liking.

• Wrap the whole thing with sheet-metal, such

that only the network ports remain exposed.

Test:

Must be able to configure and use this contraption as

if it were just another NS box.

• connect wires into any visible port(s), create

LAGs at will, enable L2 mode, MBF …

• point GUI to Cluster’s IP and configure away

Page 16: #lspe: Dynamic Scaling

Clustering

• Create a single system image out of a collection of instancesIonstances = virtual machines, physical instances, or instances on multi-tenant boxes

• True shared management + data plane (the sheet metal test)

• Shared state for key data structures (persistence, health check, etc.)

• Linear scale by adding instances (up to 32)

• Ability to manage faults with proportional degradation

Page 17: #lspe: Dynamic Scaling

Real-timeAnalytics

Bandwidth

Connections

Top ‘N Requests

Response Time

Frequency

Policy Based Traffic Selection

Policy BasedActions

DecisionFeedback loop

Compress

Cache

Log

Drop

Respond

Page 18: #lspe: Dynamic Scaling

Scaling Globally

Global Server Load Balancing(GSLB)

Route Health Injection(RHI)

NetScaler uses DNS to send users to the closest site based on administrator defined metrics (geography, topology,

site performance, availability)

NetScaler dynamically updates routing tables to direct clients to the active site based on real-time health

monitoring of backend infrastructure.

Active

SiteMirror

Site

Page 19: #lspe: Dynamic Scaling

Your ADC Has an API!

Page 20: #lspe: Dynamic Scaling

API in a Nutshell: Your ADC Has This

API

Interfaces Client Toolkits Policy Statistics

SOAP RESTfulScripting

Perl/PHP/Python/PowerShell

OOPJava/C#/ASP/

.NET based

Reverse Call-Out

JSON/XML Bulk

ReportingGranular Reporting

Page 21: #lspe: Dynamic Scaling

More RESTful - HTTP Status Code

REQUEST RESPONSE

Citrix Confidential - Do Not Distribute

Success Case:GET http://<nsip>/nitro/v1/config/lbvserver/lbv1

Failure Case:POST http://<nsip>/nitro/v1/config/lbvserverContent-Type:application/vnd.com.citrix.netscaler.lbvserver+json

{"lbvserver":{"name":"lbv111", "servicetype":"HTTP"}

}

Success Case

HTTP 200 OK

Failure Case:

HTTP/1.0 409 Conflict

{"errorcode": 273, "message": "Resource already exists", "severity": "ERROR"

}

Page 22: #lspe: Dynamic Scaling

Indicate we want “rollback on failure” in this session

Prepare 3 lbvservers to be added in one bulk operation

Print results

Example: Using Java

Output

No attempt to add

“lb3” because of

Rollback behavior

Page 23: #lspe: Dynamic Scaling

AutoSense and AutoScale

CloudStack

Internet

NetScaler is auto-provisioned by CloudStack

M

M

M

M

NetScaler monitors servers for CPU, Memory, Latency, Throughput …NetScaler monitoring engine auto-detects abnormal behavior with servers

M

M

NetScaler triggers AutoScale capability in CloudStackCloudStack “auto-provisions” new server instances based on AutoScale policyOn successful AutoScale, CloudStack provides new service descriptionsNetScaler automatically adds new service resources and does bindingsTraffic is automatically scaled to the newly added services on NetScaler

M

M

Page 24: #lspe: Dynamic Scaling

Work better. Live better.