55
This slide intentionally left blank. Wednesday, 17 October 12

Making clouds go faster, for fun and profit!

Embed Size (px)

DESCRIPTION

Everyone loves it when things are fast, and that statement holds true whether you're visiting http://www.livingsocial.com or whether you're hitting the OpenStack Nova API and requesting, "Please show me all the instances which I've got running". Nobody ever writes in asking for support and saying, "All of my API calls are completing far too quickly. Slow it down!". Optimizing the performance of software is arguably a never ending crusade. At some point in time you'll get things fast enough that you can say, "Any effort invested beyond this point is not adding value for the business" but then along comes new code which adds a zillion awesome features, but also regresses performance back to a level where it needs another tune-up. In the process of transforming our infrastructure and preparing our new OpenStack IaaS to host all our applications, we've been looking for performance wins across the whole stack. We've got some aggressive targets to meet. We've investigated many hardware options and chosen an optimal solution, we've instrumented some of the OpenStack APIs and benchmarked to produce interesting results, and whilst we're not done yet, we do have a "Half-Time Match Report". Join me as I walk through our learnings so far and propose follow-on areas for investigation and optimization.

Citation preview

Page 1: Making clouds go faster, for fun and profit!

This slide intentionally left blank.

Wednesday, 17 October 12

Page 2: Making clouds go faster, for fun and profit!

MAKING CLOUDS GO FASTERFOR FUN AND PROFIT

2

Wednesday, 17 October 12

Page 3: Making clouds go faster, for fun and profit!

3

Wednesday, 17 October 12

Page 4: Making clouds go faster, for fun and profit!

SpeakersWho crafted this talk?

4

Wednesday, 17 October 12

Page 5: Making clouds go faster, for fun and profit!

Alex Howells@nixgeek

Technical OperationsLivingSocial

[email protected]://github.com/agh

5

Wednesday, 17 October 12

Page 6: Making clouds go faster, for fun and profit!

Paul Thomas@ftergl0w

Technical OperationsLivingSocial

[email protected]://github.com/AfterGlow

6

Wednesday, 17 October 12

Page 7: Making clouds go faster, for fun and profit!

Bedtime ReadingYou can get a copy of these slides after the talk -

https://speakerdeck.com/u/nixgeek

Wednesday, 17 October 12

Page 8: Making clouds go faster, for fun and profit!

Problem?8

Wednesday, 17 October 12

Page 9: Making clouds go faster, for fun and profit!

PerformanceIt doesn’t need to be rocket science.

It does matter though!

I promise I’m not trolling you.

9

Wednesday, 17 October 12

Page 10: Making clouds go faster, for fun and profit!

“Oh man, that was too fast!It’s so much betternow it’s slow!!”

-- Average User

In a parallel universe...

10

Wednesday, 17 October 12

Page 11: Making clouds go faster, for fun and profit!

YEAH RIGHTI wish I had users who were that easy to please!

But since we live in the real world...

11

Wednesday, 17 October 12

Page 12: Making clouds go faster, for fun and profit!

“Why is that dude smiling?!This is too slow!

Why can’t it be faster?”

-- Average Users

In our universe...

12

Wednesday, 17 October 12

Page 13: Making clouds go faster, for fun and profit!

THINGS ARE IMPROVINGCactus => Diablo => Essex => Folsom

13

But things can improve faster with focus!

Wednesday, 17 October 12

Page 14: Making clouds go faster, for fun and profit!

Today

Mostly reliable,but can be a bit slow!

14

Wednesday, 17 October 12

Page 15: Making clouds go faster, for fun and profit!

The Future?

Faster. More scalable.A real driving experience.

15

Wednesday, 17 October 12

Page 16: Making clouds go faster, for fun and profit!

Why should I listen to you?

What’s the big deal?16

Wednesday, 17 October 12

Page 17: Making clouds go faster, for fun and profit!

WE’RE A LOT LIKE YOU!Developers. Operators. Engineers. Users.We see potential. We see opportunities.

17

Wednesday, 17 October 12

Page 18: Making clouds go faster, for fun and profit!

18

Wednesday, 17 October 12

Page 19: Making clouds go faster, for fun and profit!

AirspaceLivingSocial PaaS

We care about speed because ...

19

* Scaling services up/down needs to happen fast! * Needing to maintain huge pools of “slack capacity” to account for sudden spikes in traffic sucks. * Upgrading applications should be fast.

What does fast mean to us? One example?

New instances online in under 10 seconds.

Wednesday, 17 October 12

Page 20: Making clouds go faster, for fun and profit!

Performance Matters

20

What could your business do if instances came online in under 5 seconds vs. 50 seconds?

> Makes integration tests leveraging the Cloud complete much faster. > Seasonal spikes? React to them faster - happier customers spend more money. > Engineers who don’t grumble that “getting servers is a pain in the ass”. > Deploy new applications and services more quickly and easily.

Along with many other things ...

Wednesday, 17 October 12

Page 21: Making clouds go faster, for fun and profit!

What do we do?

21

Wednesday, 17 October 12

Page 22: Making clouds go faster, for fun and profit!

Think Positive22

Because solutions are better than problems!

Wednesday, 17 October 12

Page 23: Making clouds go faster, for fun and profit!

23

Wednesday, 17 October 12

Page 24: Making clouds go faster, for fun and profit!

Two-ProngedApproach

Hardware & Software“A Love Story”

24

Wednesday, 17 October 12

Page 25: Making clouds go faster, for fun and profit!

Warning!

Picking the right hardware is quite hard.It’s often individual to your users needs.

What works for us may not rock your world.

25

Wednesday, 17 October 12

Page 26: Making clouds go faster, for fun and profit!

Hardware26

Wednesday, 17 October 12

Page 27: Making clouds go faster, for fun and profit!

Our Servers

27

Supermicro 1027R-WRFT+2x Intel Xeon E5-2670 (8C/16T 2.60GHz)16 x 8GB 1600MHz ECC MemoryLSI 9266-8i (1-LD RAID-10)8 x Intel 520-series 240GB SSDDual-Port Intel X540 10GBASE-T

Wednesday, 17 October 12

Page 28: Making clouds go faster, for fun and profit!

Benefits

28

* ‘Just right’ balance of CPU/RAM for us.

* Exceptional ephemeral I/O performance > Not using eMLC - trade off? > We can think about SQL on IaaS

* A surplus of network bandwidth

Servers are not a bottleneck!

Wednesday, 17 October 12

Page 29: Making clouds go faster, for fun and profit!

Our Network

29

Top of Rack -Arista Networks 7050T48-port 10GBASE-T Switch+ 4-port 40GbE (uplinks)

Zone Spine -Arista Networks 7050Q16-port 40GbE Switch

Wednesday, 17 October 12

Page 30: Making clouds go faster, for fun and profit!

Benefits

30

* A network which runs Linux!* Ability to automate it via ZTP and Chef

* Non-blocking communication in a rack.* Provision 160Gbps to spine via four cables.* Under 2:1 contention for comms in/out of rack.

* Less need to think about QoS!

Network is not a bottleneck!

Wednesday, 17 October 12

Page 31: Making clouds go faster, for fun and profit!

Software31

Wednesday, 17 October 12

Page 32: Making clouds go faster, for fun and profit!

Production

32

Ubuntu 12.04 LTS (‘Precise Pangolin’)Hypervisor -- KVM

CloudScaling OCS 1.3 .. based off OpenStack Essex ..

Moving to OCS 2.0 in near future... .. that one is OpenStack Folsom ..

Wednesday, 17 October 12

Page 33: Making clouds go faster, for fun and profit!

33

Ubuntu 12.04 LTS (‘Precise Pangolin’)Hypervisor -- KVM

Useful for development and testing .. we’re running OpenStack Folsom now ..

Most of the data shown later was grabbedwith help from DevStack running on similarhardware to our production environment.

Wednesday, 17 October 12

Page 34: Making clouds go faster, for fun and profit!

34

WHAT NOW?We’ve picked the hardware stack. It’s awesome.

We’ve got our software installed. It’s looking great.

Wednesday, 17 October 12

Page 35: Making clouds go faster, for fun and profit!

Support calls are imprecise. We need data!

Monitoring35

Wednesday, 17 October 12

Page 36: Making clouds go faster, for fun and profit!

Old School* Is my service (API) responding on TCP/8774?* Am I able to make a GET and fetch instance info?* Is my server running all the processes it should?* Are there any errors on my network ports?

If any of this looks broken,send me alerts saying so!

Wednesday, 17 October 12

Page 37: Making clouds go faster, for fun and profit!

New Thinking

* “How long did my website take to show?”* Individual performance of each click or API call* Inspection of latency within the application

If lots of users interactions are slow,then I want you to alert me.

If its just an outlier - log it and shut up.

“End-User Experience Monitoring”

Wednesday, 17 October 12

Page 38: Making clouds go faster, for fun and profit!

DEMO TIME!Because pretty pictures are awesome.

We’ll call the slowest transactions our “Disaster Porn”.

38

Wednesday, 17 October 12

Page 39: Making clouds go faster, for fun and profit!

Boundary

39

“AppViz”

* Port-to-port throughput/latency* How much SQL traffic are you doing?

Updates in real-time.Look backwards in time.

Powered by IPFIX (RFC 5101)

Wednesday, 17 October 12

Page 40: Making clouds go faster, for fun and profit!

Tracelytics

40

Lots more cool stuff to help ...We’ll blitz through a few more things next ...

Latency Trends* Over the last 60 minutes* Over the last 24 hours* Over the last 7 days

Top Tip: This is bad news.

Wednesday, 17 October 12

Page 41: Making clouds go faster, for fun and profit!

TracelyticsPatches

41

If you want to try out OpenStack APM -https://github.com/Afterglow/tracelytics-openstack

Any questions? Just open an issue!

Wednesday, 17 October 12

Page 42: Making clouds go faster, for fun and profit!

Glance

Wednesday, 17 October 12

Page 43: Making clouds go faster, for fun and profit!

Keystone

Wednesday, 17 October 12

Page 44: Making clouds go faster, for fun and profit!

Nova

Wednesday, 17 October 12

Page 45: Making clouds go faster, for fun and profit!

Nova

Wednesday, 17 October 12

Page 46: Making clouds go faster, for fun and profit!

Nova

Wednesday, 17 October 12

Page 47: Making clouds go faster, for fun and profit!

Nova

Wednesday, 17 October 12

Page 48: Making clouds go faster, for fun and profit!

“Call to Arms”

48

Reminder about those patches -https://github.com/Afterglow/tracelytics-openstack

> Performance regression tests as an OpenStack CI gate?> More people talking about “How I fixed those >5 second outliers!”> Better ‘shared knowledge’ about what settings to tweak for added oomph> Architectural analysis asking about “big picture” (big impact) changes

Wednesday, 17 October 12

Page 49: Making clouds go faster, for fun and profit!

CreditsBecause these folks are awesome

49

N.B. Not intended as an exhaustive list of all the awesome people in the world/room!

Wednesday, 17 October 12

Page 50: Making clouds go faster, for fun and profit!

http://www.livingsocial.com

Credits

50

Wednesday, 17 October 12

Page 51: Making clouds go faster, for fun and profit!

http://www.cloudscaling.com

Credits

51

Wednesday, 17 October 12

Page 52: Making clouds go faster, for fun and profit!

http://www.aristanetworks.com

Credits

52

Wednesday, 17 October 12

Page 53: Making clouds go faster, for fun and profit!

http://www.tracelytics.com

Credits

53

Wednesday, 17 October 12

Page 54: Making clouds go faster, for fun and profit!

We’re done talking,thanks for listening!

Any questions?

54

Wednesday, 17 October 12

Page 55: Making clouds go faster, for fun and profit!

Interested?E-mail Ken -

[email protected]

Or just find me!

Reminder that these slides are over at -https://speakerdeck.com/u/nixgeek

Wednesday, 17 October 12