77
The Reluctant SysAdmin Managing the Server side of a Client- Server iPhone App Jen Harvey, Voxilate @jen_h 360|iDev Austin Nov 10 2010 Wednesday, November 17, 2010

The Reluctant SysAdmin : 360|iDev Austin 2010

Embed Size (px)

Citation preview

Page 1: The Reluctant SysAdmin : 360|iDev Austin 2010

The Reluctant SysAdmin

Managing the Server side of a Client-Server iPhone App

Jen Harvey, Voxilate@jen_h

360|iDev AustinNov 10 2010

Wednesday, November 17, 2010

Page 2: The Reluctant SysAdmin : 360|iDev Austin 2010

• Me: Network security background, OSS & Linux fangirl

• Currently: Co-founder of Voxilate with Steven Hugg

• Last year: Traveling the country while bootstrapping the company, building iPhone apps on the road

A little background...

Wednesday, November 17, 2010

Page 3: The Reluctant SysAdmin : 360|iDev Austin 2010

HeyTell

• HeyTell Voice Messenger allows users to share short voice messages & location

• Released February 2010

• Have been building, managing, deploying, re-deploying, updating, expanding, scaling on the road ever since...

Wednesday, November 17, 2010

Page 4: The Reluctant SysAdmin : 360|iDev Austin 2010

• Map of travels, 360iDev San Jose!

360|iDev San Jose!

Wednesday, November 17, 2010

Page 5: The Reluctant SysAdmin : 360|iDev Austin 2010

Current Objectives• Keep over 1 million users happy &

using our app

• Maintain respectable uptime & performance while adding new features & expanding our reach

• Get a little sleep at night

• Share what we’ve learned so that others who embark on similar journeys can also sleep!

Wednesday, November 17, 2010

Page 6: The Reluctant SysAdmin : 360|iDev Austin 2010

Agenda

• Why a Server?

• Choose Your Poison

• Build It Out

• Lock It Down

• Maintain & Monitor

Wednesday, November 17, 2010

Page 7: The Reluctant SysAdmin : 360|iDev Austin 2010

So...why would you want to run a server

component?

Wednesday, November 17, 2010

Page 8: The Reluctant SysAdmin : 360|iDev Austin 2010

Metrics!• What metrics are valuable to you?

• Number of total users

• Number of active users per day/month/year

• Number of whatever-it-is-you-do all day (for us, submitted messages)

• Number of customers vs. users

• Busiest times of day/week/month?

grep is

awesome

Wednesday, November 17, 2010

Page 9: The Reluctant SysAdmin : 360|iDev Austin 2010

Track app usage & errors

•Speed customer support

•Understand how users really use your app

•Be alerted when errors occur

•Really useful for beta testing to determine app viability

Wednesday, November 17, 2010

Page 10: The Reluctant SysAdmin : 360|iDev Austin 2010

Provide value-added content

• Virtual goods or in-app purchase goodies

• User-to-User or User-to-Public content sharing

• Run your own analytics or ad servers

Wednesday, November 17, 2010

Page 11: The Reluctant SysAdmin : 360|iDev Austin 2010

Basic Web Server

• Informational site for game

• Customer service site

• FAQ hosting

• Note: This is not what we’re focusing on in this talk, but the info here is pretty general purpose! :)

Wednesday, November 17, 2010

Page 12: The Reluctant SysAdmin : 360|iDev Austin 2010

Control your own Push Notifications

• Don’t need an external service (free)

• Can be a little painful to set up, but resources & libraries exist on web for PHP, Java, Python, Ruby...

• Additional insight when users run into Push Notification issues

Wednesday, November 17, 2010

Page 13: The Reluctant SysAdmin : 360|iDev Austin 2010

Your systems

Apple’s systems

iPhone Client

App Store Receipt checking

• Verify user is a customer before enabling feature

• To gather real-time statistics: piracy trends, conversion rates for freemium apps

Wednesday, November 17, 2010

Page 14: The Reluctant SysAdmin : 360|iDev Austin 2010

#1 reason to use a server component?

Wednesday, November 17, 2010

Page 15: The Reluctant SysAdmin : 360|iDev Austin 2010

Your server is your app’s engine

Image courtesy of Richard Smith/gocarts on flickr: http://flickr.com/gocartsWednesday, November 17, 2010

Page 16: The Reluctant SysAdmin : 360|iDev Austin 2010

Choose Your Poison

Wednesday, November 17, 2010

Page 17: The Reluctant SysAdmin : 360|iDev Austin 2010

We’re lucky! So many hosting options!

Wednesday, November 17, 2010

Page 18: The Reluctant SysAdmin : 360|iDev Austin 2010

Cloud: Infrastructure as a Service

• Pay-as-you-go systems deployment

• Amazon Web Services (EC2, S3, RDS, ELB, ...)

• Microsoft Azure

• VMWare vCloud

• Rackspace Cloud (formerly Mosso)

• ...Wednesday, November 17, 2010

Page 19: The Reluctant SysAdmin : 360|iDev Austin 2010

Cloud: Platform as a Service• Write your app for the platform,

interact via API, provider handles scaling and administrative tasks:

• Heroku (for Ruby enthusiasts, built on EC2)

• Google App Engine (Java, Python, JRuby...)

• Engine Yard (Ruby)

• ...Wednesday, November 17, 2010

Page 20: The Reluctant SysAdmin : 360|iDev Austin 2010

Virtual Private Servers (VPS)

• You pay for a dedicated server, sometimes a VM, sometimes hardware

• Rackspace

• Slicehost

• Linode

• ...

Wednesday, November 17, 2010

Page 21: The Reluctant SysAdmin : 360|iDev Austin 2010

Your Mom’s Basement

• Or your office.

• You don’t find sleep essential, do you?

• (No, really, this is fantastic if you have a large team & money to build out...but as an indie, you are likely to have neither)

Wednesday, November 17, 2010

Page 22: The Reluctant SysAdmin : 360|iDev Austin 2010

Considerations

• What’s your preferred language & OS? Write and work with what you know!

• How much responsibility/flexibility/portability do you want/need to have?

• What’s your budget? GAE & AWS have free tiers to give you a taste & likely have enough horsepower to start with.

Wednesday, November 17, 2010

Page 23: The Reluctant SysAdmin : 360|iDev Austin 2010

My advice:Go with what you

know & feel comfortable with

Wednesday, November 17, 2010

Page 24: The Reluctant SysAdmin : 360|iDev Austin 2010

We chose Amazon Web Services

• Quick & flexible & full of building blocks:

• Load balancers

• Hosted MySQL & SimpleDB

• Multiple availability zones

• Lots of h/w & memory configs

• S3 redundant storageWednesday, November 17, 2010

Page 25: The Reluctant SysAdmin : 360|iDev Austin 2010

And...• Great APIs: Command line tools & lots

of libraries

• Can script anything or integrate w/web app

• Can do some management tasks from phone

• Huge user community - many ways to obtain support

Wednesday, November 17, 2010

Page 26: The Reluctant SysAdmin : 360|iDev Austin 2010

Also...

• Quick & simple to prototype system architecture

• Easy to bring up identical-to-production test beds with same configuration as production - but with discrete & separate security grouping

• Published Service Level Agreement and Security Practices documentation

Wednesday, November 17, 2010

Page 27: The Reluctant SysAdmin : 360|iDev Austin 2010

Cons

• Handle scaling (& everything else) yourself - just because your app is “in the cloud,” doesn’t mean it automatically scales

• Harder to set up, pre-built machine images available, but still need to customize/secure

• Instances are ephemeral (but I like this because of the way it forces you to architect)

Wednesday, November 17, 2010

Page 28: The Reluctant SysAdmin : 360|iDev Austin 2010

Build It Out

Wednesday, November 17, 2010

Page 29: The Reluctant SysAdmin : 360|iDev Austin 2010

A note on scaling early• Be prepared to do it

• Know it’s coming if you’re successful and architect/code with the understanding that you’re the guy/gal who’s going to have to make it work when it comes

• Don’t overarchitect early on

• Slow, hypeless ramp-up & predictable viral growth can help here

Wednesday, November 17, 2010

Page 30: The Reluctant SysAdmin : 360|iDev Austin 2010

Cool! We have a Enterprise-Grade(TM)horizontal webscale scaling solution!

Uh, it’s getting corrupted every

12 hours.

SHUTDOWN

EVERYTHING

Wednesday, November 17, 2010

Page 31: The Reluctant SysAdmin : 360|iDev Austin 2010

Build with security in mind

• Develop & build your custom software with security in mind

• You know what anomalous behavior is/can be

• Put on the adversary’s hat - what could they do? What’s the worst outcome? Is it worth building in protection for certain scenarios?

Wednesday, November 17, 2010

Page 32: The Reluctant SysAdmin : 360|iDev Austin 2010

The Voltron Principle

Individual components join to build the ultimate defender of the universe

Wednesday, November 17, 2010

Page 33: The Reluctant SysAdmin : 360|iDev Austin 2010

• Single Linux-based machine image we use to build everything on top of

• Document changes for future migrations (I ♥ script)s

• On deployment, bolt-on the pieces we need & config changes

• If a host goes down, we can bring up an identical host in known state in minutes, swap out their IPs and run the post-mortem once we’ve normalized

Voltron Core

Wednesday, November 17, 2010

Page 34: The Reluctant SysAdmin : 360|iDev Austin 2010

• Essential logs & configuration files periodically stored on S3

• Rotate logs frequently, especially as you grow

• Don’t store passwords or keys in configs, populate these on deploy (I abuse sed, you may use something more elegant)

Wednesday, November 17, 2010

Page 35: The Reluctant SysAdmin : 360|iDev Austin 2010

Load balancer

Security Infrastructure

DatabasePersistent

storage Cache

Base AMI

Application Core

Notification server

Wednesday, November 17, 2010

Page 36: The Reluctant SysAdmin : 360|iDev Austin 2010

But some days...

Wednesday, November 17, 2010

Page 37: The Reluctant SysAdmin : 360|iDev Austin 2010

Assume everything will fail

Wednesday, November 17, 2010

Page 38: The Reluctant SysAdmin : 360|iDev Austin 2010

Ready to setup our new domain name?

Hey, do CNAMEs have a “.” at the

end?

D’OH!Let’s wait 2

hours for it to expire...

Wednesday, November 17, 2010

Page 39: The Reluctant SysAdmin : 360|iDev Austin 2010

Find your possible points of failure (rusty robot joints)

• DNS - if your hostname doesn’t resolve, your app can’t get home

• Are backups working?

• Storage and/or database - what happens when/if they go away?

• DDoS (intentional or not...)

Wednesday, November 17, 2010

Page 40: The Reluctant SysAdmin : 360|iDev Austin 2010

• Deal with small amounts of failure gracefully (cache, limited functionality)

• Don’t put your web server & application server components on the same *anything*

Wednesday, November 17, 2010

Page 41: The Reluctant SysAdmin : 360|iDev Austin 2010

But you will, without a doubt, run into a ‘flesh wound’ issue

Wednesday, November 17, 2010

Page 42: The Reluctant SysAdmin : 360|iDev Austin 2010

How you handle it is pivotal

Wednesday, November 17, 2010

Page 43: The Reluctant SysAdmin : 360|iDev Austin 2010

The database is bogged down. I think this one feature is causing it.

Does anyone even know we have that

feature?

That feature’s GONE!

Wednesday, November 17, 2010

Page 44: The Reluctant SysAdmin : 360|iDev Austin 2010

• Twitter

• Facebook

• Respond to customer support emails (have cut & pastable friendly response - small team has no time for personal emails in crisis)

• You may feel like it’s the end of the world, but this, too, shall pass

Customer Communication == Key

Wednesday, November 17, 2010

Page 45: The Reluctant SysAdmin : 360|iDev Austin 2010

Hey, guys, Justin Bieber just announced he’s using us on Twitter!

Cool. Who’s that?

Gah! Server’s melted! Users

revolt!

Wednesday, November 17, 2010

Page 46: The Reluctant SysAdmin : 360|iDev Austin 2010

Helpful tip for high-traffic systems

• If you’re looking to max out connections on a single Linux-based system, think about:

• Memory & file handles (see also: ulimit tweaking)

• Connection tracking as relates to memory (look up netfilter/tcp stack tweaking)

Wednesday, November 17, 2010

Page 47: The Reluctant SysAdmin : 360|iDev Austin 2010

Lock it Down

Wednesday, November 17, 2010

Page 48: The Reluctant SysAdmin : 360|iDev Austin 2010

Yes, security is your problem

• If you are storing users personal information, you are subject to laws and regulations in the US, specific states, and foreign countries

• Many jurisdictions define personal information differently

• Most regulations require a written policy and best practices for security

Wednesday, November 17, 2010

Page 49: The Reluctant SysAdmin : 360|iDev Austin 2010

So what’s best practices?

• Secure your perimeter

• Secure your services

• Detect, alert on, and block suspicious activity

• Protect your users and encrypt user information in transit and at rest

• Have written policies and plans

Wednesday, November 17, 2010

Page 50: The Reluctant SysAdmin : 360|iDev Austin 2010

Secure Your Perimeter• AWS has (at least) two walls

• One is its “security group” context

• One is your image’s local firewall

• Block everything by default, open only the ports you need

• No root login

• Passwordless login only (use key pairs)

Wednesday, November 17, 2010

Page 51: The Reluctant SysAdmin : 360|iDev Austin 2010

Secure your services

• Services should not run as root (for ex., www-data for apache2)

• Service usernames should not have shell login access

• Monitor for security vulnerabilities & upgrade when needed

• Build security into custom software

Wednesday, November 17, 2010

Page 52: The Reluctant SysAdmin : 360|iDev Austin 2010

For host-based intrusion detection - I love OSSEC:

• Quick & easy, lightweight, Open Source, free

• Alerts on logs - extensive default ruleset but can customize alerting for your specific app

• Daily Tripwire & rootkit checks

• Active response: can block IPs on suspicious behavior

Detect & Alert

Wednesday, November 17, 2010

Page 53: The Reluctant SysAdmin : 360|iDev Austin 2010

• If you need to store user information, encrypt in transit and at rest

• If you need data from your systems locally, use encryption end-to-end -- down to encrypting your drive

• Use SSL in the great wide world, it’s not that hard!

Protect Your Users

Wednesday, November 17, 2010

Page 54: The Reluctant SysAdmin : 360|iDev Austin 2010

Why use SSL?

• Protects your users from sending personal data over the Internet in the clear

• Protects you from neophyte reverse engineers

Wednesday, November 17, 2010

Page 55: The Reluctant SysAdmin : 360|iDev Austin 2010

On Using SSL

• EC2 Load Balancer now allows SSL termination - https to the LB, http inside data center

• Small & bootstrapped like us? Use StartSSL - free certs. Go to someone like DigiCert for nifty wildcard certs once you’ve got the resources.

Wednesday, November 17, 2010

Page 57: The Reluctant SysAdmin : 360|iDev Austin 2010

User Passwords

• Many users will use the same password for everything--banks & FourSquare.

• There’s nothing you can do about it.

• Databases full of email addresses and passwords are attractive targets for this reason

Wednesday, November 17, 2010

Page 58: The Reluctant SysAdmin : 360|iDev Austin 2010

Don’t be an attractive target...don’t make personal information necessary to use the service, if at all possible

Wednesday, November 17, 2010

Page 59: The Reluctant SysAdmin : 360|iDev Austin 2010

Allow mechanisms for users to update or clear their information at any time without your intervention

Wednesday, November 17, 2010

Page 60: The Reluctant SysAdmin : 360|iDev Austin 2010

Whenever possible, educate users about protecting their privacy (this leads to all good - more educated users, fewer complaints, more trust, more goodwill, more/happier users!)

Wednesday, November 17, 2010

Page 61: The Reluctant SysAdmin : 360|iDev Austin 2010

• Have a policy for purging data/accounts/etc. that you don’t need and follow it

• Automate this or build it into the app if you can

• Have a written policy for data breaches and intrusions

• Write down instructions for yourself-- this’ll keep you sane if you ever have a real breach or a false alarm

Wednesday, November 17, 2010

Page 62: The Reluctant SysAdmin : 360|iDev Austin 2010

• Keep a list of the services you use, one quick & dirty thing to do is scrape vulnerability feeds like feed://nvd.nist.gov/download/nvd-rss.xml for your service names

• When security issues are reported and new versions released, patch out of band, test, replace (pretty easy to do with EC2!)

Wednesday, November 17, 2010

Page 63: The Reluctant SysAdmin : 360|iDev Austin 2010

Your users & piece of mind are totally

worth it! This will save you

time & sanity in the long run

Wednesday, November 17, 2010

Page 64: The Reluctant SysAdmin : 360|iDev Austin 2010

Maintain & Monitor

Wednesday, November 17, 2010

Page 65: The Reluctant SysAdmin : 360|iDev Austin 2010

Planning Maintenance

• If you can, use a load balancer and switch out backend servers

• Have backup systems in working state for fall-back

• Track usage statistics throughout your app’s lifetime - schedule maintenance for “slowest” time period

Wednesday, November 17, 2010

Page 66: The Reluctant SysAdmin : 360|iDev Austin 2010

00 02 04 06 08 10 12 14 16 18 20 22

SundayMondayTuesdayWednesdayThursdayFridaySaturday

The User Rollercoaster(# connections/hour, GMT)

Sunday, 11:00 GMT it is, then.

Wednesday, November 17, 2010

Page 67: The Reluctant SysAdmin : 360|iDev Austin 2010

Keep a Calendar

• Keep a calendar of important dates:

• Developer certificate expirations

• SSL certificate expiration

• APNS certificate expiration

• Domain name registry expiration

Wednesday, November 17, 2010

Page 68: The Reluctant SysAdmin : 360|iDev Austin 2010

Monitor Uptime• Check out Pingdom - set thresholds to

be alerted when servers are slow or inaccessible

• Configure OSSEC to alert on conditions that precipitate an “issue”

• Set alerts or automated account recharges for *everything* that could block app functionality

• Make sure someone’s always accessible

Wednesday, November 17, 2010

Page 69: The Reluctant SysAdmin : 360|iDev Austin 2010

Hey, the server’s down. Where are you guys?

I’m on a BOAT!

I’m on a PLANE, yo!

Wednesday, November 17, 2010

Page 70: The Reluctant SysAdmin : 360|iDev Austin 2010

HeyTell Systems Uptime

97

97.75

98.5

99.25

100

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov

Wednesday, November 17, 2010

Page 71: The Reluctant SysAdmin : 360|iDev Austin 2010

Downtime, Minutes

0

175

350

525

700

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov

99.2% uptime == 350 minutes == 5.8 hours!!

Wednesday, November 17, 2010

Page 72: The Reluctant SysAdmin : 360|iDev Austin 2010

Total Downtime:1.5 days

Uptime: 99.48%

(uptime sounds sexy; downtime...not so much)

Wednesday, November 17, 2010

Page 73: The Reluctant SysAdmin : 360|iDev Austin 2010

Managing on the Run

• Phone SSH client (CommandBot on Droid, iSSH on iPhone)

• EC2 Management client (Decaf on Droid, iAWSManager on iPhone)

• Separate Support Account email setup on phone

• Notepad app with customer support FAQ answers

Wednesday, November 17, 2010

Page 74: The Reluctant SysAdmin : 360|iDev Austin 2010

Other lifesavers on the run

• Reliable 3G service

• Mobile broadband card and/or tethering setup

• Netbook or small laptop

Wednesday, November 17, 2010

Page 75: The Reluctant SysAdmin : 360|iDev Austin 2010

Summary• On hosting: Go with what you know

• Architect with failure & future scaling issues in mind

• Lock it down: Keep your data & your users safe

• Monitoring & maintenance: Make your systems work for you

• Good luck! You can do it!

Wednesday, November 17, 2010

Page 76: The Reluctant SysAdmin : 360|iDev Austin 2010

References & Links

OSSEC: http://ossec.net

Pingdom uptime cheatsheet: http://royal.pingdom.com/2009/03/24/a-handy-uptime-and-downtime-conversion-cheat-sheet/

AWS Free Tier info: http://aws.amazon.com/free/

AWS Security Doc: http://awsmedia.s3.amazonaws.com/pdf/AWS_Security_Whitepaper.pdf

TCP stack tweakage: http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1

Wednesday, November 17, 2010

Page 77: The Reluctant SysAdmin : 360|iDev Austin 2010

Thanks!

email: [email protected]: @jen_h

Wednesday, November 17, 2010