41
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Empowering Congress with Data-Driven Analytics Mathew Chase, November 13, 2013 Sri Vasireddy,

Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Embed Size (px)

DESCRIPTION

MACPAC is a federal legislative branch agency tasked with reviewing state and federal Medicaid and Children's Health Insurance Program (CHIP) access and payment policies and making recommendations to Congress. By March 15 and again by June 15 each year, the agency produces a comprehensive report for Congress that compiles results from Medicaid and CHIP data sources for the 50 states and territories. The CIO of MACPAC wanted a secure, cost-effective, high performance platform that met their needs to crunch this large amount of health data. In this session, learn how MACPAC and 8KMiles helped set up the agency’s Big Data/HPC analytics platform on AWS using SAS analytics software.

Citation preview

Page 1: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Empowering Congress with Data-Driven Analytics

Mathew Chase, November 13, 2013

Sri Vasireddy,

Page 2: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

• A small federal legislative branch agency • Newly established in late 2010 • Going beyond the “Cloud First” goal to “Cloud Only”

Page 3: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Hello

• Mathew Chase • Federal CIO • Over 20 years experience in the

public and private sectors leading technology operations

Page 4: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Who are you?

• Government • Health care industry • Cloud newbies • AWS ninjas

• Whoops… wrong session

Page 5: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Question?

How many of you are using AWS as your primary

computing datacenter?

Page 6: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

MACPAC’s AWS Datacenter

• AWS to replace an onsite or hosted datacenter

• Single primary region with cold recovery on the the other coast

• Multiple AZs for redundancy • Separate VPCs for security “air gaps”

Page 7: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

MACPAC: the “perfect” cloud customer

• Predicable work cycles • Two intense work periods (annual)

• Growing with an undefined future • Potential need for more computing

resources • Very cost conscious • No legacy infrastructure

Page 8: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

What we achieved in the cloud

• > 40% reduction in capital expenses – With additional savings in rent, utilities, and labor

• Cost spread over typical equipment lifespan • On demand storage and archiving • Zero over provisioning • Ability to expand and contract resources at will

Page 9: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Core focus

Recommendations to Congress on Medicaid and the Children’s Health

Insurance Program

Page 10: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Reports to the Congress

Reports due by: • March 15th & • June 15th

www.MACPAC.gov/reports

Page 11: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Research backed by analytics

• Analyze Medicaid program data • Find intersections with Medicare • Evaluate Medicaid survey information

Page 12: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Tools

• SAS Office Analytics enterprise platform • Red Hat Enterprise Linux x64 • Amazon EC2

Page 13: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Concerns

1. Security 2. Performance

Page 14: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Security

Page 15: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Security Requirements

• Multi-user controlled environment • Isolated environment with strong controls • No sensitive and personal data sitting at

periphery • Data encrypted at rest and in transit

Page 16: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Access Protection Challenge

• Twenty Instances • Twenty Ports for AD • 20 x 20 = 400 Rules

Page 17: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

AD Security Group

DNS SecurityGroup

Infra Security Group

Client Instances

DNS-1

AD-1 AD-2

DNS-2 Accept DNS queries from ‘Infra’ group

Accept AD related requests from ‘Infra’ group

Access Control Using Security Groups

Accept DNS queries from AD group

Page 18: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Encrypted Data flow

Page 19: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Cloud Security Design

Page 20: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Performance

Page 21: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

SAS Requirements

• Very IO intensive • Sequential read and writes

o 35-70mb/sec per core of IO desired o GOAL: 4 core system = ~200mb /sec IO

Page 22: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Base AWS Structure

• M3 extra large running RHEL x64 for cluster o 1 TB EBS RAID 10 for primary data (4, 500gb drives) o 1 TB EBS RAID 0 for temp work space (4, 256gb drives) o 1 TB EBS LUKS encrypted RAID 0 for ETL (4, 256gb drives)

Page 23: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Can AWS yield the necessary performance?

Page 24: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

“These go to eleven!” In the immortal words of Spinal Tap:

Page 25: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Turning up the AWS dial

Page 26: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Volume @ 3

Specifications M3 extra large

4 – 256gb EBS Disks

RAID 0 Stripe

Page 27: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

fio Sequential Read @ 3 [ec2-user]# fio sastest.fio

job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1

fio-2.1.2

Starting 1 process

Jobs: 1 (f=1)

job1: (groupid=0, jobs=1): err= 0: pid=31661: Sun Oct 27 23:07:18 2013

read : io=102400KB, bw=77167KB/s, iops=19291, runt= 1327msec

clat (usec): min=3, max=25911, avg=44.70, stdev=572.02

lat (usec): min=5, max=25913, avg=46.86, stdev=572.02

Run status group 0 (all jobs):

READ: io=102400KB, aggrb=77166KB/s, minb=77166KB/s, maxb=77166KB/s, mint=1327msec, maxt=1327msec

77,166 KB/s

Page 28: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Volume @ 10

Specifications M3 extra large

4 – 256gb EBS Disks

4000 iops per drive

RAID 0 Stripe

Page 29: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

fio Sequential Read @ 10 [ec2-user]$ fio sastest.fio

job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1

fio-2.1.2

Starting 1 process

job1: (groupid=0, jobs=1): err= 0: pid=2731: Tue Nov 5 22:55:33 2013

read : io=102400KB, bw=191402KB/s, iops=47850, runt= 535msec

clat (usec): min=3, max=51820, avg=13.29, stdev=337.22

lat (usec): min=4, max=51821, avg=15.52, stdev=337.21

Run status group 0 (all jobs):

READ: io=102400KB, aggrb=191401KB/s, minb=191401KB/s, maxb=191401KB/s, mint=535msec, maxt=535msec

191,401 KB/s

Page 30: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

“If we need that extra push over the cliff. You know what we do?”

“11! Exactly.” — Nigel

Page 31: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

fio Sequential Read @ 11 [ec2-user]$ fio sastest.fio

job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1

fio-2.1.2

Starting 1 process

job1: (groupid=0, jobs=1): err= 0: pid=3133: Tue Nov 5 23:13:13 2013

read : io=102400KB, bw=432068KB/s, iops=108016, runt= 237msec

clat (usec): min=0, max=1594, avg= 8.26, stdev=42.59

lat (usec): min=0, max=1594, avg= 8.38, stdev=42.59

Run status group 0 (all jobs):

READ: io=102400KB, aggrb=432067KB/s, minb=432067KB/s, maxb=432067KB/s, mint=237msec, maxt=237msec

432,067 KB/s

Page 32: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Volume @ 11

Specifications 4 – 256gb EBS Disks

4000 iops per drive

RAID 0 Stripe

cg1.4xlarge (10gb io channel)

Page 33: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

fio Sequential Read @ 11 [ec2-user]$ fio sastest.fio

job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1

fio-2.1.2

Starting 1 process

job1: (groupid=0, jobs=1): err= 0: pid=3133: Tue Nov 5 23:13:13 2013

read : io=102400KB, bw=432068KB/s, iops=108016, runt= 237msec

clat (usec): min=0, max=1594, avg= 8.26, stdev=42.59

lat (usec): min=0, max=1594, avg= 8.38, stdev=42.59

Run status group 0 (all jobs):

READ: io=102400KB, aggrb=432067KB/s, minb=432067KB/s, maxb=432067KB/s, mint=237msec, maxt=237msec

432,067 KB/s

Page 34: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

I am pretty sure I can make the dial go higher

Ram Disks Block sizes Larger stripes Application tuning Etc…

Page 35: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

WARNING!

• Be sure to touch all sectors of a new disk per AWS guidance prior to testing and production

$ dd if=/dev/md0 of=/dev/null

Command for Unix environments

Page 36: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

You are not alone…

• Guidance from software vendors • AWS professional services • Use an iterative process (Fail quickly) • Third party partners (8kMiles)

so get going!

Page 37: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

What did we learn?

• Make a decision • Start at zero… • Spend time really thinking about security • And then crank it up where you need it

“Try again. Fail again. Fail better.” Samuel Beckett, Worstward Ho (1983)

Page 38: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

References • Amazon EBS Volume Performance

– http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html

• AWS Microsoft Platform Security – http://media.amazonwebservices.com/AWS_Microsoft_Platform_Se

curity.pdf

• Benchmarking SAS I/O: Verifying I/O Performance Using fio – http://support.sas.com/resources/papers/proceedings13/479-

2013.pdf

• This is Spinal Tap (Movie, 1984, Rob Reiner - Director)

Page 39: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Special Thanks to: 8kMiles, AWS, and SAS

And thank you for your time today.

Page 40: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Contact Information [email protected]

www.macpac.gov

Page 41: Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013

Please give us your feedback on this presentation

As a thank you, we will select prize winners daily for completed surveys!

BDT304