Upload
amazon-web-services
View
1.022
Download
1
Embed Size (px)
DESCRIPTION
MACPAC is a federal legislative branch agency tasked with reviewing state and federal Medicaid and Children's Health Insurance Program (CHIP) access and payment policies and making recommendations to Congress. By March 15 and again by June 15 each year, the agency produces a comprehensive report for Congress that compiles results from Medicaid and CHIP data sources for the 50 states and territories. The CIO of MACPAC wanted a secure, cost-effective, high performance platform that met their needs to crunch this large amount of health data. In this session, learn how MACPAC and 8KMiles helped set up the agency’s Big Data/HPC analytics platform on AWS using SAS analytics software.
Citation preview
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Empowering Congress with Data-Driven Analytics
Mathew Chase, November 13, 2013
Sri Vasireddy,
• A small federal legislative branch agency • Newly established in late 2010 • Going beyond the “Cloud First” goal to “Cloud Only”
Hello
• Mathew Chase • Federal CIO • Over 20 years experience in the
public and private sectors leading technology operations
Who are you?
• Government • Health care industry • Cloud newbies • AWS ninjas
• Whoops… wrong session
Question?
How many of you are using AWS as your primary
computing datacenter?
MACPAC’s AWS Datacenter
• AWS to replace an onsite or hosted datacenter
• Single primary region with cold recovery on the the other coast
• Multiple AZs for redundancy • Separate VPCs for security “air gaps”
MACPAC: the “perfect” cloud customer
• Predicable work cycles • Two intense work periods (annual)
• Growing with an undefined future • Potential need for more computing
resources • Very cost conscious • No legacy infrastructure
What we achieved in the cloud
• > 40% reduction in capital expenses – With additional savings in rent, utilities, and labor
• Cost spread over typical equipment lifespan • On demand storage and archiving • Zero over provisioning • Ability to expand and contract resources at will
Core focus
Recommendations to Congress on Medicaid and the Children’s Health
Insurance Program
Reports to the Congress
Reports due by: • March 15th & • June 15th
www.MACPAC.gov/reports
Research backed by analytics
• Analyze Medicaid program data • Find intersections with Medicare • Evaluate Medicaid survey information
Tools
• SAS Office Analytics enterprise platform • Red Hat Enterprise Linux x64 • Amazon EC2
Concerns
1. Security 2. Performance
Security
Security Requirements
• Multi-user controlled environment • Isolated environment with strong controls • No sensitive and personal data sitting at
periphery • Data encrypted at rest and in transit
Access Protection Challenge
• Twenty Instances • Twenty Ports for AD • 20 x 20 = 400 Rules
AD Security Group
DNS SecurityGroup
Infra Security Group
Client Instances
DNS-1
AD-1 AD-2
DNS-2 Accept DNS queries from ‘Infra’ group
Accept AD related requests from ‘Infra’ group
Access Control Using Security Groups
Accept DNS queries from AD group
Encrypted Data flow
Cloud Security Design
Performance
SAS Requirements
• Very IO intensive • Sequential read and writes
o 35-70mb/sec per core of IO desired o GOAL: 4 core system = ~200mb /sec IO
Base AWS Structure
• M3 extra large running RHEL x64 for cluster o 1 TB EBS RAID 10 for primary data (4, 500gb drives) o 1 TB EBS RAID 0 for temp work space (4, 256gb drives) o 1 TB EBS LUKS encrypted RAID 0 for ETL (4, 256gb drives)
Can AWS yield the necessary performance?
“These go to eleven!” In the immortal words of Spinal Tap:
Turning up the AWS dial
Volume @ 3
Specifications M3 extra large
4 – 256gb EBS Disks
RAID 0 Stripe
fio Sequential Read @ 3 [ec2-user]# fio sastest.fio
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.1.2
Starting 1 process
Jobs: 1 (f=1)
job1: (groupid=0, jobs=1): err= 0: pid=31661: Sun Oct 27 23:07:18 2013
read : io=102400KB, bw=77167KB/s, iops=19291, runt= 1327msec
clat (usec): min=3, max=25911, avg=44.70, stdev=572.02
lat (usec): min=5, max=25913, avg=46.86, stdev=572.02
Run status group 0 (all jobs):
READ: io=102400KB, aggrb=77166KB/s, minb=77166KB/s, maxb=77166KB/s, mint=1327msec, maxt=1327msec
77,166 KB/s
Volume @ 10
Specifications M3 extra large
4 – 256gb EBS Disks
4000 iops per drive
RAID 0 Stripe
fio Sequential Read @ 10 [ec2-user]$ fio sastest.fio
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.1.2
Starting 1 process
job1: (groupid=0, jobs=1): err= 0: pid=2731: Tue Nov 5 22:55:33 2013
read : io=102400KB, bw=191402KB/s, iops=47850, runt= 535msec
clat (usec): min=3, max=51820, avg=13.29, stdev=337.22
lat (usec): min=4, max=51821, avg=15.52, stdev=337.21
Run status group 0 (all jobs):
READ: io=102400KB, aggrb=191401KB/s, minb=191401KB/s, maxb=191401KB/s, mint=535msec, maxt=535msec
191,401 KB/s
“If we need that extra push over the cliff. You know what we do?”
“11! Exactly.” — Nigel
fio Sequential Read @ 11 [ec2-user]$ fio sastest.fio
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.1.2
Starting 1 process
job1: (groupid=0, jobs=1): err= 0: pid=3133: Tue Nov 5 23:13:13 2013
read : io=102400KB, bw=432068KB/s, iops=108016, runt= 237msec
clat (usec): min=0, max=1594, avg= 8.26, stdev=42.59
lat (usec): min=0, max=1594, avg= 8.38, stdev=42.59
Run status group 0 (all jobs):
READ: io=102400KB, aggrb=432067KB/s, minb=432067KB/s, maxb=432067KB/s, mint=237msec, maxt=237msec
432,067 KB/s
Volume @ 11
Specifications 4 – 256gb EBS Disks
4000 iops per drive
RAID 0 Stripe
cg1.4xlarge (10gb io channel)
fio Sequential Read @ 11 [ec2-user]$ fio sastest.fio
job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.1.2
Starting 1 process
job1: (groupid=0, jobs=1): err= 0: pid=3133: Tue Nov 5 23:13:13 2013
read : io=102400KB, bw=432068KB/s, iops=108016, runt= 237msec
clat (usec): min=0, max=1594, avg= 8.26, stdev=42.59
lat (usec): min=0, max=1594, avg= 8.38, stdev=42.59
Run status group 0 (all jobs):
READ: io=102400KB, aggrb=432067KB/s, minb=432067KB/s, maxb=432067KB/s, mint=237msec, maxt=237msec
432,067 KB/s
I am pretty sure I can make the dial go higher
Ram Disks Block sizes Larger stripes Application tuning Etc…
WARNING!
• Be sure to touch all sectors of a new disk per AWS guidance prior to testing and production
$ dd if=/dev/md0 of=/dev/null
Command for Unix environments
You are not alone…
• Guidance from software vendors • AWS professional services • Use an iterative process (Fail quickly) • Third party partners (8kMiles)
so get going!
What did we learn?
• Make a decision • Start at zero… • Spend time really thinking about security • And then crank it up where you need it
“Try again. Fail again. Fail better.” Samuel Beckett, Worstward Ho (1983)
References • Amazon EBS Volume Performance
– http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html
• AWS Microsoft Platform Security – http://media.amazonwebservices.com/AWS_Microsoft_Platform_Se
curity.pdf
• Benchmarking SAS I/O: Verifying I/O Performance Using fio – http://support.sas.com/resources/papers/proceedings13/479-
2013.pdf
• This is Spinal Tap (Movie, 1984, Rob Reiner - Director)
Special Thanks to: 8kMiles, AWS, and SAS
And thank you for your time today.
Contact Information [email protected]
www.macpac.gov
Please give us your feedback on this presentation
As a thank you, we will select prize winners daily for completed surveys!
BDT304