On-Demand HDP Clusters using Cloudbreak and Ambari
Provisioning Big Data Platform using Cloudbreak & AmbariKarthik Karuppaiya Vivek MadaniSr. Engineering Manager, CPE Sr. Principal Software Engineer, CPESan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
1
AgendaSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek MadaniIntroduction1Big Data Platform Challenges2What is the solution?3Self Service Analytics Platform Provisioning4Going Hybrid Cloud using Cloudbreak5
Monitoring & Alerting6
2
IntroductionSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek MadaniSymantecSymantec is the world leader in providing security software for both enterprises and end usersThere are 1000s of Enterprises and more than 400 million devices (Pcs, Tablets and Phones) that rely on Symantec to help them secure their assets from attacks, including their data centers, emails and other sensitive dataCloud Platform Engineering (CPE)Build consolidated cloud infrastructure and platform services for next generation data powered Symantec applicationsA big data platform for batch and stream analytics integrated with both private and public cloudsOpen source components as building blocksBridge feature gaps and contribute back
3
AgendaSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek MadaniIntroduction1Big Data Platform Challenges2What is the solution?3Self Service Analytics Platform Provisioning4Going Hybrid Cloud using Cloudbreak5
Monitoring & Alerting6
4
Big Data Platform ChallengeHundreds of millions of users generating Billions of events every day from across the globeHundreds of Big Data Application Developers developing 1000s of applicationsAt 12 PB and 500+ nodes, Cloud Platform Engineering Analytics team built the largest security data lake at SymantecElasticity is built into the platform to optimize costs in the cloud
San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
5
Big Data Platform ChallengeGreat! Now Developers can start building applications on our Big Data Lake100s of developers start building applications using different big data tools
San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
6
Big Data Platform ChallengeProduct team developers wants quick changes, latest versionsPlatform team wants stability!Soon, frustration prevails
San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
7
AgendaSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek MadaniIntroduction1Big Data Platform Challenges2What is the solution?3Self Service Analytics Platform Provisioning4Going Hybrid Cloud using Cloudbreak5
Monitoring & Alerting6
8
What is the Solution?Build and use your own little cluster for developmentCopy subset of data for development purposesBuild elasticity into the platform for cost optimizationsTear down the cluster after development is completeRepeat and Rinse
San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
What is the Solution?But Building clusters are hard and time consumingToo many services to install and configureDevelopers are not interested in building and managing clusters
San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
What is the Solution? Self ServiceWhat if we make it really easy to build clusters?Abstract all the deployment complexities and enable developers to get their own cluster in one click of a buttonUse the same blueprint for both dev and prod clustersSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
AgendaSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek MadaniIntroduction1Big Data Platform Challenges2What is the solution?3Self Service Analytics Platform Provisioning4Going Hybrid Cloud using Cloudbreak5
Monitoring & Alerting6
12
Self Service Analytics (SSA) ClustersRESTful web services to allow creation and management of custom clustersSelect from pre-defined Ambari BlueprintsCan provision infrastructure on Openstack as well as AWSInstalls HDP stack specified as part of Ambari blueprintDashing dashboard to monitor and manage (start/stop/kill) clustersSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
EnvironmentPrivate cloud on Openstack (Kilo, No Heat)Public cloud on AWSHDP 2.3.2 & 2.4.2Ambari 2.1.2 & 2.2
San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
SSA Architecture
San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
SSA Services
San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
SSA Demo
Ambari Custom ServicesWhat about the services that are not supported by Ambari out of the box?We write our own Ambari custom stack San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
AgendaSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek MadaniIntroduction1Big Data Platform Challenges2What is the solution?3Self Service Analytics Platform Provisioning4Going Hybrid Cloud using Cloudbreak5
Monitoring & Alerting6
19
Next Gen SSA This is all great! But, lot of work to add more cloud providers. Takes a lot of effort to understand the cloud providers APIs
San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
Next Gen SSA Cloudbreak CloudbreakCloudbreak helps to simplify the provisioning of HDP clusters in cloud environmentsSupports multiple clouds including AWS, Google, Azure and OpenstackUses Apache Ambari for HDP installation and managementHas a nice UI to build and manage clustersSupports automated cluster scalingSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
AWS Cluster ArchitectureSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
Private Subnet
Direct Connect 10 GbpsData Ingestion PipesTelemetry Ingestion PipesDatacenter hosts HDP over bare-metal and Openstack
Uses d3.* and r3.* flavorsEncrypted volumes LUKSNon-EBS root volumeNon-Dockerized HDPCustom AMI Enhanced networking
Symantec Datacenter
Cloudbreak DemoSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
Hybrid Cloud Using Cloudbreak Customization & ContributionNon-dockerized HDP installationSupport for Keystone v3 for OpenstackCloudbreak 1.2 released 03/2016Support for Custom AMIsWe have our own hardened images with Enhanced Networking, Volume Encryption, etcSupport for non-EBS backed root volumesDeploy in existing private VPC/SubnetAdditional AWS instance flavors supported We use r3.* and d3.* which are not supported by CloudbreakWe build our own Cloudbreak package from the trunk
San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
Cloudbreak Keystone V3 ScreenshotSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
Cloudbreak Keystone V3 Project Scope ScreenshotSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
Custom AMI SupportOrg security mandates using specific hardened AMIs onlyCreated our own hardened image with software and configurations required by CloudbreakAllows us to use features like:Volume encryption, enhanced networking enabledNon-EBS volumesSymantec specific configurations like LDAP, repos, DNS etcSymantec standard for hostnames Use jdk1.8 instead of java 7 which comes with Cloudbreak AMI
San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
/cloud-aws/src/main/resources/aws-images.yml
Non Dockerized HDP SupportWhy?No experience running production clusters under docker.Unknowns with upgrade path for HDP components.Encrypted Disk Volumes had issues working with docker.What?Worked with Cloudbreak team to test out non-Dockerized version of CloudbreakProvided feedback from our test deployment of the non-Dockerized versionFeature now available in the master branchSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
Non-EBS backed root volumeChanges to AWS CloudFormation template used by CloudbreakWe use ephemeral storage for root volumes for availability reasonWill contribute this back as an option to CloudbreakSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
Cloudbreak Contribution In ProgressPlacement groupsMultiple security groups attached to one clusterMultiple subnet deployment inside VPCSupport for non-EBS root volumesSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
AgendaSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek MadaniIntroduction1Big Data Platform Challenges2What is the solution?3Self Service Analytics Platform Provisioning4
Monitoring & Alerting6Going Hybrid Cloud using Cloudbreak5
31
Monitoring & AlertingNow that we have delivered an elephant, the next question from users is How is his health?San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
Monitoring and AlertingComprehensive dashboards for all environments managed by the platform teamExtensively use Ambari AlertsQueryX: Custom framework to fill the gaps in Ambari AlertsAll alerts are sent to OpenTSDB + Grafana stackCritical alerts PagerDutySan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
Monitoring and AlertingSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
Ambari Metrics Collector + QueryXCluster 1Cluster 2Cluster3.OpenTSDBGrafanaCall Ambari Metrics API
Grafana DashboardsSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
Grafana DashboardsSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
Ambari AlertsSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
Ambari AlertsSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
Summary and Future WorkA journey towards one click cluster deploymentCloudbreak - one tool for all cloudContribute back the features developed in-houseEnable Cloudbreak to support Baremetal cluster provisioningAuto-scaling using Cloudbreak and PeriscopeSingle large YARN cluster for variety of compute and storage loadsOpen source use and contributeWork with community to address gapsSSA code already opensourcedhttps://github.com/symantec/
San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani
Thank You!
Q & A Karthik [email protected]
Vivek [email protected]
San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani