Bridging the Big Data Gap in the Software-Driven World

  • Published on

  • View

  • Download

Embed Size (px)


Implementing and managing a Big Data environment effectively requires essential efficiencies such as automation, performance monitoring and flexible infrastructure management. Discover new innovations that enable you to manage entire Big Data environments with unparalleled ease of use and clear enterprise visibility across a variety of data repositories. To learn more about Mainframe solutions from CA Technologies, visit:


  • 1. MainframeBridging the Big Data Gapin the Software-Driven WorldMichael HarerCA TechnologiesProduct ManagementScott AndressHortonworksSr. Director, Business DevelopmentMFT09S #CAWorld

2. AbstractMichael HarerCA TechnologiesSr. Principal Product Mgr.Database and AnalyticsImplementing and managing a Big Data environmenteffectively requires essential efficiencies such asautomation, performance monitoring and flexibleinfrastructure management. Discover new innovationsthat enable you to manage entire Big Data environmentswith unparalleled ease of use and clear enterprisevisibility across a variety of data repositories.2 2014 CA. ALL RIGHTS RESERVED. 3. AgendaQUICK REFRESHER ON BIG DATABIG DATA INFRASTRUCTURE MANAGEMENT CHALLENGES360 DEGREE BIG DATA INFRASTRUCTURE MANAGEMENT APPROACHHORTONWORKS BIG DATA PLATFORMSUMMARYRECOMMENDED SESSIONS / RELATED ACTIVITIES1234563 2014 CA. ALL RIGHTS RESERVED. 4. Big Data 1 Means Different Things To Different People4 Customers define Big Data in a broad sense: Any analytical processing that is different from the traditional data warehouse applications Defined by the types and speed of data being analyzedHigh-Velocity capture,discovery and/or analysis 2014 CA. ALL RIGHTS place todayLarge Volumes of a Varietyof data from various sourcesacross the enterpriseVeracity keeping theright, trusted dataOr explained via the 4 Vs 5. 1 Big Data Growing Fast 80 percent of data isunstructured (images,audio, tweets, etc. New analyticapplications based ona next generation bigdata platform arereaching the market5 2014 CA. AALLLL RRIIGGHHTTSS RREESSEERRVVEEDD.. Low-cost hardwareand softwareenvironments Less costly captureand exploitation ofbig data Data volumes aredoubling every year Organizations arestoring three or moreyears of data HadoopAdministrator HadoopDeveloper/Architect Data Scientist, etc.CommoditizedNew Personas Hardware and SoftwareCapturing and Managinglots of informationWorking with manynew types of data 6. Going From The Science Project To Production The organization realizes that the analytics and insights coming out of a Big Dataproject are essential To keep costs down, you start with the basic Hadoop distribution from Apache Maybe a free tool or two and off you go Gain traction tremendous pressure to deliver or the business gets farther behind More tools, software and data sources are addedYou now have a huge number of moving parts, tools frommany vendors and a ton of complexity6 2014 CA. ALL RIGHTS RESERVED.2 7. The Big Big Data Management PainsThe Need to Overcome Many Challenges Managing complex multi-vendor big dataenvironments Finding Hadoop/Big Data experts Understanding capacity requirements for rapidlychanging business needs As complexity increases, manual processes areoften required System problems are hard to isolate, downtimeincreases Unique tools and shortcomings Driving forces acquisitions, departmentconsolidations demand greater operationalefficiency7 2014 CA. ALL RIGHTS RESERVED.MainframeAMZ EMRConsole2 8. Gaps/Complexities in Managing These EnvironmentsHow 1 many people do you have to manage your Big Data infrastructure?2 Do your Big Data administrators always know the health of the systems?3 Can you detect most problems before significant system outages occur?4 How many different monitoring tools do you have in place now?5 How do you know if your capacity is optimized for cost and performance?6 What was the financial impact of downtime over the past year?8 2014 CA. ALL RIGHTS RESERVED.2 9. A New Role in the Organization is Born9 2014 CA. ALL RIGHTS RESERVED.3Role / Responsibilities: Hadoop Multi-Vendor Management Hadoop Resource Management /Reporting Hadoop Process Management /Automation Hadoop Job Management & Monitoring Hadoop System Health Monitoring & Alerts Perform day-to-day operations andsupport of Hadoop infrastructure Monitor/maintain existing clusters andprovision new ones Integrate enterprise monitoring tools Analyze current workloads and performcapacity planningKey Management Capabilities:Big Data / Hadoop Administrator 10. 360 Degree Big Data Infrastructure Management ApproachBig Data InfrastructureManagement Use CasesJob Mgmt / MonitoringMulti-Vendor Management System ManagementAlert ManagementProcess Mgmt / AutomationResource Mgmt / ReportingStorage Hadoop Distributed File System(Unstructured/Structured)10 2014 CA. ALL RIGHTS RESERVED.Data Movement (ETL)CA Big DataInfrastructureManagementConfigurationMobilitySecurityBig Data PlatformVendorsHadoop & HybridNASHadoop Distributed File SystemDataManagementSystem Health MonitoringHadoop Big Data Platform Vendor AHadoop Big Data Platform Vendor BHadoop Big Data Platform Vendor CHadoop Big Data Platform Vendor DHybrid Big Data Platform Vendor AHybrid Big Data Platform Vendor BHybrid Big Data Platform Vendor C3 11. 360 Degree 3 Big Data Infrastructure Management ApproachCA Big Data Infrastructure Management (In Development) Big Data (Hadoop) InfrastructureSINGLE, CONSISTENTMANAGEMENT UI EXPERIENCESINGLE ACCESS POINTINTO HETEROGENEOUSENVIRONMENTLinux / x8611 2014 CA. ALL RIGHTS RESERVED.OPERATIONALIZE , MANAGEMULTI-VENDOR HADOOPMANAGEMENT DOMAINSBig Data InfrastructureManagement Server(In Development) 12. 3 CA Big Data Infrastructure ManagementDemonstration(Under development )Challenges: Revised budget remains flat and requires a 30% to 50%increase in Big Data environment utilization. Significant complexity associated to hosting multipleHadoop distributions & an increasing number of business-criticalHadoop clusters to support their business apps.Demo Scenario: A global financial institution has been using Big Data technologies to bring new investmentproducts to the market. They are now expanding their Big Data environment to support 6 other business units and anever growing number of business initiatives. They also discovered that some of the business units had already started their own Big Dataprojects using different big data platforms.12 2014 CA. ALL RIGHTS RESERVED. 13. 13 2014 CA. ALL RIGHTS RESERVED. 14. 14 2014 CA. ALL RIGHTS RESERVED. 15. 15 2014 CA. ALL RIGHTS RESERVED. 16. 16 2014 CA. ALL RIGHTS RESERVED. 17. 17 2014 CA. ALL RIGHTS RESERVED. 18. 18 2014 CA. ALL RIGHTS RESERVED. 19. 19 2014 CA. ALL RIGHTS RESERVED. 20. 20 2014 CA. ALL RIGHTS RESERVED. 21. HortonworksScott AndressSenior Director, Business DevelopmentCA World 2014Page 21 Hortonworks Inc. 2011 2014. All Rights ReservedWe Do Hadoop 22. Hortonworks enables adoption of Apache Hadoopthrough HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects, developers,operators of Hadoop from Yahoo! We are leaders in Hadoopcommunity 500+ employeesPage 22 Hortonworks Inc. 2011 2014. All Rights ReservedCustomer Momentum 300+ customers in seven quarters, growing at 75+/quarter Two thirds of customers come from F1000Hortonworks and Hadoop atScale HDP in production on largest clusters on planet Multiple +1000 node clusters, including 35,000 nodes atYahoo!, 800 nodes at Spotify 23. Key Drivers of HadoopInteractive Batch Real-TimeYARN: Data Operating SystemHDFS: Hadoop Distributed File Page 23 Hortonworks Inc. 2011 2014. All Rights ReservedDEV & DATA TOOLSBuild &TestOPERATIONS TOOLSProvision,Manage &MonitorDATA SYSTEMREPOSITORIESSOURCESRDBMS EDW MPPAPPLICATIONSBusinessAnalyticsCustomApplicationsPackagedApplicationsUnlock New Approach to Analytics Agile analytics via Schema on Readwith ability to store all data in nativeformat Create new apps from new types ofdataAOptimize Investments, Cut Costs Focus EDW on high value workloads Use commodity servers & storage toenable all data (original and historical)to be accessible for ongoing explorationBEnable a Modern Data Architecture Integrate new & existing data sets Make all data available for shared accessand processing in multitenantinfrastructure Batch, interactive & real-time use cases Integrated with existing tools & skillsCEXISTINGSystemsClickstream Web &SocialGeolocation Sensor &MachineServerLogsUnstructuredSystem 24. Hortonworks Approach1 Innovate the CoreArchitect and buildinnovation at the core ofHadoop YARN: Data OperatingSystem HDFS as the storage layer Key processing enginesExtend Hadoop as anEnterprise Data Platform 2 3 Enable the EcosystemExtend Hadoop with enterprisecapabilities for governance,security & operationsApply enterprise software rigorto the open source developmentprocessScriptPigSearchSolrSQLHive/Tez,HCatalogNoSQLHBaseAccumuloStreamStormBatchMapReducePage 24 Hortonworks Inc. 2011 2014. All Rights ReservedEnable the leaders in the datacenter to easily adopt & extendtheir platforms Establish Hadoop as standardcomponent of a modern dataarchitecture Joint engineeringYARN : Data Operating SystemHDFS(Hadoop Distributed File System)HDP 2.1Governance& IntegrationSecurityOperationsData AccessYARNData Management 25. 4 all done completely in Open SourceHadoop is a platform decision Open Source: fastest path to innovation for a platform technology Eliminate vendor lock in, no proprietary software Data center leaders have committed to the open source approachScriptPigContributes more to the Apache Hadoopecosystem in the ASF than any othervendorSearchSolrSQLHive/Tez,HCatalogNoSQLHBaseAccumuloStreamStormBatchMapReduceYARN : Data Operating SystemHDFS(Hadoop Distributed File System)Page 25 Hortonworks Inc. 2011 2014. All Rights ReservedApacheProjectCommittersPMCMembersHadoop 27 20Tez 15 15Hive 16 4HBase 6 4Pig 5 5Accumulo2 2Flume 1 0Storm 3 2Sqoop 1 1Ambari 32 27Oozie 3 2Zookeeper2 1Knox 11 5Falcon 5 3TOTAL 129 91HDP 2.1Governance& IntegrationSecurityOperationsData AccessYARNData Management 26. Our Vision - Big Data Infrastructure ManagementExtending our IT Management LeadershipLAS VEGAS, November 10, 2014 CA WORLD 14 CA Technologies (NASDAQ:CA) todayannounced a new global distribution agreement with Veristorm, a software company focused on BigData management. The agreement strengthens CAs ability to help customers leverage keybusiness data on the mainframe for Big Data and analytics projects.26 2014 CA. ALL RIGHTS RESERVED.5Its extremely difficult for data scientists, ChiefMarketing Officers (CMOs) and other stakeholders toget access to their raw System z data in tandem withmachine logs and other types of transactionalinformation, said Mike Madden, general manager,Mainframe, CA Technologies. Customers around theworld are looking for greater insight to gaincompetitive advantage and much of the worlds mostimportant transactional data resides on System z.Veristorm provides next-generation data movementtechnology that makes it easier to move System zdata into Hadoop, lowering overall total cost ofownership. 27. Wrap UpKey Thoughts The Big Data market is forcing significant changes to IT. Most Big Data infrastructures will grow in complexity as business needs evolve. Think ahead - you will need to effectively manage mixed, heterogeneous Big Dataenvironments.Next Steps Understand the changes (e.g. Hadoop) and align a Big Data roadmap to meet your changingbusiness needs. Retain flexibility & adopt the Big Data technologies that are right for your business needs. Choose a management solution that can support the range of Big Data technologies yourbusiness requires now and in the future. Consider CA Big Data Infrastructure Management27 2014 CA. ALL RIGHTS RESERVED.5 28. Polling QuestionHAVE PROJECT IN PRODUCTIONCONDUCTING A PILOT PROJECTPROJECT BEING PLANNEDINVESTIGATING A PROJECTNONE OF THE ABOVE12345When it comesto a Big Dataproject, whatbest describesyourorganization:28 2014 CA. ALL RIGHTS RESERVED. 29. For More InformationInsert appropriate screenshot and text overlayfrom following More Info Graphics slide here;Mainframeensure it links to correct pageTo learn more about Mainframe solutions fromCA Technologies, please visit: 2014 CA. ALL RIGHTS RESERVED. 30. For Informational Purposes OnlyTerms of this Presentation 2014 CA. All rights reserved. All trademarks referenced herein belong to their respective companies.This presentation provided at CA World 2014 is intended for information purposes only and does not form any type of warranty.Some of the specific slides with customer references relate to customer's specific use and experience of CA products and solutions so actualresults may vary.30 2014 CA. ALL RIGHTS RESERVED.