Using Data Virtualization to Integrate With Big Data

  • Published on
    27-Jan-2015

  • View
    111

  • Download
    8

Embed Size (px)

DESCRIPTION

Hadoop and big data don't sit as an island in organizations. To analyze event streams and similar data requires integrating with other data from systems in the organization. This isn't easy with big data systems today because there are disparities in the technoogies and environments when compared to traditional IT. Data virtualization is one way to smooth over the integration and allow Hadoop to access other data, or allow SQL-oriented tools to access Hadoop

Transcript

<ul><li> 1. The Role of Data Virtualization in a World of Big Data June6,2012 MarkMadsen @markmadsen www.ThirdNature.net InformationManagementThroughHumanHistory Newtechnologydevelopment(innovation) createsNewmethodstocope (maturation) creates Newinformationscaleandavailability(saturation)createsCopyrightThirdNature,Inc.</li></ul> <p> 2. BigDataYou keep using that word.I do not think it meanswhat you think it means. 3. Whatmakesdatabig?HierarchicalstructuresNestedstructuresEncodedvaluesNonstandard(foradatabase)typesDeepstructureVerylargeamountsHumanauthoredtextbigisbetteroffbeingdefinedascomplexorhardtomanageCopyrightThirdNature,Inc. 4. YoucouldstorethisdatainthedatawarehousebutOlddatabasetechnologyhassomanyproblems 5. BigDataNewtechnologyhassomanyproblems 6. RealityismultipledatastoresandplatformsSeparate, purpose-built databases and processing systems fordifferent types of data and query / computing workloads is thenorm for information delivery. Data flows between most of theseenvironments.BI,Reporting, Dashboards1 MargeInovera $150,000 Statsi tic ai n 1 MargeI novera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n 1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n 1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n2 Anit a Bath $120,000 Seweri nspector 2 AnitaBath $120,000 Sew eri nspector2 Anit aBath $120,000 Seweri nspector 2 Anit aBath $120,000 Seweri nspector2 Anit aBath $120,000 Seweri nspector 2 Anit a Bath $120,000 Seweri nspector2 Anit aBath $120,000 Seweri nspector2 Anit aBath $120,000 Seweri nspector 2 Anit aBath $120,000 Seweri nspector3 vI anAwfulti ch $160,000 Derm atologist 3 IvanAwfulit ch $160,000 Dermatologist 3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist 3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist 3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist4 NadiaGeddit $36,000DBA4 N daia Geddit $36,000 DBA 4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA Data Warehouse 1 MargeI novera $150,000 Statsi tic ai n 1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n 1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n 1 MargeInovera $150,000 Statsi tic ai n 2 AnitaBath $120,000 Sew eri nspector 2 Anit aBath $120,000 Seweri nspector 2 Anit a Bath $120,000 Seweri nspector2 Anit aBath $120,000 Seweri nspector 2 Anit aBath $120,000 Seweri nspector2 Anit a Bath $120,000 Seweri nspector2 Anit a Bath $120,000 Seweri nspector2 Anit aBath $120,000 Seweri nspector2 Anit a Bath $120,000 Seweri nspector 3 IvanAwfulit ch $160,000 Dermatologist3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist 3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist 3 Iv anAwfulti ch $160,000 Derm atologist 4 N daia Geddit $36,000 DBA4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA4 NadiaGeddit $36,000DBADatabases Documents Flat FilesXMLQueues ERP ApplicationsSourceEnvironments Examplebigdata:WebtrackingdataUSER_ID301212631165031SESSION_ID 590387153892659VISIT_DATE 1/10/20100:00SESSION_START_DATE 1:41:44AMPAGE_VIEW_DATE 1/10/20109:59 https://www.phisherking.com/gifts/store/LogonForm?mmc= linksrcemail_m100109_44IOJ1_shop&amp;langId=DESTINATION_URL1&amp;storeId=1055&amp;URL=BECGiftListItemDisplayREFERRAL_NAMEDirectREFERRAL_URL PAGE_IDPROD_24259_CARDREL_PRODUCTS PROD_24654_CARD,PROD_3648_FLOWERSSITE_LOCATION_NAME VALENTINESDAYMICROSITESITE_LOCATION_ID SHOPBYHOLIDAYVALENTINESDAYIP_ADDRESS 67.189.110.179 MOZILLA/4.0(COMPATIBLE;MSIE7.0;AOL9.0;WINDOWSBROWSER_OS_NAMENT5.1;TRIDENT/4.0;GTB6;.NETCLR1.1.4322) 7. Examplebigdata:WebtrackingdataUSER_ID301212631165031SESSION_ID 590387153892659 The event streamVISIT_DATE 1/10/20100:00contains IDs, but noSESSION_START_DATE 1:41:44AMreference dataPAGE_VIEW_DATE 1/10/20109:59 https://www.phisherking.com/gifts/store/LogonForm?mmc= linksrcemail_m100109_44IOJ1_shop&amp;langId=DESTINATION_URL1&amp;storeId=1055&amp;URL=BECGiftListItemDisplayREFERRAL_NAMEDirectREFERRAL_URL PAGE_IDPROD_24259_CARDREL_PRODUCTS PROD_24654_CARD,PROD_3648_FLOWERSSITE_LOCATION_NAME VALENTINESDAYMICROSITESITE_LOCATION_ID SHOPBYHOLIDAYVALENTINESDAYIP_ADDRESS 67.189.110.179 MOZILLA/4.0(COMPATIBLE;MSIE7.0;AOL9.0;WINDOWSBROWSER_OS_NAMENT5.1;TRIDENT/4.0;GTB6;.NETCLR1.1.4322)Reference data, aka dimensions, master data. This isnt an OLTPDB, there is no reference data available from the source.I need that It would be logicaldata now. to keep all theIt will take. data in one place.6 months Thetypical situationforanalysts 8. Therearetwoarchitecturalapproachesto facilitatinganalysis,dependingonwherethe analystworksintheenvironment: 1. Backendintegration:ForanalystsworkingwithintheBDenvironment Reachingoutfromtheenvironmenttogetotherdatathatsneededtomakesenseofinformation.2.Frontendintegration:Foranalystsworkingina moreconventionalBI/analysisenvironment reachingintotheBDenvironmentfromothertools.Solution:copythedataintoHadoop?Just load it from the DW. If its there. Otherwise, dump and loadthe data from the sources.Great for one-time analysis, but if you need to do it again nextweek, or if you need current values on a regular basis?You can build custom extracts from each source. ButDatawarehouse Poor tool support OLTPSources Problem of on-demand / current values Minimal data management possible in the Hadoop environment The analyst waits 9. Alternative:datavirtualizationtoenableaccessAdatavirtualizationlayercanbeusedtomakeothersources(OLTP,thedatawarehouse)appearlocallyaccessibletotheanalystorHadoopprogrammer.Then,twochoicesarepossible: extractthedataandloaditintothelocalenvironment accessitdynamicallyfromwithintheenvironment DatawarehouseOLTPSourcesAlternative:datavirtualizationtobridgestoresAdatavirtualizationlayercanbeusedtobridgethedatabaseandbigdataenvironments,hidingthebackendcomplexities.AllowsonetoaccessraworprocesseddatafromHadoopalongsidedatafromotherenvironmentswithsomebenefits:nolimitedHiveconnectors,noclientsidedatamerging,nodifficultmetadatalayerintegrations. DatawarehouseOLTPSources 10. Datavirtualizationcansimplifyaccessacrosstheentire dataenvironment,bigornot DValsoenablessharedmetadataacrossenvironments,avoiding thecostsofmodelintegrationandburyingitinsourcecode.BI,Reporting, Dashboards Datavirtualizationlayer(frontend)1 MargeInovera $150,000 Statsi tic ai n 1 MargeI novera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n 1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n 1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n2 Anit aBath $120,000 Seweri nspector2 AnitaBath $120,000 Sew eri nspector2 Anit aBath $120,000 Seweri nspector 2 Anit aBath $120,000 Seweri nspector2 Anit aBath $120,000 Seweri nspector 2 Anit a Bath $120,000 Seweri nspector2 Anit aBath $120,000 Seweri nspector2 Anit aBath $120,000 Seweri nspector 2 Anit aBath $120,000 Seweri nspector3 Iv anAwfulti ch $160,000 Derm atologist 3 IvanAwfulit ch $160,000 Dermatologist 3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist 3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist 3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist4 NadiaGeddit $36,000DBA4 N daia Geddit $36,000 DBA 4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA Data Warehouse 1 MargeI novera $150,000 Statsi tic ai n 1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n 1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n1 MargeInovera $150,000 Statsi tic ai n 1 MargeInovera $150,000 Statsi tic ai n 2 AnitaBath $120,000 Sew eri nspector 2 Anit aBath $120,000 Seweri nspector 2 Anit a Bath $120,000 Seweri nspector2 Anit aBath $120,000 Seweri nspector 2 Anit aBath $120,000 Seweri nspector2 Anit a Bath $120,000 Seweri nspector2 Anit a Bath $120,000 Seweri nspector2 Anit aBath $120,000 Seweri nspector2 Anit a Bath $120,000 Seweri nspector 3 IvanAwfulit ch $160,000 Dermatologist3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist 3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist3 Iv anAwfulti ch $160,000 Derm atologist 3 Iv anAwfulti ch $160,000 Derm atologist 4 N daia Geddit $36,000 DBA4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA 4 NadiaGeddit $36,000DBA4 NadiaGeddit $36,000DBA DVlayer(backend)Databases Documents Flat FilesXMLQueues ERP ApplicationsSourceEnvironmentsBridgethedataenvironmenttousesbeyondBITheusecasesarenowinteractiveapplications,lowerlatencydata,complexanalyticsandextendbeyondreadonlyqueries. 11. About the Presenter Mark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, analytics and information management. Mark is an award-winning author, architect and former CTO whose work has been featured in numerous industry publications. During his career Mark received awards from the American Productivity &amp; Quality Center, TDWI, Computerworld and the Smithsonian Institute. He is an international speaker, contributing editor at Intelligent Enterprise, and manages the open source channel at the Business Intelligence Network. For more information or to contact Mark, visit http://ThirdNature.net.AboutThirdNatureThird Nature is a research and consulting firm focused on new andemerging technology and practices in business intelligence, analytics andperformance management. If your question is related to BI, analytics,information strategy and data then youre at the right place.Our goal is to help companies take advantage of information-drivenmanagement practices and applications. We offer education, consultingand research services to support business and IT organizations as well astechnology vendors.We fill the gap between what the industry analyst firms cover and what ITneeds. We specialize in product and technology analysis, so we look atemerging technologies and markets, evaluating technology and hw it isapplied rather than vendor market positions. </p>

Recommended

View more >