41
1 RED HAT JBOSS MIDDLEWARE Discover Red Hat and Hortonworks for the Modern Data Architecture Kimberly Palko Product Manager Red Hat

Red Hat - Presentation at Hortonworks Booth - Strata 2014

Embed Size (px)

DESCRIPTION

As the Enterprise’s big data program matures and Apache Hadoop becomes more deeply embedded in critical operations, the ability to support and operate it efficiently and reliably becomes increasingly important. To aid enterprise in operating modern data architecture at scale, Red Hat and Hortonworks have collaborated to integrate Hortonworks Data Platform with Red Hat’s proven platform technologies. Join us in this interactive series, as we’ll demonstrate how Red Hat JBoss Data Virtualization can integrate with Hadoop through Hive and provide users easy access to data.

Citation preview

Page 1: Red Hat - Presentation at Hortonworks Booth - Strata 2014

1 RED  HAT  JBOSS  MIDDLEWARE    

Discover  Red  Hat  and  Hortonworks  for  the  Modern  Data  Architecture  Kimberly  Palko  

Product  Manager  

Red  Hat  

Page 2: Red Hat - Presentation at Hortonworks Booth - Strata 2014

2 RED  HAT  JBOSS  MIDDLEWARE    

Agenda

●  Red Hat and JBoss Middleware Overview ●  Combining data in Hadoop with traditional data

sources ●  Federating two geographically distributed

Hadoop clusters ●  Virtual data marts for Hadoop Lake

Page 3: Red Hat - Presentation at Hortonworks Booth - Strata 2014

3 RED  HAT  JBOSS  MIDDLEWARE    

RED  HAT  &    JBOSS  MIDDLEWARE  OVERVIEW  

Page 4: Red Hat - Presentation at Hortonworks Booth - Strata 2014

4 RED  HAT  JBOSS  MIDDLEWARE    

Engineering  CollaboraFon   Benefits  Integra<on  with  JBoss  Data  Virtualiza<on  

Enable  agile  Big  Data  Hadoop  integra<on  with  exis<ng  enterprise  assets  and  maximize  universal  data  u<liza<on  to  enable  self-­‐service  analy<cs  

Integra<on  with  mul<ple  Red  Hat  JBoss  Middleware  product  family  

Enables  millions  of  JBoss  developers  to  quickly  build  applica<ons  with  Hadoop  

Integra<on  with  Red  Hat  Storage   Enables  Hadoop  to  use  Red  Hat  Storage  secure  resilient  storage  pool  for  data  applica<ons  

Integra<on  with  Red  Hat  Enterprise  Linux  OpenStack  PlaOorm  

Simplifies  automated  deployment  of  Hadoop  on  OpenStack  

Integrated  with  Red  Hat  Enterprise  Linux  and  OpenJDK  

Develop  and  deploy  Apache  Hadoop  as  an  integrated  component  for  mul<ple  deployment  scenarios  

Page 5: Red Hat - Presentation at Hortonworks Booth - Strata 2014

5 RED  HAT  JBOSS  MIDDLEWARE    

Big  Data  Integra<on:  Turn  Data  into  Ac<onable  Informa<on  

Hadoop  &  NoSQL  

Data  Integra<on  &  Data  Services  JBoss  Data  Virtualiza<on  

In-­‐memory  data  management  JBoss  Data  Grid  

BI  Analy<cs    (diagnos<c,  descrip<ve,  predic<ve,  prescrip<ve)  

Speed  of  Itera<on  leads  to  Success    

SOA  Applica<ons  

Event  Processing  &  Messaging  JBoss  BRMS  &  JBoss  A-­‐MQ  

Structured  Data  DW,  OLAP,  OLTP  

Red  Hat  Enterprise  Linux   Red  Hat  Storage  

Semi  /  Unstructured  Data  SOCIAL,  LOGS  

Streaming  Data  EVENTS,  IOT  

Analyze  

Integrate  

Enrich  

Ingest  

Page 6: Red Hat - Presentation at Hortonworks Booth - Strata 2014

6 RED  HAT  JBOSS  MIDDLEWARE    

Data  Challenges  Geang  Bigger…  

NoSQL  

Hive  

MapReduce  

HDFS  

Storm  

HBase  Spark  

Page 7: Red Hat - Presentation at Hortonworks Booth - Strata 2014

7 RED  HAT  JBOSS  MIDDLEWARE    

Make  Big  Data  Accessible  for  Everyone  

Page 8: Red Hat - Presentation at Hortonworks Booth - Strata 2014

8 RED  HAT  JBOSS  MIDDLEWARE    

Data Supply and Integration Solution  Data  Virtualiza<on  sits  in  front  of  mul<ple  data  sources    and      !  allows  them  to  be  treated  a  single  source      !  delivering  the  desired  data      

!  in  the  required  form  

!  at  the  right  <me    

!  to  any  applica<on  and/or  user.    THINK  VIRTUAL  MACHINE  FOR  DATA  

Page 9: Red Hat - Presentation at Hortonworks Booth - Strata 2014

9 RED  HAT  JBOSS  MIDDLEWARE    

Easy  Access  to  Big  Data  

●  Repor<ng  tool  accesses  the  data  virtualiza<on  server  via  rich  SQL  dialect  

●  The  data  virtualiza<on  server  translates  rich  SQL  dialect  to  HiveQL  

●  Hive  translates  HiveQL  to  MapReduce  

●  MapReduce  runs  MR  job  on  big  data  

MapReduce  

HDFS  

Hive  

Analytical Reporting

Tool

Data Virtualization

Server

Hadoop

Big Data

Page 10: Red Hat - Presentation at Hortonworks Booth - Strata 2014

10 RED  HAT  JBOSS  MIDDLEWARE    

Different  Users  Different  Views  of  Big  Data  

●  Logical  tables  with  different  forms  of  aggrega<on  

●  Logical  tables  containing  extra  derived  data  

●  Logical  tables  with  filtered  data  

●  All  reports/users  share  the  same  specifica<ons  MapReduce  

HDFS  

Hive  

Page 11: Red Hat - Presentation at Hortonworks Booth - Strata 2014

11 RED  HAT  JBOSS  MIDDLEWARE    

USE  CASE  1:  COMBINING  DATA  FROM  HADOOP  WITH  TRADITIONAL  SOURCES    -­‐  USING  JBOSS  DATA  VIRTUALIZATION  

Page 12: Red Hat - Presentation at Hortonworks Booth - Strata 2014

12 RED  HAT  JBOSS  MIDDLEWARE    

Integra<on  of  Big  Data  with  “Small  Data”  

•  Integra<ng  small  data  with  big  data  is  easy  

•  Integra<on  specifica<ons  can  be  shared  or  be  developed  for  individual  reports  

MapReduce  

HDFS  

Hive   Applica<on  Database  Server  

Page 13: Red Hat - Presentation at Hortonworks Booth - Strata 2014

13 RED  HAT  JBOSS  MIDDLEWARE    

Caching  the  Big  Data  

•  Caches  to  speed  up  interac<ve  repor<ng  

•  Caches  to  create  a  consistent  view  of  big  data  

•  Different  caches  for  different  reports  

MapReduce  

HDFS  

Hive  

Page 14: Red Hat - Presentation at Hortonworks Booth - Strata 2014

14 RED  HAT  JBOSS  MIDDLEWARE    

USE  CASE  2:  GEOGRAPHICALLY DISTRIBUTED HADOOP CLUSTERS WITH DATA VIRTUALIZATION - SECURING DATA BY USER ROLE  

Page 15: Red Hat - Presentation at Hortonworks Booth - Strata 2014

15 RED  HAT  JBOSS  MIDDLEWARE    

Role based access control

Roles  • Define  roles  based  on  organiza<on  hierarchy  

Users  •  External  authen<ca<on  via  Kerberos,  LDAP,  etc.  

   

VDB  •  Assign  users  and  groups  to  a  virtual  data  base  

Page 16: Red Hat - Presentation at Hortonworks Booth - Strata 2014

16 RED  HAT  JBOSS  MIDDLEWARE    

Authentication

Kerberos  From  client  to  the  virtual  data  base  

Login  Modules  LDAP  (MS  Ac<ve  Directory,  OpenLDAP,  etc.),  any  JAAS  based  security  domain  

REST  and  Web  Services  WS-­‐UsernameToken  HTTP  Basic  authen<ca<on  

SAML  SAML  authen<ca<on  for  web  client  applica<ons  

 

Page 17: Red Hat - Presentation at Hortonworks Booth - Strata 2014

17 RED  HAT  JBOSS  MIDDLEWARE    

Audit Logging via Dashboard

Page 18: Red Hat - Presentation at Hortonworks Booth - Strata 2014

18 RED  HAT  JBOSS  MIDDLEWARE    

Row  and  Column  Masking  

-­‐  Row  based  masking              Ex:  keyed  off  geographic  marker    -­‐  Column  masking  to  a  constant,  null,  or  a  SQL  statement        Example:  change  all  but  the    Last  4  digits  in  a  credit  card  number  to  stars    concat('****',  substring(column,  length(column)-­‐4))      

Page 19: Red Hat - Presentation at Hortonworks Booth - Strata 2014

19 RED  HAT  JBOSS  MIDDLEWARE    

Summary  of  Security  Capabili<es  ●  Authentication

–  Kerberos, LDAP, WS-UsernameToken, HTTP Basic, SAML

●  Authorization –  Virtual data views, Role based access control

●  Administration –  Centralized management of VDB privileges

●  Audit –  Centralized audit logging and dashboard

●  Protection –  Row and column masking –  SSL encryption (ODBC and JDBC)

Page 20: Red Hat - Presentation at Hortonworks Booth - Strata 2014

20 RED  HAT  JBOSS  MIDDLEWARE    

Demonstration Geographically Distributed Hadoop Clusters with Data Virtualization - Securing

Data by User Role

Page 21: Red Hat - Presentation at Hortonworks Booth - Strata 2014

21 RED  HAT  JBOSS  MIDDLEWARE    

Use Case 2: Federating across Geographically Distributed Hadoop Clusters Problem:

Geographically distributed Hadoop clusters contains sensitive data like patient records or customer identification that cannot be accessed by other regions due to regulatory policy. IT needs access to all data, but users can only access the data in their region.

Solution: Leverage JBoss Data Virtualization to

provide Row Level Security and Masking of columns while federating across Hadoop clusters.

Consume  Compose  Connect  

Data  can  be  accessed  by    mulFple  tools  and  methods  already  in-­‐house  

     

JBoss  Data  Virtualiza<on  

Hive  

Hadoop  cluster  in  one  geographic  

region  

Hive  

Hadoop  cluster  in  a  second  geographic  

region  

Page 22: Red Hat - Presentation at Hortonworks Booth - Strata 2014

22 RED  HAT  JBOSS  MIDDLEWARE    

Use Case 2 - Architecture

DATA

   SYSTEM  

APPLICAT

IONS  

Business    AnalyFcs  

Custom  ApplicaFons  

Packaged  ApplicaFons  

VIRTUAL  DATA  MART  

Page 23: Red Hat - Presentation at Hortonworks Booth - Strata 2014

23 RED  HAT  JBOSS  MIDDLEWARE    

Use Case 2 - Resources

•  GUIDE  How  to  guide:  https://github.com/DataVirtualizationByExample/HortonworksUseCase2 Tutorial:  Available  soon    •  VIDEOS:  hpp://vimeo.com/user16928011/hortonworksusecase2short  hpp://vimeo.com/user16928011/hortonworksusecase2short    •  SOURCE:  hpps://github.com/DataVirtualiza<onByExample/HortonworksUseCase2    

Page 24: Red Hat - Presentation at Hortonworks Booth - Strata 2014

24 RED  HAT  JBOSS  MIDDLEWARE    

USE  CASE  3:  VIRTUAL DATA MARTS FOR HADOOP DATA LAKE - WITH JBOSS DATA VIRTUALIZATION  

Page 25: Red Hat - Presentation at Hortonworks Booth - Strata 2014

25 RED  HAT  JBOSS  MIDDLEWARE    

Data for entire organization in Hadoop Data Lake

Problem:    How  does  IT  control  access  and  give  business  users  just  the    data  they  need?  -­‐  Does  every  line  of  business  have  access  to  everyone’s  data?  -­‐  How  do  business  users  get  access  to  the  data  they  need  in  a    simple  (even  self-­‐service)  way?  

Marke<ng  Clickstream  Data   Finance  

Expense  Reports  

HR  Employee  Files   Server  

Logs  

Sales  Transac<ons  

Customer  Accounts  Twiper  

Sen<ment  Data  

Hadoop  Data  Lake  

Page 26: Red Hat - Presentation at Hortonworks Booth - Strata 2014

26 RED  HAT  JBOSS  MIDDLEWARE    

Marke<ng  Clickstream  Data  

Marke<ng   IT  Finance  

Customer  Accounts  Twiper  

Sen<ment  Data  

Sales  

Server  Logs  

Sales  Transac<ons  HR  Employee  Files  Finance  Expense  Reports  

Secure, Self-Service Virtual Data Marts for Hadoop

SoluFon:    Use  JBoss  Data  VirtualizaFon  to  create  virtual  data  marts  on  top  of  a  Hadoop  cluster    -­‐  Lines  of  Business  get  access  to  the  data  they  need  in  a  simple  manner  -­‐  IT  maintains  the  process  and  control  it  needs  -­‐  All  data  remains  in  the  data  lake,  nothing  is  copied  or  moved  

Hadoop  Data  Lake  

Page 27: Red Hat - Presentation at Hortonworks Booth - Strata 2014

27 RED  HAT  JBOSS  MIDDLEWARE    

Optional hierarchical data architectures with virtual data mart Can be combined with security features like user role access and row and column masking

Dept  Base  Virtual  Database  (VDB)  

Team  1  VDB  

Team2  VDB  

View2  View1  

Page 28: Red Hat - Presentation at Hortonworks Booth - Strata 2014

28 RED  HAT  JBOSS  MIDDLEWARE    

Virtual Data Marts for Operational Data

Problem:    All  the  legacy  and  archived  data  is  in  the  Hadoop  data  lake.    We  want  to  access  the  most  recent,  up  to  the  minute,  operaFonal  data    o\en  and  quickly.  

Marke<ng  Clickstream  Data  

Finance  Expense  Reports  

HR  Employee  Files  

Server  Logs  

Sales  Transac<ons  

Customer  Accounts  

Twiper  Sen<ment  Data  

Hadoop  Data  Lake  Historical  Data  

Page 29: Red Hat - Presentation at Hortonworks Booth - Strata 2014

29 RED  HAT  JBOSS  MIDDLEWARE    

Caching  For  Faster  Performance  –  Materialized  View  

 

       Cached  or  Materialized  View  1  

View  1  

Query  2  Query  1  

Virtual  Database  (VDB)  

•  Same  cached  view  for  mul<ple  queries  

•  Refreshed  automa<cally  or  manually  

•  Cache  repository  can  be  any  supported  data  source  

Page 30: Red Hat - Presentation at Hortonworks Booth - Strata 2014

30 RED  HAT  JBOSS  MIDDLEWARE    

Virtual operational data store

SoluFon:  Use  JBoss  Data  VirtualizaFon  to  integrate  up  to  the  minute  data  from    mulFple  diverse  data  sources  that  can  be  quickly  queried.                                                                                                          -­‐  Use  HDP  for  older  data  -­‐                                                                                         -­‐  Use  JDV  to  materialize  the  data  in  HDP  for  -­‐                                                                                               faster  access  and  to  combine  with  operaFonal  VDB                                                                                                  -­‐                                                                                                                     

Marke<ng  Clickstream  Data   Finance  

Expense  Reports  

HR  Employee  Files  

Server  Logs  

Sales  Transac<ons  

Customer  Accounts  

Twiper  Sen<ment  Data  

Hadoop  Data  Lake  Historical  Data  Opera<onal  

VDB  with  up  to  the  

minute  data  

Periodic  Transfer  from    Data  Sources  

Materialized  View  

Page 31: Red Hat - Presentation at Hortonworks Booth - Strata 2014

31 RED  HAT  JBOSS  MIDDLEWARE    

Demonstration Virtual Data Marts

with Hadoop Data Lake

Page 32: Red Hat - Presentation at Hortonworks Booth - Strata 2014

32 RED  HAT  JBOSS  MIDDLEWARE    

Use Case 3 - Overview

xxx ObjecFve:  –Purpose  oriented  data  views  for  func<onal  teams  over  a  rich  variety  of  semi-­‐structured  and  structured  data    Problem:  –Data  Lakes  have  large  volumes  of  consolidated  clickstream  data,  product  and  customer  data  that  need  to  be  constrained  for  mul<-­‐departmental  use.    SoluFon:  –Leverage  HDP  to  mashup  Clickstream  analysis  data  with  product  and  customer  data  on  HDP  to  answer    -­‐  Leverage  Jboss  Data  Virt  to  provide  Virtual  data  marts  for  Marke<ng  and  Product  teams  

Page 33: Red Hat - Presentation at Hortonworks Booth - Strata 2014

33 RED  HAT  JBOSS  MIDDLEWARE    

Use Case 3 - Architecture

APPLICAT

IONS  

Business    AnalyFcs  

Custom  ApplicaFons  

Packaged  ApplicaFons  

DATA

   SYSTEM  

SOURC

ES  

Emerging  Sources    (Sensor,  SenFment,  Geo,  

Unstructured)  

ExisFng  Sources    (CRM,  ERP,  Clickstream,  

Logs)  

                           HDP  2.1  

Gov

erna

nce

&

Inte

grat

ion  

Secu

rity  

Ope

ratio

ns  

Data Access  

Data Management  VIRTUAL  DATA  MART  

Page 34: Red Hat - Presentation at Hortonworks Booth - Strata 2014

34 RED  HAT  JBOSS  MIDDLEWARE    

Use Case 3 - Resources •  GUIDE How to guide: https://github.com/DataVirtualizationByExample/HortonworksUseCase3 Tutorial: Available soon •  VIDEOS: http://vimeo.com/user16928011/hwxuc3configuration http://vimeo.com/user16928011/hwxuc3run http://vimeo.com/user16928011/hwxuc3overview •  SOURCE: https://github.com/DataVirtualizationByExample/HortonworksUseCase3

Page 35: Red Hat - Presentation at Hortonworks Booth - Strata 2014

35 RED  HAT  JBOSS  MIDDLEWARE    

Demonstration Combining Sentiment Data

with Sales Data

Page 36: Red Hat - Presentation at Hortonworks Booth - Strata 2014

36 RED  HAT  JBOSS  MIDDLEWARE    

Use Case 1: Combine data from Hadoop with traditional data sources

Problem: Data from new data sources like

social media, clickstream and sensors needs to be combined with data from traditional sources to get the full value.

Solution: Leverage JBoss Data Virtualization

to mashup new data in Hadoop with data in traditional data sources without moving or copying any data and access it through a variety of BI tools and SOA technologies.

Consume  Compose  Connect  

Data  can  be  accessed  by    mulFple  tools  and  methods  

already  in-­‐house  

JBoss  Data  Virtualiza<on  

Hive  

SOURCE  1:  Hive/Hadoop  contains  data  from  new  data  sources  like  social  media,  clickstream  and  sensor  data  

SOURCE  2:  TradiFonal  relaFonal  databases  in  the  

enterprise  

Page 37: Red Hat - Presentation at Hortonworks Booth - Strata 2014

37 RED  HAT  JBOSS  MIDDLEWARE    

Use Case 1 - Architecture

DATA

   SYSTEM  

TRADITIONAL  REPOSITORIES  

RDBMS   EDW   MPP  

APPLICAT

IONS  

Business    AnalyFcs  

Custom  ApplicaFons  

Packaged  ApplicaFons  

VIRTUAL  DATA  MART  

Page 38: Red Hat - Presentation at Hortonworks Booth - Strata 2014

38 RED  HAT  JBOSS  MIDDLEWARE    

Use Case 1 – Demo

Page 39: Red Hat - Presentation at Hortonworks Booth - Strata 2014

39 RED  HAT  JBOSS  MIDDLEWARE    

Use Case 1 - Resources

http://hortonworks.com/hadoop-tutorial/evolving-data-stratagic-asset-using-hdp-red-hat-jboss-data-virtualization/

Page 40: Red Hat - Presentation at Hortonworks Booth - Strata 2014

40 RED  HAT  JBOSS  MIDDLEWARE    

Benefits  of  Data  Virtualiza<on  on  Big  Data  

●  Enterprise  democra<za<on  of  big  data  

●  Any  repor<ng  or  analy<cal  tool  can  be  used  

●  Easy  access  to  big  data  

●  Seamless  integra<on  of  big  data  and  small  data  

●  Sharing  of  integra<on  specifica<ons  

●  Collabora<ve  development  on  big  data  

●  Fine-­‐grained  security  of  big  data  

●  Speedy  delivery  of  reports  on  big  data  

Page 41: Red Hat - Presentation at Hortonworks Booth - Strata 2014

41 RED  HAT  JBOSS  MIDDLEWARE    

QUESTIONS