Data Lake for Hadoop

  • Upload
    g17ram

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

  • 8/11/2019 Data Lake for Hadoop

    1/12

    2010 Cisco and/or its affiliates. All rights reserved.

    Cisco Data Lake

    March 3, 2014

  • 8/11/2019 Data Lake for Hadoop

    2/12

    2010 Cisco and/or its affiliates. All rights reserved.

    Data Lake Defin

    Current Hadoop

    Why to Build Da

    Benefits

    Data Lake Desi

  • 8/11/2019 Data Lake for Hadoop

    3/12

    2010 Cisco and/or its affiliates. All rights reserved.

    Data Lake - a place to store practically unlimited amounts of data of any form

    type that is relatively inexpensive and massively scalable. Data processing so

    Hadoop can transform the data from its raw state to a finished product.

    --Revelytix

    If you think of a datamart as a store of bottled watercleansed and package

    for easy consumptionthe data lake is a large body of water in a more natur

    contents of the data lake stream in from a source to fill the lake, and various u

    can come to examine, dive in, or take samples.

    --Pentaho

    The difference between a data lake and a data warehouse is that in a data wa

    data is pre-categorized at the point of entry, which can dictate how its going t

    --Forbes

  • 8/11/2019 Data Lake for Hadoop

    4/12

    2013 Cisco and/or its affiliates. All rights reserved.

    Databases

    Current Hadoop Landscape

    Unstructured Data

    Docs, Cases, Content

    IoE, Machine Data,

    Clickstream

    ERP

    SFDC

    Database N

    Data Sources Hadoop Platform

    IB, Contracts,

    Hierarchies

    Network Logs

    CPAI

    IB, Cases,

    Hierarchies,

    Customer

    Network Logs

    Collab

    CSTG

    Customer,

    Hierarchies

    Cisco.com

    logs

    Marketing

    Bookings,

    Hierarchies

    etc

    Data Science Program

  • 8/11/2019 Data Lake for Hadoop

    5/12 2010 Cisco and/or its affiliates. All rights reserved.

    Every project team spends resources in bringing its data

    Difficult to track data elements availability in the platform

    Redundant platform resource utilization for data acquisition & mai

    Data quality and reliability issues

    Project teams develop their data acquisition flows manually

  • 8/11/2019 Data Lake for Hadoop

    6/12 2013 Cisco and/or its affiliates. All rights reserved.

    Databases

    Data Lake

    Unstructured Data

    Docs, Cases, Content

    IoE, Machine Data,

    Clickstream

    ERP

    SFDC

    Database N

    Data Sources Hadoop Platform

    IB, Contracts, Cases

    Hierarchies, Bookings,

    Customers, Supply Chain

    Etc

    Network Logs,Cisco.com logs,

    Documents,

    etc

    Data Lake (EDS)CPAI

    Marketing

    Data Science

    CSTG

  • 8/11/2019 Data Lake for Hadoop

    7/12 2013 Cisco and/or its affiliates. All rights reserved.

    Data reusebring data once and consumed by multiple projects

    Data stored in raw formatcan be used by variety of apps and to

    Automated frameworkcan be quickly configured to get data from

    Better resource utilizationfrees resources in source systems an

    platform

    Quick project deliveries

  • 8/11/2019 Data Lake for Hadoop

    8/12

    2013 Cisco and/or its affiliates. All rights reserved.

    Databases

    High Level Data Lake Architecture

    Unstructured Data

    Docs, Cases, Content

    IoE, Machine Data,

    Clickstream

    ERP

    SFDC

    Database N

    Data Sources Hadoop Platform

    IB, Contracts, Cases

    Hierarchies, Bookings,

    Customers, Supply Chain

    Etc

    Network Logs,Cisco.com logs,

    Documents,

    etc

    Data Lake (EDS)Tidal

    Data Lake

    Load Process

    Hadoop Edge Node

    Data lake

    Metadata(TD)

  • 8/11/2019 Data Lake for Hadoop

    9/12

    2013 Cisco and/or its affiliates. All rights reserved.

  • 8/11/2019 Data Lake for Hadoop

    10/12

    2013 Cisco and/or its affiliates. All rights reserved.

    Unstructured

    Sources

    Data Lake Population and Consumption

    Transformed LayerData Lake

    (Source Like

    Structure) T

    L

    F1

    F2

    F3

    F5 F6

    F4

    S

    O

    R

    Any SourceStructured

    Sources

    CG1

    TD

    Docs, Cases,

    Content

    IoE, Machine

    data,

    Clickstream

    ETL Offload

    (3NF Model)

    What data

    Dat

    Str

    ProMo

    What are

    Lake?

    Any

    uns

    Do we bu

    layer?

    Yes

    HADOOP

    SSOT to be

    consumed

    Functional A

    Data Lake,

    other Funct

    EDS Gover

    Transforme

  • 8/11/2019 Data Lake for Hadoop

    11/12

    Thank you.

  • 8/11/2019 Data Lake for Hadoop

    12/12

    2010 Cisco and/or its affiliates. All rights reserved.