17
HadoopDB in Action: Building Real World Applications Tilani Gunawardena

HadoopDB in Action

Embed Size (px)

DESCRIPTION

HadoopDB in Action: Building Real World Applications

Citation preview

Page 1: HadoopDB in Action

HadoopDB in Action: Building Real World

Applications

Tilani Gunawardena

Page 2: HadoopDB in Action

IntroductionArchitecture and DesignExample applicationDemostration Scenario

Road Map

Page 3: HadoopDB in Action

Managing and analysing massive data◦ Provides high performance◦ Scales over clusters of thousands of

heterogeneous machines◦ Versatile-adaptability of a system to analytical

queries of varying complexity

How does one build real world applications with HadoopDB?

Introduction

Page 4: HadoopDB in Action

Database Connector - connects Hadoop with the single-node database systems.

Data Loader - partitions data and manages parallel loading of data into the database systems.

Catalog - tracks locations of different data chunks,including those replicated across multiple nodes.

SQL-MapReduce-SQL (SMS) planner - extends Hive to provide a SQL interface to HadoopDB

Architecture And Design

Page 5: HadoopDB in Action
Page 6: HadoopDB in Action

Supports any JDBC-compliant database server

as an underlying DBMS layer Applications built on top of HadoopDB

generally use the 3-tier architecture◦ data tier◦ business logic tier◦ presentation tier

HadoopDB is a black box(in application perspective)

HadoopDB

Page 7: HadoopDB in Action

A semantic web/biological data analysis application.

A business data warehousing application.

Example Application

Page 8: HadoopDB in Action

Semantic web is an effort by the W3C to enable integration and sharing of data across dierent applications

RDF- is a directed, labeled graph data format for representing information in the Web

SPARQL –is an RDF query language

SemanticWeb-Biological Data Analysis

Page 9: HadoopDB in Action

Find all proteins whose existence in the `Human' organism is uncertain

SPARQL query :

Page 10: HadoopDB in Action
Page 11: HadoopDB in Action

demonstrate◦ how the data administrator should prepare the

dataset.

Analyst- is shielded from the complexity of the actual implementation of the RDF storage layer.

Page 12: HadoopDB in Action

Natural target application for HadoopDB. Common business data warehousing

workloads are read-mostly and involve analytical queries over a complex schema

To achieve good query performance, the dataset requires signicant preparation through data partitioning and replication to optimize for join queries

Data & Queries- TPC-H benchmark

Business Data Warehousing

Page 13: HadoopDB in Action

Find 10 highest-revenue unshipped orders Query :

Page 14: HadoopDB in Action

Audience is invited to query both data sets through HadoopDB

Data sets are located in a remote cluster Multiple users interaction- two client

machines that connect to the clusters.

Demonstration scenario

Page 15: HadoopDB in Action

user selects dataset SemanticWeb—Biological Data Analysis

- An animation of the behind-the-scenes data preparation & loading is presented- Details on the tools used for data conversion from RDF to relational form.

Business Data Warehousing- the animation provides details on the partitioning scheme, the interaction between the loader and catalog components, and a summary of the configuration parameters

User select and parametrize a query to execute -User can then monitor the progress of query

execution

Page 16: HadoopDB in Action

In addition demonstrate HadoopDB's fault-tolerance with the introduction of a node failure.

For a subset of the predened queries, as the query executes in the background, an animation of the flow of data and control through the HadoopDB system is simultaneously presented, highlighting which parts of the query execution are run in parallel.

Page 17: HadoopDB in Action

Thank You!