81
BIGSQL homerun or merely a major bluff? Copyright © 2016 ITGAIN GmbH 1 PER STRICKER, THOMAS KALB 07.02.2017, HEART OF TEXAS DB2 USER GROUP, AUSTIN 08.02.2017, DB2 FORUM USER GROUP, DALLAS INITIAL EVALUATION BIGSQL FOR HORTONWORKS (Homerun or merely a major bluff?)

INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Embed Size (px)

Citation preview

Page 1: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 1

PER STRICKER, THOMAS KALB

07.02.2017, HEART OF TEXAS DB2 USER GROUP, AUSTIN

08.02.2017, DB2 FORUM USER GROUP, DALLAS

INITIAL EVALUATION BIGSQL FOR HORTONWORKS (Homerun or merely a major bluff?)

Page 2: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 2

Agenda

Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) Installation stumbling blocks

Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation

Working with BigSQL Familiar and the New

a. DB2 - Interface b. HDFS - Interface

Der Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine

Functional Differences Performance Differences

BIG SQL and Hive Conclusion – Sham or Masterstroke? Questions and Discussions

Page 3: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Hadoop (HDFS)

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 3

http://bradhedlund.s3.amazonaws.com/2011/hadoop-network-intro/Hadoop-Cluster.PNG

Page 4: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Hadoop Distribution

Cloudera / Hortonworks / MapR / IOP (Worldwide Market share)

Hortonworks 16 %

others 20 %

Cloudera

53%

MapR 11 %

Quelle: https://www.dezyre.com/article/top-6-hadoop-vendors-providing-big-data-solutions-in-open-data-platform/93

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 4

Page 5: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Hadoop Appraisal

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 5

Quelle: https://www.cloudera.com/content/dam/www/static/documents/analyst-reports/forrester-wave-big-data-hadoop-distributions.pdf

Page 6: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Hadoop SQL Engines

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 6

Quelle: IBM Big SQL – Vendor Landscape © 2014 IBM Corporation

Page 7: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 7

Agenda

Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) BIGSQL – Sham or Masterstroke? Installation stumbling blocks

Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation

Working with BigSQL Familiar and the New

a. DB2 - Interface b. HDFS - Interface

Der Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine

Functional Differences Performance Differences

Conclusion – Sham or Masterstroke? Questions and Discussion

Page 8: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Big SQL and MPP-Architecture

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 8

IBM Big SQL is a high performance SQL- on-Apache-Hadoop- Engine

IBM MPP-engine (C++) replaces the MapReduce-Layer (Java)

Big SQL is a MPP (Massively Parallel Processing) SQL-engine

HIVE extends Hadoop with Data- Warehouse Features

HBASE is a distributed column-oriented database

HDFS is a high availability filesystem for storing very large volumes of data distributed across many nodes.

Quelle: Big SQL: A Technical Introduction © 2016 IBM Corporation

Page 9: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 9

SMP vs. MPP Architecture

SMP: Dynamically distributes running processes across all available processors which share system resources (multi processor systems)

Page 10: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 10

SMP vs. MMP Architecture

MPP: Distributes a task across multiple independent nodes with individual processors, RAM and I/O. (Share nothing architecture)

Page 11: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

SMP Scaling

Vertical Scaling

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 11

Page 12: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Horizontal Scaling

BIGSQL homerun or merely a major bluff?

Page 13: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Agenda

Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) Installation stumbling blocks

Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation

Working with BigSQL Familiar and the New

a. DB2 - Interface b. HDFS - Interface

Der Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine

Functional Differences Performance Differences

BIG SQL and Hive Conclusion – Sham or Masterstroke? Questions and Discussions

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 13

Page 14: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Hadoop

Cluster

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 14

DB2 DPF versus Hadoop (HDFS) Hadoop Cluster (Diploma Thesis)

DB2 DPF

Page 15: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 15

DB2 DPF

Quelle: toadworld.com

Page 16: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 16

Big SQL – IBM Slide

Quelle: Big SQL: A Technical Introduction © 2016 IBM Corporation

Page 17: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 17

BIG SQL – ITGAIN Slide

Page 18: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 18

Agenda

Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) Installation stumbling blocks

Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation

Working with BigSQL Familiar and the New

a. DB2 - Interface b. HDFS - Interface

Der Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine

Functional Differences Performance Differences

Conclusion – Sham or Masterstroke? Questions and Discussions

Page 19: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 19

Installation Stumbling Blocks

ITGAIN Test Environment

Installing two nodes

• Hardware

2 virtual Servers with 8 Cores / 10 GB RAM / SSDs

• Software

Linux RedHat 7.2 / Cent OS 7.2

Ambari 2.2.2.0

Hortonworks Data Platform (HDP) 2.4.2

BETA: Big SQL 4.2 for Hortonworks Data Platform

Extending with two additional identical nodes (DataNode / WorkerNode)

Page 20: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 20

Installation Stumbling Blocks Red Hat or CentOS?

IBM BigInsights for Apache Hadoop 4.2 only supports

Red Hat Enterprise Linux (RHEL) Server 6.7

Red Hat Enterprise Linux (RHEL) Server 7.2

Hortonworks Data Platform HDP 2.4.2 supports

Red Hat Enterprise Linux (RHEL) 6.x - 7.x

CentOS 6.x - 7.x

Debian 7.x

Oracle Linux 6.x - 7.x

SUSE Linux Enterprise Server (SLES) v11 SP3 / SP4

Ubuntu Precise v12.04

Ubuntu Trusty v14.04

Page 21: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 21

Installation Stumbling Blocks Red Hat or CentOS?

Recommendation for BETA auf Hortonworks Red Hat Enterprise Linux (RHEL) Server 7.2

Test-Cluster on

Red Hat Enterprise Linux (RHEL) Server 7.2

CentOS 7.2

Installation on both OSes was successful

Page 22: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Installation Stumbling Blocks The HDP Installation with Ambari

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 22

Page 23: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Installation Stumbling Blocks The HDP Installation with Ambari

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 23

Tips and Tricks:

• Very simple installation with Ambari, provided there are no errors

• Therefore: prior to the installation take the time to clear any warnings in the Confirm Hosts and Check Scripts

• In case of Errors: Check the errors output to stderr

Often stderr is empty Typical cause is a timeout

If stderr contains errors Attempt to correct the error and retry

• If the installation crashes it is often easier to retry with a fresh OS

rather than changing the OS and retrying the installation

Page 24: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 24

Installation Stumbling Blocks The BigSQL Installation

Recommendations: Execute the Big SQL Pre-Checker before the Installation

Pre-Checker Scripts are available in the installation package but need to be extracted

rpm2cpio BigInsights-HDP-1.2.0.0-2.4.el7.x86_64.rpm | cpio -ivd

./var/lib/ambari-server/resources/stacks/HDP/2.4/services/BIGSQL/

package/scripts/bigsql-precheck.sh

rpm2cpio BigInsights-HDP-1.2.0.0-2.4.el7.x86_64.rpm | cpio -ivd

./var/lib/ambari-server/resources/stacks/HDP/2.4/services/BIGSQL/

package/scripts/bigsql-util.sh

All errors should be cleared before starting the installation

Page 25: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 25

Installation Stumbling Blocks The BigSQL Installation

Execute for ALL servers!

Only when successful should you start the installation

Page 26: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 26

Installation Stumbling Blocks The BigSQL Installation

Add the Service to a Cluster

Page 27: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 27

Installation Stumbling Blocks The BigSQL Installation

Page 28: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 28

Installation Stumbling Blocks The BigSQL Installation

It is always possible to add additional Big SQL Workers to an individual host via Add Services option under Hosts

However, this is not possible on a Big SQL Head Node!

Page 29: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 29

Installation Stumbling Blocks Extending the Cluster with Ambari

Additional hosts can easily be added with the Add New Hosts – Wizard

Page 30: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Installation Stumbling Blocks Extending the Cluster with Ambari

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 30

Page 31: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Installation Stumbling Blocks Extending the Cluster with Ambari

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 31

Page 32: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Installation Stumbling Blocks Extending the Cluster with Ambari

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 32

Page 33: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Installation Stumbling Blocks Extending the Cluster with Ambari

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 33

Data must be redistributed after the extension

Page 34: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 34

Agenda

Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) Installation stumbling blocks

Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation

Working with BigSQL Familiar and the New

a. DB2 - Interface b. HDFS - Interface

Der Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine

Functional Differences Performance Differences

BIG SQL and Hive Conclusion – Sham or Masterstroke? Questions and Discussions

Page 35: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Working with BigSQL – The New and the Familiar

DB2 Interface

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 35

Page 36: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Working with BigSQL – The New and the Familiar

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 36

Where does one find the Tables in HDFS? /apps/hive/warehouse/bigsql.db/firsttable

Page 37: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Working with BigSQL – The New and the Familiar

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 37

Or via the Command line (HDFS Browse):

Page 38: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Working with BigSQL – The New and the Familiar

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 38

Not everything works with the DB2 Command line: For example loading data into a Hadoop Table

What now?

Page 39: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Working with BigSQL – The New and the Familiar

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 39

There is also a Command line for BigSQL: JSqsh (Java SQL Shell) – pronounced "jay-skwish“

According to the docs it should be found in:

/usr/ibmpacks/common-utils/current/jsqsh

BUT:

Page 40: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Working with BigSQL – The New and the Familiar

SOLUTION: JSqsh isn’t part of the BigSQL-Installation

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 40

Page 41: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Working with BigSQL – The New and the Familiar

JSqsh appears in the list of installed clients

JSqsh can also be installed via the OpenSource GitHub- project

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 41

Page 42: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Working with BigSQL – The New and the Familiar

JSqsh Setup:

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 42

Page 43: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Working with BigSQL – The New and the Familiar

JSqsh Setup: driver selection

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 43

Page 44: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Working with BigSQL – The New and the Familiar

JSqsh Setup: Customize the Connection details and save

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 44

Page 45: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Working with BigSQL – The New and the Familiar

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 45

Requesting the table list with Jsqsh

Jsqsh Command help via \help e.g g.: Defining the current schema: use BIGSQL

Requesting a table list in a given schema: \show tables

Page 46: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Working with BigSQL – The New and the Familiar

Starting point: Load data in the Tables Tip: for better Performance load the Load-File with hdfs

hdfs dfs -copyFromLocal /tmp/firsttable.csv /tmp/

hdfs dfs -chmod 777 /tmp/firsttable.csv

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 46

Page 47: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Working with BigSQL – The New and the Familiar

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 47

What happened in the hdfs-Filesystem? a new file has appeared

Page 48: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Working with BigSQL – The New and the Familiar

db2top also works: For example, LOAD

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 48

Page 49: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Working with BigSQL – The New and the Familiar

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 49

Even db2pd works: For example LOAD However LIST UTILITIES does not work

Page 50: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 50

Agenda

Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) Installation stumbling blocks

Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation

Working with BigSQL Familiar and the New

a. DB2 - Interface b. HDFS - Interface

Der Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine

Functional Differences Performance Differences

BIG SQL and Hive Conclusion – Sham or Masterstroke? Questions and Discussions

Page 51: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Loading the Benchmark BIGSQL HDFS Table

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 51

Page 52: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

The HDFS (DB2-) Blocks

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 52

Page 53: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 53

BIGSQL HDFS versus DB2 DPF

Page 54: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 54

BIGSQL HDFS versus DB2 DPF

Page 55: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 55

DB2 DPF Restrictions

Page 56: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 56

DB2 DPF Restrictions

Page 57: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Performance differences DB2 DPF versus DB2 HDFS Loading 10 million rows

DB2 HDFS: 64 Sek.

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 57

DB2 DPF: 22 Sek.

Page 58: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Performance differences DB2 DPF versus DB2 HDFS Random I/O Benchmark (Reading von 1023 rows)

DB2 DPF DB2 HDFS Cold: Cold:

Warm: Warm:

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 58

Page 59: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Performance differences DB2 DPF versus DB2 HDFS Read-Ahead I/O Benchmark (Reading von 10 Mio. Rows)

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 59

Warm:

Cold:

Warm:

Cold:

DB2 DPF DB2 HDFS

Page 60: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 60

Agenda

Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) Installation stumbling blocks

Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation

Working with BigSQL Familiar and the New

a. DB2 - Interface b. HDFS - Interface

The Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine

Functional Differences Performance Differences

BIG SQL and Hive Conclusion – Sham or Masterstroke? Questions and Discussions

Page 61: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

The Big Data Deployment (SQL for unstructured Data)

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 61

Working with datatypes for complex data (partially structured)

ARRAY: Collection of data of the same datatype

MAP: Collection of Key-Value pairs

STRUCT: Collection of data with different datatypes

Working with unstructured data is possible via the Serializer and

Deserializer (SerDe)

The SerDe-Interface is instructed how it should process data blocks

There are many Built-In SerDes for example for JSON, Avro, Parquet, Regular Expressions, etc...

Many SerDes are available in the Public Domain

Specific SerDes that may be required can be developed in Java

Page 62: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 62

Big Data – Working with the ARRAY-Data types

Collection of data of the same datatype

Page 63: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Big Data – Working with MAP Types

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 63

Collection of Key-Value pairs

Page 64: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 64

Big Data– Working with STRUCTs

Collection of data with different data types

Page 65: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Big Data – Unstructured Data

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 65

Using SerDes in BigSQL

Before using the SerDe.jar-Files it needs to be registered in BigSQL - Only when the jar file has been successfully registered will it be available to BigSQL

3 Steps to Register:

Hive Servers: Copy the SerDe.jar-File in the /lib/ directory

Big SQL Node: Copy the SerDe.jar-File in the /userlib/ directory of each individual node

Restart all BigSQL Services

Page 66: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Big Data – Example of Unstructured Data

Example: Parsing log files with Regular Expression (RegexSerDe)

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 66

Page 67: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Big Data – Example of Unstructured Data

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 67

select * from apache_log fetch first 5 rows only

For example, to correlate Client Data with Web Browser data for analysis of user behavior

Page 68: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 68

Agenda

Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) Installation stumbling blocks

Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation

Working with BigSQL Familiar and the New

a. DB2 - Interface b. HDFS - Interface

Der Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine

Functional Differences Performance Differences

BIG SQL and Hive Conclusion – Sham or Masterstroke? Questions and Discussions

Page 69: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

Big SQL versus Hive

SQLReplayer

Copyright © 2016 ITGAIN GmbH 69

Page 70: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

SQLReplayer

Copyright © 2016 ITGAIN GmbH 70

Hive Big SQL Object Synchronization

Create a table into Hive:

Page 71: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

SQLReplayer

Copyright © 2016 ITGAIN GmbH 71

Hive Big SQL Object Synchronization

Synchronize the Hive Tables:

Page 72: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

SQLReplayer

Copyright © 2016 ITGAIN GmbH 72

Hive Big SQL Object Synchronization

Test the Big SQL Table:

Page 73: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

SQLReplayer

Copyright © 2016 ITGAIN GmbH 73

Hive Big SQL Data Synchronization (Refresh)

Edit the HDFS File:

Page 74: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

SQLReplayer

Copyright © 2016 ITGAIN GmbH 74

Hive Big SQL Data Synchronization (Refresh)

Select the Hive Table:

Page 75: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

SQLReplayer

Copyright © 2016 ITGAIN GmbH 75

Hive Big SQL Data Synchronization (Refresh)

Synchronization (Refresh):

Page 76: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 76

Agenda

Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) Installation stumbling blocks

Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation

Working with BigSQL Familiar and the New

a. DB2 - Interface b. HDFS - Interface

Der Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine

Functional Differences Performance Differences

BIG SQL and Hive Conclusion – Sham or Masterstroke? Questions and Discussions

Page 77: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 77

BIGSQL – Sham or Masterstroke?

Sham

DB2 DPF for HDFS

Masterstroke

The right strategy at the right time

Reuse of existing investments

Increased acceptance via the reuse of SQL

Simple integration of Big Data in an existing infrastructure

Page 78: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 78

The Big Data Solution

Big SQL Hadoop-Tables are not a replacement for OLTP-DBMS Technology

Big SQL makes it possible to use SQL Requests against existing Hadoop Data (no proprietary storage formats)

All the data are Hadoop files in HDFS

Big SQL was developed to make effective and efficient use of the Hadoop infrastructure Most organizations possess experienced SQL developers

No UPDATE or DELETE is possible on a Hadoop table

Much lower license costs than DPF

Good SQL compatibility

Great monitoring with Speedgain for BIGSQL is available

Page 79: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 79

The Big Data Solution

Primary Use cases would be:

To move rarely referenced data out of the Data-Warehouse and onto cheaper hardware while maintaining the ability to query the data via SQL

To setup new Data-Warehouse

To filter and analyze unstructured data (such as log files, sensor data and social media) as well as to connect this data to existing structured data (such as via federation)

Page 80: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 80

Conclusion

Bluff = Homerun

Page 81: INITIAL EVALUATION BIGSQL FOR · PDF file8/2/2017 · Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks ... IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise

BIGSQL homerun or merely a major bluff?

Copyright © 2016 ITGAIN GmbH 81

Q & A