18
MariaDB ColumnStore BigData Analytics

Big Data Analytics with MariaDB ColumnStore

Embed Size (px)

Citation preview

Page 1: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Thank You (Dark)

MariaDB ColumnStore BigData Analytics

Page 2: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Title and Content PowerPoint Default

Agenda

Session 1 Overview Architecture

Session 2 Window Functions Analytic Functions

Session 3 Demo – DEX Data Explorer

Page 3: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Title Only PowerPoint Default

Analytics vs

Data Warehouse

What questions do you have?

What had happen?

Page 4: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Title and Content PowerPoint Default

Data Warehousing

Selective column based queries Large number of dimensions

High Performance Analytics On Large

Volume Of Data

Reporting and analysis on millions or billions

of rows From datasets

containing millions to trillions of rows

Terabytes to Petabytes of datasets

Analytics Require

Statistical Algorithms, Windowing Functions

Learning from data and understanding data

Technical Use Cases

Page 5: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Title Only PowerPoint Default

Data Scientist/Engineer

What tool(s) do I use? SQL interfaces

What’s inside the dataset?

Data Exploration

What story can I tell? Visualization

(picture worth 1000 words)

Page 6: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Title and Content PowerPoint Default

MariaDB ColumnStore

•  GPLv2 Open Source •  Columnar, Massively Parallel

MariaDB Storage Engine •  Scalable, high-performance

analytics platform •  Built in redundancy and

high availability •  Runs on premise, on AWS cloud •  Full SQL syntax and capabilities

regardless of platform Big Data Sources Analytics Insight

MariaDB ColumnStore

. . . Node 1 Node 2 Node 3 Node N

Local / AWS® / GlusterFS ®

ELT Tools

BI Tools Analyticials

Page 7: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Title Only PowerPoint Default

MariaDB ColumnStore Architecture

Columnar Distributed Data Storage

User Connections

User Module n User Module 1

Performance Module n

Performance Module 2

Performance Module 1

MariaDB Front End

Query Engine

User Module Processes SQL Requests

Performance Module Distributed Processing Engine

Page 8: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Title Only PowerPoint Default

MariaDB ColumnStore High performance columnar storage engine that support wide variety of

analytical use cases with SQL in a highly scalable distributed environments

Parallel query processing for

distributed environments

Faster, More Efficient Queries

Single SQL Interface for OLTP and

analytics

Easier Enterprise Analytics

Power of SQL and Freedom of Open

Source to Big Data Analytics

Better Price Performance

Page 9: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Comparison PowerPoint Default

OLTP/NoSQL Workloads

Suited for reporting or analysis of millions-billions of rows from data sets containing millions-trillions of rows.

OLAP/Analytic/ Reporting Workloads

Workload – Query Vision/Scope

1 100 10,000

10-100GB 10,000,000,000

1-10TB 1,000,000 100,000,000

100-1,000GB

Page 10: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Title Only PowerPoint Default

Sizing

Minimum Spec

UM 4 core,

32 G RAM PM 4 core,

16 G RAM Typical Server spec

PM 8 core 64G RAM

UM 8 core, 264G RAM

Data Storage

External Data Volumes •  Maximum 2 data volume per IO

channel per PM node server •  up to 2TB on the disk per data

volume ≈ Max 4 TB per PM node

Local disk Up to 2TB on the disk per

PM node server

DETAILED SIZING GUIDE based on data size

and workload

Page 11: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Title Only PowerPoint Default

Sizing - Example

•  MariaDB ColumnStore 60TB uncompressed data = 6TB compressed data at 10x compression

•  2UM - 8 core 512G(based on work load)

•  6 TB compressed = 3 data volume (at 2TB per volume) -  with 1 data volume per PM node - 3PMs

•  Data growth - 2TB per month, Data retention - 2 years -  Plan for 2TB X24 = 48 TB additional -  48 TB = 4.8TB compressed ≈ 3 data volume(at 2TB per volume)

with 1 data volume per PM node - 3 additional PMs

•  Total 6 PMs, 2 UMs

Page 12: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Blank PowerPoint Default

Analytics with MariaDB

ColumnStore

SQL Features Aggregation

Window Functions

Page 13: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Title Only (Dark)

ColumnStore SQL Features

Source : InfiniDB SQL Syntax Guide

Cross Engine Joins

CTE

DML

Aggregation

DDL

Disk Based Joins

Windowing Functions

SELECT QUERY

Page 14: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Title and Content PowerPoint Default

MariaDB ColumnStore

MariaDB ColumnStore uses standard

“Engine=columnstore” syntax

mysql> use tpcds_djoshi Database changed mysql> select count(*) from store_sales; +----------+ | count(*) | +----------+ | 2880404 | +----------+ 1 row in set (1.68 sec) mysql> describe warehouse; +-------------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------+--------------+------+-----+---------+-------+ | w_warehouse_sk | int(11) | NO | | NULL | | | w_warehouse_id | char(16) | NO | | NULL | | | w_warehouse_name | varchar(20) | YES | | NULL | | | w_warehouse_sq_ft | int(11) | YES | | NULL | | | w_street_number | char(10) | YES | | NULL | | | w_street_name | varchar(60) | YES | | NULL | | | w_street_type | char(15) | YES | | NULL | | | w_suite_number | char(10) | YES | | NULL | | | w_city | varchar(60) | YES | | NULL | | | w_county | varchar(30) | YES | | NULL | | | w_state | char(2) | YES | | NULL | | | w_zip | char(10) | YES | | NULL | | | w_country | varchar(20) | YES | | NULL | | | w_gmt_offset | decimal(5,2) | YES | | NULL | | +-------------------+--------------+------+-----+---------+-------+ 14 rows in set (0.05 sec)

CREATE TABLE `game_warehouse`.`dim_title` ( `id` INT, `name` VARCHAR(45), `publisher` VARCHAR(45), `release_date` DATE, `language` INT, `platform_name` VARCHAR(45), `version` VARCHAR(45) ) ENGINE=columnstore;

Uses custom scalable columnar architecture

Page 15: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Title and Content PowerPoint Default

MariaDB ColumnStore

mysql> use tpcds_djoshi Database changed mysql> select count(*) from store_sales; +----------+ | count(*) | +----------+ | 2880404 | +----------+ 1 row in set (1.68 sec) mysql> describe warehouse; +-------------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------+--------------+------+-----+---------+-------+ | w_warehouse_sk | int(11) | NO | | NULL | | | w_warehouse_id | char(16) | NO | | NULL | | | w_warehouse_name | varchar(20) | YES | | NULL | | | w_warehouse_sq_ft | int(11) | YES | | NULL | | | w_street_number | char(10) | YES | | NULL | | | w_street_name | varchar(60) | YES | | NULL | | | w_street_type | char(15) | YES | | NULL | | | w_suite_number | char(10) | YES | | NULL | | | w_city | varchar(60) | YES | | NULL | | | w_county | varchar(30) | YES | | NULL | | | w_state | char(2) | YES | | NULL | | | w_zip | char(10) | YES | | NULL | | | w_country | varchar(20) | YES | | NULL | | | w_gmt_offset | decimal(5,2) | YES | | NULL | | +-------------------+--------------+------+-----+---------+-------+ 14 rows in set (0.05 sec)

MariaDB Front End

Standard ANSI SQL

Page 16: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Title Only PowerPoint Default

MAX RANK

MIN DENSE_RANK

COUNT PERCENT_RANK

SUM NTH_VALUE

AVG FIRST_VALUE

VARIANCE LAST_VALUE

VAR_POP CUME_DIST

VAR_SAMP LAG

STD LEAD

STDDEV NTILE

STDDEV_POP PERCENTILE_CONT

STDDEV_SAMP PERCENTILE_DISC

ROW_NUMBER MEDIAN

•  Aggregate over a series of related rows

•  Simplified function for complex statistical analytics over sliding window per row -  Cumulative, moving or centered aggregates -  Simple Statistical functions like rank, max, min,

average, median -  More complex functions such as distribution,

percentile, lag, lead -  Without running complex sub-queries

Windowing Functions

Source : InfiniDB SQL Syntax Guide

Page 17: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Blank PowerPoint Default

Data exploration

Dataset Import

Data Visualization

Dataset Exploration Demo

Page 18: Big Data Analytics with MariaDB ColumnStore

LAYOUT

Thank You (Blue)

Thank you