Upload
mariadb-corporation
View
75
Download
5
Embed Size (px)
Citation preview
LAYOUT
Thank You (Dark)
MariaDB ColumnStore BigData Analytics
LAYOUT
Title and Content PowerPoint Default
Agenda
Session 1 Overview Architecture
Session 2 Window Functions Analytic Functions
Session 3 Demo – DEX Data Explorer
LAYOUT
Title Only PowerPoint Default
Analytics vs
Data Warehouse
What questions do you have?
What had happen?
LAYOUT
Title and Content PowerPoint Default
Data Warehousing
Selective column based queries Large number of dimensions
High Performance Analytics On Large
Volume Of Data
Reporting and analysis on millions or billions
of rows From datasets
containing millions to trillions of rows
Terabytes to Petabytes of datasets
Analytics Require
Statistical Algorithms, Windowing Functions
Learning from data and understanding data
Technical Use Cases
LAYOUT
Title Only PowerPoint Default
Data Scientist/Engineer
What tool(s) do I use? SQL interfaces
What’s inside the dataset?
Data Exploration
What story can I tell? Visualization
(picture worth 1000 words)
LAYOUT
Title and Content PowerPoint Default
MariaDB ColumnStore
• GPLv2 Open Source • Columnar, Massively Parallel
MariaDB Storage Engine • Scalable, high-performance
analytics platform • Built in redundancy and
high availability • Runs on premise, on AWS cloud • Full SQL syntax and capabilities
regardless of platform Big Data Sources Analytics Insight
MariaDB ColumnStore
. . . Node 1 Node 2 Node 3 Node N
Local / AWS® / GlusterFS ®
ELT Tools
BI Tools Analyticials
LAYOUT
Title Only PowerPoint Default
MariaDB ColumnStore Architecture
Columnar Distributed Data Storage
User Connections
User Module n User Module 1
Performance Module n
Performance Module 2
Performance Module 1
MariaDB Front End
Query Engine
User Module Processes SQL Requests
Performance Module Distributed Processing Engine
LAYOUT
Title Only PowerPoint Default
MariaDB ColumnStore High performance columnar storage engine that support wide variety of
analytical use cases with SQL in a highly scalable distributed environments
Parallel query processing for
distributed environments
Faster, More Efficient Queries
Single SQL Interface for OLTP and
analytics
Easier Enterprise Analytics
Power of SQL and Freedom of Open
Source to Big Data Analytics
Better Price Performance
LAYOUT
Comparison PowerPoint Default
OLTP/NoSQL Workloads
Suited for reporting or analysis of millions-billions of rows from data sets containing millions-trillions of rows.
OLAP/Analytic/ Reporting Workloads
Workload – Query Vision/Scope
1 100 10,000
10-100GB 10,000,000,000
1-10TB 1,000,000 100,000,000
100-1,000GB
LAYOUT
Title Only PowerPoint Default
Sizing
Minimum Spec
UM 4 core,
32 G RAM PM 4 core,
16 G RAM Typical Server spec
PM 8 core 64G RAM
UM 8 core, 264G RAM
Data Storage
External Data Volumes • Maximum 2 data volume per IO
channel per PM node server • up to 2TB on the disk per data
volume ≈ Max 4 TB per PM node
Local disk Up to 2TB on the disk per
PM node server
DETAILED SIZING GUIDE based on data size
and workload
LAYOUT
Title Only PowerPoint Default
Sizing - Example
• MariaDB ColumnStore 60TB uncompressed data = 6TB compressed data at 10x compression
• 2UM - 8 core 512G(based on work load)
• 6 TB compressed = 3 data volume (at 2TB per volume) - with 1 data volume per PM node - 3PMs
• Data growth - 2TB per month, Data retention - 2 years - Plan for 2TB X24 = 48 TB additional - 48 TB = 4.8TB compressed ≈ 3 data volume(at 2TB per volume)
with 1 data volume per PM node - 3 additional PMs
• Total 6 PMs, 2 UMs
LAYOUT
Blank PowerPoint Default
Analytics with MariaDB
ColumnStore
SQL Features Aggregation
Window Functions
LAYOUT
Title Only (Dark)
ColumnStore SQL Features
Source : InfiniDB SQL Syntax Guide
Cross Engine Joins
CTE
DML
Aggregation
DDL
Disk Based Joins
Windowing Functions
SELECT QUERY
LAYOUT
Title and Content PowerPoint Default
MariaDB ColumnStore
MariaDB ColumnStore uses standard
“Engine=columnstore” syntax
mysql> use tpcds_djoshi Database changed mysql> select count(*) from store_sales; +----------+ | count(*) | +----------+ | 2880404 | +----------+ 1 row in set (1.68 sec) mysql> describe warehouse; +-------------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------+--------------+------+-----+---------+-------+ | w_warehouse_sk | int(11) | NO | | NULL | | | w_warehouse_id | char(16) | NO | | NULL | | | w_warehouse_name | varchar(20) | YES | | NULL | | | w_warehouse_sq_ft | int(11) | YES | | NULL | | | w_street_number | char(10) | YES | | NULL | | | w_street_name | varchar(60) | YES | | NULL | | | w_street_type | char(15) | YES | | NULL | | | w_suite_number | char(10) | YES | | NULL | | | w_city | varchar(60) | YES | | NULL | | | w_county | varchar(30) | YES | | NULL | | | w_state | char(2) | YES | | NULL | | | w_zip | char(10) | YES | | NULL | | | w_country | varchar(20) | YES | | NULL | | | w_gmt_offset | decimal(5,2) | YES | | NULL | | +-------------------+--------------+------+-----+---------+-------+ 14 rows in set (0.05 sec)
CREATE TABLE `game_warehouse`.`dim_title` ( `id` INT, `name` VARCHAR(45), `publisher` VARCHAR(45), `release_date` DATE, `language` INT, `platform_name` VARCHAR(45), `version` VARCHAR(45) ) ENGINE=columnstore;
Uses custom scalable columnar architecture
LAYOUT
Title and Content PowerPoint Default
MariaDB ColumnStore
mysql> use tpcds_djoshi Database changed mysql> select count(*) from store_sales; +----------+ | count(*) | +----------+ | 2880404 | +----------+ 1 row in set (1.68 sec) mysql> describe warehouse; +-------------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------+--------------+------+-----+---------+-------+ | w_warehouse_sk | int(11) | NO | | NULL | | | w_warehouse_id | char(16) | NO | | NULL | | | w_warehouse_name | varchar(20) | YES | | NULL | | | w_warehouse_sq_ft | int(11) | YES | | NULL | | | w_street_number | char(10) | YES | | NULL | | | w_street_name | varchar(60) | YES | | NULL | | | w_street_type | char(15) | YES | | NULL | | | w_suite_number | char(10) | YES | | NULL | | | w_city | varchar(60) | YES | | NULL | | | w_county | varchar(30) | YES | | NULL | | | w_state | char(2) | YES | | NULL | | | w_zip | char(10) | YES | | NULL | | | w_country | varchar(20) | YES | | NULL | | | w_gmt_offset | decimal(5,2) | YES | | NULL | | +-------------------+--------------+------+-----+---------+-------+ 14 rows in set (0.05 sec)
MariaDB Front End
Standard ANSI SQL
LAYOUT
Title Only PowerPoint Default
MAX RANK
MIN DENSE_RANK
COUNT PERCENT_RANK
SUM NTH_VALUE
AVG FIRST_VALUE
VARIANCE LAST_VALUE
VAR_POP CUME_DIST
VAR_SAMP LAG
STD LEAD
STDDEV NTILE
STDDEV_POP PERCENTILE_CONT
STDDEV_SAMP PERCENTILE_DISC
ROW_NUMBER MEDIAN
• Aggregate over a series of related rows
• Simplified function for complex statistical analytics over sliding window per row - Cumulative, moving or centered aggregates - Simple Statistical functions like rank, max, min,
average, median - More complex functions such as distribution,
percentile, lag, lead - Without running complex sub-queries
Windowing Functions
Source : InfiniDB SQL Syntax Guide
LAYOUT
Blank PowerPoint Default
Data exploration
Dataset Import
Data Visualization
Dataset Exploration Demo
LAYOUT
Thank You (Blue)
Thank you