James Taylor jtaylor@salesforce

Preview:

DESCRIPTION

James Taylor jtaylor@salesforce.com. Phoenix. We put the SQL back in the NoSQL. Agenda. Phoenix Overview Phoenix Implementation Performance Analysis Phoenix Roadmap Demo. Completed. Phoenix Overview. SQL layer on top of HBase Delivered as a embedded JDBC driver - PowerPoint PPT Presentation

Citation preview

PhoenixJames Taylorjtaylor@salesforce.com

We put the SQL back in the NoSQL

Agenda

Completed

Phoenix OverviewPhoenix ImplementationPerformance AnalysisPhoenix RoadmapDemo

Phoenix Overview

Completed

SQL layer on top of HBaseDelivered as a embedded JDBC driverTargeting low latency queries over HBase dataColumns modeled as multi-part row key and key valuesQuery engine transforms SQL into series of scansUsing native HBase APIs and capabilities

Coprocessors for aggregationCustom filters for expression evaluationTransaction isolation through scan time rangeOptionally client-controlled timestamps

Open sourcing soon100% Java

Phoenix SQL Support

SELECT <expression>… FROM <table>WHERE <expression>GROUP BY <expression>…HAVING <aggregate expression>ORDER BY <aggregate expression>…LIMIT <value>

Aggregation Functions MIN, MAX, AVG, SUM, COUNT

Built-in Functions SUBSTR, ROUND, TRUNC, TO_CHAR, TO_DATE

Operators =,!=,<>,<,<=,>,>=, LIKE AND, OR, NOT

Bind Parameters ?, :#

CASE WHENIN (<value>…)DDL/DML (in progress)

CREATE/DROP <table> DELETE FROM <table> WHERE <expression> UPSERT INTO <table> [(<column>…)]

VALUES (<value>…)

Sample Queries

CompletedSELECT host, TRUNC(dateTime, 'DAY'), AVG(cache_hit), MIN(cache_hit), MAX(cache_hit)FROM server_metricsWHERE host LIKE 'cs11-%' AND dateTime> TO_DATE('2012-04-01') AND dateTime< TO_DATE('2012-07-01')GROUP BY host, TRUNC(dateTime, 'DAY')HAVING MIN(cache_hit) < 90ORDER BY host, AVG(cache_hit)

SELECT product_number, product_name, CASE WHEN list_price = 0 THEN 'Mfg item - not for resale' WHEN list_price < 50 THEN 'Under $50' WHEN list_price >= 50 and list_price < 250 THEN 'Under $250' WHEN list_price >= 250 and list_price < 1000 THEN 'Under $1000' ELSE 'Over $1000' END as price_categoryFROM product_catalogueWHERE product_category IN ('Camping', 'Hiking’)AND (product_name LIKE '%Pack’ OR product_name LIKE '% Cots %’)

Query Processing

FEATURERow Key

Key Values

ORG_ID DATE

TXNS

IO_TIME

RESPONSE_TIME

Product Metrics HTable

Scan Start key: ORG_ID (:1) + DATE (:2) End key: ORG_ID (:1) + DATE (:3)

Filter Filter: IO_TIME > 100

Aggregation Intercepts scan on region server Builds map of distinct FEATURE values Returns one row per distinct group Client does final merge

SELECT feature, SUM(txns)FROM product_metricsWHERE org_id = :1AND date >= :2 AND date <= :3AND io_time > 100GROUP BY feature

Phoenix Query Optimizations

Completed

Start/stop key of scan based on AND-ed columnsThrough SUBSTR, ROUND, TRUNC, LIKE

Parallelized on client by chunking over start/stop key of scanAggregation on region-servers through coprocessor

Inline for GROUP BY over row key ordered columnsIn memory map per group otherwise

WHERE clause executed through custom filtersIncremental evaluation with early terminationEvaluated through byte pointers

IN and OR over same column (in progress)Becomes batched get or filter with next row hint

Top N queries (future)Through coprocessor keeping top N rows

TABLESAMPLE (future)Becomes filter with next row hint

Phoenix Performance

Phoenix Performance

Completed

Phoenix Roadmap

Completed

Increase breadth of SQL supportDML/DDL (in progress)Derived tables (SELECT * FROM (SELECT foo FROM bar))More built-in functions: COALESCE, UPPER, TRIM More operators: ||, IS NULL, *,/,+,-

Secondary indexesMultiple projections for immutable data

Reordered columns in row keyDifferent levels of aggregation

Incrementally maintained for non immutable dataTABLESAMPLE for samplingImprove multi-byte supportJoins

Hash joinOLAP extensions

OVERPARTITION BY

Demo

Completed

Time-series database charting

http://goo.gl/61WRs

Thank you!Questions/comments?

Recommended