PhoenixJames [email protected]
We put the SQL back in the NoSQL
Agenda
Completed
Phoenix OverviewPhoenix ImplementationPerformance AnalysisPhoenix RoadmapDemo
Phoenix Overview
Completed
SQL layer on top of HBaseDelivered as a embedded JDBC driverTargeting low latency queries over HBase dataColumns modeled as multi-part row key and key valuesQuery engine transforms SQL into series of scansUsing native HBase APIs and capabilities
Coprocessors for aggregationCustom filters for expression evaluationTransaction isolation through scan time rangeOptionally client-controlled timestamps
Open sourcing soon100% Java
Phoenix SQL Support
SELECT <expression>… FROM <table>WHERE <expression>GROUP BY <expression>…HAVING <aggregate expression>ORDER BY <aggregate expression>…LIMIT <value>
Aggregation Functions MIN, MAX, AVG, SUM, COUNT
Built-in Functions SUBSTR, ROUND, TRUNC, TO_CHAR, TO_DATE
Operators =,!=,<>,<,<=,>,>=, LIKE AND, OR, NOT
Bind Parameters ?, :#
CASE WHENIN (<value>…)DDL/DML (in progress)
CREATE/DROP <table> DELETE FROM <table> WHERE <expression> UPSERT INTO <table> [(<column>…)]
VALUES (<value>…)
Sample Queries
CompletedSELECT host, TRUNC(dateTime, 'DAY'), AVG(cache_hit), MIN(cache_hit), MAX(cache_hit)FROM server_metricsWHERE host LIKE 'cs11-%' AND dateTime> TO_DATE('2012-04-01') AND dateTime< TO_DATE('2012-07-01')GROUP BY host, TRUNC(dateTime, 'DAY')HAVING MIN(cache_hit) < 90ORDER BY host, AVG(cache_hit)
SELECT product_number, product_name, CASE WHEN list_price = 0 THEN 'Mfg item - not for resale' WHEN list_price < 50 THEN 'Under $50' WHEN list_price >= 50 and list_price < 250 THEN 'Under $250' WHEN list_price >= 250 and list_price < 1000 THEN 'Under $1000' ELSE 'Over $1000' END as price_categoryFROM product_catalogueWHERE product_category IN ('Camping', 'Hiking’)AND (product_name LIKE '%Pack’ OR product_name LIKE '% Cots %’)
Query Processing
FEATURERow Key
Key Values
ORG_ID DATE
TXNS
IO_TIME
RESPONSE_TIME
Product Metrics HTable
Scan Start key: ORG_ID (:1) + DATE (:2) End key: ORG_ID (:1) + DATE (:3)
Filter Filter: IO_TIME > 100
Aggregation Intercepts scan on region server Builds map of distinct FEATURE values Returns one row per distinct group Client does final merge
SELECT feature, SUM(txns)FROM product_metricsWHERE org_id = :1AND date >= :2 AND date <= :3AND io_time > 100GROUP BY feature
Phoenix Query Optimizations
Completed
Start/stop key of scan based on AND-ed columnsThrough SUBSTR, ROUND, TRUNC, LIKE
Parallelized on client by chunking over start/stop key of scanAggregation on region-servers through coprocessor
Inline for GROUP BY over row key ordered columnsIn memory map per group otherwise
WHERE clause executed through custom filtersIncremental evaluation with early terminationEvaluated through byte pointers
IN and OR over same column (in progress)Becomes batched get or filter with next row hint
Top N queries (future)Through coprocessor keeping top N rows
TABLESAMPLE (future)Becomes filter with next row hint
Phoenix Performance
Phoenix Performance
Completed
Phoenix Roadmap
Completed
Increase breadth of SQL supportDML/DDL (in progress)Derived tables (SELECT * FROM (SELECT foo FROM bar))More built-in functions: COALESCE, UPPER, TRIM More operators: ||, IS NULL, *,/,+,-
Secondary indexesMultiple projections for immutable data
Reordered columns in row keyDifferent levels of aggregation
Incrementally maintained for non immutable dataTABLESAMPLE for samplingImprove multi-byte supportJoins
Hash joinOLAP extensions
OVERPARTITION BY
Thank you!Questions/comments?