32
SQL-on-Accumulo with Pivotal HAWQ and PXF Agenda HAWQ & PXF Overview Accumulo Connector - Usage Accumulo Connector - Advanced Features PXF API Demo

Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

Embed Size (px)

DESCRIPTION

Pivotal Xtension Framework (PXF) support for Accumulo within HAWQ provides a fully-featured and native SQL interface to data stored in Accumulo. The Accumulo/PXF module works by intelligently extracting data from Accumulo through iterators and the Accumulo APIs to deliver data to HAWQ's SQL execution engine. Data extraction is fully parallel and utilizes query predicate push downs for an additional performance boost. Additionally, it natively supports Accumulo's security labels functionality. PXF is an external table interface in HAWQ, a SQL-on-Hadoop system, which allows you to read data stored within the Hadoop ecosystem. External tables can be used to load data into HAWQ from Hadoop and/or also query Hadoop data without materializing it into HAWQ PXF enables analysis of HAWQ data and Hadoop data in a single query. It supports a wide range of data formats such as Text, AVRO, Hive, Sequence, RCFile formats, HBase, and now Accumulo.

Citation preview

Page 1: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

SQL-on-Accumulo with Pivotal HAWQ and PXF

Agenda

• HAWQ & PXF Overview

• Accumulo Connector - Usage

• Accumulo Connector - Advanced Features

• PXF API

• Demo

Page 2: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

HAWQ is…

A parallel SQL query engine on Hadoop

Page 3: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

PHD

Page 4: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

PHD

Page 5: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

PHD

Page 6: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

PHD

Page 7: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

PXF is...

A fast extensible framework connecting HAWQ to a data store of choice that

exposes a parallel API

Page 8: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

PHD

dire

ct an

alytics

PXF

Page 9: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

PHD

ind

irect a

na

lytics

PXF

Page 10: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

Usage

CREATE EXTERNAL TABLE <table>(<col list>)LOCATION (‘pxf://rest_host:port/<data source>?<plugin options>’)FORMAT ‘<type>’(<params>)[SEGMENT REJECT LIMIT <n> [ROWS|PERCENT] LOG ERRORS INTO <err_t>]

-- direct analytics (external)SELECT <…> FROM <table> WHERE <…>

-- indirect analytics (internal)INSERT INTO <hawq table> SELECT <…> FROM <table> WHERE <…>

Any SQL operation (joining, aggregates, sorting, etc) can be executed

Page 11: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

Accumulo Connector - Usage

CREATE EXTERNAL TABLE <table>(<col list>)LOCATION (‘pxf://…/<accumulo table name>?profile=accumulo’)FORMAT ‘custom’(formatter=‘pxfwritable_import’)

CREATE EXTERNAL TABLE t(recordkey text, “cf1:date” date, “cf1:price” double)

LOCATION (‘pxf://…/instance:sales?profile=accumulo’)FORMAT ‘custom’(formatter=‘pxfwritable_import’)

-- Example of a simple querySELECT “cf1:date”, max(“cf1:price”) FROM tGROUP BY “cf1:date”

Page 12: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

Accumulo Connector - Advanced Features

Smart filtering with predicate pushdownExcluding irrelevant tablets and filtering on values on source according to HAWQ’s query WHERE clause.

Error tables for logging badly formatted data and avoid aborting the querySpecify desired error threshold. Query the error table after operation to see the rejected data and the related error.

Lookup table for easy access to non textual qualifiersDefine a qualifier lookup table that translates between Accumulo style naming and SQL style naming.

Automatic Statistics for better join planningRun ANALYZE on a PXF-Accumulo table to update HAWQ’s optimizer with table and attribute level statistics from the Accumulo table.

Mechanism for storing remote credentialsThe mapping between a HAWQ user credentials and Accumulo user credentials are entered once in HAWQ and automatically transferred to the Accumulo connector in runtime.

Page 13: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

Accumulo Connector - Advanced Features

Visibility labels for enhanced securityThe Accumulo connector utilizes Accumulo’s built in cell-level security to ensure users are only able to view information for which they have been granted access.

Custom Iterators for increased performancePredicate pushdown is implemented using stackable custom Iterators which increase comparison operation (<, <=, >, >=, ==, !=) performance in a query’s WHERE clause.

Intelligent range filteringSpecifying a comparison on a recordkey will modify the Accumulo Connector’s range, minimizing the amount of data scanned, resulting in faster scans.

Automatic type detectionData types are detected automatically within the iterator, ensuring correct comparison operations are being utilized.

Page 14: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

PXF API

• Fragmenter – returns a list of data source fragments and their location

• Accessor – access a given list of fragments, read them and return records

• Resolver – deserialize each record according to a given schema or technique

Distributedexecutionthreads

Distributeddatabaseservers

Page 15: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

PXF API

• AccumuloFragmenter returns a list of Accumulo tablets+locations for a given table

• AccumuloAccessor access a given list of fragments, read them and return Accumulo records. Use filter pushdown when possible

• AccumuloResolver convert each qualifier value into something that can be understood by HAWQ

Page 16: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

Live Demo

Page 17: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

Accumulo Table Contents

Page 18: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

User Authorizations

Page 19: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

$PHD_ROOT/conf/pxf-profiles.xml

Page 20: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

Define Table in HAWQ

Page 21: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

Setting Authorizations

Page 22: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

Executing a Simple Query

Page 23: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

A Query With a Single Pushdown Filter

Page 24: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

A Query With a Single Pushdown Filter

Page 25: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

A Query With a Multiple Pushdown Filters

Page 26: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

A Query With a Multiple Pushdown Filters

Page 27: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

A Query With a Multiple Pushdown Filters

Page 28: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

Setting Authorizations

Page 29: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

Executing a Query as ‘foo’

Page 30: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

Define a Lookup Table in Accumulo

Page 31: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

Define a Lookup Table in HAWQ

Page 32: Accumulo Summit 2014: SQL-on-Accumulo with Pivotal HAWQ and PXF

Performing a Simple Query