19
© Smart Associates (USA) Inc, 2015 May Netezza Meetup NPS 7.2 Dash DB & Bluemix Fluid Query 1

April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

  • Upload
    lynhan

  • View
    230

  • Download
    3

Embed Size (px)

Citation preview

Page 1: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

© Smart Associates (USA) Inc, 2015

May Netezza MeetupNPS 7.2

Dash DB & BluemixFluid Query

1

Page 2: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

© Smart Associates (Aotearoa) Ltd, 2015

NPS7.2 Highlights

• Kerberos authentication support

• Workload management improvements for medium latency queries,

• Improved system views to gauge query/snippet progress and runtimes

• Improved CallHome support and configuration options

• New system view to track load progress status 

• GPFS mount support, for easy loading from or exporting to a Hadoop cluster

• Netezza Replication Services improvements

• New SQL language functions (including string_to_int, int_to_string, hex_to_binary, hex_to_geometry)

• Multi-stream nzrestore!

• New system configuration and CLI settings

2

Page 3: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

© Smart Associates (Aotearoa) Ltd, 2015

Workload Management Implications

• 2-60 second (configurable by host.schedMediumQueryLimitSecs registry setting) queries get scheduled to run ahead of estimated longer queries - accurate statistics very important!

• giving priority to shorter running queries improves throughput (by reducing scheduler conflicts), latency (by reducing queuing time), and GRA accuracy (by avoiding over or under-serving resource groups)

• can dramatically improve performance on busy systems (294 queries in 2 hrs vs. 539 queries in 1 hr)

3

Before After

Page 4: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

© Smart Associates (Aotearoa) Ltd, 2015

Improved Monitoring Implications

• Improved Performance Portal provides GRA, resource usage, query workload/throughput, nzlocal swap space usage, and hardware status on one page

• nzsqa progress -tr command provides some of the information available in _vt_snippet_progress to help diagnose query performance issues

• can see table/database/file names; number of rows loaded/rejected for ALL loads currently in progress by querying the new _v_load_status table

4

Page 5: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

© Smart Associates (Aotearoa) Ltd, 2015

Other NPS7.2 Implications

• need to upgrade clients to latest ODBC/JDBC drivers if you want to use Kerberos authentication

• after upgrading to 7.2, you should remove your existing call home events and configure the new events to define the latest list of event rules. Note: call home is disabled by default, needs to be enabled for automatic PMR creation with IBM.

• can create external tables that unload data directly to a query-able Hadoop cluster, or load from it - note GPFS support requires raising a support ticket in order to configure Hosts correctly

• Replication Services no longer require dedicated IBM hardware, and PTS servers can be configured for HA, along with numerous other improvements

• nzrestore is now a multi-stream operation. host.bnrRestoreStreamsDefault registry setting controls the default number of streams to use - if set to zero and the -streams AUTO parameter is used, it will default to the same as nzbackup

5

Page 6: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

© Smart Associates (Aotearoa) Ltd, 2015

dashDB

• Single user/multi-tenant cloud database service

• combines BLU in-memory, columnar storage with INZA, ESRI Spatial, R, functions on top

• no need to worry about hardware/VMs, o/s, networking, physical storage, availability/backups

• web-based monitoring, querying, and usage reporting

• can be integrated with other BlueMix services e.g. DataWorks, BigInsights

• Different data volume, performance, and pricing tiers

• Free <=1 GB

• US$50 a month <= 20 GB

• 64GB RAM, 16 vCPUs, <=1 TB raw

• 256GB RAM, 32 Cores, <=4 TB raw

• 256GB RAM, 32 Cores, <=12 TB raw (+data=-speed)

6

Page 7: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

Some Screenshots

Page 8: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

© Smart Associates (Aotearoa) Ltd, 2015

dashDB Implications

• The Good

• BLU provides fast query performance with minimal physical database design

• Mature DB2 capabilities (e.g. recursive SQL, PK/FK/Unique constraint enforcement)

• in-database analytic function support (e.g. Open Source R, ESRI Spatial)

• no need to GROOM

• no separate software license/maintenance fees or data transfer fees

• Potential Gotchas

• not a true ‘Netezza in the Cloud’ offering (although we can provide that if wanted)

• no external table load mechanism - can use DataWorks Bluemix component for bulk data loading, but requires coding

• different SQL syntax (e.g. no distribute/organize clauses) & ODBC/JDBC drivers

• Contact us if you need > 20GB storage

8

Page 9: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources
Page 10: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources
Page 11: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

…& it’s not just IBM services…Data Stores Development Tools Security Operations Support

Messaging Mobile Analytics

Cloud innovators are joining the IBM Cloud marketplace

CloudAMQP

Ac#ve&or&being&onboarded.&&More&joining&every&day…&

Business Support

Page 12: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

© Smart Associates (Aotearoa) Ltd, 2015

Bluemix Implications

• Faster, easier, cheaper to build, deploy, manage, & maintain applications than provisioning infrastructure en premise or via infrastructure as a service providers like Azure, Amazon, Google or developing bespoke equivalent functionality

• No hidden charges or billing surprises found with other service providers simply for accessing your own data, or making it highly available. Less administration, monitoring, & maintenance hassle too. Register at hub.jazz.net for DevOps & Git source code control of your Bluemix cloud applications.

• Allows you to preserve, leverage, & integrate with existing system investments - no need to duplicate existing infrastructure & data in the cloud to leverage it

• From a BI/DW perspective makes it very quick & easy to integrate, enrich, and analyse external data e.g. social graph, weather, demographic, etc. with internal customer, account, transactional information - without sacrificing functionality like data cleansing/profiling, security, etc.

• Not confined to or constrained by IBM’s offerings - the combination of IBM Cloud Marketplace and Bluemix Platform As A Service packaging represents the future of software development (which others are trying to copy)

12

Page 13: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

© Smart Associates (Aotearoa) Ltd, 2015

FluidQuery

• New, free PureData for Analytics extension, available now

• Provides remote/distributed SQL capability

• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

• totally transparent to end users and applications

• no need to copy/move data to/from your Hadoop cluster (either on premise or in the cloud) in order to query it and combine it with your existing Netezza data

• Can optionally be used for bulk data import/export also if necessary

• Requires some installation and configuration

13

Page 14: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

Supported Systems

Page 15: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

FluidQuery in Action

Page 16: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

© Smart Associates (Aotearoa) Ltd, 2015

FluidQuery Implications

• This is v1.0. Future releases could potentially add support for federated queries to other Netezza appliances; Apache SPARK; MAPR; dashDB; as well as improved semantics-based query routing; RESTful API-support; improved distributed workload management; and more….

• Note the dependency on INZA already being installed for FluidQuery to work

• Note the use of UDTF syntax in SQL statements to reference remote objects

• Can make remote objects appear just like local ones by encapsulating the UDTF call in a view, and then querying through the view instead

• Each remote source needs to be defined/configured to the FluidQuery Data Connector service using the fqConfigure.sh script

• Data Connector functions needs to be created in one or more Netezza databases using the fqRegister.sh script, and users need to be assigned permissions to invoke the functions if they want to use them e.g.

• create_inza_db_user.sh dc_db username

• grant execute on function_name (varchar(any),varchar(any)) to username/groupname;

16

Page 17: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

© Smart Associates (Aotearoa) Ltd, 2015

Local vs Remote Mode

• Local Mode

• connections enables users to run concurrent queries to different Hadoop service providers (but will need to register functions specific to each service provider)

• connections have a greater impact on the remote system than remote mode connections (as queries are effectively run twice)

• Remote Mode

• only allows connections to one Hadoop service at a time

• connections to other Hadoop services must use local mode, or you must stop the remote mode service and start a new one to a different Hadoop provider

• requires a service to be running which can be started by running the fqRemote.sh script or automatically when you run a remote mode function

• recommended for query users who plan to select and retrieve data directly from Hadoop objects

17

Page 18: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

© Smart Associates (Aotearoa) Ltd, 2015

Bulk Data Movement using FluidQuery

• Requires additional software installation and configuration (on both the PDA Hosts, and the remote Hadoop instance)

• Uses a command line tool with XML parameter files to perform the actual movement (i.e. not using SQL)

• Transfer can occur in three modes (set in the Compression properties section of the XML configuration files)

• Text mode - NPS tables are transferred in text format and saved to HDFS in text format

• Mixed mode - NPS tables are transferred in compressed format and saved to HDFS in text format

• Compressed mode - NPS tables are transferred in compressed format and saved to HDFS in compressed format

• Note that you can store data in compressed mode on Hadoop, but only for backup/restore purposes. Tables stored in the compressed format on Hadoop cannot be queried or modified

• Some data type translation will occur as a result (e.g. dates/timestamps stored on Hadoop as strings)

18

Page 19: April Netezza Meetupfiles.meetup.com/15407782/May Netezza Meetup.pdf• a single Netezza query can run in multiple physical locations, and combine data from local and remote data sources

© Smart Associates (Aotearoa) Ltd, 2015

Demonstration Videos

• dashDB Overview

• https://youtu.be/__yudj8whWk

• R Studio integration with dashDB

• https://youtu.be/Idq-24nD9DY

• ESRI Spatial integration with dashDB

• https://youtu.be/lVb0vWaokuU

• How BlueMix Works

• https://youtu.be/OD1NP-Yk2BI

• DataWorks Service

• https://www.youtube.com/playlist?list=PLmVWZ6sPOWL0ibI368j-7AQM-udSm9Aap

• BigInsights Service

• https://youtu.be/gcB7PKrVapw

19