20
© 2014 IBM Corporation Backup Options IBM PureData™ System for Analytics, powered by Netezza Tony Pearson – IBM Master Inventor and Senior IT Specialist March 2014

Backup Options for IBM PureData for Analytics powered by Netezza

Embed Size (px)

DESCRIPTION

Confused about what options there are to backup your Netezza or IBM PureData for Analytics solution? This presentation provides alternatives related to file system and external backup software approaches using IBM Storwize V7000 Unified and IBM Tivoli Storage Manager

Citation preview

Page 1: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

Backup Options IBM PureData™ System for Analytics, powered by Netezza

Tony Pearson – IBM Master Inventor and Senior IT Specialist

March 2014

Page 2: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

2

Part of the IBM Big Data PlatformWorkload Optimized Solutions for All Your Analytic Needs

Analytics & Decision Management

Solutions

Big Data Infrastructure

IBM Big Data Platform

Accelerators

Information Integration & Governance

Visualization& Discovery

Application Development

Systems Management

Stream Computing

HadoopSystem

Data Warehouse

PureDataSystem for Analytics

Page 3: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

33

Spend Less Time Managing and More Time Innovating

� No dbspace/tablespace sizing and configuration

� No redo/physical/Logical log sizing and configuration

� No page/block sizing and configuration for tables

� No extent sizing and configuration for tables

� No Temp space allocation and monitoring

� No RAID level decisions for dbspaces

� No logical volume creations of files

� No integration of OS kernel recommendations

� No maintenance of OS recommended patch levels

� No JAD sessions to configure host/network/storage

� No dbspace/tablespace sizing and configuration

� No redo/physical/Logical log sizing and configuration

� No page/block sizing and configuration for tables

� No extent sizing and configuration for tables

� No Temp space allocation and monitoring

� No RAID level decisions for dbspaces

� No logical volume creations of files

� No integration of OS kernel recommendations

� No maintenance of OS recommended patch levels

� No JAD sessions to configure host/network/storage

Data Experts, not

Database Experts

� Easy Administration Portal

� No software installation

� No indexes and tuning

� No storage administration

IBM’s Advantage--FPGA

� A Real-Time silicon

SQL accelerator

� Dynamically

reprogrammed for each

individual query.

� Eradicates ~95% of

system I/O before the

CPU ever sees it.

� Completely unique to

PDA.

Simplicity and

Ease of

Administration

Simplicity and

Ease of

Administration

Page 4: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

4

PureData System for Analytics Hardware Overview: Model N200x

� User Data Capacity: 192 TB*� Data Scan Speed: 478 TB/hr*� Load Speed (per system): 5+ TB/hr

� Active Data Slices: 96� Power Requirements: 7.5 kW� Cooling Requirements: 27,000 BTU/hr

* Assuming 4X compression

Scales from 1/4 Rack to 4

Racks

2 Hosts (Active-Passive)� 2 Intel 2.7 GHz Sandy Bridge CPUs� 7x300 GB SAS Drives� Red Hat Linux 6 64-bit

7 PureData for Analytics S-Blades™� 2 Intel 8 Core 2+ GHz CPUs� 2 8-Engine Xilinx Virtex-6 FPGAs� 128 GB RAM + 8 GB slice buffer� Linux 64-bit Kernel

12 Disk Enclosures� 288 600 GB SAS2 Drives

• 240 for User Data• 14 for S-Blades• 34 Spare

� RAID 1 Mirroring

Page 5: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

5

IBM PureData for Analytics – Reasons for Backup

� IBM will take care of Red Hat Enterprise Linux,

Web Admin and other code as needed

–No need for you to back it up yourself

Firmware

• Linux

• Code

Metadata

• Host Catalog

• Global users, groups, permissions

User Data

Database 1

• Table A

• Table B

Database 2

• Table X

• Table Y

• Table Z

� Backup this to protect host

configuration from data

corruption (rare)

� Various reasons to backup database schema

and contents

–As part of firmware upgrade/downgrade

–To transfer data to another system

–Protect against hardware failure / disaster

–Protect against data corruption

Page 6: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

6

Compressed versus Text-format

Firmware + 1

User Data

Database 1

• Table A

• Table B

Database 2

• Table X

• Table Y

• Table Z

Firmware

Firmware -1

Firmware

Compressed

database backup

Compressed

external tables

Text-format

external table

Other

Database

systems

Upgrade

Downgrade

Restore to same

or higher firmware

Restore to any,

but slower, takes

up more space

Page 7: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

7

Two Primary Approaches

1. Filesystem Approach

� Backup metadata and databases to

external NAS storage devices

� Built-in CLI commands included

� Scripts for large databases available

2. External Backup Software

� Backup metadata and databases to

external backup server/media

� User-initiated and Automatic scheduled

backups

� Supports disk, tape and virtual tape

storage devices

Metadata

• Host Catalog

• Global users,

groups,

permissions

User Data

Database 1

• Table A

• Table B

Database 2

• Table X

• Table Y

• Table Z

Page 8: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

8

Network Configuration using SAN or LAN as Backup Network

Metadata

• Host Catalog

• Global users,

groups,

permissions

User Data

Database 1

• Table A

• Table B

Database 2

• Table X

• Table Y

• Table Z

User

Network

• nzhostbackup

• nzbackup -users

• nzbackup –db• (up to 16

multiple streams)

• CREATE

EXTERNAL

TABLE

• nz_backup script

for larger databases

External storage

device

Backup

Network

Page 9: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

9

Proof-of-Concept (PoC) Configuration

� Storwize V7000 Unified comprising

–Two file modules (2073-700)

–One V7000 control enclosure (2076-324)

–Code level 1.4.0.1

� File modules connected via 4 x 10 Gbit interfaces

� 24 x 600 GB 10K SAS drives installed in V7000 control enclosure

� Test database:

Page 10: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

10

Test Conclusion / Best Practices

4 NSD 8 NSD 20

NSD

2 NSD 10

NSD

3 NSD 6 NSD 8 NSD

4 x RAID-5 4+P 2 x RAID-5 8+P 3 x RAID-10 4+4

0

50

100

150

200

250

300

350

400

450

500

MB

/ s

ec

* ~1.7 TB/h compressed data

� Matching the GPFS block size

to RAID full stripe width is

beneficial

� Matching the number of NSDs

to number of RAIDs is beneficial

� When matching number of

NSDs to number of RAIDs,

usage of sequential NSDs is

beneficial

� Small RAID-5 arrays (4+P) with

the matching number of NSDs

and mdisks (RAIDs) and 2

mount points shows best

performance (multiple streams)

� Supports both

nzbackup/nzrestore CLI and

nz_backup/nz_restore scripts

6+ TB/h

uncompressed data *

�Focusing on backup performance

– Run multiple backup streams

�Focusing on restore performance

– Run single backup stream

Page 11: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

11

Two Primary Approaches

1. Filesystem Approach

� Backup metadata and databases to

external NAS storage devices

� Built-in CLI commands included

� Scripts for large databases available

2. External Backup Software

� Backup metadata and databases to

external backup server/media

� User-initiated and Automatic scheduled

backups

� Supports disk, tape and virtual tape

storage devices

Metadata

• Host Catalog

• Global users,

groups,

permissions

User Data

Database 1

• Table A

• Table B

Database 2

• Table X

• Table Y

• Table Z

Page 12: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

12

Network Configuration

Metadata

• Host Catalog

• Global users,

groups,

permissions

User Data

Database 1

• Table A

• Table B

Database 2

• Table X

• Table Y

• Table Z

User

Network

Backup

Network

• nzhostbackup to local file

• transfer to backup server

• Nzbackup –users

• nzbackup –db• (up to 1000

multiple streams)

• Specify

–connector –connectorArgs

• Create scripts for

automatic schedule

External Backup

server

Page 13: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

13

External Backup Architecture

Client

code

Backup Server

Master

Catalog

Media

Management

SAN

Storage Hierarchy

•Disk

•Physical Tape

•Virtual Tape

IBM Tivoli Storage Manager (TSM) server

LAN

Page 14: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

14

External Backup Architecture – TSM Proxy Node

Proxy

node

Backup Server

Master

Catalog

Media

Management

SAN

Storage Hierarchy

•Physical Tape

•Virtual Tape

LAN

Proxy node

• Sends data directly to

physical or virtual tape

over SAN fabric

• Registers copies with

Master Catalog

• Can support multiple

PureData for Analytics

systems

TSM client code sends

backup to Proxy node

TSM server manages

media, tape reclamation,

backup copy pools, etc.XBSA

code

LAN Free

Storage

agent

Page 15: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

15

External Backup Architecture – TSM LAN Free

XBSA

code

Backup Server

Master

Catalog

Media

Management

SAN

Storage Hierarchy

•Physical Tape

•Virtual Tape

LAN

TSM client code sends

backups directly to

physical or virtual tape

over SAN fabric

TSM client code registers

backup copies with Master

Catalog

TSM server manages

media, tape reclamation,

backup copy pools, etc.

LAN Free

• Avoids congestion traffic

on LAN by using SAN

directly

• Will consume more CPU

resources on PureData

for Analytics system

LAN Free

Storage

agent

Page 16: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

16

Summary

1.Use Filesystem Method with SAN or

NAS storage device such as

Storwize V7000 Unified

2.Use IBM Tivoli Storage Manager

server infrastructure to backup

PureData for Analytics systems

Page 17: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

17

Page 18: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

18

About the Speaker

Mr. Tony Pearson

Master Inventor,

Senior Managing Consultant

IBM System Storage

Tony Pearson is a Master Inventor and Senior IT storage consultant for the IBM System Storage™ product line.

Tony Pearson joined IBM Corporation in 1986 in Tucson, Arizona, USA, and has lived there ever since. Over the past years, Tony has worked in

development, marketing and customer care positions for various storage hardware and software products.

In his current role, Tony presents briefings on storage topics covering the entire System Storage product line, as well as various storage software

products. He interacts with clients, speaks at conferences and events, and leads workshops to help clients with strategic planning for IBM’s integrated

set of storage management software, hardware, and virtualization products.

Tony writes the “Inside System Storage” blog, which is read by hundreds of clients, IBM sales reps and IBM Business Partners every week. This blog

was rated one of the top 10 blogs of 2006 for the IT storage industry by “Networking World” magazine. The blog was published in book form as Inside

System Storage: Volume I and Volume II , both available from Lulu publishing.

Tony has a Bachelor of Science degree in Software Engineering, and a Master of Science degree in Electrical Engineering, both from the University

of Arizona. Tony holds 19 IBM patents for inventions on storage hardware and software products.

9000 S. Rita Road

Bldg 9070 Mail 9070

Tucson, AZ 85744

+1 520-799-4309 (Office)

[email protected]

Tony Pearson

Master Inventor,

Senior Managing

Consultant

IBM System Storage™

Page 19: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

19

Additional Resources

19

Email:[email protected]

Twitter:http://twitter.com/az99Øtony

Blog: http://ibm.co/brAeZØ

Books:http://www.lulu.com/spotlight/99Ø_tony

IBM Expert Network:http://www.slideshare.net/az99Øtony

19

Page 20: Backup Options for IBM PureData for Analytics powered by Netezza

© 2014 IBM Corporation

IBM PureData for Analytics powered by Netezza – Backup Options

20

Trademarks and disclaimers© IBM Corporation 2011. All rights reserved.

References in this document to IBM products or services do not imply that IBM intends to make them available in every country.

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.

Other product and service names might be trademarks of IBM or other companies. Information is provided "AS IS" without warranty of any kind.

The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.

Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.

All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.

Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business Partner for the most current pricing in your geography.

Photographs shown may be engineering prototypes. Changes may be incorporated in production models.

Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml.

ZSP03490-USEN-00