Netezza Data Loading Guide

Embed Size (px)

Citation preview

  • Netezza Data Loading GuideDocument Number: 20525-1 Rev. 1Software Release: 6.0.xRevised: October 6, 2010Netezza Corporation

    Corporate Headquarters26 Forest St., Marlborough, Massachusetts 01752tel 508.382.8200 fax 508.382.8300 www.netezza.com

  • The specifications and information regarding the products described in this manual are subject to change without notice. All statements, information, and recommendations in this manual are believed to be accurate.

    Netezza makes no representations or warranties of any kind, express or implied, including, without limitation, those of merchantability, fitness for a partic-ular purpose, and non infringement, regarding this manual or the products' use or performance. In no event will Netezza be liable for indirect, incidental, consequential, special, or economic damages (including lost business profits, business interruption, loss or damage of data, and the like) arising out of the use or inability to use this manual or the products, regardless of the form of action, whether in contract, tort (including negligence), breach of warranty, or otherwise, even if Netezza has been advised of the possibility of such damages.

    Netezza, the Netezza logo, Netezza TwinFin, TwinFin, Snippet Blades, S-Blades, NPS, Snippet, Snippet Processing Unit, SPU, Snippet Processing Array, SPA, Performance Server, Netezza Performance Server, Asymmetric Massively Parallel Processing, AMPP, Intelligent Query Streaming and other marks are trademarks or registered trademarks of Netezza Corporation in the United States and/or other countries. All rights reserved.

    Red Hat is a trademark or registered trademark of Red Hat, Inc. in the United States and/or other countries.

    Linux is a trademark or registered trademark of Linus Torvalds in the United States and/or other countries.

    D-CC, D-C++, Diab+, FastJ, pSOS+, SingleStep, Tornado, VxWorks, Wind River, and the Wind River logo are trademarks, registered trademarks, or service marks of Wind River Systems, Inc. Tornado patent pending.

    APC and the APC logo are trademarks or registered trademarks of American Power Conversion Corporation.

    All document files and software of the above named third-party suppliers are provided "as is" and may contain deficiencies. Netezza and its suppliers dis-claim all warranties of any kind, express or implied, including, without limitation, those of merchantability, fitness for a particular purpose, and non infringement.

    In no event will Netezza or its suppliers be liable for indirect, incidental, consequential, special, or economic damages (including lost business profits, busi-ness interruption, loss or damage of data, and the like), or the use or inability to use the above-named third-party products, even if Netezza or its suppliers have been advised of the possibility of such damages.

    All other trademarks mentioned in this document are the property of their respective owners.

    Document Number: 20525-1

    Software Release Number: 6.0.x

    Netezza Data Loading Guide

    Copyright 2001-2010 Netezza Corporation.

    All rights reserved.

    PostgreSQL

    Portions of this publication were derived from PostgreSQL documentation. For those portions of the documentation that were derived originally from Postgr-eSQL documentation, and only for those portions, the following applies:

    PostgreSQL is copyright 1996-2001 by the PostgreSQL global development group and is distributed under the terms of the license of the University of California below.

    Postgres95 is copyright 1994-5 by the Regents of the University of California.

    Permission to use, copy, modify, and distribute this documentation for any purpose, without fee, and without a written agreement is hereby granted, pro-vided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies.

    In no event shall the University of California be liable to any party for direct, indirect, special, incidental, or consequential damages, including lost profits, arising out of the use of this documentation, even if the University of California has been advised of the possibility of such damage.

    The University of California specifically disclaims any warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The documentation provided hereunder is on an "as-is" basis, and the University of California has no obligations to provide maintenance, support, updates, enhancements, or modifications.

    ICU Library

    The Netezza implementation of the ICU library is an adaptation of an open source library Copyright (c) 1995-2003 International Business Machines Corpo-ration and others.

    ICU License - ICU 1.8.1 and laterCOPYRIGHT AND PERMISSION NOTICE

    Copyright (c) 1995-2003 International Business Machines Corporation and othersAll rights reserved.

    Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, provided that the above copyright notice(s) and this permission notice appear in all cop-ies of the Software and that both the above copyright notice(s) and this permission notice appear in supporting documentation.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRAN-TIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAM-AGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

    Except as contained in this notice, the name of a copyright holder shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Software without prior written authorization of the copyright holder.

    ODBC Driver

    The Netezza implementation of the ODBC driver is an adaptation of an open source driver, Copyright 2000, 2001, Great Bridge LLC. The source code for this driver and the object code of any Netezza software that links with it are available upon request to [email protected]

  • Botan License

    Copyright (C) 1999-2008 Jack Lloyd

    2001 Peter J Jones2004-2007 Justin Karneges2005 Matthew Gregan2005-2006 Matt Johnston2006 Luca Piccarreta2007 Yves Jerschow2007-2008 FlexSecure GmbH2007-2008 Technische Universitat Darmstadt2007-2008 Falko Strenzke2007-2008 Martin Doering2007 Manuel Hartl2007 Christoph Ludwig2007 Patrick Sona

    All rights reserved.

    Redistribution and use in source and binary forms, for any use, with or without modification, of Botan (http://botan.randombit.net/license.html) is permitted provided that the following conditions are met:

    1. Redistributions of source code must retain the above copyright notice, this list of conditions, and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions, and the following disclaimer in the documentation and/or other materials provided with the distribution.

    THIS SOFTWARE IS PROVIDED BY THE AUTHOR(S) "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ARE DISCLAIMED.

    IN NO EVENT SHALL THE AUTHOR(S) OR CONTRIBUTOR(S) BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CON-SEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBIL-ITYOF SUCH DAMAGE.

    Regulatory Notices

    Install the NPS system in a restricted-access location. Ensure that only those trained to operate or service the equipment have physical access to it. Install each AC power outlet near the NPS rack that plugs into it, and keep it freely accessible.

    Provide approved 30A circuit breakers on all power sources.

    Product may be powered by redundant power sources. Disconnect ALL power sources before servicing.

    High leakage current. Earth connection essential before connecting supply. Courant de fuite lev. Raccordement la terre indispensable avant le raccor-dement au rseau.

    FCC - Industry Canada Statement

    This equipment has been tested and found to comply with the limits for a Class A digital device, pursuant to part 15 of the FCC rules. These limits are designed to provide reasonable protection against harmful interference when the equipment is operated in a commercial environment. This equipment gen-erates, uses, and can radiate radio-frequency energy and, if not installed and used in accordance with the instruction manual, may cause harmful interference to radio communications. Operation of this equipment in a residential area is likely to cause harmful interference, in which case users will be required to correct the interference at their own expense.

    This Class A digital apparatus meets all requirements of the Canadian Interference-Causing Equipment Regulations.

    Cet appareil numrique de la classe A respecte toutes les exigences du Rglement sur le matriel brouilleur du Canada.

    WEEE

    Netezza Corporation is committed to meeting the requirements of the European Union (EU) Waste Electrical and Electronic Equipment (WEEE) Directive. This Directive requires producers of electrical and electronic equipment to finance the takeback, for reuse or recycling, of their products placed on the EU market after August 13, 2005.

    CE Statement (Europe)

    This product complies with the European Low Voltage Directive 73/23/EEC and EMC Directive 89/336/EEC as amended by European Directive 93/68/EEC.

    Warning: This is a class A product. In a domestic environment this product may cause radio interference in which case the user may be required to take adequate measures.

    VCCI Statement

    VCCI A

  • Table of Contents

    Preface

    1 OverviewData Loading Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

    Data Loading Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2

    New Decimal Delimiter Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2

    2 External TablesAbout External Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1

    Privileges Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2

    Displaying External Table Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2

    Log Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2

    Usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2

    Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3

    Backing Up and Restoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4

    Command Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4

    Transient External Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4

    Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4

    Explicit Schema Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5

    Implicit Schema Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5

    Exporting Data Using Transient External Tables . . . . . . . . . . . . . . . . . . . . . . . . . 2-5

    Remote Transient External Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5

    Supported Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6

    Integer Data Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7

    Fixed-Point Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7

    Floating-Point Data Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8

    Character Strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10

    Time Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11

    Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13

    Best Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13

    Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15

    Transient External Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15

    Fixed-Length Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16

    Standard Unloading and Reloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16v

  • Back up and Restore a User Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17

    3 External Table OptionsOptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1

    Option Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3

    BoolStyle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3

    Compress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4

    CRinString . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4

    CtrlChars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4

    DataObject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4

    DateDelim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5

    DateStyle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5

    DecimalDelim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6

    Delimiter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6

    Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7

    EscapeChar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7

    FillRecord. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8

    Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8

    IgnoreZero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8

    IncludeZeroSeconds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8

    Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8

    LogDir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9

    MaxErrors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9

    MaxRows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9

    NullValue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9

    QuotedValue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10

    RecordDelim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10

    RecordLength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11

    RemoteSource. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11

    RequireQuotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11

    SkipRows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11

    SocketBufSize. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12

    TimeDelim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12

    TimeRoundNanos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12

    TimeStyle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12

    TruncString. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12

    Y2Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13vi

  • Option Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13

    Counting Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13

    Handling Bad Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14

    Delineating Input Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14

    Matching Input Fields to Table Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14

    Using String and Non-string Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14

    Handling the Absence of a Value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14

    Enabling Load Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15

    Handling Legal Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15

    Session Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16

    4 Using nzloadHow the nzload Command Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1

    Protection and Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1

    Concurrency and Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2

    Program Invocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2

    Using the nzload Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2

    Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2

    Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3

    Additional Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3

    Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4

    Using a Control File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5

    Configuration File Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6

    5 Unloading DataUnloading Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1

    Unloading Data to a Remote Client System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2

    6 Using Fixed-Length FormatFormatting Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1

    Fixed-Length Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1

    Data Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2

    Format Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2

    New Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2

    Changed Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3

    Unsupported Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3

    Default Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4

    Layout Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4vii

  • Building the Fixed-Length Format Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6

    End-of-Record. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6

    Record Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6

    Skipping Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6

    Temporal Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6

    Numeric Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7

    Logical Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8

    Null Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8

    Appendix A: Examples and GrammarThe nzload Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1

    Specifying nzload Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1

    Using Named Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2

    Sample nzload Usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2

    Reference Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4

    Decimal Delimiter Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4

    SQL Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5

    Fixed-Length Format Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-7

    Appendix B: TroubleshootingTips for Successful Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1

    Create Your Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1

    Determine Your Data Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1

    Consider the Load Source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2

    Run the Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2

    Troubleshoot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3

    Handle Exceptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3

    Validate the Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3

    Generate Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3

    Test Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3

    nzload Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4

    Reporting Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4

    Understanding nzload Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4

    Appendix C: Option NamesSpecifying Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-1

    Indexviii

  • PrefaceThe Netezza Data Loading Guide describes the Netezza functionality for data loading.

    Audience

    The Netezza Data Loading Guide is written for administrators using data loading features.

    About this Guide

    This guide contains the following information:

    Symbols and Conventions

    This guide uses the following typographical conventions:

    X Italics for emphasis on terms and user-defined values such as user input X Upper case for SQL commands; for example INSERT, DELETEX Bold for command line input; for example, nzsystem stop

    If You Need Help

    If you are having trouble using the Netezza appliance, you should:

    1. Retry the action, carefully following the instructions given for that task in the documentation.

    Topics See the following

    Introduction to Data Loading Concepts and Terms Chapter 1, Overview

    How to use External Tables Chapter 2, External Tables

    External Table options to use, and how the sys-tem processes them

    Chapter 3, External Table Options

    Details on the nzload command Chapter 4, Using nzload

    Details on unloading data Chapter 5, Unloading Data

    Details on the Fixed-Length format Chapter 6, Using Fixed-Length Format

    Examples of commands, format, and usage Appendix A, Examples and Grammar

    Command and Task Tips Appendix B, Troubleshooting

    How to enter external table options on the com-mand line, in a control file, or in a SQL command

    Appendix C, Option Namesix

  • 2. Go to the Netezza Knowledge Base at https://knowledge.netezza.com. Enter your sup-port username and password. You can search the knowledge base or the latest updates to the product documentation. Click Netezza HelpDesk to submit a support request.

    3. If you are unable to access the Netezza Knowledge Base, you can also contact Netezza Support at the following telephone numbers:

    S North American Toll-Free: +1.877.810.4441S United Kingdom Free-Phone: +0.800.032.8382S International Direct: +1.508.620.2281

    Refer to your Netezza maintenance agreement for details about your support plan choices and coverage.

    Netezza Welcomes Your Comments

    Let us know what you like or dislike about our manuals. To help us with future versions of our manuals, we want to know about any corrections or clarifications that you would find useful.

    Include the following information:

    X The name and version of the manual that you are usingX Any comments that you have about the manualX Your name, address, and phone numberSend us an e-mail message at the following address: [email protected]

    The doc alias is reserved exclusively for reporting errors and omissions in our documentation.

    We appreciate your suggestions. x

  • C H A P T E R 1Overview

    Whats in this chapterX Data Loading ComponentsX Data Loading FormatsX New Decimal Delimiter OptionThis chapter provides general information about the data loading methods now available.

    Note that loading data takes a significant allocation of system resources, which may affect performance.

    Data Loading Components

    Within the Netezza environment, data loading means simply to transfer data to the Netezza appliance. Within this framework, there are a number of components:

    X External Tables These are tables stored as flat files on the host or client systems and not in the Netezza appliance database. These tables can be used to load data into the Netezza appliance. For more information, see Chapter 2, External Tables.

    X nzload This is a command that provides an easy method for using external tables and getting data into the Netezza appliance. For more information, see Chapter 4, Using nzload.

    X Format Options These are options for formatting the data load to and from external tables. Since data comes in different forms, Netezza provides different ways of setting up the load. For more information, see Chapter 4, Using nzload, and Chapter 6, Using Fixed-Length Format.

    X Backup and Restore There are different methods for doing backups and restores to transfer data between systems. One method is to create external tables and use nzload, described in Chapter 2, External Tables, and Chapter 4, Using nzload. For more information on backups and restores, see Backing Up and Restoring Databases in the Netezza Performance Server System Administrators Guide.1-1

    X nz_migrate This is a separate tool, not part of the Netezza software package. This utility is a script that can migrate (copy) a database/table from one Netezza appliance to another, or make a copy of a database/table on the same server. Run the following command to see the help explanation text for the command, showing syntax and usage:

    nz_migrate -?

  • Netezza Data Loading GuideData Loading Formats

    In the database environment, there is always the need to load data from external sources such as files, pipes, or sockets into a table. These external sources have a variety of formats to represent each of the data types individually, and together as records or rows.

    When you load data from database-like applications, such as an RDBMS, a Web-server, or some other structured data-store, they may export data into files or streams in different for-mats. The following formats are used with the Netezza environment:

    X Text-Delimited The method commonly used for data loading is Text-Delimited format, where every value of a field or column ends with a delimiter, and each set of these val-ues of rows or records has an end-of-record delimiter, typically a new-line character. Previously, this has been the preferred method used for loading data into external tables.

    X Fixed-Length The new format, which allows for a more expressive form of external table definition, thus increasing the kinds of data formats and layouts that can be loaded.

    X Compressed Binary This Netezza proprietary format compresses the data before a backup or restore to benefit performance. It typically yields smaller data files, retains information about the Netezza appliance topology, and thus is often faster to backup and restore. Compress the data before loading, and uncompress before unloading. For more information, see the Netezza Performance Server System Administrators Guide.

    New Decimal Delimiter Option

    In the 6.0 release, a new option allows you to specify a comma as a decimal separator, in addition to the period (the default value). This new option is available for external tables and for nzload, to help you to directly load data without extra pre-load conversion.

    X For the text-delimited format, and for unloading data, this option is available only at the table level.

    X For the fixed-length format, you can specify this option at the column level, making it possible to have a mix of comma and decimal separators.

    The option is available for the following data types, for both text-delimited and fixed-length formats:

    X NumericX FloatX DoubleX TimeX TimetzX TimestampOption usage for each data type is explained in each particular section describing that data type. For examples of how to use this new option, see Appendix A, Examples and Grammar.1-2 20525-1 Rev.1

  • C H A P T E R 2External Tables

    Whats in this chapterX About External TablesX Command SyntaxX Transient External TablesX Supported Data TypesX RestrictionsX Best PracticesX Examples

    This chapter describes external tables, as well as best practices and restrictions for using them. For options for using external table, see Chapter 3, External Table Options. For examples of how to use external tables, see Appendix A, Examples and Grammar.

    In the Netezza environment, there are the following types of tables:

    X System tables Stored on the hostX User tables Stored on the SPUsX External tables Stored as flat files on the host or client systems

    About External Tables

    An external table allows Netezza to treat an external file as a database table. An external table has a definition (a table schema), but the actual data exists outside of the Netezza appliance database. External tables can be used to access files which are stored on the Netezza host server or, in the case of a remote external table, Netezza can treat a file on a client system as an external table (see REMOTESOURCE option).

    After you have created the external table definition, you can use INSERT INTO statements to load data from the external file into a database table, or SELECT FROM statements to 2-1

    query the external table.

  • Netezza Data Loading GuidePrivileges RequiredTo create an external table, you must have LIST privilege on the database and CREATE EXTERNAL TABLE administration privilege. The database user who issues the CREATE EXTERNAL TABLE command owns the resultant table. The operating system user must have proper permission on the data object (READ permission for loading, WRITE permis-sion for unloading).

    Displaying External Table InformationTo display information about external tables, use the \d command from the nzsql prompt.

    T To list all external tables found in the current database, use the \dx command. For example:

    dev(admin)=> \dxList of relations

    Name | Type | Owner-------------+-----------+-------extlineitem | ext table | admin

    xlineitem | ext table | admin(2 rows)

    T To list the options defined in an external table, use the \d com-mand. For example:

    dev(admin)=>\d extlineitem

    Log FilesBy default, loading errors go into the following log files:

    X nzbad ..nzbadX nzlog ..nzlogYou can override the default by specifying a file for errors to go by using the following with a filename:

    X bf for nzbadX lf for nzlog

    UsageUse external tables to do the following:

    X Load data into the Netezza appliance from an external table and structure the loading operation to manipulate the data by using casts, joins, dropping columns, and so on.

    X Store data outside the Netezza appliance, either to transfer to another application, or as a table backup. See Backing Up and Restoring on page 2-4.

    X Create an external table and use data from an external table as part of a SQL query.The power of external tables is that the entire Extraction-Transformation-Loading (ETL) pro-cess is mapped to plain SQL. Since a SQL-based ETL process can be initiated/executed from any SQL client that can talk to the Netezza appliance, it reduces or avoids the require-ment of specialized ETL tools.2-2 20525-1 Rev.1

  • About External TablesTo load an external data file into the Netezza appliance as an external table, you can do either of the following:

    X Use a FROM clause of a SELECT SQL statement/command, like any normal table.X Use a WHERE clause of an UPDATE or DELETE SQL statement.To unload an external table into an external data file, use the table as the target table in any of the following SQL statements:

    X INSERT SQLX SELECT INTO SQLX CREATE TABLE AS SELECT SQLAll references to columns in the external table can be complex SQL expressions used for the transformation of external data during a load/unload process. For more information, see Restrictions on page 2-13.

    ParsingFor loads, the sequence of rows are parsed one-by-one from the external data file, and con-verted into internal records of the external table. There could be errors during the parsing of each row, or each column. For example, there could be errors in identifying the column value itself, as in the case of a missing delimiter. Or there could be errors during the con-version from external format to internal records of the external table, such as alphabets mentioned for an integer column in Text-Delimited format.

    Each error is logged in detail in an nzlog file, and bad rows are logged in an nzbad file. These files help user to identify bad rows in the external data file and correct them for reloading. Depending on the load options of the external table in use, each bad row would either cause the row to be skipped, or the entire load to be aborted. Similarly, each bad col-umn of a bad row could cause the rest of the row to be ignored, or if possible to recover, the load could continue to parse subsequent columns of the same row.

    Note that if there is an error in the project-expression on the external table columns, then the entire load is aborted and the transaction rolled back. Errors of this nature are not logged in nzbad or nzlog files, as they are outside of the scope of the external table load mechanism. Once the processing reaches the normal SQL engine, the external table is treated as if it is a normal table.

    Unlike an external table that has external rows in an ordered sequence, normal user tables have no implicit row order other than hidden rowid columns. So there is no way for a user not using rowids to identify the bad row in a SQL engine. In this case, the Netezza system just returns an error that a particular column caused an error, without identifying the bad row. It is as if the query was selecting from a normal table and inserting into another nor-mal table, with some row that caused the error during insertion.20525-1 Rev.1 2-3

  • Netezza Data Loading GuideBacking Up and RestoringYou can use external tables to back up a table in the system database. While the Netezza appliance database backup utility, nzbackup, enables you to create backups of the entire database, the external table backup method allows you to create a backup of a single table, with the ability to later restore it to the database as needed.

    X To back up table data using an external table, create external table definitions for each user table and then use SQL to insert into the external table.

    X When you restore table data, create a table definition (if it does not exist) and then use SQL to insert into the table from an external table.

    Command Syntax

    The CREATE EXTERNAL TABLE command has the following syntax.

    T To create an external table based on another table:CREATE EXTERNAL TABLE table_nameSAMEAS table_nameUSING external_table_options

    T To create an external table by defining columns:CREATE EXTERNAL TABLE table_name({ column_name type[ column_constraint [ ... ] ]} [, ... ])[USING external_table_options]

    Note: Although you can specify column constraints, they are ignored, and must be defined elsewhere. For more information, see Column Constraint Rules for Empty Strings on page 2-10.

    Transient External Tables

    Transient external tables (TET) provide a way to define an external table that exists only for the duration of a single query. Transient external tables have the same capabilities and lim-itations as normal external tables. A special feature of a TET is that the schema does not have to be defined when the TET is used to load data into a table or when the TET is cre-ated as the target of a SELECT statement.

    SyntaxThe following is the syntax for a TET:

    INSERT INTO SELECT FROM EXTERNAL 'filename' [(schema_definition)][USING (external_table_options)];

    CREATE EXTERNAL TABLE 'filename' [USING (external_table_options)]AS select_statement;2-4 20525-1 Rev.1

  • Transient External TablesSELECT FROM EXTERNAL 'filename' (schema_definition)[USING (external_table_options)];

    Explicit Schema DefinitionThe schema of a transient external table can be explicitly defined in a query. When defined this way, the schema definition is the same as is used when defining a schema using CRE-ATE TABLE.

    SELECT x, y, NVL(dt, current_date) AS dt FROM EXTERNAL '/tmp/test.txt' ( x integer, y numeric(18,4), dt date ) USING (DELIM ',');

    The explicit schema definition feature can be used to specify fixed length formats.

    SELECT * FROM EXTERNAL '/tmp/fixed.txt' ( x integer, y numeric(18,4), dt date ) USING (FORMAT 'fixed' LAYOUT (bytes 4, bytes 20, bytes 10));

    The SAMEAS keyword can also be used to specify that the schema of the external table is identical to some other table that currently exists in the database.

    SELECT * FROM EXTERNAL '/tmp/test.txt' SAMEAS test_table USING (DELIM ',');

    Implicit Schema DefinitionIf the schema is not explicitly defined, the schema for a transient external table is deter-mined based on the query being executed. When a TET is used as a data source for an INSERT statement, the external table will take on the schema of the target table.

    The external table in this INSERT statement takes on the schema of the target table. The columns in the external data file must be in the same order as the target table, and every column in the target table must also exist in the external table data file.

    INSERT INTO target SELECT * FROM external '/tmp/data.txt' USING (DELIM '|');

    Exporting Data Using Transient External TablesA transient external table can also be used to export data out of the database. In this case the schema of the external table is based on the query being executed.

    Example:

    CREATE EXTERNAL TABLE '/tmp/export.csv' USING (DELIM ',') ASSELECT foo.x, bar.y, bar.dt FROM foo, bar WHERE foo.x = bar.x;

    Remote Transient External TablesA session connected to Netezza using ODBC, JDBC, or OLE DB from a client system can import and export data using a remote transient external table, which is defined by using the REMOTESOURCE option in the USING clause.

    For example, the following SQL statement loads data from a file on a Windows system into a TEMP table on Netezza, using an ODBC connection.

    CREATE TEMP TABLE mydata AS SELECT cust_id, upper(cust_name) as name from external 'c:\customer\data.csv' (cust_id integer, cust_name varchar(100)) USING (DELIM ',' REMOTESOURCE 'ODBC');20525-1 Rev.1 2-5

  • Netezza Data Loading GuideRemote external table loads work by sending the contents of a file from the client system to the Netezza server where the data is then parsed. This method minimizes CPU usage on the client system during a remote external table load.

    Supported Data Types

    Table 2-1 describes the Netezza supported data types for external tables. For more infor-mation about the specific data types, see the Netezza Performance Server Database Users Guide.

    Table 2-1: Supported Data Types

    Data Type Example Description

    byteintsmallintintegerbigint

    12002561290985

    See Integer Data Types on page 2-7.

    numericdecimal

    -99.56123.679

    See Fixed-Point Data Types on page 2-7.

    realdouble precision

    81293.35 See Floating-Point Data Types on page 2-8.

    char (n) salary See Character Strings on page 2-10 and Column Constraint Rules for Empty Strings on page 2-10.

    varchar (n) I am John See Character Strings on page 2-10 and Column Constraint Rules for Empty Strings on page 2-10.

    boolean true An ASCII string that contains any of the following values:[true|false]|[yes|no]|[1|0]|[t|f]|[y|n]See BoolStyle on page 3-3.

    date 2002-02-04 The date is an exact four-byte data type. The system recognizes a range of dates composed of year, month, and day. See DateStyle on page 3-5.

    time 01:59:4523:00:01

    See Time on page 2-11.

    time with time zone 01:15:33 -05 See Time with time zone on page 2-12.

    timestamp 2002-02-04 01:15:33 See Timestamp on page 2-12.2-6 20525-1 Rev.1

  • Supported Data TypesInteger Data TypesInteger types are exact data types. The system generates an error if an input fields value cannot be expressed without loss of accuracy in the target table.

    Table 2-2 describes the integer syntax.

    Table 2-3 describes the integer handling.

    Fixed-Point Data TypesThe fixed-point data types are exact data types. The system generates an error if an input fields value cannot be expressed without loss of accuracy in the target table or database.

    Table 2-4 lists and describes the fixed-point syntax.

    Table 2-2: Integer Description

    Syntax [+|-]

    Description Optional leading sign

    Unlimited leading zeros

    At least one decimal digit

    Limitation No thousands-separator commas

    No support for exponential notation

    Table 2-3: Integer Handling

    SQL Alias Representation Values

    byteint int1 1 byte, signed min value = -128max value = 127

    smallint int2 2 bytes, signed min value = -32768max value = 32767

    integer int or int4 4 bytes, signed min value = 2147483648max value = 2147483647

    bigint int8 8 bytes, signed min value = 9223372036854775808max value = 9223372036854775807

    Table 2-4: Fixed-Point Description

    Syntax [+|-][.[]][+|-].[+|-][,[]][+|-],

    Description Optional leading sign

    Unlimited leading zeros

    At least one decimal digit20525-1 Rev.1 2-7

  • Netezza Data Loading GuideThe syntax of fixed-point values is the same as the syntax of integer values with the addi-tion of an optional decimal digit that can occur anywhere from before the first decimal digit to after the last decimal digit.

    The optional decimal point can be followed by zero or more decimal digits, if there is at least one decimal digit before the decimal point; followed by one or more decimal digits if there are no decimal digits before the decimal point.

    If there is no explicit decimal point, the system assumes a decimal point immediately fol-lowing the last decimal digit.

    You can also specify a comma as a separator, using it like the decimal digit. For examples of how to do this, see Decimal Delimiter Examples on page A-4.

    Table 2-5 describes the fixed-point precision and representation:

    The following result in system errors:

    X Precision Having more decimal digits before the decimal point than the declaration allows (P-S).

    X Scale Having more decimal digits following the decimal point than the declared scale (S).

    Note: Because fixed-point is an exact data type, when there are too many digits following the decimal point, the system does not round the number.

    Floating-Point Data TypesThe floating-point data types are approximate data types. The system rounds the signifi-cand if more precision is present that it can represent.

    Table 2-6 lists the floating point syntax.

    Limitation No thousands-separator commas

    No support for exponential notation

    Table 2-5: Fixed-Point Precision

    Precision

    Representation 4 bytes, signed 8 bytes, signed 16 bytes signed

    Table 2-4: Fixed-Point Description

    P 9 9 P 18< 18 P 381;

    If the query returns multiple rows that share the same row ID, truncate the database table and reload the external table (but only once).

    After you load data from an external table into a user table, you should run GENERATE STATISTICS to update the statistics for the user table. This improves the performance of queries that run against that table. 2-14 20525-1 Rev.1

  • ExamplesExamples

    The following examples show how to use the CREATE EXTERNAL TABLE command.

    T To create an external table, enter: CREATE EXTERNAL TABLE ext_orders(ord_num INT, ord_dt TIMESTAMP)USING(dataobject('/tmp/order.tbl') DELIMITER '|');

    T To create an external table that uses column definitions from an existing table, enter: CREATE EXTERNAL TABLE demo_ext SAMEAS emp USING (dataobject ('/tmp/demo.out') DELIMITER '|');

    T To create an external table and specify the escape character (\), enter:CREATE EXTERNAL TABLE extemp SAMEAS emp USING( dataobject ('/tmp/extemp.dat') DELIMITER '|' escapechar '\');

    T To unload data from your database into a file by using an insert statement, enter:INSERT INTO demo_ext SELECT * FROM weather;

    T To drop an external table, enter:DROP TABLE extemp

    The system removes only the external tables schema information from the system cat-alog. The file defined in the dataobject option remains unaffected in the filesystem.

    T To back up by creating an external table, enter:CREATE EXTERNAL TABLE '/path/extfile' USING (FORMAT 'internal' COMPRESS true) AS SELECT * FROM source_table;

    T To restore from an external table, enter:INSERT INTO t_desttbl SELECT * FROM EXTERNAL'/path/extfile' USING(FORMAT 'internal' COMPRESS true);

    Transient External TableThe following examples show how to specify the shape of a transient external table:

    T To take on the schema of the target table:insert into select * from external '' [USING(...)]

    T To take on the schema of the query:create external table '' [USING (...)] as

    T To take on the schema of :select * from external '' sameas [USING(...)]

    T To take on the schema as defined:select * from external '' (schema) [USING(...)]

    T To take on the schema as defined:create external table '' (schema) [USING(...)]

    T To make the source file FIXED format with the schema as defined:select * from external '' (schema) USING (FORMAT 'FIXED' LAYOUT (...))20525-1 Rev.1 2-15

  • Netezza Data Loading GuideT To make the source file FIXED format and the table takes on the schema of the target table:

    insert into select * from external '' USING (FORMAT 'FIXED' LAYOUT (...))

    T The following example will not work, because you cannot unload data into a FIXED for-mat external table:

    create external table '' [(schema)] USING (FORMAT 'FIXED' LAYOUT ... )

    Fixed-Length FormatThe following examples show how to use Fixed-Length format with external tables:

    T To load data in fixed format, enter:INSERT INTO t SELECT * FROM EXTERNAL /data/fixed USING ( FORMAT FIXED LAYOUT (BYTES 20, REF BYTES 3, BYTES @2) )

    T To load data with different date/time delimiters for different zones, enter:INSERT INTO t SELECT * FROM EXTERNAL /data/fixed USING ( FORMAT FIXED LAYOUT ( YMD - BYTES 15, DMY / BYTES 15 ) )

    T To load spatial data (binary data into VARCHAR), enter:INSERT INTO t SELECT * FROM EXTERNAL /data/fixed USING ( FORMAT FIXED CTRLCHARS true LAYOUT ( BYTES 100, REF BYTES 4, BYTES @2) )

    T To load fixed format data with record-length and no record-delimiter, enter:INSERT INTO t SELECT * FROM EXTERNAL /data/fixed USING ( FORMAT FIXED RECORDDELIM RECORDLENGTH @1 LAYOUT( REF BYTES 2, BYTES 120, REF BYTES 2, BYTES @3) )

    T To load data with different NULLIF clauses for different zones, enter:INSERT INTO t SELECT * FROM EXTERNAL /data/fixed USING ( FORMAT FIXED LAYOUT ( BYTES 15 NULLIF 2000-10-10, BYTES 2 NULLIF & = 12) )

    T To load data with NULLIF clauses referring other zones, enter:INSERT INTO t SELECT * FROM EXTERNAL /data/fixed USING ( FORMAT FIXED LAYOUT ( REF BYTES 2, BYTES @1 NULLIF @1 = -1, REF BYTES 4, BYTES 100 NULLIF &&3 = null ) )

    Standard Unloading and ReloadingThe following examples unload and load a user table to an external table in text-delimited format. Unloading is not supported for Fixed-Length format.

    T To create a text-format external table, enter:CREATE EXTERNAL TABLE extemp SAMEAS emp USING (DATAOBJECT ('/tmp/emp.dat'));

    T To unload data in user table EMP to the external table EXTEMP, enter:INSERT INTO extemp SELECT * FROM emp;

    T To load data into user table EMP from external table EXTEMP, enter:2-16 20525-1 Rev.1

  • ExamplesTRUNCATE TABLE emp;INSERT INTO emp SELECT * FROM extemp;

    Back up and Restore a User TableThe following examples show how to back up and restore the user table EMP to an external table in binary compressed format.

    T To create a compressed binary format external table definition called emp_backup for the table emp, enter:

    CREATE EXTERNAL TABLE emp_backup SAMEAS emp USING (DATAOBJECT ('/tmp/emp.bck')COMPRESS trueFORMAT 'internal');

    T To back up the emp table data into emp_backup, enter:INSERT INTO emp_backup SELECT * FROM emp;

    T To restore the emp table from emp_backup, make sure that the emp table is empty and enter:

    TRUNCATE TABLE emp;INSERT INTO emp SELECT * FROM emp_backup;20525-1 Rev.1 2-17

  • Netezza Data Loading Guide2-18 20525-1 Rev.1

  • C H A P T E R 3External Table Options

    Whats in this chapterX OptionsX Option DetailsX Option ProcessingX Session Variables

    This chapter describes the options used with external tables. For examples of how to use external tables, see Appendix A, Examples and Grammar.

    Options

    When you create an external table definition, you can specify options. There are different types of options: some are for records/rows, some are for fields, and some are for loads. Use these options when loading from an external table or when using the external table directly in a SQL query.

    Note: The best method to verify that the load processing has been successful is to ensure the system records any errors to the nzlog and nzbad files. Check these files occasionally.

    Table 3-1 lists the external table options, and a description of each option follows. In the Valid Formats column, Text refers to Text-Delimited format and Fixed refers to Fixed-Length format. In the Data type column, enumeration refers to the system accepting a specified set of quoted or unquoted string values.

    Table 3-1: External Table Options

    OptionValid Formats

    Values DefaultUnload Y/N

    Data Type

    BoolStyle Text, Fixed

    1_0/T_F/Y_N NULL, 1_0 Y enumera-tion3-1

    Compress Text, Fixed

    True/False False Y boolean

    CRinString Text, Fixed

    True/False NULL, False Y boolean

  • Netezza Data Loading GuideCtrlChars Text, Fixed

    True/False NULL, False N boolean

    DataObject Text, Fixed

    Existing file path No default Y filename

    DateDelim Text, Fixed

    1-byte NULL, "-" Y string

    DateStyle Text, Fixed

    YMD/MDY/DMY NULL, YMD Y enumera-tion

    DecimalDelim Text, Fixed

    1-byte . Y string

    Delimiter Text 1-byte NULL, "|" Y string

    Encoding Text Inter-nal/Latin9/Utf8

    NULL, Internal

    Y enumera-tion

    EscapeChar Text 1-byte NULL Y string

    FillRecord Text True/False NULL, False N boolean

    Format Text, Fixed

    Text/Inter-nal/Fixed

    Text Y enumera-tion

    IgnoreZero Text True/False NULL, False N boolean

    IncludeZero-Seconds

    Text True/False NULL, False Y boolean

    Layout Text, Fixed

    Zone definitions NULL, Inherit N none

    LogDir Text, Fixed

    existing dir path NULL, /tmp N string

    MaxErrors Text, Fixed

    >=0 NULL,1 N integer

    MaxRows Text, Fixed

    >=0 NULL, 0 N integer

    NullValue Text, Fixed

    4-bytes NULL, "NULL"

    Y string

    QuotedValue Text No/Yes/Sin-gle/Double

    NULL, No N enumera-tion

    Table 3-1: External Table Options

    OptionValid Formats

    Values DefaultUnload Y/N

    Data Type3-2 20525-1 Rev.1

  • Option DetailsOption Details

    The following sections details the different options.

    BoolStyleSpecifies the boolean style. During a load, the loader can handle only a specific style of boolean values.

    Table 3-2 lists the styles and their values.

    RecordDelim Text, Fixed

    4-bytes NULL, /newline

    N string

    RecordLength Fixed Integer/Zone-ref expr

    NULL N integer

    RemoteSource Text, Fixed

    ODBC/JDBC NULL Y enumera-tion

    RequireQuotes Text True/False NULL, False N boolean

    SkipRows Text, Fixed

    >=0 NULL, 0 N bigint

    SocketBufSize Text, Fixed

    64KB-2GB 8MB Y integer

    TimeDelim Text, Fixed

    1-byte NULL, ":" Y string

    TimeRound Nanos TimeExtraZeros

    Text True/False NULL, False N boolean

    TimeStyle Text, Fixed

    24hour/12hour NULL, 24hour

    Y enumera-tion

    TruncString Text True/False NULL, False N boolean

    Y2Base Text, Fixed

    >=0 NULL, 0 N integer

    Table 3-1: External Table Options

    OptionValid Formats

    Values DefaultUnload Y/N

    Data Type

    Table 3-2: Boolean Values

    Style Name Value

    1_0 1 or 020525-1 Rev.1 3-3

  • Netezza Data Loading GuideThe default style is 1_0. The values can be expressed in mixed case, so true can be True or TRUE or tRuE.

    If you specify the YES_NO option on the command line, the system assumes that the data in the Boolean field is in the form yes or no. If the data is any of the other values: true, false, 1, 0, t, f, y, or n, the system discards the record to the nzbad file and logs an error with the record number in the nzlog file.

    CompressSpecifies whether the source datafile data is compressed or not. The valid values are true or on, false or off. The default is false. This can only be true if the format is set to internal.

    CRinStringSpecifies whether to allow unescaped carriage returns in char/varchar and nchar/nvarchar fields. Acceptable values are true or false, on or off. Do not put quotes around the value.

    X False Default, treats all cr or crlf as end-of-record.X True Accepts unescaped CR in char/varchar fields (LF becomes only end of row).Note: This option is different for Fixed-Length format. For more information, see Changed Options on page 6-3.

    CtrlCharsSpecifies whether to allow an ASCII value 1-31 in char/varchar and nchar/nvarchar fields. You must escape NULL, CR, and LF characters. Acceptable values are: true or false, on or off. The default is false. Do not insert quotes around the value.

    Note: This option is different for Fixed-Length format. For more information, see Changed Options on page 6-3.

    DataObjectSpecifies the OS-path to the source datafile (or any media that can be treated as a file). There is no default, and this must be specified. When the remotesource option is not set (or set to empty string), this path has to be an absolute path and not a relative path. The file-name must be a valid UTF-8 string.

    X For loads, this file has to be an existing file with READ permission for the OS user ini-tiating the load.

    T_F T or F

    Y_N Y or N

    YES_NO YES or NO

    TRUE_FALSE TRUE or FALSE

    Table 3-2: Boolean Values

    Style Name Value3-4 20525-1 Rev.1

  • Option DetailsX For unloads, the parent directory of this file has to have READ-WRITE permissions for the OS user initiating the unload, and the data file is overwritten if it already exists.

    DateDelimSpecifies the delimiter character that separates the date components, and used with the dateStyle option. The default is - for all dateStyles except MONDY[2], where the default is (space). This is a single-byte string.

    X If you specify the option as an empty string, which means that there is no delimiter between the date components, you must specify days and months as two-digit num-bers. Single-digit months and days are not supported.

    X With MonDY or MonDY2, the default dateDelim option is space. X With days and months less than 10, use either one or two digits, or a space followed by

    a single digit.

    X With the dateDelim option as a space, the system allows a comma after the day.X With any component (day, month, year) as zero, or any day/month inconsistency, such

    as August 32 or February 30, the system returns an error.

    Table 3-3 lists dateDelim option examples.

    Note: If not using delimiters, the date will be determined as in the following example for June 12, 2009:

    06122009

    DateStyle Specifies how to interpret the date format. The date style settings YMD, MDY, DMY, DMONY, MONDY. The default is YMD.

    Note: The two-digit year formats (Y2MD, MDY2, DMY2, DMONY2 and MONDY2) are not supported for unloads.

    The dateStyle options are shown in Table 3-4.

    Table 3-3: The -dateDelim

    No dateDelim -dateDelim , -dateDelim (space)

    Jan 01 2003 Jan 01,2003 Jan 01, 2003Jan 1 2003 Jan 1,2003 Jan 1, 2003Jan 1 2003 Jan 1,2003 Jan 1, 2003

    Table 3-4: DateStyle

    Sequence of Date Components Four-digit Year Two-digit Year

    Year Numeric-month Day YMD Y2MD

    Day Numeric-month Year DMY DMY220525-1 Rev.1 3-5

  • Netezza Data Loading GuideNote: Two-digit year formats are not supported for unloads.

    The default dateStyle is YMD, and the SQL standard stipulates that the legal years are 0001 to 9999. There is no provision in SQL for years prior to 0001 AD or later than 9999 AD.

    Date example: In the data file jan-01.data, the data are specified as the following:

    14255932|30/06/2002|20238|20127|40662|157|

    Because the date value is using the DD/MM/YYYY format, specify the following dateStyle and dateDelim options:

    nzload -t agg_month -df jan-01.data -delim | -dateStyle DMY -datedelim '/'

    DecimalDelimSpecifies the decimal delimiter for the following data types, for both text-delimited and fixed-length formats: float, double, numeric, time, timetz, and timestamp. Default is .. For examples of usage, see Decimal Delimiter Examples on page A-4.

    DelimiterSpecifies the field delimiter. The default is the pipe character |. You can specify charac-ters in the 7-bit ASCII range using either a quoted value (for example: delimiter '|') or by its unquoted decimal number (delimiter 124) . To specify a byte value above 127, use the decimal number. This is a single-byte string.

    Note: For nzload, the default is \t(tab).

    Note: This option is not supported for Fixed-Length format.

    The system processes an input row by identifying the successive fields within that row. A single character field delimiter separates adjacent fields. The lack of a field delimiter between fields is an error. You can use a trailing field delimiter following the last field in a row (but it is not required).

    You can specify the following delimiters:

    X Numeric 0xNN or NN where NN is a number for either hexadecimal or decimal.X Control characters ^A -^Z (low-order 5 bits) and ^a -^z (low-order 5 bits).X Symbols \b backspace (8), \t horizontal tab (9), \n line feed (10), \f form feed (12), \r

    carriage return (13), \\ backslash, \ single quote, \ double quote.

    X Literal Any character, such as c (the non-control character c).

    Numeric-month Day Year MDY MDY2

    Alphabetic-month Day Year MonDY MonDY2

    Day Alpha-betic-month

    Year DMonY DMonY2

    Table 3-4: DateStyle

    Sequence of Date Components Four-digit Year Two-digit Year3-6 20525-1 Rev.1

  • Option DetailsTo use a character other than a 7-bit-ASCII character as a delimiter, make sure that you specify it as a decimal or hex number. Do not specify a character literal, which could result in errors from encoding transformation. For example, to use the hex value 0xe9 as a delim-iter (which is in Latin9), use d 0xe9 as the value. Do not use d ''.

    Although the system accepts alpha-numeric characters, to avoid ambiguity do not select a delimiter that conflicts with the data in a field. Also if you use the dateDelim and timeDe-lim options, select different delimiters for each type.

    Note: When you are using the nzload wrapper you can enter escape characters on the com-mand line, such a \b. If you use the CREATE EXTERNAL TABLE command, the only special character you can specify is \t (\t).

    EncodingSpecifies the encoding of the datafile for the character set. The default is internal.

    You can also specify utf8 if the whole file is in UTF-8 encoding and has only nchar/nvar-char data and no char/varchar data. Use internal if the file could have both Latin-9 and UTF-8 data or either type using char, varchar, nchar, or nvarchar data.

    The system supports single-byte characters in Latin9 encoding, and Unicode data in the multi-byte UTF-8 encoding. Use the encoding option to specify the type of data in the file. The encoding option has three values:

    X A value of latin9 indicates that the whole file is in Latin-9 char/varchar data and has no nchar/nvarchar data. (If the file contains any nchar/nvarchar data, it will be rejected by the load operation.)

    X A value of utf8 indicates the whole file is in UTF-8 encoding and has only nchar/nvar-char data and no char/varchar data. (If the file contains any char/varchar data, it will be rejected by the load operation.)

    X The value internal indicates that the file could have either or both Latin-9 and UTF-8 data using any or all of the char, varchar, nchar, or nvarchar data types. As a best prac-tice, use internal if you are not certain of the data encoding.

    For more information, see the Using International Character Sets chapter in the Netezza Performance Server Database Users Guide.

    Use the nzconvert command to convert character encoding before loading with external tables. For the command options and examples, refer to Converting Legacy Formats in the Netezza Performance Server Database Users Guide.

    Note: This option is not supported for Fixed-Length format.

    EscapeCharSpecifies the use of an escape character. The character immediately following the \ is escaped. The only supported value is \, and the default is no escaping.

    By default, the system expects fields to be delimited by a field-delimiter character or by an end-of-row sequence. The system assumes all other characters are part of the fields value.

    Although efficient, this representation has the drawback that string fields may not contain instances of the field delimiters. In addition, one value typically becomes inexpressible because you have used it to convey the absence of any value (that is, that column is null).20525-1 Rev.1 3-7

  • Netezza Data Loading GuideOne solution is to use an escape character for the delimiter. For example, the following command line demonstrates using the escapeChar option.

    nzload -escapeChar \ -nullValue NULL -delim |

    X |NULL| A null input fieldX |\NULL| A non-null input field containing the text NULLX |\|| A non-null input field containing the single character |X |\\| A non-null input field containing the single character \Note: This option is not supported for Fixed-Length format.

    FillRecordSpecifies whether to allow an input line with fewer columns than the table definition. Miss-ing or trailing input fields should be treated as nulls if the columns are nullable. The default is false.

    The system expects one input field for every column in the target tables schema, and rejects a row with fewer fields. If you specify the fillRecord option, the system allows omit-ting one or more trailing (rightmost) fields, as long as all corresponding columns can be null.

    Note: This option is not supported for Fixed-Length format.

    FormatSpecifies the data format of the source file to load and unload. The valid values are as follows:

    X text (default) Data in Text-Delimited formatX fixed Data in new Fixed-Length formatX internal Data in compressed binary format (to use this, the compress option must be

    set to true)

    IgnoreZeroSpecifies discarding byte value zero in char() and varchar() fields. The default is false. If true, the command accepts binary value zeroes in input fields and discards them.

    Note: This option is not supported for Fixed-Length format.

    IncludeZeroSecondsSpecifies that 00 seconds values will be unloaded to the external table. For example, a time value such as 12:34:00 or 12:34 will be unloaded to the external table in the format 12:34:00. The default is false.

    Note: This option is not supported for Fixed-Length format, and is only for unloading.

    LayoutSpecifies the zone definitions.3-8 20525-1 Rev.1

  • Option DetailsNote: This option is used only with the Fixed-Length format. For more information, see New Options on page 6-2.

    LogDirSpecifies the directory to which nzlog and nzbad files are generated for loads. This is not used for unloads. The default value is '/tmp'. Note that when doing remote loads from Win-dows clients (through ODBC/JDBC), the default output directory is mapped to "C:\". The directory name must be a valid UTF-8 string.

    MaxErrors Specifies the number of errors at which the system stops processing rows. If the count of rejected rows reaches this threshold, the system immediately aborts and rolls back the load.

    The default value is 1. This default has the effect of committing a load only if it contains no errors. A maxErrors value n (where n is greater than 1) allows the first n-1 row rejections to be recoverable errors, not including the number of rows processed in the skipped row range.

    Use this option to specify a different value, from 0 (unlimited errors) up to 2,147,483,647 (the largest signed 32-bit integer).

    Note: This option is different for Fixed-Length format. For more information, see Changed Options on page 6-3.

    MaxRowsSpecifies to stop processing after this initial number of rows. Use a limit clause with the select statement to limit loading data. The default is 0 (load all rows).

    After processing a row (whether inserted, skipped or rejected), the system decides whether to look for another input row:

    X If you did not specify the maxRows option, the system attempts to locate the next input row.

    X If you specified the maxRows option and the input row counter is equal to the maxRows count, the system ends the load and commits all inserted records, not including the rows processed in the skipped row range. Otherwise, the system attempts to locate the next input row.

    NullValueSpecifies the string to use for the null value, with a maximum 4-byte UTF-8 string. The default is NULL. You can specify a value such as a space (' ') or any string up to four char-acters. Conceptually a field contains either a value or an indication that there is no value. The system provides some flexibility in how you indicate that a field contains no value. For more information about how the system handles nulls, see Column Constraint Rules for Empty Strings on page 2-10.

    The system determines a fields type and whether it is null by inspecting the corresponding column declaration:

    X If there is no value, the system sets the corresponding value in the candidate binary 20525-1 Rev.1 3-9

    record to null.

  • Netezza Data Loading GuideX If you declared the target column not null, then an absence of a value is an error.X If a field does not indicate null, the system assumes it contains a value. The system

    analyzes the contents of that field, converts its textual input representation to binary, and sets the corresponding value in the candidate binary record to that value.

    QuotedValueSpecifies whether data values are quoted or not. The default is false. Specify SINGLE or YES to require single quotes or DOUBLE to require double quotation marks. You can pre-cede the opening quote or follow the closing quote with spaces. You can use the actual quote characters if you enclose them in double quotes. The system recognizes the end of the field by a field-delimiter character or an end-of-row sequence.

    The system recognizes a quoted value when the first non-space character is the quote char-acter specified in the quotedValue option. If the first non-space character is not the specified quote character, then the system handles it according to the normal rules. In par-ticular, leading or trailing spaces in string fields are considered part of the strings value.

    For example, the following command line demonstrates using the quotedValue option.

    nzload -quotedValue SINGLE -nullValue NULL -delim |

    X |NULL| A null input fieldX |NULL| A null input fieldX | Im | A non-null input field containing the text Im X | Im | A non-null input field containing the text ImX | | | A non-null input filed containing the single character |X | | A non-null input field containing a single spaceX | | A non-null input field containing a single spaceX | | A non-null input field containing a zero-length stringX || A non-null input field containing a zero-length stringNote that unlike the escapeChar option, the quotedValue option is not able to force the sys-tem to accept the nullValue token as a valid non-null input value. The system overhead for processing quoted value syntax is much greater than the default unquoted syntax. In addi-tion, except for strings containing three or more field delimiters that need to be escaped and no embedded quotes, using the quotedValue option results in more bytes of input data than the escapeChar option. When you have a choice, use unquoted syntax.

    If you expect all values in all input fields (string or otherwise) to be uniformly enclosed in quotes, then use the requireQuotes option to cause the system to enforce this usage. Using the requireQuotes option improves the parsing overhead and provides extra robustness.

    Note: This option is not supported for Fixed-Length format.

    RecordDelimSpecifies that the row/record delimiter to be used is the string literal. Valid values must be a maximum 8-byte UTF-8 string.

    Note: This option is used only with the Fixed-Length format. For more information, see 3-10 20525-1 Rev.1

    New Options on page 6-2.

  • Option DetailsRecordLengthSpecifies the length of the entire record. Includes the length itself, but does not include the RecordDelimiter.

    Note: This option is used only with the Fixed-Length format. For more information, see New Options on page 6-2.

    RemoteSourceSpecifies the source datafile is remote, and takes the following values: ODBC, JDBC or empty string. External tables created with the remote source set to ODBC or JDBC are usable only through ODBC or JDBC respectively. External tables created with the remote source not set (or set to empty string) are usable from any client (the source datafile path is assumed to be on the Netezza host, even if the load/unload is initiated remotely from a dif-ferent host).

    Note that nzsql does not support remote loads/unloads using external tables (you can only create external tables remotely), though it does support loads/unloads locally on the host.

    This option is automatically set to ODBC if the hostname option is set to anything but local-host or the reserved IP address (127.0.0.1).

    RequireQuotesSpecifies if quotes are mandatory. The default is false. If set to true, the quoted value must be set to YES, SINGLE, or DOUBLE. See QuotedValue on page 3-10.

    Note: This option is not supported for Fixed-Length format.

    SkipRowsSpecifies the number of initial rows to skip before loading the data. The default is 0 (none). After the system has a candidate binary record from an input row, it determines whether to insert that record into the target table:

    X If you did not specify this option, the system inserts every record.X If you specified this option and the input row counter is less than or equal to the

    skipRows count, the system discards the candidate binary record (skipped). Otherwise, the system inserts the record.

    Note: If you use the skipRows option, the system skips that number of rows, and then begins the count for the maxErrors and/or maxRows options (if you have specified them).

    Note that this cannot be used for 'header' row processing in a datafile, as even the skipped rows are processed first, so the data in the header rows should be valid with respect to the external table definition.

    This option can be used for doing a dry-run to validate the datafile is correct, before loading into a user table, by setting a maximum value.20525-1 Rev.1 3-11

  • Netezza Data Loading GuideSocketBufSizeSpecifies the chunk size at which to read the data from the source file, expressed in bytes. Valid values range from 64KB to 800MB, with a default value of 8MB. Values outside this range result in a system notice that the value will be reset to the appropriate minimum or maximum level. This is used to fine-tune the performance of loads, depending on the speed at which the source data is available for loads.

    TimeDelimSpecifies the single-byte character that separates the time components. The default is ':'.

    X If you specify the timeDelim option as an empty string, you must specify the hour, min-utes, and optional seconds as two-digit numbers.

    X If you specify the 12-hour format, you can precede the AM or PM token with a single space. Note that the tokens, AM and PM are case-insensitive.

    The system checks syntax and range errors. If an error occurs, the system discards the record to the nzbad file and logs an error with the record number in nzlog file.

    TimeRoundNanosRounds the time value to six fractional seconds digits. You can use the timeRoundNanos option to specify allowing but rounding non-zero digits with smaller than microsecond precision.

    X If you do not use the timeRoundNanos option, a value is accepted, as long as it can be stored without loss of precision.

    X If you specify this option, the value is accepted, even when full precision of any frac-tional seconds cannot be preserved. In this case, the value is rounded.

    For example, consider the following timestamps:

    1999/12/31 23:59:59.99999941999/12/31 23:59:59.9999995

    Both of these timestamps specify finer than microsecond resolution. Without the option, each would be rejected. Using the option, the first sample timestamp would round to:

    1999/12/31 23:59:59.999999The second sample would round to:

    2000/01/01 00:00:00.0Note: This option is not supported for Fixed-Length format, and is also referred to as the TimeExtraZeros option.

    TimeStyleSpecifies the time format (24HOUR, 12HOUR) used in the data file. The default is 24HOUR.

    TruncStringSpecifies truncating a string and inserting it into the declared storage. 3-12 20525-1 Rev.1

    False Default, the system reports an error when a string exceeds its declared storage.

  • Option ProcessingTrue Truncate any string value that exceeds its declared char/varchar storage.

    Note: This option is not supported for Fixed-Length format.

    Y2BaseIf you specify the Y2-style date, use the -y2Base option to specify the start of the 100-year range. Table 3-5 provides some examples of date ranges and their corresponding input values.

    Option Processing

    This section contains additional information on how the system processes the options.

    Counting RowsThe system uses a line-oriented input format one line of text is an input row. It operates by isolating successive rows in the input stream. Every time it finds a new row, it incre-ments a row counter (starting with number 1) and analyzes the contents of the row.

    During analysis two sorts of errors can occur:

    X The input text may not match the expected format. X A field value might fail to meet a requirement imposed by the target table schema.

    Table 3-5: The -y2Base Option

    Desired Range 19001999 19232022 19762075 20002999

    Option -y2Base 1900 -y2Base 1923 -y2Base 1976 -y2Base 2000

    In Y2 input

    00 1900 2000 2000 2000

    01 1901 2001 2001 2001

    02 1902 2002 2002 2002

    24 1924 1924 2024 2024

    25 1925 1925 2025 2025

    76 1976 1976 1976 2076

    77 1977 1977 1977 2077

    98 1998 1998 1998 2098

    99 1999 1999 1999 209920525-1 Rev.1 3-13

    If a row contains no errors, the system converts the row into a candidate binary record.

  • Netezza Data Loading GuideHandling Bad RowsWhen the system encounters an error, it stops analyzing the row, appends the row to the bad rows file, writes a supporting diagnostic message to the nzlog file describing the posi-tion and nature of the error, and increments a rejected rows counter.

    Delineating Input RowsInput rows are separated by any of the common end-of-line conventions: , , , or . In UNIX environments is commonly known as NewLine. The last row/line need not have an end-of-line character.

    Neither of the pairs nor is a valid end-of-line sequence. Instead each pair encloses an empty row containing no values. The system considers such an empty row valid only if you specified the fillRecord option, and you specified that every col-umn in the target tables is capable of being set to null.

    Matching Input Fields to Table ColumnsThe system determines the shape of input rows by inspecting the target tables schema. The fields are paired-up left-to-right with the columns in the target schema. Once the sys-tem has located the start of a field, the declared type of the corresponding target column guides further processing.

    Note: It is an error for a row to contain more fields than the target table contains columns.

    Using String and Non-string Fields If an input field corresponds to a column declared char, nchar, varchar, or nvarchar, the system considers it a string field, with all other types as non-string fields. This distinction is important because spaces are significant within string fields, but not elsewhere.

    Note: An empty field or a field containing only spaces can represent a legitimate string value, but can never be a legitimate non-string value.

    The system uses the following rules based on whether the field is a string field:

    X If the field is a string field All characters from the beginning of the field to the termi-nating delimiter or end of row sequence contribute to the fields value.

    X If the field is a non-string field The system skips any leading spaces, interprets or converts the fields contents, and skips any trailing spaces.

    The string/non-string distinction also affects the details of how a field indicates that it is null. For more information, see Handling the Absence of a Value on page 3-14.

    Handling the Absence of a ValueIn SQL, a record must include a value if a column is declared not null. When a record con-tains no value for a column, the column is considered to be null. The system provides an explicit and implicit method for conveying nullness.

    X The explicit method includes a specific token in the field instead of a value. By default, this token is the word null (case insensitive). You can use the nullValue option to change this token to any other 1-4 character alphabetic token. You can pre-cede or follow an occurrence of the explicit null token in a non-string field with 3-14 20525-1 Rev.1

    adjacent spaces. For the system to recognize an explicit null token in a string field, the

  • Option Processingtoken cannot have preceding or trailing adjacent spaces. The explicit null token method makes it impossible to express a string consisting of exactly the text of the null token.

    X The implicit method interprets an empty field as null. This method is always available to non-string fields independent of any nullValue option setting and works even if the non-string field contains spaces. You can use the implicit method on string fields only if you have set the nullValue option to the empty string ('').

    The system considers a string field empty (potentially null) only if it contains truly zero characters (no spaces). Setting nullValue to the empty string makes it impossible to set any character varying (alias varchar(n)) column to an empty, zero-length string. In other words, if the system encounters an empty string and the nullValue is set to '', then the system treats the empty string as a null value.

    Enabling Load ContinuationIf you enable load continuation with the allowReplay option, or set the session variable LOAD_REPLAY_REGION to true, the system ensures that a simple load using external tables has the ability to continue after the system has been paused and resumed. You do not have to abort and resubmit the load. If no value is specified for the allowReplay option, or n is 0, the system defaults to the postgres default setting. If n is a valid non-zero num-ber, it specifies the number of allowable query restarts.

    The system accomplishes this automatic resumption by holding records to be sent to the SPU in the replay region in host memory. After the system sends the data in this region to the SPUs, it does a partial commit that forces all the unwritten data to the SPUs disks and allows the system to re-use the reload regions data buffers. If an SPU reboots or resets, the system rollbacks to the last partial commit, and reprocesses and resends the data.

    Note: Setting this option has a performance impact which depends on the speed of the incoming data. In addition, system memory is used for the data buffering that enables loads to be continued. When the buffer memory is exhausted, new loads will pend until needed memory becomes available.

    Load continuation cannot operate on any table that has one or more materialized views in an active state. Before enabling load continuation, suspend the associated materialized views. You can suspend active materialized views either through the NzAdmin tool or by issuing the ALTER VIEWS command. Sample syntax for ALTER VIEWS follows.

    ALTER VIEWS ON MATERIALIZE SUSPENDOnce loading has completed, you can update and activate the materialized views for the table. Sample syntax follows.

    ALTER VIEWS ON MATERIALIZE REFRESHFor more information, see the Netezza Performance Server System Administrators Guide.

    Handling Legal CharactersInput is composed of the printing characters (bytes 33-255), space (byte 32), horizontal tab (byte 9), line feed (byte 10) and carriage return (byte 13). By default you cannot use the nonprinting control characters.

    X Specify the ctrlChars option to permit control characters (bytes 1-8, 11-12, and 14-31) to appear within strings. In this case, only 0, 10, and 13 are not allowed.20525-1 Rev.1 3-15

  • Netezza Data Loading GuideX Specify the crInString option to permit unescaped carriage returns (cr) in char/varchar fields. If you specify the crlnString option, line feed (LF) becomes the default end-of-row indicator.

    X Specify the escapeChar option to permit any character preceded with a backslash (\) to be interpreted as an escape character. In this way, you could use the zero (byte 0), line feed (byte 10), carriage return (byte 13), or the closing delimiter.

    X Specify the ignoreZero option to cause the system to check every character for zero. This causes the system to skip over each zero it finds and to consider the next charac-ter. If you specify this option, you cannot include a zero byte in a string.

    For example, assume is a null byte, the field delimiter is '|' and you have speci-fied ignoreZero:

    ..|ABCDEF|..fills a char(6) column with 'ABCDEF'.

    ..|127|..fills a byteint column with binary 01111111 (= 0x7F).

    Table 3-6 lists the end-of-row and control characters that are permitted with the different nzload system options. The 3 mark indicates that the option is specified or allowed.

    Note: In Fixed-Length format, control characters are treated differently. For more informa-tion, see Chapter 6, Using Fixed-Length Format.

    Session Variables

    The following session variables work as nzload options.

    X LOAD_REPLAY_REGION See Enabling Load Continuation