52
Informatica Confidential. Do not duplicate. Revision: 1/25/2008 Tips & Techniques for NCR Teradata© with Informatica PowerCenter© Tips & Techniques for NCR Teradata© with Informatica PowerCenter©.................................................................................................... 1 Introduction............................................................................................................................................................................................. 3 Teradata Basics ...................................................................................................................................................................................... 3 Teradata Hardware ................................................................................................................................................................................. 3 Teradata Software .................................................................................................................................................................................. 3 Tools ...................................................................................................................................................................................................... 4 Client Configuration Basics for Teradata .................................................................................................................................................. 5 Informatica/Teradata Touch Points .......................................................................................................................................................... 5 ODBC ................................................................................................................................................................................................ 5 ODBC Windows.............................................................................................................................................................................. 7 ODBC UNIX ................................................................................................................................................................................... 7 Teradata External Loaders .................................................................................................................................................................. 8 Partitioned Loading ............................................................................................................................................................................15 Teradata TPump................................................................................................................................................................................16 Teradata MultiLoad ............................................................................................................................................................................17 Cleaning up after a failed MultiLoad session ...................................................................................................................................17 Using One Instance of Teradata MultiLoad to Load Multiple Tables.................................................................................................18 Multiple Workflows that “MultiLoad” to The Same Table ..................................................................................................................22 Teradata FastLoad ............................................................................................................................................................................22 FAQ: Why are there three different loaders for Teradata? Which loader should I use? ........................................................................22 Teradata FastExport ..........................................................................................................................................................................24 HOW TO: Enable the Teradata FastExport option in an existing repository using command line ......................................................25 HOW TO: use encryption with Fast Export......................................................................................................................................28 Teradata Parallel Transporter (TPT) ...................................................................................................................................................29 Connection attributes for Teradata Parallel Transporter (TPT) .........................................................................................................31 ETL Vs EL-T Design Paradigm (PushDown Optimization) ...................................................................................................................32 Maximizing Performance using Pushdown Optimization..................................................................................................................32 Running Pushdown Optimization Sessions .....................................................................................................................................33 Running Source-Side Pushdown Optimization Sessions .................................................................................................................33 Running Target-Side Pushdown Optimization Sessions ..................................................................................................................33 Running Full Pushdown Optimization Sessions ..............................................................................................................................33 Integration Service Behavior with Full Optimization .........................................................................................................................34 Working with SQL Overrides ..........................................................................................................................................................36 Configuring Sessions for Pushdown Optimization ...........................................................................................................................36 Design Techniques ........................................................................................................................................................................38 FAQ’s ....................................................................................................................................................................................................41 Uncached Lookup Date/Time limitation...............................................................................................................................................41 “Streaming”/Non-Staged Mode...........................................................................................................................................................41 Lookup Performance..........................................................................................................................................................................49

doc_2361421

Embed Size (px)

Citation preview

Page 1: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Tips & Techniques for NCR Teradata© with Informatica PowerCenter©

Tips & Techniques for NCR Teradata© with Informatica PowerCenter©.................................................................................................... 1 Introduction............................................................................................................................................................................................. 3 Teradata Basics...................................................................................................................................................................................... 3 Teradata Hardware ................................................................................................................................................................................. 3 Teradata Software .................................................................................................................................................................................. 3 Tools ...................................................................................................................................................................................................... 4 Client Configuration Basics for Teradata.................................................................................................................................................. 5 Informatica/Teradata Touch Points .......................................................................................................................................................... 5

ODBC ................................................................................................................................................................................................ 5 ODBC Windows.............................................................................................................................................................................. 7 ODBC UNIX ................................................................................................................................................................................... 7

Teradata External Loaders.................................................................................................................................................................. 8 Partitioned Loading............................................................................................................................................................................15 Teradata TPump................................................................................................................................................................................16 Teradata MultiLoad............................................................................................................................................................................17

Cleaning up after a failed MultiLoad session...................................................................................................................................17 Using One Instance of Teradata MultiLoad to Load Multiple Tables.................................................................................................18 Multiple Workflows that “MultiLoad” to The Same Table..................................................................................................................22

Teradata FastLoad ............................................................................................................................................................................22 FAQ: Why are there three different loaders for Teradata? Which loader should I use? ........................................................................22 Teradata FastExport ..........................................................................................................................................................................24

HOW TO: Enable the Teradata FastExport option in an existing repository using command line ......................................................25 HOW TO: use encryption with Fast Export......................................................................................................................................28

Teradata Parallel Transporter (TPT)...................................................................................................................................................29 Connection attributes for Teradata Parallel Transporter (TPT).........................................................................................................31

ETL Vs EL-T Design Paradigm (PushDown Optimization)...................................................................................................................32 Maximizing Performance using Pushdown Optimization..................................................................................................................32 Running Pushdown Optimization Sessions.....................................................................................................................................33 Running Source-Side Pushdown Optimization Sessions .................................................................................................................33 Running Target-Side Pushdown Optimization Sessions..................................................................................................................33 Running Full Pushdown Optimization Sessions ..............................................................................................................................33 Integration Service Behavior with Full Optimization.........................................................................................................................34 Working with SQL Overrides ..........................................................................................................................................................36 Configuring Sessions for Pushdown Optimization ...........................................................................................................................36 Design Techniques........................................................................................................................................................................38

FAQ’s....................................................................................................................................................................................................41 Uncached Lookup Date/Time limitation...............................................................................................................................................41 “Streaming”/Non-Staged Mode...........................................................................................................................................................41 Lookup Performance..........................................................................................................................................................................49

Page 2: doc_2361421

Informatica Confidential. Do not duplicate. 2 Revision: 1/25/2008

Hiding the Password..........................................................................................................................................................................49 Troubleshooting.................................................................................................................................................................................50

Errors that indicate a prior MultiLoad session has not been “cleaned up”.........................................................................................51 Sessions periodically fail with “broken pipe” errors when writing to a loader in “streaming” (non-staging) mode:................................51

Page 3: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Introduction This document gives an overview of all the Integration and touch points between Informatica and Teradata. It discusses ELT architecture and design Methodologies. This document covers all the details regarding the configuration of various Informatica Teradata touch points and supplies how-to examples for using Informatica PowerCenter® 8.1/8.1.1 and Teradata Warehouse. It covers Teradata basics and also describes some “tweaks,” which experience has shown may be necessary to adequately deal with some of the common practices you may encounter at a Teradata account. The Teradata documentation (especially the MultiLoad, FastLoad, TPump, FastExport and TPT reference materials) is highly recommended, as is the “External Loader” section of the Server Manager Guide for PowerCenter. Additional Information: All Teradata documentation can be downloaded from the Teradata Web site (http://www.info.ncr.com/Teradata/eTeradata-BrowseBy.cfm). Finally, a “Teradata Forum” provides a wealth of information that can be useful (http://www.Teradataforum.com). Please visit my.informatica.com to see Informatica documentation.

Teradata Basics Teradata, a division of NCR Corporation (NYSE: NCR), is the global technology leader in enterprise data warehousing, analytic applications and data warehousing services. Optimized for decision support, Teradata Warehouse is a powerful suite of software (Teradata RDBMS, data access and management utilities, and data mining capabilities), hardware and consulting services. Due to its parallel database architecture and scalable hardware, Teradata Warehouse outperforms other vendors solutions, from small to very large production warehouses with hundreds of terabytes. With 25+ years experience, Teradata is a major player in the financial services, retail, communications, insurance, travel and transportation, and manufacturing industries, as well as with government organizations.

Teradata Hardware While Teradata can run on other platforms, it is predominantly found on NCR Intel-based servers, which run NCR’s version of Unix (NCR Unix MP-RAS), which are proven to deliver higher performance, availability to guarantee response, and scalability to accommodate business growth. NCR Servers can be configured for both massively parallel processing (MPP) and symmetric multiprocessing (SMP). Each MPP “node” (or semi-autonomous processing unit) can support SMP. Teradata can be configured to communicate directly with a mainframe’s input/output (I/O) channel, known as “channel-attached.” Alternatively, it can be “network-attached”; that is, configured to communicate via transmission control protocol/Internet protocol (TCP/IP) over a local area network (LAN). Because PowerCenter runs on UNIX, you will be dealing with a network-attached configuration most of the time. However, occasionally, clients will want to use their existing channel-attached configuration under the auspices of better performance. Do not assume that channel-attached is always faster than network-attached. Similar performance has been observed across a channel attachment as well as a 100-MB LAN. In addition, channel attachment requires an additional sequential data move: data must be moved from the PowerCenter server to the mainframe before moving the data across the mainframe channel to Teradata.

Teradata Software

Page 4: doc_2361421

Informatica Confidential. Do not duplicate. 4 Revision: 1/25/2008

In the Teradata world, there are Teradata Director Program Identifier (TDPIDs), databases, and users. The TDPID is simply the name for connections from a Teradata client to Teradata server (as any other RDBMS systems that has host port and instance name mapping file). And Teradata also looks at databases and users somewhat synonymously. A login is used to login to the instance and A user has a userid, password, and space to store tables. Teradata AMPs are access module processors. Think of them as Teradata’s parallel database engines. Although they are strictly software (“virtual processors” according to Teradata terminology), Teradata often uses the terms AMP and hardware “node” interchangeably because an AMP previously was a piece of hardware.

Tools These are the tools you may find at a Teradata site. There are others, but these are the main ones. BTEQ: (pronounced BEE-teek): This is the command line utility for Teradata. It is similar to Oracle’s SQL*Plus. Teradata SQL, for the most part, is standard SQL. Teradata SQL Assistant/Queryman: This is the GUI SQL client for Teradata. Older versions were called “Queryman”, newer versions “Teradata SQL Assistant”. They are basically the same tool. If you’re going to be doing a lot with Teradata, it is probably worth getting access to this tool so you don’t have to do so much command line typing. WinDDI: Windows Data Dictionary Interface. This is a DBA type client tool that you use to perform database administration tasks. It is nice if you can get access to this tool, but it has been my experience that not every client will allow this.

MultiLoad: This is a sophisticated bulk load utility and is the primary method PowerCenter uses to load mass quantities of data into Teradata. Unlike bulk load utilities from other vendors, MultiLoad supports inserts, updates, upserts and delete operations. You can also use variables and embed conditional logic into MultiLoad scripts. It is very fast (millions of rows in a few minutes). It is also a resource hog.

Tpump: Tpump is kind of like a “MultiLoad lite”. It also supports inserts, updates, upserts and deletes. It is not as fast as MultiLoad, but it doesn’t use as many resource nor does it require table level locks. It is often used to “trickle load” a table. The syntax of a Tpump script is very similar to MultiLoad. FastLoad: As the name suggests, this is a very fast utility to load data into Teradata. It is the fastest method to load data into Teradata. However, there is one major restriction: the target table must be empty (yes, you read that correctly). FastExport: As the name suggests, this is a very fast utility to unload data from Teradata. Teradata’s ODBC has been optimized for query support so it is pretty fast (testing has shown it is as fast as BTEQ), but, alas, not as fast as FastExport. Fast Export has been supported since version 7.1.3. Teradata Warehouse Builder: Teradata Warehouse Builder (TWB) is a single utility that was intended to replace FastLoad, MultiLoad, Tpump and FastExport. It was to support a single scripting environment with different “modes”, where each “mode” roughly equates to one of the legacy utilities. It also was to support parallel loading (i.e. multiple instances of a TWB client could run and load the same table at the same time – something the legacy loaders cannot do). PowerCenter supports TWB. Unfortunately, NCR/Teradata does not. Much to Informatica’s dismay, TWB has never been formally released (never went “GA”). According to NCR, its release was delayed primarily because of issues with the mainframe version. This delay has lasted for over 2 years. If you find a prospect willing to use TWB, please do. Its ability to support parallel load clients makes some things quite a bit easier.

Page 5: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Client Configuration Basics for Teradata The client configuration is wholly contained in the “hosts” file (/etc/hosts on UNIX or winnt\system32\drivers\etc\hosts on Win). Informatica does not run on NCR UNIX MP-RAS, so you should not have to deal with the server side. Teradata uses a naming nomenclature in the “host” file. The name of the Teradata instance (that is, TDPID) is indicated by the letters and numbers that precede the string “cop1” in a “host” file entry. For example: 127.0.0.1 localhost demo1099cop1 192.168.80.113 curly pcop1 This tells the Teradata Database that when a client tool references the instance “demo1099,” it should direct requests to “localhost” (or IP address 127.0.0.1); when a client tool references instance “p”, it is located on the server “curly” (or IP address 192.168.80.113). There is no tie here to any kind of database server-specific information. That is, the TDPID is used strictly to define the name a client uses to connect to a server. Teradata does not care. It simply takes the name you specify, looks in the “host” file to map the <name>cop1 (or cop2, and so on) to an IP address, and then attempts to establish a connection with Teradata at the IP address. Sometimes you’ll see multiple entries in a “host” file with similar TDPIDs: 127.0.0.1 localhost demo1099cop1 192.168.80.113 curly_1 pcop1 192.168.80.114 curly_2 pcop2 192.168.80.115 curly_3 pcop3 192.168.80.116 curly_4 pcop4 This setup allows load balancing of clients among multiple Teradata nodes. That is, most Teradata systems have many nodes, and each node has its own IP address. Without the multiple “host” file entries, every client will connect to one node and eventually this node will be doing more than its fair share of client processing. With multiple “host” file entries, if it takes too long for the node specified with the “cop1” suffix to respond (that is, curly_1) to the client request to connect to “p,” then the client will automatically attempt to connect to the node with the “cop2” suffix (that is, curly_2).

Informatica/Teradata Touch Points Informatica PowerCenter 7.1x/8.x accesses the Teradata Database through various Teradata tools. Each one of the touch points is defined below according to how it is configured within PowerCenter.

ODBC Teradata provides 32-bit and 64 bit ODBC drivers for Windows and UNIX platforms. If possible, use the ODBC driver from Teradata’s TTU 8.1 release (or above) of its client software because this version supports “array reads.” Tests have shown these “new” drivers (3.05) can be 20 to 30 percent faster than the “old” drivers (3.01). Also 64 bit drivers will provide better performance than the 32 bit counter parts but it is important to note that drivers bit mode needs to be compatible with PowerCenter bit mode. This latest release of Teradata’s TTU, 8.1, uses ODBC v3.0.5. TTU 8.2 uses ODBC v3.0.6 but it is not yet supported but PowerCenter 8.1.1. Teradata’s ODBC driver is on a performance par with Teradata’s SQL CLI. In fact, ODBC is Teradata’s recommended SQL interface for its partners. When using traditional ETL approach, it is suggested to use ODBC drivers to write to Teradata only when you’re writing very small data sets (and even then, you should probably use TPump, defined later) because Teradata’s ODBC driver is optimized for query access, not for writing data. For extraction and large size lookups, it is better to use FastExport rather than ODBCPowerCenter Designer uses Teradata’s ODBC driver to import source and target table.

Page 6: doc_2361421

Informatica Confidential. Do not duplicate. 6 Revision: 1/25/2008

If you have performance problems, Pushdown Optimization is suggested. Detailed design methodologies, EL-T approach and Configuring and designing mappings using PushDown Optimization is described in the following sections. If PushDown Optimization is unavailable, it is suggested to use the native Teradata Utilities. Note: ODBC is not good for sourcing and lookups instead FastExport should be used for large sourcing and lookups. The most efficient method to create a lookup (for very large data sets) is to use FastExport to create a sorted file that can then be used as a source for a flat file lookup. In version 8.5, one can also use the “pipeline lookup” feature to do this automatically. That is, the “pipeline lookup” feature allows a source qualifier (any source qualifier, even a source qualifier that will be using FastExport behind the scenes) to be tied to a lookup.

Page 7: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

ODBC Windows

ODBC UNIX When the PowerCenter server is running on UNIX, then ODBC is required to read (both sourcing and look-ups) from the Teradata Database. As with all UNIX ODBC drivers, the key to configuring the UNIX ODBC driver is adding the appropriate entries to the “.odbc.ini” file. To correctly configure the “.odbc.ini” file, there must be an entry under [ODBC Data Sources] that points to the Teradata ODBC driver shared library (tdata.sl on HP-UX, the standard shared library extension on other flavors of UNIX). The following example shows the required entries from an actual “.odbc.ini” file (note that the path to the driver may be different on each computer): [ODBC Data Sources]

Page 8: doc_2361421

Informatica Confidential. Do not duplicate. 8 Revision: 1/25/2008

dBase=MERANT 3.60 dBase Driver Oracle8=MERANT 3.60 Oracle 8 Driver Text=MERANT 3.60 Text Driver Sybase11=MERANT 3.60 Sybase 11 Driver Informix=MERANT 3.60 Informix Driver DB2=MERANT 3.60 DB2 Driver MS_SQLServer7=MERANT SQLServer driver

TeraTest=tdata.sl [TeraTest] Driver=/usr/odbc/drivers/tdata.sl Description=Teradata Test System DBCName=148.162.247.34 Similar to the client “host” file set-up, you can specify multiple IP addresses for the DBCName to balance the client load across multiple Teradata nodes. Consult with the Teradata administrator for exact details on this (or copy the entries from the PC client’s “host” file (see the section “Client Configuration Basics for Teradata” earlier in this document). Important note: Make sure that the Merant ODBC path precedes the Teradata ODBC path information in the PATH and SHLIB_PATH (or LD_LIBRARY_PATH, and so on) environment variables. This is necessary because both sets of ODBC software use some of the same file names. PowerCenter should use the Merant files because this software has been certified. Important note: If possible, use the ODBC driver from Teradata’s TTU7 release (or above) of their client software because this version supports “array reads”. Tests have shown these “new” drivers (3.02) can be 20%-30% faster than the “old” drivers (3.01).

Teradata External Loaders PowerCenter 7.1.2/8.x supports four different Teradata external loaders: TPump, FastLoad, MultiLoad, and Teradata Warehouse Builder. The actual Teradata loader executables (TPump, mload, fastload, tbuild) must be accessible by the PowerCenter server generally in the path statement. Note: Please look at the Product Availability Matrix All of the Teradata loader connections will require a value to the TDPID attribute. Refer to the first section of this document to understand how to correctly enter the value. All of these loaders require:

A load file, which can be configured to be a stream/pipe and is autogenerated by PowerCenter A control file of commands to tell the loader what to do (PowerCenter autogenerates)

All of these loaders will also produce a log file, which will be the means to debug the loader if something goes wrong. Because these are external loaders, PowerCenter will only be notified of whether it ran successfully or not.

Page 9: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

By default, the input file, control file, and log file will be created in $PMTargetFileDir of the PowerCenter server executing the workflow.

Page 10: doc_2361421

Informatica Confidential. Do not duplicate. 10 Revision: 1/25/2008

You can use any of these loaders by configuring the target in the PowerCenter session to be a “File Writer” and then choosing the appropriate loader.

Page 11: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

The auto-generated control file can be overridden. Click the Pencil icon next to the loader connection name.

Page 12: doc_2361421

Informatica Confidential. Do not duplicate. 12 Revision: 1/25/2008

Scroll to the bottom of the connection attribute list and click the value next to the “Control File Content Override” attribute. Then click the Down arrow.

Page 13: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Click the “Generate” button and change the control file as you wish. The repository stores the changed control file.

Alternate option: 1. Run session. 2. Modify the control file created by the initial session run. 3. Make the control file read-only. 4. Run the session again. This and subsequent session runs will use the modified control file.

Page 14: doc_2361421

Informatica Confidential. Do not duplicate. 14 Revision: 1/25/2008

Most of the loaders also use some combination of internal work, error, and log tables. By default, these will be in the same database as the target table. All of these can now be overridden in the attributes of the connection.

To stage the input flat file to the disk (verus in-memory or pipe option), ensure the “Is Staged” attribute is checked. If the “Is Staged” attribute is not checked, then the file will be piped/streamed to the loader. If you select the non-staged mode for a loader, also set the “checkpoint” property to 0. This effectively turns off the “checkpoint” processing. “Checkpoint” processing is used for recovery/restart of Teradata FastLoad and MultiLoad sessions. However, if you are using a named pipe instead of a physical file as input, then the recovery/restart mechanism of the loaders does not work. Besides impacting performance (the checkpoint processing is not free, and we want to eliminate unnecessary overhead when possible), a nonzero checkpoint value will sometimes cause seemingly random errors and session failures when used with named pipe input (as is the case in “streaming” mode).

Page 15: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Partitioned Loading With PowerCenter if you set a “round robin” partition point on the target definition and set each target instance to be loaded using the same loader connection instance, then PowerCenter automatically writes all data to the first partition and only starts one instance of Teradata FastLoad or MultiLoad. You will know you are getting this behavior if you see the following entry in the session log: MAPPING> DBG_21684 Target [TD_INVENTORY] does not support multiple partitions. All data will be routed to the first partition. If you do not see this message, then chances are the session fails with the following error: WRITER_1_*_1> WRT_8240 Error: The external loader [Teradata Mload Loader] does not support partitioned sessions. WRITER_1_*_1> Thu Jun 16 11:58:21 2005 WRITER_1_*_1> WRT_8068 Writer initialization failed. Writer terminating.

Page 16: doc_2361421

Informatica Confidential. Do not duplicate. 16 Revision: 1/25/2008

Teradata TPump Teradata TPump is an external loader that supports inserts, updates, upserts, deletes, and data-driven updates. Multiple TPump loaders can execute simultaneously against the same table because TPump doesn’t use many resources or require table-level locks. It is often used to “trickle load” a table. As stated earlier, Teradata TPump provides a faster method to update a table than using ODBC, but will not be as fast as the other loaders.

Page 17: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Teradata MultiLoad This sophisticated bulk load utility is the primary method PowerCenter uses to load/update large quantities of data into a Teradata Warehouse. Unlike bulk load utilities from other vendors, Teradata MultiLoad supports inserts, updates, upserts, deletes, and data-driven operations in PowerCenter. You can also use variables and embed conditional logic into Teradata MultiLoad scripts. It is very fast (millions of rows in a few minutes). It can be resource-intensive and will take a table lock.

Cleaning up after a failed MultiLoad session: Teradata MultiLoad supports sophisticated error recovery. That is, it allows load jobs to be restarted without having to redo all of the prior work. However, for the types of problems normally encountered during a Proof Of Concept (loading null values into a column that does not support nulls, incorrectly formatted date columns), the error recovery mechanisms tend to get in the way. Please refer to Teradata MultiLoad Manual for Teradata MultiLoad’s sophisticated error recovery (Available of www. Teradata.com/manuals). To learn how to work around the recovery mechanisms to restart a failed MultiLoad script from scratch, read this section.

Page 18: doc_2361421

Informatica Confidential. Do not duplicate. 18 Revision: 1/25/2008

Teradata MultiLoad puts the target table into the “MultiLoad” state. Upon successful completion, the target table is returned to the “normal” (non-“MultiLoad”) state. Therefore, when a MultiLoad session fails for any reason, the table is left in the “MultiLoad” state, and you cannot simply rerun the same MultiLoad session. MultiLoad will report an error. In addition, MultiLoad also queries the target table’s MultiLoad log table to see if it contains any errors. If a MultiLoad log table exists for the target table, then you also will not be able to rerun your MultiLoad job. To recover from a failed MultiLoad, “release” the target table from the “MultiLoad” state and also drop the MultiLoad log table. You can do this using BTEQ or Teradata QueryMan to issue the following commands: drop table mldlog_<table name>; release mload <table name>; Note: The “drop table” command assumes that you’re recovering from a MultiLoad script generated by PowerCenter (PowerCenter always names the MultiLoad log table “mldlog_<table name>”). If you’re working with a hand-coded MultiLoad script, the name of the MultiLoad log table could be anything. Here is the actual text from a BTEQ session that cleans up a failed load to the table “td_test” owned by the user “infatest:” BTEQ -- Enter your DBC/SQL request or BTEQ command: drop table infatest.mldlog_td_test; drop table infatest.mldlog_td_test; *** Table has been dropped. *** Total elapsed time was 1 second. BTEQ -- Enter your DBC/SQL request or BTEQ command: release mload infatest.td_test; release mload infatest.td_test; *** Mload has been released. *** Total elapsed time was 1 second.

Using One Instance of Teradata MultiLoad to Load Multiple Tables MultiLoad is a big consumer of resources on a Teradata system. Some systems will have hard limits on the number of concurrent MultiLoad sessions allowed. By default, PowerCenter will start an instance of MultiLoad for every target file. Sometimes, this is illegal (if the multiple instances target the same table). Other times, it is just expensive. Therefore, a prospect may ask that PowerCenter use a single instance of MultiLoad to load multiple tables (or to load both inserts and updates into the same target table). To make this happen, you must heavily edit the generated MultiLoad script file. Note: This is not an issue with Teradata TPump because TPump is not as resource-intensive as MultiLoad (and multiple concurrent instances of TPump can target the same table). Here’s the workaround:

1) Use a dummy session (i.e., set test rows to 1 and target a test database) to generate MultiLoad control files for each of the targets.

2) Merge the multiple control files (one per target table) into a single control file (one for all target tables). 3) Configure the session to call MultiLoad from a post-session script using the control file created in Step 2. Integrated support

cannot be used because each input file is processed sequentially. and this causes problems when combined with the integrated named pipes and streaming of PowerCenter.

Details on “merging” the control files:

Page 19: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

1) There is a single log file for each instance of MultiLoad. Therefore, you do not have to change or add anything to the “LOGFILE” statement. However, you might want to change the name of the log table because it may be a log that spans multiple tables.

2) Copy the work and error tables’ delete statements into the common control file. 3) Modify the “BEGIN MLOAD” statement to specify all the tables that the MultiLoad job will be hitting. 4) Copy the “Layout” sections into the common control file and give each a unique name. Organize the file such that all the layout

sections are grouped together. 5) Copy the “DML” sections into the common control file and give each a unique name. Organize the file such that all the DML

sections are grouped together. 6) Copy the “Import” statements into the common control file and modify them to reflect the unique names created for the

referenced layout and DML sections created in Steps 4 and 5. Organize the file such that all the import sections are grouped together.

7) Run “chmod –w” on the newly minted control file so PowerCenter doesn’t overwrite it, or, better yet, name it something different so PowerCenter cannot overwrite it.

8) Remember, a single instance of Teradata MultiLoad can target five tables at most. Therefore, don’t combine more than five target files into a common file.

Here’s an example of a control file merged from two default control files: .DATEFORM ANSIDATE; .LOGON demo1099/infatest,infatest; .LOGTABLE infatest.mldlog_TD_TEST; DROP TABLE infatest.UV_TD_TEST ; DROP TABLE infatest.WT_TD_TEST ; DROP TABLE infatest.ET_TD_TEST ; DROP TABLE infatest.UV_TD_CUSTOMERS ; DROP TABLE infatest.WT_TD_CUSTOMERS ; DROP TABLE infatest.ET_TD_CUSTOMERS ; .ROUTE MESSAGES WITH ECHO TO FILE c:\LOGS\TgtFiles\td_test.out.ldrlog ; .BEGIN IMPORT MLOAD TABLES infatest.TD_TEST, infatest.TD_CUSTOMERS ERRLIMIT 1 CHECKPOINT 10000 TENACITY 10000 SESSIONS 1 SLEEP 6 ; /* Begin Layout Section */ .Layout InputFileLayout1; .Field CUST_KEY 1 CHAR( 12) NULLIF CUST_KEY = '*' ; .Field CUST_NAME 13 CHAR( 20) NULLIF CUST_NAME = '*' ; .Field CUST_DATE 33 CHAR( 10) NULLIF CUST_DATE = '*' ; .Field CUST_DATEmm 33 CHAR( 2) ; .Field CUST_DATEdd 36 CHAR( 2) ;

Page 20: doc_2361421

Informatica Confidential. Do not duplicate. 20 Revision: 1/25/2008

.Field CUST_DATEyyyy 39 CHAR( 4) ;

.Field CUST_DATEtd CUST_DATEyyyy||'/'||CUST_DATEmm||'/'||CUST_DATEdd NULLIF CUST_DATE = '*' ; .Filler EOL_PAD 43 CHAR( 2) ; .Layout InputFileLayout2; .Field CUSTOMER_KEY 1 CHAR( 12) ; .Field CUSTOMER_ID 13 CHAR( 12) ; .Field COMPANY 25 CHAR( 50) NULLIF COMPANY = '*' ; .Field FIRST_NAME 75 CHAR( 30) NULLIF FIRST_NAME = '*' ; .Field LAST_NAME 105 CHAR( 30) NULLIF LAST_NAME = '*' ; .Field ADDRESS1 135 CHAR( 72) NULLIF ADDRESS1 = '*' ; .Field ADDRESS2 207 CHAR( 72) NULLIF ADDRESS2 = '*' ; .Field CITY 279 CHAR( 30) NULLIF CITY = '*' ; .Field STATE 309 CHAR( 2) NULLIF STATE = '*' ; .Field POSTAL_CODE 311 CHAR( 10) NULLIF POSTAL_CODE = '*' ; .Field PHONE 321 CHAR( 30) NULLIF PHONE = '*' ; .Field EMAIL 351 CHAR( 30) NULLIF EMAIL = '*' ; .Field REC_STATUS 381 CHAR( 1) NULLIF REC_STATUS = '*' ; .Filler EOL_PAD 382 CHAR( 2) ; /* End Layout Section */ /* begin DML Section */ .DML Label tagDML1; INSERT INTO infatest.TD_TEST ( CUST_KEY , CUST_NAME , CUST_DATE ) VALUES ( :CUST_KEY , :CUST_NAME , :CUST_DATEtd ) ; .DML Label tagDML2; INSERT INTO infatest.TD_CUSTOMERS ( CUSTOMER_KEY , CUSTOMER_ID , COMPANY , FIRST_NAME , LAST_NAME , ADDRESS1 , ADDRESS2 , CITY , STATE , POSTAL_CODE , PHONE , EMAIL , REC_STATUS

Page 21: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

) VALUES ( :CUSTOMER_KEY , :CUSTOMER_ID , :COMPANY , :FIRST_NAME , :LAST_NAME , :ADDRESS1 , :ADDRESS2 , :CITY , :STATE , :POSTAL_CODE , :PHONE , :EMAIL , :REC_STATUS ) ; /* end DML Section */ /* Begin Import Section */ .Import Infile c:\LOGS\TgtFiles\td_test.out Layout InputFileLayout1 Format Unformat Apply tagDML1 ; .Import Infile c:\LOGS\TgtFiles\td_customers.out Layout InputFileLayout2 Format Unformat Apply tagDML2 ; /* End Import Section */ .END MLOAD; .LOGOFF;

Page 22: doc_2361421

Informatica Confidential. Do not duplicate. 22 Revision: 1/25/2008

Multiple Workflows that “MultiLoad” to The Same Table BecauseTeradata MultiLoad puts a lock on the table, it requires that all MultiLoad sessions handle wait events so they don't try to access the table simultaneously. Also, any log files should be given unique names for the same reason.

Teradata FastLoad As the name suggests, this utility is the fastest method to load data into a Teradata Warehouse. However, there is one major restriction: the target table must be empty.

FAQ: Why are there three different loaders for Teradata? Which loader should I use? FastLoad is the fastest loader, but it only works with empty tables with no secondary indexes. Use FastLoad for a high-volume initial load, or for high-volume “truncate and reload” operations. FastLoad can only insert data.

Page 23: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

MultiLoad can insert, update, delete, and upsert into Teradata. An upsert is essentially an “update else insert” performed at the database level. Note that this does not require specifying “update else insert” in PowerCenter, or use of an Update Strategy transformation. You specify “Upsert” as the Load Mode in the Connection Properties when defining the MultiLoad External Loader Connection in the PowerCenter Workflow Manager. Use MultiLoad for large volume incremental loads.

Both FastLoad and MultiLoad work at the data block level. In other words, these loaders are much faster than “standard DML” within Teradata. The both acquire table level locks, which means they are only appropriate for off-line data loading. MultiLoad first writes the data into temporary tables in Teradata, and then it updates the data blocks directly. All changes to a physical data block are made in a single operation.

Tpump is designed to refresh the data warehouse on-line or in real-time. Tpump is an alternative to MultiLoad for relatively low-volume, on-line data loads. It does not incur the overhead of writing to temporary tables, but it does potentially incur the expense of changing the same physical data block multiple times. TPump is not as fast as MultiLoad for large volume loads, but Tpump acquires row-hash locks on the table, rather than acquiring a table-level lock. Tpump also provides a mechanism to limit resource consumption by controlling the rate at which statements are sent to the RDBMS. Other users and applications can access data in the table being loaded while Tpump is running.

Page 24: doc_2361421

Informatica Confidential. Do not duplicate. 24 Revision: 1/25/2008

PowerCenter 7.1.x/ 8.x Product Availability Matrix for Teradata

Product Ver OS Database Ver Src Tgt Rep Status PowerCenter 8.1.1 SP1

(Ltd PAM) Windows Unix

Teradata v2R6.1, v2R6, v2R5, v2R5.1 x x Supported

PowerCenter 8.0.0 Windows Unix

Teradata v2R6.1, v2R6, v2R5.1, v2R5, v2R4, v2R4.1

x x Supported

PowerCenter 7.1.5 Windows Unix

Teradata v2R6.1, v2R6, v2R5.1, v2R5, v2R4, v2R4.1

x x x Supported

PowerCenter 7.1.4 Windows Unix

Teradata v2R6.1, v2R6, v2R5.1, v2R5, v2R4, v2R4.1

x x x Supported

PowerCenter 7.1.3 Windows Unix

Teradata v2R6.1, v2R6, v2R5.1, v2R5, v2R4, v2R4.1

x x x Supported

PowerCenter 7.1.2 Windows Unix

Teradata v2R6, v2R5.1, v2R5, v2R4, v2R4.1

x x x Supported

PowerCenter 7.1.1 Windows Unix

Teradata v2R5.1, v2R5, v2R4, v2R4.1 x x x Supported

Teradata FastExport Teradata Fast Export support with PowerCenter 7.1.3 and above versions Prior versions of PowerCenter do not support Teradata FastExport. FastExport is a utility that uses multiple Teradata sessions to quickly export large amounts of data from a Teradata database. You can create a PowerCenter session that uses FastExport to read Teradata sources. To use FastExport with PowerCenter, you need to register the FastExport plug-in to PowerCenter. The plug-in includes a FastExport Teradata connection and FastExport Reader that you can select for a session. To register FastExport plug-in in PowerCenter 8.1.1, please see the instructions given below HOW TO: Register a FastExport plug-in using Admin Console in PowerCenter 8.1.1:

1) Run the Repository Service in exclusive

2) In the Navigator, select the Repository Service to which you want to add the plug-in. 3) Click the Plug-ins tab. 4) Click the link to register a Repository Service plug-in.

Page 25: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

5) On the Register Plugin for <Repository Service> page, click the Browse button to locate the plug-in file. 6) If the plug-in was registered previously and you want to overwrite the registration, select the check box to update the existing

plug-in registration. For example, you might select this option when you upgrade a plug-in to the latest version. 7) Enter your repository user name and password and Click OK. 8) The Repository Service registers the plug-in with the repository. The results of the registration operation appear in the activity

log. 9) Run the Repository Service in normal mode.

HOW TO: Enable the Teradata FastExport option in an existing repository using command line To enable the FastExport Application Connection for a repository it is necessary to register the plug-in file for Teradata FastExport ("pmtdfexp.xml") This plug-in file is located in the "native" sub-directory of the Repository Server installation. * Windows The default directory on Windows is C:\Program Files\Informatica PowerCenter 7.1.3\RepositoryServer\bin\native * Unix An example on UNIX would be: /local/repserver/native Use the "pmrepagent" command located in the repository server installation directory. Syntax: * Windows: pmrepagent registerplugin -r reponame -n Administrator -x Administrator -t dbtype -u dbuser -p dbpwd -c connect_string –I .\native\pmtdfexp.xml -N * UNIX: pmrepagent registerplugin -r reponame -n Administrator -x Administrator -t dbtype -u dbuser -p dbpwd -c connect_string -i ./native/pmtdfexp.xml -N Example: * Windows: cd C:\Program Files\Informatica PowerCenter 7.1.3\RepositoryServer\bin pmrepagent registerplugin -r PC71X -n Administrator -x Administrator -t oracle -u PC71X -p PCPASS -c orarepa2.informatica.com -i .\native\pmtdfexp.xml -N * UNIX: $ cd /local/repserver $ pmrepagent registerplugin -r PC71X -n Administrator -x Administrator -t oracle -u PC71X -p PCPASS -c orarepa2.informatica.com -i ./native/pmtdfexp.xml -N

Page 26: doc_2361421

Informatica Confidential. Do not duplicate. 26 Revision: 1/25/2008

To use FastExport, create a mapping with a Teradata source database. In the session, use FastExport reader instead of Relational reader. Use a FastExport connection to the Teradata tables you want to export in a session. FastExport uses a control file that defines what to export. When a session starts, the Integration Service creates the control file from the FastExport connection attributes. If you create a SQL override for the Teradata tables, the Integration Service uses the SQL to generate the control file. You can override the control file for a session by defining a control file in session properties. The Integration Service writes FastExport messages in the session log and information about FastExport performance in the FastExport log. PowerCenter saves the FastExport log in the folder defined by the Temporary File Name session attribute. The default extension for the FastExport log is .log. HOWTO: To use FastExport in a session:

1) Create a FastExport connection in the Workflow Manager and configure the connection attributes. 2) Open the session and change the Reader property from Relational Reader to Teradata FastExport Reader. 3) Change the connection type and select a FastExport connection for the session. 4) Optionally, create a FastExport control file in a text editor and save it in the Repository.

HOWTO: To create a FastExport connection:

1) Click Connections > Application in the Workflow Manager. The Connection Browser dialog box appears. 2) Click New. 3) Select a Teradata FastExport connection and click OK.

Page 27: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

4) Enter a name for the FastExport connection.

At run time, PowerCenter starts FastExport and streams the data (in FastExport format) to a named pipe or file. PowerCenter then reads from the named pipe/file. "FastExport file format" is used since this is more efficient than converting everything to ASCII characters (within a FastExport formatted file, numbers are stored as binary data). Important Fast Export Session Attributes Is Staged – If Selected FastExport writes data to a stage file Fractional Seconds – Precision for Fractional Seconds following the decimal point in a timestamp. Range is between 0 to 6 – be very careful as this has to match with table creation. Control File Override – Control File override attribute There is a known issue with Control File override – currently the overrides in the Control File is not persistent. If Fast export support is not available in your PowerCenter version: For versions of PowerCenter prior to 7.1.3 the following options are available: * Version 3.02 of Teradata's ODBC driver supports array reads (SQLExtendedFetch). * Write a FastExport script and invoke it as a pre-session command task. The FastExport task could also be written to a pipe.

Page 28: doc_2361421

Informatica Confidential. Do not duplicate. 28 Revision: 1/25/2008

Additional Information: * FastExport is an extract utility (similar to external loaders Multi-load, Tpump, and Fastload). * FastExport can be used in streaming mode which avoids the need to stage the file. * FastExport is only available for sources, not for lookups (to do a lookup type operation use a joiner, refer to article 10238).

HOW TO: use encryption with Fast Export To encrypt data with FastExport, enable the DataEncryption attribute in the Teradata FastExport connection. The DataEncryption attribute is disabled by default. Teradata FastExport is a new feature available in PowerCenter 7.1.3 and later releases. There is a new repository plug-in that comes with the 7.1.3 install and when you create a new repository using the PowerCenter 7.1.3 Repository Server it automatically registers the plug-in. However, a repository that is created with a previous release of PowerCenter will not have this plug-in registered in the repository. WHAT Command is used behind the screen while running FastExport? When running a session with a Teradata source that extracts the data using FastExport PowerCenter does the following: Runs the fexp command as a child process and opens a named pipe to retrieve data from the Teradata table. Also Either of the following FastExport commands can be used with the .ctl file generated by PowerCenter.

fexp -c ".RUN FILE <control file name>;" fexp -r ".RUN FILE <control file name>;"

Some known limitations with PowerCenter 7.1.3 1. Teradata FastExport (CR:88001) - Teradata FastExport does not support SQL override at session level If we paste the SQL in the Teradata FastExport Properties in the mappings tab of the session properties it does not override. Either we should allow SQL override in the session level or disable the property. 2. Teradata FastExport (CR:88240) - We support ANSI time only for fastexport as of now Getting error: READER_1_1_1> [PMTDFEXP_EN_305416] [ERROR] Received unexpected data. READER_1_1_1> SDKS_38200 Partition-level [SQ_CERT_ALL_DATATYPES_SRC]: Plug-in #305400 failed in run(). 3. Teradata FastExport (CR:88367) - Teradata FastExport gives different DataEncryption message on TTU v7 TTU v7 doesnt have an option DATAENCRYPTION. So customers should use TTUv8

Page 29: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Teradata Parallel Transporter (TPT)

Figure provided by courtesy of NCR Terdata©

Teradata Parallel Transporter (TPT) is a single utility that is intended to replace Teradata FastLoad, MultiLoad, TPump, and FastExport. It will support a single scripting environment with different “modes,” with each “mode” roughly equating to one of the legacy utilities. It also will support parallel loading (i.e., multiple instances of a TPT client could run and load the same table at the same time – something the legacy loaders cannot do). Teradata Parallel Transporter is used to massively parallel process reading large amounts of data from Teradata and writing large amounts of data to Teradata. Teradata Parallel Transporter PowerExchange connect provides integration between PowerCenter and Teradata for data extraction and loading. Using the Teradata Parallel Transporter in Informatica PowerCenter, sessions can use Teradata Parallel Transporter to: Read Teradata Sources and Load Teradata Targets It can be used to create a Mapping with a Teradata Source or Target or use an existing mapping created using Teradata ODBC Connection. then use a Teradata Parallel Transporter Connection to connect to the Teradata tables to be loaded or exported in a session PowerCenter Connect for Teradata PT then extracts or loads data using one of the following methods depending upon :

Export: Extracts data from Teradata. Load: Used for initial bulk table loading in the Teradata database. Update: Used to update, insert, upsert, and delete data from the Teradata database. Stream: Used to update, insert, upsert, and delete data (continuous data load) from the Teradata database.

Page 30: doc_2361421

Informatica Confidential. Do not duplicate. 30 Revision: 1/25/2008

Traditionally if you had to do the above, then you would have to use Fastload, Multiload, TPump, FastExport, ODBC or combination of two or more loading mechanisms. Key thing to note is that there are no control files generated under the cover – no there is no need to overwrite or store passwords in the file. Also, the metadata lineage is completely preserved within PowerCenter. Performance of TPT is supposedly 20% faster compared to the traditional loading or extraction mechanisms. RELEASE INFORMATION Support since Informatica PowerCenter Version: 8.1.1.0.2 which was released July 2007. Supports Teradata TPT API 8.2 Prerequisites (on machine where PowerCenter Integration Service is running): Teradata Parallel Transporter API 8.2 Teradata CLIv2 4.8.2 Shared ICU libraries for Teradata 01.01.02.xx Teradata GSS Client nt-i386 06.02.00.00 A seperate license is required as per Teradata for TPT API.

Page 31: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Connection attributes for Teradata Parallel Transporter (TPT) Connection Attribute Required/Optional Description TDPID Required Name/IP Address of Teradata Server Host Database Name Optional Teradata database name Tenacity Optional Number of hours the driver attempts to log on Max Session Optional Maximum number of sessions to log on. Block Size Optional Block size in bytes used when returning data to the Client. Sleep Optional Number of minutes the driver pauses before attempting to

log on Data Encryption Optional Activates full security encryption of SQL requests, responses

and data System Operator Required Data loading operator Log Database Optional Name of the log database Log Table Name Required Name of the restart log table for restart information Error Database Optional Name of the error database Error Table Name 1 Optional Name of the first error table

Error Table Name 2 Optional Name of the second error table

Drop Error Tables N/A Reserved for future use

Known Issues in 8.1.1 SP4 CR 130060: Session processes a different number of rows than configured for test load

When you enable test load in the session properties, the total number of rows processed by the session might differ from the number of rows you configure for the test load in the session properties.

CR 130061: Teradata PT API 8.2 UPDATE and STREAM system operators not supported on UNIX platforms

TPT API 8.2 supports only LOAD and EXPORT system operators in ASCII and UNICODE mode on UNIX platforms. The UPDATE system operator fails on multi-AMP instances and the STREAM system operator cannot load UTF-8 data.

CR 130062: Teradata PT API 8.2 might not return correct row statistics in the session load summary

Teradata PT API 8.2 might not include the correct number of affected and rejected rows for the update strategy, including insert, update, and delete operations. Workaround: If a session fails, use the session log and error and log tables for more information about errors that occurred during the session.

CR 130064: Cannot insert data using LOAD system operator for multiple pass-through partitions

The LOAD system operator requires the target table to be empty for loading data. If any partition establishes a connection with Teradata and starts inserting the data, Teradata locks the target table. As a result, other partitions cannot establish connections with the target table.

CR 130065: Cannot insert, update, or delete data using the UPDATE system operator for multiple pass-through partitions

Page 32: doc_2361421

Informatica Confidential. Do not duplicate. 32 Revision: 1/25/2008

The UPDATE system operator cannot establish a connection with a Teradata target table for a partition when the target table is already being loaded by another partition. Using the UPDATE operator with multiple pass-through partitions might cause inconsistent results.

CR 170071: Decimal data type behaves incorrectly for high precision data when you enable the Enable High Precision session property

For the data with precision greater than 10, the last digit of a decimal number is not the same as the last digit of the source data when you enable the Enable High Precision session property. Workaround: Disable the Enable High Precision session property.

ETL Vs EL-T Design Paradigm (PushDown Optimization) Informatica PowerCenter embeds a Powerful engine that actually has a memory management system built within and all the smart algorithms built into the engine to perform various transformation operations such as aggregation, sorting, joining, lookup etc. This is a typically referred to as ETL architecture where EXTRACTS, TRANSFORMATIONS and LOAD performed. So in other words data is extracted from the data source to the PowerCenter Engine (can be on the same machine as the source or a separate machine) where all the transformations are applied and then pushed to the target. In this scenario, some of the performance considerations are as there is data transfer, the network has to be fast and tuned effectively and also the Hardware on which PowerCenter is running should be a powerful machine with high processing power and high memory. EL-T is a new design or runtime paradigm that is becoming popular with the advent of higher performing RDBM systems be it DSS or OLTP. And Terdata specially runs on well tuned operating system and well tuned hardware – so EL-T paradigm just tries to maximize the benefits of this system by pushing as much as transformation logic on the Teradata Box. EL-T design paradigm can be achieved through Pushdown Optimization option provided within Informatica PowerCenter 8.1 version.

Maximizing Performance using Pushdown Optimization You can push transformation logic to the source or target database using pushdown optimization. The amount of work you can push to the database depends on the pushdown optimization configuration, the transformation logic, and the mapping and session configuration. When you run a session configured for pushdown optimization, the Integration Service analyzes the mapping and writes one or more SQL statements based on the mapping transformation logic. The Integration Service analyzes the transformation logic, mapping, and session configuration to determine the transformation logic it can push to the database. At run time, the Integration Service executes any SQL statement generated against the source or target tables, and it processes any transformation logic that it cannot push to the database. Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the Integration Service can push to the source or target database. You can also use the Pushdown Optimization Viewer to view the messages related to Pushdown Optimization.

FILT R AN S T erada ta_T arge t

(T erada ta )SQ _T D _SR CT erad a ta_Sourc

e (T erada ta ) Figure showing the mapping

Page 33: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

The mapping contains a Filter transformation that filters out all items except for those with an ID greater than 1005. The Integration Service can push the transformation logic to the database, and it generates the following SQL statement to process the transformation logic: INSERT INTO ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC, n_PRICE) SELECT ITEMS.ITEM_ID, ITEMS.ITEM_NAME, ITEMS.ITEM_DESC, CAST(ITEMS.PRICE AS INTEGER) FROM ITEMS WHERE (ITEMS.ITEM_ID >1005) The Integration Service generates an INSERT SELECT statement to obtain and insert the ID, NAME, and DESCRIPTION columns from the source table, and it filters the data using a WHERE clause. The Integration Service does not extract any data from the database during this process.

Running Pushdown Optimization Sessions When you run a session configured for pushdown optimization, the Integration Service analyzes the mapping and transformations to determine the transformation logic it can push to the database. If the mapping contains a mapplet, the Integration Service expands the mapplet and treats the transformations in the mapplet as part of the parent mapping. You can configure pushdown optimization in the following ways:

Using source-side pushdown optimization: The Integration Service pushes as much transformation logic as possible to the source database.

Using target-side pushdown optimization: The Integration Service pushes as much transformation logic as possible to the

target database.

Using full pushdown optimization: The Integration Service pushes as much transformation logic as possible to both source and target databases. If you configure a session for full pushdown optimization, and the Integration Service cannot push all the transformation logic to the database, it performs partial pushdown optimization instead.

Running Source-Side Pushdown Optimization Sessions When you run a session configured for source-side pushdown optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the database. The Integration Service generates a SELECT statement based on the transformation logic for each transformation it can push to the database. When you run the session, the Integration Service pushes all transformation logic that is valid to push to the database by executing the generated SQL statement. Then, it reads the results of this SQL statement and continues to run the session. If you run a session that contains an SQL override, the Integration Service generates a view based on the SQL override. It then generates a SELECT statement and runs the SELECT statement against this view. When the session completes, the Integration Service drops the view from the database.

Running Target-Side Pushdown Optimization Sessions When you run a session configured for target-side pushdown optimization, the Integration Service analyzes the mapping from the target to the source or until it reaches an upstream transformation it cannot push to the database. It generates an INSERT, DELETE, or UPDATE statement based on the transformation logic for each transformation it can push to the database, starting with the first transformation in the pipeline it can push to the database. The Integration Service processes the transformation logic up to the point that it can push the transformation logic to the target database. Then, it executes the generated SQL.

Running Full Pushdown Optimization Sessions

Page 34: doc_2361421

Informatica Confidential. Do not duplicate. 34 Revision: 1/25/2008

To use full pushdown optimization, the source and target must be on the same database. When you run a session configured for full pushdown optimization, the Integration Service analyzes the mapping starting with the source and analyzes each transformation in the pipeline until it analyzes the target. It generates SQL statements that are executed against the source and target database based on the transformation logic it can push to the database. If the session contains an SQL override, the Integration Service generates a view and runs a SELECT statement against this view. When you run a session for full pushdown optimization, the database must run a long transaction if the session contains a large quantity of data. Consider the following database performance issues when you generate a long transaction:

A long transaction uses more database resources.

A long transaction locks the database for longer periods of time, and thereby reduces the database concurrency and increases the likelihood of deadlock.

A long transaction can increase the likelihood that an unexpected event may occur.

Integration Service Behavior with Full Optimization When you configure a session for full optimization, the Integration Service might determine that it can push all of the transformation logic to the database. When it can push all transformation logic to the database, it generates an INSERT SELECT statement that is run on the database. The statement incorporates transformation logic from all the transformations in the mapping. When you configure a session for full optimization, the Integration Service might determine that it can push only part of the transformation logic to the database. When it can push part of the transformation logic to the database, the Integration Service pushes as much transformation logic to the source and target databases as possible. It then processes the remaining transformation logic. For example, a mapping contains the following transformations:

The Rank transformation cannot be pushed to the database. If you configure the session for full pushdown optimization, the Integration Service pushes the Source Qualifier transformation and the Aggregator transformation to the source. It pushes the Expression transformation and target to the target database, and it processes the Rank transformation. The Integration Service does not fail the session if it can push only part of the transformation logic to the database. Known Issues with Teradata and PowerCenter 8.1.1

Page 35: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

You may encounter the following problems using ODBC drivers with a Teradata database: Teradata sessions fail if the session requires a conversion to a numeric datatype and the precision is greater than 18. Teradata sessions fail when you use full pushdown optimization for a session containing a Sorter transformation. A sort on a distinct key may give inconsistent results if the sort is not case sensitive and one port is a character port. A session containing an Aggregator transformation may produce different results from PowerCenter if the group by port is a string datatype and it is not case-sensitive. A session containing a Lookup transformation fails if it is configured for target-side pushdown optimization. A session that requires type casting fails if the casting is from x to date/time. A session that contains a date to string conversion fails.

Sample mapping with two partitions

The first key range is 1313 - 3340, and the second key range is 3340 - 9354. The SQL statement merges all the data into the first partition: INSERT INTO ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC) SELECT ITEMS ITEMS.ITEM_ID, ITEMS.ITEM_NAME, ITEMS.ITEM_DESC FROM ITEMS WHERE (ITEMS.ITEM_ID>=1313)AND ITEMS.ITEM_ID<9354) ORDER BY ITEMS.ITEM_ID The SQL statement selects items 1313 through 9354, which includes all values in the key range and merges the data from both partitions into the first partition. The SQL statement for the second partition passes empty data:

Page 36: doc_2361421

Informatica Confidential. Do not duplicate. 36 Revision: 1/25/2008

INSERT INTO ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC) ORDER BY ITEMS.ITEM_ID

Working with SQL Overrides You can configure the Integration Service to perform an SQL override with pushdown optimization. To perform an SQL override, you configure the session to create a view. When you use an SQL override for a Source Qualifier transformation in a session configured for source or full pushdown optimization with a view, the Integration Service creates a view in the source database based on the override. After it creates the view in the database, the Integration Service generates an SQL query that it can push to the database. The Integration Service runs the SQL query against the view to perform pushdown optimization. Note: To use an SQL override with pushdown optimization, you must configure the session for pushdown optimization with a view. Running a Query If the Integration Service did not successfully drop the view, you can run a query against the source database to search for the views generated by the Integration Service. When the Integration Service creates a view, it uses a prefix of PM_V. You can search for views with this prefix to locate the views created during pushdown optimization. Teradata specific sql:

SELECT TableName FROM DBC.Tables WHERE CreatorName = USER AND TableKind ='V' AND TableName LIKE 'PM\_V%' ESCAPE '\'

Rules and Guidelines for SQL OVERIDE Use the following rules and guidelines when you configure pushdown optimization for a session containing an SQL override:

1. Do not use an order by clause in the SQL override. 2. Use ANSI outer join syntax in the SQL override. 3. Do not use a Sequence Generator transformation. 4. If a Source Qualifier transformation is configured for a distinct sort and contains an SQL override, the Integration Service ignores

the distinct sort configuration. 5. If the Source Qualifier contains multiple partitions, specify the SQL override for all partitions. 6. If a Source Qualifier transformation contains Informatica outer join syntax in the SQL override, the Integration Service processes

the Source Qualifier transformation logic. 7. PowerCenter does not validate the override SQL syntax, so test the SQL override query before you push it to the database. 8. When you create an SQL override, ensure that the SQL syntax is compatible with the source database.

Configuring Sessions for Pushdown Optimization You configure a session for pushdown optimization in the session properties. However, you may need to edit the transformation, mapping, or session configuration to push more transformation logic to the database. Use the Pushdown Optimization Viewer to examine the transformations that can be pushed to the database. To configure a session for pushdown optimization:

Page 37: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

1. In the Workflow Manager, open the session properties for the session containing transformation logic you want to push to the database.

2. From the Properties tab, select one of the following Pushdown Optimization options: None To Source To Source with View To Target Full Full with View

3. Click on the Mapping Tab in the session properties. 4. Click on View Pushdown Optimization. 5. The Pushdown Optimizer displays the pushdown groups and the SQL that is generated to perform the transformation logic.

It displays messages related to each pushdown group. The Pushdown Optimizer Viewer also displays numbered flags to indicate the transformations in each pushdown group.

6. View the information in the Pushdown Optimizer Viewer to determine if you need to edit the mapping, transformation, or session configuration to push more transformation logic to the database.

Page 38: doc_2361421

Informatica Confidential. Do not duplicate. 38 Revision: 1/25/2008

Design Techniques As Pushdown Optimization option supports most of the Transformations on the Source Side, while running a Data Transfer job, please a simple pass through mapping and stage the entire source Data in staging areas on the Target Teradata Database and just use Full Pushdown Optimization option to run the actual mapping that has the transformations. This design will avail all the benefits that come with Push Down / EL-T Approach using full pushdown.

Effectively designing mappings for PushDown Optimization Attached below is an example of a mapping that needs to be redesigned in order to use Pushdown Optimization

Source_Table (T eradata)

Lookup_3

Lookup_1Source_Qualifier

Target_Table (T eradata)

Filter_1

In the above mapping, there are two lookups, one filter. And as the staging area is the same as the target area, we can use PushDown Optimization in order to achieve high performance. But parallel lookups are not supported within PowerCenter 8.1.1 – so the mapping needs to be redesigned. Please see below for the redesigned mapping….

Page 39: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Source_Table (T eradata)

Lookup_1Source_Qualifier

Target_Table (T eradata)

lookup_2

Filter_1

In order to use PushDown Optimization, the lookups have been serialized which makes them a sub query while generating the SQL. Please see the figure below that shows the complete SQL and PushDown Configuration using Full Pushdown option.

Page 40: doc_2361421

Informatica Confidential. Do not duplicate. 40 Revision: 1/25/2008

Sample SQL generated is shown below Group 1 INSERT INTO Target_Table (ID,ID2,SOME_CAST) SELECT Source_Table.ID, Source_Table.SOME_CONDITION, CAST(Source_Table.SOME_CAST), Lookup_1.ID, Source_Table.ID, FROM ((Source_Table LEFT OUTER JOIN Lookup_1 ON

Page 41: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

(Lookup_1.ID = Source_Table.ID) AND (Source_Table.ID2 = (SELECT Lookup_2.ID2 FROM Lookup_2 Lookup_1 WHERE (Lookup_1.ID = Source_Table.ID2)))) LEFT OUTER JOIN Lookup_1 Lookup_2 ON (Lookup_1.ID = Source_Table.ID) AND (Source_Table.ID = (SELECT Lookup_2.ID2 FROM Lookup_2 WHERE (Lookup_2.ID2 = Source_Table.ID2)))) WHERE (NOT (Lookup_1.ID1 IS NULL) AND NOT (Lookup_2.ID2 IS NULL)) As you can see from the above example, very complicated SQL can be generated using PushDown Optimization. Some points to remember while configuring sessions is to make sure the right joins are being generated.

FAQ’s Uncached Lookup Date/Time limitation From the 6.1 release notes: When you run a session with a mapping that uses an uncached lookup on a Teradata database, the Informatica Server fails the session if any transformation port in the lookup condition uses a Date/Time datatype. The Informatica Server writes the following Teradata error message to the session log:

[NCR][ODBC Teradata Driver][Teradata RDBMS] Invalid operation on an ANSI Datetime or Interval value.

Workaround: Configure the Lookup transformation to use a lookup cache, or remove the Date/Time port from the lookup condition. There is now a better workaround. From the v7.1.2 release notes: Workaround: Apply the Teradata ODBC patch 3.2.011 or later and remove NoScan=Yes from the odbc.ini file.

“Streaming”/Non-Staged Mode If one selects “streaming” (a.k.a. non-staged) mode for a loader, one should also set the “checkpoint” property to 0. This effectively turns off the “checkpoint” processing. “Checkpoint” processing is used for recovery/restart of fastload and multiload sessions. However, if one is not using a physical file as input, but rather a named pipe, then the recovery/restart mechanism of the loaders does not work. Not only does this impact performance (i.e. the checkpoint processing is not free and we want to eliminate as much unnecessary overhead as possible), but a non-zero checkpoint value will sometimes cause seemingly random errors and session failures when used with named pipe input (as is the case with “streaming” mode).

Creating a session which performs both inserts and updates MkIII Update(6/05): PowerCenter now supports data driven sessions that target tpump or multiload. This feature is called “Teradata mixed mode processing”. Essentially, because MultiLoad and Tpump support inserts, updates and deletes, the row indicator is also written

Page 42: doc_2361421

Informatica Confidential. Do not duplicate. 42 Revision: 1/25/2008

to the output file/stream and the generated control file is enhanced to obey the row indicator. When using this option, make sure to set both the session’s and loader’s “mode” property to “data driven”. Important usage note: “Mixed mode” only works when there is single target definition instance for the target table. That is, if one has a target definition instance of the target table to which the “insert” rows are mapped, and another target definition instance to which “update” rows are mapped, this will not work as expected. The problem is that PowerCenter will start an instance of the loader for each target definition instance, and two mloads cannot write to the same table at the same time. Multiple tpumps may be able to write to the same table, although you could get locking conflicts between the inserts, updates and deletes. If one needs to update different columns than when inserting, override the target’s update SQL (see section 5.2 below). If this does not work, then perhaps the “legacy” solution described below will.

Legacy Notes: Suppose you need to populate a slowly changing dimension. By typical design, the mapping will contain at least two instances of the target definition, an “insert” target definition and an “update” target definition”. The type of operation MultiLoad or Tpump performs is determined by the “Load Mode” property of the External Loader (defined within the Server Manager). In this way, one can create External Loaders for each type of Load Mode (e.g. a “insert” MultiLoad, an “update” MultiLoad, etc.), and then assign the corresponding External Loader to the correct target definition instance (i.e. assign the “insert” loader to the “insert” target definition instance and the “update” loader to the “update” target definition instance)… except this does not work! This should work fine for Tpump (although there could be locking conflicts between the updates and inserts), but this will not work for MultiLoad (don’t even think about using FastLoad since it requires the target table to be empty – except as described below). Only a single instance of MultiLoad can run at any one time. Following the method described above, PowerCenter would start two instance of MultiLoad (one for the inserts and one for the updates). Whichever MultiLoad instance that starts second will fail. The simple workaround is as follows:

1) Configure PowerCenter to use the integrated MultiLoad support for the larger of the two output files. That is, if the majority of rows will be inserts, configure PowerCenter to use the “insert” MultiLoad external loader for the “insert” file.

2) Load the remaining file using MultiLoad called via a post-session script (Run a dummy session to generate a MultiLoad control file for this file, then use this control file for the post-session script). The syntax to run MultiLoad from a post session script is simply:

mload < <control file> For example, if the target definition’s output file is named “td_test_update.out”: mload < ./TgtFiles/td_test_update.out.ctl

The benefit of this approach is that larger file is “streamed” into Teradata, only one of the output files must be completely staged prior to loading, and it is fairly simple to setup. The downside is that MultiLoad is invoked twice, and this incurs more overhead on the Teradata system. A more difficult workaround that only calls MultiLoad once is described below in section 5.5 “Using one instance of MultiLoad to load multiple tables.” Using Update SQL Override on Target Definitions Both MultiLoad and Tpump can do updates as well as inserts to the target table (again, FastLoad does one thing: insert into empty tables). In addition, they also support the concept of an “upsert” (update if exists else insert). Where does this “update” statement come from? By default, the update statement generated in a MultiLoad or Tpump script is just like the update statement that would be used by a native connection (update all the target’s mapped ports using the key ports in the “where clause”). However, one can override the target definition’s update SQL to change this default behavior. When one overrides a target’s update SQL, the SQL from the “Override SQL” property of the mapping’s target definition is moved to the MultiLoad or Tpump script verbatim. This means that one must get rid of each columns’ “:TU.” prefix because the utilities do not understand this PowerCenter specific nomenclature. Why would you ever do this? Suppose you need to do an “incremental aggregation” against an existing table that already contains many millions of rows. That is, suppose every day a set of aggregations must be computed and then applied as an “upsert” to an existing table (i.e. if the aggregate already exists, update the existing row to reflect to latest data, otherwise insert a new row). You could use

Page 43: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

PowerCenter’s patented “incremental aggregation” capability. However, if the table already existed before PowerCenter came into the picture, there is the problem of building the initial “incremental aggregation” cache (the initial transactional source data is probably long gone). There are also session restart issues caused by PowerCenter’s “incremental aggregation” (and Teradata folks seem to be keen on making sure there is always a mechanism to assure clean restarts). How do you do this? Assume you’re computing a single aggregate to be incrementally applied to a single target table. Compute the aggregate as usual and map all aggregate ports to the target definition (no need for any update strategies). Now, override the target update SQL to add the computed values (those coming from the mapping) to the existing values (those coming from the table) and modify the SQL to make it syntactically correct for MultiLoad or Tpump. Here is an example of a target definition SQL Override that, when combined with the “Upsert” Load Mode, will perform an “incremental aggregation”. Here is the original update SQL (updating every non-primary key field in the table): UPDATE TERA_DIST_INVENTORY SET QTY = :TU.QTY, LAST_TRANS = :TU.LAST_TRANS WHERE PRODUCT = :TU.PRODUCT

Here’s the modified MultiLoad SQL (adds computed QTY to existing QTY and updates the last transaction date column – the “td” at the end of the date field name is just something the MultiLoad control file generation routine does): UPDATE TERA_DIST_INVENTORY SET QTY = :QTY + QTY, LAST_TRANS = :LAST_TRANStd WHERE PRODUCT = :PRODUCT

Here’s the modified Tpump SQL (same as above, but as of v5.1.1, Tpump and MultiLoad scripts use different naming conventions for date fields – Tpump uses no “td” suffix on dates): UPDATE TERA_DIST_INVENTORY SET QTY = :QTY + QTY, LAST_TRANS = :LAST_TRANS WHERE PRODUCT = :PRODUCT

Date Formats Key Point: If target table contains dates, expect to run into problems. Here’s why: When one creates a date column in a Teradata table, one can specify a display format for the date. Not only does this determine the format in which dates will be displayed by the Teradata client tools, but it also, unexpectedly, determines the format in which dates can be loaded into the column. For example, suppose a date column has been created with a format of “yyyy/mm/dd”, if one attempts to load a date string formatted as “mm/dd/yyyy” into the column, the load will fail! This is further complicated by the fact that PowerCenter only supports a small subset of the date formats supported by Teradata (basically, PowerCenter supports “yyyy/mm/dd” and “mm/dd/yyyy”). Also, it is unwise to assume consistency in date column formats between tables. That is, experience has shown that just because a particular date format has been specified for tableA, there is no guarantee that the same format will be specified in tableB. NOTE: The QueryMan client tool does not respect the “format” option on dates. That is, it displays dates in a consistent format regardless of the declared “format” option. To view the format defined for a date field, you must run the command “show table <table name>” (or run a “select” from the table using BTEQ). An especially dangerous Teradata date format which one might run into is “yyyyddd”. This corresponds to the 4 digit century followed by the day in the year (e.g. 1-365). If you run into a column defined like this, you must do the following workaround:

1) Edit the target table definition within PowerCenter to change the date column’s data type from “date” to “char(7)”. 2) Create an expression transformation that converts the date port into a string of the format “yyyyddd” (i.e.

to_char(date_port,’yyyy’) || to_char(date_port,’ddd’) – note, to_char(date_port, ‘yyyyddd’) does not work.). 3) Map the output of this transformation expression into the port of the target definition that was changed in step 1).

Page 44: doc_2361421

Informatica Confidential. Do not duplicate. 44 Revision: 1/25/2008

Of course, another alternative would be to get the column redefined with a date format that we support. It is important to note that the date format does not change the way that a date is internally stored by Teradata. This might help the argument for making a change in the format. Creating the work, log and error tables in a different database MkIII Update(6/05): PowerCenter now supports more flexibility in the creation of work, log and error tables. The loaders now support properties to specify the database in which to put these tables.

Legacy Notes: By default, the MultiLoad and Tpump scripts generated by PowerCenter will place the work, log and error tables in the same database as the target table (For a little more detailed discussion of the error tables, see section 6 “Troubleshooting”). Sometimes it is a site’s standard to put the work, log or error tables in a database other than the target. Here’s the workaround (Note: At this point, it is probably a good idea to reference the Teradata MultiLoad or Tpump documentation. It would also be a good idea to enlist somebody from the prospect to help with this as well – this “somebody” should be familiar with the standards to which you are trying to conform.): To change the location of the work table, add a WORKTABLES clause to the “BEGIN MLOAD” statement: .BEGIN IMPORT MLOAD TABLES infatest.TD_TEST WORKTABLES dumpdb.WT_TD_TEST ERRLIMIT 1 CHECKPOINT 10000 TENACITY 10000 SESSIONS 1 SLEEP 6 ; To change the location of the log table, find the following line and change the database name (“infatest” is the database name): .LOGTABLE infatest.mldlog_TD_TEST; To change the location of the error tables, add an ERRORTABLES clause to the “BEGIN MLOAD” statement: .BEGIN IMPORT MLOAD TABLES infatest.TD_TEST WORKTABLES dumpdb.WT_TD_TEST ERRORTABLES dumpdb.ET_TD_TEST dumpdb.UV_TD_TEST ERRLIMIT 1 CHECKPOINT 10000 TENACITY 10000 SESSIONS 1 SLEEP 6 ; If you change the name or location of the log or error tables, you will also need to change the statements that drop these tables at the beginning of the script: DROP TABLE dumpdb.UV_TD_TEST ; DROP TABLE dumpdb.WT_TD_TEST ; DROP TABLE dumpdb.ET_TD_TEST ; After you make these changes to the generated control file, run the “chmod” command (change file mode) to make the control file “read-only”. In this way, PowerCenter will not overwrite these changes the next time it runs the session:

Page 45: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

chmod –w td_test.out.ctl The obvious downside to this is maintenance. When/If the target table changes, you’ll need to update the control file to reflect the changes. Using one instance of MultiLoad to load multiple tables MultiLoad is a big consumer of resources on a Teradata system. Some systems will have hard limits on the number of concurrent MultiLoad sessions allowed. By default, PowerCenter will start an instance of MultiLoad for every target file. Sometimes, this is illegal (if the multiple instances target the same table). Other times, it is just expensive. Therefore, a prospect may ask that PowerCenter use a single instance of MultiLoad to load multiple tables (or to load both inserts and updates into the same target table). To make this happen, we’re back to heavy editing of the generated MultiLoad script file. Note: This should not be an issue with Tpump because Tpump is not as resource intensive as MultiLoad (and a multiple concurrent instances of Tpump can target the same table). Here’s the workaround:

4) Use a dummy session (i.e. set test rows to 1 and target a test database) to generate MultiLoad control files for each of the targets.

5) Merge the multiple control files (one per target table) into a single control file (one for all target tables) 6) Configure the session to call MultiLoad from a post-session script using the control file created in step 2. Integrated support

cannot be used because each input file is processed sequentially and this causes problems when combined with PowerCenter’s integrated named pipes and streaming.

Details on “merging” the control files:

9) There is a single log file for each instance of MultiLoad. Therefore, you do not have to change or add anything the “LOGFILE” statement. However, you might want to change the name of the log table since it may be a log that spans multiple tables.

10) Copy the work and error table delete statements into the common control file 11) Modify the “BEGIN MLOAD” statement to specify all the tables that the MultiLoad will be hitting 12) Copy the “Layout” sections into the common control file and give each a unique name. Organize the file such that all the layout

sections are grouped together. 13) Copy the “DML” sections into the common control file and give each a unique name. Organize the file such that all the DML

sections are grouped together. 14) Copy the “Import” statements into the common control file and modify them to reflect the unique names created for the

referenced LAYOUT and DML sections created in steps 4) and 5). Organize the file such that all the Import sections are grouped together.

15) Run “chmod –w” on the newly minted control file so PowerCenter doesn’t overwrite it, or, better yet, name it something different so PowerCenter cannot overwrite it.

16) It’s just that easy!!! Also remember, a single instance of MultiLoad can target at most 5 tables. Therefore, don’t combine more than 5 target files into a common file.

Here’s an example of a control file merged from two default control files: .DATEFORM ANSIDATE; .LOGON demo1099/infatest,infatest; .LOGTABLE infatest.mldlog_TD_TEST; DROP TABLE infatest.UV_TD_TEST ; DROP TABLE infatest.WT_TD_TEST ; DROP TABLE infatest.ET_TD_TEST ; DROP TABLE infatest.UV_TD_CUSTOMERS ; DROP TABLE infatest.WT_TD_CUSTOMERS ; DROP TABLE infatest.ET_TD_CUSTOMERS ;

Page 46: doc_2361421

Informatica Confidential. Do not duplicate. 46 Revision: 1/25/2008

.ROUTE MESSAGES WITH ECHO TO FILE c:\LOGS\TgtFiles\td_test.out.ldrlog ; .BEGIN IMPORT MLOAD TABLES infatest.TD_TEST, infatest.TD_CUSTOMERS ERRLIMIT 1 CHECKPOINT 10000 TENACITY 10000 SESSIONS 1 SLEEP 6 ; /* Begin Layout Section */ .Layout InputFileLayout1; .Field CUST_KEY 1 CHAR( 12) NULLIF CUST_KEY = '*' ; .Field CUST_NAME 13 CHAR( 20) NULLIF CUST_NAME = '*' ; .Field CUST_DATE 33 CHAR( 10) NULLIF CUST_DATE = '*' ; .Field CUST_DATEmm 33 CHAR( 2) ; .Field CUST_DATEdd 36 CHAR( 2) ; .Field CUST_DATEyyyy 39 CHAR( 4) ; .Field CUST_DATEtd CUST_DATEyyyy||'/'||CUST_DATEmm||'/'||CUST_DATEdd NULLIF CUST_DATE = '*' ; .Filler EOL_PAD 43 CHAR( 2) ; .Layout InputFileLayout2; .Field CUSTOMER_KEY 1 CHAR( 12) ; .Field CUSTOMER_ID 13 CHAR( 12) ; .Field COMPANY 25 CHAR( 50) NULLIF COMPANY = '*' ; .Field FIRST_NAME 75 CHAR( 30) NULLIF FIRST_NAME = '*' ; .Field LAST_NAME 105 CHAR( 30) NULLIF LAST_NAME = '*' ; .Field ADDRESS1 135 CHAR( 72) NULLIF ADDRESS1 = '*' ; .Field ADDRESS2 207 CHAR( 72) NULLIF ADDRESS2 = '*' ; .Field CITY 279 CHAR( 30) NULLIF CITY = '*' ; .Field STATE 309 CHAR( 2) NULLIF STATE = '*' ; .Field POSTAL_CODE 311 CHAR( 10) NULLIF POSTAL_CODE = '*' ; .Field PHONE 321 CHAR( 30) NULLIF PHONE = '*' ; .Field EMAIL 351 CHAR( 30) NULLIF EMAIL = '*' ; .Field REC_STATUS 381 CHAR( 1) NULLIF REC_STATUS = '*' ; .Filler EOL_PAD 382 CHAR( 2) ; /* End Layout Section */ /* begin DML Section */ .DML Label tagDML1; INSERT INTO infatest.TD_TEST ( CUST_KEY , CUST_NAME ,

Page 47: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

CUST_DATE ) VALUES ( :CUST_KEY , :CUST_NAME , :CUST_DATEtd ) ; .DML Label tagDML2; INSERT INTO infatest.TD_CUSTOMERS ( CUSTOMER_KEY , CUSTOMER_ID , COMPANY , FIRST_NAME , LAST_NAME , ADDRESS1 , ADDRESS2 , CITY , STATE , POSTAL_CODE , PHONE , EMAIL , REC_STATUS ) VALUES ( :CUSTOMER_KEY , :CUSTOMER_ID , :COMPANY , :FIRST_NAME , :LAST_NAME , :ADDRESS1 , :ADDRESS2 , :CITY , :STATE , :POSTAL_CODE , :PHONE , :EMAIL , :REC_STATUS ) ; /* end DML Section */ /* Begin Import Section */ .Import Infile c:\LOGS\TgtFiles\td_test.out Layout InputFileLayout1 Format Unformat Apply tagDML1 ; .Import Infile c:\LOGS\TgtFiles\td_customers.out Layout InputFileLayout2 Format Unformat Apply tagDML2 ;

Page 48: doc_2361421

Informatica Confidential. Do not duplicate. 48 Revision: 1/25/2008

/* End Import Section */ .END MLOAD; .LOGOFF; Partitioned Loading As previously mentioned, without “special behavior”, one cannot simultaneously run multiple instances of MultiLoad targeting the same table. This is exactly what PowerCenter would do if one were allowed to specify Teradata external loaders for a partitioned session. However, “special behavior” has been added to PowerCenter. Please read on… MkIII Update(6/05): With PowerCenter v7.x, if one sets a “round robin” partition point on the target definition and sets each target instance to be loaded using the same loader connection instance, then PowerCenter automatically writes all data to the first partition and only starts one instance of FastLoad or MultiLoad. You will know you are getting this behavior if you see the following entry in the session log: MAPPING> DBG_21684 Target [TD_INVENTORY] does not support multiple partitions. All data will be routed to the first partition. If you do not see this message, then chances are the session fails with the following error: WRITER_1_*_1> WRT_8240 Error: The external loader [Teradata Mload Loader] does not support partitioned sessions. WRITER_1_*_1> Thu Jun 16 11:58:21 2005 WRITER_1_*_1> WRT_8068 Writer initialization failed. Writer terminating.

Legacy Notes: With v6.1 and beyond, there is a special “undocumented” pmserver.cfg/registry variable to handle this (this is equally applicable to UDB, SybaseIQ as well as Teradata). One must add the following line to pmserver.cfg: SupportNonPartitionedLoaders=Yes On Win2k/NT, the following registry entry must be created: HK_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Powermart\Parameters\MiscInfo\SupportNonPartitionedLoaders=Yes When this flag is set to “Yes”, a writer thread does not actually start the external loader process until it receives data. However, you still must configure your session so that only a single writer thread receives data. This is typically accomplished by placing a “key range” partition point on the target definition. The “key range” must then be configured such that one and only one partition receives data (place an all inclusive key range on one partition and a non-inclusive key range on all others). While not required at this time, it is recommended that “partition 1” be the “non-null” partition. This is to support future behavior changes to this flag. Prior to v6.1, PowerCenter did not allow one to configure integrated MultiLoad support for partitioned sessions. The workaround is:

1) Use a dummy non-partitioned session (i.e. set test rows to 1 and target a test database) to generate the MultiLoad control file. 2) Check the “Merge targets for partitioned sessions” check box under the “Target Options” for the partitioned session 3) Configure the session to call MultiLoad from a post-session script using the control file created in step 1.

“Streaming” data into MultiLoad and Tpump on Win2K/NT MkIII Update(6/05): PowerCenter now supports streaming on Win2K to MultiLoad, FastLoad, Tpump and TWB automatically with no extra work. Simply select or deselect the “staged” property.

Page 49: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Legacy Notes:

In general, Win2K/NT does not support the Unix facility of “named pipes”. Therefore, INFA has not been able to “stream” into external loaders on Win2K/NT. However, Teradata supports a special “named pipes” access module that INFA can leverage to stream data to/from the Teradata utilities (MultiLoad, Tpump, FastLoad and FastExport). To do this, one uses the ASXMOD (Access Module) option of the various tools. The following is an example of streaming data into Tpump (MultiLoad and FastLoad would be very similar). One must modify the “IMPORT” statement in the Tpump command script to specify a Win2K/NT named pipe instead of the PowerCenter output file and the AXSMOD modifier must be specified: Default: .Import Infile c:\LOGS\TgtFiles\td_test.out Layout InputFileLayout Format Unformat Apply tagDML ; Modified to use Teradata Named Pipe Access Module: .Import Infile \\.\pipe\mypipe axsmod np_axsmod.dll Layout InputFileLayout Format Unformat Apply tagDML; Unlike Unix named pipes, the Teradata implementation of Win2K/NT named pipes can support checkpoint restart capabilities. Also, the Teradata utility is responsible for creating and deleting the named pipe. It is likely that the PATH environment variable of the PowerCenter server must include the directory where “np_axsmod.dll” is located. After modifying the tpump control file to use a named pipe (it is suggested that you rename the control file to be something other than the default name so it does not get overwritten if you re-run the session with tpump or mload specified for the target), you must reconfigure the session to write to the named pipe rather than to tpump. That is, do not specify an external loader and specify the named pipe (e.g. “\\.\pipe\mypipe” ) as the output file for the session. In addition, you must run tpump as a pre-session command as the tpump access module is responsible for creating the named pipe, and the named pipe must exist before PowerCenter can write to it. For more information on the Win2K/NT Named Pipe Access Module, see the Teradata manual: “Teradata Tools and Utilities Access Module Reference”.

Lookup Performance Creating a lookup cache based on a multi-million row table may time some time. This is especially true since the lookup will be using ODBC to populate the cache. A non-cached lookup may actually improve overall throughput because a big NCR box can service simple queries very fast vs. pulling millions of rows into an often much smaller box. Your mileage will vary and there are no hard and fast rules about this – just keep this in mind as a potential area to improve performance.

Hiding the Password Teradata loaders use clear text passwords. They offer no nice way to obscure the password within the file. Some prospects see this as a security concern. The easiest solution is to lock down the directory in which the control file is generated so that the general public cannot read the control file.

Page 50: doc_2361421

Informatica Confidential. Do not duplicate. 50 Revision: 1/25/2008

You can also direct PowerCenter to write the control files to a different location (directory) and then secure this location. To configure the PowerCenter Server on UNIX to write the external loader control file to a separate directory, add the following entry to pmserver.cfg:

LoaderControlFileDirectory=<directory_name>

On Win2K, add the following to the registry HK_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Powermart\Parameters\MiscInfo\LoaderControlFileDirectory=<directory_name> Finally, MultiLoad and Tpump (but NOT FastLoad) support a command called “RUN FILE”. This essentially directs control from the current control file to the control file specified in the script. Place your login statements in a file in a secure location, and then add a RUN FILE statement to the generated control file to call it. For example, create a login script as follows (in the file “login.ctl” in a <secure directory path>): .LOGON demo1099/infatest,infatest; The modify the generated control file to replace the login statement with: .RUN FILE <secure directory path>/login.ctl;

Troubleshooting A MultiLoad load actually consists of two main phases: acquisition and application. In the acquisition phase, the input data file is read and written to a temporary work table. During the application phase, the data from the work table is written to the actual target table. Different errors will appear in the different phases. Errors that have to do with the format or content of the source data will generally be identified during the acquisition phase. Errors that have to do with the data as it is stored in the target table (i.e. constraints, primary keys) will generally occur during the application phase. MultiLoad requires an exclusive lock on the target table during the application phase. Tpump is a single phase load. In fact, Tpump does not do anything “fancy” except for macro-ized SQL. That is, it takes the SQL from the control file and turns it into a database macro. It then takes the input data and applies the macro to it. It uses standard SQL and standard locking.

The “et” table Errors generated during the acquisition phase of a MultiLoad or during a Tpump load can be found in the “et” table (i.e. “et_<table name>” by default). This is generally the first place to look for more specific information when a load fails. The key column of the error table is “ErrorField”. This column indicates the column of the target table that could not be loaded. There is also an ErrorCode field that provides details about why the column failed. The most common ErrorCodes are: 2689: Trying to load a null value into a non-null field 2665: Invalid date format See the Teradata documentation for a complete list of possible error codes.

The “uv” table

Page 51: doc_2361421

Informatica Confidential. Do not duplicate. Revision: 1/25/2008

Error generated during the application phase of an MultiLoad can be found in this table (because Tpump does not have an “application phase”, it does not create a “uv” table). The most common types of errors logged to the “uv” table are non-unique primary keys, field overflow and constraint violations. Like the “et” table, the key column of the “uv” table is “DBCErrorField” and “DBCErrorCode”. The “DBCErrorField” column is not initialized in the case of primary key uniqueness violations. However, DBCErrorCode that corresponds to a primary key uniqueness violation is 2794.

Errors that indicate a prior MultiLoad session has not been “cleaned up” The following is an excerpt from the tail end of a MultiLoad session log (<output file name>.ldrlog found in the target file directory): **** 21:23:55 UTY0817 MultiLoad submitting the following request: CHECKPOINT LOADING; **** 21:23:56 UTY0805 RDBMS failure, 2801: Duplicate unique prime key error in infatest.ET_TD_TEST. ======================================================================== = = = Logoff/Disconnect = = = ======================================================================== **** 21:23:58 UTY6212 A successful disconnect was made from the RDBMS. **** 21:23:58 UTY2410 Total processor time used = '0.0701008 Seconds' . Start : 21:23:48 - THU APR 18, 2002 . End : 21:23:58 - THU APR 18, 2002 . Highest return code encountered = '12'. This error indicates that you’re trying to run a MultiLoad session without properly cleaning up a previously fail MultiLoad session. See section 4.5. You might also see messages similar to these in the pmserver’s standard output file (see section 4.3): 0003 .LOGTABLE infatest.mldlog_TD_TEST; **** 21:23:50 UTY8400 Default character set: ASCII **** 21:23:50 UTY8400 Maximum supported buffer size: 64K **** 21:23:51 UTY6211 A successful connect was made to the RDBMS. **** 21:23:51 UTY6210 Logtable 'infatest.mldlog_TD_TEST' indicates that a restart is in progress. 0004 DROP TABLE infatest.UV_TD_TEST ; **** 21:23:51 UTY1012 A restart is in progress. This request has already been executed. The return code was: 0. This will also be corrected by properly cleaning up from the MultiLoad.

Sessions periodically fail with “broken pipe” errors when writing to a loader in “streaming” (non-staging) mode: If you’re using FastLoad or MultiLoad and have a non-zero value specified for the “checkpoint” property, this can happen. Set the loader’s “checkpoint” property to 0.

Page 52: doc_2361421

Informatica Confidential. Do not duplicate. 52 Revision: 1/25/2008

©Copyright 2005 Informatica Corporation. Informatica and PowerCenter are registered trademarks of Informatica Corporation. Teradata is a registered trademark of NCR Corporation. All other company, product, or service names may be trademarks or registered trademarks of their respective owners.