21
Enhancements in Powercenter 8.6 to 8.5 Informatica has released its latest version 8.6 covering all the hot fixes it released for the prior version 8.5 and including few new features. Since version 8, a Unified Admin Console has been designed for managing Integration and Repository services. These were discussed in earlier Blogs. What does PowerCenter 8.6 bring new for the developers? Let us discuss PowerCenter 8.6 Client enhancements which will be useful to the developers. 1. Creating Targets from Transformations We can create targets based on transformations in the workspace or navigator. To create a target, 1. Right-click the transformation in the workspace and select the Create and Add Target option. 2. Alternatively, we can drag and drop the transformation in the Target Designer. The target that is created has the same port definitions as the transformation from which it was created. We can edit the target definitions later. In addition, the target type is the same as that of the repository used. 2. Invalid/Invalidated renamed

Powercenter 8

Embed Size (px)

Citation preview

Page 1: Powercenter 8

Enhancements in Powercenter 8.6 to 8.5

Informatica has released its latest version 8.6 covering all the hot fixes it released for the

prior version 8.5 and including few new features. Since version 8, a Unified Admin

Console has been designed for managing Integration and Repository services. These were

discussed in earlier Blogs.

What does PowerCenter 8.6 bring new for the developers? Let us discuss PowerCenter

8.6 Client enhancements which will be useful to the developers.

 

1.     Creating Targets from Transformations

We can create targets based on transformations in the workspace or navigator.

To create a target,

1.      Right-click the transformation in the workspace and select the Create and Add Target option. 2.      Alternatively, we can drag and drop the transformation in the Target Designer.

The target that is created has the same port definitions as the transformation from which

it was created. We can edit the target definitions later. In addition, the target type is the

same as that of the repository used.

 

2.     Invalid/Invalidated renamed

In PowerCenter 7, the two states of objects were known as Invalid and Invalidated.

The exact meaning of these states is as follows:

Invalid – an object will not run,

Invalidated – an object may be invalid or may not run.

The difference between the two terms was not very clear. Therefore, to avoid any

confusion, in PowerCenter 8.6, the two states have been renamed as Invalid and

Impacted. While the Invalid state still implies that an object will not run, Impacted

means that an object is affected by a change, and therefore, may not run.

Apart from the naming convention the icons are also changed in PowerCenter 8.

 

Page 2: Powercenter 8

3.     Propagating Port Descriptions

In the Designer, in addition to the other properties of port propagation, we can edit a port

description and propagate the description to other transformations in the mapping. 

 

4.     Environment SQL Enhancements

In PowerCenter 8, environment SQL can be used to execute an SQL statement at the

beginning of each transaction. The Integration Service executes transaction environment

SQL at the beginning of each transaction. Environment SQL can still be used to execute

an SQL statement at each connection to the database.

Use SQL commands that depend upon a transaction being opened during the entire read

or write process. For example, the following SQL command modifies how the session

handles characters:

ALTER SESSION SET NLS_LENGTH_SEMANTICS=CHAR

5.     Flat File Enhancements

PowerCenter 8 includes enhancements for handling flat files. Some of these improve

performance.

Flat files can now use Integer or Double data types.

In addition, target partitions can be merged.

2. PowerCenter Repository

The PowerCenter Repository is one of best metadata storage among all ETL products.

The repository is sufficiently normalized to store metadata at a very detail level; which in

turn means the Updates to the repository are very quick and the overall Team-based

Development is smooth. The repository data structure is also useful for the users to do

analysis and reporting.

Page 3: Powercenter 8

Accessibility to the repository through MX views and SDK kit extends the repositories

capability from a simple storage of technical data to a database for analysis of the ETL

metadata.

PowerCenter Repository is a collection of 355 tables which can be created on any major

relational database. The kinds of information that are stored in the repository are,

1. Repository configuration details

2. Mappings

3. Workflows

4. User Security

5. Process Data of session runs

For a quick understanding,

When a user creates a folder, corresponding entries are made into table OPB_SUBJECT;

attributes like folder name, owner id, type of the folder like shared or not are all stored.

When we create\import sources and define field names, datatypes etc in source analyzer

entries are made into opb_src and OPB_SRC_FLD.

When target and related fields are created/imported from any database entries are made

into tables like OPB_TARG and OPB_TARG_FLD.

Table OPB_MAPPING stores mapping attributes like Mapping Name, Folder Id, Valid

status and mapping comments.

Table OPB_WIDGET stores attributes like widget type, widget name, comments etc.

Widgets are nothing but the Transformations which Informatica internally calls them as

Widgets.

Table OPB_SESSION stores configurations related to a session task and table

OPB_CNX_ATTR stores information related to connection objects.

Table OPB_WFLOW_RUN stores process details like workflow name, workflow started

time, workflow completed time, server node it ran etc.

REP_ALL_SOURCES, REP_ALL_TARGETS and REP_ALL_MAPPINGS are few of

the many views created over these tables.

Page 4: Powercenter 8

PowerCenter applications access the PowerCenter repository through the Repository

Service. The Repository Service protects metadata in the repository by managing

repository connections and using object-locking to ensure object consistency.

We can create a repository as global or local. We can go for‘global’ to store common

objects that multiple developers can use through shortcuts and go for local repository to

perform of development mappings and workflows. From a local repository, we can create

shortcuts to objects in shared folders in the global repository. PowerCenter supports

versioning. A versioned repository can store multiple versions of an object.

3. Administration Console

The Administration Console is a web application that we use to administer the

PowerCenter domain and PowerCenter security. There are two pages in the console,

Domain Page & Security Page.

We can do the following In Domain Page:

o Create & manage application services like Integration Service and Repository

Service

o Create and manage nodes, licenses and folders

o Restart and shutdown nodes

o View log events

o Other domain management tasks like applying licenses and managing grids and

resources

We can do the following in Security Page:

o Create, edit and delete native users and groups

o Configure a connection to an LDAP directory service. Import users and groups

from the LDAP directory service

o Create, edit and delete Roles (Roles are collections of privileges)

o Assign roles and privileges to users and groups

Page 5: Powercenter 8

o Create, edit, and delete operating system profiles. An operating system profile is

a level of security that the Integration Services uses to run workflows

4. PowerCenter Client

Designer, Workflow Manager, Workflow Monitor, Repository Manager & Data

Stencil are five client tools that are used to design mappings, Mapplets, create

sessions to load data and manage repository.

Mapping is an ETL code pictorially depicting logical data flow from source to target

involving transformations of the data. Designer is the tool to create mappings

Designer has five window panes, Source Analyzer, Warehouse Designer,

Transformation Developer, Mapping Designer and Mapplet Designer.

Source Analyzer:

Allows us to import Source table metadata from Relational databases, flat files, XML

and COBOL files. We can only import the source definition in the source Analyzer

and not the source data itself is to be understood. Source Analyzer also allows us to

define our own Source data definition.

Warehouse Designer:

Allows us to import target table definitions which could be Relational databases, flat

files, XML and COBOL files. We can also create target definitions manually and can

group them into folders. There is an option to create the tables physically in the

database that we do not have in source analyzer. Warehouse designer doesn’t allow

creating two tables with same name even if the columns names under them vary or

they are from different databases/schemas.

Transformation Developer:

Transformations like Filters, Lookups, Expressions etc that have scope to be re-used

are developed in this pane. Alternatively Transformations developed in Mapping

Designer can also be reused by checking the option‘re-use’ and by that it would be

displayed under Transformation Developer folders.

Mapping Designer:

This is the place where we actually depict our ETL process; we bring in source

definitions, target definitions, transformations like filter, lookup, aggregate and

Page 6: Powercenter 8

develop a logical ETL program. In this place it is only a logical program because the

actual data load can be done only by creating a session and workflow.

Mapplet Designer:

We create a set of transformations to be used and re-used across mappings.

4. PowerCenter Client (contd)

Workflow Manager : In the Workflow Manager, we define a set of instructions called a

workflow to execute mappings we build in the Designer. Generally, a workflow contains

a session and any other task we may want to perform when we run a session. Tasks can

include a session, email notification, or scheduling information.

A set of tasks grouped together becomes worklet. After we create a workflow, we run the

workflow in the Workflow Manager and monitor it in the Workflow Monitor. Workflow

Manager has following three window panes,Task Developer, Create tasks we want to

accomplish in the workflow. Worklet Designer, Create a worklet in the Worklet

Designer. A worklet is an object that groups a set of tasks. A worklet is similar to a

workflow, but without scheduling information. You can nest worklets inside a workflow.

Workflow Designer, Create a workflow by connecting tasks with links in the Workflow

Designer. We can also create tasks in the Workflow Designer as you develop the

workflow. The ODBC connection details are defined in Workflow Manager

“Connections “ Menu .

Workflow Monitor : We can monitor workflows and tasks in the Workflow Monitor.

We can view details about a workflow or task in Gantt Chart view or Task view. We can

run, stop, abort, and resume workflows from the Workflow Monitor. We can view

sessions and workflow log events in the Workflow Monitor Log Viewer.

The Workflow Monitor displays workflows that have run at least once. The Workflow

Monitor continuously receives information from the Integration Service and Repository

Service. It also fetches information from the repository to display historic information.

The Workflow Monitor consists of the following windows:

Page 7: Powercenter 8

Navigator window – Displays monitored repositories, servers, and repositories

objects.

Output window – Displays messages from the Integration Service and Repository

Service.

Time window – Displays progress of workflow runs.

Gantt chart view – Displays details about workflow runs in chronological format.

Task view – Displays details about workflow runs in a report format.

Repository Manager

We can navigate through multiple folders and repositories and perform basic repository

tasks with the Repository Manager. We use the Repository Manager to complete the

following tasks:

2. Add and connect to a repository, we can add repositories to the Navigator window

and client registry and then connect to the repositories.

3. Work with PowerCenter domain and repository connections, we can edit or remove

domain connection information. We can connect to one repository or multiple

repositories. We can export repository connection information from the client

registry to a file. We can import the file on a different machine and add the

repository connection information to the client registry.

4. Change your password. We can change the password for our user account.

5. Search for repository objects or keywords. We can search for repository objects

containing specified text. If we add keywords to target definitions, use a keyword

to search for a target definition.

6. View objects dependencies. Before we remove or change an object, we can view

dependencies to see the impact on other objects.

Page 8: Powercenter 8

7. Compare repository objects. In the Repository Manager, wean compare two

repository objects of the same type to identify differences between the objects.

8. Truncate session and workflow log entries. we can truncate the list of session and

workflow logs that the Integration Service writes to the repository. we can

truncate all logs, or truncate all logs older than a specified date.

Informatica PowerCenter 8x Key Concepts – 5

By Badri Narayanan on January 16th, 2009 under Informatica Way.

5. Repository Service

As we already discussed about metadata repository, now we discuss a separate,multi-

threaded process that retrieves, inserts and updates metadata in the repository database

tables, it is Repository Service.

Repository service manages connections to the PowerCenter repository from

PowerCenter client applications like Designer, Workflow Manager, Monitor, Repository

manager, console and integration service. Repository service is responsible for ensuring

the consistency of metadata in the repository.

Creation & Properties:

Use the PowerCenter Administration Console Navigator window to create a Repository

Service. The properties needed to create are,

Service Name – name of the service like rep_SalesPerformanceDev

Location – Domain and folder where the service is created

License – license service name

Node, Primary Node & Backup Nodes – Node on which the service process runs

CodePage – The Repository Service uses the character set encoded in the repository code

page when writing data to the repository

Database type & details – Type of database, username, pwd, connect string and

tablespacename

Page 9: Powercenter 8

The above properties are sufficient to create a repository service, however we can take a

look at following features which are important for better performance and maintenance.

General Properties

> OperatingMode: Values are Normal and Exclusive. Use Exclusive mode to perform

administrative tasks like enabling version control or promoting local to global repository

> EnableVersionControl: Creates a versioned repository

Node Assignments: “High availability option” is licensed feature which allows us to

choose Primary & Backup nodes for continuous running of the repository service. Under

normal licenses would see only only Node to select from

Database Properties

> DatabaseArrayOperationSize: Number of rows to fetch each time an array database

operation is issued, such as insert or fetch. Default is 100

> DatabasePoolSize:Maximum number of connections to the repository database that the

Repository Service can establish. If the Repository Service tries to establish more

connections than specified for DatabasePoolSize, it times out the connection attempt after

the number of seconds specified for DatabaseConnectionTimeout

Advanced Properties

> CommentsRequiredFor Checkin: Requires users to add comments when checking in

repository objects.

> Error Severity Level: Level of error messages written to the Repository Service log.

Specify one of the following message levels: Fatal, Error, Warning, Info, Trace & Debug

> EnableRepAgentCaching:Enables repository agent caching. Repository agent caching

provides optimal performance of the repository when you run workflows. When you

enable repository agent caching, the Repository Service process caches metadata

requested by the Integration Service. Default is Yes.

> RACacheCapacity:Number of objects that the cache can contain when repository agent

caching is enabled. You can increase the number of objects if there is available memory

Page 10: Powercenter 8

on the machine running the Repository Service process. The value must be between 100

and 10,000,000,000. Default is 10,000

> AllowWritesWithRACaching: Allows you to modify metadata in the repository when

repository agent caching is enabled. When you allow writes, the Repository Service

process flushes the cache each time you save metadata through the PowerCenter Client

tools. You might want to disable writes to improve performance in a production

environment where the Integration Service makes all changes to repository metadata.

Default is Yes.

Environment Variables

The database client code page on a node is usually controlled by an environment variable.

For example, Oracle uses NLS_LANG, and IBM DB2 uses DB2CODEPAGE. All

Integration Services and Repository Services that run on this node use the same

environment variable. You can configure a Repository Service process to use a different

value for the database client code page environment variable than the value set for the

node.

You might want to configure the code page environment variable for a Repository

Service process when the Repository Service process requires a different database client

code page than the Integration Service process running on the same node.

For example, the Integration Service reads from and writes to databases using the UTF-8

code page. The Integration Service requires that the code page environment variable be

set to UTF-8. However, you have a Shift-JIS repository that requires that the code page

environment variable be set to Shift-JIS. Set the environment variable on the node to

UTF-8. Then add the environment variable to the Repository Service process properties

and set the value to Shift-JIS.

6.  Integration Service (IS)

The key functions of IS are

Page 11: Powercenter 8

Interpretation of the workflow and mapping metadata from the repository.

Execution of the instructions in the metadata

Manages the data from source system to target system within the memory and

disk

The main three components of Integration Service which enable data movement are,

Integration Service Process

Load Balancer

Data Transformation Manager 

6.1 Integration Service Process (ISP) 

The Integration Service starts one or more Integration Service processes to run and

monitor workflows. When we run a workflow, the ISP starts and locks the workflow,

runs the workflow tasks, and starts the process to run sessions. The functions of the

Integration Service Process are,

Locks and reads the workflow

Manages workflow scheduling, ie, maintains session dependency

Reads the workflow parameter file

Creates the workflow log

Runs workflow tasks and evaluates the conditional links

Starts the DTM process to run the session

Writes historical run information to the repository

Sends post-session emails

6.2    Load Balancer

The Load Balancer dispatches tasks to achieve optimal performance. It dispatches tasks

to a single node or across the nodes in a grid after performing a sequence of steps. Before

understanding these steps we have to know about Resources, Resource Provision

Thresholds, Dispatch mode and Service levels

Page 12: Powercenter 8

Resources – we can configure the Integration Service to check the resources

available on each node and match them with the resources required to run the

task. For example, if a session uses an SAP source, the Load Balancer dispatches

the session only to nodes where the SAP client is installed

Three Resource Provision Thresholds, The maximum number of runnable

threads waiting for CPU resources on the node called Maximum CPU Run Queue

Length. The maximum percentage of virtual memory allocated on the node

relative to the total physical memory size called Maximum Memory %. The

maximum number of running Session and Command tasks allowed for each

Integration Service process running on the node called Maximum Processes

Three Dispatch mode’s – Round-Robin: The Load Balancer dispatches tasks to

available nodes in a round-robin fashion after checking the “Maximum Process”

threshold. Metric-based: Checks all the three resource provision thresholds and

dispatches tasks in round robin fashion. Adaptive: Checks all the three resource

provision thresholds and also ranks nodes according to current CPU availability

Service Levels establishes priority among tasks that are waiting to be dispatched,

the three components of service levels are Name, Dispatch Priority and Maximum

dispatch wait time. “Maximum dispatch wait time” is the amount of time a task

can wait in queue and this ensures no task waits forever

A .Dispatching Tasks on a node

1. The Load Balancer checks different resource provision thresholds on the node

depending on the Dispatch mode set. If dispatching the task causes any threshold

to be exceeded, the Load Balancer places the task in the dispatch queue, and it

dispatches the task later

2. The Load Balancer dispatches all tasks to the node that runs the master

Integration Service process

B. Dispatching Tasks on a grid,

1. The Load Balancer verifies which nodes are currently running and enabled

Page 13: Powercenter 8

2. The Load Balancer identifies nodes that have the PowerCenter resources required

by the tasks in the workflow

3. The Load Balancer verifies that the resource provision thresholds on each

candidate node are not exceeded. If dispatching the task causes a threshold to be

exceeded, the Load Balancer places the task in the dispatch queue, and it

dispatches the task later

4. The Load Balancer selects a node based on the dispatch mode 

6.3 Data Transformation Manager (DTM) Process

When the workflow reaches a session, the Integration Service Process starts the DTM

process. The DTM is the process associated with the session task. The DTM process

performs the following tasks:

Retrieves and validates session information from the repository.

Validates source and target code pages.

Verifies connection object permissions.

Performs pushdown optimization when the session is configured for pushdown

optimization.

Adds partitions to the session when the session is configured for dynamic

partitioning.

Expands the service process variables, session parameters, and mapping variables

and parameters.

Creates the session log.

Runs pre-session shell commands, stored procedures, and SQL.

Sends a request to start worker DTM processes on other nodes when the session is

configured to run on a grid.

Creates and runs mapping, reader, writer, and transformation threads to extract,

transform, and load data

Runs post-session stored procedures, SQL, and shell commands and sends post-

session email

After the session is complete, reports execution result to ISP 

Page 14: Powercenter 8

Pictorial Representation of Workflow execution:

1. A PowerCenter Client request IS to start workflow

2. IS starts ISP

3. ISP consults LB to select node

4. ISP starts DTM in node selected by LB.