48
www.edureka.co/informatica Management in Informatica Power center

Informatica PowerCenter : Agile Data Integration Tool

  • Upload
    edureka

  • View
    115

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Informatica PowerCenter : Agile Data Integration Tool

www.edureka.co/informatica

Management in Informatica Power center

Page 2: Informatica PowerCenter : Agile Data Integration Tool

Slide 2 www.edureka.co/informatica

Understand Informatica & Informatica Product Suite

Explain the Error Handling In Informatica

Understand Informatica Domain & Repository Management

Understand Informatica Recovery Concepts

Understand PowerCenter Log Management

At the end of this module, you will be able to:

Objectives

Page 3: Informatica PowerCenter : Agile Data Integration Tool

Slide 3 www.edureka.co/informatica

Informatica – A Product Company

Informatica Corp. provides data integration software and services for various businesses, industries and government organizations including telecommunication, health care, financial and insurance services

Page 4: Informatica PowerCenter : Agile Data Integration Tool

Slide 4 www.edureka.co/informatica

Informatica Products & Their Functionalities

There are a wide range of products available under the Informatica product suite that helps satisfy the data integration requirements within the enterprise and beyond

Informatica's product is a portfolio focused on Data Integration:

» Data Integration & ETL» Information Lifecycle Management » Complex Event Processing» Data Masking» Data Quality» Data Replication » Data Virtualization » Master Data Management» Ultra Messaging

Currently at version 9.6, these components form a toolset for establishing and maintaining enterprise-wide data warehouses

Page 5: Informatica PowerCenter : Agile Data Integration Tool

Slide 5 www.edureka.co/informatica

Informatica Products & Their Functionalities (Contd.)

Page 6: Informatica PowerCenter : Agile Data Integration Tool

Slide 6 www.edureka.co/informatica

Informatica Products & Their Functionalities (Contd.)

Page 7: Informatica PowerCenter : Agile Data Integration Tool

Slide 7 www.edureka.co/informatica

PowerCenter - Fully integrated end-to-end data integration platform, Informatica PowerCenterEnterprise converts raw data into information to drive analysis, daily operations, and datagovernance initiatives

Information Lifecycle Management - Informatica’s Information Lifecycle Management softwareempowers your IT organizations to cost-effectively handle data growth, safely retire legacysystems and applications, optimize test data management and protect sensitive data

Complex Event Processing - Informatica RulePoint is a complex event processing software thatdelivers robust and effective complex event processing with real-time alerts and insight intopertinent information to operate in a smarter, faster, efficient and competitive way

Data Masking - Informatica Data Masking products dynamically mask sensitive production datafrom unauthorized access, permanently and irreversibly mask nonproduction data thereby helpingIT organizations to comply with data privacy regulations, organization-wide data privacymandates and reduce the risk of a data breach

Informatica Products & Their Functionalities (Contd.)

Page 8: Informatica PowerCenter : Agile Data Integration Tool

Slide 8 www.edureka.co/informatica

Data Quality - Informatica Data Quality provides clean, high-quality data regardless of size, data format, platform, or technology to the business. Helps validating and improving address information, profiling and cleansing business data, or implementing a data governance practice and ensure the data quality requirements are met

Data Replication - Informatica Data Replication is database-agnostic, real-time transaction replication software that’s highly scalable, reliable, and non-disruptive to the performance of operational source systems

Data Virtualization - Informatica Data Services provides a single scalable architecture for both data integration and data federation, creating a data virtualization layer that hides and handles the complexity of accessing underlying data sources - all while insulating them from change

Master Data Management - The Informatica Master Data Management (MDM) product family delivers consolidated and reliable business-critical data—also known as master data—to the applications that employees rely on every day

Ultra Messaging - Informatica Ultra Messaging is a family of next-generation, low-latency messaging middleware products. With very high throughput and 24x7 reliability, they deliver extremely low-latency application messaging over both network-based and shared-memory (inter-process) based transports

Informatica Products & Their Functionalities (Contd.)

Page 9: Informatica PowerCenter : Agile Data Integration Tool

Slide 9 www.edureka.co/informatica

Informatica Resources

Informatica Corporate Website

Informatica University

Customer Portal

Product Documentation

Knowledge Base

Technical Support

Informatica Product Certification

Page 10: Informatica PowerCenter : Agile Data Integration Tool

Slide 10 www.edureka.co/informatica

Introduction to PowerCenter

PowerCenter:

It is a single, unified enterprise data integration platform that allows companies and government organizations of all sizes to access, discover and integrate data from virtually any business system, in any format and deliver that data throughout the enterprise at any speed

An ETL tool ( Extract, Transform and Load)

The main advantages of PowerCenter over other ETL tools lies in its robustness, for it can be used in both Windows and Unix based systems

PowerCenter can read from a variety of different sources and write to as many targets, while transforming data in between

The main advantages of PowerCenter over other ETL tools, and hence a reason for its popularity over other such tools are as follows:

» It is robust, and can be used in both windows and UNIX based systems

» It is high performing yet very simple for developing, maintaining and administering

Page 11: Informatica PowerCenter : Agile Data Integration Tool

Slide 11 www.edureka.co/informatica

Versions of PowerCenter

PowerCenter Version History:

The current version of PowerCenter is Informatica PowerCenter 9.6.1 HF2 (as of Feb ’15)

From version 9.x onwards, PowerCenter has become service oriented, with each server component being identified as a service. (Ex.: Repository service, Integration service etc.)

The previous versions of Informatica are neither in use nor under support of Informatica

For more information please visit www.informatica.com

Page 12: Informatica PowerCenter : Agile Data Integration Tool

Slide 12 www.edureka.co/informatica

PowerCenter Architecture - SOA

The architecture of Informatica PowerCenter (version 9.x onwards) is based on the Service Oriented Architecture (SOA) concept

A service oriented architecture (SOA) can be defined as a group of services, which communicate with each other. The process of communication involves either simple data passing or it could involve two or more services coordinating same activity

Informatica 9.x represents a major change in the architecture of the product line

Aim: Its main aim is to provide improved performance and high availability

Approach: By reengineering, the underlying architecture has been made even more service-based

Page 13: Informatica PowerCenter : Agile Data Integration Tool

Slide 13 www.edureka.co/informatica

PowerCenter Architecture - Single Unified Architecture

Page 14: Informatica PowerCenter : Agile Data Integration Tool

Slide 14 www.edureka.co/informatica

Error Handling In Informatica

Page 15: Informatica PowerCenter : Agile Data Integration Tool

Slide 15 www.edureka.co/informatica

Error Handling In Informatica

Error Handling is one of the must have components in any Data Warehouse or Data Integration project.

When we start with any Data Warehouse or Data Integration projects, business users come up with set of exceptions to be handled in the ETL process. In this article, lets talk about how do we easily handle these user defined error.

Identifying errors and creating an error handling strategy is very important.

The 2 types of errors in an ETL process are – Data Errors & Process Errors.

Data Errors : To handle Data errors we can use the Row Error Logging feature. The errors are captured into the error tables. We can then analyse, correct and reprocess them.

Process errors : To handle Process errors we can configure an email task to notify the event of a session failure.

Page 16: Informatica PowerCenter : Agile Data Integration Tool

Slide 16 www.edureka.co/informatica

Error Handling In Informatica

INFORMATICA FUNCTIONS USED

Informatica PowerCenter to define our user defined error capture logic.

ERROR() : This function Causes the PowerCenter Integration Service to skip a row and issue anerror message, which you define. The error message displays in the session log or written to theerror log tables based on the error logging type configuration in the session.

ABORT() : Stops the session, and issues a specified error message to the session log file or writtento the error log tables based on the error logging type configuration in the session. When thePowerCenter Integration Service encounters an ABORT function, it stops transforming data at thatrow. It processes any rows read before the session aborts.

Page 17: Informatica PowerCenter : Agile Data Integration Tool

Slide 17 www.edureka.co/informatica

Error Handling In Informatica

Page 18: Informatica PowerCenter : Agile Data Integration Tool

Slide 18 www.edureka.co/informatica

Error Handling In Informatica

INFORMATICA ERROR TABLES

Once Configuration is specified, Informatica PowerCenter will create four different tables for errorlogging and the table details as below.

ETL_PMERR_DATA :- Stores data about a transformation row error and its corresponding source row.

ETL_PMERR_MSG :- Stores metadata about an error and the error message.

ETL_PMERR_SESS :- Stores metadata about the session.

ETL_PMERR_TRANS:- Stores metadata about the source and transformation ports, when error occurs.

With this, we are done with the setting required to capture user defined errors. Any data records whichviolates our data validation check will be captured into PMERR tables mentioned above.

Page 19: Informatica PowerCenter : Agile Data Integration Tool

Slide 19 www.edureka.co/informatica

Error Handling In Informatica

REPORT THE ERROR DATA

Now we have the error data stored in the error table, we can pull the error report using an SQL Query.

We can be more fancy with the SQL and get more information from the error.

selectsess.FOLDER_NAME as 'Folder Name',sess.WORKFLOW_NAME as 'WorkFlow Name',sess.TASK_INST_PATH as 'Session Name',data.SOURCE_ROW_DATA as 'Source Data',msg.ERROR_MSG as 'Error MSG'

fromETL_PMERR_SESS sess

left outer join ETL_PMERR_DATA dataon data.WORKFLOW_RUN_ID = sess.WORKFLOW_RUN_ID anddata.SESS_INST_ID = sess.SESS_INST_ID

left outer join ETL_PMERR_MSG msgon msg.WORKFLOW_RUN_ID = sess.WORKFLOW_RUN_ID andmsg.SESS_INST_ID = sess.SESS_INST_ID

wheresess.FOLDER_NAME = <Project Folder Name> andsess.WORKFLOW_NAME = <Workflow Name> andsess.TASK_INST_PATH = <Session Name> andsess.SESS_START_TIME = <Session Run Time>

Page 20: Informatica PowerCenter : Agile Data Integration Tool

Slide 20 www.edureka.co/informatica

Informatica Domain & Repository Management

Page 21: Informatica PowerCenter : Agile Data Integration Tool

Slide 21 www.edureka.co/informatica

Overview of PowerCenter Architecture

The PowerCenter tool consists of :

Client components

Server components

Page 22: Informatica PowerCenter : Agile Data Integration Tool

Slide 22 www.edureka.co/informatica

Client Components of PowerCenter

PowerCenter Repository Manager

PowerCenter Designer

PowerCenter Workflow Manager

PowerCenter Workflow Monitor

PowerCenter Administration Console (browser based)

Page 23: Informatica PowerCenter : Agile Data Integration Tool

Slide 23 www.edureka.co/informatica

Server Components of PowerCenter

The PowerCenter server components comprises of the following services:

Repository service: The Repository service manages the repository. It retrieves, inserts, and updates metadata into the repository database tables

Integration service: The Integration service runs sessions and workflows

SAP BW service: The SAP BW service looks out for RFC requests from SAP BW and initiates workflows to extract data from, or load data into the SAP BW

Web services hub: The Web services hub receives requests from web service clients and exposes PowerCenter workflows as services

Page 24: Informatica PowerCenter : Agile Data Integration Tool

Slide 24 www.edureka.co/informatica

Overall Architecture of PowerCenter

PowerCenter 9.x Architecture

Page 25: Informatica PowerCenter : Agile Data Integration Tool

Slide 25 www.edureka.co/informatica

Informatica- Domain & Nodes

The salient features of a Domain are as follows:

A Domain is a logical collection or set of nodes and services

The PowerCenter Domain is the fundamental administrative unit of PowerCenter

A Domain can be a single PowerCenter installation, or it can consist of multiple PowerCenter installations

The salient features of a node are as follows:

A node is a logical representation of a physical machine. It has physical attributes such as a hostname and a port number

Each node runs a service manager which is responsible for the application and core services

A node can be a gateway node or a worker node, but it can belong to only one Domain

Page 26: Informatica PowerCenter : Agile Data Integration Tool

Slide 26 www.edureka.co/informatica

Gateway Node

A gateway node can be described as follows:

The gateway node is the node where all core services are meant to run

The primary function of a gateway node is to route all service request from the PowerCenter client to other available nodes

If gateway node is unavailable, a Domain cannot accept any service request, however only one node within the Domain can act as a gateway at any given point in time

Page 27: Informatica PowerCenter : Agile Data Integration Tool

Slide 27 www.edureka.co/informatica

Informatica- Domain & Nodes (Summarization)

Page 28: Informatica PowerCenter : Agile Data Integration Tool

Slide 28 www.edureka.co/informatica

How different components of PowerCenter interact

Page 29: Informatica PowerCenter : Agile Data Integration Tool

Slide 29 www.edureka.co/informatica

Informatica Repository Management

#1. Repository is a generic term referred to container, place or room where something is stored.

#2. Informatica repository is a set of database tables where Informatica stores its metadata. METADATA is datathat describes other data. More specifically it is data about data.

#3. Informatica repository keeps Informatica Meta data. Information about different type of objects,Example mappings, transformations, Folders, connections, user privileges etc.

#4. Informatica repository metadata tables in industry also called as OPB tables/views or REP tables/views.

#5. Repository is managed with client tool “Informatica power center repository manager”. Repository manager isuseful for ADMIN activities.

1 You can create, edit and delete folders.2 You can manage object and user permissions.3 You can backup repository to local machine and restore it back to some other server.4 You can create deployment group.5 You can view objects and their locks and can disable write intent lock on the objects locked by you.6 You can import and export objects.7 You can copy objects from one folder to another.

Page 30: Informatica PowerCenter : Agile Data Integration Tool

Slide 30 www.edureka.co/informatica

Informatica Recovery Concepts

Page 31: Informatica PowerCenter : Agile Data Integration Tool

Slide 31 www.edureka.co/informatica

Informatica Recovery Concepts

# Informatica Recovery Strategies

• Workflow Configuration for Recovery

• Session Recovery

Page 32: Informatica PowerCenter : Agile Data Integration Tool

Slide 32 www.edureka.co/informatica

Informatica Recovery Concepts

# Workflow Recovery

• Workflow recovery allows you to continue processing the workflow and workflow tasks from the point ofinterruption.

• During the workflow recovery process Integration Service access the workflow state, which is stored in memory or ondisk based on the recovery configuration.

• The workflow state of operation includes the status of tasks in the workflow and workflow variable values.

• The configuration includes.

1. Workflow Configuration for Recovery2. Session and Tasks Configuration for Recovery3. Recovering the Workflow from Failure

Page 33: Informatica PowerCenter : Agile Data Integration Tool

Slide 33 www.edureka.co/informatica

Informatica Recovery Concepts

1. Workflow Configuration for Recovery

To configure a workflow for recovery, we must enable the workflow for recovery or configure the workflow to suspend on task error.

Enable Recovery : When you enable aworkflow for recovery, the IntegrationService saves the workflow state ofoperation in a shared location. You canrecover the workflow if it terminates,stops, or aborts. The workflow does nothave to be running.

Page 34: Informatica PowerCenter : Agile Data Integration Tool

Slide 34 www.edureka.co/informatica

Informatica Recovery Concepts

1. Workflow Configuration for Recovery

Suspend : When you configure a workflow tosuspend on error, the Integration Service storesthe workflow state of operation in memory.

You can recover the suspended workflow if a taskfails.

You can fix the task error and recover theworkflow.

If the workflow is not able to recoverautomatically from failure with in the maximumallowed number of attempts, it goes to'suspended' state..

Page 35: Informatica PowerCenter : Agile Data Integration Tool

Slide 35 www.edureka.co/informatica

Informatica Recovery Concepts

2. Session and Tasks Configuration for Recovery

Session and Tasks Each session or task in a workflow has its own recovery strategy.

When the Integration Service recovers a workflow, it recovers tasks based on the recovery strategy of each task or session specified.

Three different options are available.1. Restart task2. Fail task and continue workflow3. Resume from the last check point for

Recovery

Page 36: Informatica PowerCenter : Agile Data Integration Tool

Slide 36 www.edureka.co/informatica

Informatica Recovery Concepts

1. Restart task : This recovery strategy is available for all type of workflow tasks. When theIntegration Service recovers a workflow, it restarts each recoverable task that is configured witha restart strategy. You can configure Session and Command tasks with a restart recoverystrategy. All other tasks have a restart recovery strategy by default.

2. Fail task and continue workflow : This recovery strategy is only available for session andcommand tasks. When the Integration Service recovers a workflow, it does not recover the task.The task status becomes failed, and the Integration Service continues running the workflow.Configure a fail recovery strategy if you want to complete the workflow, but you do not want torecover the task.

3. Resume from the last checkpoint : This recovery strategy is only available for session tasks. TheIntegration Service saves the session state of operation and maintains target recovery tables. Ifthe session aborts, stops, or terminates, the Integration Service uses the saved recoveryinformation to resume the session from the point of interruption.

Page 37: Informatica PowerCenter : Agile Data Integration Tool

Slide 37 www.edureka.co/informatica

Informatica Recovery Concepts

3. Recovering the Workflow from Failure

Workflow can be either recovered automatically or manually depending on the workflow recovery strategy

Recovering Automatically

If you have High Availability (HA) licence and the workflow is configured to recover automatically as described above,Integration service automatically attempts to recover the workflow based on the recovery strategy set of each session ortask in the workflow. If the workflow is not able to recover automatically from failure with in the maximum allowednumber of attempts, it goes to 'suspended' state, which can be then manually recovered.

Recovering Manually

If you do not have High Availability (HA) licence, you can manually recover the workflow or individual tasks with in aworkflow separately. You can access the options as shown in below image from the workflow manager or from theworkflow monitor.

Page 38: Informatica PowerCenter : Agile Data Integration Tool

Slide 38 www.edureka.co/informatica

Informatica Recovery Concepts

3. Recovering the Workflow from Failure

Recovering Manually

Recover workflow :- Continue processing theworkflow from the point of interruption.

Recover Task :- Recover a session but not the rest ofthe workflow.

Recover workflow from a task :- Recover a sessionand continue processing a workflow.

Page 39: Informatica PowerCenter : Agile Data Integration Tool

Slide 39 www.edureka.co/informatica

PowerCenter Log Management

Page 40: Informatica PowerCenter : Agile Data Integration Tool

Slide 40 www.edureka.co/informatica

Informatica Log Management

Informatica Log Management

Workflow can be either recovered automatically or manually depending on the workflow recovery strategy

The Integration service will be generate two logs when the mapping runs

1) Session log -- Has the details of the task ,session errors and load statistics..

2) Workflow log -- Has the details of the workflow processing, and workflow errors..

The workflow log will be generated when the workflow started and the session log will be generated once the session initiated.

Page 41: Informatica PowerCenter : Agile Data Integration Tool

Slide 41 www.edureka.co/informatica

Informatica Log Management

Informatica Log Management

The workflow log will be generated when the workflow started and the session log will be generated once the sessioninitiated.

The below process will happen the when the workflow initiated..

1. The Integration Service writes binary log files on the node. It sends information about the sessions and workflows to theLog Manager.

2. The Log Manager stores information about workflow and session logs in the domain configuration database. The domainconfiguration database stores information such as the path to the log file location, the node that contains the log, and theIntegration Service that created the log.

3. When you view a session or workflow in the Log Events window, the Log Manager retrieves the information from thedomain configuration database to determine the location of the session or workflow logs.

4. The Log Manager dispatches a Log Agent to retrieve the log events on each node to display in the Log Events window.

Page 42: Informatica PowerCenter : Agile Data Integration Tool

Slide 42 www.edureka.co/informatica

Informatica Log Management

Informatica Log Management

When a workflow is invoked the Integration Service creates the following output files:

Workflow log :The Integration Service process creates a workflow log for each workflow it runs. It writes information in theworkflow log such as initialization of processes, workflow task run information, errors encountered, and workflow runsummary.

Session log : The Integration Service process creates a session log for each session it runs. It writes information in thesession log such as initialization of processes, session validation, creation of SQL commands for reader and writer threads,errors encountered, and load summary.

Session detail : When you run a session, the Workflow Manager creates session details that provide load statistics for eachtarget in the mapping Performance Detail : Performance details provide transformation-by-transformation information onthe flow of data through the session.

Reject Files : By default, the Integration Service process creates a reject file for each target in the session. The reject filecontains rows of data that the writer does not write to targets.

Page 43: Informatica PowerCenter : Agile Data Integration Tool

Slide 43 www.edureka.co/informatica

Informatica Log Management

Informatica Log Management

When a workflow is invoked the Integration Service creates the following output files:

Row Error Logs : When a row error occurs, the Integration Service process logs error information that allows you todetermine the cause and source of the error.

Recovery Tables Files : The Integration Service process creates recovery tables on the target database system when it runs asession enabled for recovery. When you run a session in recovery mode.

Indicator File : If you use a flat file as a target, you can configure the Integration Service to create an indicator file for targetrow type information. For each target row, the indicator file contains a number to indicate whether the row was marked forinsert, update, delete, or reject.

Cache Files : When the Integration Service process creates memory cache, it also creates cache files. The Integration Serviceprocess creates cache files for the following mapping objects: Aggregator transformation, Joiner transformation,Ranktransformation, Lookup transformation, Sorter transformation, XML target.

Page 44: Informatica PowerCenter : Agile Data Integration Tool

Slide 44 www.edureka.co/informatica

Informatica Log Management

Informatica Logs - Different Types of Tracing Levels In Informatica

The tracing levels can be configured at the transformation And/OR session level in informatica. There are 4

different types of tracing levels. The different types of tracing levels are listed below:

Tracing levels:

•None: Applicable only at session level. The Integration Service uses the tracing levels configured in the mapping.

•Terse: logs initialization information, error messages, and notification of rejected data in the session log file.

•Normal: Integration Service logs initialization and status information, errors encountered and skipped rows due to

transformation row errors. Summarizes session results, but not at the level of individual rows.

•Verbose Initialization: In addition to normal tracing, the Integration Service logs additional initialization details;

names of index and data files used, and detailed transformation statistics.

•Verbose Data: In addition to verbose initialization tracing, the Integration Service logs each row that passes into

the mapping. Also notes where the Integration Service truncates string data to fit the precision of a column and

provides detailed transformation statistics. When you configure the tracing level to verbose data, the Integration

Service writes row data for all rows in a block when it processes a transformation.

Page 45: Informatica PowerCenter : Agile Data Integration Tool

Slide 45 www.edureka.co/informatica

Informatica Log Management

Page 46: Informatica PowerCenter : Agile Data Integration Tool

Slide 46 www.edureka.co/informatica

Informatica Log Management

Page 47: Informatica PowerCenter : Agile Data Integration Tool

Questions

Slide 47 www.edureka.co/informatica

Page 48: Informatica PowerCenter : Agile Data Integration Tool

Slide 48 www.edureka.co/informatica