Upload
venkat485
View
140
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Informatica
Citation preview
INFORMATICA
Overview
A DataWarehouse is a collection of subject oriented databases. It is a series of processes, procedures and tools (h/w & s/w). From the Data Warehouse , data flows to various customized databases. If this data is periodically extracted from data warehouse and loaded into local databases, then local database is called a Data Mart.
Metadata
Data SourcesData Sources Data ManagementData Management AccessAccess
Complete Warehouse Solution Architecture
Operational Data
Legacy Data
The Post
VISA
External DataSources
EnterpriseData
Warehouse
Organizationally structured
ExtractTransformLoad
Data Information Knowledge
Asset Assembly (and Management) Asset Exploitation
Data Mart
Data Mart
Departmentally structured
Data Mart
Sales
Inventory
Purchase
Use of Informatica in Datawarehousing
The data in the data warehouse comes from various sources running on different platforms. An ETL tool is used to integrate data from various sources and load it into DataWarehouse.INFORMATICA is an ETL tool used in the process of Extracting data, transforming the data and loading it in data warehouse. INFORMATICA has two products to carry out this ETL process.
PowerCenterPowerMart
Overview
Source TargetServerSource
DataTransformed Data
Instructions
Repository
Overview
Components
INFORMATICA PowerCenter has following components :•ODBC•PowerCenter Server: It is a application that reads, transforms and writes data to target.
•PowerCenter Client : The client has five different tools:
The Source Analyzer : Used to add source definitions to the repository.The Warehouse Designer : Used to create targets and add their definitions to the repository.The Transformation Developer : Used to create reusable transformations.
Components
Mapplet Designer : Used to create
mapplets.The Mapping Designer : Used to create
mappings from source to targets.
Components
Connectivity And Set Up
Configuring Server Manager
• Informatica Server name
• Type of network protocol to access the server – TCP/IP or IPX/SPX
• Port number on which the client communicates (for TCP/IP) - 4001
• Address of machine on which the server runs (for IPX/SPX)
• Timeout – number of seconds the SM waits for response from Informatica Server
Configuring Server Manager
• Default directories for session files and caches e.g $PMRootDir, $PMSessionLogDir, $PMBadFileDir
• Defining Database Connections
• Defining FTP connections
Features
•INFORMATICA Server : Reads data from sources, transforms data as instructed by repository metadata and writes it to target.
•Repository manager: Used to create and manage repositories.
Repository is a database containing a set of instructions to know from where to get data (source), how to process/transform it and where to write it (target). This set of instructions is called metadata.
Features
You can create repository users and groups, assign privileges and permissions, manage folders and locks, import and export from ODBC data sources.•Designer: used to create mappings and target tables.•Server manager: used to create sessions and configure the schedule to run the sessions.
Features
Repository User Management
Multiple developers can use same repository to create/manage multiple projects or same project. Informatica allows to create separate user profile for each developer with separate username and password.
Privileges like Administer Server, Create sessions, User Designer can be assigned to each user on repository.Groups of users can be created and privileges can be granted to the groups.A user can be member of one or more groups.
Repository User Management
Access can be restricted to individual folders within a repository.Permissions of following types can be granted to Owner, Owner’s group and Repository users on folders: Read: Allow to view the folder and objects within the folder. Write: Allow to create and edit objects within the folder. Execute: Allow to execute or schedule a session in the folder.
Repository User Management
Designer
• Creation of mappings
MAPPING
Type of metadata that you create to specify how to move and transform data between sources and targets
- Stored in Repository
A mapping describes how to move and transform data from sources to targets. Mapping includes:
SourceTargetTransformations
Mapping
Sample MappingMapping
A component of a mapping which describes how Informatica Server should transform data.
Transformations
There are two categories of transformations depending upon their scope:
Standard Transformation: It is created in a mapping
and exists within that mapping. It can not be used in
other mappings.
Reusable Transformation: It is created and stored
independently in the repository. It can be used by all
mappings.
Transformations
Following are the types of transformations:
Expression – Calculate a value or modify text. Operates on individual rows.Aggregator – Perform aggregate calculations. Operates on sets of rows.
Transformations
Source Qualifier – Filter records read from the relational source only. Order records queried by Informatica server.Filter – Filter records sent to the targets. Applicable to any source.Stored Procedure – Call a stored procedure.External procedure/Advanced External Procedure – Call a procedure in a shared library (e.g. a DLL) or in a COM layer of Windows NT.
Transformations
Sequence Generator – Generates primary keys.Rank – Limit records to a top or bottom range.Normalizer – Normalize records including those read from COBOL sources.Lookup – Get related values.
Transformations
Update Strategy – Determine whether to insert, update, delete or reject data.Joiner – Join records from different databases or flat file systems.
Transformations
Every mapping needs at least one Source Qualifier Transformation or a normalizer transformation for COBOL sources.
Transformations
Ports
A port represents a single column of data.Every source definition, target definition and transformation contains a collection of ports.
There exist four types of ports:
Input port - Receives data.
Output port – provide data.
Input/Output port – pass data.
Variable port – Used to store components of expression.
Ports
Source definitions contain only output ports, since they provide data.
Target definitions contain only input ports, since they receive data.
Transformations contain a combination of input port, output port and input/output port, since they can pass the data as it is or modify the data depending upon its type.
Ports
Transformation Language
Transformation Language is used to write expressions for Transformations. It consists of functions (similar to SQL) used to modify the data or validate the data.
Expressions can be written in followingtypes of transformations:
Aggregator
Expression
Filter
Rank
Update Strategy.
Transformation Language
Transformation Language consists of following components: Functions : E.g. AVG, COUNT, ISNULL,
SUBSTR, IIF etc. Operators : E.g. Addition, Subtraction,
Multiplication, Division etc. Constants : E.g. Built-in constants like TRUE Variables : E.g. SYSDATE to represent current
date. Return Values.
Transformation Language
Mapplets
A Mapplet is a reusable object created in a repository that represents a set of transformations.
Basic steps to create a project:
Create database that contains repository.
Create data model for target.
Create repositories.
Create folders within repositories.
Import definitions of sources.
Create targets that will receive data.
Summary
Create mappings between source & targets,
including transformations which modify the data.
Create source & target connections in the server
manager.
Create sessions for transferring data between
source & target.
Schedule & run sessions.
Summary