Informatica Fundamentalsetl

Embed Size (px)

Citation preview

  • 8/6/2019 Informatica Fundamentalsetl

    1/16

    Informatica Fundamentals

    1. Introduction

    Organizations have a number of ERP, CRM, SCM and Web application

    implementations and are hence burdened with the maintenance of theseheterogeneous environments. To address the existing and evolving integration

    requirements, organizations need a reliable and scalable data integration

    architecture so that individual projects can build value on one another.Informaticaprovides a complete range of tools and data services needed to address the most

    complex data integration projects.

    2. Purpose and Intended Audience

    The purpose of this document is to provide an overview of the architecture of

    Informatica, its features, its working, the advantages offered by Informatica vis--

    vis the other data integration tools etc.This document is intended as a reference material for members of the ETL team

    so as enable the team members in getting an initial understanding of theArchitecture, Features and Working of Informatica.The Case Study provided

    herein would help the reader in getting a good working knowledge of the

    application.

    3. Assumptions:

    In order to follow this document better, the reader would be required to have a

    sound knowledge of the Data Warehousing concepts and also have an exposure to

    SQL as a language for the database. Knowledge of ODBC and basic networkingis essential to help install Informatica and knowledge of Unix and Shells would behelpful for Unix based servers.

    4. Reference:

    Title Location

  • 8/6/2019 Informatica Fundamentalsetl

    2/16

    5. Informatica in the Data Warehousing Scenario

    a) What is a Data Warehouse?

    A Data Warehouse is a Subject Oriented, Integrated, Non volatile, and TimeVariant repository of data that is generally used for querying and analyzing the

    past trends to support management decisions for the future.

    A Data Warehouse can be a relational database, multidimensionaldatabase, flat file, hierarchical database, object database, etc.

    Please refer the following links for more information on Data Warehousing concepts

    http://www.dwinfocenter.org/

    DWH_Material_Presentation.ppt

    b) Stages in a typical Data Warehousing project

    i. Requirement Gathering

    The Project team will gather end user reporting requirements and the

    remaining period of the project would be dedicated to satisfying these

    requirements.

    ii. Identify the Business Areas.

    Identify the data that would be required by the Business.

    iii. Data Modeling

    The foundation of the data warehousing system is the data model. The firststep in this stage is to build the Logical data model based on the userrequirements and the next step would be to translate the Logical data

    model into a Physical data model.

    iv. ETL Process: ETL is the Data Warehouse acquisition processes of

    Extracting, Transforming and Loading data from source systems into the

    data warehouse.This requires an understanding of the business rules, the logical and the

    physical data models and also involves getting the data from the source

    and populating it into the target.

    v. Reporting: Design, Develop and enable the end users to visualize the

    reports thereby bringing value to the Data Warehouse.

    http://www.dwinfocenter.org/http://www.dwinfocenter.org/
  • 8/6/2019 Informatica Fundamentalsetl

    3/16

    c) What are the various ETL tools that are available?

    Selection of an ETL tool would depend on various factors such as the Complexity

    of the data transformation, Data Cleansing needs and the Volume of data

    involved.The commonly used ETL tools are:

    Informatica

    Ab Initio.For information on Ab Initio as an ETL tool, refer the link

    http://www.abinitio.com/abinitio/ab.nsf/index-flashFor discussions on Ab Initio, refer the link below:

    http://www.datawarehouse.com/forum/read.php?f=21&i=1921&t=1921

    Ascential DataStageFor information on Ab Initio, refer the link below

    http://www.ascential.com/products/ds_features.html

    Data Junction Reveleus

    d) What is Informatica?

    Informatica provides an environment that can extract data from multiple sources,transform the data according to the business logic that is built in the Informatica

    Client application and load the transformed data into files or relational targets.

    Informatica comes in different packages:

    PowerCenter license has all options, including distributed metadata (data about

    data).

    PowerMart is a limited license and does not have a distributed metadata.

    http://www.abinitio.com/abinitio/ab.nsf/index-flashhttp://www.datawarehouse.com/forum/read.php?f=21&i=1921&t=1921http://www.ascential.com/products/ds_features.htmlhttp://www.abinitio.com/abinitio/ab.nsf/index-flashhttp://www.datawarehouse.com/forum/read.php?f=21&i=1921&t=1921http://www.ascential.com/products/ds_features.html
  • 8/6/2019 Informatica Fundamentalsetl

    4/16

    The other products that are provided by Informatica are

    PowerAnalyzer which is a web based tool for data analysis.

    SuperGlue provides graphical representation of data quality and flow, flexible

    analysis and reporting of overall data volumes, loading performance, etc.

    6. Architecture:

    The diagram provided below provides an overview of the various components ofInformatica and the connectivity between them:

    Informatica 5.1 provides the following integrated components:

    a) Informatica Repository:

    The Informatica Repository is a database with a set of metadata tables that is

    accessed by the Informatica Client and Server to save and retrieve metadata.Repository stores the data needed for data extraction, transformation, loading, and

    management.

    b) Informatica Client:The Informatica Client is used to manage users, define sources and targets, build

    mappings and mapplets with the transformation logic, and create sessions to runthe mapping logic.The Informatica Client has three main applications:

    i. Repository Manager: This is used to create and administer the metadatarepository.

    The repository users and groups are created through the Repository

    Manager.

  • 8/6/2019 Informatica Fundamentalsetl

    5/16

    Assigning privileges and permissions, managing folders in the repository

    and managing locks on the mappings are also done through the Repository

    Manager

    ii. Designer: The Designer has five tools that are used to analyze sources,

    design target schemas and build the Source to Target mappings. These are Source Analyzer: This is used to either import or create the

    source definitions.

    Warehouse Designer: This is used to import or create targetdefinitions.

    Mapping Designer: This is used to create mappings that will be

    run by the Informatica Server to extract, transform and load data.

    Transformation Developer: This is used to develop reusable

    transformations that can be used in mappings.

    Mapplet Designer: This is used to create sets of transformationsreferred to as Mapplets which can be used across mappings.

    iii. Server Manager: The Server Manager is used to create, schedule, executeand monitor sessions.

    c) Informatica Server:

    The Informatica Server reads the mapping and the session information from therepository. It extracts data from the mapping sources, stores it in the memory,

    applies the transformation rules and loads the transformed data into the mapping

    targets.

    Connectivity:

    Informatica uses the Network Protocol, Native Drivers or the ODBC for theConnectivity between its various components. The Connectivity details are as

    provided in the diagram above.

    7. Setting up Informatica:

    i. Install and Configure the Server components.ii. Install the Client applications.

    iii. Configure the ODBC.

    iv. Register the Informatica Server in the Server Manager.

    v. Create a Repository, create users and groups, edit users profiles.vi. Add source and target definitions, set up mapping between the sources and

    targets, create a session for each mapping and run the sessions.

    a) Configuring the ODBC

    i. Go to StartSettingsControl Panel

    ii. Go to Administrative ToolsData Sources(ODBC)

  • 8/6/2019 Informatica Fundamentalsetl

    6/16

    iii. Click on the System DSN tab and add an entry.

    iv. Select MERANT CLOSED 3.60 32-BIT Oracle 8 driver.

    v. Provide any Data Source Name.vi. Provide the tns entry name for the (Informatica) database as the Server

    Name.

    vii. Do a test connect by providing the informatica database userid andpassword.

    viii. Save the settings.

    b) Configuring the Informatica Repository

    i. Open the Repository Manager

    ii. Click on RepositoryAdd Repository

    iii. Provide the Name of an existing Repository and its Username

    iv. Click on RepositoryConnect

    v. Provide the password for the repository.

    vi. Provide the Informatica database details (those provided during the ODBCsetup).

    vii. Open the Designer

    viii. Click on the RepositoryConnect tab.

    ix. Provide the password for the repository.x. The left pane displays the various folders and the Sources, Targets,

    Mappings, Transformations, Mapplets etc within each folder.

    xi. Click on the Mappings tab within any folder, select a mapping and drag itinto the right pane to view the mapping.

  • 8/6/2019 Informatica Fundamentalsetl

    7/16

    8. Case Study

    A Transformation is a repository object that generates, modifies, or passes data.

    The various Transformations that are provided by the Designer in Informatica have been

    explained with the aid of a mapping, Map_CD_Country_code. (Explained in blue)The mapping is present in the cifSIT9i repository of the SIT machine under the folder

    Ecif_Dev_map

    Objective: The mapping Map_CD_Country_code has been developed to extract data

    from the STG_COUNTRY table and move it into the ECIF_COUNTRY and the

    TRF_COUNTRY target tables.

    a) Source Definition:

    i. The Source Definition contains a detailed definition of the Source.

    ii. The Source can be a Relational table, Fixed width and delimited flat filesthat do not contain binary data, COBOL files etc.

    iii. The relational source definition is imported from database tables byconnecting to the source database from the client machine.

    The Source in the Map_CD_Country_code is

    Shortcut_To_STG_COUNTRY*, a Source Definition Shortcut.

    Right click on the Source and select edit.

    In the Edit Transformations window, the Transformation tab has the

    following info:

  • 8/6/2019 Informatica Fundamentalsetl

    8/16

    The circled area provides the location of the object that the shortcut references.In the above ex, the object referenced by the shortcut is present in the cifSIT9i repository

    under the Ecif_dev_def folder and the object name is STG_COUNTRY.

    All fields from the Source are moved into the Source Qualifier.

    *For information on the Naming Standard, please refer the document embedded below:

    Informatica_ETL_Naming_Conventions.d

    P.N: The Naming standards provided in the document indicate generic standards thatCAN be followed while designing a mapping.

    What are the advantages of having a Shortcut?

    The following are the main advantages of having a Shortcut:

    The main advantage of having a shortcut is maintenance.

    If all instances of an object have to change, the original repository object is the

    only object that has to be edited and all shortcuts accessing the objectautomatically inherit the changes.

    Restricting the repository users to a set of predefined metadata by asking users toincorporate the shortcuts into their work instead of developing repository objects

    independently.

  • 8/6/2019 Informatica Fundamentalsetl

    9/16

    Space can be saved in a repository by keeping a single repository object and usingshortcuts to that object, instead of creating copies of the object in multiple folders.

    For information on creating and working with Shortcuts, refer the Informatica DesignerHelp.

    b) Source Qualifier (SQ_Shortcut_To_STG_COUNTRY):i. The Source Qualifier is an Active transformation.

    ii. The differences between an Active and a Passive transformation are asgiven below:

    Active Transformation Passive Transformation

    An Active Transformation can change the

    number of rows that pass through it

    A Passive Transformation does not change

    the number of rows that pass through it.

    Ex.: Advanced External Procedure

    Aggregator

    ERP Source Qualifier

    Filter

    Joiner

    Normalizer

    Rank

    Source Qualifier

    Router

    Update Strategy

    Ex: Expression

    External Procedure

    Input

    Lookup

    Output

    Sequence Generator

    Stored Procedure

    XML Source Qualifier

    In the SQ_Shortcut_To_STG_COUNTRY, click on the Properties tab SQL

    Query.

    The SQL Query is the query that is generated by Informatica and is a SELECT statementfor each source column used in the mapping. But the Informatica Server reads only the

    columns in Source Qualifier that are connected to another transformation.

    In SQ_Shortcut_To_STG_COUNTRY, since all 4 fields ISO_CTRY_COD,

    CTRY_NAM, EMU_IND, PROC_FLG columns are connected to theEXP_COUNTRY transformation and hence the default SQL Query generated by

    Informatica would have all 4 columns. In case, one of the fields had not been

    mapped to any other transformation, that field would not have appeared in thedefault SQL Query.

    The ISO_CTRY_COD field from the Source Qualifier is moved to the Lookup

    transformation LKP_CTRY_COD and all the fields including theISO_CTRY_COD is moved to the Expression transformation EXP_COUNTRY.

  • 8/6/2019 Informatica Fundamentalsetl

    10/16

    c) Lookup Transformation (LKP_CTRY_COD)

    i. Lookup transformation is Passive transformation.ii. A Lookup transformation would be used in an Informatica mapping to

    lookup data in a relational table, view, or synonym.

    iii. The Informatica server queries the lookup table based on the lookup portsin the transformation. It compares Lookup transformation port values to

    lookup table column values based on the lookup condition. The result of

    the Lookup would then be passed on to other transformations and targets.

    In the Lookup transformation LKP_CTRY_COD, the input field

    SRC_COUNTRY_CODE is looked up against the COUNTRY_CODE field

    of the Lookup table and if the Lookup is successful, then the corresponding

    COUNTRY_CODE is returned as the output.For more info on Lookup transformation and on Lookup caches, refer the Informatica

    Designer Help and also the attached doc.

    Lookup_Cache.doc

    How does the Lookup Cache work?

    Informatica creates a data cache and an index cache when the first row in the data flow

    hits the Lookup transformation. This happens only when the Lookup cache option is

    enabled in the transformation properties.To create these caches, Informatica issues a SELECT statement against the database

    where the lookup table resides and extracts all the data it needs for the lookup. After that,

    whenever a row passes through the lookup, Informatica tries to find a match within thecached data set based on the lookup conditions and input port values for that row.

    When the cache option is disabled, Informatica queries the lookup table every time a rowpasses through the lookup.

    Advantages of Lookup transformation over Source Qualifier/Joiner transformation

    Lookup transformation helps in fetching data from a table exactly where we need it in the

    data stream, instead of having to pass the data through every step of the mapping, as itwould with a Source Qualifier or a Joiner transformation.

    How do we handle multiple matches in the Lookup table?

    The Lookup transformation can be configured to handle multiple matches in the

    following ways:

    Return the first matching value, or return the last matching valueThe transformation can be configured to return the first matching value or the lastmatching value. The first and last values are the first values and last values found

    in the lookup cache that match the lookup condition.

    Return an error: The Informatica server returns the default value for the outputports.

  • 8/6/2019 Informatica Fundamentalsetl

    11/16

    d) Expression Transformation (EXP_COUNTRY)i. Expression transformation is Passive transformation

    All fields from the Source Qualifier are moved into the Expression

    transformation. The COUNTRY_CODE that is the output of the Lookup

    transformation is also moved into the Expression transformation.

    O_PROC_FLAG has been set to Y in the Expression transformation. All fields from the Expression transformation except the PROC_FLG field

    are moved into the Filter transformations FIL_NOTNULL_CTRY_COD

    and FIL_NULL_CTRY_COD.

    e) Filter Transformation (FIL_NOTNULL_CTRY_COD)

    Filter transformation is an Active transformation.

    The COUNTRY_CODE field is checked for NOT NULL and if found true,

    the records are passed on to the Update Strategy UPD_COUNTRY_CODE,

    the Lookup transformation LKPTRANS and the Update StrategyUPD_UPD_STG_COUNTRY.

    f) Update Strategy Transformation (UPD_COUNTRY_CODE)i. Update Strategy transformation is an Active transformation.

    The ISO_CTRY_COD, CTRY_NAM, BMU_IND fields are moved to the

    Update Strategy transformation from the FIL_NOTNULL_CTRY_COD

    transformation.

    Click on the Properties tab

    Update Strategy Expression is DD_UPDATE.

    Forward Rejected Rows option is selected.

    ii. Update Strategy Expression is used to flag individual records for insert,

    delete, update or reject.iii. The below table lists the constants for each database operation and the

    numerical equivalent:

    Operation Constant Numeric Value

    Insert DD_INSERT 0

    Update DD_UPDATE 1

    Delete DD_DELETE 2

    Reject DD_REJECT 3

    iv. A session can also be configured for handling specific database operations.This is done by setting the Treat rows as field in the Session Wizard

    dialog box that appears while session configuration.

    Open the Server Manager.

    Click on cifSIT9i under the Repositories tab

  • 8/6/2019 Informatica Fundamentalsetl

    12/16

  • 8/6/2019 Informatica Fundamentalsetl

    13/16

    g) Update Strategy Transformation (UPD_UPD_STG_COUNTRY) This receives the ISO_CTRY_COD and PROC_FLG fields from the filter

    transformation FIL_NOTNULL_CTRY_COD when the COUNTRY_CODEis NOT NULL.

    This updates the target table Shortcut_To_STG_COUNTRY which is a

    shortcut to the STG_COUNTRY table.

    h) Lookup Transformation (LKPTRANS)

    The ISO_CTRY_COD from the filter transformation

    FIL_NOTNULL_CTRY_COD is brought as input to the Lookup transformation.

    ISO_CTRY_COD as SRC_ISO_CTRY_COD is looked up against the

    ISO_CTRY_COD of the TRF_COUNTRY lookup table and if the Lookup issuccessful, the corresponding ISO_CTRY_COD of the lookup table is taken as

    the output.

    The output of the Lookup table is passed to the Filter transformations

    FIL_NULL_TRF_CTRY_COD and FIL_NOTNULL_TRF_CTRY_COD.

  • 8/6/2019 Informatica Fundamentalsetl

    14/16

    i) Filter Transformation (FIL_NULL_TRF_CTRY_COD)

    This transformation receives the ISO_CTRY_COD from the Lookup

    transformation LKPTRANS and the rest of the fields from the Filtertransformation FIL_NOTNULL_CTRY_COD.

    The ISO_CTRY_COD field which is the output of the previous lookup is checked

    for NULL and if found to be NULL, the records are inserted into the targetShortcut_To_TRF_COUNTRY which is a Shortcut to the TRF_COUNTRY table.

    j) Filter Transformation (FIL_NOTNULL_TRF_CTRY_COD)

    This transformation receives the ISO_CTRY_COD from the Lookup

    transformation LKPTRANS and the rest of the fields from the Filter

    transformation FIL_NOTNULL_CTRY_COD.

    The ISO_CTRY_COD field which is the output of the previous lookup is checked

    for NOT NULL and if found to be NOT NULL, the records are passed on to the

    Update Strategy UPD_TRF_CTRY_COD.

    k) Update Strategy Transformation (UPD_TRF_CTRY_COD) This is used to update the target table Shortcut_To_TRF_COUNTRY, which is a

    Shortcut to the TRF_COUNTRY table.

    l) Filter Transformation (FIL_NULL_CTRY_COD)

    The COUNTRY_CODE field is checked for NULL and if found true, the records

    are passed on to the Lookup transformation LKPTRANS1 and the Update

    Strategy UPD_INS_STG_COUNTRY.

    The records are also inserted into the target table Shortcut_To_ECIF_COUNTRY

    which is a shortcut to the ECIF_COUNTRY table.

    m) Update Strategy Transformation (UPD_INS_STG_COUNTRY)

    This receives the ISO_CTRY_COD and PROC_FLG fields from the filter

    transformation FIL_NULL_CTRY_COD when the COUNTRY_CODE is NULL.

    This inserts a record into the target table Shortcut_To_STG_COUNTRY which is

    a shortcut to the STG_COUNTRY table.

    n) Lookup Transformation (LKPTRANS1)

    The ISO_CTRY_COD from the filter transformation FIL_NULL_CTRY_COD is

    brought as input to the Lookup transformation.

    ISO_CTRY_COD as SRC_ISO_CTRY_COD is looked up against the

    ISO_CTRY_COD of the TRF_COUNTRY lookup table and if the Lookup issuccessful, the corresponding ISO_CTRY_COD of the lookup table is taken as

    the output.

    The output of the Lookup table is passed to the Filter transformations

    FIL_NULL_TRF_CTRY_COD2 and FIL_NOTNULL_TRF_CTRY_COD2.

  • 8/6/2019 Informatica Fundamentalsetl

    15/16

    o) Filter Transformation (FIL_NULL_TRF_CTRY_COD2)

    This transformation receives the ISO_CTRY_COD from the Lookup

    transformation LKPTRANS1 and the rest of the fields from the Filter

    transformation FIL_NULL_CTRY_COD.

    The ISO_CTRY_COD1 field which is the output of the previous lookup ischecked for NULL and if found to be NULL, the records are inserted into thetarget Shortcut_To_TRF_COUNTRY which is a Shortcut to the TRF_COUNTRY

    table.

    p) Filter Transformation (FIL_NOTNULL_TRF_CTRY_COD2)

    This transformation receives the ISO_CTRY_COD from the Lookup

    transformation LKPTRANS1 and the rest of the fields from the Filter

    transformation FIL_NULL_CTRY_COD.

    The ISO_CTRY_COD1 field which is the output of the previous lookup is

    checked for NOT NULL and if found to be NOT NULL, the records are passed on

    to the Update Strategy UPD_TRF_CTRY_COD2.

    q) Update Strategy Transformation (UPD_TRF_CTRY_COD2)

    This is used to update the target table Shortcut_To_TRF_COUNTRY, which is a

    Shortcut to the TRF_COUNTRY table.

    Stored Procedure Transformation (PR_COMP_COUNTRY)

    i. A Stored Procedure is a Passive transformation.ii. A Stored Procedure can be run with the following options

    Normal

    Pre-load of the Source.

    Post-load of the Source.Pre-load of the Target.

    Post-load of the Target.

    iii. Pre-load of the Source is when the Stored Procedure runs before the sessionretrieves data from the source.

    The Stored Procedure PR_COMP_COUNTRY is called as a Source Pre Load

    procedure.

    What is a Sequence Generator Transformation?

    The Sequence Generator transformation is an object in Informatica which outputs

    a unique sequential number to each dataflow that it is attached to.

    The starting value and the increment are set in the Sequence Generator

    transformation and the NEXTVAL is connected to the dataflow.

    A Sequence generator is normally placed after a filter (generally a filter that

    checks the primary key value of the target for NULL, which would indicate thatthe record is new) and before an update strategy that is set to DD_INSERT.

  • 8/6/2019 Informatica Fundamentalsetl

    16/16

    If multiple informatica mappings write to the same target table, the sequence

    generator should be used as a reusable object or a shortcut.

    If non informatica routines write to the same target table, using a trigger or adatabase method is recommended.

    The document provided below highlights the Best Practices that can be taken into

    consideration either while designing mappings or when running sessions.

    Informatica_Tuning_Guide.doc

    For info on the features in the Informatica Power Center 6.2, refer the link below:

    http://www.itap.purdue.edu/ea/files/PMPC-62_release%20notes%20for%206.2.pdf

    Pls refer the link below for enhancements related to Informatica PowerCenter 7.1

    http://www.csn.no/nyhetsbrev/0402NyhetsbrevInfa_files/whats_new_PC7_dec2003.pdf

    http://www.itap.purdue.edu/ea/files/PMPC-62_release%20notes%20for%206.2.pdfhttp://www.csn.no/nyhetsbrev/0402NyhetsbrevInfa_files/whats_new_PC7_dec2003.pdfhttp://www.itap.purdue.edu/ea/files/PMPC-62_release%20notes%20for%206.2.pdfhttp://www.csn.no/nyhetsbrev/0402NyhetsbrevInfa_files/whats_new_PC7_dec2003.pdf