103
FINAL INTERVIEW QUESTIONS ( ETL - INFORMATICA) Data warehousing Basics 1. Definition of data warehousing? Data warehouse is a Subject oriented, Integrated, Time variant, Non volatile collection of data in support of management's decision making process. Subject Oriented Data warehouses are designed to help you analyze data. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case makes the data warehouse subject oriented. Integrated Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated. Nonvolatile Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred. Time Variant In order to discover trends in business, analysts need large amounts of data. This is very much in contrast to online transaction processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive. A data warehouse's focus on change over time is what is meant by the term time variant. 2. How many stages in Datawarehousing? Data warehouse generally includes two stages ETL Report Generation ETL Short for extract, transform, load, three database functions that are combined into one tool Extract -- the process of reading data from a source database. Transform -- the process of converting the extracted data from its previous form into required form Load -- the process of writing the data into the target database. ETL is used to migrate data from one database to another, to form data marts anddata warehouses and also to convert databases from one format to another format. It is used to retrieve the data from various operational databases and is transformed into useful information and finally loaded into Datawarehousing system. 1 INFORMATICA 2 ABINITO 3 DATASTAGE 4. BODI 5 ORACLE WAREHOUSE BUILDERS Report generation In report generation, OLAP is used (i.e.) online analytical processing. It is a set of specification which allows the client applications in retrieving the data for analytical processing.

Final Interview Questions

  • Upload
    surredd

  • View
    29

  • Download
    2

Embed Size (px)

DESCRIPTION

Informatica

Citation preview

  • FINAL INTERVIEW QUESTIONS ( ETL - INFORMATICA)

    Data warehousing Basics

    1. Definition of data warehousing? Data warehouse is a Subject oriented, Integrated, Time variant, Non volatile collection

    of data in support of management's decision making process.

    Subject Oriented Data warehouses are designed to help you analyze data. For example, to learn more

    about your company's sales data, you can build a warehouse that concentrates on sales.

    Using this warehouse, you can answer questions like "Who was our best customer for this

    item last year?" This ability to define a data warehouse by subject matter, sales in this case

    makes the data warehouse subject oriented.

    Integrated Integration is closely related to subject orientation. Data warehouses must put data

    from disparate sources into a consistent format. They must resolve such problems as

    naming conflicts and inconsistencies among units of measure. When they achieve this, they

    are said to be integrated.

    Nonvolatile Nonvolatile means that, once entered into the warehouse, data should not

    change. This is logical because the purpose of a warehouse is to enable you to analyze

    what has occurred.

    Time Variant In order to discover trends in business, analysts need large amounts of data. This

    is very much in contrast to online transaction processing (OLTP) systems, where

    performance requirements demand that historical data be moved to an archive. A data

    warehouse's focus on change over time is what is meant by the term time variant.

    2. How many stages in Datawarehousing?

    Data warehouse generally includes two stages

    ETL Report Generation

    ETL

    Short for extract, transform, load, three database functions that are combined into one tool

    Extract -- the process of reading data from a source database.

    Transform -- the process of converting the extracted data from its previous form

    into required form

    Load -- the process of writing the data into the target database.

    ETL is used to migrate data from one database to another, to form data marts anddata

    warehouses and also to convert databases from one format to another format.

    It is used to retrieve the data from various operational databases and is transformed into

    useful information and finally loaded into Datawarehousing system.

    1 INFORMATICA

    2 ABINITO

    3 DATASTAGE

    4. BODI

    5 ORACLE WAREHOUSE BUILDERS

    Report generation

    In report generation, OLAP is used (i.e.) online analytical processing. It is a set of

    specification which allows the client applications in retrieving the data for analytical

    processing.

  • It is a specialized tool that sits between a database and user in order to provide various

    analyses of the data stored in the database.

    OLAP Tool is a reporting tool which generates the reports that are useful for Decision

    support for top level management.

    1. Business Objects

    2. Cognos

    3. Micro strategy

    4. Hyperion

    5. Oracle Express

    6. Microsoft Analysis Services

  • Different Between OLTP and OLAP

    OLTP OLAP

    1 Application Oriented (e.g., purchase order it is

    functionality of an application)

    Subject Oriented (subject in the

    sense customer, product, item, time)

    2 Used to run business Used to analyze business

    3 Detailed data Summarized data

    4 Repetitive access Ad-hoc access

    5 Few Records accessed at a time (tens), simple query

    Large volumes accessed at a

    time(millions), complex query

    6 Small database Large Database

    7 Current data Historical data

    8 Clerical User Knowledge User

    9 Row by Row Loading Bulk Loading

    10 Time invariant Time variant

    11 Normalized data De-normalized data

    12 E R schema Star schema

  • 3. What are the types of datawarehousing?

    EDW (Enterprise datawarehousing)

    It provides a central database for decision support throughout the enterprise It is a collection of DATAMARTS

    DATAMART

    It is a subset of Datawarehousing It is a subject oriented database which supports the needs of individuals depts. in an

    organizations

    It is called high performance query structure It supports particular line of business like sales, marketing etc..

    ODS (Operational data store)

    It is defined as an integrated view of operational database designed to support operational monitoring

    It is a collection of operational data sources designed to support Transaction processing Data is refreshed near real-time and used for business activity It is an intermediate between the OLTP and OLAP which helps to create an instance

    reports

  • 4. What are the modeling involved in Data Warehouse Architecture?

  • 5. What are the types of Approach in DWH?

    Bottom up approach: first we need to develop data mart then we integrate these data

    mart into EDW

    Top down approach: first we need to develop EDW then form that EDW we develop data

    mart

    Bottom up

    OLTP ETL Data mart DWH OLAP

    Top down

    OLTP ETL DWH Data mart OLAP

    Top down

    Cost of initial planning & design is high Takes longer duration of more than an year

    Bottom up

    Planning & Designing the Data Marts without waiting for the Global warehouse design Immediate results from the data marts Tends to take less time to implement Errors in critical modules are detected earlier. Benefits are realized in the early phases. It is a Best Approach

    Data Modeling Types: Conceptual Data Modeling Logical Data Modeling Physical Data Modeling Dimensional Data Modeling

    1. Conceptual Data Modeling Conceptual data model includes all major entities and relationships and does not contain

    much detailed level of information about attributes and is often used in the INITIAL

    PLANNING PHASE Conceptual data model is created by gathering business requirements from various sources

    like business documents, discussion with functional teams, business analysts, smart

    management experts and end users who do the reporting on the database. Data modelers

    create conceptual data model and forward that model to functional team for their review. Conceptual data modeling gives an idea to the functional and technical team about

    how business requirements would be projected in the logical data model.

    2. Logical Data Modeling This is the actual implementation and extension of a conceptual data model. Logical

    data model includes all required entities, attributes, key groups, and relationships

    that represent business information and define business rules.

    3. Physical Data Modeling Physical data model includes all required tables, columns, relationships, database

    properties for the physical implementation of databases. Database performance, indexing

    strategy, physical storage and demoralization are important parameters of a physical model.

    Logical vs. Physical Data Modeling

    Logical Data Model Physical Data Model

    Represents business information and

    defines business rules Represents the physical implementation of the

    model in a database.

    Entity Table

    Attribute Column

    Primary Key Primary Key Constraint

  • Alternate Key Unique Constraint or Unique Index

    Inversion Key Entry Non Unique Index

    Rule Check Constraint, Default Value

    Relationship Foreign Key

    Definition Comment

    Dimensional Data Modeling Dimension model consists of fact and dimension tables It is an approach to develop the schema DB designs

    Types of Dimensional modeling Star schema Snow flake schema Star flake schema (or) Hybrid schema Multi star schema

    What is Star Schema?

    The Star Schema Logical database design which contains a centrally located fact table surrounded by at least one or more dimension tables

    Since the database design looks like a star, hence it is called star schema db The Dimension table contains Primary keys and the textual descriptions It contain de-normalized business information A Fact table contains a composite key and measures The measure are of types of key performance indicators which are used to evaluate the

    enterprise performance in the form of success and failure Eg: Total revenue , Product sale , Discount given, no of customers To generate meaningful report the report should contain at least one dimension and one fact

    table

    The advantage of star schema

    Less number of joins Improve query performance Slicing down Easy understanding of data.

    Disadvantage: Require more storage space

  • Example of Star Schema:

    Snowflake Schema

    In star schema, If the dimension tables are spitted into one or more dimension tables The de-normalized dimension tables are spitted into a normalized dimension table

    Example of Snowflake Schema:

    In Snowflake schema, the example diagram shown below has 4 dimension tables, 4 lookup tables and 1 fact table. The reason is that hierarchies (category, branch, state, and month)

    are being broken out of the dimension tables (PRODUCT, ORGANIZATION, LOCATION, and

    TIME) respectively and separately.

    It increases the number of joins and poor performance in retrieval of data. In few organizations, they try to normalize the dimension tables to save space. Since dimension tables hold less space snow flake schema approach may be avoided. Bit map indexes cannot be effectively utilized

    Important aspects of Star Schema & Snow Flake Schema In a star schema every dimension will have a primary key. In a star schema, a dimension table will not have any parent table. Whereas in a snow flake schema, a dimension table will have one or more parent tables. Hierarchies for the dimensions are stored in the dimensional table itself in star schema. Whereas hierarchies are broken into separate tables in snow flake schema. These hierarchies

    help to drill down the data from topmost hierarchies to the lowermost hierarchies.

    Star flake schema (or) Hybrid Schema

    Hybrid schema is a combination of Star and Snowflake schema Multi Star schema

    Multiple fact tables sharing a set of dimension tables

    Confirmed Dimensions are nothing but Reusable Dimensions. The dimensions which u r using multiple times or in multiple data marts. Those are common in different data marts

    Measure Types (or) Types of Facts

    Additive - Measures that can be summed up across all dimensions.

    o Ex: Sales Revenue

    Semi Additive - Measures that can be summed up across few dimensions and not

    with others

    o Ex: Current Balance

    Non Additive - Measures that cannot be summed up across any of the dimensions.

    o Ex: Student attendance

    Surrogate Key Joins between fact and dimension tables should be based on surrogate keys Users should not obtain any information by looking at these keys These keys should be simple integers

  • A sample data warehouse schema

    WHY NEED STAGING AREA FOR DWH?

    Staging area needs to clean operational data before loading into data warehouse. Cleaning in the sense your merging data which comes from different source. Its the area where most of the ETL is done

    Data Cleansing It is used to remove duplications It is used to correct wrong email addresses It is used to identify missing data It used to convert the data types It is used to capitalize name & addresses.

    Types of Dimensions:

    There are three types of Dimensions

    Confirmed Dimensions Junk Dimensions Garbage Dimension Degenerative Dimensions Slowly changing Dimensions

    Garbage Dimension or Junk Dimension Confirmed is something which can be shared by multiple Fact Tables or multiple Data Marts. Junk Dimensions is grouping flagged values Degenerative Dimension is something dimensional in nature but exist fact table.(Invoice No)

    Which is neither fact nor strictly dimension attributes. These are useful for

    some kind of analysis. These are kept as attributes in fact table called degenerated

    dimension

    Degenerate dimension: A column of the key section of the fact table that does not

    have the associated dimension table but used for reporting and analysis, such column is

    called degenerate dimension or line item dimension.

    For ex, we have a fact table with customer_id, product_id, branch_id, employee_id, bill_no,

    and date in key section and price, quantity, amount in measure section. In this fact table,

    bill_no from key section is a single value; it has no associated dimension table. Instead of

    creating a

    Separate dimension table for that single value, we can Include it in fact table to improve

    performance. SO here the column, bill_no is a degenerate dimension or line item dimension.

    Informatica Architecture

  • The Power Center domain

    It is a primary unit of the Administration.

    Can have single and multiple domains.

    It is a collection of nodes and services.

    Nodes A node is the logical representation of a machine in a domain

    One node in the domain acts as a gateway node to receive service requests from clients and

    route them to the appropriate service and node

    Integration Service:

    Integration Service does all the real job. It extracts data from sources, processes it as per

    the business logic and loads data to targets.

    Repository Service:

    Repository Service is used to fetch the data from the repository and sends it back to

    the requesting components (mostly client tools and integration service)

    Power Center Repository:

    Repository is nothing but a relational database which stores all the metadata created

    in Power Center.

    Power Center Client Tools:

    The Power Center Client consists of multiple tools.

    Power Center Administration Console:

    This is simply a web-based administration tool you can use to administer the Power Center

    installation.

  • Q. How can you define a transformation? What are different types of

    transformations available in Informatica?

    A. A transformation is a repository object that generates, modifies, or passes data. The

    Designer provides a set of transformations that perform specific functions. For example, an

    Aggregator transformation performs calculations on groups of data. Below are the various

    transformations available in Informatica:

    Aggregator

    Custom

    Expression

    External Procedure

    Filter

    Input

    Joiner

    Lookup

    Normalizer

    Rank

    Router

    Sequence Generator

    Sorter

    Source Qualifier

    Stored Procedure

    Transaction Control

    Union

    Update Strategy

    XML Generator

    XML Parser

    XML Source Qualifier

    Q. What is a source qualifier? What is meant by Query Override?

    A. Source Qualifier represents the rows that the PowerCenter Server reads from a relational

    or flat file source when it runs a session. When a relational or a flat file source definition is

    added to a mapping, it is connected to a Source Qualifier transformation.

    PowerCenter Server generates a query for each Source Qualifier Transformation whenever it

    runs the session. The default query is SELET statement containing all the source columns.

    Source Qualifier has capability to override this default query by changing the default

    settings of the transformation properties. The list of selected ports or the order they appear

    in the default query should not be changed in overridden query.

    Q. What is aggregator transformation?

    A. The Aggregator transformation allows performing aggregate calculations, such as

    averages and sums. Unlike Expression Transformation, the Aggregator transformation can

    only be used to perform calculations on groups. The Expression transformation permits

    calculations on a rowby-row basis only.

  • Aggregator Transformation contains group by ports that indicate how to group the data.

    While grouping the data, the aggregator transformation outputs the last row of each group

    unless otherwise specified in the transformation properties.

    Various group by functions available in Informatica are : AVG, COUNT, FIRST, LAST, MAX,

    MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE.

    Q. What is Incremental Aggregation?

    A. Whenever a session is created for a mapping Aggregate Transformation, the session

    option for Incremental Aggregation can be enabled. When PowerCenter performs

    incremental aggregation, it passes new source data through the mapping and uses

    historical cache data to perform new aggregation calculations incrementally.

    Q. How Union Transformation is used?

    A. The union transformation is a multiple input group transformation that can be used to

    merge data from various sources (or pipelines). This transformation works just like UNION

    ALL statement in SQL, that is used to combine result set of two SELECT statements.

    Q. Can two flat files be joined with Joiner Transformation?

    A. Yes, joiner transformation can be used to join data from two flat file sources.

    Q. What is a look up transformation?

    A. This transformation is used to lookup data in a flat file or a relational table, view or

    synonym. It compares lookup transformation ports (input ports) to the source column

    values based on the lookup condition. Later returned values can be passed to other

    transformations.

    Q. Can a lookup be done on Flat Files?

    A. Yes.

    Q. What is a mapplet?

    A. A mapplet is a reusable object that is created using mapplet designer. The mapplet

    contains set of transformations and it allows us to reuse that transformation logic in multiple

    mappings.

    Q. What does reusable transformation mean?

    A. Reusable transformations can be used multiple times in a mapping. The reusable

    transformation is stored as a metadata separate from any other mapping that uses the

    transformation. Whenever any changes to a reusable transformation are made, all the

    mappings where the transformation is used will be invalidated.

    Q. What is update strategy and what are the options for update strategy?

    A. Informatica processes the source data row-by-row. By default every row is marked to be

    inserted in the target table. If the row has to be updated/inserted based on some logic

    Update Strategy transformation is used. The condition can be specified in Update Strategy

    to mark the processed row for update or insert.

    Following options are available for update strategy:

    DD_INSERT: If this is used the Update Strategy flags the row for insertion. Equivalent

    numeric value of DD_INSERT is 0.

    DD_UPDATE: If this is used the Update Strategy flags the row for update. Equivalent

    numeric value of DD_UPDATE is 1.

  • DD_DELETE: If this is used the Update Strategy flags the row for deletion. Equivalent

    numeric value of DD_DELETE is 2.

    DD_REJECT: If this is used the Update Strategy flags the row for rejection. Equivalent

    numeric value of DD_REJECT is 3.

    Q. What are the types of loading in Informatica? There are two types of loading, 1. Normal loading and 2. Bulk loading.

    In normal loading, it loads record by record and writes log for that. It takes comparatively a

    longer time to load data to the target.

    In bulk loading, it loads number of records at a time to target database. It takes less time

    to load data to target.

    Q. What is aggregate cache in aggregator transformation? The aggregator stores data in the aggregate cache until it completes aggregate calculations.

    When you run a session that uses an aggregator transformation, the informatica server

    creates index and data caches in memory to process the transformation. If the informatica

    server requires more space, it stores overflow values in cache files.

    Q. What type of repositories can be created using Informatica Repository

    Manager?

    A. Informatica PowerCenter includes following type of repositories:

    Standalone Repository: A repository that functions individually and this is unrelated to

    any other repositories.

    Global Repository: This is a centralized repository in a domain. This repository can

    contain shared objects across the repositories in a domain. The objects are shared through

    global shortcuts.

    Local Repository: Local repository is within a domain and its not a global repository.

    Local repository can connect to a global repository using global shortcuts and can use

    objects in its shared folders.

    Versioned Repository: This can either be local or global repository but it allows version

    control for the repository. A versioned repository can store multiple copies, or versions of an

    object. This feature allows efficiently developing, testing and deploying metadata in the

    production environment.

    Q. What is a code page?

    A. A code page contains encoding to specify characters in a set of one or more languages.

    The code page is selected based on source of the data. For example if source contains

    Japanese text then the code page should be selected to support Japanese text.

    When a code page is chosen, the program or application for which the code page is set,

    refers to a specific set of data that describes the characters the application recognizes. This

    influences the way that application stores, receives, and sends character data.

    Q. Which all databases PowerCenter Server on Windows can connect to?

    A. PowerCenter Server on Windows can connect to following databases:

  • IBM DB2

    Informix

    Microsoft Access

    Microsoft Excel

    Microsoft SQL Server

    Oracle

    Sybase

    Teradata

    Q. Which all databases PowerCenter Server on UNIX can connect to?

    A. PowerCenter Server on UNIX can connect to following databases:

    IBM DB2

    Informix

    Oracle

    Sybase

    Teradata

    Q. How to execute PL/SQL script from Informatica mapping?

    A. Stored Procedure (SP) transformation can be used to execute PL/SQL Scripts. In SP

    Transformation PL/SQL procedure name can be specified. Whenever the session is

    executed, the session will call the pl/sql procedure.

    Q. What is Data Driven? The informatica server follows instructions coded into update strategy transformations

    within the session mapping which determine how to flag records for insert, update, delete or

    reject. If we do not choose data driven option setting, the informatica server ignores all

    update strategy transformations in the mapping.

    Q. What are the types of mapping wizards that are provided in Informatica?

    The designer provide two mapping wizard.

    1. Getting Started Wizard - Creates mapping to load static facts and dimension tables as

    well as slowly growing dimension tables.

    2. Slowly Changing Dimensions Wizard - Creates mappings to load slowly changing

    dimension tables based on the amount of historical dimension data we want to keep and the

    method we choose to handle historical dimension data.

    Q. What is Load Manager?

    A. While running a Workflow, the PowerCenter Server uses the Load Manager

    process and the Data Transformation Manager Process (DTM) to run the workflow and

    carry out workflow tasks. When the PowerCenter Server runs a workflow, the Load

    Manager performs the following tasks:

    1. Locks the workflow and reads workflow properties.

    2. Reads the parameter file and expands workflow variables.

    3. Creates the workflow log file.

    4. Runs workflow tasks.

    5. Distributes sessions to worker servers.

    6. Starts the DTM to run sessions.

    7. Runs sessions from master servers.

  • 8. Sends post-session email if the DTM terminates abnormally.

    When the PowerCenter Server runs a session, the DTM performs the following tasks:

    1. Fetches session and mapping metadata from the repository.

    2. Creates and expands session variables.

    3. Creates the session log file.

    4. Validates session code pages if data code page validation is enabled. Checks

    Query conversions if data code page validation is disabled.

    5. Verifies connection object permissions.

    6. Runs pre-session shell commands.

    7. Runs pre-session stored procedures and SQL.

    8. Creates and runs mappings, reader, writer, and transformation threads to extract,

    transform, and load data.

    9. Runs post-session stored procedures and SQL.

    10. Runs post-session shell commands.

    11. Sends post-session email.

    Q. What is Data Transformation Manager?

    A. After the load manager performs validations for the session, it creates the DTM

    process. The DTM process is the second process associated with the session run. The

    primary purpose of the DTM process is to create and manage threads that carry out

    the session tasks.

    The DTM allocates process memory for the session and divide it into buffers. This

    is also known as buffer memory. It creates the main thread, which is called the

    master thread. The master thread creates and manages all other threads.

    If we partition a session, the DTM creates a set of threads for each partition to

    allow concurrent processing.. When Informatica server writes messages to the

    session log it includes thread type and thread ID.

    Following are the types of threads that DTM creates:

    Master Thread - Main thread of the DTM process. Creates and manages all other

    threads.

    Mapping Thread - One Thread to Each Session. Fetches Session and Mapping

    Information.

    Pre and Post Session Thread - One Thread each to Perform Pre and Post Session

    Operations.

    Reader Thread - One Thread for Each Partition for Each Source Pipeline.

    Writer Thread - One Thread for Each Partition if target exist in the source pipeline

    write to the target.

    Transformation Thread - One or More Transformation Thread For Each Partition.

    Q. What is Session and Batches?

    Session - A Session Is A set of instructions that tells the Informatica Server How

    And When To Move Data From Sources To Targets. After creating the session, we

    can use either the server manager or the command line program pmcmd to start

    or stop the session.

  • Batches - It Provides A Way to Group Sessions For Either Serial Or Parallel Execution By

    The Informatica Server. There Are Two Types Of Batches:

    1. Sequential - Run Session One after the Other.

    2. Concurrent - Run Session At The Same Time.

    Q. How many ways you can update a relational source definition and what

    are they?

    A. Two ways

    1. Edit the definition

    2. Reimport the definition

    Q. What is a transformation?

    A. It is a repository object that generates, modifies or passes data.

    Q. What are the designer tools for creating transformations?

    A. Mapping designer

    Transformation developer

    Mapplet designer

    Q. In how many ways can you create ports?

    A. Two ways

    1. Drag the port from another transformation

    2. Click the add button on the ports tab.

    Q. What are reusable transformations?

    A. A transformation that can be reused is called a reusable transformation

    They can be created using two methods:

    1. Using transformation developer

    2. Create normal one and promote it to reusable

    Q. Is aggregate cache in aggregator transformation?

    A. The aggregator stores data in the aggregate cache until it completes aggregate

    calculations. When u run a session that uses an aggregator transformation, the Informatica

    server creates index and data caches in memory to process the transformation. If the

    Informatica server requires more space, it stores overflow values in cache files.

    Q. What r the settings that u use to configure the joiner transformation?

    Master and detail source

    Type of join

    Condition of the join

    Q. What are the join types in joiner transformation?

    A. Normal (Default) -- only matching rows from both master and detail

    Master outer -- all detail rows and only matching rows from master

    Detail outer -- all master rows and only matching rows from detail

    Full outer -- all rows from both master and detail (matching or non matching)

    Q. What are the joiner caches?

    A. When a Joiner transformation occurs in a session, the Informatica Server reads all the

    records from the master source and builds index and data caches based on the master

    rows. After building the caches, the Joiner transformation reads records

    from the detail source and performs joins.

  • Q. What r the types of lookup caches?

    Static cache: You can configure a static or read-only cache for only lookup table. By

    default Informatica server creates a static cache. It caches the lookup table and lookup

    values in the cache for each row that comes into the transformation. When the lookup

    condition is true, the Informatica server does not update the cache while it processes the

    lookup transformation.

    Dynamic cache: If you want to cache the target table and insert new rows into cache and

    the target, you can create a look up transformation to use dynamic cache. The Informatica

    server dynamically inserts data to the target table.

    Persistent cache: You can save the lookup cache files and reuse them the next time the

    Informatica server processes a lookup transformation configured to use the cache.

    Recache from database: If the persistent cache is not synchronized with the lookup

    table, you can configure the lookup transformation to rebuild the lookup cache.

    Shared cache: You can share the lookup cache between multiple transactions. You can

    share unnamed cache between transformations in the same mapping.

    Q. What is Transformation?

    A: Transformation is a repository object that generates, modifies, or passes data.

    Transformation performs specific function. They are two types of transformations:

    1. Active

    Rows, which are affected during the transformation or can change the no of rows that pass

    through it. Eg: Aggregator, Filter, Joiner, Normalizer, Rank, Router, Source qualifier, Update

    Strategy, ERP Source Qualifier, Advance External Procedure.

    2. Passive

    Does not change the number of rows that pass through it. Eg: Expression, External

    Procedure, Input, Lookup, Stored Procedure, Output, Sequence Generator, XML Source

    Qualifier.

    Q. What are Options/Type to run a Stored Procedure?

    A: Normal: During a session, the stored procedure runs where the

    transformation exists in the mapping on a row-by-row basis. This is useful for calling the

    stored procedure for each row of data that passes through the mapping, such as running a

    calculation against an input port. Connected stored procedures run only in normal mode.

    Pre-load of the Source. Before the session retrieves data from the source, the stored

    procedure runs. This is useful for verifying the existence of tables or performing joins of

    data in a temporary table.

    Post-load of the Source. After the session retrieves data from the source, the stored

    procedure runs. This is useful for removing temporary tables.

    Pre-load of the Target. Before the session sends data to the target, the stored procedure

    runs. This is useful for verifying target tables or disk space on the target system.

    Post-load of the Target. After the session sends data to the target, the stored procedure

    runs. This is useful for re-creating indexes on the database. It must contain at least one

    Input and one Output port.

    Q. What kinds of sources and of targets can be used in Informatica?

    Sources may be Flat file, relational db or XML.

  • Target may be relational tables, XML or flat files.

    Q: What is Session Process?

    A: The Load Manager process. Starts the session, creates the DTM process, and

    sends post-session email when the session completes.

    Q. What is DTM process?

    A: The DTM process creates threads to initialize the session, read, write, transform

    data and handle pre and post-session operations.

    Q. What is the different type of tracing levels?

    Tracing level represents the amount of information that Informatica Server writes in

    a log file. Tracing levels store information about mapping and transformations. There are 4

    types of tracing levels supported

    1. Normal: It specifies the initialization and status information and summarization of the

    success rows and target rows and the information about the skipped rows due to

    transformation errors.

    2. Terse: Specifies Normal + Notification of data

    3. Verbose Initialization: In addition to the Normal tracing, specifies the location of the

    data cache files and index cache files that are treated and detailed transformation statistics

    for each and every transformation within the mapping.

    4. Verbose Data: Along with verbose initialization records each and every record processed

    by the informatica server.

    Q. TYPES OF DIMENSIONS?

    A dimension table consists of the attributes about the facts. Dimensions store the

    textual descriptions of the business.

    Conformed Dimension:

    Conformed dimensions mean the exact same thing with every possible fact table to

    which they are joined.

    Eg: The date dimension table connected to the sales facts is identical to the date

    dimension connected to the inventory facts.

    Junk Dimension:

    A junk dimension is a collection of random transactional codes flags and/or text

    attributes that are unrelated to any particular dimension. The junk dimension is

    simply a structure that provides a convenient place to store the junk attributes.

    Eg: Assume that we have a gender dimension and marital status dimension. In the

    fact table we need to maintain two keys referring to these dimensions. Instead of

    that create a junk dimension which has all the combinations of gender and marital

    status (cross join gender and marital status table and create a junk table). Now we

    can maintain only one key in the fact table.

    Degenerated Dimension:

    A degenerate dimension is a dimension which is derived from the fact table and

    doesnt have its own dimension table.

    Eg: A transactional code in a fact table.

    Slowly changing dimension:

  • Slowly changing dimensions are dimension tables that have slowly increasing

    data as well as updates to existing data.

    Q. What are the output files that the Informatica server creates during the

    session running?

    Informatica server log: Informatica server (on UNIX) creates a log for all status and

    error messages (default name: pm.server.log). It also creates an error log for error

    messages. These files will be created in Informatica home directory

    Session log file: Informatica server creates session log file for each session. It writes

    information about session into log files such as initialization process, creation of sql

    commands for reader and writer threads, errors encountered and load summary. The

    amount of detail in session log file depends on the tracing level that you set.

    Session detail file: This file contains load statistics for each target in mapping.

    Session detail includes information such as table name, number of rows written or

    rejected. You can view this file by double clicking on the session in monitor window.

    Performance detail file: This file contains information known as session performance

    details which helps you where performance can be improved. To generate this file

    select the performance detail option in the session property sheet.

    Reject file: This file contains the rows of data that the writer does not write to

    targets.

    Control file: Informatica server creates control file and a target file when you run a

    session that uses the external loader. The control file contains the information about

    the target flat file such as data format and loading instructions for the external

    loader.

    Post session email: Post session email allows you to automatically communicate

    information about a session run to designated recipients. You can create two

    different messages. One if the session completed successfully the other if the session

    fails.

    Indicator file: If you use the flat file as a target, you can configure the Informatica

    server to create indicator file. For each target row, the indicator file contains a

    number to indicate whether the row was marked for insert, update, delete or reject.

    Output file: If session writes to a target file, the Informatica server creates the

    target file based on file properties entered in the session property sheet.

    Cache files: When the Informatica server creates memory cache it also creates cache

    files.

    For the following circumstances Informatica server creates index and data cache

    files:

    Aggregator transformation

    Joiner transformation

    Rank transformation

    Lookup transformation

    Q. What is meant by lookup caches?

    A. The Informatica server builds a cache in memory when it processes the first row

    of a data in a cached look up transformation. It allocates memory for the cache

  • based on the amount you configure in the transformation or session properties. The

    Informatica server stores condition values in the index cache and output values in

    the data cache.

    Q. How do you identify existing rows of data in the target table using lookup

    transformation?

    A. There are two ways to lookup the target table to verify a row exists or not :

    1. Use connect dynamic cache lookup and then check the values of NewLookuprow

    Output port to decide whether the incoming record already exists in the table / cache

    or not.

    2. Use Unconnected lookup and call it from an expression transformation and check

    the Lookup condition port value (Null/ Not Null) to decide whether the incoming

    record already exists in the table or not.

    Q. What are Aggregate tables? Aggregate table contains the summary of existing warehouse data which is grouped to

    certain levels of dimensions. Retrieving the required data from the actual table, which have

    millions of records will take more time and also affects the server performance. To avoid

    this we can aggregate the table to certain required level and can use it. This tables reduces

    the load in the database server and increases the performance of the query and can retrieve

    the result very fastly.

    Q. What is a level of Granularity of a fact table? Level of granularity means level of detail that you put into the fact table in a data

    warehouse. For example: Based on design you can decide to put the sales data in each

    transaction. Now, level of granularity would mean what detail you are willing to put for each

    transactional fact. Product sales with respect to each minute or you want to aggregate it

    upto minute and put that data.

    Q. What is session?

    A session is a set of instructions to move data from sources to targets.

    Q. What is worklet?

    Worklet are objects that represent a set of workflow tasks that allow to reuse a set of

    workflow logic in several window.

    Use of Worklet: You can bind many of the tasks in one place so that they can easily get

    identified and also they can be of a specific purpose.

    Q. What is workflow?

    A workflow is a set of instructions that tells the Informatica server how to execute the tasks.

    Q. Why cannot we use sorted input option for incremental aggregation? In incremental aggregation, the aggregate calculations are stored in historical cache on the

    server. In this historical cache the data need not be in sorted order. If you give sorted

    input, the records come as presorted for that particular run but in the historical cache the

    data may not be in the sorted order. That is why this option is not allowed.

    Q. What is target load order plan?

  • You specify the target loadorder based on source qualifiers in a mapping. If you have the

    multiple source qualifiers connected to the multiple targets, you can designate the order in

    which informatica server loads data into the targets.

    The Target load Plan defines the order in which data extract from source qualifier

    transformation. In Mappings (tab) Target Load Order Plan

    Q. What is constraint based loading? Constraint based load order defines the order of loading the data into the multiple targets

    based on primary and foreign keys constraints.

    Set the option is: Double click the session

    Configure Object > check the Constraint Based Loading

    Q. What is the status code in stored procedure transformation? Status code provides error handling for the informatica server during the session. The

    stored procedure issues a status code that notifies whether or not stored procedure

    completed successfully. This value cannot see by the user. It only used by the informatica

    server to determine whether to continue running the session or stop.

    Q. Define Informatica Repository? The Informatica repository is a relational database that stores information, or metadata,

    used by the Informatica Server and Client tools. Metadata can include information such as

    mappings describing how to transform source data, sessions indicating when you want the

    Informatica Server to perform the transformations, and connect strings for sources and

    targets.

    The repository also stores administrative information such as usernames and passwords,

    permissions and privileges, and product version.

    Use repository manager to create the repository. The Repository Manager connects to the

    repository database and runs the code needed to create the repository tables. These tables

    stores metadata in specific format the informatica server, client tools use.

    Q. What is a metadata?

    Designing a data mart involves writing and storing a complex set of instructions. You need

    to know where to get data (sources), how to change it, and where to write the information

    (targets). PowerMart and PowerCenter call this set of instructions metadata. Each piece of

    metadata (for example, the description of a source table in an operational database) can

    contain comments about it.

    In summary, Metadata can include information such as mappings describing how to

    transform source data, sessions indicating when you want the Informatica Server to

    perform the transformations, and connect strings for sources and targets.

  • Q. What is metadata reporter? It is a web based application that enables you to run reports against repository metadata.

    With a Meta data reporter you can access information about your repository without having

    knowledge of sql, transformation language or underlying tables in the repository.

  • Q. What are the types of metadata that stores in repository? Source definitions. Definitions of database objects (tables, views, synonyms) or files that

    provide source data.

    Target definitions. Definitions of database objects or files that contain the

    target data. Multi-dimensional metadata. Target definitions that are configured as cubes and

    dimensions.

    Mappings. A set of source and target definitions along with transformations containing

    business logic that you build into the transformation. These are the instructions that the

    Informatica Server uses to transform and move data.

    Reusable transformations. Transformations that you can use in multiple mappings.

    Mapplets. A set of transformations that you can use in multiple mappings.

    Sessions and workflows. Sessions and workflows store information about how and when

    the Informatica Server moves data. A workflow is a set of instructions that describes how

    and when to run tasks related to extracting, transforming, and loading data. A session is a

    type of task that you can put in a workflow. Each session corresponds to a single mapping.

    Following are the types of metadata that stores in the repository

    Database Connections

    Global Objects

    Multidimensional Metadata

    Reusable Transformations

    Short cuts

    Transformations

    Q. How can we store previous session logs? Go to Session Properties > Config Object > Log Options Select the properties as follows. Save session log by > SessionRuns Save session log for these runs > Change the number that you want to save the number of log files (Default is 0)

    If you want to save all of the logfiles created by every run, and then select the option

    Save session log for these runs > Session TimeStamp You can find these properties in the session/workflow Properties.

    Q. What is Changed Data Capture? Changed Data Capture (CDC) helps identify the data in the source system that has changed

    since the last extraction. With CDC data extraction takes place at the same time the insert

    update or delete operations occur in the source tables and the change data is stored inside

    the database in change tables.

    The change data thus captured is then made available to the target systems in a controlled

    manner.

    Q. What is an indicator file? and how it can be used? Indicator file is used for Event Based Scheduling when you dont know when the Source

    Data is available. A shell command, script or a batch file creates and send this indicator file

  • to the directory local to the Informatica Server. Server waits for the indicator file to appear

    before running the session.

    Q. What is audit table? and What are the columns in it? Audit Table is nothing but the table which contains about your workflow names and session

    names. It contains information about workflow and session status and their details.

    WKFL_RUN_ID

    WKFL_NME

    START_TMST

    END_TMST

    ROW_INSERT_CNT

    ROW_UPDATE_CNT

    ROW_DELETE_CNT

    ROW_REJECT_CNT

    Q. If session fails after loading 10000 records in the target, how can we load

    10001th record when we run the session in the next time?

    Select the Recovery Strategy in session properties as Resume from the last check

    point. Note Set this property before running the session

    Q. Informatica Reject File How to identify rejection reason D - Valid data or Good Data. Writer passes it to the target database. The target accepts it

    unless a database error occurs, such as finding a duplicate key while inserting.

    O - Overflowed Numeric Data. Numeric data exceeded the specified precision or scale for

    the column. Bad data, if you configured the mapping target to reject overflow or truncated

    data.

    N - Null Value. The column contains a null value. Good data. Writer passes it to the target,

    which rejects it if the target database does not accept null values.

    T - Truncated String Data. String data exceeded a specified precision for the column, so

    the Integration Service truncated it. Bad data, if you configured the mapping target to

    reject overflow or truncated data.

    Also to be noted that the second column contains column indicator flag value D which

    signifies that the Row Indicator is valid.

    Now let us see how Data in a Bad File looks like:

    0,D,7,D,John,D,5000.375,O,,N,BrickLand Road Singapore,T

    Q. What is Insert Else Update and Update Else Insert? These options are used when dynamic cache is enabled.

    Insert Else Update option applies to rows entering the lookup transformation with the row

    type of insert. When this option is enabled the integration service inserts new rows in the

    cache and updates existing rows. When disabled, the Integration Service does not update

    existing rows.

    Update Else Insert option applies to rows entering the lookup transformation with the row

    type of update. When this option is enabled, the Integration Service updates existing rows,

    and inserts a new row if it is new. When disabled, the Integration Service does not insert

    new rows.

  • Q. What are the Different methods of loading Dimension tables? Conventional Load - Before loading the data, all the Table constraints will be checked

    against the data.

    Direct load (Faster Loading) - All the Constraints will be disabled. Data will be loaded

    directly. Later the data will be checked against the table constraints and the bad data wont

    be indexed.

    Q. What are the different types of Commit intervals?

    The different commit intervals are:

    Source-based commit. The Informatica Server commits data based on the number of

    source rows. The commit point is the commit interval you configure in the session

    properties.

    Target-based commit. The Informatica Server commits data based on the number of

    target rows and the key constraints on the target table. The commit point also depends on the buffer block size and the commit interval.

    Q. How to add source flat file header into target file?

    Edit Task-->Mapping-->Target-->Header Options--> Output field names

    Q. How to load name of the file into relation target?

    Source Definition-->Properties-->Add currently processed file name port

    Q. How to return multiple columns through un-connect lookup?

    Suppose your look table has f_name,m_name,l_name and you are using unconnected

    lookup. In override SQL of lookup use f_name||~||m_name||~||l_name you can easily

    get this value using unconnected lookup in expression. Use substring function in expression

    transformation to separate these three columns and make then individual port for

    downstream transformation /Target.

    -----------------------------------------------------------------------------------------

  • Q. What is Factless fact table? In which purpose we are using this in our DWH

    projects? Plz give me the proper answer?

    It is a fact table which does not contain any measurable data.

    EX: Student attendance fact (it contains only Boolean values, whether student

    attended class or not ? Yes or No.)

    A Factless fact table contains only the keys but there is no measures or in other

    way we can say that it contains no facts. Generally it is used to integrate the fact

    tables

    Factless fact table contains only foreign keys. We can have two kinds of aggregate

    functions from the factless fact one is count and other is distinct count.

    2 purposes of factless fact

    1. Coverage: to indicate what did NOT happen. Like to

    Like: which product did not sell well in a particular region?

    2. Event tracking: To know if the event took place or not.

    Like: Fact for tracking students attendance will not contain any measures.

    Q. What is staging area?

    Staging area is nothing but to apply our logic to extract the data from source and

    cleansing the data and put the data into meaningful and summaries of the data for

    data warehouse.

    Q. What is constraint based loading

    Constraint based load order defines the order of loading the data into the multiple

    targets based on primary and foreign keys constraints. Q. Why union transformation is active transformation?

    the only condition for a transformation to bcum active is row number changes.

    Now the thing is how a row number can change. Then there are 2 conditions: 1. either the no of rows coming in and going out is diff.

    eg: in case of filter we have the data like id name dept row_num

    1 aa 4 1 2 bb 3 2 3 cc 4 3

    and we have a filter condition like dept=4 then the o/p wld b like

    id name dept row_num 1 aa 4 1

    3 cc 4 2

    So row num changed and it is an active transformation

  • 2. or the order of the row changes eg: when Union transformation pulls in data, suppose we have

    2 sources sources1:

    id name dept row_num 1 aa 4 1 2 bb 3 2

    3 cc 4 3 source2:

    id name dept row_num 4 aaa 4 4 5 bbb 3 5

    6 ccc 4 6

    it never restricts the data from any source so the data can come in any manner

    id name dept row_num old row_num 1 aa 4 1 1

    4 aaa 4 2 4 5 bbb 3 3 5

    2 bb 3 4 2 3 cc 4 5 3 6 ccc 4 6 6

    so the row_num are changing . Thus we say that union is an active transformation

    Q. What is use of batch file in informatica? How many types of batch file in

    informatica?

    With the batch file, we can run sessions either in sequential or in concurrently.

    Grouping of Sessions is known as Batch.

    Two types of batches:

    1)Sequential: Runs Sessions one after another.

    2)Concurrent: Run the Sessions at the same time.

    If u have sessions with source-target dependencies u have to go for sequential

    batch to start the sessions one after another. If u have several independent

    sessions u can use concurrent batches Which run all the sessions at the same time

    Q. What is joiner cache?

    When we use the joiner transformation an integration service maintains the cache,

    all the records are stored in joiner cache. Joiner caches have 2 types of cache

    1.Index cache 2. Joiner cache.

    Index cache stores all the port values which are participated in the join condition

    and data cache have stored all ports which are not participated in the join

    condition.

  • Q. What is the location of parameter file in Informatica?

    $PMBWPARAM

    Q. How can you display only hidden files in UNIX $ ls -la total 16

    8 drwxrwxrwx 2 zzz yyy 4096 Apr 26 12:00 ./ 8 drwxrwxrwx 9 zzz yyy 4096 Jul 31 16:59 ../

    Correct answer is

    ls -a|grep "^\." $ls -a

    Q. How to delete the data in the target table after loaded.

    SQ---> Properties tab-->Post SQL

    delete from target_tablename

    SQL statements executed using the source database connection, after a pipeline is run write post sql in target table as truncate table name. we have the property in session truncate option.

    Q. What is polling in informatica?

    It displays the updated information about the session in the monitor window. The

    monitor window displays the status of each session when you poll the Informatica

    server. Q. How i will stop my workflow after 10 errors

    Session level property error handling mention condition stop on errors: 10

    --->Config object > Error Handling > Stop on errors Q. How can we calculate fact table size?

    A fact table is multiple of combination of dimension tables

    ie if we want 2 find the fact table size of 3years of historical date with 200 products and 200 stores 3*365*200*200=fact table size Q. Without using emailtask how will send a mail from informatica?

    by using 'mailx' command in unix of shell scripting Q. How will compare two mappings in two different repositories?

    in the designer client , goto mapping tab there is one option that is 'compare', here we will compare two mappings in two different

    repository in informatica designer go to mapping tab--->compare..

    we can compare 2 folders within the same repository ..

    we can compare 2 folders within different repository .. Q. What is constraint based load order

  • Constraint based load order defines the order in which data loads into the multiple

    targets based on primary key and foreign key relationship. Q. What is target load plan

    Suppose i have 3 pipelines in a single mapping designer

    emp source--->sq--->tar1 dept source--->sq--->tar2 bonus source--->sq--->tar3

    my requirement is to load first in tar2 then tar1 and then finally tar3

    for this type of loading to control the extraction of data from source by source qualifier we use target load plan.

    Q. What is meant by data driven.. in which scenario we use that..?

    Data driven is available at session level. it says that when we r using update strategy t/r ,how the integration service fetches the data and how to update/insert row in the database log.

    Data driven is nothing but instruct the source rows that should take action on target i.e(update,delete,reject,insert). If we use the update strategy transformation

    in a mapping then will select the data driven option in session. Q. How to run workflow in unix?

    Syntax: pmcmd startworkflow -sv -d -u -p -f Example

    Pmcmd start workflow service ${INFA_SERVICE} -domain

    ${INFA_DOMAIN} -uv xxx_PMCMD_ID -pv PSWD -folder ${ETLFolder} -wait ${ETLWorkflow} \ Q. What is the main difference between a Joiner Transformation and Union

    Transformation?

    Joiner Transformation merge horizontally Union Transformation merge vertically

    A joiner Transformation is used to join data from hertogenous database ie (Sql

    database and flat file) where has Union transformation is used to join data from the same relational sources.....(oracle table and another Oracle table)

    Join Transformation combines data record horizontally based on join condition. And combine data from two different sources having different metadata.

    Join transformation supports heterogeneous, homogeneous data source. Union Transformation combines data record vertically from multiple sources, having

    same metadata. Union transformation also support heterogeneous data source.

    Union transformation functions as UNION ALL set operator.

  • Q. What is constraint based loading exactly? And how to do this? I think it is when

    we have primary key-foreign key relationship. Is it correct?

    Constraint Based Load order defines load the data into multiple targets depend on

    the primary key foreign key relation. set the option is: Double click the session

    Configure Object check the Constraint Based Loading

    Q. Difference between top down(w.h inmon)and bottom up(ralph kimball)approach? Top Down approach:-

    As per W.H.INWON, first we need to build the Data warehouse after that we need

    to build up the DataMart but this is so what difficult to maintain the DWH. Bottom up approach;-

    As per Ralph Kimbal, first we need to build up the Data Marts then we need to build

    up the Datawarehouse.. this approach is most useful in real time while creating the Data warehouse.

    Q. What are the different caches used in informatica?

    Static cache

    Dynamic cache

    Shared cache

    Persistent cache

    Q. What is the command to get the list of files in a directory in unix?

    $ls -lrt

    Q. How to import multiple flat files in to single target where there is no common

    column in the flat files

    in workflow session properties in Mapping tab in properties choose Source filetype - Indirect

    Give the Source filename : This file should contain all the multiple files which you want to Load Q. How to connect two or more table with single source qualifier?

    Create a Oracle source with how much ever column you want and write the join

    query in SQL query override. But the column order and data type should be same as in the SQL query. Q. How to call unconnected lookup in expression transformation?

    :LKP.LKP_NAME(PORTS) Q. What is diff between connected and unconnected lookup?

  • Connected lookup:

    It is used to join the two tables it returns multiple rows

    it must be in mapping pipeline u can implement lookup condition using connect lookup u can generate sequence numbers by

    enabling dynamic lookup cache.

    Unconnected lookup: it returns single output through return port it acts as a lookup function(:lkp)

    it is called by another t/r. not connected either source r target. ------

    CONNECTED LOOKUP: >> It will participated in data pipeline

    >> It contains multiple inputs and multiple outputs. >> It supported static and dynamic cache.

    UNCONNECTED LOOKUP: >> It will not participated in data pipeline

    >> It contains multiple inputs and single output. >> It supported static cache only. Q. Types of partitioning in Informatica?

    Partition 5 types

    1. Simple pass through 2. Key range

    3. Hash 4. Round robin 5. Database

    Q. Which transformation uses cache?

    1. Lookup transformation

    2. Aggregator transformation 3. Rank transformation

    4. Sorter transformation 5. Joiner transformation

    Q. Explain about union transformation?

    A union transformation is a multiple input group transformation, which is used to

    merge the data from multiple sources similar to UNION All SQL statements to

    combine the results from 2 or more sql statements.

    Similar to UNION All statement, the union transformation doesn't remove

    duplicate rows. It is an active transformation.

  • Q. Explain about Joiner transformation?

    Joiner transformation is used to join source data from two related heterogeneous

    sources. However this can also be used to join data from the same source. Joiner

    t/r join sources with at least one matching column. It uses a condition that matches

    one or more pair of columns between the 2 sources.

    To configure a Joiner t/r various settings that we do are as below:

    1) Master and detail source

    2) Types of join

    3) Condition of the join

    Q. Explain about Lookup transformation?

    Lookup t/r is used in a mapping to look up data in a relational table, flat file, view

    or synonym.

    The informatica server queries the look up source based on the look up ports in the

    transformation. It compares look up t/r port values to look up source column values

    based on the look up condition.

    Look up t/r is used to perform the below mentioned tasks:

    1) To get a related value.

    2) To perform a calculation.

    3) To update SCD tables.

    Q. How to identify this row for insert and this row for update in dynamic lookup

    cache?

    Based on NEW LOOKUP ROW.. Informatica server indicates which one is insert and which one is update.

    Newlookuprow- 0...no change Newlookuprow- 1...Insert Newlookuprow- 2...update

    Q. How many ways can we implement SCD2?

    1) Date range 2) Flag 3) Versioning

    Q. How will you check the bottle necks in informatica? From where do you

    start checking? You start as per this order

    1. Target 2. Source

    3. Mapping 4. Session

    5. System Q. What is incremental aggregation? When the aggregator transformation executes all the output data will get stored in

    the temporary location called aggregator cache. When the next time the mapping

  • runs the aggregator transformation runs for the new records loaded after the first run. These output values will get incremented with the values in the aggregator

    cache. This is called incremental aggregation. By this way we can improve performance...

    --------------------------- Incremental aggregation means applying only the captured changes in the source

    to aggregate calculations in a session. When the source changes only incrementally and if we can capture those

    changes, then we can configure the session to process only those changes. This allows informatica server to update target table incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time you

    run the session. By doing this obviously the session performance increases.

    Q. How can i explain my project architecture in interview..? Tell me your project flow from source to target..?

    Project architecture is like

    1. Source Systems: Like Mainframe,Oracle,People soft,DB2.

    2. Landing tables: These are tables act like source. Used for easy to access, for

    backup purpose, as reusable for other mappings.

    3. Staging tables: From landing tables we extract the data into staging tables

    after all validations done on the data.

    4. Dimension/Facts: These are the tables those are used for analysis and make

    decisions by analyzing the data.

    5. Aggregation tables: These tables have summarized data useful for managers

    who wants to view monthly wise sales, year wise sales etc.

    6. Reporting layer: 4 and 5 phases are useful for reporting developers to generate

    reports. I hope this answer helps you.

    Q. What type of transformation is not supported by mapplets?

    Normalizer transformation

    COBOL sources, joiner

    XML source qualifier transformation

    XML sources

    Target definitions

    Pre & Post Session stored procedures

    Other mapplets

    Q. How informatica recognizes mapping?

    All are organized by Integration service.

    Power center talks to Integration Service and Integration service talk to session.

    Session has mapping Structure. These are flow of Execution.

  • Q. Can every transformation reusable? How?

    Except source qualifier transformation, all transformations support reusable

    property. Reusable transformation developed in two ways.

    1. In mapping which transformation do you want to reuse, select the

    transformation and double click on it, there you got option like make it as reusable

    transformation

    option. There you need to check the option for converting non reusable to reusable

    transformation. but except for source qualifier trans.

    2. By using transformation developer

    Q. What is Pre Sql and Post Sql?

    Pre SQL means that the integration service runs SQL commands against the source

    database before it reads the data from source.

    Post SQL means integration service runs SQL commands against target database

    after it writes to the target.

    Q. Insert else update option in which situation we will use?

    if the source table contain multiple records .if the record specified in the associated

    port to insert into lookup cache. it does not find a record in the lookup cache when

    it is used find the particular record & change the data in the associated port. ----------------------

    We set this property when the lookup TRFM uses dynamic cache and the session property TREAT SOURCE ROWS AS "Insert" has been set.

    --------------------

    This option we use when we want to maintain the history.

    If records are not available in target table then it inserts the records in to target and records are available in target table then it updates the records.

    Q. What is an incremental loading? in which situations we will use incremental loading?

    Incremental Loading is an approach. Let suppose you a mapping for load the data

    from employee table to a employee_target table on the hire date basis. Again let

    suppose you already move the employee data from source to target up to the

    employees hire date 31-12-2009.Your organization now want to load data on

    employee_target today. Your target already have the data of that employees

    having hire date up to 31-12-2009.so you now pickup the source data which are

    hiring from 1-1-2010 to till date. That's why you needn't take the data before than

  • that date, if you do that wrongly it is overhead for loading data again in target

    which is already exists. So in source qualifier you filter the records as per hire date

    and you can also parameterized the hire date that help from which date you want

    to load data upon target.

    This is the concept of Incremental loading.

    Q. What is target update override?

    By Default the integration service updates the target based on key columns. But we

    might want to update non-key columns also, at that point of time we can override

    the

    UPDATE statement for each target in the mapping. The target override affects only

    when the source rows are marked as update by an update strategy in the mapping.

  • Q. What is the Mapping parameter and Mapping variable?

    Mapping parameter: Mapping parameter is constant values that can be defined

    before mapping run. A mapping parameter reuses the mapping for various constant

    values.

    Mapping variable: Mapping variable is represent a value that can be change

    during the mapping run that can be stored in repository the integration service

    retrieve that value from repository and incremental value for next run.

    Q. What is rank and dense rank in informatica with any examples and give sql query for this both ranks

    for eg: the file contains the records with column 100 200(repeated rows)

    200 300

    400 500 the rank function gives output as

    1 2

    2 4 5

    6 and dense rank gives

    1 2 2

    3 4

    5 for eg: the file contains the records with column

    empno sal 100 1000

    200(repeated rows) 2000 200 3000

    300 4000 400 5000 500 6000

    Rank :

    select rank() over (partition by empno order by sal) from emp

    1

  • 2 2

    4 5

    6 Dense Rank

    select dense_rank() over (partition by empno order by sal) from emp and dense rank gives

    1 2

    2 3

    4 5

    Q. What is the incremental aggregation?

    The first time you run an upgraded session using incremental aggregation, the

    Integration Service upgrades the index and data cache files. If you want to partition

    a session using a mapping with incremental aggregation, the Integration Service

    realigns the index and data cache files.

    Q. What is session parameter?

    Parameter file is a text file where we can define the values to the parameters

    .session parameters are used for assign the database connection values

    Q. What is mapping parameter?

    A mapping parameter represents a constant value that can be defined before

    mapping run. A mapping parameter defines a parameter file which is saved with an

    extension.prm a mapping parameter reuse the various constant values.

    Q. What is parameter file?

    A parameter file can be a text file. Parameter file is to define the values for

    parameters and variables used in a session. A parameter file is a file created by

    text editor such as word pad or notepad. You can define the following values in

    parameter file

    Mapping parameters

    Mapping variables

    Session parameters

    Q. What is session override?

    Session override is an option in informatica at session level. Here we can manually

    give a sql query which is issued to the database when the session runs. It is

  • nothing but over riding the default sql which is generated by a particular

    transformation at mapping level.

    Q. What are the diff. b/w informatica versions 8.1.1 and 8.6.1?

    Little change in the Administrator Console. In 8.1.1 we can do all the creation of IS

    and repository Service, web service, Domain, node, grid ( if we have licensed

    version),In 8.6.1 the Informatica Admin console we can manage both Domain page

    and security page. Domain Page means all the above like creation of IS and

    repository Service, web service, Domain, node, grid ( if we have licensed version)

    etc. Security page means creation of users, privileges, LDAP configuration, Export

    Import user and Privileges etc.

    Q. What are the uses of a Parameter file?

    Parameter file is one which contains the values of mapping variables.

    type this in notepad.save it .

    foldername.sessionname

    $$inputvalue1=

    ---------------------------------

    Parameter files are created with an extension of .PRM

    These are created to pass values those can be changed for Mapping Parameter and

    Session Parameter during mapping run.

    Mapping Parameters:

    A Parameter is defined in a parameter file for which a Parameter is create already in

    the Mapping with Data Type , Precision and scale.

    The Mapping parameter file syntax (xxxx.prm).

    [FolderName.WF:WorkFlowName.ST:SessionName]

    $$ParameterName1=Value

    $$ParameterName2=Value

    After that we have to select the properties Tab of Session and Set Parameter file

    name including physical path of this xxxx.prm file.

    Session Parameters:

    The Session Parameter files syntax (yyyy.prm).

    [FolderName.SessionName]

    $InputFileValue1=Path of the source Flat file

    After that we have to select the properties Tab of Session and Set Parameter file

    name including physical path of this yyyy.prm file.

    Do following changes in Mapping Tab of Source Qualifier's

  • Properties section

    Attributes values

    Source file Type ---------> Direct

    Source File Directory --------> Empty

    Source File Name --------> $InputFileValue1

    Q. What is the default data driven operation in informatica?

    This is default option for update strategy transformation.

    The integration service follows instructions coded in update strategy within session

    mapping determine how to flag records for insert,delete,update,reject. If you do not

    data driven option setting, the integration service ignores update strategy

    transformations in the mapping.

    Q. What is threshold error in informatica?

    When the target is used by the update strategy DD_REJECT,DD_UPDATE and some

    limited count, then if it the number of rejected records exceed the count then the

    session ends with failed status. This error is called Threshold Error.

    Q. SO many times i saw "$PM parser error ". What is meant by PM?

    PM: POWER MART

    1) Parsing error will come for the input parameter to the lookup.

    2) Informatica is not able to resolve the input parameter CLASS for your lookup.

    3) Check the Port CLASS exists as either input port or a variable port in your

    expression.

    4) Check data type of CLASS and the data type of input parameter for your lookup.

    Q. What is a candidate key?

    A candidate key is a combination of attributes that can be uniquely used to identify

    a database record without any extraneous data (unique). Each table may have one

    or more candidate keys. One of these candidate keys is selected as the table

    primary key else are called Alternate Key.

    Q. What is the difference between Bitmap and Btree index?

    Bitmap index is used for repeating values.

    ex: Gender: male/female

    Account status:Active/Inactive

    Btree index is used for unique values.

    ex: empid.

    Q. What is ThroughPut in Informatica?

    Thoughtput is the rate at which power centre server read the rows in bytes from

    source or writes the rows in bytes into the target per second.

  • You can find this option in workflow monitor. Right click on session choose

    properties and Source/Target Statictics tab you can find thoughtput details for

    each instance of source and target.

    Q. What are set operators in Oracle

    UNION

    UNION ALL

    MINUS

    INTERSECT

    Q. How i can Schedule the Informatica job in "Unix Cron scheduling tool"?

    Crontab

    The crontab (cron derives from chronos, Greek for time; tab stands for table)

    command, found in Unix and Unix-like operating systems, is used to schedule

    commands to be executed periodically. To see what crontabs are currently running

    on your system, you can open a terminal and run:

    sudo crontab -l

    To edit the list of cronjobs you can run:

    sudo crontab -e

    This will open a the default editor (could be vi or pico, if you want you can change

    the default editor) to let us manipulate the crontab. If you save and exit the editor,

    all your cronjobs are saved into crontab. Cronjobs are written in the following

    format:

    * * * * * /bin/execute/this/script.sh

    Scheduling explained

    As you can see there are 5 stars. The stars represent different date parts in the

    following order:

    1. minute (from 0 to 59)

    2. hour (from 0 to 23)

    3. day of month (from 1 to 31)

    4. month (from 1 to 12)

    5. day of week (from 0 to 6) (0=Sunday)

    Execute every minute

    If you leave the star, or asterisk, it means every. Maybe

    that's a bit unclear. Let's use the the previous example

    again:

    * * * * * /bin/execute/this/script.sh

    They are all still asterisks! So this means

    execute /bin/execute/this/script.sh:

    1. every minute

    2. of every hour

    3. of every day of the month

  • 4. of every month

    5. and every day in the week.

    In short: This script is being executed every minute.

    Without exception.

    Execute every Friday 1AM

    So if we want to schedule the script to run at 1AM every

    Friday, we would need the following cronjob:

    0 1 * * 5 /bin/execute/this/script.sh

    Get it? The script is now being executed when the system

    clock hits:

    1. minute: 0

    2. of hour: 1

    3. of day of month: * (every day of month)

    4. of month: * (every month)

    5. and weekday: 5 (=Friday)

    Execute on weekdays 1AM

    So if we want to schedule the script to run at 1AM every Friday, we would need the

    following cronjob:

    0 1 * * 1-5 /bin/execute/this/script.sh

    Get it? The script is now being executed when the system

    clock hits:

    1. minute: 0

    2. of hour: 1

    3. of day of month: * (every day of month)

    4. of month: * (every month)

    5. and weekday: 1-5 (=Monday til Friday)

    Execute 10 past after every hour on the 1st of every month

    Here's another one, just for practicing

    10 * 1 * * /bin/execute/this/script.sh

    Fair enough, it takes some getting used to, but it offers great flexibility.

  • Q. Can anyone tell me the difference between persistence and dynamic

    caches? On which conditions we are using these caches?

    Dynamic:--

    1)When you use a dynamic cache, the Informatica Server updates the lookup cache

    as it passes rows to the target.

    2)In Dynamic, we can update catch will New data also.

    3) Dynamic cache, Not Reusable

    (when we need Updated cache data, That only we need Dynamic Cache)

    Persistent:--

    1)a Lookup transformation to use a non-persistent or persistent cache. The

    PowerCenter Server saves or deletes lookup cache files after a successful session

    based on the Lookup Cache Persistent property.

    2) Persistent, we are not able to update the catch with New data.

    3) Persistent catch is Reusable.

    (When we need Previous Cache data, That only we need Persistent Cache)

    ----------------------------------

    few more additions to the above answer.....

    1. Dynamic lookup allows modifying cache where as Persistent lookup does not

    allow us to modify cache.

    2. Dynamic lookup use 'newlookup row', a default port in the cache but persistent

    does use any default ports in cache.

    3.As session completes dynamic cache removed but the persistent cache saved in

    informatica power centre server.

    Q. How to obtain performance data for individual transformations? There is a property at session level Collect Performance Data, you can select that

    property. It gives you performance details for all the transformations.

  • Q. List of Active and Passive Transformations in Informatica? Active Transformation - An active transformation changes the number of rows that pass

    through the mapping.

    Source Qualifier Transformation

    Sorter Transformations

    Aggregator Transformations

    Filter Transformation

    Union Transformation

    Joiner Transformation

    Normalizer Transformation

    Rank Transformation

    Router Transformation

    Update Strategy Transformation

    Advanced External Procedure Transformation

    Passive Transformation - Passive transformations do not change the number of rows that

    pass through the mapping.

    Expression Transformation

    Sequence Generator Transformation

    Lookup Transformation

    Stored Procedure Transformation

    XML Source Qualifier Transformation

    External Procedure Transformation

    Q. Eliminating of duplicate records without using dynamic lookups?

    Hi U can eliminate duplicate records by an simple one line SQL Query. Select id, count (*) from seq1 group by id having count (*)>1;

    Below are the ways to eliminate the duplicate records:

    1. By enabling the option in Source Qualifier transformation as select

    distinct.

    2. By enabling the option in sorter transformation as select distinct. 3. By enabling all the values as group by in Aggregator transformation.

    Q. Can anyone give idea on how do we perform test load in informatica? What do

    we test as part of test load in informatica?

    With a test load, the Informatica Server reads and transforms data without writing to

    targets. The Informatica Server does everything, as if running the full session. The

    Informatica Server writes data to relational targets, but rolls back the data when the session

    completes. So, you can enable collect performance details property and analyze the how

    efficient your mapping is. If the session is running for a long time, you may like to find out

    the bottlenecks that are existing. It may be bottleneck of type target, source, mapping etc.

    The basic idea behind test load is to see the behavior of Informatica Server with your

    session.

    Q. What is ODS (Operational Data Store)?

  • A collection of operation or bases data that is extracted from operation databases

    and standardized, cleansed, consolidated, transformed, and loaded into enterprise

    data architecture.

    An ODS is used to support data mining of operational data, or as the store for base

    data that is summarized for a data warehouse.

    The ODS may also be used to audit the data warehouse to assure summarized and derived

    data is calculated properly. The ODS may further become the enterprise shared operational

    database, allowing operational systems that are being reengineered to use the ODS as there

    operation databases.

    Q. How many tasks are there in informatica?

    Session Task

    Email Task

    Command Task

    Assignment Task

    Control Task

    Decision Task

    Event-Raise

    Event- Wait

    Timer Task

    Link Task

    Q. What are business components in Informatica?

    Domains

    Nodes

    Services

    Q. WHAT IS VERSIONING?

    Its used to keep history of changes done on the mappings and workflows

    1. Check in: You check in when you are done with your changes so that everyone can see

    those changes.

    2. Check out: You check out from the main stream when you want to make any change to the

    mapping/workflow.

    3. Version history: It will show you all the changes made and who made it.

    Q. Diff between $$$sessstarttime and sessstarttime? $$$SessStartTime - Returns session start time as a string value (String datatype)

    SESSSTARTTIME - Returns the date along with date timestamp (Date datatype)

    Q. Difference between $,$$,$$$ in Informatica?

    1. $ Refers

    These are the system variables/Session Parameters like $Bad file,$input

    file, $output file, $DB connection,$source,$target etc..

  • 2. $$ Refers

    User defined variables/Mapping Parameters like $$State,$$Time, $$Entity,

    $$Business_Date, $$SRC,etc.

    3. $$$ Refers

    System Parameters like $$$SessStartTime

    $$$SessStartTime returns the session start time as a string value. The format of the

    string depends on the database you are using.

    $$$SessStartTime returns the session start time as a string value --> The format of the

    string depends on the database you are using.

    Q. Finding Duplicate Rows based on Multiple Columns?

    SELECT firstname, COUNT(firstname), surname, COUNT(surname), email, COUNT(email)

    FROM employee

    GROUP BY firstname, surname, email

    HAVING (COUNT(firstname) > 1) AND (COUNT(surname) > 1) AND (COUNT(email) > 1);

    Q. Finding Nth Highest Salary in Oracle? Pick out the Nth highest salary, say the 4th highest salary.

    Select * from

    (select ename,sal,dense_rank() over (order by sal desc) emp_rank from emp)

    where emp_rank=4;

    Q. Find out the third highest salary?

    SELECT MIN(sal) FROM emp WHERE

    sal IN (SELECT distinct TOP 3 sal FROM emp ORDER BY sal DESC);

    Q. How do you handle error logic in Informatica? What are the

    transformations that you used while handling errors? How did you reload

    those error records in target?

    Row indicator: It generally happens when working with update strategy

    transformation. The writer/target rejects the rows going to the target

    Column indicator:

    D -Valid

    o - Overflow

    n - Null

    t - Truncate

    When the data is with nulls, or overflo