XML With Informatica

Embed Size (px)

DESCRIPTION

Working with informatica and XML processing

Citation preview

  • 5/28/2018 XML With Informatica

    1/12

    XML & XML with Informatica

  • 5/28/2018 XML With Informatica

    2/12

    XML

    My best description of XML is this: XML is a cross-platform, software and hardware

    independent tool for transmitting information.

    XML is used to Exchange Data

    With XML, data can be exchanged between incompatible sstems.

    In the real world, computer systems and databases contain data in incompatible formats.

    One of the most time-consuming challenges for developers has been to exchange data

    between such systems over the Internet.

    onverting the data to XML can greatly reduce this complexity and create data that can be

    read by many different types of applications.

    XML, D!D, and XML "chema

    !xtensible Mar"up Language #XML$ is a mar"up language generally regarded as the universal

    format for structured documents and data on the %eb. Li"e &'ML, XML contains element

    tags and attributes that define data. (nli"e &'ML, XML element tags and attributes are

    not based on a predefined, static set of elements and attributes. !very XML file can have a

    different set of tags and attributes. )ocument 'ype )efinition #)')$ files and XML

    schema files define the elements and attribute that can be used and the structure within

    which they fit in an XML file.

    )') and XML schema files specify the structure and content of XML files in different

    ways. * )') file defines the names of elements, the number of times the occur, and

    how the fit together. 'he XML schema file provides the same information plus the datatpesof the elements.

    D!D

    'he purpose of a )') is to define the legal building bloc"s of an XML document. It defines

    the document structure with a list of legal elements. * )') can be declared inline in your

    XML document, or as an external reference.

    'he )') file contains only metadata. It contains the description of the structure and the

    definition of the elements and attributes that can be found in the associated XML file. It

    does not contain any data.

    * sample )') loo"s li"e this+

    !L!M!' employees #companyname, employee $ /

    !L!M!' companyname # id, name$ /

    !L!M!' employee # emp0 $ /

    !L!M!' emp # id, info $ /

    !L!M!' info # name, age, sex, 1ob, sal $ /

  • 5/28/2018 XML With Informatica

    3/12

    !L!M!' created-date # format, timestamp $ /

    !L!M!' id # 23)*'* $ /

    !L!M!' name # 23)*'* $ /

    !L!M!' format # 23)*'* $ /

    !L!M!' timestamp # 23)*'* $ /

    eg+

    employees/

    companyname /

    id/456id/

    name/%ipro 'echnologies6name/

    6 companyname /

    employee /

    emp/

    id/754446id/

    info/ name/)ileep6name/

    age/896age/

    sex/Male6sex/

    1ob/3ro1ect !ngineer61ob/

    sal/844446sal/

    6info/

    6emp/

    6employee/

    6employees/

    XML "chema

    'he XML schema file, li"e the )') file, contains only metadata. In addition to the

    definition and structure of elements and attributes, an XML schema contains a description

    of the type of elements and attributes found in the associated XML file.

    * sample XML :chema file loo"s li"e this+

    xs+element name;

    xs+complex'ype/

    xs+se>uence/

    xs+element ref;

    6xs+se>uence/

    6xs+complex'ype/

    6xs+element/

    xs+element name;

    xs+complex'ype/

  • 5/28/2018 XML With Informatica

    4/12

    xs+se>uence/

    xs+element name;

    xs+se>uence/

    xs+element name;

  • 5/28/2018 XML With Informatica

    5/12

    xs+element name;

  • 5/28/2018 XML With Informatica

    6/12

    )*'* means character data. i.e. if we have a character data element declared as )*'*

    then all characters or text or data inside the xml tags will not be parsed by the XML

    parser. If we text contains a lot of uires a lot of memory and resources to parse very large XML files and

    extract metadata for source or target definitions. 'o ensure that the )esigner creates an

    XML source or target definition >uic"ly and efficiently, Informatica recommends that youimport source or target definitions only from XML files that are no larger than 544H or

    from )') or XML schema files. If you want to import from a very large XML file that has

    no )') or XML schema file, decrease the sie of the XML file by deleting duplicate data

    elements. Gou do not need all of your data to import an XML source or target definition. Gou

    need only enough data to accurately show the hierarchy of your XML file and enable the

    )esigner to create a source or target definition.

    'he XML schema file, li"e the )') file, contains only metadata. In addition to the

    definition and structure of elements and attributes, an XML schema contains a description

    of the type of elements and attributes found in the associated XML file.

    'arget from XML+

    Gou can create an XML target definition from an XML, )'), or XML schema file. Gou can

    also create an XML target definition from an XML source definition or from one or more

    relational source definitions.

    =ules for a Jalid Kroup

    *n XML group is valid when it follows these rules+

    *ny element or attribute in an XML file can be included in a group.

    * group cannot contain two elements with a many-to-many relationship.

    olumn names in the groups are uni>ue within a source or target definition. Kroup names are uni>ue within a source or target definition.

    'he )esigner validates any group you create or modify. %hen you try to create a group that

    does not follow these constraints, the )esigner returns an error message and does not

    create the group.

  • 5/28/2018 XML With Informatica

    7/12

    ote+ If the target definition consists of only one group, then it does not re>uire a primary

    "ey or a foreign "ey.

    ormalied Kroups

    * normalied group is a valid group that contains only one multiple-occurring element. In

    most cases, XML sources contain more than one multiple-occurring element and convert tomore than one normalied group.

    'he following rules apply to normalied groups+

    * normalied group must be a valid group.

    * normalied group cannot contain more than one multiple-occurring element.

    )enormalied Kroups

    * denormalied group has more than one multiple-occurring element. 'he multiple-occurring

    elements can have a one-to-many relationship, but not a many-to-many relationship. *ll the

    elements in a denormalied group belong to the same parent chain.

    :ource definitions can have denormalied groups, but target definitions cannot have

    denormalied groups.

    )enormalied groups, li"e denormalied relational tables, generate duplicate data. It can

    also generate null data. Ma"e sure you filter out any unwanted duplicate or null data before

    passing data to the target.

    'he following rules apply to denormalied groups+

    * denormalied group must be a valid group.

    * denormalied group can contain more than one multiple-occurring element.

    Multiple-occurring elements in a denormalied group must have a one-to-many

    relationship.

    )enormalied groups can exist in a source definition, but not in a target definition.

    Kroup Heys and =elationships

    'he relationship between elements in the XML hierarchy translates into a combination of

    primary and foreign "eys that define the relationship between XML groups. If you define a

    "ey in the XML hierarchy, the )esigner uses it as a primary "ey in a group. 'he )esigner

    handles group "eys and relationships differently for sources and targets.

    In a source definition, a group does not have to be related to any other group. *

    denormalied group can be independent of any other group. 'herefore, groups in a source

    definition do not re>uire primary or foreign "eys. &owever, if a group is related to another

    group based on the XML hierarchy, and you do not designate any column as a "ey for the

    group, the )esigner creates a column called the Kenerated 3rimary Hey to hold a "ey for

    the group.

  • 5/28/2018 XML With Informatica

    8/12

    In a target definition, each group must be related to one other group. 'herefore, each

    group needs at least one "ey to establish its relationship with another group. If you do not

    designate any column as a "ey for a group, the )esigner creates a column called Kroup Lin"

    Hey to hold a "ey for the group.

    %hen you run a session with a mapping that contains an XML source, the Informatica

    :erver generates the values for the generated primary "ey columns in the source definition.

    %hen you run a session with a mapping that contains an XML target, you need to pass the

    values to the group lin" columns in the target groups from the data in the pipeline.

    Kroup "eys and relationships follow these rules+

    *ny element or attribute can be mar"ed as a "ey.

    * group can have only one primary "ey.

    * group can be related to only one other group, and therefore can have only one

    foreign "ey. * column cannot be mar"ed as both a primary "ey and a foreign "ey.

    * "ey column can be a column that points to an element in the hierarchy or a column

    created by the )esigner. * group can have a combination of the two types of "ey columns.

    * source group does not re>uire a "ey.

    * target group re>uires at least one "ey.

    'he target root group re>uires a primary "ey. It does not re>uire a foreign "ey.

    * target leaf group re>uires a foreign "ey. It does not re>uire a primary "ey.

    * foreign "ey always refers to a primary "ey in another group. :elf-referencing

    "eys are not allowed.

    * foreign "ey column created by the )esigner always refers to a primary "ey column

    created by the )esigner.

    #ode (ages

    XML files contain an encoding declaration that indicates the code page used in the file. 'he

    most commonly used code pages in XML are ('B-A and ('B-5@. *ll XML parsers support

    these two code pages. Bor information on the XML character encoding specification, go to

    the % website at http+66www.wc.org.

    3owerenter and 3owerMart support the same set of code pages for XML files that they

    support for relational databases and other flat files. Gou can use any code page supported

    by both Informatica and the XML specification. Bor a list of code pages that Informaticasupports, see ode 3agesN in the Installation and onfiguration Kuide. Informatica does not

    support any user-defined code page.

    Bor XML source definitions, 3owerenter and 3owerMart use the repository code page.

    %hen you import a source definition from an XML file, the )esigner displays the code page

    declared in the file for verification only. It does not use the code page declared in the XML

    file.

    http://www.w3c.org/http://www.w3c.org/
  • 5/28/2018 XML With Informatica

    9/12

    Bor XML target definitions, 3owerenter and 3owerMart use the code page declared in the

    XML file. If Informatica does not support the declared code page, the )esigner returns an

    error. Gou cannot import the target definition.

    XML writer:

    Jerify the XML environment is set up correctly, such as the environment variables are set

    properly, the .dll files are in the correct location on %indows or the shared libraries on

    (IX, and the supporting .dat files are present.

    +ow XML sources targets loo in nformatica/

    XML :ource+

    !ach group in an XML definition is analogous to a relational table, and the )esigner treats

    each group within the XML :ource ualifier as a separate source of data.

    In a mapping, the ports of one group in an XML :ource ualifier can be part of more than

    one data flow. &owever, the ports of more than one group in the same XML :ource ualifier

    cannot lin" to one transformation or be part of the same data flow. 'his is the biggest

    drawbac" with XML sources. If you need to use data from two different XML source

    definitions, you can lin" a group from each source >ualifier and 1oin the data in a Poiner

    transformation. Gou can also use the same source definition more than once in a mapping.

    onnect each source definition to a different XML :ource ualifier and 1oin the groups in a

    Poiner transformation. 'he following figure shows how we can 1oin two XML groups in the

    same mapping using a Poiner transformation.

  • 5/28/2018 XML With Informatica

    10/12

    If we need to load data from several groups to the same target based on the granularity itQsalways better to divide those mapping to 8 or mappings E load the data to the target.

    %hen we create a session to extract data from an XML source we need to configure source

    properties, such as source file location, in the session properties. )efine the XML source

    properties on the 3roperties settings on the :ources tab.

  • 5/28/2018 XML With Informatica

    11/12

    XML 'arget+

    'he following figure shows how an XML target loo"s in Informatica )esigner.

    %hen you configure a session to load data to an XML target, you define properties on the

    'argets tab and the 'ransformations tab of the session properties. Gou can configure the

    following properties for XML targets+

  • 5/28/2018 XML With Informatica

    12/12

    0utput file options. Gou can configure the directory and file name to which the

    Informatica :erver writes the target file.

    #ode page.Gou can define the code page declared in the XML target file. (se the :et Bile

    3roperties button to define the code page.

    Duplicate 1roup 'ow +andling.Gou can configure how the Informatica :erver handles

    duplicate rows.D!D2"chema 'eference.Gou can specify a )') or an XML schema file name for the XML

    target.

    (oints to be taen care while using XML as source or target:

    'he code page used in the XML6)')6XML :chema file should be a valid one and

    supported by Informatica. It should be ta"en care while creating the file to match

    with the same format. Bor eg+ Bor a ('B-A code file, the encoding should be ('B-A

    itself. It should not be *I.

    If we have a )')6XML :chema file associated with the source6target, then the

    XML data file should exactly match with the )')6XML :chema file. If we have a large no. of data in the XML source or to load huge data to our XML

    target, then divide it into smaller moduleQs with respect to the business

    re>uirement. Informatca will not be able to read or write bigger XML files.

    If we got any changes to the source6target )')6XML schema file, always re-import

    the source6target again.

    *lways ma"e sure that the data type and sie for the imported XML metadata is

    correct E matching with the re>uirement. Ry default it will ta"e only number E

    string for all data as data type E sie as 54.

    %e need to ma"e sure that whenever we 1oin two groups in the Poiner

    transformation that we select only the smaller group6set as the Master group.

    If we have XML as target, we should always ma"e sure that the data sent to thetarget is matching with the cardinality defined in the target )')6XML :chema

    file6XML file.

    If we have XML as source, decide whether groups in the source to be normalied or

    de-normalied based on our re>uirement. Rut ma"e sure that the XML sources

    contain only one multiple-occurring element.

    XML target never can be de-normalied one.