Upload
irfan-ali
View
50
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Working with informatica and XML processing
Citation preview
5/28/2018 XML With Informatica
1/12
XML & XML with Informatica
5/28/2018 XML With Informatica
2/12
XML
My best description of XML is this: XML is a cross-platform, software and hardware
independent tool for transmitting information.
XML is used to Exchange Data
With XML, data can be exchanged between incompatible sstems.
In the real world, computer systems and databases contain data in incompatible formats.
One of the most time-consuming challenges for developers has been to exchange data
between such systems over the Internet.
onverting the data to XML can greatly reduce this complexity and create data that can be
read by many different types of applications.
XML, D!D, and XML "chema
!xtensible Mar"up Language #XML$ is a mar"up language generally regarded as the universal
format for structured documents and data on the %eb. Li"e &'ML, XML contains element
tags and attributes that define data. (nli"e &'ML, XML element tags and attributes are
not based on a predefined, static set of elements and attributes. !very XML file can have a
different set of tags and attributes. )ocument 'ype )efinition #)')$ files and XML
schema files define the elements and attribute that can be used and the structure within
which they fit in an XML file.
)') and XML schema files specify the structure and content of XML files in different
ways. * )') file defines the names of elements, the number of times the occur, and
how the fit together. 'he XML schema file provides the same information plus the datatpesof the elements.
D!D
'he purpose of a )') is to define the legal building bloc"s of an XML document. It defines
the document structure with a list of legal elements. * )') can be declared inline in your
XML document, or as an external reference.
'he )') file contains only metadata. It contains the description of the structure and the
definition of the elements and attributes that can be found in the associated XML file. It
does not contain any data.
* sample )') loo"s li"e this+
!L!M!' employees #companyname, employee $ /
!L!M!' companyname # id, name$ /
!L!M!' employee # emp0 $ /
!L!M!' emp # id, info $ /
!L!M!' info # name, age, sex, 1ob, sal $ /
5/28/2018 XML With Informatica
3/12
!L!M!' created-date # format, timestamp $ /
!L!M!' id # 23)*'* $ /
!L!M!' name # 23)*'* $ /
!L!M!' format # 23)*'* $ /
!L!M!' timestamp # 23)*'* $ /
eg+
employees/
companyname /
id/456id/
name/%ipro 'echnologies6name/
6 companyname /
employee /
emp/
id/754446id/
info/ name/)ileep6name/
age/896age/
sex/Male6sex/
1ob/3ro1ect !ngineer61ob/
sal/844446sal/
6info/
6emp/
6employee/
6employees/
XML "chema
'he XML schema file, li"e the )') file, contains only metadata. In addition to the
definition and structure of elements and attributes, an XML schema contains a description
of the type of elements and attributes found in the associated XML file.
* sample XML :chema file loo"s li"e this+
xs+element name;
xs+complex'ype/
xs+se>uence/
xs+element ref;
6xs+se>uence/
6xs+complex'ype/
6xs+element/
xs+element name;
xs+complex'ype/
5/28/2018 XML With Informatica
4/12
xs+se>uence/
xs+element name;
xs+se>uence/
xs+element name;
5/28/2018 XML With Informatica
5/12
xs+element name;
5/28/2018 XML With Informatica
6/12
)*'* means character data. i.e. if we have a character data element declared as )*'*
then all characters or text or data inside the xml tags will not be parsed by the XML
parser. If we text contains a lot of uires a lot of memory and resources to parse very large XML files and
extract metadata for source or target definitions. 'o ensure that the )esigner creates an
XML source or target definition >uic"ly and efficiently, Informatica recommends that youimport source or target definitions only from XML files that are no larger than 544H or
from )') or XML schema files. If you want to import from a very large XML file that has
no )') or XML schema file, decrease the sie of the XML file by deleting duplicate data
elements. Gou do not need all of your data to import an XML source or target definition. Gou
need only enough data to accurately show the hierarchy of your XML file and enable the
)esigner to create a source or target definition.
'he XML schema file, li"e the )') file, contains only metadata. In addition to the
definition and structure of elements and attributes, an XML schema contains a description
of the type of elements and attributes found in the associated XML file.
'arget from XML+
Gou can create an XML target definition from an XML, )'), or XML schema file. Gou can
also create an XML target definition from an XML source definition or from one or more
relational source definitions.
=ules for a Jalid Kroup
*n XML group is valid when it follows these rules+
*ny element or attribute in an XML file can be included in a group.
* group cannot contain two elements with a many-to-many relationship.
olumn names in the groups are uni>ue within a source or target definition. Kroup names are uni>ue within a source or target definition.
'he )esigner validates any group you create or modify. %hen you try to create a group that
does not follow these constraints, the )esigner returns an error message and does not
create the group.
5/28/2018 XML With Informatica
7/12
ote+ If the target definition consists of only one group, then it does not re>uire a primary
"ey or a foreign "ey.
ormalied Kroups
* normalied group is a valid group that contains only one multiple-occurring element. In
most cases, XML sources contain more than one multiple-occurring element and convert tomore than one normalied group.
'he following rules apply to normalied groups+
* normalied group must be a valid group.
* normalied group cannot contain more than one multiple-occurring element.
)enormalied Kroups
* denormalied group has more than one multiple-occurring element. 'he multiple-occurring
elements can have a one-to-many relationship, but not a many-to-many relationship. *ll the
elements in a denormalied group belong to the same parent chain.
:ource definitions can have denormalied groups, but target definitions cannot have
denormalied groups.
)enormalied groups, li"e denormalied relational tables, generate duplicate data. It can
also generate null data. Ma"e sure you filter out any unwanted duplicate or null data before
passing data to the target.
'he following rules apply to denormalied groups+
* denormalied group must be a valid group.
* denormalied group can contain more than one multiple-occurring element.
Multiple-occurring elements in a denormalied group must have a one-to-many
relationship.
)enormalied groups can exist in a source definition, but not in a target definition.
Kroup Heys and =elationships
'he relationship between elements in the XML hierarchy translates into a combination of
primary and foreign "eys that define the relationship between XML groups. If you define a
"ey in the XML hierarchy, the )esigner uses it as a primary "ey in a group. 'he )esigner
handles group "eys and relationships differently for sources and targets.
In a source definition, a group does not have to be related to any other group. *
denormalied group can be independent of any other group. 'herefore, groups in a source
definition do not re>uire primary or foreign "eys. &owever, if a group is related to another
group based on the XML hierarchy, and you do not designate any column as a "ey for the
group, the )esigner creates a column called the Kenerated 3rimary Hey to hold a "ey for
the group.
5/28/2018 XML With Informatica
8/12
In a target definition, each group must be related to one other group. 'herefore, each
group needs at least one "ey to establish its relationship with another group. If you do not
designate any column as a "ey for a group, the )esigner creates a column called Kroup Lin"
Hey to hold a "ey for the group.
%hen you run a session with a mapping that contains an XML source, the Informatica
:erver generates the values for the generated primary "ey columns in the source definition.
%hen you run a session with a mapping that contains an XML target, you need to pass the
values to the group lin" columns in the target groups from the data in the pipeline.
Kroup "eys and relationships follow these rules+
*ny element or attribute can be mar"ed as a "ey.
* group can have only one primary "ey.
* group can be related to only one other group, and therefore can have only one
foreign "ey. * column cannot be mar"ed as both a primary "ey and a foreign "ey.
* "ey column can be a column that points to an element in the hierarchy or a column
created by the )esigner. * group can have a combination of the two types of "ey columns.
* source group does not re>uire a "ey.
* target group re>uires at least one "ey.
'he target root group re>uires a primary "ey. It does not re>uire a foreign "ey.
* target leaf group re>uires a foreign "ey. It does not re>uire a primary "ey.
* foreign "ey always refers to a primary "ey in another group. :elf-referencing
"eys are not allowed.
* foreign "ey column created by the )esigner always refers to a primary "ey column
created by the )esigner.
#ode (ages
XML files contain an encoding declaration that indicates the code page used in the file. 'he
most commonly used code pages in XML are ('B-A and ('B-5@. *ll XML parsers support
these two code pages. Bor information on the XML character encoding specification, go to
the % website at http+66www.wc.org.
3owerenter and 3owerMart support the same set of code pages for XML files that they
support for relational databases and other flat files. Gou can use any code page supported
by both Informatica and the XML specification. Bor a list of code pages that Informaticasupports, see ode 3agesN in the Installation and onfiguration Kuide. Informatica does not
support any user-defined code page.
Bor XML source definitions, 3owerenter and 3owerMart use the repository code page.
%hen you import a source definition from an XML file, the )esigner displays the code page
declared in the file for verification only. It does not use the code page declared in the XML
file.
http://www.w3c.org/http://www.w3c.org/5/28/2018 XML With Informatica
9/12
Bor XML target definitions, 3owerenter and 3owerMart use the code page declared in the
XML file. If Informatica does not support the declared code page, the )esigner returns an
error. Gou cannot import the target definition.
XML writer:
Jerify the XML environment is set up correctly, such as the environment variables are set
properly, the .dll files are in the correct location on %indows or the shared libraries on
(IX, and the supporting .dat files are present.
+ow XML sources targets loo in nformatica/
XML :ource+
!ach group in an XML definition is analogous to a relational table, and the )esigner treats
each group within the XML :ource ualifier as a separate source of data.
In a mapping, the ports of one group in an XML :ource ualifier can be part of more than
one data flow. &owever, the ports of more than one group in the same XML :ource ualifier
cannot lin" to one transformation or be part of the same data flow. 'his is the biggest
drawbac" with XML sources. If you need to use data from two different XML source
definitions, you can lin" a group from each source >ualifier and 1oin the data in a Poiner
transformation. Gou can also use the same source definition more than once in a mapping.
onnect each source definition to a different XML :ource ualifier and 1oin the groups in a
Poiner transformation. 'he following figure shows how we can 1oin two XML groups in the
same mapping using a Poiner transformation.
5/28/2018 XML With Informatica
10/12
If we need to load data from several groups to the same target based on the granularity itQsalways better to divide those mapping to 8 or mappings E load the data to the target.
%hen we create a session to extract data from an XML source we need to configure source
properties, such as source file location, in the session properties. )efine the XML source
properties on the 3roperties settings on the :ources tab.
5/28/2018 XML With Informatica
11/12
XML 'arget+
'he following figure shows how an XML target loo"s in Informatica )esigner.
%hen you configure a session to load data to an XML target, you define properties on the
'argets tab and the 'ransformations tab of the session properties. Gou can configure the
following properties for XML targets+
5/28/2018 XML With Informatica
12/12
0utput file options. Gou can configure the directory and file name to which the
Informatica :erver writes the target file.
#ode page.Gou can define the code page declared in the XML target file. (se the :et Bile
3roperties button to define the code page.
Duplicate 1roup 'ow +andling.Gou can configure how the Informatica :erver handles
duplicate rows.D!D2"chema 'eference.Gou can specify a )') or an XML schema file name for the XML
target.
(oints to be taen care while using XML as source or target:
'he code page used in the XML6)')6XML :chema file should be a valid one and
supported by Informatica. It should be ta"en care while creating the file to match
with the same format. Bor eg+ Bor a ('B-A code file, the encoding should be ('B-A
itself. It should not be *I.
If we have a )')6XML :chema file associated with the source6target, then the
XML data file should exactly match with the )')6XML :chema file. If we have a large no. of data in the XML source or to load huge data to our XML
target, then divide it into smaller moduleQs with respect to the business
re>uirement. Informatca will not be able to read or write bigger XML files.
If we got any changes to the source6target )')6XML schema file, always re-import
the source6target again.
*lways ma"e sure that the data type and sie for the imported XML metadata is
correct E matching with the re>uirement. Ry default it will ta"e only number E
string for all data as data type E sie as 54.
%e need to ma"e sure that whenever we 1oin two groups in the Poiner
transformation that we select only the smaller group6set as the Master group.
If we have XML as target, we should always ma"e sure that the data sent to thetarget is matching with the cardinality defined in the target )')6XML :chema
file6XML file.
If we have XML as source, decide whether groups in the source to be normalied or
de-normalied based on our re>uirement. Rut ma"e sure that the XML sources
contain only one multiple-occurring element.
XML target never can be de-normalied one.