Ds Lineage

Embed Size (px)

Citation preview

  • 7/30/2019 Ds Lineage

    1/7

    Design InfoSphere DataStage jobsfor optimum lineage

  • 7/30/2019 Ds Lineage

    2/7

    ii Design InfoSphere DataStage jobs for optimum lineage

  • 7/30/2019 Ds Lineage

    3/7

  • 7/30/2019 Ds Lineage

    4/7

    iv Design InfoSphere DataStage jobs for optimum lineage

  • 7/30/2019 Ds Lineage

    5/7

    Design InfoSphere DataStage jobs for optimum lineage

    Design your IBM InfoSphere DataStage jobs to ensure that complete metadata isavailable for lineage reports in IBM InfoSphere Metadata Workbench.

    When an IBM InfoSphere DataStage and QualityStage job is developed,information that is included in the job is called design metadata. When you design a job, you build the data flow from a source of the job to a target in the job.

    IBM InfoSphere Metadata Workbench uses design metadata to build lineagereports that analyze the flow of data from source to target. The lineage analysismakes relationships and links between job assets and stages. In addition,InfoSphere Metadata Workbench uses the design metadata to identify the sourcesthat the job stages read from or write to. This metadata includes the followinginformation: name of the database server or the data connection, name of thedatabase schema, any user-defined SQL statements, or name and location of thedata file.

    Information that flows across InfoSphere DataStage and QualityStage jobs is calleddesign lineage. The data output of one job can be the data source of another job. Inthis case, the data source is shared between the two jobs. If a source of the job isnot imported into the metadata repository, the design lineage metadata is used toinfer the relationship with other jobs. This relationship is based on the sharedusage of the referenced data source.

    Use the following table of actions to ensure that your job design gives completemetadata for best lineage results.

    Table 1. Actions to ensure complete job design metadata for data lineage

    Action DescriptionHow this action affectslineage Additional information

    Use Connectorstages

    Connector stages give themaximum amount of metadata about the jobdesign. Therefore, useConnector stages instead of equivalent generic stages.For example, use theODBC Connector stagerather than the ODBCEnterprise stage.

    The Manage Lineage utilityreads the design lineagemetadata from the stages of the job. The ManageLineage utility then infersthe database or data fileassets that the job readsfrom or writes to.Connector stages providemore information toenhance the utility.

    For a list of job stages withtheir description, seeAlphabetical list of stages.Whether a particular stageis displayed on theInfoSphere DataStageDesigner client palettedepends on the type of joband the installed productsand add-ons.

    1

    http://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.stageix.doc/topics/master_stagesbytype.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.stageix.doc/topics/master_stagesbytype.html
  • 7/30/2019 Ds Lineage

    6/7

    Table 1. Actions to ensure complete job design metadata for data lineage (continued)

    Action DescriptionHow this action affectslineage Additional information

    Useenvironmentvariables and job parameters

    You can define variablesand parameters to reuseacross all jobs of a project by using environmentvariables and jobparameters. Whereverpossible, use parametersand parameter sets ascommon references acrossall jobs in a project.

    The use of variablesreduces error and promotesdata reuse in jobdevelopment.

    For more informationabout how to set up jobparameters and parametersets, see Making your jobsadaptable.

    For general informationabout setting environmentvariables, see Guide tosetting environmentvariables.

    For general informationabout environmentvariables, see Environmentvariables.

    Importproject-levelenvironmentvariables

    Before you run lineagereports, you must importthe project-levelenvironment variables thatyou defined in InfoSphereDataStage into InfoSphereMetadata Workbench.

    InfoSphere MetadataWorkbench uses theenvironment variables toreconcile and link the jobwith referenced sources.

    For information about howto import environmentvariables, see Importproject-level environmentvariables.

    Check theproject-levelenvironmentvariables

    To list the environmentvariables that are definedfor the project, use thedsadmin utility.

    For information about howto run this utility, seeListing environmentvariables.

    Load columnsof databaseand file stagesfrom sharedmetadata

    Table definitions carryinformation about yoursource and target data,such as the name andstructure of the databasetables or files that containyour data. Within a tabledefinition are columndefinitions. Columndefinitions containinformation about thecolumn name, columnlength, data type, andother column properties,such as keys and nullvalues.

    InfoSphere MetadataWorkbench requires tableand column definitions tomatch imported databaseassets to jobs and to otherassets in the metadatarepository.

    For more informationabout shared metadata inInfoSphere DataStage, seeShared metadata.

    When youimport a datafile, ensurethat the itsname anddirectory pathare defined inthe same waythat they aredefined in thestage

    The name and directorypath of the imported orshared data file mustmatch the name anddirectory path in the stage.

    If the name or directorypath is not the same as it isin the stage, the data fileand stage cannot be linkedcorrectly in the job dataflow. As a result, thelineage is incorrect orincomplete.

    Use jobparameters todefine filenames anddirectorypaths

    To minimize errors, use jobparameters whereverpossible.

    For information about jobparameters, see Jobparameters.

    2 Design InfoSphere DataStage jobs for optimum lineage

    http://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.design.doc/topics/c_ddesref_Parameter_Sets.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.design.doc/topics/c_ddesref_Parameter_Sets.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_Guide_to_Setting_Environment_Variables.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_Guide_to_Setting_Environment_Variables.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_Guide_to_Setting_Environment_Variables.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_Environment_Variables_environment.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_Environment_Variables_environment.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/t_importingProjectLevelEnvironmentVariables.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/t_importingProjectLevelEnvironmentVariables.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/t_importingProjectLevelEnvironmentVariables.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.cliapi.ref.doc/topics/r_dsvjbref_Listing_Environment_Variables.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.cliapi.ref.doc/topics/r_dsvjbref_Listing_Environment_Variables.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.design.doc/topics/workingwithsharedmetadata.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.design.doc/topics/c_ddesref_Parameter_Sets.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.design.doc/topics/c_ddesref_Parameter_Sets.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.design.doc/topics/c_ddesref_Parameter_Sets.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.design.doc/topics/c_ddesref_Parameter_Sets.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.design.doc/topics/workingwithsharedmetadata.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.cliapi.ref.doc/topics/r_dsvjbref_Listing_Environment_Variables.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.cliapi.ref.doc/topics/r_dsvjbref_Listing_Environment_Variables.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/t_importingProjectLevelEnvironmentVariables.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/t_importingProjectLevelEnvironmentVariables.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/t_importingProjectLevelEnvironmentVariables.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_Environment_Variables_environment.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_Environment_Variables_environment.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_Guide_to_Setting_Environment_Variables.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_Guide_to_Setting_Environment_Variables.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_Guide_to_Setting_Environment_Variables.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.design.doc/topics/c_ddesref_Parameter_Sets.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.design.doc/topics/c_ddesref_Parameter_Sets.html
  • 7/30/2019 Ds Lineage

    7/7

    Table 1. Actions to ensure complete job design metadata for data lineage (continued)

    Action DescriptionHow this action affectslineage Additional information

    Use thedefault SQLstatementsrather thanuser-definedSQL

    In InfoSphere MetadataWorkbench, the schemaand database table name of the imported databasemust be the same as theschema and table name inthe stage. You can generatedefault SQL statements toread from and write todata sources. Alternatively,you can define SQLstatements that read fromand write to data sources.

    The Manage Lineage utilityparses all SQL statementsto extract information aboutthe schema, owner,database tables, andcolumns. The utility thenmaps this information toshared database tables thatwere previously imported.User-defined SQL thatcontains complexstatements might not beparsed correctly. If statements are not parsedcorrectly, you must run theManual Binding utility. Thisutility manually sets therelationships betweenstages and data sources and between stages and other

    stages.

    For information aboutuser-defined SQL inInfoSphere DataStage, seeUser-defined SQL.

    For information about jobdesign considerations andSQL, see Job designconsiderations.

    Set up alogging viewand reviewthe metadataworkbenchlogs

    You can view the loginformation in the IBMInfoSphere InformationServer Web console.

    For information about logviews and theirconfiguration in InfoSphereMetadata Workbench, seeLog messages, Creatinglogging configurations, andCreating log views.

    QueryInfoSphereDataStage jobsin InfoSphereMetadataWorkbench

    On the Discover tab, youcan run the Job DesignUsage published query tosee the links between jobsand their sources. You canalso construct your ownqueries to see the stagetypes of a project.

    For general informationabout queries, see Queries.

    For information aboutcreating queries, seeCreating queries.

    After you complete these actions, you are ready to set up InfoSphere MetadataWorkbench to analyze metadata for lineage. Follow these steps:1. Run the Manage Lineage utility.

    This utility automatically runs the Manual Binding and Map Database Aliasutilities.

    2. To identify schemas that are identical, run the Data Source Identity utility.If two schemas are identified as identical, the database tables and databasecolumns contained by the schemas are also marked as identical when theirnames match. This might be necessary when the same data source is importedinto the repository by different means, such as by a connector and a bridge.

    3. Run the data lineage report.The data lineage report shows the movement of data within a job or throughmultiple jobs. The report can also show the order of activities in a run of a job.

    Design InfoSphere DataStage jobs for optimum lineage 3

    http://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/limitationshand-writtensql.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/limitations.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/limitations.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/c_logMessagesMDWB.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/t_creatingLoggingConfigurationMDWB.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/t_creatingLoggingConfigurationMDWB.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/t_creatingLogViewMDWB.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.doc/topics/c_queriesMDWB.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.doc/topics/t_creatingQueries.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/t_runningAutomatedServices.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/t_runningDataSourceIdentity.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.doc/topics/t_runningDataFlowJobFlow.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.doc/topics/t_runningDataFlowJobFlow.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/t_runningDataSourceIdentity.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/t_runningAutomatedServices.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.doc/topics/t_creatingQueries.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.doc/topics/c_queriesMDWB.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/t_creatingLogViewMDWB.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/t_creatingLoggingConfigurationMDWB.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/t_creatingLoggingConfigurationMDWB.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.mdwb.admin.doc/topics/c_logMessagesMDWB.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/limitations.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/limitations.htmlhttp://publib.boulder.ibm.com/infocenter/iisinfsv/v9r1/topic/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/limitationshand-writtensql.html