SSIS Framework.pptx

  • Upload
    murali

  • View
    231

  • Download
    2

Embed Size (px)

Citation preview

  • 8/17/2019 SSIS Framework.pptx

    1/53

    SSIS – A BEGINNING

    FRAMEWORKSQL Saturday #15

    2 May 2009

    Eric Wisdahl

    http://ewisdahl.spaces.live.com

    http://ewisdahl.spaces.live.com/http://ewisdahl.spaces.live.com/

  • 8/17/2019 SSIS Framework.pptx

    2/53

    OVERVIEW

    Most developers would agree that every SSISsolution will have the same fundamental outline.

     A basic framework will expedite the process byhandling the common tasks between the systems

    while allowing the developer to concentrate onthe task at hand. This framework will consist ofmany items, including but not limited to settingup package configurations, logging, audit trails,error handling, naming standards, etc. This

    document will present an example frameworkwhich can be used as the basis for future SSISPackage development.

  • 8/17/2019 SSIS Framework.pptx

    3/53

    META SUB SYSTEMS

    The Meta Subsystem contains informationrelating to the auditing, data quality, datadictionaries, processing control tables, and

    configurations.

    Please see documentation for the Meta-Datasubsystem for more information on the

    audit, data quality and dictionary tablesathttp://ewisdahl.spaces.live.com

    http://ewisdahl.spaces.live.com/http://ewisdahl.spaces.live.com/

  • 8/17/2019 SSIS Framework.pptx

    4/53

     AUDIT TABLES

    The Packages table stores information relating to the packagename and versions which are executing.

    The PackageExecutions table stores information relating to thepackage that is being run, the start and end dates, and whether

    or not the execution was successful.

    The TableProcessing table stores information relating to thestatistics of the package execution. How many records whereinitially in the table, how many were inserted, how manyupdated, how many errors there were, and how many records

    were in the table after execution.

    The DimAudit table stores information to tie thePackageExecutions and TableProcessing tables together forthose packages which might have more than one entry for theTableProcessing table.

  • 8/17/2019 SSIS Framework.pptx

    5/53

  • 8/17/2019 SSIS Framework.pptx

    6/53

    DATA DICTIONARY TABLES

    The TableDictionary table stores informationrelating to the database, schema, and table fromthe database system tables, as well as user inputinformation such as the description, grain,

    display name, business name, etc.

    The ColumnDictionary table stores informationrelating to the column name, data type, size,precision, scale, nullability, and default valuefrom the database system tables, as well as userinput information such as description, businessname, display name, type of SCD dimension,example values, unknown member values, etc.

  • 8/17/2019 SSIS Framework.pptx

    7/53

    DATA DICTIONARY TABLES

    (CONTINUED)

    The LogicalDataMap table stores information

    relating to how the data was input in to the

    system. This includes the source system

    database, schema, table, field and data type as

    well as the etl rules and any relevant comments.

  • 8/17/2019 SSIS Framework.pptx

    8/53

    CONTROL TABLES

    The FrequencyTypes table holds information relating to

    types of date ranges. It is used by the ProcessingDates

    table below as an enumeration.

    The ProcessingDates Table is a control table which holds apointer to a filter as well as a start and end date range

    for the particular job to process. It also holds a pointer

    to the Frequency type. The date range values in the

    processing dates table are updated via a stored

    procedure based on the frequency type.

    The DictionaryDatabaseList table is used to store a list of

    attributes relating to the databases which will be looped

    through when processing the data dictionary tables.

  • 8/17/2019 SSIS Framework.pptx

    9/53

    CONFIGURATIONS TABLE

    The META environment also houses the SSISConfiguration Table. It is used to house all of theSQL Server configurations that are used in thevarious SSIS packages. Please see SSIS

    Configurations, Expressions and Constraints onhttp://ewisdahl.spaces.live.com or BOL for an

    overview of SQL Server configurations.

    http://ewisdahl.spaces.live.com/http://ewisdahl.spaces.live.com/

  • 8/17/2019 SSIS Framework.pptx

    10/53

    CONFIGURATIONS

    In this version of an SSIS framework, we use anenvironment variable to hold the connection stringfor the META database. In this fashion we form anindirect configuration to the rest of theconfigurations to be performed.

    Once we have the connection to META we use the SQLServer Configuration table to populate the rest ofthe framework configurations as well as theremainder of the connection strings.

    When using configurations, always put the descriptionfor the variable or property with the configuration ifpossible, as this allows the next user to identify howthe record(s) in the configuration table are beingused.

  • 8/17/2019 SSIS Framework.pptx

    11/53

  • 8/17/2019 SSIS Framework.pptx

    12/53

    CONFIGURATIONS –

    FRAMEWORK-AUDITPARAMETERS-

    SERVERNAME

    The ServerName configuration is used to allow the

    easy identification of what server the

    configurations are coming from as well as

    (presumably) what server the ssis job was

    running from. It is further used incommunicating back with the operator during

    error or completion emails.

  • 8/17/2019 SSIS Framework.pptx

    13/53

    CONFIGURATIONS-

    FRAMEWORK-

     AUDITQUERYEXPRESSIONS

    The AuditQueryExpressions configurations are

    used to set the variable values which contain the

    sql command strings (via expressions) for the

    execute sql tasks within the pre and post

    processing sequence containers.

  • 8/17/2019 SSIS Framework.pptx

    14/53

    CONFIGURATIONS –

    FRAMEWORK-EMAILSETTINGS

    The EmailSettings configurations will hold the

    values for the from and to email addresses. It

    will also hold the expressions for the subject and

    body of the email when a package generates and

    error as well as for when a package executessuccessfully.

    Note – There is an alternative configuration

    Controller-EmailSettings, which houses the sameinformation but with different values, that will be

    used in the control (master) packages.

    CO G A O S

  • 8/17/2019 SSIS Framework.pptx

    15/53

    CONFIGURATIONS –

    FRAMEWORK-

    INDEXSCRIPTGENERATION

    The IndexScriptGeneration configuration is used to

    house the expressions for the Create and Delete

    Index Script queries. Note, this is for the query

    that generates the individual scripts and not for

    the script itself. See the section on handlingIndexes in SSIS which follows.

  • 8/17/2019 SSIS Framework.pptx

    16/53

    CONFIGURATIONS –

    FRAMEWORK-ROOTFOLDER

    The RootFolder configuration is used to house the

    UNC path to the folder which will contain sub

    folders for your log files, raw files, packages,

    access databases, etc.

    NOTE – In the examples I am presenting I use the

    “C:\” named drive. This is bad practice. All

    paths within SSIS should be full UNC paths (

    \\servername.domainname\folder\subfolder\).

    However, I do not have shares set up on my

    personal laptop… This is an example of “Do as I

    say, not as I do!”

    http://smb//servername.domainname/folder/subfolder/http://smb//servername.domainname/folder/subfolder/

  • 8/17/2019 SSIS Framework.pptx

    17/53

    CONFIGURATIONS –

    SMTPCONNECTIONMANAGER-

    CONNECTIONSTRING

    The SMTPConnectionManager-ConnectionString is

    used to house the connection string to the local

    exchange server (or other mail service).

    Note – As I do not have access to an exchange server

    outside of work, my examples either have non-

    working email components, or script tasks pointing

    to gmail’s outward facing SMTP server. This

    script task, or something similar, will need to beused if you have any situations where you need to

    pass along security credentials to an email task, as

    the send mail task does not allow any security

    outside of windows security.

  • 8/17/2019 SSIS Framework.pptx

    18/53

    CONFIGURATIONS – OTHER

    If you have connection strings to a set of databases outside

    of the meta database, it is often useful to include all of

    these connections within the framework as well, so that

    you do not have to continually recreate the connection

    managers or reset the configurations to these connection

    managers.

    Once the framework configurations are set up, it is

    important to realize that other configurations can and

    should be set for the individual packages as applicable. In

    the screen shot showing the package configuration

    organizer you can see an extra configuration – Dictionary-

    DynamicDatabaseConnectionString that is relevant only

    to a particular package or set of packages, but not to the

    framework as a whole. This is normal behavior.

  • 8/17/2019 SSIS Framework.pptx

    19/53

    LOGGING

    SSIS contains an internal logging mechanism to expose run timeevents. This information can be sent to text files, an sql profilerfile, the sysssislog table on an instance of SQL Server, thewindows event log or to an xml file. For our purposes, we use thetext file logging mechanism. This creates a csv file for eachpackage, which is dynamically created with the package nameand date.

    This file can be used to track down warnings and errors from theexecution of the package, as well as determining the last activityfrom the package if the package has hung. We have chosen thetext file as it is a basic method of tracking any errors which is notreliant on any other system being up in order to function.

    In this framework I have included all logging events except for the

    OnPipeline events and the diagnostic events, as these add a lot ofrecords to the log without providing details that I feel are reallyneeded.

  • 8/17/2019 SSIS Framework.pptx

    20/53

    LOGGING MENU ITEM

  • 8/17/2019 SSIS Framework.pptx

    21/53

    LOGGING WIZARD

  • 8/17/2019 SSIS Framework.pptx

    22/53

    LOGGING WIZARD 2

  • 8/17/2019 SSIS Framework.pptx

    23/53

    FRAMEWORK VARIABLES

     Variables are used for a host of activities

    throughout the framework. There are variables

    which are affected by both package configurations

    and expressions.

    There has been some effort to keep the variables in

    a semblance of organization by using the

    namespace property. To see the namespace

    property, open the variables window and selectthe “choose variable columns” button.

  • 8/17/2019 SSIS Framework.pptx

    24/53

    FRAMEWORK VARIABLES

    This will open up the choose variable columns window. Here you have the option to

    select from the scope, data type, value, namespace and raise event when variable

    value changes columns. Check the namespace column.

  • 8/17/2019 SSIS Framework.pptx

    25/53

    FRAMEWORK VARIABLES

    In the framework, we have created a collection ofnamespaces to hold related variables.

    The AuditParameter namespace currently houses

    information about the destination and sourcetables. It is necessary to fill out the variables inthis namespace for every package in order toleave the proper audit trail.

    The AuditQuery namespace currently housesvariables which use expressions to generate thesql query or command used in the pre-processingand post-processing sequence containers (as wellas the stop process task).

  • 8/17/2019 SSIS Framework.pptx

    26/53

    FRAMEWORK VARIABLES

    The AuditVariable namespace is used to house the returnvalues from the sql queries, insert / update / error / etccounts from the data flow, etc. Essentially any item used totrack an audit item for the package will be stored in thisnamespace.

    The DateParameter namespace is used to house informationrelating to the processing dates record. The namespacecontains the frequency type variable which will need to befilled in for any package which wishes to make use of theprocessing dates table. This frequency type is used togenerate the processing dates filter via an expression with

    the package name. The DateParameter namespace furthercontains the processing date key, start and end date rangesfor this package (if a record is present in the processingdates table for the package).

  • 8/17/2019 SSIS Framework.pptx

    27/53

    FRAMEWORK VARIABLES

    The Files namespace contains variables used to housenetwork paths and file names. It includes variablesthat are either set via package configurations orexpressions.

    The Index namespace is used to house the queriesthat will generate the create and delete indexscripts, the record sets that will house these scriptsand the individual variable that will hold one ofthese scripts at a time.

    The Key namespace will be used to house anyreturned surrogate key values. As of this writingthis is only used for the audit trail, although it iscertainly possible to house any returned key withinthe namespace.

  • 8/17/2019 SSIS Framework.pptx

    28/53

    FRAMEWORK VARIABLES

    The Query namespace is used to house any queriesthat are process related as opposed to relating tothe audit or control procedures. An example is aquery used to update the type 2 slowly changingdimension columns in a batch update (as opposed

    to a row by row approach within the data flow).

    The SSISEmail namespace is used to holdvariables related to emailing the operators andconstructing the subject and body of emails to besent out. 

    The User namespace is the default namespace forSSIS. It will contain any variables which areadded to the package using the framework(Unless if you specify another namespace).

  • 8/17/2019 SSIS Framework.pptx

    29/53

    SSIS AND INDEXES

    Indexes are known to have a great impact onperformance when performing a large number ofinserts or updates. As such, it is advisable todrop and recreate the indexes associated withany table that an SSIS package is processing.

    We handle the creation and deletion of the indexesthrough a pair of expressions, stored as packageconfigurations, which return a recordset of thescripts used in this process. We then loopthrough the recordset and execute eachstatement individually.

  • 8/17/2019 SSIS Framework.pptx

    30/53

    STOP PROCESS

    The Stop Process task in the framework is used todetermine whether or not this process has beenrun for the parent package before. This task usesthe AuditQuery::StopProcessQuery variable asthe source of the query and the

     AuditVariable::StopProcess variable to store theBoolean value returned in the query.

    Finally, the precedence constraint going in to thepre-processing container is as follows:

    @[AuditVariable::StopProcess] == false ||@[AuditParameter::ParentPackageExecutionKey] == -1

  • 8/17/2019 SSIS Framework.pptx

    31/53

    PRE-PROCESSING CONTAINER

    The pre-processing sequence container houses thetasks used in determining the initial row countsand surrogate key for the destination table,creating the audit trail, generating the necessarycontrol information for the package and those

    tasks used to handle the indexes on thedestination table.

  • 8/17/2019 SSIS Framework.pptx

    32/53

  • 8/17/2019 SSIS Framework.pptx

    33/53

    POST PROCESSING CONTAINER

    The post-processing sequence container houses thetasks used in determining the initial row countsand surrogate key for the destination table,updating the audit trail, recreating the indexeson the destination table, sending out completion

    emails (where appropriate) and deleting any fileswhich are no longer necessary.

  • 8/17/2019 SSIS Framework.pptx

    34/53

  • 8/17/2019 SSIS Framework.pptx

    35/53

    PROCESSING CONTAINER

    The processing container is used to house the tasks specific

    to the package being developed. It can be further broken

    down into sub containers if desired.

  • 8/17/2019 SSIS Framework.pptx

    36/53

    DATA FLOW TASKS

    Most of the activity in the processing sequence

    container will take place in a data flow task.

    Inside of the data flow task, we like to keep

    certain items standardized across packages.

  • 8/17/2019 SSIS Framework.pptx

    37/53

    COUNTS

    Extract – The number of rows pulled from the source system

    Error Type1 Update – The number of data errors encountered during thetype 1 update branch.

    Error Type 2 Update – The number of data errors encountered during thetype 2 update branch.

    Error Insert – The number of data errors encountered during the insertion of

    the records into the destination table.Failed Lookup – The number of rows that failed to find a match in a lookuptransformation. Often used when building dimensions.

    Insert Standard – The number of rows inserted during standard processing.

    Insert Non-Standard – The number of rows inserted during non-standardprocessing (ex. late arriving)

    No Change – The number of rows which did not change between what was

    input from the source system and what is currently stored in thedestination.

    Update Type 1 – The number of rows updated during the processing of theSCD Type 1 branch.

    Update Type 2 – The number of rows updated during the processing of theSCD Type 2 branch.

  • 8/17/2019 SSIS Framework.pptx

    38/53

  • 8/17/2019 SSIS Framework.pptx

    39/53

    ERROR FILES

    Data errors are put out to a raw file destination.

     All errors within the data flow should be brought

    together via a union all operation with enough

    information to describe where the error occurred

    as well as what the error was.

  • 8/17/2019 SSIS Framework.pptx

    40/53

  • 8/17/2019 SSIS Framework.pptx

    41/53

    ONERROR EVENT HANDLER

    The OnError Event Handler is a set of code that isexecuted any time that an error has occurredwhile executing a package. These are errors thatoccur with the process, and are different from a

    data error, if the data error is handled within thedata flow task. Within the OnError EventHandler we determine whether or not we havealready sent an error email for this package. Ifwe have not previously sent an error email, we do

    so now to a list of recipients determined viapackage configuration. Afterwards we incrementthe counter so that we do not send a second erroremail.

  • 8/17/2019 SSIS Framework.pptx

    42/53

    Sample Error Email:

    From: [email protected] [mailto:[email protected]]

    Sent: Tuesday, March 1, !" !:# $M

    To: %e&ort Team

    Su'(ect: )rror durin* e+ecution of the load%-T$)/T$0M/2)340)S &ac5a*e.m&ortance: 6i*h

    There 7as an error in the e+ecution of the load%-T$)/T$0M/2)340)S &ac5a*e

    7hich started at 8919!" !::## $M. The follo7in* is the first error re&orted:

    ;1

  • 8/17/2019 SSIS Framework.pptx

    43/53

    CONNECTION MANAGERS

    Connection Managers should be created for every data basewhich is used. The name should be the name of thedatabase or file with no reference to the machine or accountto be used (as these will change between environments).The connection managers that are common to thedevelopment efforts should be placed in the common

    template for a project and should have the connectionstring and descriptions set via package configuration. It isworth noting that having extra connection managerswithin a package that are not used carries a minimal costwhen validating the package.

    If there would be two separate connection managers to thesame database, but with different connection managertypes, assume that the OLE db connection manager is thedefault and name any other connection managers withtheir type (example META and META.NET)

  • 8/17/2019 SSIS Framework.pptx

    44/53

    HASH VALUES (CHECK SUMS)

    Hash values are used to generate quickcomparisons to determine whether or not arecord, or a subset of a record’s columns, haschanged. In order to facilitate the quickcomputation of hash values within a data flow we

    have employed the Checksum Transformationavailable from Konesans. With thistransformation you simply select which columnsyou would like to be included with the hash andspecify and output column name.

  • 8/17/2019 SSIS Framework.pptx

    45/53

  • 8/17/2019 SSIS Framework.pptx

    46/53

    BIDS HELPER

    BIDS Helper is a visual studio add-in that expands thefunctionality of the business intelligence design studio.BIDS Helper includes a vast array of extensions includinggiving a graphical representation of expressions andconfigurations, allowing for pipeline componentperformance breakdowns, it extends the variables window,

    sorts the project files, fixes relative paths, gives a list of allexpressions and non-standard property values used withinthe packages, etc. It is HIGHLY recommended that anyoneusing BIDS to develop SSIS package install this product.

    BIDS Helper is available at

    http://www.codeplex.com/bidshelper For more informationon this product please see the the bidshelper web site listedabove.

    http://www.codeplex.com/bidshelperhttp://www.codeplex.com/bidshelper

  • 8/17/2019 SSIS Framework.pptx

    47/53

  • 8/17/2019 SSIS Framework.pptx

    48/53

  • 8/17/2019 SSIS Framework.pptx

    49/53

  • 8/17/2019 SSIS Framework.pptx

    50/53

    BIDS OBJECT GUIDS

    However, if you have installed

    BIDS you can generate new

    GUIDS for all objects within

    the package by right clicking

    on the package name within

    the solution explorer and

    choosing Reset GUIDS (this

    method is preferred as it

    will reset all of the IDs

    within the package).

  • 8/17/2019 SSIS Framework.pptx

    51/53

  • 8/17/2019 SSIS Framework.pptx

    52/53

    PACKAGE VERSIONS

    There is further a property for Version Comments that should befilled in to explain the changes that have been implemented.

  • 8/17/2019 SSIS Framework.pptx

    53/53

    CONCLUSION

    I hope that this has been helpful. I will try to

    provide the packages that load the META data

    dictionary shortly on my skydrive (which you can

    find a link to athttp://ewisdahl.spaces.live.com)

    as a working example. I will also try to provide apackage or two showing normal load into an ODS

    and a sample package used to conform data.

    NOTE: The framework I have presented is a draft

    item. I am continually updating it, and, if youshould happen to use it as your base framework

    going forward, I would expect you to do the same.

    http://ewisdahl.spaces.live.com/http://ewisdahl.spaces.live.com/