13
DataStage Projects – DataStage Projects – Life Cycle Stages Life Cycle Stages

Day 2[1].1.2 DataStage Projects Life Cycle

Embed Size (px)

Citation preview

Page 1: Day 2[1].1.2 DataStage Projects Life Cycle

DataStage Projects – DataStage Projects – Life Cycle StagesLife Cycle Stages

Page 2: Day 2[1].1.2 DataStage Projects Life Cycle

2© 2002. Infosys Technologies Ltd.

Agenda

Introduction

Requirements

Design

Build

Testing

Implementation

Support

Page 3: Day 2[1].1.2 DataStage Projects Life Cycle

3© 2002. Infosys Technologies Ltd.

Introduction

DataStage projects follow the same life cycle stages as other projects.

A typical life cycle phase of DataStage projects is

Requirements Design Build Test Implement Support

Page 4: Day 2[1].1.2 DataStage Projects Life Cycle

4© 2002. Infosys Technologies Ltd.

Requirements

Warehouse needs to cater to a wide range of user analytics. Requirements should be well documented, elaborate and tight

Clearly identify the interface points and define the communication protocol

User views need to be modeled and aligned more closely to meet business needs

Identify the dependencies between all aspects of the project like ETL feeds, User Views etc. to facilitate better control over project execution

Performance related requirements need to be identified and documented.

Source Data Analysis need to be done to understand the type of data which needs to be processed.

A detailed Analysis/High level design phase is required to drill down the requirements

DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport

Page 5: Day 2[1].1.2 DataStage Projects Life Cycle

5© 2002. Infosys Technologies Ltd.

Steps to effective Requirement gathering

Identify the source system tables required.

Identify the data flow

Identify the data process.

Identify Views to be created, Reports to be generated etc.

Create Requirement Traceability and Test Matrix

State the assumptions clearly.

Define implementation Considerations.

Document Design Solution.

Identify Transformations -- Define data mapping.

Gather Volumetrics

Start Data Analysis.

DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport

Page 6: Day 2[1].1.2 DataStage Projects Life Cycle

6© 2002. Infosys Technologies Ltd.

Design

A fluid Data Model will result in lot of rework. Changes might be small, but might be required at multiple places increasing volume of rework.

Changing Data Model leads to difficulty in Metadata Management, which is very critical for an enterprise data warehouse. Metadata needs to be extracted and loaded into DataStage every time there is a change. This process needs a significant lead time.

Design should be robust and accommodate process health features like Auditing, ACR balancing, Error processing and reprocessing, Restart ability, Recovery etc

Perform POC on critical requirements and identify performance bottlenecks upfront

ACR checkpoints in the data flow will help in identifying the data problems early in the process before data is loaded to warehouse.

Design patterns should be reusable across projects to reduce development time

Brainstorm and consider various aspects of Framework , Finalize and Bring Clarity.

A flexible framework design which takes care of recovery in case of a downtime is very critical from application support perspective.

DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport

Page 7: Day 2[1].1.2 DataStage Projects Life Cycle

7© 2002. Infosys Technologies Ltd.

Steps to a Good Design

Re-validate Data Mapping.

Define General programming specifications.

Define Development objects.

Define Miscellaneous processes like Error processing, re-processing, ACR balancing, Auditing etc.

Create a POC for all the critical/complicated points, make it End to End to have no surprises during build.

Identify Common functionality, jobs, Scripts etc keeping re-usability in mind

Prepare Test plans, map them to requirements.

Define Programming standards, directory structure.

Explore different options/possibilities for Data Extraction

DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport

Page 8: Day 2[1].1.2 DataStage Projects Life Cycle

8© 2002. Infosys Technologies Ltd.

Build

Multiple stages can be used to establish similar/same function. Choice of selecting the right stage and configuration is key in developing a quality solution

Implementation of encryption routines using Open SSL library for AES encryption/Decryption/ SHA-1 hashing etc should be taken care in the start of the phase.

Metadata is a key aspect of a successful data warehouse implementation. Standards need to be clearly defined and followed

Accessing DataStage over Citrix server has improved productivity to a large extent. This has also given the flexibility to try out multiple options and provide the best solution. Hence Citrix server should be used for accessing datastage.

Knowledge Management practices capture and disseminate information. Repository of knowledge articles, learnings, checklist should be built from experience

DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport

Page 9: Day 2[1].1.2 DataStage Projects Life Cycle

9© 2002. Infosys Technologies Ltd.

Tips to Efficient Build

Categorize similar jobs.

Define framework for each category.

Define framework for each process (like error processing, record processing,

Finalize job parameters.

Build re-usable components, frameworks and custom stages.

Prepare necessary check list for Build.

Get Metadata ready.

Build datastage jobs.

Perform Usage Analysis for Metadata Compliance.

Finalize sequencing and scheduling (Either Control M or Sequencer)

DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport

Page 10: Day 2[1].1.2 DataStage Projects Life Cycle

10© 2002. Infosys Technologies Ltd.

Testing – System/ Volume/ Performance/ Integration/ Acceptance

Experience in handling large volumes of data in multiple projects, including the huge CSPAM volumes from Target Stores

Broader understanding and good experience from innumerable challenges that we have overcome across projects and environments, old as well as new.

Understanding the role of the various teams involved. Ability to partner/coordinate/collaborate with multiple teams.

Testing of DataStage jobs requires considerable amount of time. Adequate testing time should be planned

Preparing a good test data bed is often complex and difficult. Plan well in advance.

Plan to have enough database capacity and test schemas to have a smooth testing phase.

Learning's from Target DataStage/UDB environment is critical in successful testing phase

DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport

Page 11: Day 2[1].1.2 DataStage Projects Life Cycle

11© 2002. Infosys Technologies Ltd.

Testing – System/ Volume/ Performance/ Integration/ Acceptance

Obtain/ prepare source data matching all the scenarios.

Try to obtain production source data if possible for testing.

If possible have more rounds of testing.

Perform Unit testing

Test for negative cases too.

If there are changes, do regression testing.

Ensure the configuration similar to production environment while testing.

Identify system related issues and include them in System Testing. Configure and use Schedulers.

Perform Volume testing with various data. Use source data from production if available. Otherwise use generate data using tools for volume testing.

Identify all other external components and include them for Integration testing.

DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport

Page 12: Day 2[1].1.2 DataStage Projects Life Cycle

12© 2002. Infosys Technologies Ltd.

Implementation

Need to plan in advance for the implementation phase. Need to collaborate with different stake holders to successfully implement various aspects of the application such as DataStage jobs, Control-M schedule, Unix scripts, ACR application, etc.

In case of a new environment like grmetlprod01, there needs to be a test implementation phase to iron out any environment related surprises.

Awareness of the new processes in place for DataStage implementation such as the deployment using WBSD. This will help in resolving problems and reducing delays

A well developed deployment checklist which can be reused across projects

DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport

Page 13: Day 2[1].1.2 DataStage Projects Life Cycle

13© 2002. Infosys Technologies Ltd.

Support

Supported DataStage applications after implementation and successfully turned over a few applications to ESS. Need to plan well in advance for the involvement of TOC for the application turnover. Also a comprehensive knowledge article listing all the issues faced and the resolution from the UAT through support phase is very critical for the support team.

Based on the criticality of the application, a clear escalation procedure/support plan should be put in place to address environment related issues. This should be planned in advance with the DataStage support team as well as the DB hosting team.

Ability and experience to provide 24*7 support is critical for most of the high volume Data Warehouse ETL applications. Infosys Global Delivery Model suits for round the clock support with onsite-offshore resources.

Familiarity with all the support/turnover activities and systems like remedy to manage the post implementation/ turnover effectively.

DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport