Upload
rahul-verma
View
657
Download
4
Embed Size (px)
Citation preview
DataStage Projects – DataStage Projects – Life Cycle StagesLife Cycle Stages
2© 2002. Infosys Technologies Ltd.
Agenda
Introduction
Requirements
Design
Build
Testing
Implementation
Support
3© 2002. Infosys Technologies Ltd.
Introduction
DataStage projects follow the same life cycle stages as other projects.
A typical life cycle phase of DataStage projects is
Requirements Design Build Test Implement Support
4© 2002. Infosys Technologies Ltd.
Requirements
Warehouse needs to cater to a wide range of user analytics. Requirements should be well documented, elaborate and tight
Clearly identify the interface points and define the communication protocol
User views need to be modeled and aligned more closely to meet business needs
Identify the dependencies between all aspects of the project like ETL feeds, User Views etc. to facilitate better control over project execution
Performance related requirements need to be identified and documented.
Source Data Analysis need to be done to understand the type of data which needs to be processed.
A detailed Analysis/High level design phase is required to drill down the requirements
DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport
5© 2002. Infosys Technologies Ltd.
Steps to effective Requirement gathering
Identify the source system tables required.
Identify the data flow
Identify the data process.
Identify Views to be created, Reports to be generated etc.
Create Requirement Traceability and Test Matrix
State the assumptions clearly.
Define implementation Considerations.
Document Design Solution.
Identify Transformations -- Define data mapping.
Gather Volumetrics
Start Data Analysis.
DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport
6© 2002. Infosys Technologies Ltd.
Design
A fluid Data Model will result in lot of rework. Changes might be small, but might be required at multiple places increasing volume of rework.
Changing Data Model leads to difficulty in Metadata Management, which is very critical for an enterprise data warehouse. Metadata needs to be extracted and loaded into DataStage every time there is a change. This process needs a significant lead time.
Design should be robust and accommodate process health features like Auditing, ACR balancing, Error processing and reprocessing, Restart ability, Recovery etc
Perform POC on critical requirements and identify performance bottlenecks upfront
ACR checkpoints in the data flow will help in identifying the data problems early in the process before data is loaded to warehouse.
Design patterns should be reusable across projects to reduce development time
Brainstorm and consider various aspects of Framework , Finalize and Bring Clarity.
A flexible framework design which takes care of recovery in case of a downtime is very critical from application support perspective.
DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport
7© 2002. Infosys Technologies Ltd.
Steps to a Good Design
Re-validate Data Mapping.
Define General programming specifications.
Define Development objects.
Define Miscellaneous processes like Error processing, re-processing, ACR balancing, Auditing etc.
Create a POC for all the critical/complicated points, make it End to End to have no surprises during build.
Identify Common functionality, jobs, Scripts etc keeping re-usability in mind
Prepare Test plans, map them to requirements.
Define Programming standards, directory structure.
Explore different options/possibilities for Data Extraction
DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport
8© 2002. Infosys Technologies Ltd.
Build
Multiple stages can be used to establish similar/same function. Choice of selecting the right stage and configuration is key in developing a quality solution
Implementation of encryption routines using Open SSL library for AES encryption/Decryption/ SHA-1 hashing etc should be taken care in the start of the phase.
Metadata is a key aspect of a successful data warehouse implementation. Standards need to be clearly defined and followed
Accessing DataStage over Citrix server has improved productivity to a large extent. This has also given the flexibility to try out multiple options and provide the best solution. Hence Citrix server should be used for accessing datastage.
Knowledge Management practices capture and disseminate information. Repository of knowledge articles, learnings, checklist should be built from experience
DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport
9© 2002. Infosys Technologies Ltd.
Tips to Efficient Build
Categorize similar jobs.
Define framework for each category.
Define framework for each process (like error processing, record processing,
Finalize job parameters.
Build re-usable components, frameworks and custom stages.
Prepare necessary check list for Build.
Get Metadata ready.
Build datastage jobs.
Perform Usage Analysis for Metadata Compliance.
Finalize sequencing and scheduling (Either Control M or Sequencer)
DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport
10© 2002. Infosys Technologies Ltd.
Testing – System/ Volume/ Performance/ Integration/ Acceptance
Experience in handling large volumes of data in multiple projects, including the huge CSPAM volumes from Target Stores
Broader understanding and good experience from innumerable challenges that we have overcome across projects and environments, old as well as new.
Understanding the role of the various teams involved. Ability to partner/coordinate/collaborate with multiple teams.
Testing of DataStage jobs requires considerable amount of time. Adequate testing time should be planned
Preparing a good test data bed is often complex and difficult. Plan well in advance.
Plan to have enough database capacity and test schemas to have a smooth testing phase.
Learning's from Target DataStage/UDB environment is critical in successful testing phase
DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport
11© 2002. Infosys Technologies Ltd.
Testing – System/ Volume/ Performance/ Integration/ Acceptance
Obtain/ prepare source data matching all the scenarios.
Try to obtain production source data if possible for testing.
If possible have more rounds of testing.
Perform Unit testing
Test for negative cases too.
If there are changes, do regression testing.
Ensure the configuration similar to production environment while testing.
Identify system related issues and include them in System Testing. Configure and use Schedulers.
Perform Volume testing with various data. Use source data from production if available. Otherwise use generate data using tools for volume testing.
Identify all other external components and include them for Integration testing.
DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport
12© 2002. Infosys Technologies Ltd.
Implementation
Need to plan in advance for the implementation phase. Need to collaborate with different stake holders to successfully implement various aspects of the application such as DataStage jobs, Control-M schedule, Unix scripts, ACR application, etc.
In case of a new environment like grmetlprod01, there needs to be a test implementation phase to iron out any environment related surprises.
Awareness of the new processes in place for DataStage implementation such as the deployment using WBSD. This will help in resolving problems and reducing delays
A well developed deployment checklist which can be reused across projects
DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport
13© 2002. Infosys Technologies Ltd.
Support
Supported DataStage applications after implementation and successfully turned over a few applications to ESS. Need to plan well in advance for the involvement of TOC for the application turnover. Also a comprehensive knowledge article listing all the issues faced and the resolution from the UAT through support phase is very critical for the support team.
Based on the criticality of the application, a clear escalation procedure/support plan should be put in place to address environment related issues. This should be planned in advance with the DataStage support team as well as the DB hosting team.
Ability and experience to provide 24*7 support is critical for most of the high volume Data Warehouse ETL applications. Infosys Global Delivery Model suits for round the clock support with onsite-offshore resources.
Familiarity with all the support/turnover activities and systems like remedy to manage the post implementation/ turnover effectively.
DesignDesignDesignDesignRequirementsRequirementsRequirementsRequirements BuildBuildBuildBuild TestTestTestTest ImplementImplementImplementImplement SupportSupportSupportSupport