67
ISYS 637 603: IPL Data Warehouse Implementation Table of Contents 1. EXECUTIVE SUMMARY.................................................2 2. INTRODUCTION......................................................3 3. BUSINESS CASE FOR DATA WAREHOUSE..................................4 4. DATA WAREHOUSE ARCHITECTURE.......................................6 5. DIMENSIONAL MODEL.................................................9 6. ETL IMPLEMENTATION...............................................10 6.1 Extraction and Import of Data from Source.....................10 6.1.1 IPL data.....................................................10 6.1.2 Retail Data..................................................10 6.2 Transforming the data.........................................11 6.2.1 To obtain Batsman data.....................................11 6.2.2 To obtain Bowler Data......................................12 6.2.3 To obtain Player data......................................13 6.3 Loading the Data into the Data Warehouse......................14 6.3.1 Loading the Time Dimension (Dim_Time).......................14 6.3.2 Loading the Match dimension (Dim_Match)......................15 6.3.3 Loading the retail dimensions................................16 6.3.4 Creating and Loading the Fact tables:........................17 6.3.5 Creating the Player fact table...............................18 6.3.6 Creating the Procurement Fact table..........................18 7. METADATA DOCUMENTATION...........................................19 7.1 Fact Meta Data................................................19 7.2 Dimension Metadata.............................................22 8. BI REPORTS.......................................................40 8.1 Cricket Related Analysis......................................40 8.2 Business Operations Analysis...................................47 8.3 Additional Report Generated using SSRS.........................50 9. GLOSSARY OF TERMS................................................51 10. REFERENCES..................................................... 52 1

satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Table of Contents1. EXECUTIVE SUMMARY.................................................................................................................2

2. INTRODUCTION...............................................................................................................................3

3. BUSINESS CASE FOR DATA WAREHOUSE.................................................................................4

4. DATA WAREHOUSE ARCHITECTURE.........................................................................................6

5. DIMENSIONAL MODEL..................................................................................................................9

6. ETL IMPLEMENTATION...............................................................................................................10

6.1 Extraction and Import of Data from Source...............................................................................10

6.1.1 IPL data........................................................................................................................................10

6.1.2 Retail Data....................................................................................................................................10

6.2 Transforming the data................................................................................................................11

6.2.1 To obtain Batsman data............................................................................................................11

6.2.2 To obtain Bowler Data..............................................................................................................12

6.2.3 To obtain Player data................................................................................................................13

6.3 Loading the Data into the Data Warehouse................................................................................14

6.3.1 Loading the Time Dimension (Dim_Time)............................................................................14

6.3.2 Loading the Match dimension (Dim_Match)................................................................................15

6.3.3 Loading the retail dimensions.......................................................................................................16

6.3.4 Creating and Loading the Fact tables:...........................................................................................17

6.3.5 Creating the Player fact table........................................................................................................18

6.3.6 Creating the Procurement Fact table.............................................................................................18

7. METADATA DOCUMENTATION.................................................................................................19

7.1 Fact Meta Data...........................................................................................................................19

7.2 Dimension Metadata........................................................................................................................22

8. BI REPORTS....................................................................................................................................40

8.1 Cricket Related Analysis...........................................................................................................40

8.2 Business Operations Analysis..........................................................................................................47

8.3 Additional Report Generated using SSRS.......................................................................................50

9. GLOSSARY OF TERMS..................................................................................................................51

10. REFERENCES..............................................................................................................................52

1

Page 2: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

1. EXECUTIVE SUMMARYThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing various Indian cities. As of 2016, there have been nine seasons of the cricket extravaganza that is watched my millions throughout the world. The soaring popularity of the tournament has generated vast amounts of data related to the matches, players and its profitability and marketability. Currently, all the data are in isolated systems and team business users are experiencing difficulties in making the best decisions that would drive sustainable growth. There is a need for a data strategy that would complement the teams marketing strategies and would help drive revenue growth and customer experience [1].

The objective of this project is to create a data warehouse that will consolidate and transform data into useful information and serve as a central repository for the purpose of decision-making to all the stakeholders of Indian Premier League [2]. The data warehouse can be used by team executives, team coaches and bookies as a platform to analyze the performances of teams and make decisions that would offer them competitive advantage. The data warehouse also strives to coalesce business operation information such as customer information, merchandising and procurement information to help the executives correlate the on field performance of their teams with their business revenues and profitability.

To realize this objective, an enterprise wide data warehouse has been implemented that integrates the business processes encompassing both the sports information and the retail data to provide a bird’s eye view of data to key stakeholders. This goal was accomplished with seamless progress through multiple source data extractions, data staging, cube generation and developing extensive dashboards. The wide range of decision support can be used by the IPL team executives, coaches and bookies in a multitude of scenarios for better decision making. The implemented data warehouse would contain consistent, cleaned, transformed, and summarized information derived from multiple data sources. The report below will detail the processes followed in detail.

The recommendation to IPL would be to make effective use of the implemented data warehouse to improve match performance, perform sophisticated player analysis to propel team selection and opposition scouting and increase revenue from retail operations. It will help the stakeholders to extract maximum profitability from the league.

2

Page 3: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

2. INTRODUCTION Cricket is one of the most famous sports in the world, especially popular in the Commonwealth nations and South East Asia. Traditionally, the sport has been a long affair with matches lasting anywhere between one day to five days depending upon the format. At the dawn of 21 st century, a shortened version called Twenty 20 was introduced to combat the diminishing popularity of the sport and make it more entertaining to the fans. The new format soon became popular among the supporters, helped by the massive fanfare with which it was accepted in India, one of the growing economies and second most populated country in the world.

To take advantage of the format’s popularity, the Indian Premier League was founded by the Board of Control for Cricket in India (BCCI) in 2007 as the first franchise based T20 league in the world. It takes place annually during the months of April and May and is participated by eight teams from different parts of India. The league has a round robin format where each team plays each other home and away followed by knockouts matches.

Each team has a financial cap on the maximum amount it can spend on player’s salary. Every participating team has to follow the guidelines for player selection, where they are supposed to have a mix of local players and international stars. The players who are not contracted to any team can be bought in the IPL auctions that take place annually. This financial and logistical restrictions make it even more important to take sound business decisions and invest in the right talent.

IPL has also grown exponentially over the years to become one of the biggest sporting leagues in the world. As of 2016, The IPL is the most-attended cricket league in the world and ranks sixth among all sports leagues [5]. The brand value of IPL was estimated to be US$4.5 billion in 2015 by American Appraisal. It also contributes significantly to the Indian economy, generating nearly US$182 million in 2015 alone [6].

With India being a growing economy, many multinational corporations are starting their branches in India and diversifying their operations. Most of these companies view IPL as an ideal platform to launch their products and advertise nationally. The growing popularity and financial revenues IPL is generating has made it pivotal for the team owners and executives to align their business strategy with the team strategy and maximize their financial gains. The needs for a consolidated, consistent data warehouse that satisfies business intelligence needs has never been higher.

3

Page 4: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

3. BUSINESS CASE FOR DATA WAREHOUSEThere is a huge need for data warehouse in sports, especially in one as popular and as profitable as IPL. The league is one of the richest sporting tournaments in the world, thus making it an extremely competitive affair with the world’s best cricket players battling it out against each other. In a league with such cutthroat competition, making informed decisions with the data available is of utmost importance. The majority of sports teams do not have a data warehouse and information is fragmented across the numerous departments of the enterprise making them unable to process information in a timely matter.

Team executives need information to record the past, understand current circumstances and predict the future. In the current situation, each executive across the organization is struggling on their own to identify data analysis requirements and interpreting results obtained from non-integrated data sources. There is no single version of accurate information that could assist the executives and the team coaches in buying new players, renewing their contracts based on performance, predicting the individual and team performance.

Apart from the sporting aspects, the stakeholders are also in dark over how to understand and develop a sustainable growth model for their team. Even though there is information available about the merchandise sales, vendor transactions and customer preferences, no logical relationship has been established between this data and the sporting data. Hence, business leaders do not have enough relevant information to make decisions related to store expansions, product diversification, merchandise inventory etc. This inability is preventing the team executives from fully exploiting the brand value of their teams. It has led to a situation where the executives are either drowning in too much data with no option to analyze it or left with too little data [7]. It would provide an instant snapshot of the organizations performance wherever the executives are while allowing them to drill down into any level of detail, without waiting for staff to create one-off reports.

The data warehouse would also serve the needs of the team coaches and provide them the right tools to perform complex analysis of players and matches. Currently, most of the decisions are made based on either word of mouth reputation of the player or their recent performances. This approach has several drawbacks. It does not consider the suitability of a player for a particular team with regards to their playing style, how the player would perform in different venues or different stages of the match. Without these details, the team coaches are left with a haphazard approach to player management, impeding the overall performance of the team. The implementation of the data warehouse would allow them to slice and dice the data according to their whims, simplifying the decision making process to a great extent.

The other users who would benefit from the data warehouse implementation are bookies who run betting websites. While betting is considered a game of chance, there is certainly a lot of science and data crunching that goes behind the scenes. With a central data store that contains both atomic and aggregated data about players and matches, it would be possible for the bookies to forecast future trends corresponding to players and teams. Integration and aggregation of all key sources will drive the most thorough year-over-year insights and lead to more accurate

4

Page 5: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

forecasting of future results. This would enable them to set proper odds, thus increasing their revenues and profits.

To address all the above issues, it is rational to implement an enterprise wide data warehouse and provide ad hoc reporting tools as an extension. Some of the business questions we hope to answer through this implementation relates to both the sporting aspects and the business operations aspects of IPL:

1. Which Batsmen performs the best in the death overs by venue?2. How the player performed through all the seasons of IPL?3. Which team won the most number of matches at home across all seasons?4. How’s the overseas bowlers’ economy by country for a particular season?5. What is the Batsmen performance through various phases of a match?6. Which bowling style is most successful in the death overs?7. Which team is most successful after winning the toss?8. What is the trend in Jersey sales by player from 2008 to 2016?9. What is the profit of various brands who produce merchandise?10. What is Time analysis of merchandise sale of all teams?

5

Page 6: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

4. DATA WAREHOUSE ARCHITECTUREThe IPL Data Warehouse has been designed based on Kimball’s bottom up approach where the focus is on creating individual data marts first and then combining them into an organization wide data warehouse. In this approach, individually data marts are logically separated but are located within the same dimensional data warehouse [8]. This approach allows us to store both the atomic data and the aggregated data at the same location, making it optimized for both transactional data and reporting services [9]. Each data mart builds on the next, reusing dimensions and facts so users can query across data marts, if desired, to obtain a single version of the truth as well as both summary and atomic data.

Kimball’s approach consciously tries to minimize back-office operations, preferring to focus an organization’s effort on developing dimensional designs that meet end-user requirements. Since the most important requirement for our data warehouse is fast access to data for reporting and analysis, Kimball’s approach fits the bill flawlessly. The incremental approach also allows us to add further dimensions in the future to accommodate any growing changes [10]. The following diagram depicts the data warehouse for IPL:

6

Page 7: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Figure: Data Warehouse Architecture for IPL

Data Sources: The IPL has a host of data sources which will serve as the cornerstone for the data warehouse. They have extensive cricket data covering every match and every ball from the first tournament to the recently concluded one in 2016. These data sources, which are in Comma Separated Values and sqlite format, contain most of the data about the sporting aspects like Player measures, Match Results, runs scored off each ball etc. Other required data like the captains of each team, Team Merchandise information have been acquired through a meticulous web scrapping process.

Retail data such as Customer information, Vendor data and Sales Data could not be obtained for the IPL Merchandise. As a result, we have mostly used data from other similar datasets available on the Internet which closely resembles actual retail data.

Data Extraction: Since the data has been obtained from different sources, it was necessary to extract the data and perform transformations to provide a single version of data. This step was accomplished by applying a series of rules and functions to the data before loading it into the warehouse. The data from the sqlite database was exported and then converted into csv format so that it can be combined with the other data that we have acquired [11]. SSIS was used to extract the data from multiple sources and load it into a database created on the server, infodata.tamu.edu.

Data Transformation: Most of the data that we got from the data sources was consistent, making the data cleaning process a simple task. Some of the columns with null values had to be inspected and eliminated. Once the data was standardized into a single format, it was necessary to execute certain transformations to denormalize the data and make it suitable for business intelligence needs. These data transformations delivered us aggregate data like runs scored in various phases of the match, aggregate match results at different venues etc. The staging area also provided the platform to combine retail data with match performance which is crucial to address several business questions. Other transformation like keeping the date in a consistent format across all data sources had to be performed to ensure data consistency.

The SSIS functions that aided during the transformation and cleaning process include:

a. LOOKUP: It joins additional columns to the data flow by looking up values in a table.

b. DATA CONVERSION: This converts data from one data type to another.c. DERIVED COLUMNS: This creates new column values by applying expressions to

input columns.d. MERGE: This merges datasets from different source tables into a single output table.e. SORT: Arranges data set based on a key. This is done before merging to optimize the

process.

Data Loading: Once the data transformations are completed, it gives us the final dimension and fact tables that are ready to be loaded into the data warehouse.

7

Page 8: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Data Storage: The data warehouse is then segregated into multiple data marts that enable a specific function for each business process. The data marts are typically setup as an OLAP cubes optimized for querying large data sets for reporting and analytical purposes. OLAP cubes are multi-dimensional databases usually containing data specific for a particular function [12]. The functions in case of IPL data warehouse are primarily concentrated on 3 specific data marts that include Player, Match and Procurement. These data marts are primarily focused on providing only data that is relevant to the particular operation performed by the organization.

Reporting: The final component of the data warehouse would be providing reporting and analytical capabilities to users. For the IPL data warehouse, we have made use of Tableau and SQL Server Reporting Services(SSRS) to create provisions for ad hoc reports, complex queries, multidimensional analysis, statistical number crunching and data mining. These tools provide a range of out-of-the-box reports and dashboards that can be used by all the business users according to their requirements.

8

Page 9: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

5. DIMENSIONAL MODELAfter the transformation process, the data is stored in the data warehouse in a format that optimizes the analytical querying process. The core of this architecture are dimensions and facts. There are multiple conformed dimensions that are present across our business processes and the overlap is shown below:

Dim

_Tim

e

Dim

_Pla

yer

Dim

_Mat

ch

Dim

_Tea

m

Dim

_Pro

duct

Dim

_Cus

tom

er

Dim

_Ven

dor

Dim

_Sto

re

Fact_Bowling_Performance X X X XFact_Batting_Performance X X X X

Fact_Player X X XFact_Product_Sales X X X X X XFact__Procurement X X X X

Table: Data Warehouse Bus Matrix for IPL

9

Page 10: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Figure: Dimensional Data Model for IPL

(Enlarged picture attached)

6. ETL IMPLEMENTATIONThe ETL implementation for the various data sources was done on SQL Server Management Studio using the SQL Server Integration Services (SSIS) and SQL Server Data Tools for the Staging database. We identify the key fact and dimensions that we have created using SSIS.

6.1 Extraction and Import of Data from Source

6.1.1 IPL dataBefore we start of the SSIS processes, we had make sure that all the IPL data provided to us was denormalized. Therefore, we extracted the data provided and created the different tables in a CSV format and then imported them to our staging database. The main tables that were denormalized from the different sources has been shown here.

Figure: Importing IPL Data

10

Page 11: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

6.1.2 Retail DataTo find the sales trend of merchandises that are sold by each IPL franchise, we created data for the retail dimensions. Additional details were noted before creating the different retail sources. While the data was created, we kept in mind the performance of each team in every season and mapped each of the team performance with their product sales. We also considered the different star players in each team and accounted separately for merchandise sales that took place. The main retail tables that were created were Dim_Store, Dim_Customer, Dim_Product and Dim_Vendor

Figure: Importing Retail Data

6.2 Transforming the dataWe make use of a variety of transformation tools which are inbuilt in Excel to calculate, consolidate and create specific table related to Batting performance, Bowling performance and Player performance.

11

Page 12: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

6.2.1 To obtain Batsman data While calculating the batting data, we mainly made use of the Ball_by_Ball table present to calculate the runs scored by each of the batsmen in each season and throughout the IPL. We used a combination of aggregate and derived column transformation to find the total runs scored by a batsman and additionally, we even divided the match into 3 different phases namely PowerPlay overs (1-6), Middle overs (7-16) and Death overs (17- 20) using conditional split to calculate the wickets scored by players at crucial stages of the match. This makes it easy for the Team Coach and Management to obtain critical data while selecting batsman in a match.

Figure: Transforming Batsman Data

6.2.2 To obtain Bowler DataWhile calculating the bowling data, we mainly made use of the Ball_by_Ball table present to calculate the wickets taken by a bowler in each season and throughout the IPL. We used a combination of aggregate, summation, Pivot and derived column transformation to find the total

12

Page 13: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

wickets taken by a bowler and additionally, we even divided the match into 3 different phases namely PowerPlay overs (1-6), Middle overs (7-16) and Death overs (17- 20) using conditional split to calculate the wickets taken by the bowler at crucial stages of the match. This makes it easy for the Team Coach and Management to obtain critical data while selecting bowlers in a match.

Figure: Transforming Bowler Data

6.2.3 To obtain Player dataWhile calculating the player performance, we mainly made use of the Ball_by_Ball and Player table present to calculate the strike rate and economy of players in each season and throughout the IPL. We used a combination of aggregate and derived column transformation to find This makes it easy for the Team Coach and Management to obtain critical data while selecting bowlers in a match. Also, this data will make the Management decide on future players based on the needs of the current team.

13

Page 14: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Figure: Transforming Player Data

6.3 Loading the Data into the Data WarehouseAfter extraction of data from the source and transforming them in the staging database, we then proceed to loading of the data into the Data Warehouse. We use the data warehouse for all our reporting and data analysis processes and it is the core component of the Business Intelligence System. The data warehouse consists a various facts and dimensions arranged in the form of a star schema. For the data warehouse the different dimensions created are Dim_Time, Dim_Player, Dim_Match, Dim_Product, Dim_Store, Dim_Vendor, Dim_Customer, Dim_Product and Dim_Team.

6.3.1 Loading the Time Dimension (Dim_Time)The time dimension consolidates all the times available from different tables across multiple sources to form a single Dim_Time dimension. We made use of the derived column function in SSIS to obtain the date and time from the different tables and merge them to have a single timeline across the different fact tables.

14

Page 15: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

. Figure: Loading Time Dimension

6.3.2 Loading the Match dimension (Dim_Match)From the different tables present in our staging database, we now start creating our data warehouse dimensions. The dimension are the building blocks of our data warehouse and we can base our BI question based on the attributes present in the dimension. In this case, we are creating a Match dimension by utilizing the data from the staging tables Match and Ball_by_Ball. As we can see from our dimensional model, the Match Dimension consists of a variety of information starting from the date of the match, teams and details regarding the outcome of the match and win type.

15

Page 16: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Figure: Loading Match Dimension

6.3.3 Loading the retail dimensionsAfter creating the different dimension, we need for the player statistics. We then move on to the retails aspect of IPL. Here we import all the retail dimension from our staging database. No transformation is needed in this case as we created the data for the tables from scratch and they are already clean.

16

Page 17: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Figure: Loading retail dimensions

6.3.4 Creating and Loading the Fact tables:The final step of creating a data warehouse is the creation of fact table from the different dimensions. A fact table works with the dimension tables to hold the data to be analyzed. IT consists mainly of foreign key columns and measure. The foreign key allows joins with the different dimension tables and measures contain the data being analyzed. In the examples shown below we have shown 2 of the main fact tables present as part of the data warehouse. The player fact table Fact_Player_Performance as shown in the dimensional model holds the measures that account for player performance and is also connected to multiple dimensions such as the Dim_Player, Dim_Time, Dim_Team and Dim_Match. The procurement fact table Fact_Product_Procurement is connected to multiple retail dimensions such as Dim_Time, Dim_Product, Dim_Store and Dim_Vendor. The measure for the procurement fact table are useful in understanding he various trends in the retail market place.

17

Page 18: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

6.3.5 Creating the Player fact table

Figure: Creating Player Fact table

6.3.6 Creating the Procurement Fact table

Figure: Creating Procurement Fact table

18

Page 19: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

7. METADATA DOCUMENTATION

7.1 Fact Meta Data

Fact Table Meta Data DescriptionName of Fact Table Fact_Bowling_Performance

Business Definition This table stores various metrics to track each bowler’s performance

Alias Bowling Performance Table

Grain Bowler performance in each match

Load Frequency Daily

Data Quality Check for incorrect characters, redundant rows and inconsistent rows, Check for null values

Key {Match_ID, Player_ID, Date_ID, Team_ID}

Key Generation Method Composite Key (All the foreign keys are concatenated to create a primary key for the fact table.)

Number_of_Wickets AdditiveNumber_of_Runs AdditiveExtras_given AdditiveEconomy Non - additive

Measures Wickets_in_powerplay AdditiveWickets_in_middle AdditiveWickets_in_death AdditiveNumber_of_overs_bowled AdditiveNumber_of_lbw AdditiveNumber_of_wk_catch AdditiveNumber_of_catch_wickets AdditiveRuns_in_powerplay AdditiveRuns_in_middle AdditiveRuns_in_death Additive

Dimensions

Dim_TimeDim_PlayerDim_MatchDim_Team

Contact Person Mr. Anantha Bolar

Fact Table Meta Data Description

19

Page 20: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Name of Fact Table Fact_Batting_PerformanceBusiness Definition This table stores various metrics to track each batsman’s performance Alias Batting Performance TableGrain Batsman performance in each matchLoad frequency Daily

Data Quality Check for incorrect characters, redundant rows and inconsistent rows, Check for Null values

Key {Match_ID, Player_ID, Date_ID, Team_ID}

Key Generation Method Composite Key (All the foreign keys are concatenated to create a primary key for the fact table.)

Measures

Strike_Rate Non - additiveRuns_Scored AdditiveNumber_of_fours AdditiveNumber_of_sixes AdditiveRuns_in_powerplay AdditiveRuns_in_middle AdditiveRuns_in_death Additive

Conformed Fact Yes

Dimensions

Dim_TimeDim_PlayerDim_MatchDim_Team

Contact Person Mr. Anantha Bolar

Fact Table Meta Data DescriptionName of Fact Table Fact_Product_Sales Business Definition Every invoice generated is stored in this tableAlias Sales TransactionGrain Each sale at the retail storeLoad frequency DailyData Quality Check for incorrect characters, redundant rows and inconsistent rowsKey {Date_ID, Player_ID, Team_ID, Customer_ID, Product_ID, Store_ID}

Key Generation Method Composite Key (All the foreign keys are concatenated to create a primary key for the fact table.)

Measures

Sales_Quantity MeasuresSales_AmountCost_AmountDiscountGross_Profit_Amount

Conformed Fact Yes Conformed Fact

Dimensions Dim_Time Dimensions

20

Page 21: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Dim_TeamDim_PlayerDim_CustomerDim_ProductDim_Store

Degenerate Dimensions Order_No

Contact Person Mr. Anantha Bolar

Name of Fact Table Fact_Product_Sales

Fact Table Meta Data DescriptionName of Fact Table Fact_ProcurementBusiness Definition Every order for product procurement is stored in this tableAlias Vendor_TransactionGrain Each order fulfilled by a vendor to a store ownerLoad frequency WeeklyData Quality Check for incorrect characters, redundant rows and inconsistent rowsKey {Date_ID, Product_ID, Vendor_ID, Store_ID}

Key Generation Method Composite Key (All the foreign keys are concatenated to create a primary key for the fact table.)

FactsOuantity_Procured AdditivePurchase_Amount Additive

Conformed Fact Yes

Dimensions

Dim_StoreDim_VendorDim_ProductDim_Time

Contact Person Mr. Anantha Bolar

Fact Table Meta Data DescriptionName of Fact Table Fact_Player Business Definition Every player’s key Performance metrics are stored in this table

21

Page 22: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Alias Player PerformanceGrain Each Player who has played a matchLoad frequency Daily

Data Quality Check for incorrect characters, redundant rows and inconsistent rows, Check for null values

Key {Date_ID, Player_ID, Match_ID}

Key Generation Method Composite Key (All the foreign keys are concatenated to create a primary key for the fact table.)

FactsRuns_scored AdditiveWickets_taken AdditiveBatting_avg Non - additiveBowling_avg Non - additiveBatting_strikerate Non - additiveBowling_strikerate Non - additiveNumber_of_catches_taken AdditiveMan_of_the_matches Additive

Conformed Fact Yes

Dimensions

Dim_TimeDim_PlayerDim_Match

Contact Person Mr. Anantha Bolar

7.2 Dimension Metadata

Dimension meta data DescriptionDimension Name Dim_TimeBusiness Definition Lists different timespans of interestAlias N/AHierarchy BalancedSCD NoLoad Frequency Daily

Data Quality Clean Data

Primary Key {DateID}

Key Generation MethodKey is generated when a new date is added to the table.

Conformed Yes. The attribute remains same across all fact tables

Role PlayingNo

Name of the Attribute: Date_ID

22

Page 23: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Definition of the Attribute: Uniquely identifies each time element in the tableChange rules for the Attribute: Type - 2

Attribute 1 Data Type for the Attribute: INTDomain values for the Attribute: 0-9Derivation rules for the Attribute: Autogenerated (+1)Name of the Attribute: Date

Attribute 2

Definition of the Attribute: Date as per recordChange rules for the Attribute: Type - 2

Data Type for the Attribute: DATEDomain values for the Attribute: 20 Characters

Attribute 3

Name of the Attribute: Day_of_weekDefinition of the Attribute: Specifies the day of the weekAlias of the Attribute: N/AChange rules for the Attribute: Type - 2Data Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Attribute 4

Name of the Attribute: Month_of_yearDefinition of the Attribute: Specifies the month in that calendar yearChange rules for the Attribute: Type - 2Data Type for the Attribute: INT

Domain values for the Attribute: 0-9

Attribute 5

Name of the Attribute: YearDefinition of the Attribute: Number of the yearChange rules for the Attribute: Type - 2Data Type for the Attribute: INT

Domain values for the Attribute: 0-9

Attribute 6 Name of the Attribute: Quarter_of_yearDefinition of the Attribute: Specifies the quarter in that calendar yearChange rules for the Attribute: Type - 2Data Type for the Attribute:

INT

23

Page 24: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Domain values for the Attribute: 1-4

FactsFact_Bowling_Performance, Fact_Batting_Performance, Fact_Product_Sales, Fact_Procurement, Fact_Player

Contact Person Mr. Anantha Bolar

Dimension meta data DescriptionDimension Name Dim_PlayerBusiness Definition Lists personal details of each playerAlias N/AHierarchy BalancedChange Rule Type - 1Load Frequency Weekly

Data Quality Check for incorrect characters, redundant rows and inconsistent rows, Check for Null values

Key {Player_ID}

Key Generation MethodKey is generated when a new player is added to the table.

Conformed Yes Role Playing No

Name of the Attribute: Player_IDDefinition of the Attribute: Uniquely identifies each player in the tableChange rules for the Attribute: Type - 1

Attribute 1 Data Type for the Attribute: INTDomain values for the Attribute: 0-9Derivation rules for the Attribute: Autogenerated (+1)Name of the Attribute: Player_Name

Attribute 2

Definition of the Attribute: Name of the PlayerChange rules for the Attribute: N/A

Data Type for the Attribute: VarcharDomain values for the Attribute: 50 Characters

Attribute 3

Name of the Attribute: Team_NameDefinition of the Attribute: Specifies the name of the teamAlias of the Attribute: N/AChange rules for the Type - 1

24

Page 25: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Attribute:Data Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Attribute 4

Name of the Attribute: Date_of_birthDefinition of the Attribute: Specifies the date on which the player was bornChange rules for the Attribute: N/AData Type for the Attribute: DATE

Domain values for the Attribute: 20 Characters

Attribute 5

Name of the Attribute: CountryDefinition of the Attribute: Specifies the nationality of the playerChange rules for the Attribute: Type - 1Data Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Attribute 6

Name of the Attribute: Bowling_styleDefinition of the Attribute: Specifies the kind of bowling approach adoptedChange rules for the Attribute: NAData Type for the Attribute: Varchar

Domain values for the Attribute: 50 CharactersName of the Attribute: Batting_styleDefinition of the Attribute: Specifies the kind of batting approach adopted

Attribute 7 Change rules for the Attribute: NAData Type for the Attribute: VarcharDomain values for the Attribute: 50 Characters

FactsFact_Bowling_Performance, Fact_Batting_Performance, Fact_Product_Sales, Fact_Player

Contact Person Mr. Anantha Bolar

Dimension meta data DescriptionDimension Name Dim_TeamBusiness Definition Lists details of each teamAlias N/AHierarchy Balanced

25

Page 26: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Change Rule Type - 1Load Frequency WeeklyData Quality Clean Data

Key {Team_ID}Key Generation Method Key is generated when a new team is added to the table.Conformed Yes

Role Playing NoName of the Attribute: Team_IDDefinition of the Attribute: Uniquely identifies each team in the tableChange rules for the Attribute: Type - 1

Attribute 1 Data Type for the Attribute: INTDomain values for the Attribute: 0-9Derivation rules for the Attribute: Autogenerated (+1)Name of the Attribute: Team_Name

Attribute 2

Definition of the Attribute: Specifies the name of the teamChange rules for the Attribute: N/A

Data Type for the Attribute: VarcharDomain values for the Attribute: 50 Characters

Attribute 3

Name of the Attribute: Home_venueDefinition of the Attribute: Specifies the home ground of the teamAlias of the Attribute: N/AChange rules for the Attribute: N/AData Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Attribute 4

Name of the Attribute: Team_CaptainDefinition of the Attribute: Specifies the captain of the teamChange rules for the Attribute: Type - 2Data Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Attribute 5 Name of the Attribute: Captain_Start_YearDefinition of the Attribute: Specifies year in which captain assumed responsibility

26

Page 27: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Change rules for the Attribute: N/AData Type for the Attribute: INT

Domain values for the Attribute: 0-9

Attribute 6

Name of the Attribute: Captain_End_YearDefinition of the Attribute: Specifies year in which captain relinquished responsibilityChange rules for the Attribute: N/AData Type for the Attribute: INT

Domain values for the Attribute: 0-9

FactsFact_Bowling_Performance, Fact_Batting_Performance, Fact_Product_Sales

Contact Person Mr. Anantha Bolar

Dimension meta data DescriptionDimension Name Dim_ProductBusiness Definition Lists details of each product/merchandise soldAlias N/AHierarchy BalancedChange Rule Type - 1Load Frequency WeeklyData Quality Clean DataKey {Product_ID}Key Generation Method Key is generated when a new product is added to the table.Conformed YesRole playing No

Name of the Attribute: Product_IDDefinition of the Attribute: Uniquely identifies each product in the tableChange rules for the Attribute: Type - 1

Attribute 1 Data Type for the Attribute: INTDomain values for the Attribute: 0-9Derivation rules for the Attribute: Autogenerated (+1)Name of the Attribute: Product_Name

Attribute 2 Definition of the Attribute: Name of the ProductChange rules for the N/A

27

Page 28: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Attribute:Data Type for the Attribute: VarcharDomain values for the Attribute: 50 Characters

Attribute 3

Name of the Attribute: Product_BrandDefinition of the Attribute: Specifies the brand name of the productAlias of the Attribute: N/AChange rules for the Attribute: N/AData Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Attribute 4

Name of the Attribute: Cost_priceDefinition of the Attribute: Specifies the cost of the productChange rules for the Attribute: N/AData Type for the Attribute: Currency

Domain values for the Attribute: 0-9

Attribute 5

Name of the Attribute: DiscountDefinition of the Attribute: Specifies the discount available on the productChange rules for the Attribute: N/AData Type for the Attribute: INT

Domain values for the Attribute: 0-9

Attribute 6

Name of the Attribute: Product_CategoryDefinition of the Attribute: Specifies the product line of the merchandiseChange rules for the Attribute: NAData Type for the Attribute: Varchar

Domain values for the Attribute: 50 CharactersName of the Attribute: Vendor_NameDefinition of the Attribute: Specifies the name of the vendor supplying the product

Attribute 7 Change rules for the Attribute: NAData Type for the Attribute: VarcharDomain values for the Attribute: 50 CharactersName of the Attribute: Team_NameDefinition of the Attribute: Specifies the name of the team

28

Page 29: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Attribute 8 Change rules for the Attribute: N/A

Data Type for the Attribute: VarcharDomain values for the Attribute: 50 Characters

Name of the Attribute: Player_NameDefinition of the Attribute: Name of the Player

Attribute 9 Change rules for the Attribute: N/A

Data Type for the Attribute: VarcharDomain values for the Attribute: 50 Characters

Facts Fact_Product_Sales, Fact_ProcurementContact Person Mr. Anantha Bolar

Dimension meta data DescriptionDimension Name Dim_StoreBusiness Definition Lists details of each storeAlias N/AHierarchy BalancedChange Rule Type - 1Load Frequency WeeklyData Quality Clean DataKey {Store_ID}Key Generation Method Key is generated when a new store is added to the table.Conformed YesRole Playing No

Name of the Attribute: Store_IDDefinition of the Attribute: Uniquely identifies each store in the tableChange rules for the Attribute: Type - 1

Attribute 1 Data Type for the Attribute: INTDomain values for the Attribute: 0-9Derivation rules for the Attribute: Autogenerated (+1)Name of the Attribute: Store_Address

Attribute 2 Definition of the Attribute: Specifies the street where the store is locatedChange rules for the Attribute:

N/A

29

Page 30: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Data Type for the Attribute: VarcharDomain values for the Attribute: 50 Characters

Attribute 3

Name of the Attribute: Store_CityDefinition of the Attribute: Specifies the city where the store is locatedAlias of the Attribute: N/AChange rules for the Attribute: N/AData Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Attribute 4

Name of the Attribute: Store_StateDefinition of the Attribute: Specifies the state where the store is locatedChange rules for the Attribute: N/AData Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Attribute 5

Name of the Attribute: Store_ZipDefinition of the Attribute: Specifies zip code of the area where the store is locatedChange rules for the Attribute: N/AData Type for the Attribute: INT

Domain values for the Attribute: 0-9

Attribute 6

Name of the Attribute: Store_Phone_NoDefinition of the Attribute: Specifies contact phone number of the storeChange rules for the Attribute: N/AData Type for the Attribute: INT

Domain values for the Attribute: 0-9Name of the Attribute: Store_email_idDefinition of the Attribute: Specifies contact email address of the store

Attribute 7 Change rules for the Attribute: N/AData Type for the Attribute: VarcharDomain values for the Attribute: 50 Characters

Facts Fact_Product_Sales, Fact_ProcurementContact Person Mr. Anantha Bolar

30

Page 31: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Dimension meta data DescriptionDimension Name Dim_Match

Business Definition Lists details of each match

Alias N/A

Hierarchy Balanced

Change Rule Type - 1

Load Frequency Weekly

Data Quality Check for incorrect characters, redundant rows and inconsistent rows, Check for Null values

Key {Match_ID}

Key Generation Method Key is generated when a new match is added to the table.

Conformed Yes

Role playing No

Name of the Attribute: Match_ID

Definition of the Attribute: Uniquely identifies each match in the table

Change rules for the Attribute: Type - 1

Attribute 1 Data Type for the Attribute: INTDomain values for the Attribute: 0-9Derivation rules for the Attribute: Autogenerated (+1)

Name of the Attribute: Match_Date

Attribute 2

Definition of the Attribute: Specifies the date on which the game is playedChange rules for the Attribute: N/A

Data Type for the Attribute: DateDomain values for the Attribute: 20 Characters

Name of the Attribute: Host_TeamDefinition of the Attribute: Specifies the name of the home team

31

Page 32: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Attribute 3

Alias of the Attribute: N/AChange rules for the Attribute: N/AData Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Attribute 4

Name of the Attribute: Away_Team

Definition of the Attribute: Specifies the name of the away team

Change rules for the Attribute: N/AData Type for the Attribute: VarcharDomain values for the Attribute: 50 Characters

Attribute 5

Name of the Attribute: Winning_Team

Definition of the Attribute: Specifies the name of the winning team

Change rules for the Attribute: N/AData Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Attribute 6

Name of the Attribute: Toss_won_by

Definition of the Attribute: Specifies the team that won the toss for that game

Change rules for the Attribute: NAData Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Name of the Attribute: Toss_decision

Definition of the Attribute: Specifies the decision taken by the toss winner to bat or field

Attribute 7 Change rules for the Attribute: NAData Type for the Attribute: VarcharDomain values for the Attribute: 50 Characters

Name of the Attribute: First_Innings_Score

Definition of the Attribute: Specifies the target set by the team batting first

32

Page 33: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Attribute 8 Change rules for the Attribute: N/A

Data Type for the Attribute: INTDomain values for the Attribute: 0-9

Name of the Attribute: Second_Innings_Score

Definition of the Attribute: Specifies the runs scored by the team batting second

Attribute 9 Change rules for the Attribute: N/A

Data Type for the Attribute: INTDomain values for the Attribute: 0-9

Name of the Attribute: Total_Runs

Definition of the Attribute: Sum of the runs scored in a game by both teams

Attribute 10 Change rules for the Attribute: N/A

Data Type for the Attribute: INT

Domain values for the Attribute: 0-9

Name of the Attribute: Win_margin

Definition of the Attribute: Difference in scores by which the team won

Attribute 11 Change rules for the Attribute: N/A

Data Type for the Attribute: INT

Domain values for the Attribute: 0-9

Name of the Attribute: Total_Wickets

Definition of the Attribute: Sum of the wickets that fell in a game

Attribute 12 Change rules for the Attribute: N/A

Data Type for the Attribute: INT

Domain values for the Attribute: 0-9

Name of the Attribute: Result_type

Definition of the Attribute:Indicates the manner in which the winning team defeated the opposition

Attribute 13 Change rules for the Attribute: N/A

Data Type for the Varchar

33

Page 34: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Attribute:Domain values for the Attribute: 50 Characters

FactsFact_Bowling_Performance, Fact_Batting_Performance, Fact_Player

Contact Person Mr. Anantha Bolar

Dimension meta data Description

Dimension Name Dim_Customer

Business Definition Lists details of each customer

Alias N/A

Hierarchy Balanced

Change Rule Type - 1

Load Frequency Weekly

Data Quality Clean Data

Key {Customer_ID}

Key Generation Method Key is generated when a new customer is added to the table.

Conformed Yes

Role playing No

Name of the Attribute: Customer_ID

Definition of the Attribute: Uniquely identifies each customer in the table

Change rules for the Attribute: Type - 1

Attribute 1 Data Type for the Attribute: INTDomain values for the Attribute: 0-9Derivation rules for the Attribute: Autogenerated (+1)

Name of the Attribute: First_NameAttribute 2

Definition of the Attribute: Specifies the first name of the customerChange rules for the Attribute: N/A

Data Type for the Attribute:

Varchar

34

Page 35: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Domain values for the Attribute: 50 Characters

Attribute 3

Name of the Attribute: Last_Name

Definition of the Attribute: Specifies the last name of the customer

Alias of the Attribute: N/A

Change rules for the Attribute: Type-1Data Type for the Attribute: VarcharDomain values for the Attribute: 50 Characters

Attribute 4

Name of the Attribute: Customer_Address

Definition of the Attribute: Specifies the street where the customer is locatedChange rules for the Attribute: N/A

Data Type for the Attribute: VarcharDomain values for the Attribute: 50 Characters

Attribute 5

Name of the Attribute: Customer_City

Definition of the Attribute: Specifies the city where the customer is locatedChange rules for the Attribute: N/AData Type for the Attribute: VarcharDomain values for the Attribute: 50 Characters

Attribute 6

Name of the Attribute: Customer_State

Definition of the Attribute: Specifies the state where the customer is located

Change rules for the Attribute: N/AData Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Name of the Attribute: Customer_Zip

Definition of the Attribute: Specifies zip code of the area where the customer is located

Attribute 7 Change rules for the Attribute: N/AData Type for the Attribute: INT

35

Page 36: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Domain values for the Attribute: 0-9

Name of the Attribute: Customer_Phone_No

Definition of the Attribute: Specifies contact phone number of the customer

Attribute 8 Change rules for the Attribute: N/AData Type for the Attribute: INT

Domain values for the Attribute: 0-9

Name of the Attribute: Customer_email_id

Definition of the Attribute: Specifies contact email address of the customer

Attribute 9 Change rules for the Attribute: N/AData Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Name of the Attribute: Customer_Street_no

Definition of the Attribute: Specifies the street number where the customer is located

Attribute 10 Change rules for the Attribute: N/A

Data Type for the Attribute: INTDomain values for the Attribute: 0-9

Name of the Attribute: Customer_Street_Name

Definition of the Attribute: Specifies the street name where the customer is located

Attribute 11 Change rules for the Attribute: N/A

Data Type for the Attribute: VarcharDomain values for the Attribute: 50 Characters

Name of the Attribute: Cust_Age

Definition of the Attribute: Specifies the age of the customer

Attribute 12 Change rules for the Attribute: N/A

Data Type for the Attribute: INT

Domain values for the Attribute: 0-9

36

Page 37: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Name of the Attribute: Cust_GenderDefinition of the Attribute: Indicates the sex of the customer

Attribute 13 Change rules for the Attribute: N/A

Data Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Name of the Attribute: Cust_WeightDefinition of the Attribute: Indicates weight of the customer

Attribute 14 Change rules for the Attribute: N/A

Data Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Name of the Attribute: Cust_HeightDefinition of the Attribute: Indicates height of the customer

Attribute 15 Change rules for the Attribute: N/A

Data Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Name of the Attribute: Cust_EducationDefinition of the Attribute: Indicates literacy levels of the customer

Attribute 16 Change rules for the Attribute: N/A

Data Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Name of the Attribute: Cust_IncomeDefinition of the Attribute: Indicates the income of the customer

Attribute 17 Change rules for the Attribute: N/A

Data Type for the Attribute: INT

Domain values for the Attribute: 0-9

Facts Fact_Product_Sales

Contact Person Mr. Anantha Bolar

Dimension meta data Description

37

Page 38: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Dimension Name Dim_VendorBusiness Definition Lists details of each vendorAlias N/AHierarchy BalancedChange Rule Type - 1Load Frequency Weekly

Data Quality Clean Data

Key {Store_ID}Key Generation Method Key is generated when a new vendor is added to the table.Conformed Yes Role Playing No

Name of the Attribute: Vendor_IDDefinition of the Attribute: Uniquely identifies each vendor in the tableChange rules for the Attribute: Type - 1

Attribute 1 Data Type for the Attribute: INTDomain values for the Attribute: 0-9Derivation rules for the Attribute: Autogenerated (+1)Name of the Attribute: Vendor_Address

Attribute 2

Definition of the Attribute: Specifies the street where the Vendor is locatedChange rules for the Attribute: N/A

Data Type for the Attribute: VarcharDomain values for the Attribute: 50 Characters

Attribute 3

Name of the Attribute: Vendor_CityDefinition of the Attribute: Specifies the city where the Vendor is locatedAlias of the Attribute: N/AChange rules for the Attribute: N/AData Type for the Attribute: Varchar

Domain values for the Attribute: 50 Characters

Attribute 4 Name of the Attribute: Vendor_StateDefinition of the Attribute: Specifies the state where the vendor is locatedChange rules for the Attribute: N/AData Type for the Attribute:

Varchar

38

Page 39: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Domain values for the Attribute: 50 Characters

Attribute 5

Name of the Attribute: Vendor_ZipDefinition of the Attribute: Specifies zip code of the area where the vendor is locatedChange rules for the Attribute: N/AData Type for the Attribute: INT

Domain values for the Attribute: 0-9

Attribute 6

Name of the Attribute: Vendor_Phone_NoDefinition of the Attribute: Specifies contact phone number of the VendorChange rules for the Attribute: N/AData Type for the Attribute: INT

Domain values for the Attribute: 0-9Name of the Attribute: Vendor_email_idDefinition of the Attribute: Specifies contact email address of the Vendor

Attribute 7 Change rules for the Attribute: N/AData Type for the Attribute: VarcharDomain values for the Attribute: 50 CharactersName of the Attribute: Vendor_NameDefinition of the Attribute: Specifies name of the Vendor

Attribute 8 Change rules for the Attribute: N/AData Type for the Attribute: VarcharDomain values for the Attribute: 50 Characters

Facts Fact_Procurement Contact Person Mr. Anantha Bolar

39

Page 40: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

8. BI REPORTSThere are 2 categories of BI reports that can be generated from the IPL data warehouse:

i. For Cricket Related Analysis: These reports are solely related to sporting information like player performance, match by match analysis, venue statistics etc.

ii. For Business Operations Analysis: These reports can be used to combine cricket data and retail data to understand the customer shopping trends, procurement forecasts etc.

8.1 Cricket Related Analysis

Business Question 1: Which Batsmen performs the best in the death overs by venue?

40

Page 41: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Figure: Batsmen performance during death overs at different venues

We can analyze the batsmen performance in the death overs by various venues. This will give the team manager information to decide which batsmen performs best under pressure. There are filters available in the dashboard to select the players and venue. We have selected batsmen from Kolkata Knight Rider team. As we can see from the above report Gautam Gambhir has scored the most runs in the death overs.

Business Question 2: How the player performed through all the seasons of IPL?

41

Page 42: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Figure: Player Performance across different IPL seasons

We have analyzed the data by using the Player Name filter as Virat Kohli. The dark blue line indicates the total run Kohli scored each year in IPL season. As we can see his performance and total runs have increase constantly over the years. The estimated value is represented by the light blue line, which predict his run in 2016. The historical data has R-square of 0.4. Thus, the past performance is a good variable to indicate future performance. This report can be used by the owner to see if the player is worth investing in.

Business Question 3: Which team won the most number of matches at home across all seasons?

42

Page 43: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Figure: Performance of teams at home venue

This question reveals critical information about the home ground advantage. As expected, the home team knows the ground better than other teams and have an advantage. This visual representation indicates which team won most number of matches at home. The darkness of blue and size of the box indicates the number of matches won. It is evident that Chennai Super Kings had the most home ground advantage. This report can be used by the bookie, to place safer bets.

Business Question 4: How’s the overseas bowlers’ economy by country for a particular season?

43

Page 44: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Figure: Overseas Bowler’s Economy by Country

This thematic map represents the average bowlers’ economy by country of overseas players. Since there is a cap on number of overseas players allowed in IPL, this report can be used by the team owners while selecting overseas players for the team. The scale goes from gold to dark green on the bowlers’ economy. Thus, the bowlers from Srilanka have the best average economy.

Business Question 5: What is the Batsmen performance through various phases of a match?

44

Page 45: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Figure: Batsmen performance during various phases of the match

The user can use the filter to select player’s name. The graph represents Virat Kohli’s performance in various phases of match across all IPL seasons. The light blue points indicate the forecasted runs Kohli is expected to score based on the historical data. This report can be used by the Team manager and/ or Team owner to analyze the performance of the player.

Business Question 6: Which bowling style is most successful in the death overs?

45

Page 46: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Figure: Success of bowling styles during death overs

This bar chart represents the most successful bowling style in the death overs. This report can be used by the coach and captain to decide which bowler should bowl if the match is critical. The most successful bowling style which took most number of wickets in death over is Right arm medium.

Business Question 7: Which team is most successful after winning the toss?

46

Page 47: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Figure: Success rate of teams after winning toss

This report can be used by the bookie to place and take bets on a particular team based on the result of the toss. The filters can be used to select particular teams or seasons. The data mapped above represents that Chennai Super Kings is most likely to win a match if they have won the toss.

47

Page 48: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

8.2 Business Operations Analysis

Business Question 8: What is the trend in Jersey sales by player from 2008 to 2016?

Figure: Jerseys sales by Player

This report can be used by the various brands to see which player’s jersey is sold most across all seasons, and how it’s related to the performance of the player. The user can select the player name in the dashboard. The various color on the graph represents different players.

48

Page 49: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Business Question 9: What is the profit of various brands who produce merchandise?

Figure: Profits of brands producing merchandise

The above line graph represents the profit of the brands selling the merchandise of the teams. The report also displays the forecasted value for profit in coming years. This report can be used by the brand owners to keep a check on the profit. Based on the report, Reebok had the highest profit across all seasons.

49

Page 50: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Business Question 10: What is Time analysis of merchandise sale of all teams?

Figure: Merchandise Sales of all teams

This line graph represents the time analysis of the merchandise sale of all teams. This report can be used by the Team and brand owners to see the trend in sale of merchandise. This can also be related to the performance of the team. The various colors represent the IPL teams. According to this report, Kolkata Knight Rider had the highest quantity of merchandise sold.

50

Page 51: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

8.3 Additional Report Generated using SSRS

Business Question 11: What is the correlation between a team’s performance and its merchandise sales?

Figure: Correlation between team’s on field performance and merchandise sales

The above report depicts how a team’s performance impacts its off field revenue generation activities like merchandise sales over the last 9 seasons. From this report, we can see that Kolkota Knight Riders has the highest sales and profits for the majority of the 9-year span, although they have fared well on the field only from the 5 th season of IPL. This report can be used by the team owners to make strategic decisions on how to maximize their revenues from merchandise.

51

Page 52: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

9. GLOSSARY OF TERMS

1. Aggregation: One way of speeding up query performance. Facts are summed up for selected dimensions from the original fact table. The resulting aggregate table will have fewer rows, thus making queries that can use them go faster.

2. Attribute: Attributes represent a single type of information in a dimension. For example, year is an attribute in the Time dimension.

3. Alias: An alias is a short substitute or nickname for a table name.

4. Conformed Dimension: A dimension that has exactly the same meaning and content when being referred to from different fact tables.

5. Data Mart: Data marts have the same definition as the data warehouse (see below), but data marts have a more limited audience and/or data content.

6. Data Warehouse: A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process (as defined by Bill Inmon).

7. Data Warehousing: The process of designing, building, and maintaining a data warehouse system.

8. Degenerate dimension: A degenerate dimension is a dimension key, such as an invoice number or a ticket number that has no attributes and hence has no actual dimension table.

9. Dimension: The same category of information. For example, year, month, day, and week are all part of the Time Dimension.

10. Dimensional Model: A type of data modeling suited for data warehousing. In a dimensional model, there are two types of tables: dimensional tables and fact tables. Dimensional table records information on each dimension, and fact table records all the "fact", or measures.

11. Dimensional Table: Dimension tables store records related to this particular dimension. No facts are stored in a dimensional table.

12. Drill Across: Data analysis across dimensions.

13. Drill Down: Data analysis to a child attribute.

14. Drill Through: Data analysis that goes from an OLAP cube into the relational database.

15. Drill Up: Data analysis to a parent attribute.

52

Page 53: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

16. ETL: Stands for Extraction, Transformation, and Loading. The movement of data from one area to another.

17. Fact Table: A type of table in the dimensional model. A fact table typically includes two types of columns: fact columns and foreign keys to the dimensions.

18. Hierarchy: A hierarchy defines the navigating path for drilling up and drilling down. All attributes in a hierarchy belong to the same dimension.

19. Metadata: Data about data. For example, the number of tables in the database is a type of metadata.

10. REFERENCES

[1]https://www.umbel.com/blog/sports/5-burning-questions-sports-marketers-have-about-fan-data/

[2] http://www.rpi.edu/datawarehouse/docs/Data-Warehouse-White-Paper.pdf

[3] http://cricbetlive.co.uk/2015/08/27/indian-premier-league-schedule-ipl-2016-fixtures/

[4] http://www.espncricinfo.com/india/content/story/374805.html

[5]http://www.smh.com.au/sport/cricket/big-bash-league-jumps-into-top-10-of-most-attended-sports-leagues-in-the-world-20160110-gm2w8z.html

[6]http://www.thehindu.com/sport/cricket/2015-indian-premier-league-ipl-contributed-rs115-billion-12-million-to-indias-gross-domestic-product-gdp-says-bcci/article7823334.ece

[7]https://www.umbel.com/blog/sports/5-burning-questions-sports-marketers-have-about-fan-data/

[8]http://www.jamesserra.com/archive/2012/03/data-warehouse-architecture-kimball-and-inmon-methodologies/

[9] http://tdan.com/data-warehouse-design-inmon-versus-kimball/20300

[10] http://www.bi-bestpractices.com/view-articles/4770

[11] http://www.dataintegration.info/etl

[12] http://www.solverglobal.com/blog/2014/04/data-warehouse-vs-olap-cube/

53

Page 54: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

CONTRIBUTIONS OF EACH TEAM MEMBER

Collective work as a group:

Collecting the Cricket data and preparing retail data. Finalizing BI questions that the data warehouse will address. Drafting and Finalizing the dimensional model for the data set. Documentation for the respective tasks performed.

Individual Contributions:

Student Staging Data warehouse Reporting Satish Kaladagimath

Denormalizing the data provided and creating source tables and staging databases

ETLTransfomations and loading the dimensions Dim_Match from Staging to data warehouse.

Loading the Facts Fact_Bowling_performance from staging to data warehouse.

Implementation of SSAS and SSRS reporting.

Rahul Banur Creating the retail data sets and loading the data from source CSV tables to staging

ETL Transformatons and loading the dimensions Dim_Customer from Staging to data warehouse.

Loading the Facts Fact_Batting_performance from staging to data warehouse.

Implementation of Tableau dashboard.

Anantha Bolar Creating the retail data sets andloading the data from source CSV tables to staging

Loading the dimensions Dim_Time, Dim_Team from Staging to data warehouse.

Loading the Facts

Implementation of Tableau dashboard.

54

Page 55: satishkaladagimath.files.wordpress.com · Web viewThe Indian Premier League (abbreviated as IPL) is a professional Twenty20 cricket league hosted in India contested by teams representing

ISYS 637 603: IPL Data Warehouse Implementation

Fact_Player from staging to data warehouse.

Sameer Raja Creating the retail data sets andloading the data from source CSV tables to staging

Loading the dimensions Dim_Product and Dim_Store from Staging to data warehouse.

Loading the Facts Fact_Product_Sales from staging to data warehouse.

Implementation of Tableau dashboard.

Suraj Ishwaran

Creating the retail data sets andloading the data from source CSV tables to staging

Loading the dimensions Dim_Vendor, Dim_Player from Staging to data warehouse.

Loading the Facts Fact_Procurement from staging to data warehouse.

Implementation of SSAS and SSRS reporting.

Satish Kaladagaimath

___________________________________________________

Rahul Banur

___________________________________________________

Anantha Bolar

_____________________________________________________

Sameer Raja

_____________________________________________________

Suraj Ishwaran

_____________________________________________________

55