15
STAR SCHEME 1 Group 1 Ferris State University MISM740: Business Intelligence Assignment: Star Scheme November 19, 2011

Group 1 Ferris State University MISM740: Business ...jdilly.weebly.com/uploads/5/6/3/7/5637453/star_scheme_.pdfFerris State University MISM740: Business Intelligence Assignment:

  • Upload
    ledung

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

STAR SCHEME 1

Group 1

Ferris State University

MISM740: Business Intelligence

Assignment: Star Scheme

November 19, 2011

STAR SCHEME 2

Introduction

Working as a team, our objective was to design and populate a star scheme of the Northwind

database:

Projects that have 3 dimensions will be able to receive up to 12 points.

Projects that have at least 4 dimensions will be able to receive up to 15 points.

Projects that have a design that has at least 4 dimensions and is able to allow reporting by

Employee Region consistent with the existing Northwind database design will be eligible

for extra credit.

Our team, consisting of Debbie Davis, Jennifer Dilly, Janice Hinds, and Jo Wood, chose to

have 4 dimensions that are able to allow reporting by Employee Region consistent with the

existing Northwind database design to earn up to 15 points and be eligible for extra credit.

Our assignment submission, uploaded by one team member, includes:

The original design submitted for review.

A copy of our final database design.

A title page with team members listed.

Documentation of our ETL processes.

A review of our team’s design efforts, realizations, and experiences.

Star Scheme

This group’s original design in Diagram 1 included only the Employee, Time, Customer, and

OrderDetails table.

STAR SCHEME 3

Diagram 1 – Original Design

Fact

PK EmployeeID

PK TimeID

PK CustomerID

PK OrderID

EmpSalesbyState

Quantity

Discount

Employees

PK EmployeeID

FName

LName

Time

PK TimeID

Year

Month

Day

DayOfWeek

Holiday

Quarter

OrdersDetail

PK OrderID

OrderDate

TotalPrice

Customers

PK CustomerID

CompanyName

City

State

The Time table was designed to pull data from year, month, day, and day of the week and

by quarter as well as indicate sales during holidays. We felt the OrderDetails table was

needed to bring in some of the details to be able to see the amounts of sales each employee

had. The Customers table would bring in the detail about where each customer resided and

when putting the Fact table together with this type of information as well as information from

the other tables, the region, or state, where the orders were going could be seen. The Fact

table also included columns for Employee Sales by State as well as Quantity and Discount.

At this point, we struggled with whether to include a Products table because we contemplated

where the quantity detail could be pulled from if it was not included in another table within

the design. The Invoices table seemed to provide more pertinent information, including

quantity and price, and therefore our second draft (Diagram 2) included the Invoices table

rather than the OrderDetails table.

STAR SCHEME 4

Diagram 2 – Second Draft

Fact

PK EmployeeID

PK TimeID

PK CustomerID

PK InvoiceID

EmpSalesbyState

Employees

PK EmployeeID

FName

LName

Time

PK TimeID

Year

Month

Day

DayOfWeek

Holiday

Quarter

Invoice

PK InvoiceID

UnitPrice

Quantity

Discount

Extended Price

Customers

PK CustomerID

CompanyName

City

State

There were a few items with this draft that we did not understand at the time, the first

being that we needed to somehow link the employee information with the various regions.

We also did not understand that the information in the Invoice table contained the actual facts

we needed to query or measure rather and therefore it was not appropriate to keep it in a

dimension table which typically contains descriptive items.

Diagram 3 (below) reveals this groups final design. The four dimensions surrounding the

fact table include Time, Product, Customers, and Employees. There is also a snowflake

table, Region, linked to the Employees table which will provide description of the

employee’s region. The Fact table contains all of the ID’s from each dimension so when

running a query, descriptive information from each table will be provided. Within the Fact

table are measures which when a report is created would provide enough information to show

STAR SCHEME 5

salesperson by name and employee ID, price and quantity of what was sold, the date it was

sold, and in which region.

Diagram 3 – Final Design

Employees

PK EmployeeID

FName

LName

RegionID

Time

PK TimeID

Year

Month

Day

Qtr

OrderDate

Customers

PK CustomerID

CompanyName

State

CountryRegion

Fact

PK EmployeeID

PK TimeID

PK CustomerID

PK ProductID

Discount

ExtendedPrice

Quantity

UnitPrice

OrderDate

SalesPerson

Region

PK RegionID

CountryRegion

Product

PK ProductID

ProductName

SQL Server

The objective of this assignment was to design and populate a star scheme of the

Northwind database. The Northwind database is a Microsoft sample database that can be found

as many incarnations online, the version that indicated it was for Microsoft Access 2007 was

downloaded.

To build decision support structures, Microsoft SQL Server 2008 R2 was downloaded

and Microsoft SQL Server Management Studio, Version 10.50.1617.0 was used. Getting SQL

Server was a personal trial and tribulation in this assignment as observed in the sections about

STAR SCHEME 6

personal efforts, realizations, and experiences. Many other members of the team were having

similar issues getting SQL Server downloaded and installed.

With the help of Dr. Gogolin, the first thing was to create a database, which was named

Northwind. Next the Task menu to Import Data from the Access Northwind database was used

and the entire database and all of its data was imported. The entire database was imported in

case the team made a design change. After that the database was named NorthwindStar.

With more coaching by Dr. Gogolin, data was imported from the Northwind database

into the NorthwindStar database. He showed the group how to import just the information that

our star scheme design was going to use.

Importing the information our star scheme needed was accomplished using the SQL

Server Import and Export Wizard that has an Edit Mappings feature that allowed us to choose the

fields we did not need, as seen in the screen shot above, by picking <ignore> from the drop down

menu in Column Mappings under Destination.

STAR SCHEME 7

Screen shots of the three of our four dimension tables were created from our database

design, Customers, Products and Employees, imported from the SQL Server Import and Export

Wizard, are shown below:

Our design included a snowflake dimension

table connected to Employees for Region; the

creation is shown in the screen shot to the

right.

An Excel spreadsheet for the Region dimension table data

was created as shown in the screen shot to the left.

STAR SCHEME 8

A time dimension had to be created for our

project. We determined that our star

schema would report by day (DD), month

(MM), year (YYYY), and quarter (QTR),

so the Time dimension table shown in the

screen shot to the right was created.

Data for Time was imported

from an Excel spreadsheet that

was created to hold time data

as shown on the screen shot to

the left.

Our fact table, named Fact, is shown in the

screen shot to left.

STAR SCHEME 9

After the dimensions and fact were done, our next objective in completing this project

was to create queries that would provide the employee sales report that we wanted. Our first

SQL Query attempted to assign OrderDate to the value in the Time dimension table.

As you can see in the screen shot below the TimeID field still contained NULL values

because OrderDate included both dates for year, month and day, and time for hours and seconds,

and our Time dimension table did not contain those fields. In addition, the EmployeeID field

contained NULL values.

STAR SCHEME 10

The NULL values in the EmployeeID field were next problem that we needed a SQL

Query to solve. The problem was that the Salesperson field was the employee’s name in one

field.

Dr. Gogolin came to the rescue for our team again and helped us with the SQL Query to

allow Salesperson to be read in our Employee dimension table. That was accomplished by a

statement that read Salesperson and separated its first name, space, and last name allowing it to

be read by the Employee dimension table as the separate fields of FirstName and LastName.

The results are shown in the script and execution screen shot below:

STAR SCHEME 11

Finally, the TimeID field still contained NULL values because OrderDate included both

dates for year, month and day, and time for hours and seconds, and our Time dimension table did

not contain those fields.

Dr. Gogolin revised our original SQL Query to “cast” OrderDate as an integer that

allowed OrderDate to be read in our Time dimension table.

The script and final execution results are shown below:

ETL

Perhaps, we clumsily performed ETL for populating our dimensions in the Star Scheme,

but it was done. They say Excel is a good tool, and this was put to work developing the time

STAR SCHEME 12

dimension. Instead of creating a format that used a limited amount of items to create a multitude

of combinations, actual items were used to populate our dimensions. Regions did not come from

actual spots, and were created for the assignment, but all in all, actual import was performed

using the SQL Server. It imported the Excel, Northwind and newly created information.

Realizations and Experiences with the Assignment

Jo Woods

Unlike my teammates, I did not have a problem getting SQL Server to work for this

project, but only because I encountered the same problems earlier and Dr. Gogolin helped me get

it installed and working during the previous class meeting. Even though my knowledge and

experience with SQL Server is very limited, I like working with it, and know I would benefit

from getting more education in how it works.

I called us the “remedial team” only half-jokingly because of our limited background in

this kind of work – two of us are accountants and one is a nurse. On the other hand, many of our

fellow students on other teams work with SQL Server and seemed to have a better idea of what

they were doing than we did. It was, at times, quite intimidating.

Creating a star schema seemed fairly straightforward when Dr. Gogolin explained it in

class. However, putting together the design was more complicated than I had imagined. I had to

keep reminding myself what the function of each kind of table was when thinking about what

fields went in them.

One of the most frustrating parts of the project was the poor design of Microsoft’s

Northwind database in two areas that our team needed to make our star scheme work. The first

was Salesperson being a field that contained both the first and last name…a violation of database

design rules. And the other was that OrderDate included both date (year, month, day) and time

(hours, seconds).

STAR SCHEME 13

I did not think that I would take a lot away from this project because of my limited

knowledge and experience with SQL Server because I did not think I would understand much of

what we were doing. However, I am glad to say that I was wrong. With the help of Dr. Gogolin,

I am able to understand star schema design and concepts. I also have learned a lot about using

SQL Server.

Jan Hinds

Upon first working with this assignment, I had continued difficulty with having a

working SQL Server application. This made it difficult to cooperate in the assignment with my

teammates, which meant a reliance on their manipulation of the data. I didn’t want to be helpless

to them. I definitely wanted to help but their use of a laptop prevented that. Luckily, at the end

of the day, I downloaded and installed a fully working SQL application.

Notwithstanding this, there were classmates who used SQL in their work or they had

other experience using it. This made it intimidating to find my questions were below their

working abilities. However, this was a personal intimidation and never once communicated to

me by anyone in the class. This helped my ego. Once I resolved this issue in my mind (although

I hoped my incredulity was conveyed in a humorous manner!), getting down to work and

arriving at a design that satisfied the assignment, challenged our team and was comparable to the

work of co-students was gratifying.

The work we completed meant that as a Database Manager, the people I work with on

databases can have more knowledge; however I can compete with them in broad knowledge of

the database’s function. The main work consists of preparing the database’s tables (and correct

properties) and importing the work of others or information created in another format (Access,

STAR SCHEME 14

Excel, etc.). I may not be the designer of the database but will understand or overview the work

of others.

Our ETL process meant non-expertise in extracting information from the Northwind

database. We did not create an equation to manipulate data to create compounded information

from others but instead imported what existed. The Time Table designed in Excel worked as the

template for creating a year-long calendar for use the data. While the year in question turned out

to be a Leap Year, disregarding this fact was useful to completing the assignment in strategizing

it with the other dimensions.

Jennifer Dilly

At first the Star Scheme design did not seem difficult, but I quickly realized I did not

know as much as I thought! Working through the design with a group and in a classroom with

others who obviously have some of this knowledge already has helped somewhat. The biggest

realization came with what is contained in the Fact table. The power point about Star Scheme

provided for class helped clarify the difference between the descriptive information contained in

the dimension tables and the facts contained in the Fact table which are the facts that would be

needed when creating a report.

The hard part for me was thinking in terms of business (price, quantity, customers, etc.)

instead of details related to healthcare topics. As a very visual and hands-on learner, not using an

example that I can relate to made this a little more difficult. I could understand what each table in

the Northwind database contains, but I had a hard time visualizing what descriptive information

was needed to produce the required report with the appropriate information.

STAR SCHEME 15

Debbie Davis

I found this assignment to be very challenging just as my fellow group members did. I

thought that it was easy at first, I was sure that I understood what the assignment was and what a

star scheme database was all about. As we worked through the assignment we found that we

hadn’t really understood what the assignment was all about. I know that I did not realize that the

fact table was the root of the database, although I should have considering where it is located in

the database. So I learned that the fact table is the center of the database. It is in essence the root

of what you are trying to discover with the database.

As far as populating the tables in the database I really was lost, SQL is really not my

thing yet. I hope that as the opportunity arises I won’t feel so intimated by this and will be able

to gain the experience to work with SQL databases. I feel that in the end I learned a lot by this

experience. You really had to think out first just what you wanted to accomplished and then

build your tables to accommodate this, and I thought that made you think about your design

better.

I also enjoyed the other people’s databases and especially how different groups went

about doing this task. Obviously the ones with the most experience did an awesome job in both

the database and how they populated it. But, I thought that all the groups gave this assignment

the thought and attention that it deserved and did the assignment to the best of their abilities as a

group. Great learning experience for each of us, you could see the excitement for what the

groups had accomplished.