Upload
ledung
View
216
Download
2
Embed Size (px)
Citation preview
STAR SCHEME 1
Group 1
Ferris State University
MISM740: Business Intelligence
Assignment: Star Scheme
November 19, 2011
STAR SCHEME 2
Introduction
Working as a team, our objective was to design and populate a star scheme of the Northwind
database:
Projects that have 3 dimensions will be able to receive up to 12 points.
Projects that have at least 4 dimensions will be able to receive up to 15 points.
Projects that have a design that has at least 4 dimensions and is able to allow reporting by
Employee Region consistent with the existing Northwind database design will be eligible
for extra credit.
Our team, consisting of Debbie Davis, Jennifer Dilly, Janice Hinds, and Jo Wood, chose to
have 4 dimensions that are able to allow reporting by Employee Region consistent with the
existing Northwind database design to earn up to 15 points and be eligible for extra credit.
Our assignment submission, uploaded by one team member, includes:
The original design submitted for review.
A copy of our final database design.
A title page with team members listed.
Documentation of our ETL processes.
A review of our team’s design efforts, realizations, and experiences.
Star Scheme
This group’s original design in Diagram 1 included only the Employee, Time, Customer, and
OrderDetails table.
STAR SCHEME 3
Diagram 1 – Original Design
Fact
PK EmployeeID
PK TimeID
PK CustomerID
PK OrderID
EmpSalesbyState
Quantity
Discount
Employees
PK EmployeeID
FName
LName
Time
PK TimeID
Year
Month
Day
DayOfWeek
Holiday
Quarter
OrdersDetail
PK OrderID
OrderDate
TotalPrice
Customers
PK CustomerID
CompanyName
City
State
The Time table was designed to pull data from year, month, day, and day of the week and
by quarter as well as indicate sales during holidays. We felt the OrderDetails table was
needed to bring in some of the details to be able to see the amounts of sales each employee
had. The Customers table would bring in the detail about where each customer resided and
when putting the Fact table together with this type of information as well as information from
the other tables, the region, or state, where the orders were going could be seen. The Fact
table also included columns for Employee Sales by State as well as Quantity and Discount.
At this point, we struggled with whether to include a Products table because we contemplated
where the quantity detail could be pulled from if it was not included in another table within
the design. The Invoices table seemed to provide more pertinent information, including
quantity and price, and therefore our second draft (Diagram 2) included the Invoices table
rather than the OrderDetails table.
STAR SCHEME 4
Diagram 2 – Second Draft
Fact
PK EmployeeID
PK TimeID
PK CustomerID
PK InvoiceID
EmpSalesbyState
Employees
PK EmployeeID
FName
LName
Time
PK TimeID
Year
Month
Day
DayOfWeek
Holiday
Quarter
Invoice
PK InvoiceID
UnitPrice
Quantity
Discount
Extended Price
Customers
PK CustomerID
CompanyName
City
State
There were a few items with this draft that we did not understand at the time, the first
being that we needed to somehow link the employee information with the various regions.
We also did not understand that the information in the Invoice table contained the actual facts
we needed to query or measure rather and therefore it was not appropriate to keep it in a
dimension table which typically contains descriptive items.
Diagram 3 (below) reveals this groups final design. The four dimensions surrounding the
fact table include Time, Product, Customers, and Employees. There is also a snowflake
table, Region, linked to the Employees table which will provide description of the
employee’s region. The Fact table contains all of the ID’s from each dimension so when
running a query, descriptive information from each table will be provided. Within the Fact
table are measures which when a report is created would provide enough information to show
STAR SCHEME 5
salesperson by name and employee ID, price and quantity of what was sold, the date it was
sold, and in which region.
Diagram 3 – Final Design
Employees
PK EmployeeID
FName
LName
RegionID
Time
PK TimeID
Year
Month
Day
Qtr
OrderDate
Customers
PK CustomerID
CompanyName
State
CountryRegion
Fact
PK EmployeeID
PK TimeID
PK CustomerID
PK ProductID
Discount
ExtendedPrice
Quantity
UnitPrice
OrderDate
SalesPerson
Region
PK RegionID
CountryRegion
Product
PK ProductID
ProductName
SQL Server
The objective of this assignment was to design and populate a star scheme of the
Northwind database. The Northwind database is a Microsoft sample database that can be found
as many incarnations online, the version that indicated it was for Microsoft Access 2007 was
downloaded.
To build decision support structures, Microsoft SQL Server 2008 R2 was downloaded
and Microsoft SQL Server Management Studio, Version 10.50.1617.0 was used. Getting SQL
Server was a personal trial and tribulation in this assignment as observed in the sections about
STAR SCHEME 6
personal efforts, realizations, and experiences. Many other members of the team were having
similar issues getting SQL Server downloaded and installed.
With the help of Dr. Gogolin, the first thing was to create a database, which was named
Northwind. Next the Task menu to Import Data from the Access Northwind database was used
and the entire database and all of its data was imported. The entire database was imported in
case the team made a design change. After that the database was named NorthwindStar.
With more coaching by Dr. Gogolin, data was imported from the Northwind database
into the NorthwindStar database. He showed the group how to import just the information that
our star scheme design was going to use.
Importing the information our star scheme needed was accomplished using the SQL
Server Import and Export Wizard that has an Edit Mappings feature that allowed us to choose the
fields we did not need, as seen in the screen shot above, by picking <ignore> from the drop down
menu in Column Mappings under Destination.
STAR SCHEME 7
Screen shots of the three of our four dimension tables were created from our database
design, Customers, Products and Employees, imported from the SQL Server Import and Export
Wizard, are shown below:
Our design included a snowflake dimension
table connected to Employees for Region; the
creation is shown in the screen shot to the
right.
An Excel spreadsheet for the Region dimension table data
was created as shown in the screen shot to the left.
STAR SCHEME 8
A time dimension had to be created for our
project. We determined that our star
schema would report by day (DD), month
(MM), year (YYYY), and quarter (QTR),
so the Time dimension table shown in the
screen shot to the right was created.
Data for Time was imported
from an Excel spreadsheet that
was created to hold time data
as shown on the screen shot to
the left.
Our fact table, named Fact, is shown in the
screen shot to left.
STAR SCHEME 9
After the dimensions and fact were done, our next objective in completing this project
was to create queries that would provide the employee sales report that we wanted. Our first
SQL Query attempted to assign OrderDate to the value in the Time dimension table.
As you can see in the screen shot below the TimeID field still contained NULL values
because OrderDate included both dates for year, month and day, and time for hours and seconds,
and our Time dimension table did not contain those fields. In addition, the EmployeeID field
contained NULL values.
STAR SCHEME 10
The NULL values in the EmployeeID field were next problem that we needed a SQL
Query to solve. The problem was that the Salesperson field was the employee’s name in one
field.
Dr. Gogolin came to the rescue for our team again and helped us with the SQL Query to
allow Salesperson to be read in our Employee dimension table. That was accomplished by a
statement that read Salesperson and separated its first name, space, and last name allowing it to
be read by the Employee dimension table as the separate fields of FirstName and LastName.
The results are shown in the script and execution screen shot below:
STAR SCHEME 11
Finally, the TimeID field still contained NULL values because OrderDate included both
dates for year, month and day, and time for hours and seconds, and our Time dimension table did
not contain those fields.
Dr. Gogolin revised our original SQL Query to “cast” OrderDate as an integer that
allowed OrderDate to be read in our Time dimension table.
The script and final execution results are shown below:
ETL
Perhaps, we clumsily performed ETL for populating our dimensions in the Star Scheme,
but it was done. They say Excel is a good tool, and this was put to work developing the time
STAR SCHEME 12
dimension. Instead of creating a format that used a limited amount of items to create a multitude
of combinations, actual items were used to populate our dimensions. Regions did not come from
actual spots, and were created for the assignment, but all in all, actual import was performed
using the SQL Server. It imported the Excel, Northwind and newly created information.
Realizations and Experiences with the Assignment
Jo Woods
Unlike my teammates, I did not have a problem getting SQL Server to work for this
project, but only because I encountered the same problems earlier and Dr. Gogolin helped me get
it installed and working during the previous class meeting. Even though my knowledge and
experience with SQL Server is very limited, I like working with it, and know I would benefit
from getting more education in how it works.
I called us the “remedial team” only half-jokingly because of our limited background in
this kind of work – two of us are accountants and one is a nurse. On the other hand, many of our
fellow students on other teams work with SQL Server and seemed to have a better idea of what
they were doing than we did. It was, at times, quite intimidating.
Creating a star schema seemed fairly straightforward when Dr. Gogolin explained it in
class. However, putting together the design was more complicated than I had imagined. I had to
keep reminding myself what the function of each kind of table was when thinking about what
fields went in them.
One of the most frustrating parts of the project was the poor design of Microsoft’s
Northwind database in two areas that our team needed to make our star scheme work. The first
was Salesperson being a field that contained both the first and last name…a violation of database
design rules. And the other was that OrderDate included both date (year, month, day) and time
(hours, seconds).
STAR SCHEME 13
I did not think that I would take a lot away from this project because of my limited
knowledge and experience with SQL Server because I did not think I would understand much of
what we were doing. However, I am glad to say that I was wrong. With the help of Dr. Gogolin,
I am able to understand star schema design and concepts. I also have learned a lot about using
SQL Server.
Jan Hinds
Upon first working with this assignment, I had continued difficulty with having a
working SQL Server application. This made it difficult to cooperate in the assignment with my
teammates, which meant a reliance on their manipulation of the data. I didn’t want to be helpless
to them. I definitely wanted to help but their use of a laptop prevented that. Luckily, at the end
of the day, I downloaded and installed a fully working SQL application.
Notwithstanding this, there were classmates who used SQL in their work or they had
other experience using it. This made it intimidating to find my questions were below their
working abilities. However, this was a personal intimidation and never once communicated to
me by anyone in the class. This helped my ego. Once I resolved this issue in my mind (although
I hoped my incredulity was conveyed in a humorous manner!), getting down to work and
arriving at a design that satisfied the assignment, challenged our team and was comparable to the
work of co-students was gratifying.
The work we completed meant that as a Database Manager, the people I work with on
databases can have more knowledge; however I can compete with them in broad knowledge of
the database’s function. The main work consists of preparing the database’s tables (and correct
properties) and importing the work of others or information created in another format (Access,
STAR SCHEME 14
Excel, etc.). I may not be the designer of the database but will understand or overview the work
of others.
Our ETL process meant non-expertise in extracting information from the Northwind
database. We did not create an equation to manipulate data to create compounded information
from others but instead imported what existed. The Time Table designed in Excel worked as the
template for creating a year-long calendar for use the data. While the year in question turned out
to be a Leap Year, disregarding this fact was useful to completing the assignment in strategizing
it with the other dimensions.
Jennifer Dilly
At first the Star Scheme design did not seem difficult, but I quickly realized I did not
know as much as I thought! Working through the design with a group and in a classroom with
others who obviously have some of this knowledge already has helped somewhat. The biggest
realization came with what is contained in the Fact table. The power point about Star Scheme
provided for class helped clarify the difference between the descriptive information contained in
the dimension tables and the facts contained in the Fact table which are the facts that would be
needed when creating a report.
The hard part for me was thinking in terms of business (price, quantity, customers, etc.)
instead of details related to healthcare topics. As a very visual and hands-on learner, not using an
example that I can relate to made this a little more difficult. I could understand what each table in
the Northwind database contains, but I had a hard time visualizing what descriptive information
was needed to produce the required report with the appropriate information.
STAR SCHEME 15
Debbie Davis
I found this assignment to be very challenging just as my fellow group members did. I
thought that it was easy at first, I was sure that I understood what the assignment was and what a
star scheme database was all about. As we worked through the assignment we found that we
hadn’t really understood what the assignment was all about. I know that I did not realize that the
fact table was the root of the database, although I should have considering where it is located in
the database. So I learned that the fact table is the center of the database. It is in essence the root
of what you are trying to discover with the database.
As far as populating the tables in the database I really was lost, SQL is really not my
thing yet. I hope that as the opportunity arises I won’t feel so intimated by this and will be able
to gain the experience to work with SQL databases. I feel that in the end I learned a lot by this
experience. You really had to think out first just what you wanted to accomplished and then
build your tables to accommodate this, and I thought that made you think about your design
better.
I also enjoyed the other people’s databases and especially how different groups went
about doing this task. Obviously the ones with the most experience did an awesome job in both
the database and how they populated it. But, I thought that all the groups gave this assignment
the thought and attention that it deserved and did the assignment to the best of their abilities as a
group. Great learning experience for each of us, you could see the excitement for what the
groups had accomplished.