Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
You will submit two files
Word file with q1 and q2 EXCEL file for q1 (show all work)
READ THE INSTRUCTIONS CAREFULLY!!!!!!!!
NO LATE SUBMISSION!!!!!!!
Note: Email me for any GENERAL clarifications All work should be done on the sheet itself except EXCEL
file. ORACLE queries and output should be embedded in the exam. Make sure to include all Oracle statements that you use (including how many rows selected etc). I will be going and checking your queries in ORACLE in your account
1
Please Show All Work -- ORACLE queries, their output For Full Credit
Q1. Given the following file for assignment worker.com, identify data anomalies that must be removed before data can be loaded in data warehouse.
Worker_assignment -----------------on course web site
Assignment_worker(assignment_no, assignment_date, emp_number, chg_hour,assigned_hour, charges)
Where assignment number is the number assigned to an assignmentAssignment_date is the date assignment startedEmp_number is the number of employee assigned to that assignmentChg_hour is amount paid to that employee for that assignmentAssigned_hour is the hours assigned to that employee for that assignmentCharges are the Total charges for that employee for that assignment (this is calculated as Chg_hour*assigned_hour)
Rules: Assignment numbers always start with a letter followed by a 1 and are ALWAYS four characters
longex: A123, Z178
Emp No IS always 3 CHARCATER LONG
An employee cannot work more than 40 hour on a given project
Requirement:Count (using EXCEL formulas -- IF, countif etc. as done in class) four types of errors:
Missing data Incorrect Format
o To check length of empno--you can use LEN(cell address) to get length of item in that cello check for assignment number format (BONUS +1 points)
Zero values2
Incorrect Calculations o check for charges
charges= chg_hour*Assigned_Houro check for employee working more than 40 hours
Once counted
Draw the pie chart or line chart of data anomalies and Discuss what errors can be corrected and how. (submit in WORD)
(20 points)
Must submit the EXCEL worksheet where errors are calculated and graph is drawn
Q2(20 points)Data integrity is a required feature of data warehouses. P & G is building a data warehouse and have run in data integration problems. They need to get data from 2 different users and combine them to maintain data integrity in their data warehouse.
The sources are:
Asia regionNorth American Region
Both region have data stored in different formats in two different files (employee_asia and emp_NA
Both tables are available in account Aggarwal as READ ONLY. You must create a copy in your account before using it.
Or
you can create your own tables.
Asia region data is available as
EMPLOYEE_ASIA ( Emp_ID, Emp_Last, Emp_first, gender, country of origin, no of years working)
A fictitious sample data is presented below:Emp_ID Emp_Last Emp_first gender country of origin noofyearsS112 Bora Lakshmi female India 30S113 Teela Sony male Singapore 5S115 Patel Danny male Ceylon 20
3
S118 Raj Desai male Ceylon 10S121 Singh Linda female United States 15S411 Sawal Gary male India 40S124 Ye Linda female China 0.5S456 Saul Bee male United Kingdom 40S101 Marriott Uli male Ceylon 25
SQL> desc employee_asia Name Null? Type ----------------------------------------- -------- ---------------------------- EMP_ID NOT NULL CHAR(4) EMP_LAST CHAR(15) EMP_FIRST CHAR(10) GENDER CHAR(7) COUNTRY CHAR(50) NOOFYEARS NUMBER(3,1)
North America data is available as
EMP_NA (Emp_num, Employee_first, emp_last, emp_gender, emp_country, job_title)
A sample is available as (note m for male and f for female)
Emp_num Emp_first emp_last Emp_gender Emp_country job_titlePM112 Maria Santa f USA ManagerPM345 Mary Bowie f USA SalesmanPM455 Bora Bora m Canada SalesmanPM233 Lucky Willy f Canada ManagerPM101 Bobby Reyas f Canada CEOPM202 Wheely Sancez f Mexico ManagerPM221 Li Chi m USA DBAPM312 Perry Well m USA CIO
SQL> desc emp_NA Name Null? Type ----------------------------------------- -------- ---------------------------- EMP_NUM NOT NULL CHAR(5) EMP_FIRST CHAR(10) EMP_LAST CHAR(12) EMP_GENDER CHAR(6) EMP_COUNTRY CHAR(30) JOB_TITLE CHAR(12)
P&G wants to develop a table a following integrated table:4
EMPLOYEE_DIM (Employee Id, Employee_name, seniority, gender, country, job_class)
Note:Job_class is classified as:
Job Job_classCEO, CIO TOP Manager, DBA MIDDLESalesman OPERATIONS
In addition seniority is defined as:
No of years Seniority<1 temporary Between 1 and 5 juniorBetween 5.1 and 10 senior10.1 and above Super senior
emp_ID and emp_NUM are the same fields.
SHOW ALL QUERIES AND OUTPUTS
1. CLEAN the data in required format (for gender, country of origin, job_class and seniority) a. Employee gender should be standardized, i.e., male should be changed to m and female to fb. Country should be spelled completely, i.e, USA should be spelled out as United States of Americac. Ceylon no longer exists, change the name to Sri Lankad. Name is one attribute in dimension table, combine name as last and first, example Bora (last) and
Lakshmi (first) should be modified to Bora, Lakshmi e. Calculate both job_class and seniority
2. Create CLEAN_ASIA table
3. Create CLEAN_NA table
4. Combine the two using UNION to create following table
EMPLOYEE_DIM (Employee Id, Employee_name, seniority, gender, country, job_class)
5. Show the contents and structure of EMPLOYEE_DIM table.6. Give a count of male and female employees
5
Q3 Revise the data warehouse based on new requirements (same as what we did in class)(10 points)
6