14
(2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory. (2) Makesuitable assumptions wherever necessary and state the assumptions made. (3) Answers to the same question must be written together. (4) Numbers to the right indicate marks. (5) Draw neat labeled diagrams wherever necessary. (6) Use of Non-programmable calculators is allowed. 1. Attempt any two of the following: 10 a. What are operational databases? Explain following characteristics of data in a data warehouse. i) Subject-oriented ii) Integrated iii) Time-variant iv) Non-volatile The Operational databases are often used for on-line transaction processing (OLTP). It deals with day-to-day operations such as banking, purchasing, manufacturing, registration, accounting, etc. These systems typically get data into the database. Each transaction processes information about a single entity. The purpose of these queries is to support business operations. Features of Data warehouse Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and Sales. It focuses on the modeling and analysis of data rather than day-to- day business operations. Integrated: A data warehouse is constructed by integrating data from multiple heterogeneous data sources. Time variant: A data warehouse is a repository of historical data. It gives the view of the data for a designated time frame. Non-volatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. b. Explain virtual data warehouse in detail. This option provides end users with direct access to multiple operational databases through middleware tools. That is it provides on-the-fly data for decision support purposes. The end users can generate “summarized data” reports for their data analysis. The advantages of this approach are: Easy to build Elimination of the time and expense of developing a traditional data warehouse Flexibility No data redundancy Provides end-users with the most current corporate information The drawbacks of this approach include: Repetitive transformation and integration operations Impacts to source systems Loss of historical perspective

All compulsory state the assumptions same question written ...muresults.net/itacademic/TYIT6/Nov17/DWSS.pdf · (2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: All compulsory state the assumptions same question written ...muresults.net/itacademic/TYIT6/Nov17/DWSS.pdf · (2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory

(2½ hours)

Total Marks: 75

N. B.: (1) All questions are compulsory.

(2) Makesuitable assumptions wherever necessary and state the assumptions made.

(3) Answers to the same question must be written together.

(4) Numbers to the right indicate marks.

(5) Draw neat labeled diagrams wherever necessary.

(6) Use of Non-programmable calculators is allowed.

1. Attempt any two of the following: 10

a. What are operational databases? Explain following characteristics of data in a data

warehouse. i) Subject-oriented

ii) Integrated

iii) Time-variant

iv) Non-volatile

The Operational databases are often used for on-line transaction processing (OLTP). It

deals with day-to-day operations such as banking, purchasing, manufacturing,

registration, accounting, etc. These systems typically get data into the database. Each

transaction processes information about a single entity. The purpose of these queries is

to support business operations.

Features of Data warehouse

• Subject-oriented:

A data warehouse is organized around major subjects, such as customer, vendor,

product, and Sales. It focuses on the modeling and analysis of data rather than day-to-

day business operations.

• Integrated: A data warehouse is constructed by integrating data from multiple

heterogeneous data sources.

• Time variant: A data warehouse is a repository of historical data. It gives the view of

the data for a designated time frame.

• Non-volatile: A data warehouse is always a physically separate store of data

transformed from the application data found in the operational environment. Due to

this separation, a data warehouse does not require transaction processing, recovery,

and concurrency control mechanisms.

b. Explain virtual data warehouse in detail.

This option provides end users with direct access to multiple operational databases

through middleware tools. That is it provides on-the-fly data for decision support

purposes. The end users can generate “summarized data” reports for their data analysis.

The advantages of this approach are:

– Easy to build

– Elimination of the time and expense of developing a traditional data warehouse

– Flexibility

– No data redundancy

– Provides end-users with the most current corporate information

The drawbacks of this approach include:

– Repetitive transformation and integration operations

– Impacts to source systems

– Loss of historical perspective

Page 2: All compulsory state the assumptions same question written ...muresults.net/itacademic/TYIT6/Nov17/DWSS.pdf · (2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory

• Virtual data warehouses often provide a starting point for organizations to learn what

end users are really looking for.

• The major drawback of this approach is that it can put the largest unplanned query

load on operational systems which will certainly affect OLTP query response time.

c. Explain star schema model in relational implementation of data warehouses.

The relational implementation of dimensional model is done using star schema. It

represents multi-dimensional data. A star schema consists of a central fact table

containing measures and a set of dimension tables. In star schema model a fact table is

at the center of the star and the dimension tables as points of the star. A star schema

represents one central set of facts. The dimension tables contain descriptions about each

of the aspects. Say for example a warehouse that store sales data, there is a sales fact

table stores facts about sales while dimension tables store data about location , clients,

items, times, branches.

Examples of sales facts are unit sales, dollar sales, sale cost etc. Facts are numeric values

which enable users to query and understand business performance metrics by

summarizing data. The primary key in each dimension table is related to a foreign key in

the fact table.

d. What is data aggregation? Briefly explain granularity of facts.

The process of summarizing information for the purpose of statistical analysis is known

as data aggregation. Data aggregation helps company data warehouses try to piece

together different kinds of data within the data warehouse so that they can have meaning

that will be useful as statistical basis for company reporting and analysis.

Granularity refers to the level of information stored in the fact table. The data in fact

table needs to be highly granular. To achieve highest granularity data should be kept in

most detailed level. Low granularity refers to data that is summarized or aggregated. The

monthly summarized data is lightly summarized when compared with yearly

summarized data. Similarly daily summarized data is lightly summarized when

compared with monthly summarized data. Say for example the levels of granularity for

a time dimension can be Year, Month, Quarter, Week, Day, Hour, Minute, Second with

the year being the highest level, in which there is low fact table granularity. The lowest

level of detail smallest unit of analysis is known as ‘grain’.

Page 3: All compulsory state the assumptions same question written ...muresults.net/itacademic/TYIT6/Nov17/DWSS.pdf · (2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory

Selecting the appropriate level of granularity is totally dependent on the

business requirement. For example, if a Product dimension has only a Category attribute,

then you cannot query for information at the Brand or Item level. So for this example, it

is required to define the grain to the Item level.

2. Attempt any two of the following: 10

a. Explain about data objects and data object editor of Oracle Warehouse Builder.

This data in the target schema is in the form of data objects such as tables, views,

dimensional objects, and cubes.

Oracle Warehouse Builder uses relational and dimensional data objects and intelligence

objects as follows:

• Relational objects rely on tables and table-derived objects to store and link all

of their data. Relational objects include tables, views, materialized views, and

sequences.

• Dimensional objects contain additional metadata to identify and categorize your

data. Dimensional objects include dimensions and cubes.

OWB uses relational and dimensional data objects as follows:

• Relational objects rely on tables and table-derived objects to store and link all

of their data. Relational objects include tables, views, materialized views, and

sequences.

• Dimensional objects contain additional metadata to identify and categorize your

data. Dimensional objects include dimensions and cubes.

The Data Object Editor is the manual editor interface that the Warehouse Builder

provides to create, edit, configure, validate, and deploy Oracle data objects.

Use data object editors to:

• Create, edit, and delete relational and dimensional objects.

• Define relationships between Oracle data objects.

• Validate, generate, and deploy Oracle data objects.

• Define and edit all aspects of a data object such as its columns, constraints,

indexes, partitions, data rules, and attribute sets.

• Define implementation details for dimensional objects with a relational

implementation.

b. List various components of OWB. Explain major functions of design center.

• Following are the client side components:

– Design Center

– Repository Browser.

• Following are the server side components:

– Control Center Service

– Repository

– Target Schema.

The Design Center is the primary graphical user interface for designing a logical design

of the data warehouse.

Design Center is used to :

– import source objects

– design ETL processes

– Define the integration solution.

Design Center provides a logical design, not the physical implementation. This logical

design will be stored behind the scenes in a Workspace in the Repository on the server.

Page 4: All compulsory state the assumptions same question written ...muresults.net/itacademic/TYIT6/Nov17/DWSS.pdf · (2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory

The Control Center Manager is a part of the design center. It manages communication

between target schema and design center. The Control Center Manager is used for

managing the creation of that physical implementation by deploying the designs we've

created into the Target Schema. Through the Control Center Manager we can execute

the design by running the code associated with the ETL that we've designed. The design

objects are stored as metadata in a centralized repository known as workspace. This is

where all of the design information is stored for the target systems you are creating. The

Repository Browser is another user interface used to browse design metadata. The Target

Schema is where OWB will deploy the object to, and where the execution of ETL

processes that load our data warehouse will take place. It contains the objects that were

designed in the Design Center, as well as the ETL code to load those objects.

c. What is meant by importing metadata in OWB? Explain steps to import metadata

from flat files.

Metadata is the data that describes the source data. By importing metadata Warehouse

Builder will get to know about source database objects and their location, so that it can

build the code necessary to pull the data from them when we design and run mappings

to populate data warehouse. The metadata is represented in the Warehouse Builder as

objects corresponding to the type of the source object. So if we’re representing tables in

a database, we will have tables defined in the Warehouse Builder.

Steps to import metadata from flat files

1. Creating a Flat File module

Right-Click the Files node under your project node and select New Flat File

Module to create a flat file module. Name the module. Say for example FileMod. 2. Specifying the File location

Specify the flat file location using the Locations Navigator. It indicates where the

file is located.

3. Starting the Import Metadata Wizard

Right-click the newly created module FileMod, select New, and follow the

prompts in the Import Metadata Wizard. On the summary page of the Import

Metadata Wizard, select the file whose metadata is to be imported and then

select Sample to launch the Flat File Sample Wizard. 4. Flat File Sample Wizard

Follow the prompts in the Flat File Sample Wizard to specify the metadata

structure. After sampling the flat file, return to the Summary and select Finish.

d. What is a listener? How is it configured?

Listener

In Oracle all network connections are done through the listener. The Listener is a named

process which runs on the Oracle Server. The listener process runs constantly in the

background on the database server computer awaiting for requests from clients to

connect to the Oracle database. It receives connection requests from the client and

manages the traffic of these requests to the database server.

Configuring Listener

Run Net Configuration Assistant to configure a listener.

Step 1

• The first screen is a welcome screen. Select Listener Configuration option from it and

then click next button.

Step 2

• The second screen allows you to add, reconfigure, delete or rename a listener.

• Choose Add from the given option to configure a new listener and click next.

Step 3

• The third screen asks you to enter a name for the listener.

• The default name is “LISTENER”.

• Enter a new name or continue with the default and then click next button to proceed.

Page 5: All compulsory state the assumptions same question written ...muresults.net/itacademic/TYIT6/Nov17/DWSS.pdf · (2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory

Step 4

• The fourth screen is the protocol selection screen.

• By default the TCP protocol is selected in this screen.

• TCP is the standard communication protocol for internet and most local networks

• Select the protocol and click next.

Step 5

• The fifth and final screen asks the TCP/IP port number for the listener to run.

• The default port number is 1521 and continues with the default port number.

It will ask us if we want to configure another listener. Select no to finish the listener

configuration.

3. Attempt any two of the following: 10

a. Explain about dimensional modeling of data warehouse

A dimensional model represents the business rules in a more understandable way. Users

just want to know what the result is, and don't want to worry about how many tables need

to be joined in a complex query to get that result. A dimensional model removes the

complexity and represents the data in a simple, understandable, and easy to query for

the business end user.

Dimensions

A dimension is a structure that organizes data. Examples of commonly used dimensions

are Customers, Time, Store and Products. Dimensions are perspectives with respect to

which an organization wants analyze data. The dimensions are organized into levels and

hierarchies. In the following figure month, quarter and year levels of Time dimension

are depicted.

Hierarchies break down the dimensions into navigational paths which you can use to get

more granular level detail in the data. It is composed of certain levels in order. In the

above diagram, the levels year, quarter and month forms a hierarchy. The data can be

viewed at each of these levels, and the next level up would simply be a summation of all

the lower-level data within that period.

Cubes

A data warehouse cube is a multidimensional structure composed of fact tables

and dimensions. It enables us to make a multidimensional analysis of the facts. The cubes

are fast accessible, secure and user-friendly. Cubes contain measures and link to one or

more dimensions.

Example:-

Page 6: All compulsory state the assumptions same question written ...muresults.net/itacademic/TYIT6/Nov17/DWSS.pdf · (2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory

Although a real world cube can represent only three dimensions (length, breadth and

height), a warehouse cube can represent any number of dimensions. One can think of

additional dimensions as being cubes within a cube. The term hypercube is used to refer

a cube with many dimensions. b. What is the use of canvas area in data object editor? Also explain about explorer

and palette windows.

The Canvas is an area of Data Object Editor, in which the contents are displayed

graphically. The objects you have created in Data Object Editor will be displayed on the

Canvas. These objects include dimensions, cubes tables and so on. Each object is

displayed in a box with the name of the object as the title of the box and attributes of the

object listed inside the box. The Canvas has three tabs: one for Relational, one for

Dimensional, and one for Business definition. They are for displaying objects of the

corresponding type. Dimensional objects such as cubes and dimensions will be displayed

on the Dimensional tab and relational objects such as tables will be displayed on the

Relational tab. The Business Definitions are for interfacing with the Oracle Discoverer

Business Intelligence tool to analyze data.

Explorer

This window is similar to the Project Explorer in Design Center. It shows all object that

can be edited with Data Object Editor. The Available Objects tab of explorer shows

objects that are available to include on the Canvas window and the Selected Objects tab

shows the objects that are currently placed on the Canvas. By clicking and dragging an

object from the Explorer you can place an already created object on the canvas.

Palette

The Palette contains various types of objects that can be used in the Data Object Editor.

The list of objects available will change as the tab is changed in the Canvas. New objects

can be created on the Canvas by clicking and dragging objects from palette to the Canvas.

This will create a new object where clicking and dragging from the Explorer will place

an already created object on the canvas.

c. Briefly explain various steps to create a dimension using dimension wizard.

Right-click on the Dimensions node under our target module, which is under Databases

in the Design Center Project Explorer. Choose New and then Using Wizard... to

launch the Create Dimension Wizard.

The first screen is the welcome screen. It shows a summary of all steps to create a

dimension. Click Next to continue.

Step1: Provide name and description

This screen asks input for name and description for the new dimension.

Step2: Set the storage type

Choose the storage type as either ROLAP or as MOLAP.

Step3: Define the dimension attributes

Page 7: All compulsory state the assumptions same question written ...muresults.net/itacademic/TYIT6/Nov17/DWSS.pdf · (2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory

Define the attributes that you want define for the new dimension. It also allows you to

define surrogate key of business identifiers for the dimension.

Step4: Define the levels within the default hierarchy

The next step is where we can specify the levels in our dimension. The levels must be

entered in order from top to bottom with the highest level listed first, then down to the

lowest level.

Step5: Choose the level attributes from the dimension attributes

Moving on to the next screen, we get to specify the level attributes.

Step6: Choose the slowly changing dimension type

Choose one of the following options for slowly changing dimensions.

• Type 1 (Overwrite) - Do not keep a history. This means we basically do not care

what the old value was and just overwrite it.

• Type 2 (Add a row) – It stores the complete change history. A new record (row) is

added to the dimension table every time when a dimension value changes.

• Type 3 (Add a column) – It stores only the previous value when a dimension data

changes.

Step7: Review dimension settings

The next screen shows a summary of the actions we performed.

Step8: Dimension creation progress

This step creates the dimension.

d. What are surrogate keys? Explain the need of surrogate keys with example.

Surrogate keys are artificial keys that have no business meaning and are used as a

substitute for source system primary keys. They are generated and maintained within the

data warehouse. A Surrogate Key is a NUMBER type column and is generated using a

Sequence. The Surrogate Keys are used to uniquely identify each record in a dimension.

Surrogate keys are numeric values and hence Indexing is faster.

In the above Dimension Table CustomerID is the business key. Here the value of

dimension attribute CustomerCity is changed two times. Here the dimension is a Type 2

slowly changing dimension and hence two additional records were inserted into the

dimension table to preserve history with the same business key. This is where a surrogate

key is extremely important. In such a situation the surrogate key ID is the primary key

and has no business meaning but uniquely identifies each row in the dimension table.

4. Attempt any two of the following: 10

a. What is joiner operator? Explain steps to use it in a mapping.

The Joiner operator implements SQL joins on two or more input sets of data, and

produces a single output row set. That is it combines data from multiple input sources

Page 8: All compulsory state the assumptions same question written ...muresults.net/itacademic/TYIT6/Nov17/DWSS.pdf · (2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory

into one. It has a property called Join Condition through which you can specify the

criterion for join.

Following are the steps to connect source tables to target using joiner operator

1. Place source tables by clicking and dragging them from Explorer Window of the

Mapping Editor to Mapping Area of Mapping Editor. Arrange the source tables

vertically on the left side of the mapping area.

2. Now place the target table on the right side of mapping area in a similar way.

3. Place a Joiner operator in between the source tables and target table on the

mapping area. It can be done by clicking and dragging Joiner operator from the

Palette window to the Mapping area. The Joiner operator has two input groups

and one output group by default. Each input group corresponds to a separate table,

and the output group represents the joined output from the input tables.

4. Let us suppose that you want to combine two tables using Joiner. Click and drag

INOUTGRP1 of the first source table operator onto the INGRP1 group of the

JOINER. Similarly click and drag INOUTGRP1 of the second source table

operator onto the INGRP2 group of the JOINER.

5. The nest step in the process is to specify the Join condition. Select the Joiner

operator to show properties of the selected object on the Joiner Properties

window. Through the Joiner Properties window you can launch Expression

Builder to specify the join condition.

6. Now link the OUTGRP1 of joiner to the target table.

b. What is cube operator? Discuss various attributes in cube operator.

A cube is a source-target operator that represents a cube in a mapping. When you load a

cube, you map the data flow from the source to the attribute that represents the business

identifier of the referencing level.

The Cube operator contains a group with the same name as the cube. This group

contains an attribute for each of the cube measures. It also contains the attributes for the

surrogate identifier and business identifier of each dimension level that the cube

references. Additionally, the Cube operator displays one group for each dimension that

the cube references.

In the above diagram, SALES cube contains a group with the same name as the cube.

This SALES group contains two attributes QUANTITY and SALES_AMOUNT for

representing measures. The attributes PRODUCT_NAME and PRODUCT_SKU

represents business identifiers of PRODUCT dimension. The attributes

DATE_DIM_DAY_START_DATE represents business identifiers of TIME dimension.

The attributes ACTIVE_DATE, PRODUCT and STORE are surrogate identifier for

TIME, PRODUCT and STORE dimensions respectively. Additionally, the Cube

operator displays one group for each dimension that the cube references.

Page 9: All compulsory state the assumptions same question written ...muresults.net/itacademic/TYIT6/Nov17/DWSS.pdf · (2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory

c. Explain about aggregator and filter operators in mapping.

Aggregator

It is used to do data aggregations in source data. The data aggregation is implemented

behind the scene using an SQL group by clause with an aggregation SQL function

applied to the amount(s) we want to aggregate. Following are the two properties of

aggregator:

– Group by clause

– Having clause

A single Aggregator operator can perform multiple aggregations. The Aggregator

operator has one input group and one output group. For the output group, define a

GROUP BY clause that specifies the attributes over which the aggregation is performed.

The GROUP BY clause can be specified using the Expression Builder. Through the

Aggregator Properties window of Mapping Editor, you can launch Expression Builder

to specify the Group By clauses. Finally map the attributes in the output group of the

Aggregator operator to the input group of the target.

Filter

The Filter operator is used to limit the rows from an output set based on the criteria that

we specify. It is generally implemented in a where clause in SQL to restrict the rows that

are returned. We can connect a Filter to a source object, specify the filter criteria, and get

only those records that we want in the output. It has a Filter Condition property to specify

the filter criteria.

d. Write short notes on the following: i) Extract, Transform and Load in data warehousing

ii) Source to target map

ETL stands for extract, transform and load. The ETL process transforms the data from

an application-oriented structure into a corporate data structure. Once the source and

target structures defined, we can move on to the following activities in constructing a

data warehouse.

▪ Work on extracting data from sources

▪ Perform any transformations on the data

▪ Load into target data warehouse structure

The data warehouse architect builds a source to-target data map before ETL processing

starts. The source target map specifies:

▪ what data must be placed in the data warehouse environment

▪ where that data comes from (known as source or system of record)

▪ the logic or calculation or data reformatting that must be done to the data

The data mapping is the input needed to feed the ETL process. Mappings are visual

representations of the flow of data from source to target and the operations that need to

be performed on the data.

5. Attempt any two of the following: 10

a. What is expression operator? Explain TO_CHAR() functions with proper syntax

and example.

The expression operator represents an SQL expression that can be applied to the output

to produce the desired result. Any valid SQL code for an expression can be used, and we

can reference input attributes to include them as well as functions. It’s possible to write

expressions in an Expression operator for which separate operators are predefined such

as functions. We will generally get better performance out of our mappings if we use the

prebuilt operators whenever possible rather than implement code in expressions.

The TO_CHAR function converts a DATETIME, number, or NTEXT expression to a

TEXT expression in a specified format. The syntax of TO_CHAR is as follows:

TO_CHAR(datetime-exp, [datetime-fmt])

Page 10: All compulsory state the assumptions same question written ...muresults.net/itacademic/TYIT6/Nov17/DWSS.pdf · (2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory

Where datetime-exp is the expression which is to be converted to text and datetime-fmt

is the format of resultant text.

Example:-

To_char(SALE_DATE, 'Month DD, YYYY') would return sale date as 'April 07, 2009'

b. Write a detailed note on data object validation in OWB.

• The process of validation is all about making sure the objects and mappings we've

defined in the Warehouse Builder have no obvious errors in design.

• Oracle Warehouse Builder runs a series of validation tests to ensure that data object

definitions are complete and that scripts can be generated and deployed.

• When these tests are complete, the results are displayed.

• Oracle Warehouse Builder enables you to open object editors and correct any invalid

objects before continuing.

• Validating objects and mapping can be done with the help of Design Center.

• Validation of repository objects can be done with the help of Data Object Editor.

• Validation of mapping can be done through Mapping Editor.

• The validation will result in one of the following three possibilities:

– The validation completes successfully with no warnings and/or errors.

– The validation completes successfully, but with some non-fatal warnings.

– The validation fails due to one or more errors.

c. What is constant operator? Explain steps to create constants in mapping.

The Constant operator enables you to generate constant values. The outputs of constant

operator are constants.

• Drag a Constant Operator onto the mapping canvas.

• The Constant operator only allows output, so there is no input group defined or

allowed.

• The output to from this operator are constants

• There can be multiple constants defined using single constant operator.

• Right click on the constant operator and select Open Details... from the pop up

to open the CONSTANT editor dialog box.

• This dialog box helps in defining constant values as output.

• The Output Attributes tab of the CONSTANT editor allows you to add/remove

constants.

• Let us add two Output Attributes X and Y.

• Click on X and enter the value 10 for the Expression property( through Properties

window)

• Similarly Click on Y and enter the value 20 for Expression property

• The next step is to connect our constants X and Y to target attributes where

constant values are needed.

d. Explain various functions of control center manager.

6. Attempt any two of the following: 10

a. Explain clipboard and recycle bin features of oracle warehouse builder.

Page 11: All compulsory state the assumptions same question written ...muresults.net/itacademic/TYIT6/Nov17/DWSS.pdf · (2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory

Clipboard

The Clipboard is a concept the OWB has borrowed from operating systems. The

clipboard facilitates cut, copy and paste objects. The Clipboard is a temporary storage

area for objects that you have copied or moved from one project and plan to use

somewhere else.

To view the content of the Clipboard:

– select Clipboard from the Tools menu

– or press the F8 key

Recycle Bin

• The Recycle Bin in OWB is similar to recycle bin in operating systems.

• OWB keeps deleted objects in the recycle bin.

• The deleted objects can be restored from the Recycle Bin.

• To undo a deletion select an object from the Recycle Bin and click Restore. .

• If “Put in Recycle Bin” check box is checked while deleting, then the object will be

send to the Recycle Bin.

• The Warehouse Builder Recycle Bin window can be opened by clicking on Tools

menu and selecting Recycle Bin option from the pop-up menu

• This window has a content area which shows all deleted objects.

• The content shown with Object Parent as well as Time Deleted information.

• The Object Parent means the project from which it was deleted from and the Time

Deleted is when we deleted the object.

• Below the content area it has two buttons:

– One for restoring a deleted object

– Another for emptying the content of recycle bin

b. What is Metadata Loader? What are its benefits?

The workspace objects can be exported to a file. We can export anything from an entire

project down to a single data object or mapping. Following are the benefits of Metadata

Loader exports and imports

– Backup

– To transport metadata definitions to another repository for loading

If we choose an entire project or a collection such as a node or module, it will export all

objects contained within it. If we choose any subset, it will also export the context of the

objects so that it will remember where to put them on import.

Say for example if you export a table, the metadata also contains the definition for:

– the module in which it resides

– the project the module is in

We can also choose to export any dependencies on the object being exported if they exist.

To export an object select the object which is to exported and click on Design | Export

|Warehouse Builder Metadata from the main menu. This will launch the Metadata Export

dialog box. This dialog box lists all objects selected from explorer for export. We can

specify the file name of the export file. Metadata Export dialog box has option to choose

to export all dependencies on the object being exported.

c. Explain data density and data sparsity with example.

d. Explain ROLAP and its merits.

It uses a standard relational database to store the physical data. For this data is stored in

a special structure known as a star schema or snowflake schema. A star schema consists

of a central fact table containing measures and a set of dimension tables with the

hierarchy defined by child-parent columns. In star schema model a fact table is at the

center of the star and the dimension tables as points of the star. The dimension tables are

then relationally joined with the fact table to allow multidimensional queries. The data

is retrieved from the relational database into the client tool by SQL queries.

Page 12: All compulsory state the assumptions same question written ...muresults.net/itacademic/TYIT6/Nov17/DWSS.pdf · (2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory

• Advantage:

– Because it utilizes a relational database, ROLAP can support massive

amount of data

– Same technology as existing source systems (source systems are RDBMS

based)

4. Attempt any three of the following: 15

a. Differentiate between OLTP database and Data warehouse database.

OLTP database

Data warehouse database

Application Oriented Subject Oriented Detailed data Summarized and refined Designed for real-time business transactions and

concurrent processes Designed for analysis of business

Isolated Data Integrated Data Repetitive access Ad-hoc access Performance Sensitive Performance relaxed Few Records accessed at a time Large volumes accessed at a time Optimized for common and known set of

transactions, usually intensive nature; addition,

updation and deletion of rows at a time per table.

Optimized for bulk loads and complex,

unpredictable queries that access many

rows per table Database Size 100 MB to 100 GB Database Size 100 GB to few terabytes Very minimal historical data Current as well as historical data

b.

What is Oracle Warehouse Builder? Explain the significance of projects and

modules in OWB?

Oracle Warehouse Builder is an ETL tool produced by Oracle that offers a graphical

environment to build, manage and maintain data integration processes in business

intelligence systems.

The OWB design objects are organized under a project, which provide a means for

structuring the objects for security and reusability. Each project contains nodes for each

type of design object that you can create or import. These projects are stored in a

workspace. So prior to extracting data one has to create a project. A default project called

MY_PROJECT is automatically created when you create a workspace. Alternatively,

you can rename MY_PROJECT or define more projects. A project will contain one or

more modules. Therefore before you import source metadata into Warehouse Builder,

create a module that will contain these metadata definitions. The type of module you

create depends on the source from which you are importing metadata. The Oracle

Warehouse Builder supports Oracle, Non-Oracle or Flat File modules to import metadata

definitions from Oracle database, Non-Oracle database and flat files respectively.

Page 13: All compulsory state the assumptions same question written ...muresults.net/itacademic/TYIT6/Nov17/DWSS.pdf · (2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory

c. What is time dimension? Explain various steps to create it through time dimension

wizard.

The Time/Date dimension provides the time series information to describe warehouse

data. Most of the data warehouses include a time dimension. Also the information it

contains is very similar from warehouse to warehouse. It has levels such as days, weeks,

months, etc. The Time dimension enables the warehouse users to retrieve data by time

period.

Creation of Time dimension

• Launch Design Center

• Expand the Databases node under your project node and then right-click on the

Dimensions node, a pop menu appears. Select New | Using Time Wizard... from the

pop up menu to launch the Time Dimension Wizard.

• The first screen is a welcome screen which shows various steps involved in creation

of a time dimension.

• Step1: First screen is to provide name and description for the time dimension

• Step2: This step set the storage type for this dimension either as ROLAP or as

MOLAP.

• Step3: Define the range of data stored in the time dimension. It asks us what year we

want to start with, and then how many total years to include starting with that year.

• Step4: This step is to choose hierarchy and the levels in the hierarchy. Following are

the hierarchies and the levels in the hierarchy:

1. Normal Hierarchy

• Calendar Year

• Calendar Quarter

• Calendar month

• Day

2. Week Hierarchy

• Calendar Week

• Day

• Step5: This screen shows summary of Time Dimension before creation of the

Sequence and Map

• Step6:Progress Status

d. Write a detailed note on staging and its benefits.

Staging is the process of copying the source data temporarily into tables in target

database. The purpose is to perform any cleaning and transformations before loading the

source data into the final target tables. The data staging area is a temporary area for

storage and processing. This is the place where all the extracted data is put together and

prepared for loading into the data warehouse. A staging area is like a large table with

data pulled from various sources to be loaded into a data warehouse in the required

format. In the absence of a staging area, data must be loaded directly from OLTP system

to the OLAP system. A staging area can be created

– within the database ( using Database tables )

– outside the database (using Flat files)

Advantages of staging

• Source database connection can be freed immediately after copying the data to the

staging area. The formatting and restructuring of the data happens later with data in

the staging area.

• If the ETL process needs to be restarted, there is no need to go back and disturb the

source system to retrieve the data.

Page 14: All compulsory state the assumptions same question written ...muresults.net/itacademic/TYIT6/Nov17/DWSS.pdf · (2½ hours) Total Marks: 75 N. B.: (1) All questions are compulsory

e. Describe on SUBSTR transformation function.

SUBSTR function is used get a substring of a given string value. After dropping

the Transformation Operator on the mapping, it will pop up a dialog box where we

can select SUBSTR function. This operator needs three parameters, STRING,

POSITION and SUBSTR_LENGTH. The STRING parameter represents the string field

from which substring is to be extracted. The other two parameters must be constant

integer values. The POSITION is a number indicating the start position of the substring

within the source string. The SUBSTR_LENGTH specifies the length of the substring to

extract.

Map the input field to the STRING input attribute of the SUBSTR operator. Using

Constant operator, you can supply integer constant values to POSITION and

SUBSTR_LENGTH. Following figure shows how a SUBSTR Transformation Operator

look like when it is placed on the mapping.

f. What are snapshots? Explain full snapshot and signature snapshot.

Snapshot

• A snapshot is a point in time version of an object.

• The snapshot of an object captures all the metadata information about that object at the time when

the snapshot is taken.

• It enables you to compare the current object with a previously taken snapshot.

• Since objects can be restored from snapshots, it can be used as a backup mechanism.

Full Snapshots :

• Full snapshots provide complete metadata of an object that you can use to restore it later.

• So it is suitable for making backups of objects.

• Full snapshots take longer time to create and require more storage space than signature snapshots.

Signature Snapshots:

• It captures only the signature of an object.

• A signature contains enough information about the selected metadata component to detect changes

when compared with another snapshot or the current object definition.

Signature snapshots are small and can be created quickly.