Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
(2½ hours)
Total Marks: 75
N. B.: (1) All questions are compulsory.
(2) Makesuitable assumptions wherever necessary and state the assumptions made.
(3) Answers to the same question must be written together.
(4) Numbers to the right indicate marks.
(5) Draw neat labeled diagrams wherever necessary.
(6) Use of Non-programmable calculators is allowed.
1. Attempt any two of the following: 10
a. What are operational databases? Explain following characteristics of data in a data
warehouse. i) Subject-oriented
ii) Integrated
iii) Time-variant
iv) Non-volatile
The Operational databases are often used for on-line transaction processing (OLTP). It
deals with day-to-day operations such as banking, purchasing, manufacturing,
registration, accounting, etc. These systems typically get data into the database. Each
transaction processes information about a single entity. The purpose of these queries is
to support business operations.
Features of Data warehouse
• Subject-oriented:
A data warehouse is organized around major subjects, such as customer, vendor,
product, and Sales. It focuses on the modeling and analysis of data rather than day-to-
day business operations.
• Integrated: A data warehouse is constructed by integrating data from multiple
heterogeneous data sources.
• Time variant: A data warehouse is a repository of historical data. It gives the view of
the data for a designated time frame.
• Non-volatile: A data warehouse is always a physically separate store of data
transformed from the application data found in the operational environment. Due to
this separation, a data warehouse does not require transaction processing, recovery,
and concurrency control mechanisms.
b. Explain virtual data warehouse in detail.
This option provides end users with direct access to multiple operational databases
through middleware tools. That is it provides on-the-fly data for decision support
purposes. The end users can generate “summarized data” reports for their data analysis.
The advantages of this approach are:
– Easy to build
– Elimination of the time and expense of developing a traditional data warehouse
– Flexibility
– No data redundancy
– Provides end-users with the most current corporate information
The drawbacks of this approach include:
– Repetitive transformation and integration operations
– Impacts to source systems
– Loss of historical perspective
• Virtual data warehouses often provide a starting point for organizations to learn what
end users are really looking for.
• The major drawback of this approach is that it can put the largest unplanned query
load on operational systems which will certainly affect OLTP query response time.
c. Explain star schema model in relational implementation of data warehouses.
The relational implementation of dimensional model is done using star schema. It
represents multi-dimensional data. A star schema consists of a central fact table
containing measures and a set of dimension tables. In star schema model a fact table is
at the center of the star and the dimension tables as points of the star. A star schema
represents one central set of facts. The dimension tables contain descriptions about each
of the aspects. Say for example a warehouse that store sales data, there is a sales fact
table stores facts about sales while dimension tables store data about location , clients,
items, times, branches.
Examples of sales facts are unit sales, dollar sales, sale cost etc. Facts are numeric values
which enable users to query and understand business performance metrics by
summarizing data. The primary key in each dimension table is related to a foreign key in
the fact table.
d. What is data aggregation? Briefly explain granularity of facts.
The process of summarizing information for the purpose of statistical analysis is known
as data aggregation. Data aggregation helps company data warehouses try to piece
together different kinds of data within the data warehouse so that they can have meaning
that will be useful as statistical basis for company reporting and analysis.
Granularity refers to the level of information stored in the fact table. The data in fact
table needs to be highly granular. To achieve highest granularity data should be kept in
most detailed level. Low granularity refers to data that is summarized or aggregated. The
monthly summarized data is lightly summarized when compared with yearly
summarized data. Similarly daily summarized data is lightly summarized when
compared with monthly summarized data. Say for example the levels of granularity for
a time dimension can be Year, Month, Quarter, Week, Day, Hour, Minute, Second with
the year being the highest level, in which there is low fact table granularity. The lowest
level of detail smallest unit of analysis is known as ‘grain’.
Selecting the appropriate level of granularity is totally dependent on the
business requirement. For example, if a Product dimension has only a Category attribute,
then you cannot query for information at the Brand or Item level. So for this example, it
is required to define the grain to the Item level.
2. Attempt any two of the following: 10
a. Explain about data objects and data object editor of Oracle Warehouse Builder.
This data in the target schema is in the form of data objects such as tables, views,
dimensional objects, and cubes.
Oracle Warehouse Builder uses relational and dimensional data objects and intelligence
objects as follows:
• Relational objects rely on tables and table-derived objects to store and link all
of their data. Relational objects include tables, views, materialized views, and
sequences.
• Dimensional objects contain additional metadata to identify and categorize your
data. Dimensional objects include dimensions and cubes.
OWB uses relational and dimensional data objects as follows:
• Relational objects rely on tables and table-derived objects to store and link all
of their data. Relational objects include tables, views, materialized views, and
sequences.
• Dimensional objects contain additional metadata to identify and categorize your
data. Dimensional objects include dimensions and cubes.
The Data Object Editor is the manual editor interface that the Warehouse Builder
provides to create, edit, configure, validate, and deploy Oracle data objects.
Use data object editors to:
• Create, edit, and delete relational and dimensional objects.
• Define relationships between Oracle data objects.
• Validate, generate, and deploy Oracle data objects.
• Define and edit all aspects of a data object such as its columns, constraints,
indexes, partitions, data rules, and attribute sets.
• Define implementation details for dimensional objects with a relational
implementation.
b. List various components of OWB. Explain major functions of design center.
• Following are the client side components:
– Design Center
– Repository Browser.
• Following are the server side components:
– Control Center Service
– Repository
– Target Schema.
The Design Center is the primary graphical user interface for designing a logical design
of the data warehouse.
Design Center is used to :
– import source objects
– design ETL processes
– Define the integration solution.
Design Center provides a logical design, not the physical implementation. This logical
design will be stored behind the scenes in a Workspace in the Repository on the server.
The Control Center Manager is a part of the design center. It manages communication
between target schema and design center. The Control Center Manager is used for
managing the creation of that physical implementation by deploying the designs we've
created into the Target Schema. Through the Control Center Manager we can execute
the design by running the code associated with the ETL that we've designed. The design
objects are stored as metadata in a centralized repository known as workspace. This is
where all of the design information is stored for the target systems you are creating. The
Repository Browser is another user interface used to browse design metadata. The Target
Schema is where OWB will deploy the object to, and where the execution of ETL
processes that load our data warehouse will take place. It contains the objects that were
designed in the Design Center, as well as the ETL code to load those objects.
c. What is meant by importing metadata in OWB? Explain steps to import metadata
from flat files.
Metadata is the data that describes the source data. By importing metadata Warehouse
Builder will get to know about source database objects and their location, so that it can
build the code necessary to pull the data from them when we design and run mappings
to populate data warehouse. The metadata is represented in the Warehouse Builder as
objects corresponding to the type of the source object. So if we’re representing tables in
a database, we will have tables defined in the Warehouse Builder.
Steps to import metadata from flat files
1. Creating a Flat File module
Right-Click the Files node under your project node and select New Flat File
Module to create a flat file module. Name the module. Say for example FileMod. 2. Specifying the File location
Specify the flat file location using the Locations Navigator. It indicates where the
file is located.
3. Starting the Import Metadata Wizard
Right-click the newly created module FileMod, select New, and follow the
prompts in the Import Metadata Wizard. On the summary page of the Import
Metadata Wizard, select the file whose metadata is to be imported and then
select Sample to launch the Flat File Sample Wizard. 4. Flat File Sample Wizard
Follow the prompts in the Flat File Sample Wizard to specify the metadata
structure. After sampling the flat file, return to the Summary and select Finish.
d. What is a listener? How is it configured?
Listener
In Oracle all network connections are done through the listener. The Listener is a named
process which runs on the Oracle Server. The listener process runs constantly in the
background on the database server computer awaiting for requests from clients to
connect to the Oracle database. It receives connection requests from the client and
manages the traffic of these requests to the database server.
Configuring Listener
Run Net Configuration Assistant to configure a listener.
Step 1
• The first screen is a welcome screen. Select Listener Configuration option from it and
then click next button.
Step 2
• The second screen allows you to add, reconfigure, delete or rename a listener.
• Choose Add from the given option to configure a new listener and click next.
Step 3
• The third screen asks you to enter a name for the listener.
• The default name is “LISTENER”.
• Enter a new name or continue with the default and then click next button to proceed.
Step 4
• The fourth screen is the protocol selection screen.
• By default the TCP protocol is selected in this screen.
• TCP is the standard communication protocol for internet and most local networks
• Select the protocol and click next.
Step 5
• The fifth and final screen asks the TCP/IP port number for the listener to run.
• The default port number is 1521 and continues with the default port number.
It will ask us if we want to configure another listener. Select no to finish the listener
configuration.
3. Attempt any two of the following: 10
a. Explain about dimensional modeling of data warehouse
A dimensional model represents the business rules in a more understandable way. Users
just want to know what the result is, and don't want to worry about how many tables need
to be joined in a complex query to get that result. A dimensional model removes the
complexity and represents the data in a simple, understandable, and easy to query for
the business end user.
Dimensions
A dimension is a structure that organizes data. Examples of commonly used dimensions
are Customers, Time, Store and Products. Dimensions are perspectives with respect to
which an organization wants analyze data. The dimensions are organized into levels and
hierarchies. In the following figure month, quarter and year levels of Time dimension
are depicted.
Hierarchies break down the dimensions into navigational paths which you can use to get
more granular level detail in the data. It is composed of certain levels in order. In the
above diagram, the levels year, quarter and month forms a hierarchy. The data can be
viewed at each of these levels, and the next level up would simply be a summation of all
the lower-level data within that period.
Cubes
A data warehouse cube is a multidimensional structure composed of fact tables
and dimensions. It enables us to make a multidimensional analysis of the facts. The cubes
are fast accessible, secure and user-friendly. Cubes contain measures and link to one or
more dimensions.
Example:-
Although a real world cube can represent only three dimensions (length, breadth and
height), a warehouse cube can represent any number of dimensions. One can think of
additional dimensions as being cubes within a cube. The term hypercube is used to refer
a cube with many dimensions. b. What is the use of canvas area in data object editor? Also explain about explorer
and palette windows.
The Canvas is an area of Data Object Editor, in which the contents are displayed
graphically. The objects you have created in Data Object Editor will be displayed on the
Canvas. These objects include dimensions, cubes tables and so on. Each object is
displayed in a box with the name of the object as the title of the box and attributes of the
object listed inside the box. The Canvas has three tabs: one for Relational, one for
Dimensional, and one for Business definition. They are for displaying objects of the
corresponding type. Dimensional objects such as cubes and dimensions will be displayed
on the Dimensional tab and relational objects such as tables will be displayed on the
Relational tab. The Business Definitions are for interfacing with the Oracle Discoverer
Business Intelligence tool to analyze data.
Explorer
This window is similar to the Project Explorer in Design Center. It shows all object that
can be edited with Data Object Editor. The Available Objects tab of explorer shows
objects that are available to include on the Canvas window and the Selected Objects tab
shows the objects that are currently placed on the Canvas. By clicking and dragging an
object from the Explorer you can place an already created object on the canvas.
Palette
The Palette contains various types of objects that can be used in the Data Object Editor.
The list of objects available will change as the tab is changed in the Canvas. New objects
can be created on the Canvas by clicking and dragging objects from palette to the Canvas.
This will create a new object where clicking and dragging from the Explorer will place
an already created object on the canvas.
c. Briefly explain various steps to create a dimension using dimension wizard.
Right-click on the Dimensions node under our target module, which is under Databases
in the Design Center Project Explorer. Choose New and then Using Wizard... to
launch the Create Dimension Wizard.
The first screen is the welcome screen. It shows a summary of all steps to create a
dimension. Click Next to continue.
Step1: Provide name and description
This screen asks input for name and description for the new dimension.
Step2: Set the storage type
Choose the storage type as either ROLAP or as MOLAP.
Step3: Define the dimension attributes
Define the attributes that you want define for the new dimension. It also allows you to
define surrogate key of business identifiers for the dimension.
Step4: Define the levels within the default hierarchy
The next step is where we can specify the levels in our dimension. The levels must be
entered in order from top to bottom with the highest level listed first, then down to the
lowest level.
Step5: Choose the level attributes from the dimension attributes
Moving on to the next screen, we get to specify the level attributes.
Step6: Choose the slowly changing dimension type
Choose one of the following options for slowly changing dimensions.
• Type 1 (Overwrite) - Do not keep a history. This means we basically do not care
what the old value was and just overwrite it.
• Type 2 (Add a row) – It stores the complete change history. A new record (row) is
added to the dimension table every time when a dimension value changes.
• Type 3 (Add a column) – It stores only the previous value when a dimension data
changes.
Step7: Review dimension settings
The next screen shows a summary of the actions we performed.
Step8: Dimension creation progress
This step creates the dimension.
d. What are surrogate keys? Explain the need of surrogate keys with example.
Surrogate keys are artificial keys that have no business meaning and are used as a
substitute for source system primary keys. They are generated and maintained within the
data warehouse. A Surrogate Key is a NUMBER type column and is generated using a
Sequence. The Surrogate Keys are used to uniquely identify each record in a dimension.
Surrogate keys are numeric values and hence Indexing is faster.
In the above Dimension Table CustomerID is the business key. Here the value of
dimension attribute CustomerCity is changed two times. Here the dimension is a Type 2
slowly changing dimension and hence two additional records were inserted into the
dimension table to preserve history with the same business key. This is where a surrogate
key is extremely important. In such a situation the surrogate key ID is the primary key
and has no business meaning but uniquely identifies each row in the dimension table.
4. Attempt any two of the following: 10
a. What is joiner operator? Explain steps to use it in a mapping.
The Joiner operator implements SQL joins on two or more input sets of data, and
produces a single output row set. That is it combines data from multiple input sources
into one. It has a property called Join Condition through which you can specify the
criterion for join.
Following are the steps to connect source tables to target using joiner operator
1. Place source tables by clicking and dragging them from Explorer Window of the
Mapping Editor to Mapping Area of Mapping Editor. Arrange the source tables
vertically on the left side of the mapping area.
2. Now place the target table on the right side of mapping area in a similar way.
3. Place a Joiner operator in between the source tables and target table on the
mapping area. It can be done by clicking and dragging Joiner operator from the
Palette window to the Mapping area. The Joiner operator has two input groups
and one output group by default. Each input group corresponds to a separate table,
and the output group represents the joined output from the input tables.
4. Let us suppose that you want to combine two tables using Joiner. Click and drag
INOUTGRP1 of the first source table operator onto the INGRP1 group of the
JOINER. Similarly click and drag INOUTGRP1 of the second source table
operator onto the INGRP2 group of the JOINER.
5. The nest step in the process is to specify the Join condition. Select the Joiner
operator to show properties of the selected object on the Joiner Properties
window. Through the Joiner Properties window you can launch Expression
Builder to specify the join condition.
6. Now link the OUTGRP1 of joiner to the target table.
b. What is cube operator? Discuss various attributes in cube operator.
A cube is a source-target operator that represents a cube in a mapping. When you load a
cube, you map the data flow from the source to the attribute that represents the business
identifier of the referencing level.
The Cube operator contains a group with the same name as the cube. This group
contains an attribute for each of the cube measures. It also contains the attributes for the
surrogate identifier and business identifier of each dimension level that the cube
references. Additionally, the Cube operator displays one group for each dimension that
the cube references.
In the above diagram, SALES cube contains a group with the same name as the cube.
This SALES group contains two attributes QUANTITY and SALES_AMOUNT for
representing measures. The attributes PRODUCT_NAME and PRODUCT_SKU
represents business identifiers of PRODUCT dimension. The attributes
DATE_DIM_DAY_START_DATE represents business identifiers of TIME dimension.
The attributes ACTIVE_DATE, PRODUCT and STORE are surrogate identifier for
TIME, PRODUCT and STORE dimensions respectively. Additionally, the Cube
operator displays one group for each dimension that the cube references.
c. Explain about aggregator and filter operators in mapping.
Aggregator
It is used to do data aggregations in source data. The data aggregation is implemented
behind the scene using an SQL group by clause with an aggregation SQL function
applied to the amount(s) we want to aggregate. Following are the two properties of
aggregator:
– Group by clause
– Having clause
A single Aggregator operator can perform multiple aggregations. The Aggregator
operator has one input group and one output group. For the output group, define a
GROUP BY clause that specifies the attributes over which the aggregation is performed.
The GROUP BY clause can be specified using the Expression Builder. Through the
Aggregator Properties window of Mapping Editor, you can launch Expression Builder
to specify the Group By clauses. Finally map the attributes in the output group of the
Aggregator operator to the input group of the target.
Filter
The Filter operator is used to limit the rows from an output set based on the criteria that
we specify. It is generally implemented in a where clause in SQL to restrict the rows that
are returned. We can connect a Filter to a source object, specify the filter criteria, and get
only those records that we want in the output. It has a Filter Condition property to specify
the filter criteria.
d. Write short notes on the following: i) Extract, Transform and Load in data warehousing
ii) Source to target map
ETL stands for extract, transform and load. The ETL process transforms the data from
an application-oriented structure into a corporate data structure. Once the source and
target structures defined, we can move on to the following activities in constructing a
data warehouse.
▪ Work on extracting data from sources
▪ Perform any transformations on the data
▪ Load into target data warehouse structure
The data warehouse architect builds a source to-target data map before ETL processing
starts. The source target map specifies:
▪ what data must be placed in the data warehouse environment
▪ where that data comes from (known as source or system of record)
▪ the logic or calculation or data reformatting that must be done to the data
The data mapping is the input needed to feed the ETL process. Mappings are visual
representations of the flow of data from source to target and the operations that need to
be performed on the data.
5. Attempt any two of the following: 10
a. What is expression operator? Explain TO_CHAR() functions with proper syntax
and example.
The expression operator represents an SQL expression that can be applied to the output
to produce the desired result. Any valid SQL code for an expression can be used, and we
can reference input attributes to include them as well as functions. It’s possible to write
expressions in an Expression operator for which separate operators are predefined such
as functions. We will generally get better performance out of our mappings if we use the
prebuilt operators whenever possible rather than implement code in expressions.
The TO_CHAR function converts a DATETIME, number, or NTEXT expression to a
TEXT expression in a specified format. The syntax of TO_CHAR is as follows:
TO_CHAR(datetime-exp, [datetime-fmt])
Where datetime-exp is the expression which is to be converted to text and datetime-fmt
is the format of resultant text.
Example:-
To_char(SALE_DATE, 'Month DD, YYYY') would return sale date as 'April 07, 2009'
b. Write a detailed note on data object validation in OWB.
• The process of validation is all about making sure the objects and mappings we've
defined in the Warehouse Builder have no obvious errors in design.
• Oracle Warehouse Builder runs a series of validation tests to ensure that data object
definitions are complete and that scripts can be generated and deployed.
• When these tests are complete, the results are displayed.
• Oracle Warehouse Builder enables you to open object editors and correct any invalid
objects before continuing.
• Validating objects and mapping can be done with the help of Design Center.
• Validation of repository objects can be done with the help of Data Object Editor.
• Validation of mapping can be done through Mapping Editor.
• The validation will result in one of the following three possibilities:
– The validation completes successfully with no warnings and/or errors.
– The validation completes successfully, but with some non-fatal warnings.
– The validation fails due to one or more errors.
c. What is constant operator? Explain steps to create constants in mapping.
The Constant operator enables you to generate constant values. The outputs of constant
operator are constants.
• Drag a Constant Operator onto the mapping canvas.
• The Constant operator only allows output, so there is no input group defined or
allowed.
• The output to from this operator are constants
• There can be multiple constants defined using single constant operator.
• Right click on the constant operator and select Open Details... from the pop up
to open the CONSTANT editor dialog box.
• This dialog box helps in defining constant values as output.
• The Output Attributes tab of the CONSTANT editor allows you to add/remove
constants.
• Let us add two Output Attributes X and Y.
• Click on X and enter the value 10 for the Expression property( through Properties
window)
• Similarly Click on Y and enter the value 20 for Expression property
• The next step is to connect our constants X and Y to target attributes where
constant values are needed.
d. Explain various functions of control center manager.
6. Attempt any two of the following: 10
a. Explain clipboard and recycle bin features of oracle warehouse builder.
Clipboard
The Clipboard is a concept the OWB has borrowed from operating systems. The
clipboard facilitates cut, copy and paste objects. The Clipboard is a temporary storage
area for objects that you have copied or moved from one project and plan to use
somewhere else.
To view the content of the Clipboard:
– select Clipboard from the Tools menu
– or press the F8 key
Recycle Bin
• The Recycle Bin in OWB is similar to recycle bin in operating systems.
• OWB keeps deleted objects in the recycle bin.
• The deleted objects can be restored from the Recycle Bin.
• To undo a deletion select an object from the Recycle Bin and click Restore. .
• If “Put in Recycle Bin” check box is checked while deleting, then the object will be
send to the Recycle Bin.
• The Warehouse Builder Recycle Bin window can be opened by clicking on Tools
menu and selecting Recycle Bin option from the pop-up menu
• This window has a content area which shows all deleted objects.
• The content shown with Object Parent as well as Time Deleted information.
• The Object Parent means the project from which it was deleted from and the Time
Deleted is when we deleted the object.
• Below the content area it has two buttons:
– One for restoring a deleted object
– Another for emptying the content of recycle bin
b. What is Metadata Loader? What are its benefits?
The workspace objects can be exported to a file. We can export anything from an entire
project down to a single data object or mapping. Following are the benefits of Metadata
Loader exports and imports
– Backup
– To transport metadata definitions to another repository for loading
If we choose an entire project or a collection such as a node or module, it will export all
objects contained within it. If we choose any subset, it will also export the context of the
objects so that it will remember where to put them on import.
Say for example if you export a table, the metadata also contains the definition for:
– the module in which it resides
– the project the module is in
We can also choose to export any dependencies on the object being exported if they exist.
To export an object select the object which is to exported and click on Design | Export
|Warehouse Builder Metadata from the main menu. This will launch the Metadata Export
dialog box. This dialog box lists all objects selected from explorer for export. We can
specify the file name of the export file. Metadata Export dialog box has option to choose
to export all dependencies on the object being exported.
c. Explain data density and data sparsity with example.
d. Explain ROLAP and its merits.
It uses a standard relational database to store the physical data. For this data is stored in
a special structure known as a star schema or snowflake schema. A star schema consists
of a central fact table containing measures and a set of dimension tables with the
hierarchy defined by child-parent columns. In star schema model a fact table is at the
center of the star and the dimension tables as points of the star. The dimension tables are
then relationally joined with the fact table to allow multidimensional queries. The data
is retrieved from the relational database into the client tool by SQL queries.
• Advantage:
– Because it utilizes a relational database, ROLAP can support massive
amount of data
– Same technology as existing source systems (source systems are RDBMS
based)
4. Attempt any three of the following: 15
a. Differentiate between OLTP database and Data warehouse database.
OLTP database
Data warehouse database
Application Oriented Subject Oriented Detailed data Summarized and refined Designed for real-time business transactions and
concurrent processes Designed for analysis of business
Isolated Data Integrated Data Repetitive access Ad-hoc access Performance Sensitive Performance relaxed Few Records accessed at a time Large volumes accessed at a time Optimized for common and known set of
transactions, usually intensive nature; addition,
updation and deletion of rows at a time per table.
Optimized for bulk loads and complex,
unpredictable queries that access many
rows per table Database Size 100 MB to 100 GB Database Size 100 GB to few terabytes Very minimal historical data Current as well as historical data
b.
What is Oracle Warehouse Builder? Explain the significance of projects and
modules in OWB?
Oracle Warehouse Builder is an ETL tool produced by Oracle that offers a graphical
environment to build, manage and maintain data integration processes in business
intelligence systems.
The OWB design objects are organized under a project, which provide a means for
structuring the objects for security and reusability. Each project contains nodes for each
type of design object that you can create or import. These projects are stored in a
workspace. So prior to extracting data one has to create a project. A default project called
MY_PROJECT is automatically created when you create a workspace. Alternatively,
you can rename MY_PROJECT or define more projects. A project will contain one or
more modules. Therefore before you import source metadata into Warehouse Builder,
create a module that will contain these metadata definitions. The type of module you
create depends on the source from which you are importing metadata. The Oracle
Warehouse Builder supports Oracle, Non-Oracle or Flat File modules to import metadata
definitions from Oracle database, Non-Oracle database and flat files respectively.
c. What is time dimension? Explain various steps to create it through time dimension
wizard.
The Time/Date dimension provides the time series information to describe warehouse
data. Most of the data warehouses include a time dimension. Also the information it
contains is very similar from warehouse to warehouse. It has levels such as days, weeks,
months, etc. The Time dimension enables the warehouse users to retrieve data by time
period.
Creation of Time dimension
• Launch Design Center
• Expand the Databases node under your project node and then right-click on the
Dimensions node, a pop menu appears. Select New | Using Time Wizard... from the
pop up menu to launch the Time Dimension Wizard.
• The first screen is a welcome screen which shows various steps involved in creation
of a time dimension.
• Step1: First screen is to provide name and description for the time dimension
• Step2: This step set the storage type for this dimension either as ROLAP or as
MOLAP.
• Step3: Define the range of data stored in the time dimension. It asks us what year we
want to start with, and then how many total years to include starting with that year.
• Step4: This step is to choose hierarchy and the levels in the hierarchy. Following are
the hierarchies and the levels in the hierarchy:
1. Normal Hierarchy
• Calendar Year
• Calendar Quarter
• Calendar month
• Day
2. Week Hierarchy
• Calendar Week
• Day
• Step5: This screen shows summary of Time Dimension before creation of the
Sequence and Map
• Step6:Progress Status
d. Write a detailed note on staging and its benefits.
Staging is the process of copying the source data temporarily into tables in target
database. The purpose is to perform any cleaning and transformations before loading the
source data into the final target tables. The data staging area is a temporary area for
storage and processing. This is the place where all the extracted data is put together and
prepared for loading into the data warehouse. A staging area is like a large table with
data pulled from various sources to be loaded into a data warehouse in the required
format. In the absence of a staging area, data must be loaded directly from OLTP system
to the OLAP system. A staging area can be created
– within the database ( using Database tables )
– outside the database (using Flat files)
Advantages of staging
• Source database connection can be freed immediately after copying the data to the
staging area. The formatting and restructuring of the data happens later with data in
the staging area.
• If the ETL process needs to be restarted, there is no need to go back and disturb the
source system to retrieve the data.
e. Describe on SUBSTR transformation function.
SUBSTR function is used get a substring of a given string value. After dropping
the Transformation Operator on the mapping, it will pop up a dialog box where we
can select SUBSTR function. This operator needs three parameters, STRING,
POSITION and SUBSTR_LENGTH. The STRING parameter represents the string field
from which substring is to be extracted. The other two parameters must be constant
integer values. The POSITION is a number indicating the start position of the substring
within the source string. The SUBSTR_LENGTH specifies the length of the substring to
extract.
Map the input field to the STRING input attribute of the SUBSTR operator. Using
Constant operator, you can supply integer constant values to POSITION and
SUBSTR_LENGTH. Following figure shows how a SUBSTR Transformation Operator
look like when it is placed on the mapping.
f. What are snapshots? Explain full snapshot and signature snapshot.
Snapshot
• A snapshot is a point in time version of an object.
• The snapshot of an object captures all the metadata information about that object at the time when
the snapshot is taken.
• It enables you to compare the current object with a previously taken snapshot.
• Since objects can be restored from snapshots, it can be used as a backup mechanism.
Full Snapshots :
• Full snapshots provide complete metadata of an object that you can use to restore it later.
• So it is suitable for making backups of objects.
• Full snapshots take longer time to create and require more storage space than signature snapshots.
Signature Snapshots:
• It captures only the signature of an object.
• A signature contains enough information about the selected metadata component to detect changes
when compared with another snapshot or the current object definition.
Signature snapshots are small and can be created quickly.