Upload
shravan-kumar
View
220
Download
0
Embed Size (px)
Citation preview
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
1/46
Introduction to Extraction Methods in Data Warehouses
Extraction is the operation of extracting data from a source system for further use in a data
warehouse environment. This is the first step of the ETL process. After the extraction,
this data can be transformed and loaded into the data warehouse.
The extraction method you should choose is highly dependent on the source system
and also from the business needs in the target data warehouse environment. Very
often there!s no possibility to add additional logic to the source systems to enhance
an incremental extraction of data due to the performance or the increased wor"load
of these systems. #ometimes even the customer is not allowed to add anything to
an out$of$the$box application system.
The estimated amount of the data to be extracted and the stage in the ET% process&initial load or maintenance of data' may also impact the decision of how to extract
from a logical and a physical perspective. (asically you have to decide how to
extract data logically and physically.
%ogical Extraction Methods
There are two "inds of logical extraction)
*ull Extraction
Incremental Extraction
*ull Extraction
The data is extracted completely from the source system. #ince this extraction
re+ects all the data currently available on the source system there!s no need to"eep trac" of changes to the data source since the last successful extraction. The
source data will be provided as$is and no additional logical information &for example
timestamps' is necessary on the source site. ,n example for a full extraction may
be an export -le of a distinct table or a remote #% statement scanning the
complete source table.
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
2/46
Incremental Extraction
,t a speci-c point in time only the data that has changed since a well$de-ned
event bac" in history will be extracted. This event may be the last time of extraction
or a more complex business event li"e the last boo"ing day of a -scal period. To
identify this delta change there must be a possibility to identify all the changed
information since this speci-c time event. This information can be either provided
by the source data itself li"e an application column re+ecting the last$changed
timestamp or a change table where an appropriate additional mechanism "eeps
trac" of the changes besides the originating transactions. In most cases using the
latter method means adding extraction logic to the source system.
Many data warehouses do not use any change$capture techni/ues as part of the
extraction process. Instead entire tables from the source systems are extracted to
the data warehouse or staging area and these tables are compared with a previous
extract from the source system to identify the changed data. This approach may not
have signi-cant impact on the source systems but it clearly can place a
considerable burden on the data warehouse processes particularly if the data
volumes are large.
0racle!s 1hange Data 1apture mechanism can extract and maintain such delta
information.
#ee ,lso)
1hapter 23 41hange Data 1apture4 for further details about the 1hange Data
1apture framewor"
5hysical Extraction Methods
Depending on the chosen logical extraction method and the capabilities and
restrictions on the source side the extracted data can be physically extracted by
two mechanisms. The data can either be extracted online from the source system or
from an o6ine structure. #uch an o6ine structure might already exist or it might be
generated by an extraction routine.
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
3/46
There are the following methods of physical extraction)
0nline Extraction
06ine Extraction
0nline Extraction
The data is extracted directly from the source system itself. The extraction process
can connect directly to the source system to access the source tables themselves or
to an intermediate system that stores the data in a precon-gured manner &for
example snapshot logs or change tables'. 7ote that the intermediate system is not
necessarily physically di8erent from the source system.
With online extractions you need to consider whether the distributed transactions
are using original source ob9ects or prepared source ob9ects.
06ine Extraction
The data is not extracted directly from the source system but is staged explicitly
outside the original source system. The data already has an existing structure &for
example redo logs archive logs or transportable tablespaces' or was created by an
extraction routine.
:ou should consider the following structures)
*lat -les
Data in a de-ned generic format. ,dditional information about the source ob9ect is
necessary for further processing.
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
4/46
Dump -les
0racle$speci-c format. Information about the containing ob9ects is included.
;edo and archive logs
Information is in a special additional dump -le.
Data transformation is the process of converting data from one format &e.g. a
database -le
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
5/46
The structure of stored data may also vary between applications re/uiring semantic
mapping prior to the transformation process. *or instance two applications might
store the same customer credit card information using slightly di8erent structures)
$ #ee more at) https)>>www.mulesoft.com>resources>esb>data$
transformation?sthash.7@y7:#@.dpuf
Transportation in Data Warehouses
The following topics provide information about transporting data into a data
warehouse:
• Overview of Transportation in Data Warehouses
• Introduction to Transportation Mechanisms in Data Warehouses
Overview of Transportation in Data Warehouses
Transportation is the operation of moving data from one system to another system. In
a data warehouse environment, the most common reuirements for transportation arein moving data from:
• ! source system to a staging database or a data warehouse database
• ! staging database to a data warehouse
• ! data warehouse to a data mart
Transportation is often one of the simpler portions of the "T# process, and can be
integrated with other portions of the process. $or e%ample, as shown in &hapter '',("%traction in Data Warehouses(, distributed uery technology provides a mechanism for
both e%tracting and transporting data.
Introduction to Transportation Mechanisms in DataWarehouses
https://www.mulesoft.com/resources/esb/data-transformation#sthash.NQUyNYSU.dpufhttps://www.mulesoft.com/resources/esb/data-transformation#sthash.NQUyNYSU.dpufhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#11969https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#13103https://docs.oracle.com/cd/A97630_01/server.920/a96520/extract.htm#11221https://docs.oracle.com/cd/A97630_01/server.920/a96520/extract.htm#11221https://docs.oracle.com/cd/A97630_01/server.920/a96520/extract.htm#11221https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#11969https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#13103https://docs.oracle.com/cd/A97630_01/server.920/a96520/extract.htm#11221https://docs.oracle.com/cd/A97630_01/server.920/a96520/extract.htm#11221https://www.mulesoft.com/resources/esb/data-transformation#sthash.NQUyNYSU.dpufhttps://www.mulesoft.com/resources/esb/data-transformation#sthash.NQUyNYSU.dpuf
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
6/46
)ou have three basic choices for transporting data in warehouses:
• Transportation *sing $lat $iles
• Transportation Through Distributed Operations
• Transportation *sing Transportable Tablespaces
Transportation Using Flat Files
The most common method for transporting data is by the transfer of flat files, using
mechanisms such as $T+ or other remote file system access protocols. Data is
unloaded or e%ported from the source system into flat files using techniues discussed
in &hapter '', ("%traction in Data Warehouses(, and is then transported to the target
platform using $T+ or similar mechanisms.
ecause source systems and data warehouses often use different operating systems
and database systems, using flat files is often the simplest way to e%change data
between heterogeneous systems with minimal transformations. -owever, even when
transporting data between homogeneous systems, flat files are often the most efficient
and most easytomanage mechanism for data transfer.
Transportation Through Distributed Operations
Distributed ueries, either with or without gateways, can be an effective mechanism
for e%tracting data. These mechanisms also transport the data directly to the targetsystems, thus providing both e%traction and transformation in a single step.
Depending on the tolerable impact on time and system resources, these mechanisms
can be well suited for both e%traction and transformation.
!s opposed to flat file transportation, the success or failure of the transportation is
recogni/ed immediately with the result of the distributed uery or transaction.
Transportation Using Transportable Tablespaces
Oracle0i introduced an important mechanism for transporting data: transportable
tablespaces. This feature is the fastest way for moving large volumes of data between
two Oracle databases.
https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#13152https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12367https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12010https://docs.oracle.com/cd/A97630_01/server.920/a96520/extract.htm#11221https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#13152https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12367https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12010https://docs.oracle.com/cd/A97630_01/server.920/a96520/extract.htm#11221
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
7/46
+revious to Oracle0i, the most scalable data transportation mechanisms relied on
moving flat files containing raw data. These mechanisms reuired that data be
unloaded or e%ported into files from the source database, Then, after transportation,
these files were loaded or imported into the target database. Transportable tablespaces
entirely bypass the unload and reload steps.
*sing transportable tablespaces, Oracle data files 1containing table data, inde%es, and
almost every other Oracle database ob2ect3 can be directly transported from one
database to another. $urthermore, li4e import and e%port, transportable tablespaces
provide a mechanism for transporting metadata in addition to transporting data.
Transportable tablespaces have some notable limitations: source and target systems
must be running Oracle0i 1or higher3, must be running the same operating system,
must use the same character set, and, prior to Oracle5i, must use the same bloc4 si/e.
Despite these limitations, transportable tablespaces can be an invaluable data
transportation techniue in many warehouse environments.
The most common applications of transportable tablespaces in data warehouses are in
moving data from a staging database to a data warehouse, or in moving data from a
data warehouse to a data mart.
ee !lso"
Oracle9i Database Concepts for more information on transportabletablespaces
Transportable Tablespaces Example
6uppose that you have a data warehouse containing sales data, and several data marts
that are refreshed monthly. !lso suppose that you are going to move one month of
sales data from the data warehouse to the data mart.
tep #" $lace the Data to be Transported into its own Tablespace
The current month7s data must be placed into a separate tablespace in order to be
transported. In this e%ample, you have a tablespace ts_temp_sales, which will hold acopy of the current month7s data. *sing the CREATE TABLE ... AS SELECT statement, the
current month7s data can be efficiently copied to this tablespace:
CREATE TABLE temp_jan_salesNOLOGGINGTABLESPACE ts_temp_salesAS
https://docs.oracle.com/cd/A97630_01/server.920/a96524/c04space.htm#CNCPT003https://docs.oracle.com/cd/A97630_01/server.920/a96524/c04space.htm#CNCPT003
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
8/46
SELECT * FROM salesWHERE time_id BETWEEN !"#$EC#"%%% AN$ &"#FEB#'&&&(
$ollowing this operation, the tablespace ts_temp_sales is set to readonly:
ALTER TABLESPACE ts_temp_sales REA$ ONL)(
! tablespace cannot be transported unless there are no active transactions modifying
the tablespace. 6etting the tablespace to readonly enforces this.
The tablespace ts_temp_sales may be a tablespace that has been especially created to
temporarily store data for use by the transportable tablespace features. $ollowing (6tep
8: &opy the Datafiles and "%port $ile to the Target 6ystem(, this tablespace can be set to
read9write, and, if desired, the table temp_jan_sales can be dropped, or the tablespace
can be reused for other transportations or for other purposes.
In a given transportable tablespace operation, all of the ob2ects in a given tablespace
are transported. !lthough only one table is being transported in this e%ample, the
tablespace ts_temp_sales could contain multiple tables. $or e%ample, perhaps the data
mart is refreshed not only with the new month7s worth of sales transactions, but also
with a new copy of the customer table. oth of these tables could be transported in the
same tablespace. Moreover, this tablespace could also contain other database ob2ects
such as inde%es, which would also be transported.
!dditionally, in a given transportabletablespace operation, multiple tablespaces can be transported at the same time. This ma4es it easier to move very large volumes of
data between databases. ote, however, that the transportable tablespace feature can
only transport a set of tablespaces which contain a complete set of database ob2ects
without dependencies on other tablespaces. $or e%ample, an inde% cannot be
transported without its table, nor can a partition be transported without the rest of the
table. )ou can use the $BMS_TTS pac4age to chec4 that a tablespace is transportable.
ee !lso"
Oracle9i Supplied PL/SQL Packages and Types Reference for detailed
information about the $BMS_TTS pac4age
In this step, we have copied the ;anuary sales data into a separate tablespace< however,
in some cases, it may be possible to leverage the transportable tablespace feature
without even moving data to a separate tablespace. If the sales table has been
partitioned by month in the data warehouse and if each partition is in its own
https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12539https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12539https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12539https://docs.oracle.com/cd/A97630_01/appdev.920/a96612/d_tts.htm#ARPLS063https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12539https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#12539https://docs.oracle.com/cd/A97630_01/appdev.920/a96612/d_tts.htm#ARPLS063
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
9/46
tablespace, then it may be possible to directly transport the tablespace containing the
;anuary data. 6uppose the ;anuary partition, sales_jan'&&&, is located in the
tablespace ts_sales_jan'&&&. Then the tablespace ts_sales_jan'&&& could potentially
be transported, rather than creating a temporary copy of the ;anuary sales data in
the ts_temp_sales.
-owever, the same conditions must be satisfied in order to transport the
tablespace ts_sales_jan'&&& as are reuired for the specially created tablespace. $irst,
this tablespace must be set to REA$ ONL). 6econd, because a single partition of a
partitioned table cannot be transported without the remainder of the partitioned table
also being transported, it is necessary to e%change the ;anuary partition into a separate
table 1using the ALTER TABLE statement3 to transport the ;anuary data.
The ECHANGE operation is very uic4, but the ;anuary data will no longer be a part of
the underlying sales table, and thus may be unavailable to users until this data is
e%changed bac4 into the sales table after the e%port of the metadata. The ;anuary data
can be e%changed bac4 into the sales table after you complete step 8.
tep %" Export the Metadata
The "%port utility is used to e%port the metadata describing the ob2ects contained in
the transported tablespace. $or our e%ample scenario, the "%port command could be:
EP TRANSPORT_TABLESPACE+,TABLESPACES+ts_temp_sales
FILE+jan_sales-dmp
This operation will generate an e%port file, jan_sales-dmp. The e%port file will be
small, because it contains only metadata. In this case, the e%port file will contain
information describing the tabletemp_jan_sales, such as the column names, column
datatype, and all other information that the target Oracle database will need in order to
access the ob2ects in ts_temp_sales.
tep &" 'op( the Datafiles and Export File to the Target (stem
&opy the data files that ma4e up ts_temp_sales, as well as the e%port
file jan_sales-dmp to the data mart platform, using any transportation mechanism forflat files.
Once the datafiles have been copied, the tablespace ts_temp_sales can be set
to REA$ WRITE mode if desired.
tep )" Import the Metadata
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
10/46
Once the files have been copied to the data mart, the metadata should be imported into
the data mart:
IMP TRANSPORT_TABLESPACE+, $ATAFILES+.d/.tempjan-0TABLESPACES+ts_temp_salesFILE+jan_sales-dmp
!t this point, the tablespace ts_temp_sales and the table temp_sales_jan are
accessible in the data mart. )ou can incorporate this new data into the data mart7s
tables.
)ou can insert the data from the temp_sales_jan table into the data mart7s sales table
in one of two ways:
INSERT .*1 APPEN$ *. INTO sales SELECT * FROM temp_sales_jan(
$ollowing this operation, you can delete the temp_sales_jan table 1and even the
entire ts_temp_sales tablespace3.
!lternatively, if the data mart7s sales table is partitioned by month, then the new
transported tablespace and the temp_sales_jan table can become a permanent part of
the data mart. The temp_sales_jan table can become a partition of the data mart7s sales
table:
ALTER TABLE sales A$$ PARTITION sales_&&jan 2AL3ES
LESS THAN 4TO_$ATE4&"#0e/#'&&&5dd#m6n#,,,,77(ALTER TABLE sales ECHANGE PARTITION sales_&&janWITH TABLE temp_sales_jan
INCL3$ING IN$EES WITH 2ALI$ATION(
Other Uses of Transportable Tablespaces
The previous e%ample illustrates a typical scenario for transporting data in a data
warehouse. -owever, transportable tablespaces can be used for many other purposes.
In a data warehousing environment, transportable tablespaces should be viewed as a
utility 1much li4e Import9"%port or 6=#>#oader3, whose purpose is to move large
volumes of data between Oracle databases. When used in con2unction with paralleldata movement operations such as
the CREATE TABLE ... AS SELECT and INSERT ... AS SELECT statements, transportable
tablespaces provide an important mechanism for uic4ly transporting data for many
purposes.
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
11/46
*oading and Transformation
This chapter helps you create and manage a data warehouse, and discusses:
• Overview of #oading and Transformation in Data Warehouses
• #oading Mechanisms
• Transformation Mechanisms
• #oading and Transformation 6cenarios
Overview of *oading and Transformation in Data
WarehousesData transformations are often the most comple% and, in terms of processing time, the
most costly part of the "T# process. They can range from simple data conversions to
e%tremely comple% data scrubbing techniues. Many, if not all, data transformations
can occur within an Oracle5i database, although transformations are often
implemented outside of the database 1for e%ample, on flat files3 as well.
This chapter introduces techniues for implementing scalable and efficient data
transformations within Oracle5i. The e%amples in this chapter are relatively simple.
?ealworld data transformations are often considerably more comple%. -owever, thetransformation techniues introduced in this chapter meet the ma2ority of realworld
data transformation reuirements, often with more scalability and less programming
than alternative approaches.
This chapter does not see4 to illustrate all of the typical transformations that would be
encountered in a data warehouse, but to demonstrate the types of fundamental
technology that can be applied to implement these transformations and to provide
guidance in how to choose the best techniues.
Transformation Flow
$rom an architectural perspective, you can transform your data in two ways:
• Multistage Data Transformation
• +ipelined Data Transformation
https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#11197https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13130https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13132https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13785https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13754https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14040https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#11197https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13130https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13132https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13785https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13754https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14040
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
12/46
Multistage Data Transformation
The data transformation logic for most data warehouses consists of multiple steps. $or
e%ample, in transforming new records to be inserted into a sales table, there may be
separate logical transformation steps to validate each dimension 4ey.
$igure '8' offers a graphical way of loo4ing at the transformation logic.
Figure 13-1 Multistage Data Transformation
Te%t description of the illustration [email protected]
When using Oracle5i as a transformation engine, a common strategy is to implement
each different transformation as a separate 6=# operation and to create a separate,
temporary staging table 1such as the
tables ne8_sales_step" and ne8_sales_step' in $igure '8'3 to store the incremental
results for each step. This loadthentransform strategy also provides a natural
chec4pointing scheme to the entire transformation process, which enables to the
process to be more easily monitored and restarted. -owever, a disadvantage to
multistaging is that the space and time reuirements increase.
It may also be possible to combine many simple logical transformations into a single
6=# statement or single +#96=# procedure. Doing so may provide better
performance than performing each step independently, but it may also introduce
https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13760https://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg025.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13760https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13760https://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg025.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg025.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13760
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
13/46
difficulties in modifying, adding, or dropping individual transformations, as well as
recovering from failed transformations.
$ipelined Data Transformation
With the introduction of Oracle5i, Oracle7s database capabilities have beensignificantly enhanced to address specifically some of the tas4s in "T# environments.
The "T# process flow can be changed dramatically and the database becomes an
integral part of the "T# solution.
The new functionality renders some of the former necessary process steps obsolete
whilst some others can be remodeled to enhance the data flow and the data
transformation to become more scalable and noninterruptive. The tas4 shifts from
serial transformthenload process 1with most of the tas4s done outside the database3
or loadthentransform process, to an enhanced transformwhileloading.
Oracle5i offers a wide variety of new capabilities to address all the issues and tas4s
relevant in an "T# scenario. It is important to understand that the database offers
tool4it functionality rather than trying to address a onesi/efitsall solution. The
underlying database has to enable the most appropriate "T# process flow for a
specific customer need, and not dictate or constrain it from a technical
perspective.$igure '8A illustrates the new functionality, which is discussed throughout
later sections.
Figure 13-2 Pipelined Data Transformation
Te%t description of the illustration dwg0'@CB.gif
*oading Mechanisms
https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14071https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14071https://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwg81065.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14071https://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwg81065.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwg81065.htm
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
14/46
)ou can use the following mechanisms for loading a warehouse:
• 6=#>#oader
• "%ternal Tables
• O&I and Direct+ath !+Is
• "%port9Import
+*,*oader
efore any data transformations can occur within the database, the raw data must
become accessible for the database. One approach is to load it into the
database. &hapter 'A, (Transportation in Data Warehouses(, discusses several techniues
for transporting data to an Oracle data warehouse. +erhaps the most common
techniue for transporting data is by way of flat files.
6=#>#oader is used to move data from flat files into an Oracle data warehouse.
During this data load, 6=#>#oader can also be used to implement basic data
transformations. When using directpath 6=#>#oader, basic data manipulation, such
as datatype conversion and simple N3LL handling, can be automatically resolved
during the data load. Most data warehouses use directpath loading for performance
reasons.
Oracle7s conventionalpath loader provides broader capabilities for data
transformation than a directpath loader: 6=# functions can be applied to any column
as those values are being loaded. This provides a rich capability for transformations
during the data load. -owever, the conventionalpath loader is slower than directpath
loader. $or these reasons, the conventionalpath loader should be considered primarily
for loading and transforming smaller amounts of data.
ee !lso"
Oracle9i Database Utilities for more information on 6=#>#oader
The following is a simple e%ample of a 6=#>#oader controlfile to load data into
the sales table of the s9 sample schema from an e%ternal file s9_sales-dat. The
e%ternal flat file s9_sales-dat consists of sales transaction data, aggregated on a daily
level. ot all columns of this e%ternal file are loaded into sales. This e%ternal file will
https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13131https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13134https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13135https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13136https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#1020https://docs.oracle.com/cd/A97630_01/server.920/a96652/toc.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13131https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13134https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13135https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13136https://docs.oracle.com/cd/A97630_01/server.920/a96520/transpor.htm#1020https://docs.oracle.com/cd/A97630_01/server.920/a96652/toc.htm
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
15/46
also be used as source for loading the second fact table of the s9 sample schema,
which is done using an e%ternal table:
The following shows the controlfile 4s9_sales-:tl3 to load the sales table:
LOA$ $ATAINFILE s9_sales-datAPPEN$ INTO TABLE salesFIEL$S TERMINATE$ B) ; s?lld@ s9.s9 :6nt@6l+s9_sales-:tl di@e:t+t@e
External Tables
!nother approach for handling e%ternal data sources is using e%ternal tables.
Oracle5is e%ternal table feature enables you to use e%ternal data as a virtual table that
can be ueried and 2oined directly and in parallel without reuiring the e%ternal data to
be first loaded in the database. )ou can then use 6=#, +#96=#, and ;ava to access the
e%ternal data.
"%ternal tables enable the pipelining of the loading phase with the transformation
phase. The transformation process can be merged with the loading process without
any interruption of the data streaming. It is no longer necessary to stage the data insidethe database for further processing inside the database, such as comparison or
transformation. $or e%ample, the conversion functionality of a conventional load can
be used for a directpath INSERT AS SELECT statement in con2unction with
the SELECT from an e%ternal table.
The main difference between e%ternal tables and regular tables is that e%ternally
organi/ed tables are readonly. o DM# operations 13P$ATE9INSERT9$ELETE3 are
possible and no inde%es can be created on them.
Oracle5i7s e%ternal tables are a complement to the e%isting 6=#>#oader functionality,and are especially useful for environments where the complete e%ternal source has to
be 2oined with e%isting database ob2ects and transformed in a comple% manner, or
where the e%ternal data volume is large and used only once. 6=#>#oader, on the other
hand, might still be the better choice for loading of data where additional inde%ing of
the staging table is necessary. This is true for operations where the data is used in
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
16/46
independent comple% transformations or the data is only partially used in further
processing.
ee !lso"
Oracle9i SQL Reference for a complete description of e%ternal table synta%and restrictions and Oracle9i Database Utilities for usage e%amples
)ou can create an e%ternal table named sales_t@ansa:ti6ns_et, representing the
structure of the complete sales transaction data, represented in the e%ternal
file s9_sales-dat. The product department is especially interested in a cost analysis on
product and time. We thus create a fact table named :6st in the sales 9ist6@, schema.
The operational source data is the same as for the sales fact table. -owever, because
we are not investigating every dimensional information that is provided, the data in
the cost fact table has a coarser granularity than in the sales fact table, for e%ample, alldifferent distribution channels are aggregated.
We cannot load the data into the cost fact table without applying the previously
mentioned aggregation of the detailed information, due to the suppression of some of
the dimensions.
Oracle7s e%ternal table framewor4 offers a solution to solve this. *nli4e 6=#>#oader,
where you would have to load the data before applying the aggregation, you can
combine the loading and transformation within a single 6=# DM# statement, as
shown in the following. )ou do not have to stage the data temporarily before insertinginto the target table.
The Oracle ob2ect directories must already e%ist, and point to the directory containing
the s9_sales-dat file as well as the directory containing the bad and log files.
CREATE TABLE sales_t@ansa:ti6ns_et4 PRO$_I$ N3MBER475 C3ST_I$ N3MBER5 TIME_I$ $ATE5 CHANNEL_I$ CHAR4"75
PROMO_I$ N3MBER475 =3ANTIT)_SOL$ N3MBER4!75 AMO3NT_SOL$ N3MBER4"&5'75 3NIT_COST N3MBER4"&5'75 3NIT_PRICE N3MBER4"&5'77ORGANIDATION ete@nal4 T)PE 6@a:le_l6ade@ $EFA3LT $IRECTOR) data_0ile_di@
https://docs.oracle.com/cd/A97630_01/server.920/a96540/toc.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96652/toc.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96540/toc.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96652/toc.htm
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
17/46
ACCESS PARAMETERS4
RECOR$S $ELIMITE$ B) NEWLINE CHARACTERSET 3SASCII BA$FILE l6_0ile_di@s9_sales-/ad_t LOGFILE l6_0ile_di@s9_sales-l6_t FIEL$S TERMINATE$ B) ;
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
18/46
)ou have the following choices for transforming data inside the database:
• Transformation *sing 6=#
• Transformation *sing +#96=#
• Transformation *sing Table $unctions
Transformation Using +*
Once data is loaded into an Oracle5i database, data transformations can be e%ecuted
using 6=# operations. There are four basic techniues for implementing 6=# data
transformations within Oracle5i:
• &?"!T" T!#" ... !6 6"#"&T !nd I6"?T 9>E!++"D>9 !6 6"#"&T
• Transformation *sing *+D!T"
• Transformation *sing M"?F"
• Transformation *sing Multitable I6"?T
'/E!TE T!0*E 111 ! E*E'T !nd I2E/T .,3!$$E2D,. ! E*E'T
The CREATE TABLE ... AS SELECT statement 1&T!63 is a powerful tool for manipulating
large sets of data. !s shown in the following e%ample, many data transformations can
be e%pressed in standard 6=#, and &T!6 provides a mechanism for efficiently
e%ecuting a 6=# uery and storing the results of that uery in a new database table.
The INSERT 9>EAPPEN$>9 ... AS SELECT statement offers the same capabilities with
e%isting database tables.
In a data warehouse environment, &T!6 is typically run in parallel
using NOLOGGING mode for best performance.
! simple and common type of data transformation is data substitution. In a data
substitution transformation, some or all of the values of a single column are modified.
$or e%ample, our sales table has a:9annel_id column. This column indicates whether
a given sales transaction was made by a company7s own sales force 1a direct sale3 or
by a distributor 1an indirect sale3.
)ou may receive data from multiple source systems for your data warehouse. 6uppose
that one of those source systems processes only direct sales, and thus the source
https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13133https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13137https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13138https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14004https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13240https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14183https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14370https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13133https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13137https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13138https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14004https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13240https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14183https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#14370
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
19/46
system does not 4now indirect sales channels. When the data warehouse initially
receives sales data from this system, all sales records have a N3LL value for
the sales-:9annel_id field. These N3LL values must be set to the proper 4ey value. $or
e%ample, )ou can do this efficiently using a 6=# function as part of the insertion into
the target sales table statement:
The structure of source table sales_a:tiit,_di@e:t is as follows:
S=LJ $ESC sales_a:tiit,_di@e:tName NllK T,pe############ ##### ################SALES_$ATE $ATEPRO$3CT_I$ N3MBERC3STOMER_I$ N3MBERPROMOTION_I$ N3MBERAMO3NT N3MBER=3ANTIT) N3MBER
INSERT .*1 APPEN$ NOLOGGING PARALLEL *.INTO salesSELECT p@6d:t_id5 :st6me@_id5 TR3NC4sales_date75 S5p@6m6ti6n_id5 ?antit,5 am6nt
FROM sales_a:tiit,_di@e:t(
Transformation Using U$D!TE
!nother techniue for implementing a data substitution is to use an 3P$ATE statement
to modify the sales-:9annel_id column. !n 3P$ATE will provide the correct result.
-owever, if the data substitution transformations reuire that a very large percentage
of the rows 1or all of the rows3 be modified, then, it may be more efficient to use a&T!6 statement than an 3P$ATE.
Transformation Using ME/4E
Oracle7s merge functionality e%tends 6=#, by introducing the 6=# 4eyword MERGE, in
order to provide the ability to update or insert a row conditionally into a table or out of
line single table views. &onditions are specified in the ON clause. This is, besides pure
bul4 loading, one of the most common operations in data warehouse synchroni/ation.
+rior to Oracle5i, merges were e%pressed either as a seuence of DM# statements oras +#96=# loops operating on each row. oth of these approaches suffer from
deficiencies in performance and usability. The new merge functionality overcomes
these deficiencies with a new 6=# statement. This synta% has been proposed as part of
the upcoming 6=# standard.
When to Use Merge
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
20/46
There are several benefits of the new MERGE statement as compared with the two other
e%isting approaches.
• The entire operation can be e%pressed much more simply as a single 6=#
statement.
• )ou can paralleli/e statements transparently.
• )ou can use bul4 DM#.
• +erformance will improve because your statements will reuire fewer scans of
the source table.
Merge Examples
The following discusses various implementations of a merge. The e%amples assumethat new data for the dimension table products is propagated to the data warehouse
and has to be either inserted or updated. The table p@6d:ts_delta has the same
structure as p@6d:ts.
Example # Merge Operation Using +* in Oracle5i
MERGE INTO p@6d:ts t3SING p@6d:ts_delta sON 4t-p@6d_id+s-p@6d_id7WHEN MATCHE$ THEN3P$ATE SETt-p@6d_list_p@i:e+s-p@6d_list_p@i:e5t-p@6d_min_p@i:e+s-p@6d_min_p@i:eWHEN NOT MATCHE$ THENINSERT4p@6d_id5 p@6d_name5 p@6d_des:5p@6d_s/:ate6@,5 p@6d_s/:at_des:5 p@6d_:ate6@,5p@6d_:at_des:5 p@6d_stats5 p@6d_list_p@i:e5 p@6d_min_p@i:e72AL3ES4s-p@6d_id5 s-p@6d_name5 s-p@6d_des:5s-p@6d_s/:ate6@,5 s-p@6d_s/:at_des:5s-p@6d_:ate6@,5 s-p@6d_:at_des:5s-p@6d_stats5 s-p@6d_list_p@i:e5 s-p@6d_min_p@i:e7(
Example % Merge Operation Using +* $rior to Oracle5i
! regular 2oin between source p@6d:ts_delta and target p@6d:ts.
3P$ATE p@6d:ts tSET4p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5 p@6d_s/:at_des:5 p@6d_:ate6@,5p@6d_:at_des:5 p@6d_stats5 p@6d_list_p@i:e5
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
21/46
p@6d_min_p@i:e7 +4SELECT p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5 p@6d_s/:at_des:5p@6d_:ate6@,5 p@6d_:at_des:5 p@6d_stats5 p@6d_list_p@i:e5p@6d_min_p@i:e 0@6m p@6d:ts_delta s WHERE s-p@6d_id+t-p@6d_id7(
!n anti2oin between source p@6d:ts_delta and target p@6d:ts.
INSERT INTO p@6d:ts tSELECT * FROM p@6d:ts_delta sWHERE s-p@6d_id NOT IN4SELECT p@6d_id FROM p@6d:ts7(
The advantage of this approach is its simplicity and lac4 of new language e%tensions.
The disadvantage is its performance. It reuires an e%tra scan and a 2oin of both
the p@6d:ts_delta and the p@6d:tstables.
Example & $re-5i Merge Using $*.+*
CREATE OR REPLACE PROCE$3RE me@e_p@6:ISC3RSOR :@ ISSELECT p@6d_id5 p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5 p@6d_s/:at_des:5 p@6d_:ate6@,5 p@6d_:at_des:5 p@6d_stats5 p@6d_list_p@i:e5 p@6d_min_p@i:eFROM p@6d:ts_delta(:@e: :@@68t,pe(BEGIN OPEN :@( LOOP
FETCH :@ INTO :@e:( EIT WHEN :@n6t06nd( 3P$ATE p@6d:ts SET
p@6d_name + :@e:-p@6d_name5 p@6d_des: + :@e:-p@6d_des:5p@6d_s/:ate6@, + :@e:-p@6d_s/:ate6@,5p@6d_s/:at_des: + :@e:-p@6d_s/:at_des:5p@6d_:ate6@, + :@e:-p@6d_:ate6@,5p@6d_:at_des: + :@e:-p@6d_:at_des:5p@6d_stats + :@e:-p@6d_stats5p@6d_list_p@i:e + :@e:-p@6d_list_p@i:e5
p@6d_min_p@i:e + :@e:-p@6d_min_p@i:e WHERE :@e:-p@6d_id + p@6d_id(
IF S=Ln6t06nd THEN INSERT INTO p@6d:ts4p@6d_id5 p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5p@6d_s/:at_des:5 p@6d_:ate6@,5p@6d_:at_des:5 p@6d_stats5 p@6d_list_p@i:e5 p@6d_min_p@i:e7
2AL3ES 4:@e:-p@6d_id5 :@e:-p@6d_name5 :@e:-p@6d_des:5 :@e:-p@6d_s/:ate6@,5
:@e:-p@6d_s/:at_des:5 :@e:-p@6d_:ate6@,5:@e:-p@6d_:at_des:5 :@e:-p@6d_stats5 :@e:-p@6d_list_p@i:e5
:@e:-p@6d_min_
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
22/46
p@i:e7( EN$ IF( EN$ LOOP( CLOSE :@(EN$ me@e_p@6:(.
Transformation Using Multitable I2E/T
Many times, e%ternal data sources have to be segregated based on logical attributes for
insertion into different target ob2ects. It7s also freuent in data warehouse
environments to fan out the same source data into several target ob2ects. Multitable
inserts provide a new 6=# statement for these 4inds of transformations, where data
can either end up in several or e%actly one target, depending on the business
transformation rules. This insertion can be done conditionally based on business rules
or unconditionally.
It offers the benefits of the INSERT ... SELECT statement when multiple tables are
involved as targets. In doing so, it avoids the drawbac4s of the alternatives available
to you using functionality prior to Oracle5i. )ou either had to deal
with n independent INSERT ... SELECT statements, thus processing the same source
data n times and increasing the transformation wor4load n times. !lternatively, you
had to choose a procedural approach with a perrow determination how to handle the
insertion. This solution lac4ed direct access to highspeed access paths available in
6=#.
!s with the e%isting INSERT ... SELECT statement, the new statement can be paralleli/edand used with the directload mechanism for faster performance.
Example 13-1 Unconditional Insert
The following statement aggregates the transactional sales information, stored
in sales_a:tiit,_di@e:t, on a per daily base and inserts into both the sales and
the :6sts fact table for the current day.
INSERT ALL INTO sales 2AL3ES 4p@6d:t_id5 :st6me@_id5 t6da,5 S5 p@6m6ti6n_id5
?antit,_pe@_da,5 am6nt_pe@_da,7 INTO :6sts 2AL3ES 4p@6d:t_id5 t6da,5 p@6d:t_:6st5 p@6d:t_p@i:e7SELECT TR3NC4s-sales_date7 AS t6da,5
s-p@6d:t_id5 s-:st6me@_id5 s-p@6m6ti6n_id5 S3M4s-am6nt_s6ld7 AS am6nt_pe@_da,5 S3M4s-?antit,7 ?antit,_pe@_da,5 p-p@6d:t_:6st5 p-p@6d:t_p@i:e FROM sales_a:tiit,_di@e:t s5 p@6d:t_in06@mati6n p WHERE s-p@6d:t_id + p-p@6d:t_id AN$ t@n:4sales_date7+t@n:4s,sdate7 GRO3P B) t@n:4sales_date75 s-p@6d:t_id5
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
23/46
s-:st6me@_id5 s-p@6m6ti6n_id5 p-p@6d:t_:6st5 p-p@6d:t_p@i:e(
Example 13-2 Conditional !! Insert
The following statement inserts a row into the sales and :6st tables for all sales
transactions with a valid promotion and stores the information about multiple identicalorders of a customer in a separate table:m_sales_a:tiit,. It is possible two rows
will be inserted for some sales transactions, and none for others.
INSERT ALLWHEN p@6m6ti6n_id IN 4SELECT p@6m6_id FROM p@6m6ti6ns7 THEN INTO sales 2AL3ES 4p@6d:t_id5 :st6me@_id5 t6da,5 S5 p@6m6ti6n_id5
?antit,_pe@_da,5 am6nt_pe@_da,7 INTO :6sts 2AL3ES 4p@6d:t_id5 t6da,5 p@6d:t_:6st5 p@6d:t_p@i:e7WHEN nm_60_6@de@s J " THEN INTO :m_sales_a:tiit, 2AL3ES 4t6da,5 p@6d:t_id5 :st6me@_id5 p@6m6ti6n_id5 ?antit,_pe@_da,5 am6nt_pe@_da,5 nm_60_6@de@s7SELECT TR3NC4s-sales_date7 AS t6da,5 s-p@6d:t_id5 s-:st6me@_id5
s-p@6m6ti6n_id5 S3M4s-am6nt7 AS am6nt_pe@_da,5 S3M4s-?antit,7 ?antit,_pe@_da,5 CO3NT4*7 nm_60_6@de@s5 p-p@6d:t_:6st5 p-p@6d:t_p@i:eFROM sales_a:tiit,_di@e:t s5 p@6d:t_in06@mati6n pWHERE s-p@6d:t_id + p-p@6d:t_idAN$ TR3NC4sales_date7 + TR3NC4s,sdate7GRO3P B) TR3NC4sales_date75 s-p@6d:t_id5 s-:st6me@_id5
s-p@6m6ti6n_id5 p-p@6d:t_:6st5 p-p@6d:t_p@i:e(
Example 13-3 Conditional FI"#T Insert
The following statement inserts into an appropriate shipping manifest according to thetotal uantity and the weight of a product order. !n e%ception is made for high value
orders, which are also sent by e%press, unless their weight classification is not too
high. It assumes the e%istence of appropriate
tables la@e_0@ei9t_s9ippin, ep@ess_s9ippin, and de0alt_s9ippin.
INSERT FIRST WHEN 4sm_?antit,_s6ld J "& AN$ p@6d_8ei9t_:lass 7 OR
4sm_?antit,_s6ld J AN$ p@6d_8ei9t_:lass J 7 THEN INTO la@e_0@ei9t_s9ippin 2AL3ES
4time_id5 :st_id5 p@6d_id5 p@6d_8ei9t_:lass5 sm_?antit,_s6ld7 WHEN sm_am6nt_s6ld J "&&& THEN INTO ep@ess_s9ippin 2AL3ES 4time_id5 :st_id5 p@6d_id5 p@6d_8ei9t_:lass5 sm_am6nt_s6ld5 sm_?antit,_s6ld7 ELSE INTO de0alt_s9ippin 2AL3ES 4time_id5 :st_id5 p@6d_id5 sm_?antit,_s6ld7SELECT s-time_id5 s-:st_id5 s-p@6d_id5 p-p@6d_8ei9t_:lass5 S3M4am6nt_s6ld7 AS sm_am6nt_s6ld5
S3M4?antit,_s6ld7 AS sm_?antit,_s6ld
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
24/46
FROM sales s5 p@6d:ts pWHERE s-p@6d_id + p-p@6d_idAN$ s-time_id + TR3NC4s,sdate7GRO3P B) s-time_id5 s-:st_id5 s-p@6d_id5 p-p@6d_8ei9t_:lass(
Example 13-$ Mixed Conditional and Unconditional Insert
The following e%ample inserts new customers into the customers table and stores all
new customers with :st_:@edit_limit higher then GB@@ in an additional, separate
table for further promotions.
INSERT FIRST WHEN :st_:@edit_limit J+ && THEN INTO :st6me@s INTO :st6me@s_spe:ial 2AL3ES 4:st_id5 :st_:@edit_limit7 ELSE INTO :st6me@sSELECT * FROM :st6me@s_ne8(
Transformation Using $*.+*
In a data warehouse environment, you can use procedural languages such as +#96=#
to implement comple% transformations in the Oracle5i database. Whereas &T!6
operates on entire tables and emphasi/es parallelism, +#96=# provides a rowbased
approached and can accommodate very sophisticated transformation rules. $or
e%ample, a +#96=# procedure could open multiple cursors and read data from
multiple source tables, combine this data using comple% business rules, and finally
insert the transformed data into one or more target table. It would be difficult or
impossible to e%press the same seuence of operations using standard 6=#statements.
*sing a procedural language, a specific transformation 1or number of transformation
steps3 within a comple% "T# processing can be encapsulated, reading data from an
intermediate staging area and generating a new table ob2ect as output. ! previously
generated transformation input table and a subseuent transformation will consume
the table generated by this specific transformation. !lternatively, these encapsulated
transformation steps within the complete "T# process can be integrated seamlessly,
thus streaming sets of rows between each other without the necessity of intermediate
staging. )ou can use Oracle5i7s table functions to implement such behavior.
Transformation Using Table Functions
Oracle5i7s table functions provide the support for pipelined and parallel e%ecution of
transformations implemented in +#96=#, &, or ;ava. 6cenarios as mentioned earlier
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
25/46
can be done without reuiring the use of intermediate staging tables, which interrupt
the data flow through various transformations steps.
What is a Table Function6
! table function is defined as a function that can produce a set of rows as output.!dditionally, table functions can ta4e a set of rows as input. +rior to Oracle5i,
+#96=# functions:
• &ould not ta4e cursors as input
• &ould not be paralleli/ed or pipelined
6tarting with Oracle5i, functions are not limited in these ways. Table functions e%tend
database functionality by allowing:
• Multiple rows to be returned from a function
• ?esults of 6=# subueries 1that select multiple rows3 to be passed directly to
functions
• $unctions ta4e cursors as input
• $unctions can be paralleli/ed
• ?eturning result sets incrementally for further processing as soon as they are
created. This is called incremental pipelining
Table functions can be defined in +#96=# using a native +#96=# interface, or in ;ava
or & using the Oracle Data &artridge Interface 1OD&I3.
ee !lso"
PL/SQL Users !uide and Reference for further information and Oracle9i
Data Cartridge De"elopers !uide
$igure '88 illustrates a typical aggregation where you input a set of rows and output a
set of rows, in that case, after performing a S3M operation.
Figure 13-3 Ta%le Function Example
https://docs.oracle.com/cd/A97630_01/appdev.920/a96624/toc.htmhttps://docs.oracle.com/cd/A97630_01/appdev.920/a96595/toc.htmhttps://docs.oracle.com/cd/A97630_01/appdev.920/a96595/toc.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#17164https://docs.oracle.com/cd/A97630_01/appdev.920/a96624/toc.htmhttps://docs.oracle.com/cd/A97630_01/appdev.920/a96595/toc.htmhttps://docs.oracle.com/cd/A97630_01/appdev.920/a96595/toc.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#17164
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
26/46
Te%t description of the illustration [email protected]
The pseudocode for this operation would be similar to:
INSERT INTO 6tSELECT * FROM 4;Ta/le Fn:ti6n;4SELECT * FROM in77(
The table function ta4es the result of the SELECT on In as input and delivers a set of
records in a different format as output for a direct insertion into Ot.
!dditionally, a table function can fan out data within the scope of an atomic
transaction. This can be used for many occasions li4e an efficient logging mechanism
or a fan out for other independent transformations. In such a scenario, a single staging
table will be needed.
Figure 13-$ Pipelined Parallel Transformation &it' Fanout
Te%t description of the illustration [email protected]
The pseudocode for this would be similar to:
INSERT INTO ta@et SELECT * FROM 4t0'4SELECT *FROM 4t0"4SELECT * FROM s6@:e7777(
https://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg084.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg079.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg084.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg084.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg079.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg079.htm
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
27/46
This will insert into ta@et and, as part of t0", into Stae Ta/le " within the scope of
an atomic transaction.
INSERT INTO ta@et SELECT * FROM t0!4SELECT * FROM stae_ta/le"7(
Example 13-( Ta%le Functions Fundamentals
The following e%amples demonstrate the fundamentals of table functions, without the
usage of comple% business rules implemented inside those functions. They are chosen
for demonstration purposes only, and are all implemented in +#96=#.
Table functions return sets of records and can ta4e cursors as input. esides
the Sales Hist6@, schema, you have to set up the following database ob2ects before
using the e%amples:
REM 6/je:t t,pesCREATE T)PE p@6d:t_t AS OBECT 4
p@6d_id N3MBER475p@6d_name 2ARCHAR'4&75
p@6d_des: 2ARCHAR'4&&&75 p@6d_s/:ate6@, 2ARCHAR'4&75 p@6d_s/:at_des: 2ARCHAR'4'&&&7- p@6d_:ate6@, 2ARCHAR'4&75 p@6d_:at_des: 2ARCHAR'4'&&&75 p@6d_8ei9t_:lass N3MBER4'75 p@6d_nit_60_meas@e 2ARCHAR'4'&75 p@6d_pa:_siQe 2ARCHAR'4!&75 spplie@_id N3MBER475 p@6d_stats 2ARCHAR'4'&75
p@6d_list_p@i:e N3MBER45'75 p@6d_min_p@i:e N3MBER45'77(.CREATE T)PE p@6d:t_t_ta/le AS TABLE OF p@6d:t_t(.COMMIT(
REM pa:ae 60 all :@s6@ t,pesREM 8e 9ae t6 9andle t9e inpt :@s6@ t,pe and t9e 6tpt :@s6@ :6lle:ti6nREM t,peCREATE OR REPLACE PACAGE :@s6@_PG as T)PE p@6d:t_t_@e: IS RECOR$ 4
p@6d_id N3MBER475p@6d_name 2ARCHAR'4&75
p@6d_des: 2ARCHAR'4&&&75 p@6d_s/:ate6@, 2ARCHAR'4&75 p@6d_s/:at_des: 2ARCHAR'4'&&&75 p@6d_:ate6@, 2ARCHAR'4&75 p@6d_:at_des: 2ARCHAR'4'&&&75 p@6d_8ei9t_:lass N3MBER4'75 p@6d_nit_60_meas@e 2ARCHAR'4'&75 p@6d_pa:_siQe 2ARCHAR'4!&75
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
28/46
spplie@_id N3MBER475 p@6d_stats 2ARCHAR'4'&75 p@6d_list_p@i:e N3MBER45'75 p@6d_min_p@i:e N3MBER45'77( T)PE p@6d:t_t_@e:ta/ IS TABLE OF p@6d:t_t_@e:( T)PE st@6n_@e0:@_t IS REF C3RSOR RET3RN p@6d:t_t_@e:( T)PE @e0:@_t IS REF C3RSOR(EN$(.
REM a@ti0i:ial 9elp ta/le5 sed t6 dem6nst@ate 0i@e "!#CREATE TABLE 6/s6lete_p@6d:ts_e@@6@s 4p@6d_id N3MBER5 ms 2ARCHAR'4'&&&77(
The following e%ample demonstrates a simple filtering< it shows all obsolete products
e%cept the p@6d_:ate6@, B6,s. The table function returns the result set as a set of
records and uses a wea4ly typed ref cursor as input.
CREATE OR REPLACE F3NCTION 6/s6lete_p@6d:ts4:@ :@s6@_p-@e0:@_t7RET3RN p@6d:t_t_ta/le
IS p@6d_id N3MBER47(
p@6d_name 2ARCHAR'4&7(p@6d_des: 2ARCHAR'4&&&7(p@6d_s/:ate6@, 2ARCHAR'4&7(p@6d_s/:at_des: 2ARCHAR'4'&&&7(p@6d_:ate6@, 2ARCHAR'4&7(
p@6d_:at_des: 2ARCHAR'4'&&&7(p@6d_8ei9t_:lass N3MBER4'7(
p@6d_nit_60_meas@e 2ARCHAR'4'&7( p@6d_pa:_siQe 2ARCHAR'4!&7( spplie@_id N3MBER47(
p@6d_stats 2ARCHAR'4'&7( p@6d_list_p@i:e N3MBER45'7(
p@6d_min_p@i:e N3MBER45'7( sales N3MBER+&( 6/jset p@6d:t_t_ta/le + p@6d:t_t_ta/le47( i N3MBER + &(BEGIN LOOP ## Fet:9 0@6m :@s6@ a@ia/le FETCH :@ INTO p@6d_id5 p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5
p@6d_s/:at_des:5 p@6d_:ate6@,5 p@6d_:at_des:5 p@6d_8ei9t_:lass5p@6d_nit_60_meas@e5 p@6d_pa:_siQe5 spplie@_id5 p@6d_stats5p@6d_list_p@i:e5 p@6d_min_p@i:e(
EIT WHEN :@NOTFO3N$( ## eit 89en last @68 is 0et:9ed IF p@6d_stats+6/s6lete AN$ p@6d_:ate6@, + B6,s THEN ## append t6 :6lle:ti6n i+i1"( 6/jset-etend( 6/jset4i7+p@6d:t_t4 p@6d_id5 p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5p@6d_s/:at_des:5 p@6d_:ate6@,5 p@6d_:at_des:5 p@6d_8ei9t_:lass5 p@6d_nit_60_meas@e5 p@6d_pa:_siQe5 spplie@_id5 p@6d_stats5 p@6d_list_p@i:e5 p@6d_min_p@i:e7( EN$ IF(
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
29/46
EN$ LOOP( CLOSE :@( RET3RN 6/jset(EN$(.
)ou can use the table function in a 6=# statement to show the results. -ere we use
additional 6=# functionality for the output.
SELECT $ISTINCT 3PPER4p@6d_:ate6@,75 p@6d_statsFROM TABLE46/s6lete_p@6d:ts4C3RSOR4SELECT * FROM p@6d:ts777(
3PPER4PRO$_CATEGOR)7 PRO$_STAT3S#################### ###########GIRLS 6/s6leteMEN 6/s6lete
' @68s sele:ted-
The following e%ample implements the same filtering than the first one. The main
differences between those two are:
• This e%ample uses a strong typed ?"$ cursor as input and can be paralleli/ed
based on the ob2ects of the strong typed cursor, as shown in one of the
following e%amples.
• The table function returns the result set incrementally as soon as records are
created.
• REM Same eample5 pipelined implementati6n
• REM st@6n @e0 :@s6@ 4inpt t,pe is de0ined7
• REM a ta/le 8it96t a st@6n t,ped inpt @e0 :@s6@ :ann6t /epa@alleliQed
• REM
• CREATE OR
• REPLACE F3NCTION 6/s6lete_p@6d:ts_pipe4:@ :@s6@_p-st@6n_@e0:@_t7
• RET3RN p@6d:t_t_ta/le
• PIPELINE$
• PARALLEL_ENABLE 4PARTITION :@ B) AN)7 IS
• p@6d_id N3MBER47(
• p@6d_name 2ARCHAR'4&7(
• p@6d_des: 2ARCHAR'4&&&7(
• p@6d_s/:ate6@, 2ARCHAR'4&7(
• p@6d_s/:at_des: 2ARCHAR'4'&&&7(
• p@6d_:ate6@, 2ARCHAR'4&7(
• p@6d_:at_des: 2ARCHAR'4'&&&7(
• p@6d_8ei9t_:lass N3MBER4'7(
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
30/46
• p@6d_nit_60_meas@e 2ARCHAR'4'&7(
• p@6d_pa:_siQe 2ARCHAR'4!&7(
• spplie@_id N3MBER47(
• p@6d_stats 2ARCHAR'4'&7(
• p@6d_list_p@i:e N3MBER45'7(
• p@6d_min_p@i:e N3MBER45'7(
• sales N3MBER+&(
• BEGIN
• LOOP
• ## Fet:9 0@6m :@s6@ a@ia/le
• FETCH :@ INTO p@6d_id5 p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5p@6d_s/:at_
• des:5 p@6d_:ate6@,5 p@6d_:at_des:5 p@6d_8ei9t_:lass5p@6d_nit_60_meas@e5
• p@6d_pa:_siQe5 spplie@_id5 p@6d_stats5 p@6d_list_p@i:e5p@6d_min_p@i:e(
• EIT WHEN :@NOTFO3N$( ## eit 89en last @68 is 0et:9ed
• IF p@6d_stats+6/s6lete AN$ p@6d_:ate6@, +B6,s THEN
• PIPE ROW 4p@6d:t_t4p@6d_id5 p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5p@6d_
• s/:at_des:5 p@6d_:ate6@,5 p@6d_:at_des:5 p@6d_8ei9t_:lass5p@6d_nit_60_
• meas@e5 p@6d_pa:_siQe5 spplie@_id5 p@6d_stats5 p@6d_list_p@i:e5p@6d_min_
• p@i:e77(
• EN$ IF(
• EN$ LOOP(
• CLOSE :@(
• RET3RN(
• EN$(
• .
)ou can use the table function as follows:
SELECT $ISTINCT p@6d_:ate6@,5 $ECO$E4p@6d_stats5 6/s6lete5 NO LONGERREMO2E_A2AILABLE5 N.A7FROM TABLE46/s6lete_p@6d:ts_pipe4C3RSOR4SELECT * FROM p@6d:ts777(
PRO$_CATEGOR) $ECO$E4PRO$_STAT3S5############# ###################Gi@ls NO LONGER A2AILABLEMen NO LONGER A2AILABLE
' @68s sele:ted-
We now change the degree of parallelism for the input table products and issue the
same statement again:
ALTER TABLE p@6d:ts PARALLEL (
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
31/46
The session statistics show that the statement has been paralleli/ed:
SELECT * FROM 2>P=_SESSTAT WHERE statisti:+=e@ies Pa@alleliQed(
STATISTIC LAST_=3ER) SESSION_TOTAL#################### ########## #############
=e@ies Pa@alleliQed " !
" @68 sele:ted-
Table functions are also capable to fanout results into persistent table structures. This
is demonstrated in the ne%t e%ample. The function filters returns all obsolete products
e%cept a those of a specificp@6d_:ate6@, 1default Men3, which was set to status
obsolete by error. The detected wrong p@6d_id7s are stored in a separate table
structure. Its result set consists of all other obsolete product categories. It furthermore
demonstrates how normal variables can be used in con2unction with table functions:
CREATE OR REPLACE F3NCTION 6/s6lete_p@6d:ts_dml4:@:@s6@_p-st@6n_@e0:@_t5p@6d_:at 2ARCHAR' $EFA3LT Men7 RET3RN p@6d:t_t_ta/lePIPELINE$PARALLEL_ENABLE 4PARTITION :@ B) AN)7 IS PRAGMA A3TONOMO3S_TRANSACTION( p@6d_id N3MBER47(
p@6d_name 2ARCHAR'4&7(p@6d_des: 2ARCHAR'4&&&7(p@6d_s/:ate6@, 2ARCHAR'4&7(p@6d_s/:at_des: 2ARCHAR'4'&&&7(p@6d_:ate6@, 2ARCHAR'4&7(
p@6d_:at_des: 2ARCHAR'4'&&&7(p@6d_8ei9t_:lass N3MBER4'7( p@6d_nit_60_meas@e 2ARCHAR'4'&7( p@6d_pa:_siQe 2ARCHAR'4!&7( spplie@_id N3MBER47( p@6d_stats 2ARCHAR'4'&7( p@6d_list_p@i:e N3MBER45'7(
p@6d_min_p@i:e N3MBER45'7( sales N3MBER+&(BEGIN LOOP ## Fet:9 0@6m :@s6@ a@ia/le FETCH :@ INTO p@6d_id5 p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5p@6d_s/:at_
des:5 p@6d_:ate6@,5 p@6d_:at_des:5 p@6d_8ei9t_:lass5 p@6d_nit_60_meas@e5p@6d_pa:_siQe5 spplie@_id5 p@6d_stats5 p@6d_list_p@i:e5 p@6d_min_p@i:e( EIT WHEN :@NOTFO3N$( ## eit 89en last @68 is 0et:9ed IF p@6d_stats+6/s6lete THEN IF p@6d_:ate6@,+p@6d_:at THEN INSERT INTO 6/s6lete_p@6d:ts_e@@6@s 2AL3ES
4p@6d_id5 :6@@e:ti6n :ate6@,
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
32/46
PIPE ROW 4p@6d:t_t4 p@6d_id5 p@6d_name5 p@6d_des:5 p@6d_s/:ate6@,5p@6d_s/:at_des:5 p@6d_:ate6@,5 p@6d_:at_des:5 p@6d_8ei9t_:lass5 p@6d_nit_60_meas@e5 p@6d_pa:_siQe5 spplie@_id5 p@6d_stats5 p@6d_list_p@i:e5 p@6d_min_p@i:e77( EN$ IF( EN$ IF( EN$ LOOP( COMMIT( CLOSE :@( RET3RN(EN$(.
The following uery shows all obsolete product groups e%cept the p@6d_:ate6@, Men,
which was wrongly set to status 6/s6lete.
SELECT $ISTINCT p@6d_:ate6@,5 p@6d_stats FROM TABLE46/s6lete_p@6d:ts_
dml4C3RSOR4SELECT * FROM p@6d:ts777(PRO$_CATEGOR) PRO$_STAT3S############# ###########B6,s 6/s6leteGi@ls 6/s6lete
' @68s sele:ted-
!s you can see, there are some products of the p@6d_:ate6@, Men that were obsoleted
by accident:
SELECT $ISTINCT ms FROM 6/s6lete_p@6d:ts_e@@6@s(
MSG########################################:6@@e:ti6n :ate6@, MEN still aaila/le
" @68 sele:ted-
Ta4ing advantage of the second input variable changes the result set as follows:
SELECT $ISTINCT p@6d_:ate6@,5 p@6d_stats FROM TABLE46/s6lete_p@6d:ts_dml4C3RSOR4SELECT * FROM p@6d:ts75 B6,s77(
PRO$_CATEGOR) PRO$_STAT3S############# ###########Gi@ls 6/s6leteMen 6/s6lete
' @68s sele:ted-
SELECT $ISTINCT ms FROM 6/s6lete_p@6d:ts_e@@6@s(
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
33/46
MSG#########################################:6@@e:ti6n :ate6@, BO)S still aaila/le
" @68 sele:ted-
ecause table functions can be used li4e a normal table, they can be nested, as shown
in the following:
SELECT $ISTINCT p@6d_:ate6@,5 p@6d_statsFROM TABLE46/s6lete_p@6d:ts_dml4C3RSOR4SELECT *
FROM TABLE46/s6lete_p@6d:ts_pipe4C3RSOR4SELECT * FROMp@6d:ts777777(
PRO$_CATEGOR) PRO$_STAT3S############# ###########Gi@ls 6/s6lete
" @68 sele:ted-
ecause the table function 6/s6lete_p@6d:ts_pipe filters out all products of
the p@6d_:ate6@, B6,s, our result does no longer include products of
the p@6d_:ate6@, B6,s. The p@6d_:ate6@, Men is still set to be obsolete by accident.
SELECT CO3NT4*7 FROM 6/s6lete_p@6d:ts_e@@6@s(MSG########################################:6@@e:ti6n :ate6@, MEN still aaila/le
The biggest advantage of Oracle5i "T# is its tool4it functionality, where you can
combine any of the latter discussed functionality to improve and speed up your "T#
processing. $or e%ample, you can ta4e an e%ternal table as input, 2oin it with an
e%isting table and use it as input for a paralleli/ed table function to process comple%
business logic. This table function can be used as input source for a MERGE operation,
thus streaming the new information for the data warehouse, provided in a flat file
within one single statement through the complete "T# process.
*oading and Transformation cenariosThe following sections offer e%amples of typical loading and transformation tas4s:
• +arallel #oad 6cenario
• ey #oo4up 6cenario
https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13139https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13835https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13139https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13835
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
34/46
• "%ception -andling 6cenario
• +ivoting 6cenarios
$arallel *oad cenario
This section presents a case study illustrating how to create, load, inde%, and analy/e a
large data warehouse fact table with partitions in a typical star schema. This e%ample
uses 6=#>#oader to e%plicitly stripe data over 8@ dis4s.
• The e%ample 'A@ F table is named 0a:ts.
• The system is a '@&+* shared memory computer with more than '@@ dis4
drives.
• Thirty dis4s 1G F each3 are used for base table data, '@ dis4s for inde%es, and
8@ dis4s for temporary space. !dditional dis4s are needed for rollbac4
segments, control files, log files, possible staging area for loader flat files, and
so on.
• The 0a:ts table is partitioned by month into 'A partitions. To facilitate bac4up
and recovery, each partition is stored in its own tablespace.
• "ach partition is spread evenly over '@ dis4s, so a scan accessing few partitions
or a single partition can proceed with full parallelism. Thus there can be intra partition parallelism when ueries restrict data access by partition pruning.
• "ach dis4 has been further subdivided using an operating system utility into G
operating system files with names li4e .de.$"-"5 .de.$"-'5 --- 5
.de.$!&-.
• $our tablespaces are allocated on each group of '@ dis4s. To better balance I9O
and paralleli/e table space creation 1because Oracle writes each bloc4 in a
datafile when it is added to a tablespace3, it is best if each of the four
tablespaces on each group of '@ dis4s has its first datafile on a different dis4.Thus the first tablespace has .de.$"-" as its first datafile, the second
tablespace has .de.$-' as its first datafile, and so on, as illustrated
in $igure '8B.
Figure 13-( Datafile !a)out for Parallel !oad Example
https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13930https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13842https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13398https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13930https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13842https://docs.oracle.com/cd/A97630_01/server.920/a96520/transfor.htm#13398
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
35/46
Te%t description of the illustration [email protected]
tep #" 'reate the Tablespaces and !dd Datafiles in $arallel
The following is the command to create a tablespace named Ts0a:ts". Other
tablespaces are created with analogous commands. On a '@&+* machine, it should
be possible to run all 'A CREATE TABLESPACEstatements together. !lternatively, it might
be better to run them in two batches of C 1two from each of the three groups of dis4s3.
CREATE TABLESPACE TS0a:ts"$ATAFILE .de.$"-" SIDE "&'MB RE3SE5$ATAFILE .de.$'-" SIDE "&'MB RE3SE5$ATAFILE .de.$!-" SIDE "&'MB RE3SE5$ATAFILE .de.$-" SIDE "&'MB RE3SE5$ATAFILE .de.$-" SIDE "&'MB RE3SE5$ATAFILE .de.$-" SIDE "&'MB RE3SE5$ATAFILE .de.$-" SIDE "&'MB RE3SE5$ATAFILE .de.$-" SIDE "&'MB RE3SE5$ATAFILE .de.$%-" SIDE "&'MB RE3SE5$ATAFILE .de.$"&-" SIDE "&'MB RE3SE5
$EFA3LT STORAGE 4INITIAL "&&MB NET "&&MB PCTINCREASE &7(---
CREATE TABLESPACE TS0a:ts'$ATAFILE .de.$-' SIDE "&'MB RE3SE5$ATAFILE .de.$-' SIDE "&'MB RE3SE5$ATAFILE .de.$-' SIDE "&'MB RE3SE5$ATAFILE .de.$-' SIDE "&'MB RE3SE5$ATAFILE .de.$-' SIDE "&'MB RE3SE5$ATAFILE .de.$%-' SIDE "&'MB RE3SE5
https://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg099.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg099.htmhttps://docs.oracle.com/cd/A97630_01/server.920/a96520/img_text/dwhsg099.htm
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
36/46
$ATAFILE .de.$"&-' SIDE "&'MB RE3SE5$ATAFILE .de.$"-' SIDE "&'MB RE3SE5$ATAFILE .de.$'-' SIDE "&'MB RE3SE5$ATAFILE .de.$!-' SIDE "&'MB RE3SE5$EFA3LT STORAGE 4INITIAL "&&MB NET "&&MB PCTINCREASE &7(---CREATE TABLESPACE TS0a:ts$ATAFILE .de.$"&- SIDE "&'MB RE3SE5$ATAFILE .de.$"- SIDE "&'MB RE3SE5$ATAFILE .de.$'- SIDE "&'MB RE3SE5$ATAFILE .de.$!- SIDE "&'MB RE3SE5$ATAFILE .de.$- SIDE "&'MB RE3SE5$ATAFILE .de.$- SIDE "&'MB RE3SE5$ATAFILE .de.$- SIDE "&'MB RE3SE5$ATAFILE .de.$- SIDE "&'MB RE3SE5$ATAFILE .de.$- SIDE "&'MB RE3SE5$ATAFILE .de.$%- SIDE "&'MB RE3SE5$EFA3LT STORAGE 4INITIAL "&&MB NET "&&MB PCTINCREASE &7(---CREATE TABLESPACE TS0a:ts"'
$ATAFILE .de.$!&- SIDE "&'MB RE3SE5$ATAFILE .de.$'"- SIDE "&'MB RE3SE5$ATAFILE .de.$''- SIDE "&'MB RE3SE5$ATAFILE .de.$'!- SIDE "&'MB RE3SE5$ATAFILE .de.$'- SIDE "&'MB RE3SE5$ATAFILE .de.$'- SIDE "&'MB RE3SE5$ATAFILE .de.$'- SIDE "&'MB RE3SE5$ATAFILE .de.$'- SIDE "&'MB RE3SE5$ATAFILE .de.$'- SIDE "&'MB RE3SE5$ATAFILE .de.$'%- SIDE "&'MB RE3SE5$EFA3LT STORAGE 4INITIAL "&&MB NET "&&MB PCTINCREASE &7(
"%tent si/es in the STORAGE clause should be multiples of the multibloc4 read si/e,where bloc4si/e > M3LTIBLOC_REA$_CO3NT J multibloc4 read si/e.
INITIAL and NET should normally be set to the same value. In the case of parallel
load, ma4e the e%tent si/e large enough to 4eep the number of e%tents reasonable, and
to avoid e%cessive overhead and seriali/ation due to bottlenec4s in the data dictionary.
When PARALLEL+TR3E is used for parallel loader, the INITIAL e%tent is not used. In this
case you can override the INITIAL e%tent si/e specified in the tablespace default
storage clause with the value specified in the loader control file, for e%ample, CG.
Tables or inde%es can have an unlimited number of e%tents, provided you have setthe COMPATIBLE initiali/ation parameter to match the current release number, and use
the MAETENTS 4eyword on the CREATE or ALTER statement for the tablespace or ob2ect.
In practice, however, a limit of '@,@@@ e%tents for each ob2ect is reasonable. ! table or
inde% has an unlimited number of e%tents, so set the PERCENT_INCREASE parameter to
/ero to have e%tents of eual si/e.
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
37/46
2ote"
If possible, do not allocate e%tents faster than about A or 8 for each minute.
Thus, each process should get an e%tent that lasts for 8 to B minutes.
ormally, such an e%tent is at least B@ M for a large ob2ect. Too small ane%tent si/e incurs significant overhead, which affects performance and
scalability of parallel operations. The largest possible e%tent si/e for a G F
dis4 evenly divided into G partitions is ' F. '@@ M e%tents should perform
well. "ach partition will have '@@ e%tents. )ou can then customi/e the defaultstorage parameters for each ob2ect created in the tablespace, if needed.
tep %" 'reate the $artitioned Table
We create a partitioned table with 'A partitions, each in its own tablespace. The tablecontains multiple dimensions and multiple measures. The partitioning column is
named dim_' and is a date. There are other columns as well.
CREATE TABLE 0a:ts 4dim_" N3MBER5 dim_' $ATE5 --- meas_" N3MBER5 meas_' N3MBER5 --- 7PARALLELPARTITION B) RANGE 4dim_'74PARTITION jan% 2AL3ES LESS THAN 4&'#&"#"%%7 TABLESPACETS0a:ts"5PARTITION 0e/% 2AL3ES LESS THAN 4&!#&"#"%%7 TABLESPACETS0a:ts'5---
PARTITION de:% 2AL3ES LESS THAN 4&"#&"#"%%7 TABLESPACETS0a:ts"'7(
tep &" *oad the $artitions in $arallel
This section describes four alternative approaches to loading partitions in parallel. The
different approaches to loading help you manage the ramifications of
the PARALLEL+TR3E 4eyword of 6=#>#oader that controls whether individual partitions
are loaded in parallel. The PARALLEL 4eyword entails the following restrictions:
•Inde%es cannot be defined.
• )ou must set a small initial e%tent, because each loader session gets a new
e%tent when it begins, and it does not use any e%isting space associated with the
ob2ect.
• 6pace fragmentation issues arise.
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
38/46
-owever, regardless of the setting of this 4eyword, if you have one loader process for
each partition, you are still effectively loading into the table in parallel.
Example 13-* !oading Partitions in Parallel Case 1
In this approach, assume 'A input files are partitioned in the same way as your table.)ou have one input file for each partition of the table to be loaded. )ou start 'A
6=#>#oader sessions concurrently in parallel, entering statements li4e these:
S=LL$R $ATA+jan%-dat $IRECT+TR3E CONTROL+jan%-:tlS=LL$R $ATA+0e/%-dat $IRECT+TR3E CONTROL+0e/%-:tl - - -S=LL$R $ATA+de:%-dat $IRECT+TR3E CONTROL+de:%-:tl
In the e%ample, the 4eyword PARALLEL+TR3E is not set. ! separate control file for each
partition is necessary because the control file must specify the partition into which the
loading should be done. It contains a statement such as the following:
LOA$ INTO 0a:ts pa@titi6n4jan%7
The advantage of this approach is that local inde%es are maintained by 6=#>#oader.
)ou still get parallel loading, but on a partition levelwithout the restrictions of
the PARALLEL 4eyword.
! disadvantage is that you must partition the input prior to loading manually.
Example 13-+ !oading Partitions in Parallel Case 2
In another common approach, assume an arbitrary number of input files that are not
partitioned in the same way as the table. )ou can adopt a strategy of performing
parallel load for each input file individually. Thus if there are seven input files, you
can start seven 6=#>#oader sessions, using statements li4e the following:
S=LL$R $ATA+0ile"-dat $IRECT+TR3E PARALLEL+TR3E
Oracle partitions the input data so that it goes into the correct partitions. In this caseall the loader sessions can share the same control file, so there is no need to mention it
in the statement.
The 4eyword PARALLEL+TR3E must be used, because each of the seven loader sessions
can write into every partition. In &ase ', every loader session would write into only
one partition, because the data was partitioned prior to loading. -ence all
the PARALLEL 4eyword restrictions are in effect.
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
39/46
In this case, Oracle attempts to spread the data evenly across all the files in each of the
'A tablespaceshowever an even spread of data is not guaranteed. Moreover, there
could be I9O contention during the load when the loader processes are attempting to
write to the same device simultaneously.
Example 13-, !oading Partitions in Parallel Case 3
In this e%ample, you want precise control over the load. To achieve this, you must
partition the input data in the same way as the datafiles are partitioned in Oracle.
This e%ample uses '@ processes loading into 8@ dis4s. To accomplish this, you must
split the input into 'A@ files beforehand. The '@ processes will load the first partition
in parallel on the first '@ dis4s, then the second partition in parallel on the second '@
dis4s, and so on through the 'Ath partition. )ou then run the following commands
concurrently as bac4ground processes:
S=LL$R $ATA+jan%-0ile"-dat $IRECT+TR3E PARALLEL+TR3E FILE+.de.$"-"---S=LL$R $ATA+jan%-0ile"&-dat $IRECT+TR3E PARALLEL+TR3E FILE+.de.$"&-"WAIT(---S=LL$R $ATA+de:%-0ile"-dat $IRECT+TR3E PARALLEL+TR3E FILE+.de.$!&----S=LL$R $ATA+de:%-0ile"&-dat $IRECT+TR3E PARALLEL+TR3E FILE+.de.$'%-
$or Oracle ?eal !pplication &lusters, divide the loader session evenly among the
nodes. The datafile being read should always reside on the same node as the loadersession.
The 4eyword PARALLEL+TR3E must be used, because multiple loader sessions can write
into the same partition. -ence all the restrictions entailed by the PARALLEL 4eyword are
in effect. !n advantage of this approach, however, is that it guarantees that all of the
data is precisely balanced, e%actly reflecting your partitioning.
2ote"
!lthough this e%ample shows parallel load used with partitioned tables, thetwo features can be used independent of one another.
Example 13- !oading Partitions in Parallel Case $
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
40/46
$or this approach, all partitions must be in the same tablespace. )ou need to have the
same number of input files as datafiles in the tablespace, but you do not need to
partition the input the same way in which the table is partitioned.
$or e%ample, if all 8@ devices were in the same tablespace, then you would arbitrarily
partition your input data into 8@ files, then start 8@ 6=#>#oader sessions in parallel.The statement starting up the first session would be similar to the following:
S=LL$R $ATA+0ile"-dat $IRECT+TR3E PARALLEL+TR3E FILE+.de.$"- - -S=LL$R $ATA+0ile!&-dat $IRECT+TR3E PARALLEL+TR3E FILE+.de.$!&
The advantage of this approach is that as in &ase 8, you have control over the e%act
placement of datafiles because you use the FILE 4eyword. -owever, you are not
reuired to partition the input data by value because Oracle does that for you.
! disadvantage is that this approach reuires all the partitions to be in the same
tablespace. This minimi/es availability.
Example 13-1. !oading External Data
This is probably the most basic use of e%ternal tables where the data volume is large
and no transformations are applied to the e%ternal data. The load process is performed
as follows:
' )ou create the e%ternal table. Most li4ely, the table will be declared as parallelto perform the load in parallel. Oracle will dynamically perform load balancing
between the parallel e%ecution servers involved in the uery.
' Once the e%ternal table is created 1remember that this only creates the metadata
in the dictionary3, data can be converted, moved and loaded into the database
using either a PARALLEL CREATE TABLE ASSELECT or a PARALLEL INSERT statement.
' CREATE TABLE p@6d:ts_et! 4p@6d_id N3MBER5 p@6d_name 2ARCHAR'4&75 ---5 p@i:e N3MBER4-'75 dis:6nt N3MBER4-'77
ORGANIDATION ETERNAL 4 $EFA3LT $IRECTOR) 4stae_di@7 ACCESS PARAMETERS% 4 RECOR$S FIE$ !&"& BA$FILE /ad./ad_p@6d:ts_et"" LOGFILE l6.l6_p@6d:ts_et"' 4 p@6d_id POSITION 4"7 CHAR5"! p@6d_name POSITION 4*51&7 CHAR5" p@6d_des: POSITION 4*51'&&7 CHAR5
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
41/46
" - - -7" REMO2E_LOCATION 4ne8.ne8_p@6d"-tt5ne8.ne8_p@6d'-tt77" PARALLEL " REECT LIMIT '&&("% U l6ad it in t9e data/ase sin a pa@allel inse@t'& ALTER SESSION ENABLE PARALLEL $ML('" INSERT INTO TABLE p@6d:ts SELECT * FROM p@6d:ts_et(''
In this e%ample, stae_di@ is a directory where the e%ternal flat files reside.
ote that loading data in parallel can be performed in Oracle5i by using 6=#>#oader.
ut e%ternal tables are probably easier to use and the parallel load is automatically
coordinated. *nli4e 6=#>#oader, dynamic load balancing between parallel e%ecution
servers will be performed as well because there will be intrafile parallelism. The
latter implies that the user will not have to manually split input files before starting the
parallel load. This will be accomplished dynamically.
7e( *oo8up cenario
!nother simple transformation is a 4ey loo4up. $or e%ample, suppose that sales
transaction data has been loaded into a retail data warehouse. !lthough the data
warehouse7s sales table contains a p@6d:t_idcolumn, the sales transaction data
e%tracted from the source system contains *niform +rice &odes 1*+&3 instead of
product IDs. Therefore, it is necessary to transform the *+& codes into product IDs
before the new sales transaction data can be inserted into the sales table.
In order to e%ecute this transformation, a loo4up table must relatethe p@6d:t_id values to the *+& codes. This table might be the p@6d:t dimension
table, or perhaps another table in the data warehouse that has been created specifically
to support this transformation. $or this e%ample, we assume that there is a table
named p@6d:t, which has a p@6d:t_id and an p:_:6de column.
This data substitution transformation can be implemented using the following &T!6
statement:
CREATE TABLE temp_sales_step'NOLOGGING PARALLEL ASSELECT
sales_t@ansa:ti6n_id5 p@6d:t-p@6d:t_id sales_p@6d:t_id5
sales_:st6me@_id5 sales_time_id5
sales_:9annel_id5sales_?antit,_s6ld5sales_d6lla@_am6nt
FROM temp_sales_step"5 p@6d:t
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
42/46
WHERE temp_sales_step"-p:_:6de + p@6d:t-p:_:6de(
This &T!6 statement will convert each valid *+& code to a valid p@6d:t_id value. If
the "T# process has guaranteed that each *+& code is valid, then this statement alone
may be sufficient to implement the entire transformation.
Exception 9andling cenario
In the preceding e%ample, if you must also handle new sales data that does not have
valid *+& codes, you can use an additional &T!6 statement to identify the invalid
rows:
CREATE TABLE temp_sales_step"_inalid NOLOGGING PARALLEL AS SELECT * FROM temp_sales_step" WHERE temp_sales_step"-p:_:6de NOT IN 4SELECT p:_:6de FROM p@6d:t7(
This invalid data is now stored in a separate table, temp_sales_step"_inalid, and can
be handled separately by the "T# process.
!nother way to handle invalid data is to modify the original &T!6 to use an outer
2oin:
CREATE TABLE temp_sales_step'NOLOGGING PARALLEL AS
SELECTsales_t@ansa:ti6n_id5
p@6d:t-p@6d:t_id sales_p@6d:t_id5sales_:st6me@_id5
sales_time_id5sales_:9annel_id5sales_?antit,_s6ld5sales_d6lla@_am6nt
FROM temp_sales_step"5 p@6d:t WHERE temp_sales_step"-p:_:6de + p@6d:t-p:_:6de 417(
*sing this outer 2oin, the sales transactions that originally contained invalidated *+&
codes will be assigned a p@6d:t_id of N3LL. These transactions can be handled later.
!dditional approaches to handling invalid *+& codes e%ist. 6ome data warehouses
may choose to insert nullvalued p@6d:t_id values into their sales table, while other
data warehouses may not allow any new data from the entire batch to be inserted into
the sales table until all invalid *+& codes have been addressed. The correct approach
is determined by the business reuirements of the data warehouse. ?egardless of the
8/17/2019 Introduction to Extraction,Tranportation and Loading Methods in Data Warehouses
43/46
specific reuirements, e%ception handling can be addressed by the same basic 6=#
techniues as transformations.
$ivoting cenarios
! data warehouse can receive data from many different sources. 6ome of these sourcesystems may not be relational databases and may store data in very different formats
from the data warehouse. $or e%ample, suppose that you receive a set of sales records
from a nonrelational database having the form:
p@6d:t_id5 :st6me@_id5 8eel,_sta@t_date5 sales_sn5 sales_m6n5 sales_te5sales_8ed5 sales_t95 sale