WhitePaper Solution for Staging Area 01

Embed Size (px)

Citation preview

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    1/19

    Solution for Staging Area in Near Real-Time DWH Efficient in Refresh and Easy to Operate

    Technical White Paper

    Mathias Zarick, Karol Hajdu

    Senior Consultants

    March-2011

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    2/19

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    3/19

    [email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 3 / 19

    1. The role of a Staging Area in Data WarehouseIn data warehouse architectures, there are some common good practices concerning thestaging area:

    1. Create a staging area. After being extracted from source systems, the data is loadedinto the staging area. The staging area serves as the input for transformation

    processes.

    2. During the extraction and load into the staging area, only minimal datatransformations are done: the tables in the staging area have the same structure as

    the corresponding tables in the source system. This makes the ETL architecture

    much more transparent.

    Based on the staging areas content, the transformation and integration processes will produce:- snapshots of data, serving as input for DWHs versioning- sets of change events (transactions) to be loaded into the DWH

    1.1 The Challenge called short latencyIn many enterprises, the Data Warehouse is the place where operative data originating from

    different systems are coming together and are integrated with analytical or dispositive data.

    Step-by-step, many business users discovered the value of integrated data. They use the data

    stored in the Data Warehouse to create reporting or analytical applications.

    As the markets in many lines of business get more and more volatile, the business users are

    not willing to wait several days or hours for the latest figures. They require a shortened latency

    of the Data Warehouse: the need for near real-time data warehouse was born.

    Integration tasks involve both hardware resources and time. Hence, the Data Warehouse

    architects faced a new challenge: to find a trade-off between get more speed (data latency)

    and provide integrated and cleansed data. Some of them decided to introduce additional

    redundancy (by creating an Operational Data Store, having short latency, but less integration).

    Some of them decided to provide short latency only for very narrow and well specified content:they speak about real-time data warehouse content, rather than a real-time data warehouse.

    Regardless on which approach the Data Warehouse architect has chosen, he or she needs to

    have a Staging Area with short latency. This is the subject covered by this white paper.

    1.2 Different solutions having different advantagesThere are several technical approaches how data extraction and the loading of a staging area

    can be implemented. The technical implementations differ basically in the following

    characteristics:

    - transferred data volumes required to refresh the staging area

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    4/19

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    5/19

    [email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 5 / 19

    2. Solution with Data Guard the management perspective2.1 Benefits for Data WarehousesOur experience shows that the solution described in this paper brings the following benefits for

    Data Warehousing:

    Benefit How is this achieved?

    Short latency of data stored

    in DWH or ODS

    (to near real-time)

    The refresh process of Staging Area and/or Operational Data

    Store (ODS) is very efficient.

    It consumes small amount of HW resources.

    It terminates in short elapsed time.

    Shorter time-to-market

    for new ETL functionality

    Solution enables that Staging Area contains full set of data

    (not only the changed records). This makes the ETLapplication more transparent. Introducing changes in ETL

    applications is then less complex.

    Easy and stable operation While refreshing the tables in Staging Area, the operational

    complexity is delegated to standard and reliable Oracle

    products and features. These features are easy to operate.

    2.2 Which types of Data Warehouses will benefit most?Extraction and sourcing from dedicated online transaction applications which are used to

    manage complex relationships between customers, suppliers, accounts or delivery

    components (applications like CRM or SCM2) can be very hard. The underlying database

    schema of these applications is related with complex data models3. Companies using

    dedicated CRM or SCM applications often have to manage the life cycle ofseveral millions of

    individual subjects (like customers, suppliers, contracts, product components, stock keeping

    units, policies etc).

    A Staging Area or even an Operational Data Store (ODS) if using the solution described in

    this paper, takes most benefits if:

    - The source system has huge data volume with complex relationships, havingrelatively small rate of data changes.

    - There are reports with short latency requirements. The Staging Area needs tocapture the changes made in online applications with a very short latency.

    2 Supply Chain Management3 a lot of relationships between the tables

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    6/19

    [email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 6 / 19

    3. Solution with Oracle Data Guard the technical insight3.1 How it works?The solution presented in this article is based on Oracle Data Guard.

    Data Guard technology maintains standby databases which are copies of primary databases.

    A Data Guards standby database can also be used for refreshing the staging area in a data

    warehouse.

    The main idea is based on the Data Guards ability to open a physical standby database

    temporarily read-write and the ability to rewind it back to the time when it was opened. This is

    achieved by using Oracle guaranteed restore point and flashback technology.

    How can this be used for refreshing a staging area?

    Lets explain it using the Figure 1. On the data warehouse machine (host DWH), a physical

    standby for the database OLTPwill be configured. The primary database of OLTP is on host

    OLTP.

    This setup leads to the following situation: Using the Data Guard functionality, any change

    done on the primary database will be performed on the standby database as well.

    OLTP_SITE2OLTP_SITE1

    tablespaceCRM

    datafile crm01OLTP.dbf

    physical standby

    databaseOLTP

    Standby Site

    databaseDWH

    Primary Site

    primary

    database OLTP

    host OLTP host DWH

    Staging Area CORE DWH

    Redo

    Transport

    tablespaceCRM

    Archived

    Redo Logs

    Archived

    Redo Logs

    Standby

    RedoLogs

    Online

    RedoLogs

    Figure 1: On the DWH machine, a physical standby database of OLTP is configured with Data Guard

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    7/19

    [email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 7 / 19

    Reading from the Staging Area:

    As soon as an ETL process inside the DWH database needs to read the content out of the

    staging area, the following action will be taken:- Recovery process on the standby is paused.- Physical standby database is converted to a snapshot standby database. This opens

    the standby database read write.

    - Using the transportable tablespaces feature, the tablespace CRMof the snapshotstandby database is plugged into the database DWH:

    o The tablespace CRMin the snapshot standby database is set to read onlymode.

    o The metadata (definition of tables, indexes, etc.) of this tablespace istransferred with data pump from the snapshot standby database to the

    DWH database

    4

    .- Datafile crm01OLTP.dbf is now part of both databases (snapshot standby databaseOLTPand database DWH). In both databases the tablespace is in read only mode.

    - The ETL process can read the data out of the staging area.

    OLTP_SITE2

    Archived

    Redo Logs

    OLTP_SITE1

    Archived

    Redo Logs

    read only access

    tablespaceCRM

    read only access

    tablespaceCRM

    datafile crm01OLTP.dbf

    Standby Site

    databaseDWH

    Primary Site

    primary

    database OLTP

    host OLTP host DWH

    Staging Area CORE DWH

    StandbyRedoLogs

    Redo

    Transport

    OnlineRedoLogs

    tablespaceCRM

    physical / snapshot standby

    databaseOLTP

    Figure 2: On the DWH machine, datafile crm01OLTP.dbf is part of both databases (read-only)

    4 For convenient handling of this transfer with data pump a database link can be used.

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    8/19

    [email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 8 / 19

    Refreshing the Content in the Staging Area:

    As far as there is a need to read a more current content, that means there is a need to refresh

    the CRMpart of the staging area, the following action will be taken:- The plugged-in tablespace CRMis dropped from the DWHdatabase.- The snapshot standby database is converted back to a physical standby database.- This resumes the recovery process of all its datafiles, including those of the

    tablespace CRM.

    OLTP_SITE2OLTP_SITE1

    recovery

    tablespaceCRM

    dropped

    tablespaceCRM

    datafile crm01OLTP.dbf

    Standby Site

    databaseDWH

    Primary Site

    primary

    database OLTP

    host OLTP host DWH

    Staging Area CORE DWH

    Redo

    Transport

    tablespaceCRM

    physical / snapshot standby

    databaseOLTP

    Archived

    Redo Logs

    Archived

    Redo Logs

    StandbyRedoLogs

    OnlineRedoLogs

    Figure 3: Tablespace CRMis dropped from DWH database; standby database is converted back to physical

    standby

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    9/19

    [email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 9 / 19

    3.2 The Key AdvantagesThis solution has the following Key Advantages:

    - Staging area contains the full set of data.- No additional workload on the host OLTP.- Datafiles with full set of data are neither transferred nor copied.

    o Volume of data transferred between the OLTPand DWHis determined merelyby the volume of data changes (size of archived redo logs).

    - Elapsed time of refresh process of the staging area this represents the refresh of thestandby database - does not include the elapsed time to copy the archived redo logs

    from host OLTPto DWH:

    o The standby site is able to receive logs from the primary database, in both thephysical standby mode and in the snapshot standby mode. In the snapshot

    standby mode, the logs are queued and not applied.o Since the log transport to the standby site is running all the time, as the

    recovery process resumes, the outstanding archived redo log files are already

    registered and available for the recovery5 of the physical standby database.

    - Elapsed time for refresh process of staging area does not depend on tablespace sizebut only on the volume of data changes since the last refresh.

    - Once configured, both the operation of physical standby databases and the operationof transportable tablespaces are easy to handle and maintain.

    - Neither remote queries nor distributed joins are used.- On the DWH database the access methods to the data residing in the transported

    tablespace(s) can be adjusted as follows:o estimation of additional statistics like histogramso manipulation of statisticso creation of additional data structures like indexes or materialized views

    Considering the overhead produced on the source system and the workload produced on the

    DWH machine, the solution presented in this article is the most efficient one.

    - Only the redo logs, and no additional structures, are used- it works on the level of changes on data blocks and not on the level of SQL

    statements

    5 Transported redo logs are applied in physical standby mode only.

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    10/19

    [email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 10 / 19

    In case the refresh of the staging area is the only purpose of the standby database on the

    DWHmachine, the elapsed time for the refresh process can be minimized by narrowing the

    scope of the recovery process on the standby database to only those tablespaces ofOLTPdatabase, which need to be read by the ETL application.

    Usually, the ETL processes in DWH require other index types than an OLTP application. If

    indexes of an OLTP schema reside in a separate tablespace, excluding them can boost the

    recovery process.

    Exclusion of irrelevant tablespaces can be easily achieved by offlining and deleting their

    datafiles on the standby database6.

    The standby database on the DWHmachine can be configured to serve two purposes at the

    same time: both for refresh of staging area and for disaster protection ofOLTPdatabase.

    While considering this approach, be aware of the following impacts:- A standby database with offline datafiles cannot be used for disaster protection.- If MaxAvailability or MaxProtection is considered then the availability or the workload

    on the DWH machine can impact the availability or the performance of the OLTP

    database.

    3.3 Technical PrerequisitesThere are some technical prerequisites which have to be fulfilled, in order to be able to use the

    solution described.

    These prerequisites can be grouped in the following categories:

    - Identical database character set- Self-contained tablespace sets- Required Oracle database releases- Required Oracle licenses

    3.3.1 Identical Database Character SetIn order to use transportable tablespaces the database OLTPand the database DWHmust

    have identical database character set and identical national character set.

    3.3.2 Self-contained Tablespace SetsIn order to be able to transport a set of tablespaces it needs to be self contained. This means

    that you cannot transport a set of tablespaces which contain objects with other dependent

    objects such as materialized views, table partitions, etc. as long as you transfer all those objects

    together in one set7.

    6 Tablespaces that are needed for opening the database like system, sysaux and undo cannot be excluded.7 Segmentless objects like sequences, views, pl/sql packages are not transferred with transportabletablespaces. Normally you dont need to transfer them into Staging Area anyway.

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    11/19

    [email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 11 / 19

    3.3.3 Required Oracle Database ReleaseThe OLTPdatabase needs to be operated with Oracle Database release 10g or higher.

    Oracle 11g is recommended as the snapshot standby database feature is available as of thisrelease.

    If using Oracle 10g it would be necessary to emulate this functionality manually by creating a

    guaranteed restore point on the standby database before opening it read write. The following

    limitations have also to be considered when running with Oracle 10g:

    - There is no out-of-the-box handling with Data Guard for this functionality. You willneed to develop a piece of code but this is quite straight forward.

    - The redo transport between primary and standby is stopped during the period whenthe standby is open read write8.

    In order to use transportable tablespaces in this context, the DWHdatabase needs to be atsame or higher release as the OLTPdatabase.

    3.3.4 Required Oracle LicensesThis solution requires Oracle Enterprise Edition licence both for the OLTPhost and for the

    DWH host. All required features like Data Guard, transportable tablespaces and snapshot

    standby database are included in the Enterprise Edition license.

    No additional option is required for this solution - neither the Active Data Guard 9 option nor

    the Partitioning option.

    8 As mentioned before with a snapshot standby database as of 11g the log transport stays active all the time.9

    Active Data Guard is a new extra licensable option with 11g which includes real-time query and fastincremental backup. None of these features is required by the described solution.

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    12/19

    [email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 12 / 19

    4. Take a Tour on Real-Life ExampleTo demonstrate our approach on a representative sample we will use an excerpt from the

    database schema of CRM application called Siebel. We took Siebel to improve the readabilityof this example. Siebel is a widely used CRM application owned by Oracle and hence there is a

    higher chance that ETL developers are familiar with the data model behind it.

    It is important to understand that the described solution works with any other system or

    application, even a non-standard in-house developed SW application, too10.

    We took the Siebel tables S_CONTACT, S_ORG_EXT and S_ASSET as the representatives for a

    set of approximately 15 Siebel tables having complex relationships and high cardinality.

    4.1

    Real-life example

    Consider the following common Data Warehouse situation:

    Transformation processes have to read the content out from Siebel tables and transform it into

    a new entity, lets call it Customer Subscription (refer to Figure 3).

    Figure 3: Transformation process reads Siebel tables and transform the data into new entity Customer

    Subscription

    The Data Warehouse has to store not only the latest status of the Customer Sub scriptions,

    but also all the historical values. The ETL has to compare the new snapshot of Customer

    Subscriptions with the latest one and in case of changes create new versions which will

    keep track of the fragmented history. This concept is known as versioning -refer to Figure 4.

    10 as long as the data is stored in Oracle RDBMS

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    13/19

    [email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 13 / 19

    database DWH

    Staging Area CORE DWH

    tablespace CRM

    S_ASSET S_ORG_EXT

    S_CONTACT

    C_CUST_SUBSCRIPTION

    Derive

    Customer SubscriptionNew rows

    Updated rows

    Deleted rows

    DELTA

    Historize Subscription

    0.5 mio rows

    0.5 mio rows

    2.5 mio rows

    Latest Snapshot

    Highest Version

    from History- create new version

    - close version

    Figure 4: ETL compares new snapshot of Customer Subscriptions with the latest one and in case of

    changes creates new versions which keep track of the fragmented history

    Consider the following design decisions of a Data Warehouse architect:

    - Due to the many inner joins and filters inside the query, the Staging Area needs to holdthe full set of data.

    - transferring millions of rows from source system to Staging Area every night is not anoption

    - in the source system, no reliable row-markers or journals exist or can be introduced- architect decided to use the solution described in this white paper

    Because of the high cardinality of the data set (several millions of rows) good scalability of the

    underlying database11 is assumed.

    In the next sections we will present the most important steps to build and operate this solution.

    4.2 Setup and ConfigurationThe Oracle Data Guard has been setup as described above in chapter 3.

    On both the OLTPdatabase and the DWHdatabase, we used Oracle Database 11.2.0.2.0.

    We created a Data Guard Broker configuration. We left the protection mode on Maximum

    Performance (default) and set the log transport to asynchronous.

    11 including the physical data model of CORE DWH

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    14/19

    [email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 14 / 19

    To enforce logging on database OLTP, we issued the following statement:

    ALTER DATABASE FORCE LOGGING;

    This causes every12 attempt of an unrecoverable nologging operation to be logged anyway.

    4.2.1 Create Role with Common Name in Both DatabasesIn both the database DWHand the database OLTP, the role dwh_sa_crm_role was created:CREATE ROLE dwh_sa_crm_role;

    Grant the SELECT privilege on Siebel tables to this role in the OLTPdatabase.GRANT SELECT ON s_contact TO dwh_sa_crm_role;GRANT SELECT ON s_org_ext TO dwh_sa_crm_role;GRANT SELECT ON s_asset TO dwh_sa_crm_role;

    You will also need to create the owner of the transported tables on the DWHdatabase:CREATE USER crm IDENTIFIED BY thisIsASecretPassword;

    Neither a create session nor a create table privilege is necessary for this user.

    4.3 OperationLets take a look on operation of this solution.

    From the point of view of the CRM data in the staging area, there are two main operational

    states:

    - Snapshot of latest CRM data is available in staging area (Status A)- Refresh of CRM data in staging area is in progress (Status B)

    time

    OLTP CRM users are changing the operative data (24/7)

    A: Snapshot of latest CRM datain Staging Area is available for read

    DWH

    B: Refresh of CRM data in Staging Area

    Figure 5: Two main operational states for CRM data in the Staging Area

    Most of the time, the CRM data in the staging area is available for read (Status A). Sometimes

    you will need to refresh the data in the staging area: during this period, the data is not

    available (Status B).

    12 Like for any other change in database instance parameters: an impact analysis is required before makingthis change.

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    15/19

    [email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 15 / 19

    Transitions between these two operational states are usually triggered by one of the following

    two events

    - ETL processes need more current CRM data (A to B)o This event triggers the start of refresh processo The goal of the refresh process is to achieve a given (defined) point in time of

    the snapshot

    - ETL processes need to read CRM data again (B to A)o As soon as the CRM data in physical standby is current enough the refresh

    process will be terminated and transition to the Status A will be taken

    o This event triggers the immediate termination of the refresh process and causestransition to status A.

    In the next paragraphs we will describe:- the actions related with the termination of the refresh process and- the actions related with the start of the refresh process

    4.3.1 Termination of Refresh ProcessAs long as the refresh process is in progress, the CRM data in the staging area is not available.

    The datafiles of the tablespace CRM13 on the host DWHare currently exclusively assigned to

    the physical standby database ofOLTPcalled OLTP_SITE2 for recovery.

    In order to terminate the refresh process the following sequence of actions is taken:

    Firstly the physical standby is converted to a snapshot standby database. This is performed as

    follows:

    DGMGRL> connect sys@OLTP_SITE2

    Password:

    Connected.

    DGMGRL> convert database 'OLTP_SITE2' to snapshot standby

    Converting database "OLTP_SITE2" to a Snapshot Standby database, please wait...

    Database "OLTP_SITE2" converted successfully

    Secondly the tablespace is set to read only and plugged into the DWH database with data

    pump via a database link from DWH to the snapshot standby database.

    SQL> alter tablespace crm read only;# impdp system@DWH logfile=imp_crm.log network_link=OLTP_SNAPtransport_tablespaces=CRM transport_datafiles=d:\oradata\oltp\crm01oltp.dbf14

    13 Of course this concept can be extended to transfer multiple tablespaces.14 If running 11.2.0.2 due to Oracle Bug 10185688 it is required that either XDB is loaded into sourceDatabase or related patch is applied.

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    16/19

    [email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 16 / 19

    In order to transport the metadata you may also use other alternatives like:

    - export the metadata with data pump to a dump file and import from that dump fileinstead of using a database link

    - export / import with classical exp/imp15

    - initiate data pump directly with PL/SQL, see [2] for details

    When transferring the metadata you can also decide whether to include or exclude certain

    tables. You can also choose whether to import indexes, object privileges, table triggers and

    table constraints.

    As the last step a deterministic function is created in DWH database. Return value of this

    function reflects the timestamp of the CRM data. We used the following PL/SQL code:

    declaresql_text varchar2(1000);v_timestamp varchar2(20);

    beginselect to_char(timestamp,'DD.MM.YYYY HH24:MI:SS') into v_timestampfrom (select timestamp from gv$recovery_progress@OLTP

    where item = 'Last Applied Redo'order by start_time desc )

    where rownum < 2;dbms_output.put_line('timestamp is ' || v_timestamp);sql_text := 'create or replace function crm.SA_CRM_SNAPSHOT_TIMESTAMP return date

    deterministic ists date;

    begin

    select to_date ('''|| v_timestamp ||''', ''DD.MM.YYYY HH24:MI:SS'')into ts from dual;return ts;

    end;';execute immediate sql_text;execute immediate 'GRANT EXECUTE ON crm.SA_CRM_SNAPSHOT_TIMESTAMP to

    DWH_SA_CRM_ROLE' ;end;/

    Listing 1: In the DWH database, this creates a function which returns the timestamp for data in the CRM

    tablespace

    This function is used by ETL processes during the versioning operation (Figure 4): it is used tobuild the value for VALID_FROM attributes of new versions and for VALID_TO of versions to be

    closed.

    15 Deprecated with 11g but worked in our case.

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    17/19

    [email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 17 / 19

    4.3.2 Start of Refresh ProcessIn order to start the refresh process the following sequence of actions has to be taken:

    Firstly, the tablespace CRM has to be dropped from the DWH database. After this any queries

    on the data will fail as it is not available. Dependent views, synonyms, stored PL/SQLprocedures etc. get invalid.16

    SQL> drop tablespace crm including contents;DGMGRL> convert database 'OLTP_SITE2' to physical standby

    If you need to check whether your physical database with CRM tablespace in OLTP_SITE2 is

    again current enough in order to be used for the next integration load cycle you can easily

    query the Data Guard Broker:DGMGRL> show database 'OLTP_SITE2';

    Database - OLTP_SITE2

    Role: PHYSICAL STANDBYIntended State: APPLY-ONTransport Lag: 0 seconds

    Apply Lag: 11 minutes 23 secondsReal Time Query: OFFInstance(s):oltp

    Database Status:SUCCESS

    As the refresh is a parallel media recovery the process is very efficient.

    Media recovery works block change oriented and is much faster and less resource consuming

    than the mechanisms of GoldenGate and Streams where SQL is extracted and processed row

    by row.

    The presented real-life example demonstrates clearly the high efficiency and the easy and

    stable operation of this solution.

    5. Solution extension: If data availability for operational reportingmatters

    There is yet another challenge for todays DWH architects: Where to place the Operational

    Reporting?

    - The OLTP database is becoming a less and less suitable place due to the heavyworkload related with complex query logic inside the Operational Reports

    - Many Operational Reports query not only the data residing in OLTP system, but alsoadditional analytical attributes which are typically stored in a Core DWH.

    With the solution presented in this white paper, the DWH architect can consider to use the

    data residing in the Staging Area for Operational Reporting17 as well. As this data resides in the

    16 They get valid automatically again when they are used after the tablespace reappears in next cycle.

  • 7/30/2019 WhitePaper Solution for Staging Area 01

    18/19

    [email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 18 / 19

    data warehouse database, it can be joined with analytical attributes in Core DWH without

    performance impacts (no distributed queries).

    However, one point has to be taken into consideration: During the refresh, the data in Staging

    Area is not available (refer to Figure 5). This unavailability needs to be eliminated.

    The snapshot functionality of operating system and/or storage facility can be used to

    overcome this.

    The concept: After the standby database is turned into snapshot standby and the tablespaces

    are set to read only, snapshots of data files will be created. These snapshots are then plugged

    into the DWH database, instead of the standby databases data files.

    This will result in two advantages:

    - the data in the Staging Area is available almost all18 the time-

    the recovery of the tablespaces can go on as the standby database can be convertedfrom snapshot standby back to a physical standby right after taking the snapshot of the

    data files. This will achieve an even shorter latency for the refresh cycles.

    Note: the snapshots do not copy the data. The data is presented a second time.

    Later changes are tracked for both sets of data: origin and snapshot.

    This is known as copy on write mechanism (COW).

    Examples for OS side snapshotting:

    With ZFS on Solaris you have a feature of taking copy on write snapshots. It is also possible

    with Veritas file system, LVM snapshots in Linux and Microsoft Volume Shadow Copy on

    Windows. SAN and NAS Systems also offer snapshotting features that work with COW

    mechanism.

    By using the knowhow of Trivadis, we believe it is possible to reduce operating costs and the

    complexity of your data warehouse: proper design is what matters!

    Kontakt

    Karol Hajdu [email protected]

    Mathias Zarick [email protected]

    Trivadis Delphi GmbH

    Millennium Tower

    Handelskai 94-96

    A-1200 Vienna

    Tel.: +43 1 332 35 31 00

    www.trivadis.com

    Please contact us if you need more information or help with your setup.

    17 at least for that part of reporting where the integrity level of data in Staging Area is sufficient.18 short downtime will still occur during tablespace drop and re-plugin

    mailto:[email protected]:[email protected]:[email protected]:[email protected]://www.trivadis.com/http://www.trivadis.com/http://www.trivadis.com/mailto:[email protected]:[email protected]
  • 7/30/2019 WhitePaper Solution for Staging Area 01

    19/19

    Literature and Links

    [1] Oracle Data Guard Concepts and Administration,http://download.oracle.com/docs/cd/E11882_01/server.112/e17022/toc.htm

    [2] Oracle Database PL/SQL Packages and Types Reference Chapter 46,

    http://download.oracle.com/docs/cd/E11882_01/appdev.112/e16760/d_datpmp.htm

    [3] Data Warehousing mit Oracle Business Intelligence in der Praxis. Chapter 3.4. Jordan et al. Hanser.

    2011.

    http://download.oracle.com/docs/cd/E11882_01/server.112/e17022/toc.htmhttp://download.oracle.com/docs/cd/E11882_01/server.112/e17022/toc.htmhttp://download.oracle.com/docs/cd/E11882_01/appdev.112/e16760/d_datpmp.htmhttp://download.oracle.com/docs/cd/E11882_01/appdev.112/e16760/d_datpmp.htmhttp://download.oracle.com/docs/cd/E11882_01/appdev.112/e16760/d_datpmp.htmhttp://download.oracle.com/docs/cd/E11882_01/server.112/e17022/toc.htm