21
Version 2.3 Real-time High Volume Data Replication White Paper

Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

  • Upload
    hakhanh

  • View
    232

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Version 2.3

Real-time High Volume Data Replication

White Paper

Page 2: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

2

Table of Contents 1 HVR overview .................................................................................................................................... 3

1.1 HVR Usage Scenarios ................................................................................................................ 4 1.2 Product Overview ..................................................................................................................... 7

2 Technology ........................................................................................................................................ 9 2.1 Architecture and Key Capabilities............................................................................................. 9 2.2 Continuous Database Replication .......................................................................................... 10 2.3 Database Compare and Refresh/Repair ................................................................................. 14 2.4 File Replication ....................................................................................................................... 16 2.5 HVR Management and Operations ......................................................................................... 17 2.6 Platform Support .................................................................................................................... 19

3 About HVR – High Volume Replication ........................................................................................... 21

Page 3: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

3

1 HVR overview Companies and organizations rely ever more on intensive flows of information to carry out or sustain their business processes. The information needs to be available at the right time at the right place. As a result IT systems must be able to distribute large amounts of data to a multitude of computing platforms on geographically dispersed locations. And, in today’s ultra-connected world this must be achieved in near real-time. This is the where HVR comes in. HVR is a software package that can be deployed to configure and perform data replication and synchronization between various kinds of databases and other types of data repositories within distributed computing environments. Using HVR enterprises can easily manage very large and sophisticated data integration scenarios from a central point of control. HVR is designed for infinite scalability and to most efficiently use network and computing resources whilst aiming for minimum latency in data transfers.

Figure 1. HVR Real-time Data Integration All of HVR’s functionality is provided by a single product and can be used interchangeably without extra configuration. HVR can coexist with other integration solutions. It can operate in conjunction with an Enterprise Service Bus (ESB) for integration closer to the application layer or with an ETL (Extract, Transform, Load) tool when in addition to real time integration extensive transformations are required. HVR’s open architecture simplifies cooperation with other integration technologies.

Page 4: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

4

1.1 HVR Usage Scenarios HVR’s unique set of capabilities make it a very versatile integration tool that can be applied as a solution in wide variety of usage scenarios. The first four use cases center around trends in the data integration space.

Real-time analytics, Business Intelligence and reporting More and more companies are using their own data to obtain competitive advantage through sophisticated analysis and reporting. In many cases data is moving from transactional systems, often more than one, into a consolidated system optimized for analytics. Today’s competitive environment increases the need for up-to-date information in support of much more operational and personalized analytics. This kind of “operational analytics” requires real-time integration and consolidation from all relevant data sources. HVR fulfills these requirements through its real-time data integration capabilities and support for a variety of analytical database technologies.

Figure 2. Real-time Data Integration trends

Big data Data volumes are ever increasing as organizations want to store and mine all available data generated anywhere. Besides transactional data, organizations also want to store behavioral data like click paths and social media interactions. Add data emitted by devices such as sensors and mobile phones and even looking at volume alone one could say this is big data. Organization will somehow need bring this data together to make sense out of it, often using technologies such as Hadoop and NoSQL databases that were not traditionally part of the data center. HVR’s heterogeneous nature and efficient operations make it ideally suited to support big data uses cases with its scalable infrastructure and support for structured and unstructured data.

Hybrid Cloud Computing To many organizations, cloud computing services offer many advantages in terms of cost flexibility, scalability, availability and speed of deployment. As a result many organizations are migrating parts if not all of their IT environment to the cloud. With this, at least temporarily, the IT landscape turns into a combination of cloud hosted and internally hosted applications. IT departments are now faced with the challenge of exchanging, replicating and synchronizing data between these applications. HVR’s rich capabilities and focus on performance, efficiency and security makes it particularly suitable in cloud environments. HVR is specifically adapted to the most popular cloud environments.

Page 5: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

5

Data Integration : Separate => Single solutions Enterprise Integration covers a multitude of the use cases mentioned before. HVR provides a very versatile technology that can be used for many use cases, and multiple organizations have selected HVR to consolidate a variety of different technologies into a single, easy to use and maintain, real-time integration environment.

On top of the trends there are still a number of traditional data integration use cases.

Geographical Distributed Computing Some organizations have to spread their computing systems over a number of locations to minimize latency and offer a great user experience across the globe. Globally operating companies often run IT services from regional data centers to ensure local availability and performance. Organizations that operate on a regional level may find it useful to retain data and computing facilities in local branch offices, especially if the local availability of business applications is business-critical and network performance is relatively expensive or unreliable. HVR can be deployed to selectively distribute large databases and files between geographically dispersed sites reliably in real-time with minimum use of network resources.

Figure 3. Geographical Distributed Computing

High Availability Business-critical operations often warrant multiple copies of the data and the databases to provide failover and disaster recovery solutions, and to minimize downtime during planned or unplanned downtime. Multiple copies of the data also allows for load-balancing of concurrent systems to avoid degradation of performance. HVR can be used to create multiple database and file copies. Because of HVR’s fast and efficient replication and transport mechanisms, there will be minimal if any data loss in case of a failover.

Page 6: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

6

Figure 4. High availability between databases

Migrations In every large IT environment migrations are a fact of life. They may range from operating system and software upgrades and to server installations or upgrades to large scale projects implementing new platforms or business applications. Such migrations often introduce downtime and may introduce integration challenges when legacy systems have to integrate with new applications, either temporarily or on an ongoing basis. The combined benefits that HVR provides for data integration and High Availability scenarios makes it a very adequate solution for migrations.

Figure 5. Cross-platform migrations

Page 7: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

7

1.2 Product Overview

HVR offers a number of powerful features that can be grouped into three major functionalities. These are provided within a common software framework and can be managed from a single integrated management console.

Figure 6. HVR framework

With those functionalities, HVR can support the total data integration lifecycle, from table create and initial data load to continuous integration to compare and repair until decommissioning.

Continuous Database Replication HVR supports continuous replication using log-based capture between databases within large distributed computing environments. Changes applied to the source database are detected by HVR in real-time and transmitted over the network to be copied to one or more target databases. Replication schemes can be quite complex. Implementations vary from a simple one-to-one replication on identical systems to multi-way active/active environments to multi-way distribution into a variety of different database technologies. All replication scenarios are configured, initiated and monitored from a single point of control within the enterprise and through a highly intuitive Graphical User Interface. Replications may be applied to entire databases, but can also be done selectively on specific tables within a database or even rows within a table. Data replication may be configured between instances of different DBMS product or versions, for example between OLTP databases such as Oracle and MS SQL Server, or into analytical database such as Teradata, Pivotal Greenplum or Actian Vector. HVR’s algorithms are optimized to make the replication processes as fast and efficient as possible. For example, transactions are captured efficiently using log-based change data capture, data is highly compressed when sent over the network, and whenever possible the software uses native DBMS interfaces to connect or load data.

Page 8: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

8

Database Compare and Refresh/Repair HVR Refresh is used during the initial load process. Optionally absent tables can be created – with keys – as part of this process. HVR can also be used to compare the contents of different databases. If there are differences, use refresh/repair to bring the databases in sync. This may be required before enabling real-time replication, or after recovering a database taking part in a replication after it has crashed. HVR supports the complete replication lifecycle.

As with continuous real-time replication, Database Compare and Refresh can be applied to heterogeneous DBMS instances and operates fast and efficiently. Any transformations defined in HVR are considered when performing the comparison.

File replication HVR offers the functionality to schedule, execute and monitor complex chains of file transfers. Files can be exchanged between multiple platforms, including Unix/Linux, Windows, Microsoft SharePoint and ftp/sftp locations. Optionally apply extensive and flexible rules to select, route and rename files. Managed File Transfer is used simply to manage file streams, but also in conjunction with data integration (e.g. picking up data from a database and delivering it onto a big data platform like Hadoop).

Figure 7. File replication Though functionally distinct, all capabilities are delivered within the same integrated software package using a common architecture. As a result all use cases take advantage of the same architecture in terms of scalability, performance, efficiency, and single control through an intuitive graphical user interface (offering a low TCO and quick time to market). HVR excels in complex real-time data integration scenarios in heterogeneous environments.

Page 9: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

9

2 Technology

2.1 Architecture and Key Capabilities Internally HVR uses a common software framework using shared code components. All functions, refresh, continuous replication, compare and repair as well as managed file transfer use this framework. HVR tasks operate on database and file stores that are generically referred to as “locations”. One server running the HVR software must be assigned a central role and is called the HVR hub. The hub serves as the central point from which all processes are run and tasks are scheduled, monitored and logged. The hub typically interacts with other installations of HVR called agents. The hub also interfaces with the HVR Graphical User Interface (GUI) to configure tasks and generate code for execution, and to start/stop jobs. Metadata for the hub is stored in relational tables in a database (the hub database). Locations interact with the hub by having either a HVR agent installed or using protocols like ODBC, FTP etc. The HVR agents at source locations take care of data capture from their local database or file store and sending the captured data to the hub, which distributes it to the target locations (often to agents at the target locations. Agents at the target locations integrate the data they receive into their local database or file store. Data exchange between the hub and agents benefits from the optimized HVR protocols and algorithms to provide optimum performance, efficiency and stability. The HVR hub also serves as the central queuing node for the data exchanges between source and target locations if for whatever reason a target system cannot keep up with the data volumes that must be consumed, for example if there is a network outage. Since the hub only requires limited resources it is often deployed on one of the source or target machines and does not require dedicated hardware.

Figure 8. HVR architecture

Page 10: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

10

HVR operates well in large distributed and heterogeneous environments thanks to its specific design criteria:

High performance and efficiency The HVR agents interface to the local databases or file systems through native interfaces, avoiding any overhead due to compatibility layers, such as ODBC or middleware. The protocols and algorithms that the HVR agents use to communicate over the network are optimized for high performance and low bandwidth usage. This is achieved through proprietary compression and smart data packaging techniques.

Scalability Most of the HVR processing is done by the local HVR agents to capture or integrate data. Almost every additional location adds an additional HVR agent so that a single hub can scale to thousands of locations.

Availability and continuity If multiple locations are selected as the source or target of an HVR task, and one of the locations becomes unavailable in the process, HVR will still complete the task for the remaining locations. This is achieved by creating independent jobs handling the communication with each of the locations, especially on the hub. To prevent the hub from becoming a single point of failure it should be installed on a highly available server or inside a cluster with failover set up between the nodes in the cluster. Beyond that it is possible to include a standby hub in the setup to which the HVR operation can be switched over when the primary hub fails. In this case, all HVR tasks can be continued without disruptions or loss of data.

Robustness and error recovery HVR has a range of options and mechanisms to detect and resolve errors that may occur, such as database collisions due to bi-directional replication, database errors, application errors and operator mistakes.

2.2 Continuous Database Replication

HVR can be used to selectively deliver captured changes from one or more designated source databases to one or more target databases. The databases may be of different types and versions, or be located on distributed sites. HVR is optimized for performance, efficient use of network and system resources, and large data volumes. HVR’s continuous database replication works through HVR agents that reside on the database servers involved in the replication process and that directly connect to their local databases. The HVR agent on a source database server continuously scans the source database transaction log for changes, extracts the relevant changes and sends these over the network to the HVR agents on the target database servers. The receiving agents apply the changes to the local database as SQL transactions through the native interface of the database server. The consistency and integrity of the replicated data is preserved based on the following principles:

Changes are acknowledged after they have arrived and have been applied to the target database,

Changes are not replicated until they are committed,

Changes are re-played on the target database in the same order that they occurred on the source database (except when using burst mode – see “using burst mode”)

By default transaction boundaries are maintained

This paragraph describes the primary technical features and components of continuous database replication and explains how these contribute to HVR’s unique capabilities.

Change Data Capture

Change Data Capture (CDC) is the mechanism through which changes to a database can be detected and extracted. HVR uses CDC to retrieve changes to the source database. Almost all implementations use Log-based capture. Log-based capture reads the transaction log files in which the database server records all transactions that are applied to its databases, (e.g. the redo and archive logs in an Oracle Database server). In a typical operating mode HVR should be reading the on-line log files at the tail end of the log where active transactions

Page 11: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

11

are written. However, if HVR falls behind for whatever reason, it will resort to reading from the archived log files until it catches up again to the current point in the online log files. Log-based capture is non-intrusive to the database server as it only needs read-permission to the database log files, and generally it reads the log directly out of the file system cache.), HVR also supports Trigger-based Capture which is rarely used. Unlike Log-based Capture, Trigger-based Capture does involve interference with the database and therefore additional overhead and latency. It does however also allow for specific customizations.

Network Transport

Captured changes from the source databases must be sent through the hub to the target databases, which may be situated in sites connected via slow networks. To perform this transfer efficiently HVR features a built-in transport mechanism allowing for very efficient use of the available system and network resources. The transport of change data is achieved over a direct socket connection between the HVR agents on the database servers involved in the replication process. This “symmetrical” streaming approach with collaborating HVR agents on both ends of each transport connection enables the following specific features and benefits:

Data compression HVR implements a proprietary compression algorithm that exploits specific information on the transported data, such as the column data types and table widths involved. The resulting compression ratios can be as high as 95+%%, hence making very efficient use of the available bandwidth.

Data packaging Prior to transmitting the data HVR combines whole queues of changes into only a few network packets to minimize the overhead of network protocols and performance reductions due to network round-trip latencies.

Data encryption Sensitive data can be protected using Secure Socket Layer (SSL) encryption.

Bandwidth management HVR can be configured to only use a defined fraction of the maximum bandwidth on a network connection to ensure other types of network traffic will go through concurrently.

Coalescing of changes Multiple changes applied to the same row within a transaction can be coalesced by HVR before transmission. Coalescing means that multiple changes to the same data are replaced by a single change having the same end result when it is applied to the target database. For instance, an insert and two updates to the same row in a single transaction can be merged into a single insert. On the destination by default transactions are applied in groups of 100 source transactions across which HVR can perform another coalesce operation. Applying coalesced changes reduces the number of changes to be transported and applied, which further boosts replication speed.

Minimization of replication hops Every change that is written to the file system causes additional I/O overhead and may introduce additional latency. HVR only queues data changes to disk once in order to keep the process running if one of the target databases is offline. Change data written to disk is highly compressed.

Integration optimizations

Several options exist to optimize HVR’s performance at the target location. The default mode of HVR is “trickle integrate” in which every transaction is immediately propagated but transactions are grouped into larger transactions for optimum performance (so long as there is enough transaction volume). On a busy system end-to-end latency is often seconds at most, and less than a second in many cases. Trickle integrate works well on an OLTP database that is optimized for fast inserts, updates and deletes.

Page 12: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

12

Figure 9. HVR optimizations

Alternatively, HVR can run in “batch” mode, where transactions are not applied individually but are collected and applied periodically. Batch mode works well for analytic databases that typically perform well for large bulk operations, but poorly when applying single-row changes or deletes.

Coalescing changes HVR can coalesce multiple operations on the same row into a single change. E.g. instead of performing an insert followed by 5 updates HVR would perform a single insert of the final row image. Also, an insert followed by delete results in a no-op.

“Burst” mode For optimal performance integrating at the target location applying captured changes, HVR provides a burst mode. In this mode, HVR automatically coalesces changes and uses SQL set operations to integrate the results. This feature is essential if large volumes of updates or deletes have to be loaded continuously into a target database that is not optimized for single row operations (e.g. an analytic database that does not support regular indexes).

Parallel integrate to leverage massive parallel (scale –out) clustered targets HVR can be configured to launch parallel integration tasks to speed up performance. On top of that, using the sharded key feature of specific (scale-out) databases, HVR can deliver those parallel jobs the right set of data to apply to each separate cluster node. This implies nearly 100% scalability in the number of cluster nodes and HVR jobs can be achieved.

Replication Topologies

HVR can both be used for one-directional, for bi-directional and for multi-directional replication between databases. In one-directional mode application transactions are applied to a source database and the resulting changes are replicated to a target database. In bi-directional or multi-directional mode application transactions run against two or more databases and the resulting changes travel in all directions between the databases. Every database acts both as the source and as the target in the replication process. HVR can be used in complex meshed topologies with a large number of databases that each may have multiple incoming and outgoing replication streams. In all these topologies HVR routes the changes through the server

Page 13: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

13

that has been designated as the HVR hub. Changes originating from a source database are sent to the HVR hub where they are stored in a queue file. From there on the change is sent to the target databases. Especially in complex topologies with many data flows, this central queuing mechanism greatly simplifies the configuration and management of the replication environment improving robustness and scalability. It is very easy to add or remove a database without disrupting the rest of the replication scheme.

Collision Detection and Resolution

Collisions can occur in bi-directional or multi-directional replication scenarios when a multiple users perform a conflicting change on different databases. For example two users make changes to the same row at the same time on different databases. Undetected collisions can lead to inconsistencies within and between the replicated databases. HVR has an efficient collision detection and resolution mechanism using the timestamps of the changes involved. It can be configured to operate selectively on the databases and tables where collisions may occur, such as in a bi-directional or multi-directional replication process. If the replicated table contains timestamp columns these can be used without the need to maintain a separate timestamp table, reducing overhead.

Heterogeneous Replication

In contrast to many other database replication tools HVR is not tied to a single DBMS product. HVR can just as easily replicate data between an Oracle database and a DB2 database as between two different versions of Microsoft SQL databases.

Figure 10. Example: HVR replication from Oracle Database to Microsoft SQL Server HVR’s performance and functional features are independent of the DBMS environments and whether they are homogeneous or heterogeneous, including bi-directional replication and collision handling. There is no need for additional tools or interfaces, as the local HVR agents interface directly with the local DBMS instance and automatically take care of any necessary data type conversions. Moreover, HVR can be configured to selectively replicate data from the source to the target databases, map between different table or column names and convert data values during the replication. The database schemas of the source and target databases need not be identical.

Selective replication HVR can be configured to selectively replicate changes in a given table. Only changes to rows that correspond to a definable criterion are replicated and changes to other rows are ignored. This can for example be used to allow old rows to be purged from a source production database to be kept in the target database that is used for archiving or reporting. It can also be used for horizontal partitioning in which different parts of a table are replicated into different directions. For example if a SAAS vendor distribute data out of its central database to local customers who can only see their own data.

Page 14: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

14

Name and data conversion HVR supports replication between tables or columns that have different names between the source and the target databases. On top of that HVR can be instructed to calculate new column values for the target database table from the source database table column values using an SQL expression, or to provide a default value. As a result source and target database tables can have different columns. Columns that are present in the source database table but not in the target database table may be ignored or can be used to provide input to calculate values for another column in the target database. Alternatively, columns that are present in the target database table but not in the source database table may be filled with a default value or a calculated value.

Figure 11. Name and data conversion

2.3 Database Compare and Refresh/Repair The HVR Database Compare and Refresh/Repair functionality are very useful to bring two or more databases in sync. Refresh can also be used for fast batch data loading or to perform the initial load for continuous database replication. Database Compare and Refresh/Repair can also be used to resynchronize replicated databases after a crash has occurred. The data transformation rules that were defined for the continuous database replication are taken into consideration during Compare, Repair and Refresh operations. Database Compare and Refresh/Repair shares many of the features and benefits of continuous data replication. Specifically, it can be deployed in heterogeneous database environments and between databases with different schemas and it can handle large data volumes through its high performance and efficiency. Database Compare can in itself be useful to monitor and verify the consistency of the databases. Both database Compare and Database Refresh/Repair can be done in two modes, a bulk mode and a row-wise mode:

Bulk compare In the case of bulk compare HVR performs checksums on each of the corresponding tables in the compared databases and subsequently compares these checksums. The actual data does not travel over the network, so this method is very efficient to verify whether large databases are in sync.

Page 15: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

15

Figure 12. Bulk compare

Bulk refresh/repair During bulk refresh HVR reads the data from all tables in the source database. If the databases are on different servers, the data is compressed and transported over the network. As the final step, the data is loaded into the target database whereby the table constraints and indexes are reinitialized. Bulk refresh is the fastest option to perform a database copy.

Figure 13. Bulk refresh

Row-wise compare Row-wise compare HVR extracts sorted data from the source database, submits it in compressed format over the network (if required), and compares the data table by table and row by row with the target database. For any detected differences it then generates the minimal SQL script required to apply the inserts, updates or deletes to the target database to make it consistent with the source database.

Row-wise refresh/repair Row-wise refresh performs the same steps as row-wise compare, but the resulting SQL script is applied directly to the target database for resynchronization. Row-wise refresh is efficient in cases where the source and the target database are largely equal.

Page 16: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

16

Figure 14. Row-wise compare and repair (refresh) All modes of Database Compare and Refresh/Repair are very flexible. They respect any transformation rules that may have been defined for a continuous database replication, such as selective replications and name and data conversions. Database Compare and Refresh/Repair are optimized for performance and flexibility. For example data streams can be parallelized across tables and across destinations. HVR orders the tables in groups of similar size by checking the DBMS catalogue information on the table sizes to ensure optimal load-balancing across the streams and processes. Moreover, the source database does not have to be taken off-line during a refresh action to avoid loss of changes. HVR will capture all changes applied to the source database during the refresh process and include them in the update of the target database. In case a row-wise refresh is applied, it is not even necessary to take the target database off-line.

Using refresh for optimizing target performance

In some cases optimal target performance can be achieved by dropping the target tables and recreating them with fresh data. HVR’s bulk refresh feature performs this task and can be scheduled regularly the same way as change data capture.

2.4 File Replication

In many organizations a large part of the exchange and distribution of information is realized by the copy and transfer of data files. This may result in countless numbers of unsecured and unmonitored point-to-point connections using various protocols, such as FTP or proprietary protocols. However, business critical purposes benefit from a more infrastructural solution. Managed File Transfer enables centralized management and control of end-to-end file transfers within the enterprise. It also enables secure and auditable file transports and it can be used between disparate platforms, applications and people. HVR provides Managed File Transfer functionality with its File Replication functionality. This can be used as a tool in its own right, but also as part of the integrated HVR suite for enterprise data integration. Complex file transfer chains can be configured, scheduled and controlled from a central point on an enterprise-wide level. HVR supports 3 types of file replication: file-to-file, database-to-file and file-to-database.

File-to-file transfer An HVR file-to-file transfer will copy the files from one file location (the source location) to one or more other file locations (the target locations). A file location is a directory or a tree of directories, which can either be accessed through the local file system (Unix, Linux or Windows) or through a network file protocol (FTP, FTPs, sFTP,

Page 17: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

17

WebDAV or HDFS). Files can be copied or moved. In the latter case, the files on the source location are deleted after they have been copied to the target locations. The file contents are normally preserved, but it is possible to include file transformations in the copy process using external commands or definitions defined in an XSLT file.

File distribution HVR provides the possibility to copy files selectively from the source location by matching their names to a predefined pattern. This feature also enables the routing of files within the same source location to different target locations on the basis of their file names. Thus enabling selective file distribution scenarios.

File-to-database In a file-to-database transfer data will be read from files in the source file location and replicated to one or more target databases. The source files are by default expected to be in a specific HVR XML format, which contains the table information required to determine to which tables and rows the changes should be written in the target database. It is also possible to use other input file formats by including an additional transformation step in the file capture. Support for CSV is already provided, but any format can be handled by providing an external command or an XSLT definition.

Database-to-file Alternatively, in a database-to-file transfer the data is read from a source database and copied into one or more files on the source file location. The resulting files are by default in the HVR XML format, so that the table information is preserved. However, CSV is also supported and other file formats can be obtained by including an additional transformation command or XSLT definition in the file output. As in the continuous database replication between databases, it is possible to select specific tables and rows form the source database and convert names and column values.

Advanced Scenarios HVR’s flexible architecture and seamless file integration makes a combination of various scenarios also possible. For example, combining distribution and conversion into a database distribution and file distribution channel. HVR’s flexible architecture enables all kinds of scenarios easily!

It is no surprise that HVR File Replication is optimized for maximum performance, efficiency and scalability as it benefits from the same mechanisms for data compression, data encryption, network efficiency and queuing as continuous data replication and Database Compare and Refresh/Repair. It therefore can handle multi-gigabyte files sent in a single task to or from 100 or more locations.

2.5 HVR Management and Operations

User Interface

All HVR tasks can be managed from a central intuitive and integrated Graphical User Interface (GUI). The GUI connects to the HVR hub and can be directly installed on the hub machine. Alternatively, it can be installed on the user’s PC and connect to the hub machine over the network. The hub controls and monitors all HVR agents. Administrators use the GUI configure HVR tasks, such as continuous database replication or Managed File Transfer schemes, and to schedule, execute and monitor these tasks.

Page 18: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

18

Instead of using the GUI, all HVR actions can also be initiated from the command line and hence be included in scripts for automating operations.

HVR Channels

Setting up or modifying a HVR task consists of two parts: Location Configuration and Channel Definition.

Location Configuration The Location Configuration contains the parameters that describe the addresses and the access credentials of each database and file store (“location”) that will be involved in the HVR task. Typical parameters include network node names or IP addresses, port numbers and login passwords. The type of storage (database or file store) and the kind of database (e.g. Oracle or Ingres) are also defined here. Each location has a logical name.

Channel Definition The Channel Definition defines the logical transformation rules for the HVR task. Locations are referenced by their logical names assigned in the Location Configuration. The locations are combined into location groups which act as the source and target of the HVR operation. Tables that should be replicated can be selected out of the database data dictionary. Finally, channel actions define the type of operation (e.g. a database replication consisting of a capture and an integration action) and the options and parameters that should be applied (such as tuning options or column transformations).

Once an HVR task has been configured it can be generated into a job on hub machine and scheduled for execution. The separation between the Location Group in the channel definition and the Location Definition allows for flexibility in terms of reuse and role separation. With the implementation details of the environment included in the Location Configuration and hidden from the Channel Definition, it is easy to change “physical” parameters within the environment, such as addresses or even replacing one type of DBMS by another, without having to change the “logical” parameters in the Channel Definition. Location Configuration could be

Figure 15. HVR GUI - Configuration panel

Page 19: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

19

done by the system administrators, without them having knowledge of the replication logic. On the other hand, Channel Definition can be done by the developers of replication schemes who do not need to be aware of the particular details of the deployment environment. With this separation a certain Channel Definition could first be tested within a testing environment and then be deployed to the production environment by only changing the Location Configuration. Or a high availability implementation can simply be reversed by flipping the allocation of physical source and target environments.

Monitoring and Statistics

The monitor as part of the GUI provides status information and statistics on the progress of the various HVR tasks on all involved operations in the distributed environment. HVR can also create alerts and interface through SMTP traps to popular system management tools, such as HP Open View and Nagios.

Figure 16. Monitoring

2.6 Platform Support

HVR has been designed to support complex data integration scenarios across large heterogeneous computing environments with divergent hardware platforms, database products and file systems. HVR can be installed on a number of Operating Systems and interface with most of the mainstream DBMS products and file systems.

Supported Database Management Systems

HVR can interact with the following DBMS products through native support:

Oracle Database Change Data Capture & Target

Oracle Exadata Change Data Capture & Target

Microsoft SQL Server Change Data Capture & Target

Actian Ingres Change Data Capture & Target

IBM DB2 (Linux/Unix and iSeries) Change Data Capture & Target

Salesforce Data Capture & Target

Actian Vector (Vectorwise) Target only

Actian Matrix (ParAccel) Target only

Teradata Database Target only

Pivotal Greenplum Target only

Pivotal HawQ Target only

Page 20: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

20

PostgreSQL Target only

Microsoft SQL Azure Target only

Amazon Redshift Target only

Through its XML file interface and support of external agents HVR provides an API into any database or application platform.

HVR supports all editions of the above databases, like Enterprise, Standard, Express, BI, RAC, ASM......

Supported Operating Systems & Cloud platforms

HVR can be installed on the following Operating Systems:

Linux for x86 and x86-64

Microsoft Windows for x86 and x86-64

Solaris for Sparc and Intel

HP-UX for Itanium

IBM AIX HVR supports real and virtual instances of these OS's. HVR can be installed on the following Cloud Platforms:

Microsoft Azure VM IAAS

Microsoft Azure Cloud Services PAAS

Amazon EC2 IAAS

Amazon RDS PAAS

Salesforce SAAS HVR also supports supported databases and file systems running on an IAAS / PAAS environment, including (virtualized) environments provided by the cloud vendor.

Supported File Locations

HVR can directly access files on the local file system of the servers on which it is installed or access files over network file protocols. The following file systems are supported:

Unix and Linux file systems

Windows file systems

Cloud IAAS file systems (e.g. Amazon S3 & Microsoft Azure Storage)

Hadoop Distributed File System (HDFS)

Microsoft Sharepoint (WebDAV)

FTP(s) and SFTP

Page 21: Real-time High Volume Data Replication - HVR Software ... · PDF fileReal-time and High Volume Data ... Real-time and High Volume Data Replication 4 1.1 HVR Usage ... software uses

Real-time and High Volume Data Replication

21

3 About HVR – High Volume Replication At HVR Software we believe it should be easy to deliver large volumes of data efficiently and at the right time into your data store of choice. Our software HVR, High Volume Replicator, supports log-based change data capture from relational databases and integration into relational and BI databases as well as file systems and Hadoop. HVR simplifies the complex challenges of real-time data integration through an easy-to-use graphical user interface and an innovative architecture. Customers across industries and regions rely on HVR for real-time data integration. Typical use cases include real-time Business Intelligence, hybrid cloud integration and geographically distributed computing. We serve customers around the world with offices in San Francisco – USA, Amsterdam – the Netherlands, Shanghai – PRC, and Sydney – Australia. For more information or to request a trial please visit us at http://www.hvr-software.com.

Copyright © 2015 HVR Software. All rights reserved. Trademarks referenced in this document are the sole property of their respective owners.