SAP BusinessObjects Data Services Setup Guide

SAP BusinessObjects Data Services Setup Guide

Follow the instructions below on how to setup SAP BusinessObjects Data Services

After having the licensed version of SAP BODS XI, Click the installation icon "setup.exe" to

launch installer.

Click Next

Check the option "I accept the License Agreement" and Click the Next button.

Click on the Next button and provide the license information as provided by SAP.

Full Name: XYZ.

Organisation: XYZ CORP.

Product Key: SAP Provided License Information.

Click Next button to continue.

Change the software installation directory to D:\Business Objects\BusinessObjects Data

Services\ You can install in any local directory of your choice and also on SAN drive.

Click the Next button.

Select the features to install.

By default the server, client and web components are selected. Real time services i.e. Message

Client Components are not available in this license copy.

Click the Next button to continue.

Select the option Use an existing database server.

Click the Next button to continue.

Provide the pre configured database connection information. (MSSQL Server in this case)

Database Type: Microsoft SQL Server

Database Server Name: EDWDEVETL

Database Name: DEVETLREPO

User name: DEVETLREPO

Password:

Click on the button Get Version to check whether the service can connect to database.

Click the Ok button.

Next check the Create option and click the Next button to continue.

This will create local metadata repository of SAP BODS.

Click the Next button to configure the Job Server.

Check the option - Configure a new Job Server

Job Server name: JS_DEVETLREPO

Job Server port: 3500

Next click on the Advanced button, to change Cache Directory for this job server.

Change the Cache folder as D:\Business Objects\CacheDir

Leave the ports as default. Click the OK button.

Click the Next button to continue with the Service Login Information.

Check the option Use system account and click the Next button to continue.

Three BODS services

namelyBOE120Tomcat,BusinessObjectsAddressServerandDI_JOBSERVICEwill be created under

this system account. Click Next to Continue.

Next check the option Skip Access Server Configuration and click the Next button to continue.

Since we are not using Real Time Data Services we skip this configuration step.

Click Next to install the web application server.

Check the option Install Tomcat application server and click the Next button to continue.

Leave the ports as default and click the Next button to continue installation.

This tomcat web application server will host the Data Services web based management console.

Click Next to finalize the installation. The list of programs and services to be installed are

displayed.

Click the Next button to start the SAP BODS installation.

After successful installation of SAP Business Objects Data Services reboot to reflect

configuration changes.

Click Finish to exit installation.

Next Restart the System to configure BODS installation.

Click Yes.

After machine restart do the initial validation to check the SAP BODS Services and Programs

installed. Click Start then All Programs, next check the SAP BODS Installed programs.

In this part of the article, we will see how to change the default Log file directory of SAP BODS,

Create Local, Central (Secured version control for multi-user environment) and Profiler (Data

profile) repositories, Configure Job Server, Configuring DS Management Console, Add/Remove

license information and Configuring SAP BODS Metadata Integrator with SAP BOXI.

Getting Started with SAP BODS DESIGNER

After BODS Installation, let us try to look at the basics of creating Project, Datastore, Import Table, Batch Job, Workflow, Dataflow and executing a simple Job.

Now we will go step by step after loging into the SAP BODS Designer Console. Here the given screen-shots are self explanatory.

CREATE PROJECT

http://www.dwbiconcepts.com/images/etl_bods/start_bods/ds_project1.jpg

http://www.dwbiconcepts.com/images/etl_bods/start_bods/ds_project3.jpg

CREATE DATASTORE

http://www.dwbiconcepts.com/images/etl_bods/start_bods/ds_ds1.jpg

IMPORT DATASTORE TABLE BY NAME

http://www.dwbiconcepts.com/images/etl_bods/start_bods/ds_ds3.jpg

http://www.dwbiconcepts.com/images/etl_bods/start_bods/ds_table1.jpg

CREATE NEW BATCH JOB

http://www.dwbiconcepts.com/images/etl_bods/start_bods/ds_table3.jpg

http://www.dwbiconcepts.com/images/etl_bods/start_bods/ds_job1.jpg

CREATE GLOBAL VARIABLE



ADD WORKFLOW

ADD DATAFLOW


Now we will create a simple Mapping or Dataflow.

ADD ROW GENERATION TRANSFORM



ADD QUERY TRANSFORM



ADD TARGET TABLE TO DATAFLOW



TABLE AND TRANSFORM PROPERTIES






VALIDATE AND SAVE OBJECTS TO METADATA REPOSITORY


http://www.dwbiconcepts.com/images/etl_bods/start_bods/ds_validate.jpg

EXECUTE BATCH JOB



CHECK DATA TO VALIDATE SUCCESSFUL JOB RUN


List of available transforms

1. Data Integrator- Data_Transfer, Date_Generation, Effective_Date, Hierarchy_Flattening, History_Preserving, Key_Generation, Map_CDC_Operation, Pivot (Columns to Rows), Reverse Pivot (Rows to Columns), Table_Comparison, XML_Pipeline

2. Data Quality- Associate, Country ID, Data Cleanse, DSF2 Walk Sequencer, Geocoder, Global Address Cleanse, Global Suggestion Lists, Match, USA Regulatory Address Cleanse, User-Defined

3. Platform- Case, Map_Operation, Merge, Query, Row_Generation, SQL, Validation4. Text Data Processing- Entity_Extraction

Our approach is to get a detailed knowledge on all the above transforms starting with the mostly commonly used ones. So we will start with the Query transform.



QUERY Transform

Query transform is used to retrieve a data set based on the input schema that satisfies conditions that we specify. A query transform is similar to a SQL SELECT statement. The Query transform is used to perform the following operations: -

Maps column from input Schema to output Schema. Perform transformations and functions on the source data. Assign Primary Keys to output Schema columns. Add New Output columns, Nested Schemas, and Function Calls to the output Schema. Perform Data Nesting and Unnesting with Sub Schemas of the Output Schema. Also

assign Make Current Schema. Generate Distinct result set output for the input Schema. Join data from Multiple Input Source Schemas. Equi Join as well as Outer Join is

supported. Filter input Source Data. Performs Aggregation based on input column groups. Generate sorted dataset based on source input column order. Also we can generate DTD, XML Schema or File Format based on the Input or Output

Schema.

CASE Transform

Case transform is used to divide or route the input data set into multiple output data sets based on the defined logical expression. It is used to implement IF-THEN-ELSE logic at dataflow level. This transform accepts only one source input. We can define multiple labels and their corresponding CASE expression. For input rows that do not satisfy any of the CASE conditions, we may select to output those records using theDEFAULT case. For that we need to select the check box Produce default output when all expressions are false.

Two other featured properties of this transform are Row can be TRUE for one case only and Preserve expression order. If we select the option Row can be TRUE for one case only, then a row is passed to the first case whose expression returns TRUE. Otherwise, the row is passed to all the cases whose expression returns TRUE. Preserve expression order option is available only when the Row can be TRUE for one case only option is checked. We can select this option if expression order is important to us because there is no way to guarantee which expression will evaluate to TRUE first.

MERGE Transform

Merge transform is used to combine multiple input dataset with the same schemas into a single output dataset of the same schema. It is equivalent to SQL UNION ALL statement. In order to

eliminate duplicate records from output dataset basically to attainUNION operation, add a Query transform withDISTINCT option enabled after the Merge transform.

VALIDATION Transform

Validation transform is used to filter or replace the source dataset based on criteria or validation rules to produce desired output dataset. It enables to create validation rules on the input dataset, and generate the output based on whether they have passed or failed the validation condition. This transform is typically used for NULL ckecking for mandatory fields, Pattern matching, existence of value in reference table, validate datatype, etc.

The Validation transform can generate three output dataset Pass, Fail, andRuleViolation. The Pass Output schema is identical with the Input schema. The Fail Output schema has two more columns, DI_ERRORACTION and DI_ERRORCOLUMNS. The RuleViolation has three columns DI_ROWID, DI_RULENAME and DI_COLUMNNAME.

MAP_OPERATION Transform

Map_Operation transform allows conversions between data manipulation operations like INSERT, UPDATE, DELETE & REJECT. It enables to change theoperation codes of input data sets to produce the desired output row type. There are 4 operation codes for any input row type – Normal, Update, Insert and Delete. In addition, the DISCARD option can be assigned to the output row type. Discarded rows are not passed through to the output of the transform.

If the output record is flagged NORMAL orINSERT, then it inserts a new record in the target table. If it is marked asUPDATE it basically overwrites an existing row in the target table. If the input record is flagged as Delete it does not load the records in the target table. But if the output row type is set to DELETE then it deletes the corresponding records present in target. If the row is marked asDISCARD then no records are passed to the output of the transform.

ROW_GENERATION Transform

Row_Generation transform produces a dataset with a single column. The column values start with the number that we specify in the Row number startsat option. The value thenincrements by one to specified number of rows as set in the Row countoption. This transform does not allow any input data set.

SQL Transform

SQL transform is used to submit or perform standard SQL operations on database server. The SQL transform supports a single SELECT statement only. This transform does not allow any input data set. Use this transform when other built-transforms cannot perform the required SQL

operation. Try to use this transform as your last option as it not optimised for performance and also reduces readability.

KEY_GENERATION Transform

Key_Generation transform helps to generate artificial keys for new rows in a table. The transform looks up the maximum existing key value of the surrogate key column from the table; And uses it as the starting value to generate new keys for new rows in the input dataset. The transform expects a column with the same name as the Generated key column of the source table to be a part of the input schema.

The source table must be imported into the DS repository before defining the source table for this transform. The fully qualified Table namee.g.DATASTORE.OWNER.TABLEshould be specified. Also we can set the Increment value i.e. the interval between the generated key values. By default it is 1. We can also use a variable placeholder for this option. We will be using this tranform frequently while populating surrogate key values of slowly changing dimension tables.

TABLE_COMPARISON Transform

Table_Comparison transform helps to compare two data sets and generates the difference between them as a resultant data set with rows flagged asINSERT, UPDATE, orDELETE. This transform can be used to ensure rows are not duplicated in a target table, or to compare the changed records of a data warehouse dimension table. It helps to detect and forward all changes or the latest ones that have occurred since the last time the comparison table was updated. We will be using this transform frequently while implementing slowing changing dimensions and while designing dataflows for recovery.

The source table must be already imported into the DS repository. The fully qualified Table namee.g.DATASTORE.OWNER.TABLEshould be specified. Also set the input dataset columns that uniquely identify each row as Input primary key columns. These columns must be present in the comparison table with the same column names and datatypes. If the primary key value from the input data set does not match a value in the comparison table, DS generates an INSERT statement. Else it generates an UPDATE row with the values from the input dataset row after comparing all the columns in the input data set that are also present in the comparison table apart from the primary key columns. As per your requirement select only the required subset of non-key Compare columns which will give performance improvement.

http://www.dwbiconcepts.com/images/etl_bods/start_bods/platform_transforms.jpg

If the Input primary key columns have duplicate keys, the transform arbitrarily chooses any of the rows to compare during dataflow processing i.e. order of the input rows are ignored. Selecting theInput contains duplicate keys check box provides a method ofhandling duplicate keys in the input data set.

If the comparison table contains rows with the same primary keys, the transform arbitrarily chooses any of the rows to compare. Specify the column of the comparison table with unique keys i.e. by design contains no duplicate keys as theGenerated key column. A generated key column indicates which row of a set containing identical primary keys is to be used in the comparison. This provides a method ofhandling duplicate keys in the comparison table.

For an UPDATE, the output data set will contain the largest key value found for the given primary key. And for aDELETE, the output data set can include all duplicate key rows or just the row with the largest key value.

When we select the check box Detect deleted row(s) from comparison table the transform flags rows of the comparison table with the same key value asDELETE. When we select the options of the transforms - Generated key column, Detect deleted row(s) from comparison table and Row-by-row select or the Sorted input comparison method; Additional section appears to specify how to handle DELETE rows with duplicate keys. i.e.Detect all rows orDetect row with largest generated key value

Apart from all these properties there are three methods for accessing the comparison table namelyRow-by-row select,Cached comparison table and Sorted input. Below is the brief on when to select which option.

1. Row-by-row selectoption is best if the target table is large compared to the number of rows the transform will receive as input. In this case for every input row the transform fires a SQL to lookup the target table.

2. Cached comparison tableoption is best when we are comparing the entire target table. DS uses pageable cache as the default. If the table fits in the available memory, we can change the Cache type property of the dataflow to In-Memory.

3. Sorted inputoption is best when the input data is pre sorted based on the primary key columns. DS reads the comparison table in the order of the primary key columns using sequential read only once. NOTE: The order of the input data set must exactly match the order of all primary key columns in the Table_Comparison transform.

NOTE:

The transform only considers rows flagged as NORMAL as Input dataset. Cautious when using real datatype columns in this transform as comparison results are

unpredictable for this datatype.

Documents

SAP BusinessObjects Data Services Setup Guide