Data warehousing labs maunal

Data Warehousing Lab Manual

Engr. Muhammad Waseem

SUBJECT: DATAWARE HOUSING & MINING

SUBJECT CODE: (CS-401)

LIST OF PRACTICALS

LAB NO:1 Understanding Teradata

LAB NO:2 Creating Database and Users

LAB NO:3 Creating the Tables in the Database

LAB NO:4 To be familiar with Teradata SQL

Assistant

LAB NO:5 Execute the different data

manipulation queries

LAB NO:6 To be familiar with the visual

Explain

LAB NO:7 Generating reports using Teradata

Warehouse Miner 5.3.0 Express

LAB NO:8 Histograms generation using

Teradata Warehouse Miner 5.3.0

Express

LAB NO:9 Connecting database with VB

LAB NO:10 Loading of data using Fastload

utility

LAB NO:11 To be familiar with schemas

LAB NO:12 Teradata Warehouse Builder Visual Interface

LAB NO:13 Generating frequency diagram of

data using Warehouse Miner

LAB NO:14 To become familiar with Teradata

Parallel Transporter Wizard 13.0

LAB NO:15 Creating a job script by using

Teradata Parallel Transporter

Wizard 3.0

Engr. Shakeel Ahmed Shaikh

LAB TASK # 01

Understanding Teradata

Object The purpose of this lab is to introduce you with the Teradata.

Introduction The Teradata provides the solutions for the data warehousing. TERADATA is a

registered trademark of NCR International, Inc. The Teradata Tools and Utilities

are a group of products designed to work with the Teradata RDBMS.

Tools • Teradata Service Control

• Administrator

Theory The Teradata RDBMS is a complete relational database management system

composed of hardware and software. The system can use either of two attachment

methods to connect to other computer systems as illustrated in the following table:

This attachment method… This attachment method… Allows the

system to be attached… Channel Directly to an I/O channel of a

mainframe computer.

Network To intelligent workstations

through a Local Area Network

(LAN).

With the Teradata RDBMS, you can access, store, and operate on data using

Teradata Structured Query Language (Teradata SQL).

Teradata Service Control

It is used to start and stop the Teradata server.

Teradata Administrator

Teradata Administrator is a Windows-based tool that interfaces with the Teradata

Database Data Dictionary to perform database administration tasks. Teradata

Administrator enables you to view the hierarchy of databases and users on a

Teradata system. You can then display information about these objects, create

new objects, or perform other maintenance functions on the system.

Procedure

Starting the Teradata server To start the Teradata server click on the start menu then Programs>> Teradata

Database Express 13.0>>Teradata Service Control

You will see the following window.

You can see that currently Teradata is Down/Stopped. To start the Teradata click

on the Start Teradata!. Then wait for two minutes. When the Teradata server will

be started it status will be shown in the same window.

The databases and the users in the database

You can check all databases, users, tables, and views etc by using the database

browser to start the database browser click on

Start button to start the Teradata Administrator program via:

Start>>Programs>>Teradata Administrator 13.0 For the 'please select a data source' window, select tdadmin and click on the OK

button

When you will click ok you will the Teradata Administrator windows.

Result Now we know how to start the Teradata Server and how to check the different

databases and users in the database.

LAB TASK # 02 Creating Database and Users

Object To create a database and its Users

Introduction In Teradata, a database is always created inside another database. The dbc

database is the parent of all databases. The users are created for the databases to

perform different operations on the data.


• Teradata Administrator 6.0

Theory When the Teradata RDBMS software is installed, the Teradata RDBMS contains

the following system users/databases:

• DBC

• SysAdmin

• SystemFE

Initially, the special system user DBC owns all space in the Teradata

Database. Some space is assigned from DBC to the system users and databases

named SysAdmin, SystemFE, Crashdumps, and Sys_Calendar.

Everyone higher in the hierarchy is a parent or owner. Everyone lower in the

hierarchy is a child.

Every object has one and only one creator. The creator is the user who executes

the CREATE statement.

The GRANT statement enables you to grant any of the privileges you have to

another user. For example, when logged on as user DBC you need to grant all the

privileges retained by DBC to your new DBAdmin user:

GRANT ALL ON ALL TO DBADMIN;

The GIVE statement enables you to transfer ownership of a database or user to a

non-owner. GIVE transfers to the recipient not only the specified database or user

space, but also all of the databases, users, and objects owned by that database or

user.

Permanent space allocated to a Teradata user or database for creating:

• Data tables

• Permanent journals

• Table headers of global temporary tables (one header row per table)

• Secondary indexes (SIs)

• Join indexes ( JIs)

• Hash indexes ( HIs)

• Stored procedures

• Triggers

Procedure Start the Teradata server using the Teradata Service Control and start the Teradata

Administrator 13.0. To start the Teradata Administrator 13.0 click on Start button:

Start>>Programs>>Teradata Administrator 13.0 For the 'please select a data source' window, select tdadmin and click on the OK

button


When you will click ok you will the Teradata Administrator windows.

To create the database click on the diamond as shown in the center of fig below


Type the entries as shown in the table below

Database Name Mydatabase

Owner Dbc

Perm Space 10

Spool Space 10

Temp Space 10

For perm space, Spool Space and Temp Space select the option

Click on the create button, the database will be created. See the status bar

message.

To create the user click on the user icon as shown at the top of fig below


Type the entries as shown in the table below :

User Name Ahmed

Owner Mydatabase

Password Ahmed

Perm Space 10

Spool Space 10

Temp Space 10

Account

Default Database mydatabase

Click on the create button, the user will be created.

You can see the database and its user in the Teradata Administrator 13.0 as shown

in the fig below.

Finally grand the privileges to the user Ahmed.

Result Now we are familiar to Teradata Administrator 13.0 and we can create databases

and users.

.

LAB TASK # 03 Creating the tables in the database

Object To create the tables in the database using BTEQ

Introduction The tables created in the relational database management system store the data.

They are created within a database by a user. Tables consist of rows and tables.

The rows store the records and columns store the same type data.


• BTEQ

Theory

The BTEQ is used to execute the SQL queries. You can start many sessions of

BTEQ at one time.

BTEQ is an abbreviation of Basic Teradata Query. It is a general-purpose,

command-based program that allows users on a workstation to communicate with

one or more Teradata RDBMS systems, and to format reports for both print and

screen output. Using BTEQ you can submit SQL queries to the Teradata RDBMS.

BTEQ formats the results and returns them to the screen, a file, or to a designated

printer.

A BTEQ session provides a quick and easy way to access a Teradata RDBMS. In

a BTEQ session, you can do the following:

• Enter Teradata SQL statements to view, add, modify, and delete data.

• Enter BTEQ commands.

• Enter operating system commands.

• Create and use Teradata stored procedures from a BTEQ session.

Procedure

Start the BTEQ. To start the BTEQ click on the start menu then

program>>Teradata Client>>BTEQ

In the BTEQ window type the following commands to logon.

.logon

UserId: Asad

Password: Lodhi The session is shown in the fig bellow in the fig..

Now you can create tables by executing the following SQL command.

CREATE TABLE Event (

account_id char(12) CHARACTER SET LATIN,

account_type char(2) CHARACTER SET LATIN,

Complaint_id integer ,

Complaint_detail varchar(50) CHARACTER SET LATIN,

Actions_taken varchar(20) CHARACTER SET LATIN,

Remarks varchar(50) CHARACTER SET LATIN

);

When you execute the above command in BTEQ after the execution of that

command the table will be created and the following message will be displayed.

*** Table has been created.

*** Total elapsed time was 1 second.

Result

We are familiar with the BTEQ and we can create the tables using this utility.

LAB TASK # 04

Teradata SQL Assistant Object

To be familiar with Teradata SQL Assistant

Introduction

Designed to provide a simple way to execute and manage your queries against a

Teradata, or other ODBC compliant database, SQL Assistant stores your queries

for easy re-use, and provides you with an audit trail that shows the steps that

produced your current results.

Tools

• Teradata Service Control

• Teradata SQL Assistant 6.1

Theory

There are several tools for executing the SQL queries. But the Teradata SQL

Assistant 6.1 is the visual and easiest query submitting tool. Any kind of query

can executed using this utility. You can create new databases, tables, views,

macros etc. the data present in the tables can also be manipulated.

Procedure

Start SQL Assistant

To start SQL Assistant click on the following menu from the Windows Start

bar: Start >> Programs Teradata SQL Assistant 6.1

You will see the following window. Now connect it with the DemoTDAT.

Now type any valid SQL query and execute it by pressing F5 key. The results will

be displayed. The results of a query are shown in the following figure.

Result Now we are familiar with the Teradata SQL Assistant 6.1. We can execute any

query using this utility.

LAB TASK # 05

Data manipulation Object

Execute the different data manipulation queries.

Introduction

Data manipulation statements affect one or more table rows.

Tools


• BTEQ

• Teradata SQL Assistant 6.1

Theory

Some of the data manipulation statements and their purpose is given in the

following table

Command Purpose

ABORT Terminates the current transaction.

BEGIN TRANSACTION Defines the beginning of a single logical transaction.

CALL Invokes a stored procedure.

CHECKPOINT Places a flag in a journal table that can be used to coordinate

transaction recovery.

COMMENT Adds or replaces an object comment.

COMMIT Terminates the current ANSI SQL transaction and commits all

changes made within it.

DELETE Removes rows from a table.

ECHO Returns a fixed character string to the requestor.

END TRANSATION Defines the end of a single logical transaction.

EXECUTE ( Macro

Form)

Performs a macro.

EXPLAIN Modifier Reports a summary of the plan generated by the SQL query

optimizer: the steps Teradata would perform to resolve a request.

The request itself is not processed.

GROUP BY Clause Groups result rows of a SELECT query by the values in one or

more columns.

HAVING Clause Specifies a conditional expression that must be satisfied by the

rows in a SELECT query to be included in the resulting data.

INCLUDE

INSERT Adds new rows to a named table by directly specifying the row

data to be inserted (valued form) or by retrieving the new row

data from another table (selected form).

LOCKING Modifier Locks a database, table, view, or row hash, overriding the default

usage lock that Teradata places on a database, table, view, or row

hash in response to a request.

MERGE Merges a source row into a target table based on whether any

target rows satisfy a specified matching condition with the source

row.

ORDER BY Clause Specifies how result data in a SELECT statement is to be ordered.

QUALIFY Clause Eliminates rows from a SELECT query based on the value of a

computation.

ROLLBACK Terminates and rolls back the current transaction.

SAMPLE Clause Selects rows from a SELECT query for further processing to

satisfy a conditional expression expressed in a WHERE clause.

SELECT Returns selected rows in the form of a result table.

SELECT INTO Returns a selected row from a table and assigns its values to host

variables.

UPDATE (Searched

Form)

Modifies field values in existing rows of a table.

UPDATE (Upsert Form) Updates column values in a specified row and, if the row does not

exist, inserts it into the table with a specified set of initial column

values.

USING Row Descriptor Defines one or more variable parameter names.

WHERE Cause Selects rows in a SELECT query that satisfy a conditional

expression.

WITH Clause Specifies summary lines and breaks (grouping conditions) that

determine how selected results from a SELECT query are

returned (typically used for subtotals).

Procedure

Open any query executing tool, here we are using the Teradata SQL Assistant

6.1.

To start SQL Assistant click on the following menu from the Windows

Start bar: Start >> Programs >> Teradata SQL Assistant 6.1.

Connect the Teradata SQL Assistant 6.1 by with DemoTDAT clicking on the

connection button .

Type the following SQL select query in the Query window and press F5 to

execute it. select Complaint_detail as vent_f_Compalint_detail,b,rank(b) as rnk from (select Complaint_detail ,count(*) as cnt from thesis.Event group by 1) as foo(Complaint_detail ,b) qualify rnk< 25;

You will the following result.

Execute the other sample query as shown which is shown below. select Actions_taken as vent_f_Actions_taken,b,rank(b) as rnk from (select Actions_taken ,count(*) as cnt from thesis.Event group by 1) as foo(Actions_taken ,b) qualify rnk< 25;

After executing above query you will find the following result.

Execute the other sample query as shown which is shown below.

select account_type as Eventaccount_type,b,rank(b) as rnk from (select account_type ,count(*) as cnt from thesis.Event group by 1) as foo(account_type ,b) qualify rnk< 25;



select Complaint_detail as EventCompalint_detail,b,rank(b) as rnk from (select Complaint_detail ,count(*) as cnt from thesis.Event group by 1) as foo(Complaint_detail ,b) qualify rnk< 25;



select account_id as Eventaccount_id,b,rank(b) as rnk from (select account_id ,count(*) as cnt from thesis.Event group by 1) as foo(account_id ,b) qualify rnk< 25;



select Complaint_id as EventComapint_id,b,rank(b) as rnk from (select complaint_id ,count(*) as cnt from thesis.Event group by 1) as foo(Complaint_id ,b) qualify rnk< 25;



select Actions_taken as EventActions_taken,b,rank(b) as rnk from (select Actions_taken ,count(*) as cnt from thesis.Event group by 1) as foo(Actions_taken ,b) qualify rnk< 25;


Execute the other sample query as shown which is shown below explain select * from thesis.event; After executing above query you will find the following result.

Explanation

-------------------------------------------------------------------------

1) First, we lock a distinct thesis."pseudo table" for read on a

RowHash to prevent global deadlock for thesis.event.

2) Next, we lock thesis.event for read.

3) We do an all-AMPs RETRIEVE step from thesis.event by way of an

all-rows scan with no residual conditions into Spool 1

(group_amps), which is built locally on the AMPs. The input table

will not be cached in memory, but it is eligible for synchronized

scanning. The size of Spool 1 is estimated with low confidence to

be 12,936 rows. The estimated time for this step is 0.31 seconds.

4) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.

-> The contents of Spool 1 are sent back to the user as the result of

statement 1. The total estimated time is 0.31 seconds.

Similarly we can execute the other command in the query editor of Teradata

SQL Assistant 6.1.

Result

We are familiar with the data manipulation statements.

LAB TASK # 06 Visual Explain

Object

To be familiar with the visual Explain

Introduction

The Visual Explain Demo provides a visual depiction of the execution plan

chosen by the Teradata Database Optimizer to access data.

Tools


• Teradata Visual Explain 3.0

Theory

The visual explain does visual depiction by turning the output text of the

EXPLAIN modifier into a series of easily readable icons. We will use 7 queries

in the Visual Explain lab.

Procedure

Start the Visual Explain and Compare Utility

Start>>Programs>>Teradata>>Visual Explain 3.0

• Connect to the Teradata Database by clicking the green connect icon

(looks like a plug) • Highlight “DemoTDAT”

• Click OK

• Click on File>>Open Plan from Database…

• Under Selection fill in the Database name: QCD

• Click on Browse QCD… button. Note: Make sure Query Tag field is

blank.

• A list of seven queries appears; click the checkbox for the first query

Select the first item and click ADD, the entry now appears on the right

hand side

.

• Click OPEN, the query plan will load

• The visual plan now appears

• A summary will appear on top of the plan, click the X in the upper-right

corner to close it.

• Moving the mouse over the plan components will display various pieces

of information about the plan

Result Now we are familiar with the visual explain.

LAB TASK # 07

Generating reports using the Miner Object

Generating the frequency diagrams of our data using the Miner

Introduction

Compute frequency of column values or multi-column combined values.

Optionally, compute frequency of values for pairs of columns in a single column

list or two column lists.

Tools


• Teradata Warehouse Miner

Theory

Frequency analysis is designed to count the occurrence of individual data values

in columns that contain categorical data. It can be useful in understanding the

meaning of a particular data element, and it may point out the need to recode

some of the data values found, either permanently or in the course of building an

analytic data set. This function can also be useful in analyzing combinations of

values occurring in two or more columns.

Procedure

To generate the frequency diagrams start the Teradata Warehouse Miner by

clicking it icon on desktop.

Connect it with the thesis database.

Start the new project by clicking the new project icon or from file menu.

Now click the Project menu>>Add new Analysis.

In the Analysis window, from categories pane select Descriptive Statistics and

click on the Frequency icon in the Analysis pane

and then press the OK button.

• Select the Employee table from available tables.

• Select Deptno column.

• Click the right arrow to move the Deptno into the selected columns,

as shown in the fig below.

Start the report generating by Clicking on OR using Shortcut F5. The

status can be seen in the execution status pane. As shown below

The resultant report can be viewed by clicking on the Results icon in the

frequency window.

Output can be viewed in ways ( Data, Graph, SQL ) as shown below. Click on the

Graph icon in the Frequency window.

The resultant graph will be displayed as shown below.

Result

We are familiar with the frequency diagrams and also we know how to

generate them.

LAB TASK # 08 Histograms generation

Object

To generate the histograms of data

Introduction

Determine the distribution of a numeric column(s) giving counts with optional

overlay counts and statistics. Optionally sub-bin numeric column(s) and

determine data "spikes" giving additional counts as an option.

Tools


• Teradata Warehouse Miner

Theory

Histogram analysis is designed to study the distribution of continuous numeric

values in a column by providing the data necessary to create a histogram graph.

This type of analysis is sometimes also referred to as binning because it counts the

occurrence of values in a series of numeric ranges called bins.

Procedure




Start the new project by clicking the new project icon or from file menu. Now

click the Project menu>>Add>>Descriptive Statistics.

Then click on the Histogram icon in the Add Descriptive Statistics Function

window and press the OK button.

Select the event table from available tables, and move the account_id into the

selected Bin columns and Aliases as shown in the fig below.

Start the report generating by pressing F5 or Run>>Start F5.

The resultant report can be checked by clicking on the Results icon in the new

project window.

Click on the Histogram Graph icon in the Analysis Results window.


Result

We are familiar with the Histogram diagrams and also we know how to

generate them.

Connecting database with VB

Object

To connect the database with VB

Introduction

Visual basic is a language, used to make different software.90% applications of

VB are in Databases.

Tools

1. Teradata Service Control

2. Visual Basic 6.0

Theory

We use the Adodc1 control to connect with database. We define the data sources,

and then connect the adodc1 with those data sources. Following are the drivers

which VB supports for data base.

Procedure

To connect any data base with VB we should set the driver for the data source. To do this click

on the data sources icon in the control panel

Following window will be displayed.

Click on the add button.

In create new data source window select the Teradata and click on the Finish button.

Fill the entries as shown above or according to your requirement then click on the Ok

button.

Click yes for the warning message.

You can check Event data source name having driver of Teradata in the ODBC Data Source

Administrator window. Close the ODBC Data Source Administrator window.

Start a new standard EXE project in the visual basic. And select the adodc1 data control form the

components. Place the adodc1 control on the form and right click on it. Select properties form

the popup menu.

Select Event in the Use ODBC Data Source Name under the General tag.

Write the user name and password under the Authentication tag.

Select the 2- adCmdTable in the command Type and Event in the Table or Stored Procedure

Name under the RecordSource tag.

Click ok to close the properties page.

Select the dataGrid control form the components and place it on the VB form. Make it bigger so

that data could be seen easily.

Now set the data source property of grid control to adodc1

Press F5 to run the VB project the data will be shown in the grid control as shown below.

Result

We are learned form this lab how to view the data places in a data warehouse using VB

LAB TASK # 10

Loading of data

Object

To load the data in the tables using the fast load utility

Introduction

To load the data we have some utilities like Fastload, BTEQ, Tpump, Multiload,

tbuild. Fastload is used to load data in empty tables.

Theory

FastLoad is a command-driven utility you can use to quickly load large amounts

of data in an empty table on a Teradata Relational Database Management System

(RDBMS).

You can load data from:

• Disk or tape files on a channel-attached client system

• Input files on a network-attached workstation

• Special input module (INMOD) routines you write to select, validate, and

preprocess input data

• Any other device providing properly formatted source data

FastLoad uses multiple sessions to load data. However, it loads data into only one

table on a Teradata RDBMS per job. If you want to load data into more than one

table in an RDBMS, you must submit multiple FastLoad jobs—one for each table.

Procedure

To start the Fastload click on the start menu then Program>>Teradata

Client>>FastLoad

You will see the following Fastload screen.

The following script is used to create one table and then load data in that table from flat file. Run

the command of the script in the Fastload. You will see the data will be loaded in the table

LOGON dbc/dbc , dbc;

.set record unformatted

DATABASE thesis;

DROP table Event;

drop table Event_error1;


CREATE TABLE Event (

account_id char(12) CHARACTER SET LATIN,

account_type char(2) CHARACTER SET LATIN,

Complaint_id integer ,

Complaint_detail varchar(50) CHARACTER SET LATIN,

Actions_taken varchar(20) CHARACTER SET LATIN,

Remarks varchar(50) CHARACTER SET LATIN

);

DEFINE

account_id (char(12)),

account_type (char(2)),

Complaint_id (char(5)),

Complaint_detail (char(50)),

Actions_taken (char(20)),

Remarks (char(50)),

newline3 (char(2))

FILE= c:\fl\events.txt;

BEGIN LOADING

Event

ERRORFILES

Event_ERROR1,Event_ERROR2

CHECKPOINT 10000;

INSERT INTO Event

(

account_id ,

account_type ,

Complaint_id ,

Complaint_detail ,

Actions_taken ,

Remarks

)

VALUES

(

:account_id ,

:account_type ,

:Complaint_id ,

:Complaint_detail ,

:Actions_taken ,

:Remarks

);

END LOADING

;

.LOGOFF;

Here we are loading the data from event.txt file. You can retrieve that data using BTEQ or any

other utility. Here is the sample data.

Result

Now we are familiar about loading of data

LAB TASK # 12

Teradata Warehouse Builder Visual Interface

Object

To run the job scripts in the Teradata Warehouse Builder Visual Interface.

Introduction

The warehouse builder is used to create the warehouse. We can create the DBMS,

Schema, Table, TableSet, Operator, Logger, LogView, and Job by using the

Teradata Warehouse Builder Visual Interface.

Tools


2. Teradata Warehouse Builder Visual Interface

Theory

This demo script will show you how to familiarize yourself with the application,

to collect and retrieve recommendations for a table, and to execute the

recommendations. In this short lesson we will:

Start the Teradata Warehouse Builder Visual Interface application

Run a predefined data load job.

Check to see data was loaded into Warehouse as a result of a job using

Teradata Administrator (WinDDI)

The following is a graphical representation of the tasks that will be performed in

Demo script 1. The other scripts are graphically represented at the end of this

demo.

Flat

file

Teradata

Union

All

Teradata

Data

Connector

ODBC

Operator

Update

Operator

Update

Operator

Data

Connector

Procedure

Start the Teradata Warehouse Builder Visual Interface application

To start Teradata Statistics Wizard click on the following menu from the Windows

Start bar:

Start >> Programs >> Teradata Client >> Teradata Warehouse Builder Visual Interface or double click on the desktop shortcut.

The Teradata Warehouse Builder application window will open:

Click “+” sign at the left of Job. A list of predefined jobs will be shown.

Click “+” sign at the left of “Demo_01_setup” job. This will show nothing at this point.

Click “Demo_01_setup” to highlight this job then right mouse click.

Select “Submit” to run this job.

Enter a name like “run-01” and click OK button.

Answer OK to the pop-up window saying job is being started.

If the following window does not appear, expand the “Demo_01_SETUP and click on the

run-01 selection. Under the job you will see the name of the running job and in the Job

Output window a message saying the job has started. Next click on the “Job Details” tab

You will see the details of the setup tasks that are being performed. Wait until it

terminates then click back on the “Job Output” tab.

Back in the Job State output window you will see the summary of the completed tasks.

Also, note that the icon for the job you just ran is now a checker flag indicating the job

has finished.

Next select “DEMO_01” job to highlight it.

Again right mouse click and select submit to run this new job.

Like before fill in a job name like “run-01” and click on OK. You’ll also click on OK to

close the message box.

If you don’t see any job information to right then click the “+” sign to the left of the job

name to reveal the running job underneath.

Like before click on the “Job Details” tab to see the task being performed. For this job

you will see much more information. This job loads data from flat file and merges it with

a record read from Teradata via ODBC connection.

Watch the job as it executes the various stages. This particular job will take several

minutes to complete as it load approximately 100,000 rows.

Back on the “Job Output” tab you will see summary of all the steps completed.

Now let have a look at the warehouse to see if indeed the rows were loaded. Bring up

Teradata Administrator via the Start>>Programs>>Teradata Administrator 6.0

Double click DemoTDAT then OK to access the data. If you are not familiar with

Teradata Administrator then run through it’s demo script 1st.

Navigate down the left hand side until you find “twbdemo” database/user. Double click

on “twbdemo”.

Next select the “twb_target_table”, which is where we just loaded data from

“Demo_01”.

Right mouse click and select “Row Count”.

You will see the count of 100,001 rows. 100,000 rows were added from flat files and 1

row came from the ODBC connection to “twb_source_table”.

Result

We are familiar with the Teradata warehouse builder visual interface.

LAB TASK # 13

Generating the Frequency Diagrams using the Miner

Object

Generating the frequency diagrams of our data using the Miner

Introduction

Compute frequency of column values or multi-column combined values.

Optionally, compute frequency of values for pairs of columns in a single column

list or two column lists.

Tools


2. Teradata Warehouse Miner

Theory

Frequency analysis is designed to count the occurrence of individual data values

in columns that contain categorical data. It can be useful in understanding the

meaning of a particular data element, and it may point out the need to recode

some of the data values found, either permanently or in the course of building an

analytic data set. This function can also be useful in analyzing combinations of

values occurring in two or more columns.

Procedure




Start the new project by clicking the new project icon or from file menu.

Now click the Project menu>>Add new Analysis.

In the Analysis window, from categories pane select Descriptive Statistics and

click on the Frequency icon in the Analysis pane

and then press the OK button.

Select the Employee table from available tables.

Select Deptno column.

Click the right arrow to move the Deptno into the selected columns,

as shown in the fig below.

Start the report generating by Clicking on OR using Shortcut F5. The

status can be seen in the execution status pane. As shown below

The resultant report can be viewed by clicking on the Results icon in the

frequency window.

Output can be viewed in ways ( Data, Graph, SQL ) as shown below. Click on the

Graph icon in the Frequency window.


Result

We are familiar with the frequency diagrams and also we know how to

generate them.

LAB TASK # 11

Implementing Schemas

Object

To be familiar with schemas

Introduction

A schema is a set of metadata definitions about the columns and rows of a data

source or destination object, such as:

Data types and column sizes

Precision, scale, and null-value indicators

Database tables, columns and rows

Tools


2. BTEQ

3. Tbuild

4. Teradata Administrator 6.0

Theory

Teradata WB uses schema definitions, which are similar to SQL’s table

definitions. The schema definitions used in Teradata WB:

Represent virtual tables. They do not have to correspond to any actual

tables in the Teradata RDBMS.

Contain column definitions: names, and data types.

Act as reusable templates

Describe the contents of various data sources and targets, such as files,

relational tables, etc.

Are similar to record layout definitions used by the Teradata load and

unload utilities.

Procedure

Run the following code in the BTEQ utility to create the tables.

.LOGON dbc/Asad,Lodhi;

DATABASE thesis;

drop table RL_Event;

DROP table Event;



CREATE TABLE Event ( account_id char(12) CHARACTER SET LATIN,

account_type char(2) CHARACTER SET LATIN, Complaint_id integer ,

Complaint_detail varchar(50) CHARACTER SET LATIN, Actions_taken

varchar(20) CHARACTER SET LATIN, Remarks varchar(50) CHARACTER

SET LATIN );

.LOGOFF;

Then execute the following code the tbuild utility

DEFINE JOB PRODUCT_SOURCE_LOAD

DESCRIPTION 'LOAD PRODUCT DEFINITION TABLE'

(

DEFINE SCHEMA PRODUCT_SOURCE_SCHEMA

DESCRIPTION 'PRODUCT INFORMATION SCHEMA'

(

account_id char(12),

account_type char(2),

Complaint_id char(5),

Complaint_detail char(50),

Actions_taken char(20),

Remarks char(50),

newline3 char(2)

);

DEFINE OPERATOR LOAD_OPERATOR ()

DESCRIPTION 'TERADATA WB LOAD OPERATOR'

TYPE CONSUMER

INPUT SCHEMA *

EXTERNAL NAME 'libldop'

ALLOW PARALLEL MULTIPHASE

MSGCATALOG 'pcommon'

ATTRIBUTES

(

VARCHAR PauseAcq ,

INTEGER ErrorLimit = 50,

INTEGER BufferSize ,

INTEGER TenacityHours,

INTEGER TenacitySleep,

INTEGER MaxSessions = 2,

INTEGER MinSessions,

INTEGER RowInteval,

VARCHAR TdpID = 'dbc',

VARCHAR UserName = 'Asad',

VARCHAR UserPassword = 'Lodhi',

VARCHAR AccountID,

VARCHAR TargetTable = 'Event',

VARCHAR ErrorTable1 = 'Event_ERROR1',

VARCHAR ErrorTable2 = 'Event_ERROR2',

VARCHAR LogTable = 'RL_Event',

VARCHAR PrivateLogName ,

VARCHAR WorkingDatabase = 'thesis'

) ;

DEFINE OPERATOR DATACON

DESCRIPTION 'TERADATA WB DATACONNECTOR OPERATOR'

TYPE PRODUCER

OUTPUT SCHEMA PRODUCT_SOURCE_SCHEMA

EXTERNAL NAME 'libdtac'

ALLOW PARALLEL MULTIPHASE

MSGCATALOG 'pdatacon'

ATTRIBUTES

(

VARCHAR AccessModuleName ,

VARCHAR PrivateLogName ,

VARCHAR DirectoryPath = 'c:\fl',

VARCHAR FileName = 'events.txt',

VARCHAR IndicatorMode = 'N',

VARCHAR OpenMode = 'read' ,

VARCHAR Format = 'UNFORMATTED'

) ;

APPLY

(

'INSERT INTO Event (

account_id ,

account_type ,

Complaint_id ,

Complaint_detail ,

Actions_taken ,

Remarks ) VALUES (

:account_id ,

:account_type ,

:Complaint_id ,

:Complaint_detail ,

:Actions_taken ,

:Remarks );

')

TO OPERATOR ( LOAD_OPERATOR() [1])

SELECT * FROM OPERATOR

( DATACON());

) ;

tbuild is a utility which is used to implement schemas on the tables, executing different jobs and

building operators. Above two codes defines and creates a complete warehouse.

You can check the data using any query executing tool.

Result

Now we are familiar with schemas and we know how to develop a warehouse.

LAB TASK # 14

Object

To become familiar with Teradata Parallel Transporter Wizard 13.0

Tools


6. Tbuild

7. Teradata Parallel Transporter Wizard 13.0

Introduction

Teradata PT is an object-oriented client application that provides scalable, high-speed, parallel

data:

Extraction

Loading

Updating

These capabilities can be extended with customizations or with third-party products Teradata PT

uses and expands on the functionality of the traditional Teradata extract and load utilities, that is,

FastLoad, MultiLoad, FastExport, and TPump, also known as standalone Utilities.

Teradata PT supports:

• Process-specific operators: Teradata PT jobs are run using operators. These are discrete

object-oriented modules that perform specific extraction, loading, and updating

processes.

• Access modules: These are software modules that give Teradata PT access to various

data stores.

• A parallel execution structure: Teradata PT can simultaneously load data from multiple

and dissimilar data sources into, and extract data from, Teradata Database. In addition,

Teradata PT can execute multiple instances of an operator to run multiple and concurrent

loads and extracts and perform inline updating of data. Teradata PT maximizes

throughput performance through scalability and parallelism.

Basic Processing

Teradata PT can load data into, and export data from, any accessible database object in the

Teradata Database or other data store using Teradata PT operators or access modules.

Multiple targets are possible in a single Teradata PT job. A data target or destination for a

Teradata PT job can be any of the following:

Databases (both relational and non-relational)

• Database servers

• Data storage devices

• File objects, texts, and comma separated values (CSV)

When job scripts are submitted, Teradata PT can do the following:

Analyze the statements in the job script.

• Initialize its internal components.

• Create, optimize, and execute a parallel plan for completing the job by:

• Creating instances of the required operator objects.

• Creating a network of data streams that interconnect the operator instances.

• Coordinating the execution of the operators.

• Coordinate checkpoint and restart processing.

• Restart the job automatically when the Teradata Database signals restart.

• Terminate the processing environments.

Between the data source and destination, Teradata PT jobs can:

• Retrieve, store, and transport specific data objects using parallel data streams.

• Merge or split multiple parallel data streams.

• Duplicate data streams for loading multiple targets.

• Filter, condition, and cleanse data.

Teradata PT Parallel Environment

Although the traditional Teradata standalone utilities offer load and extract functions, these

utilities are limited to a serial environment. Figure given below illustrates the parallel

environment of Teradata PT.

Traditional Teradata Utilities Teradata Parallel Transporter

Teradata PT uses data streams that act as a pipeline between operators. With data streams, data

basically flows from one operator to another.

Teradata PT supports the following types of environments

Pipeline Parallelism

• Data Parallelism

Pipeline Parallelism

Teradata PT pipeline parallelism is achieved by connecting operator instances through data

streams during a single job.

• An export operator on the left that extracts data from a data source and writes it to the

data stream.

• A filter operator extracts data from the data stream, processes it, then writes it to another

data stream.

• A load operator starts writing data to a target as soon as data is available from the data

stream.

All three operators, each running its own process, can operate independently and concurrently.

As the figure shows, data sources and destinations for Teradata PT jobs can include:

• Databases (both relational and non-relational)

• Database servers

• Data storage devices, such as tapes or DVD readers

• File objects, such as images, pictures, voice, and text

Data Parallelism

Figure given below shows how larger quantities of data can be processed by partitioning a source

data into a number of separate sets, with each partition handled by a separate instance of an

operator.

Teradata PT Data Parallelism

Verifying the Teradata PT Version

To verify the version of Teradata PT you are running, issue a tbuild command (on the command

line) with no options specified, as follows:

tbuild

Switching Versions Multiple versions of Teradata Warehouse Builder (Teradata WB) and Teradata PT can be

installed.

Result

We are now become familiar with Teradata Parallel Transporter Wizard 13.0 .

LAB TASK # 15

Object

Creating a iob script a by using Teradata Parallel Transporter Wizard 13.0.

Tools


2. Tbuild

3. Teradata Parallel Transporter Wizard 13.0

Introduction

Creating a job script requires that you define the job components in the declarative section of the

job script, and then apply them in the executable section of the script to accomplish the desired

extract, load, or update tasks. The object definition statements in the declarative section of the

script can be in any order as long as they appear prior to being referenced by another object.

The following sections describe how to define the components of a Teradata PT job script.

• Defining the Job Header and Job Name

• Defining a Schema

• Defining Operators

• Coding the Executable Section

• Defining Job Steps

Defining the Job Header and Job Name

A Teradata PT script starts with an optional header that contains general information about the

job, and the required DEFINE JOB statement that names and describes the job, as shown in

Figure.

Job Header and Job Name

Consider the following when creating the job header and assigning the job name.

The Script Name shown in the job header is optional, and is there for quick reference. It can be

the same as the jobname or it can be the filename for the script.

• The jobname shown in the DEFINE JOB statement is required. It is best to use a descriptive

name, in the case of the example script, something like “Two Source Bulk

Update.”

Note that the jobname shown in the DEFINE JOB statement is not necessarily the same as the

“jobname” specified in the tbuild statement when launching the job, although it can be. The

tbuild statement might specify something like “Two Source Bulk Updateddmmyy,” to

differentiate a specific run of the job.

Defining a Schema Teradata PT requires that the job script describe the structure of the data to be processed, that

is the columns in table rows or fields in file records. This description is called the schema.

Schemas are created using the DEFINE SCHEMA statement.

The value following the keyword SCHEMA in a DEFINE OPERATOR statement identifies the

schema that the operator will use to process job data. Schemas specified in operator definitions

must have been previously defined in the job script. To determine how many schemas you must

define, observe the following guidelines on how and why schemas are referenced in operator

definitions (except standalone operators):

The schema referenced in a producer operator definition describes the structure of the

source data.

• The schema referenced in a consumer operator definition describes the structure of the data that

will be loaded into the target. The consumer operator schema can be coded as SCHEMA * (a

deferred schema), which means that it will accept the scheme of the output data from the

producer.

• You can use the same schema for multiple operators.

• You cannot use multiple schemas within a single operator, except in filter operators, which use

two schemas (input and output).

• The column names in a schema definition in a Teradata PT script do not have to match the

actual column names of the target table, but their data types must match exactly. Note,

that when a Teradata PT job is processing character data in the UTF-16 character set, all

CHAR(m) and VARCHAR(n) schema columns will have byte count values m and n,

respectively, that are twice the character count values in the corresponding column definitions of

the DBS table. Because of this, m and n must be even numbers.

The following is an example of a schema definition:

Defining Operators

Choosing operators for use in a job script is based on the type of data source, the

characteristics of the target tables, and the specific operations to be performed.

Teradata PT scripts can contain one or more of the following operator types.

• Producer operators “produce” data streams after reading data from data sources.

• Consumer operators “consume” data from data streams and write it to target tables or files.

• Filter operators read data from input data streams, perform operations on the data or filter it,

and write it to output data streams. Filter operators are optional.

• Standalone operators issue Teradata SQL statements or host operating system commands to

set up or clean up jobs; they do not read from, or write to, the data stream.

Coding the Executable Section After defining the Teradata PT script objects required for a job, you must code the executable

(processing) statement to specify which objects the script will use to execute the job tasks and

the order in which the tasks will be executed. The APPLY statement may also include data

transformations by including filter operators or through the use of derived columns in its

SELECT FROM.

A job script must always contain at least one APPLY statement, and if the job contains

multiple steps, each step must have an APPLY statement.

Coding the APPLY Statement An APPLY statement typically contains two parts, which must appear in the order shown:

A DML statement (such as INSERT, UPDATE, or DELETE) that is applied TO the consumer

operator that will write the data to the target, as shown in Figure below. The statement may also

include a conditional CASE or WHERE clause.

2. For most jobs, the APPLY statement also includes the read activity, which uses a SELECT

FROM statement to reference the producer operator. If the APPLY statement uses a standalone

operator, it does not need the SELECT FROM statement.

Note: In Figure below, the SELECT statement also contains the UNION ALL statement to

combine the rows from two SELECT operations against separate sources, each with its own

operator.

Defining Job Steps

Job steps are units of execution in a Teradata PT job. Using job steps is optional, but when

used, they can execute multiple operations within a single Teradata PT job. Job steps are

subject to the following rules:

A job must have at least one step, but jobs with only one step do not need to use the STEP

syntax.

• Each job step contains an APPLY statement that specifies the operation to be performed

and the operators that will perform it.

• Most job steps involve the movement of data from one or more sources to one or more

targets, using a minimum of one producer and one consumer operator.

• Some job steps may use a single standalone operator, such as:

• DDL operator, for setup or cleanup operations in the Teradata Database.

• The Update operator, for bulk delete of data from the Teradata Database.

• OS Command operator, for operating system tasks such as file backup.

Using Job Steps

Job steps are executed in the order in which they appear within the DEFINE JOB statement.

Each job step must complete before the next step can begin. For example, the first job step

could execute a DDL operator to create a target table. The second step could execute a Load

operator to load the target table. A final step could then execute a cleanup operation.

The following is a sample of implementing multiple job steps:

DEFINE JOB multi-step

(

DEFINE SCHEMA...;

DEFINE SCHEMA...;

DEFINE OPERATOR...;

DEFINE OPERATOR...;

STEP first_step

(

APPLY...; /* DDL step */

);

STEP second_step

(

APPLY...; /* DML step */

);

STEP third_step

(

APPLY...; /* DDL step */

);

);

Starting a Job from a Specified Job Step

You can start a job from step one or from an intermediate step. The tbuild -s command

option allows you to specify the step from which the job should start, identifying it by either

the step name, as specified in the job STEP syntax, or by the implicit step number, such as 1, 2,

3, and so on. Job execution begins at the specified job step, skipping the job steps that precede

it in the script.

Result

We have now created a iob script a by using Teradata Parallel Transporter Wizard 13.0

Engineering

Data warehousing labs maunal