353
onfidential & Proprietary A Practical Introduction to Ab Initio Software

A Practical Introduction to Ab Initio v14

  • Upload
    kodanda

  • View
    2.015

  • Download
    13

Embed Size (px)

Citation preview

Page 1: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

A Practical Introduction toAb Initio Software

Page 2: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Course Structure

Part 1: Basic Concepts and DML

Part 2: Building Applications

Part 3: Partitioning, Layouts, Checkpoints

Database Connectivity

IntermediateExercises

&

Day 1

Day 2Part 4: Lookups, Partitioners, Variables

Testing/Validation

Finger Exercises

Page 3: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Confidential and Proprietary!

All software and training material is:

Copyright © 1994-2001

Software Corporation

Presentations, on-line files, and printed matter are covered by nondisclosure agreement(s).

Course material and documentation are not to be circulated to organizations or individuals not under nondisclosure.

Page 4: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Page 5: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

A Practical Introduction toAb Initio Software

Part 1: Basic Concepts and DML

V14

Page 6: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Outline

Ab Initio OverviewFirst PrinciplesParallel Computer ArchitectureSample Applications using Ab Initio SoftwareAb Initio Product Architecture

The Graph ModelDML to Describe Data (Data Formats)DML to Transform Data

Page 7: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

What Does “Ab Initio” Mean?

• Ab Initio is Latin for “From the Beginning.”

• From the beginning our software was designed to support the largest, most complex business applications. Crucial capabilities like parallelism and checkpointing can’t be added after the fact.

• The Graphical Development Environment and a powerful set of components allow our customers to get valuable results from the beginning.

Page 8: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Ab Initio’s focus

“Big Data” problems high volumehigh complexity

High performance, scalable solutions

High productivity development

Page 9: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Ab Initio Software

•Ab Initio software is a general-purpose data processing platform for enterprise class, mission-critical applications such as:

Data warehousingBatch processingClick-stream analysisData movementData transformation

Page 10: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Parallel Computer Architecture

• Computers come in many “shapes and sizes”:• Single-CPU

• Multi-CPU

• Network of single-CPU nodes

• Network of multi-CPU nodes

• Multi-CPU machines are often called SMP’s (for Symmetric Multi Processors).

• Specially-built networks of machines are often called MPP’s (for Massively Parallel Processors).

Page 11: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

A Single-CPU Computer

Processor

Disk

Memory

Bus

Page 12: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

A Multi-CPU Computer (SMP)

Page 13: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

A Network of Single-CPU Nodes

Network

If all of these comprise one computer, it may be an MPP

Page 14: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

A Network of Multi-CPU Nodes

Page 15: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

A Network of Networks

Page 16: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Ab Initio Provides For:

• Distribution - a platform for applications to run on collections of cpu’s

• Complexity - the ability for applications to run in parallel on any combination of single-CPU computers, multi-CPU computers, and networks of computers.

Page 17: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Applications of Ab Initio Software

• “Big Data” processing.

• Parallel execution of existing applications.

• Parallel sort/merge processing.

• Data transformation.

• Rehosting of corporate data.

Page 18: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Applications of Ab Initio Software

• Front end of Data Warehouse:• Transformation of disparate sources

• Aggregation and other preprocessing

• Referential integrity checking

• Database loading

• Back end of Data Warehouse:• Extraction for external processing

• Aggregation and loading of Data Marts

Page 19: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Ab Initio Product Architecture

Native Operating Systems (Unix, Windows, OS/390)Native Operating Systems (Unix, Windows, OS/390)

The Co>Operating SystemThe Co>Operating System

Component SuitePartitioners, Transforms, ...

Component SuitePartitioners, Transforms, ...

Development Environments

GDE Shell C++

Development Environments

GDE Shell C++

3rd Party Components

3rd Party Components

UserComponents

UserComponents

User ApplicationsUser Applications

Page 20: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Co>Operating System Runs on:• Sun Solaris 2.6, 7, and 8 (SPARC)

• IBM AIX 4.2, and 4.3

• Hewlett-Packard HP-UX 10.20, 11.00, and 11.11

• Siemens Pyramid Reliant UNIX Release 5.43

• IBM DYNIX/ptx 4.4.6, 4.4.8, 4.5.1, and 4.5.2

• Silicon Graphics IRIX 6.5

• Red Hat Linux 6.2 and 7.0 (x86)

• Windows NT 4.0 (x86) with SP 4, 5 or 6

• Windows NT 2000 (x86) with no service pack or SP1

• Digital UNIX V4.0D (Rev. 878) and 4.0E (Rev. 1091)

• Compaq Tru64 UNIX Versions 4.0F (Rev 1229) and 5.1 (Rev 732)

• IBM OS/390 Version 2.8, 2.9, and 2.10

• NCR MP-RAS 3.02

Page 21: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Connectivity to Other Software

• Common, high performance database interface:• IBM DB2, DB2/PE, UDB•Oracle • Informix XPS•Sybase•Teradata•MS SQL Server 7

• Other software packages:•SAS•Trillium•Postalsoft• ...

Page 22: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Co>Operating System Services

• Parallel and distributed application execution• Control• Data Transport

• Transactional semantics on the application level.

• Checkpointing.• Monitoring and debugging.• Parallel file management.• Metadata-driven components.

Page 23: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The Graph Model

Page 24: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The Graph Model: Naming the Pieces

Dataset DatasetsComponents

Flows

Page 25: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The Graph Model: Some Details

Ports

Record formatmetadata

Expressionmetadata

Page 26: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Components

• A component is a program.

• Components may run on any computer running the Co>Operating System.

• Different components do different jobs.

• The particular work a component accomplishes depends on its parameter settings.

• Some parameters are computational metadata.

Page 27: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Datasets

• A dataset is a source or destination of data. It can be a file, a database table, a SAS dataset, ...

• Datasets may reside on any machine running the Co>Operating System.

• Datasets may reside on other machines if connected by FTP or database middleware

• Data is described by record format metadata.

Page 28: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Dataset:Records and Fields

0345John Smith0212Sam Spade0322Elvis Jones0492Sue West0121Mary Forth0221Bill Black

0345John Smith0212Sam Spade0322Elvis Jones0492Sue West0121Mary Forth0221Bill Black

Dataset

Records

Fields

A dataset is made up of records; a record consists of fields.

Analogous database terms are rows and columns; analogousSAS terms are observations and variables.

Page 29: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Sources of Record Format Metadata

•Record formats can be generated manually (hand coding / typing) or automatically from:•Database catalogs

•COBOL copybooks

•SAS datasets

•Other third-party products

Page 30: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Record Format Metadata in GDE

0345John Smith0212Sam Spade0322Elvis Jones0492Sue West0121Mary Forth0221Bill Black

0345John Smith0212Sam Spade0322Elvis Jones0492Sue West0121Mary Forth0221Bill Black

Page 31: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Editing Types in GDE

Field name Field type Field length

Page 32: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The Record Format in Text

record decimal(4) id; string(6) first_name; string(6) last_name; string(5) newfield;end

Page 33: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Field Names

Names consist of letters, digits, and underscores:a … z, A … Z, 0 … 9, _

Note: No spaces, hyphens, $’s, #’s, %’s, or other symbols that may be acceptable to RDBMS

Case matters! ABC and abc are different!

Some words are reserved (record, end, date, …)

Page 34: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Field Type and Field Length

There are several built-in types available via the drop-down menu. This course uses three types: string, decimal (for all numbers), and date.

A date requires a format specifier that is an exact representation of the date (e.g., “MM-DD-YY”).

A field length is either a number for fixed-length fields, or the delimiter that terminates the field for variable-length fields.

Page 35: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

What Data Can Be Described?

• There are both fixed-size and variable-length types.

• ASCII, EBCDIC, UNICODE character sets are supported.

• Supported types can represent strings, numbers, binary numbers, packed decimals, dates …

• Complex data formats can consist of vectors, nested records, ...

Page 36: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Access to Field Characteristics

•Some aspects of field descriptions (e.g., date formats) must be accessed via the attribute pane.

•To see additional attributes, use the ‘Attributes’ item on the Record Format Editor’s View Menu or use the Attributes button.

Page 37: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

More Record Format Editing

View… Attributes.

Field Type drop-down

Length can be delimiter string

Date format goes here

Page 38: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Text Record Format for Date Field

record decimal(4) id; string(6) first_name; string(6) last_name; date("YYYY-DD-MM") newfield;end;

Page 39: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Viewing Data(figure-01)

1. Right click on dataset.

2. Select “View Data...”

Page 40: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Expressions in DML

• Computations are expressed in the algebraic syntax of C, Pascal, etc.

• Field names act as variables.

• Arithmetic operators: +, -, *, ...• Comparison operators: >, <, ==, !=, ...

• Many built-in functions: string_concat, string_trim, today, date_day_of_week, …

• (See Chapter 4 of the Data Manipulation Language Reference for more information on expressions and built-in functions.)

Page 41: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Evaluating Expressions fromView Data

Type in an expression...

…or use the expression editor

Page 42: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Expression Editor

Expression text

Fields Functions Operators

Page 43: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 1: Writing DML

• Open examples\intro-course\ex1.

• The data file ex1.dat contains these lines:Smith,John,1992.02.23,2400Jones,Jane,1993.10.29,320Warren,Jake,1994.11.02,9045

• Use the Record Format Editor (New) to create a description of this data. Lastname, firstname, pur_date, and amt. Then use View Data to verify the description is correct.

• (Hint: Newline delimiters are written: ”\n”.)

Page 44: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Simple Components

• In these components the record format metadata does not change from input to output

Page 45: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The Filter by Expression Component

• Reads records from input port and evaluates the select_expr parameter for each. If expression is true (non-zero), record is written to out port.

• Optionally, if expression is false (zero), record is written to deselect port.• One port must be connected downstream• Can use both flows

Page 46: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Filter Data (Selection) (figure-02)

1. Push “Run” button.

2. View monitoring information. 3. View output data.

Page 47: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Expression Parameter

Page 48: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 2: Data Filtering (Selection)

•Using example graph figure-02.mp, change the select expression parameter of the Filter by Expression component to select records with id greater than 215.

•Run the application and examine the resulting data.

Page 49: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Keys

• A key identifies a field or set of fields used to organize a dataset in some way.

• Single field:id• Multiple field: { last_name; first_name }

• Modifiers: { id descending }

• Used for sorting, grouping, partitioning.

• (See Chapter 8 of the Data Manipulation Language Reference for more information on keys. Note: keys are also called collators.)

Page 50: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The Sort Component

• Reads records from input port, sorts them by key, and writes result on output port.

Page 51: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Sorting (figure-03)

Page 52: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Sorting(figure-03)

Page 53: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Using example graph figure-03.mp, change the key parameter of the Sort component to sort the data by first_name.

Run the application and examine the resulting data.

Exercise 3: Sorting

Page 54: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

More Complex Components

• In these components the record format metadata typically changes (goes through a transfor-mation) from input to output

Page 55: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Data Transformation

0345,090263John,Smith;0345,090263John,Smith;

1000345Smith 1963.09.021000345Smith 1963.09.02

Drop

id+1000000

Reformat

Reformat Reorder

Input record format: record decimal(”,”) id; date(”MMDDYY”) bday; string(”,”)

first_name; string(”;”) last_name; end

Output record format: record decimal(7) id; string(8) last_name; date(”YYYY.MM.DD”) bday; end

Page 56: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The Reformat Component

• Reads records from input port, reformats them according to a transform function, and writes the result records to output (out0) port.

• Additional output ports (out1, ...) can be created by adjusting the count parameter.

Page 57: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

•A transform function specifies the rules used to create the output record.

•Each field of the output record must be assigned a value. Partial output records are not allowed!

•The transform editor is used to create a transform function in a graphical manner.

Transform Function

Page 58: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Transform Editor

Page 59: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Text DML: Transform Function Syntax

• Functions look like: output-variables :: name ( input-variables ) = begin

assignments end;

• Assignments look like: output-variable.field :: expression ;

(See Chapter 6 of the Data Manipulation Language Reference for more information on transform functions.)

Page 60: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

A Look Inside the ReformatComponent

b ca

x zy

Page 61: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

45 QF9

out :: trans(in) =begin out.x :: in.b - 1; out.y :: in.a; out.z :: fn(in.c);end;

1. Record arrives at input port

Page 62: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

45 QF9

out :: trans(in) =begin out.x :: in.b - 1; out.y :: in.a; out.z :: fn(in.c);end;

2. Record is read into component

Page 63: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

45 QF9

out :: trans(in) =begin out.x :: in.b - 1; out.y :: in.a; out.z :: fn(in.c);end;

3. Transform function is evaluated

Page 64: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

44 RG9

out :: trans(in) =begin out.x :: in.b - 1; out.y :: in.a; out.z :: fn(in.c);end;

4. Transform function yields a result record

Page 65: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

out :: trans(in) =begin out.x :: in.b - 1; out.y :: in.a; out.z :: fn(in.c);end;

44 RG9

5. Result record is written to output port

Page 66: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 4: Reformat Data

• Using graph figure-04.mp, write a record format with an id from the simple dataset and a single name field of 20 characters.

• Write a transform function to produce a dataset in this format passing through the id and concatenating first_name and last_name using string_concat.

• Run the graph and examine the results.

• Modify the transform to trim the spaces from the first name before concatenating with last name to get “John Smith ” rather than “John Smith ”

Page 67: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Data Aggregation

0345Smith Bristol 560212Spade London 80322Jones Compton 120492West London 230121Forth Bristol 70221Black New York 42

0345Smith Bristol 560212Spade London 80322Jones Compton 120492West London 230121Forth Bristol 70221Black New York 42

Bristol 63Compton 12London 31New York 42

Bristol 63Compton 12London 31New York 42

Page 68: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Data Aggregation of Sorted/Grouped Input

0345Smith Bristol 560121Forth Bristol 70322Jones Compton 120212Spade London 80492West London 230221Black New York 42

0345Smith Bristol 560121Forth Bristol 70322Jones Compton 120212Spade London 80492West London 230221Black New York 42

Bristol 63Compton 12

London 31New York 42

Bristol 63Compton 12

London 31New York 42

Page 69: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The Rollup Component

• By default, Rollup reads sorted records from the input port, aggregates them as indicated by key and transform parameters, and writes the resulting aggregated records on the out port.

Page 70: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Built-in Functions for Rollup

•The following aggregation functions are predefined and are only available in the rollup component:

avg maxcount minfirst productlast sum

Page 71: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Note the use of an aggregation function in the expression

Rollup Wizard

Page 72: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 6: Rollup Data

•Using example graph figure-05.mp, modify the transform function to count the number of records for the same city.

•Run the application and examine the results.

Page 73: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Joining Data

0345Smith Bristol 560212Spade London 80322Jones Compton 120492West London 230121Forth Bristol 70221Black New York 42

0345Smith Bristol 560212Spade London 80322Jones Compton 120492West London 230121Forth Bristol 70221Black New York 42

0322970402 1242.500345970924 923.750121961211 12392.000492971123 234.120666950616 2312.10

0322970402 1242.500345970924 923.750121961211 12392.000492971123 234.120666950616 2312.10

0345Bristol 561997/09/240212London 81900/01/010322Compton 121997/04/020492London 231997/11/230121Bristol 71996/12/110221New York 421900/01/01

0345Bristol 561997/09/240212London 81900/01/010322Compton 121997/04/020492London 231997/11/230121Bristol 71996/12/110221New York 421900/01/01

Page 74: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Joining Sorted Data

0121Forth Bristol 70212Spade London 80221Black New York 420322Jones Compton 120345Smith Bristol 560492West London 23

0121Forth Bristol 70212Spade London 80221Black New York 420322Jones Compton 120345Smith Bristol 560492West London 23

0121961211 12392.00

0322970402 1242.500345970924 923.750492971123 234.120666950616 2312.10

0121961211 12392.00

0322970402 1242.500345970924 923.750492971123 234.120666950616 2312.10

0121Bristol 71996/12/110212London 81900/01/01...

0121Bristol 71996/12/110212London 81900/01/01...

Page 75: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Building the Output Record

• out:

• record

• decimal(4) id;• string(8) city;• decimal(3) amount;• date(”YYYY/MM/DD”) dt;• end

in0:record decimal(4) id; string(6) name; string(8) city; decimal(3) amount;end

in1:record decimal(4) id; date(”YYMMDD”) dt; decimal(9.2) cost;end

Page 76: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

What if in1 record is missing?

• out:• record

• decimal(4) id;• string(8) city;• decimal(3) amount;• date(”YYYY/MM/DD”) dt;• end

in0:record decimal(4) id; string(6) name; string(8) city; decimal(3) amount;end

in1:record decimal(4) id; date(”YYMMDD”) dt; ??? decimal(9.2) cost;end

Page 77: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Prioritized Assignment

• In DML, a missing value (say, if there is no in1 record) causes an assignment to fail.

• If an assignment for a left hand side fails, the next priority assignment is tried. There must be one successful assignment to each output field.

out.dt :1: in1.dt;out.dt :2: “1900/01/01”;

PriorityDestination Source

Page 78: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Assigning Priority to Business Rules

Page 79: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Resulting Display

Page 80: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The Join Component

• Join performs a join of inputs. By default, the inputs to join must be sorted and an inner join is computed.

•Note: The following slides and the on-line example assume the join-type parameter is set to ‘Outer’, and thus compute an outer join.

Page 81: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Joining (figure-06)

Page 82: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

A Look Inside the Join Component*

out :: fname(in0, in1) =begin ... ... ... ... ...end;

q rab ca

Align inputs by key

xa q

q rab ca

*join-type = Full Outer join

Page 83: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

out :: join(in0, in1) =begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: ”XX”;end;

Align inputs by a

NY 4G234 42G

1.Records arrive at inputs

Page 84: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

out :: join(in0, in1) =begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: ”XX”;end;

Align inputs by a

NY 4G234 42G

2.Records read into component

Page 85: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

out :: join(in0, in1) =begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: ”XX”;end;

Align inputs by a

NY 4G234 42G

3.Keys compared

Page 86: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

out :: join(in0, in1) =begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: ”XX”;end;

Align inputs by a

NY 4G234 42G

4.Aligned records passed to function

Page 87: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

out :: join(in0, in1) =begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: ”XX”;end;

Align inputs by a

NY 4G234 42G

5.Transform evaluated

Page 88: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

out :: join(in0, in1) =begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: ”XX”;end;

Align inputs by a

24G NY

6.Result record generated

Page 89: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

out :: join(in0, in1) =begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: ”XX”;end;

Align inputs by a

24G NY

7.Result record written

Page 90: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

out :: join(in0, in1) =begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: ”XX”;end;

Align inputs by a

IL 8K 79 23H

8.Records arrive at input

Page 91: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

out :: join(in0, in1) =begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: ”XX”;end;

Align inputs by a

IL 8K 79 23H

9.Records read into component

Page 92: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

out :: join(in0, in1) =begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: ”XX”;end;

Align inputs by a

IL 8K 79 23H

10.Keys compared

Page 93: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

out :: join(in0, in1) =begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: ”XX”;end;

Align inputs by a

IL 8K

79 23H

11.Aligned records passed to function

Page 94: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

out :: join(in0, in1) =begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: ”XX”;end;

Align inputs by a

IL 8K

79 23H

12.Transform evaluated

Page 95: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

out :: join(in0, in1) =begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: ”XX”;end;

Align inputs by a

89H XX

IL 8K

13.Result record generated

Page 96: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

out :: join(in0, in1) =begin out.a : : in0.a; out.x :1: in1.r + 20; out.x :2: in0.b + 10; out.q :1: in1.q; out.q :2: ”XX”;end;

Align inputs by a

89H XX

IL 8K

14.Result record written

Page 97: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 7: Join Data

•Using example graph figure-06.mp, modify the transform function to join visits.dat and last-visits.dat so that no records are rejected.

•Run the application, and examine the results. The Unmatched Last Visits dataset should be empty.

Page 98: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 8 (if time): Join Retaining All Fields

• Building upon the graph you created in Exercise 7, create a new output record format and transform function to join visits.dat and last-visits.dat according to the following rules:• Retain all fields from each dataset.• Supply defaults where necessary.

• Change the necessary parameters, run the application, and examine the results.

Page 99: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Mouse & Key ShortcutsAction On What? Does This

Shift-<doubleclick>

Components Open Editor

<double click> A parameter inParameter Tab

Open Editor

<double click> Port Open RecordFormat Editor

Drag input fieldto blank spacein output fieldpane

Transformeditor

Adds field tooutput recordformat

Mouse and Key Shortcuts

Page 100: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Page 101: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Part 2: Building Applications

A Practical Introduction toAb Initio Software

Page 102: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Outline

•Constructing Applications•Parallelism

•Data Partitioning•Multifiles

Page 103: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Steps in Building an Application

•Add datasets.•Add components.•Add flows.•Modify as needed.

• Configure datasets and components along the way; let the yellow “To Do” cues guide you.

• Generally, you should configure your input and output metatdata (record formats) before adding flows.

Page 104: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Adding an Input Dataset

2. Open Datasets Category

3. Choose InputFile

1. Click on Component Button

Page 105: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Configuring the Input Dataset

1. Browse to find simple.dat 2. Browse to find simple.dml

3. Change label to something descriptive

Page 106: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Adding a Filter by Expression Component

2. Choose Filter by Expression

1. Open Transform Category

Page 107: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Adding an Output Dataset

Choose OutputFile

Page 108: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Configuring the Output Dataset

1. Browse to see directory 2. Enter name of output file

Page 109: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Adding Flows

1. Click on source (hold)

2. Drag to destination (release)

Page 110: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Configuring Filter by Expression

Enter expression

Page 111: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Flows Can Propagate Configuration

• One way to “Get rid of yellow” is to configure datasets or components.

• Hooking up flows allows the GDE to automatically propagate many kinds of information, like record format metadata; sometimes, connecting things is all you need to do to “Get rid of yellow.”

Page 112: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Tip: Let Propagation Do the Work!

• Define record formats for input datasets.

• Define record formats for output datasets only when they differ from input datasets; let propagation do as much as possible.

• If record formats change, this minimizes the impact on the graph.

• Sometimes you will need to set record formats on components. In such cases, usually you should set the format on the output port.

Page 113: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Tip: Look Before Deleting Components!

• Before deleting a component in a graph, look to see whether the component defines record formats for any of its ports. If you delete a component with record format definitions, you may lose the definitions.

• To safely delete such a component: For each port with a record format definition, go to the other end of the flow for that port (which will be some other component or dataset) and uncheck the ‘propogate from neighbor’ box for the associated port.

Page 114: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Running the Application

1. Push “Run” button.

2. View monitoring information.

3. View output data.

Page 115: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Diagnostic Ports:Reject, Error

•Reject: Input records that caused errors.

•Error: Error messages.

Page 116: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Instrumentation Parameters:Reject-threshold

• A drop-down menu specifying the number of errors to tolerate. The choice “Use limit/ramp” allows for other possibilities.

Page 117: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Diagnostic Port:Log

•Log: Logging records.

Page 118: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Instrumentation Parameters:Log

• Syntax: event OR event/n (a power of 10)

• Logs records of type event. If n is specified, only 1 of every n records are logged. Valid events are:

• input, output, reject, intermediate

Page 119: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Logging Record Format

• Logging flows have predefined metadata.

• The record format is:

• record• string("|") node;• string("|") timestamp;• string("|") component;• string("|") subcomponent;• string("|") event_type;• string("|\n") event_text;• end

Page 120: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Component: Gather Logs

• Reads logging records from multiple flows connected to the input port and writes them to the specified file outside of the application’s transactional context. The start-text and end-text parameter values are written to the log at the beginning and end.

Page 121: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Component: Replicate

• Copies records from input port to multiple flows connected to output port.

Page 122: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Sample Graph

Page 123: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 9: Creating a Reformatting Application

• Create a new graph that:

Reads data from simple.dat with record format simple.dml.

Reformats that data with simple-out.xfr.

Writes the results to simple-out.dat with record format simple-out.dml.

• Run it and verify the results.

Page 124: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 10:Obtaining Log Information

• Add a Gather Logs component to the application.

• Configure the component. Don’t forget to provide a log file name.

• Connect it to the Reformat’s log port.

• Run the application.

• View the log file on the server.

Page 125: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 11: Creating an Aggregation Application

•Create an application that:

Reads data from visits.dat with record format visits.dml.

Sorts it by city.

Aggregates it (using Rollup component) by city with visits-to-city-rollup.xfr.

Writes the results to visits-to-city.dat with record format visits-to-city.dml.

Logs input,output,intermediate events.

Page 126: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Computing without Sort

Some components do not require pre-sorted inputs.

These components work by keeping some or all of the inputs in memory.

These components usually have a sorted-input parameter, or have the word hash in their name.

There are rules of thumb about when to use “in-memory” sorting or grouping vs sorting before the component.

Page 127: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 12: Rollup without Sort

• Open figure-05.

• Save As... to figure-05-nosort.

• Delete the Sort component.

• Change the sorted-input parameter of the Rollup component to “in-memory…”

• Run the application and examine the results.

Page 128: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 13:Join without Sort

• Open figure-06.

• Save As... to figure-06-nosort.

• Delete both Sort components.

• Change the sorted-input parameter of the Join component to “in-memory…”

• Run the application and examine the results.

Page 129: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Forms of Parallelism

• Component parallelism

• Pipeline parallelism

• Data parallelism

Page 130: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Component Parallelism

Sorting Customers

Sorting Transactions

Page 131: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Component Parallelism

•Comes “for free” with graph programming.

•Limitation:•Scales to number of “branches” a graph.

Page 132: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Pipeline Parallelism

Processing Record: 100

Processing Record: 99

Page 133: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Pipeline Parallelism

•Comes “for free” with graph programming.

•Limitations:•Scales to length of “branches” in a graph.

•Some operations, like sorting, do not pipeline.

Page 134: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Data Parallelism

Partiti

ons

Page 135: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Global View:

Expanded View:

Two Ways of Looking atData Parallelism

Page 136: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Data Parallelism

•Scales with data.

•Requires data partitioning.

•Different partitioning methods for different operations.

Page 137: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Data Partitioning

Expanded View:

Global View:

Page 138: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Data Partitioning: The Global View

Fan-out Flow

Degree of Parallelism

Page 139: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Component: Partition by Round-robin

• Reads records from its input port and writes them to the flow partitions connected to its output port. Records are written to partitions in “roundrobin” fashion, with block-size records going to a partition before moving on to the next.

Page 140: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Roundrobin Partitioning

BCD

FCDBGB

DF

D

BC

D

FC

DB

GB

DF

E

E

E

E

A

AA

A

A

AA

AD

Partition 0 Partition 1 Partition 2

Page 141: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Roundrobin Partitioning

B CD FC D BG B

D FD

Partition 0 Partition 1 Partition 2

BCD

FCDBGB

DF

E

EE

E

A

AA

A

A

AA

AD

Page 142: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

A Data Parallel Application:The Expanded View

Page 143: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 14: Data Parallel Reformatting (Expanded)

• Open figure-04.

• Save As... to figure-04-expanded.

• Create a copy of the Reformat and the Simple-Out dataset (use Edit...Copy and Edit…Paste).

• Change the path for the copy of Simple-Out.

• Add a Partition by Round-robin component before the Reformat components; hook them up with flows.

• Run the application and examine the results.

Page 144: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

A Data Parallel Application:The Global View

Fan-out Flow Multifile

Degree of Parallelism(Abstract)

Page 145: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

What is a Multifile?

• A multifile is essentially the “global view” of a set of ordinary files, each of which may be located anywhere.

• Each partition of a multifile is an ordinary file.

• By using the global view and multifiles, you can avoid having to draw data parallelism explicitly.

• Ab Initio utilities let you manipulate (copy, rename, delete, etc.) multifiles as easily as ordinary files.

• Note that the icon for a multifile has 3 platters instead of 2.

Page 146: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Multifiles

• Multifiles reside in multidirectories.

• Multidirectories and multifiles are identified using URL syntax with “mfile” as the protocol part:

•mfile:/users/training-07/test-mfs/

•mfile:mfs2/transactions/

•mfile://mktg-mpp/vol3/big-mfs/january/sales.dat

• These URL’s are simply abbreviations for the many pieces making up a multidirectory or multifile.

(See Chapter 2 of the Co>Operating System Administrator’s Guide for more information on multifiles.)

Page 147: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

A Multidirectory

mfile://host1/u/jo/mfs

//host3/vol7/pC///host2/vol3/pB///host1/vol4/pA///host1/u/jo/mfs/

ControlPartition

DataPartition

DataPartition

DataPartition

A single name for three directories

Page 148: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

A Multifile

//host3/vol7/pC///host2/vol3/pB///host1/vol4/pA///host1/u/jo/mfs/

ControlPartition

DataPartition

DataPartition

DataPartition

A single name for three filesmfile://host1/u/jo/mfs/a.dat

a.dat a.dat a.dat a.dat

Page 149: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Additional Multidirectories

//host3/vol7/pC///host2/vol3/pB///host1/vol4/pA///host1/u/jo/mfs/

ControlPartition

DataPartition

DataPartition

DataPartition

a.dat a.dat a.dat a.dat

dir1/ dir1/ dir1/ dir1/

mfile://host1/u/jo/mfs/dir1

Page 150: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Additional Multidirectories

//host3/vol7/pC///host2/vol3/pB///host1/vol4/pA///host1/u/jo/mfs/

ControlPartition

DataPartition

DataPartition

DataPartition

a.dat a.dat a.dat a.dat

dir1/ dir1/ dir1/ dir1/

mfile://host1/u/jo/mfs/dir1

Page 151: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Additional Multidirectories

//host3/vol7/pC///host2/vol3/pB///host1/vol4/pA///host1/u/jo/mfs/

ControlPartition

DataPartition

DataPartition

DataPartition

a.dat a.dat a.dat a.dat

dir2/ dir2/ dir2/ dir2/dir1/ dir1/ dir1/ dir1/

mfile://host1/u/jo/mfs/dir2

Page 152: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

A Multidirectory Hierarchy

//host3/vol7/pC///host2/vol3/pB///host1/vol4/pA///host1/u/jo/mfs/

ControlPartition

DataPartition

DataPartition

DataPartition

a.dat a.dat a.dat a.dat

dir2/ dir2/ dir2/ dir2/dir1/ dir1/ dir1/ dir1/

mfile://host1/u/jo/mfs/dir2/b.dat

b.dat b.dat b.dat b.datx.dat x.dat x.dat x.dat

Page 153: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Adding a Multifile Dataset

1. Drill into multidirectory

2. Type in filename

Page 154: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 15: Data Parallel Reformatting (Global)

• Open figure-04.

• Save As... to figure-04-global.

• Add a Partition by Round-robin component.

• Change the Simple-Out dataset to a multifile.

• Run the application and examine the results (use the “Partition” option in View Data).

Page 155: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Data Aggregation in Parallel

0345Smith Bristol 560322Jones Compton 120121Forth Bristol 7

0345Smith Bristol 560322Jones Compton 120121Forth Bristol 7

Bristol 63Compton 12

Bristol 63Compton 12

0212Spade London 80492West London 230221Black New York 42

0212Spade London 80492West London 230221Black New York 42

London 31New York 42

London 31New York 42

Page 156: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Data Aggregation of Grouped Input in Parallel

0345Smith Bristol 560121Forth Bristol 70322Jones Compton 12

0345Smith Bristol 560121Forth Bristol 70322Jones Compton 12

Bristol 63Compton 12

Bristol 63Compton 12

0212Spade London 80492West London 230221Black New York 42

0212Spade London 80492West London 230221Black New York 42

London 31New York 42

London 31New York 42

Page 157: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

• Aggregation processes records in groups defined by key values.

• Parallel aggregation requires partitioning based on key value.

• Parallel aggregation takes three steps:• Partition by key.• Sort by key. Same key in each step• Aggregate by key.

Key-Dependent Data Parallelism

Page 158: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Component: Partition by Key

• Reads records from its input port and writes them to the flow partitions connected to its output port. A hash code computed using the key determines which partition a record will be written on, meaning that records with the same key value will go to the same partition.

Page 159: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partitioning by Key

BC

D

FC

DBGB

DF

D

Partition 0 Partition 1 Partition 2

BCD

FCDBGB

DF

E

E

A

AA

A

E

E

A

AA

AD

Page 160: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partitioning by Key

Partition 0 Partition 1 Partition 2B

C

DF

C D

BGB

DF

D

BCD

FCDBGB

DF

E

E

A

AAA

E

E

A

AA

AD

Page 161: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partition by Key + Sort = Parallel Grouping

B

C

D

FC

D

B

G

BD

F

D

Partition 0 Partition 1 Partition 2

DF

D

DF

D

BCD

FCDBGB

DF

D

BCC

BGB

EE

AAAA

E

E

A

AA

A

E

E

A

AAA

Page 162: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Common Mistakes

•Incorrect Results if:Keys for partition, sort, or aggregate differ.Data is partitioned, but is never sorted.

•Computationally Expensive if:Data is sorted before it is partitioned.

Page 163: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 16:Data Parallel Aggregation•Start with figure-05.

•Save As... to figure-05-parallel.

•Add a Partition by Key component.

•Change the output file to a multifile.

•Run the application and examine the results.

Page 164: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Page 165: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Part 3: Intermediate Topics

A Practical Introduction toAb Initio Software

Page 166: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Outline

•Departitioning

•Deadlock

•Repartitioning

•Layouts

•Phases and Checkpoints

•Anatomy of a Running Job

•Sample Applications

Page 167: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Departitioning

Departitioning combines many flows of data toproduce one flow. It is the opposite of partitioning.

Each departition component combines flows in adifferent manner.

Page 168: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Expanded View:

Global View:

Departitioning

Output File

Score 1

DepartitionScore 2

Score 3

Page 169: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Departitioning

• For the various departitioning components:•Key-based?•Result ordering?•Effect on parallelism?•Uses?

Fan-in Flow

Page 170: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Departitioning: Performance

Input buffer Output buffer

Free space

Used space

Page 171: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Concatenation

49Jane 02241 2

44Bob 02116 8

43Mark 02114 9

49Jane 02241 2

44Bob 02116 8

43Mark 02114 9

47Bill 02114 14

46Rick 02116 23

47Bill 02114 14

46Rick 02116 2342John 02116 30

48Mary 02116 38

45Sue 02241 92

42John 02116 30

48Mary 02116 38

45Sue 02241 92

Globally ordered, partitioned data:

49Jane 02241 2

44Bob 02116 8

43Mark 02114 9

47Bill 02114 14

46Rick 02116 23

42John 02116 30

48Mary 02116 38

45Sue 02241 92

49Jane 02241 2

44Bob 02116 8

43Mark 02114 9

47Bill 02114 14

46Rick 02116 23

42John 02116 30

48Mary 02116 38

45Sue 02241 92

Sorted data:

Page 172: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Concatenation: Performance

Blocked components

Running components Reading single flowin its entirety

Page 173: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Concatenation

• Not key-based.• Result ordering is by partition.• Serializes pipelined computation.• Useful for:

• creating serial flow from partitioned data• appending headers and trailers• writing DML

• Used infrequently

Page 174: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Merge

42John 02116 30

48Mary 02116 38

45Sue 02241 92

42John 02116 30

48Mary 02116 38

45Sue 02241 92

49Jane 02241 2

43Mark 02114 9

46Rick 02116 23

49Jane 02241 2

43Mark 02114 9

46Rick 02116 23

44Bob 02116 8

47Bill 02114 14

44Bob 02116 8

47Bill 02114 14

Round-robin partitioned and sorted by amount:

49Jane 02241 2

44Bob 02116 8

43Mark 02114 9

47Bill 02114 14

46Rick 02116 23

42John 02116 30

48Mary 02116 38

45Sue 02241 92

49Jane 02241 2

44Bob 02116 8

43Mark 02114 9

47Bill 02114 14

46Rick 02116 23

42John 02116 30

48Mary 02116 38

45Sue 02241 92

Sorted data, following merge on amount:

Page 175: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Merge: Performance

Components running roughly in lock-step

If keys evenly distributed: Reading flowsroughly evenly

Page 176: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Merge: Performance

If keys globally sorted or near globally sorted:

Blocked components

Reading single flowin its entirety

Page 177: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Merge

• Key-based.• Result ordering is sorted if each input is sorted.• Possibly synchronizes pipelined computation; may

even serialize.• Useful for creating ordered data flows.• Used more than concatenate, but still infrequently

Page 178: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Interleave

42John 02116 30A

45Sue 02241 92A

48Mary 02116 38A

42John 02116 30A

45Sue 02241 92A

48Mary 02116 38A

43Mark 02114 9C

46Rick 02116 23B

49Jane 02241 2C

43Mark 02114 9C

46Rick 02116 23B

49Jane 02241 2C

44Bob 02116 8C

47Bill 02114 14B

44Bob 02116 8C

47Bill 02114 14B

Round-robin partitioned and scored:

42John 02116 30A

43Mark 02114 9C

44Bob 02116 8C

45Sue 02241 92A

46Rick 02116 23B

47Bill 02114 14B

48Mary 02116 38A

49Jane 02241 2C

42John 02116 30A

43Mark 02114 9C

44Bob 02116 8C

45Sue 02241 92A

46Rick 02116 23B

47Bill 02114 14B

48Mary 02116 38A

49Jane 02241 2C

Scored dataset in original order, following interleave:

Page 179: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Interleave: Performance

Components running in lock-step

Reading flows inround-robin sequence

Page 180: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Interleave

• Not key-based.

• Result ordering is inverse of round-robin.

• Synchronizes pipelined computation.

• Useful for restoring original order following a record-independent parallel computation partitioned by round-robin.

• Used in rare circumstances

Page 181: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Gather

42John 02116 30A

45Sue 02241 92A

48Mary 02116 38A

42John 02116 30A

45Sue 02241 92A

48Mary 02116 38A

43Mark 02114 9C

46Rick 02116 23B

49Jane 02241 2C

43Mark 02114 9C

46Rick 02116 23B

49Jane 02241 2C

44Bob 02116 8C

47Bill 02114 14B

44Bob 02116 8C

47Bill 02114 14B

Round-robin partitioned and scored:

43Mark 02114 9C

46Rick 02116 23B

42John 02116 30A

45Sue 02241 92A

48Mary 02116 38A

44Bob 02116 8C

47Bill 02114 14B

49Jane 02241 2C

43Mark 02114 9C

46Rick 02116 23B

42John 02116 30A

45Sue 02241 92A

48Mary 02116 38A

44Bob 02116 8C

47Bill 02114 14B

49Jane 02241 2C

Scored dataset in random order, following gather:

Page 182: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Gather: Performance

Reading flows asdata is available

Page 183: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Gather

• Not key-based.• Result ordering is unpredictable.• Neither serializes nor synchronizes pipelined

computation.• Useful for efficient collection of data from multiple

partitions and for repartitioning.• Used most frequently

Page 184: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Summary of Departitioning Methods

Method Key-based? Ordering? Uses Merge Yes Sorted Creating ordered serial flow Concatenate No Global Creating serial flow from

partitioned data Interleave No Inverse of

round-robin “Undoing” round-robin partitioning

Gather No Unpredictable Unordered departitioning, repartitioning

Page 185: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Deadlock

Blocking on read

Blocking on write

Page 186: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Avoiding Deadlock

•Use Concatenate, Interleave and Merge with care

•Use flow buffering.

• Insert phase break before departition.

•Don’t serialize data unnecessarily; repartition instead of departition.

Page 187: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Repartitioning

Use to redistribute records across partitions.

Records are almost always redistributed in akey-based manner, but don’t have to be.

Records can be redistributed to fewer partitions,the same number of partitions, or more partitions.

Page 188: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The “Wrong” Way

This serializes the computation.

Page 189: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Expanded View:

Global View:

Repartitioning -- The Right Way

Page 190: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Repartitioning

Note: The departition component is almost always a Gather.

All-to-All Flow

Page 191: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Key Repartition + Sort = Regroup

B

C

D

FC

D

B

G

BD

F

D

Partition 0 Partition 1 Partition 21

6

6 6

6

2

42

2

4

77

G 7F 7

C

C

BD

12

2F 2

B

AAAA

55

55

5

5

55

AAA

A 4

D 4

B 6

EE

3 3

E

3

3

E 6

D 6

D 6

Partition by Key:

Gather:

Page 192: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partition 0 Partition 1 Partition 2

G 7F 7

CCBD

122

F 2

B

4D 4B 6

6D 6D 6

Sort:

G 7F 7

C

C

BD

12

2F 2

B

5555

AA

A

A

5

5

55

AAA

A 4

D 4

B 6

E

33

E

E

3

3

E 6

D 6

D 6

Key Repartition + Sort = Regroup

Page 193: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Key Repartition + Sort = Regroup

Page 194: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Sort Does “Gathering”

Page 195: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Which Components will Gather?

Many built-in components will gather. To find out ifa specific component will gather:• Select the component in the component organizer• Either:

– Look at the adjacent help– Look for “fan” next to Input Ports: in

OR– Press the help button– Look for “fan-in” in the Ports section beside in

Page 196: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Layout

•Layout determines the location of a resource.

•A layout is either serial or parallel.

•A serial layout specifies one node and one directory.

•A parallel layout specifies multiple nodes and multiple directories. It is permissible for the same node to be repeated.

Page 197: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Layout

•The location of a Dataset is one or more places on one or more disks.

•The location of a computing component is one or more directories on one or more nodes. By default, the node and directory is unknown.

•Computing components propagate their layouts from neighbors, unless specifically given a layout by the user.

Page 198: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Layout

(notice that all layouts are serial in this graph)

files on Node X

file on Node X

Q: On which node do the processing components run?A: On Node X.

Page 199: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Layout Determines What Runs Where

Node XNode W Node Y Node Z

Q: On which Node do the processing components run?

Page 200: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Layout Determines What Runs Where

Node XNode W Node Y Node Z

Page 201: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Layout Determines What Runs Where

Serial

Parallel

3-way multifile onNode X,Y,Zfile on Node W

Page 202: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Layout Determines What Runs Where

Node XNode W Node Y Node Z

Page 203: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Layout Determines What Runs Where

Serial Serial

Q: Where do the Reformat(s) run?

file on Node Wfile on Node W

Q: Serial or Parallel?

Page 204: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Controlling Layout

Propagate (default)Bind layout to thatof another component

Use layout of URL

Construct layoutmanually

Run on thesehosts

Page 205: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Multidirectory URL as a Layout

//host3/vol7/pC///host2/vol3/pB///host1/vol4/pA/

mfile://host1/u/jo/mfs

Layout specifies the locations of the partitions.

Each partition of a layout has:A host part (node to run on)A data part (directory for working storage)

Page 206: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Reining in the Parallel Beast

• Applications built with Ab Initio Software can combine all forms of parallelism.

• Layouts control the number of partitions of a parallel computation; that is, the degree of data parallelism.

• Phases control the number of components running at any one time; that is, the degree of component and pipeline parallelism.

Page 207: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Phases

Phase 0 Phase 1

Page 208: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Phases

•Breaking an application into phases limits the contention for:•Main memory.

•Processor(s).

•Breaking an application into phases costs:•Disk space.

Page 209: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Checkpoints

•Since data is staged to disk between phases, one can arrange to use that data to “start from the middle” should something go wrong.

•Any phase break can be a checkpoint.

Page 210: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The Phase Toolbar

Select Phase Number

View Phase Set Phase

A Toggle between:Phase (P), and Checkpoint After Phase (C)

Page 211: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Anatomy of a Running Job

What happens when you push the “Run” button?

• Your graph is translated into a script that can be executed in the Shell Development Environment.

• This script and any metadata files stored on the GDE client machine are shipped (via FTP) to the server.

• The script is invoked (via REXEC or TELNET) on the server.

• The script creates and runs a job that may run across many nodes.

• Monitoring information is sent back to the GDE client.

Page 212: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Anatomy of a Running Job

•Host Process Creation•Pushing “Run” button generates script.•Script is transmitted to Host node.•Script is invoked, creating Host process.

Client Host Processing nodes

GDE

Host

Page 213: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Anatomy of a Running Job

•Agent Process Creation•Host process spawns Agent processes.

Client Host Processing nodes

GDE

Host

Agent Agent

Page 214: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Anatomy of a Running Job

•Component Process Creation•Agent processes create Component

processes on each processing node.

Client Host Processing nodes

GDE

Host

Agent Agent

Page 215: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Anatomy of a Running Job

• Component Execution• Component processes do their jobs.• Component processes communicate directly with

datasets and each other to move data around.

Client Host Processing nodes

GDE

Host

Agent Agent

Page 216: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Anatomy of a Running Job

•Successful Component Termination•As each Component process finishes with its

data, it exits with success status.

Client Host Processing nodes

GDE

Host

Agent Agent

Page 217: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Anatomy of a Running Job

• Agent Termination• When all of an Agent’s Component processes exit,

the Agent informs the Host process that those components are finished.

• The Agent process then exits.

Client Host Processing nodes

GDE

Host

Page 218: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Anatomy of a Running Job

• Host Termination• When all Agents have exited, the Host process

informs the GDE that the job is complete.

• The Host process then exits.

Client Host Processing nodes

GDE

Host

Page 219: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Anatomy of a Running Job

• Abnormal Component Termination•When an error occurs in a Component

process, it exits with error status.•The Agent then informs the Host.

Client Host Processing nodes

GDE

Host

Agent Agent

Page 220: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Anatomy of a Running Job

• Abnormal Component Termination

•The Host tells each Agent to kill its Component processes.

Client Host Processing nodes

GDE

Host

Agent Agent

Page 221: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Anatomy of a Running Job

• Agent Termination• When every Component process of an Agent have

been killed, the Agent informs the Host process that those components are finished.

• The Agent process then exits.

Client Host Processing nodes

GDE

Host

Page 222: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Anatomy of a Running Job

• Host Termination•When all Agents have exited, the Host

process informs the GDE that the job failed.•The Host process then exits.

Client Host Processing nodes

GDE

Host

Page 223: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

To View or Edit the Script

“Edit Script” button

Lines beginning with“mp” are ShellDevelopment Environmentdirectives

Page 224: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Connecting the GDE to the Server

Hostname of server

User ID

Password

Page 225: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Sample Applications

•Loading a Data Warehouse

•Extracting a Data Mart

•Data Cleansing

•Rehosting a Database

Page 226: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Loading a Data Warehouse

Page 227: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Extracting a Data Mart

Page 228: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Data Cleansing

Page 229: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Rehosting a Database

Page 230: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Rehosting a Database - example run

Page 231: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Texas Massachusetts

Page 232: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Page 233: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Topics for Developers:Partitioners,

Multistage Components, Lookup Files,

and More

A Practical Introduction toAb Initio Software

Page 234: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Outline

Digging Deeper•Partitioners Revisited• Introduction to Multi-stage Transforms•Online Examples•Controlling Rejects via Limit/Ramp•Lookup Tables•Multifile Creation

Page 235: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partitioning Review

• For the various partitioning components:• Is it Key-based? Does the problem require a

key-based partition?•Performance: Are the partitions balanced or

skewed?

Fan-out Flow

Page 236: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partitioning: Performance

Balanced:Processors get neithertoo much nor too little.

Skewed:Some processors get

too much, others too little.

Partition 0

Partition 1

Partition 2

Partition 3

Partition 0

Partition 1

Partition 2

Partition 3

Page 237: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Sample Data to be Partitioned

• Customers• 42John 02116 30

• 43Mark 02114 9

• 44Bob 02116 8

• 45Sue 02241 92

• 46Rick 02116 23

• 47Bill 02114 14

• 48Mary 02116 38

• 49Jane 02241 2

• Customers• 42John 02116 30

• 43Mark 02114 9

• 44Bob 02116 8

• 45Sue 02241 92

• 46Rick 02116 23

• 47Bill 02114 14

• 48Mary 02116 38

• 49Jane 02241 2

record decimal(2) id; string(5) name; decimal(5) zipcode; decimal(3) amount; string(1) newline;end

Page 238: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partition by Round-robin

Customers

42John 02116 30

45Sue 02241 92

48Mary 02116 38

Customers

42John 02116 30

45Sue 02241 92

48Mary 02116 38

Customers

43Mark 02114 9

46Rick 02116 23

49Jane 02241 2

Customers

43Mark 02114 9

46Rick 02116 23

49Jane 02241 2

Customers

44Bob 02116 8

47Bill 02114 14

Customers

44Bob 02116 8

47Bill 02114 14

Partition 0 Partition 1 Partition 2

Page 239: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partition by Round-robin

• Not key based.• Results in very well balanced data,

especially with block-size of 1.• Useful for record-independent parallelism.

Page 240: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partition by Key

Customers

43Mark 02114 9

45Sue 02241 92

47Bill 02114 14

49Jane 02241 2

Customers

43Mark 02114 9

45Sue 02241 92

47Bill 02114 14

49Jane 02241 2

Customers

42John 02116 30

44Bob 02116 8

46Rick 02116 23

48Mary 02116 38

Customers

42John 02116 30

44Bob 02116 8

46Rick 02116 23

48Mary 02116 38

partition on zipcode:

Page 241: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partition by Key often followed by a Sort

Customers

43Mark 02114 9

47Bill 02114 14

45Sue 02241 92

49Jane 02241 2

Customers

43Mark 02114 9

47Bill 02114 14

45Sue 02241 92

49Jane 02241 2

Customers

42John 02116 30

44Bob 02116 8

46Rick 02116 23

48Mary 02116 38

Customers

42John 02116 30

44Bob 02116 8

46Rick 02116 23

48Mary 02116 38

Sort on zipcode:

Totals by Zipcode

02114 23

02241 94

Totals by Zipcode

02114 23

02241 94

Totals by Zipcode

02116 99

Totals by Zipcode

02116 99

Rollup by zipcode:

Page 242: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partition by Key

•Key-based.

•Usually results in well balanced data.

•Useful for key-dependent parallelism.

Page 243: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partition by Expression

Customers

42John 02116 30

43Mark 02114 9

44Bob 02116 8

46Rick 02116 23

47Bill 02114 14

49Jane 02241 2

Customers

42John 02116 30

43Mark 02114 9

44Bob 02116 8

46Rick 02116 23

47Bill 02114 14

49Jane 02241 2

Customers

48Mary 02116 38

Customers

48Mary 02116 38Customers

45Sue 02241 92

Customers

45Sue 02241 92

Expression: amount/33

Page 244: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partition by Expression

• Key-based, depending on the expression.

• Resulting balance very dependent on expression and on data.

• Various application-dependent uses.

Page 245: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partition by Range

Customers

43Mark 02114 9

44Bob 02116 8

49Jane 02241 2

Customers

43Mark 02114 9

44Bob 02116 8

49Jane 02241 2

Customers

46Rick 02116 23

47Bill 02114 14

Customers

46Rick 02116 23

47Bill 02114 14

Customers

42John 02116 30

45Sue 02241 92

48Mary 02116 38

Customers

42John 02116 30

45Sue 02241 92

48Mary 02116 38

With splitter values of 9 and 23:

Page 246: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Range+Sort: Global Ordering

Customers

49Jane 02241 2

44Bob 02116 8

43Mark 02114 9

Customers

49Jane 02241 2

44Bob 02116 8

43Mark 02114 9

Customers

47Bill 02114 14

46Rick 02116 23

Customers

47Bill 02114 14

46Rick 02116 23

Customers

42John 02116 30

48Mary 02116 38

45Sue 02241 92

Customers

42John 02116 30

48Mary 02116 38

45Sue 02241 92

Sort following a partition by range:

Page 247: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partition by Range

•Key-based.

•Resulting balance dependent on set of splitters chosen.

•Useful for “binning” and global sorting.

Page 248: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partition with Load Balance

Customers

42John 02116 30

43Mark 02114 9

44Bob 02116 8

49Jane 02241 2

Customers

42John 02116 30

43Mark 02114 9

44Bob 02116 8

49Jane 02241 2

Customers

45Sue 02241 92

Customers

45Sue 02241 92Customers

46Rick 02116 23

47Bill 02114 14

48Mary 02116 38

Customers

46Rick 02116 23

47Bill 02114 14

48Mary 02116 38

if middle node highly loaded:

Page 249: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partition by Load Balance

• Not key-based.

• Results in skewed data distribution to complement skewed load.

• Useful for record-independent parallelism.

Page 250: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partition with Percentage

Customers

42John 02116 30

43Mark 02114 9

44Bob 02116 8

45Sue 02241 92

Customers

42John 02116 30

43Mark 02114 9

44Bob 02116 8

45Sue 02241 92

Customers

...

Customers

...

Customers

46Rick 02116 23

47Bill 02114 14

48Mary 02116 38

49Jane 02241 2

Customers

46Rick 02116 23

47Bill 02114 14

48Mary 02116 38

49Jane 02241 2

With percentages: 4, 20

The next 16 recordswould go here, and the next 76 records would go here

Page 251: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Partition by Percentage

• Not key-based

• Results in usually skewed data distribution conforming to the provided percentages.

• Useful for record-independent parallelism.

Page 252: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Broadcast (as a Partitioner)

Unlike all other partitioners which write a record to ONE outputflow, Broadcast writes each record to EVERY output flow.

Customers42John 02116 3043Mark 02114 944Bob 02116 845Sue 02241 9246Rick 02116 2347Bill 02114 1448Mary 02116 3849Jane 02241 2

Customers42John 02116 3043Mark 02114 944Bob 02116 845Sue 02241 9246Rick 02116 2347Bill 02114 1448Mary 02116 3849Jane 02241 2

Customers42John 02116 3043Mark 02114 944Bob 02116 845Sue 02241 9246Rick 02116 2347Bill 02114 1448Mary 02116 3849Jane 02241 2

Customers42John 02116 3043Mark 02114 944Bob 02116 845Sue 02241 9246Rick 02116 2347Bill 02114 1448Mary 02116 3849Jane 02241 2

Customers42John 02116 3043Mark 02114 944Bob 02116 845Sue 02241 9246Rick 02116 2347Bill 02114 1448Mary 02116 3849Jane 02241 2

Customers42John 02116 3043Mark 02114 944Bob 02116 845Sue 02241 9246Rick 02116 2347Bill 02114 1448Mary 02116 3849Jane 02241 2

Page 253: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Broadcast

•Not key-based

•Results in perfectly balanced partitions

•Useful for record-independent parallelism.

Page 254: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Summary of Partitioning Methods

Method Key-based? Balancing? Uses Key Yes Good Key-dependent parallelism Expression Yes Depends on data & expression Application specific Range Yes Depends on splitters Key-dependent parallelism,

global sorting Round-robin No Good Record-independent

parallelism Load Balance

No Depends on load Record-independent parallelism

Percentage No Depends on percentages given Record-independent parallelism

Page 255: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Multistage Transform Components

• These components take several sets of rules to tell them how data is to be transformed in several stages.

• Each set of rules (in the form of one transform function) determines how each stage of the transformation will proceed.

• Stages include: input selection, initialization, iteration, finalization, output selection, and more.

Page 256: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Packages Hold Types and Functions

•Multistage transform components are driven by packages:

Temporary type

Initialization stage

Iteration stage

Finalization stage

“Helper” function

Page 257: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Rollup is a Multistage Component

• By default, Rollup comes up in “Wizard” mode.

• To access the full power of this component, switch to Package Mode once you are in the transform editor. (do View -> Package)

Page 258: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Rollup

• Rollup performs a general aggregation of data.

• Rollup has these stages:• Key Change key_change

• Input Selection input_select

• Initialization initialize (required)

• Rollup rollup (required)

• Finalization finalize (required)

• Output Selection output_select

Page 259: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Data Aggregation

0345Smith Bristol 560212Spade London 80322Jones Compton 120492West London 230121Forth Bristol 70221Black New York 42

0345Smith Bristol 560212Spade London 80322Jones Compton 120492West London 230121Forth Bristol 70221Black New York 42

Bristol 63Compton 12London 31New York 42

Bristol 63Compton 12London 31New York 42

Page 260: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Data Aggregation of Sorted/Grouped Input

0345Smith Bristol 560121Forth Bristol 70322Jones Compton 120212Spade London 80492West London 230221Black New York 42

0345Smith Bristol 560121Forth Bristol 70322Jones Compton 120212Spade London 80492West London 230221Black New York 42

Bristol 63Compton 12

London 31New York 42

Bristol 63Compton 12

London 31New York 42

Page 261: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Input record format: record decimal(4) id; string(6) name; string(8) city; decimal(3) amount; end

Output record format: record string(8) city; decimal(4) sum; end

Initialization: tmp = 0;

Calculation (loop): tmp = tmp + amount;

Result: sum = tmp;

Aggregation Calculation for each City

Page 262: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Data Aggregation in the Rollup Transform

• temp:• record

• decimal(4) sum;

• end

in: record decimal(4) id; string(6) name; string(8) city; decimal(3) amount; end

out: record string(8) city; decimal(4) sum; end

record name

use a descriptive name

Initialization: temp.sum = 0;

Calculation (loop): temp.sum = temp.sum + in.amount;

Result: out.sum = temp.sum;

Page 263: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The Temporary Variable

•Multistage transform components provide a temporary variable which may be used to carry information between stages.

•Multiple pieces of information may be conveyed from stage to stage by having multiple fields in the temporary type.

Page 264: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Package Editor: creating temporary type...

Page 265: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

…and rollup code

Page 266: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

type temporary_type =record decimal(4) sum;end

temp::initialize(in) =begin temp.sum :: 0;end;

temp::rollup(temp, in) =begin temp.sum :: temp.sum + in.amount;end;

out::finalize(temp, in) =begin out.city :: in.city; out.sum :: temp.sum;end;

Text Representation of Rollup Aggregation

Page 267: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Rollup: ...

Initialize: ...

Finalize: ...

Do for first recordin each group

temp:

in:

out:

Do for every recordin each group

Do for last recordin each group

A Look Inside the Rollup Component

Page 268: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Normalization

H002 Smith 3 1994.03.23 Jane Bill ThomH003 Jones 2 1993.02.12 Andy ElleH004 Lee 1 1994.08.15 LoriH008 Ruben 2 1993.10.22 Eric Anne

Jane SmithBill SmithThom SmithAndy JonesElle JonesLori LeeEric RubenAnne Ruben

Page 269: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Inside Normalize

length

len

normalize

len = length(input);

for index = 0 to len-1 output = normalize(input, index);

for index = 0 to len-1

index

Page 270: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Online Example of Normalize

Open this example graph:Examples… DML… Transforms … Normalize

View input data, run the graph, and view the output data.

Examine the Normalize parameters.

Page 271: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Denormalize

•Denormalize generates one output record for a group of input records.

•Denormalize has these stages:• Input Selection input_select

• Initialization initialize

• Initialization initial_denormalization (required)

•Rollup rollup

•Denormalize denormalize (required)•Finalization finalize

•Output Selection output_select

Page 272: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Denormalization

Smith 3 Jane Bill ThomJones 2 Andy ElleLee 1 LoriRuben 2 Eric Anne

Jane SmithBill SmithThom SmithAndy JonesElle JonesLori LeeEric RubenAnne Ruben

Page 273: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Online Examples of Denormalize

Open either of these example graphs:Examples… DML… Transforms … DenormalizeExamples… DML… Transforms … Denorm-rollup

View input data, run the graph, and view the output data.

Examine the Normalize parameters.

Page 274: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Online Examples of Transform Components

See: Help…Examples…DML…Transformsfor a number of graphs that demonstrate transformcomponent usage.

Page 275: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Join

• Join performs a join of inputs. By default, the inputs to join must be sorted and an inner join is computed.

• Options:

• join-type: Inner, Outer or Explicit (other).

• dedupn: Call the transform function only once for any matching record on input n. Defaults to false.

• record-requiredn: Call transform function for all keys, even if there is not a matching record for input n. Defaults to true. Only used if join-type is Explicit.

Page 276: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

An inner join produces an output record only when a given key is present on ALL inputs. If the key is duplicated on any input, each (duplicate) key is matched with the other inputs.

in0 in1 resulta,me b,7 b,we,7b,we b,8 b,we,8b,she c,9 b,she,7c,he b,she,8d,us c,he,9

Inner Join:

Page 277: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

A full outer join produces an output record whether there is a match for a given key on an input or not. If the key is duplicated on any input, each (duplicate) key is matched with the other inputs. The user should provide default values. in0 in1 resulta,hi b,7 a,hi,999b,lo b,8 b,lo,7c,bye c,9 b,lo,8

d,1 c,bye,9d,XXX,1

Full Outer Join:

Page 278: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Joins can be arbitrarily complex in Ab Initio

The Join component is capable of combining its input in many ways. It is also capable of combining more than two inputs.

See the Component Reference or the Online Help for complete information about Join.

Page 279: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Controlling Rejects: When First/Never Are Not Enough

•Sometimes it is desirable to exercise more control over when to abort a graph than is possible with “Never Abort” or “Abort on first reject”. The choice “Use limit/ramp” allows for other possibilities...

Page 280: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Instrumentation Parameters:Limit, Ramp

• Limit: Number of errors to tolerate.

• Ramp: Scale of errors to tolerate per input. Similar to percentage in fractional form.

Page 281: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Typical Limit and Ramp Settings

• Limit = 0 Ramp = 0.0Abort on any error.

• Limit = 50 Ramp = 0.0Abort after 50 errors.

• Limit = 100 Ramp = 0.01Abort if more than 1 record in 100 causes error, but only after processing 100 records.

Page 282: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Lookup Files

• DML provides a facility for looking up records in a dataset based on a key:

lookup(”file-name”, key-expression)

• The data is read from a file into memory.

• The GDE provides a Lookup File component as a special dataset with no ports.

Page 283: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Using lookup instead of Join

Using Last-Visitsas a lookup file

Page 284: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Configuring a Lookup File1. Label used as name in lookup expression

3. Set record format

2. Browse for pathname 4. Set key

Page 285: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Using lookup in a Transform Function

• Transform function:• out :: lookup_info(in) =

• begin

• out.id : : in.id;

• out.city : : in.city;

• out.amount : : in.amount;

• out.dt :1 : lookup(”Last-Visits”, in.id).dt;

• out.dt :2 : ”1900/01/01”;

• end;

Input 0 record format:record decimal(4) id; string(6) name; string(8) city; decimal(3) amount;end

Output record format:record decimal(4) id; string(8) city; decimal(3) amount; date(”YYYY/MM/DD”) dt;end

Page 286: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Multifile Commands

Roles of people in an Ab Initio Project

• Normally the SA for the project manages the multifile systems (with input from the team).

Suggested Directory Structures in a Project

• Ab Initio has a white paper and course modules on the significance of environment in a project.

Utilities for multifile structures

• The Co>Operating System reference guide describes the “m_” commands, some of which, follow.

Page 287: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The m_mkfs Command

m_mkfs mfs-url dir-url1 dir-url2 ...

• Creates a multifile system rooted at mfs-url and having as partitions the new directories dir-url1, dir-url2, ...

$ m_mkfs //host1/u/jo/mfs3 //host1/vol4/dat \ //host2/vol3/dat //host3/vol7/dat

$ m_mkfs my-mfs dir1 dir2 dir3

Page 288: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The m_mkdir Command

m_mkdir url

• Creates the named multidirectory. The url must refer to a pathname within an existing multifile system.

$ m_mkdir mfile:my-mfs/subdir

$ m_mkdir mfile://host2/tmp/temp-mfs/dir1

Page 289: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The m_ls command

m_ls [option...] url [url...]

• Lists information on the file or directories specified by the urls. The information presented is controlled by the options, which follow the form of ls.

$ m_ls -ld mfile:my-mfs/subdir

$ m_ls mfile://host2/tmp/temp-mfs

Page 290: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise C: Multifile Commands

1.Create a three-partition multifile system named mfs-3way.

2.Create two directories within mfs-3way named dir1 and dir2.

3.Use m_ls to list the contents of mfs-3way.

Page 291: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise D: Using Multifiles

1.Use m_ls to examine other multidirectories and multifiles (in particular, mfs-2way).

2.Use ls to examine the control and data partitions of mfs-2way.

Page 292: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Page 293: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

IDB Database

Page 294: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Setting Up

•Ensure the Accounts, Sites, and Transactions datasets from Figure-07 of the intro training class are available

•Create the dbc config file for your database

•Load the data from each of the datasets into the database – remember to set necessary radio buttons on access tab

Page 295: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Database Config File

•Click Config File and New to create a config file for your DBMS and instance.

Page 296: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Database Config File

•A window with the available DBMS types will pop-up.

•Choose the one you want.

Page 297: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Database Config File

•The Database Configuration file will be brought into an editor for you to fill in the necessary information

Page 298: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Database Config File

• dbms: oracle

• ## REQUIRED. Do not change the value of this tag from oracle.

• db_version: 8.0.5 ## REQUIRED. Enter the Oracle version number.

• db_home: c:/orant ## REQUIRED. Enter the Oracle home directory.

• db_name: NTORCL ## often just the SID

• db_nodes: laptop-12 ## can be multiple

• user: ${MY_USERNAME} ## use environment variables

• password: ${MY_PASSWORD} ## so as not to hardcode

• case: lower ## dml from dbms in lowercase

• ##column_delimiter:

• generate_dml_with_nulls: false

• fixed_size_dml: false

• treat_blanks_as_null: true

• ##local_db_version:

• ## environment:

• direct_parallel: false

Page 299: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

setup-idb-training.mp

These reformats do nothing but are required for the NT Oracle version to carry the fields to the database tables.

Page 300: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Load Account Table Parameters: Access

• If truncating, append or replace are irrelevant.

Page 301: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

modify_accounts.mp

Page 302: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

update_account.mp

Page 303: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Update Table Parameters

Page 304: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

updateSqlFile & insertSqlFile

update accounts set address = :address where acct_id = :acct_id

insertSqlFile

insert into accounts values (:acct_id,:acct_name,:address)

updateSqlFile

Page 305: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Log file

laptop-12.abinitio.com|Thu May 03 10:21:45 2001|Gather_Logs.000||start|Start|

laptop-12.abinitio.com|Thu May 03 10:21:45 2001|Update_Table_Accounts.000|update|start||

laptop-12.abinitio.com|Thu May 03 10:21:46 2001|Update_Table_Accounts.000|update|sql|

Primary SQL supplied: update accounts set address =

:address where acct_id = :acct_id|

laptop-12.abinitio.com|Thu May 03 10:21:46 2001|Update_Table_Accounts.000|update|sql|

Secondary SQL supplied: insert into accounts values

(:acct_id,:acct_name,:address)|

laptop-12.abinitio.com|Thu May 03 10:21:48 2001|Update_Table_Accounts.000|update|finish|

10 records read

10 rows updated by SQL1

0 records sent to SQL2

0 rows updated by SQL2

0 records rejected|

laptop-12.abinitio.com|Thu May 03 10:21:48 2001|Gather_Logs.000||finish|End|

Page 306: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

check_updates.mp

Page 307: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Browsing the database

Page 308: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Selecting from available

Page 309: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Input Table Properties: Select Statement

Page 310: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Run SQL

Page 311: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

delete_rows.sql

Page 312: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Log file (shortened for clarity)

SQL File to run: c:\data\training\data-for-training\delete_rows.sql|

SQL*Plus: Release 8.0.5.0.0 - Production on Thu May 3 12:12:8 2001|

(c) Copyright 1998 Oracle Corporation. All rights reserved.|

Connected to:|Oracle8 Enterprise Edition Release 8.0.5.0.0 - Production|

PL/SQL Release 8.0.5.0.0 - Production|

COUNT(*)|---------| 1000|

3 rows deleted.|

COUNT(*)|---------| 997|

Commit complete.|

Disconnected from Oracle8 Enterprise Edition Release 8.0.5.0.0 - Production|

PL/SQL Release 8.0.5.0.0 - Production|

Page 313: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Input Table (direct to output file = unload)

Page 314: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Input Table Parameters: Source

Page 315: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Serial unload: Select Statement

Page 316: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Parallel Unload: ABLOCAL(tablename)

Page 317: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Parallel Unload: ABLOCAL()

Page 318: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

ablocal_expr

ablocal_expr: if (this_partition() == 0) “acct_id <= 112347000” else if (this_partition() == 1) “acct_id > 112347000” else “1 = 2”

Page 319: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Testing and Validation:Techniques and Strategies

Page 320: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Testing and Validation

•Components•DML Features•Data Cleansing•Generating Test Data•Testing Strategies

Page 321: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Components for Testing and Validation

• In Component Organizer: Validate Category• Check Order

• Compare Checksums

• Compare Records

• Compute Checksum

• Generate Random Bytes

• Generate Records

• Validate Records

• In Other Categories:• Intermediate File (Datasets)

• Trash (Miscellaneous)

• Dedup Sorted (Transform)

Page 322: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Compare Records

Page 323: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Generate Records and Validate Records

Page 324: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

DML: Validation Function Fields

Function fields with names that begin with “is_valid” are called to check validity of data.

record string(20) x; decimal(5) y; int is_valid_y() = (y > 0); end

Page 325: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Data Cleansing with Reformat

Page 326: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Data Cleansing with Reformat

out :: cleanse(in) = begin out.x : : in.x; out.y :1: if (in.y < 0) -in.y; out.y :2: 99999; end;

Page 327: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Generating Test Data

• Generate Records component has many options: sequential values, values of expressions, etc.

• Data can be produced “by hand” and reformatted in any way desired.

• Data that is to be joined can be produced by generating “wide” records and reformatting into different data sets.

• Real data can be sampled/selected to produce test data.

• Test data can be combined from multiple sources.

Page 328: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Generating Test Data

Page 329: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Using Intermediate Files to “Capture” Data

Page 330: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Testing Strategies

•Mix generated data with select special cases.

•Test serial form of application first.

•Test parallel form only after serial version passes.

•Use production data to flesh out test data.

•Use Intermediate Files to capture intermediate results

Page 331: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Appendix A: Additional Exercises

A Practical Introduction toAb Initio Software

Page 332: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The Problem

A phone company has accounts (customers) who have switches at a number of sites (locations). Each site can both send and receive (from-site and to-site). The switches at the sites record transactions that include the from-site, the to-site, the date, the time of day, and the duration of the call. The phone company wants to do some analysis on this data.

Page 333: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The Data (see Figure 07)These exercises will make use of:

•Accounts dataset: acct.dat, acct.dml.Records for each account.

•Sites dataset: site.dat, site.dml.Records for each account’s site.

•Transactions dataset: trans.dat, trans.dml.Records for transactions between sites.

Open figure-07.mp where these datasets are defined.

Page 334: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

The Scenario - High Level ER Diagram

Accountsacct_idacct_nameaddress

Sitessite_idacct_idaddress

Transactionsfrom_siteto_sitedthhmmssduration

1M 1

N

1 M

Page 335: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Data format and sample: Accounts

• record

• decimal(9) acct_id;

• string(15) acct_name;

• string(20) address;

• string(1) newline = "\n";

• end

112346893Johnson Paints 2303 Appian Way112342374Jackson Stone 419 Rockville Pkwy112346225Kendall Drug One Main Street112391676Stanfill Flower7286A Befug Ave100246677Sihebosev Toys 59520 Lico St

112346893Johnson Paints 2303 Appian Way112342374Jackson Stone 419 Rockville Pkwy112346225Kendall Drug One Main Street112391676Stanfill Flower7286A Befug Ave100246677Sihebosev Toys 59520 Lico St

Page 336: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

record decimal(8) site_id; decimal(9) acct_id; string(20) address; string(1) newline = "\n";end

33213432112347574Fourteen Helime Blvd3321365411234237442 Babcock Way332132341123468938288 Main St332139981123423741232 Center St332102121123462255061 Pollard Rd

33213432112347574Fourteen Helime Blvd3321365411234237442 Babcock Way332132341123468938288 Main St332139981123423741232 Center St332102121123462255061 Pollard Rd

Data format and sample: Sites

Page 337: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

record decimal(8) from_site; decimal(8) to_site; date("YYYYMMDD") dt; decimal(6) hhmmss; decimal(5) duration; string(1) newline = "\n";end

332136543321323419940428072929 236332134323321399819940429082345 102332102123321343219940430125202 2310332136543321323419940403142811 39

332136543321323419940428072929 236332134323321399819940429082345 102332102123321343219940430125202 2310332136543321323419940403142811 39

Data format and sample: Transactions

Page 338: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 1:Early Analysis

Build a non-partitioned application that processes the sites dataset to produce a dataset with the number of sites for each account.

Page 339: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Build a non-partitioned application that processes the sites dataset to produce a dataset with the number of sites for each account

• Things to Consider• What fields do you need on the output?• (Hint: fewer is better.)

• What datasets do you need to process?• (Hint: fewer is much better.)

• What are the steps you need to take?• (Hint: fewer is better.)

Things to Consider

Page 340: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 2:

• Include the Account Name in the information from the last exercise

• Modify the previous application to produce a dataset that includes the account name of every account in the accounts dataset, and the number of sites per account you just computed.

Page 341: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 3:

•Practice Going Parallel with Data

•Make your solution to Exercise 2 run with parallel data streams.

Page 342: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 4:Further Analysis

•Build a data parallel application that processes the transactions dataset to produce a dataset that contains, for each site:•The number of transactions made from the

site.•The sum of the durations of all transactions

made from the site.

Page 343: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 5:Yet More Analysis

•Modify the previous application to produce two serial datasets: one sorted by number of transactions (descending) and another sorted by duration of all transactions (descending).

Page 344: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 6:Marketing’s Real Question

•Which named accounts are our “best” accounts based on frequency and duration?

•Build a data parallel application that finds accounts that are both frequent (20 or more transactions) and long total duration (greater than 10000). Include the account name in your answer.

Page 345: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 7:Review/Revise for Efficiency

•Use the output select stage in the Package Editor of Rollup to do the filter specified in Exercise 6.

•Don’t forget to document that the selector is there.

Page 346: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 8:Further Revise for Efficiency

•Replace one of the JOIN components with a lookup table.

Hint: Make ACCOUNTS.dat a lookup table.

Page 347: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Exercise 9: How many within account transactions?

•How many From_Site / To_Site combinations are within one account?

Page 348: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Page 349: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Appendix B: Setting up the Graphical Development

Environment

A Practical Introduction toAb Initio Software

Page 350: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Setting Up the GDE

From menu bar: Run... Settings...

Edit

Page 351: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Editing the Host Profile

Hostname of server

User ID

Password

Some environmentsmay require other settings

Page 352: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Host Profile Settings:

Connection:Host: hostLogin: loginPassword: password

Page 353: A Practical Introduction to Ab Initio v14

Confidential & Proprietary

Setting Up:Copying On-Line Materials

•From menu bar: Run… Execute Command…

• $AB_HOME/examples/intro-course/set-up-

training