Upload
nagainbox
View
235
Download
0
Embed Size (px)
Citation preview
8/3/2019 a - Advanced Topics
1/43
Advanced TopicsInformatica
8/3/2019 a - Advanced Topics
2/43
Topics of Discussion
Identifying Bottlenecks
Session Settings
Some tips from experience
8/3/2019 a - Advanced Topics
3/43
Identifying Bottlenecks
Target
Bottleneck
SourceBottleneck
Mapping
BottleneckSession
Bottleneck
System
Bottleneck
Writing to a slow target?
System not
optimized?
Reading from a slow source?
Transformation inefficiencies?
Session inefficiencies?
Identify Bottleneck
Performance
ok?
No
Yes
Done
8/3/2019 a - Advanced Topics
4/43
Target Bottleneck
Common causes of problems
Indexes or key constraints
Databases checkpoints
Small database network packet size Too many target instances in the mapping
Target table is too wide
Common solutions
Drop indexes and key constraints before loading and then
rebuild Use bulk loading wherever practical
Increase database network packet size
Decrease the frequency of database checkpoints
When using partitions consider partitioning the target table.
8/3/2019 a - Advanced Topics
5/43
Source Bottleneck
Common causes of problems Slow query
Index issues
Highly complex
Small database network packet size
Wide source tables
Common Solutions
Analyze the query with the help of show plan or other tools.
Consider using database optimizer hints when joining
several tables Consider indexing tables when you have order by or group
by clauses
Test source qualifier conditional filter versus filtering at thedatabase level
Increase database network packet size
8/3/2019 a - Advanced Topics
6/43
Mapping Bottleneck
Common causes of problems
Too many transformations
Unused links between ports
Too many input/output or output ports in an aggregator or
ranking transformations Unnecessary datatype conversions
Common Solutions
Eliminate transform errors
If several mappings read from the same source try singlepass reading
Optimize datatypes use integers for comparison
Dont convert back and forth between datatypes
Optimize lookups and lookup tables, using cache andindexing tables
Put the filters early in the dataflow, use simple filter
condition
8/3/2019 a - Advanced Topics
7/43
Mapping Bottleneck
Common Solutions For aggregators use sorted input integer columns to group
by and simplify expressions
Use reusable sequence generators, increase number of
cached values If you use the same logic in different data streams apply it
before the streams branch off
Optimize expressions
Isolate slow and complex expressions
Reduce or simplify aggregate functions Use local variables to encapsulate repeatedcomputations
Integer computations are faster than charactercomputations
Use operators rather than the equivalent functions, ||
faster than CONCAT()
8/3/2019 a - Advanced Topics
8/43
Session Bottleneck
Common causes of problems
Inappropriate memory allocation settings
Running in series rather than in parallel
Error tracing override set to high level
Common Solutions Experiment with DTM buffer pool and buffer block size
If your mapping allows it use partition
Run sessions in parallel with concurrent batches,whenever possible
Increase database commit interval Turn off recovery and decimal arithmetic (theyre off by
default)
Use debugger rather than high error tracing, always reduceyour tracing level during production runs
Dont stage your data if you can avoid it
8/3/2019 a - Advanced Topics
9/43
System Bottleneck
Common causes of problems
Slow network connections
Overloaded or under-powered servers
Slow disk performance Common Solutions
Get the best machines to run the servers
Use multiple CPUs and session partitioning
Make sure informatica servers and database servers are
closely located in your network If possible consider having informatica server and
database server on the same machine
8/3/2019 a - Advanced Topics
10/43
Identifying Bottlenecks
Examining session results
Read/write throughput
Rows failed
# of objects in the mapping Type of objects in the mapping
Examining Parallelism, Partitioning
How many objects In parallel/partitioned?
What's the size of the hardware
Source/Target/Database?
What kind of pipeline is setup for sourcing and targeting?
8/3/2019 a - Advanced Topics
11/43
Identifying Bottlenecks (2)
Source SQL/Lookup SQL
Group/Order bys
Distinct clauses
Where clause (filters) use of non-indexed fields Invalid plans
Database issues
Database connection configuration
Database instance configuration
8/3/2019 a - Advanced Topics
12/43
Identifying Bottlenecks (3)
Aggregator Problems
Too many multi level aggregates
Joiner Problems
Incorrect selection of master table Too many (or too wide) join columns
Not tuning the data and index caches
Rank Problems
Not tuning the data and index caches
8/3/2019 a - Advanced Topics
13/43
Identifying Bottlenecks (4)
Source or Target problems
Too many fields, width (precision) issues
Implicit data conversions
Update Strategies Too many targets per mapping
No use of the bulk-loader
Session Problems
Not enough RAM given to the session
Commit point too high
Commit point too low
Too many sessions running in parallel
8/3/2019 a - Advanced Topics
14/43
Top 10 Mapping Bottlenecks
1. Too many targets in a single mapping
2. Data width is too large (too many columns passingthrough the mapping)
3. Too many aggregators, lookups, joiners, ranks in themapping
4. Not tuning data/index settings for the above objects
5. Too many objects in a single mapping
6. Unused ports in Cached lookups
7. Source query/joins not tuned
8. Lookup query/cache not tuned
9. Ports passed through the mapping but not passed tothe target
10. Huge expressions
8/3/2019 a - Advanced Topics
15/43
Top 10 Session Mistakes
1. Not controlling the log file override
2. Not tuning the data and index caches for lookups, aggregators,ranks and joiners
3. Not Tuning the commit point to match the database performancesetup
4. Assuming that giving the mapping more memory will make it runfaster
5. Assuming that increasing the commit point will make it run faster
6. Not utilizing partitioning available
7. Running too many sessions in parallel on an undersized, over-utilized machine
8. Not architecting for failed rows
9. Not setting the line buffer length for flat files
10. Not testing the session for performance, when targeting a flat file
8/3/2019 a - Advanced Topics
16/43
Problems with Map
Multiple targets, single thread used for writeprocesses
Multiple aggregators, single thread used for moving
the data Stacked aggregators fight for memory, disk and cache
directory in a single session
8/3/2019 a - Advanced Topics
17/43
Problems with Maps (2)
Filter condition is too lengthy not optimized
Expression only performs a single calculation, forcingthe entire row processing when only the one field
should be flowed through Disk contention is high 4 targets, single writer
thread, I/O is a hotspot
8/3/2019 a - Advanced Topics
18/43
Expression/Filter Contention
Expression Filter Expression Filter
IIF (EmpId=
A and .. Or
..
..)
B_Rowpass =
IIF (EmpId=A and .. Or
...)
B_Rowpass
Expressions are built for evaluation speed
Filters take a different code path (slower)
Passing numeric integer to the filter keeps it fast (increases throughput
by 0.5x and 3x)
Filter expressions should be as simple as possible to maximize
throughput
8/3/2019 a - Advanced Topics
19/43
Aggregator Contention
Aggregator1
Aggregator1
Aggregator1
Aggregator1
Aggregator1
Aggregator1
Aggregator1
Parallel ExecutionSerial Execution
Single map multiple aggregator
Fight for disk I/O (Cache directory)Fight for RAM
Multiple pass aggregation of the entire data set
Session runs only as fast as the slowest
aggregator
All aggregation done in serial
Splitting the Aggs across Maps
Each map has its own I/O process threadsData is aggregated only once
Parallelism is increased
Amount of RAM per object is increased
All aggregation is done in parallel
8/3/2019 a - Advanced Topics
20/43
Update StragtegiesUpdate Target1
Update Target2
Update Target3
If each row must be examined speed will be negatively
impacted
Remove the update strategies through parallel mappings
Update strategies force each row to be analyzed
Update strategies dont work against flat files
8/3/2019 a - Advanced Topics
21/43
Steps to tuning
Make a copy of the map for each target
Remove all but one target from each copy
Work backwards from the target to the sourceeliminate
unused/unnecessary transformations
Simplify the mapping
Move the filters upstream to the source if possible
Move a large cached lookup into a joiner on source feed
Stage the target data if necessary, then use a bulk loader to
mass insert at high speeds
Tune the source SQL, and the session parameters
Tune the DB connection and the RDBMS
8/3/2019 a - Advanced Topics
22/43
What does Session Partitioning Do?
It separates the data into physical blocks, basically
reduces the amount of work that each load process has
to dobut increases the number of load process that
have to take placePartitioning is the method of splitting the data.
Execution of the partitioned loads is the parallelization
of the loading process
8/3/2019 a - Advanced Topics
23/43
Source Horizontal PartitioningSource/Target
A - L
M - S
T - Z
Provides best read ranges if you
know what data you are after
Allows for process parallelism
Breaks source data into
manageable parts
Caution: Requires additional
maintenance
Faster indexes
Allows parallel queries to be run
8/3/2019 a - Advanced Topics
24/43
Source Vertical PartitioningSource/Target
Provides smaller network packets
Allows increased parallelism /
increased parallel reads
Can provide better management
over wide tables
Potentially decreases I/Os on the
source side
Assists in the processing
component of the data movement
Cols
0-100
Cols
200-300
8/3/2019 a - Advanced Topics
25/43
Joiner Object
Master reads rows first into data and index caches
RAM is impacted depending on the rows read
Joiner fields are read into D&I
No control over D&I memory sizes
Contrary to popular belief, the joiner is not slow (compared to
cached lookup)
The joiner is a powerful object for performance tuning
especially when getting rid of cached lookups.
8/3/2019 a - Advanced Topics
26/43
Joiner Contention
Data and index cache must be calculated properly
Master-Detail Join: master should be the smaller table
Detail Outer Join: Master should be smaller table
Full Outer Join: Both tables should be relatively smaller in
size
Good for heterogeneous sources
Rarely necessary when staging table architecture is employed
Consider the size of shared memory for large scale join
operations
8/3/2019 a - Advanced Topics
27/43
Lookup Object
Initialization caches ALL data identified in the ports of the
lookup, including the data to be matched on
RAM is impacted depending on the number of rows read
The lookups two flaws are: It fights for resources duringexecution and its initialization speed is dependent on the speed
of the SQL beneath it.
WIDE lookups can soak up a lot of time, and space,
especially with the Order By clause thats
generated/appended to the SQL.
8/3/2019 a - Advanced Topics
28/43
Lookup ContentionLookups should always be cached when: small width andhuge number of rows (primary key only), or any width, and
small number of rows (less than 100,000). The exception is: If
the hardware has enough RAM to handle all the concurrent
sessionsand the cached lookup, then it is cached.
If the lookup is cached, the data and index cache are utilized.
Uncached lookups should be used when: extremely large
number of rows are sourced, or a wide table is sourced, or
RAM is scarce
An uncached lookup generates its own database connection
An uncached lookup should always be retrieved by primary
key
Cached or Uncached the connection to the database should
have the maximum packet size.
8/3/2019 a - Advanced Topics
29/43
Aggregator Object
Initialization caches ALL data identified in the ports of the
aggregator
RAM is impacted depending on the number of rows read
The aggregator absorbs all rows before pushing them to thetarget
WIDE Aggregates can soak up a lot of space, also, they
can increase I/Os if the data isnt sorted on the way in.
8/3/2019 a - Advanced Topics
30/43
Rank Object
A lot like aggregator, reads all rows in to the data and index
cache
RAM is impacted depending on the number of rows read.
WIDE Ranks can soak up space, also they can increase the
I/Os if data isnt sorted on the way in. All rows must be
evaluated to get the top or bottom X%.
8/3/2019 a - Advanced Topics
31/43
Sort Object
A lot like aggregator, reads all rows in to the data and index
cache
RAM is impacted depending on the number of rows read.
WIDE rows can soak up space. Depending on whatssetup for sorting on, the index can also grow large.
8/3/2019 a - Advanced Topics
32/43
Router Transformation
Total of 6 passes for 3 filters. This can double the amount of
work for passing a row.
Target1
Expression
Filter1
Filter2
Filter3 Target3
Target2
8/3/2019 a - Advanced Topics
33/43
Router Transformation
Compared to 6 passes of data, now there are 4. This will help
improve the performance.
Target1
Expression
Target3
Target2
Router
8/3/2019 a - Advanced Topics
34/43
Why use Bulk-Loaders?
Native (Internal) Connectivity
Build row sets into RAM blocks
Capable of bypassing logging mechanisms
Capable of being run in parallel (Synchronized withthe RDBMS parallel engine)
8/3/2019 a - Advanced Topics
35/43
Why NOT use Bulk-Loaders?
Limited to FLAT FILE sourcing
Provides for inserts only
Requires rigid input structure
Usually requires external scheduling resourcesMost loaders cannot be scripted within the database
Complex logic for inserts can slow Loaders
tremendously
8/3/2019 a - Advanced Topics
36/43
Bulk-Loaders Modes
Fast
Slow
The fast loads usually bypass R.I., and indexing.Even faster modes append to the target tables.
Slow loads run direct through the engine
(same as other applications)
Loaders will switch modes if certain criteria arent
met before the job starts.
8/3/2019 a - Advanced Topics
37/43
Important session settings
Shared Memory Size
Buffer Block Size
Line Width Size (If flat file source)
Data and Index Cache sizesCommit Point
Log Setting
Power Center: Partition Settings
8/3/2019 a - Advanced Topics
38/43
Session SpeedsSimple Map
Simple Maps run faster given more memory
There is a 1.8 GIG real mem limit to all maps
The performance depends on how many maps are running
in parallel
Eventually too much parallelism will cause slow down of
the entire system
Even the simple maps run as fast as the source and the
target
Complex MapEach thread is linked by memory managementas the
block sizes change, the thread speed slides
Speed tests indicate best average performance achieved
with 128k Block sizes and 24Mb RAM (Shared Memory)
8/3/2019 a - Advanced Topics
39/43
Alternatives to a lookup
Expression
Lookup Expression
Source Qual
Joiner
Lookups are to be used for relatively
smaller tables
Caches all the data identified in the
ports
Joiners cache only the keys based on
which the join happens
8/3/2019 a - Advanced Topics
40/43
Alternatives to a lookup
Expression
Lookup Expression
Src Qual
Joiner
This is an unconnected lookup
The lookup is used to get the values
based on a key and a a code
Agg
1 1 Txt1
1 2 Txt2
1 3 Txt3
1 Txt1 Txt2 Txt3
2 txta txtb txtc
Input O/P Result set
Exp
The lookup is replaced by the Source qualifier,
expression, aggregator and the joiner.
The expression is used to calculate the proper
values and the aggregator is used to keep therecord with the accurate information (which in
this case is the last record)
8/3/2019 a - Advanced Topics
41/43
Group by Clause
Src Qual Agg
A group by clause can be replaced by an aggregator with a
sorted input (if possible)
If the data is coming as sorted then the aggregator is always
faster than the group bys
8/3/2019 a - Advanced Topics
42/43
Deletes
Src Qual
Src1
Src2
Upd Tgt(Src1)
Source 2 is the driving table and Source 1
is the table from which the data is deleted
Normally this process gives the keys in the
table source 1 from where the data is to be
deleted based on some conditions and thenthe data is deleted accordingly
Source 2 contains a few record on a key
which forms a part of the composite
primary key in table 1
Source 2 is the only table that is used
The target instance contains update over-
ride, based on which the deletion happens
on the particular key coming from the
source table
Src2 Src Qual Upd Tgt(Src1)
8/3/2019 a - Advanced Topics
43/43
Thank YouJ