If you can't read please download the document
Upload
alex-meadows
View
2.810
Download
2
Embed Size (px)
Citation preview
Choosing the Right Steps in Pentaho Kettle
Alex MeadowsBI Engineer, iContactAugust RTP PUG Meetup
Kettle (PDI) The ETL Swiss Army Knife
Over 100 steps
Plugin Architecture
Scripting Steps
Which to use?!?!?
Example: Loading a Text File
Text File Input, right?
Will work for most text files
Most powerful of text file inputs
There are other options in PDI!
Find the one that closely matches what you're trying to do
Example: Sharded Databases
Default feature of database connections
Non-dynamic, so have to update as needed
Example: Sharded Databases
Needed a dynamic sharded list
Built job and transformation to read from table and perform function on each shard in table
Plugins Add More Functions
Community contributions
Teradata Bulk Loader
R/Weka Integration
Treated as siblings of native steps
All native steps are in essence plugins.
Many eventually become part of the core product.
Processing handled directly within the engine, just like native steps
Scripting Steps
Greatest functionality/flexibility
Executes/compiles at runtime
Can dramatically slow performance
If script is used in multiple places, turn it into a plugin for potentially better performance
Recommended Reading
Pentaho Solutions (general BI audience)
Pentaho Data Integration Beginner's Guide (beginner)
Pentaho Data Integration Cookbook (intermediate)
Pentaho Kettle Solutions (advanced)