14
More ETL

More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from

Embed Size (px)

Citation preview

Page 1: More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from

More ETL

Page 2: More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from

ETL in a nutshell

• ETL is an abbreviation of the three words Extract, Transform and Load.

• It is an ETL process to – extract data, mostly from different types

of systems, – transform it into a structure that's more

appropriate for reporting and analysis– finally load it into the database and or

cube(s).

Page 3: More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from

Logical Data Map

• Used to collect and document source systems to be used for DW

• Should contain the following:– Target table name– Target column name– Table type– SCD type– Source db– Source table name– Source column name– Transformation

Page 4: More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from

ETL in a nutshell

• Extract from source• Transform the data• Load the data into the

datawarehouse

Page 5: More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from

Do not forget

• Data Profiling• Data Quality Control• Metadata Management

Page 6: More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from

ETL Tools1. Oracle Warehouse Builder (OWB) Oracle 2. Data Services SAP Business Objects 3. IBM Information Server (Datastage) IBM4. SAS Data Integration Studio SAS Institute 5. PowerCenter Informatica 9.0 Informatica 6. Elixir Repertoire 7.2.2 Elixir7. Data Migrator 7.7 Information Builders 8. SQL Server Integration Services (SSIS) Microsoft 9. Talend Open Studio & Integration Suite Talend 10.DataFlow Manager 6.5 Pitney Bowes

Business Insight

Page 7: More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from

On ETL Tools

• All the ETL vendors say the same thing: “our biggest competitor is custom code and stored procedures”. There are still a large number of organizations building data warehouses without using an ETL tool, writing their own, mostly very complex, SQL statements, often difficult to maintain.

• ETL tools can increase productivity by a factor of three to five. Two of the most important features in this case are that these tools are completely meta data driven, and work with an easy to understand graphical user interface (GUI).

Page 8: More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from

On ETL Tools (WYSWYG)

• Tools with data driven, embedded data driven approach. They can show you, at any time you choose, how the data will be transformed and loaded into the data warehouse, without actually loading it. At any time, the user can press the ‘WYSIWYG’ button, and the ETL tool shows immediately what the results of an ETL flow would be

Page 9: More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from

The Future? EAI and ETL

• What are the similarities between Enterprise Application Integration and Extraction Transformation and Loading tools? Some people (analysts) think these technologies will converge to form one product, Enterprise Information Management. They both move data, often use the same metadata, so why shouldn't they merge into one package?

Page 10: More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from

ETL Tool Selection

• In many organisations there is no software selection process at all. Sometimes there is a corporate standard you need to comply to, or you may have a database or BI tool in place from a specific vendor, and you decide to buy an ETL tool from the same vendor, because you expect that these will work better together.

• Often organisations find it difficult and labor intensive to execute a software selection process in order to gather all the necessary data. 

Page 11: More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from

ETL Tool Selection: From Company Perspective

• Create data integration strategy• Define criteria in business terms• Create a short-list• Invite vendors for live demonstration• Perform Proof of Concept• Negotiate with Vendors• Close the Deal

Page 12: More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from

Selection Criteria

• Architecture• ETL Functionality• Ease-of-Use• Reusability• Debugging• Real-time• Connectivity• General ETL tool characteristics

Page 13: More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from

Selection Criteria

• Architecture– Symmetric Multiprocssing? Grid? etc

• ETL Functionality– Conditional Splitting? Pivoting? etc

• Ease-of-Use– Screen design? Task compatibility? etc

• Reusability– Decompisition? User-defined functions?

• Debugging– Breakpoints? Validators? etc

• Real-time– On demand data integration? etc

Page 14: More ETL. ETL in a nutshell ETL is an abbreviation of the three words Extract, Transform and Load. It is an ETL process to –extract data, mostly from

Long Test

• Date and Venue• Coverage