Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
`
Advance Scanner
Naveed Haider & Rakesh Varma KPrincipal Customer Success Technologist
2 © Informatica. Proprietary and Confidential.
Housekeeping Tips
Today’s Webinar is scheduled for 1 hour
The session will include a webcast and then your questions will be answered live at the end of the presentation
All dial-in participants will be muted to enable the speakers to present without interruption
Questions can be submitted to “All Panelists" via the Q&A option and we will respond at the end of the presentation
The webinar is being recorded and will be available to view on our INFASupport YouTube channel and Success Portal. The link will be emailed as well.
Please take time to complete the post-webinar survey and provide your feedback and suggestions for upcoming topics.
Feature Rich Success Portal
© Informatica. Proprietary and Confidential.
Product Learning Paths and Weekly Expert Sessions
Bootstrap trial and POC Customers
InformaticaConcierge with
Chatbot integrations
Enriched Customer Onboarding experience
Tailored training and content
recommendations
44 © Informatica. Proprietary and Confidential.
More Information
Success Portal
https://network.informatica.com
Communities & Support
Documentation
https://www.informatica.com/in/services-and-training/informatica-university.html
University
https://success.informatica.com https://docs.informatica.com
5 © Informatica. Proprietary and Confidential.
Safe Harbor
The information being provided today is for informational purposes only. The
development, release, and timing of any Informatica product or functionality
described today remain at the sole discretion of Informatica and should not be
relied upon in making a purchasing decision.
Statements made today are based on currently available information, which is
subject to change. Such statements should not be relied upon as a
representation, warranty or commitment to deliver specific products or
functionality in the future.
6 © Informatica. Proprietary and Confidential.
Agenda
• Advance Scanners
• Why Advance Scanners
• Advance Scanner Architecture
• Installation Requirements & Best Practice
• Demo – Oracle Procedure
• Roadmap
What are the EDC Advanced scanners ?
8 © Informatica. Proprietary and Confidential.8
Enterprise Data Catalog – Broadest Metadata connectivity
DatabaseOracleIBM DB2 LUWMicrosoft SQL ServerSybase ASEIBM NetezzaTeradataJDBCAzure SQL DBAzure SQL DWMySQLAmazon RedshiftGoogle Big QuerySnowflakeSAP Hana DB
Big DataCloudera NavigatorHortonworks AtlasHDFSHiveKafka
NOSQLCassandraMongoDB
CatalogAWS Glue
MainframeIBM DB2 z/OSIBM DB2 i5/OS3
BISAP Business ObjectsCognosTableauQlikViewMicrostrategyMicrosoft PowerBIOBIEEQlikSense
INFAInformatica PlatformInformatica CloudInformatica PowerCenterInformatica AxonBusiness GlossaryInformatica Data Integration HubInformatica Data Quality
ModellingErwinSAP PowerDesigner
FilesAmazon S3Azure BlobAzure Data Lake StoreMicrosoft SharePoint (2013, 2016, Online/Office 365)Microsoft OneDrive, OneDrive for BusinessGoogle Cloud StorageWindows/Linux
AppsSalesforceWorkdaySAP ECC
Metadata Connectivity is foundational to data cataloging & metadata management
Informatica provides the broadest metadata connectivity out of the box with EDC
Some ETL tools, Scripts and Complex sources are still missing from this list
Advance Scanner is the bridge for missing link
9 © Informatica. Proprietary and Confidential.
What is Advance Scanner
• An extension to current Enterprise Data Catalog (EDC) product
• Can extract Lineage from Non Informatica ETL Process and Scripts
• Metadata Extraction form Complex Sources
• Needs to be installed separately.
10 © Informatica. Proprietary and Confidential.
EDC Advanced Scanners Extends industry’s broadest and most complete metadata connectivity
• Stored Procedures for Oracle• Stored Procedures for SQL Server• Stored Procedures for IBM DB2• Stored Procedures for Netezza• Stored Procedures for Teradata
Code and Scripting• SAP BW• SAP BW/4HANA• SAS• Microsoft SSRS• Microsoft SSAS
Analytic Applications
• IBM DataStage• Microsoft SSIS
Multi-Vendor ETL• JCL• COBOL
Mainframe
Advanced scanners Architecture
12 © Informatica. Proprietary and Confidential.
Overall Architecture
• EDC Advance Scanner ( formally known as Metadex) is a Java based application
• Advance Scanner have its own UI to configure the scanners
• Can be installed on • Linux (can be in the same machine as EDC or separate)
• Windows (can be on the same machine as the EDC Agent or separate)
• Load metadata in EDC through custom scanner framework automatically
• Can be online (direct connection to EDC) / offline ( need manual steps to load the metadata)
EDC – Advance Scanner Deployment – Logical Architecture
Metadata Cluster
Informatica EDC
InformaticaEDC
EDC Advance Scanner
MetaDex
MetaDexConfiguration
SQL Scripts (Copy)
View Definitions
(Select from DB Catalog )
Windows or Linux
Repository DBOracle/SQL/DB2
Advance Scanner
DB/EDW
DB/EDW
File System
Data Integration
Standard Resource (Out of Box)
14 © Informatica. Proprietary and Confidential.
MRS
PWH
REF
DomainAnalyst Service
Developer UI
EDC Architecture – Services & Advance Scanner Catalog
Administrator Business Glossary
Model Repository Service
Smart Executor
Profiling Service
Data Integration Service
Content Mgmt
Service
Enterprise Data Catalog User Interface
Enterprise Data Catalog Service
Oracle/DB2
Informatica Cluster Service
Ambari UI
HDFS
YARN
HBaseSolr
Slider Slider
Spark
Zook
eepe
r
Scanner Scanner
Ded
ica
ted
/Em
bed
ded
Clu
ster
Prof
iling
Serv
er
Infra
stru
ctur
e Se
rver
HDFS
YARNZook
eepe
r
Hive
Spark Blaze
Sent
ry /
Rang
er
Da
ta L
ake
DB/
EDW
File
Sys
tem
Busin
ess
Inte
lligen
ce
Ap
plic
atio
n/C
loud
Da
ta In
tegr
atio
n
EDC Advance Scanner
EDC AS UI
EAS –Repo
15 © Informatica. Proprietary and Confidential.
Advance Scanner Ingestion process steps
Repository
Advance Scanner Server
Metadex server
Advance Scanner Processing
Create Custom resource Type
EDC
Create MetadexModel
Create Custom resource
Resource exec & Monitoring
MetadataSources
12
3
5
6
4Start process
JDBC
JDBC
Upload metadata zip file
1. Metadex process extract the metadata from the source system2. The process ensures that metadex model is available in EDC, if not, creates it.3. The process ensures the custom resource type is available in EDC, if not, creates it4. Create the custom resource
5. Upload the metadata file
6. Start and monitor the execution of the custom resource
HTTP(s)/REST
JDBC for DBs, Local/NAS for files
16 © Informatica. Proprietary and Confidential.
Informatica Domain+
Metadex
EDC Sizing guideline (summary)Sizing Based on Metadata Resources and concurrent users for 10.X
Infrastructure Metadata Processing Hadoop Cluster
Env. Size# of conc.
(total) users
CPU RAM Disk Metadata Resources
# of objects CPU RAM Disk # of
nodes CPU RAM Disk
Small 20 (200) 16 32 GB 200 GB 30-40 1 Million 16 32 GB 20
GB** 1 8 24 GB 120 GB***
Medium 50 (500) 24 32 GB 200 GB 200-400 20
Million 32 64 GB 100 GB** 3 24 72 GB 2 TB***
Large 100 (1000) 48 64 GB 300 GB 500-1000 50
Million 32 64 GB 500 GB** 6 48 144 GB 12 TB***
• Refer to Sizing and Performance Tuning Guide for sizing recommendations, parameter tuning and more.
*** 4 to 6 disks recommended on cluster nodes ** 1 to 4 disks for profiling
Advanced scanners Pre-Requisite & Best Practice
18 © Informatica. Proprietary and Confidential.
Advance Scanner Pre-Requisite 1. Operating System –One of these supported operating systems:
1. Linux (Red Hat Enterprise Linux 7 or higher, Oracle Linux 7 or higher) 2. MS Windows (Windows 10, Windows 2012, Windows 2016, Windows 2019)
Note The machine on which Advanced Scanners runs can be either a physical or a virtual machine. When running in a virtual environment the virtualization platform (VMWare, Citrix XEN etc.) must be supported by the guest operating system being used. 2. Java
OpenJDK version 8.x
3. Memory 4 GB Ram Minimum ( 8 GB Recommended ) if plan to run two parallel conversion processes then 18 GB
4. Disk Space – 20 GB
19 © Informatica. Proprietary and Confidential.
Advance Scanner - workspace folder & variable
1. Advance Scanner Home is the location where software is unzipped. 2. Advance Scanner (Metadex) uses a workspace which is used to store the config files/and scanner
results. • if you do not specify a workspace (via SCANNERS_WORKSPACE environment variable) it will create a workspace folder within
the home folder for the software.• it is a good idea (best practice) to separate the workspace form Scanner Home – so new versions of software can easily install
and replace older , without impacting workspace
Metadex env Variable ( OLD) EDC Advance Scanner Env Variable
Assigned Values
METADEX_HOME SCANNERS_HOME /home/opt/Infa_Products/EDCAdvancedScanners-10.4-g70f977b-lx-GenericEDC
METADEX_WORKSPACE_PATH SCANNER_WORKSPACE /home/opt/Infa_Products/workspace
Note :-Add Defined variables to OS .bash_profile in Linux or create env variables in Windows
20 © Informatica. Proprietary and Confidential.
Advance Scanner - High Level Steps to install 1. Download ( Advance Scanner ) Software and License
2. Install / Configure JAVA_HOME (If does not exist already)
3. Create Folders and define Variable (SCANERS_HOME SCANNER_WORKSPACE) as discussed in previous slide
4. Create Database user and JDBC properties file (If does not exist already)
5. Set Advance Scanner Repository Properties - “repository.properties” configuration file under “SCANNERS_WORKSPACE/etc” directory and create contents
6. Start Scanner
21 © Informatica. Proprietary and Confidential.
Advance Scanner – Security Authentication
• Supports both built-in and LDAP users for authenication
• Permissions can be granted to the individuals or groups
• System-level privileges (“Global roles”)
• Admin
• Permission to create, edit and delete projects
• Allows for projects roles assignmens
• Permission to upload, edit and delelte files in the allowed server location
• …
• Repository Viewer
• Allows viewing Advanced Scanner repository content
• Allows accessing built-in HTTP server
EDC Advance Scanner Demo
23 © Informatica. Proprietary and Confidential.
Advance Scanner – Download Advance Scanner
1. Download Advance Scanner binaries
2. Unzip downloaded Advance Scanner binaries
24 © Informatica. Proprietary and Confidential.
Advance Scanner – Extracted files 1. Download Advance Scanner extracted files
Once unzipped - you should see a folder with a name EDCAdvanceScanners*
25 © Informatica. Proprietary and Confidential.
Advance Scanner - workspace folder & variable1. Advance Scanner Home is the location where software is unzipped. 2. Advance Scanner (Metadex) uses a workspace which is used to store the config files/and scanner
results. • if you do not specify a workspace (via SCANNERS_HOME environment variable) it will create a workspace folder within the
home folder for the software.• it is a good idea (best practice) to separate the workspace form Scanner Home – so new versions of software can easily install
and replace older , without impacting workspace
Metadex env Variable ( OLD) EDC Advance Scanner Env Variable
Assigned Values
METADEX_HOME SCANNERS_HOME /home1/pc1021/Metadex/EDCAdvancedScanners-10.4.1.2.202010151311-g70f977b-lx-GenericEDC
METADEX_WORKSPACE_PATH SCANNER_WORKSPACE /home1/pc1021/Metadex/workspace
Add Defined variables to OS .bash_profile
26 © Informatica. Proprietary and Confidential.
Advance Scanner –Create DB Schema 1. Advance Scanner requires 1 DB Schema to store its metadata – Supported DB are ( Oracle, SQL Server
and DB2) 2. Create DB Schema on Oracle and grant privileges
3. Create workspace (being SCANNERS_WORKSPACE) and config directories for the scanners: mkdir -p /$SCANNER_WORKSPACE/libMkdir -p /$SCANNER_WORKSPACE/etc
4. Copy DB JDBC driver libraries (jar files) under “lib” folder, subfolder “jdbc”: mkdir -p /$SCANNER_WORKSPACE/jdbccp */CatalogService/scanner_agents/Catalog_Agent_install/java/Jdbc/oracle/ojdbc7.jar $SCANNER_WORKSPACE/loib/jdbc/
5. Create jdbc Properties file echo 'oracle.jdbc.driver.OracleDriver=ojdbc7.jar' > $SCANNERS_WORKSPACE/etc/jdbc.properties
Create <user_name> identified by <password>;grant connect, resource, create view, unlimited tablespace to <user_name>
27 © Informatica. Proprietary and Confidential.
Advance Scanner – Set Repository Properties 1. Create “repository.properties” configuration file under “$SCANNERS_HOME/etc” directory
scanners.repository.enabled=true#scanners.repository.jdbcUrl=jdbc:oracle:thin:@<hostName>:<portNumber>:<sid> (if you have sid)#scanners.repository.jdbcUrl=jdbc:oracle:thin:@//<hostName>:<portNumber>/serviceName (if you have oracle service name) scanners.repository.username=<UserName>scanners.repository.password=PBE0}C2Kw8KtA21+LbRPgOBNDrN11aaBp+dqkJr493jTRAoI=scanners.repository.schema=<SchemaName>scanners.repository.type=ORACLE12cscanners.repository.log.enabled=True
Note :- Password can be encrypted using encrypt.sh
2. Create Repository Contents This process will create contents in Database
cd $SCANNERS_HOME
. ./utils/repositoryUtils.sh -u
28 © Informatica. Proprietary and Confidential.
Advance Scanner –1. Start Advance Scanner Server using
29 © Informatica. Proprietary and Confidential.
Advance Scanner – Log in http://HostName:port/login http://IP:8090/login
30 © Informatica. Proprietary and Confidential.
Advance Scanner – Log in
31 © Informatica. Proprietary and Confidential.
Advance Scanners – Administration
32 © Informatica. Proprietary and Confidential.
Advance Scanners – Configuration
33 © Informatica. Proprietary and Confidential.
Example: Stored procedure parsing
• Detailed lineage available for the stored procedure• Statement level
information available
• Field level lineage and impact analysis
• Support for dynamically generated SQL
• Replaces the tech preview feature
EDC Advance Scanner Road Map (Sneak Peek)
35 © Informatica. Proprietary and Confidential.
Jul’2020 Q1 2021 Q4’2021
Advanced Scanner (Sneak Peek)
SQL Script & Stored Procedure lineage• Oracle, MS SQL Server, IBM
DB2, Teradata, Netezza
Statistical & BI Tools• SAS• Microsoft SSAS• Microsoft SSRS
ETL Scanners• Microsoft SSIS• IBM Datastage
Complex Systems• SAP BW• SAP BW4HANA
Mainframe• Cobol• JCL
SQL Script & Stored Procedure lineage• HANA DB, Greenplum,
MySQL, PostgreSQL, Redshift PSQL, Snowflake
Scripts• Pyspark• Databricks Notebooks
ETL Scanners• Microsoft Azure Data
Factory• Oracle Data Integrator (GA)• SAP BODS• Talend DI
Statistical & BI Tools• SAS BI, SAS DI
ETL Scanners• Alteryx• AWS Glue ETL• Google cloud dataflow• Denodo• Mulesoft
Mainframe• VSAM
Metadex as separate application Metadex as platform service Complete configuration, execution, logging integration
Questions?
?
`
Thank You