Upload
prasad-gaikwad
View
90
Download
1
Embed Size (px)
Citation preview
Prasad Gaikwad
8087002445
Big Data Hadoop developer with over three years of experience in using cutting edge technologies such as Big Data on Cloud along with machine learning and data visualization & discovery to help businesses identify new opportunities and create disruptive business models. Leading and mentoring team of solution developers in integrating new age technologies with enterprise ETL and DW appliances. Won Tata Technologies Spot award for proactive involvement in performance tuning and automating manual deliverables.
Professional Experience
Big Data Hadoop Developer | Lead
Aug'15 to presentTata Technologies
Currently working as Big Data lead in digital team for Tata motors ltd, a major Indian automotive manufacturer. Helping business identify value added insights by designing and implementing cost effective Cloud based solutions, integrating open source technologies viz., Spark, Hive, Sqoop, Kafka, Oozie with Enterprise applications such as SAP, CRM and Cordys. Implementing multiple fast paced PoCs to validate concepts and mature it into projects if successful.
Hands on experience in using Spark, Hive, Sqoop, Kafka, Oozie, Hue, Ambari and Zeppelin for ingesting, cleaning and integrating Enterprise wide application data.
Excellent understanding on working of Hadoop and Spark internals such as HDFS, MapReduce, YARN, RDD, Dataframes and Dataset.
Designing solution architectures using excellent understanding of various cloud offerings and hands on experience in provisioning and managing resources such as Amazon EMR, EC2, S3, RedShift, RDS, Kinesis, Lambda, Google Compute Engine, GCS, BigQuery, HDInsight (Azure)
Setting up and using multi-node EMR cluster, Cloudera CDH cluster on AWS using Cloudera Director and on premise HDP Hadoop Cluster using Ambari.
Active participation in Summits, Sessions and hands-on workshops on large scale data processing by Solution Architects and SMEs from Industry leaders such as AWS, Cloudera, Teradata, Microsoft and Google to evaluate and understand Big Data appliances and data lake solutions offered, on-Premise vs. on-Cloud architecture and how they fit-in current enterprise IT landscape.
Projects-
Profix DataMart – Tata Motors (AWS) - AWS EMR, Spark, Hive, Sqoop, Oozie, SAS
Designed and deployed Datamart on top of Amazon S3 establishing ODBC connectivity to SAS modelling team via Hive using Amazon EMR. Automated daily refresh of data from Teradata EDW box using combination of sqoop, spark and oozie to create daily ETL jobs running on on-demand EMR cluster.
Project Wave- Tata Motors (AWS) - AWS EMR, Spark, Hive, S3, Tableau
Designed and deployed trend analysis dashboards in Tableau to predict impact of fluctuations in commodity market on VC cost using Hive, Spark, S3, Hue and Zeppelin. Automated provisioning of EMR clusters with spot instances for executing batch ETL workloads.
Vehicle stoppage analysis– Tata Motors (GCP) - Google BigQuery, Python, MS-SQL, Tableau
Designed and deployed pure cloud Big data solution of telemetry data, to convert live tracking of vehicles into stoppage heat maps, used clustering algorithms to identify points of interest to business based on most frequent stoppages across India.
POCs-
SAP BOM data explosion using Hive, Pig, Spark and Tableau (on premise).
Deployed on-premise HDP cluster to develop interactive Tableau dashboards displaying component/vehicle/plant wise cost variations. Used Pig, Hive, Spark, and Ambari for Ingesting and integrating BOM data from SAP BW with CRM
Designing and developing Data Lake strategy
Developing data lake strategy for ingesting structured, semi-structured and unstructured data generated by various applications in current enterprise landscape and exposing only relevant data to end users.
Solution Developer | ETL Lead
January 2014 to Jul'15 Tata Technologies
Worked as ETL lead in Business Intelligence team for Tata motors ltd, a major Indian automotive manufacturer. Integrated Siebel CRM, SAP and Cordys Portal data using Informatica in Teradata EDW (10 TB+). Delivered end-to-end business solution, implemented PoCs, Performance tuned and provided production support for one of the largest CRM deployment. Technical lead for ETL developers, responsible for deploying CRs in production, monitoring and troubleshooting execution of nightly ETL.
Hands on work experience in‐
Providing L2 and L3 support on production environment serving 5000+ application end users on reporting engine (OBIEE).
Performance Tuning – 40%+ improvement in ETL execution time from 4 hours down to 2.5 hours by implementing Informatica, DAC and Teradata best practices.
Understanding business requirement for designing end to end Module deployment. Building and maintaining complex ETLs for data integration across business applications
with about 1000+ mappings, 600+ tables and 10+ TB relational database on Teradata using Informatica Powercenter and DAC.
Building custom data models to integrate Enterprise data with CSV uploads to create complete picture of business landscape.
Debugging and resolving ETL failures and data discrepancies of existing models. Shell scripting and writing cronjobs for executing Teradata scripts, ftp/sftp transfers
between application servers.
Technical Proficiency
Big Data Technologies Hadoop (Cloudera CDH, HortonWorks HDP, AWS EMR)
HDFS, Hive, Kafka, MapReduce, Oozie, Pig, Spark, Sqoop, Zookeeper, Zeppelin
Cloud Vendors Amazon AWS, Google Cloud, Microsoft Azure
Database Teradata, MongoDB, Google BigQuery, Oracle, MySQL
Tools Informatica, DAC, Teradata Utilities
Monitoring and Reporting Apache Ambari, HUE, Cloudera Manager, TD Viewpoint, OBIEE, Tableau
Programming/Scripting Languages
SQL, python, java, scala
Operating Systems RHEL, CentOS, Fedora, Windows
Academic Qualifications
B.E. in Information Technology from Walchand Institute of Technology, Solapur with 72% Diploma in Computer Engineering from Govt. Polytechnic Mumbai with 81.3% S.S.C. with 88.3%