PrasadGaikwad

Prasad Gaikwad

8087002445

[email protected]

Big Data Hadoop developer with over three years of experience in using cutting edge technologies such as Big Data on Cloud along with machine learning and data visualization & discovery to help businesses identify new opportunities and create disruptive business models. Leading and mentoring team of solution developers in integrating new age technologies with enterprise ETL and DW appliances. Won Tata Technologies Spot award for proactive involvement in performance tuning and automating manual deliverables.

Professional Experience

Big Data Hadoop Developer | Lead

Aug'15 to presentTata Technologies

Currently working as Big Data lead in digital team for Tata motors ltd, a major Indian automotive manufacturer. Helping business identify value added insights by designing and implementing cost effective Cloud based solutions, integrating open source technologies viz., Spark, Hive, Sqoop, Kafka, Oozie with Enterprise applications such as SAP, CRM and Cordys. Implementing multiple fast paced PoCs to validate concepts and mature it into projects if successful.

Hands on experience in using Spark, Hive, Sqoop, Kafka, Oozie, Hue, Ambari and Zeppelin for ingesting, cleaning and integrating Enterprise wide application data.

Excellent understanding on working of Hadoop and Spark internals such as HDFS, MapReduce, YARN, RDD, Dataframes and Dataset.

Designing solution architectures using excellent understanding of various cloud offerings and hands on experience in provisioning and managing resources such as Amazon EMR, EC2, S3, RedShift, RDS, Kinesis, Lambda, Google Compute Engine, GCS, BigQuery, HDInsight (Azure)

Setting up and using multi-node EMR cluster, Cloudera CDH cluster on AWS using Cloudera Director and on premise HDP Hadoop Cluster using Ambari.

Active participation in Summits, Sessions and hands-on workshops on large scale data processing by Solution Architects and SMEs from Industry leaders such as AWS, Cloudera, Teradata, Microsoft and Google to evaluate and understand Big Data appliances and data lake solutions offered, on-Premise vs. on-Cloud architecture and how they fit-in current enterprise IT landscape.

Projects-

Profix DataMart – Tata Motors (AWS) - AWS EMR, Spark, Hive, Sqoop, Oozie, SAS

Designed and deployed Datamart on top of Amazon S3 establishing ODBC connectivity to SAS modelling team via Hive using Amazon EMR. Automated daily refresh of data from Teradata EDW box using combination of sqoop, spark and oozie to create daily ETL jobs running on on-demand EMR cluster.

Project Wave- Tata Motors (AWS) - AWS EMR, Spark, Hive, S3, Tableau

Designed and deployed trend analysis dashboards in Tableau to predict impact of fluctuations in commodity market on VC cost using Hive, Spark, S3, Hue and Zeppelin. Automated provisioning of EMR clusters with spot instances for executing batch ETL workloads.

Vehicle stoppage analysis– Tata Motors (GCP) - Google BigQuery, Python, MS-SQL, Tableau

Designed and deployed pure cloud Big data solution of telemetry data, to convert live tracking of vehicles into stoppage heat maps, used clustering algorithms to identify points of interest to business based on most frequent stoppages across India.

POCs-

SAP BOM data explosion using Hive, Pig, Spark and Tableau (on premise).

Deployed on-premise HDP cluster to develop interactive Tableau dashboards displaying component/vehicle/plant wise cost variations. Used Pig, Hive, Spark, and Ambari for Ingesting and integrating BOM data from SAP BW with CRM

Designing and developing Data Lake strategy

Developing data lake strategy for ingesting structured, semi-structured and unstructured data generated by various applications in current enterprise landscape and exposing only relevant data to end users.

Solution Developer | ETL Lead

January 2014 to Jul'15 Tata Technologies

Worked as ETL lead in Business Intelligence team for Tata motors ltd, a major Indian automotive manufacturer. Integrated Siebel CRM, SAP and Cordys Portal data using Informatica in Teradata EDW (10 TB+). Delivered end-to-end business solution, implemented PoCs, Performance tuned and provided production support for one of the largest CRM deployment. Technical lead for ETL developers, responsible for deploying CRs in production, monitoring and troubleshooting execution of nightly ETL.

Hands on work experience in‐

Providing L2 and L3 support on production environment serving 5000+ application end users on reporting engine (OBIEE).

Performance Tuning – 40%+ improvement in ETL execution time from 4 hours down to 2.5 hours by implementing Informatica, DAC and Teradata best practices.

Understanding business requirement for designing end to end Module deployment. Building and maintaining complex ETLs for data integration across business applications

with about 1000+ mappings, 600+ tables and 10+ TB relational database on Teradata using Informatica Powercenter and DAC.

Building custom data models to integrate Enterprise data with CSV uploads to create complete picture of business landscape.

Debugging and resolving ETL failures and data discrepancies of existing models. Shell scripting and writing cronjobs for executing Teradata scripts, ftp/sftp transfers

between application servers.

Technical Proficiency

Big Data Technologies Hadoop (Cloudera CDH, HortonWorks HDP, AWS EMR)

HDFS, Hive, Kafka, MapReduce, Oozie, Pig, Spark, Sqoop, Zookeeper, Zeppelin

Cloud Vendors Amazon AWS, Google Cloud, Microsoft Azure

Database Teradata, MongoDB, Google BigQuery, Oracle, MySQL

Tools Informatica, DAC, Teradata Utilities

Monitoring and Reporting Apache Ambari, HUE, Cloudera Manager, TD Viewpoint, OBIEE, Tableau

Programming/Scripting Languages

SQL, python, java, scala

Operating Systems RHEL, CentOS, Fedora, Windows

Academic Qualifications

B.E. in Information Technology from Walchand Institute of Technology, Solapur with 72% Diploma in Computer Engineering from Govt. Polytechnic Mumbai with 81.3% S.S.C. with 88.3%

Documents

PrasadGaikwad