The Evolution of Big Data Pipelines at Intuit

The Evolution of Big Data Pipelines At IntuitJune 30, 2016

#hadoopsummit #HS16SJ

Your Speakers

Lokesh RajaramSenior Software Engineer, Intuit

likes Photography

Rekha JoshiPrincipal Software Engineer, Intuit

Currently likes Chopped

The Plan

Unicellular Amoeba

Multicellular Humans

Cannot Evolve? Disappear..

Evolution of Big Data

Our Mission

To improve our customers’ financial lives so profoundly … they can’t imagine going back to the old way!

Consumers Small Businesses Accounting Professionals

Who we serve

42M 2.3M 7MFile their own taxes with

TurboTaxRun their small businesses

with QuickBooksManage their personal finances

with Mint

The Numbers Are Growing

65+ Applications, 25% of US GDP

Era of Windows Era of

Era of the Cloud

Era of DOS

Intuit - An Evolution Case Study

Compliantdata

Mobile First

1980s 1990s 2000s

• Employees: 150• Customers: 1.3M customers• Revenue: $33M

• Employees: 4,500 • Customers: 5.6M • Revenue: $1.04B

• Employees: 7,700• Customers: 37M• Revenue: $4.2B

20162010

Regulatory data Transactional data Batch data Real time data Complex, secure data

Data Is The Decision Maker

Evolution of Big Data Pipelines – The Need

Secure Cloud Environment

Single Cohesive Data Pipeline

AB Testing

Personalization

StreamingProfile Store

Fraud Detection

Support Varied Use Cases

and more..

Evolution of Big Data Pipelines

Thin Slices - Minimal Viable Product

Evolution of Big Data Pipelines – The Recipe

Taking the Data In

Transforming Data

Handling The Indigestion With Scale

No SnowflakesSolutions

Getting Vested Stakeholders Agreements

Establishing The

Standards

Breaking The Silos

Moving Organization In

One Direction

● Making The Configuration Knobs Work● At Scaleo Latency o Throughput

● Schema, PII, Metadata, Changes, Audit, Governance ● Controlled Access←→ Innovation● Error Monitoring ● Cluster Deployment

Organization Evolution Data Evolution

User-entered data

Apache Kafka

Collector: User-entered and clickstream data

Real-time processing

Personalization Engine

Profile Store

Big Data Pipeline Slice View

Big Data Pipeline Components

Monitoring The Pipeline

AWS resource alarms

Custom App MetricsJVM and App Metrics

Custom process alerts

Logging and alert

Evolution In Stages

Evolution - Stage 0: Disparate And Chaotic

Disparate Databases

Data Pipeline (an example)

• Collect event stream data into one location

• Handle ~ 200k events / sec

• Payload ~ 3-5KB

• Enrich message and load it into Hive in defined SLA

Evolution - Stage 1

Event Stream

OozieSqoop

Netezza LoaderHive QL

operationsStormSamzaFlume

Evolution - Stage 2

Event Stream

{ ReST }

Evolution - Stage 3 (HA & DR)SDK

{ ReST }SDK

{ ReST }Mirroring

Challenges & Opportunities

Set of Changes

• Network upgrades

• Increase pipe

• Broker

• Mirrormaker

• Host TCP

Evolution - Stage 4 (Streaming + Batch)

{ ReST }

Mirroring

Evolution - Stage 5 (Cloud only)

{ ReST }

Kafka Connectors

Evolution - Stage 5 (Cloud only - Future state)

Pipeline Essentials

SDK { ReST }

Traffic Rate Monitoring

Trust by Verification

• Test all Observable End-points• Functional• Data Loss• Data Parity

• Measure for SLA• Baseline Tests

Interested in Joining?

goo.gl/BLPfyR

Thank You!

The Evolution of Big Data Pipelines at Intuit

Technology

Intuit Philosophy

Intuit Report

Introduction to intuit products

[Intuit] Control Everything

Intuit websites

Learning Manual - Intuit

Intuit SBDC Toolkithttp-download.intuit.com/http.intuit/CMO/intuit/press_room/vpk_quickbooks_ss/intuit...Internet-based Version Available g Manage.a.business.from.anywhere..QuickBooks:

Copyright 2011 – Intuit Inc. Intuit QuickBooks Certified User Trevor Matheson

Disdat: Bundle Data Management for Machine Learning Pipelines - … · 2019-05-20 · Disdat: Bundle Data Management for Machine Learning Pipelines Ken Yocum Intuit, Inc. Sean Rowan

Intuit Mobile Custom Views

the Evolution of Graphics Pipelines

Intuit Show 2010

We are Intuit

Intuit final ppt

Intuit Partner Platform Overview

Profesor Intuit Ivo

Vijay anand intuit.34664759

Intuit Case Study

Intuit demandforce

Intuit GoPayment•U