44
Put your data to work with Big Data services from AWS and Informatica

SendGrid Improves Email Delivery with Hybrid Data Warehousing

Embed Size (px)

Citation preview

Page 1: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Put your data to work with Big Data services from AWS and Informatica

Page 2: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Data is growing

of new data will be created every second for every human being on the planet by 2020

http://www.whizpr.be/upload/medialab/21/company/Media_Presentation_2012_DigiUniverseFINAL1.pdf

1.7MB

compound annual growth rate of 58% surpassing $1 billion by 2020 forecasted for the Hadoop market

http://www.ap-institute.com/big-data-articles/big-data-what-is-hadoop-%E2%80%93-an-explanation-for-absolutely-anyone.aspx

http://www.marketanalysis.com/?p=279

58%of all data is ever analyzed and used at the moment

http://www.technologyreview.com/news/514346/the-data-made-me-do-it/

0.5%<

Page 3: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Big Data is for everyoneThe market for Big Data technologies is growing more than six times faster than the information technology market as a whole….

…and those companies who use their data well win.

Page 4: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Why AWS for Big Data?

Immediately Available

Broad and Deep Capabilities

Trusted and Secure

Scalable

Page 5: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Collect, Store, Analyze, and VisualizeIt’s easy to get data to AWS, store it securely, and analyze it with the engine of your choice, without any long-term commitment or vendor lock-in

CollectImport/Export

Snowball

Direct Connect

VM Import/Export

StoreAmazon S3

EMR

Amazon Glacier

Amazon Redshift

DynamoDB

AnalyzeAmazon Kinesis

Lambda

EMR

EC2

Aurora

Page 6: SendGrid Improves Email Delivery with Hybrid Data Warehousing

AWS provides the most complete platform for Big DataWhat can you do with Big Data on AWS?

Big Data Repositories Clickstream Analysis ETL Offload

Machine Learning Online Ad Serving BI Applications

Page 7: SendGrid Improves Email Delivery with Hybrid Data Warehousing

The Amazon Redshift view of data warehousing

10x cheaper

Easy to provision

Higher DBA productivity

10x faster

No programming

Easily leverage BI tools, Hadoop, Machine Learning, Streaming

Analysis in-line with process flows

Pay as you go, grow as you need

Managed availability & DR

Enterprise Big Data SaaS

Page 8: SendGrid Improves Email Delivery with Hybrid Data Warehousing

The cloud can be made more secure than on-premises

High speed redundant direct connect lines

Load billions of rows in minutes

All data in private VPC

All data encrypted with private on-premises hardware keys

Encryption of data, transport, backups, partial spills

Audit of all SQL actions

Audit of all configuration changes

Page 9: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Data warehouses can support real-time dataBig data does not mean batch

Can be streamed in

Can be processed in near real time

Can be used to respond quickly to requests

You can mix and match on-premises and cloud

Custom development and managed services

Infrastructure with managed scaling, security

Page 10: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Hybrid Cloud Data Management with AWS and InformaticaPresented by Andrew McIntyre

Page 11: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Agenda The IT Landscape and how it is changing IT challenges with Hybrid Cloud Architecture Customer success story with SendGrid How Informatica can help customer migrate to Hybrid Architecture Why choose Informatica?

Page 12: SendGrid Improves Email Delivery with Hybrid Data Warehousing

IT Landscape is Changing…

Page 13: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Why Enterprises are Adopting Cloud Architecture

Business agility requires IT agility

Cloud economics pay off in a big way

Focus on core competencies & unique value

Page 14: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Hybrid Cloud is Common Approach

ERP & On-Premises AppsTraditional Relational Databases

Traditional Data Warehouse

Amazon Redshift

+

Cloud:

On-Premises:

Page 15: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Defining Hybrid Cloud Data Management

Integrate, Cleanse, Govern, Master, Secure^Integrating data from:

On-premises databases, data warehouses, apps with SaaS applications

With Public cloud: AWS

Page 16: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Data Management Challenges in Hybrid Cloud Architecture

Connectivity

Many Data Systems: Cloud & On-Prem Reuse work across systems Secure connection

Data Visibility

Complex data flows-less comprehension Quality, Governance, security, regulation,

audits, mastering

Scalability Support large data volume Match infinite capacity in cloud platform

Operational Control

Monitor & Manage data in production Ensure operational success Monitor end to end business process

Page 17: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Informatica + AWS Use Cases

Lift and Shift: Moving on-premises databases, systems and/or DW to AWS-based workloads

Hybrid App Integration: Integrate on-premises and cloud apps with Informatica Cloud. Also known as iPaaS (integration Platform-as-a-Service)

Hybrid Data Warehousing: Load multiple data sources from cloud and/or on premise to AWS using Informatica Cloud

+

Page 18: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Lift and Shift your Workloads

Cloud

On premise

Use Case Summary:Moving on-premises databases, systems and/or data warehouse to AWS-based workloads

Amazon Redshift

On-premises Data Warehouse

Other Databases Your Data Integration Platform

Firewall

Amazon RDS

Amazon Aurora

Page 19: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Hybrid App Integration

Use Case Summary:Integrate on-premises and cloud apps with Informatica Cloud. Also known as iPaaS (integration Platform-as-a-Service)

Cloud

On premise

Data Warehous

e on-premises Apps

Firewall

Amazon RDS Amazon Redshift

Your Data Integration Platform

on-premises Data

Warehouse

Other Databases

Page 20: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Hybrid Data WarehousingUse Case Summary: Load multiple data sources from cloud and/or on premise to AWS using Informatica Cloud

On-premisesData Warehouse

Your Data Integration Platform

ERP, on-premises Apps

Traditional Relational Databases

Social Media

Logs IoT

Analytics Tools

Cloud

On premise

Firewall

Amazon RDS Amazon Redshift

Page 21: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Informatica Cloud for Amazon Web Services

Amazon DynamoDB

Amazon EMR

Amazon S3

Amazon Redshift

Amazon Aurora

Amazon RDS

Informatica Cloud provides native connectivity to Amazon Web Services for scalable, high-performance integration with any cloud and on-premises data source.

Page 22: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Informatica Cloud and Amazon Redshift

Seamless integration with any data system on cloud and on-prem

Native, high performance data integration and synchronization

The only solution to provide “Upsert” functionality

Step by step integration wizards for non-technical users

Advanced point and click integration workflows for technical users

Page 23: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Hybrid Data Warehousing

An Informatica Case Study

Page 24: SendGrid Improves Email Delivery with Hybrid Data Warehousing

SendGrid: Company Background

Founded in 2009, after graduating from the TechStars program, SendGrid developed an industry-disrupting, cloud-based email service to solve the challenges of reliably delivering emails on behalf of growing companies. Like many great solutions, SendGrid was born from the frustration of three engineers whose application emails didn’t get delivered, so they built an app for email deliverability. Today, SendGrid’s reliable email platform delivers each month over 25 billion transactional and marketing emails on behalf of many of your favorite brands, including Uber, Airbnb, Spotify, Foursquare and NextDoor.

Page 25: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Business and Technical Requirements

Emphasis for the architecture was speed over accuracy, sustainability and growth.

As a result, the architecture was already hitting the limitations of its design.

Architecture IssuesPrior to my joining the company, SendGrid had already committed to using MySQL for a new data warehouse build. The SendGrid Data Warehouse architecture that was underway did not follow a formal data warehousing methodology. It was built specifically to support the BI tool and it’s features and limitations. This resulted in an architecture that does not follow many of the industry standard Data Warehousing best practices.

Page 26: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Business and Technical Requirements

Our small team is responsible for the strategic direction, design, delivery and availability of business data for corporate-wide utilization in measuring performance, business outcomes and decision making capabilities. Data and analytics need to be provided in various ways and formats through effective and efficient delivery methods.

To accomplish this, the team was tasked with building a new data warehouse. We planned to start on our main data source, which houses our email event and customer information.

Business Needs for Data & Analytics

Director, Enterprise Data Operations

Data Warehouse Architect / ETL Developer

BI Developer

Business Systems Analyst

Meet the Team:

Page 27: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Technical RequirementsEvaluate the overall data warehouse architecture and suggest required changes and improvements to: Database technology, design and work products ETL tool, design and work products

Data Warehouse Assessment Needed

Database Technology: Nimble Cost effective Meets storage and capacity needs Allows the team to be self-sufficient without reliance on

other team’s skill-sets ETL Tool: Mature ETL tool to leverage for data warehousing

Technical Requirements

Page 28: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Data Warehouse AssessmentDatabase Technology Options

Page 29: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Data Warehouse AssessmentThe Findings: Overview

Confirmed assumption that utilizing MySQL was not sustainable as a database technology

Switch to a technology that better aligns with a data warehouse infrastructure: Amazon Redshift selected

Mature ETL tool is needed for data warehousing while providing a user-friendly tool for business communities

Informatica selected to load data into Amazon Redshift from multiple data sources, cloud, and on-prem while supporting citizen integrators.

Page 30: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Data Warehouse and Analytics Conceptual Architecture

Marketo

SalesForce

Zuora

Mail db

Raw Data

Acquisition Layer

Core LayerMapping Schemas

Data Sources

Data Mining,Benchmark

Data

Enterprise Data Warehouse (Amazon Redshift)

Time

RevenueCustomer

SalesForce

Product Volume Usage

Product Usage

Segment*

Jira

Hadoop Cluster

Analytics Tools

Reporting/Analytics,

Dashboards,Export Data

Publishing Layer

Clean Data/Metadata Dimensional Data

Zendesk

Test and Learn

Campaigns

ETL

Informatica

Clo

udon

-pre

mis

es

ETL

Informatica

ETL

Informatica

Or

Page 31: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Project Outlook

The project is still in the early stages of the data warehouse build.

The project is in the early stages of the data warehouse build. We have set up our Amazon Redshift instance for the data warehouse and have started sourcing data from six sources, a mix of both cloud and on-premises.

We are actively using Informatica data integration portfolio in a hybrid architecture to support ETL integration.

By the end of 2016, we will have enough data from multiple data sources in the Amazon Redshift data warehouse and our BI tool, allowing us to roll-out self service analytics with a foundational view of customer, product, revenue, and email volume and usage data.

We are confident that with this approach we have set ourselves up for success in a nimble, scalable, cost effective manner to rapidly enable business driven insights for SendGrid!

Page 32: SendGrid Improves Email Delivery with Hybrid Data Warehousing

How Informatica Can Help

Page 33: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Connectivity

High-performance out-of-the-box native connectors to any data system

Abstraction layer enables reuse Secure

Data Visibility

Metadata-driven visual design: visibility into data flows cross cloud and on-prem

Metadata: the foundation of quality, governance, security, mastering

Scalability Inherently designed for performance at scale iPaaS offers infinite integration capacity and

bursting

Operational Control

Single point of control for production data across cloud and on-prem

Admin can monitor production data flows and flag issues early

Informatica Addresses Data Management Challenges

Page 34: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Hundreds of Connectors For Every Type of Data Source Sales & Service Big Data

Human Resources

Web Protocols & API ERP & Financials

B2B

Marketing

Social

IT & Admin Analytics

Page 35: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Informatica’s 3 Key Differentiators

The project is still in the early stages of the data warehouse build.

Unlock your data

1 2

Scale with Performance

UI maximizes productivity for developers & citizen integrators

Visual data mappingOut of box templates & wizardsEasy to use & highly reusable

3

Hundreds of out-of-box connectors for cloud and on-prem data

sources

Optimized to process the largest data volumes

Pushdown Optimization Automated

CONNECT DEVELOP DEPLOY

Page 36: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Informatica Product Portfolio for Hybrid Cloud Management

The project is still in the early stages of the data warehouse build.

Cloud Test Data

Management

Cloud Application Integration

Cloud Data Integration

Data as a

Service

Cloud Customer 360

Page 37: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Amazon Redshift Upsert – Manual Coding Method1. Extract the data from source2. Put into flat files and compress3. Transfer Compressed Files To

S34. Wait for S3 Consistency5. Create Staging Table in

Redshift6. Copy Data From S3 Into

Staging Table7. Inner Join With Target Table To

Delete Rows To Be Updated8. Insert Updated Rows From

Staging Table9. Delete Staging Table10. Delete Files From S3

Or, Do It In 3 Simple Steps…

Page 38: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Amazon Redshift Upsert – Informatica Cloud Method

1. Choose Upsert Operation

2. Map Your Fields

3. Run Or Schedule!

Page 39: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Informatica Cloud Amazon Redshift Architecture

Informatica Cloud Secure Agent

Metadata Mappings

Build mapping and execute job

1

1Retrieve Account Data2

23 Put Account Data into Flat File(s)

4 Transfer compressed Flat File(s) to S35 Initiate copy from S3

6 Load data into Amazon Redshift

6

3

54

Firewall

Page 40: SendGrid Improves Email Delivery with Hybrid Data Warehousing

iPaaS customers

4,500OEMs with over 1,000 customers

70+Transactions per month 130% growth yoY

300BIntegration jobs / processes per day

1M<

Page 41: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Next Steps…

Page 42: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Additional Resources

Page 43: SendGrid Improves Email Delivery with Hybrid Data Warehousing

Getting Started – Amazon Web Services

www.informatica.com/products/cloud-integration/connectivity/amazon-connectors.html

4 hour Trial of Specific Use Cases

60 Day Trial of All Functionality

www.informatica.com/products/cloud-integration/connectivity/amazon-connectors/amazon-test-drive.html

Informatica.com Amazon Marketplace

Page 44: SendGrid Improves Email Delivery with Hybrid Data Warehousing