Big Data in the Cloud with Informatica Cloud and Amazon Redshift

  • Published on
    11-May-2015

  • View
    1.794

  • Download
    2

Embed Size (px)

DESCRIPTION

Data warehousing costs have been continually rising with the explosion of Big Data. To help you explore the most cost-effective data warehousing techniques, learn from the cloud experts from Amazon and Informatica. Learn more: http://www.informaticacloud.com/amazon-redshift Amazon Redshift is a petabyte-scale cloud-based data warehouse that allows you to provision multiple database nodes on demand and offload raw data from on-premise databases for more cost effective data warehousing. Getting this data into Redshift is easy with Informatica Cloud. In this interactive webinar, youll learn: -How Amazon Redshift is changing the economics of data warehousing -Why Big Data integration and management is a strategic imperative within enterprises -How cloud integration makes cloud data warehousing even more cost effective At Informatica, our goal is to unlock your information potential. Join us with featured guest speakers from Amazon for this interactive webinar.

Transcript

  • 1.Cloud and Amazon Redshift Rahul Pathak, Amazon Redshift Product Management Nicolas Brisoux, Informatica Cloud Platform Adoption Darren Cunningham, Informatica Cloud Marketing @infacloud #redshift

2. Todays Agenda Informatica and Amazon Strategic Partnership Amazon Redshift Overview Informatica Cloud Redshift Connector Demonstration Discussion Next Steps 2 3. Informatica: The Information Management Leader B2B Data Exchange Informatica supports the requirements of cross-organizational data exchange, so users apply familiar & trusted data integration tools and techniques to the growing practice of B2B data integration. Cloud Data IntegrationEnterprise Data Integration Complex Event Processing Informatica received high praise for its services from customers. For deployments involving systems monitoring use cases, Informatica offers a five-day standup of RulePoint. Ultra Messaging In spite of the new entrants, Informatica remains the market leader in this highly demanding part of the messaging market. Data Quality Master Data Management Application ILM 4. Informatica Cloud: our fastest growing product line Todays Focus: Cloud Data Integration 4 5. Informatica Cloud and Amazon Redshift: Enabling cost-effective data warehousing Redshift Connector pre-release announced in February General availability this month (August) 5 InformaticaCloud.com/Amazon-Redshift 6. Rahul Pathak | rapathak@amazon.com | @rahulpathak Senior Product Manager Amazon Redshift 7. AWS Database Services Amazon RDS Fully managed SQL database service for OLTP workloads Amazon DynamoDB Fully managed NoSQL service for massively scalable, high throughput, low latency workloads Amazon Redshift Fully managed fast and powerful, petabyte- scale data warehouse service Amazon ElastiCache Fully managed Memcached-compliant in memory caching service 8. We set out to build A fast and powerful, petabyte-scale data warehouse that is: A Lot Faster A Lot Cheaper A Lot Simpler Amazon Redshift 9. Data warehousing done the AWS way Pay as you go, no up front costs Fast, cheap, easy to use SQL Easy to provision 10. Common Customer Use Cases Reduce costs by extending DW rather than adding HW Migrate completely from existing DW systems Respond faster to business; provision in minutes Improve performance by an order of magnitude Make more data available for analysis Access business data via standard reporting tools Add analytic functionality to applications Scale DW capacity as demand grows Reduce HW & SW costs by an order of magnitude Traditional Enterprise DW Companies with Big Data SaaS Companies 11. Progress Since Launch on Feb 14, 2013 Fastest growing service in AWS history Well over 1,000 customers; adding over 100 per week Obtained SOC1 & SOC2 certification with more in progress Deployed in US East (N. Virginia), US West (Oregon), EU (Ireland) and Asia Pacific (Tokyo) Additional global regions coming soon 12. Amazon Redshift Customers 5x 20x reduction in query times; 4x cost reduction over HIVE 20x 40x reduction in query times Nokia: 50% reduction in costs, 2x improvement in query times 13. Amazon Redshift Customer: bit.ly When we want to answer a question with Redshift, we just write a SQL query and get an answer within a few minutes if not seconds. - Sean OConnor, Engineer at bit.ly Bit.ly provides social link sharing analytics, managing over 300 million shortens and 5 billion clicks each month 14. 14 Amazon Redshift Customer: HasOffers Amazon Redshift introduces a major opportunity to improve the performance of our real- time reporting, allowing us to run queries up to 50 times faster than our current OLAP solution. - Niek Sanders, VP of Engineering, HasOffers HasOffers records and reports billions of desktop and mobile interactions for performance marketers 15. Amazon Redshift Customer: Infor This is the formula for fast and broad adoption, where customers can get consistent, accurate, and useful data fast - in weeks not months or years. - Ali Shadman, SVP, Business Cloud & Upgrades, Infor Infor is the worlds third largest ERP vendor, serving over 70,000 customers in 194 countries 16. Amazon Redshift dramatically reduces I/O Data compression Zone maps Direct-attached storage Large data block sizes ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375 With row storage you do unnecessary I/O To get total amount, you have to read everything 17. Amazon Redshift dramatically reduces I/O Data compression Zone maps Direct-attached storage Large data block sizes With column storage, you only read the data you need ID Age State Amount 123 20 CA 500 345 25 WA 250 678 40 FL 125 957 37 WA 375 18. Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps Direct-attached storage Large data block sizes Columnar compression saves space & reduces I/O Amazon Redshift analyzes and compresses your data analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw 19. Amazon Redshift dramatically reduces I/O Column storage Data compression Direct-attached storage Large data block sizes Track of the minimum and maximum value for each block Skip over blocks that dont contain the data needed for a given query Minimize unnecessary I/O 20. Amazon Redshift dramatically reduces I/O Column storage Data compression Zone maps Direct-attached storage Large data block sizes Use direct-attached storage to maximize throughput Hardware optimized for high performance data processing Large block sizes to make the most of each read Amazon Redshift manages durability for you 21. Amazon Redshift architecture Leader Node SQL endpoint Stores metadata Coordinates query execution Compute Nodes Local, columnar storage Execute queries in parallel Load, backup, restore via Amazon S3 Parallel load from Amazon DynamoDB Single node version available 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC 22. Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage Optimized for I/O intensive workloads High disk density Runs in HPC - fast network HS1.8XL available on Amazon EC2 23. Amazon Redshift lets you start small and grow big Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores Single Node (2 TB) Cluster 2-32 Nodes (4 TB 64 TB) Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE Cluster 2-100 Nodes (32 TB 1.6 PB) Note: Nodes not to scale 24. Amazon Redshift is priced to let you analyze all your data Simple Pricing Number of Nodes x Cost per Hour No charge for Leader Node No upfront costs Pay as you go Price Per Hour for HS1.XL Single Node Effective Hourly Price Per TB Effective Annual Price per TB On-Demand $ 0.850 $ 0.425 $ 3,723 1 Year Reservation $ 0.500 $ 0.250 $ 2,190 3 Year Reservation $ 0.228 $ 0.114 $ 999 25. Amazon Redshift is easy to use Provision in minutes Monitor query performance Point and click resize Built in security Automatic backups Slides not intended for redistribution. 26. Amazon Redshift has security built-in SSL to secure data in transit Encryption to secure data at rest AES-256; hardware accelerated All blocks on disks and in Amazon S3 encrypted No direct access to compute nodes Amazon VPC support Slides not intended for redistribution. 10 GigE (HPC) Ingestion Backup Restore Customer VPC Internal Security Group JDBC/ODBC 27. Amazon Redshift continuously backs up your data and recovers from failures Replication within the cluster and backup to Amazon S3 to maintain multiple copies of data at all times Backups to Amazon S3 are continuous, automatic, and incremental Designed for eleven nines of durability Continuous monitoring and automated recovery from failures of drives and nodes Able to restore snapshots to any Availability Zone within a region Slides not intended for redistribution. 28. Amazon Redshift works with your existing analysis tools More coming soon JDBC/ODBC Amazon Redshift 29. Amazon Redshift integrates with multiple data sources Amazon Elastic MapReduce Amazon DynamoDB Amazon Elastic Compute Cloud (EC2) AWS Storage Gateway Service Amazon Simple Storage Service (S3) Corporate Data Center Amazon Relational Database Service (RDS) Amazon Redshift 30. Todays Agenda Informatica and Amazon Strategic Partnership Amazon Redshift Overview Informatica Cloud Redshift Connector Demonstration Discussion Next Steps 30 31. 2 1 Informatica Cloud Architecture Overview 4Secure Agent Your Company 3 Marketplace Amazon Redshift 32. Map Once. Deploy Anywhere. ON PREMISE HADOOP 3rd PARTY APPLICATIONS CLOUD 33. Cloud Amazon Redshift Connector Demo Nicolas Brisoux, Cloud Platform Adoption 34. Best practices to remember The Amazon S3 bucket that holds the data files must be created in the same region as your cluster Files are deleted from Amazon S3 bucket when upload is complete Choose a batch size where the number of batches matches the number of slices in your cluster Each XL node has 2 slices, each 8XL node has 16 If you have a 2 node XL cluster and 40,000 rows of data, choose a batch size of 10,000 The Informatica Cloud Redshift connector can maximize Amazons parallel processing capabilities this way 35. Informatica Cloud Amazon Redshift demonstration Firewall Informatica Cloud Secure Agent Metadata Mappings Authenticate and retrieve Data Synchronization Task 1 1 Retrieve Account Data2 2 3 Perform lookup on SLA level 3 4 4 Put Account Data & SLA Level into Flat File 5 Transferred compressed Flat File 5 6 Initiate load from Amazon S3 6 7 Load data into Amazon Redshift 7 36. PowerCenter Mappings and Informatica Cloud If you want to reuse your existing PowerCenter mappings with Informatica Cloud and Redshift you have 2 options: Use the PowerCenter Repository Manager to export your existing workflows and import them into Informatica Cloud using the PowerCenter Tasks feature Or Keep your existing mappings in PowerCenter and stage the data Create a DSS task in Informatica Cloud to move the data to Redshift from the staging area This task can be managed from PowerCenter 1 2 37. Why Informatica Cloud Integration for Redshift? 37 1 Map Once, Deploy Anywhere 2 Rapid Connectivity & Deployment 3 Advanced Integration Delivered Easily 4 Excellence in batch and real-time integration InformaticaCloud.com 38. Next Steps Get started with Amazon Redshift Get started with Informatica Cloud InformaticaCloud.com Learn more about our Redshift Connector InformaticaCloud.com/Amazon-Redshift 38 39. Discussion Rahul Pathak, Amazon Redshift Product Management Nicolas Brisoux, Informatica Cloud Platform Adoption Darren Cunningham, Informatica Cloud Marketing @infacloud #redshift InformaticaCloud.com

Recommended

View more >