52
© 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. AWSマイスターシリーズ ビッグデータサービス群のおさらい + AWS Data Pipeline 2014.3.19 ソリューションアーキテクト 今井 雄太 蒋 逸峰 re:Generate

ビッグデータサービス群のおさらい & AWS Data Pipeline

Embed Size (px)

DESCRIPTION

AWSマイスターシリーズ re:Generate ビッグデータサービス群のおさらい & AWS Data Pipeline

Citation preview

  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. AWS + AWS Data Pipeline 2014.3.19 re:Generate
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. ! AWS ! AWS Data Pipeline !
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. AWS
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Storage Data WarehouseNoSQL DynamoDB Redshift S3 Glacier Data Pipeline Relational Database Hadoop Workow Management RDS EMR Data Kinesis Stream Computing Cold Storage AWS
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. AWS Amazon Simple Storage Service(S3) Amazon Glacier(Glacier) 99.999999999% HTTP S3 S381 4
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. AWS Amazon DynamoDB(DynamoDB) Amazon Redshift(Redshift) NoSQL as a Service Data Ware House EC2RDS / Amazon RDS(RDS) Relational Database PostgreSQL,MySQL,Oracle,SQL Server
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. AWS Amazon Elastic MapReduce(EMR) Amazon Kinesis(Kinesis) Hadoop HDFSS3 Stream Computing MapReduce
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. AWS AWS Data Pipeline(Data Pipeline) ETL/
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Amazon S3 Elastic MapReduce Redshift EC2 RDS Storage Gateway EBS Redshift CloudFront GW Storage Gateway Elastic Transcoder Glacier Data Pipeline S3
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. EMR DynamoDB Redshift S3 S3 S3
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3EMR Hadoop
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3EMR S3 HDFS S3
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3EMR S3 S3
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3Redshift RedshiftS3 Redshift S3 COPY table_name FROM s3://hoge CREDENTIALS access_key_id:hoge DELIMITER ,
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3Redshift RedshiftS3 Glacier RedshiftS3 UNLOAD (SELECT * FROM) TO s3://fuga/. CREDENTIALS access_key_id:hoge;
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3Redshift RedshiftS3 Glacier Glacier
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3Redshift RedshiftS3 Glacier S3
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3DynamoDB DynamoDB
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3DynamoDB DynamoDBS3 S3
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3DynamoDB DynamoDBS3Glacier Glacier
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3 S3
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3 S3 S3 EMR
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3 S3 S3 S3 EMR EMR
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3 S3 S3 S3 EMR EMR S3
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3 S3 S3 S3 EMR EMR
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. S3 EMR S3 S3 EMR S3
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Amazon S3 Elastic MapReduce Redshift EC2 RDS Storage Gateway EBS Redshift CloudFront GW Storage Gateway Elastic Transcoder Glacier Data Pipeline S3
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Glacier EMR Redshift ETLS3 RDS S3 Web ETL 1.1.1.1, /login, 20140226000101, 1.1.1.1, /home, 20140226011226, 1.1.1.2, /home, 20140226011331, DATE PATH UU ------------------------ 2014-02-26 /login 1 2014-02-26 /home 2 USER PATH TIMESTAMP ----------------------------------- USER1 /login 2014-02-26 00:00:01 USER1 /home 2014-02-26 01:12:26 USER2 /home 2014-02-26 01:13:31
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. 2 DynamoDBKinesis Kinesis Twitter API HASH_TAG TIME COUNT ------------------------ A 2014-02-26 00:00 30 A 2014-02-26 00:01 20 B 2014-02-26 00:00 10 B 2014-02-26 00:01 5
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Glacier RDS EMR RedShift DynamoDB S3 Data WebApp BI Dashboard Data Pipeline AWS Kinesis
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Glacier RDS EMR RedShift DynamoDB S3 Data WebApp BI Dashboard Kinesis Data Pipeline AWS Data Pipeline
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Data Pipeline? S3, RDS, EMR, Redshift, DynamoDB Input Data Ready? Run Yes No
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. AWS Data Pipeline AWS AWS
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Pipeline Data Node: Activity: Schedule: Resource: Precondition: Action:
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. (Activities) AWS CopyActivity EmrActivity HiveActivity HiveCopyActivity PigActivity RedshiftCopyActivity SqlActivity ShellCommandActivity
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Input / Output S3 SQL DynamoDB Redshift CSV Data Format
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. / : S3DataNode, SqlDataNode ShellCommandActivity : o. Stage = true : ${INPUTx_STAGING_DIR} , ${OUTPUTx_STAGING_DIR} HiveActivity : on. : ${inputx}, ${outputx} { "id": "MyHiveActivity", "hiveScript": "INSERT OVERWRITE TABLE ${output1} select * from ${input1};" }, Table
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. (Preconditions) DynamoDB table S3 S3 Shell pipeline S3 key exists? Copy Yes No
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Cron: start Time Series: end EC2EMRstart : 15 15min ~ 3year Start/ Cron1 TS1 / Cron2 Period TS2 / Cron3 Period
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. (2) Backll 1 CLI --force : UTC, YYYY-MM-DDTHH:MM:SS #{inTimeZone(myDateTime,Asia/Tokyo')}
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. EC2: EC2-ClassicEC2-VPC EMR: spot instance Multi-region
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. (2) Activity: 20 Resource: : EMR1 Task 1 Task 2 Task 3
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. SNS 1~6 3 Task 1 Task 2 Alert Alert
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. GUI CSV/TSV EMR/EC2
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. GUIJSON Pipeline CLI ./datapipeline --create pipeline_name --put pipeline_file --activate --force { "objects": [ { "id": "ActivityId_YYbJV", "schedule": { "ref": "ScheduleId_X8kbH" }, "scriptUri": "s3://mybucket/ myscript.sh", "name": "ShellActivity1", "runsOn": { "ref": "ResourceId_5nJIh" }, ... ] } ./datapipeline validate my-pipeline.json credential credetials.json --force --id df-0123456789ABCD
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. (EC2, S3) (>11) On AWS $1.00 $0.60 $2.50 $1.50 pipeline $1.00 * 2014/3/19 /
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. DynamoDB Import/Export DynamoDB Import/ExportData Pipeline
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Ad S3RDS EMR Hive S3 Redshift SQLBI 1 / S3Hive SNS
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. AWS AWS Data Pipeline AWS
  • 2012 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.