Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
2-H1-1-10 / 2-H1-1-19
AWS DataLake
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
お手元のサミットガイドブックの表紙に記載している 『QRコード』 からご回答ください。もれなく素敵なAWSオリジナルグッズをプレゼントします。
本セッションのFeedbackをお願いします
プレゼントの引き換えは、パミール3F展示会場内アンケート確認エリア・受付エリアのいずれかにお越し下さい。
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
名前:上原誠 (うえはらまこと)
所属:ソリューションアーキテクト
担当:メディア系、アドテクのお客様
最近よく触ってるサービス: AWS Glue
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
•
• / ML
•
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
•
•
•
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1.
2. RDB
3.
4. AWS
5. AWS
6.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1.
2. RDB
3.
4. AWS
5. AWS
6.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
2
/ ML
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• データ活用の際には、さまざまなツールが必要となる
• ツールやモデルによって必要なリソースも異なる
• データの取得と保存、加工整形処理を行うためのツール
• 可視化- BIツール / 基礎集計のための SQL
• ディープラーニングフレームワークや Hadoop クラスタ、
Notebook などの分析ツール
• CPU / GPU / メモリ / IO
• 複数インスタンス / クラスタ
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• データ活用は基本的に試行錯誤を伴う
• データの量や中身の変化への継続的な対応
• ビジネスの状況変化に伴う、さまざまな指標の定期的な見直し
• 機械学習モデルの構築・改善サイクル
• この試行錯誤のサイクルをいかに高速に積み重ねるかが、
良い結果を導きだすために重要
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1.
2. RDB
3.
4. AWS
5. AWS
6.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
RDB
Databases
Logs
RDBBI
Report
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
RDB
• スキーマが定義されている
• データの型も強制できる
• アクセスするツールが簡単、エコシステムが安定
• トランザクション
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
RDB
Databases
Logs
BI
Report
1:
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1:
Databases
Logs
RDBBI
Report
Logs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1:
Databases
Logs
RDBBI
Report
Events
Media
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
2:
Databases
Logs
RDB
BI
Report
Lab
Realtime
Machine
Learning
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
RDB
•
•
•
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
•
➢
•
➢
•
➢
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1.
2. RDB
3.
4. AWS
5. AWS
6.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
•
Hadoop Hadoop
•
•
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Lake
Data Lake
………
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• ストレージと計算処理の分離➢ それぞれ独立してスケールできるので最適化しやすい
• Single Source of Truth (SSOT)
➢ データレイクにあるものを正とすれば良い
• 様々な input/output 手法に対応➢ in/out が独立、ETL も独立できるので、後からの拡張がスムーズ
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
( )
RDBMS
•
•
•
•
•
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1.
2. RDB
3.
4. AWS
5. AWS
6.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
これまで:
1. ディスクの容量に上限がある
2. データはサマリーだけ、もしくは期間限定で保存
3. 処理できる内容は固定的
On AWS:
1. 安価・上限無しのストレージ
2. オリジナルデータを全て残す
3. 処理対象・処理内容はビジネスに合わせて変わる
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3
•
•
•
•
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Glacier
AWS Amazon S3
Amazon S3EC2 RDS
Storage Gateway
EBS
Redshift
CloudFront
Elastic Transcoder
Amazon Lex
Amazon PollyAmazon
Rekognition
Amazon Machine
Learning
AWS IoT
QuickSight
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Amazon S3
S3
• IAM
• CloudTrail S3
• S3
•
• ( CSE, SSE )
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fluentd
Logstash
Flume
OSS
OSS on EC2
MQTT HTTPS
AWS IoTAmazon Kinesis
S3
AWS Snowball
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
OSS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis
Kinesis DataStreams
ストリームデータを処理・分析するための
データを格納
Kinesis DataFirehose
ストリームデータをS3, Redshift,
Amazon ES, Splunk に簡単にロード
Kinesis DataAnalytics
ストリーミングデータを標準的な SQL クエリで
簡単に分析
ストリームデータを収集・処理・配信するためのフルマネージドサービス群
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Kinesis Data Firehose AWS Athena
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3
Snowball
Fluentd
Logstash
FlumeAWS IoT
Amazon KinesisAmazon
Elasticsearch
Service
AWS DMSOSS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon RedshiftAmazon EMR
DWH
S3
Amazon Athena
S3
AWS Glue
ETL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
Leader node
Compute nodes
SQL Client / BI Tools
JDBC / ODBC Driver
• MPP(Massively Parallel Processing)
• 2PB
• JDBC/ODBC BI
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Leader node
Compute nodes
SQL Client / BI Tools
JDBC / ODBC Driver
• Redshift S3
• Redshift JOIN
• S3
Spectrum
Redshift
• Redshift S3
Amazon Redshift Spectrum
Redshift S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EMR
Hadoop/Spark
AWS
• Big Data
•
•
•
•
•
•
• Spot
Hadoop
Amazon EMR
Spot
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
EMRFS: S3 HDFS
と指定するだけで と同様に にアクセス
• 計算資源とストレージを分離できる• コスト面でもメリット大
• クラスタの削除が可能• クラスタを消してもデータをロストしない
• 複数クラスタ間でデータ共有が簡単
• データは耐久性の高い に配置
EMR
EMR
Amazon
S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Athena
• サーバーレス検索サービス
• S3 のデータに対して直接クエリできる
• Presto ベースで標準 SQL が実行可能
• 実行したクエリの容量ぶんの従量課金
• スキャンされたデータ1TBあたり5$
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Athena
• クエリエディタにSQLを記述しクエリを実行
51
S3
• 過去のクエリ結果は History からダウンロード可能
• クエリ結果は S3 に自動保存
• コンソールに結果が表示
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue
• ETL +
• GUI ETL PySpark
Scala
• Athena, Redshift Spectrum,
EMR Spark, Hive
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• GUI
• CSV→Parquet
Glue : GUI ETL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ジョブ=ETL処理を実行する単位
• PySpark, Scala で記述
• Extract(抽出)や、Load(取り込み)は抽象化されているため、主にTransform(変換)を既述する
■サンプルスクリプト集
https://github.com/awslabs/aws-glue-samples(※同じサイトにFAQもあり、こちらも必読)
Glue : ETL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• 作成したコードを読み込んで実行➢ IAM ロールで権限を設定
• ジョブの実行開始方法➢ API コール(手動)
➢ トリガー(スケジュール実行可能)
• リトライ制限の指定や、パラメータを渡すことが可能
• 実行ログは CloudWatch Logs に出力
Glue :
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ETL
Glue data
catalog
Uses Amazo Athena
Redshift Spectrum
EMR (Hive/Spark)
Uses
Uses
Uses
Amazon S3
AWS Glue
AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3
Amazon EMR AWS Glue
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon QuickSight
BI
• $9/
•
•
SPICE
• Redshift, RDS, S3, Athena, Salesforce,
• AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
EC2+BIツール• 多彩なパートナーソリューション・OSSをEC2上で活用
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3
EC2QuickSight
RDS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon
Glacier
Amazon
S3Amazon
DynamoDB
Amazon RDS/
AuroraAmazon
CloudSearchAmazon
Elasticsearch
Amazon
QuickSightAmazon
Elasticsearch
Amazon
Kinesis
Analytics
Amazon
EMR
Amazon
Redshift
Amazon
Machine
Learning
Amazon
Athena
AWS IoTAmazon
Kinesis Snowball
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1.
2. RDB
3.
4. AWS
5. AWS
6.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
primeNumber
DMP (Data Management Platform)
https://speakerdeck.com/hiro_koba_jp/aws-etlji-ri-aws-gluehuo-yong-shi-li-at-primenumber
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1.
2. RDB
3.
4. AWS
5. AWS
6.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
•
•
• AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you !