46
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Log Analytics with Amazon Kinesis and Amazon Elasticsearch Service

Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Embed Size (px)

Citation preview

Page 1: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Log Analytics with Amazon Kinesis and Amazon Elasticsearch

Service

Page 2: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

What to do with a terabyte of logs?

Page 3: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks
Page 4: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

data source Amazon Kinesis Firehose Amazon Elasticsearch Service

Kibana

Log analytics architecture

Page 5: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Amazon Elasticsearch Service is a cost-effective

managed service that makes it easy to deploy,

manage, and scale open source Elasticsearch for log

analytics, full-text search and more.Amazon

Elasticsearch Service

Page 6: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Amazon Elasticsearch Service benefits

Easy to use

Open-source compatible

Secure

Highly available

AWS integrated

Scalable

Page 7: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Adobe Developer Platform (Adobe I/O)

P R O B L E M• Cost effective monitor for

XL amount of log data

• Over 200,000 API calls per second at peak - destinations, response times, bandwidth

• Integrate seamlessly with other components of AWS eco-system.

S O L U T I O N• Log data is routed with

Amazon Kinesis to Amazon Elasticsearch Service, then displayed using AES Kibana

• Adobe team can easily see traffic patterns and error rates, quickly identifying anomalies and potential challenges

B E N E F I T S• Management and

operational simplicity

• Flexibility to try out different cluster config during dev and test

AmazonKinesisStreams

Spark StreamingAmazon

Elasticsearch Service

Data Sources

11

00

Page 8: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

McGraw Hill Education

P R O B L E M• Supporting a wide catalog

across multiple services in multiple jurisdictions

• Over 100 million learning events each month

• Tests, quizzes, learning modules begun / completed / abandoned

S O L U T I O N

• Search and analyze test results, student/teacher interaction, teacher effectiveness, student progress

• Analytics of applications and infrastructure are now integrated to understand operations in real time

B E N E F I T S

• Confidence to scale throughout the school year. From 0 to 32TB in 9 months

• Focus on their business, not their infrastructure

Page 9: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Get set up right

Page 10: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Amazon ES overview

Amazon Route 53

Elastic LoadBalancingIAM

CloudWatch

Elasticsearch API

CloudTrail

Page 11: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks
Page 12: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Data pattern

Amazon ES cluster

logs_01.21.2017

logs_01.22.2017

logs_01.23.2017

logs_01.24.2017

logs_01.25.2017

logs_01.26.2017

logs_01.27.2017

Shard 1

Shard 2

Shard 3hostidentauthtimestampetc.

Each index hasmultiple shards

Each shard containsa set of documents

Each document containsa set of fields and values

One index per day

Page 13: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Deployment of indices to a cluster

• Index 1– Shard 1– Shard 2– Shard 3

• Index 2– Shard 1– Shard 2– Shard 3

Amazon ES cluster

12

3

12

3

12

3

12

3

Primary Replica

1

3

3

1

Instance 1,Master

2

1

1

2

Instance 2

3

2

2

3

Instance 3

Page 14: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks
Page 15: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

How many instances?

The index size will be about the same as the corpus of source documents

• Double this if you are deploying an index replica

Size based on storage requirements• Either local storage or up to 1.5TB of EBS per

instance

• Example: 2TB corpus will need 4 instances– Assuming a replica and using EBS– Or with i2.2xlarge nodes (1.6TB ephemeral storage)

Page 16: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks
Page 17: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Instance type recommendations

Instance WorkloadT2 Entry point. Dev and test.

M3, M4 Equal read and write volumes.

R3, R4 Read-heavy or workloads with high memory demands (e.g., aggregations).

C4 High concurrency/indexing workloads

I2 Up to 1.6 TB of SSD instance storage.

Page 18: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks
Page 19: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Cluster with no dedicated masters

Amazon ES cluster

1

3

3

1

Instance 1,Master

2

1

1

2

Instance 2

3

2

2

3

Instance 3

Page 20: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Cluster with dedicated masters

Amazon ES cluster

1

3

3

1

Instance 1

2

1

1

2

Instance 2

3

2

2

3

Instance 3Dedicated master nodesData nodes: queries and updates

Page 21: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Master node selection

• < 10 nodes - m3.medium, c4.large• 11-20 nodes - m4.large, r4.large, m3.large, r3.large• 21-40 nodes - c4.xlarge, m4.xlarge, r4.xlarge, m3.xlarge

Page 22: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks
Page 23: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Cluster with zone awareness

Amazon ES cluster

1

3

Instance 1

2

1 2

Instance 2

3

2

1

Instance 3

Availability Zone 1 Availability Zone 2

2

1

Instance 4

3

3

Page 24: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Small use cases

• Logstash co-located on the Application instance

• SigV4 signing via provided output plugin

• Up to 200GB of data• m3.medium + 100G EBS

data nodes• 3x m3.medium master nodes

ApplicationInstance

Page 25: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Large use cases

AmazonDynamoDB

AWSLambda

Amazon S3 bucket

Amazon CloudWatch

• Data flows from instances and applications via Lambda; CWL is implicit

• SigV4 signing via Lambda/roles

• Up to 5TB of data• r3.2xlarge + 512GB EBS

data nodes• 3x m3.medium master nodes

Page 26: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

XL use cases

Amazon Kinesis

• Ingest supported through high-volume technologies like Spark or Kinesis

• Up to 60 TB of data• R3.8xlarge + 640GB data

nodes• 3x m3.xlarge master nodes

Amazon EMR

Page 27: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Best practices

Data nodes = Storage needed/Storage per nodeUse GP2 EBS volumesUse 3 dedicated master nodes for production deploymentsEnable Zone AwarenessSet indices.fielddata.cache.size = 40

Page 28: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Amazon Kinesis

Page 29: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Amazon Kinesis: Streaming Data Made Easy Services make it easy to capture, deliver, process streams on AWS

Amazon KinesisStreams

Amazon KinesisAnalytics

Amazon KinesisFirehose

Page 30: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Amazon Kinesis Streams

• Easy administration• Build real time applications with framework of choice• Low cost

Page 31: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Amazon Kinesis Firehose

• Zero administration• Direct-to-data store integration• Seamless elasticity

Page 32: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Amazon Kinesis Analytics

• Interact with streaming data in real-time using SQL• Build fully managed and elastic stream processing

applications that process data for real-time visualizations and alarms

Page 33: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Amazon Kinesis - Firehose vs. Streams

Amazon Kinesis Streams is for use cases that require custom processing, per incoming record, with sub-1 second processing latency, and a choice of stream processing frameworks.

Amazon Kinesis Firehose is for use cases that require zero administration, ability to use existing analytics tools based on Amazon S3, Amazon Redshift and Amazon Elasticsearch, and a data latency of 60 seconds or higher.

Page 34: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Kinesis Firehose overview

Delivery Stream: Underlying AWS resource

Destination: Amazon ES, Amazon Redshift, or Amazon S3

Record: Put records in streams to deliver to destinations

Page 35: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Kinesis Firehose Data Transformation• Firehose buffers up to 3MB of ingested data• When buffer is full, automatically invokes Lambda function,

passing array of records to be processed• Lambda function processes and returns array of transformed

records, with status of each record• Transformed records are saved to configured destination

[{" "recordId": "1234", "data": "encoded-data" }, { "recordId": "1235", "data": "encoded-data" }]

[{ "recordId": "1234", "result": "Ok" "data": "encoded-data" }, { "recordId": "1235", "result": "Dropped" "data": "encoded-data" }]

Page 36: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Kinesis Firehose delivery architecture with transformations

S3 bucket

source records

data source

source records

Amazon ElasticsearchService

Firehosedelivery stream

transformedrecords

delivery failure

Data transformation function

transformation failure

Page 37: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Kinesis Firehose features for ingest

Serverless scale Error handling S3 Backup

Page 38: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Best practices

Use smaller buffer sizes to increase throughput, but be careful of concurrency

Use index rotation based on sizing

Default: stream limits: 2,000 transactions/second, 5,000 records/second, and 5 MB/second

Page 39: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Log analysis with aggregations

Page 40: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Amazon ES aggregations

Buckets – a collection of documents meeting some criterionMetrics – calculations on the content of buckets

Bucket: time

Met

ric: c

ount

Page 41: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

host:199.72.81.55 with <histogram of verb>

1, 4, 8, 12, 30, 42, 58, 100...

Look up

199.72.81.55

Field data

GETGETPOSTGETPUTGETGETPOST

Buckets

GETPOSTPUT

521

Counts

Page 42: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

A more complicated aggregation

Bucket: ARNBucket: RegionBucket: eventNameMetric: Count

Page 43: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Best practices

Make sure that your fields are not_analyzed

Visualizations are based on buckets/metrics

Use a histogram on the x-axis first, then sub-aggregate

Page 44: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Run Elasticsearch in the AWS cloud with Amazon

Elasticsearch Service

Use Kinesis Firehose to ingest data simply

Kibana for monitoring, Elasticsearch queries for

deeper analysisAmazon Elasticsearch

Service

Page 45: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

What to do next

Qwiklab: https://qwiklabs.com/searches/lab?keywords=introduction%20to%20amazon%20elasticsearch%20serviceCentralized logging solutionhttps://aws.amazon.com/answers/logging/centralized-logging/Our overview page on AWShttps://aws.amazon.com/elasticsearch-service/

Page 46: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

Q&A

Thank you for joining!