267
Amazon Kinesis Data Analytics Developer Guide

Amazon Kinesis Analytics - Developer Guide

  • Upload
    vanminh

  • View
    284

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data AnalyticsDeveloper Guide

Page 2: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer Guide

Amazon Kinesis Data Analytics: Developer GuideCopyright © 2018 Amazon Web Services, Inc. and/or its affiliates. All rights reserved.

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any mannerthat is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks notowned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored byAmazon.

Page 3: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer Guide

Table of ContentsWhat Is Amazon Kinesis Data Analytics? ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

When Should I Use Amazon Kinesis Data Analytics? ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Are You a First-Time User of Amazon Kinesis Data Analytics? ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

How It Works .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Input .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Configuring a Streaming Source .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Configuring a Reference Source .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Working with JSONPath .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Mapping Streaming Source Elements to SQL Input Columns .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Using the Schema Discovery Feature on Streaming Data .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Using the Schema Discovery Feature on Static Data .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Preprocessing Data Using a Lambda Function .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Parallelizing Input Streams for Increased Throughput .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Application Code .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Output .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Creating an Output Using the AWS CLI ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Using a Lambda Function as Output .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Application Output Delivery Model ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Error Handling .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Reporting Errors Using an In-Application Error Stream ..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Granting Permissions .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Trust Policy .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Permissions Policy .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Auto Scaling Applications .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Getting Started .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Step 1: Set Up an Account .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Sign Up for AWS .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Create an IAM User .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Next Step .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Step 2: Set Up the AWS CLI ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Next Step .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Step 3: Create Your Starter Analytics Application .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Step 3.1: Create an Application .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48Step 3.2: Configure Input .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Step 3.3: Add Real-Time Analytics (Add Application Code) ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Step 3.4: (Optional) Update the Application Code .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Step 4 (Optional) Edit the Schema and SQL Code Using the Console .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Working with the Schema Editor ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Working with the SQL Editor ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Streaming SQL Concepts .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65In-Application Streams and Pumps .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Timestamps and the ROWTIME Column ..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Understanding Various Times in Streaming Analytics ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Continuous Queries ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Windowed Queries ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Tumbling Windows .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69Sliding Windows .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Stream Joins .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74Example 1: Report Orders Where There Are Trades Within One Minute of the Order BeingPlaced .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Example Applications .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Examples: Preprocessing Streams .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Preprocessing Streams with Lambda .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Example: Manipulating Strings and Date Times .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

iii

Page 4: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer Guide

Example: Streaming Source With Multiple Record Types .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85Example: Add Reference Data Source .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Examples: Basic Analytics ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94Example: Most Frequently Occurring Values .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94Example: Count Distinct Values .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Example: Simple Alerts ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96Example: Throttled Alerts ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

Examples: Advanced Analytics ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99Example: Detect Anomalies ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99Example: Detect Anomalies and Obtain an Explanation .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104Example: Detect Hotspots .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108Example: Using Different Types of Times in Analytics ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Examples: Other Amazon Kinesis Data Analytics Applications .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118Example: Explore the In-Application Error Stream ..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Monitoring .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120Monitoring Tools ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

Automated Tools ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121Manual Tools ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Monitoring with Amazon CloudWatch .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121Metrics and Dimensions .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Limits ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129Best Practices .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

Managing Applications .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130Defining Input Schema .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131Connecting to Outputs .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132Authoring Application Code .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Troubleshooting .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133Get a SQL Statement to Work Correctly ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133Unable to Detect or Discover My Schema .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133Important Application Health Parameters to Monitor ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133Invalid Code Errors When Running an Application .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134Application Doesn't Process Data After Deleting and Re-creating the Kinesis Application InputStream or Kinesis Data Firehose Delivery Stream with the Same Name .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134Insufficient Throughput or High MillisBehindLatest ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

Authentication and Access Control ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136Authentication .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136Access Control ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137Overview of Managing Access .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Amazon Kinesis Data Analytics Resources and Operations .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138Understanding Resource Ownership .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138Managing Access to Resources .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138Specifying Policy Elements: Actions, Effects, and Principals ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140Specifying Conditions in a Policy .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

Using Identity-Based Policies (IAM Policies) ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141Permissions Required to Use the Amazon Kinesis Data Analytics Console .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141AWS Managed (Predefined) Policies for Amazon Kinesis Data Analytics ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142Customer Managed Policy Examples .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Amazon Kinesis Data Analytics API Permissions Reference .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146SQL Reference .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148API Reference .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Actions .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149AddApplicationCloudWatchLoggingOption .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150AddApplicationInput .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152AddApplicationInputProcessingConfiguration .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155AddApplicationOutput .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157AddApplicationReferenceDataSource .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160CreateApplication .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

iv

Page 5: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer Guide

DeleteApplication .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168DeleteApplicationCloudWatchLoggingOption .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170DeleteApplicationInputProcessingConfiguration .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172DeleteApplicationOutput .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174DeleteApplicationReferenceDataSource .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176DescribeApplication .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178DiscoverInputSchema .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182ListApplications .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186StartApplication .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188StopApplication .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190UpdateApplication .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

Data Types .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195ApplicationDetail .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197ApplicationSummary .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200ApplicationUpdate .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201CloudWatchLoggingOption .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202CloudWatchLoggingOptionDescription .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203CloudWatchLoggingOptionUpdate .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204CSVMappingParameters ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205DestinationSchema .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206Input .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207InputConfiguration .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209InputDescription .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210InputLambdaProcessor ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212InputLambdaProcessorDescription .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213InputLambdaProcessorUpdate .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214InputParallelism .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215InputParallelismUpdate .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216InputProcessingConfiguration .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217InputProcessingConfigurationDescription .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218InputProcessingConfigurationUpdate .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219InputSchemaUpdate .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220InputStartingPositionConfiguration .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221InputUpdate .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222JSONMappingParameters ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224KinesisFirehoseInput .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225KinesisFirehoseInputDescription .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226KinesisFirehoseInputUpdate .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227KinesisFirehoseOutput .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228KinesisFirehoseOutputDescription .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229KinesisFirehoseOutputUpdate .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230KinesisStreamsInput .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231KinesisStreamsInputDescription .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232KinesisStreamsInputUpdate .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233KinesisStreamsOutput .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234KinesisStreamsOutputDescription .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235KinesisStreamsOutputUpdate .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236LambdaOutput .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237LambdaOutputDescription .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238LambdaOutputUpdate .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239MappingParameters ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240Output .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241OutputDescription .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243OutputUpdate .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245RecordColumn ..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247RecordFormat .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248ReferenceDataSource .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

v

Page 6: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer Guide

ReferenceDataSourceDescription .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250ReferenceDataSourceUpdate .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252S3Configuration .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254S3ReferenceDataSource .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255S3ReferenceDataSourceDescription .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256S3ReferenceDataSourceUpdate .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257SourceSchema .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

Document History .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259AWS Glossary .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

vi

Page 7: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideWhen Should I Use Amazon Kinesis Data Analytics?

What Is Amazon Kinesis DataAnalytics?

With Amazon Kinesis Data Analytics, you can process and analyze streaming data using standard SQL.The service enables you to quickly author and run powerful SQL code against streaming sources toperform time series analytics, feed real-time dashboards, and create real-time metrics.

To get started with Kinesis Data Analytics, you create a Kinesis data analytics application thatcontinuously reads and processes streaming data. The service supports ingesting data from AmazonKinesis Data Streams and Amazon Kinesis Data Firehose streaming sources. Then, you author your SQLcode using the interactive editor and test it with live streaming data. You can also configure destinationswhere you want Kinesis Data Analytics to send the results.

Kinesis Data Analytics supports Amazon Kinesis Data Firehose (Amazon S3, Amazon Redshift, andAmazon Elasticsearch Service), AWS Lambda, and Amazon Kinesis Data Streams as destinations.

When Should I Use Amazon Kinesis Data Analytics?Amazon Kinesis Data Analytics enables you to quickly author SQL code that continuously reads,processes, and stores data in near real time. Using standard SQL queries on the streaming data, you canconstruct applications that transform and provide insights into your data. Following are some of examplescenarios for using Kinesis Data Analytics:

• Generate time-series analytics – You can calculate metrics over time windows, and then stream valuesto Amazon S3 or Amazon Redshift through a Kinesis data delivery stream.

• Feed real-time dashboards – You can send aggregated and processed streaming data resultsdownstream to feed real-time dashboards.

• Create real-time metrics – You can create custom metrics and triggers for use in real-time monitoring,notifications, and alarms.

For information about the SQL language elements that are supported by Kinesis Data Analytics, seeAmazon Kinesis Data Analytics SQL Reference.

Are You a First-Time User of Amazon Kinesis DataAnalytics?

If you are a first-time user of Amazon Kinesis Data Analytics, we recommend that you read the followingsections in order:

1. Read the How It Works section of this guide. This section introduces various Kinesis Data Analyticscomponents that you work with to create an end-to-end experience. For more information, seeAmazon Kinesis Data Analytics: How It Works (p. 3).

2. Try the Getting Started exercises. For more information, see Getting Started with Amazon KinesisData Analytics (p. 44).

3. Explore the streaming SQL concepts. For more information, see Streaming SQL Concepts (p. 65).

1

Page 8: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAre You a First-Time User of

Amazon Kinesis Data Analytics?

4. Try additional examples. For more information, see Example Amazon Kinesis Data AnalyticsApplications (p. 76).

2

Page 9: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer Guide

Amazon Kinesis Data Analytics: HowIt Works

An application is the primary resource in Amazon Kinesis Data Analytics that you can create in youraccount. You can create and manage applications using the AWS Management Console or the AmazonKinesis Data Analytics API. Kinesis Data Analytics provides API operations to manage applications. For alist of API operations, see Actions (p. 149).

Amazon Kinesis data analytics applications continuously read and process streaming data in real time.You write application code using SQL to process the incoming streaming data and produce output. Then,Kinesis Data Analytics writes the output to a configured destination. The following diagram illustrates atypical application architecture.

Each application has a name, description, version ID, and status. Amazon Kinesis Data Analytics assignsa version ID when you first create an application. This version ID is updated when you update anyapplication configuration. For example, if you add an input configuration, add or delete a referencedata source, or add or delete output configuration, or update application code, Kinesis Data Analyticsupdates the current application version ID. Kinesis Data Analytics also maintains time stamps for whenan application was created and last updated.

In addition to these basic properties, each application consists of the following:

• Input – The streaming source for your application. You can select either a Kinesis data stream ora Kinesis Data Firehose data delivery stream as the streaming source. In the input configuration,you map the streaming source to an in-application input stream. The in-application stream is like acontinuously updating table upon which you can perform the SELECT and INSERT SQL operations. Inyour application code you can create additional in-application streams to store intermediate queryresults.

 

3

Page 10: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer Guide

You can optionally partition a single streaming source in multiple in-application input streams toimprove the throughput. For more information, see Limits (p. 129) and Configuring ApplicationInput (p. 5).

 

Amazon Kinesis Data Analytics provides a time stamp column in each application stream calledTimestamps and the ROWTIME Column (p. 66). You can use this column in time-based windowedqueries. For more information, see Windowed Queries (p. 69).

 

You can optionally configure a reference data source to enrich your input data stream within theapplication. It results in an in-application reference table. You must store your reference data asan object in your S3 bucket. When the application starts, Amazon Kinesis Data Analytics readsthe Amazon S3 object and creates an in-application table. For more information, see ConfiguringApplication Input (p. 5).

 

• Application code – A series of SQL statements that process input and produce output. You can writeSQL statements against in-application streams and reference tables, and you can write JOIN queries tocombine data from both of these sources.

 

For information about the SQL language elements that are supported by Kinesis Data Analytics, seeAmazon Kinesis Data Analytics SQL Reference.

 

In its simplest form, application code can be a single SQL statement that selects from a streaminginput and inserts results into a streaming output. It can also be a series of SQL statements whereoutput of one feeds into the input of the next SQL statement. Further, you can write application codeto split an input stream into multiple streams and then apply additional queries to process thesestreams. For more information, see Application Code (p. 30).

 

• Output – In application code, query results go to in-application streams. In your application code,you can create one or more in-application streams to hold intermediate results. You can thenoptionally configure application output to persist data in the in-application streams, that hold yourapplication output (also referred to as in-application output streams), to external destinations.External destinations can be a Kinesis data delivery stream or a Kinesis data stream. Note the followingabout these destinations:

• You can configure a Kinesis data delivery stream to write results to Amazon S3, Amazon Redshift, orAmazon Elasticsearch Service (Amazon ES).

 

• You can also write application output to a custom destination, instead of Amazon S3 or AmazonRedshift. To do that, you specify a Kinesis data stream as the destination in your outputconfiguration. Then, you configure AWS Lambda to poll the stream and invoke your Lambdafunction. Your Lambda function code receives stream data as input. In your Lambda function code,you can write the incoming data to your custom destination. For more information, see Using AWSLambda with Amazon Kinesis Data Analytics.

For more information, see Configuring Application Output (p. 31).

4

Page 11: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInput

In addition, note the following:

• Amazon Kinesis Data Analytics needs permissions to read records from a streaming source and writeapplication output to the external destinations. You use IAM roles to grant these permissions.

 

• Kinesis Data Analytics automatically provides an in-application error stream for each application. Ifyour application has issues while processing certain records, for example because of a type mismatchor late arrival, that record will be written to the error stream. You can configure application outputto direct Kinesis Data Analytics to persist the error stream data to an external destination for furtherevaluation. For more information, see Error Handling (p. 39).

 

• Amazon Kinesis Data Analytics ensures that your application output records are written to theconfigured destination. It uses an "at least once" processing and delivery model, even in the event ofan application interruption for various reasons. For more information, see Delivery Model for PersistingApplication Output to an External Destination (p. 38).

Topics

• Configuring Application Input (p. 5)

• Application Code (p. 30)

• Configuring Application Output (p. 31)

• Error Handling (p. 39)

• Granting Amazon Kinesis Data Analytics Permissions to Access Streaming Sources (Creating an IAMRole) (p. 40)

• Automatically Scaling Applications to Increase Throughput (p. 42)

Configuring Application InputYour Amazon Kinesis Data Analytics application can receive input from a single streaming source and,optionally, use one reference data source. For more information, see Amazon Kinesis Data Analytics: HowIt Works (p. 3). The sections in this topic describe the application input sources.

Topics

• Configuring a Streaming Source (p. 5)

• Configuring a Reference Source (p. 7)

• Working with JSONPath (p. 9)

• Mapping Streaming Source Elements to SQL Input Columns (p. 12)

• Using the Schema Discovery Feature on Streaming Data (p. 16)

• Using the Schema Discovery Feature on Static Data (p. 18)

• Preprocessing Data Using a Lambda Function (p. 20)

• Parallelizing Input Streams for Increased Throughput (p. 27)

Configuring a Streaming SourceAt the time that you create an application, you specify a streaming source. You can also modify an inputafter you create the application. Amazon Kinesis Data Analytics supports the following streaming sourcesfor your application:

5

Page 12: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideConfiguring a Streaming Source

• A Kinesis data stream• A Kinesis Data Firehose delivery stream

Kinesis Data Analytics continuously polls the streaming source for new data and ingests it in in-application streams according to the input configuration. Your application code can query the in-application stream. As part of input configuration you provide the following:

• Streaming source – You provide the Amazon Resource Name (ARN) of the stream and an IAM role thatKinesis Data Analytics can assume to access the stream on your behalf.

• In-application stream name prefix – When you start the application, Kinesis Data Analytics createsthe specified in-application stream. In your application code, you access the in-application streamusing this name.

You can optionally map a streaming source to multiple in-application streams. For more information,see Limits (p. 129). In this case, Amazon Kinesis Data Analytics creates the specified number of in-application streams with names as follows: prefix_001, prefix_002, and prefix_003. By default,Kinesis Data Analytics maps the streaming source to one in-application stream named prefix_001.

There is a limit on the rate that you can insert rows in an in-application stream. Therefore, KinesisData Analytics supports multiple such in-application streams so that you can bring records into yourapplication at a much faster rate. If you find that your application is not keeping up with the data inthe streaming source, you can add units of parallelism to improve performance.

• Mapping schema – You describe the record format (JSON, CSV) on the streaming source. You alsodescribe how each record on the stream maps to columns in the in-application stream that is created.This is where you provide column names and data types.

NoteKinesis Data Analytics adds quotation marks around the identifiers (stream name and columnnames) when creating the input in-application stream. When querying this stream and thecolumns, you must specify them in quotation marks using the same casing (matching lowercaseand uppercase letters exactly). For more information about identifiers, see Identifiers in theAmazon Kinesis Data Analytics SQL Reference.

You can create an application and configure inputs in the Amazon Kinesis Data Analytics console. Theconsole then makes the necessary API calls. You can configure application input when you create anew application API or add input configuration to an existing application. For more information, seeCreateApplication (p. 163) and AddApplicationInput (p. 152). The following is the input configurationpart of the Createapplication API request body:

"Inputs": [ { "InputSchema": { "RecordColumns": [ { "IsDropped": boolean, "Mapping": "string", "Name": "string", "SqlType": "string" } ], "RecordEncoding": "string", "RecordFormat": { "MappingParameters": { "CSVMappingParameters": { "RecordColumnDelimiter": "string", "RecordRowDelimiter": "string" }, "JSONMappingParameters": {

6

Page 13: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideConfiguring a Reference Source

"RecordRowPath": "string" } }, "RecordFormatType": "string" } }, "KinesisFirehoseInput": { "ResourceARN": "string", "RoleARN": "string" }, "KinesisStreamsInput": { "ResourceARN": "string", "RoleARN": "string" }, "Name": "string" } ]

Configuring a Reference SourceYou can also optionally add a reference data source to an existing application to enrich the data comingin from streaming sources. You must store reference data as an object in your Amazon S3 bucket. Whenthe application starts, Amazon Kinesis Data Analytics reads the Amazon S3 object and creates an in-application reference table. Your application code can then join it with an in-application stream.

You store reference data in the Amazon S3 object using supported formats (CSV, JSON). For example,suppose that your application performs analytics on stock orders. Assume the following record formaton the streaming source:

Ticker, SalePrice, OrderId

AMZN $700 1003XYZ $250 1004...

In this case, you might then consider maintaining a reference data source to provide details for eachstock ticker, such as company name.

Ticker, CompanyAMZN, AmazonXYZ, SomeCompany...

Amazon Kinesis Data Analytics provides the following API actions to manage reference data sources:

• AddApplicationReferenceDataSource (p. 160)• UpdateApplication (p. 192)

NoteThe Kinesis Data Analytics console does not support managing reference data sources for yourapplications. You can use the AWS Command Line Interface (AWS CLI) to add reference datasource to your application. For an example, see Example: Adding Reference Data to an AmazonKinesis Data Analytics Application (p. 90).

Note the following:

• If the application is running, Kinesis Data Analytics creates an in-application reference table, and thenloads the reference data immediately.

7

Page 14: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideConfiguring a Reference Source

• If the application is not running (for example, it's in the ready state), Kinesis Data Analytics saves onlythe updated input configuration. When the application starts running, Kinesis Data Analytics loads thereference data in your application as a table.

Suppose that you want to refresh the data after Kinesis Data Analytics creates the in-applicationreference table. Perhaps you updated the Amazon S3 object, or you want to use a different Amazon S3object. In this case, you must explicitly call the UpdateApplication (p. 192). Kinesis Data Analytics doesnot refresh the in-application reference table automatically.

There is a limit on the size of the Amazon S3 object that you can create as a reference data source. Formore information, see Limits (p. 129). If the object size exceeds the limit, Kinesis Data Analytics can'tload the data. The application state appears as running, but the data is not being read.

When you add a reference data source, you provide the following information:

• S3 bucket and object key name – In addition to the bucket name and object key, you also provide anIAM role that Kinesis Data Analytics can assume to read the object on your behalf.

• In-application reference table name – Kinesis Data Analytics creates this in-application table andpopulates it by reading the Amazon S3 object. This is the table name you specify in your applicationcode.

• Mapping schema – You describe the record format (JSON, CSV), encoding of data stored in theAmazon S3 object. You also describe how each data element maps to columns in the in-applicationreference table.

The following shows the request body in the AddApplicationReferenceDataSource API request.

{ "applicationName": "string", "CurrentapplicationVersionId": number, "ReferenceDataSource": { "ReferenceSchema": { "RecordColumns": [ { "IsDropped": boolean, "Mapping": "string", "Name": "string", "SqlType": "string" } ], "RecordEncoding": "string", "RecordFormat": { "MappingParameters": { "CSVMappingParameters": { "RecordColumnDelimiter": "string", "RecordRowDelimiter": "string" }, "JSONMappingParameters": { "RecordRowPath": "string" } }, "RecordFormatType": "string" } }, "S3ReferenceDataSource": { "BucketARN": "string", "FileKey": "string", "ReferenceRoleARN": "string" }, "TableName": "string" }

8

Page 15: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideWorking with JSONPath

}

Working with JSONPathJSONPath is a standardized way to query elements of a JSON object. JSONPath uses path expressions tonavigate elements, nested elements, and arrays in a JSON document. For more information about JSON,see Introducing JSON.

Accessing JSON Elements with JSONPathFollowing, you can find how to use JSONPath expressions to access various elements in JSON-formatteddata. For the examples in this section, assume that the source stream contains a JSON record in thefollowing format.

{ "customerName":"John Doe", "address": { "streetAddress": [ "number":"123", "street":"AnyStreet" ], "city":"Anytown" } "orders": [ { "orderId":"23284", "itemName":"Widget", "itemPrice":"33.99" }, { "orderId":"63122", "itemName":"Gadget", "itemPrice":"22.50" }, { "orderId":"77284", "itemName":"Sprocket", "itemPrice":"12.00" } ]}

Accessing JSON Elements

To query an element in JSON data using JSONPath, use the following syntax. Here, $ represents the rootof the data hierarchy and elementName is the name of the element node to query.

$.elementName

The following expression queries the customerName element in the preceding JSON example.

$.customerName

The preceding expression returns the following from the preceding JSON record.

John Doe

NotePath expressions are case sensitive. The expression $.Name returns null from the precedingJSON example.

NoteIf no element appears at the location where the path expression specifies, the expression returnsnull. The following expression returns null from the preceding JSON example, because thereis no matching element.

9

Page 16: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideWorking with JSONPath

$.customerId

Accessing Nested JSON Elements

To query a nested JSON element, use the following syntax.

$.parentElement.element

The following expression queries the city element in the preceding JSON example.

$.address.city

The preceding expression returns the following from the preceding JSON record.

Anytown

You can query further levels of subelements using the following syntax.

$.parentElement.element.subElement

The following expression queries the street element in the preceding JSON example.

$.address.streetAddress.street

The preceding expression returns the following from the preceding JSON record.

AnyStreet

Accessing Arrays

Arrays are queried using an array index expression inside square brackets ([]). Currently, the only indexexpression supported is 0:, meaning that all the elements in the array are returned.

The format of the data returned depends on whether the array index expression is the last expression inthe path:

• When the array index is the last expression in the path expression, all of the contents of the array arereturned as a single field in a single data row.

• When there is a nested expression after the array index expression, the array is "flattened." In otherwords, each element in the array is returned as a separate data row.

To query the entire contents of an array as a single row, use the following syntax.

$.arrayObject[0:]

The following expression queries the entire contents of the orders element in the preceding JSONexample. It returns the array contents in a single column in a single row.

$.orders[0:]

The preceding expression returns the following from the preceding JSON record.

10

Page 17: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideWorking with JSONPath

[{"orderId":"23284","itemName":"Widget","itemPrice":"33.99"},{"orderId":"61322","itemName":"Gadget","itemPrice":"22.50"},{"orderId":"77284","itemName":"Sprocket","itemPrice":"12.00"}]

To query the individual elements in an array as separate rows, use the following syntax.

$.arrayObject[0:].element

The following expression queries the orderId elements in the preceding JSON example, and returnseach array element as a separate row.

$.orders[0:].orderId

The preceding expression returns the following from the preceding JSON record, with each data itemreturned as a separate row.

23284

63122

77284

NoteIf expressions that query nonarray elements are included in a schema that queries individualarray elements, the nonarray elements are repeated for each element in the array. For example,suppose that a schema for the preceding JSON example includes the following expressions:

• $.customerName• $.orders[0:].orderId

In this case, the returned data rows from the sample input stream element resemble thefollowing, with the name element repeated for every orderId element.

John Doe 23284

John Doe 63122

John Doe 77284

NoteThe following limitations apply to array expressions in Amazon Kinesis Data Analytics:

• Only one level of dereferencing is supported in an array expression. The following expressionformat is not supported.

$.arrayObject[0:].element[0:].subElement

• Only one array can be flattened in a schema. Multiple arrays can be referenced—returned asone row containing all of the elements in the array. However, only one array can have each ofits elements returned as individual rows.

A schema containing elements in the following format is valid. This format returns thecontents of the second array as a single column, repeated for every element in the first array.

11

Page 18: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideMapping Streaming Source Elements to SQL Input Columns

$.arrayObjectOne[0:].element$.arrayObjectTwo[0:]

A schema containing elements in the following format is not valid.

$.arrayObjectOne[0:].element$.arrayObjectTwo[0:].element

Other ConsiderationsAdditional considerations for working with JSONPath are as follows:

• If no arrays are accessed by an individual element in the JSONPath expression, then a single row iscreated for each JSON record processed. Every JSONPath expression corresponds to a single column.

• When an array is flattened, any missing elements result in a null value being created in the in-application stream.

• An array is always flattened to at least one row. If no values would be returned (that is, the array isempty or none of its elements are queried), a single row with all null values is returned.

The following expression returns records with null values from the preceding JSON example, becausethere is no matching element at the specified path.

$.orders[0:].itemId

The preceding expression returns the following from the preceding JSON example record.

null

null

null

Related Topics• Introducing JSON

Mapping Streaming Source Elements to SQL InputColumnsWith Amazon Kinesis Data Analytics, you can process and analyze streaming data in either JSON or CSVformats using standard SQL.

• To process and analyze streaming CSV data, you assign column names and data types for the columnsof the input stream. Your application imports one column from the input stream per column definition,in order.

You don't have to include all of the columns in the application input stream, but you cannot skipcolumns from the source stream. For example, you can import the first three columns from an inputstream containing five elements, but you cannot import only columns 1, 2, and 4.

12

Page 19: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideMapping Streaming Source Elements to SQL Input Columns

• To process and analyze streaming JSON data, you use JSONPath expressions to map JSON elementsfrom a streaming source to SQL columns in an input stream. For more information about usingJSONPath with Amazon Kinesis Data Analytics, see Working with JSONPath (p. 9). The columns inthe SQL table have data types that are mapped from JSON types. For supported data types, see DataTypes. For details about converting JSON data to SQL data, see Mapping JSON Data Types to SQL DataTypes (p. 15).

For more information about how to configure input steams, see Configuring Application Input (p. 5).

Mapping JSON Data to SQL ColumnsYou can map JSON elements to input columns using the AWS Management Console or the Kinesis DataAnalytics API.

• To map elements to columns using the console, see Working with the Schema Editor (p. 55).• To map elements to columns using the Kinesis Data Analytics API, see the following section.

To map JSON elements to columns in the in-application input stream, you need a schema with thefollowing information for each column:

• Source Expression: The JSONPath expression that identifies the location of the data for the column.• Column Name: The name that your SQL queries use to reference the data.• Data Type: The SQL data type for the column.

Using the APITo map elements from a streaming source to input columns, you can use the Kinesis Data Analytics APICreateApplication (p. 163) action. To create the in-application stream, specify a schema to transformyour data into a schematized version used in SQL. The CreateApplication (p. 163) action configuresyour application to receive input from a single streaming source. To map JSON elements or CSVcolumns to SQL columns, you create a RecordColumn (p. 247) object in the SourceSchema (p. 258)RecordColumns array. The RecordColumn (p. 247) object has the following schema:

{ "Mapping": "String", "Name": "String", "SqlType": "String"}

The fields in the RecordColumn (p. 247) object have the following values:

• Mapping: The JSONPath expression that identifies the location of the data in the input stream record.This value is not present for an input schema for a source stream in CSV format.

• Name: The column name in the in-application SQL data stream.• SqlType: The data type of the data in the in-application SQL data stream.

JSON Input Schema Example

The following example demonstrates the format of the InputSchema value for a JSON schema.

"InputSchema": {

13

Page 20: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideMapping Streaming Source Elements to SQL Input Columns

"RecordColumns": [ { "SqlType": "VARCHAR(4)", "Name": "TICKER_SYMBOL", "Mapping": "$.TICKER_SYMBOL" }, { "SqlType": "VARCHAR(16)", "Name": "SECTOR", "Mapping": "$.SECTOR" }, { "SqlType": "TINYINT", "Name": "CHANGE", "Mapping": "$.CHANGE" }, { "SqlType": "DECIMAL(5,2)", "Name": "PRICE", "Mapping": "$.PRICE" } ], "RecordFormat": { "MappingParameters": { "JSONMappingParameters": { "RecordRowPath": "$" } }, "RecordFormatType": "JSON" }, "RecordEncoding": "UTF-8"}

CSV Input Schema Example

The following example demonstrates the format of the InputSchema value for a schema in comma-separated value (CSV) format.

"InputSchema": { "RecordColumns": [ { "SqlType": "VARCHAR(16)", "Name": "LastName" }, { "SqlType": "VARCHAR(16)", "Name": "FirstName" }, { "SqlType": "INTEGER", "Name": "CustomerId" } ], "RecordFormat": { "MappingParameters": { "CSVMappingParameters": { "RecordColumnDelimiter": ",", "RecordRowDelimiter": "\n" } }, "RecordFormatType": "CSV" }, "RecordEncoding": "UTF-8"

14

Page 21: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideMapping Streaming Source Elements to SQL Input Columns

}

Mapping JSON Data Types to SQL Data TypesJSON data types are converted to corresponding SQL data types according to the application's inputschema. For information about supported SQL data types, see Data Types. Amazon Kinesis DataAnalytics converts JSON data types to SQL data types according to the following rules.

Null Literal

A null literal in the JSON input stream (that is, "City":null) converts to a SQL null regardless ofdestination data type.

Boolean Literal

A Boolean literal in the JSON input stream (that is, "Contacted":true) converts to SQL data asfollows:

• Numeric (DECIMAL, INT, and so on): true converts to 1; false converts to 0.• Binary (BINARY or VARBINARY):

• true: Result has lowest bit set and remaining bits cleared.• false: Result has all bits cleared.

Conversion to VARBINARY results in a value 1 byte in length.• BOOLEAN: Converts to the corresponding SQL BOOLEAN value.• Character (CHAR or VARCHAR): Converts to the corresponding string value (true or false). The value

is truncated to fit the length of the field.• Datetime (DATE, TIME, or TIMESTAMP): Conversion fails and a coercion error is written to the error

stream.

Number

A number literal in the JSON input stream (that is, "CustomerId":67321) converts to SQL data asfollows:

• Numeric (DECIMAL, INT, and so on): Converts directly. If the converted value exceeds the size orprecision of the target data type (that is, converting 123.4 to INT), conversion fails and a coercionerror is written to the error stream.

• Binary (BINARY or VARBINARY): Conversion fails and a coercion error is written to the error stream.• BOOLEAN:

• 0: Converts to false.• All other numbers: Converts to true.

• Character (CHAR or VARCHAR): Converts to a string representation of the number.• Datetime (DATE, TIME, or TIMESTAMP): Conversion fails and a coercion error is written to the error

stream.

String

A string value in the JSON input stream (that is, "CustomerName":"John Doe") converts to SQL dataas follows:

15

Page 22: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideUsing the Schema Discovery Feature on Streaming Data

• Numeric (DECIMAL, INT, and so on): Amazon Kinesis Data Analytics attempts to convert the value tothe target data type. If the value cannot be converted, conversion fails and a coercion error is writtento the error stream.

• Binary (BINARY or VARBINARY): If the source string is a valid binary literal (that is, X'3F67A23A', withan even number of f), the value is converted to the target data type. Otherwise, conversion fails and acoercion error is written to the error stream.

• BOOLEAN: If the source string is "true", converts to true. This comparison is case-insensitive.Otherwise, converts to false.

• Character (CHAR or VARCHAR): Converts to the string value in the input. If the value is longer than thetarget data type, it is truncated and no error is written to the error stream.

• Datetime (DATE, TIME, or TIMESTAMP): If the source string is in a format that can be converted to thetarget value, the value is converted. Otherwise, conversion fails and a coercion error is written to theerror stream.

Valid datetime formats include:

• "1992-02-14"

• "1992-02-14 18:35:44.0"

Array or Object

An array or object in the JSON input stream converts to SQL data as follows:

• Character (CHAR or VARCHAR): Converts to the source text of the array or object. See AccessingArrays (p. 10).

• All other data types: Conversion fails and a coercion error is written to the error stream.

For an example of a JSON array, see Working with JSONPath (p. 9).

Related Topics• Configuring Application Input (p. 5)

• Data Types

• Working with the Schema Editor (p. 55)

• CreateApplication (p. 163)

• RecordColumn (p. 247)

• SourceSchema (p. 258)

Using the Schema Discovery Feature on StreamingDataProviding an input schema that describes how records on the streaming input map to an in-applicationstream can be cumbersome and error prone. You can use the DiscoverInputSchema (p. 182) API (calledthe discovery API) to infer a schema. Using random samples of records on the streaming source, the APIcan infer a schema (that is, column names, data types, and position of the data element in the incomingdata).

NoteTo use the Discovery API to generate a schema from a file stored in Amazon S3, see Using theSchema Discovery Feature on Static Data (p. 18).

16

Page 23: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideUsing the Schema Discovery Feature on Streaming Data

The console uses the Discovery API to generate a schema for a specified streaming source. Using theconsole, you can also update the schema, including adding or removing columns, changing columnnames or data types, and so on. However, make changes carefully to ensure that you do not create aninvalid schema.

After you finalize a schema for your in-application stream, there are functions you can use to manipulatestring and datetime values. You can use these functions in your application code when working with rowsin the resulting in-application stream. For more information, see Example: Manipulating Strings and DateTimes (p. 76).

Column Naming During Schema DiscoveryDuring schema discovery, Amazon Kinesis Data Analytics tries to retain as much of the original columnname as possible from the streaming input source, except in the following cases:

• The source stream column name is a reserved SQL keyword, such as TIMESTAMP, USER, VALUES, orYEAR.

• The source stream column name contains unsupported characters. Only letters, numbers, and theunderscore character ( _ ) are supported.

• The source stream column name begins with a number.• The source stream column name is longer than 100 characters.

If a column is renamed, the renamed schema column name begins with COL_. In some cases, none ofthe original column name can be retained—for example, if the entire name is unsupported characters.In such a case, the column is named COL_#, with # being a number indicating the column's place in thecolumn order.

After discovery completes, you can update the schema using the console to add or remove columns, orchange column names, data types, or data size.

Examples of Discovery-Suggested Column Names

Source Stream Column Name Discovery-Suggested Column Name

USER COL_USER

USER@DOMAIN COL_USERDOMAIN

@@ COL_0

Schema Discovery IssuesWhat happens if Kinesis Data Analytics does not infer a schema for a given streaming source?

Kinesis Data Analytics infers your schema for common formats, such as CSV and JSON, which are UTF-8encoded. Kinesis Data Analytics supports any UTF-8 encoded records (including raw text like applicationlogs and records) with a custom column and row delimiter. If Kinesis Data Analytics doesn't infer aschema, you can define a schema manually using the schema editor on the console (or using the API).

If your data does not follow a pattern (which you can specify using the schema editor), you can define aschema as a single column of type VARCHAR(N), where N is the largest number of characters you expectyour record to include. From there, you can use string and date-time manipulation to structure yourdata after it is in an in-application stream. For examples, see Example: Manipulating Strings and DateTimes (p. 76).

17

Page 24: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideUsing the Schema Discovery Feature on Static Data

Using the Schema Discovery Feature on Static DataThe schema discovery feature can generate a schema from either the data in a stream or data in a staticfile that is stored in an Amazon S3 bucket. Suppose that you want to generate a schema for a KinesisData Analytics application for reference purposes or when live streaming data isn't available. You can usethe schema discovery feature on a static file that contains a sample of the data in the expected format ofyour streaming or reference data. Kinesis Data Analytics can run schema discovery on sample data froma JSON or CSV file that's stored in an Amazon S3 bucket. Using schema discovery on a data file uses theDiscoverInputSchema (p. 182) API with the S3Configuration parameter specified.

To run discovery on a static file, you provide the API with an S3Configuration structure with thefollowing information:

• BucketARN: The Amazon Resource Name (ARN) of the Amazon S3 bucket that contains the file. Forthe format of an Amazon S3 bucket ARN, see Amazon Resource Names (ARNs) and AWS ServiceNamespaces: Amazon Simple Storage Service (Amazon S3).

• RoleARN: The ARN of an IAM role with the AmazonS3ReadOnlyAccess policy. For information abouthow to add a policy to a role, see Modifying a Role.

• FileKey: The file name of the object.

NoteGenerating a schema from a data file is currently not available in the AWS ManagementConsole.

To generate a schema from an Amazon S3 object using the DiscoverInputSchema API

1. Make sure that you have the AWS CLI set up. For more information, see Step 2: Set Up the AWSCommand Line Interface (AWS CLI) (p. 45) in the Getting Started section.

2. Create a file named data.csv with the following contents:

year,month,state,producer_type,energy_source,units,consumption2001,1,AK,TotalElectricPowerIndustry,Coal,ShortTons,476152001,1,AK,ElectricGeneratorsElectricUtilities,Coal,ShortTons,165352001,1,AK,CombinedHeatandPowerElectricPower,Coal,ShortTons,228902001,1,AL,TotalElectricPowerIndustry,Coal,ShortTons,30206012001,1,AL,ElectricGeneratorsElectricUtilities,Coal,ShortTons,2987681

3. Sign in to the Amazon S3 console at https://console.aws.amazon.com/s3/.4. Create an Amazon S3 bucket and upload the data.csv file you created. Note the ARN of the

created bucket. For information about creating an Amazon S3 bucket and uploading a file, seeGetting Started with Amazon Simple Storage Service.

5. Open the IAM console at https://console.aws.amazon.com/iam/. Create a role with theAmazonS3ReadOnlyAccess policy. Note the ARN of the new role. For information about creating arole, see Creating a Role to Delegate Permissions to an AWS Service. For information about how toadd a policy to a role, see Modifying a Role.

6. Run the following DiscoverInputSchema command in the AWS CLI, substituting the ARNs foryour Amazon S3 bucket and IAM role:

$aws kinesisanalytics discover-input-schema --s3-configuration '{ "RoleARN": "arn:aws:iam::123456789012:role/service-role/your-IAM-role", "BucketARN": "arn:aws:s3:::your-bucket-name", "FileKey": "data.csv" }'

7. The response will look similar to the following:

{ "InputSchema": {

18

Page 25: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideUsing the Schema Discovery Feature on Static Data

"RecordEncoding": "UTF-8", "RecordColumns": [ { "SqlType": "INTEGER", "Name": "COL_year" }, { "SqlType": "INTEGER", "Name": "COL_month" }, { "SqlType": "VARCHAR(4)", "Name": "state" }, { "SqlType": "VARCHAR(64)", "Name": "producer_type" }, { "SqlType": "VARCHAR(4)", "Name": "energy_source" }, { "SqlType": "VARCHAR(16)", "Name": "units" }, { "SqlType": "INTEGER", "Name": "consumption" } ], "RecordFormat": { "RecordFormatType": "CSV", "MappingParameters": { "CSVMappingParameters": { "RecordRowDelimiter": "\r\n", "RecordColumnDelimiter": "," } } } }, "RawInputRecords": [ "year,month,state,producer_type,energy_source,units,consumption\r\n2001,1,AK,TotalElectricPowerIndustry,Coal,ShortTons,47615\r\n2001,1,AK,ElectricGeneratorsElectricUtilities,Coal,ShortTons,16535\r\n2001,1,AK,CombinedHeatandPowerElectricPower,Coal,ShortTons,22890\r\n2001,1,AL,TotalElectricPowerIndustry,Coal,ShortTons,3020601\r\n2001,1,AL,ElectricGeneratorsElectricUtilities,Coal,ShortTons,2987681" ], "ParsedInputRecords": [ [ null, null, "state", "producer_type", "energy_source", "units", null ], [ "2001", "1", "AK", "TotalElectricPowerIndustry", "Coal", "ShortTons",

19

Page 26: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuidePreprocessing Data Using a Lambda Function

"47615" ], [ "2001", "1", "AK", "ElectricGeneratorsElectricUtilities", "Coal", "ShortTons", "16535" ], [ "2001", "1", "AK", "CombinedHeatandPowerElectricPower", "Coal", "ShortTons", "22890" ], [ "2001", "1", "AL", "TotalElectricPowerIndustry", "Coal", "ShortTons", "3020601" ], [ "2001", "1", "AL", "ElectricGeneratorsElectricUtilities", "Coal", "ShortTons", "2987681" ] ]}

Preprocessing Data Using a Lambda FunctionIf the data in your stream needs format conversion, transformation, enrichment, or filtering, you canpreprocess the data using an AWS Lambda function. You can do this before your application SQL codeexecutes or before your application creates a schema from your data stream.

Using a Lambda function for preprocessing records is useful in the following scenarios:

• Transforming records from other formats (such as KPL or GZIP) into formats that Kinesis Data Analyticscan analyze. Kinesis Data Analytics currently supports JSON or CSV data formats.

• Expanding data into a format that is more accessible for operations such as aggregation or anomalydetection. For instance, if several data values are stored together in a string, you can expand the datainto separate columns.

• Data enrichment with other AWS services, such as extrapolation or error correction.• Applying complex string transformation to record fields.• Data filtering for cleaning up the data.

a.

20

Page 27: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuidePreprocessing Data Using a Lambda Function

Using a Lambda Function for Preprocessing RecordsWhen creating your Kinesis Data Analytics application, you enable Lambda preprocessing in the Connectto a Source page.

To use a Lambda function to preprocess records in a Kinesis Data Analytics application

1. Sign in to the AWS Management Console and open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics.

2. On the Connect to a Source page for your application, choose Enabled in the Record pre-processing with AWS Lambda section.

3. To use a Lambda function that you have already created, choose the function in the Lambdafunction drop-down list.

4. To create a new Lambda function from one of the Lambda preprocessing templates, choose thetemplate from the drop-down list. Then choose View <template name> in Lambda to edit thefunction.

5. To create a new Lambda function, choose Create new. For information about creating a Lambdafunction, see Create a HelloWorld Lambda Function and Explore the Console.

6. Choose the version of the Lambda function to use. To use the latest version, choose $LATEST.

When you choose or create a Lambda function for record preprocessing, the records are preprocessedbefore your application SQL code executes or your application generates a schema from the records.

Lambda Preprocessing PermissionsTo use Lambda preprocessing, the application's IAM role requires the following permissions policy:

{ "Sid": "UseLambdaFunction", "Effect": "Allow", "Action": [ "lambda:InvokeFunction", "lambda:GetFunctionConfiguration" ], "Resource": "<FunctionARN>" }

For more information about adding permissions policies, see Authentication and Access Control forAmazon Kinesis Data Analytics (p. 136).

Lambda Preprocessing MetricsYou can monitor the number of Lambda invocations, bytes processed, successes and failures, and so on,using Amazon CloudWatch. For information about CloudWatch metrics that are emitted by Kinesis DataAnalytics Lambda preprocessing, see Amazon Kinesis Analytics Metrics.

Data Preprocessing Event Input Data Model/ Record ResponseModelTo preprocess records, your Lambda function must be compliant with the required event input data andrecord response models.

Event Input Data Model

Kinesis Data Analytics continuously reads data from your Kinesis data stream or Kinesis Data Firehosedelivery stream. For each batch of records it retrieves, the service manages how each batch gets passed

21

Page 28: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuidePreprocessing Data Using a Lambda Function

to your Lambda function. Your function receives a list of records as input. Within your function, youiterate through the list and apply your business logic to accomplish your preprocessing requirements(such as data format conversion or enrichment).

The input model to your preprocessing function varies slightly, depending on whether the data wasreceived from a Kinesis data stream or a Kinesis Data Firehose delivery stream.

If the source is a Kinesis Data Firehose delivery stream, the event input data model is as follows:

Kinesis Data Firehose Request Data Model

Field Description

invocationId The Lambda invocation Id (random GUID).

applicationArn Kinesis Data Analytics application Amazon Resource Name (ARN)

streamArn Delivery stream ARN

records

Field Description

recordId record ID (random GUID)

kinesisFirehoseRecordMetadataField Description

approximateArrivalTimestampDelivery stream recordapproximate arrivaltime

data Base64-encoded source record payload

If the source is a Kinesis data stream, the event input data model is as follows:

Kinesis Streams Request Data Model

Field Description

invocationId The Lambda invocation Id (random GUID).

applicationArn Kinesis Data Analytics application ARN

streamArn Delivery stream ARN

records

Field Description

recordId record ID based off of Kinesis record sequencenumber

22

Page 29: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuidePreprocessing Data Using a Lambda Function

Field Description

Field Description

kinesisStreamRecordMetadataField Description

sequenceNumberSequence numberfrom the Kinesisstream record

partitionKeyPartition key from theKinesis stream record

shardId ShardId from theKinesis stream record

approximateArrivalTimestampDelivery stream recordapproximate arrivaltime

data Base64-encoded source record payload

Record Response Model

All records returned from your Lambda preprocessing function (with record IDs) that are sent to theLambda function must be returned. They must contain the following parameters, or Kinesis DataAnalytics rejects them and treats it as a data preprocessing failure. The data payload part of the recordcan be transformed to accomplish preprocessing requirements.

Response Data Model

records

Field Description

recordId The record ID is passed from Kinesis Data Analytics to Lambdaduring the invocation. The transformed record must contain thesame record ID. Any mismatch between the ID of the originalrecord and the ID of the transformed record is treated as a datapreprocessing failure.

result The status of the data transformation of the record. The possiblevalues are:

• Ok: The record was transformed successfully. Kinesis DataAnalytics ingests the record for SQL processing.

• Dropped: The record was dropped intentionally by yourprocessing logic. Kinesis Data Analytics drops the recordfrom SQL processing. The data payload field is optional for aDropped record.

• ProcessingFailed: The record could not be transformed.Kinesis Data Analytics considers it unsuccessfully processedby your Lambda function and writes an error to the errorstream. For more information about the error stream, see Error

23

Page 30: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuidePreprocessing Data Using a Lambda Function

Field Description

Handling (p. 39). The data payload field is optional for aProcessingFailed record.

data The transformed data payload, after base64-encoding. Each datapayload can contain multiple JSON documents if the applicationingestion data format is JSON. Or each can contain multiple CSVrows (with a row delimiter specified in each row) if the applicationingestion data format is CSV. The Kinesis Data Analytics servicesuccessfully parses and processes data with either multiple JSONdocuments or CSV rows within the same data payload.

Common Data Preprocessing FailuresThe following are common reasons why preprocessing can fail.

• Not all records (with record IDs) in a batch that are sent to the Lambda function are returned back tothe Kinesis Data Analytics service.

• The response is missing either the record ID, status, or data payload field. The data payload field isoptional for a Dropped or ProcessingFailed record.

• The Lambda function timeouts are not sufficient to preprocess the data.• The Lambda function response exceeds the response limits imposed by the AWS Lambda service.

In the case of data preprocessing failures, the Kinesis Data Analytics service continues to retry Lambdainvocations on the same set of records until successful. You can monitor the following CloudWatchmetrics to gain insight into failures.

• Kinesis Data Analytics Application MillisBehindLatest: Indicates how far behind an application isreading from the streaming source.

• Kinesis Data Analytics Application InputPreprocessing CloudWatch metrics: Indicates the numberof successes and failures, among other statistics. For more information, see Amazon Kinesis AnalyticsMetrics.

• AWS Lambda function CloudWatch metrics and logs.

Creating Lambda Functions for PreprocessingYour Amazon Kinesis Data Analytics application can use Lambda functions for preprocessing records asthey are ingested into the application. Kinesis Data Analytics provides the following templates on theconsole to use as a starting point for preprocessing your data.

Topics• Creating a Preprocessing Lambda Function in Node.js (p. 24)• Creating a Preprocessing Lambda Function in Python (p. 25)• Creating a Preprocessing Lambda Function in Java (p. 25)• Creating a Preprocessing Lambda Function in .NET (p. 26)

Creating a Preprocessing Lambda Function in Node.js

The following templates for creating preprocessing Lambda function in Node.js are available on theKinesis Data Analytics console:

24

Page 31: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuidePreprocessing Data Using a Lambda Function

Lambda Blueprint Language and version Description

General KinesisData Analytics InputProcessing

Node.js 6.10 A Kinesis Data Analytics record preprocessorthat receives JSON or CSV records as input andthen returns them with a processing status. Usethis processor as a starting point for customtransformation logic.

Compressed InputProcessing

Node.js 6.10 A Kinesis Data Analytics record processor thatreceives compressed (GZIP or Deflate compressed)JSON or CSV records as input and returnsdecompressed records with a processing status.

Creating a Preprocessing Lambda Function in Python

The following templates for creating preprocessing Lambda function in Python are available on theconsole:

Lambda Blueprint Language and version Description

General KinesisAnalytics InputProcessing

Python 2.7 A Kinesis Data Analytics record preprocessorthat receives JSON or CSV records as input andthen returns them with a processing status. Usethis processor as a starting point for customtransformation logic.

KPL Input Processing Python 2.7 A Kinesis Data Analytics record processor thatreceives Kinesis Producer Library (KPL) aggregatesof JSON or CSV records as input and returnsdisaggregated records with a processing status.

Creating a Preprocessing Lambda Function in Java

To create a Lambda function in Java for preprocessing records, use the Java events classes.

The following code demonstrates a sample Lambda function that preprocesses records using Java:

public class LambdaFunctionHandler implements RequestHandler<KinesisAnalyticsStreamsInputPreprocessingEvent, KinesisAnalyticsInputPreprocessingResponse> {

@Override public KinesisAnalyticsInputPreprocessingResponse handleRequest( KinesisAnalyticsStreamsInputPreprocessingEvent event, Context context) { context.getLogger().log("InvocatonId is : " + event.invocationId); context.getLogger().log("StreamArn is : " + event.streamArn); context.getLogger().log("ApplicationArn is : " + event.applicationArn);

List<KinesisAnalyticsInputPreprocessingResponse.Record> records = new ArrayList<KinesisAnalyticsInputPreprocessingResponse.Record>(); KinesisAnalyticsInputPreprocessingResponse response = new KinesisAnalyticsInputPreprocessingResponse(records);

event.records.stream().forEach(record -> { context.getLogger().log("recordId is : " + record.recordId); context.getLogger().log("record aat is :" + record.kinesisStreamRecordMetadata.approximateArrivalTimestamp);

25

Page 32: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuidePreprocessing Data Using a Lambda Function

// Add your record.data pre-processing logic here. // response.records.add(new Record(record.recordId, KinesisAnalyticsInputPreprocessingResult.Ok, <preprocessedrecordData>)); }); return response; }

}

Creating a Preprocessing Lambda Function in .NET

To create a Lambda function in .NET for preprocessing records, use the .NET events classes.

The following code demonstrates a sample Lambda function that preprocesses records using C#:

public class Function { public KinesisAnalyticsInputPreprocessingResponse FunctionHandler(KinesisAnalyticsStreamsInputPreprocessingEvent evnt, ILambdaContext context) { context.Logger.LogLine($"InvocationId: {evnt.InvocationId}"); context.Logger.LogLine($"StreamArn: {evnt.StreamArn}"); context.Logger.LogLine($"ApplicationArn: {evnt.ApplicationArn}");

var response = new KinesisAnalyticsInputPreprocessingResponse { Records = new List<KinesisAnalyticsInputPreprocessingResponse.Record>() };

foreach (var record in evnt.Records) { context.Logger.LogLine($"\tRecordId: {record.RecordId}"); context.Logger.LogLine($"\tShardId: {record.RecordMetadata.ShardId}"); context.Logger.LogLine($"\tPartitionKey: {record.RecordMetadata.PartitionKey}"); context.Logger.LogLine($"\tRecord ApproximateArrivalTime: {record.RecordMetadata.ApproximateArrivalTimestamp}"); context.Logger.LogLine($"\tData: {record.DecodeData()}");

// Add your record preprocessig logic here.

var preprocessedRecord = new KinesisAnalyticsInputPreprocessingResponse.Record { RecordId = record.RecordId, Result = KinesisAnalyticsInputPreprocessingResponse.OK }; preprocessedRecord.EncodeData(record.DecodeData().ToUpperInvariant()); response.Records.Add(preprocessedRecord); } return response; } }

For more information about creating Lambda functions for preprocessing and destinations in .NET, seeAmazon.Lambda.KinesisAnalyticsEvents.

26

Page 33: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideParallelizing Input Streams for Increased Throughput

Parallelizing Input Streams for Increased ThroughputAmazon Kinesis Data Analytics applications can support multiple in-application input streams, to scalean application beyond the throughput of a single in-application input stream. For more information onin-application input streams, see Amazon Kinesis Data Analytics: How It Works (p. 3).

In almost all cases, Amazon Kinesis Data Analytics scales your application to handle the capacity of theAmazon Kinesis streams or Amazon Kinesis Data Firehose source streams that feed into your application.However, if your source stream's throughput exceeds the throughput of a single in-application inputstream, you can explicitly increase the number of in-application input streams that your application uses.You do so with the InputParallelism parameter.

When the InputParallelism parameter is greater than one, Amazon Kinesis Data Analytics evenlysplits the partitions of your source stream among the in-application streams. For instance, if your sourcestream has 50 shards, and you have set InputParallelism to 2, each in-application input streamreceives the input from 25 source stream shards.

When you increase the number of in-application streams, your application must access the data ineach stream explicitly. For information on accessing multiple in-application streams in your code, seeAccessing Separate In-Application Streams in Your Amazon Kinesis Data Analytics Application (p. 28).

Although Kinesis Data Analytics and Kinesis Data Firehose stream shards are both divided among in-application streams in the same way, they differ in the way they appear to your application:

• The records from a Kinesis Data Analytics stream include a shard_id field that can be used to identifythe source shard for the record.

• The records from a Kinesis Data Firehose stream don't include a field that identifies the record'ssource shard or partition, because Kinesis Data Firehose abstracts this information away from yourapplication.

Evaluating Whether to Increase Your Number of In-ApplicationInput Streams

In most cases, a single in-application input stream can handle the throughput of a single source stream,depending on the complexity and data size of the input streams. To determine if you need to increasethe number of in-application input streams, you can monitor the MillisBehindLatest metric inAmazon CloudWatch. If the MillisBehindLatest metric has either of the following characteristics,you should increase your application's InputParallelism setting:

• The MillisBehindLatest metric is gradually increasing, indicating that your application is fallingbehind the latest data in the stream.

• The MillisBehindLatest metric is consistently above 1000 (one second).

You don't need to increase your application's InputParallelism setting if the following are true:

• The MillisBehindLatest metric is gradually decreasing, indicating that your application is catchingup to the latest data in the stream.

• The MillisBehindLatest metric is below 1000 (one second).

For more information on using CloudWatch, see the CloudWatch User Guide.

27

Page 34: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideParallelizing Input Streams for Increased Throughput

Implementing Multiple In-Application Input StreamsYou can set the number of in-application input streams when an application is createdusing CreateApplication (p. 163). You set this number after an application is created usingUpdateApplication (p. 192).

NoteYou can only set the InputParallelism setting using the Amazon Kinesis Data AnalyticsAPI or the AWS CLI. You cannot set this setting using the AWS Management Console. Forinformation on setting up the CLI, see Step 2: Set Up the AWS Command Line Interface (AWSCLI) (p. 45).

Setting a New Application's Input Stream Count

The following example demonstrates how to use the CreateApplication API action to set a newapplication's input stream count to 2.

For more information on CreateApplication, see CreateApplication (p. 163).

{ "ApplicationCode": "<The SQL code the new application will run on the input stream>", "ApplicationDescription": "<A friendly description for the new application>", "ApplicationName": "<The name for the new application>", "Inputs": [ { "InputId": "ID for the new input stream", "InputParallelism": { "Count": 2 }], "Outputs": [ ... ], }]}

Setting an Existing Application's Input Stream Count

The following example demonstrates how to use the UpdateApplication API action to set an existingapplication's input stream count to 2.

For more information on Update_Application, see UpdateApplication (p. 192).

{ "InputUpdates": [ { "InputId": "yourInputId", "InputParallelismUpdate": { "CountUpdate": 2 } } ],}

Accessing Separate In-Application Streams in Your AmazonKinesis Data Analytics ApplicationTo use multiple in-application input streams in your application, you must explicitly select from thedifferent streams. The following code example demonstrates how to query multiple input streams in theapplication created in the Getting Started tutorial.

28

Page 35: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideParallelizing Input Streams for Increased Throughput

In the following example, each source stream is first aggregated using COUNT before being combinedinto a single in-application stream called in_application_stream001. Aggregating the sourcestreams beforehand helps make sure that the combined in-application stream can handle the traffic frommultiple streams without being overloaded.

NoteTo run this example and get results from both in-application input streams, you need to updateboth the number of shards in your source stream and the InputParallelism parameter inyour application.

CREATE OR REPLACE STREAM in_application_stream_001 ( ticker VARCHAR(64), ticker_count INTEGER);

CREATE OR REPLACE PUMP pump001 AS INSERT INTO in_application_stream_001SELECT STREAM ticker_symbol, COUNT(ticker_symbol)FROM source_sql_stream_001GROUP BY STEP(source_sql_stream_001.rowtime BY INTERVAL '60' SECOND), ticker_symbol; CREATE OR REPLACE PUMP pump002 AS INSERT INTO in_application_stream_001SELECT STREAM ticker_symbol, COUNT(ticker_symbol)FROM source_sql_stream_002GROUP BY STEP(source_sql_stream_002.rowtime BY INTERVAL '60' SECOND), ticker_symbol;

The preceding code example produces output in in_application_stream001 similar to the following:

Additional ConsiderationsWhen using multiple input streams, be aware of the following:

• The maximum number of in-application input streams is 64.• The in-application input streams are distributed evenly among the shards of the application's input

stream.• The performance gains from adding in-application streams don't scale linearly. That is, doubling

the number of in-application streams doesn't double throughput. With a typical row size, each in-application stream can achieve throughput of about 5,000 to 15,000 rows per second. By increasingthe in-application stream count to 10, you can achieve a throughput of 20,000 to 30,000 rows per

29

Page 36: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideApplication Code

second. Throughput speed is dependent on the count, data types, and data size of the fields in theinput stream.

• Some aggregate functions (such as AVG) can produce unexpected results when applied to inputstreams partitioned into different shards. Because you need to run the aggregate operation onindividual shards before combining them into an aggregate stream, the results might be weightedtoward whichever stream contains more records.

• If your application continues to experience poor performance (reflected by a highMillisBehindLatest metric) after you increase your number of input streams, you might havereached your limit of Kinesis Processing Units (KPUs). For more information, see Automatically ScalingApplications to Increase Throughput (p. 42).

Application CodeApplication code is a series of SQL statements that process input and produce output. These SQLstatements operate on in-application streams and reference tables. For more information, see AmazonKinesis Data Analytics: How It Works (p. 3).

For information about the SQL language elements that are supported by Kinesis Data Analytics, seeAmazon Kinesis Data Analytics SQL Reference.

In relational databases, you work with tables, using INSERT statements to add records and the SELECTstatement to query the data. In Amazon Kinesis Data Analytics, you work with streams. You can writea SQL statement to query these streams. The results of querying one in-application stream are alwayssent to another in-application stream. When performing complex analytics, you might create severalin-application streams to hold the results of intermediate analytics. And then finally, you configureapplication output to persist results of the final analytics (from one or more in-application streams) toexternal destinations. In summary, the following is a typical pattern for writing application code:

• The SELECT statement is always used in the context of an INSERT statement. That is, when you selectrows, you insert results into another in-application stream.

• The INSERT statement is always used in the context of a pump. That is, you use pumps to write to anin-application stream.

The following example application code reads records from one in-application(SOURCE_SQL_STREAM_001) stream and write it to another in-application stream(DESTINATION_SQL_STREAM). You can insert records to in-application streams using pumps, as shownfollowing:

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), change DOUBLE, price DOUBLE);-- Create a pump and insert into output stream.CREATE OR REPLACE PUMP "STREAM_PUMP" AS

INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM ticker_symbol, change,price FROM "SOURCE_SQL_STREAM_001";

The identifiers that you specify for stream names and column names follow standard SQL conventions.For example, if you put quotation marks around an identifier, it makes the identifier case sensitive. If youdon't, the identifier defaults to uppercase. For more information about identifiers, see Identifiers in theAmazon Kinesis Data Analytics SQL Reference.

Your application code can consist of many SQL statements. For example:

30

Page 37: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideOutput

• You can write SQL queries in a sequential manner where the result of one SQL statement feeds intothe next SQL statement.

• You can also write SQL queries that run independent of each other. For example, you can writetwo SQL statements that query the same in-application stream, but send output into different in-applications streams. You can then query the newly created in-application streams independently.

You can create in-application streams to save intermediate results. You insert data in in-applicationstreams using pumps. For more information, see In-Application Streams and Pumps (p. 65).

If you add an in-application reference table, you can write SQL to join data in in-application streams andreference tables. For more information, see Example: Adding Reference Data to an Amazon Kinesis DataAnalytics Application (p. 90).

According to the application's output configuration, Amazon Kinesis Data Analytics writes data fromspecific in-application streams to the external destination according to the application's outputconfiguration. Make sure that your application code writes to the in-application streams specified in theoutput configuration.

For more information, see the following topics:

• Streaming SQL Concepts (p. 65)• Amazon Kinesis Data Analytics SQL Reference

Configuring Application OutputIn your application code, you write the output of SQL statements to one or more in-application streams.You can optionally add output configuration to your application to persist everything written to an in-application stream to an external destination such as an Amazon Kinesis data stream, a Kinesis DataFirehose delivery stream, or an AWS Lambda function.

There is a limit on the number of external destinations you can use to persist an application output. Formore information, see Limits (p. 129).

NoteWe recommend that you use one external destination to persist in-application error stream dataso that you can investigate the errors.

In each of these output configurations, you provide the following:

• In-application stream name – The stream that you want to persist to an external destination.

Amazon Kinesis Data Analytics looks for the in-application stream that you specified in the outputconfiguration (note that the stream name is case sensitive and must match exactly). Make sure thatyour application code creates this in-application stream.

• External destination – You can persist data to a Kinesis data stream, a Kinesis Data Firehose deliverystream, or a Lambda function. You provide the Amazon Resource Name (ARN) of the stream orfunction, and an IAM role that Kinesis Data Analytics can assume to write to the stream or functionon your behalf. You also describe the record format (JSON, CSV) to Kinesis Data Analytics to use whenwriting to the external destination.

If Amazon Kinesis Data Analytics can't write to the streaming or Lambda destination, the servicecontinues to try indefinitely. This creates back pressure, causing your application to fall behind. If this isnot resolved, your application eventually stops processing new data. You can monitor Amazon KinesisAnalytics Metrics and set alarms for failures. For more information about metrics and alarms, see UsingAmazon CloudWatch Metrics and Creating Amazon CloudWatch Alarms.

31

Page 38: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideCreating an Output Using the AWS CLI

You can configure the application output using the AWS Management Console. The console makes theAPI call to save the configuration.

Creating an Output Using the AWS CLIThis section describes how to create the Outputs section of the request body for aCreateApplication or AddApplicationOutput operation.

Creating a Kinesis Stream OutputThe following JSON fragment shows the Outputs section in the CreateApplication request body forcreating an Amazon Kinesis data stream destination.

"Outputs": [ { "DestinationSchema": { "RecordFormatType": "string" }, "KinesisStreamsOutput": { "ResourceARN": "string", "RoleARN": "string" }, "Name": "string" } ]

Creating a Kinesis Data Firehose Delivery Stream OutputThe following JSON fragment shows the Outputs section in the CreateApplication request body forcreating an Amazon Kinesis Data Firehose delivery stream destination.

"Outputs": [ { "DestinationSchema": { "RecordFormatType": "string" }, "KinesisFirehoseOutput": { "ResourceARN": "string", "RoleARN": "string" }, "Name": "string" }]

Creating a Lambda Function OutputThe following JSON fragment shows the Outputs section in the CreateApplication request body forcreating an AWS Lambda function destination.

"Outputs": [ { "DestinationSchema": { "RecordFormatType": "string" }, "LambdaOutput": { "ResourceARN": "string",

32

Page 39: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideUsing a Lambda Function as Output

"RoleARN": "string" }, "Name": "string" }]

Using a Lambda Function as OutputUsing AWS Lambda as a destination allows you to more easily perform post-processing of your SQLresults before sending them to a final destination. Common post-processing tasks include the following:

• Aggregating multiple rows into a single record

• Combining current results with past results to address late-arriving data

• Delivering to different destinations based on the type of information

• Record format translation (such as translating to Protobuf)

• String manipulation or transformation

• Data enrichment after analytical processing

• Custom processing for geospatial use cases

• Data encryption

Lambda functions can deliver analytic information to a variety of AWS services and other destinations,including the following:

• Amazon Simple Storage Service (Amazon S3)

• Custom APIs

• Amazon DynamoDB

• Apache Aurora

• Amazon Redshift

• Amazon Simple Notification Service (Amazon SNS)

• Amazon Simple Queue Service (Amazon SQS)

• Amazon CloudWatch

For more information about creating Lambda applications, see Getting Started with AWS Lambda.

Topics

• Lambda as Output Permissions (p. 33)

• Lambda as Output Metrics (p. 34)

• Lambda as Output Event Input Data Model and Record Response Model (p. 34)

• Lambda Output Invocation Frequency (p. 35)

• Adding a Lambda Function for Use as an Output (p. 35)

• Common Lambda as Output Failures (p. 36)

• Creating Lambda Functions for Application Destinations (p. 36)

Lambda as Output PermissionsTo use Lambda as output, the application’s Lambda output IAM role requires the following permissionspolicy:

33

Page 40: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideUsing a Lambda Function as Output

{ "Sid": "UseLambdaFunction", "Effect": "Allow", "Action": [ "lambda:InvokeFunction", "lambda:GetFunctionConfiguration" ], "Resource": "FunctionARN"}

Lambda as Output Metrics

You use Amazon CloudWatch to monitor the number of bytes sent, successes and failures, and so on.For information about CloudWatch metrics that are emitted by Kinesis Data Analytics using Lambda asoutput, see Amazon Kinesis Analytics Metrics.

Lambda as Output Event Input Data Model and RecordResponse Model

To send Kinesis Data Analytics output records, your Lambda function must be compliant with therequired event input data and record response models.

Event Input Data Model

Kinesis Data Analytics continuously sends the output records from the application to the Lambda asoutput function with the following request model. Within your function, you iterate through the list andapply your business logic to accomplish your output requirements (such as data transformation beforesending to a final destination).

Field Description

invocationId The Lambda invocation ID (random GUID).

applicationArn The Kinesis data analytics application Amazon Resource Name(ARN).

records

Field Description

recordId record ID (random GUID)

lambdaDeliveryRecordMetadataField Description

retryHint Number of deliveryretries

data Base64-encoded output record payload

NoteThe retryHint is a value that increases for every delivery failure. This value is not durablypersisted, and resets if the application is disrupted.

34

Page 41: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideUsing a Lambda Function as Output

Record Response Model

Each record sent to your Lambda as output function (with record IDs) must be acknowledged with eitherOk or DeliveryFailed and must contain the following parameters. Otherwise, Kinesis Data Analyticstreats them as a delivery failure.

records

Field Description

recordId The record ID is passed from Kinesis Data Analytics to Lambdaduring the invocation. Any mismatch between the ID of theoriginal record and the ID of the acknowledged record is treated asa delivery failure.

result The status of the delivery of the record. The following are possiblevalues:

• Ok: The record was transformed successfully and sent to thefinal destination. Kinesis Data Analytics ingests the record forSQL processing.

• DeliveryFailed: The record was not delivered successfully tothe final destination by the Lambda as output function. KinesisData Analytics continuously retries sending the delivery failedrecords to the Lambda as output function.

Lambda Output Invocation FrequencyA Kinesis data analytics application buffers the output records and invokes the AWS Lambda destinationfunction frequently.

• If records are emitted to the destination in-application stream within the data analytics applicationas a tumbling window, the AWS Lambda destination function is invoked per tumbling window trigger.For example, if a tumbling window of 60 seconds is used to emit the records to the destination in-application stream, the AWS Lambda function is invoked once every 60 seconds.

• If records are emitted to the destination in-application stream within the data analytics applicationas a continuous query or a sliding window, the AWS Lambda destination function is invokedapproximately once per second.

NotePer-Lambda function invoke request payload size limits apply. Exceeding those limits results inoutput records being split and sent across multiple Lambda function calls.

Adding a Lambda Function for Use as an OutputThe following procedure demonstrates how to add a Lambda function as an output for an AmazonKinesis data analytics application.

1. Sign in to the AWS Management Console and open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics.

2. Choose the application in the list, and then choose Application details.

3. In the Destination section, choose Connect new destination.

35

Page 42: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideUsing a Lambda Function as Output

4. For the Destination item, choose AWS Lambda function.5. In the Deliver records to AWS Lambda section, either choose an existing Lambda function or choose

Create new.6. If you are creating a new Lambda function, do the following:

a. Choose one of the templates provided. For more information, Creating Lambda Functions forApplication Destinations (p. 36).

b. The Create Function page opens in a new browser tab. In the Name box, give the function ameaningful name (for example, myLambdaFunction).

c. Update the template with post-processing functionality for your application. For informationabout creating a Lambda function, see Getting Started in the AWS Lambda Developer Guide.

d. On the Kinesis Data Analytics console, in the Lambda function drop-down list, choose theLambda function that you just created.

7. In the In-application stream section, choose Choose an existing in-application stream. For In-application stream name, choose your application's output stream. The results from the selectedoutput stream are sent to the Lambda output function.

8. Leave the rest of the form with the default values, and choose Save and continue.

Your application now sends records from the in-application stream to your Lambda function. Youcan see the results of the default template in the Amazon CloudWatch console. Monitor the AWS/KinesisAnalytics/LambdaDelivery.OkRecords metric to see the number of records beingdelivered to the Lambda function.

Common Lambda as Output FailuresThe following are common reasons why delivery to a Lambda function can fail.

• Not all records (with record IDs) in a batch that are sent to the Lambda function are returned to theKinesis Data Analytics service.

• The response is missing either the record ID or the status field.• The Lambda function timeouts are not sufficient to accomplish the business logic within the Lambda

function.• The business logic within the Lambda function does not catch all the errors, resulting in a timeout and

backpressure due to unhandled exceptions. These are often referred as “poison pill” messages.

In the case of data delivery failures, Kinesis Data Analytics continues to retry Lambda invocations onthe same set of records until successful. To gain insight into failures, you can monitor the followingCloudWatch metrics:

• Kinesis Data Analytics application Lambda as Output CloudWatch metrics: Indicates the number ofsuccesses and failures, among other statistics. For more information, see Amazon Kinesis AnalyticsMetrics.

• AWS Lambda function CloudWatch metrics and logs.

Creating Lambda Functions for Application DestinationsYour Kinesis Data Analytics application can use AWS Lambda functions as an output. Kinesis DataAnalytics provides templates for creating Lambda functions to use as a destination for your applications.Use these templates as a starting point for post-processing output from your application.

Topics• Creating a Lambda Function Destination in Node.js (p. 37)

36

Page 43: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideUsing a Lambda Function as Output

• Creating a Lambda Function Destination in Python (p. 37)

• Creating a Lambda Function Destination in Java (p. 37)

• Creating a Lambda Function Destination in .NET (p. 38)

Creating a Lambda Function Destination in Node.js

The following template for creating a destination Lambda function in Node.js is available on the console:

Lambda as Output Blueprint Language and Version Description

kinesis-analytics-output Node.js 6.10 Deliver output records froma Kinesis Data Analyticsapplication to a customdestination.

Creating a Lambda Function Destination in Python

The following templates for creating a destination Lambda function in Python are available on theconsole:

Lambda as Output Blueprint Language and Version Description

kinesis-analytics-output-sns Python 2.7 Deliver output records froma Kinesis Data Analyticsapplication to Amazon SNS.

kinesis-analytics-output-ddb Python 2.7 Deliver output records froma Kinesis Data Analyticsapplication to AmazonDynamoDB.

Creating a Lambda Function Destination in Java

To create a destination Lambda function in Java, use the Java events classes.

The following code demonstrates a sample destination Lambda function using Java:

public class LambdaFunctionHandler implements RequestHandler<KinesisAnalyticsOutputDeliveryEvent, KinesisAnalyticsOutputDeliveryResponse> {

@Override public KinesisAnalyticsOutputDeliveryResponse handleRequest(KinesisAnalyticsOutputDeliveryEvent event, Context context) { context.getLogger().log("InvocatonId is : " + event.invocationId); context.getLogger().log("ApplicationArn is : " + event.applicationArn);

List<KinesisAnalyticsOutputDeliveryResponse.Record> records = new ArrayList<KinesisAnalyticsOutputDeliveryResponse.Record>(); KinesisAnalyticsOutputDeliveryResponse response = new KinesisAnalyticsOutputDeliveryResponse(records);

event.records.stream().forEach(record -> {

37

Page 44: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideApplication Output Delivery Model

context.getLogger().log("recordId is : " + record.recordId); context.getLogger().log("record retryHint is :" + record.lambdaDeliveryRecordMetadata.retryHint); // Add logic here to transform and send the record to final destination of your choice. response.records.add(new Record(record.recordId, KinesisAnalyticsOutputDeliveryResult.Ok)); }); return response; }

}

Creating a Lambda Function Destination in .NET

To create a destination Lambda function in .NET, use the .NET events classes.

The following code demonstrates a sample destination Lambda function using C#:

public class Function { public KinesisAnalyticsOutputDeliveryResponse FunctionHandler(KinesisAnalyticsOutputDeliveryEvent evnt, ILambdaContext context) { context.Logger.LogLine($"InvocationId: {evnt.InvocationId}"); context.Logger.LogLine($"ApplicationArn: {evnt.ApplicationArn}");

var response = new KinesisAnalyticsOutputDeliveryResponse { Records = new List<KinesisAnalyticsOutputDeliveryResponse.Record>() };

foreach (var record in evnt.Records) { context.Logger.LogLine($"\tRecordId: {record.RecordId}"); context.Logger.LogLine($"\tRetryHint: {record.RecordMetadata.RetryHint}"); context.Logger.LogLine($"\tData: {record.DecodeData()}");

// Add logic here to send to the record to final destination of your choice.

var deliveredRecord = new KinesisAnalyticsOutputDeliveryResponse.Record { RecordId = record.RecordId, Result = KinesisAnalyticsOutputDeliveryResponse.OK }; response.Records.Add(deliveredRecord); } return response; } }

For more information about creating Lambda functions for pre-processing and destinations in .NET, seeAmazon.Lambda.KinesisAnalyticsEvents.

Delivery Model for Persisting Application Output toan External DestinationAmazon Kinesis Data Analytics uses an "at least once" delivery model for application output tothe configured destinations. When an application is running, Kinesis Data Analytics takes internal

38

Page 45: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideError Handling

checkpoints. These checkpoints are points in time when output records have been delivered tothe destinations without data loss. The service uses the checkpoints as needed to ensure that yourapplication output is delivered at least once to the configured destinations.

In a normal situation, your application processes incoming data continuously, and Kinesis Data Analyticswrites the output to the configured destinations such as a Kinesis data stream or a Kinesis Data Firehosedelivery stream. However, your application can be interrupted occasionally; for example:

• You choose to stop your application and restart it later.• You delete the IAM role that Kinesis Data Analytics needs to write your application output to the

configured destination. Without the IAM role, Kinesis Data Analytics does not have any permissions towrite to the external destination on your behalf.

• A network outage or other internal service failures cause your application to stop runningmomentarily.

When your application restarts, Kinesis Data Analytics ensures that it continues to process and writeoutput from a point before or equal to when the failure occurred. This helps ensure that it doesn't missdelivering any application output to the configured destinations.

Suppose that you configured multiple destinations from the same in-application stream. After theapplication recovers from failure, Kinesis Data Analytics resumes persisting output to the configureddestinations from the last record that was delivered to the slowest destination. This might result inthe same output record delivered more than once to other destinations. In this case, you must handlepotential duplications in the destination externally.

Error HandlingAmazon Kinesis Data Analytics returns API or SQL errors directly to you. For more information about APIoperations, see Actions (p. 149). For more information about handling SQL errors, see Amazon KinesisData Analytics SQL Reference.

Amazon Kinesis Data Analytics reports runtime errors using an in-application error stream callederror_stream.

Reporting Errors Using an In-Application ErrorStreamAmazon Kinesis Data Analytics reports runtime errors to the in-application error stream callederror_stream. The following are examples of errors that might occur:

• A record read from the streaming source does not conform to the input schema.• Your application code specifies division by zero.• The rows are out of order (for example, a record appears on the stream with a ROWTIME value that a

user modified that causes a record to go out of order).• The data in the source stream can't be converted to the data type specified in the schema (Coercion

error). For information about what data types can be converted, see Mapping JSON Data Types to SQLData Types (p. 15).

We recommend that you handle these errors programmatically in your SQL code and/or persist the dataon the error stream to an external destination. This requires that you add an output configuration (seeConfiguring Application Output (p. 31)) to your application. For an example of how the in-applicationerror stream works, see Example: Explore the In-Application Error Stream (p. 118).

39

Page 46: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideGranting Permissions

Error Stream SchemaThe error stream has the following schema:

Field Data Type Notes

ERRORTIME TIMESTAMP The time when the erroroccurred

ERROR_LEVEL VARCHAR(10)  

ERROR_NAME VARCHAR(32)  

MESSAGE VARCHAR(4096)  

DATA_ROWTIME TIMESTAMP The row time of the incomingrecord

DATA_ROW VARCHAR(49152) The hex-encoded data in theoriginal row

PUMP_NAME VARCHAR(128) The originating pump, asdefined with CREATE PUMP

Granting Amazon Kinesis Data AnalyticsPermissions to Access Streaming Sources (Creatingan IAM Role)

Amazon Kinesis Data Analytics needs permissions to read records from a streaming source that youspecify in your application input configuration. Amazon Kinesis Data Analytics also needs permissions towrite your application output to streams that you specify in your application output configuration.

You can grant these permissions by creating an IAM role that Amazon Kinesis Data Analytics can assume.Permissions that you grant to this role determine what Amazon Kinesis Data Analytics can do when theservice assumes the role.

NoteThe information in this section is useful if you want to create an IAM role yourself. When youcreate an application in the Amazon Kinesis Data Analytics console, the console can create anIAM role for you at that time. The console uses the following naming convention for IAM rolesthat it creates:

kinesis-analytics-ApplicationName

After the role is created, you can review the role and attached policies in the IAM console.

Each IAM role has two policies attached to it. In the trust policy, you specify who can assume the role. Inthe permissions policy (there can be one or more), you specify the permissions that you want to grant tothis role. The following sections describe these policies, which you can use when you create an IAM role.

Trust PolicyTo grant Amazon Kinesis Data Analytics permissions to assume a role, you can attach the following trustpolicy to an IAM role:

40

Page 47: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuidePermissions Policy

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "kinesisanalytics.amazonaws.com" }, "Action": "sts:AssumeRole" } ]}

Permissions PolicyIf you are creating an IAM role to allow Amazon Kinesis Data Analytics to read from an application'sstreaming source, you must grant permissions for relevant read actions. Depending on your streamingsource (for example, an Kinesis stream or a Kinesis Data Firehose delivery stream), you can attach thefollowing permissions policy.

Permissions Policy for Reading an Kinesis Stream

{ "Version": "2012-10-17", "Statement": [ { "Sid": "ReadInputKinesis", "Effect": "Allow", "Action": [ "kinesis:DescribeStream", "kinesis:GetShardIterator", "kinesis:GetRecords" ], "Resource": [ "arn:aws:kinesis:aws-region:aws-account-id:stream/inputStreamName" ] } ]}

Permissions Policy for Reading a Kinesis Data Firehose DeliveryStream

{ "Version": "2012-10-17", "Statement": [ { "Sid": "ReadInputFirehose", "Effect": "Allow", "Action": [ "firehose:DescribeDeliveryStream", "firehose:Get*" ], "Resource": [ "arn:aws:firehose:aws-region:aws-account-id:deliverystream/inputFirehoseName" ] } ]

41

Page 48: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAuto Scaling Applications

}

If you direct Amazon Kinesis Data Analytics to write output to external destinations in your applicationoutput configuration, you need to grant the following permission to the IAM role.

Permissions Policy for Writing to an Kinesis Stream

{ "Version": "2012-10-17", "Statement": [ { "Sid": "WriteOutputKinesis", "Effect": "Allow", "Action": [ "kinesis:DescribeStream", "kinesis:PutRecord", "kinesis:PutRecords" ], "Resource": [ "arn:aws:kinesis:aws-region:aws-account-id:stream/output-stream-name" ] } ]}

Permissions Policy for Writing to a Firehose Delivery Stream

{ "Version": "2012-10-17", "Statement": [ { "Sid": "WriteOutputFirehose", "Effect": "Allow", "Action": [ "firehose:DescribeDeliveryStream", "firehose:PutRecord", "firehose:PutRecordBatch" ], "Resource": [ "arn:aws:firehose:aws-region:aws-account-id:deliverystream/output-firehose-name" ] } ]}

Automatically Scaling Applications to IncreaseThroughput

Amazon Kinesis Data Analytics elastically scales your application to accommodate the data throughputof your source stream and your query complexity for most scenarios. Kinesis Data Analytics provisionscapacity in the form of Kinesis Processing Units (KPU). A single KPU provides you with the memory (4 GB)and corresponding computing and networking.

The default limit for KPUs for your application is eight. For instructions on how to request an increase tothis limit, see To request a limit increase in AWS Service Limits.

42

Page 49: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAuto Scaling Applications

NoteThe drop-down item that is used to select a limit increase for KPUs is not yet available. Whenrequesting an increase, choose the following options on the support form:

• Regarding: Service limit increase• Limit Type: Kinesis Analytics• Region: Select your application's Region• Limit: Number of applications limit• New limit value: 100• Use Case Description: Provide your application prefix, and specify that you are requesting a

limit increase for KPUs.

43

Page 50: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideStep 1: Set Up an Account

Getting Started with Amazon KinesisData Analytics

Following, you can find topics to help get you started using Amazon Kinesis Data Analytics. If you arenew to Kinesis Data Analytics, we recommend that you review the concepts and terminology presentedin Amazon Kinesis Data Analytics: How It Works (p. 3) before performing the steps in the Getting Startedsection.

Topics

• Step 1: Set Up an AWS Account and Create an Administrator User (p. 44)

• Step 2: Set Up the AWS Command Line Interface (AWS CLI) (p. 45)

• Step 3: Create Your Starter Amazon Kinesis Data Analytics Application (p. 46)

• Step 4 (Optional) Edit the Schema and SQL Code Using the Console (p. 55)

Step 1: Set Up an AWS Account and Create anAdministrator User

Before you use Amazon Kinesis Data Analytics for the first time, complete the following tasks:

1. Sign Up for AWS (p. 44)

2. Create an IAM User (p. 45)

Sign Up for AWSWhen you sign up for Amazon Web Services (AWS), your AWS account is automatically signed up for allservices in AWS, including Amazon Kinesis Data Analytics. You are charged only for the services that youuse.

With Kinesis Data Analytics, you pay only for the resources you use. If you are a new AWS customer, youcan get started with Kinesis Data Analytics for free. For more information, see AWS Free Usage Tier.

If you already have an AWS account, skip to the next task. If you don't have an AWS account, perform thesteps in the following procedure to create one.

To create an AWS account

1. Open https://aws.amazon.com/, and then choose Create an AWS Account.

NoteThis might be unavailable in your browser if you previously signed into the AWSManagement Console. In that case, choose Sign in to a different account, and then chooseCreate a new AWS account.

2. Follow the online instructions.

44

Page 51: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideCreate an IAM User

Part of the sign-up procedure involves receiving a phone call and entering a PIN using the phonekeypad.

Note your AWS account ID because you'll need it for the next task.

Create an IAM UserServices in AWS, such as Amazon Kinesis Data Analytics, require that you provide credentials when youaccess them so that the service can determine whether you have permissions to access the resourcesowned by that service. The console requires your password. You can create access keys for your AWSaccount to access the AWS CLI or API. However, we don't recommend that you access AWS using thecredentials for your AWS account. Instead, we recommend that you use AWS Identity and AccessManagement (IAM). Create an IAM user, add the user to an IAM group with administrative permissions,and then grant administrative permissions to the IAM user that you created. You can then access AWSusing a special URL and that IAM user's credentials.

If you signed up for AWS, but you haven't created an IAM user for yourself, you can create one using theIAM console.

The Getting Started exercises in this guide assume that you have a user (adminuser) with administratorprivileges. Follow the procedure to create adminuser in your account.

To create an administrator user and sign in to the console

1. Create an administrator user called adminuser in your AWS account. For instructions, see CreatingYour First IAM User and Administrators Group in the IAM User Guide.

2. A user can sign in to the AWS Management Console using a special URL. For more information, HowUsers Sign In to Your Account in the IAM User Guide.

For more information about IAM, see the following:

• AWS Identity and Access Management (IAM)

• Getting Started

• IAM User Guide

Next StepStep 2: Set Up the AWS Command Line Interface (AWS CLI) (p. 45)

Step 2: Set Up the AWS Command Line Interface(AWS CLI)

Follow the steps to download and configure the AWS Command Line Interface (AWS CLI).

ImportantYou don't need the AWS CLI to perform the steps in the Getting Started exercise. However, someof the exercises in this guide use the AWS CLI. You can skip this step and go to Step 3: CreateYour Starter Amazon Kinesis Data Analytics Application (p. 46), and then set up the AWS CLIlater when you need it.

45

Page 52: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideNext Step

To set up the AWS CLI

1. Download and configure the AWS CLI. For instructions, see the following topics in the AWSCommand Line Interface User Guide:

• Getting Set Up with the AWS Command Line Interface

• Configuring the AWS Command Line Interface

2. Add a named profile for the administrator user in the AWS CLI config file. You use this profile whenexecuting the AWS CLI commands. For more information about named profiles, see Named Profilesin the AWS Command Line Interface User Guide.

[profile adminuser]aws_access_key_id = adminuser access key IDaws_secret_access_key = adminuser secret access keyregion = aws-region

For a list of available AWS Regions, see Regions and Endpoints in the Amazon Web Services GeneralReference.

3. Verify the setup by entering the following help command at the command prompt:

aws help

Next StepStep 3: Create Your Starter Amazon Kinesis Data Analytics Application (p. 46)

Step 3: Create Your Starter Amazon Kinesis DataAnalytics Application

By following the steps in this section, you can create your first Amazon Kinesis data analytics applicationusing the console.

NoteWe suggest that you review Amazon Kinesis Data Analytics: How It Works (p. 3) before trying theGetting Started exercise.

For this Getting Started exercise, you can use the console to work with either the demo stream ortemplates with application code.

• If you choose to use the demo stream, the console creates a Kinesis data stream in your account that iscalled kinesis-analytics-demo-stream.

A Kinesis data analytics application requires a streaming source. For this source, several SQL examplesin this guide use the demo stream kinesis-analytics-demo-stream. The console also runs ascript that continuously adds sample data (simulated stock trade records) to this stream, as shownfollowing.

46

Page 53: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideStep 3: Create Your Starter Analytics Application

You can use kinesis-analytics-demo-stream as the streaming source for your application in thisexercise.

NoteThe demo stream remains in your account. You can use it to test other examples in this guide.However, when you leave the console, the script that the console uses stops populating thedata. When needed, the console provides the option to start populating the stream again.

• If you choose to use the templates with example application code, you use template code that theconsole provides to perform simple analytics on the demo stream.

You use these features to quickly set up your first application as follows:

1. Create an application – You only need to provide a name. The console creates the application and theservice sets the application state to READY.

 

2. Configure input – First, you add a streaming source, the demo stream. You must create a demo streamin the console before you can use it. Then, the console takes a random sample of records on the demostream and infers a schema for the in-application input stream that is created. The console names thein-application stream SOURCE_SQL_STREAM_001.

The console uses the discovery API to infer the schema. If necessary, you can edit the inferred schema.For more information, see DiscoverInputSchema (p. 182). Kinesis Data Analytics uses this schema tocreate an in-application stream.

 

When you start the application, Kinesis Data Analytics reads the demo stream continuously on yourbehalf and inserts rows in the SOURCE_SQL_STREAM_001 in-application input stream.

 

3. Specify application code – You use a template (called Continuous filter) that provides the followingcode:

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (symbol VARCHAR(4), sector VARCHAR(12), CHANGE DOUBLE, price DOUBLE); -- Create pump to insert into output. CREATE OR REPLACE PUMP "STREAM_PUMP" AS

47

Page 54: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideStep 3.1: Create an Application

INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM ticker_symbol, sector, CHANGE, price FROM "SOURCE_SQL_STREAM_001" WHERE sector SIMILAR TO '%TECH%';

The application code queries the in-application stream SOURCE_SQL_STREAM_001. The code theninserts the resulting rows in another in-application stream DESTINATION_SQL_STREAM, using pumps.For more information about this coding pattern, see Application Code (p. 30).

For information about the SQL language elements that are supported by Kinesis Data Analytics, seeAmazon Kinesis Data Analytics SQL Reference.

 

4. Configuring output – In this exercise, you don't configure any output. That is, you don't persist data inthe in-application stream that your application creates to any external destination. Instead, you verifyquery results in the console. Additional examples in this guide show how to configure output. For oneexample, see Example: Simple Alerts (p. 96).

ImportantThe exercise uses the US East (N. Virginia) Region (us-east-1) to set up the application. You canuse any of the supported AWS Regions.

Next Step

Step 3.1: Create an Application (p. 48)

Step 3.1: Create an ApplicationIn this section, you create an Amazon Kinesis data analytics application. You configure application inputin the next step.

To create a data analytics application

1. Sign in to the AWS Management Console and open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics.

2. Choose Create new application.

3. On the New application page, type an application name, type a description, and then choose Saveand continue.

Doing this creates a Kinesis data analytics application with a status of READY. The console shows theapplication hub where you can configure input and output.

48

Page 55: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideStep 3.2: Configure Input

NoteTo create an application, the CreateApplication (p. 163) operation requires only theapplication name. You can add input and output configuration after you create anapplication in the console.

In the next step, you configure input for the application. In the input configuration, you add astreaming data source to the application and discover a schema for an in-application input streamby sampling data on the streaming source.

Next Step

Step 3.2: Configure Input (p. 49)

Step 3.2: Configure InputYour application needs a streaming source. To help you get started, the console can create a demostream (called kinesis-analytics-demo-stream). The console also runs a script that populatesrecords in the stream.

To add a streaming source to your application

1. On the application hub page in the console, choose Connect to a source.

2. On the page that appears, review the following:

• Source section, where you specify a streaming source for your application. You can select anexisting stream source or create one. In this exercise, you create a new stream, the demo stream.

 

By default the console names the in-application input stream that is created asINPUT_SQL_STREAM_001. For this exercise, keep this name as it appears.

 

49

Page 56: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideStep 3.2: Configure Input

• Stream reference name – This option shows the name of the in-application input stream that iscreated, SOURCE_SQL_STREAM_001. You can change the name, but for this exercise, keep thisname.

 

In the input configuration, you map the demo stream to an in-application input stream that iscreated. When you start the application, Amazon Kinesis Data Analytics continuously reads thedemo stream and insert rows in the in-application input stream. You query this in-applicationinput stream in your application code.

 

• Record pre-processing with AWS Lambda: This option is where you specify an AWS Lambdaexpression that modifies the records in the input stream before your application code executes.In this exercise, leave the Disabled option selected. For more information about Lambdapreprocessing, see Preprocessing Data Using a Lambda Function (p. 20).

After you provide all the information on this page, the console sends an update request (seeUpdateApplication (p. 192)) to add the input configuration the application.

3. On the Source page, choose Configure a new stream.

4. Choose Create demo stream. The console configures the application input by doing the following:

• The console creates a Kinesis data stream called kinesis-analytics-demo-stream.

• The console populates the stream with sample stock ticker data.

• Using the DiscoverInputSchema (p. 182) input action, the console infers a schema by readingsample records on the stream. The schema that is inferred is the schema for the in-applicationinput stream that is created. For more information, see Configuring Application Input (p. 5).

• The console shows the inferred schema and the sample data it read from the streaming source toinfer the schema.

The console displays the sample records on the streaming source.

The following appear on the Stream sample console page:

• The Raw stream sample tab shows the raw stream records sampled by theDiscoverInputSchema (p. 182) API action to infer the schema.

50

Page 57: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideStep 3.3: Add Real-Time Analytics (Add Application Code)

• The Formatted stream sample tab shows the tabular version of the data in the Raw streamsample tab.

• If you choose Edit schema, you can edit the inferred schema. For this exercise, don't change theinferred schema. For more information about editing a schema, see Working with the SchemaEditor (p. 55).

If you choose Rediscover schema, you can request the console to runDiscoverInputSchema (p. 182) again and infer the schema.

5. Choose Save and continue.

You now have an application with input configuration added to it. In the next step, you add SQLcode to perform some analytics on the data in-application input stream.

Next Step

Step 3.3: Add Real-Time Analytics (Add Application Code) (p. 51)

Step 3.3: Add Real-Time Analytics (Add ApplicationCode)You can write your own SQL queries against the in-application stream, but for the following step you useone of the templates that provides sample code.

1. On the application hub page, choose Go to SQL editor.

2. In the Would you like to start running "GSExample1"? dialog box, choose Yes, start application.

The console sends a request to start the application (see StartApplication (p. 188)), and then theSQL editor page appears.

3. The console opens the SQL editor page. Review the page, including the buttons (Add SQL fromtemplates, Save and run SQL) and various tabs.

4. In the SQL editor, choose Add SQL from templates.

51

Page 58: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideStep 3.3: Add Real-Time Analytics (Add Application Code)

5. From the available template list, choose Continuous filter. The sample code reads data from one in-application stream (the WHERE clause filters the rows) and inserts it in another in-application streamas follows:

• It creates the in-application stream DESTINATION_SQL_STREAM.

• It creates a pump STREAM_PUMP, and uses it to select rows from SOURCE_SQL_STREAM_001 andinsert them in the DESTINATION_SQL_STREAM.

6. Choose Add this SQL to editor.

7. Test the application code as follows:

Remember, you already started the application (status is RUNNING). Therefore, Amazon KinesisData Analytics is already continuously reading from the streaming source and adding rows to the in-application stream SOURCE_SQL_STREAM_001.

a. In the SQL Editor, choose Save and run SQL. The console first sends update request to save theapplication code. Then, the code continuously executes.

b. You can see the results in the Real-time analytics tab.

The SQL editor has the following tabs:

52

Page 59: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideStep 3.4: (Optional) Update the Application Code

• The Source data tab shows an in-application input stream that is mapped to the streamingsource. Choose the in-application stream, and you can see data coming in. Note the additionalcolumns in the in-application input stream that weren't specified in the input configuration.These include the following time stamp columns:

 

• ROWTIME – Each row in an in-application stream has a special column called ROWTIME.This column is the time stamp when Amazon Kinesis Data Analytics inserted the row in thefirst in-application stream (the in-application input stream that is mapped to the streamingsource).

 

• Approximate_Arrival_Time – Each Kinesis Data Analytics record includes a value calledApproximate_Arrival_Time. This value is the approximate arrival time stamp that isset when the streaming source successfully receives and stores the record. When KinesisData Analytics reads records from a streaming source, it fetches this column into the in-application input stream.

These time stamp values are useful in windowed queries that are time-based. For moreinformation, see Windowed Queries (p. 69).

 

• The Real-time analytics tab shows all the other in-application streams created by yourapplication code. It also includes the error stream. Kinesis Data Analytics sends any rows itcannot process to the error stream. For more information, see Error Handling (p. 39).

 

Choose DESTINATION_SQL_STREAM to view the rows your application code inserted. Notethe additional columns that your application code didn't create. These columns include theROWTIME time stamp column. Kinesis Data Analytics simply copies these values from thesource (SOURCE_SQL_STREAM_001).

 

• The Destination tab shows the external destination where Kinesis Data Analytics writes thequery results. You haven't configured any external destination for your application output yet.

Next Step

Step 3.4: (Optional) Update the Application Code (p. 53)

Step 3.4: (Optional) Update the Application CodeIn this step, you explore how to update the application code.

To update application code

1. Create another in-application stream as follows:

• Create another in-application stream called DESTINATION_SQL_STREAM_2.

• Create a pump, and then use it to insert rows in the newly created stream by selecting rows fromthe DESTINATION_SQL_STREAM.

In the SQL editor, append the following code to the existing application code:

53

Page 60: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideStep 3.4: (Optional) Update the Application Code

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM_2" (ticker_symbol VARCHAR(4), change DOUBLE, price DOUBLE);

CREATE OR REPLACE PUMP "STREAM_PUMP_2" AS INSERT INTO "DESTINATION_SQL_STREAM_2" SELECT STREAM ticker_symbol, change, price FROM "DESTINATION_SQL_STREAM";

Save and run the code. Additional in-application streams appear on the Real-time analytics tab.

2. Create two in-application streams. Filter rows in the SOURCE_SQL_STREAM_001 based on the stockticker, and then insert them in to these separate streams.

Append the following SQL statements to your application code:

CREATE OR REPLACE STREAM "AMZN_STREAM" (ticker_symbol VARCHAR(4), change DOUBLE, price DOUBLE);

CREATE OR REPLACE PUMP "AMZN_PUMP" AS INSERT INTO "AMZN_STREAM" SELECT STREAM ticker_symbol, change, price FROM "SOURCE_SQL_STREAM_001" WHERE ticker_symbol SIMILAR TO '%AMZN%';

CREATE OR REPLACE STREAM "TGT_STREAM" (ticker_symbol VARCHAR(4), change DOUBLE, price DOUBLE);

CREATE OR REPLACE PUMP "TGT_PUMP" AS INSERT INTO "TGT_STREAM" SELECT STREAM ticker_symbol, change, price FROM "SOURCE_SQL_STREAM_001" WHERE ticker_symbol SIMILAR TO '%TGT%';

Save and run the code. Notice additional in-application streams on the Real-time analytics tab.

You now have your first working Amazon Kinesis data analytics application. In this exercise, you did thefollowing:

• Created your first Kinesis data analytics application.

 

• Configured application input that identified the demo stream as the streaming source and mappedit to an in-application stream (SOURCE_SQL_STREAM_001) that is created. Kinesis Data Analyticscontinuously reads the demo stream and inserts records in the in-application stream.

 

• Your application code queried the SOURCE_SQL_STREAM_001 and wrote output to another in-application stream called DESTINATION_SQL_STREAM.

Now you can optionally configure application output to write the application output to anexternal destination. That is, you can configure the application output to write records in the

54

Page 61: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideStep 4 (Optional) Edit the Schemaand SQL Code Using the Console

DESTINATION_SQL_STREAM to an external destination. For this exercise, this is an optional step. Tolearn how to configure the destination, go to the next step.

Next Step

Step 4 (Optional) Edit the Schema and SQL Code Using the Console (p. 55).

Step 4 (Optional) Edit the Schema and SQL CodeUsing the Console

Following, you can find information about how to edit an inferred schema and how to edit SQL code forAmazon Kinesis Data Analytics. You do so by working with the schema editor and SQL editor that arepart of the Kinesis Data Analytics console.

Topics• Working with the Schema Editor (p. 55)• Working with the SQL Editor (p. 62)

Working with the Schema EditorThe schema for an Amazon Kinesis data analytics application's input stream defines how data from thestream is made available to SQL queries in the application.

The schema contains selection criteria for determining what part of the streaming input is transformedinto a data column in the in-application input stream. This input can be one of the following:

• A JSONPath expression for JSON input streams. JSONPath is a tool for querying JSON data.• A column number for input streams in comma-separated values (CSV) format.• A column name and a SQL data type for presenting the data in the in-application data stream. The

data type also contains a length for character or binary data.

The console attempts to generate the schema using DiscoverInputSchema (p. 182). If schema discoveryfails or returns an incorrect or incomplete schema, you must edit the schema manually by using theschema editor.

55

Page 62: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideWorking with the Schema Editor

Schema Editor Main ScreenThe following screenshot shows the main screen for the Schema Editor.

You can apply the following edits to the schema:

• Add a column (1): You might need to add a data column if a data item is not detected automatically.• Delete a column (2): You can exclude data from the source stream if your application doesn't require

it. This exclusion doesn't affect the data in the source stream. If data is excluded, that data simply isn'tmade available to the application.

56

Page 63: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideWorking with the Schema Editor

• Rename a column (3). A column name can't be blank, must be longer than a single character, andmust not contain reserved SQL keywords. The name must also meet naming criteria for SQL ordinaryidentifiers: The name must start with a letter and contain only letters, underscore characters, anddigits.

• Change the data type (4) or length (5) of a column: You can specify a compatible data type for acolumn. If you specify an incompatible data type, the column is either populated with NULL or the in-application stream is not populated at all. In the latter case, errors are written to the error stream. Ifyou specify a length for a column that is too small, the incoming data is truncated.

• Change the selection criteria of a column (6): You can edit the JSONPath expression or CSV columnorder used to determine the source of the data in a column. To change the selection criteria for aJSON schema, type a new value for the row path expression. A CSV schema uses the column order asselection criteria. To change the selection criteria for a CSV schema, change the order of the columns.

Editing the Schema for a Streaming Source

If you need to edit a schema for a streaming source, follow these steps.

To edit the schema for a streaming source

1. On the Source page, choose Edit schema.

2. On the Edit schema page, edit the source schema.

57

Page 64: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideWorking with the Schema Editor

3. For Format, choose JSON or CSV. For JSON or CSV format, the supported encoding is ISO 8859-1.

For further information on editing the schema for JSON or CSV format, see the procedures in the nextsections.

Editing a JSON Schema

You can edit a JSON schema by using the following steps.

To edit a JSON schema

1. In the schema editor, choose Add column to add a column.

A new column appears in the first column position. To change column order, choose the up anddown arrows next to the column name.

For a new column, provide the following information:

• For Column name, type a name.

A column name cannot be blank, must be longer than a single character, and must not containreserved SQL keywords. It must also meet naming criteria for SQL ordinary identifiers: It muststart with a letter and contain only letters, underscore characters, and digits.

• For Column type, type an SQL data type.

A column type can be any supported SQL data type. If the new data type is CHAR, VARBINARY, orVARCHAR, specify a data length for Length. For more information, see Data Types.

58

Page 65: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideWorking with the Schema Editor

• For Row path, provide a row path. A row path is a valid JSONPath expression that maps to a JSONelement.

NoteThe base Row path value is the path to the top-level parent that contains the data tobe imported. This value is $ by default. For more information, see RecordRowPath inJSONMappingParameters.

2. To delete a column, choose the x icon next to the column number.

3. To rename a column, type a new name for Column name. The new column name cannot be blank,must be longer than a single character, and must not contain reserved SQL keywords. It must alsomeet naming criteria for SQL ordinary identifiers: It must start with a letter and contain only letters,underscore characters, and digits.

4. To change the data type of a column, choose a new data type for Column type. If the new data typeis CHAR, VARBINARY, or VARCHAR, specify a data length for Length. For more information, see DataTypes.

5. Choose Save schema and update stream to save your changes.

The modified schema appears in the editor and looks similar to the following.

59

Page 66: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideWorking with the Schema Editor

If your schema has many rows, you can filter the rows using Filter by column name. For example, to editcolumn names that start with P, such as a Price column, type P in the Filter by column name box.

Editing a CSV Schema

You can edit a CSV schema by using the following steps.

To edit a CSV schema

1. In the schema editor, for Row delimiter, choose the delimiter used by your incoming data stream.This is the delimiter between records of data in your stream, such as a newline character.

2. For Column delimiter, choose the delimiter used by your incoming data stream. This is the delimiterbetween fields of data in your stream, such as a comma.

3. To add a column, choose Add column.

A new column appears in the first column position. To change column order, choose the up anddown arrows next to the column name.

For a new column, provide the following information:

• For Column name, type a name.

A column name cannot be blank, must be longer than a single character, and must not containreserved SQL keywords. It must also meet naming criteria for SQL ordinary identifiers: It muststart with a letter and contain only letters, underscore characters, and digits.

• For Column type, type a SQL data type.

A column type can be any supported SQL data type. If the new data type is CHAR, VARBINARY, orVARCHAR, specify a data length for Length. For more information, see Data Types.

60

Page 67: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideWorking with the Schema Editor

4. To delete a column, choose the x icon next to the column number.

5. To rename a column, type a new name in Column name. The new column name cannot be blank,must be longer than a single character, and must not contain reserved SQL keywords. It must alsomeet naming criteria for SQL ordinary identifiers: It must start with a letter and contain only letters,underscore characters, and digits.

6. To change the data type of a column, choose a new data type for Column type. If the new data typeis CHAR, VARBINARY, or VARCHAR, specify a data length for Length. For more information, see DataTypes.

7. Choose Save schema and update stream to save your changes.

The modified schema appears in the editor and looks similar to the following.

61

Page 68: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideWorking with the SQL Editor

If your schema has many rows, you can filter the rows using Filter by column name. For example, to editcolumn names that start with P, such as a Price column, type P in the Filter by column name box.

Working with the SQL EditorFollowing, you can find information about sections of the SQL editor and how each works. In theSQL editor, you can either author your own code yourself or choose Add SQL from templates. A SQLtemplate gives you example SQL code that can help you write common Amazon Kinesis data analyticsapplications. The example applications in this guide use some of these templates. For more information,see Example Amazon Kinesis Data Analytics Applications (p. 76).

Source Data Tab

The Source data tab identifies a streaming source. It also identifies the in-application input stream thatthis source maps to and that provides the application input configuration.

62

Page 69: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideWorking with the SQL Editor

Amazon Kinesis Data Analytics provides the following time stamp columns, so that you don't need toprovide explicit mapping in your input configuration:

• ROWTIME – Each row in an in-application stream has a special column called ROWTIME. This columnis the time stamp for the point when Kinesis Data Analytics inserted the row in the first in-applicationstream.

• Approximate_Arrival_Time – Records on your streaming source include theApproximate_Arrival_Timestamp column. It is the approximate arrival time stamp that is setwhen the streaming source successfully receives and stores the related record. Kinesis Data Analyticsfetches this column into the in-application input stream as Approximate_Arrival_Time. AmazonKinesis Data Analytics provides this column only in the in-application input stream that is mapped tothe streaming source.

These time stamp values are useful in windowed queries that are time-based. For more information, seeWindowed Queries (p. 69).

Real-Time Analytics Tab

The Real-time analytics tab shows all the in-application streams that your application code creates.This group of streams includes the error stream (error_stream) that Amazon Kinesis Data Analyticsprovides for all applications.

63

Page 70: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideWorking with the SQL Editor

Destination TabThe Destination tab enables you to configure application output, to persist in-application streams toexternal destinations. You can configure output to persist data in any of the in-application streams toexternal destinations. For more information, see Configuring Application Output (p. 31).

64

Page 71: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideIn-Application Streams and Pumps

Streaming SQL ConceptsAmazon Kinesis Data Analytics implements the ANSI 2008 SQL standard with extensions. Theseextensions enable you to process streaming data. The following topics cover key streaming SQLconcepts.

Topics• In-Application Streams and Pumps (p. 65)• Timestamps and the ROWTIME Column (p. 66)• Continuous Queries (p. 68)• Windowed Queries (p. 69)• Streaming Data Operations: Stream Joins (p. 74)

In-Application Streams and PumpsWhen you configure application input, you map a streaming source to an in-application stream thatis created. Data continuously flows from the streaming source into the in-application stream. An in-application stream works like a table that you can query using SQL statements, but it's called a streambecause it represents continuous data flow.

NoteDo not confuse in-application streams with the Amazon Kinesis streams and Kinesis DataFirehose delivery streams. In-application streams exist only in the context of an Amazon KinesisData Analytics application. Amazon Kinesis streams and Kinesis Data Firehose delivery streamsexist independent of your application, and you can configure them as a streaming source in yourapplication input configuration or as a destination in output configuration.

You can also create additional in-application streams as needed to store intermediate query results.Creating an in-application stream is a two-step process. First, you create an in-application stream, andthen you pump data into it. For example, suppose the input configuration of your application createsan in-application stream called INPUTSTREAM. In the following example, you create another stream(TEMPSTREAM), and then you pump data from INPUTSTREAM into it.

1. Create an in-application stream (TEMPSTREAM) with three columns, as shown following:

CREATE OR REPLACE STREAM "TEMPSTREAM" ( "column1" BIGINT NOT NULL, "column2" INTEGER, "column3" VARCHAR(64));

The column names are specified in quotes, making them case-sensitive. For more information, seeIdentifiers in the Amazon Kinesis Data Analytics SQL Reference.

2. Insert data into the stream using a pump. A pump is a continuous insert query running that insertsdata from one in-application stream to another in-application stream. The following statementcreates a pump (SAMPLEPUMP) and inserts data into the TEMPSTREAM by selecting records fromanother stream (INPUTSTREAM).

CREATE OR REPLACE PUMP "SAMPLEPUMP" AS INSERT INTO "TEMPSTREAM" ("column1", "column2", "column3") SELECT STREAM inputcolumn1, inputcolumn2, inputcolumn3

65

Page 72: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideTimestamps and the ROWTIME Column

FROM "INPUTSTREAM";

You can have multiple writers insert into an in-application stream, and there can be multiple readersselected from the stream. You can think of an in-application stream as implementing a publish/subscribemessaging paradigm in which the data row, including time of creation and time of receipt, can beprocessed, interpreted, and forwarded by a cascade of streaming SQL statements, without having to bestored in a traditional RDBMS.

After an in-application stream is created, you can perform normal SQL queries.

NoteWhen querying streams, most SQL statements are bound using a row-based or time-basedwindow. For more information, see Windowed Queries (p. 69).

You can also join streams. For examples of joining streams, see Streaming Data Operations: StreamJoins (p. 74).

Timestamps and the ROWTIME ColumnIn-application streams include a special column called ROWTIME. It stores a timestamp when AmazonKinesis Data Analytics inserts a row in the first in-application stream. ROWTIME reflects the timestamp atwhich Amazon Kinesis Data Analytics inserted a record into the first in-application stream after readingfrom the streaming source. This ROWTIME value is then maintained throughout your application.

NoteWhen you pump records from one in-application stream into another, you don't need toexplicitly copy the ROWTIME column, Amazon Kinesis Data Analytics copies this column for you.

Amazon Kinesis Data Analytics guarantees that the ROWTIME values are monotonically increased.You use this timestamp in time-based windowed queries. For more information, see WindowedQueries (p. 69).

You can access the ROWTIME column in your SELECT statement like any other columns in your in-application stream. For example:

SELECT STREAM ROWTIME, some_col_1, some_col_2FROM SOURCE_SQL_STREAM_001

Understanding Various Times in Streaming AnalyticsIn addition to ROWTIME, there are other types of times in real-time streaming applications. These are:

• Event time – The timestamp when the event occurred. This is also sometimes called the client-sidetime. It is often desirable to use this time in analytics because it is the time when an event occurred.However, many event sources, such as mobile phones and web clients, do not have reliable clocks,which can lead to inaccurate times. In addition, connectivity issues can lead to records appearing on astream not in the same order the events occurred.

 • Ingest time – The timestamp of when record was added to the streaming source. Amazon Kinesis Data

Streams includes a field called ApproximateArrivalTimeStamp in every record that provides thistimestamp. This is also sometimes referred to as the server-side time. This ingest time is often the closeapproximation of event time. If there is any kind of delay in the record ingestion to the stream, thiscan lead to inaccuracies, which are typically rare. Also, the ingest time is rarely out of order, but it can

66

Page 73: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideUnderstanding Various Times in Streaming Analytics

occur due to the distributed nature of streaming data. Therefore, Ingest time is a mostly accurate andin-order reflection of the event time.

 • Processing time – The timestamp when Amazon Kinesis Data Analytics inserts a row in the first in-

application stream. Amazon Kinesis Data Analytics provides this timestamp in the ROWTIME columnthat exists in each in-application stream. The processing time is always monotonically increasing, but itwill not be accurate if your application falls behind (if an application falls behind, the processing timewill not accurately reflect the event time). This ROWTIME is very accurate in relation to the wall clock,but it might not be the time when the event actually occurred.

As you can see from the preceding discussion, using each of these times in windowed queries that aretime-based has advantages and disadvantages. We recommend you choose one or more of these times,and a strategy to deal with the relevant disadvantages based on your use case scenario.

NoteIf you are using row-based windows, time is not an issue and you can ignore this section.

We recommend a two-window strategy that uses two time-based, both ROWTIME and one of the othertimes (ingest or event time).

• Use ROWTIME as the first window, which controls how frequently the query emits the results, as shownin the following example. It is not used as a logical time.

• Use one of the other times that is the logical time you want to associated with your analytics. Thistime represents when the event occurred. In the following example, the analytics goal is to group therecords and return count by ticker.

The advantage of this strategy is that it can use a time that represents when the event occurred, andit can gracefully handle when your application falls behind or when events arrive out of order. If theapplication falls behind when bringing records into the in-application stream, they are still grouped bythe logical time in the second window. The query uses ROWTIME to guarantee the order of processing.Any records that are late (ingest timestamp shows earlier value compared to the ROWTIME value) areprocessed successfully too.

Consider the following query against the demo stream used in the Getting Started Exercise. The queryuses the GROUP BY clause and emits ticker count in a one-minute tumbling window.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ("ingest_time" timestamp, "approximate_arrival_time" timestamp, "ticker_symbol" VARCHAR(12), "symbol_count" integer); CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND) AS "ingest_time", STEP("SOURCE_SQL_STREAM_001".Approximate_Arrival_Time BY INTERVAL '60' SECOND) AS "approximate_arrival_time", "TICKER_SYMBOL", COUNT(*) AS "symbol_count" FROM "SOURCE_SQL_STREAM_001" GROUP BY "TICKER_SYMBOL", STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND), STEP("SOURCE_SQL_STREAM_001".Approximate_Arrival_Time BY INTERVAL '60' SECOND);

In GROUP BY, you first group the records based on ROWTIME in a one-minute window and then byApproximate_Arrival_Time.

67

Page 74: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideContinuous Queries

Note that the timestamp values in the result are rounded down to the nearest 60 second interval.The first group result emitted by the query shows records in the first minute. The second group ofresults emitted shows records in the next minutes based on ROWTIME. The last record indicates that theapplication was late in bringing the record in the in-application stream (it shows a late ROWTIME valuecompared to the ingest timestamp).

ROWTIME INGEST_TIME TICKER_SYMBOL SYMBOL_COUNT

--First one minute window.2016-07-19 17:05:00.0 2016-07-19 17:05:00.0 ABC 102016-07-19 17:05:00.0 2016-07-19 17:05:00.0 DEF 152016-07-19 17:05:00.0 2016-07-19 17:05:00.0 XYZ 6–-Second one minute window.2016-07-19 17:06:00.0 2016-07-19 17:06:00.0 ABC 112016-07-19 17:06:00.0 2016-07-19 17:06:00.0 DEF 112016-07-19 17:06:00.0 2016-07-19 17:05:00.0 XYZ 1 ***

***late-arriving record, instead of appearing in the result of the first 1-minute windows (based on ingest_time, it is in the result of the second 1-minute window.

You can combine the results for a final accurate count per minute by pushing the results to adownstream database. For example, you can configure application output to persist the results to aKinesis Data Firehose delivery stream that can write to an Amazon Redshift table. After results are in anAmazon Redshift table, you can query the Amazon Redshift table to compute the total count group byTicker_Symbol. In the case of XYZ, the total is accurate (6+1) even though a record arrived late.

Continuous QueriesA query over a stream executes continuously over streaming data. This continuous execution enablesscenarios, such as the ability for applications to continuously query a stream and generate alerts.

In the Getting Started exercise, you have an in-application stream called SOURCE_SQL_STREAM_001 thatcontinuously receives stock prices from a demo stream (an Kinesis stream). Following is the schema:

(TICKER_SYMBOL VARCHAR(4), SECTOR varchar(16), CHANGE REAL, PRICE REAL)

Suppose you are interested in stock price changes greater than 15%. You can use the following query inyour application code. This query runs continuously and emits records when a stock price change greaterthan 1% is detected.

SELECT STREAM TICKER_SYMBOL, PRICE FROM "SOURCE_SQL_STREAM_001" WHERE (ABS((CHANGE / (PRICE-CHANGE)) * 100)) > 1

Use the following procedure to set up an Amazon Kinesis Data Analytics application and test this query.

To test the query

1. Set up an application by following the Getting Started Exercise.2. Replace the SELECT statement in the application code with the preceding SELECT query. The

resulting application code is shown following:

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4),

68

Page 75: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideWindowed Queries

price DOUBLE);-- CREATE OR REPLACE PUMP to insert into outputCREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM TICKER_SYMBOL, PRICE FROM "SOURCE_SQL_STREAM_001" WHERE (ABS((CHANGE / (PRICE-CHANGE)) * 100)) > 1;

Windowed QueriesSQL queries in your application code execute continuously over in-application streams. And, an in-application stream represents unbounded data that is flowing continuously through your application.Therefore, to get result sets from this continuously updating input, you often bound queries using awindow defined in terms of time or rows. These are also called windowed SQL.

For a time-based windowed query, you specify the window size in terms of time (for example, a one-minute window). This requires a timestamp column in your in-application stream that is monotonicallyincreasing (timestamp for a new row is greater than or equal to previous row). Amazon Kinesis DataAnalytics provides such a timestamp column called ROWTIME for each in-application stream. You canuse this column when specifying time-based queries. For your application, you might choose some othertimestamp option. For more information, see Timestamps and the ROWTIME Column (p. 66).

For a row-based windowed query, you specify window size in terms of the number of rows.

You can specify a query to process records in a tumbling window or sliding window manner, dependingon your application needs. For more information, see the following topics:

• Tumbling Windows (Aggregations Using GROUP BY) (p. 69)• Sliding Windows (p. 70)

Tumbling Windows (Aggregations Using GROUP BY)When a windowed query processes each window in a non-overlapping manner, the window is referred toas a tumbling window. In this case, each record on an in-application stream belongs to a specific window,and it's processed only once (when the query processes the window to which the record belongs).

For example, an aggregation query using a GROUP BY clause processes rows in a tumbling window. Thedemo stream in the getting started exercise receives stock price data that is mapped to the in-applicationstream SOURCE_SQL_STREAM_001 in your application. This stream has the following schema.

(TICKER_SYMBOL VARCHAR(4), SECTOR varchar(16), CHANGE REAL, PRICE REAL)

In your application code, suppose you want to find aggregate (min, max) prices for each ticker over aone-minute window. You can use the following query.

69

Page 76: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideSliding Windows

SELECT STREAM ROWTIME, Ticker_Symbol, MIN(Price) AS Price, MAX(Price) AS PriceFROM "SOURCE_SQL_STREAM_001"GROUP BY Ticker_Symbol, STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND);

The preceding is an example of a windowed query that is time-based. The query groups records byROWTIME values. For reporting on a per-minute basis, the STEP function rounds down the ROWTIMEvalues to the nearest minute.

NoteYou can also use the FLOOR function to group records into windows, but FLOOR can only roundtime values down to a whole time unit (hour, minute, second, and so on). STEP is recommendedfor grouping records into tumbling windows because it can round values down to an arbitraryinterval, e.g. 30 seconds.

This query is an example of a nonoverlapping (tumbling) window. The GROUP BY clause groups recordsin a one-minute window, and each record belongs to a specific window (no overlapping). The queryemits one output record per minute, providing the min/max ticker price recorded at the specific minute.This type of query is useful for generating periodic reports from the input data stream. In this example,reports are generated each minute.

To test the query

1. Set up an application by following the getting started exercise.

2. Replace the SELECT statement in the application code by the preceding SELECT query. The resultingapplication code is shown following:

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ( ticker_symbol VARCHAR(4), Min_Price DOUBLE, Max_Price DOUBLE);-- CREATE OR REPLACE PUMP to insert into outputCREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM Ticker_Symbol, MIN(Price) AS Min_Price, MAX(Price) AS Max_Price FROM "SOURCE_SQL_STREAM_001" GROUP BY Ticker_Symbol, STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND);

Sliding WindowsInstead of grouping records using GROUP BY, you can define a time-based or row-based window. You dothis by adding an explicit WINDOW clause.

In this case, as the window slides with time, Amazon Kinesis Data Analytics emits an output when newrecords appear on the stream. Kinesis Data Analytics emits this output by processing rows in the window.Windows can overlap in this type of processing, and a record can be part of multiple windows and beprocessed with each window. The following example illustrates a sliding window.

Consider a simple query that counts records on the stream. We assume a 5-second window. In thefollowing example stream, new records arrive at time t1, t2, t6, and t7, and three records arrive at time t8seconds.

70

Page 77: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideSliding Windows

Keep the following in mind:

• We assume a 5-second window. The 5-second window slides continuously with time.

• For every row that enters a window, an output row is emitted by the sliding window. Soon after theapplication starts, you see the query emit output for every new record that appears on the stream,even though a 5-second window hasn't passed yet. For example, the query emits output when a recordappears in the first second and second second. Later, the query processes records in the 5-secondwindow.

• The windows slide with time. If an old record on the stream falls out of the window, the query doesn'temit output unless there is also a new record on the stream that falls within that 5-second window.

Suppose the query starts executing at t0. If so, the following occurs:

1. At the time t0, the query starts. The query doesn't emit output (count value) because there are norecords at this time.

2. At time t1, a new record appears on the stream, and the query emits count value 1.

3. At time t2, another record appears, and the query emits count 2.

4. The 5-second window slides with time:

• At t3, the sliding window t3 to t0

• At t4 (sliding window t4 to t0)

• At t5 the sliding window t5–t0

At all of these times, the 5-second window has the same records—there are no new records.Therefore, the query doesn't emit any output.

5. At time t6, the 5-second window is (t6 to t1). The query detects one new record at t6 so it emits output2. The record at t1 is no longer in the window and doesn't count.

71

Page 78: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideSliding Windows

6. At time t7, the 5-second window is t7 to t2. The query detects one new record at t7 so it emits output2. The record at t2 is no longer in the 5-second window, and therefore isn't counted.

7. At time t8, the 5-second window is t8 to t3. The query detects three new records, and therefore emitsrecord count 5.

In summary, the window is a fixed size and slides with time. The query emits output when new recordsappear.

NoteWe recommend that you use a sliding window no longer than one hour. If you use a longerwindow, the application takes longer to restart after regular system maintenance, because thesource data needs to be read from the stream again.

The following are example queries that use the WINDOW clause to define windows and performaggregates. Because the queries don't specify GROUP BY, the query uses the sliding window approach toprocess records on the stream.

Example 1: Process a Stream Using a 1-minute Sliding WindowConsider the demo stream in the Getting Started exercise that populates the in-application stream,SOURCE_SQL_STREAM_001. The following is the schema.

(TICKER_SYMBOL VARCHAR(4), SECTOR varchar(16), CHANGE REAL, PRICE REAL)

Suppose that you want your application to compute aggregates using a sliding 1-minute window. That is,for each new record that appears on the stream, you want the application to emit an output by applyingaggregates on records in the preceding 1-minute window.

You can use the following time-based windowed query. The query uses the WINDOW clause to define the1-minute range interval. The PARTITION BY in the WINDOW clause groups records by ticker values withinthe sliding window.

SELECT STREAM ticker_symbol, MIN(Price) OVER W1 AS Min_Price, MAX(Price) OVER W1 AS Max_Price, AVG(Price) OVER W1 AS Avg_PriceFROM "SOURCE_SQL_STREAM_001"WINDOW W1 AS (

72

Page 79: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideSliding Windows

PARTITION BY ticker_symbol RANGE INTERVAL '1' MINUTE PRECEDING);

To test the query

1. Set up an application by following the Getting Started Exercise.

2. Replace the SELECT statement in the application code with the preceding SELECT query. Theresulting application code is the following.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ( ticker_symbol VARCHAR(10), Min_Price double, Max_Price double, Avg_Price double);CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM ticker_symbol, MIN(Price) OVER W1 AS Min_Price, MAX(Price) OVER W1 AS Max_Price, AVG(Price) OVER W1 AS Avg_Price FROM "SOURCE_SQL_STREAM_001" WINDOW W1 AS ( PARTITION BY ticker_symbol RANGE INTERVAL '1' MINUTE PRECEDING);

Example 2: Query Applying Aggregates on a Sliding Window

The following query on the demo stream returns the average of the percent change in the price of eachticker in a 10-second window.

SELECT STREAM Ticker_Symbol, AVG(Change / (Price - Change)) over W1 as Avg_Percent_ChangeFROM "SOURCE_SQL_STREAM_001"WINDOW W1 AS ( PARTITION BY ticker_symbol RANGE INTERVAL '10' SECOND PRECEDING);

To test the query

1. Set up an application by following the Getting Started Exercise.

2. Replace the SELECT statement in the application code with the preceding SELECT query. Theresulting application code is the following.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ( ticker_symbol VARCHAR(10), Avg_Percent_Change double);CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM Ticker_Symbol, AVG(Change / (Price - Change)) over W1 as Avg_Percent_Change FROM "SOURCE_SQL_STREAM_001" WINDOW W1 AS ( PARTITION BY ticker_symbol RANGE INTERVAL '10' SECOND PRECEDING);

73

Page 80: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideStream Joins

Example 3: Query Data from Multiple Sliding Windows on theSame StreamYou can write queries to emit output in which each column value is calculated using different slidingwindows defined over the same stream.

In the following example, the query emits the output ticker, price, a2, and a10. It emits output for tickersymbols whose two-row moving average crosses the ten-row moving average. The a2 and a10 columnvalues are derived from two-row and ten-row sliding windows.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ( ticker_symbol VARCHAR(12), price double, average_last2rows double, average_last10rows double);

CREATE OR REPLACE PUMP "myPump" AS INSERT INTO "DESTINATION_SQL_STREAM"SELECT STREAM ticker_symbol, price, avg(price) over last2rows, avg(price) over last10rowsFROM SOURCE_SQL_STREAM_001WINDOW last2rows AS (PARTITION BY ticker_symbol ROWS 2 PRECEDING), last10rows AS (PARTITION BY ticker_symbol ROWS 10 PRECEDING);

To test this query against the demo stream, follow the test procedure described in Example 1 (p. 72).

Streaming Data Operations: Stream JoinsYou can have multiple in-application streams in your application. You can write JOIN queries to correlatedata arriving on these streams. For example, suppose you have the following in-application streams:

• OrderStream – Receives stock orders being placed.

(orderId SqlType, ticker SqlType, amount SqlType, ROWTIME TimeStamp)

• TradeStream – Receives resulting stock trades for those orders.

(tradeId SqlType, orderId SqlType, ticker SqlType, amount SqlType, ticker SqlType, amount SqlType, ROWTIME TimeStamp)

The following are JOIN query examples that correlate data on these streams.

Example 1: Report Orders Where There Are TradesWithin One Minute of the Order Being PlacedIn this example, your query joins both the OrderStream and TradeStream. However, because wewant only trades placed one minute after the orders, the query defines the 1-minute window over theTradeStream. For information about windowed queries, see Sliding Windows (p. 70).

SELECT STREAM ROWTIME,

74

Page 81: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample 1: Report Orders Where There Are Trades

Within One Minute of the Order Being Placed

o.orderId, o.ticker, o.amount AS orderAmount, t.amount AS tradeAmountFROM OrderStream AS oJOIN TradeStream OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS tON o.orderId = t.orderId;

You can define the windows explicitly using the WINDOW clause and writing the preceding query asfollows:

SELECT STREAM ROWTIME, o.orderId, o.ticker, o.amount AS orderAmount, t.amount AS tradeAmountFROM OrderStream AS oJOIN TradeStream OVER tON o.orderId = t.orderIdWINDOW t AS (RANGE INTERVAL '1' MINUTE PRECEDING)

When you include this query in your application code, the application code runs continuously. For eacharriving record on the OrderStream, the application emits an output if there are trades within the 1-minute window following the order being placed.

The join in the preceding query is an inner join where the query emits records in OrderStream for whichthere is a matching record in TradeStream (and vice versa). Using an outer join you can create anotherinteresting scenario. Suppose you want stock orders for which there are no trades within one minute ofstock order being placed, and trades reported within the same window but for some other orders. This isexample of an outer join.

SELECT STREAM ROWTIME, o.orderId, o.ticker, o.amount AS orderAmount, t.ticker, t.tradeId, t.amount AS tradeAmount,FROM OrderStream AS oOUTER JOIN TradeStream OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS tON o.orderId = t.orderId;

75

Page 82: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExamples: Preprocessing Streams

Example Amazon Kinesis DataAnalytics Applications

This section provides examples of working with Amazon Kinesis Data Analytics. Some of these examplesalso provide step-by-step instructions for you to create an Amazon Kinesis Data Analytics application andtest the setup.

Before you explore these walkthroughs, we recommend that you first review Amazon Kinesis DataAnalytics: How It Works (p. 3) and Getting Started with Amazon Kinesis Data Analytics (p. 44).

Topics

• Examples: Preprocessing Streams (p. 76)

• Examples: Basic Analytics (p. 94)

• Examples: Advanced Analytics (p. 99)

• Examples: Other Amazon Kinesis Data Analytics Applications (p. 118)

Examples: Preprocessing StreamsThere are times when your application code needs to preprocess the incoming records before performingany analytics. This can happen for various reasons, such as records not conforming the supported recordformats that can result into unnormalized columns in in-application input streams. This section providesexamples of how to use the available string functions to normalize data, how to extract information thatyou need from string columns, and so on. The section also points to date time functions that you mightfind useful.

Preprocessing Streams with LambdaFor information about preprocessing streams with AWS Lambda, see Preprocessing Data Using a LambdaFunction (p. 20).

Topics

• Example: Manipulating Strings and Date Times (p. 76)

• Example: Streaming Source With Multiple Record Types (p. 85)

• Example: Adding Reference Data to an Amazon Kinesis Data Analytics Application (p. 90)

Example: Manipulating Strings and Date Times

String Manipulation

Amazon Kinesis Data Analytics supports formats such as JSON and CSV for records on a streamingsource. For details, see RecordFormat (p. 248). These records then map to rows in in-applicationstream as per the input configuration. For details, see Configuring Application Input (p. 5). The input

76

Page 83: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Manipulating Strings and Date Times

configuration specifies how record fields in the streaming source map to columns in in-applicationstream.

This mapping works when records on the streaming source follow the supported formats, which resultsin an in-application stream with normalized data.

But, what if data on your streaming source does not conform to supported standards? For example,what if your streaming source contains data such as clickstream data, IoT sensors, and application logs?Consider these examples:

• Streaming source contains application logs – The application logs follow the standard Apache logformat, and are written to the stream using JSON format.

{ "Log":"192.168.254.30 - John [24/May/2004:22:01:02 -0700] "GET /icons/apache_pb.gif HTTP/1.1" 304 0"}

For more information about the standard Apache log format, see Log Files on the Apache website.

 

• Streaming source contains semi-structured data – The following example shows two records. TheCol_E_Unstrucutured field value is a series of comma-separated values.

{ "Col_A" : "string", "Col_B" : "string", "Col_C" : "string", "Col_D" : "string", "Col_E_Unstructured" : "value,value,value,value"}

{ "Col_A" : "string", "Col_B" : "string", "Col_C" : "string", "Col_D" : "string", "Col_E_Unstructured" : "value,value,value,value"}

There are five columns, the first four have string type values and the last column contains comma-separated values.

• Records on your streaming source contain URLs and you need a portion of the URL domain name foranalytics.

{ "referrer" : "http://www.amazon.com"}{ "referrer" : "http://www.stackoverflow.com" }

In such cases, the following two-step process generally works for creating in-application streams thatcontain normalized data:

1. Configure application input to map the unstructured field to a column of the VARCHAR(N) type in thein-application input stream that is created.

2. In your application code, use string functions to split this single column into multiple columns andthen save the rows in another in-application stream. This in-application stream that your applicationcode creates will have normalized data. You can then perform analytics on this in-application stream.

Amazon Kinesis Data Analytics provides string operations, standard SQL functions, and extensions to theSQL standard for working with string columns, including the following:

77

Page 84: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Manipulating Strings and Date Times

• String operators – Operators such as LIKE and SIMILAR are useful in comparing strings. For moreinformation, see String Operators in the Amazon Kinesis Data Analytics SQL Reference.

 

• SQL functions – The following functions are useful when manipulating individual strings. For moreinformation, see Scalar Functions in the Amazon Kinesis Data Analytics SQL Reference.

• CHAR_LENGTH – Provides the length of a string.

• LOWER/UPPER – Converts a string to lowercase or uppercase.

• OVERLAY – Replace a portion of the first string argument (the original string) with the second stringargument (the replacement string).

• SUBSTRING – Extracts a portion of a source string starting at a specific position.

• POSITION – Searches for a string within another string.

 

• SQL Extensions – These are useful for working with unstructured strings such as logs and URIs.

• REGEX_LOG_PARSE – Parses a string based on default Java Regular Expression patterns.

• FAST_REGEX_LOG_PARSER – Works similar to the regex parser, but takes several shortcuts to ensurefaster results. For example, the fast regex parser stops at the first match it finds (known as lazysemantics).

• W3C_Log_Parse – A function for quickly formatting Apache logs.

• FIXED_COLUMN_LOG_PARSE – Parses fixed-width fields and automatically converts them to thegiven SQL types.

• VARIABLE_COLUMN_LOG_PARSE – Splits an input string into fields separated by a delimitercharacter or a delimiter string.

For examples using these functions, see the following topics:

• Example: String Manipulation (W3C_LOG_PARSE Function) (p. 78)

• Example: String Manipulation (VARIABLE_COLUMN_LOG_PARSE Function) (p. 80)

• Example: String Manipulation (SUBSTRING Function) (p. 83)

Example: String Manipulation (W3C_LOG_PARSE Function)

In this example, you write log records to an Amazon Kinesis data stream. Example logs are shownfollowing:

{"Log":"192.168.254.30 - John [24/May/2004:22:01:02 -0700] "GET /icons/apache_pba.gif HTTP/1.1" 304 0"}{"Log":"192.168.254.30 - John [24/May/2004:22:01:03 -0700] "GET /icons/apache_pbb.gif HTTP/1.1" 304 0"}{"Log":"192.168.254.30 - John [24/May/2004:22:01:04 -0700] "GET /icons/apache_pbc.gif HTTP/1.1" 304 0"}...

You then create an Amazon Kinesis data analytics application in the console, with the Kinesis data streamas the streaming source. The discovery process reads sample records on the streaming source and infersan in-application schema with one column (log), as shown following:

78

Page 85: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Manipulating Strings and Date Times

Then, you use the application code with the W3C_LOG_PARSE function to parse the log, and createanother in-application stream with various log fields in separate columns, as shown following:

Step 1: Create a Kinesis Data Stream

Create an Amazon Kinesis data stream and populate log records as follows:

1. Sign in to the AWS Management Console and open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics.

2. Choose Data Streams and then create a stream with one shard.

3. Run the following Python code to populate sample log records. The Python code is simple; itcontinuously writes same log record to the stream.

import jsonfrom boto import kinesisimport random

kinesis = kinesis.connect_to_region("us-east-1")def getHighHeartRate(): data = {} data['log'] = '192.168.254.30 - John [24/May/2004:22:01:02 -0700] "GET /icons/apache_pb.gif HTTP/1.1" 304 0' return data

while True:

79

Page 86: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Manipulating Strings and Date Times

data = json.dumps(getHighHeartRate()) print data kinesis.put_record("stream-name", data, "partitionkey")

Step 2: Create the Amazon Kinesis Data Analytics Application

Create an Amazon Kinesis Data Analytics application as follows:

1. Sign in to the AWS Management Console and open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics.

2. Choose Create new application, and specify an application name.

3. On the application hub, connect to the source.

4. On the Source page, do the following:

• Select the stream that you created in the preceding section.

• Choose the create IAM role option.

• Wait for the console to show the inferred schema and samples records used to infer the schemafor the in-application stream created. Note that the inferred schema has only one column.

• Choose Save and continue.

5. On the application hub, choose Go to SQL editor. To start the application, choose yes in the dialogbox that appears.

6. In the SQL editor, write the application code and verify the results as follows:

• Copy the following application code and paste it into the editor.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (column1 VARCHAR(16),column2 VARCHAR(16),column3 VARCHAR(16),column4 VARCHAR(16),column5 VARCHAR(16),column6 VARCHAR(16),column7 VARCHAR(16));

CREATE OR REPLACE PUMP "myPUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM l.r.COLUMN1, l.r.COLUMN2, l.r.COLUMN3, l.r.COLUMN4, l.r.COLUMN5, l.r.COLUMN6, l.r.COLUMN7 FROM (SELECT STREAM W3C_LOG_PARSE("log", 'COMMON') FROM "SOURCE_SQL_STREAM_001") AS l(r);

• Choose Save and run SQL. On the Real-time analytics tab, you can see all of the in-applicationstreams that the application created and verify the data.

Example: String Manipulation (VARIABLE_COLUMN_LOG_PARSEFunction)In this example, you write semi-structured records to an Amazon Kinesis data stream. The examplerecords are as follows:

80

Page 87: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Manipulating Strings and Date Times

{ "Col_A" : "string", "Col_B" : "string", "Col_C" : "string", "Col_D_Unstructured" : "value,value,value,value"}{ "Col_A" : "string", "Col_B" : "string", "Col_C" : "string", "Col_D_Unstructured" : "value,value,value,value"}

You then create an Amazon Kinesis Data Analytics application in the console, with the Kinesis stream asthe streaming source. The discovery process reads sample records on the streaming source and infer anin-application schema with one column (log), as shown following:

Then, you use the application code with the VARIABLE_COLUMN_LOG_PARSE function to parse thecomma-separated values, and insert normalized rows in another in-application stream, as shownfollowing:

Step 1: Create a Kinesis Data Stream

Create an Amazon Kinesis data stream and populate log records as follows:

1. Sign in to the AWS Management Console and open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics.

2. Choose Kinesis Stream and then create a stream with one shard.3. Run the following Python code to populate sample log records. The Python code is simple; it

continuously writes same log record to the stream.

import jsonfrom boto import kinesisimport random

81

Page 88: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Manipulating Strings and Date Times

kinesis = kinesis.connect_to_region("us-east-1")def getHighHeartRate(): data = {} data['Col_A'] = 'a' data['Col_B'] = 'b' data['Col_C'] = 'c' data['Col_E_Unstructured'] = 'x,y,z' return data

while True: data = json.dumps(getHighHeartRate()) print data kinesis.put_record("teststreamforkinesisanalyticsapps", data, "partitionkey")

Step 2: Create the Amazon Kinesis Data Analytics Application

Create an Amazon Kinesis Data Analytics application as follows:

1. Sign in to the AWS Management Console and open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics.

2. Choose Create new application, and specify an application name.

3. On the application hub, connect to the source.

4. On the Source page, do the following:

• Select the stream you created in the preceding section.

• Choose the create IAM role option.

• Wait for the console to show the inferred schema and samples records used to infer the schemafor the in-application stream created. Note that the inferred schema has only one column.

• Choose Save and continue.

5. On the application hub, choose Go to SQL editor. To start the application, choose yes in the dialogbox that appears.

6. In the SQL editor, write application code and verify results:

• Copy the following application code and paste it into the editor.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM"( "column_A" VARCHAR(16), "column_B" VARCHAR(16), "column_C" VARCHAR(16), "COL_1" VARCHAR(16), "COL_2" VARCHAR(16), "COL_3" VARCHAR(16));

CREATE OR REPLACE PUMP "SECOND_STREAM_PUMP" ASINSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM t."Col_A", t."Col_B", t."Col_C", t.r."COL_1", t.r."COL_2", t.r."COL_3" FROM (SELECT STREAM "Col_A", "Col_B", "Col_C", VARIABLE_COLUMN_LOG_PARSE ("Col_E_Unstructured", 'COL_1 TYPE VARCHAR(16), COL_2 TYPE VARCHAR(16), COL_3 TYPE VARCHAR(16)', ',') AS r FROM "SOURCE_SQL_STREAM_001") as t;

• Choose Save and run SQL. On the Real-time analytics tab, you can see all of the in-applicationstreams that the application created and verify the data.

82

Page 89: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Manipulating Strings and Date Times

Example: String Manipulation (SUBSTRING Function)

In this example, you write the following records to an Amazon Kinesis stream.

{ "referrer" : "http://www.stackoverflow.com" }{ "referrer" : "http://www.amazon.com"}{ "referrer" : "http://www.amazon.com"}...

You then create an Amazon Kinesis Data Analytics application in the console, with the Kinesis stream asthe streaming source. The discovery process reads sample records on the streaming source and infers anin-application schema with one column (log) as shown.

Then, you use the application code with the SUBSTRING function to parse the URL string to retrieve thecompany name. Then insert the resulting data into another in-application stream, as shown following:

Step 1: Create a Kinesis Data Stream

Create an Amazon Kinesis data stream and populate log records as follows:

1. Sign in to the AWS Management Console and open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics.

2. Choose Kinesis Stream, and then create a stream with one shard.

3. Run the following Python code to populate sample log records. The Python code is simple; itcontinuously writes same log record to the stream.

import jsonfrom boto import kinesis

83

Page 90: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Manipulating Strings and Date Times

import random

kinesis = kinesis.connect_to_region("us-east-1")def getReferrer(): data = {} data['referrer'] = 'http://www.amazon.com' return data

while True: data = json.dumps(getReferrer()) print data kinesis.put_record("teststreamforkinesisanalyticsapps", data, "partitionkey")

Step 2: Create the Amazon Kinesis Data Analytics Application

Create an Amazon Kinesis Data Analytics application as follows:

1. Sign in to the AWS Management Console and open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics.

2. Choose Create new application, and specify an application name.3. On the application hub, connect to the source.4. On the Source page, do the following:

• Select the stream you created in the preceding section.• Choose the create IAM role option.• Wait for the console to show the inferred schema and samples records used to infer the schema

for the in-application stream created. Note that the inferred schema has only one column.• Choose Save and continue.

5. On the application hub, choose Go to SQL editor. To start the application, choose yes in the dialogbox that appears.

6. In the SQL editor, write the application code and verify the results as follows:

• Copy the following application code and paste it into the editor.

-- CREATE OR REPLACE STREAM for cleaned up referrerCREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ( "ingest_time" TIMESTAMP, "referrer" VARCHAR(32)); CREATE OR REPLACE PUMP "myPUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM "APPROXIMATE_ARRIVAL_TIME", SUBSTRING("referrer", 12, (POSITION('.com' IN "referrer") - POSITION('www.' IN "referrer") - 4)) FROM "SOURCE_SQL_STREAM_001";

• Choose Save and run SQL. On the Real-time analytics tab, you can see all of the in-applicationstreams that the application created and verify the data.

Date Time ManipulationAmazon Kinesis Data Analytics supports converting columns to time stamps. For example, you mightwant to use your own time stamp as part of a GROUP BY clause as another time-based window, inaddition to the ROWTIME column. Kinesis Data Analytics provides operations and SQL functions forworking with date and time fields.

84

Page 91: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Streaming Source With Multiple Record Types

• Date and time operators – You can perform arithmetic operations on dates, times, and interval datatypes. For more information, see Date, Timestamp, and Interval Operators in the Amazon Kinesis DataAnalytics SQL Reference.

 

• SQL Functions – These include the following:

• EXTRACT() – Extracts one field from a date, time, time stamp, or interval expression.

• CURRENT_TIME – Returns the time when the query executes (UTC).

• CURRENT_DATE – Returns the date when the query executes (UTC).

• CURRENT_TIMESTAMP – Returns the time stamp when the query executes (UTC).

• LOCALTIME – Returns the current time when the query executes as defined by the environment onwhich Amazon Kinesis Data Analytics is running (UTC).

• LOCALTIMESTAMP – Returns the current time stamp as defined by the environment on whichAmazon Kinesis Data Analytics is running (UTC).

 

• SQL Extensions – These include the following:

• CURRENT_ROW_TIMESTAMP – Returns a new time stamp for each row in the stream.

• TSDIFF – Returns the difference of two time stamps in milliseconds.

• CHAR_TO_DATE – Converts a string to a date.

• CHAR_TO_TIME – Converts a string to time.

• CHAR_TO_TIMESTAMP – Converts a string to a time stamp.

• DATE_TO_CHAR – Converts a date to a string.

• TIME_TO_CHAR – Converts a time to a string.

• TIMESTAMP_TO_CHAR – Converts a time stamp to a string.

Most of the preceding SQL functions use a format to convert the columns. The format is flexible. Forexample, you can specify the format yyyy-MM-dd hh:mm:ss to convert an input string 2009-09-1603:15:24 into a time stamp. For more information, Char To Timestamp(Sys) in the Amazon Kinesis DataAnalytics SQL Reference.

Example: Streaming Source With Multiple RecordTypesTopics

• Step 1: Prepare (p. 88)

• Step 2: Create an Application (p. 89)

A common requirement in extract, transform, and load (ETL) applications is to process multiple recordtypes on a streaming source. You can create Amazon Kinesis Data Analytics application to process thesekinds of streaming sources. You do the following:

• First, you map the streaming source to an in-application input stream, similar to all other Kinesis dataanalytics applications.

• Then, in your application code, you write SQL statements to retrieve rows of specific types fromthe in-application input stream, and insert them in separate in-application streams. (You can createadditional in-application streams in your application code.)

85

Page 92: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Streaming Source With Multiple Record Types

In this exercise, you have a streaming source that receives records of two types (Order and Trade types).These are stock orders and corresponding trades. For each order, there can be zero or more trades.Example records of each type are shown following:

Order record

{"RecordType": "Order", "Oprice": 9047, "Otype": "Sell", "Oid": 3811, "Oticker": "AAAA"}

Trade record

{"RecordType": "Trade", "Tid": 1, "Toid": 3812, "Tprice": 2089, "Tticker": "BBBB"}

When you create an application using the AWS Management Console, the console displays the followinginferred schema for the in-application input stream created. By default, the console names this in-application stream as SOURCE_SQL_STREAM_001.

When you save the configuration, Amazon Kinesis Data Analytics continuously reads data from thestreaming source and inserts rows in the in-application stream. You can now perform analytics on data inthe in-application stream.

In this example, in the application code you first create two additional in-application streams,Order_Stream, Trade_Stream. You then filter the rows from the SOURCE_SQL_STREAM_001 streambased on record type and insert them in the newly created streams using pumps. For information aboutthis coding pattern, see Application Code (p. 30).

• Filter order and trade rows into separate in-application streams• Filter the order records in the SOURCE_SQL_STREAM_001, and save the orders in theOrder_Stream.

--Create Order_Stream.CREATE OR REPLACE STREAM "Order_Stream" ( order_id integer, order_type varchar(10), ticker varchar(4), order_price DOUBLE, record_type varchar(10) );

86

Page 93: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Streaming Source With Multiple Record Types

CREATE OR REPLACE PUMP "Order_Pump" AS INSERT INTO "Order_Stream" SELECT STREAM oid, otype,oticker, oprice, recordtype FROM "SOURCE_SQL_STREAM_001" WHERE recordtype = 'Order';

• Filter the trade records in the SOURCE_SQL_STREAM_001, and save the orders in theTrade_Stream.

--Create Trade_Stream. CREATE OR REPLACE STREAM "Trade_Stream" (trade_id integer, order_id integer, trade_price DOUBLE, ticker varchar(4), record_type varchar(10) );

CREATE OR REPLACE PUMP "Trade_Pump" AS INSERT INTO "Trade_Stream" SELECT STREAM tid, toid, tprice, tticker, recordtype FROM "SOURCE_SQL_STREAM_001" WHERE recordtype = 'Trade';

• Now you can perform additional analytics on these streams. In this example, you count the numberof trades by the ticker in a one-minute tumbling window and save the results to yet another stream,DESTINATION_SQL_STREAM.

--do some analytics on the Trade_Stream and Order_Stream. -- To see results in console you must write to OPUT_SQL_STREAM.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ( ticker varchar(4), trade_count integer );

CREATE OR REPLACE PUMP "Output_Pump" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM ticker, count(*) as trade_count FROM "Trade_Stream" GROUP BY ticker, FLOOR("Trade_Stream".ROWTIME TO MINUTE);

You see the result, as shown following:

Next Step

87

Page 94: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Streaming Source With Multiple Record Types

Step 1: Prepare (p. 88)

Step 1: PrepareIn this section, you create a Kinesis data stream, and then populate order and trade records on thestream. This is your streaming source for the application you create in the next step.

Step 1.1: Create a Streaming Source

You can create a Kinesis data stream using the console or the AWS CLI. The example assumesOrdersAndTradesStream as the stream name.

• Using the console – Sign in to the AWS Management Console and open the Kinesis console at https://console.aws.amazon.com/kinesis. Choose Data Streams, and then create a stream with one shard.

• Using the AWS CLI – Use the following Kinesis create-stream AWS CLI command to create thestream:

$ aws kinesis create-stream \--stream-name OrdersAndTradesStream \--shard-count 1 \--region us-east-1 \--profile adminuser

Step 1.2: Populate the Streaming Source

Run the following Python script to populate sample records on the OrdersAndTradesStream. If youcreated the stream with a different name, update the Python code appropriately.

1. Install Python and pip.

For information about installing Python, see the Python website.

You can install dependencies using pip. For information about installing pip, see Installing on the pipwebsite.

2. Run the following Python code. The put-record command in the code writes the JSON records tothe stream.

import testdataimport jsonfrom boto import kinesisimport random

kinesis = kinesis.connect_to_region("us-east-1")

def getOrderData(orderId, ticker): data = {} data['RecordType'] = "Order" data['Oid'] = orderId data['Oticker'] = ticker data['Oprice'] = random.randint(500, 10000) data['Otype'] = "Sell" return data

def getTradeData(orderId, tradeId, ticker, tradePrice): data = {} data['RecordType'] = "Trade" data['Tid'] = tradeId data['Toid'] = orderId data['Tticker'] = ticker

88

Page 95: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Streaming Source With Multiple Record Types

data['Tprice'] = tradePrice return data

x = 1while True: #rnd = random.random() rnd = random.randint(1,3) if rnd == 1: ticker = "AAAA" elif rnd == 2: ticker = "BBBB" else: ticker = "CCCC" data = json.dumps(getOrderData(x, ticker)) kinesis.put_record("OrdersAndTradesStream", data, "partitionkey") print data tId = 1 for y in range (0, random.randint(0,6)): tradeId = tId tradePrice = random.randint(0, 3000) data2 = json.dumps(getTradeData(x, tradeId, ticker, tradePrice)); kinesis.put_record("OrdersAndTradesStream", data2, "partitionkey") print data2 tId+=1 x+=1

Next Step

Step 2: Create an Application (p. 89)

Step 2: Create an ApplicationIn this section, you create an Amazon Kinesis Data Analytics application. You then update the applicationby adding input configuration that maps the streaming source you created in the preceding section to anin-application input stream.

1. Sign in to the AWS Management Console and open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics.

2. Choose Create new application. This example uses the application nameProcessMultipleRecordTypes.

3. On the application hub, connect to the source.4. On the Source page,

a. Select the stream you created in the preceding section.b. Choose the create IAM role option.c. Wait for the console to show the inferred schema and samples records used to infer the schema

for the in-application stream created.d. Choose Save and continue.

5. On the application hub, choose Go to SQL editor. To start the application, choose Yes in the dialogbox that appears.

6. In the SQL editor, write the application code and verify the results:

a. Copy the following application code and paste it into the editor.

--Create Order_Stream.CREATE OR REPLACE STREAM "Order_Stream" (

89

Page 96: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Add Reference Data Source

"order_id" integer, "order_type" varchar(10), "ticker" varchar(4), "order_price" DOUBLE, "record_type" varchar(10) );

CREATE OR REPLACE PUMP "Order_Pump" AS INSERT INTO "Order_Stream" SELECT STREAM "Oid", "Otype","Oticker", "Oprice", "RecordType" FROM "SOURCE_SQL_STREAM_001" WHERE "RecordType" = 'Order';--********************************************--Create Trade_Stream. CREATE OR REPLACE STREAM "Trade_Stream" ("trade_id" integer, "order_id" integer, "trade_price" DOUBLE, "ticker" varchar(4), "record_type" varchar(10) );

CREATE OR REPLACE PUMP "Trade_Pump" AS INSERT INTO "Trade_Stream" SELECT STREAM "Tid", "Toid", "Tprice", "Tticker", "RecordType" FROM "SOURCE_SQL_STREAM_001" WHERE "RecordType" = 'Trade';--*****************************************************************--do some analytics on the Trade_Stream and Order_Stream. CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ( "ticker" varchar(4), "trade_count" integer );

CREATE OR REPLACE PUMP "Output_Pump" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM "ticker", count(*) as trade_count FROM "Trade_Stream" GROUP BY "ticker", FLOOR("Trade_Stream".ROWTIME TO MINUTE);

b. Choose Save and run SQL. Choose the Real-time analytics tab to see all of the in-applicationstreams that the application created and verify data.

Next Step

You can configure application output to persist results to an external destination, such as another Kinesisstream or a Kinesis data delivery stream.

Example: Adding Reference Data to an AmazonKinesis Data Analytics ApplicationTopics

• Step 1: Prepare (p. 91)• Step 2: Add Reference Data Source to the Application Configuration (p. 92)• Step 3: Test: Query the In-Application Reference Table (p. 94)

In this exercise, you add reference data to an existing Amazon Kinesis Data Analytics application. Forinformation about reference data, see the following topics:

90

Page 97: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Add Reference Data Source

• Amazon Kinesis Data Analytics: How It Works (p. 3)

• Configuring Application Input (p. 5)

In this exercise you add reference data to the application you created in the getting started exercise. Thereference data provides company name for each ticker symbol. For example,

Ticker, CompanyAMZN,AmazonASD, SomeCompanyAMMB, SomeCompanyBWAS, SomeCompanyC

First complete the Getting Started Exercise. Then you do the following to set up and add reference datato your application.

1. Prepare

• Store preceding reference data as an object in your S3 bucket.

• Create an IAM role, that Amazon Kinesis Data Analytics can assume to read the S3 object on yourbehalf.

2. Add the reference data source to your application. Amazon Kinesis Data Analytics reads the S3 objectand create an in-application reference table that you can query in your application code.

3. Test. In your application code you will write a join query to join the in-application stream with the in-application reference table, to get company name for each ticker symbol.

NoteAmazon Kinesis Data Analytics console does not support managing reference data sources foryour applications. In this exercise, you use the AWS CLI to add reference data source to yourapplication. If you haven't already done so, set up the AWS CLI.

Step 1: Prepare

In this section, you store sample reference data as an object in your S3 bucket. You also create an IAMrole that Amazon Kinesis Data Analytics can assume to read the object on your behalf.

Prepare: Store Reference Data as S3 Object

Store sample reference data as S3 object.

1. Open a text editor, type the following data, and save the file as TickerReference.csv.

Ticker, CompanyAMZN,AmazonASD, SomeCompanyAMMB, SomeCompanyBWAS, SomeCompanyC

2. Upload the TickerReference.csv file to your S3 bucket. For instructions, see Uploading Objectsinto Amazon S3 in the Amazon Simple Storage Service Console User Guide.

Prepare: Create an IAM Role

Create an IAM role. Follow the procedure to create an IAM role that Amazon Kinesis Data Analytics canassume and read the S3 object.

91

Page 98: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Add Reference Data Source

1. Create an IAM role called KinesisAnalytics-ReadS3Object. In the IAM console, you specify thefollowing when you create a role:

• Choose AWS Lambda on the Select Role Type. After creating the role, you will change the trustpolicy to allow Amazon Kinesis Data Analytics to assume the role (not AWS Lambda).

• Do not attach any policy on the Attach Policy page.

For instructions, see Creating a Role for an AWS Service (AWS Management Console) in the IAM UserGuide.

2. Update the IAM role policies.

a. In the IAM console, select the role you created.

b. On the Trust Relationships tab, update the trust policy to allow Amazon Kinesis Data Analyticspermissions to assume the role. The trust policy is shown following:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "kinesisanalytics.amazonaws.com" }, "Action": "sts:AssumeRole" } ]}

c. On the Permissions tab, attach an AWS managed policy called AmazonS3ReadOnlyAccess.This grants the role permissions to read an S3 object. The policy is shown following for yourinformation:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:Get*", "s3:List*" ], "Resource": "*" } ]}

Step 2: Add Reference Data Source to the ApplicationConfiguration

In this section you add reference data source to your application configuration. You will need thefollowing information:

• Your Amazon Kinesis Data Analytics application name and current application version ID

• S3 bucket name and object key name

• IAM role ARN

92

Page 99: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Add Reference Data Source

Now, you now use the AWS CLI to complete the step:

1. Run the describe-application to get the application description, as shown following:

$ aws kinesisanalytics describe-application \--region us-east-1 \--application-name application-name

2. Note the current application version ID.

Each time you make changes to your application, the current version is updated. So you need tomake sure you have the current application version ID.

3. Use the following JSON to add the reference data source:

{ "TableName":"CompanyName", "S3ReferenceDataSource":{ "BucketARN":"arn:aws:s3:::bucket-name", "FileKey":"TickerReference.csv", "ReferenceRoleARN":"arn:aws:iam::aws-account-id:role/IAM-role-name" }, "ReferenceSchema":{ "RecordFormat":{ "RecordFormatType":"CSV", "MappingParameters":{ "CSVMappingParameters":{ "RecordRowDelimiter":"\n", "RecordColumnDelimiter":"," } } }, "RecordEncoding":"UTF-8", "RecordColumns":[ { "Name":"Ticker", "SqlType":"VARCHAR(64)" }, { "Name":"Company", "SqlType":"VARCHAR(64)" } ] }}

Run the add-application-reference-data-source command using the preceding referencedata configuration information. You need to provide your bucket name, object key name, IAM rolename, and AWS account ID.

$ aws kinesisanalytics add-application-reference-data-source \--endpoint https://kinesisanalytics.aws-region.amazonaws.com \--region us-east-1 \--application-name DemoStreamBasedGettingStarted \--debug \--reference-data-source '{"TableName":"CompanyName","S3ReferenceDataSource":{"BucketARN":"arn:aws:s3:::bucket-name","FileKey":"TickerReference.csv","ReferenceRoleARN":"arn:aws:iam::aws-account-id:role/IAM-role-name"},"ReferenceSchema":{ "RecordFormat":{"RecordFormatType":"CSV", "MappingParameters":{"CSVMappingParameters":{"RecordRowDelimiter":"\n","RecordColumnDelimiter":","} }},"RecordEncoding":"UTF-8","RecordColumns":[{"Name":"Ticker","SqlType":"VARCHAR(64)"},{ "Name":"Company","SqlType":"VARCHAR(64)"}]}}' \

93

Page 100: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExamples: Basic Analytics

--current-application-version-id 10

4. Verify that the reference data was added to the application by getting the application descriptionusing the describe-application operation.

Step 3: Test: Query the In-Application Reference TableYou can now query the in-application reference table, CompanyName. You can use the referenceinformation to enrich your application by joining the ticker price data with the reference table, and thenthe result shows the company name.

1. Replace your application code by the following. The query joins the in-application input stream withthe in-application reference table. The application code writes the results to another in-applicationstream, DESTINATION_SQL_STREAM.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), "Company" varchar(20), sector VARCHAR(12), change DOUBLE, price DOUBLE);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM ticker_symbol, "c"."Company", sector, change, price FROM "SOURCE_SQL_STREAM_001" LEFT JOIN "CompanyName" as "c" ON "SOURCE_SQL_STREAM_001".ticker_symbol = "c"."Ticker";

2. Verify that the application output appears in the SQLResults tab. Make sure some of the rows showcompany names (your sample reference data does not have all company names).

Examples: Basic AnalyticsThis section provides examples of Amazon Kinesis Data Analytics applications that perform basicanalytics. The examples provide step-by-step instructions to set up an Amazon Kinesis Data Analyticsapplication.

Topics

• Example: Most Frequently Occurring Values (the TOP_K_ITEMS_TUMBLING Function) (p. 94)

• Example: Counting Distinct Values (the COUNT_DISTINCT_ITEMS_TUMBLING function) (p. 95)

• Example: Simple Alerts (p. 96)

• Example: Throttled Alerts (p. 98)

Example: Most Frequently Occurring Values (theTOP_K_ITEMS_TUMBLING Function)In this exercise, you set up an Amazon Kinesis Data Analytics application to find the top ten mostfrequently traded stocks in a one-minute window.

For this exercise, you use the demo stream, which provides continuous flow of simulated stock traderecords and finds the top ten most frequently traded stocks in a one-minute window.

Use the following application code:

CREATE OR REPLACE STREAM DESTINATION_SQL_STREAM ( ITEM VARCHAR(1024),

94

Page 101: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Count Distinct Values

ITEM_COUNT DOUBLE);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM * FROM TABLE(TOP_K_ITEMS_TUMBLING( CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"), 'column1', -- name of column in single quote. 10, -- number of top items. 60 -- tumbling window size in seconds ) );

The code uses the TOP_K_ITEMS_TUMBLING function to find the most frequently traded stock.Note that, for efficiency, the function approximates the most frequently occurring values. For moreinformation about the function, see TOP_K_ITEMS_TUMBLING Function in the Amazon Kinesis DataAnalytics SQL Reference.

In the console, this application code is available as a template (Approximate Top-K items), which youuse to quickly create the application. You need to update this template code by replacing 'column1' with'TICKER_SYMBOL' to estimate the most frequently occurring values, in a one-minute tumbling window.

You can use the following procedure to test this template using the demo stream.

To create an application

1. Complete the Getting Started exercise. For instructions, see Step 3: Create Your Starter AmazonKinesis Data Analytics Application (p. 46).

2. Replace the application code in the SQL editor with the Approximate Top-K items template asfollows in the SQL editor:

a. Delete the existing sample code.

b. Choose Add SQL from templates and then select the TopKItems template.

c. Update the template code by replacing the column name from COLUMN1 to 'TICKER_SYMBOL'(with single quotes around). Also, change the number of items from 10 to 3, so that you get thetop three most frequently traded stocks in each one-minute window.

3. Save and run SQL. Review results in the Real-time analytics tab in the SQL editor.

Because the window size is one minute, you need to wait to see the results. TheDESTINATION_SQL_STREAM displays three columns (ROWTIME, ITEM, and ITEM_COUNT). The queryemits results every one minute.

Example: Counting Distinct Values (theCOUNT_DISTINCT_ITEMS_TUMBLING function)In this exercise, you set up a Amazon Kinesis Data Analytics application to count distinct values in a one-minute tumbling window.

For the exercise, you use the demo stream, which provides continuous flow of simulated stock traderecords and finds distinct stocks traded in a one-minute window.

Use the following application code:

CREATE OR REPLACE STREAM DESTINATION_SQL_STREAM ( NUMBER_OF_DISTINCT_ITEMS BIGINT);

95

Page 102: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Simple Alerts

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM * FROM TABLE(COUNT_DISTINCT_ITEMS_TUMBLING( CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"), 'column1', -- name of column in single quotes 60 -- tumbling window size in seconds ));

The code uses the COUNT_DISTINCT_ITEMS_TUMBLING function to approximate the number of distinctvalues. For more information about the function, see COUNT_DISTINCT_ITEMS_TUMBLING Function inthe Amazon Kinesis Data Analytics SQL Reference.

In the console, this application code is available as a template (Approximate distinct count), which youuse to quickly create the application. You need to update this template code by replacing 'column1' with'TICKER_SYMBOL' to estimate the number of distinct stocks traded, in a one-minute tumbling window.

You can use the following procedure to test this template using the demo stream.

To create an application

1. Complete the getting started exercise. For instructions, see Step 3: Create Your Starter AmazonKinesis Data Analytics Application (p. 46).

2. Now you replace the application code in the SQL editor by the Approximate distinct count templateas follows. In SQL editor, do the following:

a. Delete the existing sample code.

b. Choose Add SQL from templates and then select the Approximate distinct count template.

c. Update the template code by replacing the column name from column1 to'TICKER_SYMBOL' (with single quotes around).

3. Save and run SQL. Review results in the Real-time analytics tab in the SQL editor.

Because the window size is one minute, you need to wait to see the results. TheDESTINATION_SQL_STREAM shows two columns (ROWTIME and NUMBER_OF_DISTINCT_ITEMS).The query emits results every one minute.

Example: Simple AlertsIn this application, the query runs continuously on the in-application stream created over the demostream. For more information, see Continuous Queries (p. 68). If any rows show a stock price change thatis greater than 1 percent, those rows are inserted in another in-application stream. In the exercise, youcan configure the application output persist the results to an external destination. You can then furtherinvestigate results. For example, you can use an AWS Lambda function to process records and send youalerts.

To create a simple alerts application

1. Create the Amazon Kinesis Data Analytics application as described in the Getting Started Exercise.

2. In the SQL editor, replace the application code with the following:

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), sector VARCHAR(12), change DOUBLE,

96

Page 103: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Simple Alerts

price DOUBLE);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM ticker_symbol, sector, change, price FROM "SOURCE_SQL_STREAM_001" WHERE (ABS(Change / (Price - Change)) * 100) > 1;

The SELECT statement in the application code filters rows in the SOURCE_SQL_STREAM_001 forstock price changes greater than 1%, and inserts those rows to another in-application streamDESTINATION_SQL_STREAM using a pump. For more information about the coding pattern thatexplains using pumps to insert rows in in-application streams, see Application Code (p. 30).

3. Click Save and run SQL.

4. Add a destination. You can either choose the Destination in the SQL Editor, or choose Add adestination on the application hub.

a. In SQL editor, choose the Destination tab and then choose Add a destination.

On the Add a destination page, choose Configure a new stream.

b. Choose Go to Kinesis Streams.

c. In the Amazon Kinesis Data Streams console, create a new Kinesis stream (for example, gs-destination) with 1 shard. Wait until the stream status is ACTIVE.

d. Return to the Amazon Kinesis Data Analytics console. On the Destination page, choose thestream that you created.

If the stream does not show, refresh the page.

Now you have an external destination, where Amazon Kinesis Data Analytics persists any recordsyour application writes to the in-application stream DESTINATION_SQL_STREAM.

e. Choose Save and continue.

Now you have an external destination, a Kinesis stream, where Amazon Kinesis Data Analyticspersists your application output in the DESTINATION_SQL_STREAM in-application stream.

5. Configure AWS Lambda to monitor the Kinesis stream you created and invoke a Lambda function.

For instructions, see Preprocessing Data Using a Lambda Function (p. 20).

97

Page 104: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Throttled Alerts

Example: Throttled AlertsIn this application, the query runs continuously on the in-application stream created over the demostream. For more information, see Continuous Queries (p. 68). If any rows show stock price change isgreater than 1%, those rows are inserted in another in-application stream. The application throttles thealerts such that an alert is sent immediately when the stock price changes, but no more than one alertper minute per stock symbol is sent to the in-application stream.

To create a throttled alerts application

1. Create the Amazon Kinesis Data Analytics application as described in the Getting Started Exercise.

2. In the SQL editor, replace the application code with the following:

CREATE OR REPLACE STREAM "CHANGE_STREAM" (ticker_symbol VARCHAR(4), sector VARCHAR(12), change DOUBLE, price DOUBLE);

CREATE OR REPLACE PUMP "change_pump" AS INSERT INTO "CHANGE_STREAM" SELECT STREAM ticker_symbol, sector, change, price FROM "SOURCE_SQL_STREAM_001" WHERE (ABS(Change / (Price - Change)) * 100) > 1; -- ** Trigger Count and Limit **-- Counts "triggers" or those values that evaluated true against the previous where clause-- Then provides its own limit on the number of triggers per hour per ticker symbol to what-- is specified in the WHERE clause

CREATE OR REPLACE STREAM TRIGGER_COUNT_STREAM ( ticker_symbol VARCHAR(4), change REAL, trigger_count INTEGER);

CREATE OR REPLACE PUMP trigger_count_pump AS INSERT INTO TRIGGER_COUNT_STREAMSELECT STREAM ticker_symbol, change, trigger_countFROM ( SELECT STREAM ticker_symbol, change, COUNT(*) OVER W1 as trigger_count FROM "CHANGE_STREAM" --window to perform aggregations over last minute to keep track of triggers WINDOW W1 AS (PARTITION BY ticker_symbol RANGE INTERVAL '1' MINUTE PRECEDING))WHERE trigger_count >= 1;

The SELECT statement in the application code filters rows in the SOURCE_SQL_STREAM_001 forstock price changes greater than 1%, and inserts those rows to another in-application streamCHANGE_STREAM using a pump.

The application then creates a second stream called TRIGGER_COUNT_STREAM for the throttledalerts. A second query selects records from a window that hops forward every time a record isadmitted into it, such that only one record per stock ticker per minute is written to the stream.

3. Click Save and run SQL.

The preceding example outputs a stream to TRIGGER_COUNT_STREAM similar to the following:

98

Page 105: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExamples: Advanced Analytics

Examples: Advanced AnalyticsThis section provides additional examples of Amazon Kinesis Data Analytics applications. This includesusing the RANDOM_CUT_FOREST function to assign anomaly scores to your stream data. You can thenevaluate the anomaly scores to determine if the data is anomalous and perhaps take additional action. Inaddition, how section provides examples of using different types of times in analytics.

Topics• Example: Detecting Data Anomalies on a Stream (the RANDOM_CUT_FOREST Function) (p. 99)• Example: Detecting Data Anomalies and Getting an Explanation

(RANDOM_CUT_FOREST_WITH_EXPLANATION Function) (p. 104)• Example: Detecting Hotspots on a Stream (HOTSPOTS Function) (p. 108)• Example: Using Different Types of Times in Streaming Analytics (p. 118)

Example: Detecting Data Anomalies on a Stream (theRANDOM_CUT_FOREST Function)Amazon Kinesis Data Analytics provides a function (RANDOM_CUT_FOREST) that can assign ananomaly score to each record based on values in the numeric columns. For more information, seeRANDOM_CUT_FOREST Function in the Amazon Kinesis Data Analytics SQL Reference. In this exercise,you write application code to assign anomaly score to records on your application's streaming source.You do the following to set up the application:

1. Set up a streaming source – You set up a Kinesis stream and write sample heartRate data asshown following:

{"heartRate": 60, "rateType":"NORMAL"}...{"heartRate": 180, "rateType":"HIGH"}

The walkthrough provides a Python script for you to populate the stream. The heartRate valuesare randomly generated, with 99% of the records having heartRate values between 60 and 100,and only 1% of heartRate values between 150 and 200. Thus, records with heartRate valuesbetween 150 and 200 are anomalies.

99

Page 106: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Anomalies

2. Configure input – Using the console, create an Amazon Kinesis Data Analytics application,and configure application input by mapping the streaming source to an in-application stream(SOURCE_SQL_STREAM_001). When the application starts, Amazon Kinesis Data Analyticscontinuously reads the streaming source and inserts records into the in-application stream.

3. Specify application code – Use the following application code:

--Creates a temporary stream.CREATE OR REPLACE STREAM "TEMP_STREAM" ( "heartRate" INTEGER, "rateType" varchar(20), "ANOMALY_SCORE" DOUBLE);

--Creates another stream for application output. CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ( "heartRate" INTEGER, "rateType" varchar(20), "ANOMALY_SCORE" DOUBLE);

-- Compute an anomaly score for each record in the input stream-- using Random Cut ForestCREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "TEMP_STREAM" SELECT STREAM "heartRate", "rateType", ANOMALY_SCORE FROM TABLE(RANDOM_CUT_FOREST( CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001")));

-- Sort records by descending anomaly score, insert into output streamCREATE OR REPLACE PUMP "OUTPUT_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM * FROM "TEMP_STREAM" ORDER BY FLOOR("TEMP_STREAM".ROWTIME TO SECOND), ANOMALY_SCORE DESC;

The code reads rows in the SOURCE_SQL_STREAM_001, assigns an anomaly score, and writesthe resulting rows to another in-application stream (TEMP_STREAM). The application code thensorts the records in the TEMP_STREAM and saves the results to another in-application stream(DESTINATION_SQL_STREAM). Note that you use pumps to insert rows in in-application streams.For more information, see In-Application Streams and Pumps (p. 65).

4. Configure output – You configure the application output to persist data in theDESTINATION_SQL_STREAM to an external destination, which is another Kinesis stream. Reviewingthe anomaly scores assigned to each record and determining what score indicates an anomaly (andyou need to be alerted) is external to the application. You can use a Lambda function to processthese anomaly scores and configure alerts.

The exercise uses the US East (N. Virginia) (us-east-1) AWS Region to create these streams and yourapplication. If you use any other region, you need to update the code accordingly.

Next Step

Step 1: Prepare (p. 100)

Step 1: PrepareBefore you create an Amazon Kinesis Data Analytics application for this exercise, you create two Kinesisstreams. You configure one of the streams as the streaming source for your application, and anotherstream as destination where Amazon Kinesis Data Analytics persists your application output.

Step 1.1: Create Two Kinesis Streams

In this section, you create two Kinesis streams (ExampleInputStream and ExampleOutputStream).

100

Page 107: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Anomalies

1. You can create these streams using the console or the AWS CLI.

• Sign in to the AWS Management Console and open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics.

• Choose Kinesis Stream, and then create a stream with one shard.• Use the following Kinesis create-stream CLI command to create the first stream

(ExampleInputStream).

$ aws kinesis create-stream \--stream-name ExampleInputStream \--shard-count 1 \--region us-east-1 \--profile adminuser

2. Run the same command, changing the stream name to ExampleOutputStream, to create thesecond stream that the application will use to write output.

Step 1.2: Write Sample Records to the Input Stream

In this step, you run Python code to continuously generate sample records and write to theExampleInputStream stream.

{"heartRate": 60, "rateType":"NORMAL"} ...{"heartRate": 180, "rateType":"HIGH"}

The code writes these records to the ExampleInputStream stream.

1. Install Python and pip.

For information about installing Python, see the Python website.

You can install dependencies using pip. For information about installing pip, see Installation on thepip website.

2. Run the following Python code. The put-record command in the code writes the JSON records tothe stream.

import jsonfrom boto import kinesisimport random

kinesis = kinesis.connect_to_region("us-east-1")# generate normal heart rate with probability .99def getNormalHeartRate(): data = {} data['heartRate'] = random.randint(60, 100) data['rateType'] = "NORMAL" return data# generate high heart rate with probability .01 (very few)def getHighHeartRate(): data = {} data['heartRate'] = random.randint(150, 200) data['rateType'] = "HIGH" return data

while True: rnd = random.random() if (rnd < 0.01): data = json.dumps(getHighHeartRate())

101

Page 108: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Anomalies

print data kinesis.put_record("ExampleInputStream", data, "partitionkey") else: data = json.dumps(getNormalHeartRate()) print data kinesis.put_record("ExampleInputStream", data, "partitionkey")

Next Step

Step 2: Create an Application (p. 102)

Step 2: Create an ApplicationIn this section, you create an Amazon Kinesis Data Analytics application as follows:

• Configure the application input to use the Kinesis stream you created in the preceding section as thestreaming source.

• Use the Anomaly Detection template in the console.

To create an application

1. Follow steps 1, 2, and 3 in Getting Started exercise (see Step 3.1: Create an Application (p. 48)) tocreate an application. Note the following:

• In the source configuration, do the following:

• Specify the streaming source you created in the preceding section.

• After the console infers the schema, edit the schema and set the heartRate column type toINTEGER.

Most of the heart rate values are normal and the discovery process will most likely assignTINYINT type to this column. But very small percentage of values that show high heart rate.If these high values don't fit in the TINYINT type, Amazon Kinesis Data Analytics sends theserows to error stream. Update the data type to INTEGER so that it can accommodate all of thegenerated heart rate data.

• Use the Anomaly Detection template in the console. You then update the template code toprovide appropriate column name.

2. Update the application code by providing column names. The resulting application code is shownfollowing (you can paste this code into the SQL editor):

--Creates a temporary stream.CREATE OR REPLACE STREAM "TEMP_STREAM" ( "heartRate" INTEGER, "rateType" varchar(20), "ANOMALY_SCORE" DOUBLE);

--Creates another stream for application output. CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ( "heartRate" INTEGER, "rateType" varchar(20), "ANOMALY_SCORE" DOUBLE);

-- Compute an anomaly score for each record in the input stream-- using Random Cut ForestCREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "TEMP_STREAM" SELECT STREAM "heartRate", "rateType", ANOMALY_SCORE FROM TABLE(RANDOM_CUT_FOREST(

102

Page 109: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Anomalies

CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001")));

-- Sort records by descending anomaly score, insert into output streamCREATE OR REPLACE PUMP "OUTPUT_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM * FROM "TEMP_STREAM" ORDER BY FLOOR("TEMP_STREAM".ROWTIME TO SECOND), ANOMALY_SCORE DESC;

3. Run the SQL code and review results:

Next Step

Step 3: Configure Application Output (p. 103)

Step 3: Configure Application OutputAt this time, you have application code reading heart rate data from a streaming source and assigningan anomaly score to each. You can now send the application result from the in-application stream toan external destination, another Kinesis stream (OutputStreamTestingAnomalyScores). You canthen analyze the anomaly scores and determine which heart rate is anomalous. You can extend thisapplication further to generate alerts. Follow these steps to configure application output:

1. In the SQL editor, choose either Destination or Add a destination in the application dashboard.2. On the Add a destination page, choose Select from your streams, and then choose the

OutputStreamTestingAnomalyScores stream you created in the preceding section.

Now you have an external destination, where Amazon Kinesis Data Analytics persists any recordsyour application writes to the in-application stream DESTINATION_SQL_STREAM.

3. You can optionally configure AWS Lambda to monitor the OutputStreamTestingAnomalyScoresstream and send you alerts. For instructions, see Preprocessing Data Using a LambdaFunction (p. 20). If not, you can review the records that Amazon Kinesis Data Analytics writes to theexternal destination, the Kinesis stream OutputStreamTestingAnomalyScores, as described inthe next step.

Next Step

Step 4: Verify Output (p. 104)

103

Page 110: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Anomalies and Obtain an Explanation

Step 4: Verify OutputIn this step, you use the following AWS CLI commands to read records in the destination stream writtenby the application:

1. Run the get-shard-iterator command to get a pointer to data on the output stream.

aws kinesis get-shard-iterator \--shard-id shardId-000000000000 \--shard-iterator-type TRIM_HORIZON \--stream-name OutputStreamTestingAnomalyScores \--region us-east-1 \--profile adminuser

You get a response with a shard iterator value, as shown in the following example response:

{     "ShardIterator": "shard-iterator-value" }

Copy the shard iterator value.2. Run the CLI get-records command.

aws kinesis get-records \--shard-iterator shared-iterator-value \--region us-east-1 \--profile adminuser

The command returns a page of records and another shard iterator that you can use in thesubsequent get-records command to fetch the next set of records.

Example: Detecting Data Anomalies and Getting anExplanation(RANDOM_CUT_FOREST_WITH_EXPLANATIONFunction)Amazon Kinesis Data Analytics provides the RANDOM_CUT_FOREST_WITH_EXPLANATIONfunction, which assigns an anomaly score to each record based on values in the numericcolumns. The function also provides an explanation of the anomaly. For more information, seeRANDOM_CUT_FOREST_WITH_EXPLANATION.

In this exercise, you write application code to obtain anomaly scores for records in your application'sstreaming source. You also obtain an explanation for each anomaly.

First Step

Step 1: Prepare the Data (p. 104)

Step 1: Prepare the DataBefore you create an Amazon Kinesis Data Analytics application for this example (p. 104), you createa Kinesis data stream to use as the streaming source for your application. You also run Python code towrite simulated blood pressure data to the stream.

104

Page 111: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Anomalies and Obtain an Explanation

Step 1.1: Create a Kinesis Data Stream

In this section, you create a Kinesis data stream named ExampleInputStream. You can create this datastream using the AWS Management Console or the AWS CLI.

• To use the console:

1. Sign in to the AWS Management Console and open the Kinesis console at https://console.aws.amazon.com/kinesis.

2. Go to the Data Streams dashboard, and choose Create Kinesis stream.3. For the name, type ExampleInputStream. For the number of shards, type 1.

.• Alternatively, to use the AWS CLI to create the data stream, run the following command:

$ aws kinesis create-stream --stream-name ExampleInputStream --shard-count 1

Step 1.2: Write Sample Records to the Input Stream

In this step, you run Python code to continuously generate sample records and write them to the datastream you created.

1. Install Python and pip.

For information about installing Python, see Python.

You can install dependencies using pip. For information about installing pip, see Installation in thepip documentation.

2. Run the following Python code. You can change the Region to the one you want to use for thisexample. The put-record command in the code writes the JSON records to the stream.

import jsonfrom boto import kinesisimport random

kinesis = kinesis.connect_to_region("us-east-1")

# generate normal blood pressure with a 0.995 probabilitydef getNormalBloodPressure(): data = {} data['Systolic'] = random.randint(90, 120) data['Diastolic'] = random.randint(60, 80) data['BloodPressureLevel'] = 'NORMAL' return data # generate high blood pressure with probability 0.005def getHighBloodPressure(): data = {} data['Systolic'] = random.randint(130, 200) data['Diastolic'] = random.randint(90, 150) data['BloodPressureLevel'] = 'HIGH' return data # generate low blood pressure with probability 0.005def getLowBloodPressure(): data = {} data['Systolic'] = random.randint(50, 80) data['Diastolic'] = random.randint(30, 50)

105

Page 112: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Anomalies and Obtain an Explanation

data['BloodPressureLevel'] = 'LOW' return data

while True: rnd = random.random() if (rnd < 0.005): data = json.dumps(getLowBloodPressure()) print(data) kinesis.put_record("BloodPressureExampleInputStream", data, "partitionkey") elif (rnd > 0.995): data = json.dumps(getHighBloodPressure()) print(data) kinesis.put_record("BloodPressureExampleInputStream", data, "partitionkey") else: data = json.dumps(getNormalBloodPressure()) print(data) kinesis.put_record("BloodPressureExampleInputStream", data, "partitionkey")

The previous code writes to ExampleInputStream records similar to the following examples:

{"Systolic": 109, "Diastolic": 64, "BloodPressureLevel": "NORMAL"}{"Systolic": 99, "Diastolic": 72, "BloodPressureLevel": "NORMAL"}{"Systolic": 159, "Diastolic": 100, "BloodPressureLevel": "HIGH"}{"Systolic": 94, "Diastolic": 75, "BloodPressureLevel": "NORMAL"}{"Systolic": 91, "Diastolic": 78, "BloodPressureLevel": "NORMAL"}{"Systolic": 91, "Diastolic": 74, "BloodPressureLevel": "NORMAL"}{"Systolic": 102, "Diastolic": 75, "BloodPressureLevel": "NORMAL"}{"Systolic": 50, "Diastolic": 31, "BloodPressureLevel": "LOW"}{"Systolic": 100, "Diastolic": 66, "BloodPressureLevel": "NORMAL"}{"Systolic": 115, "Diastolic": 65, "BloodPressureLevel": "NORMAL"}{"Systolic": 99, "Diastolic": 74, "BloodPressureLevel": "NORMAL"}

Next Step

Step 2: Create an Analytics Application (p. 106)

Step 2: Create an Analytics ApplicationIn this section, you create an Amazon Kinesis data analytics application and configure it to use theKinesis data stream you created in the preceding section (p. 104) as the streaming source. You then runapplication code that uses the RANDOM_CUT_FOREST_WITH_EXPLANATION function.

To create an application

1. Open the Kinesis console at https://console.aws.amazon.com/kinesis.2. Go to the Amazon Kinesis Data Analytics dashboard, and choose Create application.3. Provide an application name and description (optional), and choose Create application.4. Choose Connect to a source, and then choose ExampleInputStream from the list.5. Choose Discover schema, and make sure that Systolic and Diastolic appear as INTEGER

columns. If they have another type, choose Edit schema, and assign the type INTEGER to both ofthem.

6. Under Real time analytics, choose Go to SQL editor. When prompted, choose to run yourapplication.

7. Paste the following code into the SQL editor, and then choose Save and run SQL.

--Creates a temporary stream.CREATE OR REPLACE STREAM "TEMP_STREAM" (

106

Page 113: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Anomalies and Obtain an Explanation

"Systolic" INTEGER, "Diastolic" INTEGER, "BloodPressureLevel" varchar(20), "ANOMALY_SCORE" DOUBLE, "ANOMALY_EXPLANATION" varchar(512));

--Creates another stream for application output. CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ( "Systolic" INTEGER, "Diastolic" INTEGER, "BloodPressureLevel" varchar(20), "ANOMALY_SCORE" DOUBLE, "ANOMALY_EXPLANATION" varchar(512));

-- Compute an anomaly score with explanation for each record in the input stream-- using RANDOM_CUT_FOREST_WITH_EXPLANATIONCREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "TEMP_STREAM" SELECT STREAM "Systolic", "Diastolic", "BloodPressureLevel", ANOMALY_SCORE, ANOMALY_EXPLANATION FROM TABLE(RANDOM_CUT_FOREST_WITH_EXPLANATION( CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"), 100, 256, 100000, 1, true));

-- Sort records by descending anomaly score, insert into output streamCREATE OR REPLACE PUMP "OUTPUT_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM * FROM "TEMP_STREAM" ORDER BY FLOOR("TEMP_STREAM".ROWTIME TO SECOND), ANOMALY_SCORE DESC;

Next Step

Step 3: Examine the Results (p. 107)

Step 3: Examine the ResultsWhen you run the SQL code for this example (p. 104), you first see rows with an anomaly score equal tozero. This happens during the initial learning phase. Then you get results similar to the following.

ROWTIME SYSTOLIC DIASTOLIC BLOODPRESSURELEVEL ANOMALY_SCORE ANOMALY_EXPLANATION27:49.0 101 66 NORMAL 0.711460417 {"Systolic":{"DIRECTION":"LOW","STRENGTH":"0.0922","ATTRIBUTION_SCORE":"0.3792"},"Diastolic":{"DIRECTION":"HIGH","STRENGTH":"0.0210","ATTRIBUTION_SCORE":"0.3323"}}27:50.0 144 123 HIGH 3.855851061 {"Systolic":{"DIRECTION":"HIGH","STRENGTH":"0.8567","ATTRIBUTION_SCORE":"1.7447"},"Diastolic":{"DIRECTION":"HIGH","STRENGTH":"7.0982","ATTRIBUTION_SCORE":"2.1111"}}27:50.0 113 69 NORMAL 0.740069409 {"Systolic":{"DIRECTION":"LOW","STRENGTH":"0.0549","ATTRIBUTION_SCORE":"0.3750"},"Diastolic":{"DIRECTION":"LOW","STRENGTH":"0.0394","ATTRIBUTION_SCORE":"0.3650"}}27:50.0 105 64 NORMAL 0.739644157 {"Systolic":{"DIRECTION":"HIGH","STRENGTH":"0.0245","ATTRIBUTION_SCORE":"0.3667"},"Diastolic":{"DIRECTION":"LOW","STRENGTH":"0.0524","ATTRIBUTION_SCORE":"0.3729"}}27:50.0 100 65 NORMAL 0.736993425 {"Systolic":{"DIRECTION":"HIGH","STRENGTH":"0.0203","ATTRIBUTION_SCORE":"0.3516"},"Diastolic":{"DIRECTION":"LOW","STRENGTH":"0.0454","ATTRIBUTION_SCORE":"0.3854"}}27:50.0 108 69 NORMAL 0.733767202 {"Systolic":{"DIRECTION":"LOW","STRENGTH":"0.0974","ATTRIBUTION_SCORE":"0.3961"},"Diastolic":{"DIRECTION":"LOW","STRENGTH":"0.0189","ATTRIBUTION_SCORE":"0.3377"}}

• The algorithm in the RANDOM_CUT_FOREST_WITH_EXPLANATION function sees that the Systolicand Diastolic columns are numeric, and uses them as input.

107

Page 114: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Hotspots

• The BloodPressureLevel column has text data, and is therefore not taken into account by thealgorithm. This column is simply a visual aide to help you quickly spot the normal, high, and low bloodpressure levels in this example.

• In the ANOMALY_SCORE column, records with higher scores are more anomalous. The second record inthis sample set of results is the most anomalous, with an anomaly score of 3.855851061.

• To understand the extent to which each of the numeric columns taken into account by the algorithmcontributes to the anomaly score, consult the JSON field named ATTRIBUTION_SCORE in theANOMALY_SCORE column. In the case of the second row in this set of sample results, the Systolicand Diastolic columns contribute to the anomaly in the ratio of 1.7447:2.1111. In other words,45 percent of the explanation for the anomaly score is attributable to the systolic value, and theremaining attribution is due to the diastolic value.

• To determine the direction in which the point represented by the second row in this sample isanomalous, consult the JSON field named DIRECTION. Both the diastolic and systolic values aremarked as HIGH in this case. To determine the confidence with which these directions are correct,consult the JSON field named STRENGTH. In this example, the algorithm is more confident that thediastolic value is high. Indeed, the normal value for the diastolic reading is usually 60–80, and 123 ismuch higher than expected.

Example: Detecting Hotspots on a Stream(HOTSPOTS Function)Amazon Kinesis Data Analytics provides a function (HOTSPOTS) that can locate and return informationabout relatively dense regions in your data. For more information, see HOTSPOTS in the Amazon KinesisData Analytics SQL Reference.

In this exercise, you write application code to locate hotspots on your application's streaming source. Youdo the following to set up the application:

1. Set up a streaming source – Set up a Kinesis stream and write sample coordinate data as shownfollowing:

{"x": 7.921782426109737, "y": 8.746265312709893, "is_hot": "N"}{"x": 0.722248626528026, "y": 4.648868803193405, "is_hot": "Y"}

The example provides a Python script for you to populate the stream. The x and y values arerandomly generated, with some records being clustered around certain locations.

The is_hot field is provided as an indicator if the script intentionally generated the value as part ofa hotspot. This can help you evaluate whether the hotspot detection function is working properly.

2. Create the application – Using the AWS Management Console, create a Kinesis Data Analyticsapplication. Configure the application input by mapping the streaming source to an in-applicationstream (SOURCE_SQL_STREAM_001). When the application starts, Kinesis Data Analyticscontinuously reads the streaming source and inserts records into the in-application stream.

Use the following application code for the application:

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ( "x" DOUBLE, "y" DOUBLE, "is_hot" VARCHAR(4), HOTSPOTS_RESULT VARCHAR(10000)); CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT "x", "y", "is_hot", "HOTSPOTS_RESULT"

108

Page 115: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Hotspots

FROM TABLE ( HOTSPOTS( CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"), 1000, 0.2, 17) );

The code reads rows in the SOURCE_SQL_STREAM_001, analyzes it for significant hotspots, andwrites the resulting data to another in-application stream (DESTINATION_SQL_STREAM). You usepumps to insert rows in in-application streams. For more information, see In-Application Streamsand Pumps (p. 65).

3. Configure the output – Configure the application output to send data from the application to anexternal destination, which is another Kinesis data stream. Review the hotspot scores and determinewhat scores indicate that a hotspot occured (and that you need to be alerted). You can use an AWSLambda function to further process hotspot information and configure alerts.

4. Verify the output – The example includes a JavaScript application that reads data from the outputstream and displays it graphically, so you can view the hotspots that the application generates inreal time.

The exercise uses the US West (Oregon) (us-west-2) AWS Region to create these streams and yourapplication. If you use any other Region, update the code accordingly.

Topics

• Step 1: Create the Input and Output Streams (p. 109)

• Step 2: Create the Kinesis Data Analytics Application (p. 112)

• Step 3: Configure the Application Output (p. 113)

• Step 4: Verify the Application Output (p. 113)

Step 1: Create the Input and Output StreamsBefore you create an Amazon Kinesis Data Analytics application for the Hotspots example (p. 108),you create two Kinesis data streams. Configure one of the streams as the streaming source for yourapplication, and the other stream as the destination where Kinesis Data Analytics persists yourapplication output.

Step 1.1: Create the Kinesis Data Streams

In this section, you create two Kinesis data streams: ExampleInputStream andExampleOutputStream.

1. Create these data streams using the console or the AWS CLI.

• To create the data streams using the console:

1. Sign in to the AWS Management Console and open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics.

2. Choose Kinesis Stream, and then create a stream with one shard calledExampleInputStream.

3. Repeat the previous step, creating a stream with one shard called ExampleOutputStream.

• To create data streams using the AWS CLI:

1. Create streams (ExampleInputStream and ExampleOutputStream) using the followingKinesis create-stream AWS CLI command:

109

Page 116: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Hotspots

$ aws kinesis create-stream \--stream-name ExampleInputStream \--shard-count 1 \--region us-west-2 \--profile adminuser $ aws kinesis create-stream \--stream-name ExampleOutputStream \--shard-count 1 \--region us-west-2 \--profile adminuser

2. To create the second stream, which the application will use to write output, run the same command,changing the stream name to ExampleOutputStream.

Step 1.2: Write Sample Records to the Input Stream

In this step, you run Python code to continuously generate sample records and write to theExampleInputStream stream.

{"x": 7.921782426109737, "y": 8.746265312709893, "is_hot": "N"}{"x": 0.722248626580026, "y": 4.648868803193405, "is_hot": "Y"}

The code writes these records to the ExampleInputStream stream.

1. Install Python and pip.

For information about installing Python, see the Python website.

You can install dependencies using pip. For information about installing pip, see Installation on thepip website.

2. Run the following Python code. This code does the following:

• A potential hotspot is generated somewhere in the (X, Y) plane.

• A set of 1000 points is generated for each hotspot. Of these points, 20 percent are clusteredaround the hotspot. The rest are generated randomly within the entire space.

• The put-record command writes the JSON records to the stream.

NoteDo not upload this file to a web server, as it contains your AWS credentials.

import boto3import jsonimport time

from random import random

# Modify this section to reflect your AWS configurationawsRegion = "" # The AWS region where your Kinesis Analytics application is configured.accessKeyId = "" # Your AWS Access Key IDsecretAccessKey = "" # Your AWS Secret Access KeyinputStream = "ExampleInputStream" # The name of the stream being used as input into the Kinesis Analytics hotspots application

# Variables that control properties of the generated data

110

Page 117: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Hotspots

xRange = [0, 10] # The range of values taken by the x-coordinateyRange = [0, 10] # The range of values taken by the y-coordinatehotspotSideLength = 1 # The side length of the hotspothotspotWeight = 0.2 # The fraction ofpoints that are draw from the hotspots

def generate_point_in_rectangle(x_min, width, y_min, height): """Generate points uniformly in the given rectangle.""" return { 'x': x_min + random() * width, 'y': y_min + random() * height }

class RecordGenerator(object): """A class used to generate points used as input to the hotspot detection algorithm. With probability hotspotWeight, a point is drawn from a hotspot, otherwise it is drawn from the base distribution. The location of the hotspot changes after every 1000 points generated."""

def __init__(self): self.x_min = xRange[0] self.width = xRange[1] - xRange[0] self.y_min = yRange[0] self.height = yRange[1] - yRange[0] self.points_generated = 0 self.hotspot_x_min = None self.hotspot_y_min = None

def get_record(self): if self.points_generated % 1000 == 0: self.update_hotspot()

if random() < hotspotWeight: record = generate_point_in_rectangle(self.hotspot_x_min, hotspotSideLength, self.hotspot_y_min, hotspotSideLength) record['is_hot'] = 'Y' else: record = generate_point_in_rectangle(self.x_min, self.width, self.y_min, self.height) record['is_hot'] = 'N'

self.points_generated += 1 data = json.dumps(record) return {'Data': bytes(data, 'utf-8'), 'PartitionKey': 'partition_key'}

def get_records(self, n): return [self.get_record() for _ in range(n)]

def update_hotspot(self): self.hotspot_x_min = self.x_min + random() * (self.width - hotspotSideLength) self.hotspot_y_min = self.y_min + random() * (self.height - hotspotSideLength)

def main(): kinesis = boto3.client("kinesis", region_name=awsRegion, aws_access_key_id=accessKeyId, aws_secret_access_key=secretAccessKey)

generator = RecordGenerator() batch_size = 10

while True:

111

Page 118: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Hotspots

records = generator.get_records(batch_size) kinesis.put_records(StreamName=inputStream, Records=records)

time.sleep(0.1)

if __name__ == "__main__": main()

Next Step

Step 2: Create the Kinesis Data Analytics Application (p. 112)

Step 2: Create the Kinesis Data Analytics Application

In this section of the Hotspots example (p. 108), you create an Amazon Kinesis Data Analyticsapplication as follows:

• Configure the application input to use the Kinesis data stream you created as the streaming source inStep 1 (p. 109).

• Use the provided application code in the AWS Management Console.

To create an application

1. Create a Kinesis Data Analytics application by following steps 1, 2, and 3 in Create Your StarterAmazon Kinesis Data Analytics Application (p. 46) (see Step 3.1: Create an Application (p. 48)).

In the source configuration, do the following:

• Specify the streaming source you created in the section called “Step 1: Create Streams” (p. 109).

• After the console infers the schema, edit the schema. Ensure that the x and y column types are setto DOUBLE and that the IS_HOT column type is set to VARCHAR.

2. Use the following application code (you can paste this code into the SQL editor):

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" ( "x" DOUBLE, "y" DOUBLE, "is_hot" VARCHAR(4), HOTSPOTS_RESULT VARCHAR(10000)); CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT "x", "y", "is_hot", "HOTSPOTS_RESULT" FROM TABLE ( HOTSPOTS( CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"), 1000, 0.2, 17) );

3. Run the SQL code and review the results.

112

Page 119: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Hotspots

Next Step

Step 3: Configure the Application Output (p. 113)

Step 3: Configure the Application OutputAt this point in the Hotspots example (p. 108), you have Amazon Kinesis Data Analytics applicationcode discovering significant hotspots from a streaming source and assigning a heat score to each.

You can now send the application result from the in-application stream to an external destination, whichis another Kinesis data stream (ExampleOutputStream). You can then analyze the hotspot scores anddetermine what an appropriate threshold is for hotspot heat. You can extend this application further togenerate alerts.

To configure the application output

1. Open the Kinesis Data Analytics console at https://console.aws.amazon.com/kinesisanalytics.2. In the the SQL editor, choose either Destination or Add a destination in the application dashboard.3. On the Add a destination page, choose Select from your streams, and then choose the

ExampleOutputStream stream that you created in the preceding section.

Now you have an external destination, where Amazon Kinesis Data Analytics persists any recordsyour application writes to the in-application stream DESTINATION_SQL_STREAM.

4. You can optionally configure AWS Lambda to monitor the ExampleOutputStream stream andsend you alerts. For more information, see Using a Lambda Function as Output (p. 33). You can alsoreview the records that Kinesis Data Analytics writes to the external destination, which is the Kinesisstream ExampleOutputStream, as described in Step 4: Verify the Application Output (p. 113).

Next Step

Step 4: Verify the Application Output (p. 113)

Step 4: Verify the Application OutputIn this section of the Hotspots example (p. 108), you set up a web application that displays the hotspotinformation in a Scalable Vector Graphics (SVG) control.

1. Create a file named index.html with the following contents:

<!doctype html><html lang=en>

113

Page 120: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Hotspots

<head> <meta charset=utf-8> <title>hotspots viewer</title>

<style> #visualization { display: block; margin: auto; }

.point { opacity: 0.2; }

.hot { fill: red; }

.cold { fill: blue; }

.hotspot { stroke: black; stroke-opacity: 0.8; stroke-width: 1; fill: none; } </style> <script src="https://sdk.amazonaws.com/js/aws-sdk-2.202.0.min.js"></script> <script src="https://d3js.org/d3.v4.min.js"></script></head><body><svg id="visualization" width="600" height="600"></svg><script src="hotspots_viewer.js"></script></body></html>

2. Create a file in the same directory named hotspots_viewer.js with the following contents.Provide your AWS Region, credentials, and output stream name in the variables provided.

// Visualize example output from the Kinesis Analytics hotspot detection algorithm.// This script assumes that the output stream has a single shard.

// Modify this section to reflect your AWS configurationvar awsRegion = "", // The AWS region where your Kinesis Analytics application is configured. accessKeyId = "", // Your AWS Access Key ID secretAccessKey = "", // Your AWS Secret Access Key hotspotsStream = ""; // The name of the Kinesis Stream where the output from the HOTSPOTS function is being written

// The variables in this section should reflect way input data was generated and the parameters that the HOTSPOTS// function was called with.var windowSize = 1000, // The window size used for hotspot detection minimumHeat = 20, // A filter applied to returned hotspots before visualization xRange = [0, 10], // The range of values to display on the x-axis yRange = [0, 10]; // The range of values to display on the y-axis

////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// D3 setup////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

114

Page 121: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Hotspots

var svg = d3.select("svg"), margin = {"top": 20, "right": 20, "bottom": 20, "left": 20}, graphWidth = +svg.attr("width") - margin.left - margin.right, graphHeight = +svg.attr("height") - margin.top - margin.bottom;

// Return the linear function that maps the segment [a, b] to the segment [c, d].function linearScale(a, b, c, d) { var m = (d - c) / (b - a); return function(x) { return c + m * (x - a); };}

// helper functions to extract the x-value from a stream record and scale it for outputvar xValue = function(r) { return r.x; }, xScale = linearScale(xRange[0], xRange[1], 0, graphWidth), xMap = function(r) { return xScale(xValue(r)); };

// helper functions to extract the y-value from a stream record and scale it for outputvar yValue = function(r) { return r.y; }, yScale = linearScale(yRange[0], yRange[1], 0, graphHeight), yMap = function(r) { return yScale(yValue(r)); };

// a helper function that assigns a CSS class to a point based on whether it was generated as part of a hotspotvar classMap = function(r) { return r.is_hot == "Y" ? "point hot" : "point cold"; };

var g = svg.append("g") .attr("transform", "translate(" + margin.left + "," + margin.top + ")");

function update(records, hotspots) {

var points = g.selectAll("circle") .data(records, function(r) { return r.dataIndex; });

points.enter().append("circle") .attr("class", classMap) .attr("r", 3) .attr("cx", xMap) .attr("cy", yMap);

points.exit().remove();

if (hotspots) { var boxes = g.selectAll("rect").data(hotspots);

boxes.enter().append("rect") .merge(boxes) .attr("class", "hotspot") .attr("x", function(h) { return xScale(h.minValues[0]); }) .attr("y", function(h) { return yScale(h.minValues[1]); }) .attr("width", function(h) { return xScale(h.maxValues[0]) - xScale(h.minValues[0]); }) .attr("height", function(h) { return yScale(h.maxValues[1]) - yScale(h.minValues[1]); });

boxes.exit().remove(); }}

////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// Use the AWS SDK to pull output records from Kinesis and update the visualization////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

var kinesis = new AWS.Kinesis({ "region": awsRegion,

115

Page 122: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Hotspots

"accessKeyId": accessKeyId, "secretAccessKey": secretAccessKey});

var textDecoder = new TextDecoder("utf-8");

// Decode an output record into an object and assign it an index valuefunction decodeRecord(record, recordIndex) { var record = JSON.parse(textDecoder.decode(record.Data)); var hotspots_result = JSON.parse(record.hotspots_result); record.hotspots = hotspots_result.hotspots .filter(function(hotspot) { return hotspot.heat >= minimumHeat}); record.index = recordIndex return record;}

// Fetch a new records from the shard iterator, append them to records, and update the visualizationfunction getRecordsAndUpdateVisualization(shardIterator, records, lastRecordIndex) { kinesis.getRecords({ "ShardIterator": shardIterator }, function(err, data) { if (err) { console.log(err, err.stack); return; }

var newRecords = data.Records.map(function(raw) { return decodeRecord(raw, ++lastRecordIndex); }); newRecords.forEach(function(record) { records.push(record); });

var hotspots = null; if (newRecords.length > 0) { hotspots = newRecords[newRecords.length - 1].hotspots; }

while (records.length > windowSize) { records.shift(); }

update(records, hotspots);

getRecordsAndUpdateVisualization(data.NextShardIterator, records, lastRecordIndex); });}

// Get a shard iterator for the output stream and begin updating the visualization. Note that this script will only// read records from the first shard in the stream.function init() { kinesis.describeStream({ "StreamName": hotspotsStream }, function(err, data) { if (err) { console.log(err, err.stack); return; }

var shardId = data.StreamDescription.Shards[0].ShardId;

kinesis.getShardIterator({ "StreamName": hotspotsStream, "ShardId": shardId, "ShardIteratorType": "LATEST" }, function(err, data) {

116

Page 123: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Detect Hotspots

if (err) { console.log(err, err.stack); return; } getRecordsAndUpdateVisualization(data.ShardIterator, [], 0); }) });}

// Start the visualizationinit();

3. With the Python code from the first section running, open index.html in a web browser. Thehotspot information will display in the page.

117

Page 124: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Using Different Types of Times in Analytics

Example: Using Different Types of Times in StreamingAnalyticsFor information about different types of times and an example query, see Timestamps and the ROWTIMEColumn (p. 66). You can try the example query in that section against the demo stream you created inthe Getting Started Exercise.

Examples: Other Amazon Kinesis Data AnalyticsApplications

This section provides examples that help you explore Amazon Kinesis Data Analytics concepts. Thisincludes, examples in which you introduce runtime errors that cause your application send rows to in-application stream, explore console support for editing schemas that the console infers for in-applicationinput stream, by sampling data on the streaming source.

Topics• Example: Explore the In-Application Error Stream (p. 118)

Example: Explore the In-Application Error StreamAmazon Kinesis Data Analytics provides an in-application error stream for each application you create.Any rows that your application cannot process are sent to this error stream. You might considerpersisting the error stream data to an external destination so that you can investigate.

In this exercise, you introduce errors in input configuration by editing the schema inferred by thediscovery process, and verify rows sent to the error stream.

You perform this exercise in the console.

Introduce Parse ErrorIn this exercise, you introduce a parse error.

1. Create an application. For instructions, see Step 3.1: Create an Application (p. 48).2. On the newly created application hub, choose Connect to a source.3. On the Source page, select the demo stream (kinesis-anlaytics-demo-stream).

If you followed the Getting Started exercise, you have a demo stream in your account.4. Amazon Kinesis Data Analytics takes a sample from the demo stream to infer a schema for the in-

application input stream it creates. The console shows the inferred schema and sample data in theFormatted stream sample tab.

5. Now you edit the schema and modify the column type to introduce the parse error. Choose Editschema.

6. Change the TICKER_SYMBOL column type from VARCHAR(4) to INTEGER.

Now that column type of the in-application schema that is created is invalid, Amazon Kinesis DataAnalytics can't bring in data in the in-application stream. Instead Kinesis Data Analytics sends therows to the error stream.

7. Choose Save schema.8. Choose Refresh schema samples.

118

Page 125: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideExample: Explore the In-Application Error Stream

Notice that there are no rows in the Formatted stream sample. However, the Error stream tabshows data with an error message. The Error stream tab shows data sent to the in-application errorstream.

Because you changed the column data type, Amazon Kinesis Data Analytics was not able to bringthe data in the in-application input stream, and instead it sent the data to the error stream.

Divide by Zero ErrorIn this exercise, you update the application code to introduce a runtime error (division by zero). Noticethat Amazon Kinesis Data Analytics sends the resulting rows to the in-application error stream, not tothe in-application error stream where the results are supposed to be written.

1. Follow the Getting Started exercise to create an application. For instructions, see Step 3: CreateYour Starter Amazon Kinesis Data Analytics Application (p. 46).

Verify the results on the Real-time analytics tab as follows:

Sour2. Update the SELECT statement in the application code to introduce divide by zero. For example:

SELECT STREAM ticker_symbol, sector, change, (price / 0) as ProblemColumnFROM "SOURCE_SQL_STREAM_001"WHERE sector SIMILAR TO '%TECH%';

3. Run the application. Because of the division by zero runtime error occurs, instead of writing theresults to the DESTINATION_SQL_STREAM, Amazon Kinesis Data Analytics sends rows to the in-application error stream. On the Real-time analytics tab, choose the error-stream, and then you cansee the rows in the in-application error stream.

119

Page 126: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideMonitoring Tools

Monitoring Amazon Kinesis DataAnalytics

Monitoring is an important part of maintaining the reliability, availability, and performance ofAmazon Kinesis Data Analytics and your Amazon Kinesis Data Analytics application. You should collectmonitoring data from all of the parts of your AWS solution so that you can more easily debug amultipoint failure if one occurs. Before you start monitoring Amazon Kinesis Data Analytics, however,you should create a monitoring plan that includes answers to the following questions:

• What are your monitoring goals?• What resources will you monitor?• How often will you monitor these resources?• What monitoring tools will you use?• Who will perform the monitoring tasks?• Who should be notified when something goes wrong?

The next step is to establish a baseline for normal Amazon Kinesis Data Analytics performance in yourenvironment, by measuring performance at various times and under different load conditions. As youmonitor Amazon Kinesis Data Analytics, you can store historical monitoring data. If you do, you cancompare it with current performance data, identify normal performance patterns and performanceanomalies, and devise methods to address issues.

With Amazon Kinesis Data Analytics, you monitor the application. The application processes datastreams (input or output), both of which include identifiers which you can use to narrow your search onCloudWatch logs. For information about how Amazon Kinesis Data Analytics processes data streams, seeAmazon Kinesis Data Analytics: How It Works (p. 3).

The most important metric is the millisBehindLatest, which indicates how far behind an applicationis reading from the streaming source. In a typical case, the milliseconds behind should be at or near zero.It is common for brief spikes to appear, which appears as an increase in millisBehindLatest.

We recommend that you set up a CloudWatch alarm that triggers when the application is behind bymore than an hour reading the streaming source. For some use cases that require very close to real-timeprocessing, such as emitting processed data to a live application, you might choose to set the alarm at alower value, such as five minutes.

For a list of metrics Amazon Kinesis Data Analytics supports, see Amazon Kinesis Data Analytics Metrics.

Topics• Monitoring Tools (p. 120)• Monitoring with Amazon CloudWatch (p. 121)

Monitoring ToolsAWS provides various tools that you can use to monitor Amazon Kinesis Data Analytics. You canconfigure some of these tools to do the monitoring for you, while some of the tools require manualintervention. We recommend that you automate monitoring tasks as much as possible.

120

Page 127: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAutomated Tools

Automated Monitoring ToolsYou can use the following automated monitoring tools to watch Amazon Kinesis Data Analytics andreport when something is wrong:

• Amazon CloudWatch Alarms – Watch a single metric over a time period that you specify, and performone or more actions based on the value of the metric relative to a given threshold over a number oftime periods. The action is a notification sent to an Amazon Simple Notification Service (Amazon SNS)topic or Amazon EC2 Auto Scaling policy. CloudWatch alarms do not invoke actions simply becausethey are in a particular state; the state must have changed and been maintained for a specifiednumber of periods. For more information, see Monitoring with Amazon CloudWatch (p. 121).

• Amazon CloudWatch Logs – Monitor, store, and access your log files from AWS CloudTrail or othersources. For more information, see Monitoring Log Files in the Amazon CloudWatch User Guide.

• Amazon CloudWatch Events – Match events and route them to one or more target functions orstreams to make changes, capture state information, and take corrective action. For more information,see What is Amazon CloudWatch Events in the Amazon CloudWatch User Guide.

• AWS CloudTrail Log Monitoring – Share log files between accounts, monitor CloudTrail log files in realtime by sending them to CloudWatch Logs, write log processing applications in Java, and validate thatyour log files have not changed after delivery by CloudTrail. For more information, see Working withCloudTrail Log Files in the AWS CloudTrail User Guide.

Manual Monitoring ToolsAnother important part of monitoring Amazon Kinesis Data Analytics involves manually monitoringthose items that the CloudWatch alarms don't cover. The Amazon Kinesis Data Analytics, CloudWatch,Trusted Advisor, and other AWS console dashboards provide an at-a-glance view of the state of your AWSenvironment.

• The CloudWatch home page shows the following:

• Current alarms and status

• Graphs of alarms and resources

• Service health status

In addition, you can use CloudWatch to do the following:

• Create customized dashboards to monitor the services you care about

• Graph metric data to troubleshoot issues and discover trends

• Search and browse all your AWS resource metrics

• Create and edit alarms to be notified of problems

• AWS Trusted Advisor can help you monitor your AWS resources to improve performance, reliability,security, and cost effectiveness. Four Trusted Advisor checks are available to all users. More than 50checks are available to users with a Business or Enterprise support plan. For more information, seeAWS Trusted Advisor.

Monitoring with Amazon CloudWatchYou can monitor Amazon Kinesis Data Analytics applications using CloudWatch, which collects andprocesses raw data from Amazon Kinesis Data Analytics into readable, near real-time metrics. Thesestatistics are retained for a period of two weeks, so that you can access historical information and gaina better perspective on how your web application or service is performing. By default, Amazon KinesisData Analytics metric data is automatically sent to CloudWatch. For more information, see What Are

121

Page 128: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideMetrics and Dimensions

Amazon CloudWatch, Amazon CloudWatch Events, and Amazon CloudWatch Logs? in the AmazonCloudWatch User Guide.

Topics• Viewing Amazon Kinesis Data Analytics Metrics and Dimensions (p. 122)

Viewing Amazon Kinesis Data Analytics Metrics andDimensionsWhen your Amazon Kinesis Data Analytics application processes data streams, Kinesis Data Analyticssends the following metrics and dimensions to CloudWatch. You can use the following procedures toview the metrics for Kinesis Data Analytics.

In the console, metrics are grouped first by service namespace, and then by the dimension combinationswithin each namespace.

For a list of metrics Amazon Kinesis Data Analytics supports, see Amazon Kinesis Data Analytics Metrics.

To view metrics using the CloudWatch console

1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.2. In the navigation pane, choose Metrics.3. In the CloudWatch Metrics by Category pane for Amazon Kinesis Data Analytics, select a metrics

category.4. In the upper pane, scroll to view the full list of metrics.

To view metrics using the AWS CLI

• At a command prompt, use the following command.

aws cloudwatch list-metrics --namespace "AWS/KinesisAnalytics" --region region

Amazon Kinesis Data Analytics metrics are collected at the following levels:

• Application• Input stream• Output stream

Creating CloudWatch Alarms to Monitor Amazon Kinesis DataAnalyticsYou can create a CloudWatch alarm that sends an Amazon SNS message when the alarm changes state.An alarm watches a single metric over a time period you specify. It performs one or more actions basedon the value of the metric relative to a given threshold over a number of time periods. The action is anotification sent to an Amazon SNS topic or Auto Scaling policy.

Alarms invoke actions for sustained state changes only. For a CloudWatch alarm to invoke an action, thestate must have changed and been maintained for a specified amount of time.

You can set alarms using the AWS Management Console, CloudWatch CLI, or CloudWatch API, asdescribed following.

122

Page 129: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideMetrics and Dimensions

To set an alarm using the CloudWatch console

1. Sign in to the AWS Management Console and open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.

2. Choose Create Alarm. The Create Alarm Wizard launches.3. Choose Kinesis Analytics Metrics, and then scroll through the Amazon Kinesis Data Analytics

metrics to locate the metric you want to place an alarm on.

To display just Amazon Kinesis Data Analytics metrics, search for the file system ID of your filesystem. Select the metric to create an alarm for, and then choose Next.

4. Type values for Name, Description, and Whenever for the metric.5. If you want CloudWatch to send you an email when the alarm state is reached, in the Whenever this

alarm: field, choose State is ALARM. In the Send notification to: field, choose an existing SNS topic.If you select Create topic, you can set the name and email addresses for a new email subscriptionlist. This list is saved and appears in the field for future alarms.

NoteIf you use Create topic to create a new Amazon SNS topic, the email addresses must beverified before they receive notifications. Emails are only sent when the alarm enters analarm state. If this alarm state change happens before the email addresses are verified, theydo not receive a notification.

6. In the Alarm Preview section, preview the alarm you’re about to create.7. Choose Create Alarm to create the alarm.

To set an alarm using the CloudWatch CLI

• Call mon-put-metric-alarm. For more information, see the Amazon CloudWatch CLI Reference.

To set an alarm using the CloudWatch API

• Call PutMetricAlarm. For more information, see the Amazon CloudWatch API Reference.

Working with Amazon CloudWatch LogsIf an Amazon Kinesis Data Analytics application is misconfigured, it can transition to a running stateduring application start or update but not process any data into the in-application input stream. Byadding a CloudWatch log option to the application, you can monitor for application configurationproblems.

Amazon Kinesis Data Analytics can generate configuration errors under the following conditions:

• The Kinesis Stream used for input doesn't exist.• The Amazon Kinesis Data Firehose delivery stream used for input doesn't exist.• The Amazon S3 bucket used as a reference data source doesn't exist.• The specified file in the reference data source in the S3 bucket doesn't exist.• The correct resource is not defined in the AWS Identity and Access Management (IAM) role that

manages related permissions.• The correct permission is not defined in the IAM role that manages related permissions.• Kinesis Data Analytics doesn't have permission to assume the IAM role that manages related

permissions.

For more information on Amazon CloudWatch, see the CloudWatch User Guide.

123

Page 130: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideMetrics and Dimensions

Adding the PutLogEvents Policy Action

Amazon Kinesis Data Analytics needs permissions to write misconfiguration errors to CloudWatch. Youcan add these permissions to the IAM role that Amazon Kinesis Data Analytics assumes, as describedfollowing. For more information on using an IAM role for Amazon Kinesis Data Analytics, see GrantingAmazon Kinesis Data Analytics Permissions to Access Streaming Sources (Creating an IAM Role) (p. 40).

Trust Policy

To grant Kinesis Data Analytics permissions to assume an IAM role, you can attach the following trustpolicy to the role.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "kinesisanalytics.amazonaws.com" }, "Action": "sts:AssumeRole" } ]}

Permissions Policy

To grant an application permissions to write log events to CloudWatch from an Kinesis Data Analyticsresource, you can use the following IAM permissions policy.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt0123456789000", "Effect": "Allow", "Action": [ "logs:PutLogEvents" ], "Resource": [ "arn:aws:logs:us-east-1:123456789012:log-group:my-log-group:log-stream:my-log-stream*" ] } ]}

Adding Configuration Error Monitoring

Use the following API actions to add a CloudWatch log option to a new or existing application or changea log option for an existing application.

NoteYou can currently only add a CloudWatch log option to an application by using API actions. Youcan't add CloudWatch log options by using the console.

Adding a CloudWatch Log Option When Creating an Application

The following code example demonstrates how to use the CreateApplication actionto add a CloudWatch log option when you create an application. For more information onCreate_Application, see CreateApplication (p. 163).

124

Page 131: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideMetrics and Dimensions

{ "ApplicationCode": "<The SQL code the new application will run on the input stream>", "ApplicationDescription": "<A friendly description for the new application>", "ApplicationName": "<The name for the new application>", "Inputs": [ ... ], "Outputs": [ ... ], "CloudWatchLoggingOptions": [{ "LogStreamARN": "<Amazon Resource Name (ARN) of the CloudWatch log stream to add to the new application>", "RoleARN": "<ARN of the role to use to access the log>" }]}

Adding a CloudWatch Log Option to an Existing Application

The following code example demonstrates how to use theAddApplicationCloudWatchLoggingOption action to add a CloudWatch log option to anexisting application. For more information on AddApplicationCloudWatchLoggingOption, seeAddApplicationCloudWatchLoggingOption (p. 150).

{ "ApplicationName": "<Name of the application to add the log option to>", "CloudWatchLoggingOption": { "LogStreamARN": "<ARN of the log stream to add to the application>", "RoleARN": "<ARN of the role to use to access the log>" }, "CurrentApplicationVersionId": <Version of the application to add the log to>}

Updating an Existing CloudWatch Log Option

The following code example demonstrates how to use the UpdateApplication action tomodify an existing CloudWatch log option. For more information on UpdateApplication, seeUpdateApplication (p. 192).

{ "ApplicationName": "<Name of the application to update the log option for>", "ApplicationUpdate": { "CloudWatchLoggingOptionUpdates": [ { "CloudWatchLoggingOptionId": "<ID of the logging option to modify>", "LogStreamARNUpdate": "<ARN of the new log stream to use>", "RoleARNUpdate": "<ARN of the new role to use to access the log stream>" } ], }, "CurrentApplicationVersionId": <ID of the application version to modify>}

Deleting a CloudWatch Log Option from an Application

The following code example demonstrates how to use theDeleteApplicationCloudWatchLoggingOption action to delete an existing CloudWatchlog option. For more information on DeleteApplicationCloudWatchLoggingOption, seeDeleteApplicationCloudWatchLoggingOption (p. 170).

{

125

Page 132: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideMetrics and Dimensions

"ApplicationName": "<Name of application to delete log option from>", "CloudWatchLoggingOptionId": "<ID of the application log option to delete>", "CurrentApplicationVersionId": <Version of the application to delete the log option from>}

Configuration Errors

Following, you can learn details about errors that you might see in CloudWatch logs from amisconfigured application.

Error Message Format

Error messages generated by application misconfiguration appear in the following format.

{"applicationARN": "string","applicationVersionId": integer,"messageType": "ERROR","message": "string","inputId": "string","referenceId": "string","errorCode": "string""messageSchemaVersion": "integer",}

The fields in an error message contain the following information:

• applicationARN: The Amazon Resource Name (ARN) of the generating application, for example:arn:aws:kinesisanalytics:us-east-1:112233445566:application/sampleApp

• applicationVersionId: The version of the application at the time the error was encountered. For moreinformation, see ApplicationDetail (p. 197).

• messageType: The message type. Currently, this type can be only ERROR.

• message: The details of the error, for example:

There is a problem related to the configuration of your input. Please check that the resource exists, the role has the correct permissions to access the resource and that Kinesis Analytics can assume the role provided.

• inputId: ID associated with the application input. This value is only present if this input is thecause of the error. This value is not present if referenceId is present. For more information, seeDescribeApplication (p. 178).

• referenceId: ID associated with the application reference data source. This value is only present if thissource is the cause of the error. This value is not present if inputId is present. For more information, seeDescribeApplication (p. 178).

• errorCode: The identifier for the error. This ID is either InputError or ReferenceDataError.

• messageSchemaVersion: A value that specifies the current message schema version, currently 1. Youcan check this value to see if the error message schema has been updated.

Errors

The errors that might appear in CloudWatch logs for Amazon Kinesis Data Analytics include thefollowing.

126

Page 133: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideMetrics and Dimensions

Resource Does Not Exist

If an ARN is specified for an Kinesis input stream that doesn't exist, but the ARN is syntactically correct,an error like the following is generated.

{"applicationARN": "arn:aws:kinesisanalytics:us-east-1:112233445566:application/sampleApp","applicationVersionId": "5", "messageType": "ERROR", "message": "There is a problem related to the configuration of your input. Please check that the resource exists, the role has the correct permissions to access the resource and that Kinesis Analytics can assume the role provided.", "inputId":"1.1", "errorCode": "InputError", "messageSchemaVersion": "1"}

If an incorrect Amazon S3 file key is used for reference data, an error like the following is generated.

{ "applicationARN": "arn:aws:kinesisanalytics:us-east-1:112233445566:application/sampleApp", "applicationVersionId": "5", "messageType": "ERROR", "message": "There is a problem related to the configuration of your reference data. Please check that the bucket and the file exist, the role has the correct permissions to access these resources and that Kinesis Analytics can assume the role provided.", "referenceId":"1.1", "errorCode": "ReferenceDataError", "messageSchemaVersion": "1"}

Role Does Not Exist

If an ARN is specified for an IAM input role that doesn't exist, but the ARN is syntactically correct, an errorlike the following is generated.

{ "applicationARN": "arn:aws:kinesisanalytics:us-east-1:112233445566:application/sampleApp", "applicationVersionId": "5", "messageType": "ERROR", "message": "There is a problem related to the configuration of your input. Please check that the resource exists, the role has the correct permissions to access the resource and that Kinesis Analytics can assume the role provided.", "inputId":null, "errorCode": "InputError", "messageSchemaVersion": "1"}

Role Does Not Have Permissions to Access the Resource

If an input role is used that doesn't have permission to access the input resources, such as an Kinesissource stream, an error like the following is generated.

{ "applicationARN": "arn:aws:kinesisanalytics:us-east-1:112233445566:application/sampleApp", "applicationVersionId": "5", "messageType": "ERROR", "message": "There is a problem related to the configuration of your input. Please check that the resource exists, the role has the correct permissions to access the resource and that Kinesis Analytics can assume the role provided.",

127

Page 134: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideMetrics and Dimensions

"inputId":null, "errorCode": "InputError", "messageSchemaVersion": "1"}

128

Page 135: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer Guide

LimitsWhen working with Amazon Kinesis Data Analytics, note the following limits:

• The size of a row in an in-application stream is limited to 50 KB.

 • The service is available in specific AWS Regions. For more information, see Amazon Kinesis Data

Analytics in the AWS General Reference.

 • You can create up to 50 Kinesis Data Analytics applications per AWS Region in your account. You

can create a case to request additional applications via the service limit increase form. For moreinformation, see the AWS Support Center.

 • The maximum amount of source parallelism is 64. That is, in your application input configuration, you

can request the mapping of a streaming source to up to 64 in-application streams.

 • The number of Kinesis processing units (KPU) is limited to eight. For instructions on how to request an

increase to this limit, see To request a limit increase in AWS Service Limits.

 

With Kinesis Data Analytics, you pay only for what you use. You are charged an hourly rate based onthe average number of KPUs that are used to run your stream-processing application. A single KPUprovides you with 1 vCPU and 4 GB of memory.

 • Each application can have one streaming source and up to one reference data source.

 • You can configure up to three destinations for your Kinesis Data Analytics application. We recommend

that you use one of these destinations to persist in-application error stream data.

 • The Amazon S3 object that stores reference data can be up to 1 GB in size.

 • If you change the reference data that is stored in the S3 bucket after you upload reference data to an

in-application table, you need to use the UpdateApplication (p. 192) operation (using the API or AWSCLI) to refresh the data in the in-application table. Currently, the AWS Management Console doesn'tsupport refreshing reference data in your application.

 • Currently, Kinesis Data Analytics doesn't support data generated by the Amazon Kinesis Producer

Library (KPL).

129

Page 136: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideManaging Applications

Best PracticesThis section describes best practices when working with Amazon Kinesis Data Analytics applications.

Topics

• Managing Applications (p. 130)

• Defining Input Schema (p. 131)

• Connecting to Outputs (p. 132)

• Authoring Application Code (p. 132)

Managing ApplicationsWhen managing Amazon Kinesis Data Analytics applications, follow these best practices:

• Set up CloudWatch alarms – Using the CloudWatch metrics that Amazon Kinesis Data Analyticsprovides, you can monitor the following:

• Input bytes and input records (number of bytes and records entering the application)

• Output bytes, output record

• MillisBehindLatest (tracks how far behind the application is in reading from the streamingsource)

We recommend that you set up at least two CloudWatch alarms on the following metrics for your in-production applications:

 

• Alarm on MillisBehindLatest – For most cases, we recommend that you set this alarm totrigger when your application is one hour behind the latest data, for an average of one minute. Forapplications with lower end-to-end processing needs, you can tune this to a lower tolerance. Thealarm can help you ensure that your application is reading the latest data.

 

• Limit the number of production applications reading from the same Kinesis stream to two applicationsto avoid getting the ReadProvisionedThroughputException exception.

NoteIn this case, the term application refers to any application that can read from the streamingsource. Only an Amazon Kinesis Data Analytics application can read from a Kinesis DataFirehose delivery stream. However, many applications can read from an Kinesis stream,such as an Amazon Kinesis Data Analytics application or AWS Lambda. The recommendedapplication limit refers to all applications that you configure to read from a streaming source.

 

Amazon Kinesis Data Analytics reads a streaming source approximately once per second perapplication. However, an application that falls behind might read data at a faster rate to catch up. Toallow adequate throughput for applications to catch up, you limit the number of applications readingthe same data source.

• Limit the number of production applications reading from the same Kinesis Data Firehose deliverystream to one application.

130

Page 137: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDefining Input Schema

A Kinesis Data Firehose delivery stream can write to destinations such as Amazon S3, Amazon Redshift,and it can also be a streaming source for your Amazon Kinesis Data Analytics application. Therefore,we recommend you do not configure more than one Amazon Kinesis Data Analytics application perKinesis Data Firehose delivery stream to make sure the delivery stream can also deliver to otherdestinations.

Defining Input SchemaWhen configuring application input in the console, you first specify a streaming source. The console thenuses the discovery API (see DiscoverInputSchema (p. 182)) to infer a schema by sampling records on thestreaming source. The schema, among other things, defines names and data types of the columns in theresulting in-application stream. The console displays the schema. We recommend you do the followingwith this inferred schema:

• Adequately test the inferred schema. The discovery process uses only a sample of records on thestreaming source to infer a schema. If your streaming source has many record types, there is apossibility that the discovery API missed sampling one or more record types, which can result in aschema that does not accurately reflect data on the streaming source.

 

When your application starts, these missed record types might result in parsing errors. Amazon KinesisData Analytics sends these records to the in-application error stream. To reduce these parsing errors,we recommend that you test the inferred schema interactively in the console, and monitor the in-application stream for missed records.

 • The Amazon Kinesis Data Analytics API does not support specifying the NOT NULL constraint

on columns in the input configuration. If you want NOT NULL constraints on columns in your in-application stream, you should create these in-application streams using your application code. Youcan then copy data from one in-application stream into another, and then the constraint will beenforced.

 

Any attempt to insert rows with NULL values when a value is required results in an error, and AmazonKinesis Data Analytics sends these errors to the in-application error stream.

 • Relax data types inferred by the discovery process. The discovery process recommends columns and

data types based on a random sampling of records on the streaming source. We recommend thatyou review these carefully and consider relaxing these data types to cover all of the possible casesof records in your input. This ensures fewer parsing errors across the application while it is running.For example, if inferred schema has a SMALLINT as column type, perhaps consider changing it toINTEGER.

 • Use SQL functions in your application code to handle any unstructured data or columns. You may

have unstructured data or columns, such as log data, in your input. For examples, see Example:Manipulating Strings and Date Times (p. 76). One approach to handling this type of data is to definethe schema with only one column of type VARCHAR(N), where N is the largest possible row that youwould expect to see in your stream. In your application code you can then read the incoming records,use the String and Date Time functions to parse and schematize the raw data.

 

131

Page 138: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideConnecting to Outputs

• Make sure that you handle streaming source data that contains nesting more than two levels deepcompletely. When source data is JSON, you can have nesting. The discovery API will infer a schemathat flattens one level of nesting. For two levels of nesting, the discovery API will also attempt toflatten these. Beyond two levels of nesting, there is limited support for flattening. In order to handlenesting completely, you have to manually modify the inferred schema to suite your needs. Use eitherof the following strategies to do this:

 • Use the JSON row path to selectively pull out only the required key value pairs for your application.

A JSON row path provides a pointer to the specific key value pair you would like to bring in yourapplication. This can be done for any level of nesting.

 • Use the JSON row path to selectively pull out complex JSON objects and then use string

manipulation functions in your application code to pull the specific data that you need.

Connecting to OutputsWe recommend that every application have at least two outputs. use the first destination to insert theresults of your SQL queries. Use the second destination to insert the entire error stream and send it to anS3 bucket through a Amazon Kinesis Data Firehose delivery stream.

Authoring Application CodeWe recommend the following:

• In your SQL statement, we recommend that you do not specify time-based window that is longer thanone hour for the following reasons:• If an application needs to be restarted, either because you updated the application or for Amazon

Kinesis Data Analytics internal reasons, all data included in the window must be read again from thestreaming data source. This will take time before Amazon Kinesis Data Analytics can emit output forthat window.

• Amazon Kinesis Data Analytics must maintain everything related to the application's state, includingrelevant data, for the duration. This will consume significant Amazon Kinesis Data Analyticsprocessing units.

• During development, keep window size small in your SQL statements so that you can see the resultsfaster. When you deploy the application to your production environment, you can set the window sizeas appropriate.

• Instead of a single complex SQL statement, you might consider breaking it into multiple statements, ineach step saving results in intermediate in-application streams. This might help you debug faster.

• When using tumbling windows, we recommend that you use two windows, one for processing timeand one for your logical time (ingest time or event time). For more information, see Timestamps andthe ROWTIME Column (p. 66).

132

Page 139: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideGet a SQL Statement to Work Correctly

Troubleshooting Amazon KinesisData Analytics

The following can help you troubleshoot problems you have with Amazon Kinesis Data Analytics.

Get a SQL Statement to Work CorrectlyIf you need to figure out how to get a particular SQL statement to work correctly, you have severaldifferent resources when using Amazon Kinesis Data Analytics:

• For more information about SQL statements, see Example Amazon Kinesis Data AnalyticsApplications (p. 76) in the Amazon Kinesis Data Analytics Developer Guide. This section provides anumber of SQL examples that you can use.

• The Amazon Kinesis Data Analytics SQL Reference provides a detailed guide to authoring streamingSQL statements.

• If you are still running into issues, we recommend that you ask a question on the Kinesis Data AnalyticsForums.

Unable to Detect or Discover My SchemaIn some cases, Kinesis Data Analytics is unable to detect or discover a schema. In many of these cases,you can still use Kinesis Data Analytics.

Suppose that you have UTF-8 encoded data that doesn't use a delimiter, data that uses a format otherthan comma-separated value (CSV) format, or the discovery API did not discover your schema. In thesecases, you can define a schema by hand or use string manipulation functions to structure your data.

To discover the schema for your stream, Kinesis Data Analytics randomly samples the latest data in yourstream. If you aren't consistently sending data to your stream, Kinesis Data Analytics might not be ableto retrieve a sample and detect a schema. For more information, see Using the Schema Discovery Featureon Streaming Data (p. 16) Using the Schema Discovery Feature and Related Editing in the Amazon KinesisData Analytics Developer Guide.

Important Application Health Parameters toMonitor

To make sure that your application is running correctly, we recommend that you monitor certainimportant parameters.

The most important parameter to monitor is the Amazon CloudWatch metric MillisBehindLatest.This metric represents how far behind the current time you are reading from the stream. This metrichelps you determine whether you are processing records from the source stream fast enough.

As a rule of thumb, you should set up a CloudWatch alarm to trigger if you fall behind more than onehour. However, the amount of time depends on your use case. You can adjust it as needed.

133

Page 140: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInvalid Code Errors When Running an Application

For more information, see Best Practices (p. 130) in the Amazon Kinesis Data Analytics Developer Guide.

Invalid Code Errors When Running an ApplicationWhen you cannot save and run the SQL code for your Amazon Kinesis Data Analytics application, thefollowing are common causes:

• The stream was redefined in your SQL code – After you create a stream and the pump associatedwith the stream, you cannot redefine the same stream in your code. For more information aboutcreating a stream, see CREATE STREAM. For more information about creating a pump, see CREATEPUMP.

• A GROUP BY clause uses multiple ROWTIME columns – You can specify only one ROWTIME columnin the GROUP BY clause. For more information, see GROUP BY and ROWTIME.

• One or more data types have an invalid casting – In this case, your code has an invalid implicit cast.For example, you might be casting a timestamp to a bigint in your code.

• A stream has the same name as a service reserved stream name – A stream cannot have the samename as the service-reserved stream error_stream.

Application Doesn't Process Data After Deletingand Re-creating the Kinesis Application InputStream or Kinesis Data Firehose Delivery Streamwith the Same Name

Suppose that you delete the Kinesis stream that provides application input for a running application andcreate a new Kinesis stream with the same name. In this case, the application doesn't process the inputdata from the new stream. In addition, no data is delivered to the destination.

The same effect occurs if you delete the Kinesis Data Firehose delivery stream for a running applicationand create a new Kinesis Data Firehose delivery stream with the same name.

To resolve this issue, stop and restart the application through the AWS Management Console.

Insufficient Throughput or High MillisBehindLatestIf your application's MillisBehindLatest metric is steadily increasing or consistently is above 1000 (onesecond), it can be due to the following reasons:

• Check your application's InputBytes CloudWatch metric. If you are ingesting more than 4 MB/sec, thiscan cause an increase in MillisBehindLatest. To improve your application's throughput, increase thevalue of the InputParallelism parameter. For more information, see Parallelizing Input Streams forIncreased Throughput (p. 27).

• Check your application's output delivery Success metric for failures in delivering to your destination.Verify that you have correctly configured the output, and that your output stream has sufficientcapacity.

• If your application uses an AWS Lambda function for pre-processing or as an output, check theapplication’s InputProcessing.Duration or LambdaDelivery.Duration CloudWatch metric. If the Lambdafunction invocation duration is longer than 5 seconds, consider doing the following:

134

Page 141: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInsufficient Throughput or High MillisBehindLatest

• Increase the Lambda function’s Memory allocation under Configuration -> Basic Settings in theLambda console. For more information, see Configuring Lambda Functions.

• Increase the number of shards in your input stream of the application. This will increase the numberof parallel functions the application will invoke which may increase throughput.

• Verify that the function is not making blocking calls that are impacting performance, such assynchronous requests for external resources.

• Examine your Lambda function to see if there are other areas where you can improve performance.Check the CloudWatch Logs of the application Lambda function.For more information, see AccessingAmazon CloudWatch Metrics for AWS Lambda.

• Verify that your application is not reaching the default limit for Kinesis Processing Units (KPU). Ifyour application is reaching this limit, you can request a limit increase. For more information, seeAutomatically Scaling Applications to Increase Throughput (p. 42).

135

Page 142: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAuthentication

Authentication and Access Controlfor Amazon Kinesis Data Analytics

Access to Amazon Kinesis Data Analytics requires credentials. Those credentials must have permissionsto access AWS resources, such as an Amazon Kinesis Data Analytics application or an Amazon ElasticCompute Cloud (Amazon EC2) instance. The following sections provide details on how you can use AWSIdentity and Access Management (IAM) and Amazon Kinesis Data Analytics to help secure access to yourresources.

• Authentication (p. 136)

• Access Control (p. 137)

AuthenticationYou can access AWS as any of the following types of identities:

• AWS account root user – When you first create an AWS account, you begin with a single sign-inidentity that has complete access to all AWS services and resources in the account. This identity iscalled the AWS account root user and is accessed by signing in with the email address and passwordthat you used to create the account. We strongly recommend that you do not use the root user foryour everyday tasks, even the administrative ones. Instead, adhere to the best practice of using theroot user only to create your first IAM user. Then securely lock away the root user credentials and usethem to perform only a few account and service management tasks.

• IAM user – An IAM user is an identity within your AWS account that has specific custom permissions(for example, permissions to create an application in Amazon Kinesis Data Analytics). You can use anIAM user name and password to sign in to secure AWS webpages like the AWS Management Console,AWS Discussion Forums, or the AWS Support Center.

 

In addition to a user name and password, you can also generate access keys for each user. You canuse these keys when you access AWS services programmatically, either through one of the severalSDKs or by using the AWS Command Line Interface (CLI). The SDK and CLI tools use the access keysto cryptographically sign your request. If you don’t use AWS tools, you must sign the request yourself.Amazon Kinesis Data Analytics supports Signature Version 4, a protocol for authenticating inbound APIrequests. For more information about authenticating requests, see Signature Version 4 Signing Processin the AWS General Reference.

 

• IAM role – An IAM role is an IAM identity that you can create in your account that has specificpermissions. It is similar to an IAM user, but it is not associated with a specific person. An IAM roleenables you to obtain temporary access keys that can be used to access AWS services and resources.IAM roles with temporary credentials are useful in the following situations:

 

• Federated user access – Instead of creating an IAM user, you can use existing user identities fromAWS Directory Service, your enterprise user directory, or a web identity provider. These are known asfederated users. AWS assigns a role to a federated user when access is requested through an identity

136

Page 143: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAccess Control

provider. For more information about federated users, see Federated Users and Roles in the IAM UserGuide.

 • AWS service access – You can use an IAM role in your account to grant an AWS service permissions

to access your account’s resources. For example, you can create a role that allows Amazon Redshiftto access an Amazon S3 bucket on your behalf and then load data from that bucket into an AmazonRedshift cluster. For more information, see Creating a Role to Delegate Permissions to an AWSService in the IAM User Guide.

 • Applications running on Amazon EC2 – You can use an IAM role to manage temporary credentials

for applications that are running on an EC2 instance and making AWS API requests. This is preferableto storing access keys within the EC2 instance. To assign an AWS role to an EC2 instance and makeit available to all of its applications, you create an instance profile that is attached to the instance.An instance profile contains the role and enables programs that are running on the EC2 instanceto get temporary credentials. For more information, see Using an IAM Role to Grant Permissions toApplications Running on Amazon EC2 Instances in the IAM User Guide.

Access ControlYou can have valid credentials to authenticate your requests, but unless you have permissions you cannotcreate or access Amazon Kinesis Data Analytics resources. For example, you must have permissions tocreate an Amazon Kinesis Data Analytics application.

The following sections describe how to manage permissions for Amazon Kinesis Data Analytics. Werecommend that you read the overview first.

• Overview of Managing Access Permissions to Your Amazon Kinesis Data Analytics Resources (p. 137)• Using Identity-Based Policies (IAM Policies) for Amazon Kinesis Data Analytics (p. 141)• Amazon Kinesis Data Analytics API Permissions: Actions, Permissions, and Resources

Reference (p. 146)

Overview of Managing Access Permissions to YourAmazon Kinesis Data Analytics Resources

Every AWS resource is owned by an AWS account, and permissions to create or access a resource aregoverned by permissions policies. An account administrator can attach permissions policies to IAMidentities (that is, users, groups, and roles), and some services (such as AWS Lambda) also supportattaching permissions policies to resources.

NoteAn account administrator (or administrator user) is a user with administrator privileges. For moreinformation, see IAM Best Practices in the IAM User Guide.

When granting permissions, you decide who is getting the permissions, the resources they getpermissions for, and the specific actions that you want to allow on those resources.

Topics• Amazon Kinesis Data Analytics Resources and Operations (p. 138)• Understanding Resource Ownership (p. 138)• Managing Access to Resources (p. 138)

137

Page 144: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAmazon Kinesis Data Analytics Resources and Operations

• Specifying Policy Elements: Actions, Effects, and Principals (p. 140)• Specifying Conditions in a Policy (p. 140)

Amazon Kinesis Data Analytics Resources andOperationsIn Amazon Kinesis Data Analytics, the primary resource is an application. In a policy, you use an AmazonResource Name (ARN) to identify the resource that the policy applies to.

These resources have unique Amazon Resource Names (ARNs) associated with them, as shown in thefollowing table.

Resource Type ARN Format

Application arn:aws:kinesisanalytics:region:account-id:application/application-name

Amazon Kinesis Data Analytics provides a set of operations to work with Amazon Kinesis Data Analyticsresources. For a list of available operations, see Amazon Kinesis Data Analytics Actions (p. 149).

Understanding Resource OwnershipThe AWS account owns the resources that are created in the account, regardless of who created theresources. Specifically, the resource owner is the AWS account of the principal entity (that is, the rootaccount, an IAM user, or an IAM role) that authenticates the resource creation request. The followingexamples illustrate how this works:

• If you use the root account credentials of your AWS account to create an application, your AWSaccount is the owner of the resource (in Amazon Kinesis Data Analytics, the resource is an application).

• If you create an IAM user in your AWS account and grant permissions to create an application to thatuser, the user can create an application. However, your AWS account, to which the user belongs, ownsthe application resource.

• If you create an IAM role in your AWS account with permissions to create an application, anyone whocan assume the role can create an application. Your AWS account, to which the user belongs, owns theapplication resource.

Managing Access to ResourcesA permissions policy describes who has access to what. The following section explains the availableoptions for creating permissions policies.

NoteThis section discusses using IAM in the context of Amazon Kinesis Data Analytics. It doesn'tprovide detailed information about the IAM service. For complete IAM documentation, see WhatIs IAM? in the IAM User Guide. For information about IAM policy syntax and descriptions, seeAWS IAM Policy Reference in the IAM User Guide.

Policies attached to an IAM identity are referred to as identity-based policies (IAM polices) and policiesattached to a resource are referred to as resource-based policies. Amazon Kinesis Data Analytics supportsonly identity-based policies (IAM policies).

Topics

138

Page 145: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideManaging Access to Resources

• Identity-Based Policies (IAM Policies) (p. 139)

• Resource-Based Policies (p. 140)

Identity-Based Policies (IAM Policies)You can attach policies to IAM identities. For example, you can do the following:

• Attach a permissions policy to a user or a group in your account – To grant a user permissions tocreate an Amazon Kinesis Data Analytics resource, such as an application, you can attach a permissionspolicy to a user or group that the user belongs to.

• Attach a permissions policy to a role (grant cross-account permissions) – You can attach anidentity-based permissions policy to an IAM role to grant cross-account permissions. For example,the administrator in account A can create a role to grant cross-account permissions to another AWSaccount (for example, account B) or an AWS service as follows:

1. Account A administrator creates an IAM role and attaches a permissions policy to the role thatgrants permissions on resources in account A.

2. Account A administrator attaches a trust policy to the role identifying account B as the principalwho can assume the role.

3. Account B administrator can then delegate permissions to assume the role to any users in account B.Doing this allows users in account B to create or access resources in account A. The principal in thetrust policy can also be an AWS service principal if you want to grant an AWS service permissions toassume the role.

For more information about using IAM to delegate permissions, see Access Management in the IAMUser Guide.

The following is an example policy that grants permission for thekinesisanalytics:CreateApplication action, which is required to create an Amazon Kinesis DataAnalytics application.

Note that:

NoteThis is an introductory example policy. When you attach the policy to the user, the user willbe able to create an application using the AWS CLI or AWS SDK. But the user will need morepermissions to configure input and output. In addition, the user will need more permissionswhen using the console. The later sections provide more information.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1473028104000", "Effect": "Allow", "Action": [ "kinesisanalytics:CreateApplication" ], "Resource": [ "*" ] } ]}

For more information about using identity-based policies with Amazon Kinesis Data Analytics, see UsingIdentity-Based Policies (IAM Policies) for Amazon Kinesis Data Analytics (p. 141). For more information

139

Page 146: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideSpecifying Policy Elements: Actions, Effects, and Principals

about users, groups, roles, and permissions, see Identities (Users, Groups, and Roles) in the IAM UserGuide.

Resource-Based Policies

Other services, such as Amazon S3, also support resource-based permissions policies. For example, youcan attach a policy to an S3 bucket to manage access permissions to that bucket. Amazon Kinesis DataAnalytics doesn't support resource-based policies.

Specifying Policy Elements: Actions, Effects, andPrincipalsFor each Amazon Kinesis Data Analytics resource, the service defines a set of API operations. To grantpermissions for these API operations, Amazon Kinesis Data Analytics defines a set of actions thatyou can specify in a policy. Some API operations can require permissions for more than one action inorder to perform the API operation. For more information about resources and API operations, seeAmazon Kinesis Data Analytics Resources and Operations (p. 138) and Amazon Kinesis Data AnalyticsActions (p. 149).

The following are the most basic policy elements:

• Resource – You use an Amazon Resource Name (ARN) to identify the resource that the policy appliesto. For more information, see Amazon Kinesis Data Analytics Resources and Operations (p. 138).

• Action – You use action keywords to identify resource operations that you want to allow or deny. Forexample, you can use create to allow users to create an application.

• Effect – You specify the effect, either allow or deny, when the user requests the specific action. If youdon't explicitly grant access to (allow) a resource, access is implicitly denied. You can also explicitlydeny access to a resource, which you might do to make sure that a user cannot access it, even if adifferent policy grants access.

• Principal – In identity-based policies (IAM policies), the user that the policy is attached to is theimplicit principal. For resource-based policies, you specify the user, account, service, or other entitythat you want to receive permissions (applies to resource-based policies only). Amazon Kinesis DataAnalytics doesn't support resource-based policies.

To learn more about IAM policy syntax and descriptions, see AWS IAM Policy Reference in the IAM UserGuide.

For a list showing all of the Amazon Kinesis Data Analytics API operations and the resources that theyapply to, see Amazon Kinesis Data Analytics API Permissions: Actions, Permissions, and ResourcesReference (p. 146).

Specifying Conditions in a PolicyWhen you grant permissions, you can use the access policy language to specify the conditions when apolicy should take effect. For example, you might want a policy to be applied only after a specific date.For more information about specifying conditions in a policy language, see Condition in the IAM UserGuide.

To express conditions, you use predefined condition keys. There are no condition keys specific to AmazonKinesis Data Analytics. However, there are AWS-wide condition keys that you can use as appropriate. Fora complete list of AWS-wide keys, see Available Keys for Conditions in the IAM User Guide.

140

Page 147: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideUsing Identity-Based Policies (IAM Policies)

Using Identity-Based Policies (IAM Policies) forAmazon Kinesis Data Analytics

This topic provides examples of identity-based policies that demonstrate how an account administratorcan attach permissions policies to IAM identities (that is, users, groups, and roles) and thereby grantpermissions to perform operations on Amazon Kinesis Data Analytics resources.

ImportantWe recommend that you first review the introductory topics that explain the basic conceptsand options available to manage access to your Amazon Kinesis Data Analytics resources. Formore information, see Overview of Managing Access Permissions to Your Amazon Kinesis DataAnalytics Resources (p. 137).

Topics

• Permissions Required to Use the Amazon Kinesis Data Analytics Console (p. 141)

• AWS Managed (Predefined) Policies for Amazon Kinesis Data Analytics (p. 142)

• Customer Managed Policy Examples (p. 143)

The following shows an example of a permissions policy.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1473028104000", "Effect": "Allow", "Action": [ "kinesisanalytics:CreateApplication" ], "Resource": [ "*" ] } ]}

The policy has one statement:

• The first statement grants permissions for one Amazon Kinesis Data Analytics action(kinesisanalytics:CreateApplication) on a resource using the Amazon Resource Name(ARN) for the application. The ARN in this case specifies a wildcard character (*) to indicate that thepermission is granted for any resource.

For a table showing all of the Amazon Kinesis Data Analytics API operations and the resources thatthey apply to, see Amazon Kinesis Data Analytics API Permissions: Actions, Permissions, and ResourcesReference (p. 146).

Permissions Required to Use the Amazon Kinesis DataAnalytics ConsoleFor a user to work with Amazon Kinesis Data Analytics console, you need to grant the requisitepermissions. For example, if you want to grant a user permission to create an application, you need

141

Page 148: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAWS Managed (Predefined) Policiesfor Amazon Kinesis Data Analytics

to grant permissions that will show the user the streaming sources in the account so that the user canconfigure input and output in the console.

We recommend the following:

• Use the AWS managed policies to grant user permissions. For available policies, see AWS Managed(Predefined) Policies for Amazon Kinesis Data Analytics (p. 142).

• Create custom policies. In this case, we recommend that you review the example provided in thissection. For more information, see Customer Managed Policy Examples (p. 143).

AWS Managed (Predefined) Policies for AmazonKinesis Data AnalyticsAWS addresses many common use cases by providing standalone IAM policies that are created andadministered by AWS. These AWS managed policies grant necessary permissions for common use casesso that you can avoid having to investigate what permissions are needed. For more information, see AWSManaged Policies in the IAM User Guide.

The following AWS managed policies, which you can attach to users in your account, are specific toAmazon Kinesis Data Analytics:

• AmazonKinesisAnalyticsReadOnly – Grants permissions for Amazon Kinesis Data Analytics actionsthat enable a user to list Amazon Kinesis Data Analytics applications and review input/outputconfiguration. It also grants permissions that allow a user to view a list of Kinesis streams and KinesisData Firehose delivery streams. As the application is running, the user can view source data and real-time analytics results in the console.

 

• AmazonKinesisAnalyticsFullAccess – Grants permissions for all Amazon Kinesis Data Analytics actionsand all other permissions that allows a user to create and manage Amazon Kinesis Data Analyticsapplications. However, note the following:

 

• These permissions are not sufficient if the user wants to create a new IAM role in the console (thesepermissions allow the user to select an existing role). If you want the user to be able to create an IAMrole in the console, add the IAMFullAccess AWS managed policy.

 

• A user must have permission for the iam:PassRole action to specify an IAM role when configuringAmazon Kinesis Data Analytics application. This AWS managed policy grants permission for theiam:PassRole action to the user only on the IAM roles that start with the prefix service-role/kinesis-analytics.

If the user wants to configure the Amazon Kinesis Data Analytics application with a role that doesnot have this prefix, you first need to explicitly grant the user permission for the iam:PassRoleaction on the specific role.

NoteYou can review these permissions policies by signing in to the IAM console and searching forspecific policies there.

You can also create your own custom IAM policies to allow permissions for Amazon Kinesis DataAnalytics actions and resources. You can attach these custom policies to the IAM users or groups thatrequire those permissions.

142

Page 149: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideCustomer Managed Policy Examples

Customer Managed Policy ExamplesThe examples in this section provide a group of sample policies that you can attach to a user. If you arenew to creating policies, we recommend that you first create an IAM user in your account and attach thepolicies to the user in sequence, as outlined in the steps in this section. You can then use the console toverify the effects of each policy as you attach the policy to the user.

Initially, the user doesn't have permissions and won't be able to do anything in the console. As you attachpolicies to the user, you can verify that the user can perform various actions in the console. 

We recommend that you use two browser windows: one to create the user and grant permissions, andthe other to sign in to the AWS Management Console using the user's credentials and verify permissionsas you grant them to the user.

For examples that show how to create an IAM role that you can use as an execution role for your AmazonKinesis Data Analytics application, see Creating IAM Roles in the IAM User Guide.

Example Steps• Step 1: Create an IAM User (p. 143)• Step 2: Allow the User Permissions for Actions that Are Not Specific to Amazon Kinesis Data

Analytics (p. 143)• Step 3: Allow the User to View a List of Applications and View Details (p. 144)• Step 4: Allow the User to Start a Specific Application (p. 145)• Step 5: Allow the User to Create an Amazon Kinesis Data Analytics Application (p. 145)• Step 6: Allow the Application to Use Lambda Preprocessing (p. 146)

Step 1: Create an IAM UserFirst, you need to create an IAM user, add the user to an IAM group with administrative permissions, andthen grant administrative permissions to the IAM user that you created. You can then access AWS using aspecial URL and that IAM user's credentials.

For instructions, see Creating Your First IAM User and Administrators Group in the IAM User Guide.

Step 2: Allow the User Permissions for Actions that Are NotSpecific to Amazon Kinesis Data AnalyticsFirst, grant a user permission for all actions that aren't specific to Amazon Kinesis Data Analytics that theuser will need when working with Amazon Kinesis Data Analytics applications. These include permissionsfor working with streams (Amazon Kinesis Data Streams actions, Amazon Kinesis Data Firehose actions),and permissions for CloudWatch actions. Attach the following policy to the user.

You need to update the policy by providing an IAM role name for which you want to grant theiam:PassRole permission, or specify a wildcard character (*) indicating all IAM roles. This is not a securepractice; however you might not have a specific IAM role created during this testing.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "kinesis:CreateStream", "kinesis:DeleteStream",

143

Page 150: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideCustomer Managed Policy Examples

"kinesis:DescribeStream", "kinesis:ListStreams", "kinesis:PutRecord", "kinesis:PutRecords" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "firehose:DescribeDeliveryStream", "firehose:ListDeliveryStreams" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "cloudwatch:GetMetricStatistics", "cloudwatch:ListMetrics" ], "Resource": "*" }, { "Effect": "Allow", "Action": "logs:GetLogEvents", "Resource": "*" }, { "Effect": "Allow", "Action": [ "iam:ListPolicyVersions", "iam:ListRoles" ], "Resource": "*" }, { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::*:role/service-role/role-name" } ]}

Step 3: Allow the User to View a List of Applications and ViewDetailsThe following policy grants a user the following permissions:

• Permission for the kinesisanalytics:ListApplications action so the user can view a list ofapplications. Note that this is a service-level API call, and therefore you specify "*" as the Resourcevalue.

• Permission for the kinesisanalytics:DescribeApplication action so that you can getinformation about any of the applications.

Add this policy to the user.

{ "Version": "2012-10-17", "Statement": [ {

144

Page 151: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideCustomer Managed Policy Examples

"Effect": "Allow", "Action": [ "kinesisanalytics:ListApplications" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "kinesisanalytics:DescribeApplication" ], "Resource": "arn:aws:kinesisanalytics:aws-region:aws-account-id:application/*" } ]}

Verify these permissions by signing into the Amazon Kinesis Data Analytics console using the IAM usercredentials.

Step 4: Allow the User to Start a Specific ApplicationIf you want the user to be able to start one of the existing Amazon Kinesis Data Analyticsapplications, you attach the following policy to the user, which provides the permission forkinesisanalytics:StartApplication action. You will need to update the policy by providing youraccount ID, AWS Region, and application name.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "kinesisanalytics:StartApplication" ], "Resource": "arn:aws:kinesisanalytics:aws-region:aws-account-id:application/application-name" } ]}

Step 5: Allow the User to Create an Amazon Kinesis DataAnalytics ApplicationNow suppose you want the user to create an Amazon Kinesis Data Analytics application. You can thenattach the following policy to the user. You will need to update the policy and provide an AWS Region,your account ID, and either a specific application name that you want the user to create or a "*" so theuser can specify any application name (and thus the user can create multiple applications).

{ "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1473028104000", "Effect": "Allow", "Action": [ "kinesisanalytics:CreateApplication" ], "Resource": [ "*" ]

145

Page 152: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAmazon Kinesis Data Analytics API Permissions Reference

}, { "Effect": "Allow", "Action": [ "kinesisanalytics:StartApplication", "kinesisanalytics:UpdateApplication", "kinesisanalytics:AddApplicationInput", "kinesisanalytics:AddApplicationOutput" ], "Resource": "arn:aws:kinesisanalytics:aws-region:aws-account-id:application/application-name" } ]}

Step 6: Allow the Application to Use Lambda PreprocessingIf you want the application to be able to use Lambda preprocessing, you can attach the following policyto the role. For more information on Lambda preprocessing, see Preprocessing Data Using a LambdaFunction (p. 20).

{ "Sid": "UseLambdaFunction", "Effect": "Allow", "Action": [ "lambda:InvokeFunction", "lambda:GetFunctionConfiguration" ], "Resource": "<FunctionARN>" }

Amazon Kinesis Data Analytics API Permissions:Actions, Permissions, and Resources Reference

When you are setting up Access Control (p. 137) and writing a permissions policy that you can attachto an IAM identity (identity-based policies), you can use the following list as a reference. The list includeseach Amazon Kinesis Data Analytics API operation, the corresponding actions for which you can grantpermissions to perform the action, and the AWS resource for which you can grant the permissions.You specify the actions in the policy's Action field, and you specify the resource value in the policy'sResource field.

You can use AWS-wide condition keys in your Amazon Kinesis Data Analytics policies to expressconditions. For a complete list of AWS-wide keys, see Available Keys in the IAM User Guide.

NoteTo specify an action, use the kinesisanalytics prefix followed by the API operation name(for example, kinesisanalytics:AddApplicationInput).

Amazon Kinesis Data Analytics API and Required Permissions for Actions

API Operation:

Required Permissions (API Action):

Resources:

146

Page 153: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAmazon Kinesis Data Analytics API Permissions Reference

Amazon Kinesis Data Analytics API and Required Permissions for Actions

Amazon RDS API and Required Permissions for Actions

API Operation:AddApplicationInput (p. 152)

Action: kinesisanalytics:AddApplicationInput

Resources:

arn:aws:kinesisanalytics: region:accountId:application/application-name

147

Page 154: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer Guide

Kinesis Data Analytics SQL ReferenceFor information about the SQL language elements that are supported by Amazon Kinesis Data Analytics,see Amazon Kinesis Data Analytics SQL Reference.

148

Page 155: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideActions

API ReferenceYou can use the AWS CLI to explore the Amazon Kinesis Data Analytics API. This guide provides GettingStarted with Amazon Kinesis Data Analytics (p. 44) exercises that use the AWS CLI.

Topics

• Actions (p. 149)• Data Types (p. 195)

ActionsThe following actions are supported:

• AddApplicationCloudWatchLoggingOption (p. 150)• AddApplicationInput (p. 152)• AddApplicationInputProcessingConfiguration (p. 155)• AddApplicationOutput (p. 157)• AddApplicationReferenceDataSource (p. 160)• CreateApplication (p. 163)• DeleteApplication (p. 168)• DeleteApplicationCloudWatchLoggingOption (p. 170)• DeleteApplicationInputProcessingConfiguration (p. 172)• DeleteApplicationOutput (p. 174)• DeleteApplicationReferenceDataSource (p. 176)• DescribeApplication (p. 178)• DiscoverInputSchema (p. 182)• ListApplications (p. 186)• StartApplication (p. 188)• StopApplication (p. 190)• UpdateApplication (p. 192)

149

Page 156: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAddApplicationCloudWatchLoggingOption

AddApplicationCloudWatchLoggingOptionAdds a CloudWatch log stream to monitor application configuration errors. For more information aboutusing CloudWatch log streams with Amazon Kinesis Analytics applications, see Working with AmazonCloudWatch Logs.

Request Syntax

{ "ApplicationName": "string", "CloudWatchLoggingOption": { "LogStreamARN": "string", "RoleARN": "string" }, "CurrentApplicationVersionId": number}

Request Parameters

The request accepts the following data in JSON format.

ApplicationName (p. 150)

The Kinesis Analytics application name.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: Yes

CloudWatchLoggingOption (p. 150)

Provides the CloudWatch log stream Amazon Resource Name (ARN) and the IAM role ARN. Note: Towrite application messages to CloudWatch, the IAM role that is used must have the PutLogEventspolicy action enabled.

Type: CloudWatchLoggingOption (p. 202) object

Required: Yes

CurrentApplicationVersionId (p. 150)

The version ID of the Kinesis Analytics application.

Type: Long

Valid Range: Minimum value of 1. Maximum value of 999999999.

Required: Yes

Response Elements

If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.

150

Page 157: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAddApplicationCloudWatchLoggingOption

ErrorsConcurrentModificationException

Exception thrown as a result of concurrent modification to an application. For example, twoindividuals attempting to edit the same application at the same time.

HTTP Status Code: 400InvalidArgumentException

Specified input parameter value is invalid.

HTTP Status Code: 400ResourceInUseException

Application is not available for this operation.

HTTP Status Code: 400ResourceNotFoundException

Specified application can't be found.

HTTP Status Code: 400

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS Command Line Interface• AWS SDK for .NET• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for JavaScript• AWS SDK for PHP V3• AWS SDK for Python• AWS SDK for Ruby V2

151

Page 158: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAddApplicationInput

AddApplicationInputAdds a streaming source to your Amazon Kinesis application. For conceptual information, seeConfiguring Application Input.

You can add a streaming source either when you create an application or you can use thisoperation to add a streaming source after you create an application. For more information, seeCreateApplication (p. 163).

Any configuration update, including adding a streaming source using this operation, results in a newversion of the application. You can use the DescribeApplication (p. 178) operation to find the currentapplication version.

This operation requires permissions to perform the kinesisanalytics:AddApplicationInputaction.

Request Syntax

{ "ApplicationName": "string", "CurrentApplicationVersionId": number, "Input": { "InputParallelism": { "Count": number }, "InputProcessingConfiguration": { "InputLambdaProcessor": { "ResourceARN": "string", "RoleARN": "string" } }, "InputSchema": { "RecordColumns": [ { "Mapping": "string", "Name": "string", "SqlType": "string" } ], "RecordEncoding": "string", "RecordFormat": { "MappingParameters": { "CSVMappingParameters": { "RecordColumnDelimiter": "string", "RecordRowDelimiter": "string" }, "JSONMappingParameters": { "RecordRowPath": "string" } }, "RecordFormatType": "string" } }, "KinesisFirehoseInput": { "ResourceARN": "string", "RoleARN": "string" }, "KinesisStreamsInput": { "ResourceARN": "string", "RoleARN": "string" }, "NamePrefix": "string"

152

Page 159: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAddApplicationInput

}}

Request ParametersThe request accepts the following data in JSON format.

ApplicationName (p. 152)

Name of your existing Amazon Kinesis Analytics application to which you want to add the streamingsource.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: YesCurrentApplicationVersionId (p. 152)

Current version of your Amazon Kinesis Analytics application. You can use theDescribeApplication (p. 178) operation to find the current application version.

Type: Long

Valid Range: Minimum value of 1. Maximum value of 999999999.

Required: YesInput (p. 152)

The Input (p. 207) to add.

Type: Input (p. 207) object

Required: Yes

Response ElementsIf the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.

ErrorsCodeValidationException

User-provided application code (query) is invalid. This can be a simple syntax error.

HTTP Status Code: 400ConcurrentModificationException

Exception thrown as a result of concurrent modification to an application. For example, twoindividuals attempting to edit the same application at the same time.

HTTP Status Code: 400InvalidArgumentException

Specified input parameter value is invalid.

153

Page 160: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAddApplicationInput

HTTP Status Code: 400ResourceInUseException

Application is not available for this operation.

HTTP Status Code: 400ResourceNotFoundException

Specified application can't be found.

HTTP Status Code: 400

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS Command Line Interface• AWS SDK for .NET• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for JavaScript• AWS SDK for PHP V3• AWS SDK for Python• AWS SDK for Ruby V2

154

Page 161: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAddApplicationInputProcessingConfiguration

AddApplicationInputProcessingConfigurationAdds an InputProcessingConfiguration (p. 217) to an application. An input processor preprocessesrecords on the input stream before the application's SQL code executes. Currently, the only inputprocessor available is AWS Lambda.

Request Syntax

{ "ApplicationName": "string", "CurrentApplicationVersionId": number, "InputId": "string", "InputProcessingConfiguration": { "InputLambdaProcessor": { "ResourceARN": "string", "RoleARN": "string" } }}

Request ParametersThe request accepts the following data in JSON format.

ApplicationName (p. 155)

Name of the application to which you want to add the input processing configuration.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: YesCurrentApplicationVersionId (p. 155)

Version of the application to which you want to add the input processing configuration. You can usethe DescribeApplication (p. 178) operation to get the current application version. If the versionspecified is not the current version, the ConcurrentModificationException is returned.

Type: Long

Valid Range: Minimum value of 1. Maximum value of 999999999.

Required: YesInputId (p. 155)

The ID of the input configuration to add the input processing configuration to. You can get a list ofthe input IDs for an application using the DescribeApplication (p. 178) operation.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 50.

Pattern: [a-zA-Z0-9_.-]+

Required: Yes

155

Page 162: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAddApplicationInputProcessingConfiguration

InputProcessingConfiguration (p. 155)

The InputProcessingConfiguration (p. 217) to add to the application.

Type: InputProcessingConfiguration (p. 217) object

Required: Yes

Response ElementsIf the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.

ErrorsConcurrentModificationException

Exception thrown as a result of concurrent modification to an application. For example, twoindividuals attempting to edit the same application at the same time.

HTTP Status Code: 400InvalidArgumentException

Specified input parameter value is invalid.

HTTP Status Code: 400ResourceInUseException

Application is not available for this operation.

HTTP Status Code: 400ResourceNotFoundException

Specified application can't be found.

HTTP Status Code: 400

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS Command Line Interface• AWS SDK for .NET• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for JavaScript• AWS SDK for PHP V3• AWS SDK for Python• AWS SDK for Ruby V2

156

Page 163: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAddApplicationOutput

AddApplicationOutputAdds an external destination to your Amazon Kinesis Analytics application.

If you want Amazon Kinesis Analytics to deliver data from an in-application stream within yourapplication to an external destination (such as an Amazon Kinesis stream, an Amazon Kinesis Firehosedelivery stream, or an AWS Lambda function), you add the relevant configuration to your applicationusing this operation. You can configure one or more outputs for your application. Each outputconfiguration maps an in-application stream and an external destination.

You can use one of the output configurations to deliver data from your in-application error stream toan external destination so that you can analyze the errors. For more information, see UnderstandingApplication Output (Destination).

Any configuration update, including adding a streaming source using this operation, results in a newversion of the application. You can use the DescribeApplication (p. 178) operation to find the currentapplication version.

For the limits on the number of application inputs and outputs you can configure, see Limits.

This operation requires permissions to perform the kinesisanalytics:AddApplicationOutputaction.

Request Syntax

{ "ApplicationName": "string", "CurrentApplicationVersionId": number, "Output": { "DestinationSchema": { "RecordFormatType": "string" }, "KinesisFirehoseOutput": { "ResourceARN": "string", "RoleARN": "string" }, "KinesisStreamsOutput": { "ResourceARN": "string", "RoleARN": "string" }, "LambdaOutput": { "ResourceARN": "string", "RoleARN": "string" }, "Name": "string" }}

Request ParametersThe request accepts the following data in JSON format.

ApplicationName (p. 157)

Name of the application to which you want to add the output configuration.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

157

Page 164: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAddApplicationOutput

Pattern: [a-zA-Z0-9_.-]+

Required: YesCurrentApplicationVersionId (p. 157)

Version of the application to which you want to add the output configuration. You can use theDescribeApplication (p. 178) operation to get the current application version. If the versionspecified is not the current version, the ConcurrentModificationException is returned.

Type: Long

Valid Range: Minimum value of 1. Maximum value of 999999999.

Required: YesOutput (p. 157)

An array of objects, each describing one output configuration. In the output configuration, youspecify the name of an in-application stream, a destination (that is, an Amazon Kinesis stream, anAmazon Kinesis Firehose delivery stream, or an AWS Lambda function), and record the formation touse when writing to the destination.

Type: Output (p. 241) object

Required: Yes

Response ElementsIf the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.

ErrorsConcurrentModificationException

Exception thrown as a result of concurrent modification to an application. For example, twoindividuals attempting to edit the same application at the same time.

HTTP Status Code: 400InvalidArgumentException

Specified input parameter value is invalid.

HTTP Status Code: 400ResourceInUseException

Application is not available for this operation.

HTTP Status Code: 400ResourceNotFoundException

Specified application can't be found.

HTTP Status Code: 400

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

158

Page 166: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAddApplicationReferenceDataSource

AddApplicationReferenceDataSourceAdds a reference data source to an existing application.

Amazon Kinesis Analytics reads reference data (that is, an Amazon S3 object) and creates an in-application table within your application. In the request, you provide the source (S3 bucket name andobject key name), name of the in-application table to create, and the necessary mapping informationthat describes how data in Amazon S3 object maps to columns in the resulting in-application table.

For conceptual information, see Configuring Application Input. For the limits on data sources you can addto your application, see Limits.

This operation requires permissions to perform the kinesisanalytics:AddApplicationOutputaction.

Request Syntax

{ "ApplicationName": "string", "CurrentApplicationVersionId": number, "ReferenceDataSource": { "ReferenceSchema": { "RecordColumns": [ { "Mapping": "string", "Name": "string", "SqlType": "string" } ], "RecordEncoding": "string", "RecordFormat": { "MappingParameters": { "CSVMappingParameters": { "RecordColumnDelimiter": "string", "RecordRowDelimiter": "string" }, "JSONMappingParameters": { "RecordRowPath": "string" } }, "RecordFormatType": "string" } }, "S3ReferenceDataSource": { "BucketARN": "string", "FileKey": "string", "ReferenceRoleARN": "string" }, "TableName": "string" }}

Request ParametersThe request accepts the following data in JSON format.

ApplicationName (p. 160)

Name of an existing application.

Type: String

160

Page 167: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideAddApplicationReferenceDataSource

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: Yes

CurrentApplicationVersionId (p. 160)

Version of the application for which you are adding the reference data source. You can use theDescribeApplication (p. 178) operation to get the current application version. If the versionspecified is not the current version, the ConcurrentModificationException is returned.

Type: Long

Valid Range: Minimum value of 1. Maximum value of 999999999.

Required: Yes

ReferenceDataSource (p. 160)

The reference data source can be an object in your Amazon S3 bucket. Amazon Kinesis Analyticsreads the object and copies the data into the in-application table that is created. You provide an S3bucket, object key name, and the resulting in-application table that is created. You must also providean IAM role with the necessary permissions that Amazon Kinesis Analytics can assume to read theobject from your S3 bucket on your behalf.

Type: ReferenceDataSource (p. 249) object

Required: Yes

Response ElementsIf the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.

ErrorsConcurrentModificationException

Exception thrown as a result of concurrent modification to an application. For example, twoindividuals attempting to edit the same application at the same time.

HTTP Status Code: 400

InvalidArgumentException

Specified input parameter value is invalid.

HTTP Status Code: 400

ResourceInUseException

Application is not available for this operation.

HTTP Status Code: 400

ResourceNotFoundException

Specified application can't be found.

HTTP Status Code: 400

161

Page 169: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideCreateApplication

CreateApplicationCreates an Amazon Kinesis Analytics application. You can configure each application with one streamingsource as input, application code to process the input, and up to three destinations where you wantAmazon Kinesis Analytics to write the output data from your application. For an overview, see How itWorks.

In the input configuration, you map the streaming source to an in-application stream, which youcan think of as a constantly updating table. In the mapping, you must provide a schema for the in-application stream and map each data column in the in-application stream to a data element in thestreaming source.

Your application code is one or more SQL statements that read input data, transform it, and generateoutput. Your application code can create one or more SQL artifacts like SQL streams or pumps.

In the output configuration, you can configure the application to write data from in-application streamscreated in your applications to up to three destinations.

To read data from your source stream or write data to destination streams, Amazon Kinesis Analyticsneeds your permissions. You grant these permissions by creating IAM roles. This operation requirespermissions to perform the kinesisanalytics:CreateApplication action.

For introductory exercises to create an Amazon Kinesis Analytics application, see Getting Started.

Request Syntax

{ "ApplicationCode": "string", "ApplicationDescription": "string", "ApplicationName": "string", "CloudWatchLoggingOptions": [ { "LogStreamARN": "string", "RoleARN": "string" } ], "Inputs": [ { "InputParallelism": { "Count": number }, "InputProcessingConfiguration": { "InputLambdaProcessor": { "ResourceARN": "string", "RoleARN": "string" } }, "InputSchema": { "RecordColumns": [ { "Mapping": "string", "Name": "string", "SqlType": "string" } ], "RecordEncoding": "string", "RecordFormat": { "MappingParameters": { "CSVMappingParameters": { "RecordColumnDelimiter": "string", "RecordRowDelimiter": "string" },

163

Page 170: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideCreateApplication

"JSONMappingParameters": { "RecordRowPath": "string" } }, "RecordFormatType": "string" } }, "KinesisFirehoseInput": { "ResourceARN": "string", "RoleARN": "string" }, "KinesisStreamsInput": { "ResourceARN": "string", "RoleARN": "string" }, "NamePrefix": "string" } ], "Outputs": [ { "DestinationSchema": { "RecordFormatType": "string" }, "KinesisFirehoseOutput": { "ResourceARN": "string", "RoleARN": "string" }, "KinesisStreamsOutput": { "ResourceARN": "string", "RoleARN": "string" }, "LambdaOutput": { "ResourceARN": "string", "RoleARN": "string" }, "Name": "string" } ]}

Request ParametersThe request accepts the following data in JSON format.

ApplicationCode (p. 163)

One or more SQL statements that read input data, transform it, and generate output. For example,you can write a SQL statement that reads data from one in-application stream, generates a runningaverage of the number of advertisement clicks by vendor, and insert resulting rows in another in-application stream using pumps. For more information about the typical pattern, see ApplicationCode.

You can provide such series of SQL statements, where output of one statement can be used as theinput for the next statement. You store intermediate results by creating in-application streams andpumps.

Note that the application code must create the streams with names specified in the Outputs.For example, if your Outputs defines output streams named ExampleOutputStream1 andExampleOutputStream2, then your application code must create these streams.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 51200.

164

Page 171: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideCreateApplication

Required: NoApplicationDescription (p. 163)

Summary description of the application.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 1024.

Required: NoApplicationName (p. 163)

Name of your Amazon Kinesis Analytics application (for example, sample-app).

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: YesCloudWatchLoggingOptions (p. 163)

Use this parameter to configure a CloudWatch log stream to monitor application configurationerrors. For more information, see Working with Amazon CloudWatch Logs.

Type: Array of CloudWatchLoggingOption (p. 202) objects

Required: NoInputs (p. 163)

Use this parameter to configure the application input.

You can configure your application to receive input from a single streaming source. In thisconfiguration, you map this streaming source to an in-application stream that is created. Yourapplication code can then query the in-application stream like a table (you can think of it as aconstantly updating table).

For the streaming source, you provide its Amazon Resource Name (ARN) and format of data onthe stream (for example, JSON, CSV, etc.). You also must provide an IAM role that Amazon KinesisAnalytics can assume to read this stream on your behalf.

To create the in-application stream, you need to specify a schema to transform your data into aschematized version used in SQL. In the schema, you provide the necessary mapping of the dataelements in the streaming source to record columns in the in-app stream.

Type: Array of Input (p. 207) objects

Required: NoOutputs (p. 163)

You can configure application output to write data from any of the in-application streams to up tothree destinations.

These destinations can be Amazon Kinesis streams, Amazon Kinesis Firehose delivery streams, AWSLambda destinations, or any combination of the three.

In the configuration, you specify the in-application stream name, the destination stream or Lambdafunction Amazon Resource Name (ARN), and the format to use when writing data. You must also

165

Page 172: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideCreateApplication

provide an IAM role that Amazon Kinesis Analytics can assume to write to the destination stream orLambda function on your behalf.

In the output configuration, you also provide the output stream or Lambda function ARN. For streamdestinations, you provide the format of data in the stream (for example, JSON, CSV). You also mustprovide an IAM role that Amazon Kinesis Analytics can assume to write to the stream or Lambdafunction on your behalf.

Type: Array of Output (p. 241) objects

Required: No

Response Syntax

{ "ApplicationSummary": { "ApplicationARN": "string", "ApplicationName": "string", "ApplicationStatus": "string" }}

Response ElementsIf the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

ApplicationSummary (p. 166)

In response to your CreateApplication request, Amazon Kinesis Analytics returns a response witha summary of the application it created, including the application Amazon Resource Name (ARN),name, and status.

Type: ApplicationSummary (p. 200) object

ErrorsCodeValidationException

User-provided application code (query) is invalid. This can be a simple syntax error.

HTTP Status Code: 400InvalidArgumentException

Specified input parameter value is invalid.

HTTP Status Code: 400LimitExceededException

Exceeded the number of applications allowed.

HTTP Status Code: 400ResourceInUseException

Application is not available for this operation.

166

Page 174: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDeleteApplication

DeleteApplicationDeletes the specified application. Amazon Kinesis Analytics halts application execution and deletes theapplication, including any application artifacts (such as in-application streams, reference table, andapplication code).

This operation requires permissions to perform the kinesisanalytics:DeleteApplication action.

Request Syntax

{ "ApplicationName": "string", "CreateTimestamp": number}

Request ParametersThe request accepts the following data in JSON format.

ApplicationName (p. 168)

Name of the Amazon Kinesis Analytics application to delete.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: YesCreateTimestamp (p. 168)

You can use the DescribeApplication operation to get this value.

Type: Timestamp

Required: Yes

Response ElementsIf the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.

ErrorsConcurrentModificationException

Exception thrown as a result of concurrent modification to an application. For example, twoindividuals attempting to edit the same application at the same time.

HTTP Status Code: 400ResourceInUseException

Application is not available for this operation.

HTTP Status Code: 400

168

Page 176: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDeleteApplicationCloudWatchLoggingOption

DeleteApplicationCloudWatchLoggingOptionDeletes a CloudWatch log stream from an application. For more information about using CloudWatch logstreams with Amazon Kinesis Analytics applications, see Working with Amazon CloudWatch Logs.

Request Syntax

{ "ApplicationName": "string", "CloudWatchLoggingOptionId": "string", "CurrentApplicationVersionId": number}

Request Parameters

The request accepts the following data in JSON format.

ApplicationName (p. 170)

The Kinesis Analytics application name.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: Yes

CloudWatchLoggingOptionId (p. 170)

The CloudWatchLoggingOptionId of the CloudWatch logging option to delete. You can get theCloudWatchLoggingOptionId by using the DescribeApplication (p. 178) operation.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 50.

Pattern: [a-zA-Z0-9_.-]+

Required: Yes

CurrentApplicationVersionId (p. 170)

The version ID of the Kinesis Analytics application.

Type: Long

Valid Range: Minimum value of 1. Maximum value of 999999999.

Required: Yes

Response Elements

If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.

170

Page 177: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDeleteApplicationCloudWatchLoggingOption

ErrorsConcurrentModificationException

Exception thrown as a result of concurrent modification to an application. For example, twoindividuals attempting to edit the same application at the same time.

HTTP Status Code: 400InvalidArgumentException

Specified input parameter value is invalid.

HTTP Status Code: 400ResourceInUseException

Application is not available for this operation.

HTTP Status Code: 400ResourceNotFoundException

Specified application can't be found.

HTTP Status Code: 400

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS Command Line Interface• AWS SDK for .NET• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for JavaScript• AWS SDK for PHP V3• AWS SDK for Python• AWS SDK for Ruby V2

171

Page 178: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDeleteApplicationInputProcessingConfiguration

DeleteApplicationInputProcessingConfigurationDeletes an InputProcessingConfiguration (p. 217) from an input.

Request Syntax

{ "ApplicationName": "string", "CurrentApplicationVersionId": number, "InputId": "string"}

Request ParametersThe request accepts the following data in JSON format.

ApplicationName (p. 172)

The Kinesis Analytics application name.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: YesCurrentApplicationVersionId (p. 172)

The version ID of the Kinesis Analytics application.

Type: Long

Valid Range: Minimum value of 1. Maximum value of 999999999.

Required: YesInputId (p. 172)

The ID of the input configuration from which to delete the input processing configuration. You canget a list of the input IDs for an application by using the DescribeApplication (p. 178) operation.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 50.

Pattern: [a-zA-Z0-9_.-]+

Required: Yes

Response ElementsIf the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.

ErrorsConcurrentModificationException

Exception thrown as a result of concurrent modification to an application. For example, twoindividuals attempting to edit the same application at the same time.

172

Page 179: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDeleteApplicationInputProcessingConfiguration

HTTP Status Code: 400InvalidArgumentException

Specified input parameter value is invalid.

HTTP Status Code: 400ResourceInUseException

Application is not available for this operation.

HTTP Status Code: 400ResourceNotFoundException

Specified application can't be found.

HTTP Status Code: 400

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS Command Line Interface• AWS SDK for .NET• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for JavaScript• AWS SDK for PHP V3• AWS SDK for Python• AWS SDK for Ruby V2

173

Page 180: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDeleteApplicationOutput

DeleteApplicationOutputDeletes output destination configuration from your application configuration. Amazon KinesisAnalytics will no longer write data from the corresponding in-application stream to the external outputdestination.

This operation requires permissions to perform the kinesisanalytics:DeleteApplicationOutputaction.

Request Syntax

{ "ApplicationName": "string", "CurrentApplicationVersionId": number, "OutputId": "string"}

Request ParametersThe request accepts the following data in JSON format.

ApplicationName (p. 174)

Amazon Kinesis Analytics application name.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: YesCurrentApplicationVersionId (p. 174)

Amazon Kinesis Analytics application version. You can use the DescribeApplication (p. 178)operation to get the current application version. If the version specified is not the current version,the ConcurrentModificationException is returned.

Type: Long

Valid Range: Minimum value of 1. Maximum value of 999999999.

Required: YesOutputId (p. 174)

The ID of the configuration to delete. Each output configuration that is added to the application,either when the application is created or later using the AddApplicationOutput (p. 157) operation,has a unique ID. You need to provide the ID to uniquely identify the output configuration that youwant to delete from the application configuration. You can use the DescribeApplication (p. 178)operation to get the specific OutputId.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 50.

Pattern: [a-zA-Z0-9_.-]+

Required: Yes

174

Page 181: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDeleteApplicationOutput

Response ElementsIf the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.

ErrorsConcurrentModificationException

Exception thrown as a result of concurrent modification to an application. For example, twoindividuals attempting to edit the same application at the same time.

HTTP Status Code: 400InvalidArgumentException

Specified input parameter value is invalid.

HTTP Status Code: 400ResourceInUseException

Application is not available for this operation.

HTTP Status Code: 400ResourceNotFoundException

Specified application can't be found.

HTTP Status Code: 400

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS Command Line Interface• AWS SDK for .NET• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for JavaScript• AWS SDK for PHP V3• AWS SDK for Python• AWS SDK for Ruby V2

175

Page 182: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDeleteApplicationReferenceDataSource

DeleteApplicationReferenceDataSourceDeletes a reference data source configuration from the specified application configuration.

If the application is running, Amazon Kinesis Analytics immediately removes the in-application table thatyou created using the AddApplicationReferenceDataSource (p. 160) operation.

This operation requires permissions to perform thekinesisanalytics.DeleteApplicationReferenceDataSource action.

Request Syntax

{ "ApplicationName": "string", "CurrentApplicationVersionId": number, "ReferenceId": "string"}

Request ParametersThe request accepts the following data in JSON format.

ApplicationName (p. 176)

Name of an existing application.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: YesCurrentApplicationVersionId (p. 176)

Version of the application. You can use the DescribeApplication (p. 178) operation toget the current application version. If the version specified is not the current version, theConcurrentModificationException is returned.

Type: Long

Valid Range: Minimum value of 1. Maximum value of 999999999.

Required: YesReferenceId (p. 176)

ID of the reference data source. When you add a reference data source to your application using theAddApplicationReferenceDataSource (p. 160), Amazon Kinesis Analytics assigns an ID. You can usethe DescribeApplication (p. 178) operation to get the reference ID.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 50.

Pattern: [a-zA-Z0-9_.-]+

Required: Yes

176

Page 183: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDeleteApplicationReferenceDataSource

Response ElementsIf the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.

ErrorsConcurrentModificationException

Exception thrown as a result of concurrent modification to an application. For example, twoindividuals attempting to edit the same application at the same time.

HTTP Status Code: 400InvalidArgumentException

Specified input parameter value is invalid.

HTTP Status Code: 400ResourceInUseException

Application is not available for this operation.

HTTP Status Code: 400ResourceNotFoundException

Specified application can't be found.

HTTP Status Code: 400

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS Command Line Interface• AWS SDK for .NET• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for JavaScript• AWS SDK for PHP V3• AWS SDK for Python• AWS SDK for Ruby V2

177

Page 184: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDescribeApplication

DescribeApplicationReturns information about a specific Amazon Kinesis Analytics application.

If you want to retrieve a list of all applications in your account, use the ListApplications (p. 186)operation.

This operation requires permissions to perform the kinesisanalytics:DescribeApplicationaction. You can use DescribeApplication to get the current application versionId, which you need tocall other operations such as Update.

Request Syntax

{ "ApplicationName": "string"}

Request ParametersThe request accepts the following data in JSON format.

ApplicationName (p. 178)

Name of the application.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: Yes

Response Syntax

{ "ApplicationDetail": { "ApplicationARN": "string", "ApplicationCode": "string", "ApplicationDescription": "string", "ApplicationName": "string", "ApplicationStatus": "string", "ApplicationVersionId": number, "CloudWatchLoggingOptionDescriptions": [ { "CloudWatchLoggingOptionId": "string", "LogStreamARN": "string", "RoleARN": "string" } ], "CreateTimestamp": number, "InputDescriptions": [ { "InAppStreamNames": [ "string" ], "InputId": "string", "InputParallelism": { "Count": number },

178

Page 185: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDescribeApplication

"InputProcessingConfigurationDescription": { "InputLambdaProcessorDescription": { "ResourceARN": "string", "RoleARN": "string" } }, "InputSchema": { "RecordColumns": [ { "Mapping": "string", "Name": "string", "SqlType": "string" } ], "RecordEncoding": "string", "RecordFormat": { "MappingParameters": { "CSVMappingParameters": { "RecordColumnDelimiter": "string", "RecordRowDelimiter": "string" }, "JSONMappingParameters": { "RecordRowPath": "string" } }, "RecordFormatType": "string" } }, "InputStartingPositionConfiguration": { "InputStartingPosition": "string" }, "KinesisFirehoseInputDescription": { "ResourceARN": "string", "RoleARN": "string" }, "KinesisStreamsInputDescription": { "ResourceARN": "string", "RoleARN": "string" }, "NamePrefix": "string" } ], "LastUpdateTimestamp": number, "OutputDescriptions": [ { "DestinationSchema": { "RecordFormatType": "string" }, "KinesisFirehoseOutputDescription": { "ResourceARN": "string", "RoleARN": "string" }, "KinesisStreamsOutputDescription": { "ResourceARN": "string", "RoleARN": "string" }, "LambdaOutputDescription": { "ResourceARN": "string", "RoleARN": "string" }, "Name": "string", "OutputId": "string" } ], "ReferenceDataSourceDescriptions": [ {

179

Page 186: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDescribeApplication

"ReferenceId": "string", "ReferenceSchema": { "RecordColumns": [ { "Mapping": "string", "Name": "string", "SqlType": "string" } ], "RecordEncoding": "string", "RecordFormat": { "MappingParameters": { "CSVMappingParameters": { "RecordColumnDelimiter": "string", "RecordRowDelimiter": "string" }, "JSONMappingParameters": { "RecordRowPath": "string" } }, "RecordFormatType": "string" } }, "S3ReferenceDataSourceDescription": { "BucketARN": "string", "FileKey": "string", "ReferenceRoleARN": "string" }, "TableName": "string" } ] }}

Response ElementsIf the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

ApplicationDetail (p. 178)

Provides a description of the application, such as the application Amazon Resource Name (ARN),status, latest version, and input and output configuration details.

Type: ApplicationDetail (p. 197) object

ErrorsResourceNotFoundException

Specified application can't be found.

HTTP Status Code: 400

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS Command Line Interface

180

Page 188: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDiscoverInputSchema

DiscoverInputSchemaInfers a schema by evaluating sample records on the specified streaming source (Amazon Kinesis streamor Amazon Kinesis Firehose delivery stream) or S3 object. In the response, the operation returns theinferred schema and also the sample records that the operation used to infer the schema.

You can use the inferred schema when configuring a streaming source for your application. Forconceptual information, see Configuring Application Input. Note that when you create an applicationusing the Amazon Kinesis Analytics console, the console uses this operation to infer a schema and showit in the console user interface.

This operation requires permissions to perform the kinesisanalytics:DiscoverInputSchemaaction.

Request Syntax

{ "InputProcessingConfiguration": { "InputLambdaProcessor": { "ResourceARN": "string", "RoleARN": "string" } }, "InputStartingPositionConfiguration": { "InputStartingPosition": "string" }, "ResourceARN": "string", "RoleARN": "string", "S3Configuration": { "BucketARN": "string", "FileKey": "string", "RoleARN": "string" }}

Request ParametersThe request accepts the following data in JSON format.

InputProcessingConfiguration (p. 182)

The InputProcessingConfiguration (p. 217) to use to preprocess the records before discovering theschema of the records.

Type: InputProcessingConfiguration (p. 217) object

Required: NoInputStartingPositionConfiguration (p. 182)

Point at which you want Amazon Kinesis Analytics to start reading records from the specifiedstreaming source discovery purposes.

Type: InputStartingPositionConfiguration (p. 221) object

Required: NoResourceARN (p. 182)

Amazon Resource Name (ARN) of the streaming source.

182

Page 189: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDiscoverInputSchema

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: NoRoleARN (p. 182)

ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream on your behalf.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: NoS3Configuration (p. 182)

Specify this parameter to discover a schema from data in an Amazon S3 object.

Type: S3Configuration (p. 254) object

Required: No

Response Syntax

{ "InputSchema": { "RecordColumns": [ { "Mapping": "string", "Name": "string", "SqlType": "string" } ], "RecordEncoding": "string", "RecordFormat": { "MappingParameters": { "CSVMappingParameters": { "RecordColumnDelimiter": "string", "RecordRowDelimiter": "string" }, "JSONMappingParameters": { "RecordRowPath": "string" } }, "RecordFormatType": "string" } }, "ParsedInputRecords": [ [ "string" ] ], "ProcessedInputRecords": [ "string" ], "RawInputRecords": [ "string" ]}

Response ElementsIf the action is successful, the service sends back an HTTP 200 response.

183

Page 190: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDiscoverInputSchema

The following data is returned in JSON format by the service.

InputSchema (p. 183)

Schema inferred from the streaming source. It identifies the format of the data in the streamingsource and how each data element maps to corresponding columns in the in-application stream thatyou can create.

Type: SourceSchema (p. 258) objectParsedInputRecords (p. 183)

An array of elements, where each element corresponds to a row in a stream record (a stream recordcan have more than one row).

Type: Array of arrays of stringsProcessedInputRecords (p. 183)

Stream data that was modified by the processor specified in theInputProcessingConfiguration parameter.

Type: Array of stringsRawInputRecords (p. 183)

Raw stream data that was sampled to infer the schema.

Type: Array of strings

ErrorsInvalidArgumentException

Specified input parameter value is invalid.

HTTP Status Code: 400ResourceProvisionedThroughputExceededException

Discovery failed to get a record from the streaming source because of the Amazon Kinesis StreamsProvisionedThroughputExceededException. For more information, see GetRecords in the AmazonKinesis Streams API Reference.

HTTP Status Code: 400ServiceUnavailableException

The service is unavailable. Back off and retry the operation.

HTTP Status Code: 500UnableToDetectSchemaException

Data format is not valid. Amazon Kinesis Analytics is not able to detect schema for the givenstreaming source.

HTTP Status Code: 400

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

184

Page 192: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideListApplications

ListApplicationsReturns a list of Amazon Kinesis Analytics applications in your account. For each application, theresponse includes the application name, Amazon Resource Name (ARN), and status. If the responsereturns the HasMoreApplications value as true, you can send another request by adding theExclusiveStartApplicationName in the request body, and set the value of this to the lastapplication name from the previous response.

If you want detailed information about a specific application, use DescribeApplication (p. 178).

This operation requires permissions to perform the kinesisanalytics:ListApplications action.

Request Syntax

{ "ExclusiveStartApplicationName": "string", "Limit": number}

Request ParametersThe request accepts the following data in JSON format.

ExclusiveStartApplicationName (p. 186)

Name of the application to start the list with. When using pagination to retrieve the list, you don'tneed to specify this parameter in the first request. However, in subsequent requests, you add the lastapplication name from the previous response to get the next page of applications.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: NoLimit (p. 186)

Maximum number of applications to list.

Type: Integer

Valid Range: Minimum value of 1. Maximum value of 50.

Required: No

Response Syntax

{ "ApplicationSummaries": [ { "ApplicationARN": "string", "ApplicationName": "string", "ApplicationStatus": "string" } ], "HasMoreApplications": boolean

186

Page 193: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideListApplications

}

Response ElementsIf the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

ApplicationSummaries (p. 186)

List of ApplicationSummary objects.

Type: Array of ApplicationSummary (p. 200) objectsHasMoreApplications (p. 186)

Returns true if there are more applications to retrieve.

Type: Boolean

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS Command Line Interface• AWS SDK for .NET• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for JavaScript• AWS SDK for PHP V3• AWS SDK for Python• AWS SDK for Ruby V2

187

Page 194: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideStartApplication

StartApplicationStarts the specified Amazon Kinesis Analytics application. After creating an application, you mustexclusively call this operation to start your application.

After the application starts, it begins consuming the input data, processes it, and writes the output to theconfigured destination.

The application status must be READY for you to start an application. You can get the application statusin the console or using the DescribeApplication (p. 178) operation.

After you start the application, you can stop the application from processing the input by calling theStopApplication (p. 190) operation.

This operation requires permissions to perform the kinesisanalytics:StartApplication action.

Request Syntax

{ "ApplicationName": "string", "InputConfigurations": [ { "Id": "string", "InputStartingPositionConfiguration": { "InputStartingPosition": "string" } } ]}

Request ParametersThe request accepts the following data in JSON format.

ApplicationName (p. 188)

Name of the application.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: YesInputConfigurations (p. 188)

Identifies the specific input, by ID, that the application starts consuming. Amazon Kinesis Analyticsstarts reading the streaming source associated with the input. You can also specify where in thestreaming source you want Amazon Kinesis Analytics to start reading.

Type: Array of InputConfiguration (p. 209) objects

Required: Yes

Response ElementsIf the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.

188

Page 195: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideStartApplication

ErrorsInvalidApplicationConfigurationException

User-provided application configuration is not valid.

HTTP Status Code: 400InvalidArgumentException

Specified input parameter value is invalid.

HTTP Status Code: 400ResourceInUseException

Application is not available for this operation.

HTTP Status Code: 400ResourceNotFoundException

Specified application can't be found.

HTTP Status Code: 400

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS Command Line Interface• AWS SDK for .NET• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for JavaScript• AWS SDK for PHP V3• AWS SDK for Python• AWS SDK for Ruby V2

189

Page 196: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideStopApplication

StopApplicationStops the application from processing input data. You can stop an application only if it is in the runningstate. You can use the DescribeApplication (p. 178) operation to find the application state. After theapplication is stopped, Amazon Kinesis Analytics stops reading data from the input, the application stopsprocessing data, and there is no output written to the destination.

This operation requires permissions to perform the kinesisanalytics:StopApplication action.

Request Syntax

{ "ApplicationName": "string"}

Request ParametersThe request accepts the following data in JSON format.

ApplicationName (p. 190)

Name of the running application to stop.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: Yes

Response ElementsIf the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.

ErrorsResourceInUseException

Application is not available for this operation.

HTTP Status Code: 400ResourceNotFoundException

Specified application can't be found.

HTTP Status Code: 400

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS Command Line Interface• AWS SDK for .NET

190

Page 198: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideUpdateApplication

UpdateApplicationUpdates an existing Amazon Kinesis Analytics application. Using this API, you can update applicationcode, input configuration, and output configuration.

Note that Amazon Kinesis Analytics updates the CurrentApplicationVersionId each time youupdate your application.

This operation requires permission for the kinesisanalytics:UpdateApplication action.

Request Syntax

{ "ApplicationName": "string", "ApplicationUpdate": { "ApplicationCodeUpdate": "string", "CloudWatchLoggingOptionUpdates": [ { "CloudWatchLoggingOptionId": "string", "LogStreamARNUpdate": "string", "RoleARNUpdate": "string" } ], "InputUpdates": [ { "InputId": "string", "InputParallelismUpdate": { "CountUpdate": number }, "InputProcessingConfigurationUpdate": { "InputLambdaProcessorUpdate": { "ResourceARNUpdate": "string", "RoleARNUpdate": "string" } }, "InputSchemaUpdate": { "RecordColumnUpdates": [ { "Mapping": "string", "Name": "string", "SqlType": "string" } ], "RecordEncodingUpdate": "string", "RecordFormatUpdate": { "MappingParameters": { "CSVMappingParameters": { "RecordColumnDelimiter": "string", "RecordRowDelimiter": "string" }, "JSONMappingParameters": { "RecordRowPath": "string" } }, "RecordFormatType": "string" } }, "KinesisFirehoseInputUpdate": { "ResourceARNUpdate": "string", "RoleARNUpdate": "string" }, "KinesisStreamsInputUpdate": { "ResourceARNUpdate": "string",

192

Page 199: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideUpdateApplication

"RoleARNUpdate": "string" }, "NamePrefixUpdate": "string" } ], "OutputUpdates": [ { "DestinationSchemaUpdate": { "RecordFormatType": "string" }, "KinesisFirehoseOutputUpdate": { "ResourceARNUpdate": "string", "RoleARNUpdate": "string" }, "KinesisStreamsOutputUpdate": { "ResourceARNUpdate": "string", "RoleARNUpdate": "string" }, "LambdaOutputUpdate": { "ResourceARNUpdate": "string", "RoleARNUpdate": "string" }, "NameUpdate": "string", "OutputId": "string" } ], "ReferenceDataSourceUpdates": [ { "ReferenceId": "string", "ReferenceSchemaUpdate": { "RecordColumns": [ { "Mapping": "string", "Name": "string", "SqlType": "string" } ], "RecordEncoding": "string", "RecordFormat": { "MappingParameters": { "CSVMappingParameters": { "RecordColumnDelimiter": "string", "RecordRowDelimiter": "string" }, "JSONMappingParameters": { "RecordRowPath": "string" } }, "RecordFormatType": "string" } }, "S3ReferenceDataSourceUpdate": { "BucketARNUpdate": "string", "FileKeyUpdate": "string", "ReferenceRoleARNUpdate": "string" }, "TableNameUpdate": "string" } ] }, "CurrentApplicationVersionId": number}

193

Page 200: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideUpdateApplication

Request ParametersThe request accepts the following data in JSON format.

ApplicationName (p. 192)

Name of the Amazon Kinesis Analytics application to update.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: YesApplicationUpdate (p. 192)

Describes application updates.

Type: ApplicationUpdate (p. 201) object

Required: YesCurrentApplicationVersionId (p. 192)

The current application version ID. You can use the DescribeApplication (p. 178) operation to get thisvalue.

Type: Long

Valid Range: Minimum value of 1. Maximum value of 999999999.

Required: Yes

Response ElementsIf the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.

ErrorsCodeValidationException

User-provided application code (query) is invalid. This can be a simple syntax error.

HTTP Status Code: 400ConcurrentModificationException

Exception thrown as a result of concurrent modification to an application. For example, twoindividuals attempting to edit the same application at the same time.

HTTP Status Code: 400InvalidArgumentException

Specified input parameter value is invalid.

HTTP Status Code: 400ResourceInUseException

Application is not available for this operation.

194

Page 201: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideData Types

HTTP Status Code: 400ResourceNotFoundException

Specified application can't be found.

HTTP Status Code: 400

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS Command Line Interface• AWS SDK for .NET• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for JavaScript• AWS SDK for PHP V3• AWS SDK for Python• AWS SDK for Ruby V2

Data TypesThe following data types are supported:

• ApplicationDetail (p. 197)• ApplicationSummary (p. 200)• ApplicationUpdate (p. 201)• CloudWatchLoggingOption (p. 202)• CloudWatchLoggingOptionDescription (p. 203)• CloudWatchLoggingOptionUpdate (p. 204)• CSVMappingParameters (p. 205)• DestinationSchema (p. 206)• Input (p. 207)• InputConfiguration (p. 209)• InputDescription (p. 210)• InputLambdaProcessor (p. 212)• InputLambdaProcessorDescription (p. 213)• InputLambdaProcessorUpdate (p. 214)• InputParallelism (p. 215)• InputParallelismUpdate (p. 216)• InputProcessingConfiguration (p. 217)• InputProcessingConfigurationDescription (p. 218)• InputProcessingConfigurationUpdate (p. 219)• InputSchemaUpdate (p. 220)• InputStartingPositionConfiguration (p. 221)• InputUpdate (p. 222)

195

Page 202: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideData Types

• JSONMappingParameters (p. 224)• KinesisFirehoseInput (p. 225)• KinesisFirehoseInputDescription (p. 226)• KinesisFirehoseInputUpdate (p. 227)• KinesisFirehoseOutput (p. 228)• KinesisFirehoseOutputDescription (p. 229)• KinesisFirehoseOutputUpdate (p. 230)• KinesisStreamsInput (p. 231)• KinesisStreamsInputDescription (p. 232)• KinesisStreamsInputUpdate (p. 233)• KinesisStreamsOutput (p. 234)• KinesisStreamsOutputDescription (p. 235)• KinesisStreamsOutputUpdate (p. 236)• LambdaOutput (p. 237)• LambdaOutputDescription (p. 238)• LambdaOutputUpdate (p. 239)• MappingParameters (p. 240)• Output (p. 241)• OutputDescription (p. 243)• OutputUpdate (p. 245)• RecordColumn (p. 247)• RecordFormat (p. 248)• ReferenceDataSource (p. 249)• ReferenceDataSourceDescription (p. 250)• ReferenceDataSourceUpdate (p. 252)• S3Configuration (p. 254)• S3ReferenceDataSource (p. 255)• S3ReferenceDataSourceDescription (p. 256)• S3ReferenceDataSourceUpdate (p. 257)• SourceSchema (p. 258)

196

Page 203: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideApplicationDetail

ApplicationDetailProvides a description of the application, including the application Amazon Resource Name (ARN), status,latest version, and input and output configuration.

ContentsApplicationARN

ARN of the application.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: YesApplicationCode

Returns the application code that you provided to perform data analysis on any of the in-applicationstreams in your application.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 51200.

Required: NoApplicationDescription

Description of the application.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 1024.

Required: NoApplicationName

Name of the application.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: YesApplicationStatus

Status of the application.

Type: String

Valid Values: DELETING | STARTING | STOPPING | READY | RUNNING | UPDATING

Required: YesApplicationVersionId

Provides the current application version.

197

Page 204: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideApplicationDetail

Type: Long

Valid Range: Minimum value of 1. Maximum value of 999999999.

Required: YesCloudWatchLoggingOptionDescriptions

Describes the CloudWatch log streams that are configured to receive application messages. For moreinformation about using CloudWatch log streams with Amazon Kinesis Analytics applications, seeWorking with Amazon CloudWatch Logs.

Type: Array of CloudWatchLoggingOptionDescription (p. 203) objects

Required: NoCreateTimestamp

Time stamp when the application version was created.

Type: Timestamp

Required: NoInputDescriptions

Describes the application input configuration. For more information, see Configuring ApplicationInput.

Type: Array of InputDescription (p. 210) objects

Required: NoLastUpdateTimestamp

Time stamp when the application was last updated.

Type: Timestamp

Required: NoOutputDescriptions

Describes the application output configuration. For more information, see Configuring ApplicationOutput.

Type: Array of OutputDescription (p. 243) objects

Required: NoReferenceDataSourceDescriptions

Describes reference data sources configured for the application. For more information, seeConfiguring Application Input.

Type: Array of ReferenceDataSourceDescription (p. 250) objects

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++

198

Page 206: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideApplicationSummary

ApplicationSummaryProvides application summary information, including the application Amazon Resource Name (ARN),name, and status.

ContentsApplicationARN

ARN of the application.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: YesApplicationName

Name of the application.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 128.

Pattern: [a-zA-Z0-9_.-]+

Required: YesApplicationStatus

Status of the application.

Type: String

Valid Values: DELETING | STARTING | STOPPING | READY | RUNNING | UPDATING

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

200

Page 207: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideApplicationUpdate

ApplicationUpdateDescribes updates to apply to an existing Amazon Kinesis Analytics application.

ContentsApplicationCodeUpdate

Describes application code updates.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 51200.

Required: NoCloudWatchLoggingOptionUpdates

Describes application CloudWatch logging option updates.

Type: Array of CloudWatchLoggingOptionUpdate (p. 204) objects

Required: NoInputUpdates

Describes application input configuration updates.

Type: Array of InputUpdate (p. 222) objects

Required: NoOutputUpdates

Describes application output configuration updates.

Type: Array of OutputUpdate (p. 245) objects

Required: NoReferenceDataSourceUpdates

Describes application reference data source updates.

Type: Array of ReferenceDataSourceUpdate (p. 252) objects

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

201

Page 208: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideCloudWatchLoggingOption

CloudWatchLoggingOptionProvides a description of CloudWatch logging options, including the log stream Amazon Resource Name(ARN) and the role ARN.

ContentsLogStreamARN

ARN of the CloudWatch log to receive application messages.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: YesRoleARN

IAM ARN of the role to use to send application messages. Note: To write application messages toCloudWatch, the IAM role that is used must have the PutLogEvents policy action enabled.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

202

Page 209: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideCloudWatchLoggingOptionDescription

CloudWatchLoggingOptionDescriptionDescription of the CloudWatch logging option.

ContentsCloudWatchLoggingOptionId

ID of the CloudWatch logging option description.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 50.

Pattern: [a-zA-Z0-9_.-]+

Required: NoLogStreamARN

ARN of the CloudWatch log to receive application messages.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: YesRoleARN

IAM ARN of the role to use to send application messages. Note: To write application messages toCloudWatch, the IAM role used must have the PutLogEvents policy action enabled.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

203

Page 210: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideCloudWatchLoggingOptionUpdate

CloudWatchLoggingOptionUpdateDescribes CloudWatch logging option updates.

ContentsCloudWatchLoggingOptionId

ID of the CloudWatch logging option to update

Type: String

Length Constraints: Minimum length of 1. Maximum length of 50.

Pattern: [a-zA-Z0-9_.-]+

Required: YesLogStreamARNUpdate

ARN of the CloudWatch log to receive application messages.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: NoRoleARNUpdate

IAM ARN of the role to use to send application messages. Note: To write application messages toCloudWatch, the IAM role used must have the PutLogEvents policy action enabled.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

204

Page 211: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideCSVMappingParameters

CSVMappingParametersProvides additional mapping information when the record format uses delimiters, such as CSV. Forexample, the following sample records use CSV format, where the records use the '\n' as the rowdelimiter and a comma (",") as the column delimiter:

"name1", "address1"

"name2", "address2"

ContentsRecordColumnDelimiter

Column delimiter. For example, in a CSV format, a comma (",") is the typical column delimiter.

Type: String

Length Constraints: Minimum length of 1.

Required: YesRecordRowDelimiter

Row delimiter. For example, in a CSV format, '\n' is the typical row delimiter.

Type: String

Length Constraints: Minimum length of 1.

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

205

Page 212: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideDestinationSchema

DestinationSchemaDescribes the data format when records are written to the destination. For more information, seeConfiguring Application Output.

ContentsRecordFormatType

Specifies the format of the records on the output stream.

Type: String

Valid Values: JSON | CSV

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

206

Page 213: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInput

InputWhen you configure the application input, you specify the streaming source, the in-application streamname that is created, and the mapping between the two. For more information, see ConfiguringApplication Input.

ContentsInputParallelism

Describes the number of in-application streams to create.

Data from your source is routed to these in-application input streams.

(see Configuring Application Input.

Type: InputParallelism (p. 215) object

Required: NoInputProcessingConfiguration

The InputProcessingConfiguration (p. 217) for the input. An input processor transforms records asthey are received from the stream, before the application's SQL code executes. Currently, the onlyinput processing configuration available is InputLambdaProcessor (p. 212).

Type: InputProcessingConfiguration (p. 217) object

Required: NoInputSchema

Describes the format of the data in the streaming source, and how each data element maps tocorresponding columns in the in-application stream that is being created.

Also used to describe the format of the reference data source.

Type: SourceSchema (p. 258) object

Required: YesKinesisFirehoseInput

If the streaming source is an Amazon Kinesis Firehose delivery stream, identifies the deliverystream's ARN and an IAM role that enables Amazon Kinesis Analytics to access the stream on yourbehalf.

Note: Either KinesisStreamsInput or KinesisFirehoseInput is required.

Type: KinesisFirehoseInput (p. 225) object

Required: NoKinesisStreamsInput

If the streaming source is an Amazon Kinesis stream, identifies the stream's Amazon Resource Name(ARN) and an IAM role that enables Amazon Kinesis Analytics to access the stream on your behalf.

Note: Either KinesisStreamsInput or KinesisFirehoseInput is required.

Type: KinesisStreamsInput (p. 231) object

Required: No

207

Page 214: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInput

NamePrefix

Name prefix to use when creating an in-application stream. Suppose that you specify aprefix "MyInApplicationStream." Amazon Kinesis Analytics then creates one or more (asper the InputParallelism count you specified) in-application streams with names"MyInApplicationStream_001," "MyInApplicationStream_002," and so on.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 32.

Pattern: [a-zA-Z][a-zA-Z0-9_]+

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

208

Page 215: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInputConfiguration

InputConfigurationWhen you start your application, you provide this configuration, which identifies the input source and thepoint in the input source at which you want the application to start processing records.

ContentsId

Input source ID. You can get this ID by calling the DescribeApplication (p. 178) operation.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 50.

Pattern: [a-zA-Z0-9_.-]+

Required: YesInputStartingPositionConfiguration

Point at which you want the application to start processing records from the streaming source.

Type: InputStartingPositionConfiguration (p. 221) object

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

209

Page 216: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInputDescription

InputDescriptionDescribes the application input configuration. For more information, see Configuring Application Input.

ContentsInAppStreamNames

Returns the in-application stream names that are mapped to the stream source.

Type: Array of strings

Length Constraints: Minimum length of 1. Maximum length of 32.

Pattern: [a-zA-Z][a-zA-Z0-9_]+

Required: NoInputId

Input ID associated with the application input. This is the ID that Amazon Kinesis Analytics assigns toeach input configuration you add to your application.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 50.

Pattern: [a-zA-Z0-9_.-]+

Required: NoInputParallelism

Describes the configured parallelism (number of in-application streams mapped to the streamingsource).

Type: InputParallelism (p. 215) object

Required: NoInputProcessingConfigurationDescription

The description of the preprocessor that executes on records in this input before the application'scode is run.

Type: InputProcessingConfigurationDescription (p. 218) object

Required: NoInputSchema

Describes the format of the data in the streaming source, and how each data element maps tocorresponding columns in the in-application stream that is being created.

Type: SourceSchema (p. 258) object

Required: NoInputStartingPositionConfiguration

Point at which the application is configured to read from the input stream.

Type: InputStartingPositionConfiguration (p. 221) object

210

Page 217: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInputDescription

Required: NoKinesisFirehoseInputDescription

If an Amazon Kinesis Firehose delivery stream is configured as a streaming source, provides thedelivery stream's ARN and an IAM role that enables Amazon Kinesis Analytics to access the streamon your behalf.

Type: KinesisFirehoseInputDescription (p. 226) object

Required: NoKinesisStreamsInputDescription

If an Amazon Kinesis stream is configured as streaming source, provides Amazon Kinesis stream'sAmazon Resource Name (ARN) and an IAM role that enables Amazon Kinesis Analytics to access thestream on your behalf.

Type: KinesisStreamsInputDescription (p. 232) object

Required: NoNamePrefix

In-application name prefix.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 32.

Pattern: [a-zA-Z][a-zA-Z0-9_]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

211

Page 218: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInputLambdaProcessor

InputLambdaProcessorAn object that contains the Amazon Resource Name (ARN) of the AWS Lambda function that is used topreprocess records in the stream, and the ARN of the IAM role that is used to access the AWS Lambdafunction.

ContentsResourceARN

The ARN of the AWS Lambda function that operates on records in the stream.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: YesRoleARN

The ARN of the IAM role that is used to access the AWS Lambda function.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

212

Page 219: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInputLambdaProcessorDescription

InputLambdaProcessorDescriptionAn object that contains the Amazon Resource Name (ARN) of the AWS Lambda function that is used topreprocess records in the stream, and the ARN of the IAM role that is used to access the AWS Lambdaexpression.

ContentsResourceARN

The ARN of the AWS Lambda function that is used to preprocess the records in the stream.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: NoRoleARN

The ARN of the IAM role that is used to access the AWS Lambda function.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

213

Page 220: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInputLambdaProcessorUpdate

InputLambdaProcessorUpdateRepresents an update to the InputLambdaProcessor (p. 212) that is used to preprocess the records in thestream.

ContentsResourceARNUpdate

The Amazon Resource Name (ARN) of the new AWS Lambda function that is used to preprocess therecords in the stream.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: NoRoleARNUpdate

The ARN of the new IAM role that is used to access the AWS Lambda function.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

214

Page 221: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInputParallelism

InputParallelismDescribes the number of in-application streams to create for a given streaming source. For informationabout parallelism, see Configuring Application Input.

ContentsCount

Number of in-application streams to create. For more information, see Limits.

Type: Integer

Valid Range: Minimum value of 1. Maximum value of 64.

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

215

Page 222: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInputParallelismUpdate

InputParallelismUpdateProvides updates to the parallelism count.

ContentsCountUpdate

Number of in-application streams to create for the specified streaming source.

Type: Integer

Valid Range: Minimum value of 1. Maximum value of 64.

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

216

Page 223: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInputProcessingConfiguration

InputProcessingConfigurationProvides a description of a processor that is used to preprocess the records in the stream before beingprocessed by your application code. Currently, the only input processor available is AWS Lambda.

ContentsInputLambdaProcessor

The InputLambdaProcessor (p. 212) that is used to preprocess the records in the stream before beingprocessed by your application code.

Type: InputLambdaProcessor (p. 212) object

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

217

Page 224: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInputProcessingConfigurationDescription

InputProcessingConfigurationDescriptionProvides configuration information about an input processor. Currently, the only input processoravailable is AWS Lambda.

ContentsInputLambdaProcessorDescription

Provides configuration information about the associated InputLambdaProcessorDescription (p. 213).

Type: InputLambdaProcessorDescription (p. 213) object

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

218

Page 225: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInputProcessingConfigurationUpdate

InputProcessingConfigurationUpdateDescribes updates to an InputProcessingConfiguration (p. 217).

ContentsInputLambdaProcessorUpdate

Provides update information for an InputLambdaProcessor (p. 212).

Type: InputLambdaProcessorUpdate (p. 214) object

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

219

Page 226: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInputSchemaUpdate

InputSchemaUpdateDescribes updates for the application's input schema.

ContentsRecordColumnUpdates

A list of RecordColumn objects. Each object describes the mapping of the streaming source elementto the corresponding column in the in-application stream.

Type: Array of RecordColumn (p. 247) objects

Array Members: Minimum number of 1 item. Maximum number of 1000 items.

Required: NoRecordEncodingUpdate

Specifies the encoding of the records in the streaming source. For example, UTF-8.

Type: String

Pattern: UTF-8

Required: NoRecordFormatUpdate

Specifies the format of the records on the streaming source.

Type: RecordFormat (p. 248) object

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

220

Page 227: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInputStartingPositionConfiguration

InputStartingPositionConfigurationDescribes the point at which the application reads from the streaming source.

ContentsInputStartingPosition

The starting position on the stream.• NOW - Start reading just after the most recent record in the stream, start at the request time stamp

that the customer issued.• TRIM_HORIZON - Start reading at the last untrimmed record in the stream, which is the oldest

record available in the stream. This option is not available for an Amazon Kinesis Firehose deliverystream.

• LAST_STOPPED_POINT - Resume reading from where the application last stopped reading.

Type: String

Valid Values: NOW | TRIM_HORIZON | LAST_STOPPED_POINT

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

221

Page 228: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInputUpdate

InputUpdateDescribes updates to a specific input configuration (identified by the InputId of an application).

ContentsInputId

Input ID of the application input to be updated.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 50.

Pattern: [a-zA-Z0-9_.-]+

Required: YesInputParallelismUpdate

Describes the parallelism updates (the number in-application streams Amazon Kinesis Analyticscreates for the specific streaming source).

Type: InputParallelismUpdate (p. 216) object

Required: NoInputProcessingConfigurationUpdate

Describes updates for an input processing configuration.

Type: InputProcessingConfigurationUpdate (p. 219) object

Required: NoInputSchemaUpdate

Describes the data format on the streaming source, and how record elements on the streamingsource map to columns of the in-application stream that is created.

Type: InputSchemaUpdate (p. 220) object

Required: NoKinesisFirehoseInputUpdate

If an Amazon Kinesis Firehose delivery stream is the streaming source to be updated, provides anupdated stream ARN and IAM role ARN.

Type: KinesisFirehoseInputUpdate (p. 227) object

Required: NoKinesisStreamsInputUpdate

If an Amazon Kinesis stream is the streaming source to be updated, provides an updated streamAmazon Resource Name (ARN) and IAM role ARN.

Type: KinesisStreamsInputUpdate (p. 233) object

Required: NoNamePrefixUpdate

Name prefix for in-application streams that Amazon Kinesis Analytics creates for the specificstreaming source.

222

Page 229: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideInputUpdate

Type: String

Length Constraints: Minimum length of 1. Maximum length of 32.

Pattern: [a-zA-Z][a-zA-Z0-9_]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

223

Page 230: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideJSONMappingParameters

JSONMappingParametersProvides additional mapping information when JSON is the record format on the streaming source.

ContentsRecordRowPath

Path to the top-level parent that contains the records.

Type: String

Length Constraints: Minimum length of 1.

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

224

Page 231: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideKinesisFirehoseInput

KinesisFirehoseInputIdentifies an Amazon Kinesis Firehose delivery stream as the streaming source. You provide the deliverystream's Amazon Resource Name (ARN) and an IAM role ARN that enables Amazon Kinesis Analytics toaccess the stream on your behalf.

ContentsResourceARN

ARN of the input delivery stream.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: YesRoleARN

ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream on your behalf.You need to make sure that the role has the necessary permissions to access the stream.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

225

Page 232: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideKinesisFirehoseInputDescription

KinesisFirehoseInputDescriptionDescribes the Amazon Kinesis Firehose delivery stream that is configured as the streaming source in theapplication input configuration.

ContentsResourceARN

Amazon Resource Name (ARN) of the Amazon Kinesis Firehose delivery stream.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: NoRoleARN

ARN of the IAM role that Amazon Kinesis Analytics assumes to access the stream.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

226

Page 233: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideKinesisFirehoseInputUpdate

KinesisFirehoseInputUpdateWhen updating application input configuration, provides information about an Amazon Kinesis Firehosedelivery stream as the streaming source.

ContentsResourceARNUpdate

Amazon Resource Name (ARN) of the input Amazon Kinesis Firehose delivery stream to read.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: NoRoleARNUpdate

ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream on your behalf.You need to grant the necessary permissions to this role.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

227

Page 234: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideKinesisFirehoseOutput

KinesisFirehoseOutputWhen configuring application output, identifies an Amazon Kinesis Firehose delivery stream as thedestination. You provide the stream Amazon Resource Name (ARN) and an IAM role that enables AmazonKinesis Analytics to write to the stream on your behalf.

ContentsResourceARN

ARN of the destination Amazon Kinesis Firehose delivery stream to write to.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: YesRoleARN

ARN of the IAM role that Amazon Kinesis Analytics can assume to write to the destination stream onyour behalf. You need to grant the necessary permissions to this role.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

228

Page 235: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideKinesisFirehoseOutputDescription

KinesisFirehoseOutputDescriptionFor an application output, describes the Amazon Kinesis Firehose delivery stream configured as itsdestination.

ContentsResourceARN

Amazon Resource Name (ARN) of the Amazon Kinesis Firehose delivery stream.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: NoRoleARN

ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

229

Page 236: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideKinesisFirehoseOutputUpdate

KinesisFirehoseOutputUpdateWhen updating an output configuration using the UpdateApplication (p. 192) operation, providesinformation about an Amazon Kinesis Firehose delivery stream configured as the destination.

ContentsResourceARNUpdate

Amazon Resource Name (ARN) of the Amazon Kinesis Firehose delivery stream to write to.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: NoRoleARNUpdate

ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream on your behalf.You need to grant the necessary permissions to this role.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

230

Page 237: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideKinesisStreamsInput

KinesisStreamsInputIdentifies an Amazon Kinesis stream as the streaming source. You provide the stream's Amazon ResourceName (ARN) and an IAM role ARN that enables Amazon Kinesis Analytics to access the stream on yourbehalf.

ContentsResourceARN

ARN of the input Amazon Kinesis stream to read.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: YesRoleARN

ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream on your behalf.You need to grant the necessary permissions to this role.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

231

Page 238: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideKinesisStreamsInputDescription

KinesisStreamsInputDescriptionDescribes the Amazon Kinesis stream that is configured as the streaming source in the application inputconfiguration.

ContentsResourceARN

Amazon Resource Name (ARN) of the Amazon Kinesis stream.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: NoRoleARN

ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

232

Page 239: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideKinesisStreamsInputUpdate

KinesisStreamsInputUpdateWhen updating application input configuration, provides information about an Amazon Kinesis stream asthe streaming source.

ContentsResourceARNUpdate

Amazon Resource Name (ARN) of the input Amazon Kinesis stream to read.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: NoRoleARNUpdate

ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream on your behalf.You need to grant the necessary permissions to this role.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

233

Page 240: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideKinesisStreamsOutput

KinesisStreamsOutputWhen configuring application output, identifies an Amazon Kinesis stream as the destination. Youprovide the stream Amazon Resource Name (ARN) and also an IAM role ARN that Amazon KinesisAnalytics can use to write to the stream on your behalf.

ContentsResourceARN

ARN of the destination Amazon Kinesis stream to write to.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: YesRoleARN

ARN of the IAM role that Amazon Kinesis Analytics can assume to write to the destination stream onyour behalf. You need to grant the necessary permissions to this role.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

234

Page 241: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideKinesisStreamsOutputDescription

KinesisStreamsOutputDescriptionFor an application output, describes the Amazon Kinesis stream configured as its destination.

ContentsResourceARN

Amazon Resource Name (ARN) of the Amazon Kinesis stream.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: NoRoleARN

ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

235

Page 242: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideKinesisStreamsOutputUpdate

KinesisStreamsOutputUpdateWhen updating an output configuration using the UpdateApplication (p. 192) operation, providesinformation about an Amazon Kinesis stream configured as the destination.

ContentsResourceARNUpdate

Amazon Resource Name (ARN) of the Amazon Kinesis stream where you want to write the output.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: NoRoleARNUpdate

ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream on your behalf.You need to grant the necessary permissions to this role.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

236

Page 243: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideLambdaOutput

LambdaOutputWhen configuring application output, identifies an AWS Lambda function as the destination. You providethe function Amazon Resource Name (ARN) and also an IAM role ARN that Amazon Kinesis Analytics canuse to write to the function on your behalf.

ContentsResourceARN

Amazon Resource Name (ARN) of the destination Lambda function to write to.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: YesRoleARN

ARN of the IAM role that Amazon Kinesis Analytics can assume to write to the destination functionon your behalf. You need to grant the necessary permissions to this role.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

237

Page 244: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideLambdaOutputDescription

LambdaOutputDescriptionFor an application output, describes the AWS Lambda function configured as its destination.

ContentsResourceARN

Amazon Resource Name (ARN) of the destination Lambda function.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: NoRoleARN

ARN of the IAM role that Amazon Kinesis Analytics can assume to write to the destination function.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

238

Page 245: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideLambdaOutputUpdate

LambdaOutputUpdateWhen updating an output configuration using the UpdateApplication (p. 192) operation, providesinformation about an AWS Lambda function configured as the destination.

ContentsResourceARNUpdate

Amazon Resource Name (ARN) of the destination Lambda function.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: NoRoleARNUpdate

ARN of the IAM role that Amazon Kinesis Analytics can assume to write to the destination functionon your behalf. You need to grant the necessary permissions to this role.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

239

Page 246: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideMappingParameters

MappingParametersWhen configuring application input at the time of creating or updating an application, providesadditional mapping information specific to the record format (such as JSON, CSV, or record fieldsdelimited by some delimiter) on the streaming source.

ContentsCSVMappingParameters

Provides additional mapping information when the record format uses delimiters (for example, CSV).

Type: CSVMappingParameters (p. 205) object

Required: NoJSONMappingParameters

Provides additional mapping information when JSON is the record format on the streaming source.

Type: JSONMappingParameters (p. 224) object

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

240

Page 247: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideOutput

OutputDescribes application output configuration in which you identify an in-application stream and adestination where you want the in-application stream data to be written. The destination can be anAmazon Kinesis stream or an Amazon Kinesis Firehose delivery stream.

For limits on how many destinations an application can write and other limitations, see Limits.

ContentsDestinationSchema

Describes the data format when records are written to the destination. For more information, seeConfiguring Application Output.

Type: DestinationSchema (p. 206) object

Required: YesKinesisFirehoseOutput

Identifies an Amazon Kinesis Firehose delivery stream as the destination.

Type: KinesisFirehoseOutput (p. 228) object

Required: NoKinesisStreamsOutput

Identifies an Amazon Kinesis stream as the destination.

Type: KinesisStreamsOutput (p. 234) object

Required: NoLambdaOutput

Identifies an AWS Lambda function as the destination.

Type: LambdaOutput (p. 237) object

Required: NoName

Name of the in-application stream.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 32.

Pattern: [a-zA-Z][a-zA-Z0-9_]+

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++

241

Page 249: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideOutputDescription

OutputDescriptionDescribes the application output configuration, which includes the in-application stream name and thedestination where the stream data is written. The destination can be an Amazon Kinesis stream or anAmazon Kinesis Firehose delivery stream.

ContentsDestinationSchema

Data format used for writing data to the destination.

Type: DestinationSchema (p. 206) object

Required: NoKinesisFirehoseOutputDescription

Describes the Amazon Kinesis Firehose delivery stream configured as the destination where output iswritten.

Type: KinesisFirehoseOutputDescription (p. 229) object

Required: NoKinesisStreamsOutputDescription

Describes Amazon Kinesis stream configured as the destination where output is written.

Type: KinesisStreamsOutputDescription (p. 235) object

Required: NoLambdaOutputDescription

Describes the AWS Lambda function configured as the destination where output is written.

Type: LambdaOutputDescription (p. 238) object

Required: NoName

Name of the in-application stream configured as output.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 32.

Pattern: [a-zA-Z][a-zA-Z0-9_]+

Required: NoOutputId

A unique identifier for the output configuration.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 50.

Pattern: [a-zA-Z0-9_.-]+

Required: No

243

Page 250: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideOutputDescription

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

244

Page 251: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideOutputUpdate

OutputUpdateDescribes updates to the output configuration identified by the OutputId.

ContentsDestinationSchemaUpdate

Describes the data format when records are written to the destination. For more information, seeConfiguring Application Output.

Type: DestinationSchema (p. 206) object

Required: NoKinesisFirehoseOutputUpdate

Describes an Amazon Kinesis Firehose delivery stream as the destination for the output.

Type: KinesisFirehoseOutputUpdate (p. 230) object

Required: NoKinesisStreamsOutputUpdate

Describes an Amazon Kinesis stream as the destination for the output.

Type: KinesisStreamsOutputUpdate (p. 236) object

Required: NoLambdaOutputUpdate

Describes an AWS Lambda function as the destination for the output.

Type: LambdaOutputUpdate (p. 239) object

Required: NoNameUpdate

If you want to specify a different in-application stream for this output configuration, use this field tospecify the new in-application stream name.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 32.

Pattern: [a-zA-Z][a-zA-Z0-9_]+

Required: NoOutputId

Identifies the specific output configuration that you want to update.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 50.

Pattern: [a-zA-Z0-9_.-]+

Required: Yes

245

Page 252: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideOutputUpdate

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

246

Page 253: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideRecordColumn

RecordColumnDescribes the mapping of each data element in the streaming source to the corresponding column in thein-application stream.

Also used to describe the format of the reference data source.

ContentsMapping

Reference to the data element in the streaming input of the reference data source.

Type: String

Required: NoName

Name of the column created in the in-application input stream or reference table.

Type: String

Pattern: [a-zA-Z_][a-zA-Z0-9_]*

Required: YesSqlType

Type of column created in the in-application input stream or reference table.

Type: String

Length Constraints: Minimum length of 1.

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

247

Page 254: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideRecordFormat

RecordFormatDescribes the record format and relevant mapping information that should be applied to schematize therecords on the stream.

ContentsMappingParameters

When configuring application input at the time of creating or updating an application, providesadditional mapping information specific to the record format (such as JSON, CSV, or record fieldsdelimited by some delimiter) on the streaming source.

Type: MappingParameters (p. 240) object

Required: NoRecordFormatType

The type of record format.

Type: String

Valid Values: JSON | CSV

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

248

Page 255: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideReferenceDataSource

ReferenceDataSourceDescribes the reference data source by providing the source information (S3 bucket name and objectkey name), the resulting in-application table name that is created, and the necessary schema to map thedata elements in the Amazon S3 object to the in-application table.

ContentsReferenceSchema

Describes the format of the data in the streaming source, and how each data element maps tocorresponding columns created in the in-application stream.

Type: SourceSchema (p. 258) object

Required: YesS3ReferenceDataSource

Identifies the S3 bucket and object that contains the reference data. Also identifies the IAMrole Amazon Kinesis Analytics can assume to read this object on your behalf. An AmazonKinesis Analytics application loads reference data only once. If the data changes, you call theUpdateApplication (p. 192) operation to trigger reloading of data into your application.

Type: S3ReferenceDataSource (p. 255) object

Required: NoTableName

Name of the in-application table to create.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 32.

Pattern: [a-zA-Z][a-zA-Z0-9_]+

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

249

Page 256: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideReferenceDataSourceDescription

ReferenceDataSourceDescriptionDescribes the reference data source configured for an application.

ContentsReferenceId

ID of the reference data source. This is the ID that Amazon Kinesis Analytics assigns when you addthe reference data source to your application using the AddApplicationReferenceDataSource (p. 160)operation.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 50.

Pattern: [a-zA-Z0-9_.-]+

Required: YesReferenceSchema

Describes the format of the data in the streaming source, and how each data element maps tocorresponding columns created in the in-application stream.

Type: SourceSchema (p. 258) object

Required: NoS3ReferenceDataSourceDescription

Provides the S3 bucket name, the object key name that contains the reference data. It also providesthe Amazon Resource Name (ARN) of the IAM role that Amazon Kinesis Analytics can assume to readthe Amazon S3 object and populate the in-application reference table.

Type: S3ReferenceDataSourceDescription (p. 256) object

Required: YesTableName

The in-application table name created by the specific reference data source configuration.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 32.

Pattern: [a-zA-Z][a-zA-Z0-9_]+

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

250

Page 257: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideReferenceDataSourceDescription

251

Page 258: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideReferenceDataSourceUpdate

ReferenceDataSourceUpdateWhen you update a reference data source configuration for an application, this object provides all theupdated values (such as the source bucket name and object key name), the in-application table namethat is created, and updated mapping information that maps the data in the Amazon S3 object to the in-application reference table that is created.

ContentsReferenceId

ID of the reference data source being updated. You can use the DescribeApplication (p. 178)operation to get this value.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 50.

Pattern: [a-zA-Z0-9_.-]+

Required: YesReferenceSchemaUpdate

Describes the format of the data in the streaming source, and how each data element maps tocorresponding columns created in the in-application stream.

Type: SourceSchema (p. 258) object

Required: NoS3ReferenceDataSourceUpdate

Describes the S3 bucket name, object key name, and IAM role that Amazon Kinesis Analytics canassume to read the Amazon S3 object on your behalf and populate the in-application referencetable.

Type: S3ReferenceDataSourceUpdate (p. 257) object

Required: NoTableNameUpdate

In-application table name that is created by this update.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 32.

Pattern: [a-zA-Z][a-zA-Z0-9_]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java

252

Page 259: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideReferenceDataSourceUpdate

• AWS SDK for Ruby V2

253

Page 260: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideS3Configuration

S3ConfigurationProvides a description of an Amazon S3 data source, including the Amazon Resource Name (ARN) of theS3 bucket, the ARN of the IAM role that is used to access the bucket, and the name of the Amazon S3object that contains the data.

ContentsBucketARN

ARN of the S3 bucket that contains the data.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: YesFileKey

The name of the object that contains the data.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 1024.

Required: YesRoleARN

IAM ARN of the role used to access the data.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

254

Page 261: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideS3ReferenceDataSource

S3ReferenceDataSourceIdentifies the S3 bucket and object that contains the reference data. Also identifies the IAM role AmazonKinesis Analytics can assume to read this object on your behalf.

An Amazon Kinesis Analytics application loads reference data only once. If the data changes, you call theUpdateApplication (p. 192) operation to trigger reloading of data into your application.

ContentsBucketARN

Amazon Resource Name (ARN) of the S3 bucket.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: YesFileKey

Object key name containing reference data.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 1024.

Required: YesReferenceRoleARN

ARN of the IAM role that the service can assume to read data on your behalf. This role must havepermission for the s3:GetObject action on the object and trust policy that allows Amazon KinesisAnalytics service principal to assume this role.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

255

Page 262: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideS3ReferenceDataSourceDescription

S3ReferenceDataSourceDescriptionProvides the bucket name and object key name that stores the reference data.

ContentsBucketARN

Amazon Resource Name (ARN) of the S3 bucket.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: YesFileKey

Amazon S3 object key name.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 1024.

Required: YesReferenceRoleARN

ARN of the IAM role that Amazon Kinesis Analytics can assume to read the Amazon S3 object onyour behalf to populate the in-application reference table.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

256

Page 263: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideS3ReferenceDataSourceUpdate

S3ReferenceDataSourceUpdateDescribes the S3 bucket name, object key name, and IAM role that Amazon Kinesis Analytics can assumeto read the Amazon S3 object on your behalf and populate the in-application reference table.

ContentsBucketARNUpdate

Amazon Resource Name (ARN) of the S3 bucket.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:.*

Required: NoFileKeyUpdate

Object key name.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 1024.

Required: NoReferenceRoleARNUpdate

ARN of the IAM role that Amazon Kinesis Analytics can assume to read the Amazon S3 object andpopulate the in-application.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 2048.

Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+

Required: No

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

257

Page 264: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer GuideSourceSchema

SourceSchemaDescribes the format of the data in the streaming source, and how each data element maps tocorresponding columns created in the in-application stream.

ContentsRecordColumns

A list of RecordColumn objects.

Type: Array of RecordColumn (p. 247) objects

Array Members: Minimum number of 1 item. Maximum number of 1000 items.

Required: YesRecordEncoding

Specifies the encoding of the records in the streaming source. For example, UTF-8.

Type: String

Pattern: UTF-8

Required: NoRecordFormat

Specifies the format of the records on the streaming source.

Type: RecordFormat (p. 248) object

Required: Yes

See AlsoFor more information about using this API in one of the language-specific AWS SDKs, see the following:

• AWS SDK for C++• AWS SDK for Go• AWS SDK for Java• AWS SDK for Ruby V2

258

Page 265: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer Guide

Document History for AmazonKinesis Data Analytics

The following table describes the important changes to the documentation since the last release ofAmazon Kinesis Data Analytics.

• API version: 2015-08-14

• Latest documentation update: March 22, 2018

Change Description Date

AWS Lambda function examplesin Java and .NET

Code samples for creatingLambda functions forpreprocessing records andfor application destinations.For more information, seeCreating Lambda Functionsfor Preprocessing (p. 24) andCreating Lambda Functions forApplication Destinations (p. 36).

March 22, 2018

New HOTSPOTS function Locate and return informationabout relatively dense regions inyour data. For more information,see Example: Detecting Hotspotson a Stream (HOTSPOTSFunction) (p. 108).

March 19, 2018

Lambda function as adestination

Send analytics resultsto a Lambda function asa destination. For moreinformation, see Using a LambdaFunction as Output (p. 33).

December 20, 2017

NewRANDOM_CUT_FOREST_WITH_EXPLANATIONfunction

Get an explanation of whatfields contribute to an anomalyscore in a data stream. Formore information, see Example:Detecting Data Anomaliesand Getting an Explanation(RANDOM_CUT_FOREST_WITH_EXPLANATIONFunction) (p. 104).

November 2, 2017

Schema discovery on static data Run schema discovery on staticdata stored in an Amazon S3bucket. For more information,see Using the Schema DiscoveryFeature on Static Data (p. 18).

October 6, 2017

259

Page 266: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer Guide

Change Description Date

Lambda preprocessing feature Preprocess records in an inputstream with AWS Lambda beforeanalysis. For more information,see Preprocessing Data Using aLambda Function (p. 20).

October 6, 2017

Auto scaling applications Automatically increase the datathroughput of your applicationwith auto scaling. For moreinformation, see AutomaticallyScaling Applications to IncreaseThroughput (p. 42).

September 13, 2017

Multiple in-application inputstreams

Increase application throughputwith multiple in-applicationstreams. For more information,see Parallelizing InputStreams for IncreasedThroughput (p. 27).

June 29, 2017

Guide to using the AWSManagement Console for KinesisData Analytics

Edit an inferred schema andSQL code using the schemaeditor and SQL editor in theKinesis Data Analytics console.For more information, seeStep 4 (Optional) Edit theSchema and SQL Code Using theConsole (p. 55).

April 7, 2017

Public release Public release of the AmazonKinesis Data Analytics DeveloperGuide.

August 11, 2016

Preview release Preview release of the AmazonKinesis Data Analytics DeveloperGuide.

January 29, 2016

260

Page 267: Amazon Kinesis Analytics - Developer Guide

Amazon Kinesis Data Analytics Developer Guide

AWS GlossaryFor the latest AWS terminology, see the AWS Glossary in the AWS General Reference.

261