Finding a home for your data

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Finding a home for your data in your serverless app

Marcilio Mendonca

S V S 2 2 3 - R 1

Sr. Solutions Developer

Amazon Web Services

Why are we here today?Why are we here today?

1) To discuss popular AWS database options for serverless applications

2) To discuss best practices for interacting with various databases from a serverless application

Agenda

• Amazon Relational Database Service (Amazon RDS)

• Amazon Aurora

• Amazon Aurora Serverless and the Data API

• Amazon DynamoDB

• Amazon Simple Storage Service (Amazon S3)

• Other data store options

Amazon RDS

• Relational databases (SQL)

• Cost-efficient and resizable capacity

• Automates hardware provisioning, database setup, patching and backups

• Supports: MySQL, PostgreSQL, MariaDB, Oracle, MS SQL Server

Amazon Aurora

• Relational database (SQL)

• MySQL and PostgreSQL-compatible

• Automates hardware provisioning, database setup, patching and backups

• Security, availability and reliability of commercial databases at 1/10th the cost

Amazon Aurora Serverless

• Same as Amazon Aurora but with no need to manage DB servers

• Simple, cost-effective option for infrequent, intermittent, or unpredictable workloads

• Is supported by the Data API for SQL

Amazon S3

• Object store service

• Massive scale

• Low latency, high throughput

• 99.999999999% durability, 99.99% availability

Amazon DynamoDB

• Key/value store

• Massive scale (horizontal)

• Schemaless (hash and sort keys)

• Serverless

• API-driven

• Multi-region (global tables)

Factors to consider when choosing a database technology

Must support SQL

• Amazon RDS, Amazon Aurora, Amazon Aurora Serverless, Amazon S3 (Athena, S3 Select)

Don’t want to change my existing SQL applications

• Amazon RDS, Amazon Aurora, Amazon Aurora Serverless

Need massive scale (hundreds of thousands of req/s) and/or a flexible schema

• Amazon DynamoDB, Amazon S3

Need to store and handle arbitrary objects (e.g., large media files)

• Amazon S3

Factors to consider when choosing a database technology

Need to do event-driven data processing

• Amazon DynamoDB, Amazon S3

Don’t want to manage servers!

• Amazon Aurora Serverless, Amazon DynamoDB

I’m expecting infrequent, intermittent or unpredictable workload


I don’t want to have to deal with database connections


For further info on database options, please check:

FSI309 - Relational databases: Performance, scale and availability

DAT301 - Data modeling with Amazon DynamoDB in 60 minutes

DAT334 - Advanced design patterns for Amazon DynamoDB

DAT202 – What’s new in Amazon Aurora

DAT355 - How to choose between Amazon Aurora MySQL and PostgreSQL

DAT309 - Amazon Aurora storage demystified: How it all works

DAT402 - Going deep on Amazon Aurora Serverless

DAT207 – What’s new in Amazon RDS

DAT316 - MySQL options on AWS: Self-managed, managed and serverless

DAT317 - PostgreSQL options on AWS: Self-managed, managed and serverless

DAT336 - Process data using cloud databases and serverless technologies



import my_package...

# Initialization codemy_var_1 = ”hello"my_var_2 = ”world"...

# Execution entry pointdef lambda_handler(event, context):

try:# Your code here

except Exception as e:logger.error(f”Oops, something went wrong: {e}")raise e

...

Lambda Code: Initialization vs. Entry Point

Executed once per execution environment provisioning

Executed once per request

Amazon RDS: Connect to the database from AWS Lambdaimport pymysql...

# RDS Settingsrds_host = "payrolldbinstance.c0zzsas12345.us-east-1.rds.amazonaws.com"username = 'admin'password = '12345678’db_name = 'PayrollDB'db_port = 3306...

# Execution entry pointdef lambda_handler(event, context):

try:conn = pymysql.connect(rds_host, user=username,

passwd=password, db=db_name, connect_timeout=5, port=db_port)except pymysql.MySQLError as e:

logger.error(f"Could not connect to MySQL instance: {e}")raise e

...

import pymysql...

# RDS Settingsrds_host = "payrolldbinstance.c0zzsas12345.us-east-1.rds.amazonaws.com"username = 'admin'password = '12345678’ <== NEVER DO THIS!db_name = 'PayrollDB'db_port = 3306...

def lambda_handler(event, context):try:

conn = pymysql.connect(rds_host, user=username,passwd=password, db=db_name, connect_timeout=5, port=db_port)

except pymysql.MySQLError as e:logger.error(f"Could not connect to MySQL instance: {e}")raise e

...

Amazon RDS: No explicit password in your code!

Amazon RDS: Reuse the DB connections

...

try:conn = pymysql.connect(db_endpoint, user=username,

passwd=password, db=db_name, connect_timeout=5, port=db_port)except pymysql.MySQLError as e:

logger.error(f”Could not connect to MySQL instance: {e}")raise e

def lambda_handler(event, context):with conn.cursor() as cur:...

(reused across invocations)

Amazon RDS: Consider using “lazy instantiation”

# module-level database connection objectconn = None

# returns existing database connection or creates one if neededdef get_database_connection():

global connif conn == None:

conn = pymysql.connect(...)return conn

def input_is_valid(event):...

def lambda_handler(event, context):

if (input_is_valid(event)):conn = get_database_connection()...

import functools

@singletondef get_database_connection():

conn = pymysql.connect(...)return conn

def input_is_valid(event):...

def lambda_handler(event, context):

if (input_is_valid(event)):conn = get_database_connection()...

Amazon RDS: Consider using “singleton” connections

def singleton(func) :singleton_obj = [email protected](func)def wrapper(*args, **kwargs):

nonlocal singleton_objif singleton_obj == None:

singleton_obj = func()return singleton_obj

return wrapper

Amazon RDS: Don’t hardcode database parametersimport os

...db_endpoint = os.environ['DB_ENDPOINT']db_port = int(os.environ['DB_PORT'])db_name = os.environ['DB_NAME']db_employee_table = os.environ['DB_EMPLOYEE_TABLE’]db_username = os.environ[‘DB_USERNAME']db_password = os.environ[‘DB_PASSWORD']

try:conn = pymysql.connect(db_endpoint, user=db_username,

passwd=db_password, db=db_name, connect_timeout=5, port=db_port)except pymysql.MySQLError as e:

logger.error(f”Could not connect to MySQL instance: {e}")raise e

def lambda_handler(event, context):with conn.cursor() as cur:...

Amazon RDS: Store DB credentials on AWS Secrets Managerimport boto3...# Secrets Manager Settingssecret_name = "ExampleDBCredentials”. # You might also fetch this from an env. variablesecrets_client = boto3.client('secretsmanager')

# Get DB credentials from AWS Secrets Managertry:

secret_response = secrets_client.get_secret_value(SecretId=secret_name)except Exception as e:

raise eelse:

secret_string = json.loads(secret_response["SecretString"])db_username = secret_string["user"]db_password = secret_string[“password”]

# Initialize the Database Connection (runs only once when Lambda env is provisioned)try:

conn = pymysql.connect(rds_host, user=db_username,passwd=db_password, db=db_name, connect_timeout=5, port=db_port)

...

Amazon RDS: Running DDL and DML statements

def lambda_handler(event, context):with conn.cursor() as cur:

try:# Creates the Employee Tablecur.execute(f"CREATE TABLE IF NOT EXISTS {db_employee_table}

(EmpID int NOT NULL, Name varchar(255) NOT NULL, PRIMARY KEY (EmpID))")# Inserts Employee Dataemp_id = random.randint(1,100000)cur.execute(f"INSERT INTO {db_employee_table} (EmpID, Name) VALUES (%s, %s)",

(emp_id, f"Employee-{emp_id}"))except:

conn.rollback()raise

else:conn.commit()# Traverses all Employeescur.execute(f"SELECT * FROM {db_employee_table}")rows = cur.fetchall()return format(rows)

1

4

2

3


Amazon Aurora: Connect to the database cluster

• You can connect to an Aurora DB cluster using the same tools that you use to connect to a MySQL or PostgreSQL database

• There are writer and reader (1+ read replicas) endpoints

Amazon Aurora: Use the Writer endpoint...db_w_endpoint = os.environ[’DB_W_ENDPOINT’]...try:

wconn = pymysql.connect(db_w_endpoint, user=db_username,passwd=db_password, db=db_name, connect_timeout=5, port=db_port)


def lambda_handler(event, context):with wconn.cursor() as wcur:

try:# Creates the Employee Tablewcur.execute(f"CREATE TABLE IF NOT EXISTS {db_employee_table}

(EmpID int NOT NULL, Name varchar(255) NOT NULL, PRIMARY KEY (EmpID))")# Inserts Employee Dataemp_id = random.randint(1,100000)wcur.execute(f"INSERT INTO {db_employee_table} (EmpID, Name)

VALUES (%s, %s)", (emp_id, f"Employee-{emp_id}"))except:

wconn.rollback()raise

else:wconn.commit()

Amazon Aurora: Use the Reader endpoint

...db_r_endpoint = os.environ['DB_R_ENDPOINT’]...try:

rconn = pymysql.connect(db_r_endpoint, user=db_username,passwd=db_password, db=db_name, connect_timeout=5, port=db_port)


def lambda_handler(event, context):with rconn.cursor() as rcur:

try:# Traverses all Employeesrcur.execute(f"SELECT * FROM {db_employee_table}")rows = rcur.fetchall()return format(rows)

except:raise


The Data API for Aurora ServerlessInteract w/an Aurora Serverless cluster using an API!

https://aws.amazon.com/blogs/database/using-the-data-api-to-interact-with-an-amazon-aurora-serverless-mysql-database/

https://aws.amazon.com/blogs/database/using-the-data-api-to-interact-with-an-amazon-aurora-serverless-mysql-database/

Amazon Aurora Serverless and the Data API

• Use the AWS CLI

• Or the various AWS SDKs

• Node.js, Java, Python

• PHP, Go, Javascript

• .NET, Ruby, C++

deprecated

Amazon Aurora Serverless and the Data APIPython SDK (boto3)

import boto3

rds_client = boto3.client('rds-data')

cluster_arn = 'arn:aws:rds:us-east-1:123456789012:cluster:mydbcluster'secret_arn = 'arn:aws:secretsmanager:us-east-1:123456789012:secret:mysecret'

response = rds_client.execute_statement(resourceArn = cluster_arn,secretArn = secret_arn,database = 'mydb',sql = 'select * from employees limit 3')

print (response['records'])

Amazon Aurora Serverless and the Data APIWrap execute_statement()

def execute_statement(sql, sql_parameters=[], transaction_id=None):parameters = {

'secretArn': db_credentials_secrets_store_arn,'database': database_name,'resourceArn': db_cluster_arn,'sql': sql,'parameters': sql_parameters

}if transaction_id is not None:

parameters['transactionId'] = transaction_idresponse = rds_client.execute_statement(**parameters)return response

response = execute_statement('select * from package')

Amazon Aurora Serverless and the Data APIParameterized statements (prevent SQL injection attacks)

sql = 'select * from package where package_name=:package_name'package_name = ’Python'sql_parameters = [{'name':'package_name', 'value':{'stringValue': f'{package_name}'}}]response = execute_statement(sql, sql_parameters)print(response['records'])

sql = 'insert into package (package_name, package_version)values (:package_name, :package_version)'

sql_parameters = [{'name':'package_name', 'value':{'stringValue': ’Python'}},{'name':'package_version', 'value':{'stringValue': ’3.7.0'}}

]response = execute_statement(sql, sql_parameters)print(f'Number of records updated: {response["numberOfRecordsUpdated"]}')

Query data

Insert data

Amazon Aurora Serverless and the Data APIWrap batch_execute_statement()

def batch_execute_statement(sql, sql_parameter_sets):response = rds_client.batch_execute_statement(

secretArn=db_credentials_secrets_store_arn,database=database_name,resourceArn=db_cluster_arn,sql=sql,parameterSets=sql_parameter_sets

)return response

Amazon Aurora Serverless and the Data APIBatching inserts

sql = 'insert into package (package_name, package_version)values (:package_name, :package_version)'

sql_parameter_sets = []for i in range(1,101):

entry = [{'name':'package_name', 'value':{'stringValue': f'package{i}'}},{'name':'package_version', 'value':{'stringValue': ’v1.0'}}

]sql_parameter_sets.append(entry)

response = batch_execute_statement(sql, sql_parameter_sets)print(f'Number of records updated: {len(response["updateResults"])}')

transaction = rds_client.begin_transaction( secretArn=db_credentials_secrets_store_arn,resourceArn=db_cluster_arn, database=database_name)

try:sql = 'insert into package (package_name, package_version) values (:package_name, :package_version)'sql_parameter_sets = []for i in range(package_start_idx,package_end_idx):

entry = [ {'name':'package_name', 'value':{'stringValue': f'package-{i}'}},{'name':'package_version', 'value':{'stringValue': 'version-1'}}]

sql_parameter_sets.append(entry)response = batch_execute_statement(sql, sql_parameter_sets, transaction['transactionId'])

except Exception as e:transaction_response = rds_client.rollback_transaction(

secretArn=db_credentials_secrets_store_arn,resourceArn=db_cluster_arn,transactionId=transaction['transactionId’])raise

else:transaction_response = rds_client.commit_transaction(

secretArn=db_credentials_secrets_store_arn,resourceArn=db_cluster_arn,transactionId=transaction['transactionId'])

Amazon Aurora Serverless and the Data APIHandling transactions


Amazon DynamoDB: Create tableimport boto3...

# DynamoDB high-level clientddb_client = boto3.resource('dynamodb’)# Read table name from environmentddb_payroll_tablename = os.environ['DDB_TABLE_NAME']

def create_payroll_table():logger.info(f'Creating table {ddb_payroll_tablename}...')table = ddb_client.create_table(

TableName=ddb_payroll_tablename,KeySchema=[{'AttributeName': 'emp_id', 'KeyType': 'HASH'}],AttributeDefinitions=[{'AttributeName': 'emp_id', 'AttributeType': 'S'}],ProvisionedThroughput={'ReadCapacityUnits': 5,'WriteCapacityUnits': 5}

)# Wait until the table exists.table.meta.client.get_waiter('table_exists').wait(TableName=ddb_payroll_tablename)return table.creation_date_time

def lambda_handler(event, context):create_payroll_table()...

... # Payroll table objectddb_payroll_table = ddb_client.Table(ddb_payroll_tablename)

def add_employee(emp_id, first_name, last_name):logger.info(f'Adding employee {emp_id} to table {ddb_payroll_tablename}...')ddb_payroll_table.put_item(

Item={ 'emp_id': emp_id, 'first_name': first_name, 'last_name': last_name })

def get_employee(emp_id):logger.info(f'Searching employee {emp_id} on table {ddb_payroll_tablename}...')response = ddb_payroll_table.get_item(Key={'emp_id': emp_id})return response['Item']

def lambda_handler(event, context):add_employee('12345678', 'John', 'Smith')print(get_employee('12345678'))

Amazon DynamoDB: Put and get item

import boto3...

from aws_xray_sdk.core import xray_recorderfrom aws_xray_sdk.core import patch_all

# Only instrument libraries if not running locallyif "AWS_SAM_LOCAL" not in os.environ:

patch_all()

...ddb_client = boto3.resource('dynamodb', endpoint_url=ddb_endpoint_url)...

def lambda_handler(event, context):// Your code here

Amazon DynamoDB and other DBs: Tracing with AWS X-Ray

Example: Tracing DynamoDB PutItem calls (sequential vs. batch)

278 ms 1,132 ms

27.8 ms 113.2 ms

Amazon DynamoDB: Running locallyhttps://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.html

# start DynamoDB locally$ docker run -p 8000:8000 amazon/dynamodb-local# env variable we set and use in Lambda code when developing locally

$ export DDB_ENDPOINT_URL=“http://localhost:8000” # if running as a Python script$ export DDB_ENDPOINT_URL=“http://docker.for.mac.localhost:8000” # sam local invoke or mac

# Read table name from environmentddb_payroll_tablename = os.environ['DDB_TABLE_NAME']# Set the endpoint for local testing (eg. http://localhost:8000), unset for productionddb_endpoint_url = os.getenv('DDB_ENDPOINT_URL', None)# DynamoDB high-level client (for low-level client use boto3.client('dynamodb') instead)ddb_client = boto3.resource('dynamodb', endpoint_url=ddb_endpoint_url)# Payroll table objectddb_payroll_table = ddb_client.Table(ddb_payroll_tablename)...

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.html

http://localhost:8000/

http://docker.for.mac.localhost:8000/


Amazon S3: Real-time image resizing example

Amazon S3: Configure Amazon S3 events using AWS SAM

AWSTemplateFormatVersion: '2010-09-09'Transform: AWS::Serverless-2016-10-31Resources:CreateThumbnail:Type: AWS::Serverless::FunctionProperties:Handler: example-s3-resize-image.lambda_handlerRuntime: python3.6Timeout: 60Policies: AWSLambdaExecuteEvents:ResizeImageEvent:Type: S3Properties:

Bucket: !Ref SrcBucketEvents: s3:ObjectCreated:*

Amazon S3: Resize images stored in Amazon S3

s3_client = boto3.client('s3')

def resize_image(image_path, resized_path):with Image.open(image_path) as image:

image.thumbnail(tuple(x / 2 for x in image.size))image.save(resized_path)

def lambda_handler(event, context):for record in event['Records']:

bucket = record['s3']['bucket']['name']key = unquote_plus(record['s3']['object']['key'])download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)upload_path = '/tmp/resized-{}'.format(key)s3_client.download_file(bucket, key, download_path)resize_image(download_path, upload_path)s3_client.upload_file(upload_path, '{}resized'.format(bucket), key)

https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example-deployment-pkg.html

https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example-deployment-pkg.html

AWS Bookstore Demo App

A simple bookstore serverless application making use of various purpose-built databases on AWS

• https://github.com/aws-samples/aws-bookstore-demo-app

re:Invent 2018: Databases on AWS: The Right Tool for the Right Job

• https://www.youtube.com/watch?v=-pb-DkD6cWg

• https://aws.amazon.com/blogs/database/building-a-modern-application-with-purpose-built-aws-databases/

https://github.com/aws-samples/aws-bookstore-demo-app

https://www.youtube.com/watch?v=-pb-DkD6cWg

https://aws.amazon.com/blogs/database/building-a-modern-application-with-purpose-built-aws-databases/

I’d like to invite you to join me in one of these hands-on sessions:

SVS333-R - Build serverless APIs supported by Amazon Aurora Serverless and the Data API

• Wednesday, Dec 4, 4:00 PM - 5:00 PM – Mirage, Events Center C1 - Table 6

• Thursday, Dec 5, 3:15 PM - 4:15 PM – Aria, Level 1 West, Bristlecone 4 - Table 10


Free, on-demand courses on serverless, including

Visit the Learning Library at https://aws.training

Additional digital and classroom trainings cover modern application development and computing

Learn serverless with AWS Training and CertificationResources created by the experts at AWS to help you learn modern application development

• Introduction to Serverless

Development

• Getting into the Serverless

Mindset

• AWS Lambda Foundations

• Amazon API Gateway for

Serverless Applications

• Amazon DynamoDB for Serverless

Architectures

Thank you!


Marcilio Mendonca

[email protected]



Amazon RDS: Create secret on AWS Secrets Manager

• Create a DB user/pass for your application in the database (e.g., MySQL)

• Add the DB user credentials to Secrets Manager

aws secretsmanager create-secret --name Apps/Payroll/--secret-string"{'user’:’payroll_app','password’: xxxxxxxx'}"

Amazon RDS: IAM database authenticationhttps://aws.amazon.com/premiumsupport/knowledge-center/users-connect-rds-iam/

Check the DB engines that support IAM database authentication

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"rds-db:connect"

],

"Resource": [

"arn:aws:rds-db:us-east-1:1234567890:

dbuser:db-ABCDEFGHIJKL01234/PayrollApp"

]

}

]

}

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAMDBAuth.html#UsingWithRDS.IAMDBAuth.Availability

https://aws.amazon.com/premiumsupport/knowledge-center/users-connect-rds-iam/


Amazon RDS: IAM database authenticationhttps://aws.amazon.com/premiumsupport/knowledge-center/users-connect-rds-iam/

Check the DB engines that support IAM database authentication

# RDS Settings fetched from Environment Variablesdb_endpoint = os.environ['DB_ENDPOINT']db_port = int(os.environ['DB_PORT'])db_name = os.environ['DB_NAME']db_username = os.environ['DB_USERNAME']db_employee_table = os.environ['DB_EMPLOYEE_TABLE']

# Use IAM Credentials to Connect to the MySQL Databaserds_client = boto3.client('rds')db_token = rds_client.generate_db_auth_token(db_endpoint, db_port, db_username)try:

conn = pymysql.connect(db_endpoint, user=db_username, passwd=db_token,db=db_name, connect_timeout=5, port=db_port)


https://aws.amazon.com/premiumsupport/knowledge-center/users-connect-rds-iam/

Amazon RDS: IAM database authentication

• AWS recommends the following when using the MySQL engine:

• Use IAM database authentication as a mechanism for temporary, personal access to databases

• Use IAM database authentication only for workloads that can be easily retried

• Don't use IAM database authentication if your application requires more than 256 new connections per second

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAMDBAuth.html#

UsingWithRDS.IAMDBAuth.Availability


Amazon Aurora: Set up user/pass upon cluster provisioningE.g., PostgreSQL quick start (AWS CloudFormation)https://github.com/aws-samples/aws-aurora-cloudformation-samples/blob/master/cftemplates/Aurora-Postgres-DB-Cluster.yml

https://github.com/aws-samples/aws-aurora-cloudformation-samples/blob/master/cftemplates/Aurora-Postgres-DB-Cluster.yml

Amazon Aurora: Single endpoint for writers and readers?

Amazon DynamoDB: Use AWS SAM to create simple tables

https://github.com/awslabs/serverless-application-model

https://aws.amazon.com/serverless/sam/

DynamoDBEmployeeTable:Type: AWS::Serverless::SimpleTableProperties:

TableName: EmployeePrimaryKey:

Name: emp_idType: String

ProvisionedThroughput:ReadCapacityUnits: 5WriteCapacityUnits: 5

Tags:Department: EngineeringAppType: Serverless

SSESpecification:SSEEnabled: true

https://github.com/awslabs/serverless-application-model

https://aws.amazon.com/serverless/sam/

Other database options for your serverless application

Amazon ElastiCache

• Managed, Redis or Memcached-

compatible in-memory data store

• Serverless apps can cache data for faster

lookups (e.g., online auctions)

Amazon Redshift

• Fully managed, petabyte-scale data warehouse service in the cloud

• Serverless apps can be used to populate data into Amazon Redshift clusters and to query and aggregate data into reports

Amazon Elasticsearch Service (Amazon ES)

• Fully managed service to deploy, secure,

and operate Amazon ES at scale

• Serverless apps can be used to ingest

data into Amazon ES clusters (e.g., from

Amazon S3)

Amazon Neptune

• Fast, reliable, fully managed graph database service

• Build serverless applications that make use of highly connected datasets

Documents

Finding a home for your data