31
Open-source Infrastructure at Lyft Constance Caramanolis Daniel Hochman July 2017

Open-source Infrastructure at Lyft

Embed Size (px)

Citation preview

Page 1: Open-source Infrastructure at Lyft

Open-source Infrastructure at LyftConstance Caramanolis

Daniel Hochman July 2017

Page 2: Open-source Infrastructure at Lyft

Overview of Lyft Architecture

Open-source Infrastructure Projects

- Confidant

- Discovery

- Ratelimit

- Envoy

Q&A

Agenda

Page 3: Open-source Infrastructure at Lyft

Architecture (simplified)

Front Envoy Application

Envoy

DiscoveryConfidant

>100 Clusters

Ratelimit

Page 4: Open-source Infrastructure at Lyft

Python

lyft / confidant

Your secret keeper. Stores secrets in Dynamo, encrypted at rest.

1,105

12 contributors

November 2015

Page 5: Open-source Infrastructure at Lyft

How is a service configured?

lyft / location-service Private

common:

PORT: 8080

TIMEOUT_MS: 15000

development:

USE_AUTH: False

staging:

API_KEY: secret_key_igjq3i494fqq234qbc

production:

API_KEY: secret_key_ojajf823jj49ij8h

environment.yaml

Page 6: Open-source Infrastructure at Lyft

Servicelocation-service

Confidant to the rescue!

Credentialapi_key: password123

Page 7: Open-source Infrastructure at Lyft

Behind the scenes

Application

IAM Role

EC2 Instance

Credential

api_key: password123

api_key = os.getenv('CREDENTIAL_API_KEY')

KMS

DynamoDB

Confidant

Page 8: Open-source Infrastructure at Lyft

Server-blind secrets

Highly sensitive secrets are encrypted and decrypted by the end-users.

Confidant stores but can't read them.

Confidant

KMS

IAM Role

EC2 Instance

Page 9: Open-source Infrastructure at Lyft

lyft / discovery

Provides a REST interface for querying for the list of hosts that belong to a microservices

54

6 contributors

Python

August 2016

Page 10: Open-source Infrastructure at Lyft

POST /v1/registration/location-service

{

"ip": "10.0.0.1",

"port": 80,

"revision": "da08f35b",

"tags": {

"id": "i-910203",

"az": "us-east-1a",

"canary": true

}

}

Tracking hosts

* * * * *

Page 11: Open-source Infrastructure at Lyft

- Hosts are stored in DynamoDB

- Storage support is abstract

- Hosts removed if not reporting since now - HOST_TTL

- Ecosystem designed to tolerate eventual consistency

unlike Zookeeper, etcd, Consul

- Pair with active healthchecks

Storage

DynamoDB

Page 12: Open-source Infrastructure at Lyft

GET /v1/registration/<service>

{

"hosts": [

{

"ip": "10.0.0.1", "port": 80, "revision": "da08f35b",

"tags": {"id": "i-910203", "az": "us-east-1a", "canary": true}

},

...

{

"ip": "10.0.0.2", "port": 80, "revision": "da08f35b",

"tags": {"id": "i-121286", "az": "us-east-1d"}

}

]

}

Fetching hosts

Page 13: Open-source Infrastructure at Lyft

Services list the hosts they want to talk to!

internal_hosts:

- jobscheduler

- roads

external_hosts:

- dynamodb_iad

- kinesis_iad

Envoy per-service configuration

location-service/envoy.yaml

/etc/envoy.conf(on the box)

Page 14: Open-source Infrastructure at Lyft

Active Healthcheck

Application

Envoy

Discovery

jobscheduler

roads

GET /healthcheck

Application

Envoy

GET

GET

Every host healthchecks every host in a destination cluster

location-service

Page 15: Open-source Infrastructure at Lyft

lyft / ratelimit

Go/gRPC service designed to enable generic rate limit scenarios

224

6 contributors

Go

January 2017

Page 16: Open-source Infrastructure at Lyft

Why rate limit?

- Control flow

- Protect against attacks

- Bad actors

- Accidents happenoops

!

Page 17: Open-source Infrastructure at Lyft

Rate Limit Service

- Written in Go

- Enable generic rate limit

scenarios

- Decisions based on a domain

and set of descriptors

- Settings configured at runtime

- Backed by Redis

Ratelimit

?

INCR

Page 18: Open-source Infrastructure at Lyft

Domains and descriptors

Domain

Defines a container for a set of rate limits

Globally unique

e.g. "envoy_front"

Descriptors

Ordered list of key/value pairs

Case sensitive

e.g. ("destination_cluster", "location-service"), ("user_id", "1234")

Page 19: Open-source Infrastructure at Lyft

Limit definition

Runtime Setting

Defines the request per unit for a descriptor.

Page 20: Open-source Infrastructure at Lyft

Request flow example

Rq1: (“user_id”, “1234”)

Redis state: user_id_1234 : 1

Rs1: RateLimitResponse_OK

Rq2: (“user_id”, “9876”)

Redis state: user_id_1234: 1, user_id_9876 : 1

Rs2: RateLimitResponse_OK

Rq3: (“user_id”, “1234)

Redis state: user_id_1234: 2, user_id_9876 : 1

Rs3: RateLimitResponse_OVER_LIMIT

Definition

domain: test_domain

key: user_id

rate_limit:

unit: hour

requests_per_unit: 1

Page 21: Open-source Infrastructure at Lyft

Ratelimit Client

from lyft_idl.client.ratelimit.ratelimit_client import RateLimitClient

ratelimit_client = RateLimitClient(settings.LYFT_API_USER_AGENT)

# Determines whether or not to limit jsonp_messages_post according to ratelimit service.

def should_allow_jsonp_messages_post(ip_address, phone_number):

domain = settings.get('RATE_LIMIT_DOMAIN')

ip_descriptors = [(('jsonp_messages_post_from_ip_address', ip_address), )]

phone_descriptors = [(('jsonp_messages_post_from_phone_number', phone_number), )]

return (

ratelimit_client.is_request_allowed(domain, ip_descriptors) and

ratelimit_client.is_request_allowed(domain, phone_descriptors)

)

Page 22: Open-source Infrastructure at Lyft

lyft / envoy

Front/service L7 proxy

1,924

62 contributors

C++

September 2016

Page 23: Open-source Infrastructure at Lyft

Why Envoy?

Service Oriented Architecture

- Many languages and frameworks

- Protocols (HTTP/1, HTTP/2, databases, caching, etc…)

- Partial implementation of SoA best practices (retries, timeouts, …)

- Observability

- Load balancers (AWS, F5)

Page 24: Open-source Infrastructure at Lyft

What is Envoy?

The network should be transparent to applications.

When network and application problems do occur it

should be easy to determine the source of the problem.

Page 25: Open-source Infrastructure at Lyft

What is Envoy?

- Modern C++11

- Runs alongside applications

- Service discovery integration

- Rate Limit integration

- HTTP2 first (get gRPC!)

- Act as front/edge proxy

- Stats, Stats, Stats

- Logging

Page 26: Open-source Infrastructure at Lyft

Observability: Global Health

Page 27: Open-source Infrastructure at Lyft

Observability: Service to Service

Page 28: Open-source Infrastructure at Lyft

Envoy Client in Python (internal)

from lyft.api_client import EnvoyClient

switchboard_client = EnvoyClient(

service='switchboard'

)

switchboard_client.post(

"/v2/messages",

data={

'template': 'welcome'

},

headers={

'x-lyft-user-id': 12345647363394

}

)

Page 29: Open-source Infrastructure at Lyft

Envoy deployment @Lyft

- > 100 services

- > 10,000 hosts

- > 2,000,000 RPS

- All service to service traffic (REST and gRPC)

- MongoDB, DynamoDB, Redis proxy

- External service proxy (AWS and other partners)

- Kibana/Elastic Search for logging.

- LightStep for tracing

- Wavefront for stats

Page 30: Open-source Infrastructure at Lyft

Architecture RevisitedFront Envoy

Application

Envoy

DiscoveryConfidant

>100 Clusters

Ratelimit

Page 31: Open-source Infrastructure at Lyft

Done!

- Lyft is hiring. If you want to work on large-scale problems in a fast-moving,

high-growth company visit lyft.com/jobs

- Visit github.com/lyft

- Slides available at slideshare.net/danielhochman

- Q&A