30
Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed Mohamed, Vasily Tarasov, Michael Littley, Lukas Rupprecht, Yue Cheng, Nannan Zhao, Dimitrios Skourtis, Amit S. Warke, Heiko Ludwig, Dean Hildebrand, and Ali R. Butt

Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

Improving Docker Registry Design based on Production Workload Analysis

Ali Anwar, Mohamed Mohamed, Vasily Tarasov, Michael Littley, Lukas Rupprecht, Yue Cheng,

Nannan Zhao, Dimitrios Skourtis, Amit S. Warke, Heiko Ludwig, Dean Hildebrand, and Ali R. Butt

Page 2: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

TheaveragecompanyQUINTUPLESitsDockerusagewithin

9MONTHS

Source: Datadog

Containers will be a $2.7B market by 2020*

2

§  Containers accelerate software development and distribution.

§  In 2017 alone, Docker adoption went up by 40%.

§  Containers use in enterprise and cloud infrastructure is expected to grow much faster.

*http://bit.ly/2uryjDI

Page 3: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

Docker usage patterns remain a mystery

3

§  How are Docker containers used and managed?

§  How can we streamline Docker workflows?

§  How do we facilitate Docker performance analysis?

Page 4: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

Our contribution: Characterization and optimization of Docker workflow

4

§  Conduct a large-scale analysis of a real-world Docker workload from geo-distributed IBM container service

§  Provide insights and develop heuristics to increase Docker performance

§  Develop an open source Docker workflow analysis tool* *https://dssl.cs.vt.edu/drtp/

Page 5: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

Background: Docker container image

5

§  Container images are divided into layers. §  The metadata file is called manifest. §  Users create repositories to store images. §  Images in a repository can have

different tags (versions). JSON

Layer

Layer

Layer

Manifest

Container image

}

Redis CentOS

v2.6

latest myOS

Page 6: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

Background: Docker container image

6

§  Container images are divided into layers. §  The metadata file is called manifest. §  Users create repositories to store images. §  Images in a repository can have

different tags (versions). JSON

Layer

Layer

Layer

Manifest

Container image

}

Redis CentOS

v2.6

latest myOS<user,repository,tag>

Page 7: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

Background: Docker container registry

7

§  Docker container images are stored online in Docker registry.

dockerpush dockerpull

§  Push image: 1.  HEAD layers 2.  POST/PUT layer 3.  PUT manifest

§  Pull image: 1.  GET manifest 2.  GET layers

Page 8: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

Background: Docker container registry

8

§  Docker container images are stored online in Docker registry.

dockerpush dockerpull

§  Push image: 1.  HEAD layers 2.  POST/PUT layer 3.  PUT manifest

§  Pull image: 1.  GET manifest 2.  GET layers

Significantamountofacontainerstartuptimeisspentinpullingtheimage

Page 9: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

The IBM Cloud Docker registry traces

9

§  Capture a diverse set of customers: individuals, small & medium businesses, government institutions

§  Cover five geographical locations and seven availability zones

§  Span 75 days and 38M requests that account for more than ~181TB of data transferred

Page 10: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

IBM Docker registry service

10

Five geographical locations constitute seven Availability Zones (AZ):

IBMCloudRegistryarchitecture

*The registry setup is identical, except prs and dev are only half the size of the other Azs.

IBMInternal5.  Staging(stg)

Testing*6.  Prestaging(prs)7.  Development(dev)

Production1.  Dallas(dal)2.  London(lon)3.  Frankfurt(fra)4.  Sydney(syd) Nginx

Object store Registry

Broadcaster

Registry

Registry

Stats counter

Page 11: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

11

Tracing methodology

§  Combined traces by matching the incoming HTTP request identifier across the components

§  Removed redundant fields and anonymized the traces

§  Collected data from Registry, Nginx, and Broadcaster

§  Studied requests: GET, PUT, HEAD, PATCH, POST

Registry

Broadcaster

Registry

Registry

Nginx

Page 12: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

12

{"host":" ","http.request.duration":0.879271282,"http.request.method":"GET","http.request.remoteaddr":" ","http.request.uri":"v2/ / /blobs/ ","http.request.useragent":"docker/17.04.0-cego/go1.7.5..)","http.response.status":200,"http.response.written":1518,"id":" ","timestamp":"2017-07-01T01:39:37.098Z"}

Anonymized log sample

Page 13: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

13

Q1: What is the distribution of request types?

0%20%40%60%80%

100%

dal

lon fra

syd

stg

prs

dev

Requ

ests

pull push80%–95%ofrequestsarereads(pulls)

Production:dal,lon,fra,sydIBMinternal:stgTesting:prs,dev

Page 14: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

0%

20%

40%

60%

80%

100%

dal lon fra syd stg prs dev

Requ

ests

GET POST HEAD PUT PATCH

14

Q1: What is the distribution of request types?

60%oftherequestsareGETand10%–22%areHEADrequests

Production:dal,lon,fra,sydIBMinternal:stgTesting:prs,dev

Page 15: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

15

Q2: What is the manifest size distribution?

Typicalmanifestsizeisaround1KB

Production:dal,lon,fra,sydIBMinternal:stgTesting:prs,dev

Page 16: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

16

Q3: What is the layer size distribution? 65%ofthelayersaresmallerthan1MBand

around80%aresmallerthan10MB

Production:dal,lon,fra,sydIBMinternal:stgTesting:prs,dev

Page 17: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

17

Q3: What is the layer size distribution? 65%ofthelayersaresmallerthan1MBand

around80%aresmallerthan10MB

Thereisasignificantopportunityforcachingthelayers

Production:dal,lon,fra,sydIBMinternal:stgTesting:prs,dev

Page 18: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

18

Q4: Is there spatial locality? 1%ofmostaccessedlayersaccountfor42%and59%of

allrequestsindalandsyd,respectively

Production:dal,lon,fra,sydIBMinternal:stgTesting:prs,dev

Page 19: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

19

Q4: Is there spatial locality?

0%

5%

10%

15%

20%

1 2 3 4 5 6 7 8 9 10

%ofreq

uests

Popularityrank

dal lon fra sydstg prs dev

Production:dal,lon,fra,sydIBMinternal:stgTesting:prs,dev

Thepopularityratedropsrapidlyaswemovefrommostpopulartotenthmostpopularlayer

Page 20: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

20

Q5: Can future requests be predicted?

GETmanifestrequestsarenotfollowedbyanysubsequentGETlayerrequest Production:dal,lon,fra,syd

IBMinternal:stgTesting:prs,dev

Page 21: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

21

Q5: Can future requests be predicted? SignificantincreaseinsubsequentGET

layerrequestswithinasessionProduction:dal,lon,fra,sydIBMinternal:stgTesting:prs,dev

Page 22: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

22

Q5: Can future requests be predicted? SignificantincreaseinsubsequentGET

layerrequestswithinasession

StrongcorrelationbetweenrequestsàGETlayersrequestscanbepredictedàopportunityforlayerprefetching

Production:dal,lon,fra,sydIBMinternal:stgTesting:prs,dev

Page 23: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

23

Enabling further analysis: Trace re-player

Client 1

Master Client 2

Client 3

Registry

Trace Round Robin/ Hashing (client remote address)

Performanceanalysismode

Offlineanalysismode

§  Study throughput and latency §  Understand effect of CPU,

Memory, Storage, Network

§  Simulate prefetching and caching policies §  Explore cache efficacy

Additionalanalysis§  Analyze request arrival rate at user define granularity §  Study effect of deduplication on registry size

Page 24: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

24

Effect of backend storage technologies

10210 103 104 105 106 107 108 109

Experimental setup:

§  Registry on 32 core machine with 64 GB RAM and 512 GB SSD

§  Swift object store on 10 similar nodes

§  Trace re-player on 6 additional nodes

Page 25: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

25

Effect of backend storage technologies

10210 103 104 105 106 107 108 109

Experimental setup:

§  Registry on 32 core machine with 64 GB RAM and 512 GB SSD

§  Swift object store on 10 similar nodes

§  Trace re-player on 6 additional nodes Fastbackendstorage/cachefortheregistrycansignificantlyimprovetheoverallperformance

Page 26: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

26

Effect of a two-level Main Memory+SSD cache

Experimental setup: §  Small layers (<100 MB) are stored in the main memory §  Replacement policy for both cache level is LRU §  Studied cache sizes:

RAM: 2%, 4%, 6%, 8%, and 10% of the data ingress SSD: 10x, 15x, 20x the size of RAM cache

§  Layers are content addressable à cache invalidation is not a problem

Page 27: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

27

Two-level cache: Main memory+SSD

00.20.40.60.81

2% 4% 6% 8% 10%

hitratio

dataingress

LRU:mem LRU:mem+SSD(10x)

LRU:mem+SSD(15x) LRU:mem+SSD(20x)

Dallas

Page 28: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

28

Benefit of layer prefetching

PUT layer GET manifest GET layer LMthresh MLthresh

036912

1h 12h 1d

hits/prefetch

LMthresh

ML-thresh:1hour ML-thresh:12hours ML-thresh:1day

Page 29: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

29

Summary

§  We perform a quantitative characterization of a production Docker registry deployment §  Registry workload is read intensive §  Layers sizes are small §  Strong correlation exists between layer requests

§  We propose effective caching and prefetching strategies for container layers

§  We enable further Docker investigation and optimization by making our traces and the trace re-player tool open source*

*https://dssl.cs.vt.edu/drtp/

Page 30: Improving Docker Registry Design based on Production Workload Analysis · 2019-12-18 · Improving Docker Registry Design based on Production Workload Analysis Ali Anwar, Mohamed

30

Thank You!

Questions & contact: AliAnwar,[email protected]

https://dssl.cs.vt.edu/