The Great Migration by Baruch Sadogursky

Embed Size (px)

Citation preview

THE GREAT MIGRATION@jbaruch

?

a.k.a. @jbaruch

JFrog

Artifactory

Bintray

Groovy

Architecturequick overview to fill inThen get to business

Social software distribution platform

InterwebzClientClient

DNS

InteractionDownloadESDBDBCDN

Load Balancer

HTTPS

bintray.com

dl.bintray.com

The download server is the most importantdl.bintray.comJcenter.bintray.comJcenter on Grails or Gradle

Webapp&Micro

GrailsGroovySpockGebGParsBuild&Auto

GradleGroovyGVMCrash~ 90%

? , !

, ...

- SOAP -!

!

1 + 1 = 21 + 2 = 31 + 3 = ?

: Sitemap

crawler-?

Runs once a dayIterates over all major content

https://www.flickr.com/photos/calafellvalo/2859947965

Runs once a dayIterates over all major content

XML

50MB max

50K entries maxDon't we all love XML.Makes incremental changes a PITA

WebappJenkins

nginxSitemapDomainsMapper

Still smallNot much contentPragmaticRuns once a dayIterates over all major content.Query GORM and write the output

10sK

Users80K

Repositories230K

Packages+40

MB of XMLIncremental update is a PITA

, ,

WebappJenkins

nginxSitemapDomainsMapper

https://www.flickr.com/photos/bryanburke/2854366734

Not the actual JFrog offices

Jenkins

, ,

DB

XML

Garbage

MapperNot quick enough,I love GORM, but gorm is chubbyToo much too soon.The are solutions:Could provide more memory, stupid, heroin addictionFire up an interaction instance closed off to the world and run it there

,

https://www.flickr.com/photos/mit-libraries/3424098958

?

We cannot allow it to disturb either the webapp or the download server

As quick as possible, but throttle should it disturb any service

Can be interrupted so that It doesn't do a kamikaze

!

Site MapperJenkins

XML

!

Ratpack poweredsite mapper

HTTPCommands

CrashCommands

Dump index as JSON

Slurp with JSONSlurper

Write XML with XMLBuilder

GPars

!

Reads and writes are concurrentCan be stopped in the middleSpeed throttled

ElasticSearch !

https://github.com/elastic/elasticsearch-groovy

The syntax is closer to the source JSON

sitemap, !

:

RPM

RedhatYUM4 XML+GZRepositoryPOM

JavaMaven1 XMLModuleDEB

DebianAPT1 TEXT/GZStructuredExplain typed and indexed repositories in bintray

Majority of tools host an index consumed by the client.Repositories must update these indexes with every update to the repo

Maven also reflects on gradle it's the same indexing

- ,

RPM

createrepoPOM

AetherDEB

reprepoClient

Index

Binaries

IndexBinariesCreate a filesystem folder with all files, run the tool.Run every time you change the index.Requires native toolsOS dependent

: repomd

RPM

Maven

Debian

Bower

Docker

Gems

NPM

Pypi

Vagrant

API Java ( Groovy)

OS agnosticEmbeddableusable by any JVM language

Client

Publish

IndexBinaries

~600K

Files~3

Hours~400

MB of Index

WebapprepomdMaven is outstanding - Many package systems embed descriptors in the package EXPLAIN THISHave to fetch the descriptorsQuad core machineMultiple workers

Multiply by amount of customersConcurrent calculations

https://www.flickr.com/photos/thinredjellies/408275494

Client

Publish

Instance 1repomdLoadBalancer

Instance ...repomd

Instance NrepomdDB

How can you tell when a multi-module deployment completes? - EXPLAIN THISHow can you synchronize multiple embedded repomd processes?

Now we are stuck with

, !

https://www.flickr.com/photos/wackystuff/14931244568

Resque

Redis

Atomic, O(1)DistributedPersistenceQueriesReplicationQueue

Per type
Worker/
ProducerJesque

Java
Flavored
Resquehttps://github.com/resque/resquehttps://github.com/gresrun/jesque

Resque started in githubNeeded monitorable throw away tasksImportant but not lifesaving criticalQueue based workers Jesque java based

def job = new Job('WorkerClassName', ['arg1', 'arg2'])jesqueClient.enqueue('queueName', job)ClientRedis

Jesquedef factorySettings = [workerName: WorkerClass]def jobFactory = new MapBasedJobFactory(factorySettings)def worker = new WorkerImplFactory(jesqueConfig, ['queueName'], jobFactory).call()threadPool.submit(worker)

!

enqueue('workerType') { arg1 = 'value1' arg2 = false}BintrayRedis

Gresquesubmit(WorkerType) { conf1 = 'value1' conf2 = false}GparsHandle clustering betterNice DSL

...

BintrayEvents(QuietPeriod)

BintrayBintray

Redis

Ratpack, BTW

!

, , XML RPM

primary.xml

General archive infofilelists.xml

Lists files in RPMother.xml

Misc attributesrepomd.xml

Inventory of above

.

Truely sorry you had to witness

DOM/STAX LOL

XMLBuilder

private def primaryPackageBuilder = { del, packageMd -> del.'package'(type: 'rpm') { name(packageMd.name) arch(packageMd.architecture) version(epoch: packageMd.epoch, ver: packageMd.version, rel: packageMd.release) checksum(type: 'sha', pkgid: 'YES', packageMd.sha1Digest) summary(packageMd.summary) description(packageMd.description) packager(packageMd.packager) url(packageMd.url) time(file: packageMd.lastModified, build: packageMd.buildTime) size(package: packageMd.size, installed: packageMd.installedSize, archive: packageMd.archiveSize) location(href: packageMd.artifactRelativePath) format { 'rpm:license'(packageMd.license) 'rpm:vendor'(packageMd.vendor) 'rpm:group'(packageMd.group) 'rpm:buildhost'(packageMd.buildHost) 'rpm:sourcerpm'(packageMd.sourceRpm) 'rpm:header-range'(start: packageMd.headerStart, end: packageMd.headerEnd) ...

:

,

I promise you we don't lie

Don't ask me which country it is; I made it up and I was drunk

-

Per minute

DL Server

DL ServerDL Server

RedisPer dayPer country

Geo
IP

Mongo(UI)Downloadable
Log files

Not part of the webapp but a monolith

Stats data is fat and we can't afford to retain it.Rather suffer the overhead async

100500 . .

Kekekekekekekekekeke

Not the actual datacenter

http://www.mengsbizarreadventure.com/2010/starcraft-2-betaaka-zerg-rush-kekekeke-hd/

Incremental update is a PITA

Per minutePer dayPer country

,

Per minutePer dayPer country

Mongo(UI)Downloadable
Log filesProximity to datacenters

-

Per minutePer dayPer country

Geo
IP

Mongo(UI)Downloadable
Log files

!

http://galleryhip.com/black-and-white-fight-scene-kill-bill.html

?

We cannot allow it to disturb either the webapp or the download server

As quick as possible, but throttle should it disturb any service

Can be interrupted so that It doesn't do a kamikaze

3-

Gather

Format

ScatterWhat do I mean?Sounds like a motto; sounds like steve balmer; I assure you these aren't just buzzwordsGather download servers are spewing out info BAM!Format Regain data lost when reportedScatter Make sure that the services facing the user get the information they need. Quick!

Redis

DispatcherRedis

Protobuff(Gradle)

Redis'Atomic Ops

Link the different stages with dispatchers. They assure that no data is lost between phases

Dispatcher

Minute formatterDay formatterCountry formatterDispatcher

Mongo

Geo
IP

Whois

UI ScattererLog file ScattererDL Server

Dispatcher

Minute formatterDay formatterCountry formatterDispatcher

UI ScattererLog file ScattererDL Server

HTTPCommands

CrashCommands

Controller

DL Server

ControllerDispatcherDispatcherDispatcherDispatcherDispatcherDispatcher

Formatters

Scatterers



...

?? ? ?

!

:)

Especially if the infra, domain , team or concept is new,

It'll let you get the hang of things before you've fully committed and gone ahead

?