51
Building low latency Micro Services and monolith in Java using high performance serialization and messaging Peter Lawrey - CEO of Higher Frequency Trading GOTO Chicago - 2016 Microservices for Performance

Microservices for Performance

Embed Size (px)

Citation preview

Page 1: Microservices for Performance

Building low latency Micro Services and monolith in Java using high performance serialization and messaging Peter Lawrey - CEO of Higher Frequency Trading

GOTO Chicago - 2016

Microservices for Performance

Page 2: Microservices for Performance

PeterLawreyJavaDeveloper/Consultantforinvestmentbanksandhedgefundsfor8years,23yearsinIT.MostanswersforJavaandJVMonstackoverflow.comFounderofthePerformanceJavaUser’sGroup.ArchitectofChronicleSoNwareJavaChampion

Page 3: Microservices for Performance

ChronicleSoNwareHelpcompaniesmigratetohighperformanceJavacode.SponsoropensourceprojectshQps://github.com/OpenHFTLicensedsoluTonsChronicle-FIX,Chronicle-EnterpriseandChronicle-Queue-EnterpriseOfferoneweekproofofconceptworkshops,advancedJavatraining,consulTngandbespokedevelopment.

Page 4: Microservices for Performance

MyFirstComputer

128KBofRAM

Page 5: Microservices for Performance

WheredoMicroservicescomefrom?

MicroservicesbuildsondesignprincipleswhichhavebeenaroundforsomeTme.•  UNIXPrinciple.•  StagedEventDrivenArchitecture.•  ServiceOrientatedArchitecture.•  LambdaArchitecture.•  ReacTveStreams.•  UsedinbuildingWebapplicaTons.“Micro-Web-Services”

Page 6: Microservices for Performance

PerformanceforGUIapplicaTonsAkeyperformancethresholdforGUIapplicaTonsisthespeedahumancansee.Anythingfasterthanthisdoesn’tmaQerforaGUI.Incinemas,theframerateusedtobe24framespersecond.Thismeansanythingshorterthan40msisunlikelytobeimportant.

Page 7: Microservices for Performance

ComputerprogramsvsHardware.GUIprogramshavetendedtonotgetanyfaster,rathertheyaQempttogivearicherexperience.

Page 8: Microservices for Performance

Microservicesdenial?

MicroservicesbringtogetherbestpracTcesfromavarietyofareas.MostlikelyyouarealreadyusingsomeofthesebestpracTces.ReacTonstoMicroservices•  ItsoundslikemarkeTnghype.•  ItallsoundspreQyfamiliar.•  Itjustarebrandingofstuffwealreadydo.•  Thereisroomforimprovementinwhatwedo.•  Therearesometools,andideaswecouldapplytooursystems

withoutchangingtoomuch.

Page 9: Microservices for Performance

MicroservicesscorecardToday QuickWins 6Months

Simplecomponentbaseddesign. ★★ ★★☆ ★★☆

DistributedbyJVMandMachine ★★ ★★ ★★☆

ServiceDiscovery ★ ★☆ ★★

Resiliencetofailure ★☆ ★☆ ★★

TransportagnosTc ★ ★☆ ★★

Asynchronousmessaging. ★☆ ★★ ★★

Automated,dynamicdeploymentofservices.

★☆ ★★ ★★☆

Serviceprivatedatasets. ☆ ★☆ ★★

Transparentmessaging. ☆ ★★ ★★☆

IndependentTeams ★☆ ★★ ★★

LambdaArchitecture ★ ★★ ★★★

Page 10: Microservices for Performance

BenefitofMicroservicesinTradingSystems

Standardtechniquesfordevelopinganddeployingdistributedsystems•  ShorterTmetomarket.•  Easiertomaintain.•  Simplerprogrammingmodels.

Page 11: Microservices for Performance

WhatMicroservicescanlearnfromTradingSystemsTradingsystemhavebeenworkingwithperformantdistributed

systemsforyears.•  Asynchronousmessaging,howtotestcorrectnessand

performanceforlatenciesyoucannotsee.•  BuildingdeterminisTc,highlyreproduciblesystems.

Page 12: Microservices for Performance

Whatislowlatency?

Theterm“lowlatency”canappliedtoawiderangeofsituaTons.AbroaddefiniTonmightbe;LowlatencymeansyouhaveaviewonhowmuchtheresponseTmeofasystemcostsyourbusiness.InthistalkIwillassume;Lowlatencymeansyoucareaboutlatenciesyoucanonlymeasureaseventheworstlatenciesaretoofasttosee.

Page 13: Microservices for Performance

Exampleoflowlatency?

AnInvestmentBankmeasuredthe99.999%ile(worst1in100,000)latencyofourChronicleFIXengineat450micro-seconds.ChronicleFIXiswriQeninJava.Thiswasunacceptabletothem.Wefixedthisbuganddroppedittobelow35micro-seconds.ThiswasaNerasocketreadtoaNerdecodingthemessagetotheirdatamodelandpersisTngit.

Page 14: Microservices for Performance

ToGoFasterDoLessWork

Micro-servicesdesignencouragesadesignwhereeachservicedoessomethingsimpleanditdoesitwell.YourL1/L2cachesareapreciousresource.Theyarelimitedofonly32KB(instrucTon),32KB(data)and256KB(L2).IfyouexceedthisyouarehipngthesharedL3cachewhichisascalabilityproblemANDeveryaccessis10xslowerormore.Ifyouwanttoscaleacrosscores,stayinyourL1/L2cacheyouwanteachthreadtoperformasimpletaskwhichfitsinsideitscachesasmuchaspossible.

Page 15: Microservices for Performance

Wheredotheyoverlap.

MicroservicesandTradingSystemshavehighlevelprinciplesof•  Simplecomponentbaseddesign.•  Asynchronousmessaging.•  Automated,dynamicdeploymentofservices.•  Serviceprivatedatasets.•  Transparentmessaging.•  Teamscandevelopindependentlybasedonwelldefined

contracts.

Page 16: Microservices for Performance

Eachoutputistheresultofoneinputmessage.Thisisusefulforgateways,bothinandoutofyoursystemandhighlyconcurrent.

Page 17: Microservices for Performance

EachoutputistheresultofALLtheinputs.InsteadofreplyingALLinputmessageeachTme,theFuncToncouldsaveanaccumulatedstate(whichcanberecreatedbyreplayinginputs)

Page 18: Microservices for Performance

YourcriTcalpathasaseriesoflowlatency,nonblockingtasks.Thiskeepsyourlatenciesendtoendconsistentlylow.

Page 19: Microservices for Performance
Page 20: Microservices for Performance

WhatdowemeanbyaDistributedSystem.

UsuallyadistributedsystemwillhaveaprocessesrunacrossmulTplemachines.Inahighperformancespaceyouwanttothinkingabouthowyourthreadsaredistributedwithinyourmachine.InparTcular,NUMAregionscanbecriTcalespeciallyfortheJVMwhichhasaGCwhichassumesrandomaccesstoallmemory.EvenifyouareworkingwithinasingleNUMAregion,itcanbeworthlookingatthedistribuTonofyourapplicaTonwithinaregion.

Page 21: Microservices for Performance

AComputerisaDistributedSystem.

WhenyouareconsideringshortTmescalesof10micro-secondsorless,youhavetoconsiderthateachcoreasaprocessorofit’sown.Eachcore-  hasit’sownmemory(L1&L2caches)-  canrunindependently-  communicateswithothercoresviaaL2cachecoherencebus.

Page 22: Microservices for Performance
Page 23: Microservices for Performance
Page 24: Microservices for Performance
Page 25: Microservices for Performance

DesigningforMicro-services

Micro-servicesneedtohave;-  IsolaTon,minimisescontenTonofstate.-  Asynchronousmessages,minimisestheimpactofdelays.-  Onethinganddoitwell,stayinyourprivateCPUcaches.-  Servicesshouldbeaddressable.-  Transparencyofwhatyourservicesaredoing.*

Micro-servicescomeinsystemsmustbetestableinisolaTon.*Transparencyisneededifyouaretoremoveredundantwork.

Page 26: Microservices for Performance

TesTngandDebuggingMicro-services

Youwantmicro-serviceswhichareeasytounittestanddebug.However,yourframeworkandinfrastructurecangetintheway.Theyarenothelping.Youneedtobeabletorunyourservicesasstandalonecomponents.Thesecomponentscanbetested,integratedanddebuggedwithouttheframeworkorinfrastructuresoyoucanseewherethesourceofyourissuesare.

Page 27: Microservices for Performance

Whynotuseaframework?

Inthisexamplewelookathowtoimplementtheseserviceswithoutaframework.Frameworksareverygoodforgepngstartedbutarenotsogoodiftheframeworkdoesn’tdoexactlywhatyouwant.InparTculartheyarenotverygoodatdoingless.Inperformantsystems,howeasyitistoremovesomethingtheprogramdoesn’thavetobedoingisjustasimportantashoweasyitistoaddsomefuncTonality.Thekeyistransparencyinwhatyourservicesaredoing.

Page 28: Microservices for Performance

TurningaMonolithintoMicro-Services

YouneedtohavegoodcomponentbaseddesignwhetheryouhaveMicro-ServicesoraMonolithwithclearseparaTonofconcerns.Micro-servicesmakegoodcomponentdesignevenmoreimportantastheywon’twellworkunlessyoudothis.Toturnyourcomponentsintoservices,youneedtoaddatransport.ThistransportshouldbeinterchangeableinfactitshouldbeopTonalandhavenoimpactifit’snotthere.

Component+Transport=Service.

Page 29: Microservices for Performance

Let’slookatanexample

Saywehaveamarketdatacomponentwhichcombinespricesandthisfeedsanothercomponentwhichisyourordermanager.AmoredetailsdiscussionofthisexampleisonmybloghQps://vanilla-java.github.io/ThefullcodeishQps://github.com/Vanilla-Java/Microservices/

Page 30: Microservices for Performance

StarTngwithasimplecontract

Asimplecontractforaservicewhichtakesasynchronousmessagesisaninterface.Eachmessagehasamethodnameandittakesoneormorearguments.SaywehaveacomponentwhichconsumesonesidedpricesAndthisproducestopofbookpriceswithbidandask

public interface SidedMarketDataListener { void onSidedPrice(SidedPrice sidedPrice); }

public interface MarketDataListener { void onTopOfBookPrice(TopOfBookPrice price); }

Page 31: Microservices for Performance

ADataTransferObject

AbstractMarshallableprovidesanequals,hashCodeandtoString.public class SidedPrice extends AbstractMarshallable { String symbol; long timestamp; Side side; double price, quantity; public SidedPrice(String symbol, long timestamp, Side side, double price, double quantity) { init(symbol, timestamp, side, price, quantity); } public SidedPrice init(String symbol, long timestamp, Side side, double price, double quantity) { this.symbol = symbol; this.timestamp = timestamp; this.side = side; this.price = price; this.quantity = quantity; return this; } }

Page 32: Microservices for Performance

DeserializabletoString()

Forittodeserializethesameobject,noinformaToncanbelost,whichusefultocreaTngtestobjectsfromproducTonlogs.

SidedPrice sp = new SidedPrice("Symbol", 123456789000L, Side.Buy, 1.2345, 1_000_000);

assertEquals("!SidedPrice {\n" + " symbol: Symbol,\n" + " timestamp: 123456789000,\n" + " side: Buy,\n" + " price: 1.2345,\n" + " quantity: 1000000.0\n" + "}\n", sp.toString()); // from string SidedPrice sp2 = Marshallable.fromString(sp.toString()); assertEquals(sp2, sp); assertEquals(sp2.hashCode(), sp.hashCode());

Page 33: Microservices for Performance

WriTngasimplecomponent

Wehaveacomponentwhichimplementsourcontractandinturncallsanotherinterfacewiththeresult(ifthereisone)

public class SidedMarketDataCombiner implements SidedMarketDataListener { final MarketDataListener mdListener; final Map<String, TopOfBookPrice> priceMap = new TreeMap<>(); public SidedMarketDataCombiner(MarketDataListener mdListener) { this.mdListener = mdListener; } public void onSidedPrice(SidedPrice sidedPrice) { TopOfBookPrice price = priceMap.computeIfAbsent(sidedPrice.symbol, TopOfBookPrice::new); if (price.combine(sidedPrice)) mdListener.onTopOfBookPrice(price); } }

Page 34: Microservices for Performance

Mockingoursimplecomponent

WecanusestandardmockingtoolssuchasEasyMockandIeasytestthisfrommyIDE.@Test public void testOnSidedPrice() { // what we expect to happen SidedPrice sp = new SidedPrice("Symbol", 123456789000L,

Side.Buy, 1.2345, 1_000_000); SidedMarketDataListener listener = createMock(SidedMarketDataListener.class); listener.onSidedPrice(sp); replay(listener); // what happens listener.onSidedPrice(sp); // verify we got everything we expected. verify(listener); }

Page 35: Microservices for Performance

TesTngoursimplecomponent

Wecanmocktheoutputlistenerofourcomponent.

MarketDataListener listener = createMock(MarketDataListener.class); listener.onTopOfBookPrice(new TopOfBookPrice("EURUSD", 123456789000L,

1.1167, 1_000_000, Double.NaN, 0)); listener.onTopOfBookPrice(new TopOfBookPrice("EURUSD", 123456789100L,

1.1167, 1_000_000, 1.1172, 2_000_000)); replay(listener); SidedMarketDataListener combiner = new SidedMarketDataCombiner(listener); combiner.onSidedPrice(new SidedPrice("EURUSD", 123456789000L,

Side.Buy, 1.1167, 1e6)); combiner.onSidedPrice(new SidedPrice("EURUSD", 123456789100L,

Side.Sell, 1.1172, 2e6)); verify(listener);

Page 36: Microservices for Performance

TesTngmulTplecomponents

Wecanmocktheoutputlistenerofourcomponent.// what we expect to happen OrderListener listener = createMock(OrderListener.class); listener.onOrder(new Order("EURUSD", Side.Buy, 1.1167, 1_000_000)); replay(listener); // build our scenario OrderManager orderManager = new OrderManager(listener); SidedMarketDataCombiner combiner = new SidedMarketDataCombiner(orderManager); // events in orderManager.onOrderIdea(new OrderIdea("EURUSD", Side.Buy, 1.1180, 2e6)); // not expected to trigger combiner.onSidedPrice(new SidedPrice("EURUSD", 123456789000L, Side.Sell, 1.1172, 2e6)); combiner.onSidedPrice(new SidedPrice("EURUSD", 123456789100L, Side.Buy, 1.1160, 2e6)); combiner.onSidedPrice(new SidedPrice("EURUSD", 123456789100L, Side.Buy, 1.1167, 2e6)); orderManager.onOrderIdea(new OrderIdea("EURUSD", Side.Buy, 1.1165, 1e6)); // expected to trigger verify(listener);

Page 37: Microservices for Performance

Addingatransport

Anymessagingsystemcanbeusedasatransport.Youcanuse-  RESTorHTTP-  JMSorAkka-  AeronoraUDPbasedtransport.-  RawTCPorUDP.-  ChronicleQueue.

Page 38: Microservices for Performance

WhyuseChronicleQueue

ChronicleQueuev4hasanumberofadvantages-  Brokerless,onlytheOSneedstobeup.-  Lowlatency,lessthan10microseconds99%oftheTme.-  Persisted,givingyourreplayandtransparency.-  Canreplaceyourloggingimprovingperformance.-  KernelBypass,SharedacrossJVMswithasystemcallforeach

message.

Page 39: Microservices for Performance

WhatdoesChronicleQueuelooklike?---!!meta-data#binaryheader:!SCQStore{wireType:!WireTypeBINARY,writePosition:777,roll:!SCQSRoll{length:86400000,format:yyyyMMdd,epoch:0},indexing:!SCQSIndexing{indexCount:!int8192,indexSpacing:64,index2Index:0,lastIndex:0}}#position:227---!!data#binaryonOrderIdea:{symbol:EURUSD,side:Buy,limitPrice:1.118,quantity:2000000.0}#position:306---!!data#binaryonTopOfBookPrice:{symbol:EURUSD,timestamp:123456789000,buyPrice:NaN,buyQuantity:0,sellPrice:1.1172,sellQuantity:2000000.0}#position:434---!!data#binaryonTopOfBookPrice:{symbol:EURUSD,timestamp:123456789100,buyPrice:1.116,buyQuantity:2000000.0,sellPrice:1.1172,sellQuantity:2000000.0}#position:566---!!data#binaryonTopOfBookPrice:{symbol:EURUSD,timestamp:123456789100,buyPrice:1.1167,buyQuantity:2000000.0,sellPrice:1.1172,sellQuantity:2000000.0}#position:698---!!data#binaryonOrderIdea:{symbol:EURUSD,side:Buy,limitPrice:1.1165,quantity:1000000.0}...#83885299bytesremaining

Page 40: Microservices for Performance

Measuringtheperformance?

MeasurethewritelatencywithJMH(JavaMicrobenchmarkHarness)

Percentiles, us/op:

p(0.0000) = 2.552 us/op

p(50.0000) = 2.796 us/op

p(90.0000) = 5.600 us/op

p(95.0000) = 5.720 us/op

p(99.0000) = 8.496 us/op p(99.9000) = 15.232 us/op

p(99.9900) = 19.977 us/op p(99.9990) = 422.475 us/op

p(99.9999) = 438.784 us/op p(100.0000) = 438.784 us/op

Page 41: Microservices for Performance

JavaLatencyBenchmarkHarness

JMHisanexcellenttool,whyaddanother?JLBHmeasures-  Timingsinasynchronousthreads.-  Timingsfromanypointintheprocesstoanother.-  Workswithasetthroughput.-  Accountsforcoordinatedomission.-  Hasabackgroundthreadtoreportthegeneralnoise.

Page 42: Microservices for Performance

JavaLatencyBenchmarkHarness

MulTpleservicesrunningintheirownthreadscanbechainedandperformancetestedindividuallyaswellasendtoend. @Override public void init(JLBH jlbh) { serviceIn = SingleChronicleQueueBuilder.binary(queueIn).build()

.createAppender().methodWriter(Service.class); service2 = new ServiceWrapper<>(queueIn, queue2,

new ServiceImpl(jlbh.addProbe("Service 2"))); service3 = new ServiceWrapper<>(queue2, queue3,

new ServiceImpl(jlbh.addProbe("Service 3"))); serviceOut = new ServiceWrapper<>(queue3, queueOut,

new ServiceImpl(jlbh.addProbe("Service Out"), jlbh)); }

Page 43: Microservices for Performance

Typicalperformance

Foratrivialservice,thetypicalperformanceisconsistent.TestedonaE5-2650v2.FiNeen2minutetestswerecompared.

Page 44: Microservices for Performance

Lookingatthenines

Foratrivialservice,thetypicalperformanceisconsistent.At99%,thethroughputhasanimpactonlatency.

Page 45: Microservices for Performance

Lookingatthenines

At99.9%,thethroughputmakesabigdifferencetothelatency.At400K/sthe99.9%ilehit19msinone2minuterun.

Page 46: Microservices for Performance

NoFlowControl?

ChronicleQueueisaproducercentricsoluTon,unlikemostmessagingsystemswhichareconsumercentric.-  manypublishersdon'tsupportflowcontrol(e.g.marketdata)

ordon'twantit(reporTngtocompliancesystems)-  Flowcontrolcanhidetheworstlatencies.-  FlowcontrolmeansanasynchronousconsumerisiteracTng

withit’sproducer.Withoutflowcontrol,theproducerandconsumerbehavethesamewhethertestedstandaloneorrunningatthesameTmeastheproducerisneverimpactedbyaslowconsumer.

Page 47: Microservices for Performance

ChronicleQueueEnterprise

WehavealicensedversionofChronicleQueuewhichincludes•  AringbuffertominimisethejiQerofpersisTngmessages.•  ConfirmedmulT-nodereplicaTon.Thewriterandreadercan

knoworwaitforamessagetobesuccessfullyreplicated.•  ReplicaTonthroQling,andtrafficshaping.•  Compressingoffilesonrolling.

Comingsoon,mulT-mastermulT-nodequeue.

Page 48: Microservices for Performance

WherecanItrythisout?

LowLatencyMicroservicesexampleshQps://github.com/Vanilla-Java/MicroservicesTheOSSChronicleproductsareavailablehQps://github.com/OpenHFT/ContactusforatrailversionofChronicle-FIX,Chronicle-EnterpriseorChronicle-Queue-Enterprise.sales@chronicle.soNware

Page 49: Microservices for Performance
Page 50: Microservices for Performance

Insummary

•  Microservicesdoesn’tmeanyoucandothingsdifferently,onlyimprovewhatyouaredoingalready.

•  IntroducetheBestPracTceswhichmakesenseforyou.•  YouwillhavesomeBestPracTcesalready.•  TradingSystemsaredistributedsystems,evenifonone

machine.•  LambdaArchitectureissimplesouseitasmuchaspossible,

thoughitwon’tsolveallproblems.

Page 51: Microservices for Performance

Q&A

Blog:hQp://vanilla-java.github.io/

hQp://chronicle.soNware

@ChronicleUG

[email protected]

hQps://groups.google.com/forum/#!forum/java-chronicle