Download pdf - Micro Benchmarking WfMS with Workflow Patterns

Research

Marigianna Skouradaki, FrankLeymannInstituteofArchitectureofApplication Systems

University ofStuttgart,Germany

VincenzoFerme,Cesare PautassoFacultyofInformatics,

University ofLugano,Switzerland

AndrévanHoornInstituteofSoftwareTechnology,University ofStuttgart,Germany

Micro– BenchmarkingBPMN2.0WorkflowManagementSystems

withWorkflowPatterns

22

Research

©Marigianna Skouradaki

Motivation

*the appearance and company selection are random

77

Research


Motivation

WorkloadMix Results

2. What is the impact of the core BPMN 2.0 language constructs on the performance of the Workflow Management Systems?

3. Can simple workload mix reveal performance bottlenecks?

1. How to define meaningful workload models candidates?

Benchmark

88

Research


Agenda

n Backgroundn WorkloadModelsn BenchFlowEnvironmentn ExperimentsSetupn ExperimentsMethodologyn Resultsn Discussionn ConclusionsandFutureWork

99

Research


WhatisaWorkflowManagementSystem(WfMS)?

4.Navigator

3.Web Services

6.ApplicationServer

WorkflowEngine

5.InstanceDatabase

2.Users& IT tools

…

1.ProcessModels

1010

Research


WhatisaWorkflowManagementSystem(WfMS)?

4.Navigator

3.Web Services

6.ApplicationServer

WorkflowEngine

5.InstanceDatabase

2.Users& IT tools

…

1.ProcessModels

1313

Research


Micro-Benchmark

n Simpleworkloadandtargetstheevaluationofatomicoperations1

1Waller, J., Hasselbring, W.: A benchmark engineering methodology to measure the overhead of application-level monitoring. In: KPDAYS. CEUR Workshop Proceedings, vol. 1083, pp. 59–68. CEUR-WS.org (2013) 2Oracle, “Java Microbenchmark Harness”, (2013). Online resource: http://openjdk.java.net/projects/code-tools/jmh/3Van Der Aalst, W.M.P., Hofstede, T., et al.: Workflow patterns. Distrib. Parallel Databases 14(1), 5–51 (Jul 2003)

HelloWorld() micro-benchmark measures “nothing”.It is a good showcase for the overheads in infrastructure2

à For Workflow Management SystemsThe basic control flow

workflow patterns can be considered as atomic operations.

ContributionIntroduce & Execute

the first Micro-Benchmark onBPMN 2.0 WfMS using

Workflow Patterns

1818

Research


WorkflowPatternsasWorkloadModels(Constraints)

1. Maximizethesimplicityoftheprocessmodels2. Omitinteractionswithexternalsystems(i.e.,stressthe

ProcessNavigator)3. Mostscripttasksareempty,unlessthe

implementationoftheworkflowpatternrequiresotherwise

4. Defineequalprobabilityforpassingthecontrolflowtotheoutgoingbranches

5. Combine:- Simplemerge+exclusivechoice- Parallelsplit+synchronization

2323

Research


WorkloadModelssequencePattern

EmptyScript 1

EmptyScript 2

Sequence Pattern [SEQ]parallelSplitPattern

EmptyScript 1

EmptyScript 2

Parallel Split & Synchronization [PAR]

implicitTerminationPattern

EmptyScript 1

Wait 5Sec

Explicit Termination Pattern [EXT]arbitaryCyclesPattern

generate1 or 2

EmptyScript 1

EmptyScript 2

i++

i < 10

i >= 10case_1

case_2

Arbitrary Cycle Pattern [CYC]

exclusiveChoicePattern

generatenumber

EmptyScript 1

EmptyScript 2case_2

case_1

Simple Merge & Exclusive Choice [EXC]

2424

Research


ThreeOpen-SourceWfMS UnderTest

1. Widely used in the industry2. Tested against BPMN 2.0 conformance1

3. Vendor provided configurations are used

1.Geiger, M., Harrer, S., Lenhard, J., Casar, M., Vorndran, A., Wirtz, G.: BPMN conformance in open source engines. In: Proc. of the 9th IEEE International Symposium on Service-Oriented System Engineering (SOSE 2015). SOSE 2015, IEEE, San Francisco Bay, CA, USA (March 30 – April 3 2015)

2525

Research

VincenzoFerme

TheBenchFlowEnvironment5

5Ferme, Vincenzo; Ivanchikj, Ana and Pautasso Cesare: "A Framework for Benchmarking BPMN 2.0 Workflow Management Systems", In: Proceedings of BPM’15, Innsbruck, Austria, Springer, August, 2015.

2626

Research

VincenzoFerme

Metrics/KPIs

2727

Research

VincenzoFerme

ExperimentsMethodology

n Experiment1:Max(1500)concurrentInstanceProducersstartinginstancesforeachworkflowpattern[SEQ,EXC,EXT,PAR,CYC]

n Experiment2:1500concurrentInstanceProducersstartinganequalnumberinstancesoftheworkflowpatterns(20%[MIX]of[SEQ,EXC,EXT,PAR,CYC])

à Threeroundsperexperimentà Eachexperiment lastsfor10minutesà Discardedthedataofthewarm-upstate(first1min)

2828

Research


Results:MeanDuration(ms)

SEQ EXC EXT PAR CYC MIX0

5

10

15

20

0.39

0.48

14.1

13.29

2.91

8.16

6.39

9.3

2,62

2

10.06

38.65

540.02

0.74

0.85

0.4

0.7 3.

06

1.22

(a)MeanDuration

(ms)

Activiti jBPM Camunda


20

40

60

80

100

43.21 57.42

60.2 66.1

70.09

77.69

5.83

5.73

0.24 5.64

4.67

0.66

36.75

44.62

33.34

41.76

41.67

48.53

(b)MeanCPU

(%)

Fig. 2. (a) Mean Duration (ms) per workflow pattern and (b) Mean CPU (%) Usageper workflow pattern

shows that the duration times are not notably a↵ected as the values are close tothose of [SEQ].

More particularly, we have a mean of 0.48 ms for Activiti, 9.30 ms for jBPMand 0.85 ms for Camunda. Concerning the CPU and RAM utilization, we seea slight increase with respect to the [SEQ]. Activiti uses an average of 57.42%CPU and 12, 215 MB RAM for executing 775, 455 workflow instances in 540 sec,jBPM takes approximately the same amount of time (562 sec) to execute 27, 805workflow instances. For this, it utilizes a mean of 5.73% CPU and 2, 976.37 MBof RAM, and Camunda 43.21% of CPU and 824.96 MB of RAM for executing765, 274 workflow instances in 540 sec.

4.2.3 Explicit Termination Pattern [EXT] As discussed in Sec. 2 the[EXT] executes concurrently an empty script and a script that implements a fiveseconds wait. According to the BPMN 2.0 execution semantics, the branch of the[EXT] that finishes first terminates the rest of the workflow’s running branches.We have therefore designed the model considering that the fastest branch (emptyscript) will complete first, and stop the slow script on the other branch whenthe terminate end event following the empty script is activated. This was thecase for Activiti and Camunda, which executed the workflow patterns in anaverage of 14.11 ms and 0.4 ms respectively. The resource utilization of these twoWfMSs also increases in this workflow pattern, i.e., we have 60.20% mean CPUusage and 12, 025 MB mean RAM usage for Activiti and 33.34% mean CPUusage and 794.92 MB mean RAM usage for Camunda. We can already see aninteresting di↵erence on the performance of the two WfMS as [EXT] constitutesthe the slowest workflow pattern for Activiti and the fastest workflow patternfor Camunda. However, we should notice that in this experiment Camunda hada higher response time which was slowing down the instance producers. Thisresulted in an experiment execution time of 539 sec and a lower throughput.


5

10

15

20

0.39

0.48

14.1

13.29

6.23 8.

16

6.39

9.3

2,62

2

10.06

39.36

540.02

0.74

0.85

0.4

0.7 3.

06

1.22

(a)MeanDuration

(ms)

WfMS A WfMS B Camunda


20

40

60

80

100

43.21 57.42

60.2 66.1

70.09

77.69

5.83

5.73

0.24 5.64

4.67

0.66

36.75

44.62

33.34

41.76

41.67

48.53

(b)MeanCPU

(%)


mentary material. The behaviour of all the WfMSs is discussed thoroughly inSec. 4.3.

4.2 Results Analysis

Sequence Flow Pattern [SEQ] The [SEQ] workflow pattern lasted onaverage 0.39 ms for WfMS A, 6.39 ms for WfMS B and 0.74 ms for Camunda.The short duration of this workflow pattern justifies the low mean CPU usagewhich is 43.21% for WfMS A, 5.83% for WfMS B and 36.75% for Camunda.WfMS B also has a very low average throughput of 63.31 wi/s while for the othertwo WfMS the average throughput is similar. Concerning the memory utilizationunder the maximum load WfMS A needed in average 12, 074, WfMS B 2, 936 andCamunda 807.81 MB of RAM respectively. As observed from the Table 1 [SEQ]is the workflow pattern with the highest throughput for all the WfMS under test.

Exclusive Choice & Simple Merge Patterns [EXC] Before proceedingto the results analysis of the [EXC], we should consider that the first script taskof the workflow pattern generates a random integer, which is given as an inputto the very simple evaluation condition of the exclusive choice gateway. This wasexpected to have some impact on the performance. However, Fig. 2(a) showsthat the duration times are not notably a↵ected as the values are close to thoseof [SEQ]. More particularly, we have a mean of 0.48 ms for WfMS A, 9.30 ms forWfMS B and 0.85 ms for Camunda. Concerning the CPU and RAM utilization,we see a slight increase with respect to the [SEQ]. WfMS A uses an average of57.42% CPU and 12, 215 MB RAM for executing 775, 455 workflow instancesin 540 sec, WfMS B takes approximately the same amount of time (562 sec) toexecute 27, 805 workflow instances. For this, it utilizes a mean of 5.73% CPU and2, 976.37 MB of RAM, and Camunda 43.21% of CPU and 824.96 MB of RAMfor executing 765, 274 workflow instances in 540 sec.

Mean duration (ms)

sequencePattern

EmptyScript 1

EmptyScript 2

[SEQ]

2929

Research




5

10

15

20

0.39

0.48

14.1

13.29

2.91

8.16

6.39

9.3

2,62

2

10.06

38.65

540.02

0.74

0.85

0.4

0.7 3.

06

1.22

(a)MeanDuration

(ms)



20

40

60

80

100

43.21 57.42

60.2 66.1

70.09

77.69

5.83

5.73

0.24 5.64

4.67

0.66

36.75

44.62

33.34

41.76

41.67

48.53

(b)MeanCPU

(%)






5

10

15

20

0.39

0.48

14.1

13.29

6.23 8.

16

6.39

9.3

2,62

2

10.06

39.36

540.02

0.74

0.85

0.4

0.7 3.

06

1.22

(a)MeanDuration

(ms)



20

40

60

80

100

43.21 57.42

60.2 66.1

70.09

77.69

5.83

5.73

0.24 5.64

4.67

0.66

36.75

44.62

33.34

41.76

41.67

48.53

(b)MeanCPU

(%)






Mean duration (ms)

[EXC]

exclusiveChoicePattern

generatenumber

EmptyScript 1

EmptyScript 2case_2

case_1

3030

Research




5

10

15

20

0.39

0.48

14.1

13.29

2.91

8.16

6.39

9.3

2,62

2

10.06

38.65

540.02

0.74

0.85

0.4

0.7 3.

06

1.22

(a)MeanDuration

(ms)



20

40

60

80

100

43.21 57.42

60.2 66.1

70.09

77.69

5.83

5.73

0.24 5.64

4.67

0.66

36.75

44.62

33.34

41.76

41.67

48.53

(b)MeanCPU

(%)






5

10

15

20

0.39

0.48

14.1

13.29

6.23 8.

16

6.39

9.3

2,62

2

10.06

39.36

540.02

0.74

0.85

0.4

0.7 3.

06

1.22

(a)MeanDuration

(ms)



20

40

60

80

100

43.21 57.42

60.2 66.1

70.09

77.69

5.83

5.73

0.24 5.64

4.67

0.66

36.75

44.62

33.34

41.76

41.67

48.53

(b)MeanCPU

(%)






Mean duration (ms)


EmptyScript 1

Wait 5Sec

[EXT]

3131

Research




5

10

15

20

0.39

0.48

14.1

13.29

2.91

8.16

6.39

9.3

2,62

2

10.06

38.65

540.02

0.74

0.85

0.4

0.7 3.

06

1.22

(a)MeanDuration

(ms)



20

40

60

80

100

43.21 57.42

60.2 66.1

70.09

77.69

5.83

5.73

0.24 5.64

4.67

0.66

36.75

44.62

33.34

41.76

41.67

48.53

(b)MeanCPU

(%)






5

10

15

20

0.39

0.48

14.1

13.29

6.23 8.

16

6.39

9.3

2,62

2

10.06

39.36

540.02

0.74

0.85

0.4

0.7 3.

06

1.22

(a)MeanDuration

(ms)



20

40

60

80

100

43.21 57.42

60.2 66.1

70.09

77.69

5.83

5.73

0.24 5.64

4.67

0.66

36.75

44.62

33.34

41.76

41.67

48.53

(b)MeanCPU

(%)






Mean duration (ms)

parallelSplitPattern

EmptyScript 1

EmptyScript 2

[PAR]

3232

Research




5

10

15

20

0.39

0.48

14.1

13.29

2.91

8.16

6.39

9.3

2,62

2

10.06

38.65

540.02

0.74

0.85

0.4

0.7 3.

06

1.22

(a)MeanDuration

(ms)



20

40

60

80

100

43.21 57.42

60.2 66.1

70.09

77.69

5.83

5.73

0.24 5.64

4.67

0.66

36.75

44.62

33.34

41.76

41.67

48.53

(b)MeanCPU

(%)






5

10

15

20

0.39

0.48

14.1

13.29

6.23 8.

16

6.39

9.3

2,62

2

10.06

39.36

540.02

0.74

0.85

0.4

0.7 3.

06

1.22

(a)MeanDuration

(ms)



20

40

60

80

100

43.21 57.42

60.2 66.1

70.09

77.69

5.83

5.73

0.24 5.64

4.67

0.66

36.75

44.62

33.34

41.76

41.67

48.53

(b)MeanCPU

(%)






Mean duration (ms)

CYC Instance Producers:

Camunda: 600

WfMS A: 600WfMS B: 600

arbitaryCyclesPattern

generate1 or 2

EmptyScript 1

EmptyScript 2

i++

i < 10

i >= 10case_1

case_2

[CYC]

3333

Research




5

10

15

20

0.39

0.48

14.1

13.29

2.91

8.16

6.39

9.3

2,62

2

10.06

38.65

540.02

0.74

0.85

0.4

0.7 3.

06

1.22

(a)MeanDuration

(ms)



20

40

60

80

100

43.21 57.42

60.2 66.1

70.09

77.69

5.83

5.73

0.24 5.64

4.67

0.66

36.75

44.62

33.34

41.76

41.67

48.53

(b)MeanCPU

(%)






5

10

15

20

0.39

0.48

14.1

13.29

6.23 8.

16

6.39

9.3

2,62

2

10.06

39.36

540.02

0.74

0.85

0.4

0.7 3.

06

1.22

(a)MeanDuration

(ms)



20

40

60

80

100

43.21 57.42

60.2 66.1

70.09

77.69

5.83

5.73

0.24 5.64

4.67

0.66

36.75

44.62

33.34

41.76

41.67

48.53

(b)MeanCPU

(%)






Mean duration (ms)

3434

Research


Results:MeanCPU(%)andRAM(MB)usage


5

10

15

20

0.39

0.48

14.1

13.29

6.23 8.

16

6.39

9.3

2,62

2

10.06

39.36

540.02

0.74

0.85

0.4

0.7 3.

06

1.22

(a)MeanDuration

(ms)



20

40

60

80

100

43.21 57.42

60.2 66.1

70.09

77.69

5.83

5.73

0.24 5.64

4.67

0.66

36.75

44.62

33.34

41.76

41.67

48.53

(b)MeanCPU

(%)







2,000

4,000

12,074

.2

12,215

.9

12,025

.54

12,201

.16

12,340

.65

12,310

.88

2,93

6.6

2,97

6.37

2,74

7.34

2,93

5.81

2,85

1.9

2,78

6.54

807.81

824.96

794.92

828.37

933.31

991.86

MeanRAM

(MB)


SEQ EXC EXT PAR CYC0

10

20

30

0.6

0.72

18.99

17.2

3.45

9.21 12

.39

2,62

7.09

12.66

51.95

0.74

0.81

0.54

0.85 3.49

MeanDuration

(ms)

Fig. 3. (a) Mean RAM (MB) Usage per workflow pattern and (b) Mean Duration (ms)per workflow pattern in [MIX]

Explicit Termination Pattern [EXT] As discussed in Sec. 2 the [EXT]executes concurrently an empty script and a script that implements a five secondswait. According to the BPMN 2.0 execution semantics, the branch of the [EXT]that finishes first terminates the rest of the workflow’s running branches. Wehave therefore designed the model considering that the fastest branch (emptyscript) will complete first, and stop the slow script on the other branch whenthe terminate end event following the empty script is activated. This was thecase for WfMS A and Camunda, which executed the workflow patterns in anaverage of 14.11 ms and 0.4 ms respectively. The resource utilization of these twoWfMSs also increases in this workflow pattern, i.e., we have 60.20% mean CPUusage and 12, 025 MB mean RAM usage for WfMS A and 33.34% mean CPUusage and 794.92 MB mean RAM usage for Camunda. We can already see aninteresting di↵erence on the performance of the two WfMS as [EXT] constitutesthe slowest workflow pattern for WfMS A and the fastest for Camunda.

As seen in Fig. 2(a), WfMS B has very high duration results for this workflowpattern. We have investigated this matter in more detail and we have observedthat over the executions WfMS B chooses the sequential execution of each pathwith an average percentage of 52.23% for following the waiting script first and47.77% for following first the empty script. Since the waiting script takes fiveseconds to complete, every time it is chosen for execution it adds a five secondsoverhead, and thus the average duration time is so high. This alternate executionof the two branches also explains the rest of the statistics. For example, weobserve a very high standard deviation of 2500.44 that indicates that there isa very large spread of the values around the mean duration. Concerning theresource utilization we can observe a very low average usage of CPU at 0.24%and a mean RAM usage similar to the rest of the workflow patterns at 2, 747.34MB. In Sec. 4.3 we attempt to give an explanation of this behavior for WfMS B.


5

10

15

20

0.39

0.48

14.1

13.29

6.23 8.

16

6.39

9.3

2,62

2

10.06

39.36

540.02

0.74

0.85

0.4

0.7 3.

06

1.22

(a)MeanDuration

(ms)



20

40

60

80

100

43.21 57.42

60.2 66.1

70.09

77.69

5.83

5.73

0.24 5.64

4.67

0.66

36.75

44.62

33.34

41.76

41.67

48.53

(b)MeanCPU

(%)






CPU(%) usage

Memory (MB) Usage

3737

Research


ResultsThroughput(wi/s)

SEQ EXC EXT PAR CYC MIX

WfMS A 1,447.66 1,436.03 1,426.35 1,429.65 644.02 1,402.33

WfMS B 63.31 49.04 0.38 48.89 13.46 1.78

Camunda 1,456.79 1,417.17 1,455.68 1,455.68 327.99 1,061.27

CYC Instance Producers: WfMS A: 800 WfMS B: 1500 Camunda: 600

WfMS B uses a synchronous instantiation API.Therefore, load drivers must wait for the workflow instances to end

Before instantiating the next one

4646

Research


Findings

Even though the defined workload is simple


5

10

15

20

0.39

0.48

14.1

13.29

2.91

8.16

6.39

9.3

2,62

2

10.06

38.65

540.02

0.74

0.85

0.4

0.7 3.

06

1.22

(a)MeanDuration

(ms)



20

40

60

80

100

43.21 57.42

60.2 66.1

70.09

77.69

5.83

5.73

0.24 5.64

4.67

0.66

36.75

44.62

33.34

41.76

41.67

48.53

(b)MeanCPU

(%)





3. Differences in the duration

of the workflow instances

2. Discovered Bottleneck on WfMS B

1. Differences in resource utilization

show space for improvement

4. EXT pattern is not implemented

consistently on WfMS B

5. Sequential patterns are better candidatesfor discovering the

maximum throughputsequencePattern

EmptyScript 1

EmptyScript 2

6. Complex and parallel workflows

are better candidatesfor stressing the WfMS on

Resource UtilizationparallelSplitPattern

EmptyScript 1

EmptyScript 2


EmptyScript 1

Wait 5Sec

7. The CYC pattern caused

time out errors in Camunda

arbitaryCyclesPattern

generate1 or 2

EmptyScript 1

EmptyScript 2

i++

i < 10

i >= 10case_1

case_2

8. The execution of the MIX

had the expectedbehavior

4747

Research


FutureWork

n FinalizeandreleasetheBenchFlowframeworkinanopen-sourcemodelonGithubandDockerHub:

n Generatesyntheticprocessmodelsw.r.t.thepresentresultsn Executeamacrobenchmarkwiththesynthetic,complex,

representativeworkflowmixn Executedifferenttestswithrealisticloadfunctionsandtestdatan Production-likeconfigurations

benchflowhttps://github.com/benchflow

goo.gl/Olw32m [email protected]