Your Program as a Transpiler - QConSP€¦ · •My first task in Red Hat: marshalling backend for jBPM •Data model mapping • From XML tree model to graph representation •Apparently

Your Program as a Transpiler

Applying Compiler Design

to Everyday Programming

About Me

• Edoardo Vacchi @evacchi• Research @ University of Milan

• Research @ UniCredit R&D

• Drools and jBPM Team @ Red Hat

Motivation

Motivation

• My first task in Red Hat: marshalling backend for jBPM

• Data model mapping

• From XML tree model to graph representation

• Apparently boring, but challenging in a way

Motivation

• Language implementation is often seen as a dark art

• But some design patterns are simple at their core

• Best practices can be applied to everyday programming

Motivation (cont'd)

• Learning about language implementation will give you a

different angle to deal with many problems

• It will lead you to a better understanding of how GraalVM

and Quarkus do their magic

Goals

• Programs have often a pre-processing phase where you

prepare for execution

• Then, there's actual process execution phase

• Learn to recognize and structure the pre-processing phase

Transpilers

Transpilers vs. Compilers

• Compiler: translates code written in a language (source

code) into code written in a target language (object code).

The target language may be at a lower level of abstraction

• Transpiler: translates code written in a language into

code written in another language at the same level of

abstraction (Source-to-Source Translator).

Are transpilers simpler than compilers?

• Lower-level languages are complex

• They are not: if anything, they're simple

• Syntactic sugar is not a higher-level of abstraction

• It is: a concise construct is expanded at compile-time

• Proper compilers do low-level optimizations

• You are thinking of optimizing compilers.

The distinction is moot

• It is pretty easy to write a crappy compiler, call it a

transpiler and feel at peace with yourself

• Writing a good transpiler is no different or harder than

writing a good compiler

• So, how do you write a good compiler?

Your Program as a Compiler

Applying Compiler Design

to Everyday Programming

Compiler-like workflows

• At least two classes of problems can be solved with

compiler-like workflows

• Boot time optimization problems

• Data transformation problems

Compiler-like workflows

• At least two classes of problems can be solved with

compiler-like workflows

• Boot time optimization problems

• Data transformation problems

Running ExampleFunction Orchestration

Function Orchestration

• You are building an immutable Dockerized serverless

function

f g

Function Orchestration

• Problem • No standard* way to describe function orchestration yet

* Yes, I know about https://github.com/cncf/wg-serverless

f g

process: elements: - start: &_1 name: Start - function: &_2: name: Hello - end: &_3 name: End - edge: source: *_1 target: *_2 - edge: source: *_2 target: *_3

Start EndHello

Solution: Roll your own YAML format

Congratulations !

Enjoy attending conferences worldwide

Alternate Solution

• You are describing a workflow

• There is a perfectly fine standard: BPMN• Business Process Model and Notation

Task 1 Task 2

<process id="Minimal" name="Minimal Process">

<startEvent id="_1" name="Start"/>

<scriptTask id="_2" name="Hello">

<script>System.out.println("Hello World");</script>

</scriptTask>

<endEvent id="_3" name="End">

<terminateEventDefinition/>

</endEvent>

<sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/>


</process>

https://github.com/evacchi/ypaat

Start EndHello

Start EndHello

Downside: Nobody will invite you at

their conference to talk about BPM.

Start EndHello

Unless you trick them.

Downside: Nobody will invite you at

their conference to talk about BPM.

Bonuses for choosing BPMN

• Standard XML-based serialization format

• that's not the bonus

• There is standard tooling to validate and parse

• that is a bonus

• Moreover:• Different types of nodes included in the main spec

• Optional spec for laying out nodes on a diagram

Start

End

Hello

Goals

• Read a BPMN workflow

• Execute that workflow

• Visualize that workflow

Start

End

Hello

Step 1Recognize your compilation phase

What's a compilation phase?

• It's your setup phase.

• You do it only once before the actual processing begins

Configuring the application

• Problem. Use config values from a file/env vars/etc

• Do you validate config values each time you read them?

• Compile-time:• Read config values into a validated data structure

• Run-time:• Use validated config values

Data Transformation Pipeline

• Problem. Manipulate data to produce analytics

• Compile-time:• Define transformations (e.g. map, filter, etc. operations)

• Decide the execution plan (local, distributed, etc.)

• Run-time:• Evaluate the execution plan

Example: BPMN Execution

• Problem. Execute a workflow description.

• Compile-time:• Read BPMN into a visitable structure (StartEvent)

• Run-time:• Visit the structure

• For each node, execute tasks

Start

End

Hello

Example: BPMN Visualization

• Problem. Visualize a workflow diagram.

• Compile-time:• Read BPMN into a graph

• Run-time:• For each node and edge, draw on a canvas

Start

End

Hello

Read BPMN into a Data Structure

• Full XML Schema Definition* is automatically mapped

onto Java classes, validated against schema constraints

TDefinitions tdefs = JAXB.unmarshal( resource, TDefinitions.class);

* Yes kids, we have working schemas

BPMN: From Tree to Graph

• No ordering imposed

on the description





</endEvent>




</scriptTask>


</process>

Forward References

<definitions> <process id="Minimal" name="Minimal Process"> <startEvent id="_1" name="Start"/>

<scriptTask id="_2" name="Hello"> <script>System.out.println("Hello World");</script> </scriptTask>

<endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent>



</process>

https://github.com/evacchi/ypaat <bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess">

<bpmndi:BPMNShape bpmnElement="_1">

<dc:Bounds x="11" y="30" width="48" height="48"/>

</bpmndi:BPMNShape>



</bpmndi:BPMNShape>



</bpmndi:BPMNShape>

<bpmndi:BPMNEdge bpmnElement="_1-_2">

<di:waypoint x="35" y="50"/>


</bpmndi:BPMNEdge>




</bpmndi:BPMNEdge>

</bpmndi:BPMNPlane>

</bpmndi:BPMNDiagram>

</definitions>

Separate Layout Definition

<definitions> <process id="Minimal" name="Minimal Process"> <startEvent id="_1" name="Start"/>

<scriptTask id="_2" name="Hello"> <script>System.out.println("Hello World");</script> </scriptTask>

<endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent>



</process>

https://github.com/evacchi/ypaat <bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess">



</bpmndi:BPMNShape>



</bpmndi:BPMNShape>



</bpmndi:BPMNShape>




</bpmndi:BPMNEdge>




</bpmndi:BPMNEdge>

</bpmndi:BPMNPlane>

</bpmndi:BPMNDiagram>

</definitions>

Separate Layout Definition

Step 2Work like a compiler

Compiling a programming language

• You start from a text representation of a program

• The text representation is fed to a parser

• The parser returns a parse tree

• The parse tree is refined into an abstract syntax tree (AST)

• The AST is further refined through intermediate representations (IRs)

• Up until the final representation is returned

Compiling a programming language

• You start from a text representation of a program

• The text representation is fed to a parser

• The parser returns a parse tree

• The parse tree is refined into an abstract syntax tree (AST)

• The AST is further refined through intermediate representations (IRs)

• Up until the final representation is returned

What makes a compiler a proper compiler

• Not optimization

• Compilation Phases

• You can have as many as you like

Example. A Configuration File

3 Sanitize values

2 Unmarshall file into a typed object

1 Read file from (class)path

5 Coerce to typed values

4 Validate values

Example. Produce a Report

3 Merge into single data stream

2 Discard invalid values

1 Fetch data from different sources

5 Generate synthesis data structure

4 Compute aggregates (sums, avgs, etc.)

Example. A Workflow Engine

2 Collect nodes

1 Read BPMN file

4 Prepare for visit/layout

3 Collect edges

Start EndHello

Compilation Phases

• Better separation of concerns

• Better testability

• You can test each intermediate result

• You can choose when and where each phase gets evaluated

• More Requirements = More Phases !

Phase vs Pass

• Many phases do not necessarily mean as many passes

• You could do several phases in one pass

• Logically phases are still distinct

One Pass vs. Multi-Passfor value in config: sanitized = sanitize(value) validated = validate(sanitized) coerced = coerce(validated)

for value in config: sanitized += sanitize(value)for value in sanitized: validated += validate(value)for value in validated: coerced += coerce(value)

Myth: one pass doing many things is better than doing many passes, each doing one thing

It is not: Complexity for value in config: sanitized = sanitize(value) validated = validate(sanitized) coerced = coerce(validated)

n times: sanitize = 1 op validate = 1 op coerce = 1 op

(1 op + 1 op + 1 op) × n = 3n

for value in config: sanitized += sanitize(value)for value in sanitized: validated += validate(value)for value in validated: coerced += coerce(value)

n times: sanitize = n opn times: validate = n opn times: coerce = n op

(n + n + n) = 3n

Single-pass is not always possible

However, doing one

pass may be be

cumbersome or plain

impossible to do





</endEvent>




</scriptTask>


</process>

Forward References

Workflow Phases: Evaluationvar resource = getResourceAsStream("/example.bpmn2");

var tdefs = unmarshall(resource, TDefinitions.class);

var graphBuilder = new GraphBuilder();

// collect nodes on the builder

var nodeCollector = new NodeCollector(graphBuilder);

nodeCollector.visitFlowElements(tdefs.getFlowElements());

// collect edges on the builder

var edgeCollector = new EdgeCollector(graphBuilder);

edgeCollector.visitFlowElements(tdefs.getFlowElements());


2

3

4

5

1 // prepare graph for visit

var engineGraph = EngineGraph.of(graphBuilder);

// “interpret” the graph

var engine = new Engine(engineGraph);

engine.eval();

Workflow Phases: Layout

<?xml version="1.0" encoding="UTF-8"?>

<definitions ...>



...

</process>

<bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess">



... </bpmndi:BPMNDiagram></definitions>


var resource = getResourceAsStream("/example.bpmn2");

var tdefs = unmarshall(resource, TDefinitions.class);


// collect nodes on the builder

var nodeCollector = new NodeCollector(graphBuilder);

nodeCollector.visitFlowElements(tdefs.getFlowElements());

// collect edges on the builder

var edgeCollector = new EdgeCollector(graphBuilder);

edgeCollector.visitFlowElements(tdefs.getFlowElements());

2

3

4

5

1 // extract layout information

var extractor = new LayoutExtractor();

extractor.visit(tdefs);

var index = extractor.index();

// “compile” into buffered image

var canvas = new Canvas(graphBuilder, index);

var bufferedImage canvas.eval();

Visitors

Data Structures

TFlowElement

| +---- StartEventNode | +---- EndEventNode | `---- ScriptTask

Pattern Matching

nodeCollector.visit(node)

def visit(node: TFlowElement) = {

node match { case StartEventNode(...) => ... case EndEventNode(...) => ... case ScriptTask(...) => ... } }

The Poor Man's Alternatives

interface Visitor { void visit(TFlowElement el); void visit(TStartEventNode start); void visit(TEndEventNode end); void visit(TScriptTask task);}

interface Visitable { void accept(Visitor v);}

if (node instanceof StartEventNode) { StartEventNode evt = (StartEventNode) node; ...} else if (node instanceof EndEventNode) { EndEventNode evt = (EndEventNode) node;

...} else if (node instanceof ScriptTask) ScriptTask evt = (ScriptTask) node; ...}

Visitor Patternclass NodeCollector implements Visitor {

void visit(TStartEventNode start) {

graphBuilder.add(

new StartEventNode(evt.getId(), evt));

}

void visit(TEndEvent evt) {

graphBuilder.add(

new EndEventNode(evt.getId(), evt));

}

void visit(TScriptTask task) {

graphBuilder.add(

new ScriptTaskNode(task.getId(), task));

}

}

class EdgeCollector implements Visitor {

void visit(TSequenceFlow seq) {

graphBuilder.addEdge(

seq.getId(),

seq.getSourceRef(),

seq.getTargetRef());

}

}


Step 3Choose a run-time representation

Workflow Evaluation

• Choose a representation suitable for

evaluation

• In our case, for each node, we need to get

the outgoing edges with the next node to

visit

• The most convenient representation of

the graph is adjacency lists

• adj( p ) = { q | ( p, q ) edges }


...

// prepare graph for visit

var engineGraph =

EngineGraph.of(graphBuilder);

// decorate with an evaluator

var engine =

new Engine(engineGraph);

// evaluate the graph by visiting once more

engine.eval();

Map<Node, List<Node>> outgoing;

Workflow Evaluation

• The most convenient representation of the graph is adjacency lists

• adj( p ) ↦ { q | ( p, q ) edges }

• Map<Node, List<Node>> outgoing

Evaluationclass Engine implements GraphVisitor { void visit(StartEventNode node) { logger.info("Process '{}' started.", graph.name()); graph.outgoing(node).forEach(this::visit); } void visit(EndEventNode node) { logger.info("Process ended."); // no outgoing edges } void visit(ScriptTaskNode node) { logger.info("Evaluating script task: {}", node.element().getScript().getContent()); graph.outgoing(node).forEach(this::visit); } ...}


Workflow Layout• In this case, for each node and edge,

we need to get the shape and position

• No particular ordering is required

• e.g. first render edges and then shapes

<?xml version="1.0" encoding="UTF-8"?>

<definitions ...>



...

</process>

<bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess">



... </bpmndi:BPMNDiagram>

</definitions>

var canvas = new Canvas(graph, index);

var bufferedImage canvas.eval();

void eval() {

graph.edges().forEach(this::draw);

graph.nodes().forEach(this::visit);

}


Layoutclass Canvas implements GraphVisitor { void draw(Edge edge) { var pts = index.edge(edge.id()); setStroke(Color.BLACK); var left = pts.get(0); for (int i = 1; i < pts.size(); i++) { var right = pts.get(i); drawLine(left.x, left.y, right.x, right.y); left = right; } } void visit(StartEventNode node) { var shape = shapeOf(node); setStroke(Color.BLACK); setFill(Color.GREEN); drawEllipse(shape.x, shape.y, shape.width, shape.height); drawLabel(element.getName()); } ...}

Start

End

Hello

Bonus Step 4Generate code at compile-time

The Killer App

• Move pre-processing out of program run-time

• Generate code

• Run-time effectively consists only in pure processing

AI and Automation Platform

• Drools rule engine

• jBPM workflow platform

• OptaPlanner constraint solver

The Submarine Initiative

“The question of whether a computer can

think is no more interesting than the

question of whether a submarine can

swim.”

Edsger W. Dijkstra

GraalVM: “One VM to Rule Them All”

• Polyglot VM with cross-language JIT

• Java Bytecode and JVM Languages

• Dynamic Languages (Truffle API)

• Native binary compilation (SubstrateVM)

GraalVM: “One VM to Rule Them All”

• Polyglot VM with cross-language JIT

• Java Bytecode and JVM Languages

• Dynamic Languages (Truffle API)

• Native binary compilation (SubstrateVM)

Native Image: Restrictions

• Native binary compilation

• Restriction: “closed-world assumption”

• No dynamic code loading

• You must declare classes you want to reflect upon

Quarkus

Drools and jBPM

rule R1 when // constraints

$r : Result()

$p : Person( age >= 18 )

then // consequence

$r.setValue( $p.getName() + " can drink");

end

Drools

jBPM

Drools DRL

rule R1 when // constraints $r : Result() $p : Person( age >= 18 ) then // consequence $r.setValue( $p.getName() + " can drink"); end

var r = declarationOf(Result.class, "$r");var p = declarationOf(Person.class, "$p");

var rule = rule("com.example", "R1").build( pattern(r), pattern(p) .expr("e", p -> p.getAge() >= 18), alphaIndexedBy( int.class, GREATER_OR_EQUAL, 1, this::getAge, 18), reactOn("age")), on(p, r).execute( ($p, $r) -> $r.setValue( $p.getName() + " can drink")));

jBPM

RuleFlowProcessFactory factory = RuleFlowProcessFactory.createProcess("demo.orderItems"); factory.variable("order", new ObjectDataType("com.myspace.demo.Order")); factory.variable("item", new ObjectDataType("java.lang.String")); factory.name("orderItems"); factory.packageName("com.myspace.demo"); factory.dynamic(false); factory.version("1.0"); factory.visibility("Private"); factory.metaData("TargetNamespace", "http://www.omg.org/bpmn20"); org.jbpm.ruleflow.core.factory.StartNodeFactory startNode1 = factory.startNode(1); startNode1.name("Start"); startNode1.done(); org.jbpm.ruleflow.core.factory.ActionNodeFactory actionNode2 = factory.actionNode(2); actionNode2.name("Show order details"); actionNode2.action(kcontext -> {

Startup Time

Conclusion

Take Aways

• Process in phases

• Do more in the pre-processing phase (compile-time)

• Do less during the processing phase (run-time)

• In other words, separate what you can do once from what you

have to do repeatedly

• Move all or some of your phases to compile-time

Resources

• Full Source Code https://github.com/evacchi/ypaat

• Your Program as a Transpiler (part I)

• Improving Application Performance by Applying Compiler Design http://bit.ly/ypaat-performance

• Other resources

• Submarine https://github.com/kiegroup/submarine-examples

• Drools Blog http://blog.athico.com

• Crafting Interpreters http://craftinginterpreters.com

• GraalVM.org

• Quarkus.io

Edoardo Vacchi @evacchi

Q&A

Documents

Your Program as a Transpiler - QConSP€¦ · •My first task in Red Hat: marshalling backend for jBPM •Data model mapping • From XML tree model to graph representation •Apparently