Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Your Program as a Transpiler
Applying Compiler Design
to Everyday Programming
About Me
• Edoardo Vacchi @evacchi• Research @ University of Milan
• Research @ UniCredit R&D
• Drools and jBPM Team @ Red Hat
Motivation
Motivation
• My first task in Red Hat: marshalling backend for jBPM
• Data model mapping
• From XML tree model to graph representation
• Apparently boring, but challenging in a way
Motivation
• Language implementation is often seen as a dark art
• But some design patterns are simple at their core
• Best practices can be applied to everyday programming
Motivation (cont'd)
• Learning about language implementation will give you a
different angle to deal with many problems
• It will lead you to a better understanding of how GraalVM
and Quarkus do their magic
Goals
• Programs have often a pre-processing phase where you
prepare for execution
• Then, there's actual process execution phase
• Learn to recognize and structure the pre-processing phase
Transpilers
Transpilers vs. Compilers
• Compiler: translates code written in a language (source
code) into code written in a target language (object code).
The target language may be at a lower level of abstraction
• Transpiler: translates code written in a language into
code written in another language at the same level of
abstraction (Source-to-Source Translator).
Are transpilers simpler than compilers?
• Lower-level languages are complex
• They are not: if anything, they're simple
• Syntactic sugar is not a higher-level of abstraction
• It is: a concise construct is expanded at compile-time
• Proper compilers do low-level optimizations
• You are thinking of optimizing compilers.
The distinction is moot
• It is pretty easy to write a crappy compiler, call it a
transpiler and feel at peace with yourself
• Writing a good transpiler is no different or harder than
writing a good compiler
• So, how do you write a good compiler?
Your Program as a Compiler
Applying Compiler Design
to Everyday Programming
Compiler-like workflows
• At least two classes of problems can be solved with
compiler-like workflows
• Boot time optimization problems
• Data transformation problems
Compiler-like workflows
• At least two classes of problems can be solved with
compiler-like workflows
• Boot time optimization problems
• Data transformation problems
Running ExampleFunction Orchestration
Function Orchestration
• You are building an immutable Dockerized serverless
function
f g
Function Orchestration
• Problem • No standard* way to describe function orchestration yet
* Yes, I know about https://github.com/cncf/wg-serverless
f g
process: elements: - start: &_1 name: Start - function: &_2: name: Hello - end: &_3 name: End - edge: source: *_1 target: *_2 - edge: source: *_2 target: *_3
Start EndHello
Solution: Roll your own YAML format
Congratulations !
Enjoy attending conferences worldwide
Alternate Solution
• You are describing a workflow
• There is a perfectly fine standard: BPMN• Business Process Model and Notation
Task 1 Task 2
<process id="Minimal" name="Minimal Process">
<startEvent id="_1" name="Start"/>
<scriptTask id="_2" name="Hello">
<script>System.out.println("Hello World");</script>
</scriptTask>
<endEvent id="_3" name="End">
<terminateEventDefinition/>
</endEvent>
<sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/>
<sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/>
</process>
https://github.com/evacchi/ypaat
Start EndHello
Start EndHello
Downside: Nobody will invite you at
their conference to talk about BPM.
Start EndHello
Unless you trick them.
Downside: Nobody will invite you at
their conference to talk about BPM.
Bonuses for choosing BPMN
• Standard XML-based serialization format
• that's not the bonus
• There is standard tooling to validate and parse
• that is a bonus
• Moreover:• Different types of nodes included in the main spec
• Optional spec for laying out nodes on a diagram
Start
End
Hello
Goals
• Read a BPMN workflow
• Execute that workflow
• Visualize that workflow
Start
End
Hello
Step 1Recognize your compilation phase
What's a compilation phase?
• It's your setup phase.
• You do it only once before the actual processing begins
Configuring the application
• Problem. Use config values from a file/env vars/etc
• Do you validate config values each time you read them?
• Compile-time:• Read config values into a validated data structure
• Run-time:• Use validated config values
Data Transformation Pipeline
• Problem. Manipulate data to produce analytics
• Compile-time:• Define transformations (e.g. map, filter, etc. operations)
• Decide the execution plan (local, distributed, etc.)
• Run-time:• Evaluate the execution plan
Example: BPMN Execution
• Problem. Execute a workflow description.
• Compile-time:• Read BPMN into a visitable structure (StartEvent)
• Run-time:• Visit the structure
• For each node, execute tasks
Start
End
Hello
Example: BPMN Visualization
• Problem. Visualize a workflow diagram.
• Compile-time:• Read BPMN into a graph
• Run-time:• For each node and edge, draw on a canvas
Start
End
Hello
Read BPMN into a Data Structure
• Full XML Schema Definition* is automatically mapped
onto Java classes, validated against schema constraints
TDefinitions tdefs = JAXB.unmarshal( resource, TDefinitions.class);
* Yes kids, we have working schemas
BPMN: From Tree to Graph
• No ordering imposed
on the description
<process id="Minimal" name="Minimal Process">
<sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/>
<endEvent id="_3" name="End">
<terminateEventDefinition/>
</endEvent>
<sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/>
<scriptTask id="_2" name="Hello">
<script>System.out.println("Hello World");</script>
</scriptTask>
<startEvent id="_1" name="Start"/>
</process>
Forward References
<definitions> <process id="Minimal" name="Minimal Process"> <startEvent id="_1" name="Start"/>
<scriptTask id="_2" name="Hello"> <script>System.out.println("Hello World");</script> </scriptTask>
<endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent>
<sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/>
<sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/>
</process>
https://github.com/evacchi/ypaat <bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess">
<bpmndi:BPMNShape bpmnElement="_1">
<dc:Bounds x="11" y="30" width="48" height="48"/>
</bpmndi:BPMNShape>
<bpmndi:BPMNShape bpmnElement="_2">
<dc:Bounds x="193" y="30" width="80" height="48"/>
</bpmndi:BPMNShape>
<bpmndi:BPMNShape bpmnElement="_3">
<dc:Bounds x="396" y="30" width="48" height="48"/>
</bpmndi:BPMNShape>
<bpmndi:BPMNEdge bpmnElement="_1-_2">
<di:waypoint x="35" y="50"/>
<di:waypoint x="229" y="50"/>
</bpmndi:BPMNEdge>
<bpmndi:BPMNEdge bpmnElement="_2-_3">
<di:waypoint x="229" y="50"/>
<di:waypoint x="441" y="50"/>
</bpmndi:BPMNEdge>
</bpmndi:BPMNPlane>
</bpmndi:BPMNDiagram>
</definitions>
Separate Layout Definition
<definitions> <process id="Minimal" name="Minimal Process"> <startEvent id="_1" name="Start"/>
<scriptTask id="_2" name="Hello"> <script>System.out.println("Hello World");</script> </scriptTask>
<endEvent id="_3" name="End"> <terminateEventDefinition/> </endEvent>
<sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/>
<sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/>
</process>
https://github.com/evacchi/ypaat <bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess">
<bpmndi:BPMNShape bpmnElement="_1">
<dc:Bounds x="11" y="30" width="48" height="48"/>
</bpmndi:BPMNShape>
<bpmndi:BPMNShape bpmnElement="_2">
<dc:Bounds x="193" y="30" width="80" height="48"/>
</bpmndi:BPMNShape>
<bpmndi:BPMNShape bpmnElement="_3">
<dc:Bounds x="396" y="30" width="48" height="48"/>
</bpmndi:BPMNShape>
<bpmndi:BPMNEdge bpmnElement="_1-_2">
<di:waypoint x="35" y="50"/>
<di:waypoint x="229" y="50"/>
</bpmndi:BPMNEdge>
<bpmndi:BPMNEdge bpmnElement="_2-_3">
<di:waypoint x="229" y="50"/>
<di:waypoint x="441" y="50"/>
</bpmndi:BPMNEdge>
</bpmndi:BPMNPlane>
</bpmndi:BPMNDiagram>
</definitions>
Separate Layout Definition
Step 2Work like a compiler
Compiling a programming language
• You start from a text representation of a program
• The text representation is fed to a parser
• The parser returns a parse tree
• The parse tree is refined into an abstract syntax tree (AST)
• The AST is further refined through intermediate representations (IRs)
• Up until the final representation is returned
Compiling a programming language
• You start from a text representation of a program
• The text representation is fed to a parser
• The parser returns a parse tree
• The parse tree is refined into an abstract syntax tree (AST)
• The AST is further refined through intermediate representations (IRs)
• Up until the final representation is returned
What makes a compiler a proper compiler
• Not optimization
• Compilation Phases
• You can have as many as you like
Example. A Configuration File
3 Sanitize values
2 Unmarshall file into a typed object
1 Read file from (class)path
5 Coerce to typed values
4 Validate values
Example. Produce a Report
3 Merge into single data stream
2 Discard invalid values
1 Fetch data from different sources
5 Generate synthesis data structure
4 Compute aggregates (sums, avgs, etc.)
Example. A Workflow Engine
2 Collect nodes
1 Read BPMN file
4 Prepare for visit/layout
3 Collect edges
Start EndHello
Compilation Phases
• Better separation of concerns
• Better testability
• You can test each intermediate result
• You can choose when and where each phase gets evaluated
• More Requirements = More Phases !
Phase vs Pass
• Many phases do not necessarily mean as many passes
• You could do several phases in one pass
• Logically phases are still distinct
One Pass vs. Multi-Passfor value in config: sanitized = sanitize(value) validated = validate(sanitized) coerced = coerce(validated)
for value in config: sanitized += sanitize(value)for value in sanitized: validated += validate(value)for value in validated: coerced += coerce(value)
Myth: one pass doing many things is better than doing many passes, each doing one thing
It is not: Complexity for value in config: sanitized = sanitize(value) validated = validate(sanitized) coerced = coerce(validated)
n times: sanitize = 1 op validate = 1 op coerce = 1 op
(1 op + 1 op + 1 op) × n = 3n
for value in config: sanitized += sanitize(value)for value in sanitized: validated += validate(value)for value in validated: coerced += coerce(value)
n times: sanitize = n opn times: validate = n opn times: coerce = n op
(n + n + n) = 3n
Single-pass is not always possible
However, doing one
pass may be be
cumbersome or plain
impossible to do
<process id="Minimal" name="Minimal Process">
<sequenceFlow id="_1-_2" sourceRef="_1" targetRef="_2"/>
<endEvent id="_3" name="End">
<terminateEventDefinition/>
</endEvent>
<sequenceFlow id="_2-_3" sourceRef="_2" targetRef="_3"/>
<scriptTask id="_2" name="Hello">
<script>System.out.println("Hello World");</script>
</scriptTask>
<startEvent id="_1" name="Start"/>
</process>
Forward References
Workflow Phases: Evaluationvar resource = getResourceAsStream("/example.bpmn2");
var tdefs = unmarshall(resource, TDefinitions.class);
var graphBuilder = new GraphBuilder();
// collect nodes on the builder
var nodeCollector = new NodeCollector(graphBuilder);
nodeCollector.visitFlowElements(tdefs.getFlowElements());
// collect edges on the builder
var edgeCollector = new EdgeCollector(graphBuilder);
edgeCollector.visitFlowElements(tdefs.getFlowElements());
https://github.com/evacchi/ypaat
2
3
4
5
1 // prepare graph for visit
var engineGraph = EngineGraph.of(graphBuilder);
// “interpret” the graph
var engine = new Engine(engineGraph);
engine.eval();
Workflow Phases: Layout
<?xml version="1.0" encoding="UTF-8"?>
<definitions ...>
<process id="Minimal" name="Minimal Process">
<startEvent id="_1" name="Start"/>
...
</process>
<bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess">
<bpmndi:BPMNShape bpmnElement="_1">
<dc:Bounds x="11" y="30" width="48" height="48"/>
... </bpmndi:BPMNDiagram></definitions>
https://github.com/evacchi/ypaat
var resource = getResourceAsStream("/example.bpmn2");
var tdefs = unmarshall(resource, TDefinitions.class);
var graphBuilder = new GraphBuilder();
// collect nodes on the builder
var nodeCollector = new NodeCollector(graphBuilder);
nodeCollector.visitFlowElements(tdefs.getFlowElements());
// collect edges on the builder
var edgeCollector = new EdgeCollector(graphBuilder);
edgeCollector.visitFlowElements(tdefs.getFlowElements());
2
3
4
5
1 // extract layout information
var extractor = new LayoutExtractor();
extractor.visit(tdefs);
var index = extractor.index();
// “compile” into buffered image
var canvas = new Canvas(graphBuilder, index);
var bufferedImage canvas.eval();
Visitors
Data Structures
TFlowElement
| +---- StartEventNode | +---- EndEventNode | `---- ScriptTask
Pattern Matching
nodeCollector.visit(node)
def visit(node: TFlowElement) = {
node match { case StartEventNode(...) => ... case EndEventNode(...) => ... case ScriptTask(...) => ... } }
The Poor Man's Alternatives
interface Visitor { void visit(TFlowElement el); void visit(TStartEventNode start); void visit(TEndEventNode end); void visit(TScriptTask task);}
interface Visitable { void accept(Visitor v);}
if (node instanceof StartEventNode) { StartEventNode evt = (StartEventNode) node; ...} else if (node instanceof EndEventNode) { EndEventNode evt = (EndEventNode) node;
...} else if (node instanceof ScriptTask) ScriptTask evt = (ScriptTask) node; ...}
Visitor Patternclass NodeCollector implements Visitor {
void visit(TStartEventNode start) {
graphBuilder.add(
new StartEventNode(evt.getId(), evt));
}
void visit(TEndEvent evt) {
graphBuilder.add(
new EndEventNode(evt.getId(), evt));
}
void visit(TScriptTask task) {
graphBuilder.add(
new ScriptTaskNode(task.getId(), task));
}
}
class EdgeCollector implements Visitor {
void visit(TSequenceFlow seq) {
graphBuilder.addEdge(
seq.getId(),
seq.getSourceRef(),
seq.getTargetRef());
}
}
https://github.com/evacchi/ypaat
Step 3Choose a run-time representation
Workflow Evaluation
• Choose a representation suitable for
evaluation
• In our case, for each node, we need to get
the outgoing edges with the next node to
visit
• The most convenient representation of
the graph is adjacency lists
• adj( p ) = { q | ( p, q ) edges }
var graphBuilder = new GraphBuilder();
...
// prepare graph for visit
var engineGraph =
EngineGraph.of(graphBuilder);
// decorate with an evaluator
var engine =
new Engine(engineGraph);
// evaluate the graph by visiting once more
engine.eval();
Map<Node, List<Node>> outgoing;
Workflow Evaluation
• The most convenient representation of the graph is adjacency lists
• adj( p ) ↦ { q | ( p, q ) edges }
• Map<Node, List<Node>> outgoing
Evaluationclass Engine implements GraphVisitor { void visit(StartEventNode node) { logger.info("Process '{}' started.", graph.name()); graph.outgoing(node).forEach(this::visit); } void visit(EndEventNode node) { logger.info("Process ended."); // no outgoing edges } void visit(ScriptTaskNode node) { logger.info("Evaluating script task: {}", node.element().getScript().getContent()); graph.outgoing(node).forEach(this::visit); } ...}
https://github.com/evacchi/ypaat
Workflow Layout• In this case, for each node and edge,
we need to get the shape and position
• No particular ordering is required
• e.g. first render edges and then shapes
<?xml version="1.0" encoding="UTF-8"?>
<definitions ...>
<process id="Minimal" name="Minimal Process">
<startEvent id="_1" name="Start"/>
...
</process>
<bpmndi:BPMNDiagram> <bpmndi:BPMNPlane bpmnElement="SubProcess">
<bpmndi:BPMNShape bpmnElement="_1">
<dc:Bounds x="11" y="30" width="48" height="48"/>
... </bpmndi:BPMNDiagram>
</definitions>
var canvas = new Canvas(graph, index);
var bufferedImage canvas.eval();
void eval() {
graph.edges().forEach(this::draw);
graph.nodes().forEach(this::visit);
}
https://github.com/evacchi/ypaat
Layoutclass Canvas implements GraphVisitor { void draw(Edge edge) { var pts = index.edge(edge.id()); setStroke(Color.BLACK); var left = pts.get(0); for (int i = 1; i < pts.size(); i++) { var right = pts.get(i); drawLine(left.x, left.y, right.x, right.y); left = right; } } void visit(StartEventNode node) { var shape = shapeOf(node); setStroke(Color.BLACK); setFill(Color.GREEN); drawEllipse(shape.x, shape.y, shape.width, shape.height); drawLabel(element.getName()); } ...}
Start
End
Hello
Bonus Step 4Generate code at compile-time
The Killer App
• Move pre-processing out of program run-time
• Generate code
• Run-time effectively consists only in pure processing
AI and Automation Platform
• Drools rule engine
• jBPM workflow platform
• OptaPlanner constraint solver
The Submarine Initiative
“The question of whether a computer can
think is no more interesting than the
question of whether a submarine can
swim.”
Edsger W. Dijkstra
GraalVM: “One VM to Rule Them All”
• Polyglot VM with cross-language JIT
• Java Bytecode and JVM Languages
• Dynamic Languages (Truffle API)
• Native binary compilation (SubstrateVM)
GraalVM: “One VM to Rule Them All”
• Polyglot VM with cross-language JIT
• Java Bytecode and JVM Languages
• Dynamic Languages (Truffle API)
• Native binary compilation (SubstrateVM)
Native Image: Restrictions
• Native binary compilation
• Restriction: “closed-world assumption”
• No dynamic code loading
• You must declare classes you want to reflect upon
Quarkus
Drools and jBPM
rule R1 when // constraints
$r : Result()
$p : Person( age >= 18 )
then // consequence
$r.setValue( $p.getName() + " can drink");
end
Drools
jBPM
Drools DRL
rule R1 when // constraints $r : Result() $p : Person( age >= 18 ) then // consequence $r.setValue( $p.getName() + " can drink"); end
var r = declarationOf(Result.class, "$r");var p = declarationOf(Person.class, "$p");
var rule = rule("com.example", "R1").build( pattern(r), pattern(p) .expr("e", p -> p.getAge() >= 18), alphaIndexedBy( int.class, GREATER_OR_EQUAL, 1, this::getAge, 18), reactOn("age")), on(p, r).execute( ($p, $r) -> $r.setValue( $p.getName() + " can drink")));
jBPM
RuleFlowProcessFactory factory = RuleFlowProcessFactory.createProcess("demo.orderItems"); factory.variable("order", new ObjectDataType("com.myspace.demo.Order")); factory.variable("item", new ObjectDataType("java.lang.String")); factory.name("orderItems"); factory.packageName("com.myspace.demo"); factory.dynamic(false); factory.version("1.0"); factory.visibility("Private"); factory.metaData("TargetNamespace", "http://www.omg.org/bpmn20"); org.jbpm.ruleflow.core.factory.StartNodeFactory startNode1 = factory.startNode(1); startNode1.name("Start"); startNode1.done(); org.jbpm.ruleflow.core.factory.ActionNodeFactory actionNode2 = factory.actionNode(2); actionNode2.name("Show order details"); actionNode2.action(kcontext -> {
Startup Time
Conclusion
Take Aways
• Process in phases
• Do more in the pre-processing phase (compile-time)
• Do less during the processing phase (run-time)
• In other words, separate what you can do once from what you
have to do repeatedly
• Move all or some of your phases to compile-time
Resources
• Full Source Code https://github.com/evacchi/ypaat
• Your Program as a Transpiler (part I)
• Improving Application Performance by Applying Compiler Design http://bit.ly/ypaat-performance
• Other resources
• Submarine https://github.com/kiegroup/submarine-examples
• Drools Blog http://blog.athico.com
• Crafting Interpreters http://craftinginterpreters.com
• GraalVM.org
• Quarkus.io
Edoardo Vacchi @evacchi
Q&A