Upload
corey-richardson
View
163
Download
0
Tags:
Embed Size (px)
Citation preview
Rust Parsing RobustnessCorey RichardsonMozilla Intern 2014
Rust
Language in development by Mozilla Research.
Really awesome. Getting more awesome.
The Goal
Define a grammar.
Verify that what we implement is what we say we implement.
The Situation
lines of handwritten parser
No grammar
Everyone is sad :(
The Benefits
Rust gets a well-defined grammar.
Our parser gets a model we can cross-verify.
We gain confidence in our syntax.
The Snag
I had never done anything like this before!
I knew almost nothing about parsing.
I barely grokked (E)BNF.
The Gameplan
1. Define a lexical grammar.2. Define a “parser” grammar.3. Make sure + grammar accept/reject
same things.4. ???5. Profit
Lexer
Takes characters, produces “tokens”
Token represents a single “unit” that the grammar cares about, and represents a category of syntax.
Lexer - Tokens
In Rust, we have tokens for literals, punctuation, etc.
Many language have very similar tokens.
Problems with our (old) lexer
Did work a lexer probably shouldn’t.
Very hard to model in any sane way.
Lost information, hard to use in tools (importantly, rustfmt).
Lexical Syntax Simplification
Wrote an RFC to modify our token definition slightly, simplifies the internals.
Somewhat misguided; tried to make each keyword its own token.
Changes in-tree.
Sample Rule
Sidetracked by: Macros
John Clements came for a month to work on macro hygiene.
I exploited this to learn everything I could about macros, esp. implementing them in Rust.
Lexer Tests
Verifying the model lexer I created produces exactly the same tokens as .
Are in the tree, running today. Took lots of hair-pulling debugging.
Parsing
A “language” is a set of strings (over a given alphabet).
A “grammar” defines rules for generating strings in a language.
Parsing
A “parser” produces a “syntax tree” from tokens, and also serves as a recognizer for determining whether a string is in a given language.
Parser Generators
“Parser generators” can generate a parser for a language, given a grammar.
Many algorithms, quite interesting field in general.
Parser Generators
Most interested in LL and LR parsers, considered “simple”.
Chose due to John Clement’s existing grammar.
Future WorkFinish LL(1) parser grammar.
Work on “nicer” reference grammar for manual.
Investigate integrating generated parser into compiler.
Work on understanding + improving the macro system.
Questions?
Thanks!