21
Rust Parsing Robustness Corey Richardson Mozilla Intern 2014

Mozilla Intern Summer 2014 Presentation

Embed Size (px)

Citation preview

Page 1: Mozilla Intern Summer 2014 Presentation

Rust Parsing RobustnessCorey RichardsonMozilla Intern 2014

Page 2: Mozilla Intern Summer 2014 Presentation

Rust

Language in development by Mozilla Research.

Really awesome. Getting more awesome.

Page 3: Mozilla Intern Summer 2014 Presentation

The Goal

Define a grammar.

Verify that what we implement is what we say we implement.

Page 4: Mozilla Intern Summer 2014 Presentation

The Situation

lines of handwritten parser

No grammar

Everyone is sad :(

Page 5: Mozilla Intern Summer 2014 Presentation

The Benefits

Rust gets a well-defined grammar.

Our parser gets a model we can cross-verify.

We gain confidence in our syntax.

Page 6: Mozilla Intern Summer 2014 Presentation

The Snag

I had never done anything like this before!

I knew almost nothing about parsing.

I barely grokked (E)BNF.

Page 7: Mozilla Intern Summer 2014 Presentation

The Gameplan

1. Define a lexical grammar.2. Define a “parser” grammar.3. Make sure + grammar accept/reject

same things.4. ???5. Profit

Page 8: Mozilla Intern Summer 2014 Presentation

Lexer

Takes characters, produces “tokens”

Token represents a single “unit” that the grammar cares about, and represents a category of syntax.

Page 9: Mozilla Intern Summer 2014 Presentation

Lexer - Tokens

In Rust, we have tokens for literals, punctuation, etc.

Many language have very similar tokens.

Page 10: Mozilla Intern Summer 2014 Presentation

Problems with our (old) lexer

Did work a lexer probably shouldn’t.

Very hard to model in any sane way.

Lost information, hard to use in tools (importantly, rustfmt).

Page 11: Mozilla Intern Summer 2014 Presentation

Lexical Syntax Simplification

Wrote an RFC to modify our token definition slightly, simplifies the internals.

Somewhat misguided; tried to make each keyword its own token.

Changes in-tree.

Page 12: Mozilla Intern Summer 2014 Presentation

Sample Rule

Page 13: Mozilla Intern Summer 2014 Presentation

Sidetracked by: Macros

John Clements came for a month to work on macro hygiene.

I exploited this to learn everything I could about macros, esp. implementing them in Rust.

Page 14: Mozilla Intern Summer 2014 Presentation

Lexer Tests

Verifying the model lexer I created produces exactly the same tokens as .

Are in the tree, running today. Took lots of hair-pulling debugging.

Page 15: Mozilla Intern Summer 2014 Presentation

Parsing

A “language” is a set of strings (over a given alphabet).

A “grammar” defines rules for generating strings in a language.

Page 16: Mozilla Intern Summer 2014 Presentation

Parsing

A “parser” produces a “syntax tree” from tokens, and also serves as a recognizer for determining whether a string is in a given language.

Page 17: Mozilla Intern Summer 2014 Presentation

Parser Generators

“Parser generators” can generate a parser for a language, given a grammar.

Many algorithms, quite interesting field in general.

Page 18: Mozilla Intern Summer 2014 Presentation

Parser Generators

Most interested in LL and LR parsers, considered “simple”.

Chose due to John Clement’s existing grammar.

Page 19: Mozilla Intern Summer 2014 Presentation

Future WorkFinish LL(1) parser grammar.

Work on “nicer” reference grammar for manual.

Investigate integrating generated parser into compiler.

Work on understanding + improving the macro system.

Page 20: Mozilla Intern Summer 2014 Presentation

Questions?

Page 21: Mozilla Intern Summer 2014 Presentation

Thanks!