Lexerless parsing
Encyclopedia : L : LE : LEX : Lexerless parsing
Lexerless parsing (also called "scannerless parsing") refers to the use of a single formalism to express both the lexical and context-free syntax used to parse a language.
Advantages
- Only a single metasyntax is required
- Non-regular lexical structure is handled easily
- "Token classification" is not required (eliminating the need for "the lexer hack")
- A clear lexer-parser distinction is not required; languages without one are handled easily (TeX, most wiki grammars, Makefiles)
- Grammars are compositional (can be merged without human intervention) [#endnote_compositional]
- No "keywordification" -- keywords (such as "while" in C) can be used as identifiers anywhere that the keyword form would not be allowed
Required Extensions
Unfortunately, when parsed at the character level, most popular programming languages are no longer strictly context-free. Visser identified five key extensions to classical context-free syntax which handle almost all common non-context-free constructs arising in practice:
- Follow restrictions, a limited form of "longest match"
- Reject productions, a limited form of negative matching (as found in boolean grammars)
- Preference attributes to handle the "dangling else" construct in C-like languages
- Per-production transitions rather than per-nonterminal transitions in order to facilitate:
- * Associativity attributes, which prevent a self-reference in a particular production of a nonterminal from producing that same production
- * Precedence/priority rules, which prevent self-references in higher-precendence productions from producing lower-precedence productions
Implementations
- [dparser] generates ANSI C code for lexerless GLR parsers
- the [SGLR] component of ASF+SDF generates and interprets parse tables for lexerless grammars
Related Material
[Visser, E. (1997b). Scannerless generalized-LR parsing. Technical Report P9707, Programming Research Group, University of Amsterdam]
- ↑ since lexerless parsers handle the entire class of context-free languages, which is closed under composition.
From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.
