Monday, January 03, 2011

Parser generator versus combinatorial parser

I've used lex/yacc type parser generators many times in the past. For a change I used a parser monad for my latest educational effort. The initial reason was that I wanted to have auto-indentation and getting the lexical indentation state right is hard when the lexer is separate from the parser. With the parser monad, the lexical analysis is just the "lower level" of a single parsing algorithm, it is then much easier to implement constructions like auto-indent where the lexical interpretation is context specific.

Another big advantage of the parser monad is that it is "just a state monad" and, as I mentioned in an earlier post, multiple states can be combined in the same state monad while still being written separately. Therefore, having written my parser and type inference algorithm separately, I can now bring them together with advantage that type errors now show up within their parsing context. This way the type inference reports meaningful errors while not needing to know anything about the language source.

The disadvantage of working with combinatorial parsers are that it takes a lot of core refactoring to make things readable, another issue is correctness of the parser: it is not because you can define the parser that it will parse "correctly". Because of issues with F# laziness, I found that only allowing recursion to be brought in as a top level construction kept me on safer grounds.

No comments: