I’m using the grammar scheme of Danielsson and Norell (“Parsing Mixfix Operators”). The short version is:
- user defined mixfix operators such as
- the precedences form a DAGs
- it is known that this creates an unambiguous grammar if the operators don’t share lexical tokens
- otherwise, input gets rejected when the parse is ambiguous
We could define operators
if_then_, then the classic example has two parse trees (as long as both operators have the same precedence; see page 5 of the paper):
if a then if b then c else d if a then (if b then c else d) if a then (if b then c) else d
I’m interested in unparsing, that is, given an AST, produce a string that will get accepted by the parser as unambiguous and result in the same AST –
parse(unparse(ast)) = ast. This amounts to choosing where to put the parentheses. Of course, there is the trivial strategy of putting parentheses everywhere the grammar will accept them, but in the interest of legibility, I am interested in minimizing the amount of parentheses used.
The case of unambiguous grammar can be easily handled by following the non-terminals around. I can see a pretty naive method of extending this to ambiguous grammars:
- Construct a potentially ambiguous string representation using the above method
- Parse it back and check for ambiguity
- If the parse is ambiguous, diff the parse forest and add parentheses around the subexpressions
of each node that differs
- Try removing some of the parentheses one by one, testing the changes by running a parse
However, considering that parsing a grammar like this takes $O(n^3)$ and potentially superfluous parentheses are limited only by the size of the AST, I’m fairly confident this can take $O(n^4)$ time in the worst case.
- Is there a better way?
- Is there some exploitable structure to the ambiguities that can be found in this grammar scheme?
- Is there a way to check whether a parse tree is unique faster than $O(n^3)$?
- Is there a more conservative way of inserting parentheses where the parse trees differ?
Note: as this SO question explains, there is no way to decide whether parentheses are needed just by looking at a particular edge in the AST (or, in other words, impossible to define a relation of operators to operator holes in which they are, that decides this):
A cautionary example: Add another prefix operator say inc of the same precedence as if-then-else and if-then. Now assume an arbitrary set
P ⊂ H x Owhere
His the set of operator holes and
Ois the set of operators. The set is meant to be a relation that tells when parentheses need to be added. Examine the expressions
if a then inc b else cand
if a then (inc if b then c) else d. The first requires
(if-then-else.2, inc)to not be in
Pand the second requires the opposite. This contradicts the assumption the problem can be solved by a some kind of relation or an order. One could try to say let
(inc.1, if-then)be in
Pmaking the latter expression
if a then inc (if b then c) else d, but then
inc if a then bbecomes
inc (if a then b)which has too many parentheses.