I’m trying to merge a little bit of theory with a little bit of practice. I’m writing a parser-generator that generates a top-down parser based on a given grammar.

I’d like to handle left- and right recursion properly, so they would each yield the “left-recursive” and “right-recursive” tree respectively (I’m not sure if left- and rightmost derivative would be a correct term here, that would mean the whole tree). This is very important for PEG, where the associativity and order of operations is encoded in the tree. The problem is that top-down parsers can’t deal with left-recursion directly.

I know the elimination rule for left-recursion, which is given a rule in the form

$A rightarrow A alpha_1 | dots | A alpha_n | beta_1 | dots | beta_m$

We can rewrite it into right-recursion using the following 2 rules:

$A rightarrow beta_1 A’ | dots | beta_m A’ \

A’ rightarrow alpha_1 A’ | dots | alpha_n A’ | varepsilon$

This is fine for the “formal” part, but the grammar became right-recursive. To me this rule seems like it’s just using right-recursion to essentially loop parts of the grammar.

So my idea was to introduce a looping construct that would account for folding the tree left. If I introduce a new postfix operator `*`

that would mean repeating the construct 0 or more times, would the following also be a valid equivalent left-recursion elimination for A?

$A rightarrow (beta_1 | dots | beta_m) (alpha_1 | dots | alpha_n)^*$

Note that since the repetition can occur 0 times, this should account for the “epsilon-case” too.

I have a strong suspicion that this is equivalent, as I did something very similar when manually parsing left-associative operators in math expressions. I’d like confirmation before introducing it in a general tool.