Introduction to Language Theory and Compiling

Shubham Batra, Nathan Liccardo

November 26, 2017

1

Contents

1 Part 1 : Grammar modifications 3

1.1 Unreachableandunproductivevariables …………….. 4

1.2 Non-ambigousgrammar …………………….. 4

1.3 LeftRecursionandLeftFactoring………………… 4

1.3.1 LeftRecursion………………………. 4

1.3.2 LeftFactoring………………………. 5

1.3.3 Algorithms ……………………….. 5

2 Part 2 : Grammar analyses 6

2.1 First1(X)…………………………….. 6

2.2 Follow1(X)……………………………. 7

2.3 Actiontable…………………………… 8

3 Part 3 : Parser 8

3.1 Recursivedescentparser …………………….. 9

3.2 Implementation…………………………. 10

3.3 ParserTree …………………………… 10

2

1 Part 1 : Grammar modifications

For the first part, we were asked to transform the IMP grammar in order to :

• Remove unreachable and/or unproductive variables, if any;

• Make the grammar non-ambiguous by taking into account the priority and the

associativity of the operators.

• Remove left-recursion and apply factorisation where need be.

Here is the original grammar :

1 ` end`

2 ` ? ?`

3 ` ? `

4

5

6

```
``` 7 ?

8 ?

9 ?

10 ?

11 ?

12 ? VarName :=

13 ? VarName

14 ? Number

15 ? ( )

16 ? –

17 ?

18 ? +

19 ? –

20 ? *

21 ? /

22 ? if then ` endif`

23 ? if then ` else `` endif`

24 ?

25 ? not

26 ?

27 ?

28 ? and

29 ? or

30 ? =

31 ? >=

32 ? >

33 ? <=
34 ? <
35 ? <>

36 ? while do ` done`

37 ? for VarName from by

to do ` done`

38 ? for VarName from to do ` done`

39 ? print(VarName)

40 ? read(VarName)

```
``` 3

1.1 Unreachable and unproductive variables

Firstly, we were asked to find the unreachable and unproductive variables in the gram-

mar, but we do not find any unproductive symbol or inaccessible symbol in the gram-

mar.

1.2 Non-ambigous grammar

Secondly we need to make the grammar non-ambigous by taking into account the fol-

lowing table of priority and associativity of the operators (decreasing order of priority) :

Operators

Associativity

-, not

?, /

+, –

>, <, >=, <=, =, <>

and

or

right

left

left

left

left

left

To do it, we just needed to transform a little bit our grammar. The trick is to detect

firstly the symbols with the lowest priority (or) and to finish by the highest priority (-,

not). The “or” and “and symbols can only appears in condition. We so need to check

that we will firstly detect those elements. This step will be done by transforming the

following rules :

1 ?

2 ? and

3 ? or

by :

6 ? ?

This transforation was aslo applied to +, ? and ?, / operators.

1.3 Left Recursion and Left Factoring

1.3.1 Left Recursion

Left recursion often poses problems for parsers, either because it leads them into infinite

recursion or because they expect rules in a normal form that forbids it Therefore, a

grammar is often pre-processed to eliminate the left recursion. In our current grammar,

we will ilustrated removing left-recursion with the following example :

1 ? or

2 ?

3 ? ?

4 ? and

5 ?

4

1 ? or

2 ?

3 ? ?

To remove it, we firstly transform the and then create the and

rules :

1 ?

2 ?

3 ? or

4 ? ?

The same trick was applied to each left-recursion in the grammar.

1.3.2 Left Factoring

Left factoring is a grammar transformation that is useful for producing a grammar

suitable for predictive parsing. The basic idea is that when it is not clear which of two

alternative productions to use to expand a non-terminal A, we may be able to rewrite

the A-productions to defer the decision until we have seen enough of the input to make

the right choice. For example,

1 ? for VarName from by

to do ` done`

2 ? for VarName from to do ` done`

```
``` becomes :

1 ? for VarName from by

to ** do **` done`

2 ** ? by **

3 ** ? ?**

**
** Again, the same trick was applied to each left-factoring in the grammar.

1.3.3 Algorithms

For information, you can find here the pseudo-code to generate both left factoring and

recursion :

5

2 Part 2 : Grammar analyses

Secondly, we were mandatory to give the action table of an LL(1) parser for the trans-

formed grammar. The construction of a predictive parser is aided by two functions

associated with a grammar G. These functions, FIRST and FOLLOW, allow us to

fill in the entries of a predictive parsing table for G, whenever possible. Sets of tokens

yielded by the FOLLOW function can also be used as synchronizing tokens during error

recovery.

2.1 First1(X)

If A is any symbol of the grammar, let FIRST1(A) be the set of terminals that begin

the strings derived from A. If the A rules contain ? then it is also an element of the

result set. To compute FIRST1(X), we applied to following steps for each identifiers :

RecursiveUpdate :

rules ? Grammar.getRulesWithIdentifier

for each rules :

if first element of rule = terminal :

set ? first element of rule

else if first element of rule = identifier :

RecursiveUpdate(first element of rule)

Which can be resumed to :

If X is terminal or ? , then FIRST(A) ? {X}.

If X is nonterminal and X ? Y1 Y2 … Yk is a production, then call FIRST1(Y1)

and put it on FIRST1(X).

In the project, the First1(X) function is represented by the class VariableSetFirst. This

class is helped by some other utils classes which was only created to make the project

more simple. You will find two importants functions used to make the first set which

are :

firstCalculation : used to iterate each identifiers.

recursiveUpdate : used to find every first values.

6

2.2 Follow1(X)

Define FOLLOW(A), for nonterminal A, to be the set of terminals a that can appear

immediately to the right of A. To compute FOLLOW1(A) for all nonterminals A, we

applied the following code which explain each steps :

private void followCalculation() {

_checked = new ArrayList<>();

for (VariableIdentifier var : _grammar.getIdentifiers()) {

_var = var;

recursive3(var);

}

}

private void recursive3(VariableIdentifier variable) {

_checked.add(variable);

ArrayList rules = _grammar.getNotRulesWith(variable);

for (Rule rule: rules) {

if (_var.equals(variable)) _currentRule = rule;

for (int index: rule.indexs(variable)) {

// CASE 1.1

}

}

if (notLastAndTerminal(rule, index))

caseNotLastAndTerminal(rule, index);

// CASE 1.2

else if (notLastAndIdentifier(rule, index))

caseNotLastAndIdentifier(rule, index);

// CASE 2

else

caseLastElement(rule);

_checked.remove(_checked.size()-1);

}

private boolean notLastAndTerminal(Rule rule, int index) {

if (index+1 < rule.getSize()) {
Variable current = rule.getRuleIndex(index+1);
if (current instanceof VariableTerminal) return true;
}
return false;
}
private boolean notLastAndIdentifier(Rule rule, int index) {
if (index+1 < rule.getSize()) {
Variable current = rule.getRuleIndex(index+1);
if (current instanceof VariableIdentifier) return true;
}
return false;
}
private void caseNotLastAndTerminal(Rule rule, int index) {
VariableTerminal terminal = (VariableTerminal) rule.getRuleIndex(index+1);
7
_set.addStart(_var, terminal, _currentRule);
}
private void caseNotLastAndIdentifier(Rule rule, int index) {
VariablePairSet set = _first.getFirst(rule.getRuleIndex(index+1));
if(set.removeAllTerminal(EPS) && !(_checked.contains(rule.getIdentifier())))
recursive3(rule.getIdentifier());
for (VariablePair var: set) {
_set.addStart(_var, var.getStart(), _currentRule);
}
}
private void caseLastElement(Rule rule) {
if (!(_checked.contains(rule.getIdentifier())))
recursive3(rule.getIdentifier());
}
2.3 Action table
To create the action table, we need the FIRST and FOLLOW sets for each non-terminal
in the grammar. We dram a n*m table from the input we receive. n denotes the num-
ber of non-terminals we have in the grammar and we determine m by counting all the
terminals in the grammar. Then in each cell, we add the number of the rule that allows
a non-terminal to access to a terminal variable.
In the project, the action table is constructed using the actionLine and actionLine-
Tuple classes. An array is constructed of all the variables in a line with line tuples in it.
Integers are only inserted where needed and rest spaces are left blank which helps us to
reduce the memory size. Less memory is needed to store the data. The first and follow
are needed to construct the table. In fact, we use the in the first time the FIRST1
function to calculate each first values. If we found an ? in the FIRST1 set, then we
apply the FOLLOW1 function. Those steps are explained bellow :
ActionTable {
create ActionTableInstance
for each non-terminal X :
create ActionTableLineInstance
set ? getAllFirst(X)
if ? in set :
set ? set.removeEps
set ? getFollow
ActionTableLine.add(set)
}
3
Part 3 : Parser
For the third part, we were asked to write, in Java, a recursive descent LL(1) parser for
this grammar.
8
3.1 Recursive descent parser
Recursive descent is a top-down parsing technique that constructs the parse tree from
the top and the input is read from left to right. It uses procedures for every terminal
and non-terminal entity. This parsing technique recursively parses the input to make a
parse tree, which may or may not require back-tracking. But the grammar associated
with it (if not left factored) cannot avoid back-tracking. A form of recursive-descent
parsing that does not require any back-tracking is known as predictive parsing.
Predictive parsing is possible only for the class of LL(k) grammars, which are the
context-free grammars for which there exists some positive integer k that allows a re-
cursive descent parser to decide which production to use by examining only the next
k tokens of input. The LL(k) grammars therefore exclude all ambiguous grammars,
as well as all grammars that contain left recursion. Any context-free grammar can be
transformed into an equivalent grammar that has no left recursion, but removal of left
recursion does not always yield an LL(k) grammar. A predictive parser runs in linear
time.
Recursive descent with backtracking is a technique that determines which production
to use by trying each production in turn. Recursive descent with backtracking is not
limited to LL(k) grammars, but is not guaranteed to terminate unless the grammar is
LL(k). Even when they terminate, parsers that use recursive descent with backtracking
may require exponential time.
Although predictive parsers are widely used, and are frequently chosen if writing a
parser by hand, programmers often prefer to use a table-based parser produced by
a parser generator, either for an LL(k) language or using an alternative parser, such
as LALR or LR. This is particularly the case if a grammar is not in LL(k) form, as
transforming the grammar to LL to make it suitable for predictive parsing is involved.
Predictive parsers can also be automatically generated, using tools like ANTLR.
Predictive parser is a recursive descent parser, which has the capability to predict
which production is to be used to replace the input string. The predictive parser does
not suffer from backtracking. To accomplish its tasks, the predictive parser uses a
look-ahead pointer, which points to the next input symbols. To make the parser back-
tracking free, the predictive parser puts some constraints on the grammar and accepts
only a class of grammar known as LL(k) grammar.
Predictive parsing uses a stack and a parsing table to parse the input and generate
a parse tree. Both the stack and the input contains an end symbol $ to denote that
the stack is empty and the input is consumed. The parser refers to the parsing table
to take any decision on the input and stack element combination.
9
3.2 Implementation
The implementation of the parser is quite simple. We firstly instantiated a stack and
then read every symbol of the input to change the stack states. Note that the function
stop if any error occurs. In the other case, it returns true which means that the input
is accepted by the grammar.
public boolean parse() {
// Step One : init stack
_stack.push(_grammar.getRule(0));
// Step Two : parse
while (!(_stack.isEmpty()) && !(_input.isEmpty())) {
// Pop the first variable of the stack
Variable current = _stack.pop();
// Get the curent variable in the input
VariableTerminal terminal = new VariableTerminal((_input.get(0)).getType());
// If the poped variable is a terminal
if (current instanceof VariableTerminal)
if (!(caseTerminal((VariableTerminal) current , terminal))) return false;
// If the poped variable is an Id
if (current instanceof VariableIdentifier)
caseIdentifier(terminal, (VariableIdentifier) current);
}
// Step Three : check the stack
System.out.println();
return (_stack.isEmpty());
}
3.3 Parser Tree
A parse tree is a graphical depiction of a derivation. It is convenient to see how strings
are derived from the start symbol. The start symbol of the derivation becomes the root
of the parse tree. In a parse tree:
• All leaf nodes are terminals.
• All interior nodes are non-terminals.
• In-order traversal gives original input string.
A parse tree depicts associativity and precedence of operators. The deepest sub-tree is
traversed first, therefore the operator in that sub-tree gets precedence over the operator
which is in the parent nodes.
In the project we added the possibility to show the parse tree. This can be done
by printing the number of the called rule at each recursion.
10