5.1 KiB
Parser
In this chaper I'll show how I would make a parser.
A parser, in addition to our lexer, transforms the input program as text, meaning an unstructured sequence of characters, into a structered representation. Structured meaning the representation tells us about the different constructs such as if statements and expressions.
Abstract Syntax Tree AST
The result of parsing is a tree structure representing the input program.
This structure is a recursive acyclic structure storing the different parts of the program.
This is how I would define an AST data type.
type Stmt = {
kind: StmtKind,
pos: Pos,
};
type StmtKind =
| { type: "error" }
// ...
| { type: "let", ident: string, value: Expr }
// ...
;
type Expr = {
kind: ExprKind,
pos: Pos,
};
type ExprKind =
| { type: "error" }
// ...
| { type: "int", value: number }
// ...
;
Both Stmt
(statement) and Expr
(expression) are polymorphic types, meaning an expression, for example, can be either an addition operation containing 2 inner expressions or an integer expression containing the integer value, etc. This can also be implemented with classes and sub classes.
For both Stmt
and Expr
there's an error-kind. This makes the parser simpler, as we won't need to manage parsing failures differently than successful parslings.
Consumer of lexer
To start, we'll implement a Parser
class, which for now is simply a consumer of a token iterater, meaning the lexer. In simple terms, whereas the lexer is a transformation from text to tokens, the parser is a transformation from token to an AST, except that the parser is not an iterator.
class Parser {
private currentToken: Token | null;
public constructor(private lexer: Lexer) {
this.currentToken = lexer.next();
}
// ...
private step() { this.currentToken = this.lexer.next() }
private done(): bool { return this.currentToken == null; }
private current(): Token { return this.currentToken!; }
// ...
}
This implementation should look familiar compared to the lexer. We use the currentToken
as a 'buffer', and then just use the .next()
on the lexer
.
Just as the lexer, we'll have a .pos()
method, returning the current position.
For convenience, although there are other ways of doing it, we'll implement another public method on Lexer
, which will return the lexer's current position.
class Lexer {
// ...
public currentPos(): Pos { return this.pos(); }
// ...
}
The reason, is that when the lexer has reached the end of the file, the .next()
method will return null
instead of a token with a position, meaning we won't get the position after the last token.
class Parser {
// ...
private pos(): Pos {
if (this.done())
return this.lexer.currentPos();
return this.current().pos;
}
// ...
}
The parser does not need to keep track of index
, line
and col
as those are stored in the tokens. The token's position is prefered to the lexer's.
Also like the lexer, we'll have a .test()
method in the parser, which will test for token type rather than strings or regex.
class Parser {
// ...
private test(type: string): bool {
return !this.done() && this.current().type === type;
}
// ...
}
When testing, we first check that we have not reach the end. Either we have to do that here, or the caller will have to write something like !this.done() && this.test(...)
, and it's easy to do it here.
We'll also want a method for reporting errors.
class Parser {
// ...
private report(pos: Pos, msg: string) {
console.log(`Parser: ${msg} at ${pos.line}:${pos.col}`);
}
// ...
}
Operands
Operands are the individual parts of an operation. For example, in the math expression a + b
, (would be + a b
in the input language), a
and b
are the operands, while +
is the operator. In the expression a + b * c
, the operands are a
, b
and c
. But in the expression a * (b + c)
, the operands of the multiply operation are a
and (b + c)
. (b + c)
is an operands, because it is enclosed on both sides. This is how we'll define operands.
We'll make a public method in Parser
called parseOperand
.
class Parser {
// ...
public parseOperand(): Expr {
const pos = this.pos();
if (this.test("int")) {
const value = this.current().intValue;
this.step();
return { kind: { type: "int", value }, pos };
}
this.report(pos "expected expr");
this.step();
return { kind: { type: "error" }, pos };
}
// ...
}
Integer
Parsing an integer is a 1:1 translation between the integer token and an integer expression.
type ExprKind =
// ...
| { type: "int", value: number }
// ...
;
class Parser {
// ...
public parseOperand(): Expr {
// ...
if (this.test("int")) {
const value = this.current().intValue;
this.step();
return { kind: { type: "int", value }, pos };
}
// ...
}
// ...
}