In this chaper I'll show how I would make a parser.
A parser, in addition to our lexer, transforms the input program as text, meaning an unstructured sequence of characters, into a structered representation. Structured meaning the representation tells us about the different constructs such as if statements and expressions.
## Abstract Syntax Tree AST
The result of parsing is a tree structure representing the input program.
This structure is a recursive acyclic structure storing the different parts of the program.
Both `Stmt` (statement) and `Expr` (expression) are polymorphic types, meaning an expression, for example, can be either an addition operation containing 2 inner expressions or an integer expression containing the integer value, etc. This can also be implemented with classes and sub classes.
For both `Stmt` and `Expr` there's an error-kind. This makes the parser simpler, as we won't need to manage parsing failures differently than successful parslings.
To start, we'll implement a `Parser` class, which for now is simply a consumer of a token iterater, meaning the lexer. In simple terms, whereas the lexer is a transformation from text to tokens, the parser is a transformation from token to an AST, except that the parser is not an iterator.
This implementation should look familiar compared to the lexer. We use the `currentToken` as a 'buffer', and then just use the `.next()` on the `lexer`.
Just as the lexer, we'll have a `.pos()` method, returning the current position.
For convenience, although there are other ways of doing it, we'll implement another public method on `Lexer`, which will return the lexer's current position.
```ts
class Lexer {
// ...
public currentPos(): Pos { return this.pos(); }
// ...
}
```
The reason, is that when the lexer has reached the end of the file, the `.next()` method will return `null` instead of a token with a position, meaning we won't get the position after the last token.
The parser does not need to keep track of `index`, `line` and `col` as those are stored in the tokens. The token's position is prefered to the lexer's.
When testing, we first check that we have not reach the end. Either we have to do that here, or the caller will have to write something like `!this.done() && this.test(...)`, and it's easy to do it here.
We'll also want a method for reporting errors.
```ts
class Parser {
// ...
private report(pos: Pos, msg: string) {
console.log(`Parser: ${msg} at ${pos.line}:${pos.col}`);
Operands are the individual parts of an operation. For example, in the math expression `a + b`, (would be `+ a b` in the input language), `a` and `b` are the *operands*, while `+` is the *operator*. In the expression `a + b * c`, the operands are `a`, `b` and `c`. But in the expression `a * (b + c)`, the operands of the multiply operation are `a` and `(b + c)`. `(b + c)` is an operands, because it is enclosed on both sides. This is how we'll define operands.
We'll make a public method in `Parser` called `parseOperand`.
```ts
class Parser {
// ...
public parseOperand(): Expr {
const pos = this.pos();
if (this.test("int")) {
const value = this.current().intValue;
this.step();
return { kind: { type: "int", value }, pos };
}
this.report(pos "expected expr");
this.step();
return { kind: { type: "error" }, pos };
}
// ...
}
```
### Integer
Parsing an integer is a 1:1 translation between the integer token and an integer expression.