courses/compiler/chapter_3.md


# Parser

In this chaper I'll show how I would make a parser.

A parser, in addition to our lexer, transforms the input program as text, meaning an unstructured sequence of characters, into a structered representation. Structured meaning the representation tells us about the different constructs such as if statements and expressions.

## Abstract Syntax Tree AST

The result of parsing is a tree structure representing the input program.

This structure is a recursive acyclic structure storing the different parts of the program.

This is how I would define an AST data type.

```ts
type Stmt = {
    kind: StmtKind,
    pos: Pos,
};

type StmtKind =
    | { type: "error" }
    // ...
    | { type: "let", ident: string, value: Expr }
    // ...
    ;

type Expr = {
    kind: ExprKind,
    pos: Pos,
};

type ExprKind =
    | { type: "error" }
    // ...
    | { type: "int", value: number }
    // ...
    ;
```

Both `Stmt` (statement) and `Expr` (expression) are polymorphic types, meaning an expression, for example, can be either an addition operation containing 2 inner expressions or an integer expression containing the integer value, etc. This can also be implemented with classes and sub classes.

For both `Stmt` and `Expr` there's an error-kind. This makes the parser simpler, as we won't need to manage parsing failures differently than successful parslings.

## Consumer of lexer

To start, we'll implement a `Parser` class, which for now is simply a consumer of a token iterater, meaning the lexer. In simple terms, whereas the lexer is a transformation from text to tokens, the parser is a transformation from token to an AST, except that the parser is not an iterator.

```ts
class Parser {
    private currentToken: Token | null;

    public constructor(private lexer: Lexer) {
        this.currentToken = lexer.next();
    }
    // ...
    private step() { this.currentToken = this.lexer.next() }
    private done(): bool { return this.currentToken == null; }
    private current(): Token { return this.currentToken!; }
    // ...
}
```

This implementation should look familiar compared to the lexer. We use the `currentToken` as a 'buffer', and then just use the `.next()` on the `lexer`.

Just as the lexer, we'll have a `.pos()` method, returning the current position.

For convenience, although there are other ways of doing it, we'll implement another public method on `Lexer`, which will return the lexer's current position.

```ts
class Lexer {
    // ...
    public currentPos(): Pos { return this.pos(); }
    // ...
}
```
The reason, is that when the lexer has reached the end of the file, the `.next()` method will return `null` instead of a token with a position, meaning we won't get the position after the last token.

```ts
class Parser {
    // ...
    private pos(): Pos {
        if (this.done())
            return this.lexer.currentPos();
        return this.current().pos;
    }
    // ...
}
```

The parser does not need to keep track of `index`, `line` and `col` as those are stored in the tokens. The token's position is prefered to the lexer's.

Also like the lexer, we'll have a `.test()` method in the parser, which will test for token type rather than strings or regex.

```ts
class Parser {
    // ...
    private test(type: string): bool {
        return !this.done() && this.current().type === type;
    }
    // ...
}
```

When testing, we first check that we have not reach the end. Either we have to do that here, or the caller will have to write something like `!this.done() && this.test(...)`, and it's easy to do it here.

We'll also want a method for reporting errors.

```ts
class Parser {
    // ...
    private report(pos: Pos, msg: string) {
        console.log(`Parser: ${msg} at ${pos.line}:${pos.col}`);
    }
    // ...
}
```

## Operands

Operands are the individual parts of an operation. For example, in the math expression `a + b`, (would be `+ a b` in the input language), `a` and `b` are the *operands*, while `+` is the *operator*. In the expression `a + b * c`, the operands are `a`, `b` and `c`. But in the expression `a * (b + c)`, the operands of the multiply operation are `a` and `(b + c)`. `(b + c)` is an operands, because it is enclosed on both sides. This is how we'll define operands.

We'll make a public method in `Parser` called `parseOperand`.

```ts
class Parser {
    // ...
    public parseOperand(): Expr {
        const pos = this.pos();
        if (this.test("int")) {
            const value = this.current().intValue;
            this.step();
            return { kind: { type: "int", value }, pos };
        }
        this.report(pos "expected expr");
        this.step();
        return { kind: { type: "error" }, pos };
    }
    // ...
}
```

### Integer

Parsing an integer is a 1:1 translation between the integer token and an integer expression.

```ts
type ExprKind =
    // ...
    | { type: "int", value: number }
    // ...
    ;
```

```ts
class Parser {
    // ...
    public parseOperand(): Expr {
        // ...
        if (this.test("int")) {
            const value = this.current().intValue;
            this.step();
            return { kind: { type: "int", value }, pos };
        }
        // ...
    }
    // ...
}
```
add chapter 3 2024-08-27 14:48:42 +01:00
			`# Parser`

			`In this chaper I'll show how I would make a parser.`

			`A parser, in addition to our lexer, transforms the input program as text, meaning an unstructured sequence of characters, into a structered representation. Structured meaning the representation tells us about the different constructs such as if statements and expressions.`

			`## Abstract Syntax Tree AST`

			`The result of parsing is a tree structure representing the input program.`

			`This structure is a recursive acyclic structure storing the different parts of the program.`

			`This is how I would define an AST data type.`

			```ts
			`type Stmt = {`
			`kind: StmtKind,`
			`pos: Pos,`
			`};`

			`type StmtKind =`
			`\| { type: "error" }`
			`// ...`
			`\| { type: "let", ident: string, value: Expr }`
			`// ...`
			`;`

			`type Expr = {`
			`kind: ExprKind,`
			`pos: Pos,`
			`};`

			`type ExprKind =`
			`\| { type: "error" }`
			`// ...`
			`\| { type: "int", value: number }`
			`// ...`
			`;`
			```

add to chapter 3 2024-08-28 14:46:00 +01:00			Both `Stmt` (statement) and `Expr` (expression) are polymorphic types, meaning an expression, for example, can be either an addition operation containing 2 inner expressions or an integer expression containing the integer value, etc. This can also be implemented with classes and sub classes.

			For both `Stmt` and `Expr` there's an error-kind. This makes the parser simpler, as we won't need to manage parsing failures differently than successful parslings.

add chapter 3 2024-08-27 14:48:42 +01:00			`## Consumer of lexer`

			To start, we'll implement a `Parser` class, which for now is simply a consumer of a token iterater, meaning the lexer. In simple terms, whereas the lexer is a transformation from text to tokens, the parser is a transformation from token to an AST, except that the parser is not an iterator.

			```ts
			`class Parser {`
			`private currentToken: Token \| null;`

			`public constructor(private lexer: Lexer) {`
			`this.currentToken = lexer.next();`
			`}`
			`// ...`
			`private step() { this.currentToken = this.lexer.next() }`
			`private done(): bool { return this.currentToken == null; }`
			`private current(): Token { return this.currentToken!; }`
			`// ...`
			`}`
			```

			This implementation should look familiar compared to the lexer. We use the `currentToken` as a 'buffer', and then just use the `.next()` on the `lexer`.

			Just as the lexer, we'll have a `.pos()` method, returning the current position.

add to chapter 3 2024-08-28 14:46:00 +01:00			For convenience, although there are other ways of doing it, we'll implement another public method on `Lexer`, which will return the lexer's current position.

			```ts
			`class Lexer {`
			`// ...`
			`public currentPos(): Pos { return this.pos(); }`
			`// ...`
			`}`
			```
			The reason, is that when the lexer has reached the end of the file, the `.next()` method will return `null` instead of a token with a position, meaning we won't get the position after the last token.

add chapter 3 2024-08-27 14:48:42 +01:00			```ts
			`class Parser {`
			`// ...`
add to chapter 3 2024-08-28 14:46:00 +01:00			`private pos(): Pos {`
			`if (this.done())`
			`return this.lexer.currentPos();`
			`return this.current().pos;`
			`}`
add chapter 3 2024-08-27 14:48:42 +01:00			`// ...`
			`}`
			```

add to chapter 3 2024-08-28 14:46:00 +01:00			The parser does not need to keep track of `index`, `line` and `col` as those are stored in the tokens. The token's position is prefered to the lexer's.
add chapter 3 2024-08-27 14:48:42 +01:00
			Also like the lexer, we'll have a `.test()` method in the parser, which will test for token type rather than strings or regex.

			```ts
			`class Parser {`
			`// ...`
add to chapter 3 2024-08-28 14:46:00 +01:00			`private test(type: string): bool {`
			`return !this.done() && this.current().type === type;`
			`}`
			`// ...`
			`}`
			```

			When testing, we first check that we have not reach the end. Either we have to do that here, or the caller will have to write something like `!this.done() && this.test(...)`, and it's easy to do it here.

			`We'll also want a method for reporting errors.`

			```ts
			`class Parser {`
			`// ...`
			`private report(pos: Pos, msg: string) {`
			console.log(`Parser: ${msg} at ${pos.line}:${pos.col}`);
			`}`
add chapter 3 2024-08-27 14:48:42 +01:00			`// ...`
			`}`
			```

			`## Operands`

add to chapter 3 2024-08-28 14:46:00 +01:00			Operands are the individual parts of an operation. For example, in the math expression `a + b`, (would be `+ a b` in the input language), `a` and `b` are the operands, while `+` is the operator. In the expression `a + b * c`, the operands are `a`, `b` and `c`. But in the expression `a * (b + c)`, the operands of the multiply operation are `a` and `(b + c)`. `(b + c)` is an operands, because it is enclosed on both sides. This is how we'll define operands.

			We'll make a public method in `Parser` called `parseOperand`.

			```ts
			`class Parser {`
			`// ...`
			`public parseOperand(): Expr {`
			`const pos = this.pos();`
			`if (this.test("int")) {`
			`const value = this.current().intValue;`
			`this.step();`
			`return { kind: { type: "int", value }, pos };`
			`}`
			`this.report(pos "expected expr");`
			`this.step();`
			`return { kind: { type: "error" }, pos };`
			`}`
			`// ...`
			`}`
			```

			`### Integer`

			`Parsing an integer is a 1:1 translation between the integer token and an integer expression.`

			```ts
			`type ExprKind =`
			`// ...`
			`\| { type: "int", value: number }`
			`// ...`
			`;`
			```

			```ts
			`class Parser {`
			`// ...`
			`public parseOperand(): Expr {`
			`// ...`
			`if (this.test("int")) {`
			`const value = this.current().intValue;`
			`this.step();`
			`return { kind: { type: "int", value }, pos };`
			`}`
			`// ...`
			`}`
			`// ...`
			`}`
			```