courses/compiler/chapter_3.md
Simon From Jakobsen 8a56d0b6a6 add chapter 3
2024-08-27 13:48:42 +00:00

2.3 KiB

Parser

In this chaper I'll show how I would make a parser.

A parser, in addition to our lexer, transforms the input program as text, meaning an unstructured sequence of characters, into a structered representation. Structured meaning the representation tells us about the different constructs such as if statements and expressions.

Abstract Syntax Tree AST

The result of parsing is a tree structure representing the input program.

This structure is a recursive acyclic structure storing the different parts of the program.

This is how I would define an AST data type.

type Stmt = {
    kind: StmtKind,
    pos: Pos,
};

type StmtKind =
    | { type: "error" }
    // ...
    | { type: "let", ident: string, value: Expr }
    // ...
    ;

type Expr = {
    kind: ExprKind,
    pos: Pos,
};

type ExprKind =
    | { type: "error" }
    // ...
    | { type: "int", value: number }
    // ...
    ;

Consumer of lexer

To start, we'll implement a Parser class, which for now is simply a consumer of a token iterater, meaning the lexer. In simple terms, whereas the lexer is a transformation from text to tokens, the parser is a transformation from token to an AST, except that the parser is not an iterator.

class Parser {
    private currentToken: Token | null;

    public constructor(private lexer: Lexer) {
        this.currentToken = lexer.next();
    }
    // ...
    private step() { this.currentToken = this.lexer.next() }
    private done(): bool { return this.currentToken == null; }
    private current(): Token { return this.currentToken!; }
    // ...
}

This implementation should look familiar compared to the lexer. We use the currentToken as a 'buffer', and then just use the .next() on the lexer.

Just as the lexer, we'll have a .pos() method, returning the current position.

class Parser {
    // ...
    private pos(): Pos { return this.current().pos; }
    // ...
}

The parser does not need to keep track of index, line and col as those are stored in the tokens.

Also like the lexer, we'll have a .test() method in the parser, which will test for token type rather than strings or regex.

class Parser {
    // ...
    private test(type: string): bool { return this.current().type === type; }
    // ...
}

Operands