使用BKLexer进行词法分析-木庄网络博客

本文摘自网络，作者，侵删。

前几天我已经封装好了词法分析器并命名BKLexer，当前BKLexer分别支持Go/C++/Python。

程序代码放在Github项目当中: 点进项目页

可以参考每一个版本的try_lexer代码进行学习，下面以Go为例:

package main

import (
    "fmt"
    "strconv"
    "./bklexer"
)

func main() {
    fmt.Println("Test Code:")
    code := "声明 变量 = PI * 100 - fda\n1024 * 4 * 3.14 ### \n123"
    fmt.Println(code)
    fmt.Println("--------------------------------")

    lexer := BKLexer.NewLexer()
    lexer.AddRule("\\d+\\.\\d*", "FLOAT")
    lexer.AddRule("\\d+", "INT")
    lexer.AddRule("[\\p{L}\\d_]+", "NAME")
    lexer.AddRule("\\+", "PLUS")
    lexer.AddRule("\\-", "MINUS")
    lexer.AddRule("\\*", "MUL")
    lexer.AddRule("/", "DIV")
    lexer.AddRule("=", "ASSIGN")
    lexer.AddRule("#[^\\r\\n]*", "COMMENT")
    lexer.AddIgnores("[ \\f\\t]+")

    lexer.Build(code)
    for true {
        token := lexer.NextToken()
        if (token.TType != BKLexer.TOKEN_TYPE_EOF) {
            fmt.Printf("%s\t%s\tt%d\t%d\t%d,%d\n",
            token.Name, strconv.Quote(token.Source), token.TType, token.Position, token.Row, token.Col)
        }
        if (token.TType == BKLexer.TOKEN_TYPE_EOF || token.TType == BKLexer.TOKEN_TYPE_ERROR) {
            break
        }
    }
}

首先引入bklexer在内的包

import (
    "fmt"
    "strconv"
    "./bklexer"
)

fmt 用于打印输出
strconv 用于优化字面量的显示
./bklexer 引入BKLexer包

实例化词法分析器并设定规则

lexer := BKLexer.NewLexer()
lexer.AddRule("\\d+\\.\\d*", "FLOAT")
lexer.AddRule("\\d+", "INT")
lexer.AddRule("[\\p{L}\\d_]+", "NAME")
lexer.AddRule("\\+", "PLUS")
lexer.AddRule("\\-", "MINUS")
lexer.AddRule("\\*", "MUL")
lexer.AddRule("/", "DIV")
lexer.AddRule("=", "ASSIGN")
lexer.AddRule("#[^\\r\\n]*", "COMMENT")
lexer.AddIgnores("[ \\f\\t]+")

NewLexer 实例化词法分析器
AddRule 增加匹配规则，参数分别为正则表达式，对应的类型名称
AddIgnores 用于设定需要忽略的字符内容

构建并循环匹配

lexer.Build(code)
for true {
    token := lexer.NextToken()
    if (token.TType != BKLexer.TOKEN_TYPE_EOF) {
        fmt.Printf("%s\t%s\tt%d\t%d\t%d,%d\n",
        token.Name, strconv.Quote(token.Source), token.TType, token.Position, token.Row, token.Col)
    }
    if (token.TType == BKLexer.TOKEN_TYPE_EOF || token.TType == BKLexer.TOKEN_TYPE_ERROR) {
        break
    }
}

使用Build方法，将代码code作为参数进行构建，然后循环调用NextToken方法获得下一个Token，并打印相关信息。
需要注意的是应当对Token的类型进行检测判断是否为EOF或ERROR以决定是否终止。

运行结果如下

Test Code:
声明 变量 = PI * 100 - fda
1024 * 4 * 3.14 ### 
123
--------------------------------
NAME    "声明"    t3    0    0,0
NAME    "变量"    t3    7    0,3
ASSIGN    "="        t8    14    0,6
NAME    "PI"    t3    16    0,8
MUL        "*"        t6    19    0,11
INT        "100"    t2    21    0,13
MINUS    "-"        t5    25    0,17
NAME    "fda"    t3    27    0,19
NEWLINE    "\n"    t0    30    0,22
INT        "1024"    t2    31    1,0
MUL        "*"        t6    36    1,5
INT        "4"        t2    38    1,7
MUL        "*"        t6    40    1,9
FLOAT    "3.14"    t1    42    1,11
COMMENT    "### "    t9    47    1,16
NEWLINE    "\n"    t0    51    1,20
INT        "123"    t2    52    2,0

下篇《递归向下算法实现Calc》，欢迎关注。

本文来自：Segmentfault

感谢作者：bxtkezhan

查看原文：使用BKLexer进行词法分析