{
    "componentChunkName": "component---src-templates-blog-blog-detail-tsx",
    "path": "/blog/reading-tidb-source-code-with-questions-2",
    "result": {"pageContext":{"blog":{"id":"Blogs_321","title":"带着问题读 TiDB 源码：Power BI Desktop 以 MySQL 驱动连接 TiDB 报错","tags":["带着问题读 TiDB 源码"],"category":{"name":"产品技术解读"},"summary":"本文从一个 Power BI Desktop 在 TiDB 上表现异常的问题为例，介绍从问题的发现、定位，到通过开源社区提 issue、写 PR 解决问题的流程，从代码实现的角度来做 trouble shooting，希望能够帮助大家更好地了解 TiDB 源码。","body":"常有人说，阅读源码是每个优秀开发工程师的必经之路，但是在面对像类似 TiDB 这样复杂的系统时，源码阅读是一个非常庞大的工程。而对一些 TiDB User 来说，从自己日常遇到的问题出发，反过来阅读源码就是一个不错的切入点，因此我们策划了《[带着问题读源码](https://pingcap.com/zh/blog/?tag=%E5%B8%A6%E7%9D%80%E9%97%AE%E9%A2%98%E8%AF%BB%20TiDB%20%E6%BA%90%E7%A0%81)》系列文章。\n\n\n\n本文为该系列的第二篇，从一个 Power BI Desktop 在 TiDB 上表现异常的问题为例，介绍从问题的发现、定位，到通过开源社区提 issue、写 PR 解决问题的流程，从代码实现的角度来做 trouble shooting，希望能够帮助大家更好地了解 TiDB 源码。\n\n首先我们重现一下失败的场景（TiDB 5.1.1 on MacOS），建一个简单的只有一个字段的表：\n\n```\nCREATE TABLE test(name VARCHAR(1) PRIMARY KEY);\n```\n\nMySQL 上可以 TiDB 上就不可以，报错\n> DataSource.Error: An error happened while reading data from the provider: 'Failed to enable constraints. One or more rows contain values violating non-null, unique, or foreign-key constraints.'\nDetails:\n    DataSourceKind=MySql\n    DataSourcePath=localhost:4000;test\n\n看 general log TiDB 上最后一条跑的 SQL 是：\n\n```\nselect COLUMN_NAME, ORDINAL_POSITION, IS_NULLABLE, DATA_TYPE, case when NUMERIC_PRECISION is null then null when DATA_TYPE in ('FLOAT', 'DOUBLE') then 2 else 10 end AS NUMERIC_PRECISION_RADIX, NUMERIC_PRECISION, NUMERIC_SCALE,            CHARACTER_MAXIMUM_LENGTH, COLUMN_DEFAULT, COLUMN_COMMENT AS DESCRIPTION, COLUMN_TYPE  from INFORMATION_SCHEMA.COLUMNS  where table_schema = 'test' and table_name = 'test';\n```\n\n我们用 tiup 启动一个 TiDB 集群，使用 tiup client 执行该命令，tiup client 也会报错：\n> error: mysql: sql: Scan error on column index 4, name \"NUMERIC_PRECISION_RADIX\": converting NULL to int64 is unsupported\n\n那我们的注意力就集中在解决这条语句的问题，我们先看 tiup client 上报的这个错意味着什么。tiup client 使用的是 golang `xo/usql` 库，但是在 `xo/usql` 库中，我们并不能找到对应的报错信息，grep converting 关键字返回极有限且无关的内容。我们再看 `xo/usql` 的 mysql driver，其中又引用到了 `go-sql-driver/mysql`，下载它的代码并 grep converting，只返回了 changelog 中的一条信息，大概率报错的地方也不在这个库中。浏览一下 `go-sql-driver/mysql` 中的代码，发现它依赖于 `database/sql`，那我们看看 `database/sql` 的内容。`database/sql` 是 golang 的标准库，所以我们需要下载 golang 的源码。在 golang 的 database 目录中 grep converting，很快就找到了与报错信息相符的内容：\n\ngo/src/database/sql/convert.go\n\n```\ncase reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:\n        if src == nil {\n                return fmt.Errorf(\"converting NULL to %s is unsupported\", dv.Kind())\n        }\n        s := asString(src)\n        i64, err := strconv.ParseInt(s, 10, dv.Type().Bits())\n        if err != nil {\n                err = strconvErr(err)\n                return fmt.Errorf(\"converting driver.Value type %T (%q) to a %s: %v\", src, s, dv.Kind(), err)\n        }\n        dv.SetInt(i64)\n        return nil\n```\n\n我们再追踪这个片段，看这里的类型是如何来的，最终我们会回到 go-sql-driver/mysql 中：\n\nmysql/fields.go\n\n```\n        case fieldTypeLongLong:\n                if mf.flags&flagNotNULL != 0 {\n                        if mf.flags&flagUnsigned != 0 {\n                                return scanTypeUint64\n                        }\n                        return scanTypeInt64\n                }\n                return scanTypeNullInt\n```\n\n这部分的代码是在解析语句返回体中的 [column definition](https://dev.mysql.com/doc/internals/en/com-query-response.html#column-definition)，转换成 golang 中的类型。我们可以使用   `mysql --host 127.0.0.1 --port 4000 -u root --column-type-info` 连上后查看有问题的 SQL 返回的 column metadata：\n\nMySQL\n\n```\nField 5: `NUMERIC_PRECISION_RADIX`\nCatalog: `def`\nDatabase: `` \nTable: ``\nOrg_table: ``\nType: LONGLONG\nCollation: binary (63)\nLength: 3\nMax_length: 0\nDecimals: 0\nFlags: BINARY NUM\n```\n\nTiDB\n\n```\nField 5: `NUMERIC_PRECISION_RADIX`\nCatalog: `def`\nDatabase: ``\nTable: ``\nOrg_table: ``\nType: LONGLONG\nCollation: binary (63)\nLength: 2\nMax_length: 0\nDecimals: 0\nFlags: NOT_NULL BINARY NUM\n```\n\n可以很明显的看到，tiup client 报错信息中的 `NUMERIC_PRECISION_RADIX` 字段的 column definition 在 TiDB 上有明显的问题，该字段在 TiDB 的返回体中被标记为了 NOT_NULL，很明显这是不合理的，因为该字段显然可以是 `NULL`，MySQL 的返回值也体现了这一点。所以 `xo/usql` 在处理返回体的时候报错了。到了这里，我们已经发现了 client 端为什么会报错，下面我们就需要去寻找 TiDB 为什么会返回一个错误的 column definition。\n\n通过 TiDB Dev Guide 我们可以知道 TiDB 中一条 DQL 语句的大体执行过程，我们从入口的 `server/conn.go#clientConn.Run` 往下看去，一路经过 `server/conn.go#clientConn.dispatch`、`server/conn.go#clientConn.handleQuery`、`server/conn.go#clientConn.handleStmt`、`server/driver_tidb.go#TiDBContext.ExecuteStmt`、`session/session.go#session.ExecuteStmt`、`executor/compiler.go#Compiler.Compile`、`planner/optimize.go#Optimize`、`planner/optimize.go#optimize`、`planner/core/planbuilder.go#PlanBuilder.Build`、`planner/core/logical_plan_builder.go#PlanBuilder.buildSelect`，在 `buildSelect` 中，我们可以看到 TiDB planner 对查询语句进行的一系列处理，然后我们就可以走到 `planner/core/expression_rewriter.go#PlanBuilder.rewriteWithPreprocess` 和 `planner/core/expression_rewriter.go#PlanBuilder.rewriteExprNode`，在 `rewriteExprNode` 中，会把有问题的字段 `NUMERIC_PRECISION_RADIX` 进行解析，最终这条 `CASE` 表达式的解析会在 `expression/builtin_control.go#caseWhenFunctionClass.getFunction` 中，我们终于走到了计算 CASE 表达式返回的 column definition 的地方（这依赖于遍历 compiler 解析出的 AST）：\n\n```\n    for i := 1; i < l; i += 2 {       \n        fieldTps = append(fieldTps, args[i].GetType())\n        decimal = mathutil.Max(decimal, args[i].GetType().Decimal)\n        if args[i].GetType().Flen == -1 {\n            flen = -1\n        } else if flen != -1 {\n            flen = mathutil.Max(flen, args[i].GetType().Flen)\n        }\n        isBinaryStr = isBinaryStr || types.IsBinaryStr(args[i].GetType())\n        isBinaryFlag = isBinaryFlag || !types.IsNonBinaryStr(args[i].GetType())\n    }\n    if l%2 == 1 {\n        fieldTps = append(fieldTps, args[l-1].GetType())\n        decimal = mathutil.Max(decimal, args[l-1].GetType().Decimal)\n        if args[l-1].GetType().Flen == -1 {\n            flen = -1\n        } else if flen != -1 {\n            flen = mathutil.Max(flen, args[l-1].GetType().Flen)\n        }\n        isBinaryStr = isBinaryStr || types.IsBinaryStr(args[l-1].GetType())\n        isBinaryFlag = isBinaryFlag || !types.IsNonBinaryStr(args[l-1].GetType())\n    }\n\n\n    fieldTp := types.AggFieldType(fieldTps)\n    // Here we turn off NotNullFlag. Because if all when-clauses are false,\n    // the result of case-when expr is NULL.\n    types.SetTypeFlag(&fieldTp.Flag, mysql.NotNullFlag, false)\n    tp := fieldTp.EvalType()\n\n\n    if tp == types.ETInt {\n        decimal = 0\n    }\n    fieldTp.Decimal, fieldTp.Flen = decimal, flen\n    if fieldTp.EvalType().IsStringKind() && !isBinaryStr {\n        fieldTp.Charset, fieldTp.Collate = DeriveCollationFromExprs(ctx, args...)\n        if fieldTp.Charset == charset.CharsetBin && fieldTp.Collate == charset.CollationBin {\n            // When args are Json and Numerical type(eg. Int), the fieldTp is String.\n            // Both their charset/collation is binary, but the String need a default charset/collation.\n            fieldTp.Charset, fieldTp.Collate = charset.GetDefaultCharsetAndCollate()\n        }\n    } else {\n        fieldTp.Charset, fieldTp.Collate = charset.CharsetBin, charset.CollationBin\n    }\n    if isBinaryFlag {\n        fieldTp.Flag |= mysql.BinaryFlag\n    }\n    // Set retType to BINARY(0) if all arguments are of type NULL.\n    if fieldTp.Tp == mysql.TypeNull {\n        fieldTp.Flen, fieldTp.Decimal = 0, types.UnspecifiedLength\n        types.SetBinChsClnFlag(fieldTp)\n    }\n```\n\n查看如上计算 column definition flag 的代码我们可以发现，无论 `CASE` 表达式的情况是怎么样的，`NOT_NULL` 标记位都一定会被设置成 `false`，所以问题不出现在这里！这个时候我们只能沿着上面的代码路径往回看，看看上面生成的 column definition 在后续有没有被修改。终于在 `server/conn.go#clientConn.handleStmt` 中，发现它调用了 `server/conn.go#clientConn.writeResultSet`，然后又陆续调用了`server/conn.go#clientConn.writeChunks`、`server/conn.go#clientConn.writeColumnInfo`、`server/column.go#ColumnInfo.Dump` 和 `server/column.go#dumpFlag`，在 dumpFlag 中，之前生成的 `column definition flag` 被修改了：\n\n```\nfunc dumpFlag(tp byte, flag uint16) uint16 {\n    switch tp {\n    case mysql.TypeSet:\n        return flag | uint16(mysql.SetFlag)\n    case mysql.TypeEnum:\n        return flag | uint16(mysql.EnumFlag)\n    default:\n        if mysql.HasBinaryFlag(uint(flag)) {\n            return flag | uint16(mysql.NotNullFlag)\n        }\n        return flag\n    }\n}\n```\n\n终于，我们找到了 TiDB 返回错误的 column definition 的原因！其实这个 bug 在 TiDB 最新版5.2.0中已经被修复了：[*: fix some problems related to notNullFlag  by wjhuang2016 · Pull Request #27697 · pingcap/tidb](https://github.com/pingcap/tidb/pull/27697)。\n\n最后，在上述阅读代码的过程中，我们其实最好能够看到被 TiDB 解析后的 AST 是什么样子的，这样在最后遍历 AST 的过程中，才不至于摸瞎。TiDB dev guide 中有 [parser 章节](https://pingcap.github.io/tidb-dev-guide/understand-tidb/parser.html)讲解如何调试 parser，[parser/quickstart.md at master · pingcap/parser](https://github.com/pingcap/parser/blob/master/docs/quickstart.md) 中也有样例输出生成的 AST，但是简单地输出基本没有任何作用，我们可以使用 `davecgh/go-spew` 直接输出 parser 生成的 node，这样就能获得一个可被人理解的 tree：\n\n```\npackage main\n\nimport (\n        \"fmt\"\n        \"github.com/pingcap/parser\"\n        \"github.com/pingcap/parser/ast\"\n        _ \"github.com/pingcap/parser/test_driver\"\n        \"github.com/davecgh/go-spew/spew\"\n)\n\nfunc parse(sql string) (*ast.StmtNode, error) {\n        p := parser.New()\n        stmtNodes, _, err := p.Parse(sql, \"\", \"\")\n        if err != nil {\n                return nil, err\n        }\n        return &stmtNodes[0], nil\n}\n\nfunc main() {\n        spew.Config.Indent = \"    \"\n        astNode, err := parse(\"SELECT a, b FROM t\")\n        if err != nil {\n                fmt.Printf(\"parse error: %v\\n\", err.Error())\n                return\n        }\n        fmt.Printf(\"%s\\n\", spew.Sdump(*astNode))\n}\n```\n```\n(*ast.SelectStmt)(0x140001dac30)({\n    dmlNode: (ast.dmlNode) {\n        stmtNode: (ast.stmtNode) {\n            node: (ast.node) {\n                text: (string) (len=18) \"SELECT a, b FROM t\"\n            }\n        }\n    },\n    resultSetNode: (ast.resultSetNode) {\n        resultFields: ([]*ast.ResultField) <nil>\n    },\n    SelectStmtOpts: (*ast.SelectStmtOpts)(0x14000115bc0)({\n        Distinct: (bool) false,\n        SQLBigResult: (bool) false,\n        SQLBufferResult: (bool) false,\n        SQLCache: (bool) true,\n        SQLSmallResult: (bool) false,\n        CalcFoundRows: (bool) false,\n        StraightJoin: (bool) false,\n        Priority: (mysql.PriorityEnum) 0,\n        TableHints: ([]*ast.TableOptimizerHint) <nil>\n    }),\n    Distinct: (bool) false,\n    From: (*ast.TableRefsClause)(0x140001223c0)({\n        node: (ast.node) {\n            text: (string) \"\"\n        },\n        TableRefs: (*ast.Join)(0x14000254100)({\n            node: (ast.node) {\n                text: (string) \"\"\n            },\n            resultSetNode: (ast.resultSetNode) {\n                resultFields: ([]*ast.ResultField) <nil>\n            },\n            Left: (*ast.TableSource)(0x14000156480)({\n                node: (ast.node) {\n                    text: (string) \"\"\n                },\n                Source: (*ast.TableName)(0x1400013a370)({\n                    node: (ast.node) {\n                        text: (string) \"\"\n                    },\n                    resultSetNode: (ast.resultSetNode) {\n                        resultFields: ([]*ast.ResultField) <nil>\n                    },\n                    Schema: (model.CIStr) ,\n                    Name: (model.CIStr) t,\n                    DBInfo: (*model.DBInfo)(<nil>),\n                    TableInfo: (*model.TableInfo)(<nil>),\n                    IndexHints: ([]*ast.IndexHint) <nil>,\n                    PartitionNames: ([]model.CIStr) {\n                    }\n                }),\n                AsName: (model.CIStr)\n            }),\n            Right: (ast.ResultSetNode) <nil>,\n            Tp: (ast.JoinType) 0,\n            On: (*ast.OnCondition)(<nil>),\n            Using: ([]*ast.ColumnName) <nil>,\n            NaturalJoin: (bool) false,\n            StraightJoin: (bool) false\n        })\n    }),\n    Where: (ast.ExprNode) <nil>,\n    Fields: (*ast.FieldList)(0x14000115bf0)({\n        node: (ast.node) {\n            text: (string) \"\"\n        },\n        Fields: ([]*ast.SelectField) (len=2 cap=2) {\n            (*ast.SelectField)(0x140001367e0)({\n                node: (ast.node) {\n                    text: (string) (len=1) \"a\"\n                },\n                Offset: (int) 7,\n                WildCard: (*ast.WildCardField)(<nil>),\n                Expr: (*ast.ColumnNameExpr)(0x14000254000)({\n                    exprNode: (ast.exprNode) {\n                        node: (ast.node) {\n                            text: (string) \"\"\n                        },\n                        Type: (types.FieldType) unspecified,\n                        flag: (uint64) 8\n                    },\n                    Name: (*ast.ColumnName)(0x1400017dc70)(a),\n                    Refer: (*ast.ResultField)(<nil>)\n                }),\n                AsName: (model.CIStr) ,\n                Auxiliary: (bool) false\n            }),\n            (*ast.SelectField)(0x14000136840)({\n                node: (ast.node) {\n                    text: (string) (len=1) \"b\"\n                },\n                Offset: (int) 10,\n                WildCard: (*ast.WildCardField)(<nil>),\n                Expr: (*ast.ColumnNameExpr)(0x14000254080)({\n                    exprNode: (ast.exprNode) {\n                        node: (ast.node) {\n                            text: (string) \"\"\n                        },\n                        Type: (types.FieldType) unspecified,\n                        flag: (uint64) 8\n                    },\n                    Name: (*ast.ColumnName)(0x1400017dce0)(b),\n                    Refer: (*ast.ResultField)(<nil>)\n                }),\n                AsName: (model.CIStr) ,\n                Auxiliary: (bool) false\n            })\n        }\n    }),\n    GroupBy: (*ast.GroupByClause)(<nil>),\n    Having: (*ast.HavingClause)(<nil>),\n    WindowSpecs: ([]ast.WindowSpec) <nil>,\n    OrderBy: (*ast.OrderByClause)(<nil>),\n    Limit: (*ast.Limit)(<nil>),\n    LockTp: (ast.SelectLockType) none,\n    TableHints: ([]*ast.TableOptimizerHint) <nil>,\n    IsAfterUnionDistinct: (bool) false,\n    IsInBraces: (bool) false,\n    QueryBlockOffset: (int) 0,\n    SelectIntoOpt: (*ast.SelectIntoOption)(<nil>)\n})\n```\n\n> 点击查看更多[带着问题读 TiDB 源码系列文章](https://pingcap.com/zh/blog/?tag=%E5%B8%A6%E7%9D%80%E9%97%AE%E9%A2%98%E8%AF%BB%20TiDB%20%E6%BA%90%E7%A0%81)\n","date":"2021-12-01","author":"张翔","fillInMethod":"writeDirectly","customUrl":"reading-tidb-source-code-with-questions-2","file":null,"relatedBlogs":[]}}},
    "staticQueryHashes": ["1327623483","1820662718","3081853212","3430003955","3649515864","4265596160","63159454"]}