上一篇文章《MySQL如何给JSON列添加索引(二)》中,我们介绍了如何给JSON列添加索引,那么接下来,我们看MySQL 8.0新增的一种索引类型:多值索引;
从MySQL 8.0.17开始,InnoDB支持多值索引。多值索引是在存储值数组的列上定义的二级索引。“普通”索引对每个数据记录有一个索引记录(1:1)。对于单个数据记录(N:1),多值索引可以有多个索引记录。多值索引旨在为JSON数组建立索引。例如,在以下JSON文档中的邮政编码数组上定义的多值索引会为每个邮政编码创建一个索引记录,每个索引记录都引用同一数据记录。
多值索引可以在CREATE TABLE、ALTER TABLE或CREATE INDEX语句中创建多值索引。这要求使用CAST(… AS … ARRAY)索引定义,该定义将JSON数组中相同类型的标量值转换为SQL数据类型数组。然后,使用SQL数据类型数组中的值透明地生成一个虚拟列。最后,在虚拟列上创建一个功能索引(也称为虚拟索引)。是在SQL数据类型数组的值的虚拟列上定义的功能索引,该索引构成了多值索引。
下表中的示例显示了在名为customers的表中的JSON列custinfo上的数组$.zipcode上创建多值索引zips的三种不同方法。在每种情况下,JSON数组都被转换为无符号整数值的SQL数据类型数组。
CREATE TABLE customers ( id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, custinfo JSON, INDEX zips( (CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) ) );
多值索引也可以定义为复合索引的一部分。此示例显示了一个复合索引,其中包括两个单值部分(用于id和 modified列)和一个多值部分(用于custinfo列):
CREATE TABLE customers ( id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, custinfo JSON ); ALTER TABLE customers ADD INDEX comp(id, modified, (CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) );
复合索引中只能使用一个多值键部分。多值键部分可以相对于键的其他部分以任何顺序使用。换句话说,ALTER TABLE刚刚显示的语句可能已经使用 comp(id, (CAST(custinfo->’$.zipcode’ AS UNSIGNED ARRAY), modified))(或任何其他排序)并且仍然有效。
使用多值索引
在WHERE子句中指定以下功能时,优化程序将使用多值索引来获取记录 :
* MEMBER OF()
* JSON_CONTAINS()
* JSON_OVERLAPS()
关于JSON函数的会在后面的文章中进行详细的讲解;
我们可以通过使用以下CREATE table和INSERT语句创建和填充customers表来演示这一点:
mysql>CREATE TABLE customers ( -> id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, -> modified DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, -> custinfo JSON -> ); Query OK, 0 rows affected (0.00 sec) mysql >INSERT INTO customers VALUES -> (NULL, NOW(), '{"user":"Jack","user_id":37,"zipcode":[94582,94536]}'), -> (NULL, NOW(), '{"user":"Jill","user_id":22,"zipcode":[94568,94507,94582]}'), -> (NULL, NOW(), '{"user":"Bob","user_id":31,"zipcode":[94477,94507]}'), -> (NULL, NOW(), '{"user":"Mary","user_id":72,"zipcode":[94536]}'), -> (NULL, NOW(), '{"user":"Ted","user_id":56,"zipcode":[94507,94582]}'); Query OK, 5 rows affected (0.00 sec) Records: 5 Duplicates: 0 Warnings: 0 mysql>select * from customers; +----+---------------------+-------------------------------------------------------------------+ | id | modified | custinfo | +----+---------------------+-------------------------------------------------------------------+ | 1 | 2021-05-24 13:37:29 | {"user": "Jack", "user_id": 37, "zipcode": [94582, 94536]} | | 2 | 2021-05-24 13:37:29 | {"user": "Jill", "user_id": 22, "zipcode": [94568, 94507, 94582]} | | 3 | 2021-05-24 13:37:29 | {"user": "Bob", "user_id": 31, "zipcode": [94477, 94507]} | | 4 | 2021-05-24 13:37:29 | {"user": "Mary", "user_id": 72, "zipcode": [94536]} | | 5 | 2021-05-24 13:37:29 | {"user": "Ted", "user_id": 56, "zipcode": [94507, 94582]} | +----+---------------------+-------------------------------------------------------------------+ 5 rows in set (0.00 sec)
首先,我们对customers表执行三个查询,每个查询使用MEMBER OF()、JSON_CONTAINS()和JSON_OVERLAPS(),每个查询的结果如下所示:
mysql>SELECT * FROM customers WHERE 94507 MEMBER OF(custinfo->'$.zipcode'); +----+---------------------+-------------------------------------------------------------------+ | id | modified | custinfo | +----+---------------------+-------------------------------------------------------------------+ | 2 | 2021-05-24 13:37:29 | {"user": "Jill", "user_id": 22, "zipcode": [94568, 94507, 94582]} | | 3 | 2021-05-24 13:37:29 | {"user": "Bob", "user_id": 31, "zipcode": [94477, 94507]} | | 5 | 2021-05-24 13:37:29 | {"user": "Ted", "user_id": 56, "zipcode": [94507, 94582]} | +----+---------------------+-------------------------------------------------------------------+ 3 rows in set (0.00 sec) mysql>SELECT * FROM customers WHERE JSON_CONTAINS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON)); +----+---------------------+-------------------------------------------------------------------+ | id | modified | custinfo | +----+---------------------+-------------------------------------------------------------------+ | 2 | 2021-05-24 13:37:29 | {"user": "Jill", "user_id": 22, "zipcode": [94568, 94507, 94582]} | | 5 | 2021-05-24 13:37:29 | {"user": "Ted", "user_id": 56, "zipcode": [94507, 94582]} | +----+---------------------+-------------------------------------------------------------------+ 2 rows in set (0.00 sec) mysql>SELECT * FROM customers WHERE JSON_OVERLAPS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON)); +----+---------------------+-------------------------------------------------------------------+ | id | modified | custinfo | +----+---------------------+-------------------------------------------------------------------+ | 1 | 2021-05-24 13:37:29 | {"user": "Jack", "user_id": 37, "zipcode": [94582, 94536]} | | 2 | 2021-05-24 13:37:29 | {"user": "Jill", "user_id": 22, "zipcode": [94568, 94507, 94582]} | | 3 | 2021-05-24 13:37:29 | {"user": "Bob", "user_id": 31, "zipcode": [94477, 94507]} | | 5 | 2021-05-24 13:37:29 | {"user": "Ted", "user_id": 56, "zipcode": [94507, 94582]} | +----+---------------------+-------------------------------------------------------------------+ 4 rows in set (0.00 sec)
接下来,我们EXPLAIN对前三个查询中的每一个运行:
mysql >explain SELECT * FROM customers WHERE 94507 MEMBER OF(custinfo->'$.zipcode'); +----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | customers | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | Using where | +----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) mysql>explain SELECT * FROM customers WHERE JSON_CONTAINS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON)); +----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | customers | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | Using where | +----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) mysql >explain SELECT * FROM customers WHERE JSON_OVERLAPS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON)); +----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | customers | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | Using where | +----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec)
刚刚显示的三个查询都不能使用任何索引。为了解决这个问题,我们可以在JSON列(custinfo)中的zipcode数组上添加一个多值索引,如下所示:
mysql>ALTER TABLE customers ADD INDEX zips( (CAST(custinfo->'$.zipcode' AS UNSIGNED ARRAY)) ); Query OK, 0 rows affected (0.12 sec) Records: 0 Duplicates: 0 Warnings: 0
当我们EXPLAIN再次运行前面的语句时,我们现在可以观察到查询可以(并且确实)使用zips刚刚创建的索引:
mysql >EXPLAIN SELECT * FROM customers WHERE 94507 MEMBER OF(custinfo->'$.zipcode'); +----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------------+ | 1 | SIMPLE | customers | NULL | ref | zips | zips | 9 | const | 1 | 100.00 | Using where | +----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) mysql >EXPLAIN SELECT * FROM customers WHERE JSON_CONTAINS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON)); +----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | customers | NULL | range | zips | zips | 9 | NULL | 6 | 100.00 | Using where | +----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) mysql >EXPLAIN SELECT * FROM customers WHERE JSON_OVERLAPS(custinfo->'$.zipcode', CAST('[94507,94582]' AS JSON)); +----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | customers | NULL | range | zips | zips | 9 | NULL | 6 | 100.00 | Using where | +----+-------------+-----------+------------+-------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec)
多值索引的限制
多值索引受此处列出的限制:
* 每个多值索引仅允许一个多值键部分。但是,该CAST(… AS … ARRAY)表达式可以引用JSON文档中的多个数组,如下所示:
CAST(data->'$.arr[*][*]' AS UNSIGNED ARRAY)在这种情况下,所有与JSON表达式匹配的值都作为单个平面数组存储在索引中。
* 具有多值键部分的索引不支持排序,因此不能用作主键。出于相同的原因,不能使用ASC或DESC 关键字定义多值索引。
* 多值索引不能是覆盖索引。
* 多值索引的每条记录的最大值数由可以在单个撤消日志页上存储的数据量决定,即65221字节(64K减去315字节的开销),这意味着最大总数键值的长度也是65221字节。键的最大数量取决于各种因素,这会阻止定义特定的限制。测试显示了一个多值索引,例如,每个记录允许多达1604个整数键。
当达到限制时,将报告类似于以下: ERROR 3905 (HY000): Exceeded max number of values per record for multi-valued index ‘idx’ by 1 value(s).
* 多值键部分中唯一允许的表达式类型是JSON 表达式。该表达式无需引用插入到索引列中的JSON文档中的现有元素,而本身在语法上必须有效。
* 因为同一聚集索引记录的索引记录分散在整个多值索引中,所以多值索引不支持范围扫描或仅只支持索引扫描。
* 外键规范中不允许使用多值索引。
* 不能为多值索引定义索引前缀。
* 无法在强制转换为的数据上定义多值索引 BINARY。
* 不支持在线创建多值索引,这意味着该操作使用 ALGORITHM=COPY。
* 多值索引不支持以下字符集和排序规则的以下两种组合以外的字符集和排序规则:
1. binary具有默认binary排序规则 的字符集
2. utf8mb4具有默认utf8mb4_0900_as_cs排序规则 的字符集。
* 与InnoDB表列上的其他索引一样 ,不能使用USING HASH创建多值索引。尝试执行此操作将导致警告: This storage engine does not support the HASH index algorithm, storage engine default was used instead.。(USING BTREE照常支持。)
好了,今天就先介绍到这里,关于JSON更多内容,后续会慢慢进行介绍;
JSON系列文章
MySQL 8.0新特性:多值索引 –为JSON数组添加索引(三)