临近十一,国庆放假的同时,往往会伴随着国庆期间业务要上相关的活动,那么今天就分享一个今年五一前夕(4月30日)上新活动中遇到的一个性能问题;
问题背景
五一前夕(4月30日)中午频繁的收到业务慢查询的告警
排查过程
查看mysql实例监控,发现CPU在不断增长,系统的负载也在不断的增加
查看慢查询日志,发现有一个update操作的慢查询,执行时间在50s多,而且锁的时间48s多,经过和业务人员沟通,由于五一活动游戏开赛,所以出现业务量的突增,那么接下来我们来分析下慢SQL的问题,下面是慢查询日志中的记录:
# Time: 2021-04-30T06:32:37.832446Z # User@Host: srv_xxx_rwl[srv_xxx_rwl] @ [10.30.xx.xx] Id: 1602496 # Schema: test_tab Last_errno: 0 Killed: 0 # Query_time: 54.673171 Lock_time: 49.894811 Rows_sent: 0 Rows_examined: 4 Rows_affected: 2 # Bytes_sent: 52 SET timestamp=1619764357; UPDATE dic_match_signup_userinfo_6 SET state=0,mtime=NOW() WHERE msid=2332976 AND userid=724493906 AND state=1; # Time: 2021-04-30T06:32:37.953099Z # User@Host: srv_xxx_rwl[srv_xxx_rwl] @ [10.30.xx.xx] Id: 299508 # Schema: test_tab Last_errno: 0 Killed: 0 # Query_time: 54.078065 Lock_time: 48.940644 Rows_sent: 0 Rows_examined: 2 Rows_affected: 1 # Bytes_sent: 52 SET timestamp=1619764357; UPDATE dic_match_signup_userinfo_6 SET state=0,mtime=NOW() WHERE msid=2332976 AND userid=749435007 AND state=1; # Time: 2021-04-30T06:32:38.378148Z
从慢查询日志中可以看到,锁等待时间比较长,那么我们来看下锁相关的信息,可以发现出现了死锁的情况,如下所示:
------------------------ LATEST DETECTED DEADLOCK ------------------------ 2021-04-30 13:58:47 0x7faff13ff700 *** (1) TRANSACTION: TRANSACTION 13826305171, ACTIVE 53 sec fetching rows mysql tables in use 3, locked 3 LOCK WAIT 9186 lock struct(s), heap size 1220816, 13 row lock(s) MySQL thread id 1152912, OS thread handle 140399163991808, query id 8222848458 10.30.xx.xx srv_xxx_rwl Searching rows for update UPDATE dic_match_signup_userinfo_7 SET state=0,mtime=NOW() WHERE msid=2332976 AND userid=749435007 AND state=1 *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 1376 page no 17410 n bits 320 index PRIMARY of table `test_tab`.`dic_match_signup_userinfo_7` trx id 13826305171 lock_mo de X locks rec but not gap waiting *** (2) TRANSACTION: TRANSACTION 13826304999, ACTIVE 53 sec fetching rows mysql tables in use 3, locked 3 9180 lock struct(s), heap size 1220816, 5 row lock(s) MySQL thread id 299537, OS thread handle 140393643505408, query id 8222848 122 10.30.xx.xx srv_xxxx_rwl Searching rows for update UPDATE dic_match_signup_userinfo_7 SET state=0,mtime=NOW() WHERE msid=2332976 AND userid=761512237 AND state=1 *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 1376 page no 17410 n bits 320 index PRIMARY of table `test_tab`.`dic_match_signup_userinfo_7` trx id 13826304999 lock_mo de X locks rec but not gap *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 1376 page no 21758 n bits 584 index idx_state of table `test_tab`.`dic_match_signup_userinfo_7` trx id 13826304999 lock_ mode X locks rec but not gap waiting *** WE ROLL BACK TRANSACTION (2)
从上面死锁信息可以看到,产生是两个update操作导致的,按道理来说,update操作不应该会很慢,那么接下来,我们看一下sql语句的执行计划:
mysql>explain UPDATE dic_match_signup_userinfo_7 SET state=0,mtime=NOW() WHERE msid=2332976 AND userid=749435007 AND state=1\G *************************** 1. row *************************** id: 1 select_type: UPDATE table: dic_match_signup_userinfo_7 partitions: NULL type: index_merge possible_keys: idx_msid_userid,idx_state key: idx_msid_userid,idx_state key_len: 8,4 ref: NULL rows: 3269 filtered: 100.00 Extra: Using intersect(idx_msid_userid,idx_state); Using where; Using temporary 1 row in set (0.00 sec)
从执行计划中可以看到,MySQL执行计划选择了单独的2个二级索引,然后通过Using intersect算法进行index merge操作。从字面意义来上intersect就是 交集的意思。虽然性能上没多少影响,但比较好奇,在理解当中MySQL知识体系中是没有交集语法。using intersect:表示使用and的各个索引的条件时,该信息表示是从处理结果获取交集
通过查看官方文档了解到:
Using intersect方式是索引合并访问方法。一般有几种算法,在EXPLAIN输出的额外字段中显示:
Using intersect(…)
Using union(…)
Using sort_union(…)
索引合并交集算法对所有使用的索引执行同步扫描,并生成从合并索引扫描中接收到的行序列的交集。其中Using intersect 就是一种。
那么接下来看下表结构和索引结构:
mysql >show create table dic_match_signup_userinfo_7\G *************************** 1. row *************************** Table: dic_match_start_timer_trigger Create Table: CREATE TABLE `dic_match_start_timer_trigger` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键id', `msid` int(11) unsigned NOT NULL COMMENT '开赛ID', `state` int(11) unsigned NOT NULL DEFAULT '1' COMMENT '0-失效,1-生效, 2-已完成', ....... `btime` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '生效时间', PRIMARY KEY (`id`), KEY `idx_msid_state` (`msid`,`state`), KEY `idx_state` (`state`) ) ENGINE=InnoDB AUTO_INCREMENT=2375237 DEFAULT CHARSET=utf8 COMMENT='xxxxx' 1 row in set (0.00 sec)
从表结构中可以看到,两个二级索引idx_msid_state和idx_state,在结合SQL语句,我们来调整一下索引结构;
解决思路
添加组合索引KEY `idx_idx_msid_userid_state` (`msid`,`userid`,`state`),
添加完成后,我们在来看下执行计划
mysql>explain UPDATE dic_match_signup_userinfo_7 SET state=0,mtime=NOW() WHERE msid=2332976 AND userid=749435007 AND state=1\G *************************** 1. row *************************** id: 1 select_type: UPDATE table: dic_match_signup_userinfo_7 partitions: NULL type: range possible_keys: idx_msid_userid,idx_state,idx_idx_msid_userid_state key: idx_idx_msid_userid_state key_len: 12 ref: const,const,const rows: 1 filtered: 100.00 Extra: Using where; Using temporary 1 row in set (0.00 sec)
接着观察一下系统的负载是否有降下来,发现添加完索引之后,CPU和实例负载就降下来了;
那么两个update操作怎么就产生了死锁的情况呢?今天就先到这里了,后面我们会继续进行分享;