Percona-toolkit工具包系列文章
关于死锁,相信计算机专业的同学不止在一门课里面接触过,操作系统老师讲过死锁吧,数据库系统概念的老师也讲过死锁吧。至于死锁原理,这里不赘述了,有兴趣的同学可以参考下这两篇博文(http://hedengcheng.com/?p=844,http://www.cnblogs.com/LBSer/p/5183300.html)写的非常棒,本文纯粹讲述如何使用pt-deadlock-logger这个工具来记录死锁。
有同学可能有疑问了,pt-deadlock-logger仅仅只有一个死锁记录功能,这有啥用啊?又不能避免死锁,又不能打开死锁。赫赫,这么讲吧,当线上系统出现死锁,那是相当严重的事情,业务积压,各种领导都蹦出来了,数据库咋回事,各种运维围到你身边,这是比较严重事情哦,但是死锁场景又很难抓出来,让开发修改。但是pt-deadlock-logger就帮我们解决了这个问题。
pt-deadlock-logger工具介绍
pt-deadlock-logger其实就是一个脚本连接到数据库服务器,用于探测死锁,一旦有死锁发现,就会从数据库中采集死锁相关信息,输出到你指定的地方。只要有死锁,我们就可以看到死锁SQL,死锁的线程ID等等相关信息。是不是超棒的功能呢。
模拟死锁场景
root@localhost [3306][wjq]>CREATE TABLE `wjqtab1` ( -> `id` int(11) NOT NULL AUTO_INCREMENT, -> `a` int(11) DEFAULT NULL, -> `b` int(11) DEFAULT NULL, -> PRIMARY KEY (`id`), -> KEY `idxa` (`a`) -> ) ENGINE=InnoDB; Query OK, 0 rows affected (0.02 sec) root@localhost [3306][wjq]>insert into wjqtab1(a,b) values(2,7),(5,10),(6,45); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0
session1:
[session1] root@localhost [wjq]begin; Query OK, 0 rows affected (0.00 sec) [session1] root@localhost [wjq] [session1] root@localhost [wjq]delete from wjqtab1 where a=5; Query OK, 1 row affected (0.00 sec)
session2:
[session2] root@localhost [wjq] begin; Query OK, 0 rows affected (0.00 sec) [session2] root@localhost [wjq] delete from wjqtab1 where a=5; 等待。。。
session1:
[session1] root@localhost [wjq]insert into wjqtab1(a,b) values(2,30); Query OK, 1 row affected (0.00 sec)
session2:
[session2] root@localhost [wjq] delete from wjqtab1 where a=5; ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
产生了死锁,通常情况下我们通过show engine innodb status命令可以查看到死锁的信息,如下所示:
------------------------ LATEST DETECTED DEADLOCK ------------------------ 2019-03-11 18:05:50 0x7f213c188700 *** (1) TRANSACTION: TRANSACTION 350213, ACTIVE 6 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s) MySQL thread id 103064, OS thread handle 139780852061952, query id 9628 localhost root updating delete from wjqtab1 where a=5 *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 1093 page no 4 n bits 72 index idxa of table `wjq`.`wjqtab1` trx id 350213 lock_mode X waiting Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 4; hex 80000005; asc ;; 1: len 4; hex 80000002; asc ;; *** (2) TRANSACTION: TRANSACTION 350212, ACTIVE 29 sec inserting mysql tables in use 1, locked 1 5 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 2 MySQL thread id 100428, OS thread handle 139780718888704, query id 9629 localhost root update insert into wjqtab1(a,b) values(2,30) *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 1093 page no 4 n bits 72 index idxa of table `wjq`.`wjqtab1` trx id 350212 lock_mode X Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 4; hex 80000005; asc ;; 1: len 4; hex 80000002; asc ;; *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 1093 page no 4 n bits 72 index idxa of table `wjq`.`wjqtab1` trx id 350212 lock_mode X locks gap before rec insert intention waiting Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 4; hex 80000005; asc ;; 1: len 4; hex 80000002; asc ;; *** WE ROLL BACK TRANSACTION (1)
1、直接使用pt-deadlock-logger检测数据库中产生的死锁
[root@VM_54_118_centos ~]# pt-deadlock-logger h=localhost,u=root,p=xxxxxx,S=/tmp/mysql3306.sock server ts thread txn_id txn_time user hostname ip db tbl idx lock_type lock_mode wait_hold victim query localhost 2019-03-11T18:05:50 100428 0 29 root localhost wjq wjqtab1 idxa RECORD X w 0 insert into wjqtab1(a,b) values(2,30) localhost 2019-03-11T18:05:50 103064 0 6 root localhost wjq wjqtab1 idxa RECORD X w 1 delete from wjqtab1 where a=5
输出参数相关说明:
server:数据库服务器地址,即死锁产生的数据库主机
ts:检测到死锁的时间戳
thread:产生死锁的线程id,这个id和show processlist里面的线程id是一致的
txn_id:innodb的事务ID
txd_time:死锁检查到前,事务执行时间
user:执行transcation的用户名
hostname:客户端主机名
ip:客户端ip
db:发生死锁的DB名
tbl:死锁发生的表名
idx:产生死锁的索引名(在上面这个demo里面, 我们直接走的主键,加的记录锁)
lock_type:锁的类型(记录锁,gap锁,next-key锁)
lock_mode:锁模式(S,X)
wait_hold:是否等着锁释放,一般死锁都是两个wait
victim:该会话是否做了牺牲,终止了执行
query:造成死锁的SQL语句
2、将死锁的信息输出到日志文件中
[root@VM_54_118_centos ~]# cat pt-deadlocks.conf #login to MySQL server# host=localhost port=3306 socket=/tmp/mysql3306.sock user=root password=qcloud@2018 charset=utf8mb4 #log to file tab log=/tmp/pt_deadlock.log daemonize interval=5
配置文件中参数说明
tab:代表,输出是带上制表符,格式化显示,更加清晰
log:指定日志文件用来存储死锁检测信息输出
daemonize:放到后台运行,守护进程模式
interval:每隔5秒检测一次
还有几个参数没有列出
run-time:pt-deadlock-logger运行时间
iterations:pt-deadlock-logger检测的次数
假如–run-time 2m –iterations 4 –interval 30,运行两分钟,每隔30s检测一次,一共检测四次
[root@VM_54_118_centos ~]# pt-deadlock-logger --config pt-deadlocks.conf [root@VM_54_118_centos ~]# tail -f /tmp/pt_deadlock.log server ts thread txn_id txn_time user hostname ip db tbl idx lock_type lock_mode wait_hold victim query localhost 2019-03-11T18:05:50 100428 0 29 root localhost wjq wjqtab1 idxa RECORD X w 0 insert into wjqtab1(a,b) values(2,30) localhost 2019-03-11T18:05:50 103064 0 6 root localhost wjq wjqtab1 idxa RECORD X w 1 delete from wjqtab1 where a=5
3、将检查到的死锁信息存储到表中
下面是存储死锁信息的表结构(可以提前创建也可以在使用命令的使用加上–create-dest-table自动创建)
root@localhost [3307][wjq]>CREATE TABLE deadlocks ( -> server char(20) NOT NULL, -> ts timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, -> thread int unsigned NOT NULL, -> txn_id bigint unsigned NOT NULL, -> txn_time smallint unsigned NOT NULL, -> user char(16) NOT NULL, -> hostname char(20) NOT NULL, -> ip char(15) NOT NULL, -- alternatively, ip int unsigned NOT NULL -> db char(64) NOT NULL, -> tbl char(64) NOT NULL, -> idx char(64) NOT NULL, -> lock_type char(16) NOT NULL, -> lock_mode char(1) NOT NULL, -> wait_hold char(1) NOT NULL, -> victim tinyint unsigned NOT NULL, -> query text NOT NULL, -> PRIMARY KEY (server,ts,thread) -> ) ENGINE=InnoDB; Query OK, 0 rows affected (0.05 sec)
下面的命令会自动创建存放死锁信息的表
[root@VM_54_118_centos ~]# pt-deadlock-logger h=localhost,u=root,p=qcloud@2018,S=/tmp/mysql3306.sock --dest h=localhost,u=root,p=qcloud@2018,S=/tmp/mysql3307.sock,D=wjq,t=deadlocks --create-dest-table
下面的命令不会自动创建存放死锁信息的表,需要提前创建
[root@VM_54_118_centos ~]# pt-deadlock-logger h=localhost,u=root,p=xxxxxxxx,S=/tmp/mysql3306.sock --dest h=localhost,u=root,p=qcloud@2018,S=/tmp/mysql3307.sock,D=wjq,t=deadlocks server ts thread txn_id txn_time user hostname ip db tbl idx lock_type lock_mode wait_hold victim query localhost 2019-03-11T18:05:50 100428 0 29 root localhost wjq wjqtab1 idxa RECORD X w 0 insert into wjqtab1(a,b) values(2,30) localhost 2019-03-11T18:05:50 103064 0 6 root localhost wjq wjqtab1 idxa RECORD X w 1 delete from wjqtab1 where a=5
查看存放死锁信息的表,查看死锁有关的信息
root@localhost [3307][wjq]>select * from deadlocks\G; *************************** 1. row *************************** server: localhost ts: 2019-03-11 18:05:50 thread: 100428 txn_id: 0 txn_time: 29 user: root hostname: localhost ip: db: wjq tbl: wjqtab1 idx: idxa lock_type: RECORD lock_mode: X wait_hold: w victim: 0 query: insert into wjqtab1(a,b) values(2,30) *************************** 2. row *************************** server: localhost ts: 2019-03-11 18:05:50 thread: 103064 txn_id: 0 txn_time: 6 user: root hostname: localhost ip: db: wjq tbl: wjqtab1 idx: idxa lock_type: RECORD lock_mode: X wait_hold: w victim: 1 query: delete from wjqtab1 where a=5 2 rows in set (0.00 sec)
关于pt-deadlock-logger的详细使用可参考官方文档:
https://www.percona.com/doc/percona-toolkit/3.0/pt-deadlock-logger.html