近期遇到一个主从复制报错的问题,具体的报错详情如下所示:
mysql >show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: xxxxx Master_User: dba_repl Master_Port: 4306 Connect_Retry: 60 Master_Log_File: mysqlbin.001589 Read_Master_Log_Pos: 353490645 Relay_Log_File: slave-relay-bin.004764 Relay_Log_Pos: 912732994 Relay_Master_Log_File: mysqlbin.001588 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1864 Last_Error: Cannot schedule event Rows_query, relay-log name /data/mysql_4306/log/slave-relay-bin.004764, position 912733141 to Worker thread because its size 21520792 exceeds 18777088 of slave_pending_jobs_size_max. Skip_Counter: 0 Exec_Master_Log_Pos: 912732823 Relay_Log_Space: 1427234531 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1864 Last_SQL_Error: Cannot schedule event Rows_query, relay-log name /data/mysql_4306/log/slave-relay-bin.004764, position 912733141 to Worker thread because its size 21520792 exceeds 18777088 of slave_pending_jobs_size_max. Replicate_Ignore_Server_Ids: Master_Server_Id: 255 Master_UUID: 548723ca-1f7f-11e9-b3ab-005056b748c5 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 210514 16:02:41 Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 548723ca-1f7f-11e9-b3ab-005056b748c5:532252-1257990582 Executed_Gtid_Set: 548723ca-1f7f-11e9-b3ab-005056b748c5:1-1257510314, 5965cc1e-42b8-11e8-9759-005056b7d9af:1-2805913654 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec)
从报错信息看,和参数slave_pending_jobs_size_max有关,那么这个参数具体是干什么的呢?我们今天就来了解一下;
下面是官方文档对该参数的介绍:
For multithreaded replicas, this variable sets the maximum amount of memory (in bytes) available to worker queues holding events not yet applied. Setting this variable has no effect on replicas for which multithreading is not enabled. Setting this variable has no immediate effect. The state of the variable applies on all subsequent START SLAVE commands.
The minimum possible value for this variable is 1024; the default is 16MB. The maximum possible value is 18446744073709551615 (16 exabytes). Values that are not exact multiples of 1024 are rounded down to the next-highest multiple of 1024 prior to being stored.
The value of this variable is a soft limit and can be set to match the normal workload. If an unusually large event exceeds this size, the transaction is held until all the worker threads have empty queues, and then processed. All subsequent transactions are held until the large transaction has been completed.
大概的意思是:
对于多线程复制,slave_pending_jobs_size_max变量设置用于保存尚未应用的event的工作队列可用的最大内存量(以字节为单位)。设置此变量对未启用多线程处理的复制没有影响。设置此变量不会立即生效。必须要停掉复制之后,重新start slave。
此变量的最小值为1024;默认值为16MB。最大可能值为18446744073709551615(16 EB)。
此变量的值是软限制,可以设置为与正常工作负载匹配。如果异常大的事件超过此大小,事务将被保留,直到所有工作线程都有空队列,然后进行处理。如果内存富余,或者延迟较大时,可以适当调大;注意这个值要比主库的max_allowed_packet大!
划重点:
该参数在多线程复制中起作用,
当worker线程正在处理的event的总大小超过slave_pending_jobs_size_max变量的大小时,将发生此等待操作。此时可有在主库看到线程的状态为: Waiting for Slave Workers to free pending events
当event大小降至该限制以下时,协调器将恢复调度。仅当slave_parallel_workers设置为大于0时,才会出现此状态。
那么对该参数了解之后,具体的解决方法也就有了:
1、查看主库max_allowed_packet的大小
mysql> show variables like '%max_allowed_packet%'; +--------------------------+------------+ | Variable_name | Value | +--------------------------+------------+ | max_allowed_packet | 67108864 | | slave_max_allowed_packet | 1073741824 | +--------------------------+------------+ 2 rows in set (0.00 sec)
2、设置从库slave_pending_jobs_size_max的大小,注意,需要大于主库max_allowed_packet的大小
mysql>stop slave; Query OK, 0 rows affected (0.01 sec) mysql>set global slave_pending_jobs_size_max=1073741824; Query OK, 0 rows affected (0.00 sec) mysql>start slave; Query OK, 0 rows affected (0.02 sec)