原创转载请注明出处
本文使用引擎INNODB版本MYSQL5.7.13
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
也就是双1设置,
本文使用通过对mysql 进行trace,gdb 自制工具infobin,以及
select * from sys.session where command<>'SLEEP' \G完成。
这里只是涉及binlog的写入问题和innodb层无关
1、问题由来
在线上环境一次同事删除了1000W的数据,在commit的时候,整个数据库
大约有几十秒的全部hang住的情况,本文就是为了深入剖析这个问题。
2、可能需要的知识
http://blog.itpub.net/7728585/viewspace-2133589/ MYSQL 什么时候同步binlog文件以及它的临时文件在哪里?
http://blog.itpub.net/7728585/viewspace-2133188/ 解析MYSQL BINLOG 二进制格式(1)--准备工作
http://blog.itpub.net/7728585/viewspace-2133189/ 解析MYSQL BINLOG 二进制格式(2)--FORMAT_DESCRIPTION_EVENT
http://blog.itpub.net/7728585/viewspace-2133321/ 解析MYSQL BINLOG 二进制格式(3)--QUERY_EVENT
http://blog.itpub.net/7728585/viewspace-2133429/ 解析MYSQL BINLOG 二进制格式(4)--TABLE_MAP_EVENT
http://blog.itpub.net/7728585/viewspace-2133463/ 解析MYSQL BINLOG 二进制格式(5)--WRITE_ROW_EVENT
http://blog.itpub.net/7728585/viewspace-2133469/ 解析MYSQL BINLOG 二进制格式(6)--UPDATE_ROW_EVENT/DELETE_ROW_EVENT
http://blog.itpub.net/7728585/viewspace-2133502/ 解析MYSQL BINLOG 二进制格式(7)--Xid_log_event/XID_EVENT
http://blog.itpub.net/7728585/viewspace-2133506/ 解析MYSQL BINLOG二进制格式(8)--GTID_LOG_EVENT/ANONYMOUS_GTID_LOG_EVENT及其他
http://blog.itpub.net/7728585/viewspace-2133534/ 解析MYSQL BINLOG二进制格式(9)--infobin解析binlog帮助文档
http://blog.itpub.net/7728585/viewspace-2133537/ 解析MYSQL BINLOG二进制格式(10)--问题解答
3、简单说明commit的时候做了什么以及为什么要上锁
在<>一文中有如下的图
其实这幅图中我已经写了需要一把锁,但是没仔细说,这把锁影响的是全部当前数据库的需要写binlog的全部语句,我们已经知道对于大事物(超过)的commit期间需要将临时
文件的中的binlog event 全部拷贝到binlog中,并且是一个事物的event必须是连续的,也就是一个事物的event中不能包含其他事物的event那么这期间需要一把mutex锁,为什
么要这把锁,这是在多线程/进程并发对同一个文件进行修改,也就是这里的binlog文件修改的,一种保护临界资源的方法,如果没有这把锁就会出现binlog中的文件出现错乱的情况。
注意这个锁是mysql层的而非innodb层。
4、模拟方法
对表test 进行删除不提交,观察临时文件大小。
关于生成临时文件的原理参考:
(http://blog.itpub.net/7728585/viewspace-2133589/ MYSQL 什么时候同步binlog文件以及它的临时文件在哪里?)
对删除delete test表commit期间马上对testsort和testsort2表插入一条数据,
这个时候可以观察到这两个插入,会被堵塞知道delete test的事物提交完成
才完成。
mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 2621440 |
+----------+
1 row in set (3.14 sec)
及如下:
可以看到一个简单的insert居然花了10多秒,commit也是做了14秒。原因就在于需要将MLq9INFu这个临时文件同步在binlog这个过程
其他需要写入binlog的会话是堵塞的。
删除期间会看到临时文件不断增大如下:
[root@testmy ~]# lsof |grep /tmp/ML
mysqld 2095 mysql 84u REG 8,3 328560640 14680133 /tmp/MLq9INFu (deleted)
[root@testmy ~]# lsof |grep /tmp/ML
mysqld 2095 mysql 84u REG 8,3 346337280 14680133 /tmp/MLq9INFu (deleted)
[root@testmy ~]# lsof |grep /tmp/ML
mysqld 2095 mysql 84u REG 8,3 367812608 14680133 /tmp/MLq9INFu (deleted)
到了commit后这个文件变为0,但是任然存在因为这个会话线程还在:
[root@testmy ~]# lsof |grep /tmp/ML
mysqld 2095 mysql 84u REG 8,3 0 14680133 /tmp/MLq9INFu (deleted)
5、使用infobin 对binlog文件中进行验证
使用./infobin test.000211>log.log 进行分析
(可以使用mysqlbinlog但是不直观
工具可以参考
http://blog.itpub.net/7728585/viewspace-2133534/ 解析MYSQL BINLOG二进制格式(9)--infobin解析binlog帮助文档
获得)
-----delete from test; 的binlog event
>Gtid Event:Pos:368030739(0X15efb413) N_pos:368030804(0X15efb454) Time:1487197597 Event_size:65(bytes)
Gtid:4a6f2a67-5d87-11e6-a6bd-0c29a879a3:1158184
-->Query Event:Pos:368030804(0X15efb454) N_Pos:368030876(0X15efb49c) Time:1487197355 Event_size:72(bytes)
Exe_time:0 Use_db:test Statment(35b-trun):BEGIN /*!Trx begin!*/ Gno:1158184
---->Map Event:Pos368030876(0X15efb49c) N_pos:368031047(0X15efb547) Time:1487197355 Event_size:171(bytes)
TABLE_ID:129 DB_NAME:test TABLE_NAME:test Gno:1158184
------>Delete Event:Pos:368031047(0X15efb547) N_pos:368039196(0X15efd51c) Time:1487197355 Event_size:8149(bytes)
Dml on table: test.test table_id:129 Gno:1158184
....(中间省略)
------>Delete Event:Pos:736052116(0X2bdf4394) N_pos:736054135(0X2bdf4b77) Time:1487197355 Event_size:2019(bytes)
Dml on table: test.test table_id:129 Gno:1158184
>Xid Event:Pos:736054135(0X2bdf4b77) N_Pos:736054166(0X2bdf4b96) Time:1487197597 Event_size:31(bytes)
COMMIT; /*!Trx end*/ Gno:1158184
-----insert into testsort values(1);的binlog
>Gtid Event:Pos:736054166(0X2bdf4b96) N_pos:736054231(0X2bdf4bd7) Time:1487197599 Event_size:65(bytes)
Gtid:4a6f2a67-5d87-11e6-a6bd-0c29a879a3:1158185
-->Query Event:Pos:736054231(0X2bdf4bd7) N_Pos:736054303(0X2bdf4c1f) Time:1487197599 Event_size:72(bytes)
Exe_time:0 Use_db:test Statment(35b-trun):BEGIN /*!Trx begin!*/ Gno:1158185
---->Map Event:Pos736054303(0X2bdf4c1f) N_pos:736054358(0X2bdf4c56) Time:1487197599 Event_size:55(bytes)
TABLE_ID:187 DB_NAME:test TABLE_NAME:testsort Gno:1158185
------>Insert Event:Pos:736054358(0X2bdf4c56) N_pos:736054410(0X2bdf4c8a) Time:1487197599 Event_size:52(bytes)
Dml on table: test.testsort table_id:187 Gno:1158185
>Xid Event:Pos:736054410(0X2bdf4c8a) N_Pos:736054441(0X2bdf4ca9) Time:1487197599 Event_size:31(bytes)
COMMIT; /*!Trx end*/ Gno:1158185
------insert into testsort2 values(1);的binlog
>Gtid Event:Pos:736054441(0X2bdf4ca9) N_pos:736054506(0X2bdf4cea) Time:1487197600 Event_size:65(bytes)
Gtid:4a6f2a67-5d87-11e6-a6bd-0c29a879a3:1158186
-->Query Event:Pos:736054506(0X2bdf4cea) N_Pos:736054578(0X2bdf4d32) Time:1487197600 Event_size:72(bytes)
Exe_time:0 Use_db:test Statment(35b-trun):BEGIN /*!Trx begin!*/ Gno:1158186
---->Map Event:Pos736054578(0X2bdf4d32) N_pos:736054634(0X2bdf4d6a) Time:1487197600 Event_size:56(bytes)
TABLE_ID:188 DB_NAME:test TABLE_NAME:testsort2 Gno:1158186
------>Insert Event:Pos:736054634(0X2bdf4d6a) N_pos:736054686(0X2bdf4d9e) Time:1487197600 Event_size:52(bytes)
Dml on table: test.testsort2 table_id:188 Gno:1158186
>Xid Event:Pos:736054686(0X2bdf4d9e) N_Pos:736054717(0X2bdf4dbd) Time:1487197600 Event_size:31(bytes)
COMMIT; /*!Trx end*/ Gno:1158186
可以看到确实insert into testsort values(1)和insert into testsort2 values(1);的event确实在delete后才进入
binlog file
6、分析等待点。
那么到底在等待什么呢?这是另外一次测试的sys.session输出
mysql> select * from sys.session where command<>'SLEEP' \G
*************************** 1. row ***************************
thd_id: 28
conn_id: 3
user: root@localhost
db: test
command: Query
state: starting
time: 13
current_statement: commit
statement_latency: 12.55 s
progress: NULL
lock_latency: 0 ps
rows_examined: 0
rows_sent: 0
rows_affected: 0
tmp_tables: 0
tmp_disk_tables: 0
full_scan: NO
last_statement: NULL
last_statement_latency: NULL
current_memory: 179.02 KiB
last_wait: wait/io/file/sql/binlog
last_wait_latency: Still Waiting
source: binlog.cc:8469
trx_latency: 3.22 m
trx_state: ACTIVE
trx_autocommit: NO
pid: 5821
program_name: mysql
*************************** 2. row ***************************
thd_id: 3031
conn_id: 3006
user: root@localhost
db: test
command: Query
state: query end
time: 12
current_statement: insert into testsort3(id1,id2, ... OR((RAND()*100000)),'gaopeng')
statement_latency: 11.82 s
progress: NULL
lock_latency: 29.18 ms
rows_examined: 0
rows_sent: 0
rows_affected: 0
tmp_tables: 0
tmp_disk_tables: 0
full_scan: NO
last_statement: NULL
last_statement_latency: NULL
current_memory: 556.53 KiB
last_wait: wait/synch/cond/sql/MYSQL_BIN_LOG::COND_done
last_wait_latency: Still Waiting
source: binlog.cc:1894
trx_latency: 12.97 s
trx_state: ACTIVE
trx_autocommit: YES
pid: 15312
program_name: mysql
不难看出current_statement: commit 等待是
last_wait: wait/io/file/sql/binlog
last_wait_latency: Still Waiting
source: binlog.cc:8469
而被堵塞的insert等待是
last_wait: wait/synch/cond/sql/MYSQL_BIN_LOG::COND_done
last_wait_latency: Still Waiting
source: binlog.cc:1894
这是commit 等待的binlog.cc:8469的内容
/**
On *pure non-transactional* workloads there is a small window
in time where a concurrent rotate might be able to close
the file before the sync is actually done. In that case,
ignore the bad file descriptor errors.
Transactional workloads (InnoDB) are not affected since the
the rotation will not happen until all transactions have
committed to the storage engine, thence decreased the XID
counters.(重点)
TODO: fix this properly even for non-transactional storage
engines.
*/
if (DBUG_EVALUATE_IF("simulate_error_during_sync_binlog_file", 1,
mysql_file_sync(log_file.file,
MYF(MY_WME | MY_IGNORE_BADFD))))
从源码注释上可以看出原因再来看看他掉用int my_sync(File fd, myf my_flags)
既然加入文件描述符fd,应该就是在写binlog file
下面是insert等待的source: binlog.cc:1894的内容
mysql_cond_wait(&m_cond_done, &m_lock_done);
这个地方就是拿不到这把锁,他是Stage_manager::enroll_for调用的函数
而 Stage_manager源码注释是:
/**
Class for maintaining the commit stages for binary log group commit.
*/
而enroll_for方法的描述为:
/**
Enroll a set of sessions for a stage.
This will queue the session thread for writing and flushing.
If the thread being queued is assigned as stage leader, it will
return immediately.
If wait_if_follower is true the thread is not the stage leader,
the thread will be wait for the queue to be processed by the
leader before it returns.(重点)
In DBUG-ON version the follower marks is preempt status as ready.
@param stage Stage identifier for the queue to append to.
@param first Queue to append.
@param stage_mutex
Pointer to the currently held stage mutex, or NULL if
we're not in a stage.
@retval true Thread is stage leader.
@retval false Thread was not stage leader and processing has been done.
*/
那么确实commit后binary log确实有一个写入的策略,会根据是否为leader进行写入,
如果不是leader则等到leader全部写完才进行写入,多个等待写入binlog的会话计入
一个队列中,这貌似就是并行了。
我使用GDB将断点打到
Stage_manager::enroll_for
MYSQL_BIN_LOG::sync_binlog_file
发现整个流程如下:
A会话COMMIT-->拿到锁-->进行binlog写-->commit完成
B会话COMMIT-->等待锁--------------------------->拿到锁-->进行binlog写-->commit完成
7、如何避免
没有什么好的办法如果使用innodb双1的情况下,只能每次修改少量的数据,增加修改次数
前面也说了binlog生成量大约为你修改数据量的2/3如果update是4/3那么就要评估你的I/O
性能,如果顺序写入每秒写入为X需要修改的数据量大约为Y
那么
S(秒)=(Y*(2/3))/(X/2)
控制S到一个你认为可以接受的范围内。其实可以通过copy来试试你的顺序的I/O性能。
比如我们写这样一个存储过程:
DELIMITER //
CREATE PROCEDURE d_move(in beid int,in cunt int,in rws int,in maxid int)
begin
declare num int;
declare nowid int;
set num = 1;
set nowid = beid;
while num <= cunt do
insert into test_vest_hist select * from test_vest where id>=nowid and id<nowid+rws and="" id<nowid+rws="" id<maxid;="" ="" select="" concat(now(),'="" ',row_count(),'="" insert!'); delete from test_vest where id>=nowid and id<nowid+rws and="" id<maxid;="" ="" select="" concat(now(),'="" ',row_count(),'="" delete!'); select concat(now(),' ',num,' times delete finsh!');
set nowid = nowid+rws;
set num = num+1;
select sleep(2);
end while;
end //
将rws控制到一个可以接受的范围比如5W,当然最好进行sleep(),这样给innodb引擎一个刷入脏数据的时间减少压力。
如果sync_log不设置为1,这个我做了trace确实就不会在commit的进行binlog写入了,但是这是不安全的,commit结束
后日志不落地是不安全的。
作者微信: