pt-heartbeat 2.1.8 reports big time drift with UTC_TIMESTAMP
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
Fix Released
|
High
|
Daniel Nichter |
Bug Description
I just noticed pt-heartbeat is now using UTC_TIMESTAMP with the check/monitor query, for updates by older version and check by 2.1.8 or vice versa, there will be a large time delay reported. See a snippet of debug output below:
This is 2.1.8:
# $DBD::mysql:
# pt_heartbeat:4904 16322 Heartbeat row primary key: server_id = 1
# pt_heartbeat:4910 16322 SELECT 1 FROM `test`.`heartbeat` WHERE server_id='1' LIMIT 1
# pt_heartbeat:5048 16322 SELECT SQL: SELECT UNIX_TIMESTAMP(
# pt_heartbeat:5055 16322 SELECT UNIX_TIMESTAMP(
# pt_heartbeat:5057 16322 Heartbeat from server 1
# now: 1358235876
# ts: 1358217876
# skew: 0.5
# pt_heartbeat:5062 16322 Delay 17999.500000 on forge.dotmanila.com
# pt_heartbeat:5103 16322 --check and exit
# pt_heartbeat:5234 16322 Checking slave delay
# pt_heartbeat:5274 16322 Checking slave D=test,
# pt_heartbeat:5284 16322 Woke up at 2013-01-
# pt_heartbeat:5055 16322 SELECT UNIX_TIMESTAMP(
# pt_heartbeat:5057 16322 Heartbeat from server 1
# now: 1358235877
# ts: 1358217877
# skew: 0.5
# pt_heartbeat:5062 16322 Delay 17999.500000 on forge.dotmanila.com
17999.50
# pt_heartbeat:5329 16322 Disconnecting
This is 2.1.7
# $DBD::mysql:
# pt_heartbeat:4822 16328 Heartbeat row primary key: server_id = 1
# pt_heartbeat:4828 16328 SELECT 1 FROM `test`.`heartbeat` WHERE server_id='1' LIMIT 1
# pt_heartbeat:4940 16328 SELECT SQL: SELECT ts/*!50038, @@hostname AS host*/, server_id FROM `test`.`heartbeat` WHERE server_id='1' LIMIT 1
# pt_heartbeat:4947 16328 SELECT ts/*!50038, @@hostname AS host*/, server_id FROM `test`.`heartbeat` WHERE server_id='1' LIMIT 1
# pt_heartbeat:4950 16328 Heartbeat from server 1
# now: 2013-01-
# ts: 2013-01-
# skew: 0.5
# pt_heartbeat:4955 16328 Delay 0.156854 on forge.dotmanila.com
# pt_heartbeat:4996 16328 --check and exit
# pt_heartbeat:5127 16328 Checking slave delay
# pt_heartbeat:5167 16328 Checking slave D=test,
# pt_heartbeat:5177 16328 Woke up at 2013-01-
# pt_heartbeat:4947 16328 SELECT ts/*!50038, @@hostname AS host*/, server_id FROM `test`.`heartbeat` WHERE server_id='1' LIMIT 1
# pt_heartbeat:4950 16328 Heartbeat from server 1
# now: 2013-01-
# ts: 2013-01-
# skew: 0.5
# pt_heartbeat:4955 16328 Delay -0.000180 on forge.dotmanila.com
0.00
# pt_heartbeat:5222 16328 Disconnecting
This should at least be documented if not made aware of the ts offset that generated the ts on the master.
Related branches
- Brian Fraser (community): Approve
- Daniel Nichter: Approve
-
Diff: 133 lines (+46/-10)2 files modifiedbin/pt-heartbeat (+20/-8)
t/pt-heartbeat/bugs.t (+26/-2)
Changed in percona-toolkit: | |
milestone: | none → 2.1.9 |
assignee: | nobody → Daniel Nichter (daniel-nichter) |
Changed in percona-toolkit: | |
status: | Triaged → In Progress |
tags: | added: backwards-incompat |
Changed in percona-toolkit: | |
status: | In Progress → Fix Committed |
Changed in percona-toolkit: | |
status: | Fix Committed → Fix Released |
Yes this bug just bit me. This new version is no longer compatible with older version. Servers started reporting massive replication delays after update.
Instead of writting "2013-01- 16T11:58: 35.002250" to the ts column, it now writes "2013-01-16 00:58:35"