Percona Server moved to https://jira.percona.com/projects/PS

performance tuning for deadlock detect switch

Series 5.5
Bug #952920

Bug #952920 reported by Hui Liu on 2012-03-12

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Percona Server moved to https://jira.percona.com/projects/PS	Status tracked in 5.7
5.1	Won't Fix	Wishlist	Unassigned
5.5	Triaged	Wishlist	Unassigned
5.6	Triaged	Wishlist	Unassigned
5.7	Fix Released	Wishlist	Unassigned	Percona Server moved to https://jira.percona.com/projects/PS 5.7.15-9

Bug Description

As for deadlock detect mechanism in Innodb, it's talked for long whether
we need recursive checking for deadlock for some specail scenario, such as
lots of concurrent updates for the same record.

In the Planet MySQL, it's recommended:
“InnoDB is much faster when deadlock detection is disabled for workloads with
a lot of concurrency and contention.”

We are suffering the scenario above, in one of Taobao's core application, Item Center(IC).
Most of the time, it's okay, while for some special sales promotion(about once per month),
it's very very bad, as lots of users of Taobao participated in.

Here is the oprofile result(simulated the online scenario):
2 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
  3 samples % symbol name
  4 2008672 84.8036 lock_deadlock_recursive
  5 91364 3.8573 lock_has_to_wait
  6 11216 0.4735 safe_mutex_lock
  7 9719 0.4103 ut_delay
  8 8047 0.3397 MYSQLparse(void*)
  9 7938 0.3351 lock_rec_has_to_wait_in_queue
10 7788 0.3288 code_state
11 7601 0.3209 my_strnncoll_binary
12 6703 0.2830 dict_col_get_clust_pos_noninline
13 6598 0.2786 _db_enter_
14 6451 0.2724 _db_return_
15 5733 0.2420 _db_doprnt_
16 5503 0.2323 rec_get_offsets_func
17 5325 0.2248 ha_innobase::update_row(unsigned char const*, unsigned char*)
18 5241 0.2213 mutex_spin_wait
19 4931 0.2082 build_template(row_prebuilt_struct*, THD*, st_table*, unsigned int)
20 4655 0.1965 lock_rec_convert_impl_to_expl

As you can see, it's soo bad for lock_detect_recursive function. So we added a switch
to disable the deadlock detect dynamically. For IC application, there is almost no
deadlock as business SQL logic is tuned, so there seems no risk yet.

To make the scenario repeatable, a test case is provided with the data we hit (tweak for sensitive columns) later and the related patch. Please help to have a review.

Tags:

Revision history for this message

Hui Liu (hickey) wrote on 2012-03-12:

test case / data / result /patch Edit (9.4 MiB, application/x-7z-compressed)

Steps to run:

1. unzip the deadlock.7zip
2. start test with "sh run.sh" to observe the results of perf data.
3. apply the patch and turn the switch off to get the result again:

root@(none) 03:40:21>set global innodb_deadlock_detect=off;
Query OK, 0 rows affected (0.00 sec)

The result we test is summarized as:
1000000 queries, concurrency=16
2124 vs 1971 seconds,
1000000 queries, concurrency=700
33569 vs 2612 seconds

As we can see, the patched switch promoted the perf much for the large concurrency scenario.

yinfeng (yinfeng-zwx) on 2012-03-12

Changed in percona-server:
assignee:	nobody → yinfeng (yinfeng-zwx)
assignee:	yinfeng (yinfeng-zwx) → nobody

Revision history for this message

Hui Liu (hickey) wrote on 2012-03-19:

deadlock_detect_switch.diff Edit (20.0 KiB, text/plain)

For the later test, we found the another interesting issue (not using the separated purge thread feature):
When the history list length to be purged grows larger and large (more than 300K) which occurs 30mins later in the test scenario, one loop of purging(10 seconds interval) lasts from seconds to several minutes, and server's TPS drops very very much during this period.

Therefore, to control the purging stage for some special case, we need to control the purging number of pages for stable performance, yet another dynamic variables is introduced: innodb_max_purge_size. Once the large purge begins, we need to set innodb_max_purge_size to be some value, such as 500, the threshold of pages to be purged for one purge chance. And it's reset to 0(no control for purge) if the history list length drops to normal value. The test result shows that it works as we expected, stable performance during large purging stage.

The new patch is attached, with this new variable and related test cases/results.

Stewart Smith (stewart) on 2012-06-14

Changed in percona-server:
importance:	Undecided → Medium
importance:	Medium → Wishlist

Raghavendra D Prabhu (raghavendra-prabhu) on 2012-09-12

tags:

added: contribution

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-09-12:

I got the following conflict when applying the patch:

patch -Np1 < /tmp/deadlock_detect_switch.diff
patching file storage/innobase/handler/ha_innodb.cc
Hunk #1 succeeded at 111 (offset 1 line).
Hunk #2 succeeded at 12387 (offset 252 lines).
Hunk #3 succeeded at 12914 (offset 298 lines).
Hunk #4 succeeded at 12945 (offset 301 lines).
patching file storage/innobase/lock/lock0lock.c
Hunk #1 succeeded at 377 (offset 1 line).
Hunk #2 succeeded at 1812 (offset 2 lines).
Hunk #3 succeeded at 3822 (offset 3 lines).
patching file mysql-test/suite/sys_vars/t/deadlock_detect_basic.test
patching file mysql-test/suite/sys_vars/t/max_purge_size_basic.test
patching file storage/innobase/srv/srv0srv.c
Hunk #1 succeeded at 548 (offset 3 lines).
Hunk #2 succeeded at 3966 (offset 70 lines).
patching file mysql-test/r/percona_server_variables_release.result
Hunk #1 succeeded at 97 (offset 1 line).
Hunk #2 FAILED at 129.
1 out of 2 hunks FAILED -- saving rejects to file mysql-test/r/percona_server_variables_release.result.rej
patching file mysql-test/suite/sys_vars/r/all_vars.result
Hunk #1 succeeded at 14 (offset -1 lines).

Now, that reject file is:

--- mysql-test/r/percona_server_variables_release.result
+++ mysql-test/r/percona_server_variables_release.result
@@ -129,6 +130,7 @@
INNODB_LOG_GROUP_HOME_DIR
INNODB_MAX_DIRTY_PAGES_PCT
INNODB_MAX_PURGE_LAG
+INNODB_MAX_PURGE_SIZE
INNODB_MIRRORED_LOG_GROUPS
INNODB_OLD_BLOCKS_PCT
INNODB_OLD_BLOCKS_TIME

That brings me to the question:

Why do you have INNODB_MAX_PURGE_SIZE when same effect can be accomplished with innodb-purge-batch-size. The batch determines the number of records purged by trx_purge. Can you elaborate on this?

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2012-09-12:

The percona_server_variables_release conflict is trivial and needs a simple re-record. It does not affect the core of the patch.

Alexey Kopytov (akopytov) on 2012-09-12

Changed in percona-server:
status:	New → Incomplete

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-09-12:

@Laurynas, ah, yes, the conflict is trivial, the purpose of mentioning it was for the question regarding innodb-purge-batch-size (and to imply that INNODB_MAX_PURGE_SIZE may not be required).

Revision history for this message

Launchpad Janitor (janitor) wrote on 2013-01-09:

[Expired for Percona Server because there has been no activity for 60 days.]

Changed in percona-server:
status:	Incomplete → Expired

Alexey Kopytov (akopytov) on 2013-01-09

Changed in percona-server:
status:	Expired → New

Laurynas Biveinis (laurynas-biveinis) on 2013-05-17

tags:

added: xtradb

Revision history for this message

Hui Liu (hickey) wrote on 2013-05-27:

Further perf tuning for high concurrency of hot records would ref to google-discussion: https://groups.google.com/forum/?fromgroups#!topic/percona-discussion/z3r4-Qm0oYg

Laurynas Biveinis (laurynas-biveinis) on 2016-03-23

tags:

added: performance

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2016-10-10:

MySQL 5.7.15 has implemented innodb_deadlock_detect - closing as Fix Released

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-2370

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Patches

Add patch

Remote bug watches

Bug watches keep track of this bug in other bug trackers.