index_merge/intersection is used when ref(const) is faster

Bug #1005898 reported by Sergey Petrunia
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
In Progress
Medium
Sergey Petrunia

Bug Description

The queries were provided by Stephane Varoqui here: http://pastebin.com/1pqidvq6

The dataset is lots.tgz, uploaded to ftp.askmonty.org/private/

EXPLAIN outputs and query time:

MariaDB [lots]> explain SELECT count(*) AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
| 1 | SIMPLE | lots | index_merge | tsClosed,contractNumber | contractNumber,tsClosed | 5,5 | NULL | 3257 | Using intersect(contractNumber,tsClosed); Using where; Using index |
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
1.85 sec.

MariaDB [lots]> explain SELECT count(*) AS amount FROM lots ignore index(tsClosed) WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
+----+-------------+-------+------+----------------+----------------+---------+-------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------+----------------+---------+-------+-------+-------------+
| 1 | SIMPLE | lots | ref | contractNumber | contractNumber | 5 | const | 28600 | Using where |
+----+-------------+-------+------+----------------+----------------+---------+-------+-------+-------------+
0.30 sec

MariaDB [lots]> explain SELECT count(*) AS amount FROM lots ignore index(contractNumber) WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
+----+-------------+-------+------+---------------+----------+---------+-------+--------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+----------+---------+-------+--------+------------------------------------+
| 1 | SIMPLE | lots | ref | tsClosed | tsClosed | 5 | const | 243422 | Using index condition; Using where |
+----+-------------+-------+------+---------------+----------+---------+-------+--------+------------------------------------+
4.50 sec

As one can see, index_merge/intersect is about 1.85/0.30=6 times slower than ref access on contractNumber.

Changed in maria:
assignee: nobody → Sergey Petrunia (sergefp)
Revision history for this message
Sergey Petrunia (sergefp) wrote :

Let's explore the dataset: here's numbers of matching records for both parts of the WHERE:

Total rows: 2 137 152
lots.tsClosed IS NULL - 544 288 (estimate: 243 422)
contractNumber='1478876' - 30 000 (estimate: 28 600)
contractNumber='1478876' AND lots.tsClosed IS NULL - 10 000 (index_merge's estimate: 3257)

* index_merge's estimate of 3257 was obtained assuming both parts of WHERE are not correlated. In fact, they're strongly correlated.

* Estimate for "lots.tsClosed IS NULL" misses the real value by the order of two.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

If I put a breakpoint in ha_myisam::records_in_range() and return the exact number for records_in_range("lots.tsClosed IS NULL") call, index_merge will still be used:

MariaDB [lots]> explain SELECT count(*) AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
| 1 | SIMPLE | lots | index_merge | tsClosed,contractNumber | contractNumber,tsClosed | 5,5 | NULL | 7283 | Using intersect(contractNumber,tsClosed); Using where; Using index |
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+

Changed in maria:
status: New → Confirmed
status: Confirmed → In Progress
Revision history for this message
Sergey Petrunia (sergefp) wrote :

In MariaDB 5.2, I get:

MariaDB [lots]> explain SELECT count(*) AS amount FROM lots WHERE contractNumber='1478876';
+----+-------------+-------+------+----------------+----------------+---------+-------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------+----------------+---------+-------+-------+--------------------------+
| 1 | SIMPLE | lots | ref | contractNumber | contractNumber | 5 | const | 28600 | Using where; Using index |
+----+-------------+-------+------+----------------+----------------+---------+-------+-------+--------------------------+
1 row in set (0.02 sec)

MariaDB [lots]> explain SELECT count(*) AS amount FROM lots WHERE lots.tsClosed IS NULL;
+----+-------------+-------+------+---------------+----------+---------+-------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+----------+---------+-------+--------+--------------------------+
| 1 | SIMPLE | lots | ref | tsClosed | tsClosed | 5 | const | 243422 | Using where; Using index |
+----+-------------+-------+------+---------------+----------+---------+-------+--------+--------------------------+
1 row in set (0.01 sec)

MariaDB [lots]> explain SELECT count(*) AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
| 1 | SIMPLE | lots | index_merge | tsClosed,contractNumber | contractNumber,tsClosed | 5,5 | NULL | 3257 | Using intersect(contractNumber,tsClosed); Using where; Using index |
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
contractNumber: 0.30 sec
tsClosed: 4.34 sec
Index_merge: 2.00 sec

Revision history for this message
Sergey Petrunia (sergefp) wrote :

Conclusion: MariaDB 5.2 is the same as 5.3 for this bug.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

=== Range analyzer ===
lots.tsClosed IS NULL
(gdb) print found_records
  $59 = 243422
(gdb) p cost
  $58 = {io_count = 243423, avg_io_cost = 1, cpu_cost = 48684.410000000003, mem_cost = 0, import_cost = 0}

contractNumber='1478876'
(gdb) print found_records
  $60 = 28600
(gdb) p cost
  $61 = {io_count = 28601, avg_io_cost = 1, cpu_cost = 5720.0100000000002, mem_cost = 0, import_cost = 0}

As a result, we have:
(gdb) print read_time
  $62 = 34321.010000000002
(gdb) print best_records
  $63 = 28600

=== Index Merge
T@10 : | | | | | | | | | | | | info: ROR key scans (original): tsClosed,contractNumber
T@10 : | | | | | | | | | | | | info: ROR key scans (ordered): contractNumber,tsClosed

T@10 : | | | | | | | | | | | >ror_intersect_add
T@10 : | | | | | | | | | | | | info: Current out_rows= 2.13715e+06
T@10 : | | | | | | | | | | | | info: Adding scan on contractNumber
T@10 : | | | | | | | | | | | | info: is_cpk_scan: 0
T@10 : | | | | | | | | | | | | >ror_scan_selectivity
T@10 : | | | | | | | | | | | | | info: sel_arg step
T@10 : | | | | | | | | | | | | | info: Selectivity multiplier: 0.0133823
T@10 : | | | | | | | | | | | | | info: Returning multiplier: 0.0133823
T@10 : | | | | | | | | | | | | <ror_scan_selectivity
(gdb) p intersect->out_rows
  $69 = 28600
(gdb) p intersect->index_records
  $70 = 28600
(gdb) p intersect->index_scan_costs
  $71 = 602.50860420650099
(gdb) p intersect->total_cost
  $72 = 17919.86261827108

T@10 : | | | | | | | | | | | >ror_intersect_add
T@10 : | | | | | | | | | | | | info: Current out_rows= 28600
T@10 : | | | | | | | | | | | | info: Adding scan on tsClosed
T@10 : | | | | | | | | | | | | info: is_cpk_scan: 0
T@10 : | | | | | | | | | | | | >ror_scan_selectivity
T@10 : | | | | | | | | | | | | | info: sel_arg step
T@10 : | | | | | | | | | | | | | info: Selectivity multiplier: 0.1139
T@10 : | | | | | | | | | | | | | info: Returning multiplier: 0.1139
T@10 : | | | | | | | | | | | | <ror_scan_selectivity
T@10 : | | | | | | | | | | | | info: ROR-intersect is covering now
(gdb) p intersect->out_rows
  $73 = 3257.5451816248915
(gdb) p intersect->index_records
  $74 = 272022
(gdb) p intersect->index_scan_costs
  $75 = 5723.2619502868065
(gdb) p intersect->total_cost
  $76 = 5723.2619502868065

Revision history for this message
Sergey Petrunia (sergefp) wrote :

Final comparison of range-vs-index_merge:
(gdb) print min_cost ## this is index_merge
  $77 = 5723.2619502868065
(gdb) print read_time ## this is read_time
  $78 = 34321.010000000002

Revision history for this message
Sergey Petrunia (sergefp) wrote :

Index_merge code assume conditions to be uncorrelated and "tsClosed IS NULL" adds selectivity of " Selectivity multiplier: 0.1139".

The real value of the multiplier is 0.3. If I set that, I ge this plan:

MariaDB [lots]> explain SELECT count(*) AS amount FROM lots WHERE contractNumber='1478876' AND lots.tsClosed IS NULL;
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+
| 1 | SIMPLE | lots | index_merge | tsClosed,contractNumber | contractNumber,tsClosed | 5,5 | NULL | 8580 | Using intersect(contractNumber,tsClosed); Using where; Using index |
+----+-------------+-------+-------------+-------------------------+-------------------------+---------+------+------+--------------------------------------------------------------------+

#rows is closer to reality now (8580 is closer to the real value of 10,000), but still, index_merge is chosen.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

The problem seems to be the mismatch between range and index_merge costs:

Range access costs
- are calculated in handler::multi_range_read_info_const(), the formula is roughly

     handler::read_time() + #rows / TIME_FOR_COMPARE

index_merge costs:
- are calculated in opt_range.cc.
- for a 2-way intersection over a MyISAM table the formula is roughly:

  handler->keyread_time(index1, ...) + handler->keyread_time(index2, ...) + get_sweep_read_cost(...)

note that the formula only includes index costs, #rows / TIME_FOR_COMPARE was not added.

Changed in maria:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.