Ignore INNODB_FT_DEFAULT_STOPWORD for ngram indexes
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
MySQL Server |
Unknown
|
Unknown
|
||||
Percona Server moved to https://jira.percona.com/projects/PS | Status tracked in 5.7 | |||||
5.5 |
Invalid
|
Undecided
|
Unassigned | |||
5.6 |
Invalid
|
Undecided
|
Unassigned | |||
5.7 |
Fix Released
|
Medium
|
Yura Sorokin |
Bug Description
Originally reported at https:/
[5 Jan 11:19] Miguel Angel Nieto
Description:
Ngram indexes also check the stopwords list, to see if any indexed element *contain* one of the words on that list. This looks good and it is the normal behaviour, but I don't think that the default table is suitable to use it with ngram.
For example, any item that contains 'a' or 'i' will be ignored. So for example, if you have word "east", you cannot search for "ea" because it has been ignored.
Ngram should have a different default list of stopwords, or an empty list.
How to repeat:
mysql> CREATE TABLE `articles` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`body` text,
PRIMARY KEY (`id`),
FULLTEXT KEY `ftx` (`body`) /*!50100 WITH PARSER `ngram` */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
mysql> insert into articles (body) values ('east');
mysql> insert into articles (body) values ('east area');
mysql> insert into articles (body) values ('east job');
mysql> insert into articles (body) values ('eastnation');
mysql> insert into articles (body) values ('eastway, try try');
mysql> SELECT * FROM articles WHERE MATCH(body) AGAINST('ea' IN BOOLEAN MODE);
Empty set (0.00 sec)
====
There is a workaround for this bug: create custom INNODB_
Suggested fix: either have special INNODB_
There is also code in fts_check_token:
4791 bool
4792 fts_check_token(
4793 const fts_string_t* token,
4794 const ib_rbt_t* stopwords,
4795 bool is_ngram,
4796 const CHARSET_INFO* cs)
4797 {
4798 ut_ad(cs != NULL || stopwords == NULL);
4799
4800 if (!is_ngram) {
4801 ib_rbt_bound_t parent;
4802
4803 if (token->f_n_char < fts_min_token_size
4804 || token->f_n_char > fts_max_token_size
4805 || (stopwords != NULL
4806 && rbt_search(
4807 return(false);
4808 } else {
4809 return(true);
4810 }
4811 }
4812
4813 /* Check token for ngram. */
4814 DBUG_EXECUTE_IF(
4815 "fts_instrument
4816 return(true);
4817 );
So only job is to replace DBUG_EXECUTE_IF with some new option.
Fixed by implementing bp:innodb- fts-ngram- ignore- stopword- list /blueprints. launchpad. net/percona- server/ +spec/innodb- fts-ngram- ignore- stopword- list).
"A new InnoDB variable to control whether InnoDB FTS should ignore stopword list" (https:/
https:/ /github. com/percona/ percona- server/ pull/1988