One of the more useful MySQL features is the ability to search for text using a FULLTEXT index. Currently this is only available if you use the MyISAM table type (which is the default table type, so if you don't know what table type you're using, it'll most likely be MyISAM). A fulltext index can be created for a TEXT, CHAR or VARCHAR type field, or combination of fields. We're going to create a sample table and use it to explore the various features.
The simple form of usage (the MATCH() function) is available to all MySQL servers from version 3.23.23, while the more complex usage (the IN BOOLEAN MODE modifier) is available from version 4. The first part of this article looks at the former, and the second part at the latter.
A sample table
We're going to use the following table throughout this tutorial.
CREATE TABLE fulltext_sample(copy TEXT,FULLTEXT(copy)) TYPE=MyISAM;
The TYPE=MyISAM clause isn't necessary unless you've set the default table type to be something other than MyISAM (perhaps you use InnoDB tables to make use of MySQL's transactional capabilities). Once you've created the table, populate it with some data, as follows:
INSERT INTO fulltext_sample VALUES
('It appears good from here'),
('The here and the past'),
('Why are we hear'),
('An all-out alert'),
('All you need is love'),
('A good alert');
If you'd already created an existing table, you can add a FULLTEXT index with the ALTER TABLE statement (as well as the CREATE INDEX statement), for example:
ALTER TABLE fulltext_sample ADD FULLTEXT(copy)
Searching for text
The syntax of a FULLTEXT search is simple. You MATCH the field AGAINST the text you are searching for, for example:
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('love');
+----------------------+
| copy |
+----------------------+
| All you need is love |
+----------------------+
Searches on a FULLTEXT index are performed case-insensitively (as are searches on TEXT and non-binary VARCHAR fields generally). So the following works as well:
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('LOVE');
+----------------------+
| copy |
+----------------------+
| All you need is love |
+----------------------+
FULLTEXT indexes are most often used to search natural language text, such as through newspaper articles, web page contents and so on. For this reason MySQL has added a number of features to assist this kind of searching. MySQL does not index any words less than or equal to 3 characters in length, nor does it index any words that appear in more than 50% of the rows. This means that if your table contains 2 or less rows, a search on a FULLTEXT index will never return anything. In future, MySQL will make this behavior more flexible, but for now it should suit most natural language uses. If most fields in your database contain the word 'music', you probably don't want these records returned, You can use the IN BOOLEAN MODE modifier to get around the 50% threshold, as you'll see in Part 2 of this article.
Results are returned in order of relevance, from highest to lowest.
The main features
A list of the main features of a standard FULLTEXT search follows:
Excludes partial words
Excludes words less than 4 characters in length (3 or less)
Excludes words that appear in more than half the rows (meaning at least 3 rows are required)
Hyphenated words are treated as two words
Rows are returned in order of relevance, descending
Words in the stopword list (common words) are also excluded from the search results. The stopword list is based upon common English words, so if your data is used for a different purpose, you'll probably want to change the list. Unfortunately, doing so at present is not easy. You'll need to edit the file myisam/ft_static.c. recompile MySQL, and rebuild the indexes! To save you hunting through the source, or if you have a binary version of MySQL, here is a list of stopwords. Note that these can and do change with different versions. To be absolutely sure, you'll have to check the specific list for your version.
Stopwords
"a", "a's", "able", "about", "above", "according", "accordingly", "across", "actually", "after", "afterwards", "again", "against", "ain't", "all", "allow", "allows", "almost", "alone", "along", "already", "also", "although", "always", "am", "among", "amongst", "an", "and", "another", "any", "anybody", "anyhow", "anyone", "anything", "anyway", "anyways", "anywhere", "apart", "appear", "appreciate", "appropriate", "are", "aren't", "around", "as", "aside", "ask", "asking", "associated", "at", "available", "away", "awfully", "b", "be", "became", "because", "become", "becomes", "becoming", "been", "before", "beforehand", "behind", "being", "believe", "below", "beside", "besides", "best", "better", "between", "beyond", "both", "brief", "but", "by", "c", "c'mon", "c's", "came", "can", "can't", "cannot", "cant", "cause", "causes", "certain", "certainly", "changes", "clearly", "co", "com", "come", "comes", "concerning", "consequently", "consider", "considering", "contain", "containing", "contains", "corresponding", "could", "couldn't", "course", "currently", "d", "definitely", "described", "despite", "did", "didn't", "different", "do", "does", "doesn't", "doing", "don't", "done", "down", "downwards", "during", "e", "each", "edu", "eg", "eight", "either", "else", "elsewhere", "enough", "entirely", "especially", "et", "etc", "even", "ever", "every", "everybody", "everyone", "everything", "everywhere", "ex", "exactly", "example", "except", "f", "far", "few", "fifth", "first", "five", "followed", "following", "follows", "for", "former", "formerly", "forth", "four", "from", "further", "furthermore", "g", "get", "gets", "getting", "given", "gives", "go", "goes", "going", "gone", "got", "gotten", "greetings", "h", "had", "hadn't", "happens", "hardly", "has", "hasn't", "have", "haven't", "having", "he", "he's", "hello", "help", "hence", "her", "here", "here's", "hereafter", "hereby", "herein", "hereupon", "hers", "herself", "hi", "him", "himself", "his", "hither", "hopefully", "how", "howbeit", "however", "i", "i'd", "i'll", "i'm", "i've", "ie", "if", "ignored", "immediate", "in", "inasmuch", "inc", "indeed", "indicate", "indicated", "indicates", "inner", "insofar", "instead", "into", "inward", "is", "isn't", "it", "it'd", "it'll", "it's", "its", "itself", "j", "just", "k", "keep", "keeps", "kept", "know", "knows", "known", "l", "last", "lately", "later", "latter", "latterly", "least", "less", "lest", "let", "let's", "like", "liked", "likely", "little", "look", "looking", "looks", "ltd", "m", "mainly", "many", "may", "maybe", "me", "mean", "meanwhile", "merely", "might", "more", "moreover", "most", "mostly", "much", "must", "my", "myself", "n", "name", "namely", "nd", "near", "nearly", "necessary", "need", "needs", "neither", "never", "nevertheless", "new", "next", "nine", "no", "nobody", "non", "none", "noone", "nor", "normally", "not", "nothing", "novel", "now", "nowhere", "o", "obviously", "of", "off", "often", "oh", "ok", "okay", "old", "on", "once", "one", "ones", "only", "onto", "or", "other", "others", "otherwise", "ought", "our", "ours", "ourselves", "out", "outside", "over", "overall", "own", "p", "particular", "particularly", "per", "perhaps", "placed", "please", "plus", "possible", "presumably", "probably", "provides", "q", "que", "quite", "qv", "r", "rather", "rd", "re", "really", "reasonably", "regarding", "regardless", "regards", "relatively", "respectively", "right", "s", "said", "same", "saw", "say", "saying", "says", "second", "secondly", "see", "seeing", "seem", "seemed", "seeming", "seems", "seen", "self", "selves", "sensible", "sent", "serious", "seriously", "seven", "several", "shall", "she", "should", "shouldn't", "since", "six", "so", "some", "somebody", "somehow", "someone", "something", "sometime", "sometimes", "somewhat", "somewhere", "soon", "sorry", "specified", "specify", "specifying", "still", "sub", "such", "sup", "sure", "t", "t's", "take", "taken", "tell", "tends", "th", "than", "thank", "thanks", "thanx", "that", "that's", "thats", "the", "their", "theirs", "them", "themselves", "then", "thence", "there", "there's", "thereafter", "thereby", "therefore", "therein", "theres", "thereupon", "these", "they", "they'd", "they'll", "they're", "they've", "think", "third", "this", "thorough", "thoroughly", "those", "though", "three", "through", "throughout", "thru", "thus", "to", "together", "too", "took", "toward", "towards", "tried", "tries", "truly", "try", "trying", "twice", "two", "u", "un", "under", "unfortunately", "unless", "unlikely", "until", "unto", "up", "upon", "us", "use", "used", "useful", "uses", "using", "usually", "v", "value", "various", "very", "via", "viz", "vs", "w", "want", "wants", "was", "wasn't", "way", "we", "we'd", "we'll", "we're", "we've", "welcome", "well", "went", "were", "weren't", "what", "what's", "whatever", "when", "whence", "whenever", "where", "where's", "whereafter", "whereas", "whereby", "wherein", "whereupon", "wherever", "whether", "which", "while", "whither", "who", "who's", "whoever", "whole", "whom", "whose", "why", "will", "willing", "wish", "with", "within", "without", "won't", "wonder", "would", "would", "wouldn't", "x", "y", "yes", "yet", "you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself", "yourselves", "z", "zero",
Let's have a look at some of the consequences of this. If you were a bit lazy in your typing, and tried 'to look for the word 'love', as follows:
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('lov');
Empty set (0.00 sec)
you'd get nothing back, as the FULLTEXT index only contains complete words, not partial words. You'd have to write the full word to get anything back, as you did in the first example.
As mentioned, hyphenated words are also excluded from the FULLTEXT index (they are indexed as separate words), so the following also returns nothing:
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('all-out');
Empty set (0.00 sec)
Unfortunately, both words are less than the required 4 letters, so they don't appear on their own either, and cannot be found at all with an ordinary search. Part 2 of this tutorial looks at BOOLEAN MODE searches when you can search for partial or hyphenated words.
You can also search for more than one word at a time, by separating the words with commas. Try and return records containing the words 'here' and 'appears', as follows:
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('here');
Empty set (0.01 sec)
Unexpectedly this returns nothing. But, a more careful look at the stopword list shows this word is listed, so they are excluded from the index. Stopwords are a common cause of people complaining that the MySQL FULLTEXT index facility is not working properly. If your query does return a result, then the stopword list in your version of MySQL does not contain the word 'here'.
Relevance
The following example shows how the records are returned in order of precedence:
mysql> SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('good,alert');
+---------------------------+
| copy |
+---------------------------+
| A good alert |
| It appears good from here |
| An all-out alert |
+---------------------------+
The record 'A good alert' appears first, as it contains both words being searched for. You don't have to believe me - just ask MySQL to display the precedence in the results. Simply repeat the MATCH() function in the field list, as follows:
mysql> SELECT copy,MATCH(copy) AGAINST('good,alert') AS relevance
FROM fulltext_sample WHERE MATCH(copy) AGAINST('good,alert');
+---------------------------+------------------+
| copy | relevance |
+---------------------------+------------------+
| A good alert | 1.3551264824316 |
| An all-out alert | 0.68526663197496 |
| It appears good from hear | 0.67003110026735 |
+---------------------------+------------------+
The relevance calculation is fairly complex and is based upon the number of words in the index, the number of unique words in that row, the total number of words in both the index and the result, as well as the weight of the word (for the average English sentence, the word 'cool' will be weighted less, and therefore have a lower relevance, than the word 'dandy', though trends can change!). The figures may differ in your version of MySQL, as MySQL does occasionally tweak the calculation algorithms.
While the standard FULLTEXT search is fairly useful and sufficient for many, MySQL 4 takes it much further. Part 2 of this article looks at BOOLEAN FULLTEXT searches, which offer a lot more functionality.
Part 1 of this article looked at the fulltext index, and how to search on it using an ordinary MATCH() AGAINST(). Even more powerful, (although only available on the newer MySQL version 4), is the ability to do a boolean search. Part 2 of this article examines the possibilities.
You'll use the same table you used in Part 1. The full list of records is:
mysql> SELECT * FROM fulltext_sample;
+---------------------------+
| copy |
+---------------------------+
| It appears good from here |
| The here and the past |
| Why are we here |
| An all-out alert |
| All you need is love |
| A good alert |
+---------------------------+
To perform a boolean search, the IN BOOLEAN MODE modifier is used. The following query demonstrates the new syntax.
mysql> SELECT * FROM fulltext_sample
WHERE MATCH (copy) AGAINST ('love' IN BOOLEAN MODE);
+----------------------+
| copy |
+----------------------+
| All you need is love |
+----------------------+
An important difference between ordinary MATCH() AGAINST() queries and IN BOOLEAN MODE searches is that the latter does not apply the 50% limit (so the word can appear in more than half the rows).
There are a number of special operators you'll need to know in order to make the most of the boolean search. If no symbol is specified, the appearance of this word causes the relevance of the row to be higher, similar to an ordinary MATCH() AGAINST().
Boolean Search Operators
+ The word is mandatory in all rows returned.
- The word cannot appear in any row returned.
< The word that follows has a lower relevance than other words, although rows containing it will still match
> The word that follows has a higher relevance than other words.
() Used to group words into subexpressions.
~ The word following contributes negatively to the relevance of the row (which is different to the '-' operator, which specifically excludes the word, or the '<' operator, which still causes the word to contribute positively to the relevance of the row.
* The wildcard, indicating zero or more characters. It can only appear at the end of a word.
" Anything enclosed in the double quotes is taken as a whole (so you can match phrases, for example).
Examples
Let's see some of this action. The first example returns all rows containing the word 'here', but not the word 'past'.
mysql> SELECT copy FROM fulltext_sample WHERE MATCH(copy)
AGAINST ('+here -past' IN BOOLEAN MODE);
+---------------------------+
| copy |
+---------------------------+
| It appears good from here |
| Why are we here |
+---------------------------+
Compare the difference between the next two examples:
mysql> SELECT copy FROM fulltext_sample WHERE MATCH(copy)
AGAINST ('here past' IN BOOLEAN MODE);
+---------------------------+
| copy |
+---------------------------+
| It appears good from here |
| The here and the past |
| Why are we here |
+---------------------------+
mysql> SELECT copy FROM fulltext_sample WHERE MATCH(copy)
AGAINST ('here +past'IN BOOLEAN MODE);
+-----------------------+
| copy |
+-----------------------+
| The here and the past |
+-----------------------+
The first works in a similar way to an ordinary MATCH() AGAINST(), albeit without the 50% threshold. The word 'past' is mandatory in the second search (by default a word is optional), so the other two records are not returned.
The next example demonstrates a common mistake.
mysql> SELECT copy FROM fulltext_sample
WHERE MATCH(copy) AGAINST ('+are here' IN BOOLEAN MODE);
+---------------------------+
| copy |
+---------------------------+
| It appears good from here |
| The here and the past |
| Why are we here |
+---------------------------+
The results may appear surprising compared the previous example, but since 'are' contains three or less letters, it is excluded for purposes of the search and is not mandatory.
The next two examples demonstrate a powerful enhancement for searching purposes:
mysql> SELECT copy FROM fulltext_sample
WHERE MATCH(copy) AGAINST ('aler' IN BOOLEAN MODE);
Empty set (0.01 sec)
mysql> SELECT copy FROM fulltext_sample
WHERE MATCH(copy) AGAINST ('aler*' IN BOOLEAN MODE);
+------------------+
| copy |
+------------------+
| An all-out alert |
| A good alert |
+------------------+
现在应该很明白全文索引列是什么了吧~!但是你可能会遇到以下错误信息!
#1214 - The used table type doesn’t support FULLTEXT indexes
于是我到 MySQL 手册中搜索 FULLTEXT 得到一条有价值的信息:
Full-text indexes can be used only with MyISAM tables, and can be created only for CHAR, VARCHAR, or TEXT columns.
mysql4.0竟然没有这个问题,但是使用mysql5.0的时候到出现,不过用以下办法就解决了
alter table `table` type=myisam;
如果看不懂一部分英文的朋友,我在这里简述一下:
ALTER TABLE `table` ADD FULLTEXT (row2)//全文索引列的创建
ALTER TABLE `table` DROP INDEX `row3`,ADD FULLTEXT `IndexName` (`row2`)//修改用
select * from `table` where match (row2) against ('12345'); //查询
SELECT * , MATCH (row2) AGAINST ('12345') FROM `table`
SHOW INDEXES FROM `TABLE` 显示是否已经设置
+
一个领头的加号表示,该词必须出现在每个返回的记录行中。
-
一个领头的减号表示,该词必须不出现在每个返回的记录行中。
缺省的
(当既没有加号也没有负号被指定时)词是随意的,但是包含它的记录行将被排列地更高一点。这个模仿没有 IN BOOLEAN MODE 修饰词的 MATCH() ... AGAINST() 的行为。
< >
这两个操作符用于改变一个词的相似性值的基值。< 操作符减少基值,> 操作符则增加它。参看下面的示例。
( )
圆括号用于对子表达式中的词分组。
~
一个领头的否定号的作用象一个否定操作符,引起行相似性的词的基值为负的。它对标记一个噪声词很有用。一个包含这样的词的记录将被排列得低一点,但是不会被完全的排除,因为这样可以使用 - 操作符。
*
一个星号是截断操作符。不想其它的操作符,它应该被追加到一个词后,不加在前面。
"
短语,被包围在双引号"中,只匹配包含这个短语(字面上的,就好像被键入的)的记录行。
例子:
+apple +(>pie
strudel” 要高一点
apple*
... 包含 “apple”,“apples”,“applesauce” 和 “applet”
"some words"
... 可以包含 “some words of wisdom”,但不是 “some noise words”
» 版权所有:YaoLei's Blog » MySQL中全文索引列入门 MATCH() AGAINST()
» 本文链接:https://www.yaolei.info/archives/66