PHP MySQL Database Search Engine With Snippet Highlighting
Recently I decided to write a simple search engine for my new PHP/MySQL blog content management system.
The requirements for it were as follows:
* The user should be able to type in a search term or terms and receive results on a search results page.
* These results should be in order of relevance.
* The results should have a headline link and snippet of the surrounding text that the search term keyword was found in.
* The keyword should be high lighted.
* The results list should be paginated.
The first thing I had to do was convert my already existing MySQL table, called blogentries to use the MYISAM database engine. This was necessary in order to use MySQL full text searching functionality.
ALTER TABLE blogentries ENGINE = MYISAM;
Next, I had to add a full text search capability using:
ALTER TABLE blogentries ADD FULLTEXT(summary, content, title);
Summary, content and title are the names of the table columns that I wished to search on.
The actual SQL for searching is pretty simple, once broken down into manageable chunks:
SELECT SQL_CALC_FOUND_ROWS *, UNIX_TIMESTAMP(publicationDate) AS publicationDate, MATCH (title, summary, entry) AGAINST ('+$searchTerm' IN BOOLEAN MODE) AS relevance FROM blogentries WHERE MATCH (title, summary, entry) AGAINST ('+$searchTerm' IN BOOLEAN MODE) ORDER BY relevance LIMIT $Page_Start, $per_page
This code can all be found in Blog.php class object in the function getSearchPaginate($searchTerm). This function also paginates the results list.
The SQL_CALC_FOUND_ROWS *, UNIX_TIMESTAMP(publicationDate) bit just allows us to use a date for the entry, and is not really relevant to this current discussion. The bit of interest is the MATCH (title, summary, entry) AGAINST ('+$searchTerm' IN BOOLEAN MODE), which takes the search keywords entered into the html search text input and finds matches in the three columns listed. Then we order them by a relevancy score.
According to the MySQL manual, Full-text is a “natural language search”; it indexes words that appear to represent the row, using the columns you specified. As an example, if all your rows contain “MySQL” then “MySQL” won’t match much. It’s not terribly unique, and it would return too many results. However, if “MySQL” were present in only 5% of the rows, it would return those rows because it doesn’t appear too often to be known as a keyword that’s very common.
MySQL also does something pretty useful. It creates a score.
This score is usually something like .9823475 or .124874, but always larger than zero. It can range up above 1, and I have seen it at 4 sometimes. MySQL will also order a row by its score, descending.
Two other functions in the Blog.php class are of particular interest, and these are:
excerpt($text, $phrase, $radius = 300, $ending = "...")
The first function gets an excerpt of 300 (for example) characters from a string based on a passed in phrase to match, 150 characters either side of it. It is used for high lighting search terms in the blog search results, by passing the snippet created by it into the excerpt() function.
Published on 9 April 2015 in PHP Scripts