alexcolson.com alexcolson.com [Printable Version]
Making a Simple Search Engine
Home > Tech Talk > Article

[Create New Article] :: [Browse Tags]

Modified: Friday, January 25, 2008 10:49:51 PM by Alex Colson
Tags: MySQL, Perl, Search Engine, SQL, Web Development

Search engines can add a lot of functionality to a website but can be very difficult to create.  If you have a database full of items that need to be searched and use Perl, then this might be your solution.  If not, then maybe this tutorial will give you some ideas.

This simple search engine ranks results based on how many times the terms appear in the results. This script can be expanded to include multiple tables and other advanced queries that might be required for certain applications.

Prerequisites:
  • Knowledge of Perl and SQL. 
  • MySQL database (or other SQL database) full of searchable data. 
  • DBI or other database interface.
Related Tutorials:
  • The Page System can be very useful when creating a search engine. 
  • MySQL Prep should always be used when user input is piped directly into a SQL query.
How it works:
The program connects to the database, breaks down the search string into individual terms while filtering out useless terms, builds the SQL query, and displays the results.

Step 1: Connecting to the database and other functions

use DBI;
$db = DBI->connect("DBI:mysql:database=search;host=localhost","user","password");

Remember to change the database name, hostname, username, and password.

Step 2: Breaking down the search string

while ( $input{'q'} =~ m/(\"[^\"]*\"|[^\ ]*)(?:$|\ )/g ) {
  if ($1 =~ m/^[\s]*not[\s]*$/i) {
    $method = " and not ";
  } elsif ($1 =~ m/^[\s]*or[\s]*$/i) {
    $method = " or ";
  } elsif ($1 =~ m/^[\s]*and[\s]*$/i) {
    $method = " and ";
  } else {
    if ($1) {
      $term = mysqlprep($1);
      $term =~ s/\%/\\\%/g;
      if ($term =~ m/\"(.*)\"/) {$term = $1;}
      if ($method) {
        $query .= $method;
        $meth = "";
      } else {
        if ($query) { $query .= " and ";}
        if ($order) { $order .= " + "; }
        $order .= "(length(field)-length(replace(lower(field),lower('$term'),'')))";
      }
      $query .= "(field like '%$term%')";
    }
  }
}

This code separates each term in the search string by spaces, except spaces that are located within quotes.  The code then substitutes the MySQL wildcard (%) with a backslashed wildecard and removes the terms and, not, and or to be used as operators within the query.

This allows users to search for trains not toys to return only train articles that do not reference toys.

Users can also search for trains or toys to return articles containing trains, toys, or both.

The default method is and when no other method is specified so the search of trains toys is the same as trains and toys which will return articles that reference trains and toys.

Items within quotes must be found as a string so "remote control" will only return articles that contain remote and control beside each other.

The remainder of the code sets the order portion of the query.  It uses the entire length of the article and subtracts the length of the article without the search terms.  The article that has the largest difference contains the most search terms.

Step 3: Building the SQL query

$results = $db->selectrow_arrayref("select count(field) from table where ($query);");
$resultcount = $results->[0];
$results = $db->selectall_arrayref("select field from table where ($query) order by $order limit 10 offset ".(10*($page-1)).";");

Two queries were actually run in this code.  The first query simply gets the number of results of the search.  The second query gets the results and is limited to 10.  The 11 - 20 are displayed when $page 1, 21 - 30 when $page is 2, and so on.

Step 4: Displaying the Results

print "<ol start=".(10*($page-1)+1).">";
for ($i=0;$i<@$results;$i++) {
  print "<li>$results->[$i][0]</li>";
}
print "</ol>";

Displaying the results is a simple for loop that goes through the results and lists them in an ordered list.  Notice the ordered list starts at different points depending on which page of results is being viewed.

Don't forget to display an option to go to different pages of the search results.
Sunday, September 5, 2010 10:36:14 AM EST (0.0266 s) Copyright © 2008 Alex Colson