Blog

Inside the Minds of the Machine

APIs, WordPress Tips and Tricks

Google’s Custom Search Engine

We do not use the built-in WordPress search on any of the sites we develop, instead we add a filter to disable the search function entirely. There are two primary reasons for this decision:

  1. Although the WordPress search may be adequate for a small blog or portfolio, we focus on performance ensuring our sites run fast no matter the size of the site or amount of traffic. WordPress searches are very inefficient as they have to go around page caching to search the database each time a query is made. When you have a site with a large database of posts to search through performance really takes a hit.
  2. WordPress doesn’t search through all of a post’s metadata. It only searches the post title and post content fields. In the custom admins and templates we develop we sometimes store important data in meta boxes. WordPress doesn’t search this data and relevant posts are omitted from search results.

Using Google’s Custom Search Engine (CSE) allows us to solve both of these issues. Performance is no longer an issue, and results are nearly instantaneous as all queries are made via google’s servers. Results are more relevant as the entire page’s content is scanned by Google’s bots instead of selected WordPress database fields. Google’s also great with misspellings, a visitor can misspell anything and google will still most likely find what they’re looking for (WordPress doesn’t do that).

When a search is made using CSE the search results come from the main Google search index. Google however allows us to fine-tune the result set that is presented to the user. We can specify which URL paths to search and have the ability to set advanced filters to exclude URL paths that we don’t want returned. For example when setting up a CSE we only want post content pages returned since they are the primary content of a website. Examples of things we exclude are taxonomy, category, and author landing pages as they just dilute the search results.

Google’s CSE can also be fully integrated with Google Analytics for search statistics. If you don’t make the connection to Analytics CSE still maintains it’s own set of search term stats.

One negative for CSE is that new content is not immediately available in Google’s index. We have to wait for Google to re-crawl our pages which generally happens within a few days.

Tips

If you’re using an SEO plugin like Yoast, set meta robots on archive, and author pages to “noindex, follow”. Do the same for “Noindex subpages of archives”, in the “other” settings tab.

Set nocontent flags in your WordPress template wrappers by adding a “.nocontent” class to page elements that are not relevant to a specific page’s content like sidebars and footers. This will result in better search results on your custom search engine. You’ll also need to activate the nocontent flag in the Google CSE admin.

How to activate the nocontent flag in Google CSE

In the Google CSE admin for your web sites search:
advanced

  • First select the advanced tab
  • Select CSE context
  • Download the XML file
  • Open the XML file in your HTML editor. Near the top of the file find the tag CustomSearchEngine. At the end of the tag add enable_nocontent_tag=”true”. It should look something like this with a few other variables between the start of the tag and the no content tag.<CustomSearchEngine enable_nocontent_tag="true"></code>
  • Save the file. Select upload XML file, and upload the edited file.

Note: The nocontent flag works against pages which have been indexed with the nocontent flag already in place. If you are making these changes to an existing live site. You’ll need to wait for the pages to be re-indexed before the changes take effect.

How to disable built-in WordPress search

You’ll need to add a function like this to your theme’s functions.php file. This will send any hits to http://siteurl.org/?s=search+terms to the 404 page. Note: the default query string variable for Google’s CSE is ?q=search+terms.

function disable_search( $query, $error = true ) {
  if ( is_search() ) {
    $query->is_search = false;
    $query->query_vars[s] = false;
    $query->query[s] = false;
    // to error
    if ( $error == true )
    $query->is_404 = true;
  }
}

add_action( 'parse_query', 'disable_search' );
add_filter( 'get_search_form', create_function( '$a', "return null;" ) );

Integrating Google CSE for your WordPress site is super easy, try setting one up at google.com/cse.