Normally for searching in a Rails app when people ask about searching, I suggest using Ferret, the only downside to this being that it can only search model data, and not static content that may be marked up manually.

Enter Google. They index all of your content, no matter how it is generate. So why not use them for the search?

Prerequisites

Make sure you have the Hpricot gem installed –

gem update hpricot —source \
http://code.whytheluckystiff.net

Make sure you include both of them in your environment.rb as follows…


require 'hpricot'
require 'open-uri'

Controller

Generating the query string and getting the results is simple enough to do in a controller method, like so…


  1. Search site using google
    def google
@query = params[:id] @start = params[:start] if params[:start] @start ||= “0”
  1. Site url as well as any other conditions you’d like.
  2. I chose to ignore all of my tag cloud pages, pagination pages and date pages.
    site = ‘douglasfshearer.com -“tagged with” -“posts by date” -page’
uri=“http://www.google.com/search?q=#{URI.escape(‘site:’+site+’ ‘@query’&start=’+@page.to_s)}” html_result = open(uri) parsed = Hpricot(html_result)
  1. parse out the number of results.
    @no_results = parsed.to_s[/<\/b> of about (\d*)<\/b> from/,1]
@results = (parsed/“div.g”).map do |ele| {:title => ele.at(“a”).inner_text, :link => ele.at(“a”)[‘href’],
  1. Huge fat hack alert. Use gsub to get rid of the weird stuff around the bold statements.
    :description => (ele/(“font”..“font/br”)).to_s.gsub(/\221/,‘’).gsub(/\222/,’’)}
    end

View

A very simple view for this can be done as so…

<%- if @results -%>
<h3> <%= @start.to_i+1 -%> - <%= @start.to_i+10 -%> of about <%= @no_results -%> results.</h3>

<%- @results.each do |r| -%>
	<h4><%= link_to r[:title], r[:link] -%></h4>
	<p><%= r[:description] -%></p>
<%- end -%>

<%= link_to 'Prev', :start => @start.to_i - 10 if @start.to_i >= 10 -%> | 
<%= link_to 'Next', :start => @start.to_i + 10 if @start.to_i < @no_results.to_i - 10 -%>
<%- end -%>

There you go. It’s just proof of concept, is a bit dirty in places, and uses the uri :id for the query string, but it works and has a few niceties such as pagination. Go play, and report back on how you get on.

Thanks

Core code, hpricot and inspiration by _why.

Nicholas Wright for regular expressions and other random banter.

A Word Of Warning

Don’t hit google with identical queries too often, you may find your IP is blocked by Google for a short period of time. Mine was, so this is probably best not used for a large production app. I would think this probably breaches Google’s T&Cs too.

Did you like my Ruby on Rails related article? Then why not recommend me on Working with Rails.