Douglas F Shearer

Site Search Using Google In Ruby On Rails


Normally for searching in a Rails app when people ask about searching, I suggest using Ferret, the only downside to this being that it can only search model data, and not static content that may be marked up manually.

Enter Google. They index all of your content, no matter how it is generate. So why not use them for the search?

Prerequisites

Make sure you have the Hpricot gem installed –

gem update hpricot --source \
http://code.whytheluckystiff.net

Make sure you include both of them in your environment.rb as follows…

require 'hpricot'
require 'open-uri'

Controller

Generating the query string and getting the results is simple enough to do in a controller method, like so…

  # Search site using google
  def google

    @query = params[:id]
    @start = params[:start] if params[:start]
    @start ||= "0" 

    # Site url as well as any other conditions you'd like.
    # I chose to ignore all of my tag cloud pages, pagination pages and date pages.
    site = 'douglasfshearer.com -"tagged with" -"posts by date" -page'

    uri="http://www.google.com/search?q=#{URI.escape('site:'+site+' '+@query+'&start='+@page.to_s)}" 

    html_result = open(uri)
    parsed = Hpricot(html_result)

    # parse out the number of results.
    # of <b>36</b> from
    @no_results = parsed.to_s[/<\/b> of about <b>(\d*)<\/b> from/,1]

    @results = (parsed/"div.g").map do |ele| 
        {:title => ele.at("a").inner_text,
        :link => ele.at("a")['href'],

        # Huge fat hack alert. Use gsub to get rid of the weird stuff around the bold statements.
        :description => (ele/("font".."font/br")).to_s.gsub(/\221/,'').gsub(/\222/,'')}
    end

View

A very simple view for this can be done as so…

<%- if @results -%>
<h3> <%= @start.to_i+1 -%> - <%= @start.to_i+10 -%> of about <%= @no_results -%> results.</h3>

<%- @results.each do |r| -%>
    <h4><%= link_to r[:title], r[:link] -%></h4>
    <p><%= r[:description] -%></p>
<%- end -%>

<%= link_to 'Prev', :start => @start.to_i - 10 if @start.to_i >= 10 -%> | 
<%= link_to 'Next', :start => @start.to_i + 10 if @start.to_i < @no_results.to_i - 10 -%>
<%- end -%>

There you go. It’s just proof of concept, is a bit dirty in places, and uses the uri :id for the query string, but it works and has a few niceties such as pagination. Go play, and report back on how you get on.

Thanks

Core code, hpricot and inspiration by _why.

Nicholas Wright for regular expressions and other random banter.

A Word Of Warning

Don’t hit google with identical queries too often, you may find your IP is blocked by Google for a short period of time. Mine was, so this is probably best not used for a large production app. I would think this probably breaches Google’s T&Cs too.

Did you like my Ruby on Rails related article? Then why not recommend me on Working with Rails.

Tags

, , , , , .

Related Posts

April 19th 2007 00:40 | comments (1)
 

Comments


Gravatar

Me

July 27th 2008 20:29

Thank you. It worked.

Add Your Comments


(Required)

Your email address to get your Gravatar. Address itself is not shown.

(Include the http://)

(Required)

 

You Are Here


Douglas F Shearer

This is the homepage of Douglas F Shearer, a software developer and mountainbike racer. Find out more at the About page.

Gallery Latest


Side on Chips on crown Front on chip2 chip1 img67

Stay Informed


What is RSS?

Top Tags