Site Search Using Google In Ruby On Rails
Enter Google. They index all of your content, no matter how it is generate. So why not use them for the search?
Prerequisites
Make sure you have the Hpricot gem installed –
gem update hpricot —source \
http://code.whytheluckystiff.net
Make sure you include both of them in your environment.rb
as follows…
require 'hpricot'
require 'open-uri'
Controller
Generating the query string and getting the results is simple enough to do in a controller method, like so…
- Search site using google
def google
@query = params[:id]
@start = params[:start] if params[:start]
@start ||= “0”
- Site url as well as any other conditions you’d like.
- I chose to ignore all of my tag cloud pages, pagination pages and date pages.
site = ‘douglasfshearer.com -“tagged with” -“posts by date” -page’
uri=“http://www.google.com/search?q=#{URI.escape(‘site:’+site+’ ‘@query’&start=’+@page.to_s)}”
html_result = open(uri)
parsed = Hpricot(html_result)
- parse out the number of results.
@no_results = parsed.to_s[/<\/b> of about (\d*)<\/b> from/,1]
@results = (parsed/“div.g”).map do |ele|
{:title => ele.at(“a”).inner_text,
:link => ele.at(“a”)[‘href’],
- Huge fat hack alert. Use gsub to get rid of the weird stuff around the bold statements.
:description => (ele/(“font”..“font/br”)).to_s.gsub(/\221/,‘’).gsub(/\222/,’’)}
end
View
A very simple view for this can be done as so…
<%- if @results -%>
<h3> <%= @start.to_i+1 -%> - <%= @start.to_i+10 -%> of about <%= @no_results -%> results.</h3>
<%- @results.each do |r| -%>
<h4><%= link_to r[:title], r[:link] -%></h4>
<p><%= r[:description] -%></p>
<%- end -%>
<%= link_to 'Prev', :start => @start.to_i - 10 if @start.to_i >= 10 -%> |
<%= link_to 'Next', :start => @start.to_i + 10 if @start.to_i < @no_results.to_i - 10 -%>
<%- end -%>
There you go. It’s just proof of concept, is a bit dirty in places, and uses the uri :id
for the query string, but it works and has a few niceties such as pagination. Go play, and report back on how you get on.
Thanks
Core code, hpricot and inspiration by _why.
Nicholas Wright for regular expressions and other random banter.
A Word Of Warning
Don’t hit google with identical queries too often, you may find your IP is blocked by Google for a short period of time. Mine was, so this is probably best not used for a large production app. I would think this probably breaches Google’s T&Cs too.
Did you like my Ruby on Rails related article? Then why not recommend me on Working with Rails.