✓ I'm available for hire! Check out my open source work on Github or drop me an email

Douglas F Shearer

Posts Tagged with ferret

There are 5 matching posts.

Ruby Gems Installation And Compilation On OpenSolaris

I’ve had a bit of trouble installing various Rubygems on my system, mostly Hpricot and Ferret which both require some code compilation. In an effort to save others the same trouble, I’ve compiled a list of tips here to make life easier…

  • Change your rbconfig.rb file in the RubyGems folder using the one created by Joyent’s Benr. This will help RubyGems find make, cc, gcc etc, which are not in their usual places in Solaris/OpenSolaris.
  • Install cc by getting the Sun Studio. You have to be registered, but if you downloaded OpenSolaris from Sun, you’ll already have an account.
  • Install gcc3. If you are using BlastWave to manage your packages simply run pkg-get install gcc3 to get it.

Hopefully this will help some of you out.

Other Great OpenSolaris/Solaris Tips

Fellow Edinburgh Rubyist Graeme Mathieson has put together some great posts on his experiences with Solaris on his new Sun Thumper. Very lucky man!

 
 

Site Search Using Google In Ruby On Rails

Normally for searching in a Rails app when people ask about searching, I suggest using Ferret, the only downside to this being that it can only search model data, and not static content that may be marked up manually.

Enter Google. They index all of your content, no matter how it is generate. So why not use them for the search?

Prerequisites

Make sure you have the Hpricot gem installed –

gem update hpricot —source \
http://code.whytheluckystiff.net

Make sure you include both of them in your environment.rb as follows…


require 'hpricot'
require 'open-uri'

Controller

Generating the query string and getting the results is simple enough to do in a controller method, like so…

  1. Search site using google
    def google
@query = params[:id] @start = params[:start] if params[:start] @start ||= “0”
  1. Site url as well as any other conditions you’d like.
  2. I chose to ignore all of my tag cloud pages, pagination pages and date pages.
    site = ‘douglasfshearer.com -“tagged with” -“posts by date” -page’
uri=“http://www.google.com/search?q=#{URI.escape(‘site:’+site+’ ‘@query’&start=’+@page.to_s)}” html_result = open(uri) parsed = Hpricot(html_result)
  1. parse out the number of results.
    @no_results = parsed.to_s[/<\/b> of about (\d*)<\/b> from/,1]
@results = (parsed/“div.g”).map do |ele| {:title => ele.at(“a”).inner_text, :link => ele.at(“a”)[‘href’],
  1. Huge fat hack alert. Use gsub to get rid of the weird stuff around the bold statements.
    :description => (ele/(“font”..“font/br”)).to_s.gsub(/\221/,‘’).gsub(/\222/,’’)}
    end

View

A very simple view for this can be done as so…

<%- if @results -%>
<h3> <%= @start.to_i+1 -%> - <%= @start.to_i+10 -%> of about <%= @no_results -%> results.</h3>

<%- @results.each do |r| -%>
	<h4><%= link_to r[:title], r[:link] -%></h4>
	<p><%= r[:description] -%></p>
<%- end -%>

<%= link_to 'Prev', :start => @start.to_i - 10 if @start.to_i >= 10 -%> | 
<%= link_to 'Next', :start => @start.to_i + 10 if @start.to_i < @no_results.to_i - 10 -%>
<%- end -%>

There you go. It’s just proof of concept, is a bit dirty in places, and uses the uri :id for the query string, but it works and has a few niceties such as pagination. Go play, and report back on how you get on.

Thanks

Core code, hpricot and inspiration by _why.

Nicholas Wright for regular expressions and other random banter.

A Word Of Warning

Don’t hit google with identical queries too often, you may find your IP is blocked by Google for a short period of time. Mine was, so this is probably best not used for a large production app. I would think this probably breaches Google’s T&Cs too.

Did you like my Ruby on Rails related article? Then why not recommend me on Working with Rails.

 
 

Acts_as_ferret Returning Only 10 Results?

The acts_as_ferret gem is great! It allows the power of the Ferret (Apache Java Luciene originally) search engine to be leveraged in a Rails application easily.

There is one thing about it that leaves me slightly miffed. Why does any search only return 10 items by default?

The fix is to use your search methods as follows…

If you are wanting model objects returned, then use:

objects = Model.find_by_contents('string search query', :limit=>1000000)

If you just want the IDs returned, use this:

object_ids = Model.find_id_by_contents('string search query', :limit=>1000000)

It would be rather nice if an :all option switch was available, but instead we make do with the :limit=>1000000 ‘hack’, which isn’t a problem unless you end up with more than a million results returned by a search. Unless you’re Google, this isn’t going to be an issue.

Hope this helped some folks out who were puzzled by this as I was.

Update

After looking at my logs, I’ve realised that :num_docs has been deprecated in favour of :limit. I’ve update this post to reflect that.

Did you like my Ruby on Rails related article? Then why not recommend me on Working with Rails?

 
 

Search Back Up

Thanks to Jens Kraemer for sending me a copy of the ferret_ext file I was requiring to get my search working. Maybe upgrading from Ubuntu Breezy to Dapper causes a few problems I hadn’t come across before. I’ll need to try a fresh install of Dapper, and see if compiling Ferret works on that. I’m now running Capistrano, a very clever set of Rake deployment scripts for rails. Being able to commit a revision to the live server, and revert to the previous version if it all goes wrong is brilliant.
 
 

Finally Online

After 3 days of torment attempting to get my new Rails app online, it is now up!

My server is now running Ubuntu Dapper with Lighttpd and FastGGI serving my blog. I ran into two problems getting this to work:

  1. First up Ferret does not install properly, and the file ferret_ext appears not to be compiled, and thus unavailable to the app. As soon as I started the server it would crash, citing the missing file. For this reason search is disabled, but will hopefully be reinstated as soon as I have solved my problems with this.
  2. The RMagick Gem wouldn’t work, and was again cited as missing by the server. I uninstalled the RMagick gem, and installed the ruby library available in the Ubuntu packages. I’ll put instructions for this in another Blog post.
Hope you like the new design, I certainly feel it was worth all the effort it took, though I’m slightly disappointed that my search isn’t working for the moment.