Xapian vs Thinking Sphinx

As someone who recently switched from Xapian / PHP to Sphinx / Thinking sphinx on Ruby on Rails, with scoop’s fork on github for (undocumented) faceted classification support.

Defining the index was extrememly simple, thanks to the clean Thinking Sphinx API and It’s ActiveRecord Integration indexes can be defined like this:

define_index do
    indexes subject, :sortable => true
    indexes content
    indexes author.name, :as => :author, :sortable => true

    has author_id, created_at, updated_at
end

That’s it! Just run rake ts:index, then rake ts:start and you’re ready to start sending queries.

Building an index is really fast

The old PHP based indexer i had used several queries per document and was optimized to fetch attributes in chunks for several related documents in one go, making the indexer much faster. Had there been not been any optimizations there would probably have been 2-10x the amount of queries.

Sphinx uses 1 massive query that joins all the tables need and queries MySQL directly for this data. Building a full index went from ~15 minutes to ~1 minute.

Cleaner code

The clean API and some database refactorings made me go from 1000+ lines of PHP code to a just over 100 lines of ruby code (and 15 lines in the model to define the index).

A big win was using the Sphinx builtin faceted classification. Even though the API is only provided in scoop’s fork and is undocumented it was not very hard to figure out by reading the source code.

Posted on 11 May 2009 by Morgan Christiansson.
blog comments powered by Disqus