Xapian vs Thinking Sphinx
As someone who recently switched from Xapian / PHP to Sphinx / Thinking sphinx on Ruby on Rails, with scoop’s fork on github for (undocumented) faceted classification support.
Defining the index was extrememly simple, thanks to the clean Thinking Sphinx API and It’s ActiveRecord Integration indexes can be defined like this:
define_index do
indexes subject, :sortable => true
indexes content
indexes author.name, :as => :author, :sortable => true
has author_id, created_at, updated_at
end
That’s it! Just run rake ts:index, then rake ts:start and you’re ready to start sending queries.
Building an index is really fast
The old PHP based indexer i had used several queries per document and was optimized to fetch attributes in chunks for several related documents in one go, making the indexer much faster. Had there been not been any optimizations there would probably have been 2-10x the amount of queries.
Sphinx uses 1 massive query that joins all the tables need and queries MySQL directly for this data. Building a full index went from ~15 minutes to ~1 minute.
Cleaner code
The clean API and some database refactorings made me go from 1000+ lines of PHP code to a just over 100 lines of ruby code (and 15 lines in the model to define the index).
Faceted search
A big win was using the Sphinx builtin faceted classification. Even though the API is only provided in scoop’s fork and is undocumented it was not very hard to figure out by reading the source code.
blog comments powered by Disqus