Xapian vs Thinking Sphinx
Defining the index was extrememly simple, thanks to the clean Thinking Sphinx API and It’s ActiveRecord Integration indexes can be defined like this:
define_index do indexes subject, :sortable => true indexes content indexes author.name, :as => :author, :sortable => true has author_id, created_at, updated_at end
That’s it! Just run rake ts:index, then rake ts:start and you’re ready to start sending queries.
Building an index is really fast
The old PHP based indexer i had used several queries per document and was optimized to fetch attributes in chunks for several related documents in one go, making the indexer much faster. Had there been not been any optimizations there would probably have been 2-10x the amount of queries.
Sphinx uses 1 massive query that joins all the tables need and queries MySQL directly for this data. Building a full index went from ~15 minutes to ~1 minute.
The clean API and some database refactorings made me go from 1000+ lines of PHP code to a just over 100 lines of ruby code (and 15 lines in the model to define the index).
A big win was using the Sphinx builtin faceted classification. Even though the API is only provided in scoop’s fork and is undocumented it was not very hard to figure out by reading the source code.blog comments powered by Disqus