Got Data? Need Search? Use Sphinx

James Socol

1

So you’ve built your web application, and people are putting data in it, but how do they get that data back out? Often, they find data by searching. MySQL provides a searchable full-text index, but it has several limitations:

  • FULLTEXT indexes are only available on MyISAM tables.
  • Adding FULLTEXT indexes can slow down INSERTs and UPDATEs.
  • FULLTEXT indexes can span multiple columns, but not multiple tables of data.
  • Searching can increase the load on your database server.

Fortunately, a search system, like Sphinx, allows you to overcome these limitations and provide powerful, flexible, and fast search solutions.

Sphinx, in particular, is easy to set up and use. It powers searches on both Addons and Support. We honestly didn’t shop around, because we found Sphinx did what we needed very well.

Because Sphinx is external to MySQL, it doesn’t care about things like what table engine you use, so you can perform fast, full-text searches on InnoDB tables. It also takes the load of searching off the database, leaving it free to retrieve all the documents your users can find. And because Sphinx finds its data via SQL queries, indexes can collate data in several tables.

How fast does Sphinx index data? For both Addons and Support, indexing is under two minutes, which includes on the order of 10^4 documents on Addons, and on the order of 10^5 on Support. This speed means we could easily double or triple the quantity of data before we’d need to rethink our search strategy.

Sphinx typically collects data by running SQL queries, so if you can write a query to get your data–possibly joining data from several tables, or small, extra queries for many-to-many relationships like tags–you’re done. If your data isn’t in a SQL database, you can feed it data in XML. If you have lots of data, or lots of old data that doesn’t change, you can break this into smaller indexes, or just look at what’s changed, and merge the data together.

The index includes all the text you select and several attributes you can define, which you can use to filter results, or to retrieve the document from the database.

Running searches from your application is very flexible. PHP, Python, Ruby, and others have ports of the search API. You can also search via a SQL interface. There’s even a command line search binary for testing. The API can do some pretty advanced things, like sorting, filtering, settings offsets and result limits, or even building excerpts for your results.

Search is an important part of data-driven web applications, and Sphinx is a powerful ally in providing search capabilities. If you’ve been relying on FULLTEXT indexes, I encourage you to try a search system like Sphinx. And if you need help, come find us in #webdev on irc.mozilla.org.

One response

  1. Artem Russakovskii wrote on ::

    In my experience, Solr provides for better search experience. Check out this article: Solr vs Sphinx: http://beerpla.net/2009/09/03/comparison-between-solr-and-sphinx-search-servers-solr-vs-sphinx-fight/