{"id":3601,"date":"2013-10-18T13:50:29","date_gmt":"2013-10-18T21:50:29","guid":{"rendered":"http:\/\/blog.mozilla.org\/webdev\/?p=3601"},"modified":"2013-10-21T08:04:28","modified_gmt":"2013-10-21T16:04:28","slug":"full-text-search-in-air-mozilla","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/webdev\/2013\/10\/18\/full-text-search-in-air-mozilla\/","title":{"rendered":"Full-text search in Air Mozilla with PostgreSQL"},"content":{"rendered":"<p>In a <a href=\"http:\/\/blog.mozilla.org\/webdev\/2013\/10\/18\/air-mozilla-mysql-to-postgresql\/\">previous post<\/a> I explained why and how we migrated <a href=\"https:\/\/air.mozilla.org\">Air Mozilla<\/a> to use PostgreSQL as the default database. We did this so we can leverage PostgreSQL&#8217;s powerful full-text search feature. <\/p>\n<p>First, off a tangent we go&#8230; Why not use the popular and also powerful full-text master <a href=\"http:\/\/www.elasticsearch.org\/\">ElasticSearch<\/a>? Surely, since it&#8217;s built on top of <a href=\"http:\/\/lucene.apache.org\/\">Apache Lucene<\/a> it&#8217;s bound to have some amazing full-text search and indexing features. I&#8217;m sure it does &#8212; but we don&#8217;t need them. <\/p>\n<p>All we want to do is find records whose <code>title<\/code>, <code>description<\/code> or <code>short_description<\/code> contain certain words spelled in the same <a href=\"http:\/\/en.wikipedia.org\/wiki\/Stemming\">stem<\/a>. We also want highlighting so we can display a neat search results page with the matches emphasized (something that isn&#8217;t easy to do with regular expressions in Python when the results come back). <\/p>\n<p>PostgreSQL can do all of that and it&#8217;s fast. Very fast! By far, the biggest win of using the same database we already connect the Django ORM to is that we simply don&#8217;t have to worry about indexing. Like, at all. All you do is set this up as a migration:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/peterbe\/7045097.js\"><\/script><\/p>\n<p>At the moment Air Mozilla only has English content, but some day there might be more languages. How to add indexes for different languages is pretty clear; you run the same migration as above with different languages named. <\/p>\n<p>That means that any inserts, updates or deletions automatically updates the full-text index for these columns in the database. We don&#8217;t have to worry about this at all, at any point in the ORM code. It just works!<\/p>\n<p>Now, let&#8217;s explain how the search works. A user types in a search query. E.g. <a href=\"https:\/\/air.mozilla.org\/search\/?q=community\">&#8220;community&#8221;<\/a>. <\/p>\n<p>What we want to do is to return an ORM QuerySet that:<\/p>\n<ul>\n<li>contains all events that the user is allowed to see depending privacy or publishing workflow criteria <strong>and<\/strong><\/li>\n<li>whose <code>title<\/code> or <code>short_description<\/code> or <code>description<\/code> contains the search term.<\/li>\n<\/ul>\n<p>And, we want it to be ranked based on matches in the <code>title<\/code> &#8220;higher&#8221; compared to matches in the <code>short_description<\/code> or <code>description<\/code>. So let&#8217;s add that to the filtering:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/peterbe\/7045326.js\"><\/script><\/p>\n<p>Now, that satisfies the &#8220;where part&#8221;. Next, we need to do something about the ranking, so we extend the code with this:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/peterbe\/7045422.js\"><\/script><\/p>\n<p>Last but not least, we want to let PostgreSQL work out the highlighting of matches so you can show extracts on the search result page with the matched words emphasized. So you extend <code>select<\/code> with some more code to look like this:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/peterbe\/7045472.js\"><\/script><\/p>\n<p>And there you have it. Note, that PostgreSQL inserts HTML markup into these <code>title_highlit<\/code> and <code>desc_highlit<\/code> extra annotations and it also escapes away any previous HTML so they&#8217;re safe to display in raw form in the Django template code. So it can look like this in the search results template:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/peterbe\/7045517.js\"><\/script><\/p>\n<p>In plain PostgreSQL SQL there are actually ways to &#8220;combine&#8221; the rank calculation with the &#8220;where criteria&#8221; so that you don&#8217;t have to do both the rank calculation and the where operation separately. However that&#8217;s way out of scope for the Django ORM API and even though it&#8217;s possible to achieve, the code will quickly get messy. <\/p>\n<p>So, how long does it take to do this query? On my laptop, with a snapshot of the production database containing over 600 events, that big query takes <strong>30-35 milliseconds<\/strong>. That&#8217;s fast enough.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In a previous post I explained why and how we migrated Air Mozilla to use PostgreSQL as the default database. We did this so we can leverage PostgreSQL&#8217;s powerful full-text search feature. First, off a tangent we go&#8230; Why not &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/webdev\/2013\/10\/18\/full-text-search-in-air-mozilla\/\">Continue reading<\/a><\/p>\n","protected":false},"author":325,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[288],"tags":[],"coauthors":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Full-text search in Air Mozilla with PostgreSQL - Mozilla Web Development<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.mozilla.org\/webdev\/2013\/10\/18\/full-text-search-in-air-mozilla\/\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Peter Bengtsson\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.mozilla.org\/webdev\/2013\/10\/18\/full-text-search-in-air-mozilla\/\",\"url\":\"https:\/\/blog.mozilla.org\/webdev\/2013\/10\/18\/full-text-search-in-air-mozilla\/\",\"name\":\"Full-text search in Air Mozilla with PostgreSQL - Mozilla Web Development\",\"isPartOf\":{\"@id\":\"https:\/\/blog.mozilla.org\/webdev\/#website\"},\"datePublished\":\"2013-10-18T21:50:29+00:00\",\"dateModified\":\"2013-10-21T16:04:28+00:00\",\"author\":{\"@id\":\"https:\/\/blog.mozilla.org\/webdev\/#\/schema\/person\/fbadf48380da3866faf204f1d1c976ea\"},\"breadcrumb\":{\"@id\":\"https:\/\/blog.mozilla.org\/webdev\/2013\/10\/18\/full-text-search-in-air-mozilla\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.mozilla.org\/webdev\/2013\/10\/18\/full-text-search-in-air-mozilla\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.mozilla.org\/webdev\/2013\/10\/18\/full-text-search-in-air-mozilla\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.mozilla.org\/webdev\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Full-text search in Air Mozilla with PostgreSQL\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.mozilla.org\/webdev\/#website\",\"url\":\"https:\/\/blog.mozilla.org\/webdev\/\",\"name\":\"Mozilla Web Development\",\"description\":\"For make benefit of glorious tubes\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.mozilla.org\/webdev\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.mozilla.org\/webdev\/#\/schema\/person\/fbadf48380da3866faf204f1d1c976ea\",\"name\":\"Peter Bengtsson\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.mozilla.org\/webdev\/#\/schema\/person\/image\/4b8aaab120020f4d8090901be034f267\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/37d081c393f95a14e2704af38ecc4c8d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/37d081c393f95a14e2704af38ecc4c8d?s=96&d=mm&r=g\",\"caption\":\"Peter Bengtsson\"},\"description\":\"Peter is a staff web developer at Mozilla working on MDN Web Docs. He blogs on www.peterbe.com\",\"sameAs\":[\"https:\/\/www.peterbe.com\"],\"url\":\"https:\/\/blog.mozilla.org\/webdev\/author\/pbengtssonmozilla-com\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Full-text search in Air Mozilla with PostgreSQL - Mozilla Web Development","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.mozilla.org\/webdev\/2013\/10\/18\/full-text-search-in-air-mozilla\/","twitter_misc":{"Written by":"Peter Bengtsson","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/blog.mozilla.org\/webdev\/2013\/10\/18\/full-text-search-in-air-mozilla\/","url":"https:\/\/blog.mozilla.org\/webdev\/2013\/10\/18\/full-text-search-in-air-mozilla\/","name":"Full-text search in Air Mozilla with PostgreSQL - Mozilla Web Development","isPartOf":{"@id":"https:\/\/blog.mozilla.org\/webdev\/#website"},"datePublished":"2013-10-18T21:50:29+00:00","dateModified":"2013-10-21T16:04:28+00:00","author":{"@id":"https:\/\/blog.mozilla.org\/webdev\/#\/schema\/person\/fbadf48380da3866faf204f1d1c976ea"},"breadcrumb":{"@id":"https:\/\/blog.mozilla.org\/webdev\/2013\/10\/18\/full-text-search-in-air-mozilla\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.mozilla.org\/webdev\/2013\/10\/18\/full-text-search-in-air-mozilla\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/blog.mozilla.org\/webdev\/2013\/10\/18\/full-text-search-in-air-mozilla\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.mozilla.org\/webdev\/"},{"@type":"ListItem","position":2,"name":"Full-text search in Air Mozilla with PostgreSQL"}]},{"@type":"WebSite","@id":"https:\/\/blog.mozilla.org\/webdev\/#website","url":"https:\/\/blog.mozilla.org\/webdev\/","name":"Mozilla Web Development","description":"For make benefit of glorious tubes","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.mozilla.org\/webdev\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/blog.mozilla.org\/webdev\/#\/schema\/person\/fbadf48380da3866faf204f1d1c976ea","name":"Peter Bengtsson","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.mozilla.org\/webdev\/#\/schema\/person\/image\/4b8aaab120020f4d8090901be034f267","url":"https:\/\/secure.gravatar.com\/avatar\/37d081c393f95a14e2704af38ecc4c8d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/37d081c393f95a14e2704af38ecc4c8d?s=96&d=mm&r=g","caption":"Peter Bengtsson"},"description":"Peter is a staff web developer at Mozilla working on MDN Web Docs. He blogs on www.peterbe.com","sameAs":["https:\/\/www.peterbe.com"],"url":"https:\/\/blog.mozilla.org\/webdev\/author\/pbengtssonmozilla-com\/"}]}},"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/posts\/3601"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/users\/325"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/comments?post=3601"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/posts\/3601\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/media?parent=3601"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/categories?post=3601"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/tags?post=3601"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/coauthors?post=3601"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}