{"id":2206,"date":"2011-11-18T15:05:38","date_gmt":"2011-11-18T23:05:38","guid":{"rendered":"http:\/\/blog.mozilla.org\/webdev\/?p=2206"},"modified":"2011-11-18T15:05:38","modified_gmt":"2011-11-18T23:05:38","slug":"scrubbing-your-django-database","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/webdev\/2011\/11\/18\/scrubbing-your-django-database\/","title":{"rendered":"Scrubbing your Django database"},"content":{"rendered":"<p><em>This is the second in a <a href=\"http:\/\/blog.mozilla.org\/webdev\/2011\/10\/20\/open-sourcing-your-django-site\/\">series of posts<\/a>, focusing on issues around open sourcing your Django site and data privacy in Django.<\/em><\/p>\n<p>You&#8217;ll end up with production data in your Django database and that will likely contain different kinds of data such as: configuration data, required basic data (categories for example), collected data and personal user data. There&#8217;s a couple of reasons for taking that production data and copying it off your production servers:<\/p>\n<ul>\n<li>for developers and contributors you want a sample copy of the app with some key data in.<\/li>\n<li>for testing or staging servers, you might want to copy down the database from the production server so you can test certain scenarios or load.<\/li>\n<\/ul>\n<h3>Extracting parts of your database<\/h3>\n<p>For the first case, it&#8217;s nice to prepare a minimal copy of the database that contains key data. For example, for those wanting to develop or contribute to <a href=\"https:\/\/addons.mozilla.org\">addons.mozilla.org<\/a> we have <a href=\"http:\/\/micropipes.com\/blog\/2011\/03\/29\/welcome-to-the-landfill\/\">Landfill<\/a> by <a href=\"http:\/\/micropipes.com\/blog\/\">Wil Clouser<\/a>.<\/p>\n<p>Django comes with a nice facility for loading data, fixture dumping and loading. This can be used to pull data out of your database and then reload it. However the built in <a href=\"https:\/\/docs.djangoproject.com\/en\/dev\/ref\/django-admin\/#dumpdata-appname-appname-appname-model\">Django dumpdata<\/a> dumps all the records for your model (depending upon your default object manager). That might not be what you want for this scenario. So a useful utility for dumping just the records you want is provided <a href=\"https:\/\/github.com\/davedash\/django-fixture-magic\">Django fixture magic<\/a> written by <a href=\"http:\/\/blog.mozilla.org\/webdev\/author\/ddashmozillacom\/\">Dave Dash<\/a>.<\/p>\n<p>A standard dumpdata looks like this:<\/p>\n<pre>manage.py dumpdata users.UserProfile<\/pre>\n<p>And will dump every <code>UserProfile<\/code>. By contrast:<\/p>\n<pre>manage.py dump_object users.UserProfile 1<\/pre>\n<p>Will just dump the <code>UserProfile<\/code> with primary key of 1. Django fixture magic also has a few other useful things such as merging and reordering fixtures.<\/p>\n<p>This allows you to trim a set of fixtures from your live database down quickly. Then developers or contributors can load the key parts of the database that they need from those fixtures.<\/p>\n<h3>Anonymising the database<\/h3>\n<p>Sending the production database downstream to internal developers or internal test sites is a pretty common use case. This process does not require a complete clean of the database, but it does require some cleaning of database. If you stored credit card data, for example, you&#8217;d never want to copy that off your production database.<\/p>\n<p>At Mozilla we use an <a href=\"https:\/\/github.com\/davedash\/mysql-anonymous\">anonymising script<\/a>, written by <a href=\"http:\/\/blog.mozilla.org\/webdev\/author\/ddashmozillacom\/\">Dave Dash<\/a> again. There are few options: to truncate, nullify, randomize or selectively delete. The format is a simple YAML file, for example:<\/p>\n<pre>\r\n   tables:\r\n        users:\r\n            random_email: email\r\n            nullify:\r\n                - firstname\r\n<\/pre>\n<p>This is a snippet from <a href=\"https:\/\/github.com\/mozilla\/zamboni\/blob\/master\/configs\/mysql-anonymous\/anonymize.yml\">the config script<\/a> for <a href=\"https:\/\/addons.mozilla.org\">addons.mozilla.org<\/a>.<\/p>\n<\/p>\n<p>When the IT copies the databases down from production, this script is run against the database. Ensuring that when us developers access the backups to investigate certain issues, we&#8217;ll be getting the bits we want and not the bits that might expose user data.<\/p>\n<p><em>In the next blog post we&#8217;ll look at logs and tracebacks.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is the second in a series of posts, focusing on issues around open sourcing your Django site and data privacy in Django. You&#8217;ll end up with production data in your Django database and that will likely contain different kinds &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/webdev\/2011\/11\/18\/scrubbing-your-django-database\/\">Continue reading<\/a><\/p>\n","protected":false},"author":271,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[288],"tags":[553],"coauthors":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Scrubbing your Django database - Mozilla Web Development<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.mozilla.org\/webdev\/2011\/11\/18\/scrubbing-your-django-database\/\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Andy McKay\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.mozilla.org\/webdev\/2011\/11\/18\/scrubbing-your-django-database\/\",\"url\":\"https:\/\/blog.mozilla.org\/webdev\/2011\/11\/18\/scrubbing-your-django-database\/\",\"name\":\"Scrubbing your Django database - Mozilla Web Development\",\"isPartOf\":{\"@id\":\"https:\/\/blog.mozilla.org\/webdev\/#website\"},\"datePublished\":\"2011-11-18T23:05:38+00:00\",\"dateModified\":\"2011-11-18T23:05:38+00:00\",\"author\":{\"@id\":\"https:\/\/blog.mozilla.org\/webdev\/#\/schema\/person\/7e1881db0e8a23a4a06695f8a0efd6b8\"},\"breadcrumb\":{\"@id\":\"https:\/\/blog.mozilla.org\/webdev\/2011\/11\/18\/scrubbing-your-django-database\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.mozilla.org\/webdev\/2011\/11\/18\/scrubbing-your-django-database\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.mozilla.org\/webdev\/2011\/11\/18\/scrubbing-your-django-database\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.mozilla.org\/webdev\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Scrubbing your Django database\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.mozilla.org\/webdev\/#website\",\"url\":\"https:\/\/blog.mozilla.org\/webdev\/\",\"name\":\"Mozilla Web Development\",\"description\":\"For make benefit of glorious tubes\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.mozilla.org\/webdev\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.mozilla.org\/webdev\/#\/schema\/person\/7e1881db0e8a23a4a06695f8a0efd6b8\",\"name\":\"Andy McKay\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.mozilla.org\/webdev\/#\/schema\/person\/image\/96eb032e0f9fa78d076a49a55bf3cd09\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/ad304e7a7d4f6fba05a81b10810fe6fd?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/ad304e7a7d4f6fba05a81b10810fe6fd?s=96&d=mm&r=g\",\"caption\":\"Andy McKay\"},\"description\":\"Andy is an Engineering Manager at Mozilla. As a Canadian he tweets and blogs about curling, skiing, politics, maple syrup, bears and all things from the great white north.\",\"sameAs\":[\"http:\/\/mckay.pub\",\"https:\/\/x.com\/andymckay\"],\"url\":\"https:\/\/blog.mozilla.org\/webdev\/author\/amckaymozilla-com\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Scrubbing your Django database - Mozilla Web Development","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.mozilla.org\/webdev\/2011\/11\/18\/scrubbing-your-django-database\/","twitter_misc":{"Written by":"Andy McKay","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/blog.mozilla.org\/webdev\/2011\/11\/18\/scrubbing-your-django-database\/","url":"https:\/\/blog.mozilla.org\/webdev\/2011\/11\/18\/scrubbing-your-django-database\/","name":"Scrubbing your Django database - Mozilla Web Development","isPartOf":{"@id":"https:\/\/blog.mozilla.org\/webdev\/#website"},"datePublished":"2011-11-18T23:05:38+00:00","dateModified":"2011-11-18T23:05:38+00:00","author":{"@id":"https:\/\/blog.mozilla.org\/webdev\/#\/schema\/person\/7e1881db0e8a23a4a06695f8a0efd6b8"},"breadcrumb":{"@id":"https:\/\/blog.mozilla.org\/webdev\/2011\/11\/18\/scrubbing-your-django-database\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.mozilla.org\/webdev\/2011\/11\/18\/scrubbing-your-django-database\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/blog.mozilla.org\/webdev\/2011\/11\/18\/scrubbing-your-django-database\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.mozilla.org\/webdev\/"},{"@type":"ListItem","position":2,"name":"Scrubbing your Django database"}]},{"@type":"WebSite","@id":"https:\/\/blog.mozilla.org\/webdev\/#website","url":"https:\/\/blog.mozilla.org\/webdev\/","name":"Mozilla Web Development","description":"For make benefit of glorious tubes","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.mozilla.org\/webdev\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/blog.mozilla.org\/webdev\/#\/schema\/person\/7e1881db0e8a23a4a06695f8a0efd6b8","name":"Andy McKay","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.mozilla.org\/webdev\/#\/schema\/person\/image\/96eb032e0f9fa78d076a49a55bf3cd09","url":"https:\/\/secure.gravatar.com\/avatar\/ad304e7a7d4f6fba05a81b10810fe6fd?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/ad304e7a7d4f6fba05a81b10810fe6fd?s=96&d=mm&r=g","caption":"Andy McKay"},"description":"Andy is an Engineering Manager at Mozilla. As a Canadian he tweets and blogs about curling, skiing, politics, maple syrup, bears and all things from the great white north.","sameAs":["http:\/\/mckay.pub","https:\/\/x.com\/andymckay"],"url":"https:\/\/blog.mozilla.org\/webdev\/author\/amckaymozilla-com\/"}]}},"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/posts\/2206"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/users\/271"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/comments?post=2206"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/posts\/2206\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/media?parent=2206"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/categories?post=2206"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/tags?post=2206"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mozilla.org\/webdev\/wp-json\/wp\/v2\/coauthors?post=2206"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}