<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for Blog of Data</title>
	<atom:link href="http://blog.mozilla.org/data/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.mozilla.org/data</link>
	<description>Mozilla metrics team technical articles</description>
	<lastBuildDate>Sat, 01 Oct 2011 18:32:33 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
	<item>
		<title>Comment on Migrating HBase: In the Trenches by Tom Goren</title>
		<link>http://blog.mozilla.org/data/2011/02/04/migrating-hbase-in-the-trenches/comment-page-1/#comment-3589</link>
		<dc:creator>Tom Goren</dc:creator>
		<pubDate>Sat, 01 Oct 2011 18:32:33 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=348#comment-3589</guid>
		<description><![CDATA[You guys did an awesome job with this.
You are welcome to check out my solution as well on &lt;a href=&quot;http://tech.tomgoren.com/archives/284&quot; title=&quot;hbase migration&quot; rel=&quot;nofollow&quot;&gt;my blog&lt;/a&gt;.
I was a little hesitant to copy hbase data straight from the hdfs due to the same data consistency worries you stated as well when planning.
Instead I went a little roundabout, and while I&#039;m sure your solution out performs mine by far, my approach seems to require a little less manual intervention. 
Also you can divide the table into time stamp based chunks fairly easily, and batch the process.

Anyhow thanks a lot! just my humble contribution, hope it helps somebody (we had to migrate the exact same route 0.20.x to CDH3).]]></description>
		<content:encoded><![CDATA[<p>You guys did an awesome job with this.<br />
You are welcome to check out my solution as well on <a href="http://tech.tomgoren.com/archives/284" title="hbase migration" rel="nofollow">my blog</a>.<br />
I was a little hesitant to copy hbase data straight from the hdfs due to the same data consistency worries you stated as well when planning.<br />
Instead I went a little roundabout, and while I&#8217;m sure your solution out performs mine by far, my approach seems to require a little less manual intervention.<br />
Also you can divide the table into time stamp based chunks fairly easily, and batch the process.</p>
<p>Anyhow thanks a lot! just my humble contribution, hope it helps somebody (we had to migrate the exact same route 0.20.x to CDH3).</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Flume, Hive and realtime indexing via ElasticSearch by Understanding DNT Adoption within Firefox &#171; Blog of Metrics</title>
		<link>http://blog.mozilla.org/data/2010/12/30/flume-hive-and-realtime-indexing-via-elasticsearch-2/comment-page-1/#comment-3552</link>
		<dc:creator>Understanding DNT Adoption within Firefox &#171; Blog of Metrics</dc:creator>
		<pubDate>Thu, 08 Sep 2011 16:56:07 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=333#comment-3552</guid>
		<description><![CDATA[[...] guy lives and dies by the data. In late 2010, the metrics team gave a small talk on how we collect log data (click here for the video ppt). While that project has gone multiple iterations over time, the [...]]]></description>
		<content:encoded><![CDATA[<p>[...] guy lives and dies by the data. In late 2010, the metrics team gave a small talk on how we collect log data (click here for the video ppt). While that project has gone multiple iterations over time, the [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on How glow.mozilla.org gets its data by HBase 的应用案例 &#124; 我要去桂林_田春峰的互联网生活</title>
		<link>http://blog.mozilla.org/data/2011/03/22/how-glow-mozilla-org-gets-its-data/comment-page-1/#comment-3271</link>
		<dc:creator>HBase 的应用案例 &#124; 我要去桂林_田春峰的互联网生活</dc:creator>
		<pubDate>Sat, 02 Jul 2011 12:10:41 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=385#comment-3271</guid>
		<description><![CDATA[[...] The first version of Glow used several technologies including SQLStream and HBase to process download request logs in real time and make them available for the site to display. If you are interested in learning the technical details behind Glow read this article from the Mozilla Metrics team, which has links to the code repositories: http://blog.mozilla.org/data/2011/03/22/how-glow-mozilla-org-gets-its-data/ [...]]]></description>
		<content:encoded><![CDATA[<p>[...] The first version of Glow used several technologies including SQLStream and HBase to process download request logs in real time and make them available for the site to display. If you are interested in learning the technical details behind Glow read this article from the Mozilla Metrics team, which has links to the code repositories: <a href="http://blog.mozilla.org/data/2011/03/22/how-glow-mozilla-org-gets-its-data/" rel="nofollow">http://blog.mozilla.org/data/2011/03/22/how-glow-mozilla-org-gets-its-data/</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Migrating HBase: In the Trenches by &#187; HBase Backup Options HBase.info -- All things about HBase</title>
		<link>http://blog.mozilla.org/data/2011/02/04/migrating-hbase-in-the-trenches/comment-page-1/#comment-2970</link>
		<dc:creator>&#187; HBase Backup Options HBase.info -- All things about HBase</dc:creator>
		<pubDate>Thu, 23 Jun 2011 12:10:17 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=348#comment-2970</guid>
		<description><![CDATA[[...] 由于Dictcp做集群复制存在数据不一致的问题，Mozilla的开发人员开发了一个Backup工具，具体情况请参考他们的这篇Migrating HBase in the Trenches。 [...]]]></description>
		<content:encoded><![CDATA[<p>[...] 由于Dictcp做集群复制存在数据不一致的问题，Mozilla的开发人员开发了一个Backup工具，具体情况请参考他们的这篇Migrating HBase in the Trenches。 [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Migrating HBase: In the Trenches by Xavier</title>
		<link>http://blog.mozilla.org/data/2011/02/04/migrating-hbase-in-the-trenches/comment-page-1/#comment-2925</link>
		<dc:creator>Xavier</dc:creator>
		<pubDate>Fri, 27 May 2011 15:44:22 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=348#comment-2925</guid>
		<description><![CDATA[Hi Matthias,

I updated the link. I moved the repo to our mozilla-metrics github organization a couple of weeks ago. You don&#039;t actually have to fix anything in .META. HBase will figure that out on its own. But you do need to copy it. As I alluded to in the post, to minimize your downtime you can use Backup to make a &quot;dirty&quot; non-functioning copy of the data first. Then during your downtime you&#039;ll only need to copy the files that have changed.

Feel free to join our IRC channel irc.mozilla.org #metrics if you need any further clarification.

Cheers,

Xavier]]></description>
		<content:encoded><![CDATA[<p>Hi Matthias,</p>
<p>I updated the link. I moved the repo to our mozilla-metrics github organization a couple of weeks ago. You don&#8217;t actually have to fix anything in .META. HBase will figure that out on its own. But you do need to copy it. As I alluded to in the post, to minimize your downtime you can use Backup to make a &#8220;dirty&#8221; non-functioning copy of the data first. Then during your downtime you&#8217;ll only need to copy the files that have changed.</p>
<p>Feel free to join our IRC channel irc.mozilla.org #metrics if you need any further clarification.</p>
<p>Cheers,</p>
<p>Xavier</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Migrating HBase: In the Trenches by Matthias</title>
		<link>http://blog.mozilla.org/data/2011/02/04/migrating-hbase-in-the-trenches/comment-page-1/#comment-2924</link>
		<dc:creator>Matthias</dc:creator>
		<pubDate>Fri, 27 May 2011 15:23:52 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=348#comment-2924</guid>
		<description><![CDATA[Hi Xavier,
we are also trying to copy hbase data over to new cluster. 0.20.4 to cdh3u0(0.90.1). Since we dont have that much data, we could stop hbase for a short while to perform distcp. However I am not sure how to fix meta information referencing old regionservers once data moves to new cluster. Did you come accross that problem when trying distcp?

Thanks for your help,
Matthias

ps. the link to the backup utility does not work any more. Is this tool still available?]]></description>
		<content:encoded><![CDATA[<p>Hi Xavier,<br />
we are also trying to copy hbase data over to new cluster. 0.20.4 to cdh3u0(0.90.1). Since we dont have that much data, we could stop hbase for a short while to perform distcp. However I am not sure how to fix meta information referencing old regionservers once data moves to new cluster. Did you come accross that problem when trying distcp?</p>
<p>Thanks for your help,<br />
Matthias</p>
<p>ps. the link to the backup utility does not work any more. Is this tool still available?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Riak and Cassandra and HBase, oh my! by Cassandra/Riak/Dynamo Optimistic Concurrency Control</title>
		<link>http://blog.mozilla.org/data/2010/05/18/riak-and-cassandra-and-hbase-oh-my/comment-page-1/#comment-2887</link>
		<dc:creator>Cassandra/Riak/Dynamo Optimistic Concurrency Control</dc:creator>
		<pubDate>Fri, 06 May 2011 02:36:10 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=184#comment-2887</guid>
		<description><![CDATA[[...] along with installed user base, the Dynamo clones seem to have something else.&#160; They are truly web scale.&#160; I say this because, unlike virtually all other NoSQL implementations, the Dynamo-based [...]]]></description>
		<content:encoded><![CDATA[<p>[...] along with installed user base, the Dynamo clones seem to have something else.&#160; They are truly web scale.&#160; I say this because, unlike virtually all other NoSQL implementations, the Dynamo-based [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Flume, Hive and realtime indexing via ElasticSearch by Rich Kroll</title>
		<link>http://blog.mozilla.org/data/2010/12/30/flume-hive-and-realtime-indexing-via-elasticsearch-2/comment-page-1/#comment-2809</link>
		<dc:creator>Rich Kroll</dc:creator>
		<pubDate>Tue, 12 Apr 2011 02:26:26 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=333#comment-2809</guid>
		<description><![CDATA[@deinspanjer I&#039;ve done some experimenting and thought you may be interested in some of the findings.  I am interested in using ElasticSearch in a similar way as you outlined in this post but really wanted to have ES live in it&#039;s own decorator and leverage a fan out sink to write to both ES and Hive.  This would allow for additional decorators over ES or have (think batching).

What I ended up doing was to add a UUID/GUID to the flume event when it is created. This is later used as the ID in ES when indexing the log event and allows for an event to be idempotent.  The drawback to this design is that each event failure causes an additional write to ES.  As there is no queue in front of ES, this could be a problem in the event of a large bulk of failed messages, but could be mitigated with other decorators.

Best,
Rich]]></description>
		<content:encoded><![CDATA[<p>@deinspanjer I&#8217;ve done some experimenting and thought you may be interested in some of the findings.  I am interested in using ElasticSearch in a similar way as you outlined in this post but really wanted to have ES live in it&#8217;s own decorator and leverage a fan out sink to write to both ES and Hive.  This would allow for additional decorators over ES or have (think batching).</p>
<p>What I ended up doing was to add a UUID/GUID to the flume event when it is created. This is later used as the ID in ES when indexing the log event and allows for an event to be idempotent.  The drawback to this design is that each event failure causes an additional write to ES.  As there is no queue in front of ES, this could be a problem in the event of a large bulk of failed messages, but could be mitigated with other decorators.</p>
<p>Best,<br />
Rich</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on How glow.mozilla.org gets its data by Links: Googles Python Anfängerkurs, Firefox 4 Visualisierungen, Data Science Toolkit, RSS Tuning, McLuhan &#8211; tobiaskut.de - Open Source &#124; Content Management &#124; Redaktion</title>
		<link>http://blog.mozilla.org/data/2011/03/22/how-glow-mozilla-org-gets-its-data/comment-page-1/#comment-2788</link>
		<dc:creator>Links: Googles Python Anfängerkurs, Firefox 4 Visualisierungen, Data Science Toolkit, RSS Tuning, McLuhan &#8211; tobiaskut.de - Open Source &#124; Content Management &#124; Redaktion</dc:creator>
		<pubDate>Fri, 08 Apr 2011 07:01:58 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=385#comment-2788</guid>
		<description><![CDATA[[...] Firefox hat zum Launch von Firefox 4 eine sehr beeindruckende Visualisierung ins Netz gestellt. Unter http://glow.mozilla.org lassen sich die Downloads in Echtzeit verfolgen. Jeder Punkt steht für einen Download, am unteren Bildschirmrand zeigt ein Balkendiagramm die Downloads im Minutentakt und wer es ganz genau wissen will hat über das Kreisdiagramm unten links die Möglichkeit die Daten von Kontinenten auf Stadtebene runterzubrechen. Das ist Datenvisualisierung! Details zur technischen Umsetzung stehen im übrigen hier: How glow.mozilla.org gets its data [...]]]></description>
		<content:encoded><![CDATA[<p>[...] Firefox hat zum Launch von Firefox 4 eine sehr beeindruckende Visualisierung ins Netz gestellt. Unter <a href="http://glow.mozilla.org" rel="nofollow">http://glow.mozilla.org</a> lassen sich die Downloads in Echtzeit verfolgen. Jeder Punkt steht für einen Download, am unteren Bildschirmrand zeigt ein Balkendiagramm die Downloads im Minutentakt und wer es ganz genau wissen will hat über das Kreisdiagramm unten links die Möglichkeit die Daten von Kontinenten auf Stadtebene runterzubrechen. Das ist Datenvisualisierung! Details zur technischen Umsetzung stehen im übrigen hier: How glow.mozilla.org gets its data [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on How glow.mozilla.org gets its data by HBase支持的 Firefox4 下载统计页 &#124; laura&#039;s site</title>
		<link>http://blog.mozilla.org/data/2011/03/22/how-glow-mozilla-org-gets-its-data/comment-page-1/#comment-2776</link>
		<dc:creator>HBase支持的 Firefox4 下载统计页 &#124; laura&#039;s site</dc:creator>
		<pubDate>Wed, 06 Apr 2011 03:32:46 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=385#comment-2776</guid>
		<description><![CDATA[[...] 　　英文原文：How glow.mozilla.org gets its data [...]]]></description>
		<content:encoded><![CDATA[<p>[...] 　　英文原文：How glow.mozilla.org gets its data [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
