<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Flume, Hive and realtime indexing via ElasticSearch</title>
	<atom:link href="http://blog.mozilla.org/data/2010/12/30/flume-hive-and-realtime-indexing-via-elasticsearch-2/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.mozilla.org/data/2010/12/30/flume-hive-and-realtime-indexing-via-elasticsearch-2/</link>
	<description>Mozilla metrics team technical articles</description>
	<lastBuildDate>Sat, 01 Oct 2011 18:32:33 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
	<item>
		<title>By: Understanding DNT Adoption within Firefox &#171; Blog of Metrics</title>
		<link>http://blog.mozilla.org/data/2010/12/30/flume-hive-and-realtime-indexing-via-elasticsearch-2/comment-page-1/#comment-3552</link>
		<dc:creator>Understanding DNT Adoption within Firefox &#171; Blog of Metrics</dc:creator>
		<pubDate>Thu, 08 Sep 2011 16:56:07 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=333#comment-3552</guid>
		<description><![CDATA[[...] guy lives and dies by the data. In late 2010, the metrics team gave a small talk on how we collect log data (click here for the video ppt). While that project has gone multiple iterations over time, the [...]]]></description>
		<content:encoded><![CDATA[<p>[...] guy lives and dies by the data. In late 2010, the metrics team gave a small talk on how we collect log data (click here for the video ppt). While that project has gone multiple iterations over time, the [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rich Kroll</title>
		<link>http://blog.mozilla.org/data/2010/12/30/flume-hive-and-realtime-indexing-via-elasticsearch-2/comment-page-1/#comment-2809</link>
		<dc:creator>Rich Kroll</dc:creator>
		<pubDate>Tue, 12 Apr 2011 02:26:26 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=333#comment-2809</guid>
		<description><![CDATA[@deinspanjer I&#039;ve done some experimenting and thought you may be interested in some of the findings.  I am interested in using ElasticSearch in a similar way as you outlined in this post but really wanted to have ES live in it&#039;s own decorator and leverage a fan out sink to write to both ES and Hive.  This would allow for additional decorators over ES or have (think batching).

What I ended up doing was to add a UUID/GUID to the flume event when it is created. This is later used as the ID in ES when indexing the log event and allows for an event to be idempotent.  The drawback to this design is that each event failure causes an additional write to ES.  As there is no queue in front of ES, this could be a problem in the event of a large bulk of failed messages, but could be mitigated with other decorators.

Best,
Rich]]></description>
		<content:encoded><![CDATA[<p>@deinspanjer I&#8217;ve done some experimenting and thought you may be interested in some of the findings.  I am interested in using ElasticSearch in a similar way as you outlined in this post but really wanted to have ES live in it&#8217;s own decorator and leverage a fan out sink to write to both ES and Hive.  This would allow for additional decorators over ES or have (think batching).</p>
<p>What I ended up doing was to add a UUID/GUID to the flume event when it is created. This is later used as the ID in ES when indexing the log event and allows for an event to be idempotent.  The drawback to this design is that each event failure causes an additional write to ES.  As there is no queue in front of ES, this could be a problem in the event of a large bulk of failed messages, but could be mitigated with other decorators.</p>
<p>Best,<br />
Rich</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Orange Factor 2: The redesign &#124; 3.1415926535897932384626433&#8230;</title>
		<link>http://blog.mozilla.org/data/2010/12/30/flume-hive-and-realtime-indexing-via-elasticsearch-2/comment-page-1/#comment-2656</link>
		<dc:creator>Orange Factor 2: The redesign &#124; 3.1415926535897932384626433&#8230;</dc:creator>
		<pubDate>Tue, 08 Mar 2011 17:29:08 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=333#comment-2656</guid>
		<description><![CDATA[[...] when you star an orange, the data is pushed into our database.  This database is hosted by the metrics team and stores log files and tbpl data.  Actually we pull from that directly and can calculate by test [...]]]></description>
		<content:encoded><![CDATA[<p>[...] when you star an orange, the data is pushed into our database.  This database is hosted by the metrics team and stores log files and tbpl data.  Actually we pull from that directly and can calculate by test [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: deinspanjer</title>
		<link>http://blog.mozilla.org/data/2010/12/30/flume-hive-and-realtime-indexing-via-elasticsearch-2/comment-page-1/#comment-2488</link>
		<dc:creator>deinspanjer</dc:creator>
		<pubDate>Tue, 04 Jan 2011 13:35:16 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=333#comment-2488</guid>
		<description><![CDATA[There is some documentation, but I don&#039;t have the link handy at the moment. The A-team will be putting a front-end on this that should be accessible to anyone.
The data source is tinderbox logs, and if the UI is too specific then we can certainly look at providing access to the data in elasticsearch.]]></description>
		<content:encoded><![CDATA[<p>There is some documentation, but I don&#8217;t have the link handy at the moment. The A-team will be putting a front-end on this that should be accessible to anyone.<br />
The data source is tinderbox logs, and if the UI is too specific then we can certainly look at providing access to the data in elasticsearch.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Axel Hecht</title>
		<link>http://blog.mozilla.org/data/2010/12/30/flume-hive-and-realtime-indexing-via-elasticsearch-2/comment-page-1/#comment-2487</link>
		<dc:creator>Axel Hecht</dc:creator>
		<pubDate>Tue, 04 Jan 2011 12:47:31 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=333#comment-2487</guid>
		<description><![CDATA[Is there information on what exactly you store, and how folks beyond WOO could get to that/benefit from it?]]></description>
		<content:encoded><![CDATA[<p>Is there information on what exactly you store, and how folks beyond WOO could get to that/benefit from it?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: aphadke</title>
		<link>http://blog.mozilla.org/data/2010/12/30/flume-hive-and-realtime-indexing-via-elasticsearch-2/comment-page-1/#comment-2482</link>
		<dc:creator>aphadke</dc:creator>
		<pubDate>Sun, 02 Jan 2011 18:07:28 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=333#comment-2482</guid>
		<description><![CDATA[Jonathan,
Thanks for the compliment! Excited to see what Flume has to offer in 2011 :-)

-anurag]]></description>
		<content:encoded><![CDATA[<p>Jonathan,<br />
Thanks for the compliment! Excited to see what Flume has to offer in 2011 :-)</p>
<p>-anurag</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Hsieh</title>
		<link>http://blog.mozilla.org/data/2010/12/30/flume-hive-and-realtime-indexing-via-elasticsearch-2/comment-page-1/#comment-2481</link>
		<dc:creator>Jonathan Hsieh</dc:creator>
		<pubDate>Sun, 02 Jan 2011 17:10:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=333#comment-2481</guid>
		<description><![CDATA[Great post guys!  We&#039;re working on making the Flume&#039;s agents/collectors topologies more flexible to make doing what you have done easier in the future!]]></description>
		<content:encoded><![CDATA[<p>Great post guys!  We&#8217;re working on making the Flume&#8217;s agents/collectors topologies more flexible to make doing what you have done easier in the future!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: deinspanjer</title>
		<link>http://blog.mozilla.org/data/2010/12/30/flume-hive-and-realtime-indexing-via-elasticsearch-2/comment-page-1/#comment-2480</link>
		<dc:creator>deinspanjer</dc:creator>
		<pubDate>Sun, 02 Jan 2011 15:50:19 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=333#comment-2480</guid>
		<description><![CDATA[We have done a bit of exploration with fanout, but currently, fanout is incompatible with DFO.  If the DFO gets an ack before any failures, it will not handle failures from any other sinks.  Conversely, if the very last sink to report in is a failure, then it will resend to all the sinks.

I wanted to try to patch this, but we didn&#039;t have enough time in our schedule, and Cloudera devs mentioned that there was some re-architecture work to be done in that area so I was afraid that we might be patching soon-to-be dead code.]]></description>
		<content:encoded><![CDATA[<p>We have done a bit of exploration with fanout, but currently, fanout is incompatible with DFO.  If the DFO gets an ack before any failures, it will not handle failures from any other sinks.  Conversely, if the very last sink to report in is a failure, then it will resend to all the sinks.</p>
<p>I wanted to try to patch this, but we didn&#8217;t have enough time in our schedule, and Cloudera devs mentioned that there was some re-architecture work to be done in that area so I was afraid that we might be patching soon-to-be dead code.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rich Kroll</title>
		<link>http://blog.mozilla.org/data/2010/12/30/flume-hive-and-realtime-indexing-via-elasticsearch-2/comment-page-1/#comment-2479</link>
		<dc:creator>Rich Kroll</dc:creator>
		<pubDate>Sun, 02 Jan 2011 13:23:04 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=333#comment-2479</guid>
		<description><![CDATA[Thanks for the great writeup!  I was just wondering if you have used the fanout collector instead of combining the two sinks in one?]]></description>
		<content:encoded><![CDATA[<p>Thanks for the great writeup!  I was just wondering if you have used the fanout collector instead of combining the two sinks in one?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tweets that mention Blog of Data » Blog Archive » Flume, Hive and realtime indexing via ElasticSearch -- Topsy.com</title>
		<link>http://blog.mozilla.org/data/2010/12/30/flume-hive-and-realtime-indexing-via-elasticsearch-2/comment-page-1/#comment-2474</link>
		<dc:creator>Tweets that mention Blog of Data » Blog Archive » Flume, Hive and realtime indexing via ElasticSearch -- Topsy.com</dc:creator>
		<pubDate>Fri, 31 Dec 2010 04:41:56 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/data/?p=333#comment-2474</guid>
		<description><![CDATA[[...] This post was mentioned on Twitter by Planet Mozilla, anurag, anurag and others. anurag said: CORRECTED Flume, Hive and ElasticSearch at Mozilla - http://bit.ly/flume-es-hive WP cache seems to have an old copy, sorry for the bad link [...]]]></description>
		<content:encoded><![CDATA[<p>[...] This post was mentioned on Twitter by Planet Mozilla, anurag, anurag and others. anurag said: CORRECTED Flume, Hive and ElasticSearch at Mozilla &#8211; <a href="http://bit.ly/flume-es-hive" rel="nofollow">http://bit.ly/flume-es-hive</a> WP cache seems to have an old copy, sorry for the bad link [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
