<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Overview of pdf.js guts</title>
	<atom:link href="http://blog.mozilla.org/cjones/2011/06/15/overview-of-pdf-js-guts/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.mozilla.org/cjones/2011/06/15/overview-of-pdf-js-guts/</link>
	<description>Mozilla hacks and suchlike</description>
	<lastBuildDate>Mon, 09 Jan 2012 07:35:58 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
	<item>
		<title>By: po_chuang</title>
		<link>http://blog.mozilla.org/cjones/2011/06/15/overview-of-pdf-js-guts/comment-page-1/#comment-7769</link>
		<dc:creator>po_chuang</dc:creator>
		<pubDate>Mon, 09 Jan 2012 07:35:58 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/cjones/?p=82#comment-7769</guid>
		<description><![CDATA[This idea is gorgeous!]]></description>
		<content:encoded><![CDATA[<p>This idea is gorgeous!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cjones</title>
		<link>http://blog.mozilla.org/cjones/2011/06/15/overview-of-pdf-js-guts/comment-page-1/#comment-6916</link>
		<dc:creator>cjones</dc:creator>
		<pubDate>Thu, 24 Nov 2011 00:45:14 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/cjones/?p=82#comment-6916</guid>
		<description><![CDATA[abhilash: There&#039;s an issue on file in the pdf.js github tracking the SVG backend.  @notmasteryet made a prototype.

shobo: HTTP byte-range requests.]]></description>
		<content:encoded><![CDATA[<p>abhilash: There&#8217;s an issue on file in the pdf.js github tracking the SVG backend.  @notmasteryet made a prototype.</p>
<p>shobo: HTTP byte-range requests.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: shobo</title>
		<link>http://blog.mozilla.org/cjones/2011/06/15/overview-of-pdf-js-guts/comment-page-1/#comment-6907</link>
		<dc:creator>shobo</dc:creator>
		<pubDate>Wed, 23 Nov 2011 14:39:53 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/cjones/?p=82#comment-6907</guid>
		<description><![CDATA[How can a client-side browser component only request a specific page from a server-side PDF without some server-side component to extract it individually? &lt;a href=&quot;http://yourbookmaker.co.uk&quot; rel=&quot;nofollow&quot;&gt;&lt;/a&gt;

Can please somebody answer this question? I would really appreciate it!

Thank you!]]></description>
		<content:encoded><![CDATA[<p>How can a client-side browser component only request a specific page from a server-side PDF without some server-side component to extract it individually? <a href="http://yourbookmaker.co.uk" rel="nofollow"></a></p>
<p>Can please somebody answer this question? I would really appreciate it!</p>
<p>Thank you!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: abhilash</title>
		<link>http://blog.mozilla.org/cjones/2011/06/15/overview-of-pdf-js-guts/comment-page-1/#comment-5283</link>
		<dc:creator>abhilash</dc:creator>
		<pubDate>Thu, 15 Sep 2011 06:00:04 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/cjones/?p=82#comment-5283</guid>
		<description><![CDATA[When do you think will start working on the SVG version. I would like to contribute to the code once basic text selection is possible.]]></description>
		<content:encoded><![CDATA[<p>When do you think will start working on the SVG version. I would like to contribute to the code once basic text selection is possible.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Leif Halvard Silli</title>
		<link>http://blog.mozilla.org/cjones/2011/06/15/overview-of-pdf-js-guts/comment-page-1/#comment-4309</link>
		<dc:creator>Leif Halvard Silli</dc:creator>
		<pubDate>Sat, 16 Jul 2011 07:29:03 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/cjones/?p=82#comment-4309</guid>
		<description><![CDATA[ACCHSH: http://www.puzzleflow.com/company]]></description>
		<content:encoded><![CDATA[<p>ACCHSH: <a href="http://www.puzzleflow.com/company" rel="nofollow">http://www.puzzleflow.com/company</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cjones</title>
		<link>http://blog.mozilla.org/cjones/2011/06/15/overview-of-pdf-js-guts/comment-page-1/#comment-4264</link>
		<dc:creator>cjones</dc:creator>
		<pubDate>Sun, 10 Jul 2011 21:16:24 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/cjones/?p=82#comment-4264</guid>
		<description><![CDATA[Bashev: Pdf.js should support Cyrillic documents (it doesn&#039;t really know what alphabet it&#039;s using to render).  Have you tried http://andreasgal.github.com/pdf.js/multi_page_viewer.html ?  If that&#039;s still broken, can you let us know which PDF file isn&#039;t working for you?  Pdf.js still has lots of bugs ;).]]></description>
		<content:encoded><![CDATA[<p>Bashev: Pdf.js should support Cyrillic documents (it doesn&#8217;t really know what alphabet it&#8217;s using to render).  Have you tried <a href="http://andreasgal.github.com/pdf.js/multi_page_viewer.html" rel="nofollow">http://andreasgal.github.com/pdf.js/multi_page_viewer.html</a> ?  If that&#8217;s still broken, can you let us know which PDF file isn&#8217;t working for you?  Pdf.js still has lots of bugs <img src='http://blog.mozilla.org/cjones/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> .</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bashev</title>
		<link>http://blog.mozilla.org/cjones/2011/06/15/overview-of-pdf-js-guts/comment-page-1/#comment-4261</link>
		<dc:creator>Bashev</dc:creator>
		<pubDate>Sun, 10 Jul 2011 15:15:15 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/cjones/?p=82#comment-4261</guid>
		<description><![CDATA[This version does not support Cyrillic documents.

May be you fixed this in some of next releases?]]></description>
		<content:encoded><![CDATA[<p>This version does not support Cyrillic documents.</p>
<p>May be you fixed this in some of next releases?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cjones</title>
		<link>http://blog.mozilla.org/cjones/2011/06/15/overview-of-pdf-js-guts/comment-page-1/#comment-4102</link>
		<dc:creator>cjones</dc:creator>
		<pubDate>Mon, 27 Jun 2011 16:41:08 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/cjones/?p=82#comment-4102</guid>
		<description><![CDATA[Baz: That is incredibly helpful, thank you.  I&#039;ve linked your notes from our wiki.  Of course, the code is at https://github.com/andreasgal/pdf.js, if you ever have some free time ... ;)]]></description>
		<content:encoded><![CDATA[<p>Baz: That is incredibly helpful, thank you.  I&#8217;ve linked your notes from our wiki.  Of course, the code is at <a href="https://github.com/andreasgal/pdf.js" rel="nofollow">https://github.com/andreasgal/pdf.js</a>, if you ever have some free time &#8230; <img src='http://blog.mozilla.org/cjones/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ThomasW</title>
		<link>http://blog.mozilla.org/cjones/2011/06/15/overview-of-pdf-js-guts/comment-page-1/#comment-4101</link>
		<dc:creator>ThomasW</dc:creator>
		<pubDate>Mon, 27 Jun 2011 15:56:33 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/cjones/?p=82#comment-4101</guid>
		<description><![CDATA[O.K., then my comment was a bit short-sighted. I thought that this complex shaping might be handled in form of ligatures, but if it can&#039;t, then I&#039;m remaining silent...]]></description>
		<content:encoded><![CDATA[<p>O.K., then my comment was a bit short-sighted. I thought that this complex shaping might be handled in form of ligatures, but if it can&#8217;t, then I&#8217;m remaining silent&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Baz</title>
		<link>http://blog.mozilla.org/cjones/2011/06/15/overview-of-pdf-js-guts/comment-page-1/#comment-4100</link>
		<dc:creator>Baz</dc:creator>
		<pubDate>Mon, 27 Jun 2011 12:54:14 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.org/cjones/?p=82#comment-4100</guid>
		<description><![CDATA[Re the &#039;big project&#039; for text selection: I did some work on fixing this in poppler, which had taken the approach mentioned (identifying &amp; chaining textruns), but it works *really* badly for multicolumn text.

What I did to fix this was to use the reading-order sort algorithm used in Ocropus [T Breuel: High performance document layout analysis. Symposium on Document Image Understanding Technology, pp.209-218 (2003)]. This is a massive improvement, but poppler still has issues because its paragraph detection is primitive; implementing the gutter detection part of the same paper would help with this. 

Adobe&#039;s solution is tagged pdfs, which contain all the info to determine reading order, but trying to deal with tagged pdfs in poppler&#039;s existing data structures is messy. It would have been better to deal with tagged PDFs first, then apply the reading-order heuristics above to infer a tagging structure for &#039;normal&#039; pdfs.

Another problem is that poppler suffers from a lack of an accessible API from the outset. Text and selections work by looking at text within a bounding box, rather than finding a start character and end character and selecting all characters in between. (like eg ATK.Text)

It&#039;d be nice if pdf.js, with its clean slate, can avoid these problems :)]]></description>
		<content:encoded><![CDATA[<p>Re the &#8216;big project&#8217; for text selection: I did some work on fixing this in poppler, which had taken the approach mentioned (identifying &amp; chaining textruns), but it works *really* badly for multicolumn text.</p>
<p>What I did to fix this was to use the reading-order sort algorithm used in Ocropus [T Breuel: High performance document layout analysis. Symposium on Document Image Understanding Technology, pp.209-218 (2003)]. This is a massive improvement, but poppler still has issues because its paragraph detection is primitive; implementing the gutter detection part of the same paper would help with this. </p>
<p>Adobe&#8217;s solution is tagged pdfs, which contain all the info to determine reading order, but trying to deal with tagged pdfs in poppler&#8217;s existing data structures is messy. It would have been better to deal with tagged PDFs first, then apply the reading-order heuristics above to infer a tagging structure for &#8216;normal&#8217; pdfs.</p>
<p>Another problem is that poppler suffers from a lack of an accessible API from the outset. Text and selections work by looking at text within a bounding box, rather than finding a start character and end character and selecting all characters in between. (like eg ATK.Text)</p>
<p>It&#8217;d be nice if pdf.js, with its clean slate, can avoid these problems <img src='http://blog.mozilla.org/cjones/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
</channel>
</rss>
