User talk:Billinghurst/Archives/2011/February

The Signpost: 31 January 2011
Read this Signpost in full &middot; Single-page &middot; Unsubscribe &middot; EdwardsBot (talk) 01:20, 1 February 2011 (UTC)

The Signpost: 7 February 2011
Read this Signpost in full &middot; Single-page &middot; Unsubscribe &middot; EdwardsBot (talk) 00:55, 8 February 2011 (UTC)

Template:Cite Men of the Time
[updated outdated information; the four Google Books scans are in any case now listed for reference on the Index .djvu page] -- P.T. Aufrette (talk) 21:12, 21 February 2011 (UTC)

Google Books has high-quality scans of the 11th edition (1884) ( Google Books scans:, helpful where a page is missing or illegible ), and OCR text can be obtained by clicking on the "Plain text" link on their page.

In many cases, the scan in Wikisource is very low quality, sometimes outright illegible. Compare:
 * http://books.google.com/books?id=-VtkAAAAMAAJ&pg=PA6&lpg=PA6
 * http://en.wikisource.org/wiki/Page:Men_of_the_Time.djvu/14

It might not be a bad idea to wholesale-replace the OCR text in Wikisource (derived from the poor-quality scan) with the one provided by Google Books "Plain text", and use that as a basis for proofreading. -- P.T. Aufrette (talk) 22:19, 14 February 2011 (UTC)
 * Unfortunately from where I live, and checking via proxy services, that is not a full downloadable version. :-/  If you can get it then it would be great if you did and either upload it to archive.org for conversion to PDF or and upload it as a PDF to Commons. Then please do so and tell me which you have done, either here, at WS, or at Commons, and I will proceed from there. billinghurst  sDrewth  00:42, 15 February 2011 (UTC)
 * There is a PDF link at the right-hand side of the page. Using it I could download a 36 MB PDF file.  I originally posted the URL books.google.ca instead of books.google.com, perhaps that was causing a problem?  I corrected the link (above), perhaps you could try it again?


 * The part that is really valuable, though, is using Google's OCR text: it is much, much less erroneous than the current text in Wikisource. They seem to be doing more sophisticated than just simple scanning of the page, for instance, they recombine hyphenated words and perhaps use some heuristics. So proofreading corrections can be done a couple of orders of magnitude faster.  However, I don't really know how to automate the uploading of the OCR text, other than cutting and pasting each individual page. -- P.T. Aufrette (talk) 02:47, 15 February 2011 (UTC) -- P.T. Aufrette (talk) 21:12, 21 February 2011 (UTC)

The Signpost: 14 February 2011
Read this Signpost in full &middot; Single-page &middot; Unsubscribe &middot; EdwardsBot (talk) 00:52, 15 February 2011 (UTC)

The Signpost: 21 February 2011
Read this Signpost in full &middot; Single-page &middot; Unsubscribe &middot; EdwardsBot (talk) 17:20, 22 February 2011 (UTC)

English College, Lisbon
So now there is an outline article. Charles Matthews (talk) 22:30, 25 February 2011 (UTC)