Wikipedia:WikiProject Terrorism/Guantanamo/The NYTimes Guantanamo Docket

The NYTimes Guantanamo Docket
In November 2008 The New York Times put up what it calls its Guantanamo Docket. The project consists of about 200 pages of scaffolding, and about 16,000 pages of original OARDEC documents. The scaffolding consists of a list of all the captives, and lists of the captives by nationality. IMO the information in these lists is not copyrightable, as per Feith v. Rural.

The scans from the original public domain documents is also not copyrightable.

Each of pages from the published OARDEC allegation memos and transcripts has the information from those documents in two formats. First each page of the large OARDEC portable document format files has been converted into an image -- one image per page. Second, each page has been optically scanned, to convert it into searchable text.

The New York Times provided a search capability. Readers can search the whole docket. Or they can search for words within the documents for a single captive.

Unfortunately, automated optical scans are imperfect. Dust, or manual notations, on the original documents can produce gibberish. It can also generate typographical errors, when a letter is scanned in as a similar letter, or letter pair, or when digits are scanned as letters, and vice versa.

Even cursory manual correction of automated optical scans adds at least one order of magnitude more work. Manual correction intended to eliminate almost all typos adds several orders of magnitude more work. The New York Times did not perform this level of corrections, so some searches will fail, due to typographical errors. Geo Swan (talk) 17:14, 29 March 2010 (UTC)

Each captives' dossier
Each captive has a dossier within the "docket".

If the captive was transferred or released prior to having a CSR Tribunal convened on his behalf the New York Times has a short page with a short paragraph, stating his nationality, age, and date of repatriation.

When a captive had a CSR Tribunal allegaitn memo, links are provided to the captive's CSR Tribunal memo, and to the ARB memos, and to the transcripts from the Tribunals and ARBs. Captives can have up to eight documents in their dossier.

The NYTimes did not republish the decision memos, or the habeas dossiers. Geo Swan (talk) 17:14, 29 March 2010 (UTC)

Capturing the optically scanned text
It is possible to capture the optically scanned text. Becasue the original was in the public domain, the NYTimes optical scans are also, IMO, in the public domain. Using the view source feature of a browser like mozilla firefox one can fairly easily capture the optically scanned text from the first page of memo or transcript.

The page that opens up when one invoked the view source feature is mainly scaffolding. The text of the document constitutes about one eighth of the web-page, and is embedded about in the middle. You have to scroll down to find it.

It is possible to capture subsequent pages, but it is more work.

Unfortunately the NYTimes next page button presents the reader with the next page of the document. But the view source feature still show the text from the first page. To get to the text of a subsequent page a readers has to:
 * 1) scroll to the subsequent page;
 * 2) pick a phrase that occurs on the subsequent page, and type it in the box that lets readers search within a single captive's documents;
 * 3) clicking search should present the reader with a list of pages where the search term occurred within the captive's documents. If it doesn't find the phrase it means the optical scan produced a typo, and the reader should pick a different phrase, and try to search again.
 * 4) The reader should ten select the appropriate page from the list.
 * 5) Once a page has been brought up through the search function, page view shows the text of that page of the document, which can then be captured, and worked with. Geo Swan (talk) 17:14, 29 March 2010 (UTC)

The quality of the NYTimes efforts
The peeling apart of the original OARDEC pdf files, the optical scanning of them, and the integration of them, so the text can be searched must have represented a very considerable effort. It may have represented several person-years.

It has been an excellent resource.

Nevertheless, I consider it a much more limited resource than another contributor. That other contributor has asserted that we should place great confidence that the NYTimes didn't just go to a very considerable clerical effort in the tasks of scanning the text, collating it, and making it searchable. That other contributor has repeatedly asserted that the NYTimes (1) employs subject field experts; (2) that those subject field experts brought their subject field knowledge to certain aspects of how the presented the information. Geo Swan (talk) 17:14, 29 March 2010 (UTC)

The reliability of the NYTimes choice of names
For many captives there are multiple incompatible transliterations of the captives' names. In some cases the names aren't just different transliterations -- the multipled names the DoD called the captives are wildly different, or don't even resemble one another.

The other wikipedian has assured us that the NYTimes experts on the transliteration of names written in non-European scripts reviewed the different names, and made an informed choice of the primary name for the captive. They have argued that we should rename several dozen articles to follow the NYTimes lead.

I see no sign that the NYTimes ever claimed its choice of names was reviewed by language experts. I am skeptical that it was.

They offered no crtical intepretation of any other aspect of the document. Some of the age figures proved to be unreliable. But they repeated the DoD figures without question.

One possible justification for using a different name for a captive than one of the names the DoD used could be to show respect for what the captive says his name was. And in this I have asserted that the NYTimes has been inconsistent. For some captive they have used wht the captive said his real name was. In other cases they continued to use the DoD's name. Geo Swan (talk) 17:14, 29 March 2010 (UTC)

How to resolve whether the NYTimes called upon its subject field experts
We could request the NYTimes reply to a letter from us with a letter to the OTRS committee, that confirms or refutes theories abot the scholarship they applied.

If we are going to contact them, I think, out of courtesy, we should agree to one brief, simple letter ahead of time.

I'd be very reluctant to take into account a reply to a letter I didn't share in the drafting. Geo Swan (talk) 17:14, 29 March 2010 (UTC)