User:Underbar dk/Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles

The Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles by the National Institute of Information and Communications Technology is created by manually translating Japanese Wikipedia articles (related to Kyoto) into English. As of December 23, 2010, 14,111 Japanese articles are translated into English. The corpus is used for supporting research and development relevant to high-performance multilingual machine translation, information extraction, and other language processing technologies.

Use and/or redistribution of the Corpus and the Lexicon is permitted under the conditions of Creative Commons Attribution-Share-Alike License 3.0.

As the corpus is a collection of Japanese Wikipedia manually translated into English, released under CC BY-SA 3.0, English Wikipedia can use this corpus to fill in gaps in its coverage, provided that the articles in the corpus are in a usable state for English Wikipedia.

Scope
Only the articles fulfilling all conditions below will be considered for use on English Wikipedia


 * Articles in the corpus with no corresponding article on English Wikipedia;
 * Articles with sources in the original article on Japanese Wikipedia;

Additional considerations

 * Need to determine if article in the corpus would likely pass English Wikipedia's WP:N
 * Need to cleanup the articles in the corpus to conform with English Wikipedia's manual of style

Methodology
Generated by running the following on Wikidata's Query Helper


 * Get list of urls from source of subtopic page (eg: https://www.japanese-wiki-corpus.org/history.html)


 * Join tables https://planetcalc.com/7487/


 * encode url


 * convert join result to wikitable

Rating criteria
In determining if the original Japanese Wikipedia (jawiki) article has sufficient sourcing for an English Wikipedia (enwiki) article, the jawiki articles are rated into the following ranks:


 * y+: jawiki article has sufficient reliable sources (RS) to satisfy WP:BEFORE and has adequate citation footnotes
 * y: jawiki article has sufficient reliable sources to satisfy WP:BEFORE
 * insuf: jawiki article has insufficient sources (only 1 or overreliance on primary sources); or jawiki article is reasonably tagged for lack of sources or OR, despite satisfying "y" above
 * n: jawiki article lacks sources

Note that the versions of Jawiki articles rated may be drastically different from the versions the corpus was based on. Ratings in brackets, where they exist, refer to the current version rather than the version the corpus is based on (assume current otherwise).

History

 * 1966 articles in the corpus
 * 1419 articles not on enwiki (72%)
 * 1387 articles not on enwiki and presented in human-readable form on https://www.japanese-wiki-corpus.org/history.html

jawiki sourcing check:
 * pass: 540 (38.9%)
 * y+: 25 (1.8%)
 * y: 514 (37.0%)
 * fail: 847 (61.0%)
 * insuf: 338 (24.4%)
 * n: 510 (36.8%)