Talk:Ancient text corpora

Papyrus Amherst 63
The whole thing has been published. See van der Toorn 2018. Srnec (talk) 01:36, 4 May 2023 (UTC)


 * Fantastic new article, thanks . Onceinawhile (talk) 06:26, 4 May 2023 (UTC)

"There are also two old African languages that have hardly been explored"
To what does this refer? Srnec (talk) 14:49, 15 May 2023 (UTC)


 * Good point - this referred to the following two bullets, but was not well formatted. I have copyedited the whole section so it reads more clearly. Onceinawhile (talk) 20:11, 15 May 2023 (UTC)

Definite article
"The definite article incorporated in languages such as Hebrew, Aramaic, and Greek has no equivalent in most languages, so its frequency would significantly affect the comparability of numbers - this is excluded in the estimates below." — the meaning of "incorporated" should be explained, as it isn't obvious that you are referring to the practice of attaching the article to the word it qualifies (I assume). Also, one could compare Hebrew to languages with separate articles by counting Hebrew articles as separate words or by not counting them in other languages; which is it? Finally, why focus on definite articles when there are other common prefixes which are likewise usually not separate words in Hebrew? Zerotalk 12:47, 17 May 2023 (UTC)
 * Thanks . The only reason for this is because Peust does it like this, and his is the most comparable source across the widest number of languages. FYI he writes: Demgegenüber möchte ich den bestimmten Artikel des Hebräischen, Aramäischen und Griechischen nicht berücksichtigen, da er in den meisten Sprachen keine Entsprechung hat, wo er aber existiert, durch seine Häufigkeit die Zahlen deutlich beeinflussen würde.. Onceinawhile (talk) 15:30, 17 May 2023 (UTC)
 * The paragraph as a whole gives a more clear description. My google-inspired translation: "Another fundamental problem lies in the underlying definition of a word. I have generally understood prepositions as words in their own right, even where they are traditionally written together with the noun (e.g. Hebrew). On the other hand, I do not want to consider the definite article of Hebrew, Aramaic, and Greek, since it has no equivalent in most languages, and where it occurs its frequency would significantly affect the counts." I did not even guess at this meaning when I read your sentence with "incorporated". I propose to replace the sentence by "Attached prepositions are counted as separate words, except in the case of the definite article in Hebrew, Aramaic and Greek." Zerotalk 01:45, 18 May 2023 (UTC)
 * Thanks – I have done this. Onceinawhile (talk) 20:01, 31 May 2023 (UTC)

Plural
Fascinating article! Just...is it my misunderstanding, but we seem to be using two of the possible three plural forms of corpus in the article; couldn't we be consistent? Using two different forms to express the same meaning seems as awkward as using both center and centre in the same paragraph would be. Happy days, ~ LindsayHello 09:15, 18 June 2023 (UTC)


 * Thank you - good point. Have fixed this. Onceinawhile (talk) 11:24, 18 June 2023 (UTC)

Irish?
Hi thanks for adding the Irish corpus. Unfortunately I am not sure any of it dates to the period of Ancient history – defined in this article as ending in 300AD – so might not be in scope here? Onceinawhile (talk) 19:36, 18 June 2023 (UTC)

Yes, thanks, I'll remove it. Sheila1988 (talk) 19:41, 18 June 2023 (UTC)

Script?
"Canaanite and Aramaic inscriptions" is not a script. That link needs to be removed from the table. Srnec (talk) 20:44, 18 June 2023 (UTC)


 * Perhaps the label could be changed to "Northwest Semitic scripts" or similar. We don’t have an article on that topic, only the individual articles Phoenician alphabet, Aramaic alphabet and Paleo-Hebrew script. Each of these articles talk about their related scripts, but there is no parent article. The closest we have at the moment is the lede of Canaanite and Aramaic inscriptions – particularly the quotes in footnotes 8-11 – but really we need a proper article on the script family itself. Onceinawhile (talk) 21:29, 18 June 2023 (UTC)


 * Agree that we need an article on the Northwest Semitic scripts. But the link itself is problematic, since we are not talking mainly about inscriptions. I'm fine with a red link. Srnec (talk)

Phoenician
The Phoenician line should perhaps also be removed from the table for now beause it lacks an actual size estimate. Srnec (talk) 20:49, 18 June 2023 (UTC)


 * It does have a size estimate, for the number of texts – which is shown in the table. There is also an estimate for the number of words, which is stated in the footnote but not included in the table as it conflicts with the number of texts estimates. Onceinawhile (talk) 21:03, 18 June 2023 (UTC)


 * So do many languages not in the table (e.g., Lydian). What are the criteria for inclusion in the table? Srnec (talk) 21:46, 18 June 2023 (UTC)
 * The table should include all languages where we have sourced numerical estimates. The only reason there are some in the list below and not yet in the table is because I ran out of time to put them in. I was prioritizing the big ones – and busy unsuccessfully looking for size estimates for ancient Chinese and ancient South Asian corpora. Onceinawhile (talk) 21:59, 18 June 2023 (UTC)

Aramaic
The Aramaic word count in the table looks like an out-of-date lower bound to me. This source (p. 60) puts the Imperial Aramaic corpus at 237,970 words, citing Stephen Kaufman. This one puts the total Aramaic corpus down to the 13th century at 3 million words. This source puts the Qumran corpus at 21,068 words. Taking these numbers and those in the article right now, I get to at least 287,000 words of pre-3rd century AD Aramaic. I think we should amend the number in the table to read >280,000 and add the two sources I've cited. Srnec (talk) 21:46, 18 June 2023 (UTC)
 * Thanks for bringing these. There is definitely more to do on Aramaic, as Peust described it as "confusing" ("unübersichtlich") and he is not specialized in the topic like Kaufman. I have fixed the existing footnote to make Peust's position clearer. The Dead Sea Scrolls figure is a similar count (21,068 vs 15,000 - Peust adjusts for the definite article), and the idea of 3 million is also consistent with Peust who states that after 300 AD the Aramaic corpus "sprunghaft an, da sich jetzt mehrere große Literatursprachen ausbilden".
 * Kaufman's "CAL ImpAram" estimate of 237,970 should definitely be added, but it is puzzling as it is very different to Peust's figures. Note that Cook (your Brill source) wrote in 2022 that the primary source for Imperial Aramaic is the Egyptian texts (see footnote 6 at Textbook of Aramaic Documents from Ancient Egypt; the way Cook describes the corpus is very similar to Peust's description). Peust estimated the first 3 volumes of TADAE to contain 20,000 words (the fourth volume was printed in the same year as Peust's article so I suspect he didn't have access to it). Onceinawhile (talk) 22:37, 18 June 2023 (UTC)
 * "Confusing" it certainly is. Holger Gzella, A Cultural History of Aramaic: From the Beginnings to the Advent of Islam (Brill, 2015), p. 166, says somewhat vaguely that the Elephantine material is "the most extensive part of the total corpus" of Achaemenid Official Aramaic. But on p. 109, he says that "Mesopotamia has yielded the lion's share of the available evidence" for Aramaic in the Neo-Assyrian and Neo-Babylonian periods. So it may depend on what is meant by "Imperial Aramaic", whether it includes the Assyrian and Babylonian periods or not. Gzella has a more recent book on Aramaic, too, but from Google Books I cannot see that he gives word counts. Srnec (talk) 23:58, 20 June 2023 (UTC)