Wikipedia:Reference desk/Archives/Language/2018 March 12

= March 12 =

Help with a Russian name.
What is the correct English transliteration of Евдокия Николаевна Завалий ? For context this is the woman in this article, this article and this article. If at all possible, the correct transliteration of the male pseudonym she used, mentioned in the uk.wiki and ru.wiki articles, would be appreciated. Prince of Thieves (talk) 00:49, 12 March 2018 (UTC)


 * There is no single transliteration standard for Russian, but the transliterations in the article are both OK. If you want the second one to conform to the first, use Zavaliy Yevdokim Nikolayevich (with maybe a note that this comes from "Завалий Евдок. Ник." in her formal documents, so Zavaliy is still the surname) 93.136.57.218 (talk) 01:11, 12 March 2018 (UTC)


 * So Yevdokim and Evdokim are equivalent? Prince of Thieves (talk) 01:25, 12 March 2018 (UTC)


 * They're both transliterations of a word beginning with Cyrillic letter "Е", though "Ye-" indicates more accurately the pronunciation. Cyrillic letter "Э" would always be "E-", never "Ye-"... AnonMoos (talk) 03:47, 12 March 2018 (UTC)


 * Thanks. Prince of Thieves (talk) 09:12, 12 March 2018 (UTC)


 * I don't know what the "-kim" is in the last few paragraphs above. The name ends with -кия "-kiya". While that does change for different cases, there isn't one that ends "-kim". --ColinFine (talk) 12:41, 12 March 2018 (UTC)


 * The female name Yevdokiya changes it suffix in declension, like most proper and improper names and nouns in Slavic languages. But you're right, Yevdokim is not a grammar case of Yevdokia – it is a male-like modification, which she took when doctors didn't recognize she is a woman.
 * После лечения была направлена в запасно́й полк, и, когда отбирали солдат на передовую, её приняли за мужчину, тем более что она была в гимнастёрке и галифе, а в документах было записано «Завалий Евдок. Ник.». Она никого разубеждать не стала и была направлена в 6-ю десантную бригаду морской пехоты как Завалий Евдоким Николаевич.[2]' (ru-wiki)
 * (Google transl.) After the treatment, she was sent to the reserve regiment, and when the soldiers were taken to the front line, they took her for a man, especially since she was in a gymnast and riding breeches, and in the documents was written "Zavaliy Yevdok. Nik.". She did not dissuade anyone and was sent to the 6th Marine Infantry Brigade as Zavaliy Evdokim Nikolaevich. [2]
 * CiaPan (talk) 13:09, 12 March 2018 (UTC)
 * P.S. Евдокия/Yevdokiya is a Russian form of Eudoxia, whose proper male counterpart is Eudoxus or Eudoxius. That one is Евдокс/Yevdox in Russian (see disambig ru:Евдокс (значения) at ru-wiki). I have no idea why our hero didn't chose the proper male version of her name, although I could guess it would be too obvious and consequently suspicious and somewhat dangerous to her. --CiaPan (talk) 13:51, 12 March 2018 (UTC)
 * You are incorrect and the name came from Εὐδοκία (no English article, but Russian). See more at εὐ- and the difference between δοκέω and δόξα, -ιος/-ία and -ιμος. Even if from the same verb root, these are actually three different names (no female equivalent for Евдоким, though). --Lüboslóv Yęzýkin (talk) 21:13, 12 March 2018 (UTC)
 * You are incorrect and the name came from Εὐδοκία (no English article, but Russian). See more at εὐ- and the difference between δοκέω and δόξα, -ιος/-ία and -ιμος. Even if from the same verb root, these are actually three different names (no female equivalent for Евдоким, though). --Lüboslóv Yęzýkin (talk) 21:13, 12 March 2018 (UTC)

Native speaker variation
Hi, I'm wondering whether there has been any research done on a specific type of variation among native speakers, namely, that where there are equivalent ways of saying something, but you have to choose just one. For example, if I am asking about a place that someone has mentioned in conversation, I nearly always say "Whereabouts is it?" rather than "Where is it?" Conversely, if I need directions, I think I say "Where is it?", not whereabouts. The question is whether, in such cases, everyone/most people have a preferred choice, or whether nearly everyone tends to mix and match. IBE (talk) 06:34, 12 March 2018 (UTC)


 * Whereabouts does not mean the same as where, it means approximately where (Concise Oxford Dictionary), i.e. what general area.--Shantavira|feed me
 * I'm not curious about word meanings. Thanks all the same, IBE (talk) 10:30, 12 March 2018 (UTC)


 * I think you are referring to a Dialect. Extensive research is done on specific dialects and the variations they contain. Prince of Thieves (talk) 09:11, 12 March 2018 (UTC)
 * No, I believe the OP is asking about the variation that is liable to occur within the speech of the same individual speaker, not between speakers of the same dialect. Basemetal  09:49, 12 March 2018 (UTC)
 * In which case, idiolect may be of some relevance. {The poster formerly known as 87.81.230.195} 90.211.131.202 (talk) 09:53, 12 March 2018 (UTC)
 * Basemetal is closest, but to be more precise, it's more like the amount of variation, so one speaker may exhibit no variation (they will always say "whereabouts") whereas another might switch between them. Then this is the point - the variation (between individuals) in the amount of variation (from moment to moment). If it makes it simpler, call the individuals who vary their speech "fickle" and those who stick to one thing, "stubborn". So in general, do we tend towards fickleness, or stubbornness? Are there some alternatives where, say, half the population chooses one (nearly all the time), and half chooses the other? IBE (talk) 10:29, 12 March 2018 (UTC)


 * I don’t think everyone is aware of the exact words used. Some people may be aware of exactly what they say, but oftentimes, it’s the underlying meaning that matters. Some people say, “Let’s go out for a drink.” Others say, “Let’s go for a drink.” And others say, “Let's go out, shall we?” These statements may all perform the same function. The listener will interpret and treat them as the same, not really paying attention to the small details. Non-native speakers and listeners may treat them as different, because often they are interpreting something in their native language. 140.254.70.33 (talk) 15:28, 12 March 2018 (UTC)


 * Certainly. But these variations do exist - you become aware of them, as in my case, from teaching English, and it becomes at least slightly important - not exactly crucial, but something I try to expand incrementally as I go. IBE (talk) 12:15, 13 March 2018 (UTC)


 * You should read about Sociolinguistics, especially variationist sociolinguistics. While earlier "waves" of sociolinguistic research focused on dialect, current approaches do look at variation within the individual (the article on Style (sociolinguistics) has some material, for instance under "style-shifting"). If you are specifically interested in syntax, and if you are North American, you can find details on particular constructions here: https://ygdp.yale.edu/ 164.107.80.170 (talk) 15:56, 12 March 2018 (UTC)


 * Hey, wow! Just browsing now, probably not my exact question, but very much the sort of thing I was looking for. IBE (talk) 12:15, 13 March 2018 (UTC)

Some questions on Breton
Can anyone identify the specific Breton spelling system used for these lyrics of "Ma Zat" ("My father", the middle part of the song is a French translation of the Breton text), a song from the French musical "Anne de Bretagne". I'd always thought Breton spelling was phonetic but it is obviously not. Could anyone who knows Breton produce a standard IPA version of these lyrics? By standard I mean not what they sound like in the recording of the song but how they should sound when spoken by a native speaker. This said, I'm pretty certain the pronunciation Cécile Corbel uses is probably largely correct. I don't know if she's a Breton speaker ("Bretonne bretonnante") but she's from Britanny so I'm fairly convinced she didn't screw up too badly. But if you have an opinion about her Breton pronunciation I'd like to hear it. Thanks. Basemetal 09:39, 12 March 2018 (UTC)
 * What has actually puzzled you? I suppose it is in standard modern Breton (modern because of siwazh which was formerly written siouaz, more with the French spelling conventions), just the accents are missed here and there (e.g. must be teñval). I can't say anything about her accent, but she certainly sings (or tries to sing) in a z-dialect.--Lüboslóv Yęzýkin (talk) 22:40, 12 March 2018 (UTC)
 * Sorry: "phonetic" was obviously an extremely poor word choice. I'd meant it followed French spelling habits. (I know, French spelling is the last that anyone would think of as phonetic!) In other words what I'd meant to say was I'd thought anyone who knew French spelling would be able to make sense of Breton spelling. Then I found that in the spelling used for the lyrics of this song "si" sounded like French "chi", "ae" sounded like "é" or "è" and so on. Since I saw the WP article mentioned several styles of Breton spelling I asked which one was used here. What's a z-dialect? Basemetal  06:44, 14 March 2018 (UTC)
 * Frankly, the link you've provided in the first sentence of your enquiry contains exactly the information you want. That is they wrote like in French but in the 20th century reformed the spelling. You may call it New or rather obviously Modern Breton. The information about the main dialect division is also there. Moreover, the comparative table there exactly says that ae is pronounced as a monophthong in the Trégor (Tregiereg) and Vannes (Gwenedeg) dialects, and considering her zh as "z" we may conclude she has tried to speak with Trégor accent. I can't say anything conclusive about her si as "chi", probably they say that way somewhere and she just repeats colloquial pronunciation.
 * Of course, Breton spelling is a way more regular than French, but it still requires some understanding and knowledge of dialects; it has its own small peculiarities, not obvious from the first sight and which you need to know but they are usually covered in books.--Lüboslóv Yęzýkin (talk) 21:23, 14 March 2018 (UTC)

Unmerging wine-whine sporadically
Has anyone run into English speakers who unmerge wine-whine but only from time to time, or vice versa. Thanks. Basemetal 10:05, 12 March 2018 (UTC)
 * I probably do this myself. I'm aware of the difference, but may not always be strict about pronouncing the "wh-" in casual speech.  Native English, no particular accent.  Rojomoke (talk) 12:39, 12 March 2018 (UTC)
 * Same with me (General American English) as with Rojomoke — In casual speech I merge them, but in careful speech I unmerge them. Loraof (talk) 14:22, 12 March 2018 (UTC)
 * I would continue to merge the pronunciations (here in the UK) except for emphasis or to make a distinction between the words.  Dbfirs  15:29, 12 March 2018 (UTC)
 * Or the opposite, to make this joke: "Would you like some cheese with that whine?" ←Baseball Bugs What's up, Doc? carrots→ 18:21, 12 March 2018 (UTC)

Online Basque dataset
Hi,

For a machine learning project I need a big corpora of (any) text lr a collection of texts written in Basque, in a format suitable for download (preferably .txt file). Are there such texts online apart from Basque Wikipedia?

Thanks. — Preceding unsigned comment added by 84.229.98.188 (talk) 18:33, 12 March 2018 (UTC)

Converting Diacritic to Non-Diacritic
Sometimes there are formatting problems where a word with Diacritics ends up being converted to a word that looks normal but is linguistically incorrect. For example, "Hōjō" becomes converted to "Hojo" because the system, which can't understand diacritic, approximates the letter "ō" to "o". And I've read somewhere that "ō" in non-diacritic form is "ou". So does that mean the linguistically correct, non-diacritic spelling of "Hōjō" is "Houjou"? And with that good question in mind, can anyone provide a list of Diacritic Letters that are spelled in Non-Diacritic form? --Arima (talk) 21:39, 12 March 2018 (UTC)
 * I have no idea if it is normal to respell Japanese ō with ou, but in German there is a centuries old convention of respelling ä, ö, ü, ß with ae, oe, ue, ss (the latter is the standard in Swiss German). For more technical cases you may be interested in Doc 9303 (see page 30 and following).--Lüboslóv Yęzýkin (talk) 22:57, 12 March 2018 (UTC)
 * German ä ö ü originated as shorthand for ae oe ue, and ß as a ligature of ſz, so arguably the centuries-old convention of respelling is the other way around. —Tamfang (talk) 00:44, 13 March 2018 (UTC)
 * From the point of view of contemporary German there is the only one way. The problem is, to my knowledge, they never used the -e digraphs in Middle and Early Modern High German. Around the 16th century, they came straight from writing/printing e to aͤ and in a few decades from that started to write/print ä. On the contrary, writing/printing ae would be very uneconomical from their point of view, they have no reasons to do this, like today we have had such as typewriters with no accented characters or computer encodings. They rather preferred to abbreviate everything, e.g. they wrote/printed ã for an/am, so it would be strange if they wrote/printed ae. I have no clear idea why Germans would have written in such a way at all, but it seems to me that this must have originated in non-Germanic areas such as France. For example, the great-grandfather of Johann Wolfgang von Goethe was Göthe, but then the grandfather moved to France and returned to Germany, so the father already bore the name Goethe.--Lüboslóv Yęzýkin (talk) 19:59, 13 March 2018 (UTC)
 * I appreciate the clarification on how to spell letters with an umlat in non-diacritic form. But the document you pointed out didn't seem to be as helpful as I had hoped. Namely, it doesn't even specify what most of the Diacritic letters are supposed to sound like. --Arima (talk) 02:00, 13 March 2018 (UTC)
 * You've literally asked about converting, so this document exactly provides this information. Maybe there are other documents, I do not know, but the general rule is simple: just strip out the accents. For the sounds you had better check the pronunciation rules of each language (I'd recommend reading www.omniglot.com). Linguistically omitting the accents is incorrect in most languages which use them. There are, of course, languages (e.g. Dutch or many African languages) that would tolerate it more than others, but this still be incorrect. Even in German ä > ae, etc. is incorrect unless there are serious technical limitations to do this.--Lüboslóv Yęzýkin (talk) 20:42, 13 March 2018 (UTC)
 * Sorry. As an english-speaker, I was wondering about how to convert Diacritic to Non-Diacritic so I could get an understanding on how words with Diacritics would be properly spelled and said in Non-Diacritic form.--Arima (talk) 21:45, 13 March 2018 (UTC)
 * I'll be frank: your way of speaking is very obscure that it's quite difficult to understand what you really want and what you really ask. How to convert is very simple: strip them out (it seems I repeat myself the second time). The exceptions, where you may respell them with two letters, are very few (namely, it is German, and no other language which does the same comes to my mind right now). Note, Japanese is not written in Latin script. So we are speaking about Romanization which may be done in a number of ways. How to pronounce words that are stripped out of diacritics: you never know for sure unless you know the language. One example has come to my mind: Serbo-Croatian family names which end in -ić tend to loose the accent and hence to be pronounced with "-ik" (rather than the proper "-ich") by the people who have no idea about Serbo-Croatian. So the only way: learn the language, at least its reading rules. The website where you could do this I've provided.--Lüboslóv Yęzýkin (talk) 21:43, 14 March 2018 (UTC)
 * Okay then. Thank you and sorry for being vague.--Arima (talk) 07:35, 15 March 2018 (UTC)
 * no other language which does the same comes to my mind right now -- all Scandinavian languages do the same, replacing æ or ä with ae, ø or ö with oe, and å with aa. --77.138.191.65 (talk) 07:42, 16 March 2018 (UTC)


 * As I misunderstand it – Japanese ō is written with the kana う (after whichever *o-kana is appropriate), literally u, but not pronounced with /u/; so it can be transliterated either ou or oo. I don't know whether the same applies to ē, ei. —Tamfang (talk) 00:44, 13 March 2018 (UTC)
 * Maybe it was pronounced ou̯ when that spelling was adopted? —Tamfang (talk) 02:33, 13 March 2018 (UTC)


 * The "Macron" Diacritic indicates a long vowel. So you're right: "Hōjō" in Non-Diacritic form would be "Hoojoo". So I'm guessing "ē" would be "ee", but I could be wrong because I don't know for sure. --Arima (talk) 02:00, 13 March 2018 (UTC)
 * Yes, the natural alternative for ē is ee. But I've very often seen ei on the same page as ō (see for example 英); and ē seems to me to be rarer.  So: is ええ ee used in Japanese, and is it distinct from えい ei?  I don't know. —Tamfang (talk) 02:50, 13 March 2018 (UTC)


 * Arima, are you talking specifically about Japanese, or about languages in general? If you mean languages in general, there is no single rule. There are different rules for different languages.
 * If you are referring to Japanese, it is a little more complicated. In most cases, Japanese long vowels are made by adding the hiragana vowels -あ (a → ā), -い (i → ī), -う (u → ū), -い (i → ē or ei), -う (u → ō). For example, for the hiragana ‘k-’ series:
 * かあ (kā), きい (kī), くう (kū), けい (kē or kei), こう (kō).
 * However, this is not always the case. In a few words, you have to add -え (e → ē) or -お (o → ō) (instead of -い and -う). For example,:
 * おねえさん (onēsan, "sister"), おおい (ōi, "long"), おおきい (ōkī, "big").
 * The above applies to hiragana only. The rules for katakana are simpler. To make a long vowel in katakana, just add ー, as in:
 * ツアー (tsuā, "tour"), メール (mēru, "email"), ケーキ (kēki, "cake"). —Stephen (talk) 05:25, 13 March 2018 (UTC)
 * Thank you. I was talking both about Japanese and other languages in general.--Arima (talk) 21:45, 13 March 2018 (UTC)