Talk:Speaker recognition

Wiki Education Foundation-supported course assignment
This article was the subject of a Wiki Education Foundation-supported course assignment, between 21 January 2020 and 4 May 2020. Further details are available on the course page. Student editor(s): Gsk42.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 09:57, 17 January 2022 (UTC)

Untitled discussion from 2004–2006
'Voice recognition' is claimed to redirect to this page but, in fact, redirects to the Speech recognition page. There are two ways to fix this. mac01021 18:10, 25 May 2011 (UTC) — Preceding unsigned comment added by Mac01021 (talk • contribs)

'Voice recognition' is surely a generic term that encapsulates all other forms (e.g. speaker / speech, etc., recognition).

86.150.22.131 (talk) 10:20, 20 August 2008 (UTC)

Is this somehow distinct from Voice recognition, or should they be merged? -FZ 18:13, 15 Dec 2004 (UTC)


 * Offhand, I think this article should be a section of Voice recognition. Joyous 18:30, Mar 17, 2005 (UTC)


 * Based on the redirect that already existed, it looks like voice recognition redirects to speaker recognition and disussion on the speech recognition page suggests that perhaps voice authentication should really be the title of this article. --DeweyQ 15:54, 22 July 2006 (UTC)


 * Another question is whether this was distinct from speaker identification. That page has a message asking whether it should be merged into this one, and a link inviting discussion here.  The term of art that I know is speaker identification or speaker ID, although my expertise is in a different (related) area, so I am not positive.  To me, speaker ID suggests a 500-way decision (which of my acquaintances is it?), while speaker recognition suggests a binary decision (is it my master or not?).  Eclecticos 13:24, 28 September 2006 (UTC)


 * To me the term speaker recognition means the recognition of a speaker. That is, the system recognizes the speaker. The term itself does not specify if the speaker is recognized from a pool of many others or if the speaker is re-recognized. Thus, it should include both verification and identification. BTW, the term authentication to me is just a synonym for verification and not for recognition just as voice is used as synonym for speaker in voice recognition. Gemuetlich 13:49, 29 September 2006 (UTC)

I agree with Gemuetlich. Speaker verification or authentication checks that the voice is from a specific person, but speaker identification means that the person was selected from a group. Speaker recognition, however, is a general term and applies to both. I think the Speaker Recognition article explains this well and should have sections for speaker verification and identification.PantherDave 02:39, 1 December 2006 (UTC)

Voice biometrics
I merged the stub article Voice biometrics here in order to avoid content forking. --Bejnar (talk) 02:01, 8 April 2012 (UTC)

Stress and fatigue voice characteristics
Where is the best place for voice monitoring for other measures such as stress, e.g. and fatigue, e.g., and ? Thanks. Martinevans123 (talk) 21:26, 27 September 2012 (UTC)

Non-electronic speaker recognition
There is nothing here about the natural ability of people to identify a speaker. Is that covered elsewhere and if so should it be pointed to from this article. 86.139.225.123 (talk) 02:00, 31 October 2013 (UTC)

Does the article really need to explain the terms speaker recognition, speech recognition and voice diarization?
Pi314m and I can't agree on whether or not the definitions in parentheses at the beginning of the article should be there to begin with or not. For example, the Similarities section at the National flag article could have little flags next to the country names so people don't have to open new tabs to look up the flags of each and every country mentioned in the section, but it doesn't, so I was thinking "if that article isn't lazy-reader friendly, why should this one be?". I could "revert 1 good faith edit" once more but I don't think that'll be good Wikiquette so I'll discuss it here instead. -- MrHumanPersonGuy (talk) 01:47, 28 December 2018 (UTC)

Speeches, speakers, English majors, ag-ism, divided powers
There is a difference knowing English, the language, and having awareness of current technology and terminology.

To respond to
 * "This isn't Simple English Wikipedia"

while trusting that this was not some type of "my backyard's bigger than yours"
 * A New York Times 1962-era writer did not have to distinguish among voice recognition, speaker recognition and speech recognition.


 * That year's World's Fair demonstration of a new IBM product in that space was nowhere near what IBM introduced 35 years later - Busines Travel News: Voice Recognition To Ease Travel Bookings

To digress, neither a 1776 British military English Major nor a same era university English major could speak to the splitting of governing known as
 * separation of powers = among branches of the same government
 * division of powers = between or among different governing authorities (e.g. Federal, state)

Back to the present: All of the above can be seen as problems in Ageism, or what USA-centric Americans spell as Agism. These words and phrases had different meanings at different times.

Wiki articles are from time to time tagged as
 * too technical
 * overly technical
 * highly technical

If, by adding 3 phrases to the first paragraph, an article with 20+ following paragraphs can have greater clarity, the burden doesn't appear overly great.

I'd like to believe that even though
 * "This isn't Simple English Wikipedia"

the addition of
 * (who is speaking)
 * (what's being said) -and-
 * (recognizing when the same speaker is speaking)
 * is acceptable, given that you find adding
 * (also called speaker authentication)
 * acceptable, following the "Speaker verification" wording.

If all of this were sitting within a 1962-written article titled "Voice recognition," there still would be a need to distinguish between "who said" and "what was said."
 * The division and separation is not locked into the English language, anymore than "depression" (1929) and "recession" (2008).

Please permit others to benefit from what you (and I) don't need; the burden to us is not as great as "dozens of country flags." Pi314m (talk) 05:53, 30 December 2018 (UTC)


 * A New York Times 1962-era writer did not have to distinguish among voice recognition, speaker recognition and speech recognition. "
 * That year's World's Fair demonstration of a new IBM product in that space was nowhere near what IBM introduced 35 years later"
 * To digress, neither a 1776 British military English Major nor a same era university English major could speak to the splitting of governing known as"
 * Back to the present: All of the above can be seen as problems in Ageism, or what USA-centric Americans spell as Agism. These words and phrases had different meanings at different times."
 * These arguments I listed barely do anything to address your insistence on keeping the article other than talk about vernacular. The IBM thing is unrelated to the main topic of the discussion.


 * If all of this were sitting within a 1962-written article titled "Voice recognition," there still would be a need to distinguish between "who said" and "what was said."
 * Speaker recognition is about recognizing particular voices. Speech recognition is about recognizing which sounds are voices. Voice recognition can be about both. The only word I think would be confusing would be diarization which takes just one google to find the definition if the Speaker diarisation article itself doesn't help. And Wikipedia is an encyclopedia, so the prose should look like something from an encyclopedia, not a publication or a journal (let alone a 1962-style one). -- MrHumanPersonGuy (talk) 15:34, 1 January 2019 (UTC)

I cannot fully follow the above discussion, but to me the current introduction does not seem quite right. The first two sentences make perfect sense and explain the topic. The third sentence introduces the different (but partly overlapping) topic of voice recognition. It is important that the page includes the distinction/similarity but is the very first paragraph the best place? And are five references necessary for the term voice recognition while the main topic gets only one reference?

The paragraph finishes with a sentence that is difficult to understand. Again a new topic/term is introduced (speaker verification/authentication) and this is for some reason contrasted with identification (rather than simply speaker recognition which is pretty much defined as identification). The same sentence then deals with yet another topic/differentiation. (I hope I can also add that the section on verification vs identification does not match the same discussion on the page on authentication.)2A00:1598:C006:0:0:0:0:8D3C (talk) 22:15, 16 December 2019 (UTC)

Difference between voice and speaker recognition
I had a different voice at age four than I have now. It's doubtful a model trained on my adult voice could be used to identify my childhood voice.

Many people speak multiple languages. For a while, I tried to learn Chinese. I highly doubt a model trained to recognize my proficient English voice could be used to recognize my pathetic Chinese voice.

People who relearn to speak after a laryngectomy have different voices than before.

People have different voices after inhaling helium.

Transsexuals under hormone therapy probably end up with substantially different voices. Adolescent boys sometimes sound weirdly croaky after their voice change, especially the ones who end up with the deepest register.

Stephen Hawking and Roger Ebert had more than one distinct electronic voice during their illnesses.

And finally, there's Zaphod Beeblebrox, who had two heads, and quite possibly, two entirely different voices. On the same planet, there's probably some bird species that moults its beak every year, and never again entirely sings the same clef.

Speakers and voices are not 1:1 maps over a variety of different dimensions. I'm pretty sure that most of these systems key on voice, not speaker, and don't come with the expectation of resolving distinct voices onto a singular speaker.

Which leaves voice recognition as the practical technology, and speaker recognition as the aspirational technology (security-focused customers are of course mainly promised the later, even when only the former is entirely practical). &mdash; MaxEnt 20:32, 13 June 2019 (UTC)

Spoofing
Add a section on the many spoofing attempts that have been done. One could imagine a famous person whose voice is available on many YouTube videos... Jidanni (talk) 06:38, 2 February 2021 (UTC)