Talk:Baum–Welch algorithm

Suggestion
Could Template:HMM example be put to use on this page? (Granted we might have to modify it a bit to make it more general) -- Kowey 14:16, 14 Nov 2004 (UTC)

Content Discussion
There is a document written by Welch on IEEE Information Theory Society Newslectter, December 2003. See http://www.itsoc.org/publications/nltr/it_dec_03final.pdf

I scrutinized the description, and I also think it is wrong.

We have: gamma_t(i): the probability that an HMM is in state s_i at time t given the observations O and the parameters Lambda. (1)

gamma_t(i)=P(s_i=q_t | O, Lambda)       (2)

gamma_t(i)=alpha_t(i)*beta_t(i) (3)            P(O|Lambda)

(3) is the Forward-Backward probability of state s_t(i), the probability one is in state s_i at t given time t. So this is the role the Foward-Backward algorithm plays in B W.

Now, let's look at the probability of going through a TRANSITION from s_i at time t to s_j at time t+1. Let's call it "chi'

Chi(i,j) is the probability of being in state s_i at time t, and being in state s_j at time t+1 given the observations and the parameters. (4)

Chi(i,j)=P(s_i=q_t, s_j=q_t+1 | O, Lambda) (5)

Chi(i,j)=alpha_t(i)a_i_jb_j(ot+1)*beta_t+1(j) P(O|Lambda)     (6)

T-1 SIGMA  Chi(i,j) is the expected number of    t=1 transitions from s_i to s_j

T-1 SIGMA  gamma_t(i) is the expected number of    t=1 transitions from s_i

T-1 SIGMA  Chi(i,j) t=1 Updated a_i_j = T-1 SIGMA  gamma_t_(i) t=1

It is the last step I am not getting. Certainly the enumerator is not constant, as the current article claims, nor is it equal to the probability of the entire string. I cannot see why updated a_i_j is greater than the old value.

References

Cutting, D., and Kupiec, J., Pedersen, and P Sibun "A Practical Part-of-Speech Tagger", Xerox Palo Alto Research Center.

ChengXiang Zhai, A Brief Note On The Hidden Markov Models, Lecture Notes, Dept. of CS, University of Illinois at Urbana-Champaign.

tp://ocw.mit.edu/NR/rdonlyres/Electrical-Engineering-and-Computer-Science/6-345Automatic-Speech-RecognitionSpring2003/4C063D4A-3B8B-4F05-B475-85AAD0D66586/0/lecture10.pdf

Koos van der Wilt Koos van der Wilt 00:52, 13 May 2007 (UTC)


 * Were you discussing the article or the Welch presentation? Which one do you feel wrong? I think the latter, please confirm. I've not read your contribution, but I remember several headaches when trying to make sense of this algorithm in Pattern Classification (2ed) by Richard Duda et al. (ISBN 0 471 05669 3); I was convinced the description was wrong, and I rememeber I thought it was difficult to fix it.

--Blaisorblade (talk) 03:09, 24 June 2008 (UTC)

Confusion over Forward-backward algorithm
I think the link is confusing. The current text of Forward-backward algorithm seems to describe a different algorithm, which uses a forward and a backward pass but combines them for other purposes (at least, it appears so to me). Also the Baum-Welch algorithm does such a forward and backward pass, so it is called forward-backward algorithm somewhere (for instance in Pattern Classification (2ed) by Richard Duda et al. (ISBN 0 471 05669 3)). --Blaisorblade (talk) 03:09, 24 June 2008 (UTC)

http://bits.wikimedia.org/skins-1.5/common/images/button_math.pnghttp://bits.wikimedia.org/skins-1.5/common/images/button_media.png —Preceding unsigned comment added by 210.83.214.164 (talk) 03:22, 25 February 2010 (UTC)

Ambiguous indices
In the formula for the epsilon variables, i and j are both free indices, and used for summation. 2001:6B0:1:1DF0:C185:CDB5:7AF7:29BA (talk) 17:58, 19 October 2013 (UTC)

Confusing chicken and egg example
There is one chicken and we observe whether it lays eggs or not at a day. Why are the observations then like NN, NE, EE, EN, that is, why are there two observations each day? I don't get it. HenningThielemann (talk) 17:33, 25 January 2015 (UTC)

In answer to HenningThielman: I assume from reading the article that the listed "observations" are supposed to represent observed transitions (pairs of consecutive days). That is, observations are listed as NN, NN, NN, NN, NE, EE, EN, NN, NN, which is consistent with the actual sequence of events being NNNNEENNN. This could be clearer, I'll make an edit. — Preceding unsigned comment added by 49.184.214.178 (talk) 05:24, 9 January 2019 (UTC)

Working out the probability all day works up an appetite, wherefore the chicken is fried at suppertime. (Probability of laying an egg drops to zero.)

Convergence?
The description ends with "These steps are now repeated iteratively until a desired level of convergence." What steps? Although it would be nice to specify how convergence is measured/detected, I'll leave that aside for now. Isn't the algorithm meant to be evaluated for T=1, then T=2, T=3, and so on until a desired level of convergence is attained? If true, the text should reflect that (it currently does not, since the "steps" are not linked with T). Urhixidur (talk) 15:54, 24 October 2017 (UTC)

Another thing that is sadly missing from the article is the measure of fitness of any given solution. Training an HMM over various data sets (of the same phenomenon) with the same parameters (number of hidden states, basically), how does one pick the "best" one? Urhixidur (talk) 17:34, 24 October 2017 (UTC)

Reproduction
I have written the following code to reproduce the numbers shown in the example. Chrism2671 (talk) 16:57, 11 April 2022 (UTC)