Talk:Boltzmann machine

Global Energy
Why does the global energy function have:
 * $$\sum\limits_{i<j}\cdots$$

Shouldn't this be:
 * $$\sum\limits_{i,j} \cdots$$

But I could be misunderstanding... 129.215.26.79 (talk) 15:31, 13 May 2014 (UTC)


 * Never mind I see it just saves having to divide by two to account for double counting. 129.215.26.79 (talk) 12:43, 15 May 2014 (UTC)
 * Looking at it from the point of view of a programmer, it says "Don't do all the work twice". ;-) 92.0.230.198 (talk) 17:25, 27 June 2015 (UTC)

Training sign
I removed the minus sign from the RHS of this:
 * $$\frac{\partial{G}}{\partial{w_{ij}}} = \frac{1}{T}[p_{ij}^{+}-p_{ij}^{-}]$$

If p+ is clamped and p- is unclamped, then we want to make the weights MORE like the correlation of the clamped and less like unclamped, I think ... please check this! Charles Fox


 * You're incorrect, the minus sign is needed —Preceding unsigned comment added by 72.137.60.77 (talk) 17:36, 5 April 2009 (UTC)

What does "marginalize" mean in the following?
"We denote the converged distribution, after we marginalize it over the visible units V, as P − (V)." There is no other instance of this word in the article. Even a technically-minded reader wouldn't understand this article if the word isn't defined anywhere. - Will

CRF
Is the Boltzmann machine the same as a Conditional Random Field? If so that should be mentioned somewhere!
 * No, it isn't. A CRF can however be viewed as convexified Boltzmann machine with hand-picked features. - DaveWF 06:10, 19 April 2007 (UTC)

The threshold
What is the importance of the threshold parameter? How is it set?
 * Learned like any other parameter. Just have a connection wired to '+1' all the time instead of another unit. I should add this. - DaveWF 06:10, 19 April 2007 (UTC)

Can threshold be referred to as bias? Also, the link on threshold takes you to the disambiguation page, which has no articles describing threshold in this context.

The term threshold is wrong in this context. As the description of Boltzmann machines is in this article, there is no threshold function, and thus no threshold. Instead the Theta is a bias here. If a unit is activated, the bias Theta of that unit will be added to the total energy function. I have updated the text accordingly. I assume this a copy and paste error, by just copying the term threshold over from the description of Hopfield networks, where it actually makes sense, since Hopfield networks have a threshold function. - sebastian.stueker 23:30, 17 May 2013 (UTC)

The Training Section
I have a problem understanding what is P+(Vα). P+ is the distribution of the states after the values for Vα are fixed. So P+(Vα) should be 1 for those fixed values and 0 for any other values to Vα.

Also, what does α iterate over in the summation for G?

The cost function
What is the cost function? What cost does it measure? How do we train the network if we have more than one input?

{-1,1} or {0,1} ?
In the definition of s the article claims that si is either -1 or 1. Five lines below, it says that the nodes are in state 0 or 1, which is also what I found in (admittedly older) literature on the subject. Is the {-1,1} simply wrong or am I missing something? —Preceding unsigned comment added by Drivehonor (talk • contribs) 13:56, 7 August 2007

I think either representation should work. But I'm not sure. Can anyone confirm this? —Preceding unsigned comment added by Zholyte (talk • contribs) 19:47, 10 November 2007 (UTC)

You probably just don't understand, because it doesn't matter at all. —Preceding unsigned comment added by 130.15.15.193 (talk) 23:12, 2 December 2009 (UTC)

I think si should be in {-1, 1}. If {0, 1}, then the contribution to energy, as per the energy function, of a positively weighted pair of off nodes would be the same as that of a mismatched pair (1 off, 1 on). I believe off-off and on-on should have equal energy contributions, but don't see how this can be achieved given the current statement of the global energy equation, and the state domain {0, 1}. Onejgordon (talk) 08:12, 22 November 2018 (UTC)

Etimology
WHY is it called a Boltzmann machine? Is it named after Ludwig Boltzmann? The Ludwig Boltzmann article references this one... but that can't be determinant. --Nehushtan (talk) 22:10, 12 January 2009 (UTC)


 * Yes, it's named for Ludwig Boltzmann. AmiDaniel (talk) 08:31, 26 September 2011 (UTC)


 * Because the underlying energy minimization strategy involves the Boltzmann Distribution p.r.newman (talk) 13:45, 19 May 2013 (UTC)

Question : the new phase of learning ?
Question, "Later, the weights are updated to maximize the probability of the network producing the completed data." What's this mean ? Is this new phase of learning ? Is this mean that :$$p_{ij}^{+}$$ are set as constatn in this phase - compute as a characteristic of learning set ? - for example if our data set is {{1,1,0},{1,0,1},{1,0,0}} then :$$p_{12}^{+}=1/3$$, :$$p_{13}^{+}1/3$$,  :$$p_{23}^{+}=0$$ in all later iterations ? Peter 212.76.37.154 (talk) 16:04, 28 January 2009 (UTC)

Please improve the first paragraph
The first paragraph (the description) fails to describe what Boltzmann machine is. It only talks about what it is not. Given that how many things are not a Boltzmann machine, it is a bit wasteful... The class, where this particular network belongs (and where this article links it to) is lacking a description (is a stub / something automatically generated and not informative). Other explanations are given by counterexample. I.e. it says that this network is a counterpart of something else and that it can't be used for something. Neither statement is helpful in understanding of what it is or where it can be useful. 79.181.224.222 (talk) 22:19, 16 December 2012 (UTC)

Tidy up of citations needed
I move the Ackley et al. citation in the article to be an in-line reference but then got daunted by trying to bring the other citations and further readings into line with Wikipedia standards. I'll try and get back to it but hope others will feel free to take it on! p.r.newman (talk) 13:45, 19 May 2013 (UTC)

Incorrect statement about scalability?
"the time the machine must be run in order to collect equilibrium statistics grows exponentially with the machine's size". I've talked to ML researchers who have disputed this point. This "fact" has been here for years - do we have a reference on it?

Yes, but does it do? What is it for? ;o)
The article tells us what it looks like and how to train it but I can't for the life of me see what it takes as input and what it gives as output. A bit more on that and especially an example or two would be a big improvement. 92.0.230.198 (talk) 17:31, 27 June 2015 (UTC)

Image
You might want to use the image



for the article. --MartinThoma (talk) 23:07, 12 February 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified one external link on Boltzmann machine. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20100705230134/http://learning.cs.toronto.edu/~hinton/absps/pdp7.pdf to http://learning.cs.toronto.edu/~hinton/absps/pdp7.pdf

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 04:44, 23 July 2017 (UTC)

What does it actually do?
I read the article but there is scant indication literally of what it does. It would be helpful to list some examples, especially what it is doing for a user. The article is the "how" of the network. Jazzbox (talk) 23:27, 29 May 2020 (UTC) Jazzbox (talk) 23:28, 29 May 2020 (UTC)

The introduction and description is a mess
The introductory paragraph of this article is a epistemological train-wreck. Lots of related terms are being randomly thrown around with no connection or coherence. Hinton's own Scholarpedia article is a good start. Probably makes sense to be fair, but Boltzmann Machine was a LEARNING model that had nothing to do with Spin Glasses which are COMBINATORIAL physics problems. It looks as if the article is hastily written by a graduate student who is writing a term paper, and not by an expert. — Preceding unsigned comment added by 128.111.64.109 (talk) 16:03, 24 August 2020 (UTC)

the link between E_{i=off} and p_{i=off} is not correct
the probability that s_i=1 is const*\sum_{j\neq i} exp(-\beta E(s_j with s_i=1)). So the rather cumbersome calculation in the middle of the article should be revised, because the partial sum needed to correctly defined p(s_i=1) or 0 are omitted 24.120.54.52 (talk) 17:18, 10 March 2023 (UTC)