Talk:Restricted Boltzmann machine

weight calculations
Hinton's article A Practical Guide to Training RBMs states that the weight adjustment should be $$ e * ( _{data} - _{recon} )$$

where the angle brackets indicates expectation, defined as $$ \sum_{i,j} v_{i} h_{j} * p(v_{i}h_{j}) $$. This article replaces the expectation with the outer product $$ v*h^T $$.

Why? Are there other ways to calculate the expectations? Numiri (talk) 06:05, 19 August 2022 (UTC)

Article Unclear
This article is not very clear. For example, it defines the weights from the hidden units back to the visible units, but not those in the other direction. — Preceding unsigned comment added by Paulhummerman (talk • contribs) 19:34, 25 March 2013 (UTC)
 * There shouldn't be any links from the hidden to the visible layer, according to the claim that it's not recurrent (i.e. no cycles). But the no-cycle limitation does not lead to the conclusion that there can't be links between nodes in the hidden layer. I.e. if there's a single linear path from the first to the last hidden node, there is no cycle (i.e not recurrent). This article needs a few citations for the claims it's trying to make, and the claims which can't be sourced should be removed per WP:OR — Preceding unsigned comment added by 89.146.45.81 (talk) 12:38, 10 September 2014 (UTC)

Yes, but does it do? What is it for? ;o)
The article tells us what it looks like and how to train it but I can't for the life of me see what it takes as input and what it gives as output. A bit more on that and especially an example or two would be a big improvement. 92.0.230.198 (talk) 17:31, 27 June 2015 (UTC)

82.67.197.191 (talk) 16:21, 24 September 2016 (UTC) You can input digital pictures (in which case visible bits are pixels) and the output on the hidden side can be seen as lower-dimensional encoding of the data - check the 2006 Science paper of Hinton to see examples along these lines.

Odd training text
Just chopped the following out:
 * A restricted/layered boltzmann machine (RBM) has either bit or scalar node values, an array for each layer, and between those are scalar values potentially for every pair of nodes one from each layer and an adjacent layer. It is run and trained using "weighted coin flips" of a chance calculated at each individual node. Those chances are the logistic sigmoid of the sum of scalar weights of whichever pairs of nodes are on at the time, divided by temperature which decreases in each round of Simulated annealing as potentially all the data is trained in again. If either node in a pair is off, that weight is not counted. To run it, you go up and down the layers, updating the chances and weighted coin flips, until it converges to the coins in lowest layer (visible nodes) staying mostly a certain way. To train it, its the same shape as running it except you observe the weights of the pairs that are on, the first time up you add the learning rate between those pairs, then go back down and up again and that time subtract the learning rate. As Geoffrey Hinton explained it, the first time up is to learn the data, and the second time up is to unlearn whatever its earlier reaction was to the data.

It was added by 66.169.5.181 in January, but reads very oddly. I'm not qualified to judge whether it's incorrect - perhaps some of this could be re-worded and re-introduced by someone who understands the topic? Snori (talk) 17:22, 6 August 2015 (UTC)

Completeness
"As their name implies, RBMs are a variant of Boltzmann machines, with the restriction that their neurons must form a bipartite graph (...)"

Does it need to form a complete bipartite graph? The "usual" Boltzmann machine seems to form a complete graph (Boltzmann machine: "A Boltzmann machine, like a Hopfield network (...)", Hopfield network: "In this sense, the Hopfield network can be formally described as a complete undirected graph (...)") --Nobelium (talk) 17:27, 9 April 2018 (UTC)