Continuous Bernoulli distribution

In probability theory, statistics, and machine learning, the continuous Bernoulli distribution  is a family of continuous probability distributions  parameterized by a single shape parameter $$\lambda \in (0, 1)$$, defined on the unit interval $$x \in [0, 1]$$, by:
 * $$ p(x | \lambda) \propto \lambda^x (1-\lambda)^{1-x}. $$

The continuous Bernoulli distribution arises in deep learning and computer vision, specifically in the context of variational autoencoders, for modeling the pixel intensities of natural images. As such, it defines a proper probabilistic counterpart for the commonly used binary cross entropy loss, which is often applied to continuous, $$[0,1]$$-valued data. This practice amounts to ignoring the normalizing constant of the continuous Bernoulli distribution, since the binary cross entropy loss only defines a true log-likelihood for discrete, $$\{0,1\}$$-valued data.

The continuous Bernoulli also defines an exponential family of distributions. Writing $$\eta = \log\left(\lambda/(1-\lambda)\right)$$ for the natural parameter, the density can be rewritten in canonical form: $$ p(x | \eta) \propto \exp (\eta x) $$.

Bernoulli distribution
The continuous Bernoulli can be thought of as a continuous relaxation of the Bernoulli distribution, which is defined on the discrete set $$ \{0,1\} $$ by the probability mass function:
 * $$ p(x) = p^x (1-p)^{1-x}, $$

where $$ p $$ is a scalar parameter between 0 and 1. Applying this same functional form on the continuous interval $$ [0,1] $$ results in the continuous Bernoulli probability density function, up to a normalizing constant.

Beta distribution
The Beta distribution has the density function:
 * $$ p(x) \propto x^{\alpha - 1} (1-x)^{\beta - 1}, $$

which can be re-written as:
 * $$ p(x) \propto x_1^{\alpha_1 - 1} x_2^{\alpha_2 - 1}, $$

where $$ \alpha_1, \alpha_2 $$ are positive scalar parameters, and $$(x_1, x_2)$$ represents an arbitrary point inside the 1-simplex, $$ \Delta^{1} = \{ (x_1, x_2): x_1 > 0, x_2 > 0, x_1 + x_2 = 1 \} $$. Switching the role of the parameter and the argument in this density function, we obtain:
 * $$ p(x) \propto \alpha_1^{x_1} \alpha_2^{x_2}. $$

This family is only identifiable up to the linear constraint $$ \alpha_1 + \alpha_2 = 1 $$, whence we obtain:
 * $$ p(x) \propto \lambda^{x_1} (1-\lambda)^{x_2}, $$

corresponding exactly to the continuous Bernoulli density.

Exponential distribution
An exponential distribution restricted to the unit interval is equivalent to a continuous Bernoulli distribution with appropriate parameter.

Continuous categorical distribution
The multivariate generalization of the continuous Bernoulli is called the continuous-categorical.