User:SkyShen/sandbox

In statistical data analysis the total sum of squares (TSS or SST) is a quantity that appears as part of a standard way of presenting results of such analyses. It is defined as being the sum, over all observations, of the squared differences of each observation from the overall mean.

Definition
In statistical linear models, (particularly in standard regression models), the TSS is the sum of the squares of the difference of the dependent variable and its mean:
 * $$\mathrm{TSS}=\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^2$$, where $$\bar{y}$$ is the mean.

Decomposition of total sum of squares
For wide classes of linear models, the total sum of squares is the sum of the explained sum of squares (RegSS) and the residual sum of squares (RSS):"$\mathrm{TSS} = \mathrm{RegSS} + \mathrm{RSS}$,"where $$\mathrm{RSS}=\sum_{i=1}^{n}\left(y_{i}-\hat{y_{i}}\right)^2$$and $$\mathrm{RegSS}=\sum_{i=1}^{n}\left(\hat{y_{i}}-\bar{y}\right)^2$$. Here, $$\hat{y_{i}}$$ is the least square estimate for observation i.

For a proof of this in the multivariate OLS case, see partitioning in the general OLS model.

Relationship between total sum of squares and R-squared
Usually $$R^2$$ (R-squared) is used to compare the explained sum of squares and the total sum of squares:

$$R^2 = \frac{RegSS}{TSS} $$.

The ratio $$\frac{RegSS}{TSS} $$ is the fraction of the part of the total variance explained by the explanatory variable(s).

In a linear model with only one explanatory variable (also called a simple linear model):
$$R^2 = \frac{RegSS}{TSS}$$ = $$r^2_{xy}$$, where $$r^2_{xy}$$ is the sample correlation of the response variable y and the explanatory variable x.

In a linear model with multiple explanatory variables (also called a multivariable linear model):
$$R^2 = \frac{RegSS}{TSS}$$ is called multiple $$R^2$$. Now, there are more than one explanatory variables, so $$R^2$$ cannot be a correlation here. The multiple $$R^2$$ can be used to explain how well the explanatory variables explain the response variable.

Another way of viewing $$R^2$$:
When a linear model is a model that has only the intercept, y = $$\beta_0$$ + $$\epsilon$$, the least squares estimate $$\hat{y_i}$$ equals $$\bar{y}$$, so the

RSS for this model is:

$$\sum_{i=1}^{n}\left(y_{i}-\hat{y_{i}}\right)^2 $$ = $$\sum_{i=1}^{n}\left(y_{i}-\bar{y_{}}\right)^2 $$, which is equal to the TSS. Denote this model as the small model.

The TSS of the big model$$y_i = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + ... + + \beta_px_{ip} + \epsilon_i$$, where p is the number of explanatory variables in the model, is also $$\sum_{i=1}^{n}\left(y_{i}-\bar{y_{}}\right)^2 $$.

Since $$R^2 = \frac{RegSS}{TSS} = \frac{TSS_{\text{big model}} - RSS_{\text{big model}}}{TSS_{\text{big model}}} = 1 - \frac{RSS_{\text{big model}}}{TSS_{\text{big model}}} = 1 - \frac{RSS_{\text{big model}}}{RSS_{\text{small model}}} $$, another way of explaining $$R^2 $$ is that it compares the variance explained by the big model to that of the small model.

A special case of $$R^2$$:
The RSS of the intercept-only linear model, y = $$\beta_0$$ + $$\epsilon$$, is equal to $$\sum_{i=1}^{n}\left(y_{i}-\bar{y_{}}\right)^2 $$, the TSS.

Under this condition, $$R^2 = 1 - \frac{RSS}{TSS} = 1 - \frac{TSS}{TSS} = 0 $$. Here $$R^2 $$ is just a comparison of the model to itself, so no additional variance is explained.

For more details about $$R^2 $$, see coefficient of determination.

Total sum of squares in ANOVA
In analysis of variance (ANOVA) the total sum of squares is the sum of the so-called "within-samples" sum of squares and "between-samples" sum of squares, i.e., partitioning of the sum of squares. In multivariate analysis of variance (MANOVA) the following equation applies
 * $$\mathbf{T} = \mathbf{W} + \mathbf{B},$$

where T is the total sum of squares and products (SSP) matrix, W is the within-samples SSP matrix and B is the between-samples SSP matrix. Similar terminology may also be used in linear discriminant analysis, where W and B are respectively referred to as the within-groups and between-groups SSP matrices.