Squared deviations from the mean

From HandWiki
Revision as of 07:18, 27 June 2023 by NBrush (talk | contribs) (linkage)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Squared deviations from the mean (SDM) result from squaring deviations. In probability theory and statistics, the definition of variance is either the expected value of the SDM (when considering a theoretical distribution) or its average value (for actual experimental data). Computations for analysis of variance involve the partitioning of a sum of SDM.

Background

An understanding of the computations involved is greatly enhanced by a study of the statistical value

[math]\displaystyle{ \operatorname{E}( X ^ 2 ) }[/math], where [math]\displaystyle{ \operatorname{E} }[/math] is the expected value operator.

For a random variable [math]\displaystyle{ X }[/math] with mean [math]\displaystyle{ \mu }[/math] and variance [math]\displaystyle{ \sigma^2 }[/math],

[math]\displaystyle{ \sigma^2 = \operatorname{E}( X ^ 2 ) - \mu^2. }[/math][1]

Therefore,

[math]\displaystyle{ \operatorname{E}( X ^ 2 ) = \sigma^2 + \mu^2. }[/math]

From the above, the following can be derived:

[math]\displaystyle{ \operatorname{E}\left( \sum\left( X ^ 2\right) \right) = n\sigma^2 + n\mu^2, }[/math]
[math]\displaystyle{ \operatorname{E}\left( \left(\sum X \right)^ 2 \right) = n\sigma^2 + n^2\mu^2. }[/math]

Sample variance

Main page: Sample variance

The sum of squared deviations needed to calculate sample variance (before deciding whether to divide by n or n − 1) is most easily calculated as

[math]\displaystyle{ S = \sum x ^ 2 - \frac{\left(\sum x\right)^2}{n} }[/math]

From the two derived expectations above the expected value of this sum is

[math]\displaystyle{ \operatorname{E}(S) = n\sigma^2 + n\mu^2 - \frac{n\sigma^2 + n^2\mu^2}{n} }[/math]

which implies

[math]\displaystyle{ \operatorname{E}(S) = (n - 1)\sigma^2. }[/math]

This effectively proves the use of the divisor n − 1 in the calculation of an unbiased sample estimate of σ2.

Partition — analysis of variance

Main page: Partition of sums of squares

In the situation where data is available for k different treatment groups having size ni where i varies from 1 to k, then it is assumed that the expected mean of each group is

[math]\displaystyle{ \operatorname{E}(\mu_i) = \mu + T_i }[/math]

and the variance of each treatment group is unchanged from the population variance [math]\displaystyle{ \sigma^2 }[/math].

Under the Null Hypothesis that the treatments have no effect, then each of the [math]\displaystyle{ T_i }[/math] will be zero.

It is now possible to calculate three sums of squares:

Individual
[math]\displaystyle{ I = \sum x^2 }[/math]
[math]\displaystyle{ \operatorname{E}(I) = n\sigma^2 + n\mu^2 }[/math]
Treatments
[math]\displaystyle{ T = \sum_{i=1}^k \left(\left(\sum x\right)^2/n_i\right) }[/math]
[math]\displaystyle{ \operatorname{E}(T) = k\sigma^2 + \sum_{i=1}^k n_i(\mu + T_i)^2 }[/math]
[math]\displaystyle{ \operatorname{E}(T) = k\sigma^2 + n\mu^2 + 2\mu \sum_{i=1}^k (n_iT_i) + \sum_{i=1}^k n_i(T_i)^2 }[/math]

Under the null hypothesis that the treatments cause no differences and all the [math]\displaystyle{ T_i }[/math] are zero, the expectation simplifies to

[math]\displaystyle{ \operatorname{E}(T) = k\sigma^2 + n\mu^2. }[/math]
Combination
[math]\displaystyle{ C = \left(\sum x\right)^2/n }[/math]
[math]\displaystyle{ \operatorname{E}(C) = \sigma^2 + n\mu^2 }[/math]

Sums of squared deviations

Under the null hypothesis, the difference of any pair of I, T, and C does not contain any dependency on [math]\displaystyle{ \mu }[/math], only [math]\displaystyle{ \sigma^2 }[/math].

[math]\displaystyle{ \operatorname{E}(I - C) = (n - 1)\sigma^2 }[/math] total squared deviations aka total sum of squares
[math]\displaystyle{ \operatorname{E}(T - C) = (k - 1)\sigma^2 }[/math] treatment squared deviations aka explained sum of squares
[math]\displaystyle{ \operatorname{E}(I - T) = (n - k)\sigma^2 }[/math] residual squared deviations aka residual sum of squares

The constants (n − 1), (k − 1), and (n − k) are normally referred to as the number of degrees of freedom.

Example

In a very simple example, 5 observations arise from two treatments. The first treatment gives three values 1, 2, and 3, and the second treatment gives two values 4, and 6.

[math]\displaystyle{ I = \frac{1^2}{1} + \frac{2^2}{1} + \frac{3^2}{1} + \frac{4^2}{1} + \frac{6^2}{1} = 66 }[/math]
[math]\displaystyle{ T = \frac{(1 + 2 + 3)^2}{3} + \frac{(4 + 6)^2}{2} = 12 + 50 = 62 }[/math]
[math]\displaystyle{ C = \frac{(1 + 2 + 3 + 4 + 6)^2}{5} = 256/5 = 51.2 }[/math]

Giving

Total squared deviations = 66 − 51.2 = 14.8 with 4 degrees of freedom.
Treatment squared deviations = 62 − 51.2 = 10.8 with 1 degree of freedom.
Residual squared deviations = 66 − 62 = 4 with 3 degrees of freedom.

Two-way analysis of variance

In statistics, the two-way analysis of variance (ANOVA) is an extension of the one-way ANOVA that examines the influence of two different categorical independent variables on one continuous dependent variable. The two-way ANOVA not only aims at assessing the main effect of each independent variable but also if there is any interaction between them.

See also

References

  1. Mood & Graybill: An introduction to the Theory of Statistics (McGraw Hill)