Formulas

FORMULAS / STRATEGY FOR STATISTICS

Probability

Complement Law - P(A’) = 1 - P(A)

Laws Of Addition - P(A È B) = P(A) + P(B) - P(A Ç B), if A and B not mutually exclusive
P(A È B) = P(A) + P(B), if A and B are mutually exclusive

Conditional Probability - P(A|B) = P(A Ç B)
P(B)

Independent Condition - If A and B are independent, P(A Ç B) = P(A) x P(B)

Laws Of Multiplication - If A and B are dependent, P(A Ç B) = P(A) x P(B|A) or
P(A Ç B) = P(B) x P(A|B)

Descriptive Statistics

Population Mean, m= å all values
N

Sample Mean, x’ = å all values
n

Population Variance, s2 = å (X - m)2
N

Sample Variance, S2 = å (x - x’)2
n-1

Standard Deviation = square root of s2 or S2

Probability Distribution

Expected Value, E(x) = å all x P(xi = x) = m
Properties of E(x),
E(a) = a
E(ax) = aE(x)
E(ax ± b) = aE(x) ± b
E(x1 ± x2) = E(x1) ± E(x2)
E(x2) = å all x2 P(xi = x)

Variance, Var(x) = E(x - m)2 or Var(x) = E(x2) - n(x’)2
Properties of Var(x),
Var(a) = 0
Var(ax) = a2Var(x)
Var(ax ± b) = a2E(x)
Var(x1 ± x2) = Var(x1) + Var(x2)
E(x2) = å all x2 P(xi = x)

Standard Deviation = square root of var(x)

Binomial Distribution - x Bin (n , p)
Characteristics,
Experiment consist of a number of trials
Results of trials are only either success or failure
Probability of each test between trials are the same

E(x) = np
Var(x) = npq

Continuous Distribution - x N(m , s2)
Standardising, z = x - m
s

Normal Approximation to Binomial Distribution - x N(np , npq)
Conditions,
Number of trials n > 50
Must use continuity correction

Joint Probability

Conditional Mean - E(x | y=y1) = å all x P(xi | y)

E(XY) = å [all x all y P(xi = x and yi = y)]
When x and y are independent, E(XY) = E(X) E(Y)

Covariance of 2 random variables, sxy - Cov(XY) = E(XY) - E(X)E(Y)
When X and Y are independent, Cov(XY) = 0, since E(XY) = E(X)E(Y)

Correlation Coefficient, r = Cov(XY) ,-1 £ r £ 1
Ö[Var(x) Var(y)]

Formula for Variance of linear combinations of 2 dependent variables -
Var(X ± Y) = Var(X) + Var (Y) ± 2Cov(XY)
Var(aX ± bY) = a2Var(X) + b2Var (Y) ± 2abCov(XY)

Distribution Of Sample Mean Sample Proportion

Let X denote the population variable. m the population mean and s2 the population variance.
then,
x’ N(m,s2/n)

Let P denote the population proportion with proportion P with n, the number of samples,
then
P N p , p [(1-p)/n] }

if P is unknown,
P N P , P [(1-P)/n] } approx. where P is the sample proportion with the use of continuity correction x ± (1/2n)

Theory Of Estimation

Mean Square Error - MSE = E(V - q)2 where V is the value of the estimator from the true value q
Best estimator of the true value is the one that yields the lowest MSE

Confidence Interval - The interval of which the true value is probable to be included.

3 Cases Of Formula For Confidence Interval -

For population mean where
m, s2 given, - m = x’ ± (s2/n)1/2 Zsig level
m given but s2 unknown, samples size n > 50 - m = x’ ± (S2/n)1/2 Zsig level
m given but s2 unknown, samples size n < 50 - m = x’ ± (S2/n)1/2 tsig level

For difference in population means mx my where
m, s2 given, -
mD = (x’ ± y’) ± (sx2/nx + sy2/ny)1/2 Zsig level

m given but s2 unknown, samples size n > 50 -
mD = (x’ ± y’) ± (Sx2/nx + Sy2/ny)1/2 Zsig level

m given but s2 unknown, samples size n < 50 -
mD = (x’ ± y’) ± (Sp2/nx + Sp2/ny)1/2 tsig level where pooled variance, Sp2 = S(x-x’)2 + S(y-y’)2
nx + ny - 2

Sp2 = Sx2(nx-1) + Sy2(ny-1)
nx + ny - 2

Paired Samples -
mD = D’ ± (SD2/nD)1/2 tsig level where D is the difference between the paired samples.

For Population Proportion, p N p, [p(1-p)]/n }
p not given, then it is estimated with variance P(1-P)/n, in the confidence interval of
p = P ± (P(1-P)/n)1/2 Zsig level

Hypothesis Testing

Procedure:
State Null and Alternate hypothesis
Determine one or two sided test
Find Ztest or ttest and compare the result with Zcritical and Tcritical respectively
Decision Rule, |Ztest| < Zcritical or |ttest| < Tcritical then null hypothesis is true
Conclude in relation to hypothesis / question

e.g.,
Ztest = x’ - m
s/Ön

P-value -
Decision Rule
Reject H0 if p-value < level of significance
Accept H0 if p-value ³ level of significance

Type I Error -