Formulas

FORMULAS / STRATEGY FOR STATISTICS

Probability

Complement Law - P(A) = 1 - P(A)

Laws Of Addition - P(A B) = P(A) + P(B) - P(A B), if A and B not mutually exclusive
P(A B) = P(A) + P(B), if A and B are mutually exclusive

Conditional Probability - P(A|B) = P(A B)
P(B)

Independent Condition - If A and B are independent, P(A B) = P(A) x P(B)

Laws Of Multiplication - If A and B are dependent, P(A B) = P(A) x P(B|A) or
P(A B) = P(B) x P(A|B)

Descriptive Statistics

Population Mean, m= all values
N

Sample Mean, x = all values
n

Population Variance, s2 = (X - m)2
N

Sample Variance, S2 = (x - x)2
n-1

Standard Deviation = square root of s2 or S2

Probability Distribution

Expected Value, E(x) = all x P(xi = x) = m
Properties of E(x),
E(a) = a
E(ax) = aE(x)
E(ax b) = aE(x) b
E(x1 x2) = E(x1) E(x2)
E(x2) = all x2 P(xi = x)

Variance, Var(x) = E(x - m)2 or Var(x) = E(x2) - n(x)2
Properties of Var(x),
Var(a) = 0
Var(ax) = a2Var(x)
Var(ax b) = a2E(x)
Var(x1 x2) = Var(x1) + Var(x2)
E(x2) = all x2 P(xi = x)

Standard Deviation = square root of var(x)

Binomial Distribution - x Bin (n , p)
Characteristics,
Experiment consist of a number of trials
Results of trials are only either success or failure
Probability of each test between trials are the same

E(x) = np
Var(x) = npq

Continuous Distribution - x N(m , s2)
Standardising, z = x - m
s

Normal Approximation to Binomial Distribution - x N(np , npq)
Conditions,
Number of trials n > 50
Must use continuity correction

Joint Probability

Conditional Mean - E(x | y=y1) = all x P(xi | y)

E(XY) = [all x all y P(xi = x and yi = y)]
When x and y are independent, E(XY) = E(X) E(Y)

Covariance of 2 random variables, sxy - Cov(XY) = E(XY) - E(X)E(Y)
When X and Y are independent, Cov(XY) = 0, since E(XY) = E(X)E(Y)

Correlation Coefficient, r = Cov(XY) ,-1 r 1
[Var(x) Var(y)]

Formula for Variance of linear combinations of 2 dependent variables -
Var(X Y) = Var(X) + Var (Y) 2Cov(XY)
Var(aX bY) = a2Var(X) + b2Var (Y) 2abCov(XY)

Distribution Of Sample Mean Sample Proportion

Let X denote the population variable. m the population mean and s2 the population variance.
then,
x N(m,s2/n)

Let P denote the population proportion with proportion P with n, the number of samples,
then
P N p , p [(1-p)/n] }

if P is unknown,
P N P , P [(1-P)/n] } approx. where P is the sample proportion with the use of continuity correction x (1/2n)

Theory Of Estimation

Mean Square Error - MSE = E(V - q)2 where V is the value of the estimator from the true value q
Best estimator of the true value is the one that yields the lowest MSE

Confidence Interval - The interval of which the true value is probable to be included.

3 Cases Of Formula For Confidence Interval -

For population mean where
m, s2 given, - m = x (s2/n)1/2 Zsig level
m given but s2 unknown, samples size n > 50 - m = x (S2/n)1/2 Zsig level
m given but s2 unknown, samples size n < 50 - m = x (S2/n)1/2 tsig level

For difference in population means mx my where
m, s2 given, -
mD = (x y) (sx2/nx + sy2/ny)1/2 Zsig level

m given but s2 unknown, samples size n > 50 -
mD = (x y) (Sx2/nx + Sy2/ny)1/2 Zsig level

m given but s2 unknown, samples size n < 50 -
mD = (x y) (Sp2/nx + Sp2/ny)1/2 tsig level where pooled variance, Sp2 = S(x-x)2 + S(y-y)2
nx + ny - 2

Sp2 = Sx2(nx-1) + Sy2(ny-1)
nx + ny - 2

Paired Samples -
mD = D (SD2/nD)1/2 tsig level where D is the difference between the paired samples.

For Population Proportion, p N p, [p(1-p)]/n }
p not given, then it is estimated with variance P(1-P)/n, in the confidence interval of
p = P (P(1-P)/n)1/2 Zsig level

Hypothesis Testing

Procedure:
State Null and Alternate hypothesis
Determine one or two sided test
Find Ztest or ttest and compare the result with Zcritical and Tcritical respectively
Decision Rule, |Ztest| < Zcritical or |ttest| < Tcritical then null hypothesis is true
Conclude in relation to hypothesis / question

e.g.,
Ztest = x - m
s/n

P-value -
Decision Rule
Reject H0 if p-value < level of significance
Accept H0 if p-value level of significance

Type I Error -