MATHSLIVE.IE
Need to Know

Statistics

The methods, techniques and warnings — the things that don't fit on a flashcard. Read it, learn it, use it.

How to use this. Read each heading. Try to recall the method in your head before tapping Show me. No score, no pressure — this is where you learn it.

Sampling & surveys

5 cards
Card 1
Sampling — guidelines before you pick a method

1. Set a clear target population.

2. Set a clear sampling frame — the list you actually sample from.

3. Make sure the sample size is large enough.

4. Sample must be random.

5. No bias — don’t over-sample one group (e.g. men over women).

Sample size — too small = unreliable. Too big = costly & slow.
Card 2
Three sources of bias

Bias = distorted results. Three classic ways it sneaks in:

1. Sample not representative — e.g. asking about alcohol abuse in a pub.

2. Failure to respond — the people who don’t answer may be different.

3. Dishonest answers.

Card 3
Stratified — proportional, not equal

Use a natural division (gender, year group, class) to split the population into strata.

Pick the sample in proportion to each stratum’s size.

e.g. 1st years = 50, 6th years = 100 ⇒ take twice as many 6th years as 1st years in the sample.

Random within each stratum — otherwise you reintroduce bias.

Card 4
Survey types — pros & cons

Postal — cheap, large reach. But poor response rate, limited type of data.

Personal — high response, can ask lots. But expensive, interviewer bias.

Observation — systematic. But laborious, time-consuming.

Card 5
Good questionnaire — rules

Brief, clear, simple questions first.

Multiple-choice answers where possible.

• Clear instructions.

• Be clear who fills it in and how answers are recorded.

NO leading questions.

NO embarrassing or personal questions.

Averages, mean problems & spread

6 cards
Card 6
Mean problems — the workhorse formula

Sum = mean × n

If you know the mean and how many values there are, you know the total. Almost every “missing number” mean problem starts here.

Card 7
Missing number, given the mean

1. Use Sum = mean × n.

2. Write: (sum of known) + x = mean × n.

3. Solve for x.

e.g. Mean of 5 numbers is 7, four of them are 3, 8, 6, 9. → 3 + 8 + 6 + 9 + x = 35x = 9.
Card 8
Missing frequency, given the mean

1. From x = ΣfxΣf:

2. Write Σfx = mean × Σf.

3. Both sides contain the missing f. Expand and solve.

Card 9
Grouped data — what to use

Use the mid-interval value of each group as x, then apply x = ΣfxΣf.

The modal class is the group with the largest frequency — you can’t pick a single modal value in grouped data.

Note. Discrete data can sit on a continuous scale — e.g. a wage of €25,000 falls in the €20,000–€30,000 group.
Card 10
σ — the method (without a calculator)

1. Find the mean x.

2. Set up three columns: x   ·   (x − x)   ·   (x − x)2.

3. Sum the squares, divide by n, square root.

σ = √[ Σ(x − x)2 / n ]

Frequency table: σ = √[ Σf(x − x)2 / Σf ].

Tip. Use the calculator (1-VAR stats) unless the question says “without calculator”.
Card 11
Mean vs Median vs Mode — which to trust

Mode & Median — not affected by extreme values.

Mean — distorted by extremes but best for further analysis (σ, z-scores, etc.).

If the data is skewed or has outliers, quote the median. If it’s for further stats work, you need the mean.

Presenting data & correlation

4 cards
Card 12
Axis rules — easy marks

Frequency on the VERTICAL axis.

Always LABEL both axes (with units).

If you skip a label or put frequency on the x-axis, you lose marks even if the bars are right.

Card 13
Skew — mean follows the tail

Right (positive) skew — long tail on the right. Mode < Median < Mean. e.g. family size.

Left (negative) skew — long tail on the left. Mean < Median < Mode. e.g. 6th year shoe size.

SymmetricMean = Median = Mode.

The mean is pulled towards the tail. Just remember that and the order works itself out.

Card 14
Line of best fit — equation and use

A straight line through the middle of the data on a scatter plot.

y = a + bxa is the y-intercept, b is the slope.

Use it to predict y for a given x.

Slope = rate of change of one variable as the other changes.

e.g. b = 4.5 ⇒ each extra year of education raises income by €4,500.
Card 15
r — what it is and what it isn’t

r measures the strength and direction of the linear relationship.

−1 ≤ r ≤ 1.   |r| > 0.7 strong; 0.3–0.7 moderate; < 0.3 weak.

r is NOT the slope of the line of best fit. The slope is b. They’re different numbers.
r = 0 means no linear relationship — not “no relationship at all”.
Correlation ≠ Causality. Two variables can move together because of a hidden third variable, or by chance. e.g. hot day ↔ ice-cream sales: not causal. Smoking ↔ cancer: causal.

z-scores & the z-table

5 cards
Card 16
Empirical Rule — 68 / 95 / 99.7

For normally distributed data:

• Within of the mean → 68%

• Within 95%

• Within 99.7%

Quick check. Anything beyond is unusual; beyond is very rare.
Card 17
P(z < a) — direct lookup

Look up a in the body of the z-table:

Row = units & tenths (e.g. 1.2).

Column = hundredths (e.g. 0.03).

The number you read off is P(z < a).

Always sketch the curve and shade what you want before looking up — it stops sign mistakes.
Card 18
P(z > a) — total area is 1

P(z > a) = 1 − P(z < a)

The total area under the standard normal is 1. Take the left tail off 1.

Card 19
Negatives — use symmetry

P(z < −a) = 1 − P(z < a) — the left tail of −a equals the right tail of +a.

P(z > −a) = P(z < a) — everything to the right of a negative equals everything to the left of the positive.

For an interval: P(−a < z < b) = P(z < b) − P(z < −a). Convert the negative one using symmetry first.

Card 20
Inverse z — given a probability, find k

If the probability is > 0.5 (e.g. P(z < k) = 0.8765):

Find 0.8765 in the body of the table, then read k off the row & column. k is positive.

If the probability is < 0.5 (e.g. P(z < k) = 0.1234):

k is negative. Look up 1 − 0.1234 = 0.8766, read the z-value, then negate it.

Confidence intervals & hypothesis testing

5 cards
Card 21
CLT — use σ/√n, NOT σ

For a large sample (n ≥ 30) the sample mean x is approximately normal, with:

• Mean of the sample means = μ

• Standard deviation of the sample means = σ√n

Watch. Use σ/√nNOT just σ. The sample-mean spread is smaller than the raw spread.
Card 22
95% Confidence Interval — formula

Population proportion p, sample proportion p:

p − 1.96·SE  ≤ p ≤  p + 1.96·SE

Or simply p ± 1.96·SE.

Where SE = √[p(1−p)/n]   (or 1/√n as a rough version).

Card 23
Hypothesis test — 5-step CI method

1. State H0 (null) and HA (alternative).

2. Find the standard error.

3. Find the margin of error = 1.96 · SE.

4. Build the confidence interval.

5. Is the claimed figure inside the CI?

YESfail to reject H0.

NOreject H0.

Card 24
Hypothesis test — z-score method

1. State H0 and HA.

2. Test statistic: x − μσ / √n   (in log tables).

3. Decision rule:

−1.96 ≤ z ≤ 1.96fail to reject H0.

• Outside → reject H0.

4. State conclusion in plain English.

Card 25
p-value method — and the wording warning

The p-value is the probability of a result as extreme as observed, IF H0 were true.

p < 0.05reject H0.

p ≥ 0.05fail to reject.

Two-sided test: multiply the tail probability by 2.

e.g. z = 2P(z > 2) = 0.0228p = 2 × 0.0228 = 0.0456 < 0.05 ⇒ reject H0.
NEVER use the word “accept” — only “fail to reject”. Failing to reject doesn’t prove H0 is true.
Low p-value = strong evidence against H0.