MATHSLIVE .ie
STATISTICS · HLPresenting Data
STATISTICS · HL

Presenting Data

Histograms, scatter, stem & leaf, skew.

Section 1 of 6

Histogram

A histogram shows grouped (continuous) data as bars. Bar height = frequency.
Bars touch — intervals are continuous, not separate categories.

(i)   Worked example

Ages of 20 people:
Age0–22–44–66–8
People5834
Bar width = age interval. Bar height = number of people.

(ii)   Median from a histogram

Total = $5 + 8 + 3 + 4 = 20$ people.
Median position $= \dfrac{n+1}{2}$.
$\dfrac{20+1}{2} = \dfrac{21}{2} = 10.5$
So the median sits between the 10th and 11th person.
Count bars left to right: $5$ in $0$–$2$, then the next $8$ are in $2$–$4$.
The 10th and 11th both fall in the $2$–$4$ interval.
The median age lies between $2$ and $4$.
People Age 0 2 4 6 8 0 2 4 6 8
Must learn
1.Bar height = frequency.
2.Bars touch — data is continuous.
3.Median position $= \dfrac{n+1}{2}$. Count bars to find the interval.
YOU TRY · 1
From the same histogram, in which age interval does the median lie?
Median position = $(n+1)/2$. Then count bars.
$n = 20$, position $= 21/2 = 10.5$
5 in 0–2, next 8 in 2–4 — so 10th & 11th sit in 2–4.
Median is in the $2$–$4$ interval.
$2$–$4$ interval
Section 2 of 6

Scatter plot

A scatter plot shows pairs $(x, y)$ as dots. We're looking for a relationship between the two variables.

(i)   Example — education vs income

Years of education ($x$) and income in €1,000s ($y$):
YearsIncome (€000s)
1165
1128
1230
1335
1343
1455
1538
1645
1758
1970

(ii)   The outlier

One point sticks out — far from the trend:
$(11, 65)$ — the "big farmer".
11 years of education but earning €65,000 — doesn't fit the trend. We flag it and explain.

(iii)   Is there a relationship?

Yes. There is a correlation. Positive.
As years of education increase, in general so does income.
Income Years 11 12 13 14 15 16 17 18 19 25 35 45 55 65 × × × × × × × × × × line of best fit outlier
Correlation
1.Correlation coefficient is $r$, with $-1 \le r \le 1$.
2.Closer to $1$ or $-1$, the stronger the relationship.
3.Correlation is the LINEAR relationship between $2$ variables.
4.Slope of line of best fit = rate of change of one variable as the other changes.
$r = 0.45 \quad \Rightarrow \quad$ weak positive relationship.

(iv)   Using the line of best fit

Equation of the line:
$y = a + bx$
$y = 9.1 + 2.5x$
(1) What is income after 18 years of education?
$x = 18$
$y = 9.1 + 2.5(18) = 54.1$
Income $\approx$ €$54{,}100$
(2) How many years to earn €50,000?
$y = 50$
$50 = 9.1 + 2.5x$
$2.5x = 40.9$
$x = 16.3$ years

(v)   Meaning of the slope

$b = m = 2.5$
For every extra year in education, income increases by €2,500.

(vi)   Line of best fit without a calculator

Pick any 2 points on the line. Carl uses $(12, 30)$ and $(17, 58)$.
$m = \dfrac{y_2 - y_1}{x_2 - x_1} = \dfrac{58 - 30}{17 - 12} = \dfrac{28}{5}$
$y - y_1 = m(x - x_1)$
$y - 30 = \dfrac{28}{5}(x - 12)$
YOU TRY · 2
Using $y = 9.1 + 2.5x$, predict the income for someone with $20$ years of education.
Sub $x = 20$ into the equation.
$y = 9.1 + 2.5(20) = 9.1 + 50 = 59.1$
$\approx$ €$59{,}100$
$\approx$ €$59{,}100$
Section 3 of 6

Causality

Correlation is not the same as causation.
Causality
Q.Has one variable caused the other to happen?

(i)   Smoking causes cancer.

Yes — causation.

(ii)   Hot weather causes more ice cream sales.

No — that's only correlation.
Both rise together in summer, but the temperature doesn't cause the sale. People choose to buy.
YOU TRY · 3
A study finds that students who eat breakfast score higher in exams. Does breakfast cause the higher score?
Correlation ≠ causation. Could something else explain it?
It's a correlation. Other factors — sleep, home support, organisation — could be the real cause.
No — correlation only.
No — correlation, not causation.
Section 4 of 6

Stem and leaf

A stem and leaf plot shows the raw data, sorted, with the leaf = one digit only.
The KEY is key — without it, nobody knows what the numbers mean.

(i)   Worked example

Draw a stem and leaf of:   $143, \; 137, \; 129, \; 133, \; 144$
129
133   7
143   4
Key:   $12 \mid 9 = 129$

(ii)   Back-to-back stem and leaf

Compare two sets on the same stem. Read the left side outwards.
John:   $33, \; 45, \; 37, \; 42, \; 29$
Ann:   $29, \; 21, \; 33, \; 48, \; 49$
John Ann
9 2 1   9
7   3 3 3
5   2 4 8   9
Keys:   $9 \mid 2 = 29$   (John)  ·  $4 \mid 9 = 49$   (Ann)

(iii)   Compare the two

Compare using mean and standard deviation.
Smaller $\sigma$ = more consistent.
Must learn
1.Leaf = one digit only.
2.The key is key. Always include it.
3.Back-to-back: left side reads outwards.
4.Compare with mean and $\sigma$.
YOU TRY · 4
Looking at the John vs Ann back-to-back plot, which person has the more consistent scores?
Consistent = smaller spread = smaller $\sigma$.
John: $29, 33, 37, 42, 45$   range $= 16$
Ann: $21, 29, 33, 48, 49$   range $= 28$
John — smaller spread, smaller $\sigma$.
John — smaller $\sigma$.
Section 5 of 6

Line plot (dot plot)

A line plot (a.k.a. dot plot) shows frequencies as stacked dots above each value.

(i)   Worked example

Goals scored:
Goals0123
Cases5321
Show on a line plot — one × per case, stacked above each value.
Cases 0 1 2 3 Goals × × × × × × × × × × ×
The shape on the dot plot tells you about the skew — see next section.
YOU TRY · 5
From the line plot, what is the total number of cases?
Add up all the cases.
$5 + 3 + 2 + 1 = 11$
$11$ cases
$11$
Section 6 of 6

Skew

Skew = the cow's tail. The skew points in the direction of the long tail.

(i)   Right skew (positive)

Mode < Median < Mean
Example: family size. Most families small, a few very large pull the mean up.

(ii)   Left skew (negative)

Mean < Median < Mode
Example: 6th year shoe size. Most at the upper end, a few smaller pull the mean down.
Mode Mean Median

(iii)   Symmetrical

Mode = Mean = Median
Mode $=$ Mean $=$ Median
Mean Mode Median
Skew — remember
1.Skew = the cow's tail.
2.Right (positive): Mode < Median < Mean. (family size)
3.Left (negative): Mean < Median < Mode. (6th yr shoe size)
4.Symmetrical: all three equal.
YOU TRY · 6
A distribution has Mean $= 8$, Median $= 6$, Mode $= 5$. What's the skew?
Compare the three. Where does the tail point?
Mode $= 5$ < Median $= 6$ < Mean $= 8$. Tail points right.
Right skew (positive).
Right skew (positive)
SUM

The lot in one box

Presenting data toolkit
1.Histogram: continuous data, bars touch. Median position $= (n+1)/2$.
2.Scatter: outlier? relationship? line of best fit $y = a + bx$. Slope $b$ = rate of change.
3.Correlation $r$: $-1 \le r \le 1$. Closer to $\pm 1$ = stronger. LINEAR only.
4.Causality: has one variable caused the other? Correlation ≠ causation.
5.Stem & leaf: leaf $=$ one digit, key is key. Back-to-back reads outwards on left.
6.Line plot: stacked $\times$ above each value.
7.Skew: the cow's tail. Right: Mo < Me < Mean. Left: Mean < Me < Mo.

End of lesson

Presenting Data — HL · Mathslive.ie

Tap NEXT to reveal the first line
0%0 / 0