Lesson Notes By Weeks and Term v5 - Grade 11

Data handling: summarising and interpreting data – Week 1 focus

Download the Lessonotes Mobile South Africa app for faster lesson access on Android and iPhone.

Get it on Google Play

Get it on the Apple App Store

Subject: Mathematical Literacy

Class: Grade 11

Term: Term 4

Week: 1

Theme: General lesson support

Lesson Video

This page supports the lesson note with a companion video and a short classroom-ready summary.

For class groups and homework, share this lesson page so learners also get the summary, objectives, and full lesson context.

Performance objectives

Calculate and interpret mean, median, and mode.
Calculate and interpret range, quartiles, and IQR.
Create and interpret box and whisker plots.
Calculate and interpret variance and standard deviation.
Compare data sets using these measures.

Lesson summary

Data handling is a crucial skill in today's world, especially in South Africa, where we are constantly bombarded with information from various sources – news reports on crime statistics, economic indicators affecting household budgets, health data relating to disease outbreaks, and survey results influencing government policies. Understanding how to summarise and interpret this data empowers you to make informed decisions about your life, your community, and your country. It allows you to critically evaluate information, identify trends, and challenge misleading claims.

Lesson notes

2.1 Measures of Central Tendency: Measures of central tendency aim to describe the "center" of a data set.

Mean (Average): The sum of all the values divided by the number of values.

Formula: Mean (x̄) = Σx / n, where Σx is the sum of all values and n is the number of values.

Example: Consider the monthly salaries of 5 workers at a construction site: R4500, R5000, R5500, R5000, R

6

0

0

0. Mean = (4500 + 5000 + 5500 + 5000 + 6000) / 5 = R26000 / 5 = R

5

2

0

0. Advantage: Uses all data values.

Disadvantage: Highly affected by outliers (extreme values). For instance, if one worker earned R20000, the mean would be significantly inflated, misrepresenting the "typical" salary.

Median: The middle value when the data is arranged in ascending order. If there's an even number of values, the median is the average of the two middle values.

Example: Using the same salaries: R4500, R5000, R5000, R5500, R

6

0

0

0. The median is R5000 (the middle value).

Advantage: Not affected by outliers. In the previous scenario with the R20000 salary, the median would still be around R5000, offering a more accurate representation.

Disadvantage: Doesn't use all data values, potentially ignoring valuable information.

Mode: The value that appears most frequently in the data set.

Example: Using the same salaries: R4500, R5000, R5000, R5500, R

6

0

0

0. The mode is R5000 (appears twice).

Advantage: Easy to identify and useful for categorical data (e.g., the most popular cellphone brand).

Disadvantage: May not exist (if all values are different) or may not be representative of the data's center. 2.2 Measures of Spread (Dispersion): Measures of spread describe how the data is distributed around the center.

Range: The difference between the highest and lowest values in the data set.

Formula: Range = Maximum Value - Minimum Value

Example: Salaries: R4500, R5000, R5500, R5000, R

6

0

0

0. Range = R6000 - R4500 = R1500 Advantage: Simple to calculate.

Disadvantage: Highly sensitive to outliers and only considers the extreme values.

Quartiles: Divide the data into four equal parts after arranging it in ascending order.

Q1 (First Quartile): The median of the lower half of the data. 25% of the data lies below Q

1. Q2 (Second Quartile): The median of the entire data set. 50% of the data lies below Q

2. Q3 (Third Quartile): The median of the upper half of the data. 75% of the data lies below Q

3. Example: Consider the following ages of people at a clinic: 12, 15, 18, 20, 22, 25, 28, 30, 32,

3

5. Q2 (Median) = (22 + 25) / 2 = 23.5 Lower Half: 12, 15, 18, 20,

2

2. Q1 = 18 Upper Half: 25, 28, 30, 32,

3

5. Q3 = 30 We can use a calculator or software to get these values.

Interquartile Range (IQR): The difference between the third quartile (Q3) and the first quartile (Q1).

Formula: IQR = Q3 - Q1

Example: Using the quartiles from the previous example: IQR = 30 - 18 = 12 Advantage: Less sensitive to outliers than the range. Represents the spread of the middle 50% of the data.

Disadvantage: Doesn't consider the extreme values.

Box and Whisker Plot: A visual representation of data using the minimum value, Q1, median (Q2), Q3, and maximum value. It helps to visualise the spread and skewness of the data, and identify potential outliers. Data points significantly below Q1 – 1.5IQR or above Q3 + 1.5*IQR are considered outliers.

Example: Based on the ages at the clinic example.

Minimum: 12, Q1: 18, Median: 23.5, Q3: 30, Maximum:

3

5. We would draw a number line and mark these values, creating the "box" between Q1 and Q3, with a line at the median. Then, "whiskers" extend to the minimum and maximum values that are NOT outliers. Outliers are marked with dots.

Variance and Standard Deviation: These measures quantify the average squared deviation of each data point from the mean. The standard deviation is the square root of the variance.

Variance Formula (Population): σ² = Σ(xᵢ - μ)² / N, where xᵢ is each value, μ is the population mean, and N is the population size.

Variance Formula (Sample): s² = Σ(xᵢ - x̄)² / (n - 1), where xᵢ is each value, x̄ is the sample mean, and n is the sample size.

Standard Deviation Formula (Population): σ = √σ² Standard Deviation Formula (Sample): s = √s²

Example: Salaries: R4500, R5000, R5500, R5000, R

6

0

0

0. Mean = R

5

2

0

0. Calculate deviations from the mean: -700, -200, 300, -200,

8

0

0. Square the deviations: 490000, 40000, 90000, 40000,

6

4

0

0

0

0. Sum the squared deviations: 490000 + 40000 + 90000 + 40000 + 640000 =

1

3

0

0

0

0

0. Assuming this is a sample, divide by (n-1) = 4: 1300000 / 4 = 325000 (variance).

Take the square root: √325000 ≈ R570.09 (standard deviation).

Advantage: Uses all data points, providing a comprehensive measure of spread.

Disadvantage: More complex to calculate than the range or IQ

R. Sensitive to outliers (although less so than range directly). Important

Note: It is important to understand when to use each measure. If your data contains outliers, the median and IQR are more robust measures than the mean and standard deviation.