Data handling: summarising and interpreting data – Week 5 focus
Download the Lessonotes Mobile South Africa app for faster lesson access on Android and iPhone.
Subject: Mathematical Literacy
Class: Grade 11
Term: Term 4
Week: 5
Theme: General lesson support
This page supports the lesson note with a companion video and a short classroom-ready summary.
For class groups and homework, share this lesson page so learners also get the summary, objectives, and full lesson context.
This week, we delve deeper into the world of data handling, specifically focusing on how to effectively summarise and interpret data. In our modern world, data is everywhere! From census data that influences government resource allocation to market research used by businesses, understanding how to make sense of information presented in tables, charts, and graphs is crucial. This skill isn't just for academics; it empowers you to make informed decisions in your everyday life, whether it's budgeting your money, understanding election results, or interpreting health information.
2.1 Measures of Central Tendency These are single values that attempt to describe a set of data by identifying the "central" position within that data.
Mean: The average. Calculate it by summing all values and dividing by the number of values.
Ungrouped Data: Simply add all the data values and divide by the total number of values.
Formula: Mean (x̄) = (Σx) / n, where Σx is the sum of all values and n is the number of values.
Example: The ages of 5 students are 16, 17, 16, 18, and
1
7. The mean age is (16 + 17 + 16 + 18 + 17) / 5 = 84 / 5 = 16.8 years.
Grouped Data: When data is presented in frequency tables, we use the midpoints of the intervals. Multiply each midpoint by its frequency, sum these products, and then divide by the total frequency.
Formula: Mean (x̄) = (Σ(f m)) / Σf, where f is the frequency of each interval and m is the midpoint of each interval.
Example: A survey of household income (in Rands) is presented in the table below: | Income (R) | Frequency | |------------|-----------| | 0 - 5000 | 10 | | 5001 - 10000| 15 | | 10001 - 15000| 8 | | 15001 - 20000| 2 | Midpoints: 2500, 7500, 12500, 17500 Σ(f m) = (10 2500) + (15 7500) + (8 12500) + (2 17500) = 25000 + 112500 + 100000 + 35000 = 272500 Σf = 10 + 15 + 8 + 2 = 35 Mean = 272500 / 35 = R7785.71 (approximately)
Median: The middle value when the data is arranged in ascending order.
Ungrouped Data: If there's an odd number of data points, the median is the middle value. If there's an even number, the median is the average of the two middle values.
Example 1 (Odd): Ages: 12, 14, 15, 16,
1
8. The median is
1
5. Example 2 (Even): Ages: 12, 14, 15,
1
6. The median is (14 + 15) / 2 = 14.
5. Grouped Data: The median class is the class interval that contains the median. To find the median, we use interpolation. This is more complex and typically introduced in higher-level mathematics but understanding the concept of the median class is important. The median class is the class interval where the cumulative frequency is greater than or equal to half the total frequency.
Mode: The value that appears most frequently in the data set. A data set can have no mode (if all values appear only once), one mode (unimodal), or more than one mode (bimodal, trimodal, etc.).
Ungrouped Data: Simply identify the most frequent value.
Example: Colours of cars in a parking lot: Red, Blue, Red, Green, Red, Blue, Black. The mode is Red.
Grouped Data: The modal class is the class interval with the highest frequency. Why are these measures important and when should we use them? The mean is sensitive to outliers (extreme values). If the data contains outliers, the mean may not be the best representation of the central tendency. The median is resistant to outliers. It's a better measure of central tendency when the data is skewed (has a long tail on one side). The mode is useful for categorical data (e.g., colours, types of transport).
Example in context: Imagine analysing the salaries of employees at a small business in Cape Town. If the owner earns significantly more than the other employees, the mean salary will be inflated. In this case, the median salary would be a better indicator of the typical employee's earnings. 2.2 Measures of Dispersion These measures describe the spread or variability of data.
Range: The difference between the highest and lowest values. It's simple to calculate but sensitive to outliers.
Formula: Range = Highest Value - Lowest Value
Example: Ages: 10, 12, 15, 18,
2
0. Range = 20 - 10 = 10 years.
Interquartile Range (IQR): The difference between the upper quartile (Q3) and the lower quartile (Q1). The quartiles divide the data into four equal parts. Q1 is the value below which 25% of the data falls, and Q3 is the value below which 75% of the data falls. The IQR represents the range of the middle 50% of the data. To find Q1 and Q3, first order the data. Q2 is the median. The IQR is less sensitive to outliers than the range.
Formula: IQR = Q3 - Q1
Example: Ages: 10, 12, 15, 18, 20, 22, 25, 28,
3
0. Q1 = 12 (Median of the lower half: 10, 12, 15, 18) Q3 = 28 (Median of the upper half: 22, 25, 28, 30) IQR = 28 - 12 = 16 years 2.3 Box-and-Whisker Plots (Box Plots) A box plot is a visual representation of data that shows the minimum value, Q1, median (Q2), Q3, and maximum value. It provides a clear picture of the data's distribution, symmetry, and potential outliers.
Construction: Draw a number line that covers the range of the data. Draw a box from Q1 to Q
3. Draw a vertical line inside the box to represent the median (Q2). Draw "whiskers" extending from the box to the minimum and maximum values, excluding outliers. Outliers are often represented as individual points beyond the whiskers. A common rule is to define outliers as values that are less than Q1 - 1.5IQR or greater than Q3 + 1.5IQ
R. Interpretation: The length of the box represents the IQR, indicating the spread of the middle 50% of the data. A longer box indicates greater variability.