Enjoy learning Statistics Online! Please be sure to share and subscribe to our YouTube channel.

Hello Professor, I was working on 2.3 problem sets and I do not understand the concept behind question 22 and 24. Thank you

Posted to STATS 1 on Monday, January 27, 2014   Replies: 1

Professor Mcguckian
01/27/2014
6:33 PM EST

Hi Cassidy,

The concept behind this problem is fairly straightforward. If there are extreme values present in the set of sample data, it is better to use the median as a measure of the center because the mean will start to become unrepresentative of what is happening typically in the data set when there are extreme values that can pull the mean too far to the right or left. Extreme values are values that stand out from the typical data values in the data set.

For example, let's consider the following set of hourly pay rates for an office: 10, 20, 21, 22, 27, 500.  The mean for the set is 600/6 = 100. The median for the same set is 21.5. The average may be 100 by definition, but does that do a good job of representing what is typical? The people earning between 10 and 27 dollars per hour would find the mean of 100 a ridiculous overstatement of their pay, while the person earning 500 dollars per hour would find the average of 100 insultingly low.

The median of 21.5 would be interpreted as the middle pay. This means half the people earn less than 21.5 dollars per hour and half earn more. This is factually true of course, but more than that, it does a better job describing the pay of most of the office employees. For this reason, the median is the better choice because of the extreme data value of 500 that is present. If the 500 wasn't a part of the data set, we would prefer to use the mean. In most cases, salary data, home value data, ... use the median as a measure of the center because it is less affected by those extreme values. For most everything else, the mean is the better option. If you are asked this on an exam, simply scan the data set for any values that really standout as being very large (or small) when compared to the rest of the data. If there are those extremes, opt to use the median instead of the mean.

I hope that helps,

Professor McGuckian