<em>Note: The second image should be titled "Data Set #2"</em>
It seems like you've already answered a part of this question yourself, but let's get into the details.
<h3>Parts A and B: 5 Number Summary</h3>The five values the questions ask you to find the minimum, maximum, median, and the first and third quartiles for both of the data sets. These data are frequently called the <em>five number summary</em> of a data set, and we can use them to create a <em>box plot</em> of our data. The meanings of maximum and minimum are pretty obvious - they're just the biggest and smallest values in the set - but the median and the first and third quartiles all refer to different "middles" in the set.
The <em>median</em> is the "middle value" of a set of ordered data. When we have an odd number of data points, the median is simply the middle number of the set, but when we have an even number, as is the case with these two data sets, we have to find the number halfway between the two middle values. In data set 1, that number is
In data set 2, it's the number halfway between 8 and 10, which is 9.
The median splits any set of data into two parts: all the data points <em>smaller </em>than the median, and all of those <em>larger</em> than the median. In data set one, it's the two subsets {1, 4, 9, 16} and {25, 36, 49, 64}. The median of the smaller set gives us the <em>first quartile</em>, and the median of the larger one gives us the <em>third quartile.</em>
Why "first" and "third" quartile? Where are the second and fourth ones? While they don't go be the titles officially, those values are already part of our five number summary:
- First quartile
- Second quartile (the median)
- Third quartile
- Fourth quartile (the maximum)
To get a visual for how our data is spread out, we can visualize our five-number-summary with a <em>box plot</em>. I've created a box plot for each of the data sets in the first two image uploads. The little nubs on the far ends, sometimes called the "whiskers" of the plot, are the minimum and maximum of the data set; the "box" represents the <em>interquartile range</em> of the data: all the values between the first and third quartile of the data; and the notch going down through the box is the median of the data.
We can see at a glance that data set 1 spans a far greater range of values that data set 2, and that its data points tend to be more concentrated in the lower values. Data set 2, by contrast, is much more uniform; its median lies right in the center of its range, and the "box" is centered similarly along it.
Comparing the medians of two data sets, especially those with the same number of values, can give us valuable information as to how much "larger" or "smaller" one set is than the other, but we need to bring in the other numbers in the five-number summary for a better picture about how that data is spread out.
<h3>Part D: Histograms vs. Box Plots: Which one is better?</h3>There's no correct answer to this, because each type of graph gives us insight into different aspect of a data set.
A box and whisker plot is great for understanding:
- The range of a set of data
- Its spread
- Its center
While a histogram can reveal:
- <em>How </em>and <em>where</em> values are concentrated
- Gaps and outliers
The histogram of data set 1, set to constant intervals of 7 units, shows us that many of the values at the lower end, and get more spread out as we go further - the empty patches become more frequent as we continue to the right, suggesting that our values will become more sparse as they get larger.
Contrast that with data set 2, which has a totally flat, uniform distribution when viewed at a constant interval of 4 units. The box plot and histogram work in tandem to give us a visual, quantitative picture of our data which we can use to make informed conclusions about it.