Bar Charts of Quantitative Measures

Bar charts are used to compare categories of data. In our course, they are particularly useful for conveying Quantitative Measures with a visualization. Bar charts are a little bit more complicated than histograms and line plots. However, they are not that bad. Remember, bar charts are different from histograms!

Let us say we wanted to compare the average earthquake magnitude across three locations ("California", "Peru", "China"). Although you may be tempted to choose many more locations, you should be strategic and keep your number of bars low.

 bar_chart_example.png

You will need to:

  • Calculate the averages for each of these cities
  • Create the list of Quantitative Measures
  • Create the list of Labels
  • Create the X-coordinates for the bars
  • Call the .bar() and .show() functions

For each one of our cities, we need to compute a quantitative measure (the Average). Earlier in the course, we would calculate the average for all datasets like so:

import earthquakes
list_of_earthquake = earthquakes.get_earthquakes() magnitude_total = 0 magnitude_count = 0 for earthquake in list_of_earthquake: magnitude_total = magnitude_total + earthquake['impact']['magnitude'] magnitude_count = magnitude_count + 1 magnitude_average = magnitude_total / magnitude_count

Then we learned how we could filter for a specific city:

china_total = 0
china_count = 0
for earthquake in list_of_earthquake:
    if earthquake['location']['state'] == "China":
        china_total = china_total + earthquake['impact']['magnitude']
        china_count = china_count + 1
china_average = china_total / china_count

You should see how you can use this FOR loop, along with three IF statements, to calculate a set of three averages. We could print out these numbers, and embed the literal values into a list. That is WRONG. You should use the variables!

average_magnitudes = [china_average, peru_average, california_average]

Now you must also create the list of categories to write on the x-axis. These must be in the same order as the list above. Notice that we use quotes here, and not in the previous list. If you know why, then you have mastered the difference between String Values and Properties.

average_magnitude_labels = ["China", "Peru", "California"]

We must create one more list, to set the X coordinates of the bars. Here, we are *nesting* a function call inside another. The len() function gets the length of the list property (3, in this case). The range() function creates a new list from 0 to whatever the argument is (3). So the end result is that this just creates the list [0, 1, 2]. It's okay if this line doesn't make sense to you; you can just treat it as magic.

average_magnitude_xcoords = range(len(average_magnitude_labels))

Now we call the .bar() function with the appropriate parameters.

plt.bar(average_magnitude_xcoords, average_magnitudes, tick_label= average_magnitude_labels, align='center')

Phew! Quite a bit of code. Let's talk about the four parameters:

  1. The first parameter sets the X-coordinates of the bars. The first bar is placed at 0 on the x-axis, the second bar is placed at 1, and so on.
  2. The second parameter sets the heights of the bars. The first bar will be the height of china_average, the second bar will be the height of peru_average, and so on.
  3. The third parameter sets the labels on the x-axis. The first label will be "China", the second label will be "Peru", and so on.
  4. The fourth parameter is optional, but it aligns the labels to the middle of the bars. Otherwise, they are aligned to the left of the bars and look strange.

Never forget to call plt.show() at the end! Here's the complete final code, for your convenience.

import earthquakes
import matplotlib.pyplot as plt 

# Get the data, put it in a property
list_of_earthquake = earthquakes.get_earthquakes()

# Calculate the three averages
china_total = 0
china_count = 0
peru_total = 0
peru_count = 0
california_total = 0
california_count = 0
for earthquake in list_of_earthquake:
    if earthquake['location']['state'] == " China":
        china_total = china_total + earthquake['impact']['magnitude']
        china_count = china_count + 1
    elif earthquake['location']['state'] == " Peru":
        peru_total = peru_total + earthquake['impact']['magnitude']
        peru_count = peru_count + 1
    elif earthquake['location']['state'] == " California":
        california_total = california_total + earthquake['impact']['magnitude']
        california_count = california_count + 1
china_average = china_total / china_count
peru_average = peru_total / peru_count
california_average = california_total / california_count

# Combine averages into a list
average_magnitudes = [china_average, peru_average, california_average]
# Create our labels
average_magnitude_labels = ["China", "Peru", "California"]
# Create the x-coordinates of the bar
average_magnitude_xcoords = range(len(average_magnitude_labels))

# Create the bar graph
plt.bar(average_magnitude_xcoords, average_magnitudes, tick_label= average_magnitude_labels, align='center')
# Labelling
plt.title("Average Magnitude across Locations")
plt.xlabel("Locations")
plt.ylabel("Magnitude")
# Show the bar graph
plt.show()