Creating Volcano Maps with Pandas and the Matplotlib Basemap Toolkit

Author: Ramiro Gómez

Introduction

This notebook walks through the process of creating maps of volcanoes with Python. The main steps involve getting, cleaning and finally mapping the data.

All Python 3rd party packages used, except the Matplotlib Basemap Toolkit, are included with the Anaconda distribution and installed when you create an anaconda environment. To add Basemap simply run the command conda install basemap in your activated anaconda environment. To follow the code you should be familiar with Python, Pandas and Matplotlib.

Get into work

First load all Python libraries required in this notebook.

%load_ext signature
%matplotlib inline

import json

from lxml import html
from mpl_toolkits.basemap import Basemap

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

chartinfo = 'Author: Ramiro Gómez - ramiro.org | Data: Volcano World - volcano.oregonstate.edu'

Get the volcano data

The data is downloaded and parsed using lxml. The Volcano World source page lists the volcano data in several HTML tables, which are each read into individual Pandas data frames that are appended to a list of data frames. Since the page also uses tables for layout the first four tables are omitted.

url ='http://volcano.oregonstate.edu/oldroot/volcanoes/alpha.html'
xpath = '//table'
tree = html.parse(url)
tables = tree.xpath(xpath)

table_dfs = []
for idx in range(4, len(tables)):
    df = pd.read_html(html.tostring(tables[idx]), header=0)[0]
    table_dfs.append(df)

The next step is to create a single data frame from the ones in the list using Pandas' concat method. To create a new index with consecutive numbers the index_ignore parameter is set to True.

df_volc = pd.concat(table_dfs, ignore_index=True)

Let's take a look at the data contained in the newly created data frame.

print(len(df_volc))
df_volc.head(10)
1560
Name Location Type Latitude Longitude Elevation (m)
0 Abu Honshu-Japan Shield volcanoes 34.50 131.60 641
1 Acamarachi Chile-N Stratovolcano -23.30 -67.62 6046
2 Acatenango Guatemala Stratovolcano 14.50 -90.88 3976
3 Acigöl-Nevsehir Turkey Caldera 38.57 34.52 1689
4 Adams US-Washington Stratovolcano 46.21 -121.49 3742
5 Adams Seamount Pacific-C Submarine volcano -25.37 -129.27 -39
6 Adatara Honshu-Japan Stratovolcanoes 37.64 140.29 1718
7 Adwa Ethiopia Stratovolcano 10.07 40.84 1733
8 Afderà Ethiopia Stratovolcano 13.08 40.85 1295
9 Agrigan Mariana Is-C Pacific Stratovolcano 18.77 145.67 965

The data frame contains 1560 records with information on name, location, type, latitude, longitude and elevation. Let's first examine the different types.

df_volc['Type'].value_counts()
Stratovolcano           601
Shield volcano          121
Stratovolcanoes         109
Submarine volcano        95
Volcanic field           82
Caldera                  78
Cinder cones             62
Complex volcano          49
Shield volcanoes         31
Pyroclastic cones        30
Lava domes               26
Submarine volcano ?      19
Volcanic  field          16
Fissure vents            16
Shield  volcano          16
Submarine  volcano       16
Maars                    12
Compound volcano         10
Cinder  cones            10
Lava dome                 9
Cinder cone               9
Calderas                  9
Pyroclastic cone          8
Tuff cones                7
Scoria cones              7
Maar                      7
Unknown                   6
Complex  volcano          6
Somma volcano             5
Lava  domes               5
                       ...
Subglacial  volcano       2
Lava cone                 2
Explosion craters         2
Fissure vent              2
Fissure  vents            2
Volcanic Landform         1
Tuff  cone                1
Lava  cone                1
Compound  volcano         1
Scoria  cones             1
Lava Cone                 1
Lava cones                1
Lava Field                1
Somma volcanoes           1
Submarine  volcano ?      1
Cinder Cone               1
Flood  Basalt             1
Flood Basalts             1
Cone                      1
Lava  dome                1
Shield volcanoe           1
Pyroclastic  cone         1
Stratovolcano  ?          1
Complex volcanoes         1
Cones                     1
Island Arc                1
Tuff  rings               1
Crater  rows              1
Hydrothermal field        1
Pumice cone               1
dtype: int64

Looking at the output we see that a single type may be represented by diffent tokens, for example Stratvolcano and Stratvolcanoes refer to the same type. Sometimes entries contain question marks, indicating that the classification may not be correct.

Cleaning the data

The next step is to clean the data. I decided to take the classification for granted and simply remove question marks. Also, use one token for each type and change the alternative spellings accordingly. Finally remove superfluous whitespace and start all words with capital letter.

def cleanup_type(s):
    if not isinstance(s, str):
        return s
    s = s.replace('?', '').replace('  ', ' ')
    s = s.replace('volcanoes', 'volcano')
    s = s.replace('volcanoe', 'Volcano')
    s = s.replace('cones', 'cone')
    s = s.replace('Calderas', 'Caldera')
    return s.strip().title()

df_volc['Type'] = df_volc['Type'].map(cleanup_type)
df_volc['Type'].value_counts()
Stratovolcano         713
Shield Volcano        173
Submarine Volcano     137
Volcanic Field         98
Caldera                87
Cinder Cone            84
Complex Volcano        56
Pyroclastic Cone       43
Lava Domes             31
Fissure Vents          18
Tuff Cone              14
Maars                  12
Compound Volcano       11
Lava Dome              10
Scoria Cone             8
Pyroclastic Shield      8
Maar                    7
Unknown                 6
Somma Volcano           6
Subglacial Volcano      6
Crater Rows             5
Lava Cone               5
Fumarole Field          3
Pumice Cone             3
Explosion Craters       2
Volcanic Complex        2
Fissure Vent            2
Flood Basalt            1
Cones                   1
Hydrothermal Field      1
Island Arc              1
Cone                    1
Volcanic Landform       1
Lava Field              1
Tuff Rings              1
Flood Basalts           1
dtype: int64

Now let's get rid of incomplete records.

df_volc.dropna(inplace=True)
len(df_volc)
1513

Creating the maps

Volcanoes will be plotted as red triangles, whose sizes depend on the elevation values, that's why I'll only consider positive elevations, i. e. remove submarine volcanoes from the data frame.

df_volc = df_volc[df_volc['Elevation (m)'] >= 0]
len(df_volc)
1406

Next I define a function that will plot a volcano map for the given parameters. Lists of longitudes, latitudes and elevations, that all need to have the same lengths, are mandatory, the other parameters have defaults set.

As mentioned above, sizes correspond to elevations, i. e. a higher volcano will be plotted as a larger triangle. To achieve this the 1st line in the plot_map function creates an array of bins and the 2nd line maps the individual elevation values to these bins.

Next a Basemap object is created, coastlines and boundaries will be drawn and continents filled in the given color. Then the volcanoes are plotted. The 3rd parameter of the plot method is set to ^r, the circumflex stands for triangle and r for red. For more details, see the documentation for plot.

The Basemap object will be returned so it can be manipulated after the function finishes and before the map is plotted, you'll see why in a later example.

def plot_map(lons, lats, elevations, projection='mill', llcrnrlat=-80, urcrnrlat=90, llcrnrlon=-180, urcrnrlon=180, resolution='i', min_marker_size=2):
    bins = np.linspace(0, elevations.max(), 10)
    marker_sizes = np.digitize(elevations, bins) + min_marker_size

    m = Basemap(projection=projection, llcrnrlat=llcrnrlat, urcrnrlat=urcrnrlat, llcrnrlon=llcrnrlon, urcrnrlon=urcrnrlon, resolution=resolution)
    m.drawcoastlines()
    m.drawmapboundary()
    m.fillcontinents(color = '#333333')

    for lon, lat, msize in zip(lons, lats, marker_sizes):
        x, y = m(lon, lat)
        m.plot(x, y, '^r', markersize=msize, alpha=.7)

    return m

Map of Stratovolcanos

The 1st map shows the locations of Stratovolcanoes on a world map, so the data frame is filtered on the type column beforehand.

plt.figure(figsize=(16, 8))
df = df_volc[df_volc['Type'] == 'Stratovolcano']
plot_map(df['Longitude'], df['Latitude'], df['Elevation (m)'])
plt.annotate('Stratovolcanoes of the world | ' + chartinfo, xy=(0, -1.04), xycoords='axes fraction')

png

We can clearly see that most Stratovolcanoes are located, where tectonic plates meet. Let's look at all volcanoes of some of those regions now.

Volcanoes of North America

The next map shows all North American volcanoes in the data frame. To display only that part of the map the parameters that determine the bounding box are set accordingly, i. e. the latitudes and longitudes of the lower left and upper right corners of the bounding box.

plt.figure(figsize=(12, 10))
plot_map(df_volc['Longitude'], df_volc['Latitude'], df_volc['Elevation (m)'],
         llcrnrlat=5.5, urcrnrlat=83.2, llcrnrlon=-180, urcrnrlon=-52.3, min_marker_size=4)
plt.annotate('Volcanoes of North America | ' + chartinfo, xy=(0, -1.03), xycoords='axes fraction')

png

Volcanoes of Indonesia

Another region with many volcanoes is the Indonesian archipelago. Some of them like the Krakatoa and Mount Tambora have undergone catastrophic eruptions with tens of thousands of victims in the past 200 years.

plt.figure(figsize=(18, 8))
plot_map(df_volc['Longitude'], df_volc['Latitude'], df_volc['Elevation (m)'],
         llcrnrlat=-11.1, urcrnrlat=6.1, llcrnrlon=95, urcrnrlon=141.1, min_marker_size=4)
plt.annotate('Volcanoes of Indonesia | ' + chartinfo, xy=(0, -1.04), xycoords='axes fraction')

png

Volcanoes of the world

The final map shows all volcanoes in the data frame and the whole map using a background image obtained from the NASA website. To be able to add this image to the map by calling the warpimage method, is why the plot_map function returns the Basemap object. Moreover, a title and an annotation are added with the code below.

plt.figure(figsize=(20, 12))
m = plot_map(df_volc['Longitude'], df_volc['Latitude'], df_volc['Elevation (m)'], min_marker_size=2)
m.warpimage(image='img/raw-bathymetry.jpg', scale=1)

plt.title('Volcanoes of the World', color='#000000', fontsize=40)
plt.annotate(chartinfo + ' | Image: NASA - nasa.gov',
             (0, 0), color='#bbbbbb', fontsize=11)
plt.show()

png

Map Poster

I also created a poster of this map on Zazzle.

Volcanoes of the World - Miller Projection Print

Bonus: volcano globe

In addition to these static maps I created this volcano globe. It is built with the WebGL globe project, that expects the following data structure [ latitude, longitude, magnitude, latitude, longitude, magnitude, ... ].

To achieve this structure, I create a data frame that contains only latitude, longitude, and elevation, call the as_matrix method which creates an array with a list of lists containing the column values, flatten this into a 1-dimensional array, turn it to a list and save the new data structure as a JSON file.

df_globe_values = df_volc[['Latitude', 'Longitude', 'Elevation (m)']]
globe_values = df_globe_values.as_matrix().flatten().tolist()
with open('json/globe_volcanoes.json', 'w') as f:
    json.dump(globe_values, f)

Summary

This notebook shows how you can plot maps with locations using Pandas and Basemap. This was a rather quick walk-through, which did not go into too much detail explaining the code. If you like to learn more about the topic, I highly recommend the Mapping Global Earthquake Activity tutorial, which is more in-depth and contains various nice examples. Have fun mapping data with Python!

%signature

Author: Ramiro Gómez • Last edited: July 31, 2015

Shirts for Python Programmers

Published: May 07, 2014 by Ramiro Gómez.
If you want to be notified about new content, click here to subscribe to the newsletter or RSS feed.

Disclosure: External links on this page may contain affiliate IDs, which means that I earn a commission if you make a purchase via such a link. This allows me to operate this site and offer hopefully valuable content that is freely accessible. More information about affiliate programs.