Retrieve data on natural disaster risk from Wikipedia's List of countries by natural disaster risk, do some cleanup and save it on a format appropriate for rendering a map with d3.geomap.
import requests
import io
import re
import pandas as pd
import geonamescache
from geonamescache import mappings
gc = geonamescache.GeonamesCache()
cnames = gc.get_countries_by_names()
url = 'http://wikitables.geeksta.net/dl/?url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FList_of_countries_by_natural_disaster_risk&idx=0'
re_num = re.compile(r'^[\d,.]+$')
def fix_num(x):
if (isinstance(x, str) and re.search(re_num, x)):
x = x.replace(',', '')
if '.' in x:
x = float(x)
else:
x = int(x)
return x
Download the data as CSV, read it into a Pandas DataFrame
and convert numbers to floats and integers.
csv = requests.get(url).text
df = pd.read_csv(io.StringIO(csv))
df = df.applymap(fix_num)
df.head()
Map country names to iso3 codes.
def get_iso3(name):
if name in mappings.country_names:
name = mappings.country_names[name]
return cnames[name]['iso3']
df['iso3'] = df['Country'].apply(get_iso3)
Delete unused columns and save the data as a CSV file.
df['Disaster Risk in Percent'] = df['Disaster risk'].apply(lambda x: float(x.replace('%', '')))
del df['Rank'], df['Disaster risk']
df.to_csv('../static/data/csv/natural-disaster-risk.csv', encoding='utf-8', index=False)
IPython Interactive Computing and Visualization Cookbook
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Python Data Visualization Cookbook
Links to Amazon and Zazzle are associate links, for more info see the disclosure.
This post was written by Ramiro Gómez (@yaph) and published on July 28, 2014.