Murders by Gender

This notebook shows how to pre-process the Percentage of male and female homicide victims dataset from the United Nations Office on Drugs and Crime to create choropleth maps that show differences of gender ratios of murder victims across countries.

The data is in Excel format and will be read into a Pandas DataFrame to add the iso3 identifier for the countries, remove unused columns and save the DataFrame as a CSV to be loaded from JavaScript.

In [1]:
import pandas as pd
import geonamescache

from geonamescache import mappings

gc = geonamescache.GeonamesCache()
df = pd.read_excel('data/GSH2013_Sex_data.xlsx', 'sex', skiprows=5)
df.head()
Out[1]:
Region Sub-region Country/territory Source Unnamed: 4 Year Males Females
0 Africa Eastern Africa Burundi PH IHME 2010 0.704 0.296
1 NaN NaN Comoros PH IHME 2010 0.706 0.294
2 NaN NaN Djibouti PH IHME 2010 0.723 0.277
3 NaN NaN Eritrea PH IHME 2010 0.744 0.256
4 NaN NaN Ethiopia PH IHME 2010 0.772 0.228

5 rows × 8 columns

Remove unused columns and rows with missing data.

In [2]:
del df['Region'], df['Sub-region'], df['Source'], df['Unnamed: 4'], df['Year']
df.dropna(axis=0, how='all', inplace=True)

Add a column with iso3 codes, map country name variants to the ones used in GeoNames.

In [3]:
cnames = gc.get_countries_by_names()

def get_iso3(name):
    # * means no homicide was recorded in the respective year
    name = name.replace('*', '').strip()
    if name in mappings.country_names:
        name = mappings.country_names[name]
    return cnames[name]['iso3']

df['iso3'] = df['Country/territory'].apply(get_iso3)
df.set_index('Country/territory', inplace=True)

Top 10 countries with the highest ratios of female murder victims

In [4]:
df[['Females']].sort('Females').tail(10).plot(kind='barh')
plt.show()

Top 10 countries with the highest ratios of male murder victims

In [5]:
df[['Males']].sort('Males').tail(10).plot(kind='barh')
plt.show()

Remove the Country/territory column, give the remaining columns a little more meaningful names and save the data frame as a CSV file using the correct encoding.

In [6]:
df.columns = ['Ratios of Male Victims', 'Ratios of Female Victims', 'iso3']
df.to_csv('../static/data/csv/murder-gender.csv', encoding='utf-8', index=False)

Map Preview


Ramiro Gómez

About this post

This post was written by Ramiro Gómez (@yaph) and published on June 02, 2014.


blog comments powered by Disqus