Murders by Gender

This notebook shows how to pre-process the Percentage of male and female homicide victims dataset from the United Nations Office on Drugs and Crime to create choropleth maps that show differences of gender ratios of murder victims across countries.

The data is in Excel format and will be read into a Pandas DataFrame to add the iso3 identifier for the countries, remove unused columns and save the DataFrame as a CSV to be loaded from JavaScript.

In [1]:

import pandas as pd
import geonamescache

from geonamescache import mappings

gc = geonamescache.GeonamesCache()
df = pd.read_excel('data/GSH2013_Sex_data.xlsx', 'sex', skiprows=5)
df.head()

Out[1]:

	Region	Sub-region	Country/territory	Source	Unnamed: 4	Year	Males	Females
0	Africa	Eastern Africa	Burundi	PH	IHME	2010	0.704	0.296
1	NaN	NaN	Comoros	PH	IHME	2010	0.706	0.294
2	NaN	NaN	Djibouti	PH	IHME	2010	0.723	0.277
3	NaN	NaN	Eritrea	PH	IHME	2010	0.744	0.256
4	NaN	NaN	Ethiopia	PH	IHME	2010	0.772	0.228

5 rows × 8 columns

Remove unused columns and rows with missing data.

In [2]:

del df['Region'], df['Sub-region'], df['Source'], df['Unnamed: 4'], df['Year']
df.dropna(axis=0, how='all', inplace=True)

Add a column with iso3 codes, map country name variants to the ones used in GeoNames.

In [3]:

cnames = gc.get_countries_by_names()

def get_iso3(name):
    # * means no homicide was recorded in the respective year
    name = name.replace('*', '').strip()
    if name in mappings.country_names:
        name = mappings.country_names[name]
    return cnames[name]['iso3']

df['iso3'] = df['Country/territory'].apply(get_iso3)
df.set_index('Country/territory', inplace=True)

Top 10 countries with the highest ratios of female murder victims

In [4]:

df[['Females']].sort('Females').tail(10).plot(kind='barh')
plt.show()

Top 10 countries with the highest ratios of male murder victims

In [5]:

df[['Males']].sort('Males').tail(10).plot(kind='barh')
plt.show()

Remove the Country/territory column, give the remaining columns a little more meaningful names and save the data frame as a CSV file using the correct encoding.

In [6]:

df.columns = ['Ratios of Male Victims', 'Ratios of Female Victims', 'iso3']
df.to_csv('../static/data/csv/murder-gender.csv', encoding='utf-8', index=False)