Global Slavery Index Ranking

This notebook shows how to retrieve the country rankings from the Global Slavery Index and save it as a CSV file.

First import the required libraries.

In [1]:
from lxml import html
import pandas as pd
import geonamescache

gc = geonamescache.GeonamesCache()

Find the 5th HTML table on the page and convert it into a Pandas data frame. Print the 1st few rows to inspect the data.

In [2]:
url ='http://www.globalslaveryindex.org/findings/'
xpath = '//table'

tree = html.parse(url)
table = tree.xpath(xpath)[4]
raw_html = html.tostring(table)

df = pd.read_html(raw_html)[0]
df.head(10)
Out[2]:
Rank Country Name Population Calculated Number of Enslaved Estimated Enslaved (Lower Range) Estimate Enslaved (Upper Range)
0 1 Mauritania 3796141 151353 140000 160000
1 2 Haiti 10173775 209165 200000 220000
2 3 Pakistan 179160111 2127132 2000000 2200000
3 4 India 1236686732 13956010 13300000 14700000
4 5 Nepal 27474377 258806 250000 270000
5 6 Moldova 3559541 33325 32000 35000
6 7 Benin 10050702 80371 76000 84000
7 8 Côte d'Ivoire 19839750 156827 150000 160000
8 9 Gambia 1791225 14046 13000 15000
9 10 Gabon 1632572 13707 13000 14000

10 rows × 6 columns

Add an iso3 column containing the 3-letter country codes. Map country names from the Global Slavery Web site that differ from those used in the GeoNames DB with the corresponding names beforehand. Then calculate the percentage of slaves relative to the total population for all countries in the dataset.

In [3]:
cnames = gc.get_countries_by_names()
mapping = {
    u"Côte d'Ivoire": 'Ivory Coast',
    'Republic of Congo': 'Democratic Republic of the Congo',
    'Timor-Leste': 'East Timor',
    'Hong Kong, SAR China': 'Hong Kong'
}

def get_iso3(name):
    if name in mapping:
        name = mapping[name]
    return cnames[name]['iso3']

df['iso3'] = df['Country Name'].apply(get_iso3)
df['Calculated Percentage of Enslaved'] = df['Calculated Number of Enslaved'] / df['Population'] * 100

The modified data frame now has 2 additional columns needed to create the interactive map with D3.

In [4]:
df.head(10)
Out[4]:
Rank Country Name Population Calculated Number of Enslaved Estimated Enslaved (Lower Range) Estimate Enslaved (Upper Range) iso3 Calculated Percentage of Enslaved
0 1 Mauritania 3796141 151353 140000 160000 MRT 3.987023
1 2 Haiti 10173775 209165 200000 220000 HTI 2.055923
2 3 Pakistan 179160111 2127132 2000000 2200000 PAK 1.187280
3 4 India 1236686732 13956010 13300000 14700000 IND 1.128500
4 5 Nepal 27474377 258806 250000 270000 NPL 0.941990
5 6 Moldova 3559541 33325 32000 35000 MDA 0.936216
6 7 Benin 10050702 80371 76000 84000 BEN 0.799656
7 8 Côte d'Ivoire 19839750 156827 150000 160000 CIV 0.790469
8 9 Gambia 1791225 14046 13000 15000 GMB 0.784156
9 10 Gabon 1632572 13707 13000 14000 GAB 0.839595

10 rows × 8 columns

Save the full data frame and one containing only the columns to display on the gobal slavery map as CSV files. Make sure to set the correct encoding.

In [5]:
df.to_csv('../static/data/csv/globalslaveryindex-full.csv', encoding='utf-8')
df[['iso3', 'Calculated Number of Enslaved', 'Calculated Percentage of Enslaved']].to_csv('../static/data/csv/globalslaveryindex.csv', encoding='utf-8')

Map Preview


Ramiro Gómez

About this post

This post was written by Ramiro Gómez (@yaph) and published on May 05, 2014.


blog comments powered by Disqus