Data on Policemen per Country and Population

IPython Notebook to retrieve, cleanup, display and plot data on number of policemen and their population ratio for over 100 countries from Wikipedia.

In [1]:
import requests
import io
import re

import pandas as pd
import geonamescache

from geonamescache import mappings

gc = geonamescache.GeonamesCache()
cnames = gc.get_countries_by_names()

url = 'http://wikitables.geeksta.net/dl/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FList_of_countries_by_number_of_police_officers&idx=0'

re_num = re.compile(r'^[\d,.]+$')

def fix_num(x):
    if (isinstance(x, str) and re.search(re_num, x)):
        x = x.replace(',', '')
        if '.' in x:
            x = float(x)
        else:
            x = int(x)
    return x

Get and cleanup data

Convert number strings to floats or integers, remove trailing numbers (footnotes) from country names and replace country names with those used in geonamescache. Also add an iso3 column for rendering the d3 based map.

In [2]:
csv = requests.get(url).text
df = pd.read_csv(io.StringIO(csv))
df = df.applymap(fix_num)

num = ''.join([str(i) for i in range(10)])
df['Country'] = df['Country'].apply(lambda x: x.rstrip(num))
df['Country'] = df['Country'].apply(lambda x: mappings.country_names.get(x, x))
df['iso3'] = df['Country'].apply(lambda x: cnames[x]['iso3'])
df.set_index('Country', inplace=True)
df.head()
Out[2]:
Size Year Police per 100,000 people iso3
Country
Afghanistan 122000 2012 401 AFG
American Samoa 200 2012 720 ASM
Andorra 237 2012 278 AND
Antigua and Barbuda 600 2012 733 ATG
Argentina 205902 2000 558 ARG

Display and plot most and least policed countries

First set some common plotting properties.

In [3]:
footer = 'CC BY-SA 2014 Ramiro Gómez - ramiro.org • Data: en.wikipedia.org/wiki/List_of_countries_by_number_of_police_officers'
mpl.rcParams['font.size'] = 11
mpl.rcParams['font.family'] = 'Ubuntu'
mpl.rcParams['axes.color_cycle'] = 'a6cee3, 1f78b4, b2df8a, 33a02c, fb9a99, e31a1c, fdbf6f, ff7f00, cab2d6'

Show most policed countries

In [4]:
df.sort('Police per 100,000 people', inplace=True)
df.tail(10)
Out[4]:
Size Year Police per 100,000 people iso3
Country
Niue 16 2012 800 NIU
Grenada 900 2012 818 GRD
Montenegro 5250 2012 839 MNE
Bahamas 3000 2012 848 BHS
Saint Kitts and Nevis 450 2012 899 KNA
Saint Helena 69 2012 939 SHN
Brunei 4400 2012 1076 BRN
Monaco 500 2012 1374 MCO
Pitcairn 1 2012 1492 PCN
Vatican 130 2012 15625 VAT

Plot most policed countries

In [5]:
info = 'Number of policemen by 100,000 people.\n'

s = df['Police per 100,000 people'][-10:]
s.plot(kind='barh', figsize=(10, 6), title='Most Policed Countries\n', fontsize='large')

ax = plt.axes()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_color((1, 1, 1))
ax.xaxis.set_label_text(info + footer)
ax.xaxis.set_ticklabels('')
ax.yaxis.set_label_text('')

for i, x in enumerate(s):
    ax.text(x + 100, i - .1, x, ha='left', fontsize='large')

plt.savefig('../static/img/graphs/most-policed-countries.png', bbox_inches='tight')

Show least policed countries

In [6]:
df.head(10)
Out[6]:
Size Year Police per 100,000 people iso3
Country
Mali 7000 2012 48 MLI
Niger 8700 2012 58 NER
Comoros 500 2012 66 COM
Togo 4000 2012 72 TGO
Iran 60000 2012 80 IRN
Kenya 35000 2012 81 KEN
French Polynesia 220 2012 82 PYF
Bangladesh 135000 2012 83 BGD
Ghana 23000 2012 94 GHA
Guinea 10000 2012 100 GIN

Plot least policed countries

In [7]:
info = 'Number of policemen by 100,000 people.\n'

s = df['Police per 100,000 people'][:10]
s.plot(kind='barh', figsize=(10, 6), title='Least Policed Countries\n', fontsize='large')

ax = plt.axes()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_color((1, 1, 1))
ax.xaxis.set_label_text(info + footer)
ax.xaxis.set_ticklabels('')
ax.yaxis.set_label_text('')

for i, x in enumerate(s):
    ax.text(x + 1, i - .1, x, ha='left', fontsize='large')

plt.savefig('../static/img/graphs/least-policed-countries.png', bbox_inches='tight')
In [8]:
df.columns = ['Size', 'Year', 'Police per 100,000 People', 'iso3']
df.to_csv('../static/data/csv/police-countries.csv', encoding='utf-8', index=False)

Map Preview


Ramiro Gómez

About this post

This post was written by Ramiro Gómez (@yaph) and published on October 06, 2014.


blog comments powered by Disqus