Internet Users by Gender

This notebook shows how to pre-process data from ITU on Internet users by country and gender to create several choropleth maps that show the percentages of male and female users by country as well as the ratio of male to female users. The latter shows gender inequality in access to the Internet.

The data was gathered through official household surveys through an annual questionnaire by ITU, who has been providing training courses on measuring ICT access and use by households and individuals in developing countries. The latest statistics were gathered in 2012, so the current situation will look different. Read more about it on the ITU Web site.

The data is in Excel format, so this involves a bit of manual work, i. e. opening the file in a program like LibreOffice Calc, removing superfluous rows and columns from the first data sheet and saving it as a CSV file.

The next steps are:

Import the required Python libraries
Create a GeonamesCache object
Read the CSV file into a Pandas DataFrame
Print the number of records and the first few rows to inspect the data

In [1]:

import pandas as pd
import geonamescache

gc = geonamescache.GeonamesCache()
df = pd.read_csv('data/Internet-Users-by-Gender-2010-2012-ITU.csv')
print(len(df))
df.head(5)

Out[1]:

	Country name	Latest year	Male Percentage	Female Percentage
0	Australia	2011	80.6	78.4
1	Austria	2012	84.1	76.0
2	Bahrain	2012	86.9	90.0
3	Belarus	2012	49.8	44.9
4	Belgium	2012	82.8	78.7

5 rows × 4 columns

Since the ratio is not present, it will be calculated from the percentages. Note that this does not take the actual number of male and female inhabitants into account. So in theory a ratio higher than one could still mean that in absolute numbers more women have access to the Internet than men, if there are more women than men in that country and vice versa.

In [2]:

df['Male/Female Ratio'] = df['Male Percentage'] / df['Female Percentage']
df.sort('Male/Female Ratio', ascending=False).head(10)

Out[2]:

	Country name	Latest year	Male Percentage	Female Percentage	Male/Female Ratio
58	Turkey	2012	55.8	34.7	1.608069
40	Morocco	2012	65.4	45.8	1.427948
25	Iran (I.R.)	2010	16.6	12.7	1.307087
43	Palestinian Authority	2011	44.6	34.4	1.296512
10	Croatia	2012	70.2	54.7	1.283364
24	Indonesia	2010	11.1	8.7	1.275862
46	Peru	2010	38.9	30.5	1.275410
28	Italy	2012	60.9	50.8	1.198819
39	Montenegro	2011	38.7	32.5	1.190769
15	El Salvador	2012	22.0	18.8	1.170213

10 rows × 5 columns

I haven't expected Turkey to have the most striking gender inequality in access to the Internet, but we have to consider that dataset covers only 65 countries and the situation is probably worse in several of the missing ones.

The next step is to add an iso3 columns containing the 3-letter country code. Some clean-up of country names is required here as every data provider likes to use slightly differing country names.

In [3]:

cnames = gc.get_countries_by_names()
mapping = {
    'Hong Kong, China': 'Hong Kong',
    'Iran (I.R.)': 'Iran',
    'Korea (Rep.)': 'South Korea',
    'Macao, China': 'Macao',
    'Palestinian Authority': 'Palestinian Territory',
    'Slovak Republic': 'Slovakia',
    'TFYR Macedonia': 'Macedonia'
}

def get_iso3(name):
    if name in mapping:
        name = mapping[name]
    return cnames[name]['iso3']

df['iso3'] = df['Country name'].apply(get_iso3)

The modified data frame now contains all the columns needed to create the D3 based maps as we can see in the extract below this time sorted by the lowest male to female ratio.

In [4]:

df.sort('Male/Female Ratio').head(10)

Out[4]:

	Country name	Latest year	Male Percentage	Female Percentage	Male/Female Ratio	iso3
29	Jamaica	2010	25.4	29.8	0.852349	JAM
44	Panama	2012	38.6	41.9	0.921241	PAN
64	Venezuela	2012	47.5	50.6	0.938735	VEN
2	Bahrain	2012	86.9	90.0	0.965556	BHR
57	Thailand	2012	26.3	26.6	0.988722	THA
62	United States	2011	69.4	70.1	0.990014	USA
26	Ireland	2012	76.6	77.3	0.990944	IRL
45	Paraguay	2012	29.3	29.3	1.000000	PRY
61	United Kingdom	2012	87.7	87.3	1.004582	GBR
17	Finland	2012	90.1	89.6	1.005580	FIN

10 rows × 6 columns

Save the data frame as a CSV file setting the correct encoding.

In [5]:

df.to_csv('../static/data/csv/internet-users-gender.csv', encoding='utf-8')

Volcano Map Poster

Volcanoes of the World - Miller Projection Print

Recommended Books

IPython Interactive Computing and Visualization Cookbook

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

Python Data Visualization Cookbook

Links to Amazon and Zazzle are associate links, for more info see the disclosure.

Internet Users by Gender

Volcano Map Poster

Recommended Books

Map Preview

Map of Internet Users by Gender: Male/Female Ratio

About this post