Internet Users by Gender

This notebook shows how to pre-process data from ITU on Internet users by country and gender to create several choropleth maps that show the percentages of male and female users by country as well as the ratio of male to female users. The latter shows gender inequality in access to the Internet.

The data was gathered through official household surveys through an annual questionnaire by ITU, who has been providing training courses on measuring ICT access and use by households and individuals in developing countries. The latest statistics were gathered in 2012, so the current situation will look different. Read more about it on the ITU Web site.

The data is in Excel format, so this involves a bit of manual work, i. e. opening the file in a program like LibreOffice Calc, removing superfluous rows and columns from the first data sheet and saving it as a CSV file.

The next steps are:

  • Import the required Python libraries
  • Create a GeonamesCache object
  • Read the CSV file into a Pandas DataFrame
  • Print the number of records and the first few rows to inspect the data
In [1]:
import pandas as pd
import geonamescache

gc = geonamescache.GeonamesCache()
df = pd.read_csv('data/Internet-Users-by-Gender-2010-2012-ITU.csv')
print(len(df))
df.head(5)
65

Out[1]:
Country name Latest year Male Percentage Female Percentage
0 Australia 2011 80.6 78.4
1 Austria 2012 84.1 76.0
2 Bahrain 2012 86.9 90.0
3 Belarus 2012 49.8 44.9
4 Belgium 2012 82.8 78.7

5 rows × 4 columns

Since the ratio is not present, it will be calculated from the percentages. Note that this does not take the actual number of male and female inhabitants into account. So in theory a ratio higher than one could still mean that in absolute numbers more women have access to the Internet than men, if there are more women than men in that country and vice versa.

In [2]:
df['Male/Female Ratio'] = df['Male Percentage'] / df['Female Percentage']
df.sort('Male/Female Ratio', ascending=False).head(10)
Out[2]:
Country name Latest year Male Percentage Female Percentage Male/Female Ratio
58 Turkey 2012 55.8 34.7 1.608069
40 Morocco 2012 65.4 45.8 1.427948
25 Iran (I.R.) 2010 16.6 12.7 1.307087
43 Palestinian Authority 2011 44.6 34.4 1.296512
10 Croatia 2012 70.2 54.7 1.283364
24 Indonesia 2010 11.1 8.7 1.275862
46 Peru 2010 38.9 30.5 1.275410
28 Italy 2012 60.9 50.8 1.198819
39 Montenegro 2011 38.7 32.5 1.190769
15 El Salvador 2012 22.0 18.8 1.170213

10 rows × 5 columns

I haven't expected Turkey to have the most striking gender inequality in access to the Internet, but we have to consider that dataset covers only 65 countries and the situation is probably worse in several of the missing ones.

The next step is to add an iso3 columns containing the 3-letter country code. Some clean-up of country names is required here as every data provider likes to use slightly differing country names.

In [3]:
cnames = gc.get_countries_by_names()
mapping = {
    'Hong Kong, China': 'Hong Kong',
    'Iran (I.R.)': 'Iran',
    'Korea (Rep.)': 'South Korea',
    'Macao, China': 'Macao',
    'Palestinian Authority': 'Palestinian Territory',
    'Slovak Republic': 'Slovakia',
    'TFYR Macedonia': 'Macedonia'
}

def get_iso3(name):
    if name in mapping:
        name = mapping[name]
    return cnames[name]['iso3']

df['iso3'] = df['Country name'].apply(get_iso3)

The modified data frame now contains all the columns needed to create the D3 based maps as we can see in the extract below this time sorted by the lowest male to female ratio.

In [4]:
df.sort('Male/Female Ratio').head(10)
Out[4]:
Country name Latest year Male Percentage Female Percentage Male/Female Ratio iso3
29 Jamaica 2010 25.4 29.8 0.852349 JAM
44 Panama 2012 38.6 41.9 0.921241 PAN
64 Venezuela 2012 47.5 50.6 0.938735 VEN
2 Bahrain 2012 86.9 90.0 0.965556 BHR
57 Thailand 2012 26.3 26.6 0.988722 THA
62 United States 2011 69.4 70.1 0.990014 USA
26 Ireland 2012 76.6 77.3 0.990944 IRL
45 Paraguay 2012 29.3 29.3 1.000000 PRY
61 United Kingdom 2012 87.7 87.3 1.004582 GBR
17 Finland 2012 90.1 89.6 1.005580 FIN

10 rows × 6 columns

Save the data frame as a CSV file setting the correct encoding.

In [5]:
df.to_csv('../static/data/csv/internet-users-gender.csv', encoding='utf-8')

Map Preview


Ramiro Gómez

About this post

This post was written by Ramiro Gómez (@yaph) and published on May 16, 2014.


blog comments powered by Disqus