This notebook shows how to pre-process data from ITU on Internet users by country and gender to create several choropleth maps that show the percentages of male and female users by country as well as the ratio of male to female users. The latter shows gender inequality in access to the Internet.
The data was gathered through official household surveys through an annual questionnaire by ITU, who has been providing training courses on measuring ICT access and use by households and individuals in developing countries. The latest statistics were gathered in 2012, so the current situation will look different. Read more about it on the ITU Web site.
The data is in Excel format, so this involves a bit of manual work, i. e. opening the file in a program like LibreOffice Calc, removing superfluous rows and columns from the first data sheet and saving it as a CSV file.
The next steps are:
import pandas as pd
import geonamescache
gc = geonamescache.GeonamesCache()
df = pd.read_csv('data/Internet-Users-by-Gender-2010-2012-ITU.csv')
print(len(df))
df.head(5)
Since the ratio is not present, it will be calculated from the percentages. Note that this does not take the actual number of male and female inhabitants into account. So in theory a ratio higher than one could still mean that in absolute numbers more women have access to the Internet than men, if there are more women than men in that country and vice versa.
df['Male/Female Ratio'] = df['Male Percentage'] / df['Female Percentage']
df.sort('Male/Female Ratio', ascending=False).head(10)
I haven't expected Turkey to have the most striking gender inequality in access to the Internet, but we have to consider that dataset covers only 65 countries and the situation is probably worse in several of the missing ones.
The next step is to add an iso3 columns containing the 3-letter country code. Some clean-up of country names is required here as every data provider likes to use slightly differing country names.
cnames = gc.get_countries_by_names()
mapping = {
'Hong Kong, China': 'Hong Kong',
'Iran (I.R.)': 'Iran',
'Korea (Rep.)': 'South Korea',
'Macao, China': 'Macao',
'Palestinian Authority': 'Palestinian Territory',
'Slovak Republic': 'Slovakia',
'TFYR Macedonia': 'Macedonia'
}
def get_iso3(name):
if name in mapping:
name = mapping[name]
return cnames[name]['iso3']
df['iso3'] = df['Country name'].apply(get_iso3)
The modified data frame now contains all the columns needed to create the D3 based maps as we can see in the extract below this time sorted by the lowest male to female ratio.
df.sort('Male/Female Ratio').head(10)
Save the data frame as a CSV file setting the correct encoding.
df.to_csv('../static/data/csv/internet-users-gender.csv', encoding='utf-8')
IPython Interactive Computing and Visualization Cookbook
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Python Data Visualization Cookbook
Links to Amazon and Zazzle are associate links, for more info see the disclosure.
This post was written by Ramiro Gómez (@yaph) and published on May 16, 2014.