This notebook shows how to retrieve and pre-process data from kimonolabs about the 736 football players who participate in the FIFA World Cup 2014.
import json
import os
import requests
import pandas as pd
from geonamescache import GeonamesCache, mappings
gc = GeonamesCache()
Get the player data from the Web service and save it in the players list.
players_url = 'http://worldcup.kimonolabs.com/api/players'
params = {
'apikey': os.environ['KIMONO_API_KEY'],
'limit': 1000
}
response = requests.get(players_url, params=params)
players = json.loads(response.text)
Create a data frame from the list, print its length and the first records.
df = pd.DataFrame(players)
print(len(df))
df.head()
Function to map country names to corresponding iso3 values.
On the basis of the feedback I got for this player migration visualization, I apply the following birth place corrections to the dataset before continuing.
Player ID | Reference | Corrected birthCity | Corrected birthCountry |
---|---|---|---|
E363B6E1-0FA6-408B-AEB4-356EA10B8D10 | https://en.wikipedia.org/wiki/Isa%C3%A1c_Brizuela | San Jose | United States |
B7942D07-C3F5-4373-9240-2D38F2279E60 | https://en.wikipedia.org/wiki/Moussa_Sissoko | Le Blanc-Mesnil | France |
EEAC7973-9E5A-4615-B3F5-22BDC0601650 | https://en.wikipedia.org/wiki/Rio_Mavuba | Born at sea |
fixes = {
'E363B6E1-0FA6-408B-AEB4-356EA10B8D10': {'birthCity': 'San Jose', 'birthCountry': 'United States'},
'B7942D07-C3F5-4373-9240-2D38F2279E60': {'birthCity': 'Le Blanc-Mesnil', 'birthCountry': 'France'},
'EEAC7973-9E5A-4615-B3F5-22BDC0601650': {'birthCity': '', 'birthCountry': 'Born at sea'}
}
for pid, fix in fixes.items():
record = df.loc[df.id == pid]
for k, v in fix.items():
df.loc[record.index, k] = v
Group by iso3 values for birth countries and save resulting data frame as a CSV file used for rendering the map.
cnames = gc.get_countries_by_names()
def get_iso3(name):
if name == 'Born at sea':
return None # sorry Rio
if name in mappings.country_names:
name = mappings.country_names[name]
return cnames[name]['iso3']
df['iso3'] = df['birthCountry'].apply(get_iso3)
birth_countries = df.groupby('iso3').count()
birth_countries[['id']].to_csv('../static/data/csv/world-cup-2014-players-birth-countries.csv')
df.to_csv('world-cup-2014-players.csv')
fr_born = df[df['birthCountry'] == 'France']
fr_born = fr_born.sort('firstName')
fr_born['firstName'] + ' ' + fr_born['lastName'] + ' - Team: ' + fr_born['nationality']
Players' birth countries for Algerian players.
df[df['nationality'] == 'Algeria']['birthCountry'].value_counts()
Players' birth countries for French players.
df[df['nationality'] == 'France']['birthCountry'].value_counts()
Create a CSV file with players who were born in another country than they play for.
df_migrated = df[df['nationality'] != df['birthCountry']]
df_migrated[['birthCity', 'birthCountry', 'birthDate', 'firstName', 'image', 'lastName', 'nationality', 'nickname']].to_csv('world-cup-2014-migrated-players.csv', index=False)
IPython Interactive Computing and Visualization Cookbook
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Python Data Visualization Cookbook
Links to Amazon and Zazzle are associate links, for more info see the disclosure.
This post was written by Ramiro Gómez (@yaph) and published on June 15, 2014.