World Cup Players 2014

This notebook shows how to retrieve and pre-process data from kimonolabs about the 736 football players who participate in the FIFA World Cup 2014.

In [1]:

import json
import os
import requests

import pandas as pd
from geonamescache import GeonamesCache, mappings

gc = GeonamesCache()

Get the Data

Get the player data from the Web service and save it in the players list.

In [2]:

players_url = 'http://worldcup.kimonolabs.com/api/players'
params = {
    'apikey': os.environ['KIMONO_API_KEY'],
    'limit': 1000
}
response = requests.get(players_url, params=params)
players = json.loads(response.text)

Create a data frame from the list, print its length and the first records.

In [3]:

df = pd.DataFrame(players)
print(len(df))
df.head()

Out[3]:

	age	birthCity	birthCountry	birthDate	clubId	firstName	foot	heightCm	id	image	lastName	nationality	nickname	position	teamId	type
0	24	Yaoundé	Cameroon	1990-03-27T00:00:00.000Z	5AF524A1-830C-4D75-8C54-2D0BA1F9BE33	Nicolas Alexis Julio	Right	180	D9AD1E6D-4253-4B88-BB78-0F43E02AF016	http://cache.images.globalsportsmedia.com/socc...	N'Koulou N'Doubena	Cameroon	N. N'Koulou	Defender	DF25ABB8-37EB-4C2A-8B6C-BDA53BF5A74D	Player	...
1	26	Douala	Cameroon	1987-09-09T00:00:00.000Z	35BCEEAF-37D3-4685-83C4-DDCA504E0653	Alexandre Dimitri	Both	180	A84540B7-37B6-416F-8C4D-8EAD55D113D9	http://cache.images.globalsportsmedia.com/socc...	Song-Billong	Cameroon	A. Song	Midfielder	DF25ABB8-37EB-4C2A-8B6C-BDA53BF5A74D	Player	...
2	28	Yaoundé	Cameroon	1986-05-20T00:00:00.000Z	0CA624BC-83F7-4BEB-A99C-C69352E6C10D	Stéphane	Right	189	A6075C77-134D-4D33-A2A7-0EDE8CFBDC46	http://cache.images.globalsportsmedia.com/socc...	M'Bia Etoundi	Cameroon	S. M'Bia	Midfielder	DF25ABB8-37EB-4C2A-8B6C-BDA53BF5A74D	Player	...
3	28	Yaoundé	Cameroon	1985-11-28T00:00:00.000Z	0EFD928C-6F64-44E1-A29C-2CD9D02F7863	Landry Joel Tsafack	Right	173	EC77DFDC-044A-43A5-A18D-8B931652883F	http://cache.images.globalsportsmedia.com/socc...	N'Guémo	Cameroon	L. N'Guémo	Midfielder	DF25ABB8-37EB-4C2A-8B6C-BDA53BF5A74D	Player	...
4	31	Yaoundé	Cameroon	1983-05-29T00:00:00.000Z	13490DB2-38ED-4F4A-A743-05579038ABD3	Jean II	Right	173	4360641A-669A-4E94-8A92-2CA776159E71	http://cache.images.globalsportsmedia.com/socc...	Makoun	Cameroon	J. Makoun	Midfielder	DF25ABB8-37EB-4C2A-8B6C-BDA53BF5A74D	Player	...

5 rows × 21 columns

Function to map country names to corresponding iso3 values.

Correct Birth Place Errors

On the basis of the feedback I got for this player migration visualization, I apply the following birth place corrections to the dataset before continuing.

Player ID	Reference	Corrected birthCity	Corrected birthCountry
E363B6E1-0FA6-408B-AEB4-356EA10B8D10	https://en.wikipedia.org/wiki/Isa%C3%A1c_Brizuela	San Jose	United States
B7942D07-C3F5-4373-9240-2D38F2279E60	https://en.wikipedia.org/wiki/Moussa_Sissoko	Le Blanc-Mesnil	France
EEAC7973-9E5A-4615-B3F5-22BDC0601650	https://en.wikipedia.org/wiki/Rio_Mavuba		Born at sea

In [4]:

fixes = {
    'E363B6E1-0FA6-408B-AEB4-356EA10B8D10': {'birthCity': 'San Jose', 'birthCountry': 'United States'},
    'B7942D07-C3F5-4373-9240-2D38F2279E60': {'birthCity': 'Le Blanc-Mesnil', 'birthCountry': 'France'},
    'EEAC7973-9E5A-4615-B3F5-22BDC0601650': {'birthCity': '', 'birthCountry': 'Born at sea'}
}

for pid, fix in fixes.items():
    record = df.loc[df.id == pid]
    for k, v in fix.items():
        df.loc[record.index, k] = v

Birth Countries

Group by iso3 values for birth countries and save resulting data frame as a CSV file used for rendering the map.

In [5]:

cnames = gc.get_countries_by_names()
def get_iso3(name):
    if name == 'Born at sea':
        return None # sorry Rio
    if name in mappings.country_names:
        name = mappings.country_names[name]
    return cnames[name]['iso3']

df['iso3'] = df['birthCountry'].apply(get_iso3)
birth_countries = df.groupby('iso3').count()
birth_countries[['id']].to_csv('../static/data/csv/world-cup-2014-players-birth-countries.csv')

In [6]:

df.to_csv('world-cup-2014-players.csv')

Players born in France

In [7]:

fr_born = df[df['birthCountry'] == 'France']
fr_born = fr_born.sort('firstName')
fr_born['firstName'] + ' ' + fr_born['lastName'] + ' - Team: ' + fr_born['nationality']

Out[7]:

30                Allan Romeo Nyom - Team: Cameroon
454                   André Ayew Pelé - Team: Ghana
411                Antoine Griezmann - Team: France
264                     Aïssa Mandi - Team: Algeria
562                     Bacary Sagna - Team: France
14             Benoît Assou-Ekotto - Team: Cameroon
644                   Blaise Matuidi - Team: France
134                    Carl Medjani - Team: Algeria
6           Charles-Hubert Itandje - Team: Cameroon
535               Cédric Si Mohamed - Team: Algeria
558                 Eliaquim Mangala - Team: France
531                  Faouzi Ghoulam - Team: Algeria
39      Giovanni-Guy Yann Sio - Team: Côte d'Ivoire
681       Gonzalo Gerardo Higuaín - Team: Argentina
283                    Hassan Yebda - Team: Algeria
396                      Hugo Lloris - Team: France
37     Jean-Daniel Akpa Akpro - Team: Côte d'Ivoire
323                       Jordan Ayew - Team: Ghana
400                    Karim Benzema - Team: France
560                Laurent Koscielny - Team: France
263      Liassine Cadamuro Bentaïba - Team: Algeria
595                        Loïc Rémy - Team: France
388                      Lucas Digne - Team: France
256                Madjid Bougherra - Team: Algeria
566                    Mamadou Sakho - Team: France
115                 Mathieu Valbuena - Team: France
395                  Mathieu Debuchy - Team: France
286                     Mehdi Lacen - Team: Algeria
533              Mehdi Mostefa Sbaa - Team: Algeria
379                 Mickaël Landreau - Team: France
459              Morgan Schneiderlin - Team: France
537                   Moussa Sissoko - Team: France
282                  Nabil Bentaleb - Team: Algeria
288                    Nabil Ghilas - Team: Algeria
384                   Olivier Giroud - Team: France
406                       Paul Pogba - Team: France
563                   Raphaël Varane - Team: France
262              Raïs M'Bolhi Ouhab - Team: Algeria
629                    Riyad Mahrez - Team: Algeria
389                     Rémy Cabella - Team: France
7                     Sammy Ndjock - Team: Cameroon
280             Saphir Sliti Taïder - Team: Algeria
317                Sofiane Feghouli - Team: Algeria
633          Souleymane Bamba - Team: Côte d'Ivoire
387                 Stéphane Ruffier - Team: France
489                  Yacine Brahimi - Team: Algeria
404                     Yohan Cabaye - Team: France
dtype: object

Players' birth countries for Algerian players.

In [8]:

df[df['nationality'] == 'Algeria']['birthCountry'].value_counts()

Out[8]:

France     16
Algeria     7
dtype: int64

Players' birth countries for French players.

In [9]:

df[df['nationality'] == 'France']['birthCountry'].value_counts()

Out[9]:

France         21
Born at sea     1
Senegal         1
dtype: int64

Migrated Players

Create a CSV file with players who were born in another country than they play for.

In [10]:

df_migrated = df[df['nationality'] != df['birthCountry']]
df_migrated[['birthCity', 'birthCountry', 'birthDate', 'firstName', 'image', 'lastName', 'nationality', 'nickname']].to_csv('world-cup-2014-migrated-players.csv', index=False)

World Cup Players 2014

Get the Data

Correct Birth Place Errors

Birth Countries

Players born in France

Migrated Players

Volcano Map Poster

Recommended Books

Map Preview

World Cup 2014 Players' Birth Countries

About this post