World Cup Players 2014

This notebook shows how to retrieve and pre-process data from kimonolabs about the 736 football players who participate in the FIFA World Cup 2014.

In [1]:
import json
import os
import requests

import pandas as pd
from geonamescache import GeonamesCache, mappings

gc = GeonamesCache()

Get the Data

Get the player data from the Web service and save it in the players list.

In [2]:
players_url = 'http://worldcup.kimonolabs.com/api/players'
params = {
    'apikey': os.environ['KIMONO_API_KEY'],
    'limit': 1000
}
response = requests.get(players_url, params=params)
players = json.loads(response.text)

Create a data frame from the list, print its length and the first records.

In [3]:
df = pd.DataFrame(players)
print(len(df))
df.head()
736

Out[3]:
age assists birthCity birthCountry birthDate clubId firstName foot goals heightCm id image lastName nationality nickname ownGoals penaltyGoals position teamId type
0 24 0 Yaoundé Cameroon 1990-03-27T00:00:00.000Z 5AF524A1-830C-4D75-8C54-2D0BA1F9BE33 Nicolas Alexis Julio Right 0 180 D9AD1E6D-4253-4B88-BB78-0F43E02AF016 http://cache.images.globalsportsmedia.com/socc... N'Koulou N'Doubena Cameroon N. N'Koulou 0 0 Defender DF25ABB8-37EB-4C2A-8B6C-BDA53BF5A74D Player ...
1 26 0 Douala Cameroon 1987-09-09T00:00:00.000Z 35BCEEAF-37D3-4685-83C4-DDCA504E0653 Alexandre Dimitri Both 0 180 A84540B7-37B6-416F-8C4D-8EAD55D113D9 http://cache.images.globalsportsmedia.com/socc... Song-Billong Cameroon A. Song 0 0 Midfielder DF25ABB8-37EB-4C2A-8B6C-BDA53BF5A74D Player ...
2 28 0 Yaoundé Cameroon 1986-05-20T00:00:00.000Z 0CA624BC-83F7-4BEB-A99C-C69352E6C10D Stéphane Right 0 189 A6075C77-134D-4D33-A2A7-0EDE8CFBDC46 http://cache.images.globalsportsmedia.com/socc... M'Bia Etoundi Cameroon S. M'Bia 0 0 Midfielder DF25ABB8-37EB-4C2A-8B6C-BDA53BF5A74D Player ...
3 28 0 Yaoundé Cameroon 1985-11-28T00:00:00.000Z 0EFD928C-6F64-44E1-A29C-2CD9D02F7863 Landry Joel Tsafack Right 0 173 EC77DFDC-044A-43A5-A18D-8B931652883F http://cache.images.globalsportsmedia.com/socc... N'Guémo Cameroon L. N'Guémo 0 0 Midfielder DF25ABB8-37EB-4C2A-8B6C-BDA53BF5A74D Player ...
4 31 0 Yaoundé Cameroon 1983-05-29T00:00:00.000Z 13490DB2-38ED-4F4A-A743-05579038ABD3 Jean II Right 0 173 4360641A-669A-4E94-8A92-2CA776159E71 http://cache.images.globalsportsmedia.com/socc... Makoun Cameroon J. Makoun 0 0 Midfielder DF25ABB8-37EB-4C2A-8B6C-BDA53BF5A74D Player ...

5 rows × 21 columns

Function to map country names to corresponding iso3 values.

Correct Birth Place Errors

On the basis of the feedback I got for this player migration visualization, I apply the following birth place corrections to the dataset before continuing.

Player IDReferenceCorrected birthCityCorrected birthCountry
E363B6E1-0FA6-408B-AEB4-356EA10B8D10https://en.wikipedia.org/wiki/Isa%C3%A1c_BrizuelaSan JoseUnited States
B7942D07-C3F5-4373-9240-2D38F2279E60https://en.wikipedia.org/wiki/Moussa_SissokoLe Blanc-MesnilFrance
EEAC7973-9E5A-4615-B3F5-22BDC0601650https://en.wikipedia.org/wiki/Rio_MavubaBorn at sea
In [4]:
fixes = {
    'E363B6E1-0FA6-408B-AEB4-356EA10B8D10': {'birthCity': 'San Jose', 'birthCountry': 'United States'},
    'B7942D07-C3F5-4373-9240-2D38F2279E60': {'birthCity': 'Le Blanc-Mesnil', 'birthCountry': 'France'},
    'EEAC7973-9E5A-4615-B3F5-22BDC0601650': {'birthCity': '', 'birthCountry': 'Born at sea'}
}

for pid, fix in fixes.items():
    record = df.loc[df.id == pid]
    for k, v in fix.items():
        df.loc[record.index, k] = v

Birth Countries

Group by iso3 values for birth countries and save resulting data frame as a CSV file used for rendering the map.

In [5]:
cnames = gc.get_countries_by_names()
def get_iso3(name):
    if name == 'Born at sea':
        return None # sorry Rio
    if name in mappings.country_names:
        name = mappings.country_names[name]
    return cnames[name]['iso3']

df['iso3'] = df['birthCountry'].apply(get_iso3)
birth_countries = df.groupby('iso3').count()
birth_countries[['id']].to_csv('../static/data/csv/world-cup-2014-players-birth-countries.csv')
In [6]:
df.to_csv('world-cup-2014-players.csv')

Players born in France

In [7]:
fr_born = df[df['birthCountry'] == 'France']
fr_born = fr_born.sort('firstName')
fr_born['firstName'] + ' ' + fr_born['lastName'] + ' - Team: ' + fr_born['nationality']
Out[7]:
30                Allan Romeo Nyom - Team: Cameroon
454                   André Ayew Pelé - Team: Ghana
411                Antoine Griezmann - Team: France
264                     Aïssa Mandi - Team: Algeria
562                     Bacary Sagna - Team: France
14             Benoît Assou-Ekotto - Team: Cameroon
644                   Blaise Matuidi - Team: France
134                    Carl Medjani - Team: Algeria
6           Charles-Hubert Itandje - Team: Cameroon
535               Cédric Si Mohamed - Team: Algeria
558                 Eliaquim Mangala - Team: France
531                  Faouzi Ghoulam - Team: Algeria
39      Giovanni-Guy Yann Sio - Team: Côte d'Ivoire
681       Gonzalo Gerardo Higuaín - Team: Argentina
283                    Hassan Yebda - Team: Algeria
396                      Hugo Lloris - Team: France
37     Jean-Daniel Akpa Akpro - Team: Côte d'Ivoire
323                       Jordan Ayew - Team: Ghana
400                    Karim Benzema - Team: France
560                Laurent Koscielny - Team: France
263      Liassine Cadamuro Bentaïba - Team: Algeria
595                        Loïc Rémy - Team: France
388                      Lucas Digne - Team: France
256                Madjid Bougherra - Team: Algeria
566                    Mamadou Sakho - Team: France
115                 Mathieu Valbuena - Team: France
395                  Mathieu Debuchy - Team: France
286                     Mehdi Lacen - Team: Algeria
533              Mehdi Mostefa Sbaa - Team: Algeria
379                 Mickaël Landreau - Team: France
459              Morgan Schneiderlin - Team: France
537                   Moussa Sissoko - Team: France
282                  Nabil Bentaleb - Team: Algeria
288                    Nabil Ghilas - Team: Algeria
384                   Olivier Giroud - Team: France
406                       Paul Pogba - Team: France
563                   Raphaël Varane - Team: France
262              Raïs M'Bolhi Ouhab - Team: Algeria
629                    Riyad Mahrez - Team: Algeria
389                     Rémy Cabella - Team: France
7                     Sammy Ndjock - Team: Cameroon
280             Saphir Sliti Taïder - Team: Algeria
317                Sofiane Feghouli - Team: Algeria
633          Souleymane Bamba - Team: Côte d'Ivoire
387                 Stéphane Ruffier - Team: France
489                  Yacine Brahimi - Team: Algeria
404                     Yohan Cabaye - Team: France
dtype: object

Players' birth countries for Algerian players.

In [8]:
df[df['nationality'] == 'Algeria']['birthCountry'].value_counts()
Out[8]:
France     16
Algeria     7
dtype: int64

Players' birth countries for French players.

In [9]:
df[df['nationality'] == 'France']['birthCountry'].value_counts()
Out[9]:
France         21
Born at sea     1
Senegal         1
dtype: int64

Migrated Players

Create a CSV file with players who were born in another country than they play for.

In [10]:
df_migrated = df[df['nationality'] != df['birthCountry']]
df_migrated[['birthCity', 'birthCountry', 'birthDate', 'firstName', 'image', 'lastName', 'nationality', 'nickname']].to_csv('world-cup-2014-migrated-players.csv', index=False)

Map Preview


Ramiro Gómez

About this post

This post was written by Ramiro Gómez (@yaph) and published on June 15, 2014.


blog comments powered by Disqus