Dental Health US

Notebook to retrieve and cleanup statistics on dental health in the US published by Bloomberg.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import geonamescache

from lxml import html

gc = geonamescache.GeonamesCache()

url ='http://www.bloomberg.com/visual-data/best-and-worst/worst-dental-health-states'
xpath = '//*[@class="hid"]'

tree = html.parse(url)
table = tree.xpath(xpath)[0]
raw_html = html.tostring(table)

df = pd.read_html(raw_html, header=0, parse_dates=[1])[0]
df.head()
Out[1]:
Rank State Dental health score i Percentage of population living in a dental HPSA i Percentage of adults who visited a dentist in past year Percentage of seniors with no natural teeth i
0 1 Mississippi t 88.4 57.8% 58.1% 27.1%
1 2 Louisiana t 74.6 50.6% 63.9% 25.6%
2 3 West Virginia t 70.2 14.5% 60.5% 36.0%
3 4 Tennessee t 67.0 27.1% 66.3% 33.7%
4 5 Alabama t 62.4 31.8% 64.7% 25.5%

Cleanup columns by removing unnecessary characters and convert percentage value into float.

In [2]:
def fl(x):
    return float(x.rstrip('%'))

df['State'] = df['State'].apply(lambda x: x.rstrip('t').strip())
df['Dental health score'] = df['Dental health score  i']
df['% of population living in a dental HPSA'] = df['Percentage of population living in a dental HPSA  i'].apply(fl)
df['% of adults who visited a dentist in past year'] = df['Percentage of adults who visited a dentist in past year'].apply(fl)
df['% of seniors with no natural teeth'] = df['Percentage of seniors with no natural teeth  i'].apply(fl)
df.head()
Out[2]:
Rank State Dental health score i Percentage of population living in a dental HPSA i Percentage of adults who visited a dentist in past year Percentage of seniors with no natural teeth i Dental health score % of population living in a dental HPSA % of adults who visited a dentist in past year % of seniors with no natural teeth
0 1 Mississippi 88.4 57.8% 58.1% 27.1% 88.4 57.8 58.1 27.1
1 2 Louisiana 74.6 50.6% 63.9% 25.6% 74.6 50.6 63.9 25.6
2 3 West Virginia 70.2 14.5% 60.5% 36.0% 70.2 14.5 60.5 36.0
3 4 Tennessee 67.0 27.1% 66.3% 33.7% 67.0 27.1 66.3 33.7
4 5 Alabama 62.4 31.8% 64.7% 25.5% 62.4 31.8 64.7 25.5

Map state names to FIPS codes and delete columns not needed in the Web based map.

In [3]:
states = gc.get_dataset_by_key(gc.get_us_states(), 'name')
df['fips'] = df['State'].apply(lambda x: 'US' + states[x]['fips'])

del df['Rank'], df['State'], df['Dental health score  i'], df['Percentage of population living in a dental HPSA  i']
del df['Percentage of adults who visited a dentist in past year'], df['Percentage of seniors with no natural teeth  i']
df.head(5)
Out[3]:
Dental health score % of population living in a dental HPSA % of adults who visited a dentist in past year % of seniors with no natural teeth fips
0 88.4 57.8 58.1 27.1 US28
1 74.6 50.6 63.9 25.6 US22
2 70.2 14.5 60.5 36.0 US54
3 67.0 27.1 66.3 33.7 US47
4 62.4 31.8 64.7 25.5 US01

Save as CSV.

In [4]:
df.to_csv('../static/data/csv/dental-health-us-states.csv', encoding='utf-8', index=False)

Map Preview


Ramiro Gómez

About this post

This post was written by Ramiro Gómez (@yaph) and published on August 07, 2014.


blog comments powered by Disqus