Visualizing the dental workforce of OECD countries I

The Organisation for Economic Co-operation and Development host a database with extensive data. In this post we will do some visualizations to compare the number of dentists in each country.

Packages used:

  • tidyverse
  • gghighlight
  • kableExtra

First we load the data. Now there is a package (OECD) able to extract the datasets, but I will use a local copy:

dent_oecd <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vStv7Pr69DtRKv6Nw6gVBep8hbT3pEeO6B1vNwxK_1DUHgpoTgbuRpZ4SvgtHFQnBZJVGeeQVyRuXZl/pub?gid=1330297229&single=true&output=csv")
## Parsed with column specification:
## cols(
##   VAR = col_character(),
##   Variable = col_character(),
##   UNIT = col_character(),
##   Measure = col_character(),
##   COU = col_character(),
##   Country = col_character(),
##   YEA = col_double(),
##   Year = col_double(),
##   Value = col_double(),
##   `Flag Codes` = col_character(),
##   Flags = col_character()
## )

Always is preferable to take a look the data and its structure:

head(dent_oecd)
## # A tibble: 6 x 11
##   VAR   Variable UNIT  Measure COU   Country   YEA  Year Value `Flag Codes`
##   <chr> <chr>    <chr> <chr>   <chr> <chr>   <dbl> <dbl> <dbl> <chr>       
## 1 HEDU… Dentist… NOMB… Number  AUS   Austra…  2008  2008   552 <NA>        
## 2 HEDU… Dentist… NOMB… Number  AUS   Austra…  2009  2009   632 <NA>        
## 3 HEDU… Dentist… NOMB… Number  AUS   Austra…  2010  2010   665 <NA>        
## 4 HEDU… Dentist… NOMB… Number  AUS   Austra…  2011  2011   684 <NA>        
## 5 HEDU… Dentist… NOMB… Number  AUS   Austra…  2012  2012   723 <NA>        
## 6 HEDU… Dentist… NOMB… Number  AUS   Austra…  2013  2013   894 <NA>        
## # … with 1 more variable: Flags <chr>
glimpse(dent_oecd)
## Observations: 956
## Variables: 11
## $ VAR          <chr> "HEDUDNGR", "HEDUDNGR", "HEDUDNGR", "HEDUDNGR", "HE…
## $ Variable     <chr> "Dentists graduates", "Dentists graduates", "Dentis…
## $ UNIT         <chr> "NOMBRENB", "NOMBRENB", "NOMBRENB", "NOMBRENB", "NO…
## $ Measure      <chr> "Number", "Number", "Number", "Number", "Number", "…
## $ COU          <chr> "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "A…
## $ Country      <chr> "Australia", "Australia", "Australia", "Australia",…
## $ YEA          <dbl> 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 200…
## $ Year         <dbl> 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 200…
## $ Value        <dbl> 552, 632, 665, 684, 723, 894, 861, 957, 118, 134, 1…
## $ `Flag Codes` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Flags        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

And a summary:

summary(dent_oecd)
##      VAR              Variable             UNIT          
##  Length:956         Length:956         Length:956        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##    Measure              COU              Country               YEA      
##  Length:956         Length:956         Length:956         Min.   :2008  
##  Class :character   Class :character   Class :character   1st Qu.:2010  
##  Mode  :character   Mode  :character   Mode  :character   Median :2012  
##                                                           Mean   :2012  
##                                                           3rd Qu.:2014  
##                                                           Max.   :2016  
##       Year          Value            Flag Codes           Flags          
##  Min.   :2008   Min.   :     0.00   Length:956         Length:956        
##  1st Qu.:2010   1st Qu.:     1.34   Class :character   Class :character  
##  Median :2012   Median :     6.00   Mode  :character   Mode  :character  
##  Mean   :2012   Mean   :  3064.18                                        
##  3rd Qu.:2014   3rd Qu.:  1071.00                                        
##  Max.   :2016   Max.   :100994.00

I will use the Dentists graduates, and Per 100 000 population so let’s filter and replace the former data set

dent_oecd <- dent_oecd %>% 
  filter(Variable == "Dentists graduates", 
         Measure == "Per 100 000 population")

There are some useless columns, so unselect them

dent_oecd <- select(dent_oecd, -c(VAR, UNIT, COU, YEA,`Flag Codes`, Flags))

Let’s plot. Since is a temporal trend, a line plot could be a good idea:

dent_oecd %>%  # this means: take the dent_oecd dataframe, and
  ggplot(aes(x = Year, y = Value, color = Country)) + #plot the year in the X axis, the value in the y axis and color the lines per country
  geom_line() + 
  labs(y = "Dentists graduation per 100 000 hab")

Hmmm….the problem here is that isn’t easy to differentiate the countries.

We have to choices: 1. plot each country separate, or 2. highlight one or few countries from the group we can use the package

1. Plotting one graph per country: faceting.

dent_oecd  %>% 
    ggplot(aes(x = Year, y = Value)) + # we delete the color = Country, since is not necessary
  geom_line() + 
  facet_wrap(~Country) + #this means: plot separately each country 
  theme(axis.text.x = element_text(colour="grey20",size=6)) + 
  labs(y = "Dentists graduation per 100 000 hab")

That’s much better. Some observations

  • seem in general dentist graduating per 100 000 habitants has been stable in OECD countries,
  • Chile seems to be increasing at a higher rate his graduation rate. Also Lithuania seems to share this trend.
  • The line in Portugal is higher than in the rest of the countries
  • Few countries decrease their graduation rate, seems that Denmark, Finland and Norway share this trend

TODO: learn how to order the plots, e.g. from higher rate to lower

We can transform the Value var using the log10. With ggplot2 we can directly add the transformation to any axis, as y-axis in this case. We use scale_y_log10(). Sometimes is worth to use to expand some small differences or to graph data with orders of magnitude of difference.

dent_oecd  %>% 
  filter(Year < 2016) %>% 
    ggplot(aes(x = Year, y = Value)) + 
  geom_line() + 
  facet_wrap(~Country) + 
  theme(axis.text.x = element_text(size=6)) + 
  labs(y = "log10 Dentists graduation per 100 000 hab") + 
  scale_y_log10()

2. Using gghighlight

gghighlight is a package that highlight ggplot’s Lines and Points with Predicates

Remember the first graph:

dent_oecd %>%  
  ggplot(aes(x = Year, y = Value, color = Country)) + 
  geom_line() + 
  labs(y = "Dentists graduation per 100 000 hab")

With gghighlight we can, as the name implies, highlight some lines according to a criteria or predicate.

dent_oecd %>% 
  filter(Year < 2016) %>% 
  gghighlight_line(aes(x = Year, y = Value, color = Country), 
                   predicate = mean(Value) > 5) + # this is the threshold
  labs(y = "Dentists graduation per 100 000 hab") 

Pivot table

Pivot table in the dplyr way

dent_oecd %>% 
  group_by(Country, Year) %>% 
  summarise(average = mean(Value)) %>% 
  spread(Year, average)
## # A tibble: 36 x 10
## # Groups:   Country [36]
##    Country   `2008` `2009` `2010` `2011` `2012` `2013` `2014` `2015` `2016`
##    <chr>      <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1 Australia   2.6    2.91   3.02   3.06   3.18   3.87   3.67   4.02  NA   
##  2 Austria     1.42   1.61   1.57   1.5    1.28   1.25   1.65  NA     NA   
##  3 Belgium     1.14   1.44   1.57   1.32   1.46   1.62   1.95   2.05   2.09
##  4 Canada      1.53   1.52   1.47   1.48   1.3    1.51  NA     NA     NA   
##  5 Chile       2.98   4.32   4.11   4.82   5.57   5.86   6.35   7.5    7.17
##  6 Czech Re…   1.59   3.9    3.37   2.86   2.49   3.15   3.26   3.75  NA   
##  7 Denmark     4.28   3.2    3.37   3.41   2.92   2.8    2.73  NA     NA   
##  8 Estonia     3.07   2.17   2.4    2.34   1.74   2.12   1.9    2.13  NA   
##  9 Finland     1.34   1.35   2.44   2.56   3.27   2.87   2.38  NA     NA   
## 10 France      1.36   1.33   1.49   1.72   1.57   1.66   1.48   1.88  NA   
## # … with 26 more rows

Now enhance the table with the package kableExtra:

install.packages("kableExtra")

library(kableExtra)

## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows

Addf the average as reference value

dent_oecd_avg <- dent_oecd %>% 
  group_by(Year) %>% 
  summarise(Value = mean(Value, na.omit = TRUE)) # create a new dataframe with the average per year  

dent_oecd_avg[,"Country"] <- "OECD average"  # add a column and fill with a string
dent_oecd <- bind_rows(dent_oecd_avg, dent_oecd) # bind the dataframes by columns
rm(dent_oecd_avg) # remove the average dataset

First I create the table and store as an object called table_dent

table_dent <- dent_oecd %>% 
  group_by(Country, Year) %>% 
  summarise(average = mean(Value)) %>% 
  spread(Year, average) 

then apply the kable enhance to the object table_dent to show a nice formatted table:

knitr::kable(table_dent, 
             caption = "Dentists graduating per 100 000 habs. OECD countries", 
             digits = 1)
(#tab:table nice)Dentists graduating per 100 000 habs. OECD countries
Country 2008 2009 2010 2011 2012 2013 2014 2015 2016
Australia 2.6 2.9 3.0 3.1 3.2 3.9 3.7 4.0 NA
Austria 1.4 1.6 1.6 1.5 1.3 1.2 1.6 NA NA
Belgium 1.1 1.4 1.6 1.3 1.5 1.6 2.0 2.0 2.1
Canada 1.5 1.5 1.5 1.5 1.3 1.5 NA NA NA
Chile 3.0 4.3 4.1 4.8 5.6 5.9 6.3 7.5 7.2
Czech Republic 1.6 3.9 3.4 2.9 2.5 3.1 3.3 3.8 NA
Denmark 4.3 3.2 3.4 3.4 2.9 2.8 2.7 NA NA
Estonia 3.1 2.2 2.4 2.3 1.7 2.1 1.9 2.1 NA
Finland 1.3 1.4 2.4 2.6 3.3 2.9 2.4 NA NA
France 1.4 1.3 1.5 1.7 1.6 1.7 1.5 1.9 NA
Germany 2.2 2.2 2.5 2.7 3.0 2.8 2.9 2.8 NA
Greece 2.5 2.4 2.3 2.3 1.6 1.8 NA NA NA
Hungary 1.8 2.2 2.5 2.8 3.5 3.0 3.4 3.1 NA
Iceland 1.9 1.9 1.9 1.9 1.9 1.8 NA NA NA
Ireland 1.7 1.6 1.5 1.5 1.7 1.9 1.8 1.9 NA
Israel 0.8 0.7 1.1 1.2 0.8 0.8 0.7 0.6 NA
Italy 1.8 1.6 1.7 1.4 1.4 1.4 0.6 1.0 NA
Japan 1.9 2.0 1.8 1.9 1.8 1.9 1.9 1.6 1.6
Korea 1.8 1.5 1.7 1.8 1.9 2.2 2.6 2.6 2.9
Latvia 1.3 1.6 1.6 2.1 1.8 2.5 2.0 2.5 2.3
Lithuania 4.4 4.7 5.2 4.7 5.2 5.5 5.8 6.4 NA
Luxembourg 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Mexico 4.5 4.7 4.6 4.7 5.0 3.8 4.3 4.0 4.3
Netherlands 1.5 1.5 1.5 1.7 1.4 1.1 1.1 1.4 1.4
New Zealand 1.8 2.0 2.1 2.1 2.4 2.3 2.3 2.1 2.2
Norway 2.2 2.7 2.3 2.7 3.0 2.5 2.3 2.1 NA
OECD average 2.2 2.2 2.3 2.4 2.3 2.5 2.5 2.8 2.7
Poland 2.2 2.5 2.5 2.5 2.5 2.6 2.4 2.5 NA
Portugal 7.1 7.1 7.5 6.8 5.6 5.9 6.2 6.7 NA
Slovak Republic 0.8 0.8 1.2 1.5 1.5 1.8 1.7 NA NA
Slovenia 2.1 1.7 2.0 1.8 2.0 3.4 2.6 3.3 NA
Spain 3.2 2.8 3.0 3.0 3.3 3.5 3.8 3.8 NA
Sweden 2.0 2.1 2.4 2.3 2.1 2.2 2.5 2.5 NA
Switzerland 1.5 1.6 1.4 1.3 1.3 1.3 1.2 NA NA
Turkey 1.4 1.3 1.3 1.5 1.4 1.8 2.0 2.2 NA
United Kingdom NA 1.7 1.8 1.9 1.9 1.9 1.8 1.9 NA
United States 1.6 1.6 1.6 1.6 1.7 1.7 1.7 1.8 NA

More information about ggplot2

comments powered by Disqus