Introduction
Tables allow you to explore and summarize data efficiently. While graphs are more intuitive for discovering relationships and trends, tables have the advantage of providing detailed information and allowing descriptive statistics and data summaries to be delivered.
Usually scientific articles in medicine begin with a table that shows the characteristics of the sample of patients. In this post, we will use the janitor and table1 packages to summarize data and make an example of table 1 using the NHANES database.
Packages used
Before using each of these packages you must install them using the command
install.packages(“package”)
and then call using the command
library(package)
Also after installed, each package be invoked specifically for command via the format
package::command()
for example, after install the package janitor, you can use function clean_names via
janitor::clean_names(dataset)
I suggest using the pacman package to handle other packages, and can be installed with
install.packages(“pacman”)
Dataset
The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. More information at https://wwwn.cdc.gov/nchs/nhanes/tutorials/default.aspx
# install.packages("NHANES")
library(NHANES)
data(NHANES)
Tables with r-base
One variable table
Univariate tables are useful for detecting the presence of errors in coding and for obtaining a general summary of the data.
The general syntaxis to indicate a column or variable in R is dataset$variable. For example, if we want to select the column Gender from the NHANES dataframe, the command is
table(NHANES$Gender)
##
## female male
## 5020 4980
To obtain the total we use the command addmargins()
addmargins(table(NHANES$Gender))
##
## female male Sum
## 5020 4980 10000
Now a table by marital status
table(NHANES$MaritalStatus)
##
## Divorced LivePartner Married NeverMarried Separated
## 707 560 3945 1380 183
## Widowed
## 456
To get the percentages you use the command prop.table and add *100 at the end to make it more readable
prop.table(table(NHANES$MaritalStatus)) * 100
##
## Divorced LivePartner Married NeverMarried Separated
## 9.777348 7.744434 54.556769 19.084497 2.530770
## Widowed
## 6.306182
The equivalent in tidyverse format is:
NHANES %>%
group_by(Gender) %>%
summarise(n = n())
## # A tibble: 2 x 2
## Gender n
## <fct> <int>
## 1 female 5020
## 2 male 4980
Two variables table
Tables become more useful even when used to cross tabulate between two nominal variables.
For example, to make a summary table indicating race by gender, the command is: The proportion table is
prop.table(table(NHANES$Race1, NHANES$Gender))
##
## female male
## Black 0.0614 0.0583
## Hispanic 0.0320 0.0290
## Mexican 0.0452 0.0563
## White 0.3221 0.3151
## Other 0.0413 0.0393
Now in percentage
prop.table(table(NHANES$Race1, NHANES$Gender)) * 100
##
## female male
## Black 6.14 5.83
## Hispanic 3.20 2.90
## Mexican 4.52 5.63
## White 32.21 31.51
## Other 4.13 3.93
and with the percentage by row
prop.table(table(NHANES$Race1, NHANES$Gender), 1) * 100
##
## female male
## Black 51.29490 48.70510
## Hispanic 52.45902 47.54098
## Mexican 44.53202 55.46798
## White 50.54928 49.45072
## Other 51.24069 48.75931
We can choose the number of digits with options(digits = n)
options(digits = 3)
prop.table(table(NHANES$Race1, NHANES$Gender), 1) * 100
##
## female male
## Black 51.3 48.7
## Hispanic 52.5 47.5
## Mexican 44.5 55.5
## White 50.5 49.5
## Other 51.2 48.8
addmargins(prop.table(table(NHANES$Race1, NHANES$Gender), 1) * 100)
##
## female male Sum
## Black 51.3 48.7 100.0
## Hispanic 52.5 47.5 100.0
## Mexican 44.5 55.5 100.0
## White 50.5 49.5 100.0
## Other 51.2 48.8 100.0
## Sum 250.1 249.9 500.0
addmargins(prop.table(table(NHANES$Race1, NHANES$Gender), 2) * 100)
##
## female male Sum
## Black 12.23 11.71 23.94
## Hispanic 6.37 5.82 12.20
## Mexican 9.00 11.31 20.31
## White 64.16 63.27 127.44
## Other 8.23 7.89 16.12
## Sum 100.00 100.00 200.00
Tables in tidyverse syle
To create this table
table(NHANES$Race1, NHANES$Gender)
##
## female male
## Black 614 583
## Hispanic 320 290
## Mexican 452 563
## White 3221 3151
## Other 413 393
In tidyverse format is:
NHANES %>%
group_by(Gender, Race1) %>%
summarise(n = n()) %>%
spread(Gender, n)
## # A tibble: 5 x 3
## Race1 female male
## <fct> <int> <int>
## 1 Black 614 583
## 2 Hispanic 320 290
## 3 Mexican 452 563
## 4 White 3221 3151
## 5 Other 413 393
Proportion table in tidyverse style
This table:
prop.table(table(NHANES$Race1, NHANES$Gender), 2) *100
##
## female male
## Black 12.23 11.71
## Hispanic 6.37 5.82
## Mexican 9.00 11.31
## White 64.16 63.27
## Other 8.23 7.89
And to get the proportions in tidyverse style, is:
NHANES %>%
group_by(Gender, Race1) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n) * 100) %>%
select(-n) %>%
spread(Gender, freq)
## # A tibble: 5 x 3
## Race1 female male
## <fct> <dbl> <dbl>
## 1 Black 12.2 11.7
## 2 Hispanic 6.37 5.82
## 3 Mexican 9.00 11.3
## 4 White 64.2 63.3
## 5 Other 8.23 7.89
Tables with janitor::tabyl
The janitor package, designed for data cleaning, contains the tabyl command which is a table 2.0, with some of the following utilities:
NHANES %>%
janitor::tabyl(Race1)
## Race1 n percent
## Black 1197 0.1197
## Hispanic 610 0.0610
## Mexican 1015 0.1015
## White 6372 0.6372
## Other 806 0.0806
Or a two-variables table:
NHANES %>%
janitor::tabyl(Race1, Gender) %>%
janitor::adorn_totals(where = "row")
## Race1 female male
## Black 614 583
## Hispanic 320 290
## Mexican 452 563
## White 3221 3151
## Other 413 393
## Total 5020 4980
NHANES %>%
janitor::tabyl(Race1, Gender) %>%
janitor::adorn_totals(where = "col")
## Race1 female male Total
## Black 614 583 1197
## Hispanic 320 290 610
## Mexican 452 563 1015
## White 3221 3151 6372
## Other 413 393 806
You can add the margins with janitor::adorn_totals(where = c(“row”,“col”))
NHANES %>%
janitor::tabyl(Race1, Gender) %>%
janitor::adorn_totals(where = c("row","col"))
## Race1 female male Total
## Black 614 583 1197
## Hispanic 320 290 610
## Mexican 452 563 1015
## White 3221 3151 6372
## Other 413 393 806
## Total 5020 4980 10000
To limit the number of decimals, we do it in the following way:
NHANES %>%
janitor::tabyl(Race1, Gender) %>%
janitor::adorn_percentages(denominator = "col") %>%
janitor::adorn_pct_formatting(digits = 0)
## Race1 female male
## Black 12% 12%
## Hispanic 6% 6%
## Mexican 9% 11%
## White 64% 63%
## Other 8% 8%
NHANES %>%
janitor::tabyl(Race1, Gender) %>%
janitor::adorn_percentages(denominator = "row") %>%
janitor::adorn_pct_formatting(digits = 0)
## Race1 female male
## Black 51% 49%
## Hispanic 52% 48%
## Mexican 45% 55%
## White 51% 49%
## Other 51% 49%
One of the most useful functions of tabyl is that it allows you to make a table by combining totals and percentages, either by row or column.
NHANES %>%
janitor::tabyl(Race1, Gender) %>%
janitor::adorn_totals(where = c("row","col")) %>%
janitor::adorn_percentages(denominator = "col") %>%
janitor::adorn_pct_formatting(digits = 0) %>%
janitor::adorn_ns(position = "front")
## Race1 female male Total
## Black 614 (12%) 583 (12%) 1197 (12%)
## Hispanic 320 (6%) 290 (6%) 610 (6%)
## Mexican 452 (9%) 563 (11%) 1015 (10%)
## White 3221 (64%) 3151 (63%) 6372 (64%)
## Other 413 (8%) 393 (8%) 806 (8%)
## Total 5020 (100%) 4980 (100%) 10000 (100%)
Now withouth the % sign
NHANES %>%
janitor::tabyl(Race1, Gender) %>%
janitor::adorn_totals(where = c("row","col")) %>%
janitor::adorn_percentages(denominator = "row") %>%
janitor::adorn_pct_formatting(digits = 1, rounding = "half to even", affix_sign = FALSE) %>%
janitor::adorn_ns(position = "front")
## Race1 female male Total
## Black 614 (51.3) 583 (48.7) 1197 (100.0)
## Hispanic 320 (52.5) 290 (47.5) 610 (100.0)
## Mexican 452 (44.5) 563 (55.5) 1015 (100.0)
## White 3221 (50.5) 3151 (49.5) 6372 (100.0)
## Other 413 (51.2) 393 (48.8) 806 (100.0)
## Total 5020 (50.2) 4980 (49.8) 10000 (100.0)
Tables in (old) SPSS style
The expss package provides tabulation functions with support for ‘SPSS’-style labels, multiple / nested banners, weights, multiple-response variables and significance testing.
pacman::p_load(expss)
Now, we can create a table
NHANES::NHANES %>%
expss::tab_cells(AgeDecade, Race1, Education) %>%
expss::tab_cols(Gender) %>%
expss::tab_stat_cpct() %>%
expss::tab_last_sig_means(subtable_marks = "both") %>%
expss::tab_pivot() %>%
expss::set_caption("Table with summary statistics and significance marks.") %>%
htmlTable()
Table with summary statistics and significance marks. | |||
Gender | |||
---|---|---|---|
female | male | ||
A | B | ||
AgeDecade | |||
0-9 | 13.5 | 15.2 | |
10-19 | 14.2 | 14.3 | |
20-29 | 14.1 | 13.9 | |
30-39 | 14.0 | 13.7 | |
40-49 | 14.1 | 14.8 | |
50-59 | 12.9 | 14.1 | |
60-69 | 9.9 > B | 9.1 < A | |
70+ | 7.2 | 4.9 | |
#Total cases | 4827.0 | 4840.0 | |
Race1 | |||
Black | 12.2 | 11.7 | |
Hispanic | 6.4 | 5.8 | |
Mexican | 9.0 | 11.3 | |
White | 64.2 > B | 63.3 < A | |
Other | 8.2 | 7.9 | |
#Total cases | 5020.0 | 4980.0 | |
Education | |||
8th Grade | 5.7 | 6.8 | |
9 - 11th Grade | 10.9 | 13.7 | |
High School | 20.9 | 21.1 | |
Some College | 32.6 > B | 30.2 < A | |
College Grad | 29.9 | 28.2 | |
#Total cases | 3677.0 | 3544.0 |
Table 1
Finally, to make the first table of a scientific report, the Table1 package allows summarizing several variables grouped by factors, as follows
NHANES %>%
table1::table1(~Age + Poverty + Race1 | Gender, data = .)
female (n=5020) |
male (n=4980) |
Overall (n=10000) |
|
---|---|---|---|
Age | |||
Mean (SD) | 37.6 (22.7) | 35.8 (22.0) | 36.7 (22.4) |
Median [Min, Max] | 37.0 [0.00, 80.0] | 36.0 [0.00, 80.0] | 36.0 [0.00, 80.0] |
Poverty | |||
Mean (SD) | 2.76 (1.68) | 2.84 (1.68) | 2.80 (1.68) |
Median [Min, Max] | 2.63 [0.00, 5.00] | 2.75 [0.00, 5.00] | 2.70 [0.00, 5.00] |
Missing | 380 (7.6%) | 346 (6.9%) | 726 (7.3%) |
Race1 | |||
Black | 614 (12.2%) | 583 (11.7%) | 1197 (12.0%) |
Hispanic | 320 (6.4%) | 290 (5.8%) | 610 (6.1%) |
Mexican | 452 (9.0%) | 563 (11.3%) | 1015 (10.2%) |
White | 3221 (64.2%) | 3151 (63.3%) | 6372 (63.7%) |
Other | 413 (8.2%) | 393 (7.9%) | 806 (8.1%) |
and with some shading
table1::table1( ~ Age + Poverty + Race1| Gender, data = NHANES,
topclass="Rtable1-zebra" )
female (n=5020) |
male (n=4980) |
Overall (n=10000) |
|
---|---|---|---|
Age | |||
Mean (SD) | 37.6 (22.7) | 35.8 (22.0) | 36.7 (22.4) |
Median [Min, Max] | 37.0 [0.00, 80.0] | 36.0 [0.00, 80.0] | 36.0 [0.00, 80.0] |
Poverty | |||
Mean (SD) | 2.76 (1.68) | 2.84 (1.68) | 2.80 (1.68) |
Median [Min, Max] | 2.63 [0.00, 5.00] | 2.75 [0.00, 5.00] | 2.70 [0.00, 5.00] |
Missing | 380 (7.6%) | 346 (6.9%) | 726 (7.3%) |
Race1 | |||
Black | 614 (12.2%) | 583 (11.7%) | 1197 (12.0%) |
Hispanic | 320 (6.4%) | 290 (5.8%) | 610 (6.1%) |
Mexican | 452 (9.0%) | 563 (11.3%) | 1015 (10.2%) |
White | 3221 (64.2%) | 3151 (63.3%) | 6372 (63.7%) |
Other | 413 (8.2%) | 393 (7.9%) | 806 (8.1%) |
The J Clin Epidemiology published an excellent guide that helps to design table 1, available at Hayes-Larson, E., Kezios, K.L., Mooney, S.J., Lovasi, G., 2019. Who is in this study, anyway? Guidelines for a useful Table 1. J. Clin. Epidemiol. 114, 125–132.
