This tutorial demonstrates how to create interactive barcharts like the one below using a variety of different libraries, currently including; plotly and highchart. Note that there is currently one interactive shiny app associated with this tutorial:
The datasets that this tutorial considers are structured as follows:
Measure | Category.1 | Category.2 |
---|---|---|
1 | A | X |
2 | B | Y |
3 | A | Y |
Where the “measure” column contains numerical data which is categorised by a number of categories (or dimensions). There are therefore two interesting bar charts that can be generated:
Note that this template covers both how to build such a bar chart inside of an HTML RMarkdown file and how to functionalise the code so as to conveniently switch between different categories and metrics in a Shiny app.
The data for this template is a .csv file accessed from Figshare here using read.csv
.
desktopItems <- read.csv(file = "https://ndownloader.figshare.com/files/5360960")
knitr::kable(head(desktopItems))
Timestamp | Desktop.Items | Operating.System | University.Department | University | Country |
---|---|---|---|---|---|
9/30/2015 13:07:58 | 5 | Mac (OS X) | IT Services | University of Oxford | UK |
11/06/2015 12:20 | 87 | Linux | Physics | University of Durham | UK |
11/06/2015 12:33 | 25 | Windows 10 | Physics | Queen’s University Belfast | UK |
11/06/2015 12:46 | 20 | Windows 7 | Physics | University of Leeds | UK |
11/06/2015 12:48 | 64 | Windows 8 | International Office | University of the West of England | UK |
11/06/2015 12:50 | 34 | Windows 7 | Biology | King’s College London | UK |
An advanced version of this template might attempt to automatically infer the measure and appropriate categories for the data, in this template we explicitly decide which columns are categories (or dimensions) and which column is the measure:
measure_column <- "Desktop.Items"
categories <- c("Operating.System","University.Department","University","Country")
These columns will be used in the BarCharts to decide what dimensions of the data we are visualising.
Using the aggregate
function the mean number of desktop items per category can easily be calculated, the chosen category for aggregation will be assigned to selected_dimension
. The function as.name
is necessary to convert strings into valid column names.
selected_dimension <- categories[1]
aggregate_mean <- aggregate(data = desktopItems, eval(as.name(measure_column)) ~ eval(as.name(selected_dimension)), FUN = mean)
knitr::kable(aggregate_mean)
eval(as.name(selected_dimension)) | eval(as.name(measure_column)) |
---|---|
Linux | 30.90909 |
Mac (OS X) | 19.76471 |
Windows 10 | 17.07143 |
Windows 7 | 43.37500 |
Windows 8 | 48.09091 |
For convenience in the visualisation, the column names of the data.frame
are renamed:
colnames(aggregate_mean) <- c(selected_dimension, measure_column)
The column names for the categories are formatted with periods instead of spaces, i.e. Operating.System which does not aid comprehension of the chart. Using gsub
a utility function called format_label
is created to replace the periods:
format_label <- function(dimension){
gsub(pattern = "[.]", replacement = " ", x = dimension)
}
The data.frame
can now be visualised using highcharter
as follows, note that it is unnecessary to use eval
with this library as a namespace is not defined within the context of the visualisation.
library(highcharter)
highchart() %>%
hc_chart(type = "column") %>%
hc_xAxis(categories = aggregate_mean[,selected_dimension]) %>%
hc_add_series(name = format_label(selected_dimension), data = aggregate_mean[,measure_column]) %>%
hc_title(text = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)))
The highcharter
library distinguishes between horizontal and vertically orientated bar charts by hc_chart(type = )
; column and bar are verticle and horizontal, respectively. Both varieties of chart are more legible if bars are ordered from largest to smallest, note that internal to hc_add_series
the ordered measure_column
is reversed to achieve this:
aggregate_mean_sorted <-
aggregate_mean[order(aggregate_mean$Desktop.Items, decreasing = TRUE), ]
highchart() %>%
hc_chart(type = "bar") %>%
hc_xAxis(categories = aggregate_mean_sorted[,selected_dimension]) %>%
hc_add_series(name = format_label(selected_dimension), data = rev(aggregate_mean_sorted[,measure_column])) %>%
hc_yAxis(title = list(text = "Mean Number of Desktop Items")) %>%
hc_title(text = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)))
The data.frame
can now be visualised using Plotly
as follows, note that eval
is necessary as the x
and y
arguments are assumed to be explicit column names for the data
provided to plotly
- eval
forces the evaluation of as.name(selected_dimension)
.
library(plotly)
as.formula(paste0("~",selected_dimension))
## ~Operating.System
## <environment: 0x105abc318>
plot_ly(data = aggregate_mean,
type = "bar",
x = as.formula(paste0("~",selected_dimension)),
y = as.formula(paste0("~",measure_column))) %>%
layout(xaxis = list(title = format_label(selected_dimension)),
yaxis = list(title = "Mean Number of Desktop Items"),
title = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)))
Barcharts are verticle by default in the plotly library, however horizontally orientated bar charts are often more appropriate where dimension labels are long. Independent of orientation, bar charts are more legible if bars are ordered from largest to smallest, as show below. Orientation is controlled in the plotly
library through the argument orientation
:
plot_ly(data = aggregate_mean[order(aggregate_mean$Desktop.Items, decreasing = TRUE), ],
type = "bar",
y = as.formula(paste0("~",selected_dimension)),
x = as.formula(paste0("~",measure_column)),
orientation = "h") %>%
layout(xaxis = list(title = "Mean Number of Desktop Items"),
yaxis = list(title = format_label(selected_dimension)),
title = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)),
margin = list(l = 80))
The number of observations per category can be calculated with aggregate
by applying the FUN
length across the subset data - i.e. how long is the list of observations for each category.
aggregate_number_of_observations <- aggregate(data = desktopItems, eval(as.name(measure_column)) ~ eval(as.name(selected_dimension)), FUN = length)
colnames(aggregate_number_of_observations) <- c(selected_dimension,"Desktop.Items")
Using the same code as above, a barchart of the aggregated data can easily be generated:
aggregate_number_of_observations <- aggregate_number_of_observations[order(aggregate_number_of_observations$Desktop.Items),]
highchart() %>%
hc_chart(type = "bar") %>%
hc_xAxis(categories = aggregate_number_of_observations[,selected_dimension]) %>%
hc_add_series(name = format_label(selected_dimension), data = rev(aggregate_number_of_observations[,measure_column])) %>%
hc_yAxis(title = list(text = "Mean Number of Desktop Items")) %>%
hc_title(text = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)))
Using the same code as above, a barchart of the aggregated data can easily be generated:
aggregate_number_of_observations <- aggregate_number_of_observations[order(aggregate_number_of_observations$Desktop.Items),]
plot_ly(data = aggregate_number_of_observations,
type = "bar",
y = as.formula(paste0("~",selected_dimension)),
x = as.formula(paste0("~",measure_column)),
orientation = "h") %>%
layout(xaxis = list(title = "Number of respondants"),
yaxis = list(title= format_label(selected_dimension)),
title = paste0("Number of respondants aggregated by ",format_label(selected_dimension)),
margin = list(l = 80))
It is convenient to proceduralise the creation of these charts by converting the scripts into functions that can easily be called with different parameters, this is particularly useful for in Shiny apps. A function for each charting library considered in this document is provided below.
Note that the aggregation function is the same, for this tutorial, regardless of the visualisation library used.
aggregate_data_for_barchart <-
function(data = NA,
dimension_column = NA,
measure_column = NA,
aggregate_function = NA) {
aggregated_data <-
aggregate(data = data,
eval(as.name(measure_column)) ~ eval(as.name(dimension_column)),
FUN = aggregate_function)
colnames(aggregated_data) <- c(dimension_column, measure_column)
aggregated_data <-
aggregated_data[order(aggregated_data[, measure_column]), ]
# Return for use
aggregated_data
}
This function can easily be called to aggregate the data as follows:
intermediate_aggregate <- aggregate_data_for_barchart(
data = desktopItems,
dimension_column = "University",
measure_column = "Desktop.Items",
aggregate_function = sum
)
knitr::kable(head(intermediate_aggregate))
University | Desktop.Items | |
---|---|---|
8 | University of Cambridge | 0 |
11 | University of Greenwich | 7 |
15 | University of Sheffield | 8 |
6 | University College London | 18 |
10 | University of Glasgow | 20 |
13 | University of Leeds | 20 |
The function below is used to generate a plotly bar chart from the aggregate data function, note that a number of additional arguments have been added to provide greater flexibility over the output.
plotly_aggregated_barchart <- function(
data = NA,
dimension_column = NA,
measure_column = NA,
aggregate_description = NA,
left_margin = 100,
displayFurniture = T
) {
plot_ly(
data = data,
type = "bar",
y = as.formula(paste0("~",dimension_column)),
x = ~Desktop.Items,
orientation = "h"
) %>%
layout(
xaxis = list(title = aggregate_description),
yaxis = list(title = ""),
title = paste0(
aggregate_description," aggregated by ",
format_label(dimension_column)
),
margin = list(l = left_margin)
) %>%
config(displayModeBar = displayFurniture)
}
For example:
plotly_aggregated_barchart(
data = intermediate_aggregate,
dimension_column = "University",
measure_column = "Desktop.Items",
aggregate_description = "Mean number of desktop items",
displayFurniture = F
)
The function below is used to generate a highcharter bar chart from the aggregate data function, note that a number of additional arguments have been added to provide greater flexibility over the output.
highcharter_aggregated_barchart <- function(
data = NA,
dimension_column = NA,
measure_column = NA,
aggregate_description = NA
) {
highchart() %>%
hc_chart(type = "bar") %>%
hc_xAxis(categories = data[,dimension_column]) %>%
hc_add_series(name = format_label(dimension_column), data = rev(aggregate_number_of_observations[,measure_column])) %>%
hc_yAxis(title = list(text = aggregate_description)) %>%
hc_title(text = paste0(aggregate_description," of desktop items aggregated by ",format_label(dimension_column)))
}
For example:
highcharter_aggregated_barchart(
data = intermediate_aggregate,
dimension_column = "University",
measure_column = "Desktop.Items",
aggregate_description = "Mean number of desktop items"
)
A shiny app containing an interactive version of the charts above is available here: https://livedataoxford.shinyapps.io/htmlwidget_template_BarCharts/.
The following types of interaction are supported: