Fork me on GitHub

This tutorial demonstrates how to create interactive barcharts like the one below using a variety of different libraries, currently including; plotly and highchart. Note that there is currently one interactive shiny app associated with this tutorial:

The datasets that this tutorial considers are structured as follows:

Measure Category.1 Category.2
1 A X
2 B Y
3 A Y

Where the “measure” column contains numerical data which is categorised by a number of categories (or dimensions). There are therefore two interesting bar charts that can be generated:

  • The mean value of the measure broken down by a particular category (Mean Parameter per Category)
  • The number of observations broken down by a particular category (Number of Observations per Category)

Note that this template covers both how to build such a bar chart inside of an HTML RMarkdown file and how to functionalise the code so as to conveniently switch between different categories and metrics in a Shiny app.

Import and Clean Data

The data for this template is a .csv file accessed from Figshare here using read.csv.

desktopItems <- read.csv(file = "https://ndownloader.figshare.com/files/5360960")
knitr::kable(head(desktopItems))
Timestamp Desktop.Items Operating.System University.Department University Country
9/30/2015 13:07:58 5 Mac (OS X) IT Services University of Oxford UK
11/06/2015 12:20 87 Linux Physics University of Durham UK
11/06/2015 12:33 25 Windows 10 Physics Queen’s University Belfast UK
11/06/2015 12:46 20 Windows 7 Physics University of Leeds UK
11/06/2015 12:48 64 Windows 8 International Office University of the West of England UK
11/06/2015 12:50 34 Windows 7 Biology King’s College London UK

An advanced version of this template might attempt to automatically infer the measure and appropriate categories for the data, in this template we explicitly decide which columns are categories (or dimensions) and which column is the measure:

measure_column <- "Desktop.Items"
categories <- c("Operating.System","University.Department","University","Country")

These columns will be used in the BarCharts to decide what dimensions of the data we are visualising.

Mean Parameter per Category

Using the aggregate function the mean number of desktop items per category can easily be calculated, the chosen category for aggregation will be assigned to selected_dimension. The function as.name is necessary to convert strings into valid column names.

selected_dimension <- categories[1]
aggregate_mean <- aggregate(data = desktopItems, eval(as.name(measure_column)) ~ eval(as.name(selected_dimension)), FUN = mean)
knitr::kable(aggregate_mean)
eval(as.name(selected_dimension)) eval(as.name(measure_column))
Linux 30.90909
Mac (OS X) 19.76471
Windows 10 17.07143
Windows 7 43.37500
Windows 8 48.09091

For convenience in the visualisation, the column names of the data.frame are renamed:

colnames(aggregate_mean) <- c(selected_dimension, measure_column)

The column names for the categories are formatted with periods instead of spaces, i.e. Operating.System which does not aid comprehension of the chart. Using gsub a utility function called format_label is created to replace the periods:

format_label <- function(dimension){
  gsub(pattern = "[.]", replacement = " ", x = dimension)
}

Highcharter

The data.frame can now be visualised using highcharter as follows, note that it is unnecessary to use eval with this library as a namespace is not defined within the context of the visualisation.

library(highcharter)
highchart() %>%
  hc_chart(type = "column") %>%
  hc_xAxis(categories = aggregate_mean[,selected_dimension]) %>%
  hc_add_series(name = format_label(selected_dimension), data = aggregate_mean[,measure_column]) %>%
  hc_title(text = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)))

The highcharter library distinguishes between horizontal and vertically orientated bar charts by hc_chart(type = ); column and bar are verticle and horizontal, respectively. Both varieties of chart are more legible if bars are ordered from largest to smallest, note that internal to hc_add_series the ordered measure_column is reversed to achieve this:

aggregate_mean_sorted <-
  aggregate_mean[order(aggregate_mean$Desktop.Items, decreasing = TRUE), ]
highchart() %>%
  hc_chart(type = "bar") %>%
  hc_xAxis(categories = aggregate_mean_sorted[,selected_dimension]) %>%
  hc_add_series(name = format_label(selected_dimension), data = rev(aggregate_mean_sorted[,measure_column])) %>%
  hc_yAxis(title = list(text = "Mean Number of Desktop Items")) %>%
  hc_title(text = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)))

Plotly

The data.frame can now be visualised using Plotly as follows, note that eval is necessary as the x and y arguments are assumed to be explicit column names for the data provided to plotly - eval forces the evaluation of as.name(selected_dimension).

library(plotly)

as.formula(paste0("~",selected_dimension))
## ~Operating.System
## <environment: 0x105abc318>
plot_ly(data = aggregate_mean,
        type = "bar",
        x = as.formula(paste0("~",selected_dimension)),
        y = as.formula(paste0("~",measure_column))) %>%
  layout(xaxis = list(title = format_label(selected_dimension)),
         yaxis = list(title = "Mean Number of Desktop Items"),
         title = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)))

Barcharts are verticle by default in the plotly library, however horizontally orientated bar charts are often more appropriate where dimension labels are long. Independent of orientation, bar charts are more legible if bars are ordered from largest to smallest, as show below. Orientation is controlled in the plotly library through the argument orientation:

plot_ly(data = aggregate_mean[order(aggregate_mean$Desktop.Items, decreasing = TRUE), ],
        type = "bar",
        y = as.formula(paste0("~",selected_dimension)),
        x = as.formula(paste0("~",measure_column)),
        orientation = "h") %>%
  layout(xaxis = list(title = "Mean Number of Desktop Items"),
         yaxis = list(title = format_label(selected_dimension)),
         title = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)),
         margin = list(l = 80))

Number of Observations per Category

The number of observations per category can be calculated with aggregate by applying the FUN length across the subset data - i.e. how long is the list of observations for each category.

aggregate_number_of_observations <- aggregate(data = desktopItems, eval(as.name(measure_column)) ~ eval(as.name(selected_dimension)), FUN = length)
colnames(aggregate_number_of_observations) <- c(selected_dimension,"Desktop.Items")

Highcharter

Using the same code as above, a barchart of the aggregated data can easily be generated:

aggregate_number_of_observations <- aggregate_number_of_observations[order(aggregate_number_of_observations$Desktop.Items),]
highchart() %>%
  hc_chart(type = "bar") %>%
  hc_xAxis(categories = aggregate_number_of_observations[,selected_dimension]) %>%
  hc_add_series(name = format_label(selected_dimension), data = rev(aggregate_number_of_observations[,measure_column])) %>%
  hc_yAxis(title = list(text = "Mean Number of Desktop Items")) %>%
  hc_title(text = paste0("Mean number of desktop items aggregated by ",format_label(selected_dimension)))

Plotly

Using the same code as above, a barchart of the aggregated data can easily be generated:

aggregate_number_of_observations <- aggregate_number_of_observations[order(aggregate_number_of_observations$Desktop.Items),]
plot_ly(data = aggregate_number_of_observations,
        type = "bar",
        y = as.formula(paste0("~",selected_dimension)),
        x = as.formula(paste0("~",measure_column)),
        orientation = "h") %>%
  layout(xaxis = list(title = "Number of respondants"),
         yaxis = list(title= format_label(selected_dimension)),
         title = paste0("Number of respondants aggregated by ",format_label(selected_dimension)),
         margin = list(l = 80))

Functionalising

It is convenient to proceduralise the creation of these charts by converting the scripts into functions that can easily be called with different parameters, this is particularly useful for in Shiny apps. A function for each charting library considered in this document is provided below.

Note that the aggregation function is the same, for this tutorial, regardless of the visualisation library used.

aggregate_data_for_barchart <-
  function(data = NA,
           dimension_column = NA,
           measure_column = NA,
           aggregate_function = NA) {
    aggregated_data <-
      aggregate(data = data,
                eval(as.name(measure_column)) ~ eval(as.name(dimension_column)),
                FUN = aggregate_function)
    colnames(aggregated_data) <- c(dimension_column, measure_column)
    
    aggregated_data <-
      aggregated_data[order(aggregated_data[, measure_column]), ]
    # Return for use
    aggregated_data
  }

This function can easily be called to aggregate the data as follows:

intermediate_aggregate <- aggregate_data_for_barchart(
  data = desktopItems,
  dimension_column = "University",
  measure_column = "Desktop.Items",
  aggregate_function = sum
)
knitr::kable(head(intermediate_aggregate))
University Desktop.Items
8 University of Cambridge 0
11 University of Greenwich 7
15 University of Sheffield 8
6 University College London 18
10 University of Glasgow 20
13 University of Leeds 20

Plotly

The function below is used to generate a plotly bar chart from the aggregate data function, note that a number of additional arguments have been added to provide greater flexibility over the output.

plotly_aggregated_barchart <- function(
  data = NA,
  dimension_column = NA,
  measure_column = NA,
  aggregate_description = NA,
  left_margin = 100,
  displayFurniture = T
) {
  plot_ly(
    data = data,
    type = "bar",
    y = as.formula(paste0("~",dimension_column)),
    x = ~Desktop.Items,
    orientation = "h"
  ) %>%
    layout(
      xaxis = list(title = aggregate_description),
      yaxis = list(title = ""),
      title = paste0(
        aggregate_description," aggregated by ",
        format_label(dimension_column)
      ),
      margin = list(l = left_margin)
    ) %>%
    config(displayModeBar = displayFurniture)
}

For example:

plotly_aggregated_barchart(
  data = intermediate_aggregate,
  dimension_column = "University",
  measure_column = "Desktop.Items",
  aggregate_description = "Mean number of desktop items",
  displayFurniture = F
)

Highcharter

The function below is used to generate a highcharter bar chart from the aggregate data function, note that a number of additional arguments have been added to provide greater flexibility over the output.

highcharter_aggregated_barchart <- function(
  data = NA,
  dimension_column = NA,
  measure_column = NA,
  aggregate_description = NA
) {
  highchart() %>%
  hc_chart(type = "bar") %>%
  hc_xAxis(categories = data[,dimension_column]) %>%
  hc_add_series(name = format_label(dimension_column), data = rev(aggregate_number_of_observations[,measure_column])) %>%
  hc_yAxis(title = list(text = aggregate_description)) %>%
  hc_title(text = paste0(aggregate_description," of desktop items aggregated by ",format_label(dimension_column)))
}

For example:

highcharter_aggregated_barchart(
  data = intermediate_aggregate,
  dimension_column = "University",
  measure_column = "Desktop.Items",
  aggregate_description = "Mean number of desktop items"
)

Shiny App

A shiny app containing an interactive version of the charts above is available here: https://livedataoxford.shinyapps.io/htmlwidget_template_BarCharts/.

The following types of interaction are supported:

  • Switch between different visualisation libraries
  • Switch between different measures to parametrize data against
  • Switch beteen viewing mean response value and number of responses
  • Hover over bars for further information
  • Most visualisation libraries provide further interactive capabilities