In this notebook, we’ll investigate time series plotting in R to achieve a visually pleasing and reusable high quality plot template for other plots in the future.

Here, we look at engine oil temperature on a relatively hot day running through periods of b-road and motorway driving. We also include coasting (off/on) as an example of a discrete signal plotting. The notebook is about plot visuals - in order to properly assess oil temperature, at least engine load should be considered as a factor.

Libraries

library(tidyverse)
── Attaching core tidyverse packages ───────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     ── Conflicts ─────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(grid)
library(gridExtra)

Attaching package: ‘gridExtra’

The following object is masked from ‘package:dplyr’:

    combine

First, let’s load the libraries we’re going to use. Please note that running the notebook in RStudio, these packages have to be preinstalled with packages.install("<package-name>").

Preparing data

coasting <- read_csv("hot.ChannelGroup_0_CAN1_-_message_dsg_10hz_0x359.csv")
Rows: 21501 Columns: 2── Column specification ─────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): timestamps, CAN1.dsg_10hz.coasting
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
oil_temp <- read_csv("hot.ChannelGroup_3_CAN2_-_message_engine_7_50hz_0x588.csv")
Rows: 21501 Columns: 2── Column specification ─────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): timestamps, CAN2.engine_7_50hz.oil_temperature
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
speed <- read_csv("hot.ChannelGroup_1_CAN1_-_message_kombi_1_40hz_0x320.csv")
Rows: 21501 Columns: 2── Column specification ─────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): timestamps, CAN1.kombi_1_40hz.kombi_speed_actual
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ambient_temperature <- read_csv("hot.ChannelGroup_2_CAN1_-_message_mfd_50hz_0x527.csv")
Rows: 21501 Columns: 2── Column specification ─────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): timestamps, CAN1.mfd_50hz.ambient_temperature
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Then, we’ll load the data CSVs. These are extracted from CAN dump data and interpolated to 0.1s - each CSV will thus have same timestamps and a single data column. The CSV names represent the originating channel group and therefore the actual signal name is not present in the CSV name. This boils down to how other tools export the data rather than a conscious choice.

colnames(coasting)[2] = "coasting"
colnames(oil_temp)[2] = "t_oil"
colnames(ambient_temperature)[2] = "t_amb"
colnames(speed)[2] = "speed"

The raw export CSVs had a little bit verbose column names so let’s rename the columns to more human readable format

df_list <- list(coasting, oil_temp, ambient_temperature, speed)
df <- df_list %>% reduce(full_join, by='timestamps')
head(df)
ABCDEFGHIJ0123456789
timestamps
<dbl>
coasting
<dbl>
t_oil
<dbl>
t_amb
<dbl>
speed
<dbl>
0.00-5928.50
0.10-5928.50
0.20-5928.50
0.30-5928.50
0.40-5928.50
0.50-5928.50

As all CSVs had same timestamps and same number of samples we can join all the data into a single dataframe for easier handling. To do this, let’s first construct a list of all datasets and then pass that to reduce function to join the rows through common column timestamps. Showing the first few rows of the result dataframe we’ll see oil temperature starts with a cool -59 value before the ignition is on. Rather than fixing this in the data we’ll just filter it out later by limiting the graph axis.

tail(df)
ABCDEFGHIJ0123456789
timestamps
<dbl>
coasting
<dbl>
t_oil
<dbl>
t_amb
<dbl>
speed
<dbl>
2149.5010222.50
2149.6010222.50
2149.7010222.50
2149.8010222.50
2149.9010222.50
2150.0010222.50
TIME_RANGE = c(0, 2150)

By looking at the last few rows of the dataframe we see the trip length was 2150 seconds. Let’s define a time range of interest as an integer vector - this way we can later easily zoom the graphs to a specific time frame.

Plotting

speed_plot <- df %>%
  select(timestamps, speed) %>%
  na.omit() %>%
  ggplot() +
  geom_line(aes(x = timestamps, y = speed), linewidth = 0.4, alpha = 0.75, color = "deepskyblue4") +
  scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
  scale_y_continuous(breaks = seq(0, 130, by = 10), limits = c(0, 130)) +
  ylab("Speed [km/h]") +
  theme_minimal() +
  theme(
    axis.title.x = element_blank(), axis.text.x = element_blank(),
    axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
    plot.margin = margin(t = 5.5, r = 5.5, b = 0, l = 5.5))

Having dataframe prepared we can create the plots for each variable. There’s a lot going on here so let’s break down a plot for vehicle speed:

Note well start the x break sequence always from zero. This allows time range to start from any second but breaks will still be zero -based.

oil_temp_plot <- df %>%
  select(timestamps, t_oil) %>%
  na.omit() %>%
  ggplot() +
  geom_line(aes(x = timestamps, y = t_oil), linewidth = 0.4, alpha = 0.75, color = "firebrick4") +
  scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
  scale_y_continuous(breaks = seq(70, 110, by = 5), limits = c(70, 110)) +
  ylab("Oil Temperature [C]") +
  theme_minimal() +
  theme(
    axis.title.x = element_blank(), axis.text.x = element_blank(),
    axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
    plot.margin = margin(t = 0, r = 5.5, b = 0, l = 5.5))
amb_temp_plot <- df %>%
  select(timestamps, t_amb) %>%
  na.omit() %>%
  ggplot() +
  geom_line(aes(x = timestamps, y = t_amb), linewidth = 0.4, alpha = 0.75, color = "darkgreen") +
  scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
  scale_y_continuous(breaks = seq(22, 32, by = 1), limits = c(22, 32)) +
  ylab("Ambient Temperature [C]") +
  theme_minimal() +
  theme(
    axis.title.x = element_blank(), axis.text.x = element_blank(),
    axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
    plot.margin = margin(t = 0, r = 5.5, b = 0, l = 5.5))

The plots for oil temperature and ambient temperature will be similar, only changing in color, y axis range, title and margins

coasting_plot <- df %>%
  select(timestamps, coasting) %>%
  na.omit() %>%
  ggplot() +
  geom_line(aes(x = timestamps, y = coasting), linewidth = 0.4, alpha = 0.75, color = "gray35") +
  scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
  scale_y_discrete(breaks = seq(0, 1, by = 1), limits = c(0, 1), labels = c("off", "on")) +
  labs(x = "Time [s]", y = "Coasting") +
  theme_minimal() +
  theme(
    axis.title.x = element_text(margin = margin(t = 10, r = 0, b = 0, l = 0)),
    axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
    plot.margin = margin(t = 0, r = 5.5, b = 5.5, l = 5.5))
Warning: Continuous limits supplied to discrete scale.
ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?

The plot for discrete coasting signal will be little bit different. Firstly, we set the y axis scaling to discrete and provide labels for the possible discrete values. Second, as this plot will be at the bottom of the graph stack we let the x axis title and breaks to be drawn - these will serve the entire plot

g1 <- ggplotGrob(speed_plot)
Warning: Removed 1 row containing missing values (`geom_line()`).
g2 <- ggplotGrob(oil_temp_plot)
Warning: Removed 15 rows containing missing values (`geom_line()`).
g3 <- ggplotGrob(amb_temp_plot)
Warning: Removed 1 row containing missing values (`geom_line()`).
g4 <- ggplotGrob(coasting_plot)
Warning: Removed 1 row containing missing values (`geom_line()`).

Our intention is to stack the plots on top of each other. We can use a grid to do such layout for us - it could do also multicolumn plots but here we just use one as a “stack”. In order to use grid, we’ll need to convert the plots to grid graphical objects aka grobs.

maxWidth = grid::unit.pmax(g1$widths, g2$widths, g3$widths, g4$widths)
g1$widths <- as.list(maxWidth)
g2$widths <- as.list(maxWidth)
g3$widths <- as.list(maxWidth)
g4$widths <- as.list(maxWidth)

The grobs might get slightly differing widths. We want to have equal widths in all grobs so that the breaks on time axis will align nicely accross all plots. To do this, we can find the width of the widest grob and force that to all grobs. Credit: https://gist.github.com/tomhopper/faa24797bb44addeba79

grid.arrange(
  arrangeGrob(g1,g2,g3,g4,
    ncol = 1,
    heights = c(1,1,1,.5)
  )
)

Now we can finally arrange the grobs to a grid, this will plot it out. We can control the height of the each grob separately - here we give 50% height to the coasting signal as it does not need as much vertical real estate as the continuous signals do.

Conclusions

---
title: "Time series data plotting"
output: html_notebook
---

In this notebook, we'll investigate time series plotting in R to achieve a visually pleasing and reusable high quality plot template for other plots in the future. 

Here, we look at engine oil temperature on a relatively hot day running through periods of b-road and motorway driving. We also include coasting (off/on) as an example of a discrete signal plotting. The notebook is about plot visuals - in order to properly assess oil temperature, at least engine load should be considered as a factor.

## Libraries
```{r}
library(tidyverse)
library(grid)
library(gridExtra)
```
First, let's load the libraries we're going to use. Please note that running the notebook in RStudio, these packages have to be preinstalled with `packages.install("<package-name>")`.

## Preparing data
```{r}
coasting <- read_csv("hot.ChannelGroup_0_CAN1_-_message_dsg_10hz_0x359.csv")
oil_temp <- read_csv("hot.ChannelGroup_3_CAN2_-_message_engine_7_50hz_0x588.csv")
speed <- read_csv("hot.ChannelGroup_1_CAN1_-_message_kombi_1_40hz_0x320.csv")
ambient_temperature <- read_csv("hot.ChannelGroup_2_CAN1_-_message_mfd_50hz_0x527.csv")
```
Then, we'll load the data CSVs. These are extracted from CAN dump data and interpolated to 0.1s - each CSV will thus have same timestamps and a single data column. The CSV names represent the originating channel group and therefore the actual signal name is not present in the CSV name. This boils down to how other tools export the data rather than a conscious choice. 

```{r}
colnames(coasting)[2] = "coasting"
colnames(oil_temp)[2] = "t_oil"
colnames(ambient_temperature)[2] = "t_amb"
colnames(speed)[2] = "speed"
```
The raw export CSVs had a little bit verbose column names so let's rename the columns to more human readable format

```{r}
df_list <- list(coasting, oil_temp, ambient_temperature, speed)
df <- df_list %>% reduce(full_join, by='timestamps')
head(df)
```
As all CSVs had same timestamps and same number of samples we can join all the data into a single dataframe for easier handling. To do this, let's first construct a list of all datasets and then pass that to `reduce` function to join the rows through common column `timestamps`. Showing the first few rows of the result dataframe we'll see oil temperature starts with a cool -59 value before the ignition is on. Rather than fixing this in the data we'll just filter it out later by limiting the graph axis. 

```{r}
tail(df)
TIME_RANGE = c(0, 2150)
```
By looking at the last few rows of the dataframe we see the trip length was 2150 seconds. Let's define a time range of interest as an integer vector - this way we can later easily zoom the graphs to a specific time frame.

## Plotting

```{r}
speed_plot <- df %>%
  select(timestamps, speed) %>%
  na.omit() %>%
  ggplot() +
  geom_line(aes(x = timestamps, y = speed), linewidth = 0.4, alpha = 0.75, color = "deepskyblue4") +
  scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
  scale_y_continuous(breaks = seq(0, 130, by = 10), limits = c(0, 130)) +
  ylab("Speed [km/h]") +
  theme_minimal() +
  theme(
    axis.title.x = element_blank(), axis.text.x = element_blank(),
    axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
    plot.margin = margin(t = 5.5, r = 5.5, b = 0, l = 5.5))

```
Having dataframe prepared we can create the plots for each variable. There's a lot going on here so let's break down a plot for vehicle speed:

* Take all of a dataframe and select `timestamp`, `speed` tuples 
* Feed the result of selection through omit filter that ignores columns that have no specified value
* Feed the filtering result to a new `ggplot()` object
* Which consists of a line plot having timestamps as x axis and speed as y axis 
* X axis is a continuous signal with breaks every 120 seconds, limited to the configured data range as opposed to all of data, and it has 1% padding on both sides
* Y axis is a continuous signal with range 0..130 and a break every 10 km/h
* The Y axis label is `Speed [km/h]`
* The plot uses a minimal theme, this basically resets background colors etc
* But then we'll override few theme settings
  * x axis will not show title nor break texts 
  * y axis margins are adjusted to push title little bit further from break texts
  * overall plot margins are adjusted for stacking multiple plots on top of each other

Note well start the x break sequence always from zero. This allows time range to start from any second but breaks will still be zero -based. 


```{r}
oil_temp_plot <- df %>%
  select(timestamps, t_oil) %>%
  na.omit() %>%
  ggplot() +
  geom_line(aes(x = timestamps, y = t_oil), linewidth = 0.4, alpha = 0.75, color = "firebrick4") +
  scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
  scale_y_continuous(breaks = seq(70, 110, by = 5), limits = c(70, 110)) +
  ylab("Oil Temperature [C]") +
  theme_minimal() +
  theme(
    axis.title.x = element_blank(), axis.text.x = element_blank(),
    axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
    plot.margin = margin(t = 0, r = 5.5, b = 0, l = 5.5))
```

```{r}
amb_temp_plot <- df %>%
  select(timestamps, t_amb) %>%
  na.omit() %>%
  ggplot() +
  geom_line(aes(x = timestamps, y = t_amb), linewidth = 0.4, alpha = 0.75, color = "darkgreen") +
  scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
  scale_y_continuous(breaks = seq(22, 32, by = 1), limits = c(22, 32)) +
  ylab("Ambient Temperature [C]") +
  theme_minimal() +
  theme(
    axis.title.x = element_blank(), axis.text.x = element_blank(),
    axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
    plot.margin = margin(t = 0, r = 5.5, b = 0, l = 5.5))
```
The plots for oil temperature and ambient temperature will be similar, only changing in color, y axis range, title and margins


```{r}
coasting_plot <- df %>%
  select(timestamps, coasting) %>%
  na.omit() %>%
  ggplot() +
  geom_line(aes(x = timestamps, y = coasting), linewidth = 0.4, alpha = 0.75, color = "gray35") +
  scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
  scale_y_discrete(breaks = seq(0, 1, by = 1), limits = c(0, 1), labels = c("off", "on")) +
  labs(x = "Time [s]", y = "Coasting") +
  theme_minimal() +
  theme(
    axis.title.x = element_text(margin = margin(t = 10, r = 0, b = 0, l = 0)),
    axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
    plot.margin = margin(t = 0, r = 5.5, b = 5.5, l = 5.5))
```
The plot for discrete coasting signal will be little bit different. Firstly, we set the y axis scaling to discrete and provide labels for the possible discrete values. Second, as this plot will be at the bottom of the graph stack we let the x axis title and breaks to be drawn - these will serve the entire plot


```{r}
g1 <- ggplotGrob(speed_plot)
g2 <- ggplotGrob(oil_temp_plot)
g3 <- ggplotGrob(amb_temp_plot)
g4 <- ggplotGrob(coasting_plot)
```
Our intention is to stack the plots on top of each other. We can use a grid to do such layout for us - it could do also multicolumn plots but here we just use one as a "stack". In order to use grid, we'll need to convert the plots to grid graphical objects aka grobs.


```{r}
maxWidth = grid::unit.pmax(g1$widths, g2$widths, g3$widths, g4$widths)
g1$widths <- as.list(maxWidth)
g2$widths <- as.list(maxWidth)
g3$widths <- as.list(maxWidth)
g4$widths <- as.list(maxWidth)
```
The grobs might get slightly differing widths. We want to have equal widths in all grobs so that the breaks on time axis will align nicely accross all plots. To do this, we can find the width of the widest grob and force that to all grobs. Credit: https://gist.github.com/tomhopper/faa24797bb44addeba79


```{r, fig.height=8}
grid.arrange(
  arrangeGrob(g1,g2,g3,g4,
    ncol = 1,
    heights = c(1,1,1,.5)
  )
)
```
Now we can finally arrange the grobs to a grid, this will plot it out. We can control the height of the each grob separately - here we give 50% height to the coasting signal as it does not need as much vertical real estate as the continuous signals do. 

## Conclusions

* Using ggplot2 with grids can yield high quality time series plots mixing continuous and discrete signals
* It's little bit tedious to get the plots stacked, aligned and scaled 
* Vertical real estate is scarce - mixing more signals will cause viewers having hard time to scroll to see time axis
* At present, no known easy way to overlay several discrete signals on top of continuous signals - this would save vertical space without a real danger of confusing viewers


