In this notebook, we’ll investigate time series plotting in R to
achieve a visually pleasing and reusable high quality plot template for
other plots in the future.
Here, we look at engine oil temperature on a relatively hot day
running through periods of b-road and motorway driving. We also include
coasting (off/on) as an example of a discrete signal plotting. The
notebook is about plot visuals - in order to properly assess oil
temperature, at least engine load should be considered as a factor.
Preparing data
coasting <- read_csv("hot.ChannelGroup_0_CAN1_-_message_dsg_10hz_0x359.csv")
Rows: 21501 Columns: 2── Column specification ─────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): timestamps, CAN1.dsg_10hz.coasting
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
oil_temp <- read_csv("hot.ChannelGroup_3_CAN2_-_message_engine_7_50hz_0x588.csv")
Rows: 21501 Columns: 2── Column specification ─────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): timestamps, CAN2.engine_7_50hz.oil_temperature
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
speed <- read_csv("hot.ChannelGroup_1_CAN1_-_message_kombi_1_40hz_0x320.csv")
Rows: 21501 Columns: 2── Column specification ─────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): timestamps, CAN1.kombi_1_40hz.kombi_speed_actual
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ambient_temperature <- read_csv("hot.ChannelGroup_2_CAN1_-_message_mfd_50hz_0x527.csv")
Rows: 21501 Columns: 2── Column specification ─────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): timestamps, CAN1.mfd_50hz.ambient_temperature
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Then, we’ll load the data CSVs. These are extracted from CAN dump
data and interpolated to 0.1s - each CSV will thus have same timestamps
and a single data column. The CSV names represent the originating
channel group and therefore the actual signal name is not present in the
CSV name. This boils down to how other tools export the data rather than
a conscious choice.
colnames(coasting)[2] = "coasting"
colnames(oil_temp)[2] = "t_oil"
colnames(ambient_temperature)[2] = "t_amb"
colnames(speed)[2] = "speed"
The raw export CSVs had a little bit verbose column names so let’s
rename the columns to more human readable format
df_list <- list(coasting, oil_temp, ambient_temperature, speed)
df <- df_list %>% reduce(full_join, by='timestamps')
head(df)
| | | | |
---|
0.0 | 0 | -59 | 28.5 | 0 |
0.1 | 0 | -59 | 28.5 | 0 |
0.2 | 0 | -59 | 28.5 | 0 |
0.3 | 0 | -59 | 28.5 | 0 |
0.4 | 0 | -59 | 28.5 | 0 |
0.5 | 0 | -59 | 28.5 | 0 |
As all CSVs had same timestamps and same number of samples we can
join all the data into a single dataframe for easier handling. To do
this, let’s first construct a list of all datasets and then pass that to
reduce
function to join the rows through common column
timestamps
. Showing the first few rows of the result
dataframe we’ll see oil temperature starts with a cool -59 value before
the ignition is on. Rather than fixing this in the data we’ll just
filter it out later by limiting the graph axis.
| | | | |
---|
2149.5 | 0 | 102 | 22.5 | 0 |
2149.6 | 0 | 102 | 22.5 | 0 |
2149.7 | 0 | 102 | 22.5 | 0 |
2149.8 | 0 | 102 | 22.5 | 0 |
2149.9 | 0 | 102 | 22.5 | 0 |
2150.0 | 0 | 102 | 22.5 | 0 |
By looking at the last few rows of the dataframe we see the trip
length was 2150 seconds. Let’s define a time range of interest as an
integer vector - this way we can later easily zoom the graphs to a
specific time frame.
Plotting
speed_plot <- df %>%
select(timestamps, speed) %>%
na.omit() %>%
ggplot() +
geom_line(aes(x = timestamps, y = speed), linewidth = 0.4, alpha = 0.75, color = "deepskyblue4") +
scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
scale_y_continuous(breaks = seq(0, 130, by = 10), limits = c(0, 130)) +
ylab("Speed [km/h]") +
theme_minimal() +
theme(
axis.title.x = element_blank(), axis.text.x = element_blank(),
axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
plot.margin = margin(t = 5.5, r = 5.5, b = 0, l = 5.5))
Having dataframe prepared we can create the plots for each variable.
There’s a lot going on here so let’s break down a plot for vehicle
speed:
- Take all of a dataframe and select
timestamp
,
speed
tuples
- Feed the result of selection through omit filter that ignores
columns that have no specified value
- Feed the filtering result to a new
ggplot()
object
- Which consists of a line plot having timestamps as x axis and speed
as y axis
- X axis is a continuous signal with breaks every 120 seconds, limited
to the configured data range as opposed to all of data, and it has 1%
padding on both sides
- Y axis is a continuous signal with range 0..130 and a break every 10
km/h
- The Y axis label is
Speed [km/h]
- The plot uses a minimal theme, this basically resets background
colors etc
- But then we’ll override few theme settings
- x axis will not show title nor break texts
- y axis margins are adjusted to push title little bit further from
break texts
- overall plot margins are adjusted for stacking multiple plots on top
of each other
Note well start the x break sequence always from zero. This allows
time range to start from any second but breaks will still be zero
-based.
oil_temp_plot <- df %>%
select(timestamps, t_oil) %>%
na.omit() %>%
ggplot() +
geom_line(aes(x = timestamps, y = t_oil), linewidth = 0.4, alpha = 0.75, color = "firebrick4") +
scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
scale_y_continuous(breaks = seq(70, 110, by = 5), limits = c(70, 110)) +
ylab("Oil Temperature [C]") +
theme_minimal() +
theme(
axis.title.x = element_blank(), axis.text.x = element_blank(),
axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
plot.margin = margin(t = 0, r = 5.5, b = 0, l = 5.5))
amb_temp_plot <- df %>%
select(timestamps, t_amb) %>%
na.omit() %>%
ggplot() +
geom_line(aes(x = timestamps, y = t_amb), linewidth = 0.4, alpha = 0.75, color = "darkgreen") +
scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
scale_y_continuous(breaks = seq(22, 32, by = 1), limits = c(22, 32)) +
ylab("Ambient Temperature [C]") +
theme_minimal() +
theme(
axis.title.x = element_blank(), axis.text.x = element_blank(),
axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
plot.margin = margin(t = 0, r = 5.5, b = 0, l = 5.5))
The plots for oil temperature and ambient temperature will be
similar, only changing in color, y axis range, title and margins
coasting_plot <- df %>%
select(timestamps, coasting) %>%
na.omit() %>%
ggplot() +
geom_line(aes(x = timestamps, y = coasting), linewidth = 0.4, alpha = 0.75, color = "gray35") +
scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
scale_y_discrete(breaks = seq(0, 1, by = 1), limits = c(0, 1), labels = c("off", "on")) +
labs(x = "Time [s]", y = "Coasting") +
theme_minimal() +
theme(
axis.title.x = element_text(margin = margin(t = 10, r = 0, b = 0, l = 0)),
axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
plot.margin = margin(t = 0, r = 5.5, b = 5.5, l = 5.5))
Warning: Continuous limits supplied to discrete scale.
ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
The plot for discrete coasting signal will be little bit different.
Firstly, we set the y axis scaling to discrete and provide labels for
the possible discrete values. Second, as this plot will be at the bottom
of the graph stack we let the x axis title and breaks to be drawn -
these will serve the entire plot
g1 <- ggplotGrob(speed_plot)
Warning: Removed 1 row containing missing values (`geom_line()`).
g2 <- ggplotGrob(oil_temp_plot)
Warning: Removed 15 rows containing missing values (`geom_line()`).
g3 <- ggplotGrob(amb_temp_plot)
Warning: Removed 1 row containing missing values (`geom_line()`).
g4 <- ggplotGrob(coasting_plot)
Warning: Removed 1 row containing missing values (`geom_line()`).
Our intention is to stack the plots on top of each other. We can use
a grid to do such layout for us - it could do also multicolumn plots but
here we just use one as a “stack”. In order to use grid, we’ll need to
convert the plots to grid graphical objects aka grobs.
maxWidth = grid::unit.pmax(g1$widths, g2$widths, g3$widths, g4$widths)
g1$widths <- as.list(maxWidth)
g2$widths <- as.list(maxWidth)
g3$widths <- as.list(maxWidth)
g4$widths <- as.list(maxWidth)
The grobs might get slightly differing widths. We want to have equal
widths in all grobs so that the breaks on time axis will align nicely
accross all plots. To do this, we can find the width of the widest grob
and force that to all grobs. Credit: https://gist.github.com/tomhopper/faa24797bb44addeba79
grid.arrange(
arrangeGrob(g1,g2,g3,g4,
ncol = 1,
heights = c(1,1,1,.5)
)
)

Now we can finally arrange the grobs to a grid, this will plot it
out. We can control the height of the each grob separately - here we
give 50% height to the coasting signal as it does not need as much
vertical real estate as the continuous signals do.
---
title: "Time series data plotting"
output: html_notebook
---

In this notebook, we'll investigate time series plotting in R to achieve a visually pleasing and reusable high quality plot template for other plots in the future. 

Here, we look at engine oil temperature on a relatively hot day running through periods of b-road and motorway driving. We also include coasting (off/on) as an example of a discrete signal plotting. The notebook is about plot visuals - in order to properly assess oil temperature, at least engine load should be considered as a factor.

## Libraries
```{r}
library(tidyverse)
library(grid)
library(gridExtra)
```
First, let's load the libraries we're going to use. Please note that running the notebook in RStudio, these packages have to be preinstalled with `packages.install("<package-name>")`.

## Preparing data
```{r}
coasting <- read_csv("hot.ChannelGroup_0_CAN1_-_message_dsg_10hz_0x359.csv")
oil_temp <- read_csv("hot.ChannelGroup_3_CAN2_-_message_engine_7_50hz_0x588.csv")
speed <- read_csv("hot.ChannelGroup_1_CAN1_-_message_kombi_1_40hz_0x320.csv")
ambient_temperature <- read_csv("hot.ChannelGroup_2_CAN1_-_message_mfd_50hz_0x527.csv")
```
Then, we'll load the data CSVs. These are extracted from CAN dump data and interpolated to 0.1s - each CSV will thus have same timestamps and a single data column. The CSV names represent the originating channel group and therefore the actual signal name is not present in the CSV name. This boils down to how other tools export the data rather than a conscious choice. 

```{r}
colnames(coasting)[2] = "coasting"
colnames(oil_temp)[2] = "t_oil"
colnames(ambient_temperature)[2] = "t_amb"
colnames(speed)[2] = "speed"
```
The raw export CSVs had a little bit verbose column names so let's rename the columns to more human readable format

```{r}
df_list <- list(coasting, oil_temp, ambient_temperature, speed)
df <- df_list %>% reduce(full_join, by='timestamps')
head(df)
```
As all CSVs had same timestamps and same number of samples we can join all the data into a single dataframe for easier handling. To do this, let's first construct a list of all datasets and then pass that to `reduce` function to join the rows through common column `timestamps`. Showing the first few rows of the result dataframe we'll see oil temperature starts with a cool -59 value before the ignition is on. Rather than fixing this in the data we'll just filter it out later by limiting the graph axis. 

```{r}
tail(df)
TIME_RANGE = c(0, 2150)
```
By looking at the last few rows of the dataframe we see the trip length was 2150 seconds. Let's define a time range of interest as an integer vector - this way we can later easily zoom the graphs to a specific time frame.

## Plotting

```{r}
speed_plot <- df %>%
  select(timestamps, speed) %>%
  na.omit() %>%
  ggplot() +
  geom_line(aes(x = timestamps, y = speed), linewidth = 0.4, alpha = 0.75, color = "deepskyblue4") +
  scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
  scale_y_continuous(breaks = seq(0, 130, by = 10), limits = c(0, 130)) +
  ylab("Speed [km/h]") +
  theme_minimal() +
  theme(
    axis.title.x = element_blank(), axis.text.x = element_blank(),
    axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
    plot.margin = margin(t = 5.5, r = 5.5, b = 0, l = 5.5))

```
Having dataframe prepared we can create the plots for each variable. There's a lot going on here so let's break down a plot for vehicle speed:

* Take all of a dataframe and select `timestamp`, `speed` tuples 
* Feed the result of selection through omit filter that ignores columns that have no specified value
* Feed the filtering result to a new `ggplot()` object
* Which consists of a line plot having timestamps as x axis and speed as y axis 
* X axis is a continuous signal with breaks every 120 seconds, limited to the configured data range as opposed to all of data, and it has 1% padding on both sides
* Y axis is a continuous signal with range 0..130 and a break every 10 km/h
* The Y axis label is `Speed [km/h]`
* The plot uses a minimal theme, this basically resets background colors etc
* But then we'll override few theme settings
  * x axis will not show title nor break texts 
  * y axis margins are adjusted to push title little bit further from break texts
  * overall plot margins are adjusted for stacking multiple plots on top of each other

Note well start the x break sequence always from zero. This allows time range to start from any second but breaks will still be zero -based. 


```{r}
oil_temp_plot <- df %>%
  select(timestamps, t_oil) %>%
  na.omit() %>%
  ggplot() +
  geom_line(aes(x = timestamps, y = t_oil), linewidth = 0.4, alpha = 0.75, color = "firebrick4") +
  scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
  scale_y_continuous(breaks = seq(70, 110, by = 5), limits = c(70, 110)) +
  ylab("Oil Temperature [C]") +
  theme_minimal() +
  theme(
    axis.title.x = element_blank(), axis.text.x = element_blank(),
    axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
    plot.margin = margin(t = 0, r = 5.5, b = 0, l = 5.5))
```

```{r}
amb_temp_plot <- df %>%
  select(timestamps, t_amb) %>%
  na.omit() %>%
  ggplot() +
  geom_line(aes(x = timestamps, y = t_amb), linewidth = 0.4, alpha = 0.75, color = "darkgreen") +
  scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
  scale_y_continuous(breaks = seq(22, 32, by = 1), limits = c(22, 32)) +
  ylab("Ambient Temperature [C]") +
  theme_minimal() +
  theme(
    axis.title.x = element_blank(), axis.text.x = element_blank(),
    axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
    plot.margin = margin(t = 0, r = 5.5, b = 0, l = 5.5))
```
The plots for oil temperature and ambient temperature will be similar, only changing in color, y axis range, title and margins


```{r}
coasting_plot <- df %>%
  select(timestamps, coasting) %>%
  na.omit() %>%
  ggplot() +
  geom_line(aes(x = timestamps, y = coasting), linewidth = 0.4, alpha = 0.75, color = "gray35") +
  scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
  scale_y_discrete(breaks = seq(0, 1, by = 1), limits = c(0, 1), labels = c("off", "on")) +
  labs(x = "Time [s]", y = "Coasting") +
  theme_minimal() +
  theme(
    axis.title.x = element_text(margin = margin(t = 10, r = 0, b = 0, l = 0)),
    axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
    plot.margin = margin(t = 0, r = 5.5, b = 5.5, l = 5.5))
```
The plot for discrete coasting signal will be little bit different. Firstly, we set the y axis scaling to discrete and provide labels for the possible discrete values. Second, as this plot will be at the bottom of the graph stack we let the x axis title and breaks to be drawn - these will serve the entire plot


```{r}
g1 <- ggplotGrob(speed_plot)
g2 <- ggplotGrob(oil_temp_plot)
g3 <- ggplotGrob(amb_temp_plot)
g4 <- ggplotGrob(coasting_plot)
```
Our intention is to stack the plots on top of each other. We can use a grid to do such layout for us - it could do also multicolumn plots but here we just use one as a "stack". In order to use grid, we'll need to convert the plots to grid graphical objects aka grobs.


```{r}
maxWidth = grid::unit.pmax(g1$widths, g2$widths, g3$widths, g4$widths)
g1$widths <- as.list(maxWidth)
g2$widths <- as.list(maxWidth)
g3$widths <- as.list(maxWidth)
g4$widths <- as.list(maxWidth)
```
The grobs might get slightly differing widths. We want to have equal widths in all grobs so that the breaks on time axis will align nicely accross all plots. To do this, we can find the width of the widest grob and force that to all grobs. Credit: https://gist.github.com/tomhopper/faa24797bb44addeba79


```{r, fig.height=8}
grid.arrange(
  arrangeGrob(g1,g2,g3,g4,
    ncol = 1,
    heights = c(1,1,1,.5)
  )
)
```
Now we can finally arrange the grobs to a grid, this will plot it out. We can control the height of the each grob separately - here we give 50% height to the coasting signal as it does not need as much vertical real estate as the continuous signals do. 

## Conclusions

* Using ggplot2 with grids can yield high quality time series plots mixing continuous and discrete signals
* It's little bit tedious to get the plots stacked, aligned and scaled 
* Vertical real estate is scarce - mixing more signals will cause viewers having hard time to scroll to see time axis
* At present, no known easy way to overlay several discrete signals on top of continuous signals - this would save vertical space without a real danger of confusing viewers


