In this notebook, we’ll investigate time series plotting in R to
achieve a visually pleasing and reusable high quality plot template for
other plots in the future.
Here, we look at engine oil temperature on a relatively hot day
running through periods of b-road and motorway driving. We also include
coasting (off/on) as an example of a discrete signal plotting. The
notebook is about plot visuals - in order to properly assess oil
temperature, at least engine load should be considered as a factor.
Preparing data
coasting <- read_csv("hot.ChannelGroup_0_CAN1_-_message_dsg_10hz_0x359.csv")
Rows: 21501 Columns: 2── Column specification ─────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): timestamps, CAN1.dsg_10hz.coasting
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
oil_temp <- read_csv("hot.ChannelGroup_3_CAN2_-_message_engine_7_50hz_0x588.csv")
Rows: 21501 Columns: 2── Column specification ─────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): timestamps, CAN2.engine_7_50hz.oil_temperature
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
speed <- read_csv("hot.ChannelGroup_1_CAN1_-_message_kombi_1_40hz_0x320.csv")
Rows: 21501 Columns: 2── Column specification ─────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): timestamps, CAN1.kombi_1_40hz.kombi_speed_actual
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ambient_temperature <- read_csv("hot.ChannelGroup_2_CAN1_-_message_mfd_50hz_0x527.csv")
Rows: 21501 Columns: 2── Column specification ─────────────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): timestamps, CAN1.mfd_50hz.ambient_temperature
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Then, we’ll load the data CSVs. These are extracted from CAN dump
data and interpolated to 0.1s - each CSV will thus have same timestamps
and a single data column. The CSV names represent the originating
channel group and therefore the actual signal name is not present in the
CSV name. This boils down to how other tools export the data rather than
a conscious choice.
colnames(coasting)[2] = "coasting"
colnames(oil_temp)[2] = "t_oil"
colnames(ambient_temperature)[2] = "t_amb"
colnames(speed)[2] = "speed"
The raw export CSVs had a little bit verbose column names so let’s
rename the columns to more human readable format
df_list <- list(coasting, oil_temp, ambient_temperature, speed)
df <- df_list %>% reduce(full_join, by='timestamps')
head(df)
| | | | |
---|
0.0 | 0 | -59 | 28.5 | 0 |
0.1 | 0 | -59 | 28.5 | 0 |
0.2 | 0 | -59 | 28.5 | 0 |
0.3 | 0 | -59 | 28.5 | 0 |
0.4 | 0 | -59 | 28.5 | 0 |
0.5 | 0 | -59 | 28.5 | 0 |
As all CSVs had same timestamps and same number of samples we can
join all the data into a single dataframe for easier handling. To do
this, let’s first construct a list of all datasets and then pass that to
reduce
function to join the rows through common column
timestamps
. Showing the first few rows of the result
dataframe we’ll see oil temperature starts with a cool -59 value before
the ignition is on. Rather than fixing this in the data we’ll just
filter it out later by limiting the graph axis.
| | | | |
---|
2149.5 | 0 | 102 | 22.5 | 0 |
2149.6 | 0 | 102 | 22.5 | 0 |
2149.7 | 0 | 102 | 22.5 | 0 |
2149.8 | 0 | 102 | 22.5 | 0 |
2149.9 | 0 | 102 | 22.5 | 0 |
2150.0 | 0 | 102 | 22.5 | 0 |
By looking at the last few rows of the dataframe we see the trip
length was 2150 seconds. Let’s define a time range of interest as an
integer vector - this way we can later easily zoom the graphs to a
specific time frame.
Plotting
speed_plot <- df %>%
select(timestamps, speed) %>%
na.omit() %>%
ggplot() +
geom_line(aes(x = timestamps, y = speed), linewidth = 0.4, alpha = 0.75, color = "deepskyblue4") +
scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
scale_y_continuous(breaks = seq(0, 130, by = 10), limits = c(0, 130)) +
ylab("Speed [km/h]") +
theme_minimal() +
theme(
axis.title.x = element_blank(), axis.text.x = element_blank(),
axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
plot.margin = margin(t = 5.5, r = 5.5, b = 0, l = 5.5))
Having dataframe prepared we can create the plots for each variable.
There’s a lot going on here so let’s break down a plot for vehicle
speed:
- Take all of a dataframe and select
timestamp
,
speed
tuples
- Feed the result of selection through omit filter that ignores
columns that have no specified value
- Feed the filtering result to a new
ggplot()
object
- Which consists of a line plot having timestamps as x axis and speed
as y axis
- X axis is a continuous signal with breaks every 120 seconds, limited
to the configured data range as opposed to all of data, and it has 1%
padding on both sides
- Y axis is a continuous signal with range 0..130 and a break every 10
km/h
- The Y axis label is
Speed [km/h]
- The plot uses a minimal theme, this basically resets background
colors etc
- But then we’ll override few theme settings
- x axis will not show title nor break texts
- y axis margins are adjusted to push title little bit further from
break texts
- overall plot margins are adjusted for stacking multiple plots on top
of each other
Note well start the x break sequence always from zero. This allows
time range to start from any second but breaks will still be zero
-based.
oil_temp_plot <- df %>%
select(timestamps, t_oil) %>%
na.omit() %>%
ggplot() +
geom_line(aes(x = timestamps, y = t_oil), linewidth = 0.4, alpha = 0.75, color = "firebrick4") +
scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
scale_y_continuous(breaks = seq(70, 110, by = 5), limits = c(70, 110)) +
ylab("Oil Temperature [C]") +
theme_minimal() +
theme(
axis.title.x = element_blank(), axis.text.x = element_blank(),
axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
plot.margin = margin(t = 0, r = 5.5, b = 0, l = 5.5))
amb_temp_plot <- df %>%
select(timestamps, t_amb) %>%
na.omit() %>%
ggplot() +
geom_line(aes(x = timestamps, y = t_amb), linewidth = 0.4, alpha = 0.75, color = "darkgreen") +
scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
scale_y_continuous(breaks = seq(22, 32, by = 1), limits = c(22, 32)) +
ylab("Ambient Temperature [C]") +
theme_minimal() +
theme(
axis.title.x = element_blank(), axis.text.x = element_blank(),
axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
plot.margin = margin(t = 0, r = 5.5, b = 0, l = 5.5))
The plots for oil temperature and ambient temperature will be
similar, only changing in color, y axis range, title and margins
coasting_plot <- df %>%
select(timestamps, coasting) %>%
na.omit() %>%
ggplot() +
geom_line(aes(x = timestamps, y = coasting), linewidth = 0.4, alpha = 0.75, color = "gray35") +
scale_x_continuous(breaks = seq(0, TIME_RANGE[2], by = 120), expand = c(0.01, 0.01), limits = TIME_RANGE) +
scale_y_discrete(breaks = seq(0, 1, by = 1), limits = c(0, 1), labels = c("off", "on")) +
labs(x = "Time [s]", y = "Coasting") +
theme_minimal() +
theme(
axis.title.x = element_text(margin = margin(t = 10, r = 0, b = 0, l = 0)),
axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)),
plot.margin = margin(t = 0, r = 5.5, b = 5.5, l = 5.5))
Warning: Continuous limits supplied to discrete scale.
ℹ Did you mean `limits = factor(...)` or `scale_*_continuous()`?
The plot for discrete coasting signal will be little bit different.
Firstly, we set the y axis scaling to discrete and provide labels for
the possible discrete values. Second, as this plot will be at the bottom
of the graph stack we let the x axis title and breaks to be drawn -
these will serve the entire plot
g1 <- ggplotGrob(speed_plot)
Warning: Removed 1 row containing missing values (`geom_line()`).
g2 <- ggplotGrob(oil_temp_plot)
Warning: Removed 15 rows containing missing values (`geom_line()`).
g3 <- ggplotGrob(amb_temp_plot)
Warning: Removed 1 row containing missing values (`geom_line()`).
g4 <- ggplotGrob(coasting_plot)
Warning: Removed 1 row containing missing values (`geom_line()`).
Our intention is to stack the plots on top of each other. We can use
a grid to do such layout for us - it could do also multicolumn plots but
here we just use one as a “stack”. In order to use grid, we’ll need to
convert the plots to grid graphical objects aka grobs.
maxWidth = grid::unit.pmax(g1$widths, g2$widths, g3$widths, g4$widths)
g1$widths <- as.list(maxWidth)
g2$widths <- as.list(maxWidth)
g3$widths <- as.list(maxWidth)
g4$widths <- as.list(maxWidth)
The grobs might get slightly differing widths. We want to have equal
widths in all grobs so that the breaks on time axis will align nicely
accross all plots. To do this, we can find the width of the widest grob
and force that to all grobs. Credit: https://gist.github.com/tomhopper/faa24797bb44addeba79
grid.arrange(
arrangeGrob(g1,g2,g3,g4,
ncol = 1,
heights = c(1,1,1,.5)
)
)

Now we can finally arrange the grobs to a grid, this will plot it
out. We can control the height of the each grob separately - here we
give 50% height to the coasting signal as it does not need as much
vertical real estate as the continuous signals do.
