Interactive exploratory data analysis (EDA) of sensor data with Pandas: Multivariate time series data

Visualizing multivariate time series data with the pandas plotting API

This post shows the basic look and feel of the pandas plotting API applied to typical multivariate sensor data, each represented as time series. Feel free to visit the source code repository, press the “Binder” button to open the repository in a Binder environment and explore the plot type interactivity in the notebook time_series_multivariate.ipynb. Of course not every plotting type makes sense to visualize multivariate time series data. However for the sake of completion and to make it clear that some plot types make no sense I’ve added GIFs for all of them. If a plotting backend does not support a plot type I skipped the GIF in the corresponding section. In the Binder environment I’ve tried to plot with all plot types to force output of the exceptions. These exceptions relate not to wrong usage of the pandas plotting API but could help to figure out that a plot type is simply not supported (yet).

Again, as with pandas Series for working with univariate time series data pandas DataFrames ease working with several time series data sets significantly. One reason are the pandas builtin capabilities for visualizing DataFrames using the pandas plotting API. Usually it’s most reasonable to construct DataFrames with Series as columns. Most plot types visualize each column like when done plotting the Series as is. Of course some plot types which relate data sets to each other required to define select columns before plotting (e.g. the hexbin plot and scatter plot).

In the example we construct univariate fake data of an ideal temperature sensor as pandas Time Series (Series with Datatime index), temperature_series. Ideal means that we ignore sensor data uncertainty for now.

temperature_d = random.sample(range(20, 20+10), 10)
temperature_dti = pd.date_range("2020-01-01 12:00:00.000001", periods=10, freq="S").tz_localize("Europe/Berlin")
temperature_series = pd.Series(data=temperature_d, index=temperature_dti, name="Temperature")

In addition we construct univariate fake data of an ideal humidity sensor as pandas Time Series, humidity_series.

d = [i for i in reversed(range(60, 70, 1))]
dti = pd.date_range("2020-01-01 12:00:00.000001", periods=10, freq="S").tz_localize("Europe/Berlin")
humidity_series = pd.Series(data=d, index=dti, name="Humidity")

Again, ideal means that we ignore sensor data uncertainty for now. Ideal means as well that the timestamps do exactly match which will never be true in the real world. Usually we’d either choose a less accurate time stamp resolution (e.g. in the range of seconds instead of microseconds). The accuracy chosen only to show which accuracy may be processed using pandas in general. Another real world option would be to use an IntervalIndex with Datetimes as interval boundaries.

To being able to plot several time series and beeing able to relate them to each other one has to combine the series into a dataframe df_row_wise. This can be done e.g. like follows

frames = [temperature_series.to_frame().T, humidity_series.to_frame().T]
df_row_wise = pd.concat(frames)

and results in this DataFrame:

Series combined into a dataframe (row wise).

However plotting of the ataframe datastructure is column-centric and converting a Series Index into a DataFrame Index makes more sense. This can be achieved e.g. like follows

df_column_wise = df_row_wise.T

and results in this DataFrame:

Series combined into a dataframe (column wise).

The following sections show how the plot types look like and behave for the different plotting backends in the default configuration. Comments w.r.t. the visualization of several time series in a single plot have been added (relating data between time series, “data overlap”).

area plot (altair)

The altair backend seems to select the order of dataframe columns/time series correctly that the higher value area is not hidding the lower value area. However the backend seems to be buggy w.r.t. visualizing area plots (the humidity is not approx. 90).

area plot (pandas bokeh)

The pandas_bokeh backend uses a transparency effect which prevents areas from overlapping each other and hiding information from the user. In addition the selection of data points is easy and all time series data values are shown for a given point in time.

area plot (hvplot/holoviews)

The hvplot backend hides the lower value area behind the higher value area.

Bar plots are not suitable to visualize time series data. The comments for this plot type have been skipped.

bar plot (altair)
bar plot (pandas_bokeh)

Horizontal bar plots are not suitable to visualize time series data. The comments for this plot type have been skipped.

barh plot (altair)
barh plot (pandas_bokeh)
barh plot (hvplot/holoviews)
box plot (altair)
box plot (hvplot/holoviews)
hexbin plot (hvplot/holoviews)
hist plot (altair)
hist plot (pandas_bokeh)
hist plot (hvplot/holoviews)
KDE plot (hvplot/holoviews)
line plot (altair)
line plot (pandas_bokeh)
line plot (hvplot/holoviews)
pie plot (pandas bokeh)
scatter plot (altair)
scatter plot (pandas_bokeh)
scatter plot (hvplot/holoviews)

The visualizations of multivariate time series data represented as DataFrame (time series as columns, datetimes as index) is based on the visualizations of pandas Series. This means the conclusion from the post Interactive exploratory data analysis (EDA) of sensor data with Pandas: Univariate time series data applies here as well. In addition one has to consider that it should be possible to relate data from different time series to each other. Also probable effects w.r.t. data overlap need to be considered.

As for the visualiaztions of Series, the altair backend is recommended for the box plot.

As for the visualization of Series, the hvplot backend is the only plotting backend which supports the density plot and KDE plot.

As for the visualization of Series, the pandas_bokeh backend is the most suitable one for all remaining plot types, with line plot beeing the most important one.

For hist plots either the pandas_bokeh or the hvplot backend is recommended.

Software Developer for rapid prototype or high quality software with interest in distributed systems and high performance on premise server applications.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store