# Interactive exploratory data analysis (EDA) of sensor data with Pandas: Multivariate time series data

## Visualizing multivariate time series data with the pandas plotting API

This post shows the basic look and feel of the pandas plotting API applied to typical multivariate sensor data, each represented as time series. Feel free to visit the source code repository, press the “Binder” button to open the repository in a Binder environment and explore the plot type interactivity in the notebook time_series_multivariate.ipynb. Of course not every plotting type makes sense to visualize multivariate time series data. However for the sake of completion and to make it clear that some plot types make no sense I’ve added GIFs for all of them. If a plotting backend does not support a plot type I skipped the GIF in the corresponding section. In the Binder environment I’ve tried to plot with all plot types to force output of the exceptions. These exceptions relate not to wrong usage of the pandas plotting API but could help to figure out that a plot type is simply not supported (yet).

## Pandas DataFrames simplify working with time series data

Again, as with pandas *Series *for working with univariate time series data pandas *DataFrames *ease working with several time series data sets significantly. One reason are the pandas builtin capabilities for visualizing *DataFrames* using the pandas plotting API. Usually it’s most reasonable to construct *DataFrames *with *Series *as columns. Most plot types visualize each column like when done plotting the *Series *as is. Of course some plot types which relate data sets to each other required to define select columns before plotting (e.g. the *hexbin plot* and *scatter plot*).

In the example we construct univariate fake data of an ideal temperature sensor as pandas *Time Series *(Series with Datatime index), `temperature_series`

. Ideal means that we ignore sensor data uncertainty for now.

`temperature_d = random.sample(range(20, 20+10), 10)`

temperature_dti = pd.date_range("2020-01-01 12:00:00.000001", periods=10, freq="S").tz_localize("Europe/Berlin")

temperature_series = pd.Series(data=temperature_d, index=temperature_dti, name="Temperature")

In addition we construct univariate fake data of an ideal humidity sensor as pandas *Time Series*,* *`humidity_series`

.

`d = [i for i in reversed(range(60, 70, 1))]`

dti = pd.date_range("2020-01-01 12:00:00.000001", periods=10, freq="S").tz_localize("Europe/Berlin")

humidity_series = pd.Series(data=d, index=dti, name="Humidity")

Again, ideal means that we ignore sensor data uncertainty for now. Ideal means as well that the timestamps do exactly match which will never be true in the real world. Usually we’d either choose a less accurate time stamp resolution (e.g. in the range of seconds instead of microseconds). The accuracy chosen only to show which accuracy may be processed using pandas in general. Another real world option would be to use an *IntervalIndex* with *Datetimes *as interval boundaries.

To being able to plot several time series and beeing able to relate them to each other one has to combine the series into a dataframe `df_row_wise`

. This can be done e.g. like follows

`frames = [temperature_series.to_frame().T, humidity_series.to_frame().T]`

df_row_wise = pd.concat(frames)

and results in this *DataFrame*:

However plotting of the ataframe datastructure is column-centric and converting a Series Index into a DataFrame Index makes more sense. This can be achieved e.g. like follows

`df_column_wise = df_row_wise.T`

and results in this *DataFrame*:

The following sections show how the plot types look like and behave for the different plotting backends in the default configuration. Comments w.r.t. the visualization of several time series in a single plot have been added (relating data between time series, “data overlap”).

## Area plot

The **altair backend** seems to select the order of dataframe columns/time series correctly that the higher value area is not hidding the lower value area. However the backend seems to be buggy w.r.t. visualizing area plots (the humidity is not approx. 90).

The **pandas_bokeh**** backend** uses a transparency effect which prevents areas from overlapping each other and hiding information from the user. In addition the selection of data points is easy and all time series data values are shown for a given point in time.

The **hvplot**** backend** hides the lower value area behind the higher value area.

## Bar plots

Bar plots are not suitable to visualize time series data. The comments for this plot type have been skipped.

## Horizontal bar plot

Horizontal bar plots are not suitable to visualize time series data. The comments for this plot type have been skipped.

## Box plot

## Hexbin plot

## Hist plot

## KDE plot

## Line plot

## Pie plot

## Scatter plot

## Conclusion

The visualizations of multivariate time series data represented as *DataFrame* (time series as columns, datetimes as index) is based on the visualizations of pandas *Series*. This means the conclusion from the post *Interactive exploratory data analysis (EDA) of sensor data with Pandas: Univariate time series data* applies here as well. In addition one has to consider that it should be possible to relate data from different time series to each other. Also probable effects w.r.t. data overlap need to be considered.

As for the visualiaztions of Series, the `altair`

backend is recommended for the **box plot**.

As for the visualization of *Series*, the **hvplot**** backend** is the only plotting backend which supports the **density plot** and **KDE plot**.

As for the visualization of *Series*, the **pandas_bokeh**** backend** is the most suitable one for all remaining plot types, with **line plot** beeing the most important one.

For **hist plots** either the **pandas_bokeh**** or** the **hvplot**** backend **is recommended.