Filling large gaps in time series using forecasting

Carlo Alberto Carrucciu
Target Reply | Insights Hub
3 min readApr 2, 2021

--

Abstract

In this story I will show an easy approach to fill large gaps in time series, maintaining a certain truthfulness and data validity.

The approach consists in applying a forecasting in both sides of the gap, and combining the two prediction using interpolation.

In my specific case, I was busy in analyzing a dataset divided into three parts. One of the required analysis was to trying various forecasting techniques. For this reason I will explain how you can you apply this approach also to attach consequential files, concerning same data entity.

Case study

In this specific case we are going to use exponential smoothing with seasonality, indulging the characteristic of our timeseries. Obviously each single case, have its timeseries and its own most suitable forecasting technique… the approach can be repeated using the preferred technique.

Data

Data are divided into three separated text files. Every dataset is composed by the same columns. Surveys have been recorded minutes by minutes, and there are no missing rows.

It is important to have an index well formatted… it is possible using the pandas.data_range() function!

Chronologically first dataset

The first columns of the txt files contains the date time, that is parsed by Pandas and used as index. However the index contains wrong timestamp, need to set a new index before continue, changing id such as 2015–02–02 14:19:59 in 2015–02–02 14:20:00

The cleaning procedure is repeated for the three files having in this manner three different datasets, chronologically ordered from 1 to 3.

Forecasting

Now we are ready to start with the forecasting. Example is given using Humidity column. The implementation is provided by the library statsmodel. Method used is exponential smoothing with seasonality to cover daily periods.

Records in a day are sixty (minutes in an hour) times 24 (hours), and for this reason season periods are of 1440.

Code to fill a gap between two timeseries

Forward

First the selected forecasting technique is applied in a standard way, fitting it in the second time series in order to fill the hole between itself and the third one time series.

Backward

While then, time series are reverted, in order to apply the forecasting in the opposite side. In order to make it working, till the forecasting is made to works only forward (that’s makes sense), it’s important to adjust also all indexes… so taking in count also how many records consist in the gap ot fill it.

Interpolation

Now we have the predictions in the two directions, only needs to be interpolated. Result are shown below.

The approach is so repeated for all attributes of our dataset until we have records for each single timestamp minute by minute, from the first one to the last one

At this point we can concat all dataframes created in a single one and save it.

Thanks for Attention

Download the jupyter notebook here

Note: this is my first story at Medium. I appreciate your valuable feedback and encouragement.

--

--