Time Series Forecasting for Call Center Metrics

David Rose
6 min readSep 26, 2018

Tasked with helping to minimize call answer and issue resolution times within a customer support call center, I used a combination of usual time-series forecasting (ARIMA) along with a popular classification technique (Boosted Trees, in this case adapted for regression).

I wanted to break down the general seasonality (hour-by-hour or time of year) and overall trends of these two topics over time, which are a result of multiple independent (and some dependent) variables. For the main issue of call answer times, which is measured as a daily average called (average-speed-to-answer or ASA), the average time (daily mean) had been on a consistently upward trajectory for a year or two.

So the two areas of focus are:

  1. Forecasting where this value will be in future quarters.
  2. Identifying the traits that most determine this value.

Forecasting Trends

Looking at the overall quarterly trend, you can see a clear upward trajectory that we would like to reverse.

And when looking closer, you can see how this behaves over the course of an average day. You can see the spikes in the morning and lulls during lunch time.

Applying ARIMA

After some extensive data cleaning and pre-processing I run it through an ARIMA model to get a general idea of where it is headed.

In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. Both of these models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting).

What affects the ASA?

The next step is to dive a bit deeper and figure out how the components of this work. We know it is generally combination of:

  1. call volume
  2. number of operators
  3. length of calls.

We can visualize this with the plot below:

With the above data, we can see that the first two yellow lines (staffing, call volume) have fairly consistent trends, but talk time seems to be all over the place with no clear pattern.

There is a formula developed called the Erlang-C that uses these numbers to compute the queue times in call centers. For this case it makes too many assumptions (distributions and operator habits) to work well. For example an operator is not committed 100% of the time to answering or waiting. Sometimes they have notes, training, or bathroom breaks. Some operators are much more experienced than others, and this shift can change over time. I also think that if the length of calls (talktime) were more consistent I would have better luck. As it is the algorithm did not work very well, sometimes under-predicting values by as much as half.

Solution? Treat it as any other supervised learning problem!

In this case I fall back to one of my favorites: Microsoft’s LightGBM implementation within Python. It uses a gradient boosting method of decision trees that follows splits all the way down a parent leaf node before moving back up to the next top-level leaf.

I will treat the speed-to-answer (ASA) as the target feature and all others as potential training features. To help observe model performance in an understandable way I will still treat this as a sort of time-series model, where each train/test/evaluate iteration is a specific slice of time (day, week, etc.).

The Sliding Window Approach

To best evaluate true performance of a model, one may normally take random slices of a dataset and train/evaluate on each of them individually and averaging out the errors. This helps get a true feel for all possible future cases by avoiding over-fitting your model to a specific set of test data, which in this case could be one day or one week of data.

But due to being time-series data the dataset can ‘leak’ information to nearby time-steps. If call volume is abnormally high at 10am on 2018–03–15, we can assume it is probably to also be high at 11am as well. But in the real world if I am predicting the next day’s ASA I won’t know the future call volume at 10am that next day. Below is one method of working around this issue:

In this example I train on a year of data, and test the next day. Then slide forward one day and repeat.

Maybe the model performs very poorly in the winter months, and well in the warmer months? Evaluating on many possible situations I can be more confident in the model’s current performance.

How does this work in practice?

Below are some plots of single day evaluations on the predicted ASA, averaged out by the hour. Keep in mind that while this may look impressive the model ‘cheats’ by knowing future features such as the operators on staff and the call volume. In reality we don’t know these a day out, but the purpose of this model is to discover how much ASA is explained by the training features, so that we can pull those levers in the future and keep ASA within preferred limits.

Say we want to see the change in predicted ASA if we increased staff by 5% or decreased the length of an average call by 20 seconds, we can artificially impute those values and run a new prediction with the trained model.

Conclusion

While in the past many time-series models focused just on extracting trends and seasonality then applying a forward looking approach, with the wealth of data and computational power these days there are many more opportunities for developing more robust models using all sorts of extra information.

If you just want see a hands-off future of your data the ARIMA model is just fine, but in many cases there are features in the data that are partially or fully under your control (staffing level in this case) that may adjust dynamically as time goes on. So implementing a supervised model that can break down the relationships between all the variables can be very important in making the business decisions on more than just gut feelings.

The current trend is moving towards a deep-learning approach, specifically wave-net for time-series, that can better capture time related data and learn more intricate patterns. I hope to try that approach next and see how this can be improved.

--

--