Managing Data Spikes to Reduce Forecasting Noise

When customers approach us requesting some guidance or feedback around the way they use SkuBrain, we are given the opportunity to observe their unique interactions with SkuBrain and draw some interesting conclusions.

We’ve noticed that some of our customers are uploading data, running large forecast jobs, and then re-running on a similar but “pruned” data set. This is a good practice, so we want all our users to be thinking about how forecasting data need to be filtered before accepting the results of the forecast.

A common filter is the elimination of data spikes that can throw off a forecast. Those filters can be applied earlier in the process to reduce the time needed to upload a new data set, and we have an illustration of this method using Excel (see below).

Most businesses see occasional spikes in their sales. Sometimes these spikes are data errors, and sometimes the spikes reflect real sales on a cyclical basis (holiday season, back to school, etc.). Unfortunately, error spikes tend to ‘pull’ the demand distribution in the direction of the spike, potentially skewing inventory planning. And if you do not look at your data before accepting a forecast, the error spikes could ruin the forecast and cost you money. So this is important, and it requires some judgement on the part of the business user.

So always look at your data, even if it is just a sample of SKUs or points-of-sale (POS). Look for spikes in the data that do not make sense, and do not hesitate to remove a record, or a set of monthly records, that seems to not match your more recent sales experience. You can always use SkuBrain to re-run a forecast next month, incorporating your most recent sales and inventory experience, so dropping some old data makes good sense given that new data arrive all the time, and the most recent data in general are more relevant to your forecasting needs.

From a forecaster’s perspective, spikes in the data should be researched separately from interpreting the forecast, because once the forecast is available most people tend to focus on the forecast and assume the earlier data filtering is sufficient. But consider this situation: a transportation company may operate vehicles from various locations. When repairs or maintenance are done at an alternate facility while a vehicle is in transit (say a freight depot where a breakdown occurred, rather than the maintenance depot where trucks are usually repaired), the purchase of spare parts should be recorded as one-time events so that the freight depot does not begin to stock parts for future repairs. Although this is a simple example, we have seen similar issues occur when running massive forecasts that incorporate information on geographic location. The solution is simple: filter out locations where the demand forecast is irrelevant -and do this with confidence, because you can always revisit the issue with new data, a new forecast, and knew insights as they develop.

Now, back to SkuBrain. In a forecasting system, we generally want to eliminate these types of irregular spikes from our forecasting. SkuBrain manages data spikes inside the algorithm tournaments that it runs and it will choose the algorithm with the lowest mean average percentage error (or MAPE). These algorithms, (like ETS, or exponential smoothing), tend to ‘smooth out’ spikes that might be considered statistically insignificant, and are suitable for forecasting data with no trends or seasonal patterns.

The way in which SkuBrain manages spikes over pre-determined time periods can be best illustrated in the following forecasts. Studying these graphs will help you to identify situations where a data spike should be pruned. Graph A shows a basic forecast devoid of any significant one-off spiking in the data:

Graph B shows how SkuBrain has forecast against a spike four years prior:

Graph C illustrates a single data spike in 2015:

As you can see, single spikes cause the forecast to flatten and diverge from the smooth upward trend over time. So if you see a spike, you’ll likely also see a forecast that looks a bit odd. If you are an Excel ninja, you could return to your data at this point, aggregate it by month and then identify where the spike is being generated and massage or remove the offending data. In fact, this is what yielded the results in Graph A.

If the data cleaning tasks exceeds your Excel capabilities, please keep an eye out for a SkuBrain blog post in the coming weeks that will guide you through this process, step by step.

Graph D shows how SkuBrain has incorporated a recent data spike into the forecast. Note that a recent spike is more likely to be incorporated into the forecast as a seasonal or cyclical event. This is where expert knowledge will help you to either trust the data (for example, if a promotional event explains the one-time spike) or remove the spike. Investigation is needed to determine what caused this spike and whether it was indeed erroneous or potentially recurring. This is where your expert knowledge of your specific business and use of the forecast is important because the forecast is an aid to making better decisions, but never a substitute for expert knowledge.

As mentioned earlier, in a future post we’ll show you a few simple ways to remove your data spikes with Excel before creating forecasts in SkuBrain. So stay tuned, and in the meantime, always look at your data, at the resulting forecast, and let us know if you have questions. We want you to get value from SkuBrain and to become a “Power Forecaster”.

SkuBrain is free to use for 7 days. Sign up here and start forecasting!