Documentation

SkuBrain 101

Detecting Duplicates

When you upload data to SkuBrain for the first time, it will usually include historical data up to “today”. Then you run a forecast and create a replenishment plan. After a month, you will want to refresh your data again, so that you can compare forecasts to actuals. Should you just re-upload everything? Or just upload the new month’s data by itself? The answer depends on how you did your first upload.

Duplicate Row Detection

SkuBrain detects duplicate rows using the LineReference column in the input. This column should contain a unique identifier for that row. Think of it as a key for that row.

If SkuBrain encounters two rows with the same LineReference, the second occurrence is considered an update. Here is an example:

In this example, rows 3 and 4 will give only one entry, as both have the same Line Reference = L13272. It will either be 12 or 25, but not 12 + 25. (There is a technical reason for this). However, the important thing is, in a single file, you should never have rows with the same LineReference.

Next Month

Now, suppose your import file in the next month (for the same project) has additional rows of data, but also includes the following entry:

Because the LineReference (L13272) is the same as an existing row in the database, SkuBrain will update the Sale Quantity for that SKU to 3.0. You can do this on purpose as well, because this is an easy way to handle returns (which affect your sales history).

So What?

Now that we know how LineReference works, the answer to our question above is, “It Depends”:

  1. You specified LineReference values explicitly in your original upload…
    Simply continue to do so in your subsequent uploads. You can upload files with just the new data (“incremental” uploads), or you can upload files that includes the old data and the new data together, in one massive file. As long as the old data contains keeps the original LineReference values, and your new data has new LineReference values, everything will tally up.
  2. You didn’t specify LineReference values in your original CSV upload…
    SkuBrain automatically generates unique LineReference values if these weren’t specified. Therefore, subsequent uploads should only contain new data. It should not contain data you’ve already uploaded before. If you do, SkuBrain will treat these as new rows and you’ll find your Sales History doubling every time you upload a new file (it may look great, but it wouldn’t be accurate!)