Close this search box.

GUEST COMMENT Big data & ecommerce: towards better demand forecasts

Image courtesy of SpreckleyPR

Image courtesy of SpreckleyPR

‘Retail and ecommerce businesses frequently have to find ways to utilize unsold merchandise and eventually some of the inventory must be liquidated. As a result, associated financial losses can quickly rack up – in 2020 the value of excess inventory in only the clothing industry reached between £120-£130bn worldwide. While no one could have predicted the pandemic, there is still room for better forecasting. The costs are simply too severe.

Improving forecasts would result in tremendous benefits to ecommerce businesses and the world at large. As many may already know, big data is the answer. However, big data by itself won’t solve anything. Only through developing modes of interpreting the meaning of that data is the key to unlocking its true power.

Understanding the essence of data

We have to get a little philosophical before continuing. Data is only as good as it is actionable – it has to have a practical application. However, its application is limited by the source. For example, we generally regard financial statements issued by a particular company as a way to understand how well it’s doing.

Financial statements can be considered as a simplified image of the state of the company. However, they might not paint a 100% accurate picture. There are cases where financial statements might be a little misleading such as startups running on massive losses but with large backings. In these cases, adding more data would make the picture more accurate.

In an idealised sense, data is a simplified representation of the world. Analysis, in turn, is the process of taking that information and deriving accurate real-world conclusions. Processes like data governance ensure that data is of high quality (i.e., is as accurate as possible).

Even exhaust data, something left essentially by accident, represents something. A large collection of a specific type of exhaust data, such as documents produced through regular business activities, can reflect a particular state of some businesses. 

For example, a sudden rise in new business contracts can be an indicator of company growing. Essentially, we are looking at a cause (sudden rise of business contracts) and predicting an effect (growth). How accurate that prediction is, depends on the volume and quality of data. If we want to predict something accurately, having more information on the causes is always better.

Think of any particular event in the future, like the next general election in the UK. If we want to predict it accurately, we’d like to know about every possible cause. Failing to account for one would lead us to different conclusions. Therefore, accounting for more potential causes, leads to more accurate predictions.

Demand forecasting is no different. Demand is a future event that has many causes. Access and the ability to process large swaths of information gathered from every corner of the globe greatly increases the accuracy of forecasts. 

Improving demand forecasting

In many ecommerce businesses, demand forecasting is still being done according to the historical sales of a single product. While complicated statistical methods have already been developed (mostly by working with time series), they haven’t moved beyond univariate analysis. Previously, that would have been a wise decision as little other data was available. Now, it seems more like tradition rather than good mathematics.

Let’s dissect what goes into product demand. Of course, historical data of that particular product is a reflection of its demand at a particular time and place. These circumstances are unlikely to repeat themselves exactly. In order to get a more complete picture of possible variance, as researchers Bandara et al. have noted, we have to take into account other similar products.

Getting a time series of other products headed under the same category provides us with a glimpse of different paths the same forecast may take. For simplicity’s sake, think of any product that lived through a popularity period (e.g. fidget spinners). Accurate predictions will be nigh impossible from one data point as it’s a non-essential commodity that is at mercy to the whims of the crowd.

However, if there were other similar products, which included their yearly sales variance into the calculations, this would bring insight into the possible peaks and troughs. Of course, some proportional approximations should be done as the historical data of product #1 will perfectly reflect its own previous demand, but the historical data of product #2 won’t be as accurate for product #1.

Yet, even with the most complex mathematical methods and large internal databases, forecast predictions will not reach maximum accuracy. In order to do so, external data is required.

How external data drives forecasts

A decade ago, most companies were limited to internal data. Now, with the advent of web scraping and its ever-increasing global availability, significantly more data points are available. For ecommerce and demand forecasting, that is a boom never seen before.

Web scraping is so incredibly beneficial as it provides access to information much closer to the consumer. Previously, businesses derived data about demand (which is more of a prediction about human behavior rather than anything else) through product-related information. Clearly, demand can be better predicted if data closely related to the consumer is included.

Let’s go back to the previous example – a product that experienced a surge in popularity. Any prediction derived from historical data is likely to be inaccurate or simply have so much variance that it will be impractical. Such surges are usually due to external factors – either instant (e.g., celebrity marketing) or exponential (e.g., compounding good reviews and popularity of related user generated content).

With large-scale web scraping, that information, if it is public, can be gathered. Advanced machine learning and semantic analysis tools can be employed to derive various data points such as sentiment. Adding all these factors into forecast predictions models might make them more resilient in both the short and long term.

In short, web scraping provides access to incredibly valuable and previously inaccessible information. Of course, certain barriers have to be broken down before any business can start aggregating external data. Web scraping is an incredibly technically complicated process, especially at scale. So much so that most businesses don’t do it in-house, but turn to third-party web scraper API providers.

External data does have a unique benefit, though. It is often created and shared publicly by the consumer who is already involved with the product. Implementing these data points into prediction models covers significantly more causes of demand. In turn, forecasts are made more accurate.

Better forecasting benefits not only the business at hand. While it does reduce losses incurred from unsold products and wasted hours on inventory management, it also greatly reduces the negative environmental impact on all areas of the supply chain. As businesses are less likely to oversupply themselves with some products, less unnecessary items will be produced. Their destruction will be solved at the root.

As the technologies of external data acquisition improve over time, new opportunities for demand forecasting will arise. We have yet to see the full potential of big data harnessed in ecommerce. However, the benefits might be much greater than most people might have anticipated. It can aid us not only in optimising our businesses, but help reduce environmental impact too.


Gediminas Rickevičius, Director of Strategic Partnerships at

Read More

Register for Newsletter

Group 4 Copy 3Created with Sketch.

Receive 3 newsletters per week

Group 3Created with Sketch.

Gain access to all Top500 research

Group 4Created with Sketch.

Personalise your experience on