Far from the financial analysts of Wall Street, MIT has developed a more efficient forecasting model with the consideration of alternative data to carry out the balance sheet and define the value of a company. The study was published in the December 2019 issue of the very serious Proceedings of the ACM on Measurement and Analysis of Computing Systems.
MIT researchers ready to replace the best financial analysts on Wall Street? Or at least the forecasting model they've trained to do so? At least that's what a publication in the December 2019 edition of the Proceedings of the ACM on Measurement and Analysis of Computing Systems.
A forecasting model for what?
The purpose of this model was to forecast the quarterly earnings of about 30 companies, a task usually performed by financial analysts. It goes without saying that such a tool would be in fine . for potential investors who today call on these specialists. In this particular case, these experts and analysts can rely on a variety of public data, calculation tools and, above all, their own intuition to deliver their reports and thus predict the future profits of this or that company. As you will have understood, knowing a company's turnover can help to determine its value and whether it is a good idea to bet and invest on it in the future.
Based on this principle, the Massachusetts Institute of Technology and its researchers have therefore developed an automated model that does much better than humans in forecasting sales by using different data and, of course, putting aside feelings or intuition. This model relies in particular on what is known as noisy data in very limited quantities. Noisy data is data that can be corrupted, distorted or with a very low signal-to-noise ratio. Data yes, but with potentially uninterpretable information...
What alternative data to better predict?
Say also alternative data, they had never before made it possible to obtain more precise or more frequent estimates of a company's future sales and profits, contrary to the studies of financial analysts. But that was before. Before MIT researchers took up the subject and obtained better results by incorporating them into a classical linear systems model along with more usual but less frequent financial data such as quarterly earnings, press releases, and other stock prices. Combine these two types of data and you will have better forecasts, or at least more accurate forecasts, than those delivered by the usual financial analysts.
However, in order to fully understand the interest of these alternative data in the evaluation of the health of a company, it is first necessary to define their outlines. If the financial markets have been paying close attention to them for some years now, but have not yet been able to use them, it is because they provide a great deal of information about the consumer and, consequently, about the company that sells products or services to them. These alternative data can therefore be location data from smartphones, data on credit card purchases or, more surprisingly, satellite images capable of determining the number of cars present in a merchant's or retailer's parking lot.
57% of forecasts outperformed by financial analysts
Michael Fleder, a postdoctoral researcher in the Information and Decision Systems Laboratory at MIT, sums it up as follows: "Alternative data are those strange substitution signals that allow you to track the underlying finances of a company. And to clarify: We asked ourselves if we could combine this noisy data with quarterly figures to determine a company's true financial data, and the answer is yes."57% of the forecasts made by this means surpassed the estimates of financial analysts who had access to private and public company data or other machine learning models.
It is thus easy to understand the interest of such a model, whether for potential investors or even traders who would like to know more about their competitors' sales. Social and political science researchers could also see it as a way to learn more about people through the study of aggregate, anonymous data available for purchase.
A lack of data to be more precise?
Indeed, there is already a great deal of consumer data available for sale. Buying credit card transaction data or location data can allow a retailer to know exactly what its competitors are selling, or advertisers to see whether their campaigns have actually increased sales. However, these results are still subject to human expertise, since no machine learning model yet allows this.
This time, it seems to be the case even if there are still problems to be solved such as... lack of data. "We have a problem of « small data "says Michael Fleder. You only get a tiny fraction of what people spend, and you have to extrapolate and infer what actually happens from that fraction of data."A quarterly report on a company will end up being just a single figure as an input, while bank card data will represent just 100 noisy additional data points, so with potentially uninterpretable information.
How to calculate daily sales?
Using a pension fund, the researchers recovered consumer credit card transactions and quarterly reports from 34 retailers between 2015 and 2018. Even so, with 306 quarterly data collected for all the companies they studied, the MIT researchers managed to do better than financial analysts.
Calculating the daily sales of a business or retailer is actually quite simple in the absolute. The model assumes that a company's sales ultimately remain similar from one day to the next, only increasing or decreasing significantly. In mathematical language, this is equivalent to multiplying the daily sales values by a constant value that ultimately represents the slight variation and a statistical noise value that represents the randomness of a company's sales. With these parameters, a standard inference algorithm will be able to solve this equation and deliver an accurate forecast of these daily sales. The trick is of course to determine these parameters.
Kalman's filter for predicting daily sales
This is where quarterly business reports come into play, using, of course, probability techniques. For it is not enough to divide the figures reported on the quarterly reports by 90 days to obtain a company's daily sales. That would mean that a company's sales are the same day after day. Clearly, however, they are not identical in essence! Then there is the inclusion of alternative data relating to credit card purchases, for example. It is impossible to determine their fraction in total sales and how noisy and therefore inaccurate they are.
To estimate the possible sales on a single day, the researchers therefore used the Kalman filter, a variant of the standard inference algorithm used for example in speed cameras or the GPS of your smartphones. The Kalman filter makes it possible to estimate the states of a dynamic system from incomplete or noisy measurement series. In this case, the Kalman filter generates a probability distribution by measuring noisy data observed over time.
The model is then trained as follows: Quarterly sales are decomposed into a specific number of measured days, which then allows sales to vary each day. All observed and noisy credit card data are then related to as yet unknown daily sales. By extrapolating from the quarterly figures, it is then possible to determine the likely share of credit card data in total sales. This makes it possible to determine the fraction of sales per day, the noise level and even an estimate of the error caused by this technique.
The loop is then closed and the standard inference algorithm retrieves from there all the data it needs to predict a company's daily sales. Even better than Wall Street analysts, in 57.2% of cases. A good start, some will say, while waiting for an even more precise and accurate model.