AI does not rhyme with (big) data
Is data the lifeblood of the business or can we do without it to create efficient Artificial Intelligence? Forecasting.ai has taken up the subject.
No, get it out of your head! Size doesn't necessarily have to do with performance. We are obviously talking about machine learning. Indeed, researchers from the University of Waterloo have recently demonstrated the feasibility of "less than one-shot" learning."less than one-shot" learning". In other words, a model can learn to identify something without ever having seen an example of it. How can this be done?
In their September 2020 paper, entitled ""Less Than One"-Shot Learning: Learning N Classes From M<N Samples", researchers Ilia Sucholutsky and Matthias Schonlau from the University of Waterloo in Ontario explained how they were able to create a machine learning model capable of classifying items with less than one example per class.
Machine learning in less than one shot
The principle is ultimately simple to understand. Take the example of an individual who must capture a unicorn without ever having seen one before. "He is not familiar with the local wildlife and there are no pictures of unicorns, so the humans show him a picture of a horse and a picture of a rhinoceros, and tell him that a unicorn is something between the two", the two researchers then explain. "With just two examples, the individual has now learned to recognize three different animals". In effect, the individual will know how to inherently identify a horse, a rhinoceros and, by "deduction," a unicorn.
To carry out this learning, which we will call LO-shot, the researchers chose a kNN (k-Notest Neighbors) type classifier, a relatively simple supervised automatic learning algorithm that normally relies on labeled data. The difference is that the labeled data were not labeled in the same way. The kNN was indeed fed with data with much looser labels than those usually used. What the researchers say in their paper about a soft label is simply "the vector representation of a point's simultaneous membership in multiple classes". To summarize, a picture of a rhinoceros used to be labeled "rhinoceros" when today it can be labeled not only "rhinoceros" but also "mammal", "quadriped", etc.
Less data but still data
By exposing the algorithm to data samples that do not have a direct one-to-one relationship with a specific class, but rather to data that reflect a continuous spectrum between two points, the researchers theorized that the algorithm would be able to induce the correct class, even if it did not actually see it.
The researchers trained their algorithm, called "soft-label prototype k-NN" (SLaPkNN), on the soft labels and found that it could correctly classify classes that it had not been exposed to in the training data. Indeed, to return to the unicorn example, the SLaPkNN learned to identify one by seeing pictures of a horse and a rhinoceros and learning that the unicorn is somewhere in the middle.
Without going into the details of this research, we can see that data is still needed to train a model. A lesser number of data to perform machine learning, but still data.
Democratization of AI in sight?
Allowing AIs to learn with less data will still contribute to democratize the field of artificial intelligence. With smaller AIs, it will be easier for academia to continue its research in the field and keep its best researchers. LO-shot learning lowers the barriers to entry by reducing training costs and lowering data requirements. It also gives users more flexibility to create new data sets and experiment with new approaches.
Smaller companies will finally have the opportunity to train models even if they don't have significant data. A new approach that could soon liberate many companies even if data has not yet said its last word!