Automatic product classification from POS receipts
Today, every mom & pop shop, convenience store, restaurant, fast-food franchise, bistro, or cafe has its own Point of Sale (POS) system. POS solutions are used for managing inventory, sales, and finances. They are helpful in managing the business, and they also make life easier for the staff. Long gone are days of pen and paper for writing down orders.
The food and beverage industry represents a significant proportion of the FMCG trade. This sector is largely untracked owing to the complexity of product classification by outlet. The merchants usually describe the products themselves without any guidance.
Common classification of products can therefore enable a reporting of general trends. This would benefit outlet owners as well as suppliers – e.g. beverage producers.
Let’s look at an example of a 330ml can of Coca-Cola. This product can be coded as:
Coca-Cola 330ml
Coke 0,33l
CC 330ml
Coke can
Soda 330
Etc.
We believe you can already see the issue. It is difficult to generalize the sales patterns if the same product is described in many ways.
That said, for every problem, there is a solution.
In this blog post, we will explore the process of unifying the product category classification from receipts.
Product Category Classification in 6 steps
1. Labeled Examples
To train the ML models, labeled examples are required. At datasapiens, we possess a vast product catalogue that can be utilized to pre-train the ML model.
We enhance the model's accuracy and effectiveness via an iterative process of expansion and refinement.
2. Sampling
Stratified random sampling is a technique used to ensure that each product category is equitably represented in the sample.
Why do we balance the sample? Product classifications can often be biased within a sample category, especially where limited classifications exist. It is therefore important to ensure that each classification has equal representation so that all products may be properly classified.
3. Pre-processing the dataset
The first step in preparing our dataset involves applying classic transformations to the labels. This includes converting all labels to lowercase and using Unicode transformation. Then, we remove all articles, prepositions, and conjunctions that do not impact the meaning of the label. Additionally, we replace mass, volume, packaging, and percentage information with generic keywords.
Finally, we reduce words to their stems by applying stemming techniques to further standardize the label format. Another option is lemmatization, which considers the morphological analysis of the words.
Lastly, we remove the common words that are not specific to any category. For example retailer brands like Kroger or Walmart.
4. Word Embeddings
We first need to convert the text into a numeric representation to use prediction algorithms on text data. This process is known as text vectorization, and there are several methods available to accomplish this task. The most used methods include Bag of Words, TF-IDF, Word2Vec, GLoVE, ELMo, and BERT. These methods generate a Term-Document matrix (TDM) that represents the text data numerically. This TDM can then be used to perform various machine-learning tasks, such as classification or clustering.
The text vectorization methods are coupled with the aforementioned pre-processing steps.
We run various word embedding models and then compare performance to select the best-suited one.
5. Feature Engineering
Feature engineering is a crucial step in the machine learning process that involves using expert knowledge. The end goal is to enable machine learning algorithms to learn effectively.
Good feature engineering can significantly enhance the performance of a model. Think of the analogy to providing accurate information to a learning child. By creating meaningful features, we can provide a more accurate representation of the underlying patterns in the data. Overall, feature engineering is a vital technique that helps us extract relevant information from our data. This then in return improves the performance and interpretability of our models.
Examples:
volumetric labels: e.g. food is never sold in ml
adjectives: red / white / rose for wine vs dark, light, wheat for beer
price: e.g. spirits and wine are generally more expensive per unit than beer or soft drinks
volume: e.g. spirits [0.02, 0.05 dl] vs wine [0.1 / 0.2 dl] vs beer in larger quantities
location or chain: e.g. Heineken pubs will sell Heineken
6. Model Training
By now, we have obtained the embedded representation of the labels along with additional features. We can proceed with training Neural Networks (NN) to predict the product classification.
Convolutional and Recurrent Neural Networks are the most effective types of NN for this task. They can leverage the spatial and temporal dependencies in the data. These networks use sophisticated architectures to learn from the data. They are capable of achieving high accuracy in classification tasks.
Overall, leveraging NN techniques can help us classify labels accurately. And as a result, we can extract valuable insights from the data.
We run multiple NN models and compare performance to select the best-suited one.
The opportunity in the hospitality industry or independent corner shop retail is huge. Nowadays, suppliers like Coca-Cola as well as the shop owners themselves need to navigate these waters almost blind.
Delivering accurate data can help the industry become more data-driven, efficient and customer-centric.
If you’re a food&beverage manufacturer, or a major POS provider to the restaurant and convenience sector, get in touch to discover unlocked value.