Automatic product classification from POS receipts

Today, every mom & pop shop, convenience store, restaurant, fast-food franchise, bistro, or cafe has its own Point of Sale (POS) system. POS solutions are used for managing inventory, sales, and finances. They are helpful in managing the business, and they also make life easier for the staff. Long gone are days of pen and paper for writing down orders.

The food and beverage industry represents a significant proportion of the FMCG trade. This sector is largely untracked owing to the complexity of product classification by outlet. The merchants usually describe the products themselves without any guidance.

Common classification of products can therefore enable a reporting of general trends. This would benefit outlet owners as well as suppliers – e.g. beverage producers.

Let’s look at an example of a 330ml can of Coca-Cola. This product can be coded as:

  • Coca-Cola 330ml

  • Coke 0,33l

  • CC 330ml

  • Coke can

  • Soda 330

  • Etc.

We believe you can already see the issue. It is difficult to generalize the sales patterns if the same product is described in many ways.

That said, for every problem, there is a solution.

In this blog post, we will explore the process of unifying the product category classification from receipts.


Product Category Classification in 6 steps

1.    Labeled Examples

To train the ML models, labeled examples are required. At datasapiens, we possess a vast product catalogue that can be utilized to pre-train the ML model.

We enhance the model's accuracy and effectiveness via an iterative process of expansion and refinement.

2.    Sampling

Stratified random sampling is a technique used to ensure that each product category is equitably represented in the sample.

Why do we balance the sample? Product classifications can often be biased within a sample category, especially where limited classifications exist. It is therefore important to ensure that each classification has equal representation so that all products may be properly classified.

3. Pre-processing the dataset

The first step in preparing our dataset involves applying classic transformations to the labels. This includes converting all labels to lowercase and using Unicode transformation. Then, we remove all articles, prepositions, and conjunctions that do not impact the meaning of the label. Additionally, we replace mass, volume, packaging, and percentage information with generic keywords.

Finally, we reduce words to their stems by applying stemming techniques to further standardize the label format. Another option is lemmatization, which considers the morphological analysis of the words.

Lastly, we remove the common words that are not specific to any category. For example retailer brands like Kroger or Walmart.

4. Word Embeddings

We first need to convert the text into a numeric representation to use prediction algorithms on text data. This process is known as text vectorization, and there are several methods available to accomplish this task. The most used methods include Bag of Words, TF-IDF, Word2Vec, GLoVE, ELMo, and BERT. These methods generate a Term-Document matrix (TDM) that represents the text data numerically. This TDM can then be used to perform various machine-learning tasks, such as classification or clustering.

The text vectorization methods are coupled with the aforementioned pre-processing steps.

We run various word embedding models and then compare performance to select the best-suited one.

5. Feature Engineering

Feature engineering is a crucial step in the machine learning process that involves using expert knowledge. The end goal is to enable machine learning algorithms to learn effectively.

Good feature engineering can significantly enhance the performance of a model. Think of the analogy to providing accurate information to a learning child. By creating meaningful features, we can provide a more accurate representation of the underlying patterns in the data. Overall, feature engineering is a vital technique that helps us extract relevant information from our data. This then in return improves the performance and interpretability of our models.

Examples:

  • volumetric labels: e.g. food is never sold in ml

  • adjectives: red / white / rose for wine vs dark, light, wheat for beer

  • price: e.g. spirits and wine are generally more expensive per unit than beer or soft drinks

  • volume: e.g. spirits [0.02, 0.05 dl] vs wine [0.1 / 0.2 dl] vs beer in larger quantities

  • location or chain: e.g. Heineken pubs will sell Heineken

6. Model Training

By now, we have obtained the embedded representation of the labels along with additional features. We can proceed with training Neural Networks (NN) to predict the product classification.

Convolutional and Recurrent Neural Networks are the most effective types of NN for this task. They can leverage the spatial and temporal dependencies in the data. These networks use sophisticated architectures to learn from the data. They are capable of achieving high accuracy in classification tasks.

Overall, leveraging NN techniques can help us classify labels accurately. And as a result, we can extract valuable insights from the data.

We run multiple NN models and compare performance to select the best-suited one.


The opportunity in the hospitality industry or independent corner shop retail is huge. Nowadays, suppliers like Coca-Cola as well as the shop owners themselves need to navigate these waters almost blind.

Delivering accurate data can help the industry become more data-driven, efficient and customer-centric.

If you’re a food&beverage manufacturer, or a major POS provider to the restaurant and convenience sector, get in touch to discover unlocked value.

Previous
Previous

The Restaurant Industry in 2023: Leveraging Data for Success

Next
Next

Lunar New Year skyrocketed the sales of fresh food in Thailand