Receipt recognition with Azure

We are constantly looking for ways to help you get the most out of your data. Our customer ask us a POC to recognize information from receipts.

Expense reports can be a very cumbersome and time-consuming task. Between all the manual data entry, approval workflows, and auditing, there are many pain points across the end-to-end process. With the you can minimize those pain points and increase the productivity of your employees, delivering real value back to your business.

Receipt processing lets you read and save key information from common sales receipts, like those used in restaurants, gas stations, retail, and more. Using this information, you can automatically pre-populate expense reports simply by scanning photos of your receipts. And when you automate the process at a large scale, there is the potential to save you and your business valuable time and money.

The prebuilt model uses state-of-the-art optical character recognition (OCR) to extract both printed and handwritten text from receipts. You can retrieve valuable information such as the merchant details, transaction date and time, list of purchased items, tax, and totals.

No training or prior configuration is required to use this prebuilt model. Start processing receipts right away in your apps and flows using the new canvas app component and AI Builder flow action.

Text translation

You can now use AI Builder to easily translate text to more than 60 languages. This prebuilt model is powered by the latest innovations in machine translation. You can use Text translation to process text in real-time from different languages from your customers worldwide, for internal and external communications and to keep language consistency in the text data that you store. Now available in preview, no trial or subscription required to try this feature.

Building a Text Classifier using Azure Machine Learning

Recently a client came to us to see if we could help them automate their RFP distribution system.  Currently the client has an employee manually check several websites for RFPs and alert the appropriate business vertical when a relevant RFP is found.  The current system requires manual data scraping, meaning the process is slow and results in RFPs being missed.  For the proof of concept phase with the client, we decided to build a machine learning model to classify the RFPs correctly and provide a way to automate the routing of the RFPs.  The client wanted to break the project into stages so once the initial Proof of Concept was successful, other parts required to automate the whole process would receive the go-ahead. If you would like a proof of concept, visit our Business Analytics page for information.

Due to the abbreviated time-period, we decided to use Microsoft’s Azure Machine Learning Studio to build the model.  Azure Machine Learning Studio provided great visualizations of the model for the client.  When developing an end-to-end solution for the client, Azure Machine Learning Service will be implemented.  If you are curious about the differences between Azure ML Studio and Azure ML Service, this article provides an excellent explanation. 


First I looked through the Azure AI Gallery to see if there were any projects that would provide guidance in building our text classifier. I found the BBC News Classifier was a great fit. 


Model Evaluation – Confusion Matrix:  


If the model is built correctly, one should see distribution like what is shown above.  The model assigns a probability per category to reflect its confidence in how to categorize each story.  It is normal for a news story to be classified in one main class, but the model recognizes there is a probability that the story could belong to multiple classes. 

The metrics from the model also showed good accuracy on the model. 


Step 1: Receiving and cleaning the data. 

The client uploaded several RFPs into different folders in Teams that were labeled with the client’s verticals.  One of the challenges not solved in this POC is scraping the data from an RFP.  Our focus was on starting small with the classifier to keep things moving forward. Every municipality creates their own version of an RFP, so most RFPs are not uniform.  For this POC, the RFP summary data was scraped manually and added to a data file. 

Step 2: Create the Model 

To start, we followed the BBC News Classifier model outline.  The R-Script module and the found in the BBC News Classifier were switched with the pre-built Preprocess Text module.  When the initial model was run, it classified all the data into the one bucket that had the largest number of examples.  The model was run again including only data with labels with a high number of examples and a comparable amount in each bucket.  Again, poor results.  Time to re-think the model. 

Microsoft has a great reference library around the modules available in Machine Learning Studio.  While looking through the documentation around Text Analytics, two modules additional modules were found to test: “Extract Key Phrases from Text” and “Extract N-Gram Features from Text.”  The Extracting Key Phrases from Text module extracts one or more phrases deemed meaningful.  The Extract N-Gram Features from Text module creates a dictionary of n-grams from free text and identifies the n-grams that have the most information value.  The new model was run with a Multi-class Decision Forest algorithm instead of the Multi-Class Neural Network.  When the model was run with all the category labels, the results were closer to what was expected, but not yielding accurate results. 

Artboard 6-1
Artboard 7
Artboard 8

One drawback was the labels with minimal data were not classifying correctly.  The model was re-run with only category labels with higher and comparable amounts of data. 

Artboard 10
Artboard 11
Artboard 12

Whoops! That was a step in the wrong direction.  Maybe the n-gram feature wasn’t the best text analytics module to try.  What happens if Feature Hashing is used instead?  Feature Hashing transforms a stream of English text into a set of features represented as integers.  The hashed features can then be passed to the machine learning algorithm to train the text analysis model. 

Artboard 13
Artboard 14
Artboard 15

Well, that accuracy is much better but maybe a bit too good.  Even though the lowest number of decision trees, least amount of depth, and the least number of random splits were used the accuracy of the model was too good. We should expect to see some distribution or a small probability that the RFP could be classified in other categories. 

This could be due to the size of the dataset that is being used.  It was good to find out that Feature Hashing does a better job than Extracting Key Phrases from Text or Extracting N-Gram Features.  What happens if a different machine learning algorithm (like the Multi-Class Neural Network) is used? 

Artboard 16
Artboard 17
Artboard 18

This is the best model yet.  Distribution is across category labels as expected.  There is a good chance of overfitting, but that can be worked out with additional data added to the model. 

Since this was the best model yet, it was re-run will data from all category labels. 

Artboard 19
Artboard 20
Artboard 21

Results were encouraging, but clearly more data will be required to appropriately label all categories.  As more data is added, there will be more improvements to the model.  Two options worth considering would be applying an ensemble approach or trying NLP techniques like entity extraction, chunking, or isolating nouns and verbs. 

Step 3:  Automate the Model 

Azure Machine Learning Studio’s option to Set Up a Web Service was used to create a Predictive Experiment and deploy as a web service.  Then using the ML Studio add-in in Excel, a template was created where data can be added, the model can be run, and predictions bucketed into a scored probability column. 

Artboard 22

The next step was to create a table that reads the predicted data that can be picked up by a Flow.  The Flow is set up to send a notification to a channel on Microsoft Teams. 

Artboard 23

This is not a final solution.  Several additional steps in a further POC will be needed to be completed to set up a fully automated solution, but the initial results are promising.  What’s important to understand is how flexible this process can be. If the client scoped a different set of requirements, or was in a different industry, we could easily tailor a solution to fit their pain. 

Overview Of Azure Kinect

Pre-requisite Knowledge

 Before we start with the understanding of what is Azure Kinect, we should know,

Background I would like to explain the short information about ‘Artificial Intelligence and Kinect’ before jumping in to ‘Azure Kinect’. 

What is Artificial Intelligence

 In simple words ‘Artificial Intelligence (AI)’ is the artificial creation of the system like a human who can observe, react, learn, plan and process the instructions, virtual reality and provide intelligence on it. It is rapidly emerging technology and internet enable technology. Sometimes AI is also called as Machine Learning. 

What is Kinect and its background

 Kinect is the motion sensor device using in Xbox 360 gaming console. This device provides natural user interface to interact with it without any intermediate device. This device has capability of face detection as well as the voice recognition. This device has 3D camera which creates the virtual images and with the help of motion sensor it detects the movements of the images. The first-generation Kinect for Xbox 360 was introduced in November 2010. This device was originally created for gaming purpose, but now a days this technology is applying to real worlds applications in the virtual shopping, education, healthcare industries, digital signage etc. This product is developed by Microsoft. 

Introduction of Azure Kinect

 As I explained above Kinect is the motion sensor device. Azure Kinect device has,

  1. DK camera system
  2. 1MP depth camera
  3. 360-degree microphone
  4. 12MP RGB camera
  5. Orientation senor
  6. Size and weight – 103 x 39 x 126 mm and weighs only 440g
Overview Of Azure Kinect

Image Source – Microsoft Docs Azure Kinect has ability to create platform for developers with Artificial tools and plug this in to the Azure cloud for cloud-based service, computer vision and speech models. Azure Kinect has its own developer kit (DK) by Microsoft which is available in the portal site here. Microsoft Azure Kinect SDK has new sensor SDK, body tracking SDK, vision APIs, speech service SDK for Azure Kinect DK. This is the latest released feature by Microsoft for Azure cloud. Please note that Azure Kinect DK is not designed for use with Xbox. By using Azure Kinect, now we can build the applications like cashier less stores, manage inventory of the products, track the patient movements integrate these motions with the AI in hospital, enhance physical therapy, improve and monitor athletic performance, computer vision and speech models etc. We can enhance feature of Azure Kinect application with Azure cognitive services. Transcribe and translate speech in real time using Speech Services. Add object, scene, and activity detection or optical character recognition using Computer Vision or use Azure IoT Edge to manage PCs connected to your Azure Kinect DK device. 

Overview Of Azure Kinect

Image Source – Microsoft Docs Azure Kinect device price is $399.00 and can be purchased from Microsoft’s store here. As of now (12th August 2019) this product is only available in the US and China. 

Inside of Azure Kinect DK

Overview Of Azure Kinect

 Image Source – Microsoft Docs

  1. 1MP depth sensor with FOV option
  2. 7-microphone array for speech and sound capture
  3. 12-MP RGB video camera for an additional color stream
  4. Accelerometer and gyroscope (IMU) for sensor orientation and spatial tracking
  5. External sync pins to easily synchronize sensor streams from multiple Kinect devices
  6. Azure Kinect Developer Kit
  7. Purchase Azure Kinect from Microsoft Store