Automating Receipt Digitization with AI

Syed Zakiuddin & Shoan Jain
Syed Zakiuddin & Shoan Jain

Syed Zakiuddin is a Data Scientist at Coupa's AI-Center of Excellence. He joined Coupa after completing his Bachelor's in Chemical Engineering from Indian Institute of Technology Bombay.

Shoan Jain is a Senior Data Scientist at Coupa's AI-Center of Excellence. She joined Coupa through acquisition of Deep Relevance, a startup that used AI to monitor occupational fraud. She holds a Masters degree in International and Development Economics from Yale University and a bachelor's degree in Economics from Indian Institute of Technology, Kanpur.

Read time: 4 mins
Automating Receipt Digitization with AI

We know filing expense reports can be tedious, especially if you have to manually enter data from receipts. If you travel a lot, you want this to be as easy as possible. And, you want your teams and colleagues to minimize the time spent on this activity as well. Since employees in a company can collectively spend a lot of time on this, we thought that automating this task was important.

To give you a perspective, in Coupa alone close to a million receipts get submitted every month. There is a lot of value we give to our customers if we can shave time from all the related expense reports.

Receipt Information Extraction Example

Coupa’s receipt extraction tool, which is part of Coupa Expense, helps us do exactly that. It also provides a seamless user experience, returning the key fields (as seen in the figure above: merchant, date, currency, and amount) within seconds and with high accuracy. Users upload the receipt and that’s it! Expense line details are automatically filled in, on the go.

Learn how to build travel and expense policies that are easy for employees to understand and simple to enforce by reading our eBook, What CFOs Need to Know About Expense Management.

Behind the scenes

Coupa’s receipt extraction model consists of Google’s Vision API to get the receipts’ text and coordinates, with our in-house machine learning (ML) models built on top. We opted to use the Google Vision API as it is the best Optical Character Recognition (OCR) solution in the market. The OCR API returns to us the text (or more specifically the characters) in the receipt and the location of those on the receipt (much like graph coordinates). But it can’t read the text or understand its context. That is what our ML models are built to do. We built four different ML models to extract each of the fields from the incomprehensible OCR output.

All these steps happen within a few seconds. Not only is our solution fast, it’s also accurate. Our models were more accurate than text-reading services from the largest leading firms offering services in this area (because they are focused on reading all types of text, while we focused our algorithms on reading receipts) and more accurate than third-party manual data entry (because these services need to go fast and people make mistakes).

How are we able to do it? We have developed robust feature engineering using our rich community data and utilize additional context around expense receipts. For example, we use the submission time of a receipt as an input to correctly extract the expense date. Similarly, we use the default user currency as an input to the algorithm, to return better results for the receipt’s currency. Combining all these innovations has made our receipt extraction models one of the best solutions in the market.

Within this framework, there were still challenges. Expense receipts come in from over millions of merchants, many of them seen only once in our community. This means there are hundreds of thousands of templates and layouts that the algorithms should be able to understand correctly to return to us our fields of interest. A lot of methods use rule-based extraction built on top of predefined templates. But this approach would not have worked for us.

Besides the sheer variety in the receipt templates, there is also the issue of variety in receipt legibility. Sometimes the receipt images are of poor quality, are handwritten or faded, have shaky images, have extremely small fonts, have wrinkles, etc. In these cases, the character recognition fails and our models can’t do anything with really bad data and return null.

Our models were built over a series of multiple iterations. We built and tested many hypotheses, carefully layered the ML with sets of rules to decrease false positives, and experimented with multiple models and features until we reached the level of accuracy that makes this feature reliable.

It was not just accuracy that we improved. When you are doing expense reports, you want it to go fast. Our models took account of speed and accuracy trade off depending on the value of the extracted field to our customers. For example, having a higher accuracy for amount and currency is more important than their processing time. Whereas, for merchant and expense date fields, we tuned the algorithms to prioritize speed.

Learn how to get full visibility and control over all your spend with Coupa’s procurement management solutions.

This product keeps user experience front and center, while deploying the best ML technology in the market behind the scenes. We will continue to increase the accuracy and decrease the processing time as we move ahead, so Coupa Expense users can spend their time in more meaningful ways instead of fretting about that pile of receipts.