Automating Receipt Digitization with AI
We know filing expense reports can be tedious, especially if you have to manually enter data from receipts. If you travel a lot, you want this to be as easy as possible. And, you want your teams and colleagues to minimize the time spent on this activity as well. Since employees in a company can collectively spend a lot of time on this, we thought that automating this task was important.
To give you a perspective, in Coupa alone close to a million receipts get submitted every month. There is a lot of value we give to our customers if we can shave time from all the related expense reports.
Coupa’s receipt extraction tool, which is part of Coupa Expense, helps us do exactly that. It also provides a seamless user experience, returning the key fields (as seen in the figure above: merchant, date, currency, and amount) within seconds and with high accuracy. Users upload the receipt and that’s it! Expense line details are automatically filled in, on the go.
Behind the scenes
Coupa’s receipt extraction model consists of Google’s Vision API to get the receipts’ text and coordinates, with our in-house machine learning (ML) models built on top. We opted to use the Google Vision API as it is the best Optical Character Recognition (OCR) solution in the market. The OCR API returns to us the text (or more specifically the characters) in the receipt and the location of those on the receipt (much like graph coordinates). But it can’t read the text or understand its context. That is what our ML models are built to do. We built four different ML models to extract each of the fields from the incomprehensible OCR output.
All these steps happen within a few seconds. Not only is our solution fast, it’s also accurate. Our models were more accurate than text-reading services from the largest leading firms offering services in this area (because they are focused on reading all types of text, while we focused our algorithms on reading receipts) and more accurate than third-party manual data entry (because these services need to go fast and people make mistakes).
How are we able to do it? We have developed robust feature engineering using our rich community data and utilize additional context around expense receipts. For example, we use the submission time of a receipt as an input to correctly extract the expense date. Similarly, we use the default user currency as an input to the algorithm, to return better results for the receipt’s currency. Combining all these innovations has made our receipt extraction models one of the best solutions in the market.
Within this framework, there were still challenges. Expense receipts come in from over millions of merchants, many of them seen only once in our community. This means there are hundreds of thousands of templates and layouts that the algorithms should be able to understand correctly to return to us our fields of interest. A lot of methods use rule-based extraction built on top of predefined templates. But this approach would not have worked for us.
Besides the sheer variety in the receipt templates, there is also the issue of variety in receipt legibility. Sometimes the receipt images are of poor quality, are handwritten or faded, have shaky images, have extremely small fonts, have wrinkles, etc. In these cases, the character recognition fails and our models can’t do anything with really bad data and return null.
Our models were built over a series of multiple iterations. We built and tested many hypotheses, carefully layered the ML with sets of rules to decrease false positives, and experimented with multiple models and features until we reached the level of accuracy that makes this feature reliable.
It was not just accuracy that we improved. When you are doing expense reports, you want it to go fast. Our models took account of speed and accuracy trade off depending on the value of the extracted field to our customers. For example, having a higher accuracy for amount and currency is more important than their processing time. Whereas, for merchant and expense date fields, we tuned the algorithms to prioritize speed.
This product keeps user experience front and center, while deploying the best ML technology in the market behind the scenes. We will continue to increase the accuracy and decrease the processing time as we move ahead, so Coupa Expense users can spend their time in more meaningful ways instead of fretting about that pile of receipts.