Using AI-Driven Commodity Classification to Improve Business Spend Visibility
Have you ever wondered why your credit card bill is so high? What did I spend it on? What is the source of this subscription fee? Maybe I should get rid of it! I spent $200 a month on coffee? I should probably make my own coffee and reduce trips to the coffee shop. And just like we all need to analyze and understand our own personal spending, there is a similar need for businesses to better analyze, understand, and optimize their spend.
Every day, Coupa’s customers transact and manage billions of dollars of spend through several hundreds of thousands of transactions with their suppliers. These transactions involve the flow of commodities (goods and services) from suppliers to customers and payments flowing from customers to suppliers. Coupa’s customers deeply care about where their spend goes and how they spend in comparison to their peers. In fact, having a thorough understanding of spend by category is foundational for Business Spend Management (BSM).
In most transactions, the commodity purchased is listed from a taxonomy unique to the company, limiting the ability to gain insight from the community. Additionally, commodities listed against transactions are often inaccurate. Therefore, mapping transactional level spend to a standard taxonomy across all customers is foundational. In addition to driving spend visibility, it offers community-driven insights like supplier recommendations, pricing insights, sourcing optimization, fraud mitigation, and much more.
AI-based commodity classification
Coupa has developed a proprietary AI-based commodity classification engine that maps millions of our customers’ transactions to a standard taxonomy. We call it the Coupa taxonomy and, unlike other taxonomies, it is designed to facilitate high-value activities such as category management and strategic sourcing. Currently, the Coupa taxonomy has roughly 60 primary categories and 1,800 sub-categories.
What are the challenges with transaction level classification?
Classifying transactions across all customers and mapping them to a standard Coupa taxonomy comes with several challenges:
1. Scale of data in terms of volume and velocity at which it changes
Active customers per year and transactions per customer have been growing consistently at Coupa. This leads to a constant flux of new transactions flowing in, leading to pattern drift in the underlying data.
2. Lack of labelled representative data for AI models to train on
AI algorithms need labelled data that is representative of the population to learn patterns that extrapolate well for all customers. Even a small percentage of 160 million transactions adds up to several months of human work-hours to label.
3. Poor signal quality in some segments of the data
About 10 to 20 percent of transactions are hard to identify even for humans. This mainly occurs due to incorrect or incomplete inputs from users. Additionally, Coupa has customers all over the world, so the data comes in many languages which adds complexity for signal extraction.
4. Many sub-categories in Coupa taxonomy to be mapped to
The accuracy of the commodity classification engine usually decreases as the number of sub-categories increases. It requires more labelled data with higher quality, more compute resources, and better algorithms, which all add more complexity to the commodity classification process.
In order to overcome these challenges, we designed a “supplier-centric” approach. This approach was developed with the help of insights that came from discussions with internal sales and customer-facing teams along with extensive data analysis on a sample of transactions from the Coupa data. The key signals feeding Coupa’s commodity classification engine are from suppliers — type of company, invoice and purchase order descriptions, and customer commodities listed against transactions. A high level overview of the approach that was implemented that resulted in doubling the accuracy of transaction level classification can be found below.
How supplier classification works
In order to classify transactions, we need to understand the suppliers involved in these transactions and the commodities they deal with. Imagine you saw a $5 charge from Starbucks in your credit card statement. You may easily recognize that it was spent on coffee. You identified the commodity “coffee” because you knew what Starbucks supplies.
Figure 2 illustrates this concept. You can see Supplier A doing repair services and supplying bolts for two customers. We can deduce that the supplier primarily deals with two primary categories (1) Facility Maintenance and Equipment and (2) MRO Equipment and Supplies. Now we can confidently classify all the transactions from supplier A and restrict the mappings to those two categories.
The challenge here is to accurately map suppliers to the Coupa taxonomy and build a robust supplier database. Based on the sample analysis, we identified a few helpful supplier insights:
- Concentration of spend with few suppliers — 4% of the suppliers contribute to 85% of the spend, so the last 15% of the tail spend comes from 96% of the suppliers.
- Suppliers deal with a few categories — 95% of the suppliers deal with a single Coupa primary category, while the remaining suppliers have three or less primary categories. A few large marketplace suppliers such as Amazon and Staples have more than three primary categories.
- Commodities from suppliers almost always fall in the set of primary categories they deal with — For example, there is almost no chance Starbucks would supply products or services related to "Facility Maintenance & Equipment."
Based on these insights, we classified these high-spend suppliers (the 4% of suppliers per item 1 above) to the Coupa primary category. We did this with the help of our internal AI Classification (AIC) team that provides labeling services for our customers. This group’s expertise in classifying spend data helped us create high quality training data.
By first classifying to primary categories, we are able to improve sub-category classification. Also, it enables us to expand to the remaining 96% of the suppliers. We do this with a combination of “AI and human in the loop” techniques, signals from manually labelled data (from 4% of the suppliers), and other relevant signals such as customer commodities.
Read It Now
How customer commodity classification works
When customers use their own taxonomies, it still provides signals at an aggregate level, which also helps us classify suppliers. For example, in Figure 3, customer 1 assigns a transaction as “Toilet Repair'' and customer 3 assigns another transaction as “Pipe Leaks & Repairs,'' both coming from the same supplier A.
If you look closely at these two commodities, they both fall into “Facility Maintenance & Equipment" and specifically “Plumbing Services.” This helps us infer that Supplier A deals with “Facility Maintenance & Equipment" related goods and services, which helps us narrow down the possibilities to make the final classification.
Using this signal helps further build a robust supplier database. It helps customers with unique taxonomies gain more spend visibility into their own taxonomy. It helps them fill in the gaps in their own data.
Commodity classification shows how Coupa uses AI to enable features
Commodity classification is an example of how Coupa uses AI behind the scenes to enable many Coupa features. Commodity classification at the transaction level enables true spend visibility for our customers and unlocks many community-driven opportunities such as supplier recommendations, pricing insights, sourcing, and risk and fraud mitigation.
Though classification at the transaction level is challenging with pure AI, breaking it into sub tasks and bringing “human in the loop” techniques help reduce a great deal of effort and significantly improve the quality of classification output. We saw a 1.5X to 2X improvement with partial implementation of this approach, and we expect to see more improvements in the coming months. We believe this new approach will significantly improve our customer experiences and unlock other opportunities for our customers.