AI for Business: Understanding AI, Precision, and Recall

Michael Watson
Michael Watson
VP of Artificial Intelligence, Coupa

Mike Watson heads the AI COE at Coupa, he has a PhD in Industrial Engineering and 20+ years of experience leading global supply chain business teams. Mike co-founded Opex Analytics and grew the company to success in the AI market prior to integrating with LLamasoft and now Coupa.  He is also an adjunct professor at two masters programs at Northwestern University and the co-author of two books (Managerial Analytics and Supply Chain Network Design).

Read time: 6 mins
AI for Business: Understanding AI, Precision, and Recall

The popular media and some celebrity commentators give the impression that artificial intelligence (AI) is perfect or that the goal is superintelligence.

However, for practical AI applied to business problems, we can be very far from superintelligence and perfection, and still get tremendous value.

To understand how well our AI algorithms are predicting (think about predicting if something is true or false), there are two important measures to keep in mind: precision and recall.

According to Wikipedia:

  • Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances.
  • Recall (also known as sensitivity) is the fraction of relevant instances that were retrieved.

It is much easier to understand this definition with a practical example. And, we’ll use this example to highlight the important concepts for a business user or leader.

Finding duplicate invoices with AI

Assume we want to find duplicate invoices. That is, we want the AI algorithms to predict when an invoice might be a duplicate and we are at risk for paying it twice.

Let’s say we have 1,000 invoices and we know that there are exactly 10 duplicate invoices in here. (In new data sets, we won’t know the true number of duplicates, but we train and test our algorithms on data sets that we’ve already determined the true number of duplicates.)

Figure 1
This represents the 1,000 invoices (not all are shown) and the 10 actual duplicates in yellow. In reality, these wouldn’t necessarily be next to each other.

Our AI algorithm comes back with five invoices that it thinks are duplicates, but really only four of them are duplicates. In this case our precision is 80% (four out of five were correct). This sounds good. However, this is not the full story because we know there were 10 duplications. This is where recall comes in. The recall in this case is 40% (we found only four of the 10 duplicates). Now the 80% precision is balanced by the 40% recall. Maybe we don’t like the 40% recall.

Figure 2
Our algorithm predicted five duplicate invoices. Four were correct; one was not. This is a precision of 80%, but a recall of 40%.

So, we go back and adjust the AI algorithm. This time the algorithm predicts 18 duplicates of which nine are actually duplicates. Our precision dropped to 50% (only nine of the 18 were true duplicates), but our recall goes up to 90% (we found nine of the 10 duplicates).

Figure 3
This algorithm predicted 18 duplicate invoices. Nine were correct; nine were not. This is a precision of 50%, but a recall of 90%.

This tradeoff always exists. You can do better on both measures by using new approaches or new data. But, you will always face the tradeoff between picking the best precision (but the worst recall), the best recall (but the worst precision), or some mix of the two.

From this example, you can get two important insights:

  1. You need both measures when understanding how accurate an AI algorithm is. There is a tradeoff and no one measure will capture how accurate the algorithm is.
  2. Emphasizing precision or recall is dependent on the problem you are solving.

Understanding the tradeoff between precision and recall

At Coupa, for the part of our Spend Guard product that looks for duplicate invoices, we think that recall is more important than precision. That is, since we’ve seen duplicate invoices that were as high as $1.5 million, we want to surface as many of those as possible (we want a high recall). We also think that it is relatively easy to deal with a lower precision. That is, we are guessing that people will be happy to review quite a few invoices that aren’t duplicates in order to find a duplicate that is worth $500,000.

In other cases at Coupa, we think precision is much more important. If predicting the failure of expensive machines in the supply chain means an expensive preemptive repair, we want to make sure that these repairs are done only on machines sure to fail. That is, we don’t want to work on machines that aren't going to fail.

And, as you think about this tradeoff, you can see why you hear about failed AI efforts in healthcare. In healthcare, the AI algorithms need to perform well on both precision (you don’t want to tell many people they have a serious condition when they don’t) and recall (you don’t want people with a serious condition to go undetected). However, they can’t avoid the tradeoff between the two measures. So, even if the algorithms are great on one measure, it won’t be good enough. To get to success in healthcare, there is a lot more work needed to create the right techniques and get the right data to be good enough on both measures.

It is important to understand both precision and recall. And, when you apply this to practical business situations, you should understand where you want to be on the precision vs. recall tradeoff.