For Arcum, the answer is a definite no! Accuracy is only one metric for measuring the performance of classification models. In a classification model, an event like a merchant leaving can either happen or not happen (the dependent variable). When an event happens, accuracy is the proportion of predictions the model got right. When modeling merchants at risk of leaving, the question would be how many merchants predicted to leave actually left.

The problem with accuracy as a measure when it comes to merchant attrition in a portfolio, is that most of the merchants end up staying. Thus, if we imagine a portfolio where 10 percent of merchants leave in a year, then a primitive model predicting that merchants will always stay will have an accuracy of 90 percent. Again, the issue is that such a model is not useful at all, since it gave us zero helpful information.

**Confusion matrix and other metrics**

To develop more useful metrics, we need to introduce some machine learning jargon (apologies in advance). When the model gets a prediction correctly, such as a merchant leaving, we refer to it as a true positive. When the model predicts that a merchant will leave, but the merchant instead stays, then we refer to it as a false positive. Similarly, when the model correctly predicts that a merchant will stay with the processor, we refer to it as a true negative. Finally, when the model incorrectly misdiagnoses the merchant as staying, while the merchant actually leaves, we denote this case as a false negative. Technically speaking, accuracy is equal to the total number of true positives and true negatives, and then divided by the total number of observations.

**Recall (aka sensitivity)**

What kind of metrics can help us avoid the pitfalls of the primitive model described above? One useful metric to consider is the recall rate. Recall is a metric that quantifies the number of correct positive predictions made out of all positive predictions that could have been made. Obviously, the primitive model above saying that all merchants should stay has a zero recall rate, since it actually predicted zero merchants out of those that actually left.

**Precision**

Another critical metric worth mentioning is precision. Precision refers to the number of true positives divided by the total number of positive predictions. In practice, if the precision of the model is 90 percent, and the model predicts that a particular merchant will leave, it will leave with a likelihood of 90 percent. We take personal pride and responsibility in the precision of our algorithms. In fact, the word Arcum is Latin for “bow”, which was the first ultra-precise weapon available to humans.

**F1 score: AI’s best friend**

A great algorithm has both high recall and precision rates. It identifies most of the merchants that leave and makes few mistakes. However, in practice there is a trade off between high recall and high precision rate. For example, the model that predicts that only one merchant will leave and gets the prediction correct, will have 100 percent precision, but a close to zero recall rate. On the other hand, the model that says that all merchants will leave, will have a 100 percent recall rate, but a very low accuracy and precision. One solution to this problem is to look at the F1 Score, which is equal to the product of precision and recall, divided by their average.

**Conclusion**

When building AI models, it is important to not focus on a single metric such as accuracy. Instead, it is important to consider an array of them, including precision and recall. F1 Score forces the model to maintain a balance between precision and recall, disproportionately penalizing the model for getting either of them too low. At some point, we were so excited at Arcum about F1 Score, that we even wanted to rename the company and purchased the domain f1score.com. Well, we are still Arcum, but we do own the domain and deliver on F1 Score, precision, and recall for all of our clients. Again, apologies for nerding out, sometimes you just can’t avoid it when talking about AI.

Co-author: Mikhail (Mike) Dmitriev, Ph.D.