One of the most common interview questions for Machine Learning Engineers and Data Scientists goes something like this:
Describe the difference between Accuracy, Precision, and Recall.
(see Classification Metrics for a detailed answer)
Accuracy measures what percent of all predictions your model gets right, Precision measures out of all cases where you predicted a positive outcome what percent were you right, and Recall measures out of all of the true positive cases in your data what percent you predicted right. They all measure the efficacy of a model but in different ways.
The famous analogy is that if you’re predicting a rare cancer that is present in only 1% of patients, you can have an accuracy of 99% by simply always predicting “No Cancer” - but you would have a Recall and Precision of 0%. If you choose Accuracy as your primary metric you appear to have done quite well, but your model is completely ineffective.
When we begin developing a model for a specific task, one of the first things we must do is choose a set of evaluation metrics we will use to measure success. We choose these metrics before we start developing the model, because choosing metrics helps us align on the goals of the model. Choosing metrics is an intention-setting exercise, and it guides what steps we take next.
When we make changes to the model we measure whether or not the metrics improved to ensure we are on the right path. We explore techniques and parameter tweaks that will have the biggest impact on these metrics. We compare multiple different models using these metrics to see which is “best.” Do you see what I’m getting at here?
If you choose the wrong metrics, you end up with a model that is optimized for the wrong goal - and we’re not just talking about machine learning anymore.
Are you healthy? Are you measuring your mile time, body-weight percentage, Apo-B, or daily step count?
Are you successful? Are you measuring your net worth, yearly salary, savings rate, or the hours per week you spend working for someone else?
Are you happy? Are you measuring the number of attendees at your annual birthday party, the duration of your longest romantic partnership, the average number of international trips you take per year, your Cortisol levels, or your meditation streak?
The metrics you choose will guide the choices you make. They will influence how you compare yourself to others. They will determine your sense of “enough.”
No metric is better than the other - measuring them just leads to different outcomes. So which outcome do you want to optimize?