Klarna Engineering

Disrupting the financial sector starts and ends with products that work, are easy to use and stable day after day. The Engineering competence is pivotal in creating, maintaining and developing the Klarna experience.

Follow publication

Stop Misusing ROC Curve and GINI: Navigate Imbalanced Datasets with Confidence

Angel Igareta
Klarna Engineering
Published in
9 min readNov 9, 2023

Understanding Model Predictions and Metrics

ROC Curve

This graphical representation details an example of a Receiver Operating Characteristic Area Under the Curve (ROC_AUC). The graph plots the True Positive Rate (TPR) on the y-axis against the False Positive Rate (FPR) on the x-axis. The graph features several curves, each representing a different classifier’s performance. The ‘random classifier’ curve, which represents a baseline model, is also highlighted. The curves of better performing models are closer to the top left corner.
ROC Curve Illustration: Comparing Classifier Performances from Best to Worst

GINI Coefficient

Precision Recall Curve

This graphical representation details an example of a Precision Recall Area Under the Curve (PR_AUC). The graph plots the Precision on the y-axis against the Recall (True Positive Rate) on the x-axis. The graph features several curves, each representing a different classifier’s performance. The ‘random classifier’ curve, which represents a baseline model, is also highlighted. The curves of better performing models are closer to the top right corner.
PR Curve Illustration with a target incidence of 0.1: Comparing Classifier Performances from Best to Worst.

The PR_AUC score offers a detailed insight into the model’s performance, especially on imbalanced datasets, proving to be an invaluable metric for data scientists and analysts.

Difference Between ROC_AUC and PR_AUC

A high number of TNs can result in a misleadingly low FPR, even with many False Positives, thereby inflating the ROC_AUC score and painting an overly optimistic picture of the model’s performance.

Practical Illustration: The Tale of Two Models

| Confusion Matrix      | Predicted Non-Defaulters | Predicted Defaulters |
|-----------------------|:------------------------:|:--------------------:|
| Actual Non-Defaulters | 9600 (TN) | 200 (FP) |
| Actual Defaulters | 100 (FN) | 100 (TP) |
| Confusion Matrix      | Predicted Non-Defaulters | Predicted Defaulters |
|-----------------------|:------------------------:|:--------------------:|
| Actual Non-Defaulters | 9700 (TN) | 100 (FP) |
| Actual Defaulters | 100 (FN) | 100 (TP) |

Comparison

Despite Model A incorrectly predicting 100 more customers as defaulters than Model B, the identical ROC_AUC scores would suggest equivalent performance.

Conclusions

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Published in Klarna Engineering

Disrupting the financial sector starts and ends with products that work, are easy to use and stable day after day. The Engineering competence is pivotal in creating, maintaining and developing the Klarna experience.

Written by Angel Igareta

Passionate about digital innovation. My goal is to use the data being collected in different domains to create new solutions that have a real impact.

No responses yet