merton.backtest.metrics

Backtest metrics for PD models.

All metrics here use the Wilcoxon-Mann-Whitney / rank-based identities so there’s no sklearn dependency. They match sklearn.metrics outputs to machine precision on the binary-classification case.

Functions

auc(→ float)

Area under the ROC curve.

accuracy_ratio(→ float)

Gini coefficient AR = 2 · AUC 1.

brier(→ float)

Brier score (mean squared error of probabilistic predictions).

ks_statistic(→ float)

Kolmogorov-Smirnov statistic — max gap between the two empirical CDFs.

hosmer_lemeshow(→ tuple[float, float])

Hosmer-Lemeshow goodness-of-fit χ² and (chi², dof).

Module Contents

merton.backtest.metrics.auc(predictions: merton._typing.ArrayLike, defaults: merton._typing.ArrayLike) float[source]

Area under the ROC curve.

Uses the Wilcoxon-Mann-Whitney identity: AUC equals the probability that a randomly chosen defaulter has a higher predicted PD than a randomly chosen non-defaulter.

\[\mathrm{AUC} = \frac{R_+ - n_+(n_+ + 1)/2}{n_+ \cdot n_-}\]

where R_+ is the sum of ranks of the positive class, n_+ is the number of positives, and n_- the number of negatives.

merton.backtest.metrics.accuracy_ratio(predictions: merton._typing.ArrayLike, defaults: merton._typing.ArrayLike) float[source]

Gini coefficient AR = 2 · AUC 1.

merton.backtest.metrics.brier(predictions: merton._typing.ArrayLike, defaults: merton._typing.ArrayLike) float[source]

Brier score (mean squared error of probabilistic predictions).

merton.backtest.metrics.ks_statistic(predictions: merton._typing.ArrayLike, defaults: merton._typing.ArrayLike) float[source]

Kolmogorov-Smirnov statistic — max gap between the two empirical CDFs.

merton.backtest.metrics.hosmer_lemeshow(predictions: merton._typing.ArrayLike, defaults: merton._typing.ArrayLike, *, bins: int = 10) tuple[float, float][source]

Hosmer-Lemeshow goodness-of-fit χ² and (chi², dof).

Returns (chi_squared, degrees_of_freedom). P-values come from scipy.stats.chi2.sf(chi2, dof).