merton.backtest.metrics¶

Backtest metrics for PD models.

All metrics here use the Wilcoxon-Mann-Whitney / rank-based identities so there’s no sklearn dependency. They match sklearn.metrics outputs to machine precision on the binary-classification case.

Functions¶

`auc`(→ float)	Area under the ROC curve.
`accuracy_ratio`(→ float)	Gini coefficient `AR = 2 · AUC − 1`.
`brier`(→ float)	Brier score (mean squared error of probabilistic predictions).
`ks_statistic`(→ float)	Kolmogorov-Smirnov statistic — max gap between the two empirical CDFs.
`hosmer_lemeshow`(→ tuple[float, float])	Hosmer-Lemeshow goodness-of-fit χ² and (chi², dof).

Module Contents¶

merton.backtest.metrics.auc(predictions: merton._typing.ArrayLike, defaults: merton._typing.ArrayLike) → float[source]¶

Area under the ROC curve.

Uses the Wilcoxon-Mann-Whitney identity: AUC equals the probability that a randomly chosen defaulter has a higher predicted PD than a randomly chosen non-defaulter.

\[\mathrm{AUC} = \frac{R_+ - n_+(n_+ + 1)/2}{n_+ \cdot n_-}\]

where R_+ is the sum of ranks of the positive class, n_+ is the number of positives, and n_- the number of negatives.

merton.backtest.metrics.accuracy_ratio(predictions: merton._typing.ArrayLike, defaults: merton._typing.ArrayLike) → float[source]¶: Gini coefficient AR = 2 · AUC − 1.

merton.backtest.metrics.brier(predictions: merton._typing.ArrayLike, defaults: merton._typing.ArrayLike) → float[source]¶: Brier score (mean squared error of probabilistic predictions).

merton.backtest.metrics.ks_statistic(predictions: merton._typing.ArrayLike, defaults: merton._typing.ArrayLike) → float[source]¶: Kolmogorov-Smirnov statistic — max gap between the two empirical CDFs.

merton.backtest.metrics.hosmer_lemeshow(predictions: merton._typing.ArrayLike, defaults: merton._typing.ArrayLike, *, bins: int = 10) → tuple[float, float][source]¶

Hosmer-Lemeshow goodness-of-fit χ² and (chi², dof).

Returns (chi_squared, degrees_of_freedom). P-values come from scipy.stats.chi2.sf(chi2, dof).