Backtest metrics¶
merton.backtest provides standalone implementations of the metrics most
commonly used in credit-risk model validation. Each matches the equivalent
scikit-learn metric to machine precision; we avoid the sklearn dependency
because real validation suites need to be auditable without optional deps.
Metric |
Function |
Range |
Interpretation |
|---|---|---|---|
AUC |
|
|
Probability a random defaulter has higher PD than a random non-defaulter. 0.5 is random; 1.0 is perfect. |
Accuracy Ratio (Gini) |
|
|
|
Brier |
|
|
Mean squared error between PD and the 0/1 default indicator. |
KS statistic |
|
|
Max gap between the defaulter and non-defaulter CDFs of PD. |
Hosmer-Lemeshow χ² |
|
|
Goodness-of-fit χ² statistic. Use |
Calibration curves¶
from merton.backtest import calibration_curve, calibration_plot
cc = calibration_curve(predictions, defaults, bins=10, strategy="quantile")
calibration_plot(predictions, defaults)
A well-calibrated model has cc.fraction_positives ≈ cc.mean_predicted
in every bucket. Persistent over- or under-prediction shows up as the
curve diverging from the 45° reference.
Rolling window¶
from merton.backtest import rolling_window
result = rolling_window(panel_df, window="252D", step="21D",
pd_col="pd", default_col="default", date_col="date")
result.to_pandas() # one row per window with AUC, AR, Brier, KS