Key Takeaway
Credit risk modeling has traveled from the elegant option-theoretic framework of Merton (1974) through the tractable reduced-form hazard models of the 1990s to modern machine learning pipelines that ingest hundreds of features. Each generation offers genuine improvements but also trades away something valuable: structural models sacrifice empirical fit for economic intuition; reduced-form models gain tractability but lose the firm's balance sheet as an anchor; machine learning gains predictive accuracy but surrenders interpretability and, often, regulatory acceptance. Practitioners rarely choose one paradigm; they combine them, using each where it does the least damage.
The Equity Market Just Repriced Credit Risk
Credit spreads on investment-grade and high-yield corporate debt have widened sharply across 2025 and into 2026 as geopolitical uncertainty has compressed global risk appetite. The move has drawn renewed attention to a question that asset managers, bank risk desks, and bond investors face continuously: how do you estimate the probability that a counterparty defaults, and how much compensation do you need for bearing that risk?
The answer depends heavily on which modeling framework you choose, and each framework has a seventy-year research lineage that determines what it can and cannot see.
Merton (1974): Equity as a Call Option on Firm Assets
The foundational insight of Merton (1974) is deceptively simple. A firm's equity is economically equivalent to a European call option on the firm's assets, with the face value of debt as the strike price. If the firm's asset value exceeds its debt at maturity, shareholders receive the residual. If assets fall below debt, shareholders receive nothing and bondholders absorb the loss.
This framing transforms the default problem into an options pricing problem. Given observable equity prices and volatility, Merton showed that the firm's asset value and asset volatility can be inferred by inverting the Black-Scholes formula. Default occurs when the asset value process, modeled as a geometric Brownian motion, crosses below the debt face value at the maturity date.
The distance-to-default (DD) summarizes this in one intuitive metric:
DD = (V - F) / (V x sigma_V)
where V is the estimated asset value, F is the default boundary (typically the face value of debt), and sigma_V is the asset volatility. A firm with a DD of 5 needs a five-standard-deviation adverse move in its assets to default. A firm with a DD of 1 is already close to the cliff.
KMV Corporation (subsequently acquired by Moody's) commercialized this insight in the late 1980s and 1990s. The KMV model estimates expected default frequencies (EDFs) by mapping distance-to-default values to empirical default rates across a large historical database. The core formula is preserved but the mapping from DD to EDF is empirical rather than theoretical.
The Empirical Shortcomings of Structural Models
For all its elegance, the Merton framework has a persistent empirical problem. Eom, Helwege, and Huang (2004) systematically evaluated five structural credit models, including Merton (1974) and the extensions by Leland (1994) and Longstaff-Schwartz (1995), against observed corporate bond yield spreads.
Their central finding is that structural models systematically misprice corporate bonds. The original Merton model predicts spreads that are too low for most bonds, often by a large margin. The more elaborate structural models solve part of the underprediction problem but introduce a new one: they overpredict spreads for risky firms. No single structural model produces well-calibrated spread predictions across the full rating spectrum.
Three structural problems underlie this empirical failure. First, the model assumes that default can only occur at debt maturity; in practice, firms can enter financial distress at any time. Second, geometric Brownian motion is a poor description of firm asset dynamics; jumps, mean reversion, and stochastic volatility all matter. Third, the model takes debt maturity as given and ignores the complex capital structures, covenant structures, and strategic default incentives that real firms face.
These are not minor calibration issues. They reflect a fundamental tension in structural models between theoretical tractability and empirical fidelity.
Reduced-Form Models: Intensity and Hazard Rates
The reduced-form (or intensity-based) approach, developed independently by Jarrow and Turnbull (1995) and extended by Duffie and Singleton (1999), abandons the structural link to firm assets entirely. Instead, default is modeled as the first arrival of a Poisson process with a stochastic intensity parameter, often denoted lambda.
The hazard rate (or default intensity) lambda(t) is the instantaneous conditional probability of default given survival to time t. If lambda(t) follows a known process, then the probability of surviving to time T given survival to time t is:
P(survival to T) = E[exp(-integral from t to T of lambda(s) ds)]
This formulation is mathematically analogous to the pricing of zero-coupon bonds in a short-rate interest rate model. In fact, Duffie and Singleton (1999) show that a defaultable bond can be priced exactly like a risk-free bond with a modified discount rate that incorporates the default intensity and the loss given default. This produces tractable closed-form solutions under affine specifications of the hazard process.
The practical advantages over structural models are significant. First, reduced-form models can be calibrated directly to observable credit spreads using straightforward yield-curve stripping techniques, without the need to infer unobservable firm asset values. Second, they handle complex term structures of default probability naturally. Third, they can be extended to accommodate correlated defaults and credit derivatives pricing within the same mathematical framework.
The tradeoff is loss of economic content. The hazard rate lambda(t) is a statistical object that describes when defaults happen; it says nothing about why they happen or what firm-level variables drive them. For risk monitoring purposes, where the practitioner wants to understand the sources of credit risk and diagnose deterioration early, the reduced-form approach offers less traction than the structural alternative.
Altman's Z-Score: The Proto-ML Classifier
Before modern machine learning, there was the Z-score. Altman (1968) used multiple discriminant analysis to construct a linear function of five financial ratios that separates bankrupt from non-bankrupt firms:
Z = 1.2 X1 + 1.4 X2 + 3.3 X3 + 0.6 X4 + 1.0 X5
where X1 is working capital / total assets, X2 is retained earnings / total assets, X3 is EBIT / total assets, X4 is market value of equity / book value of total liabilities, and X5 is sales / total assets.
Firms with Z above 2.99 are classified as safe; firms below 1.81 are classified as distress-zone. The grey zone in between is ambiguous. Altman's original sample achieved a classification accuracy of approximately 95 percent one year before bankruptcy.
Viewed from a modern machine learning perspective, the Z-score is a linear classifier trained on a small labeled dataset using discriminant analysis. Its feature set is sensible: it captures liquidity (X1), profitability (X2, X3), leverage (X4), and asset efficiency (X5). Its limitations are equally clear: it is linear, uses only five features, requires recalibration across time periods and industries, and was designed for manufacturing firms in a different macroeconomic era.
The Z-score remains widely cited and used as a benchmark, not because it is state-of-the-art, but because its interpretability makes it useful for regulatory filings, covenant monitoring, and portfolio screening where auditability matters.
Machine Learning: What Gradient Boosting Added
The shift to gradient-boosted decision trees, particularly XGBoost and LightGBM, brought three genuine improvements over classical discriminant models and logistic regression.
First, nonlinearity. Financial ratios interact in complex ways; a firm with high leverage is dangerous in a high-rate environment but manageable when rates are low. Tree-based models capture these interactions without requiring the analyst to specify them in advance.
Second, feature richness. Modern ML credit models ingest accounting data, market data (equity prices, equity volatility, credit spreads), macroeconomic indicators, industry indicators, and in some implementations textual features from earnings calls and filings. The Merton model uses two inputs; a modern gradient boosting model may use 200 or more.
Third, handling missing and imbalanced data. Corporate defaults are rare events. Gradient boosting implementations handle class imbalance natively through sample-weighting and cost-sensitive loss functions, which matters enormously for credit classification where false negatives (missed defaults) are far more costly than false positives.
The empirical gains are real. Across multiple studies and credit datasets, gradient boosting consistently outperforms logistic regression and Altman-style discriminant models on out-of-sample default prediction metrics such as the area under the ROC curve (AUC) and the Kolmogorov-Smirnov (KS) statistic. The margin is not small: typical improvements of 5 to 10 AUC points over logistic regression are common on datasets with rich market features.
The cost is interpretability. A gradient boosting model with 500 trees and hundreds of features is not auditable in the way that the Z-score is. Feature importance measures (Gini importance, SHAP values) provide approximations to explanations, but they are not structural economic interpretations.
Neural Hazard Models
The most recent methodological frontier applies neural networks to the hazard modeling framework, combining the mathematical structure of reduced-form models with the representational power of deep learning.
Kvamme et al. (2019) and related work reformulate discrete-time hazard models using neural network architectures. Instead of specifying a parametric form for the hazard function, the network learns the mapping from covariates to the conditional default probability at each time step. This enables the model to capture nonlinear effects of firm-level and macro variables on the hazard rate without the restrictive functional form assumptions of affine intensity models.
Gunnarsson et al. (2021) applied a similar framework specifically to corporate credit risk, finding that neural hazard models outperform both logistic regression and gradient boosting on longer-horizon default prediction, where the temporal dynamics of the hazard rate matter most. The advantage is particularly pronounced for firms in the early stages of financial stress, where the time path of covenant pressure and cash burn is informative in ways that a cross-sectional snapshot misses.
Recurrent architectures (LSTM, GRU) handle the temporal structure directly. Instead of feeding the model a single-period snapshot of financial ratios, recurrent networks process the time series of financial statements and market prices, learning which trajectories precede default. This is closer to what experienced credit analysts do informally: they do not look only at the most recent filing; they look at the trend.
The tradeoff is data hunger. Neural models require much larger training samples than gradient boosting to avoid overfitting, and corporate default datasets are inherently limited by the rarity of defaults. Regularization (dropout, L2 penalties), transfer learning across sectors, and data augmentation help, but the problem does not fully disappear.
The Practitioner's Framework: What Gets Used Where
| Framework | Interpretability | Data Needs | Default Prediction | Regulatory Acceptance |
|---|---|---|---|---|
| Merton / KMV | High | Market + balance sheet | Moderate | High |
| Reduced-form | Medium | Credit spreads | High (for pricing) | High |
| Altman Z-score | Very High | Accounting only | Moderate | Very High |
| Gradient Boosting | Low-Medium | Accounting + market | High | Medium |
| Neural Hazard | Low | Large panel data | Highest | Low |
Investment-grade credit assessment at banks and large asset managers typically relies on structural models (KMV-style EDF estimates) blended with judgmental overlays. The structural model provides an economically grounded anchor; the analysts adjust for factors the model cannot see, such as management quality, litigation risk, and strategic positioning.
High-yield and leveraged loan desks increasingly use gradient boosting models alongside traditional fundamental analysis. The model identifies outliers that warrant closer attention; the analyst decides whether the model's concern reflects genuine deterioration or a data artifact.
Distressed debt and credit special situations practitioners typically rely most heavily on bottom-up fundamental analysis and structural model outputs. At or near default, reduced-form models lose their edge because default timing is no longer a statistical abstraction; it is a negotiated outcome among creditors, management, and regulators.
Quantitative credit hedge funds and fintech lenders are the primary adopters of neural hazard models. They have the data volumes and the technical infrastructure to support these models, and they face fewer regulatory constraints on model form than regulated banks.
What Each Model Loses
Understanding what each model sacrifices is as important as understanding what it gains. The Merton model imposes a specific economic structure; when that structure is wrong (and it often is, particularly for firms with complex capital structures), the model fails systematically rather than randomly. Reduced-form models fit well to market prices but are silent on the mechanism of default; they cannot alert you to deteriorating fundamentals before market prices move. Gradient boosting is powerful but non-causal; it correlates patterns in the training data with defaults, and those correlations can break down out-of-sample when the economic regime shifts. Neural models extend these capabilities temporally but compound the interpretability and data requirements.
None of these frameworks is wrong. Each is a different approximation to the same complex economic reality.
Limitations
Credit models of every type share common limitations. Default datasets are small relative to the model complexity they attempt to support; even with decades of data, investment-grade defaults are rare enough to make out-of-sample validation unreliable. Models trained in one credit cycle may produce systematically biased predictions in the next. The interaction between credit risk and systemic risk (the tendency for defaults to cluster in recessions) is difficult to model without a macro component, and most credit models treat the macro environment as a covariate rather than a co-evolving state.
Regulatory requirements impose a separate constraint. Banks subject to Basel III/IV must use models that satisfy interpretability and auditability standards. This effectively rules out deep neural networks for regulatory capital calculations, even when those networks demonstrate superior out-of-sample performance. The academic frontier and the regulatory frontier are not always the same place.
Related
This analysis was synthesised from Quant Decoded Research by the QD Research Engine AI-Synthesised — Quant Decoded’s automated research platform — and reviewed by our editorial team for accuracy. Learn more about our methodology.
References
-
Merton, R.C. (1974). "On the Pricing of Corporate Debt: The Risk Structure of Interest Rates." Journal of Finance, 29(2), 449-470. https://doi.org/10.1111/j.1540-6261.1974.tb03058.x
-
Altman, E.I. (1968). "Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy." Journal of Finance, 23(4), 589-609. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
-
Jarrow, R.A. & Turnbull, S.M. (1995). "Pricing Derivatives on Financial Securities Subject to Credit Risk." Journal of Finance, 50(1), 53-85. https://doi.org/10.1111/j.1540-6261.1995.tb05167.x
-
Duffie, D. & Singleton, K.J. (1999). "Modeling Term Structures of Defaultable Bonds." Review of Financial Studies, 12(4), 687-720. https://doi.org/10.1093/rfs/12.4.687
-
Eom, Y.H., Helwege, J. & Huang, J. (2004). "Structural Models of Corporate Bond Pricing: An Empirical Analysis." Review of Financial Studies, 17(2), 499-544. https://doi.org/10.1093/rfs/hhg053
-
Kvamme, H., Foss, N., Borgan, O. & Scheel, I. (2019). "Time-to-event prediction with neural networks and Cox regression." KDD 2019. https://doi.org/10.1145/3292500.3330687
-
Gunnarsson, B.R., Vanden Broucke, S., Baesens, B., Óskarsdóttir, M. & Lemahieu, W. (2021). "Deep learning for credit scoring: Do or don't?" Expert Systems with Applications, 177, 114722. https://doi.org/10.1016/j.eswa.2021.114722