ESG Alpha: Real or Artifact?

Key Takeaway

Between 2019 and 2021, ESG funds outperformed traditional benchmarks, and a narrative took hold: doing good and doing well were not just compatible but synergistic. Many investors concluded that sustainable investing generates alpha. The academic evidence tells a more nuanced story. Pastor, Stambaugh, and Taylor (2021) show that green assets should actually earn lower expected returns in equilibrium -- and that the observed outperformance was a one-time repricing event, not persistent alpha. Meanwhile, Berg, Koelbel, and Rigobon (2022) demonstrate that ESG ratings from different providers diverge so dramatically that the very concept of an "ESG signal" is fundamentally unreliable. Together, these two papers suggest that the most common investor mistake in sustainable investing is confusing a transitory windfall with a structural edge.

Paper A: Sustainable Investing in Equilibrium

Pastor, Stambaugh, and Taylor (2021) develop an equilibrium asset pricing model that incorporates investors' ESG preferences directly into the pricing of securities. Their framework yields several important predictions that challenge conventional thinking about green alpha.

The Taste Effect: Why Green Assets Earn Less

The central mechanism is deceptively simple. If a meaningful fraction of investors derive non-pecuniary utility from holding green assets -- they feel good about owning clean energy stocks and uncomfortable holding fossil fuel companies -- then demand for green assets rises and demand for brown assets falls. This preference-driven demand shift pushes up the prices of green assets and depresses the prices of brown assets.

Higher prices today mean lower expected returns tomorrow. In equilibrium, green assets trade at a premium, which mechanically reduces their future returns. Brown assets trade at a discount, raising their future returns. The model predicts a "carbon premium" -- brown firms compensate investors with higher expected returns for the disutility of holding socially undesirable stocks.

This is analogous to the "sin stock" premium documented by Hong and Kacperczyk (2009), who found that tobacco, alcohol, and gambling stocks earn abnormally high returns precisely because many institutional investors exclude them from portfolios. Reduced demand leads to lower prices and higher expected returns for the excluded assets.

The Repricing Illusion

If green assets have lower expected returns, how do we explain the strong performance of ESG funds in 2019-2021? Pastor, Stambaugh, and Taylor provide an elegant answer: unanticipated increases in ESG demand.

When investor preferences shift toward green assets more rapidly than the market anticipated, green stocks experience a positive price shock -- a one-time revaluation upward. During this transition, green assets deliver unexpectedly high realized returns. But this outperformance is the result of the repricing itself, not a persistent return premium. Once prices have adjusted to the new, higher level of ESG demand, expected returns on green assets actually fall further.

The analogy is straightforward. If interest rates drop unexpectedly, bond holders enjoy capital gains. But going forward, the yield on those bonds is now lower. Celebrating the capital gains while ignoring the lower future yield is a mistake -- yet this is precisely what many ESG investors did during the 2019-2021 period.

The Equilibrium Prediction

The model's steady-state prediction is clear:

Asset Type	Expected Return	Mechanism
Green assets	Lower	Investors accept lower returns for ESG utility
Brown assets	Higher	Investors demand compensation for holding undesirable stocks
During ESG demand shift	Green outperforms temporarily	One-time repricing, not persistent alpha
After adjustment	Green underperforms	New equilibrium with even lower expected returns

This framework has found empirical support. Bolton and Kacperczyk (2021) documented that firms with higher total carbon emissions earn higher stock returns, consistent with a carbon risk premium of approximately 1-2% per year. The premium is most pronounced for direct (Scope 1) emissions and has grown over time as climate awareness has increased -- exactly as the Pastor-Stambaugh-Taylor model would predict.

Paper B: The Measurement Problem

Even if ESG preferences theoretically affect asset prices, implementing an ESG-based investment strategy requires a reliable way to measure corporate ESG performance. Berg, Koelbel, and Rigobon (2022) show that no such reliable measurement exists.

The Rating Divergence

The authors collected ESG ratings from six major providers -- KLD (now MSCI), Sustainalytics, Vigeo Eiris (now Moody's), RobecoSAM (now S&P Global), Asset4 (now Refinitiv), and MSCI IVA -- and computed pairwise correlations. Their findings are striking.

Comparison	Correlation
ESG ratings across providers	r = 0.54 (average)
Credit ratings (Moody's vs. S&P)	r = 0.99
ESG environmental pillar	r = 0.53
ESG social pillar	r = 0.42
ESG governance pillar	r = 0.30

A correlation of 0.54 means that roughly 70% of the variance in one provider's ESG assessment is noise relative to another provider's assessment. By contrast, when Moody's and S&P both rate a company's creditworthiness, they agree almost perfectly. The ESG rating industry produces assessments so divergent that a company rated as an ESG leader by one provider can simultaneously be rated as a laggard by another.

Three Sources of Divergence

Berg, Koelbel, and Rigobon decompose the disagreement into three components:

Scope divergence -- providers measure different things. One provider might include lobbying activities in its governance assessment while another does not. One might measure carbon emissions intensity while another measures total emissions. The set of categories that constitute "ESG" is itself undefined.

Measurement divergence -- even when providers examine the same attribute, they measure it differently. Two providers assessing labor practices might use different data sources, different methodologies, or different benchmarks. This is the largest source of disagreement, accounting for more than half of the total divergence.

Weight divergence -- providers assign different weights to the same categories when computing aggregate scores. One provider might weight environmental issues at 40% of the total score; another might weight them at 25%. Even if the raw measurements were identical, the composite scores would differ.

The Rater Effect

Perhaps most troublingly, the authors find that the rating of any given ESG category by a specific provider is influenced by the company's performance on other categories -- a "rater effect." If a provider rates a company highly on environmental performance, it tends to also rate it highly on governance, even when the two are conceptually distinct. This halo effect further inflates the noise in ESG data.

Where the Evidence Converges

Despite their different focuses, the two papers converge on several critical points.

ESG is not a traditional alpha source. Pastor, Stambaugh, and Taylor show theoretically that green assets earn lower expected returns, and any observed outperformance reflects repricing rather than persistent alpha. Berg, Koelbel, and Rigobon show empirically that the signal used to construct ESG portfolios is so noisy that any back-tested alpha is likely to be fragile and provider-dependent. Both papers undermine the claim that ESG investing systematically generates excess risk-adjusted returns.

Past performance in ESG is particularly misleading. The 2019-2021 period saw a confluence of factors -- massive inflows into ESG funds, regulatory tailwinds, and a tech-heavy composition of green indices -- that produced strong returns for sustainable strategies. The equilibrium model explains this as a repricing event that cannot repeat, while the measurement literature explains that an investor who achieved this outperformance might not be able to replicate it simply by switching to a different ESG data provider.

The concept of "ESG" lacks a stable definition. Pastor, Stambaugh, and Taylor treat ESG preferences as a well-defined taste parameter in their model. In practice, as Berg, Koelbel, and Rigobon demonstrate, there is no consensus on what ESG means, how to measure it, or how to weight its components. This makes it difficult to translate the theoretical framework into a reliable investment strategy.

Where the Evidence Diverges

The two papers also diverge in important ways.

On whether ESG affects prices at all. Pastor, Stambaugh, and Taylor take it as given that investor preferences for green assets are strong enough to affect equilibrium prices. Berg, Koelbel, and Rigobon's findings raise the question of whether the signal is too noisy for preferences to aggregate coherently. If different investors use different ESG ratings, their demand patterns may partially cancel out rather than creating a consistent price effect.

On the existence of a carbon premium. The equilibrium model predicts a clear carbon premium -- brown assets should earn higher returns. Empirical evidence from Bolton and Kacperczyk (2021) supports this, but the magnitude and consistency of the premium remain debated. Some studies find the carbon premium only in specific subperiods or for specific emission scopes.

On practical implications. Pastor, Stambaugh, and Taylor suggest that ESG-conscious investors should accept lower expected returns as the cost of their values-based preferences -- a rational and informed trade-off. Berg, Koelbel, and Rigobon's work suggests that even defining a consistent values-based strategy is problematic when the underlying data is this noisy.

Implications for Factor Investors

The intersection of these findings has several practical consequences for quantitative and factor-oriented investors.

ESG Is Not a Factor in the Traditional Sense

Traditional risk factors -- value, momentum, quality -- are defined by clear, replicable metrics (book-to-market, past returns, profitability ratios). ESG lacks this definitional clarity. A portfolio sorted on MSCI ESG ratings will look substantially different from one sorted on Sustainalytics ratings, making ESG unsuitable as a systematic factor in the Fama-French tradition. The lack of a canonical definition means that any ESG "factor" is inherently researcher-dependent.

The Carbon Premium Deserves Attention

While aggregate ESG scores are noisy, carbon emissions data is relatively well-defined and increasingly standardized through regulatory mandates. The carbon premium documented by Bolton and Kacperczyk (2021) and predicted by Pastor, Stambaugh, and Taylor (2021) rests on more solid measurement foundations than broader ESG scores. Factor investors seeking to harvest this premium can focus on carbon intensity -- tonnes of CO2 per unit of revenue -- rather than composite ESG ratings.

Beware of Back-tested ESG Alpha

Any back-test showing persistent ESG alpha should be scrutinized for several biases:

Provider selection bias. Results may depend on which ESG data provider is used. A strategy that shows alpha with MSCI ratings may not replicate with Sustainalytics data.
Repricing contamination. If the back-test includes the 2019-2021 period, the results may be inflated by a one-time repricing event that cannot recur.
Sector concentration. ESG-tilted portfolios tend to overweight technology and underweight energy. Much of the "ESG alpha" in recent years may simply be a bet on the tech sector.
Survivorship and look-ahead bias. ESG data coverage has expanded dramatically; historical back-tests may implicitly condition on firms that survived and eventually received ESG coverage.

The Rational Trade-Off Framework

Pastor, Stambaugh, and Taylor offer the most intellectually honest framing of ESG investing: it is a consumption good, not an alpha strategy. Investors who hold green assets are paying for the utility they derive from alignment with their values, just as consumers pay a premium for organic food or fair-trade coffee. The cost of this utility is lower expected returns relative to an unconstrained portfolio.

This framing does not diminish the value of ESG investing -- it clarifies what the investor is actually buying. An investor who understands that their ESG portfolio may earn 50-100 basis points less per year than a comparable unconstrained portfolio, and who considers this a worthwhile trade-off for values alignment, is making a rational decision. An investor who believes their ESG portfolio generates alpha is operating under a misconception.

Conclusion

The academic evidence on ESG alpha is inconvenient for both its proponents and its detractors. For proponents, the evidence says that ESG outperformance in recent years was a transitory repricing event rather than persistent alpha, and that ESG ratings are too noisy to serve as reliable investment signals. For detractors, the evidence does suggest that investor preferences for green assets genuinely affect equilibrium prices, and that a carbon premium likely exists -- meaning ESG considerations are financially relevant even if they do not generate alpha.

The most important takeaway is the distinction between realized returns and expected returns. Green assets delivered strong realized returns during 2019-2021 precisely because investor demand for them surged unexpectedly. But that surge, once priced in, means green assets now offer lower expected returns going forward. Confusing the windfall with the steady state is the central mistake that investors in sustainable strategies must avoid.

References

Berg, F., Koelbel, J. F., & Rigobon, R. (2022). "Aggregate Confusion: The Divergence of ESG Ratings." Review of Finance, 26(6), 1315-1344. https://doi.org/10.1093/rof/rfac033
Bolton, P., & Kacperczyk, M. (2021). "Do Investors Care about Carbon Risk?" Journal of Financial Economics, 142(2), 517-549. https://doi.org/10.1016/j.jfineco.2021.05.008
Hong, H., & Kacperczyk, M. (2009). "The Price of Sin: The Effects of Social Norms on Markets." Journal of Financial Economics, 93(1), 15-36. https://doi.org/10.1016/j.jfineco.2008.09.001
Pastor, L., Stambaugh, R. F., & Taylor, L. A. (2021). "Sustainable Investing in Equilibrium." Journal of Financial Economics, 142(2), 550-571. https://doi.org/10.1016/j.jfineco.2020.12.011