Document Type

Article

Publication Date

6-26-2026

Abstract

We apply machine learning methods to predict Thoroughbred yearling auction prices at the Keeneland September Sale (2020–2024). Our sample includes 5,788 yearling prices with pedigree data. We use both linear and tree-based models to predict log prices. We use cross-validation to tune model hyperparameters and select Ridge regression (α = 1.451) as the primary model for interpretation given its stability and interpretability. The Ridge regression explains approximately 54% of out-of-sample variation (R2≈ 0.5403). Sire and Dam Reputation emerge as the dominant predictors. Results provide pricing benchmarks and show how reputation and session structure shape Thoroughbred yearling auction prices.

Comments

JEL classifications: C45; C55; G12; Q19; L83

This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.

Included in

Economics Commons

Share

COinS