We’ve touched on forecasting models previously and only briefly discussed the methods on how to ensure that the results of the model you built is actually a fair representation of the objective your forecasting model was set out to do. In the world of predictive modeling, ensuring that models are accurate, reliable, or basically fit for purpose is of utmost importance, especially in fields such as finance and stock market analysis. This time let us look into the critical components of backtesting (quantitative tests, qualitative tests, and alternative validation) methods used to assess the performance of these models.
Backtesting
At the heart of model validation lies backtesting, a process that involves comparing current estimates with actual data to ensure that a model’s predictions align with real-world results. It operates as follows: data from the most recent x number of months/days (whichever frequency your model is built on) is collected for each estimate under scrutiny, along with the data used during the original model creation. Keep in mind that there isn’t a fixed minimum amount of data that applies universally, as the ideal dataset length can vary depending on the specific model, the frequency of trading, and the goals of the analysis. Meaning that your mileage may vary in building the appropriate testing and development datasets and the proper length of time for each one. This is why it is important to note that backtesting is just part of an overall framework for model governance that needs to be put in place to ensure that your model is fit for purpose. Keep in mind that the SEC regulates various aspects of the securities industry, including the use of forecast models. While their current scope does not directly regulate forecast models, they require financial professionals to act in their clients’ best interests, which implies using sound and reliable models for investment decisions. Here’s how:
Quantitative Tests:
Quantitative tests are essential for rigorously assessing model accuracy. These tests can be categorized into four main groups based on their objectives:
Objective 1: Tests for Stability of Estimates
The Augmented Dickey-Fuller (ADF) test is a widely used statistical test in stock market analysis to evaluate the stability and stationarity of time series data, such as stock prices. Its primary purpose is to determine whether a time series follows a random walk (non-stationary) or exhibits a stationary behavior with a stable mean and variance over time.
The ADF test assesses the stability of stock prices by analyzing their time series properties. It helps identify trends and behaviors that may exist within the data, which is crucial for making informed investment decisions. The ADF test calculates a p-value, and the interpretation of this p-value is a key step in the analysis.
Objective 2: Tests for Discriminatory Power of Model
Discrimination Testing (Accuracy Ratio): This test assesses a model’s ability to differentiate between various categories, such as profitable and non-profitable stocks. It measures the precision in classifying different outcomes. The Accuracy Ratio (AR) quantifies the model’s ability to distinguish between the two.
Information Value: Information Value (IV) measures the ability of variables to segregate “positive” from “negative” outcomes within a dataset. It is a fundamental concept in stock market analysis, quantifying the predictive power of predictor variables. High IV values suggest strong predictive attributes.
Objective 3: Tests for Calibration
Concentration Testing (Herfindahl-Hirschman Index): The Herfindahl-Hirschman Index (HHI) measures the concentration of risk within specific stock categories or sectors. It helps to assess the risk distribution in a stock portfolio. In evaluating a stock portfolio, the HHI calculates concentration levels within different stock sectors. Higher HHI values point to concentrated risk.
Calibration Testing (Modified Binomial Test): The Modified Binomial Test examines a model’s calibration, especially when stock movements are correlated. It accounts for correlations between stock events. In stock market modeling, the Modified Binomial Test evaluates whether the model’s stock predictions align with actual stock movements. This test is crucial for assessing the model’s calibration under the presence of stock correlations.
Rating Migration Testing (Mobility Index): The Mobility Index measures the degree of “Rating Mobility” in stock rating systems. It helps gauge changes in stock ratings over time and ensures they align with the model’s intended philosophy. When dealing with stock rating models, the Mobility Index quantifies the mobility of ratings over time. A high Mobility Index suggests a point-in-time (PiT) rating system, while a low index indicates a through-the-cycle (TtC) system.
Objective 4: Bias Testing
The Runs Test is designed to uncover patterns and biases in ordered sequences of stock data. It identifies sequences of consecutive positive or negative stock movements. In financial time series analysis, the Runs Test is a useful tool to detect significant patterns or trends in stock price movements.
Qualitative Tests
As comprehensive as those objectives are for backtesting, the quantitative methods listed above for testing a model are almost always never enough. In addition to quantitative tests, qualitative tests are crucial for assessing model performance. These tests provide a deeper understanding of model behavior and potential limitations. The qualitative tests that are often used are self-explanatory and there’s no need to go through each and one of them: Expert Review, Sensitivity Analysis, and Scenario Testing. For financial models that are heavily regulated (like for a bank’s credit risk models) there’s also a Model Governance Review that should evaluate the processes and controls in place to ensure model accuracy and compliance with regulatory requirements. In the stock market, this review assesses whether the model follows governance protocols and whether any regulatory requirements are met.
While quantitative and qualitative tests are highly effective in assessing model accuracy, they may have limitations. For instance, these tests are typically not applicable when the model’s primary purpose is long-term forecasting, as forecasting models aim to predict future outcomes, and historical data may not always provide a reliable basis for validation. In such cases, alternative validation methods, like out-of-sample testing or expert judgment, may be more suitable.
Alternative Validation Methods
In situations where historical data and quantitative tests may not suffice, alternative validation methods come into play:
Out-of-Sample Testing
Out-of-sample testing involves assessing a model’s performance on data that was not used in its development. It’s particularly useful for forecasting models. For example, if you have a long-term stock price forecasting model, you test its accuracy by applying it to future data that wasn’t part of the training dataset.
Expert Judgment
Expert judgment involves consulting with subject matter experts who provide insights and validation of model assumptions and predictions. In stock market forecasting, experts may be consulted to assess the model’s alignment with market dynamics.
The alternative validation methods may sound like cop outs in terms of “validating” a forecast model and that is what they technically are. In truth, alternative validation techniques are what keeps most forecasting models fit for purpose. That and a comprehensive model validation framework that doesn’t necessarily run all the tests mentioned above but covers the objective of the model. For example, in the outlined FX forecasting model in the recent articles, you may just cover key components for the FX model’s objective such as: Stability, Calibration, Out-of-Sample Testing, and enough expertise to know which variables would make sense to take out or add.