Thursday, May 7, 2020

Forecasting the next decade in the stock market using time series models

This post will introduce one way of forecasting the stock index returns on the US market. Typically, single measures such as CAPE have been used to do this, but they lack accuracy compared to using many variables and can also have different relationships with returns on different markets. Furthermore, it is possible to train different types of models and combine them to increase the accuracy even more, as is done in this post.

We'll use a variety of time series models, with a goal of forecasting future returns for the S&P 500. The variable to be forecasted is the annual future ten-year return, and all of the used models except for ETS are dynamic, i.e. they also use some regressors such as valuation multiples, which are mostly the same ones as in this post. We'll also use the same data sources as in the mentioned post, which I highly recommend before reading this one.

The way the models will be constructed is that the sample, which consists of observations between the years 1948 and 2010, will be split using a 65/35 split into training and test sets. These values were chosen to ensure that the test set has both low and high values of the future ten-year returns so that the model can be properly assessed. Time series cross-validation could have been used to get more accurate accuracy metrics, but for our purpose a simple train/test split is good enough.

Due to the value to be forecasted representing the future ten-year returns, we have to further split the test set, separating the first ten years from the rest. This is done to avoid data leakage, or more accurately, the look-ahead bias. This is because we cannot forecast future ten-year returns using last month's future ten-year returns due to not knowing them at the moment of the forecasting, but we can forecast them using the ten-year future returns from ten years ago, which correspond to the returns of the past ten years. In the plots, the gray vertical line is used to separate the test set from the training set, and the red vertical line is used to separate the look-ahead bias-free test set, which is also the set the accuracy measures are calculated with.

We'll use five different models plus a combination model, which is the average of these models. In addition, we'll also include an ETS model that is not included in the combination model, just to see how using the return of the last ten years as a reference point, which is something that people do, can be dangerous. It is also included to show that there are no clear patterns in the data, such as seasonalities or trends that can be used to make the forecasts. Some of the models are quite complicated so we won't go into too much details in this post, but they all operate on the same idea that information about the future values of the variable to be forecasted exist in the regressors, but also in the past values of the said variable.

Let's fit the models and take a look at the results:

Click to enlarge images

The Prophet model by Facebook, the neural network-based nnetar and the TSLM which takes into account just the regressors and the seasonality and trend, seem to be the most inaccurate models if we exclude the ETS model. The nnetar model has the thinnest prediction intervals that are however quite inaccurate, perhaps due to the fact that neural network models usually require much more data. Contrastingly the Vector Autoregression and ARIMA models both seem to be quite accurate but have realistic prediction intervals. Of these two, the Vector Autoregression model seems to be more confident about its predictions, most likely since the model is more complicated than ARIMA. Notice that the combination forecast has not been plotted along the other models just yet.

The ETS model shows that forecasting based on the past ten-year return would have been horribly wrong. The forecast was made when the ten-year return was at nearly an all-time high, yet the first actual ten-year return was very close to its all-time low. Using the average ten-year return instead of the latest one would have resulted in more accurate forecasts due to mean reversion in the ten-year return.

Let's then take a look at the accuracy measures of all the models on the non-biased test set, sorted from the most to the least accurate based on Mean Absolute Error:

Combination -0.0033158 0.0183 0.0129 -0.3281969 1.24 0.679 0.896
ARIMA 0.0114286 0.0363 0.0176 0.9895734 1.6 0.926 0.739
VAR 0.0009312 0.0282 0.0234 0.0740454 2.23 1.23 0.768
nnetar 0.0241653 0.0417 0.0389 2.1628762 3.64 2.05 0.744
Prophet -0.0380064 0.0474 0.0417 -3.5044745 3.87 2.19 0.661
TSLM -0.0150980 0.0609 0.0443 -1.3630049 4.17 2.33 0.458
ETS -0.1346834 0.142 0.135 -12.9110549 12.9 7.09 NA

The combination forecast was the most accurate, which is not surprising considering that some of the models were biased upwards and some downwards from the actual values. The magnitude and direction of the bias can be seen from the Mean Error. The MAE directly tells us how many percentage points the forecast was off on average. So if the actual ten-year return would have been ten percent, the forecast would have on average been just 1.29 percentage points off, which is almost twice as good as the accuracies reached by a machine learning ensemble model that was introduced in the previous post. The combination forecast also has less outliers as shown by the considerably lower RMSE compared to other models. It also has a much higher correlation with the actual values as shown by the R-squared.

It would be possible to further increase the accuracy of the combination model by combining it with pure machine learning models such as XGBoost, which were tested in the post mentioned before. Another way would be not including all the models but rather some combination of them, such as the ones with the highest accuracies and opposite biases. This would however require further splitting the data and is beyond the scope of this post.

Lastly, we'll forecast the future returns using all the models, but we'll plot just the results of the combination model. The models will now be trained with the full data (training + test sets) so that we can get as accurate results as possible. Since the data is not complete for all of the predictors for the past few months, we'll also have to fill the latest values in using data from multpl and FRED, and impute the missing values in between using spline imputation. The results are as follows:

The zoomed part includes the bias-free test set trained on just the training set and the future forecasts in blue. During the past few years, the expected return according to the model has increased from than five percent to over eight percent. The part we are especially interested about is the latest forecast, which is the forecast for the next ten years starting from this moment. These forecasts for the different models are shown in the table below, sorted from lowest to highest:

Model 10-year CAGR forecast
VAR 2.41%
Prophet 7.08%
nnetar 7.68%
Combination 8.35%
TSLM 10.75%
ETS 13.50%
ARIMA 13.82%

The Vector Autoregression model makes the lowest forecast, while the ARIMA model has the highest forecast. This is especially interesting since these were the two most accurate models based on the test set, yet they make such different forecasts. The combination forecast is about three percentage points below the historical return of the S&P 500, which is not that bad all things considered.

It is important to note that the future forecasts will be most likely less accurate due to some of the predictors such as unemployment rate and interest rates falling outside their historical ranges. This uncertainty is also visible on the individual model prediction intervals which are not plotted here. However, compared to pure machine learning models which suffer of the same problem, the time series models are likely more accurate and more robust to these types of structural changes.

If you liked this post, be sure to follow me on Twitter for updates about new blog posts like this!

The R code used in the analysis can be found here, together with the code for the machine learning models from the previous post.

The predictors PE and TR_CAPE have been excluded from all the other models except ARIMA since it seemed to react to the multicollinearity caused by them better than the other models. All the other predictors as in the last post were used.

Since the nnetar and Prophet make different distributional forecasts than rest of the models, we cannot compute the prediction intervals as easily for the combination model, so they have been left out of the plot with the combination forecast.

Tuesday, April 21, 2020

The performance of small value stocks in bear markets

If you have ever seen comparisons of investment returns with and without reinvested dividends, you know that the difference gets huge as the investment horizon increases. Wouldn't it be great you could achieve a similar increase in returns by altering your investment style?

Small cap value stocks have historically achieved much higher returns that typical stocks. Using data from French and Shiller, we can calculate that the average yearly (CAGR) total return with dividends has been 14.3 percent for US small cap value stocks (doubling every 5.2 years) and 10.1 percent for the S&P 500 index (doubling every 7.2 years). This means that it has taken a bit over 18 years for the investment in the small cap value stocks to have become twice as large as the same investment in the index. For comparison, without reinvested dividends, small cap value stocks have returned 10.8 percent (doubling every 6.8 years) and the S&P 500 has returned 6.1 percent (doubling every 11.7 years) in the same time period. The outperformance gap has therefore been a bit over four percent per year historically, which is about as big as the gap between the investment in the index with and without reinvested dividends.

The outperformance of small cap value investing doesn't however come without costs. Value investing has been underperforming for the longest time in history (pdf). Long periods of underperformance are not unusual since the underperformance tends to happen for years at the time, especially during bull markets. In addition to value stocks, the strategy relies on small cap stocks, which have also underperformed large cap stocks during the past ten years in the US (source). Now in the latest bear market, the small cap value strategy has also so far performed worse than the index, of course for a good reason, but it raises the question of what the future performance might look like.

The recent underperformance does not mean that the performance will necessarily continue to be weak in this market, as in this post I'll demonstrate that the strategy has outperformed the index not only during but also after the bottoms of bear markets. Bear markets are defined as markets where the index has fallen more than twenty percent.

The data of small cap value stock returns is from Kenneth French's data library. In the data, the market is split into six parts: small and large companies and low, medium and high book-to-market companies. We'll use the "old" definition of value since the strategy selects companies with the largest book-to-market values, which is the same as selecting companies with the smallest price-to-book values (P/B) excluding also companies with negative book values. We'll use nominal i.e. non-inflation-adjusted returns, and we also take reinvested dividends into account. The returns we'll use are monthly returns, which may not capture the shortest bear markets in the data.

Let's first take a look at the returns beginning from the peak preceding the bear market until for the next ten years after the peak. The green line represents the returns of the small cap value stocks, and the black line represents the returns of the S&P 500, while the black horizontal line represents the boundaries for bear markets.

Click to enlarge images

Since the data is monthly and beginning in 1926, it captured seven different bear markets. Notice that the month and year of the peak are in the title of each of the seven plots. We can see that the small cap value strategy has outperformed in all of the cases except in the Great Depression, and usually by a wide margin. The average return of the index during the ten years was 79.7 percent, or 6.0 percent annually, while the average return of the small cap value strategy was over double at 217 percent, or 12.2 percent annually. The returns were negative only after two out of the seven bear markets for the index, and only once for the small cap value strategy.

Since small cap value has underperformed the index so far in the latest bear market, we should look at how the strategy has performed from the bottoms of the bear markets instead of looking at data beginning at the previous peaks. Below is the same plot but with the bottom of the bear market as the starting point of each plot. This time the title represents the month and the year the market bottomed in instead of when it peaked.

We can of course not know in advance when the bottom will be, but the intention is rather to show whether the performance is weaker after the bottom than during the decline before the bottom. All of the returns were positive from the bottoms of the bear markets, which denotes that none of the bear markets were followed by another bear market that would have exceeded the previous decline. The return for the index for the ten years following the bottom was 242 percent, or 13.1 percent annually, and the return for the small cap value strategy was an astonishing 592 percent, or 21.3 percent annually. This means that the investment would have doubled every 3.6 years! The only time the small cap value strategy underperformed compared to the index was the bear market that bottomed in 2009, but the underperformance wasn't too substantial.

As can be seen from the plots, small cap value is a quite volatile strategy, which in addition tends to underperform for years at the time. It has however been one of the most profitable strategies especially right after a bear market has bottomed. The outperformance has been slightly stronger during the downturns (in relative terms) than from the bottom, achieving a return of almost double that of the index. Since the strategy has outperformed every time except for once from the bottoms of the last seven bear markets, it has a good chance of outperforming the index this time, too. This together with the fact that the returns of the strategy tend to mean revert, i.e. periods of outperformance tend to follow periods of underperformance, makes the strategy to have the one of the best starting points for the following decade.

Be sure to follow me on Twitter for updates about new blog posts like this!

The R code used in the analysis can be found here.

Notice that the S&P 500 index has been reconstructed by Shiller for the years it did not exist yet.

The file which includes the small cap value returns in the French data library is called "6 Portfolios Formed on Size and Book-to-Market (2 x 3)".

Notice that the choice of using total return data changes the definition of a bear market and its bottom a bit, but the same seven bear markets would be found also with the price return data.

Monday, March 16, 2020

A look at past bear markets and implications for the future

The S&P 500 is officially in a bear market, and the crash from the high valuation levels has been fast and painful. There is however light at the end of the tunnel. In this post I'll demonstrate how the US stock market has developed during past bear markets and how the market has recovered during the ten years after the peak.

The reason for choosing the ten years as the horizon is because I believe that you should not invest in stocks any money that you are going to need in the next ten years. The chance of having positive returns increases substantially with time and is almost ninety percent for a period of ten years. The worst annual return for a ten-year period has been about negative four percent since 1928 (sources).

We'll use monthly total return data of S&P 500 from Shiller beginning from the year 1871 until the very end of last year. The index has been reconstructed to represent the US stock market for dates the S&P 500 didn't exist yet. The reason why we go so far back in time is to include as many bear markets as possible. Panics and manias have always existed, and the human nature has not changed enough in the past 150 years to make the past data less valid. There has however been a substantial change in the spread of information, which causes panic to spread faster and may possibly make bear markets shorter and deeper.

First, let's take a look at the 14 bear markets found in the data in nominal terms, which describes how a portfolio would have developed without taking inflation into account. The horizontal black line indicates the drop needed to reach a bear market at minus 20 percent, and a blue color indicates that the return has been positive in the 10 years following the peak i.e. the ending value is higher than the value at the peak, and a red color indicates the opposite.

Click to enlarge images

Only two of the fourteen bear markets did not recover in ten years from the initial peak. Not surprisingly, the two bear markets were the ones that peaked at bubble territory in 1929 and 2000. Notice that the bear markets that peaked in 1919 and 1987 we followed by the exact same bubbles.

Below is the same plot with real returns, so the returns describe the actual change of purchasing power by taking inflation into account. Notice that since bear markets are defined as being down by twenty percent in nominal terms, the returns might not dip below the black line because of deflation.

In real terms, four of the fourteen bear markets did not recover after ten years of peaking. Judging by the history, this still leaves us an over 70 percent chance of the index being higher in the next ten years after inflation. Note that the bear market that peaked in 1968 is overlapping heavily with the bear market that peaked in 1972, so they could be considered to be the same bear market, which would increase our chances even further.

Let's then plot the bear markets in red on top of the index to get a sense of the lengths of the bear markets, from peak to full recovery.

The average length of a bear market from peak until recovery has been 3.95 years and the fall length from the peak until bottom i.e. a peak to trough time was 1.45 years. The longest bear market during the 1930s Great Depression was 15.33 years, and the longest time the stock market fell was 2.75 years.

Lastly, let's take a look at just the drawdowns. The bear market threshold is again indicated with a black horizontal line. The monthly data is only until the end of the year 2019, so the recent drawdown of early 2020 is missing from the graph. At the time of writing, the index is down 27 percent, with only seven of the historical drawdowns being as severe as this one.

The average drop in a bear market using monthly data has been 33.9 percent, with a maximum of 81.8 percent during the 1930s. Notice again that these are total returns. The drawdowns have been worse during periods with high valuations, as measured by Shiller CAPE or P/B. The maximum drawdowns seem to have also increased with time, which may be caused by lower valuations at the beginning of the time frame and possibly also because people have been more connected than ever, which makes the spread of panic easier.

To conclude, this bear market has been rough and short this far. However, judging by the history, most bear markets recover fully in ten years. The valuations that are still elevated compared to history may however make the index to not to recover as much as in past bear markets.

Be sure to follow me on Twitter for updates about new blog posts like this!

The R code used in the analysis can be found here.

Tuesday, December 31, 2019

Predicting the next decade in the stock market

Making accurate predictions using the vast amount of data produced by the stock markets and the economy itself is difficult. In this post we will examine the performance of five different machine learning models and predict the future ten-year returns for the S&P 500 using state of the art libraries such as caret, xgboostExplainer and patchwork. We will use data from Shiller, Goyal and BLS. The training data is between the years 1948 and 1991, and the test data set is from 1991 and only until 2009, because the target variable is lagged by ten years.

Different investing strategies tend to work at different times, and you should expect the accuracy of the model you are using to move in cycles; sometimes the connection with returns is very strong, and sometimes very weak. Value investing strategies are a great example of a strategy that has not really worked for the past twelve years (source, pdf). Spurious correlations are another cause of trouble, since for example two stocks might move in tandem by just random chance. This highlights the need for some manual feature selection of intuitive features.

We will use eight different predictors; P/E, P/D, P/B, the CAPE ratio, total return CAPE, inflation, unemployment rate and the 10-year US government bond rate. All five of the valuation measures are calculated for the entire S&P 500. Let's start by inspecting the correlation clusters of the different predictors and the future ten-year return (with dividends), which is used as the target.

The different valuation measures are strongly correlated with each other as expected. All expect P/B have a very strong negative correlation with the future 10-year returns. CAPE and total return CAPE, which is a new measure that considers also reinvested dividends, are very strongly correlated with each other. Total return CAPE is also slightly less correlated with the future ten-year return than the normal CAPE.

The machine learning models

First, we will create a naïve model which predicts the future return to be same as the average return in the training set. After training the five models we will also make one ensemble model of them to see if it can reach a higher accuracy as any of the five models, which is usually the case.

The models we are going to use are quite different from each other. The glmnet model is just like the linear model, except it shrinks the coefficients according to a penalty to avoid overfitting. It therefore has a very low flexibility and also performs automated feature selection (except if the alpha hyperparameter is exactly zero as in ridge regression). K-nearest-neighbors makes its predictions by comparing the observation to similar observations. MARS on the other hand takes into account nonlinearities in the data, and also considers interactions between the features. XGBoost is a tree model, which also takes into account both nonlinearities and interactions. It however improves each tree by building it based on the residuals of the previous tree (boosting), which may lead to better accuracies. Both MARS and SVM (support vector machines) are really flexible and therefore may overfit quite easily, especially if the data size is small enough. The XGBoost model is also quite flexible but does not overfit easily since it performs regularization and pruning.

Finally, we have the ensemble model which simply gives the mean of the predictions of all the models. Ensemble models are a quite popular strategy in machine learning competitions to reach accuracies beyond the accuracy of any single model.

The models will be built using the caret wrapper, and the optimal hyperparameters are chosen using time slicing, which is a cross validation technique that is suitable for time series. We will use five timeslices to capture as many periods while having enough observations in each of them. We will do the cross validation on training data consists of 70 percent of the data, while keeping the remaining 30 percent as a test set. The results are shown below:


Click to enlarge images

The predictions are less accurate after the red line, which separates the training and test sets. The model has not seen the data on the right side of the line, so its accuracy can be thought as a proxy for how well the model would perform in the future.

We will examine the model accuracies on the test set by using two measures; mean absolute error (MAE) and R-squared (R²). The results are shown in the table below:

Model MAE
Naive model 5,16 % -
Ensemble 2,15 % 48,2 %
GLMNET 3,00 % 29,7 %
KNN 3,37 % 10,6 %
MARS 10,70 % 90,2 %
SVM 10,80 % 13,1 %
XGBoost 2,17 % 60,1 %

The two most flexible models, MARS and SVM, behave wildly on the test set and show signs of overfitting. Both of them have mean absolute errors that are about twice as high when compared to the naïve model. Even though MARS has a high R-squared, the mean absolute error is high. This is why you cannot trust R-squared alone. Glmnet has quite plausible predictions until the year 2009, most likely because of the rapid growth of the P/E ratio. K-nearest-neighbors has not reacted to the data too much but still achieves a quite low MAE. Out of the single models, the XGBoost has performed the best. The ensemble model however has performed slightly better as measured by the MAE. It also seems to be the most stable model, which is expected since it combines the predictions of the other models.
Let's then look at the feature importances. They are calculated in different ways for the different model types but should still be somewhat comparable. The plotting is done using the library patchwork, which allows plotting to be done by just adding the plots together using a plus sign.

Upon closer inspection of the feature importances, we see that the MARS model uses just the CAPE ratio as a feature, while rest of the models use the features more evenly. Most of the models perform some sort of feature selection, which can also be seen from the plot.

Future predictions

Lastly, we will predict the next ten years in the stock market and compare the predictions of the different models. We will also look closer at the best performing single model, XGBoost, by inspecting the composition of the prediction. The current values of the features are mostly obtained from the sources listed in the first chapter, but also from Trading Economics and multpl.

Model 10-year CAGR prediction
Ensemble 2,20%
GLMNET 1,47 %
KNN 4,04%
MARS -9,85%
SVM 6,46%
XGBoost 8,86%

The MARS model is the most pessimistic, with a return prediction that is quite strongly negative. The model should however not be trusted too much since it uses only one variable and does not behave well on the test data. The XGBoost model is surprisingly optimistic, with a prediction of almost nine percent per year. The prediction of the ensemble model is quite low but would be three percentage points higher without the MARS model.

Let's then look at the XGBoost model more closely by using the xgboostExplainer library. The resulting plot is a waterfall chart which shows the composition of a single prediction, in this case the predicted CAGR (plus one) for the next ten years. The high CAPE ratio reduces the predicted CAGR by seven percentage points, but the P/B ratio increases it by six percentage points. This is because the model contains interactions between the CAPE and P/B ratios. The effect of the interest rate level is just a bit positive at two percentage points, but the currently high P/E ratio reduces it back to the same level. The rest of the features have a very small effect on the prediction.

The benefit of predicting the returns of a single stock market is mostly limited to the fact that you can adjust your expectations for the future. However, predicting the returns of multiple stock markets and investing in the ones with the highest return predictions is most likely a very profitable strategy. Klement (2012) has shown that the CAPE ratio alone does a quite good job at predicting the returns of different stock markets. Adding more variables that are sensible to the model is likely to make the model more stable and perhaps better at predicting the outcome.

Be sure to follow me on Twitter for updates about new blog posts like this!

The R code used in the analysis can be found here.

Wednesday, July 17, 2019

Combining momentum and value into a simple strategy to achieve higher returns

In this post I'll introduce a simple investing strategy that is well diversified and has been shown to work across different markets. In short, buying cheap and uptrending stocks has historically led to notably higher returns. The strategy is a combination of these two different investment styles, value and momentum. In a previous post I explained how the range of possible outcomes in investing into a single market is excessively high. Therefore, global diversification is the key to assure that you achieve your investment objective. This strategy is diversified across strategies, markets and different stocks. The benefits of this strategy are the low implementation costs, a high diversification level, higher expected returns and lower drawdowns.

We'll use data from Barclays for the CAPEs which represent valuations, and Yahoo Finance using quantmod for the returns that do not include dividends, which we'll use as absolute momentum. Let's take a look at the paths of valuation and momentum for the U.S. stock market for the last seven years:

The two corrections are easy to spot, because momentum was low, and valuations decreased. The U.S. stock market currently has a strong momentum as measured by six-month absolute return, but the valuation level is really high. Therefore the U.S. is not the optimal country to invest in. So, which market is the optimal place to be? Let's look at just the current values of different markets:

There is only one market that is just in the right spot: Russia. It has the highest momentum and second lowest valuation of all the countries in this sample. In emerging markets things happen faster and more intensively, which leads to more opportunities and makes investing in them more interesting. Different markets also tend to be in different cycles, which makes this combination strategy even more attractive. Let's discuss more about these strategies and why they work well together.

Research on the topic

Value and momentum factors are negatively correlated, which means that when the other one has low returns, the other one's returns tend to be higher. Both have been found to lead to excess returns and are two of the most researched so-called anomalies. Both strategies have been tried to be explained using risk-based and behavioral factors, but no single explanation has been agreed on for either of the strategies. The fact that there are multiple explanations for the superior performance can rather be viewed as a good thing for the strategies.

In their book "Your Complete Guide to Factor-Based Investing", Berkin and Swedroe found out that the yearly returns of the two anomalies using a long-short strategy was 4.8 percent for value and 9.6 percent for the momentum anomaly. This corresponds to the return of the factor itself and can directly be compared to the market beta factor, which has had a historical annual return of 8.3 percent during the same period. This means that investing just in the momentum factor and therefore hedging against the market would have led to a higher return than just investing in the market. It is important to notice that investing normally just using a momentum strategy without shorting gives exposure to both of the market beta and momentum factors, which leads to a higher return than investing just into either of these factors.

Andreu et al. examined momentum on the country level and found out that the return of the momentum factor has been about 6 percent per annum for a holding period of six months. For a holding period of twelve months, the return was cut in half (source). It seems that a short holding period seems to work best for this momentum strategy. They researched investing in a single country and three countries at a time and shorting the same amount of countries at a time. The smaller amount of countries led to higher returns, but no risk measures were presented in the study. As a short-term strategy I'd suggest equal weighting some of the countries with high momentum and low valuation. I've also tested the combination of value and momentum in the U.S. stock market, and it seems that momentum does not affect the returns at all on longer periods of time.

Value on the other hand tends to correlate strongly with future returns only on much longer periods, and on shorter periods the correlation is close to zero as I demonstrated in a previous post. However, the short-term CAGR of the value strategy on the country level in the U.S. has still been rather impressive at 14.5 percent for a CAPE ratio of 5 to 10, as shown by Faber (source, figure 3A). I chose to show this specific valuation level, since currently countries such as Turkey and Russia are trading at these valuation levels (source).

The 10-year cyclically adjusted price to earnings ratio that was discussed in the previous chapter, also known as CAPE, has been shown to be among the best variables for explaining the future returns of the stock market. It has a logarithmic relationship with future 10-15 year returns, and an r-squared as high as 0.49 across 17 country-level indices (source, page 11). A lower CAPE has also lead to smaller maximum and average drawdowns (source).

Faber has shown that investing in countries with a low CAPE has returned 14 percent annually since 1993, and the risk-adjusted return has also been really good (source). The strategy, and value investing as a whole, has however underperformed for the last ten years or so (source). This is good news if you believe in mean reversion in the stock market.

The two strategies work well together on the stock level, as shown by Keimling (source). According to the study, the quintile with highest momentum has led to a yearly excess return of 2.7 percent, and the one with the lowest valuation has led to a yearly excess return of 3 percent globally. Choosing stocks with highest momentum and lowest valuations has over doubled the excess return to 7.6 percent. O'Shaughnessy has shown that the absolute return for a quintile with the highest momentum was 11.6 percent, and 11.8 percent for value. Combining the two lead to a return of 18.5 percent (source).

Lastly, let's take a closer look at some selected countries and their paths:

As expected, the returns of the emerging markets vary a lot compared to U.S. market. U.S. has performed extremely well, but the historical earnings haven't kept up with the prices. Israel on the other hand has gotten cheaper while the momentum has been good. Even though the momentum of U.S. is higher than any other point in time in this sample, Russia's momentum currently is, and Turkey's momentum has been way higher. Both Russia's and Turkey's valuations are less than a third of U.S. valuations, which makes these markets very interesting.

In conclusion, combining value and momentum investing into a medium-term strategy is likely to lead to excess returns as shown by previous research. The strategy can be easily implemented using country-specific exchange traded funds, and the data is easily available. Currently only Russia is in the sweet spot for this strategy, and Turkey might be once it gains some momentum. Investing to just one country is however risky, and I suggest diversifying between the markets with high momentum and low valuations.

Be sure to follow me on Twitter for updates about new blog posts!

The R code used in the analysis can be found here.

Saturday, June 8, 2019

The most important chart for long-term investors

Time is the investor's best friend. The longer the investment horizon, the less the investment returns depend on factors such as crashes and current valuation levels. It is known that the chance for losing in the stock market on a 20-year period has historically been about zero. This post attempts to expand on this fact and take a look at how risky the U.S. stock market has actually been for long-term investors.

As usual, we'll use data from Robert Shiller to answer the questions. The data begins from the year 1871, long before the actual S&P 500 index was created. We'll only consider lump-sum investing, since dollar cost averaging is another story.

Let's first look at the inflation-adjusted returns for an U.S. investor, including reinvested dividends.  Keep in mind, that the U.S. stock market has been one of the best performing in the world, and future returns are likely to be lower because of high valuations and lower productivity and population growth. The upper and lower bands are the 95 percent prediction intervals, i.e. 95 percent of the time the investment return has been between these bands. The y-axis tells how many times your investment would have been multiplied. Notice that the axis is logarithmic.

This chart demonstrates how uncertain investing is. The range of outcomes is very large, but it doesn't necessarily tell the full truth. France once had a 66-year period where stocks didn't beat inflation, for Italy the longest streak was 73 years and for Austria a painful period of 97 years. This is why global diversification is important. There has however been a 5 percent chance that the investment would have increased 64-fold in the U.S. for the same period. The risk works both ways.

Let's also look at the nominal, non-inflation-adjusted returns to see how inflation eats returns:

The inflation in the U.S. has been quite high, over three percent annually. Inflation of course affects different companies in a different way, but the net effect is that lower inflation does not necessarily lead to higher inflation-adjusted returns.

Be sure to follow me on Twitter for updates about new blog posts!

The R code used in the analysis can be found here.

The original post had a problem with the calculation of dividends. The charts and code have now been updated, and the true returns were higher than in the original post. Sorry for the inconvenience.

Tuesday, January 29, 2019

Correlation analysis of cyclically adjusted valuation measures and subsequent returns

In this post we'll test three different cyclically-adjusted valuation measures: CAPE (earnings), CAPD (dividends) and CAPB (book value). CAPE is calculated like the P/E ratio, but by dividing the current real price with the last ten year's average inflation-adjusted earnings. CAPD uses dividends instead of earnings, and CAPB uses book value. We'll test the optimal measurement (i.e. forward-looking return) period and formation period for all three valuation measures by calculating correlations with the future returns.

Typically CAPE, also known as P/E10, is calculated by using a 10-year formation period. The maximum time period for both the formation period and measurement period we'll use is 30, which means that  the performance of for example P/E1-P/E30 will be tested by looking forward 1-30 years.

We'll use Shiller and Goyal data from the US, which both begin from the year 1871. We'll plot the measurement period (in years) on the x-axis and r-squared on the y axis, and we'll make a distinct line for each of the formation periods.

As you can see from the plot above, a measurement period of about ten years, or maybe a little more, has worked the best for CAPE. As expected, valuation measures don't do a good job explaining short-term returns. However, this also applies to long-term returns, which gives the lines a bell-curved shape. The lines with shorter formation periods are lower than the rest, which means that short-term valuation measures such as normal trailing twelve-month P/E also don't work as well as the long-term valuation measures.

For CAPD, the correlation turns positive at long measurement periods, which is rather unwanted. The better performance of the long-term and the worse performance of the short-term valuation measures are more apparent with CAPD.

For CAPB, longer measurement periods of about twenty years seem to work the best. The r-squared is much larger than with CAPE or CAPD. Even the worst formation periods seem to work better in explaining future returns than the CAPE with the best formation period. This is consistent with Keimling's research (pdf, page 16), which suggests that normal P/B is almost as strong in predicting future returns as CAPE. The plot above shows that the cyclically-adjusted P/B is even stronger than CAPE in predicting future returns.

The reason why the r-squared of the CAPE is lower than what is often quoted is because of the long time period of the data. As you can see from the plot below, the rolling 10-year correlation of CAPE and subsequent returns has been rising over time. 

Another way of viewing these correlations is bringing them into Excel and color coding them. Notice that we are now using simple correlations instead of the r-squared. The x-axis tells the formation period of the valuation measure, and y-axis tells the measurement period i.e. how long into the future the valuation measure is used to predict.

The 10-year CAPE has surprisingly high explanatory power even for forecasting 1-year periods. The explanatory power starts declining noticeably from P/E8 to the left and P/E14 to the right.

For CAPD, the correlations are weaker, but the shape is about the same.

CAPB has the strongest correlations with future returns, but the shape is way different. Interestingly the strongest explanatory power regarding future returns comes from 22-24-year P/B and for a measurement period of 5-15 years.

This post was partly inspired by the O’Shaughnessy Quarterly Investor Letter Q4 2018.

Be sure to follow me on Twitter for updates about new blog posts!

The R code used in the analysis can be found here.

Sunday, October 28, 2018

How quickly do stock market valuations revert back to their means?

Mean reversion is the assumption that things tend to revert back to their means in the long run. This is especially true for valuations and certain macroeconomic variables, but not so much for stock prices themselves. In this post we'll look at the mean reversion of different valuation measures by forming equal sized baskets from each valuation decile and letting the valuations change as time goes on.

This study (pdf) shows an interesting graph on page 23 about the mean reversion of the 10-year price-to-earnings ratio also known as CAPE. In this post the study will be replicated using also international CAPE, P/E and P/B. I'll replicate the results using a longer time frame of twenty years. Let's start with CAPE using Shiller data of the US stock market from years 1926 to 2008:

Click to enlarge images

Using a longer time frame over reversion becomes visible, i.e. high valuations tend to eventually lead to low valuations and vice versa. The only exception is the decile with the highest valuation, which is explained by the housing bubble after the tech bubble. The valuations seem to revert back to their means in 11-12 years.

Let's look at the mean reversion of the same metric using Barclays data from years 1982 to 2008 from 26 different countries or continents:

The mean reversion happens again in about 12 years, but the over reversion seems to disappear. This might be caused by US having different kind of bubbles and busts than the rest of the world, or because of the shorter time period. The dataset is many times larger and should give a clearer picture of the mean reversion than using only US data.

Next, we'll look at price-to-book:

It seems to take longer for the P/B to revert back to its mean, which is logical since CAPE uses historical 10-year earnings. There is however still some noticeable over reversion.

Let's look at price-to-earnings ratio next:

The P/E ratio seems to revert back to its mean a little bit quicker than the rest, in about 9-10 years. There is still some over reversion.

In summary, different valuation measures tend to revert back to their means in about ten years, and over revert after that.

Hope you enjoyed this short post. Be sure to follow me on Twitter for updates about new blog posts!

The R code used in the analysis can be found here.

Monday, August 6, 2018

Mapping the stock market using self-organizing maps

Self-organizing maps are an unsupervised learning approach for visualizing multi-dimensional data in a two-dimensional plane. They are great for clustering and finding out correlations in the data. In this post we apply self-organizing maps on historical US stock market data to find out interesting correlations and clusters. We'll use data from ShillerGoyal and BLS to calculate the historical valuations levels, interest rates, inflation rates, unemployment rates and future ten-year total real returns from years 1948 to 2008.

Click to enlarge images

You can see a clear correlation between the different valuation measures, and that low valuations have led to high returns. There's a slight negative correlation between the valuation measures and unemployment, i.e. valuations have been higher when unemployment has been lower. Charlie Bilello has a great article on the subject. There's also a positive correlation between unemployment and rates, which means that rates have typically been higher when unemployment has been higher.

Next, let's look at clusters formed using hierarchical clustering. We'll form four clusters on the same plane as used in the above analysis. Let's look at the results:

The balls inside each hexagon correspond to each month. We are currently in the green cluster, which has typically lead to low returns. Why has low unemployment, low rates and low inflation led to low returns, aren't these things good for the stock market? I see two possible causes: these conditions tend to revert back to their mean (which means worsening macroeconomical conditions), and investors tend to extrapolate past returns into the future (a great tweet on the subject by Michael Batnick). The second part causes high valuations, which is present in the green cluster.

Which cluster is the best place to be in? I'd say the gray one, but the data seems to support the blue one as well. The good thing is that there are other countries that are in both of these clusters. Even though I recommend looking at valuations alone rather than macroeconomic indicators, a good place worth checking for all that macro stuff is

The R code used in the analysis is available here.

Sunday, July 22, 2018

How likely is a stock market crash?

In this post we'll look at the odds of a stock market crash from the view point of valuation. We'll use my favorite valuation measure Shiller P/E or CAPE ratio, which is just like regular P/E except it's calculated by using earnings of the last ten years instead of just one year.

According to, the CAPE ratio is currently at 32.57, which is in the 97th percentile when compared to history. We'll perform logistic regressions to calculate the probability of a correction (which is defined to be a decline of over ten percent from all-time highs) and the probability of a crash (a decline of over twenty percent). We'll use data from Robert Shiller to do the analysis. The data is from years 1881 to 2005.

The probability of a correction during the next year is a little bit higher than usual at 25 percent, as you can see at the point where the two lines intersect. Let's look at the probability of a crash next:

The probability of a crash seems to rise exponentially as the valuations rise. However the probability is less than I expected at fifteen percent.

The R code used in the analysis is available here.