Monday, August 6, 2018

Mapping the stock market using self-organizing maps


Self-organizing maps are an unsupervised learning approach for visualizing multi-dimensional data in a two-dimensional plane. They are great for clustering and finding out correlations in the data. In this post we apply self-organizing maps on historical US stock market data to find out interesting correlations and clusters. We'll use data from ShillerGoyal and BLS to calculate the historical valuations levels, interest rates, inflation rates, unemployment rates and future ten-year total real returns from years 1948 to 2008.

Click to enlarge images

You can see a clear correlation between the different valuation measures, and that low valuations have led to high returns. There's a slight negative correlation between the valuation measures and unemployment, i.e. valuations have been higher when unemployment has been lower. Charlie Bilello has a great article on the subject. There's also a positive correlation between unemployment and rates, which means that rates have typically been higher when unemployment has been higher.

Next, let's look at clusters formed using hierarchical clustering. We'll form four clusters on the same plane as used in the above analysis. Let's look at the results:


The balls inside each hexagon correspond to each month. We are currently in the green cluster, which has typically lead to low returns. Why has low unemployment, low rates and low inflation led to low returns, aren't these things good for the stock market? I see two possible causes: these conditions tend to revert back to their mean (which means worsening macroeconomical conditions), and investors tend to extrapolate past returns into the future (a great tweet on the subject by Michael Batnick). The second part causes high valuations, which is present in the green cluster.

Which cluster is the best place to be in? I'd say the gray one, but the data seems to support the blue one as well. The good thing is that there are other countries that are in both of these clusters. Even though I recommend looking at valuations alone rather than macroeconomic indicators, a good place worth checking for all that macro stuff is tradingeconomics.com.

The R code used in the analysis is available here.


11 comments

  1. Sorry to be obtuse, but what do the x and y locations of the hexes signify?

    ReplyDelete
    Replies
    1. The positions are basically chosen by random and can be changed by changing set.seed(). Only the positions relative to each other matter, and the algorithm tries to map them as closely to each other as possible.

      Delete
  2. I'd like a explanation of how to read the chart (even a link). Why are there 36 hexagons? What does a hexagon represent? If we are comparing the correlations between metrics (e.g., CAPE, PE) and the future return of the market, wouldn't it be easier to compare if the scales were the same?

    ReplyDelete
    Replies
    1. I aimed to make the post less technical, sorry about that. Each hexagon represents multiple observations that were grouped together by their similarity. The amount of hexagons should be decided so that each hexagon has enough observations, and six by six times six was a good amount in my opinion. Also I forgot to mention that the metrics were actually scaled and then descaled again for the visualization.

      Delete
  3. Very nice ... trying to replicate this in R, I run into

    bls_data <- read_xlsx("bls_data.xlsx", sheet = 1, skip = 10)

    ... whereas everywhere else you've provided links to the data, here I'm left to the tender mercies of BLS Data Finder.

    Assuming that's not the solution, can you provide a link to the dataset you had in mind?
    Assuming it is ... some pointers about parameters for BLS search? I can figure out 80% of it, but would rather replicate what you've got before getting creative.

    Thanks! Again, nice work

    ReplyDelete
    Replies
    1. Hello, the data source for BLS is here, you just need to change the "from" year to 1948 and click download xlsx. It was mentioned in the beginning of the article. Thanks for the feedback!

      Delete
    2. Ah, so it was - in my quick scan, I jumbled it in with Shiller Goyal ... and because it was the Friday of a loooong week, wasn't able to disaggregate upon rereading. :-( Thanks!

      Delete
  4. In my computer, Figures are presented as circle. How I can do to have "hexagonal" presentation?

    ReplyDelete
    Replies
    1. You must set shape = "straight" when plotting, as seen at lines 67, 79-83 and 89 here.

      Delete
  5. Congrats for this really interesting post!
    I got the R code from your git-repo and run it on R-studio.
    Everything is working fine but the final chart with hierarchical clustering is different for me, here the jpeg: https://ibb.co/qYpykW6

    Is it normal or I was wrong? If normal, why?

    I downloaded the data as you suggested. You can see there: https://docs.google.com/spreadsheets/d/1DAitaZKO7dgiHnMKakbFJdsx0tDAdDssKxzubac4oMM/edit?usp=sharing

    Thanks
    PM

    ReplyDelete
    Replies
    1. Hello, sorry for the late answer and thank you for reading my blog. You are not doing anything wrong. On line 10 in the code data from Shiller and Goyal are retrieved, but the data has changed (i.e. observations for 2018 have been added) after this post. That's why the clustering is different. If you don't like the cluster you get, you can change the set.seed(5) on row 54 to any other value to get different kinds of clusters.

      Delete