Marco Sammon

The Effect of the Smoot-Hawley Tariff Act on the Stock Market

Fri, 31 Jul 2020 00:00:00 +0000

Over the past few weeks, I have been reading Big Debt Crises by Ray Dalio. The case study on the Great Depression ties nicely into two projects I’m working on (1) My paper, “What Triggers National Stock Market Jumps?” with Scott Baker, Nick Bloom and Steve Davis, which uses newspaper accounts to identify the causes of large moves in the stock market and (2) my paper on Trade Policy Uncertainty with Marcelo Bianconi and Federico Esposito.

In Dalio’s book, he claims, “Stocks sold off sharply as it became clear the tariff bill [Smoot-Hawley Tariff Act] would pass. After falling 5 percent the previous week, the Dow dropped another 7.9 percent on June 16, the day before the tariff bill passed”. I noticed that in our Wall Street Journal (WSJ) dataset (available at https://stockmarketjumps.com/), we did not attribute this market drop to trade policy. I wanted to understand why the WSJ didn’t make the connection between the tariffs and the stock market, so I decided to gather more newspaper articles.

For the most part, I am going to take a step back, and let the journalists of 90 years ago speak for themselves. Their commentary is not contaminated by years of hindsight, as Dalio’s book might be (although each case study in the book does have a ‘real time’ news feed, so the reader can see what journalists were thinking as the crises he studies evolved). Every article in this post is from the ProQuest Historical Newspaper database. All the articles are from 6/17/1930 – the first day after the tariff was passed in the Senate – and are analyzing what happened on 6/16/1930. They are not in an easily machine-readable format, so I have included pictures of the articles in this post.

Historical Context

According to Dalio: “Protectionist sentiment resulted most notably in the passage of the Smoot-Hawley Tariff Act (Also known as The Tariff Act of 1930), which imposed tariffs on nearly 20,000 US imports. Investors and economists alike feared that the proposed 20 percent increase in tariffs would trigger a global trade war and cripple an already weak global economy.”

This figure (also from Dalio’s book) plots the average tariff rate on imports which would be subject to these taxes:

After Smoot-Hawley, average tariffs increased from around 20% to over 50%.

Not all sectors, however, were equally affected by these tariff changes. The plot below shows the distribution of new tariff rates, where each observation represents a different SIC 4 industry:

Some industries were essentially unaffected, while others had tariffs going to 80%. This heterogeneity is going to be important later in the post when we examine the effect of the tariffs on the stock market.

Wall Street Journal

On 6/16/1930, the CRSP value-weighted index, which is what we use in the Stock Market Jumps paper to identify large moves in the stock market, was down 5.4%.

We coded this day as commodities based on the following WSJ articles.

The first article mentions the tariffs, but said the main factor was the drop in commodity prices.

I believe the drop in commodities was related to the tariffs, but that will come out in other newspaper articles.

The second article makes no mention of the tariff, and only attributed the move to commodities prices:

In fact, some WSJ articles argue that the tariffs would help the U.S. economy:

This wasn’t the only article touting the benefits of the tariff, even though the market was down over 5%!

Journalists at the time thought that the passage of the bill resolved uncertainty:

And they thought it would stimulate trade.

With hindsight, we know otherwise. According to Dalio: “As similar policies piled up in the years that followed, they accelerated the collapse in global trade caused by the economic contraction.” During the Great Depression, US imports and exports dropped by 67% (Eckes, 1995).

Not every article in the WSJ believed the tariffs were good news. This article has some strong words to describe the tariff plan. At the time, Hoover thought the tariffs were not perfect, but he would have the power to amend them as needed:

Here is another passage from the same article, where they discuss the President’s plans to change the tariffs if the need arose:

A concern at the time was that our trading partners would levy retaliatory tariffs:

This was not restricted to Mexico. According to Dalio, “The most impactful initial response came from the US’s largest trading partner, Canada, which at the time took in 20 percent of American exports. Canadian policy makers increased tariffs on 16 US goods while simultaneously lowering tariffs on imports from the British Empire.”

Interestingly, the journalist thought the tariffs would have no effect on the price of copper:

New York Times

The articles in Dalio’s time line come from the New York Times (NYT), so I decided to look there for additional evidence.

This article blames the tariff bill for the drop in the stock market.

Although not every article in the NYT agrees that tariffs were responsible for the drop in the stock market:

Smoot (one of the eponymous authors of the tariff bill) was very positive on the tariffs:

Although he might have been missing a few facts… In this passage, he is claiming the stock market was up in response to the tariffs, even though it was down:

Finally, the main NYT article [Selling Swamps Exchange, shown above] links the drop in commodity prices to the tariffs [Right Panel]. Note that cotton was one of the harder hit commodities on 6/16/1930. The left two panels are from the NYT, while the right panel is from “Abreast of the Market”, a long running market commentary column in the WSJ:

Other Newspapers

What did other newspapers have to say? The Washington Post attributes sinking stocks to the tariffs:

As does the Chicago Tribune:

The Los Angeles Times, however, attributes it to a “deflation in the speculative markets”:

The last paragraph says the tariff measure was an ‘aggravating factor’, but does not make a strong link to the stock market’s performance that day.

Trade Policy Uncertainty and Stock Returns

How does all this link to my paper on Trade Policy Uncertainty? In the early 1990’s, congress threatened to raise tariffs on Chinese imports to Smoot-Hawley levels. These fights in congress lead to a lot of uncertainty about China’s future as a major trading partner with the United States (see e.g. Pierce and Schott 2016). In our paper, we find that the introduction of trade policy uncertainty can push down stock prices. Here, the firms exposed to more policy uncertainty (dashed red line) drop more in response to the start of the fight over Chinese tariffs (7/18/1990-7/23/1990) than stocks less exposed to this uncertainty (solid blue line):

A natural question is: Was there really uncertainty about the Smoot-Hawley tariff bill being signed into law? Did the President’s statement reduce uncertainty? According to the NYT, no way:

Although the White House denied that passing the bill was certain:

What (Actually) Happened?

When we present the stock market jumps paper, a lot of audience members ask, “why not just code what ‘actually’ happened?”. I think these newspaper articles show that people at the time may not have been sure. And the only thing we can do, without the hindsight of knowing how bad the tariffs ended up being for the global economy, is to code what the journalists said at the time.

I think the strongest link between the tariffs and the stock market drop, however, is the timing. The bill was sent over to the president soon after noon, which was exactly when the market started to drop.

So what happed to stock returns? There was a large drop in stock prices (maroon line) on 6/16/1930 (vertical red line), and a huge increase in trading volume (blue line):

How did this relate to tariffs? Stocks outside of manufacturing, which were likely not directly affected by the tariffs (green line) went down less than stocks which would have had smaller tariffs (blue line) which went down less than stocks which would have larger tariffs (red line):

Wrap Up

For me, two pieces of evidence point strongly to the link between the tariffs and the drop in the stock market. (1) The fact that they happened at the same time of day and (2) the fact that the stocks of companies more exposed to tariff increases were hit harder on that day.

Are Momentum Funds Momentum Funds?

Fri, 17 Jul 2020 00:00:00 +0000

There have been many papers (e.g. Jegadeesh Titman 1993) documenting momentum – the phenomenon where stocks that went up in the past keep going up, while stocks that went down in the past keep going down. This means it should be a profitable trading strategy to buy ‘winners’ (stocks that went up), and sell ‘losers’ (stocks that went down). In practice, however, this strategy requires a lot of rebalancing – the winners and losers are always changing – so it may be difficult for the average investor to actually trade on momentum.

As an alternative, some mutual fund and ETF companies have started offering momentum funds: you pay them a management fee, and they execute momentum for you. Before investing in one of these funds, however, you probably want to know how close the “momentum” you’re buying is to the momentum in academic research.

In this blog post, I will review a basic momentum strategy, examine the growth of momentum mutual funds, and see how much these funds’ returns look like the returns to the momentum trading strategy.

Review of Momentum

There are many ways to trade on momentum, but one way is to assign firms to portfolios based on their returns over the past year. Specifically, calculate the cumulative returns for each stock from months t-12 to t-2 (exclude the return in month t-1). Then, sort firms into 10 groups based on their cumulative returns over this period. Finally, go long the firms that went up the most (top 10%), and go short the firms that went down the most (bottom 10%).

This strategy has historically performed well. Here is a plot of the value of $1 invested in this strategy in 1930 (all data here is from Kenneth French’s Data Library):

I have also included the value of $1 investments in a strategy which is long small stocks, and short big stocks (SMB i.e. the size effect) and a strategy which is long value stocks and short growth stocks (HML i.e. the value effect). I also included an investment in the market, financed by borrowing at the risk-free rate. This is so all of these portfolios are computed using excess returns and it is an apples-to-apples comparison.

If it wasn’t for the crash after the financial crisis, the momentum strategy would seem far superior to the other strategies considered. This is a reminder that even though momentum had higher returns, and a higher Sharpe ratio than the market when many of the famous momentum papers were written, trading on this strategy it is not without risk. The momentum strategy lost more than half its value in the span of a few months, wiping out most of the gains it had relative to the market as a whole.

Here is the same plot, but starting in 1980. Momentum had outperformed the market by a factor of about 2x before the crash:

Here is a plot starting in 2010, after the crash. Momentum has under-performed the market as a whole:

Finally, here is a plot starting in 2018:

From these figures, we can see momentum did very well up until around the 2000’s. Then it experienced some volatility and a big crash. After 2010, it did worse than the market, although recently it has been performing similarly to the market.

Given momentum’s poor recent performance, it is surprising that momentum funds have grown as substantially as they have over the past 20 years, as I will show in the next subsection.

Momentum Mutual Funds

The mutual fund/ETF industry has grown massively over the past 20 years. Below, I plot the TNA (total net assets) in billions of all mutual funds/ETFs (I will use the terms interchangeably in this blog post). Note that this includes mutual funds that do not hold stocks, which is why these numbers seem large relative to the whole value of the stock market.

Using the CRSP mutual fund data, I identify momentum funds as funds with momentum in the name, and which hold at least 50% of their assets in equities (it turns out momentum exists in many asset classes, see e.g. Assness and Moskowitz 2013). The number of momentum funds has grown, especially after the momentum crash around 2010:

The total size of these funds is now near $25 Billion:

As I said, momentum requires a lot of trading, so I wanted to know how much asset management companies are charging for executing momentum on your behalf. Here are the average fees of mutual funds with at least 50% of their assets in equities. These fees may seem high, but it includes active funds.

Here are the fees for the average index fund, which have gone toward zero in recent years:

Momentum fund fees look similar to fees for the average fund – but are high relative to index funds (e.g. 0.095% for SPY), which have outlasted and outperformed all these momentum funds (more on this below):

Further, you can’t justify this as an active strategy; momentum is based on mechanical rules and does not require active ‘stock-picking’.

Here is a list of the biggest momentum funds as of 2019: There is a lot of concentration in this industry, with more than 10% of all momentum fund assets in the largest fund.

With the growth of the momentum fund industry in mind, let’s look at these funds’ performance.

Are Momentum Funds Momentum Funds?

The most accurate way to tell if these funds are implementing the momentum strategy I outlined above is to look at their holdings and see if they differ from the holdings predicted by the strategy. For simplicity, I just look at the correlation between the daily returns of the momentum strategy and the daily returns of the funds.

Here are some histograms of the distribution of this correlation overall, and by year. Note that for the whole sample, the distribution is centered barley above zero!

Here are the top 10 funds overall by correlation, note that many of these funds had short histories, so the correlation may be spurious:

And here are the top funds by correlation with the basic momentum strategy for a few selected years:

And how have these funds done if they are not closely tracking momentum? Well, the average momentum fund looked a lot like the market from 2000-2012, and has done worse than the market since then.

Note also that the momentum funds did not experience the momentum crash at the end of the financial crisis – more evidence that these funds are not actually trading on momentum (or that the managers were smart enough to avoid it…).

Wrap Up

It seems like momentum funds are not really momentum funds. They don’t seem to be executing the ‘academic’ momentum strategy, with average correlations in some years, like 2016, near zero. Further, they charge higher fees, and have similar, or have worse performance than the market as a whole. I guess at least as of now, there is no substitute for directly trading on momentum.

Who Owns Passive Ownership?

Fri, 03 Jul 2020 00:00:00 +0000

I have been updating my paper on the introduction of ETFs to incorporate comments from a recent bag lunch (a copy of the presentation can be found here). In the paper, I develop a model where rational agents decide to allocate attention to a systematic risk-factor that affects all assets, or idiosyncratic risk-factors that are specific to only one asset.

The systematic risk-factor, however, is less volatile than the stock-specific factors. This is central to a key trade-off in the model. Agents want to balance: (1) making more profits and (2) avoid holding a portfolio that seems too risky. Because stock-specific risk-factors are more volatile, there are more opportunities for profits trading stocks than trading in the ETF, but they are also riskier. Which of these forces dominate is governed by risk aversion: if agents are more risk averse, they prefer to hold diversified portfolios, while if they are closer to risk neutral, they prefer to make targeted bets on stock-specific risks.

In trying to link the effects of introducing ETFs to my paper on passive ownership, I had the following question: What is the analogue to passive ownership in the model? Uninformed investors are usually the agents who hold the ETF, while the informed investors are short the ETF. While this is a good start, I don’t think it’s the whole story. To tie things back to the data, in this post, I am going to examine which types of institutional investors actually holds ETFs.

Defining Passive Ownership

Let’s start with this definition from Investopedia. It says:

Active portfolio management focuses on outperforming the market in comparison to a specific benchmark such as the Standard & Poor’s 500 Index.

Passive portfolio management mimics the investment holdings of a particular index in order to achieve similar results.

They also give the following details, which are testable predictions in the model:

Active management requires frequent buying and selling in an effort to outperform a specific benchmark or index.

Passive management replicates a specific benchmark or index in order to match its performance.

Active management portfolios strive for superior returns but take greater risks and entail larger fees.

I think this summarizes the general consensus about passive ownership, and is also broadly consistent with the model’s findings on informed and uninformed investors. In the model, relative to uninformed investors, informed investors (1) trade more (2) take more risk by loading up on the individual stocks (3) outperform the market (i.e. the ETF) on average.

There is one thing, however, that I think this definition is missing. What about informed investors executing a market-timing strategy? They may buy an ETF on the S&P 500 when they think the market is going to go up, and short the same ETF (or just stay out of it) when they think the market is going to go down. At any point in time that they are holding the ETF, it looks like they are passive (they are just replicating the index), even though they are actively trying to outperform this index. With this in mind, let’s see who actually holds ETFs.

Holdings of Institutional Investors

I construct three empirical measures ETF ownership: 1) What percent of an investor’s total long equity holdings are in ETFs 2) What is an investor’s total dollar holdings of ETFs 3) What percent of an ETF’s total shares outstanding are owned by a particular investor

I constructed these measures for groups of institutional investors based on Professor Bushee’s data. I merged his classifications with 13F filings – which are quarterly reports of holdings by large institutional investors.

Here is the first breakdown by type of institutional investor: We can see that the two largest players in the ETF market are: (1) Banks and (2) Advisors (which includes hedge funds). I was not surprised that Advisors are large holders of ETFs – this is consistent with active investors executing market timing strategies. I was, however, surprised that banks now own about 20% of all equity ETF shares outstanding. My first intuition was that this is related to risk-weights in the Basel framework, but this document suggests otherwise.

I also do a breakdown by whether or not the institution is ‘tax sensitive’: ETFs have many tax advantages relative to index mutual funds (see this explanation by Fidelity) so its not surprising that ETFs are a larger part of tax sensitive institutions’ portfolios. There is, however, something puzzling about the bottom left panel of this figure: the shares manage to add up to more than one! The next subsection provides an explanation for this fact.

Who is Shorting ETFs?

In the last figure, we can see that the percent of ETF shares outstanding held by institutions added up to more than 1. This reveals a limitation of the 13F data: short positions are not reported. So if in the data, hedge funds are net short the ETFs, it’s possible that the long positions add up to more than the number of shares outstanding.

Here is a table from The Institutional ETF Toolbox (a great book if you want to get started in learning about ETFs). This table, which is based on data from Goldman Sachs Hedge Fund Monitor (2016), shows the long and short positions of Hedge Funds in ETFs:

The table shows that hedge funds are net short almost all these ETFs! This means that the total dollars of hedge funds’ short positions in these funds is larger than their total long positions. This can also explain how there are more dollars in 13F longs in ETFs than there are in Total Net Assets (TNA) in the same ETF: If hedge funds are heavily shorting ETFs, they are effectively taking the opposite side of these long positions. If I borrow a share of SPY from you, and sell it to someone else, you still have your long position on your book, and now the person I sold it to also has the long position on their book.

If we check out some data from ETF.com we can see that some of these short positions are massive at the individual-ETF level. At the time of writing (late June 2020), over 500% of the retail sector ETF (XRT) was short:

This is consistent with the bankruptcies of large retail firms during COVID-19 like J.Crew JCPenny and Neiman Marcus.

Wrap Up

In the model, informed investors take aggressive bets on individual stocks, while hedging out systematic risk with the ETF. Uninformed investors take the other side of this trade, buying the ETF from informed investors. From this alone, it seems like uninformed investors are similar to passive investors in the data, while informed investors are similar to hedge funds, which are presumably active investors.

This, however, is not the whole story. Informed investors may decide to learn about systematic risk, and bet long on the ETF. Consistent with this, in the data, hedge funds have some of the largest long positions in ETFs. In this instance, my empirical paper would classify them as ‘passive’ investors, even though they may really be doing a market-timing (active) strategy. On top of that, there are many investors who label themselves as active, who actually hold diversified portfolios (see e.g. data from Antti Petajisto). While I think the general consensus is mostly correct about how to classify passive and active management, some nuance needs to be applied to match the model to the real world.

Index Inclusion vs. Passive Ownership

Fri, 26 Jun 2020 00:00:00 +0000

I have been updating my paper on passive ownership to incorporate comments from my recent Kellogg bag lunch (a copy of the presentation can be found here). In the paper, I document a decrease in stock price informativeness over the last thirty years. Using two natural experiments, I show that passive ownership is an important cause of this decline. I also find empirical evidence that passive investors gather less information about stock-specific risks, suggesting a mechanism for the causal results. In a companion paper (update coming next week!), I develop a model where this information-gathering mechanism arises endogenously. The introduction of an Exchange Traded Fund (ETF) in the model leads fewer investors to become informed, and the remaining informed investors to learn more about systematic risk and less about stock-specific risks.

One concern is that the results from the natural experiments are driven by index inclusion effects (see e.g. What Drives the S&P 500 Inclusion Effect? An Analytical Survey), rather than the increase in passive ownership associated with index inclusion. For example, the general consensus is that when a firm is added to an index, its stock returns become more correlated with the stock returns of other firms in that index. If this changed the distribution of stock returns on non-earnings announcement dates, it could affect my meausre of pre-earnings price informativeness.

In this blog post, I am going to (1) Describe one of the price informativeness measures in my paper (2) Describe the natural experiments, and measure index inclusion effects (3) Relate these index inclusion effects to my results.

Set Up

My paper proposes three ways to measure pre-earnings price informativeness: (1) Trading volume in the month before an earnings announcement (2) the pre-earnings drift (3) the share of total annual volatility that occurs on earnings days. In this post, I am going to be focusing on this third measure, because I think it is the most likely to be influenced by index inclusion effects.

The motivation for this third measure is as follows: If the total amount of information is not changing over time, but prices become less informative before earnings announcements, we would expect there to be (1) relatively larger returns on earnings days and (2) relatively smaller returns on all other days. I quantify this using the share of total annual volatility occurring on earnings dates. Specifically, define the quadratic variation share (QVS) for firm $i$ in year $t$ as: \begin{equation}\label{eq:qvs} QVS_{i,t}=\sum\limits_{\tau=1}^4 r_{i,\tau}^2/\sum\limits_{j=1}^{252} r_{i,j}^2 \end{equation} where $r$ denotes a market-adjusted daily return (the stock return minus the market return). The numerator is the sum of squared returns on the 4 quarterly earnings days in year $t$, while the denominator is the sum of squared returns for all days in year $t$. If relatively more information is being learned and incorporated into prices on earnings dates i.e. prices have become less informative before earnings announcements, we would expect larger values of $QVS$.

Earnings days make up roughly 1.6% of trading days, so values of $QVS_{i,t}$ larger than 0.016 imply that earnings days account for a disproportionately large share of total volatility. This figure plots coefficients from a regression of $QVS$ on a set of year dummy variables: Average $QVS$ increased from 3.0% in 1990 to almost 16% in 2018. The Appendix of the paper shows that the increase in $QVS$ was due to a simultaneous increase in the numerator (more volatility on earnings days) and a decrease in the denominator (less volatility on all other days).

Here is the (value-weighted) distribution of $QVS$ for a few selected years: As time has gone on, the distribution has spread out more, with an especially long right tail – note the change of scale, especially in 2010 and 2018.

I find that increases in passive ownership are correlated with increases in $QVS$. In the paper, passive ownership is defined as the percent of shares outsanding which are owned by index mutual funds and ETFs. This table shows a regression of $QVS$ on the change in passive ownership, a set of firm level controls/fixed effects, and a set of year/quarter fixed effects: These estimates imply that a 10% increase in passive ownership would explain 10-20% of the average decline in $QVS$ we’ve observed over the past 30 years. For refrence, passive funds now own over 15% of the US stock market! For firms in the 95th percentile of passive ownership, this number can now be as high as 30%. In addition, this definition may understate the true size of passive ownership, as there has been a rise in closet indexing over the same time period – so there may be passive owners who still identify themselves as active funds.

S&P 500 Index Addition

When originally writing the paper, I was concerned that the reduced-form results (last table in the previous subsection) were due to reverse causality: maybe passive ownership happened to increase the most in stocks that had the biggest decrease in price informativeness for other reasons. With this in mind, I wanted to identify increases in passive ownership which are plausibly uncorrelated with firm characteristics. I knew that the largest passive fund is SPY, which tracks the S&P 500 index. As a result, firms which get added to the S&P 500 index get a big increase in passive ownership, so let’s start by looking into that proccess.

Each year, a committee from Standard & Poor’s selects firms to be added/removed from the S&P 500 index. For a firm to be added to the index, it has to meet criteria set out by S&P, including a sufficiently large market capitalization, a specific industry classification and financial health. Once a firm is added to the S&P 500 index, it experiences a large increase in passive ownership, as many index funds and ETFs buy the stock (SPY isn’t the only fund which tracks the index itself, and there are many sector ETFs which track subsets of the index).

With this in mind, I designed the following experiment: The treated group is going to be firms which are added to the index. The control group should be firms which reasonably could have been added to the index. Given the index inclusion criteria, I decided to select firms in the same industry, and of a similar size, which were not added to the index. As an additional check, I created a second control group, which is firms of a similar size/industry, but which are already in the index.

This plot shows the change in passive ownership around index addition for the treated group (blue circles), control group of firms out of the index (red diamonds) and control group of firms already in the index (green triangles): Unsurprisingly, the firms that are already in the index have higher passive ownership than those outside the index, and when a firm gets added to the index, there is an increase in passive ownership.

I find that when a firm is added to the S&P 500 index, its CAPM beta increases (marginally statistically significant) and its CAPM R-squared increases (statistically significant). There is no effect on the magnitude of CAPM residuals (i.e. idiosyncratic volatility), but there is an increase in total volatility (statistically significant). This last fact is consistent with Do ETFs increase volatility? and The sound of many funds rebalancing, where being a member of an ETF basket leads to additional volatility.

Russell 1000/2000 Rebalancing

The Russell 3000 contains approximately the 3000 largest stocks in the United States stock market. Each May, FTSE Russell selects the 1000 largest stocks by float (float is often very close to market capitalization) to be members of the Russell 1000, while it selects the next 2000 largest stocks by float to be members of the Russell 2000. Both of these indices are value-weighted, so moving from the 1000 to the 2000 significantly increases the share of passive ownership in a stock. The firm goes from being the smallest firm in an index of large firms, to the biggest firm in an index of small firms, increasing its relative weight by a factor of 10 (see e.g. Passive investors, not passive owners).

The increase in passive ownership corresponding to S&P 500 index addition is not a perfect natural experiment because firms are not added at random. Once added, firms receive increased attention, and added firms may start marketing their stock differently to institutional investors. The increase in passive ownership associated with the Russell reconstitution sidesteps many of these issues, as moving from the 1000 to the 2000 is based on a mechanical rule, rather than committee selection. Further, because the firm’s market capitalization shrunk, it is less likely to change the way the firm is marketing itself to institutions.

For this experiment, the treated firms are going to be those which switch from the Russell 1000 to the Russell 2000. For the control group, I identify firms which were in the Russell 1000 at the same time as the switching firms, had ranks between 900 and 1000, but did not switch to the Russell 2000. This figure shows the change in passive ownership for the switching firms (blue circles), and the control firms (red triangles): Consistent with the Russell reconstitution literature, there is an increase in passive ownership for the switching firms.

For the Russell experiment, firms which switch have a significant increase in CAPM beta, and a marginally significant decrease in CAPM R-squared. Switching firms also have significant increases in idiosyncratic volatility and total volatility. Firms which switch from the Russell 1000 to the 2000 are shrinking, so these last two facts are consistent with bad news usually being associated with increased volatility.

Implications

Here are the results from the S&P 500 experiment: Firms which are added have a significant decrease in pre-earnings trading volume, a significant decrease in pre-earnings drift, and a significant increase in earnings day volatility ($QVS$, last column).

Here are the results for the Russell experiment: While the volume and drift results are consistent with my reduced-form estimates, the volatility ($QVS$, last column) results are the right sign, but insignificant.

Why is this the case? The index inclusion effects are likely working against my results on the share of volatility on earnings days. If total volatility increases after index addition/rebalancing, the denominator of $QVS$ should increase, and shrink $QVS$. This may explain why the volatility results are insignificant for the Russell experiment: In the post period, treated firms had an increase in total volatility that was two times as large as the increase in total volatility for the corresponding firms in the S&P 500 experiment.

Wrap Up

In the real world, it’s hard to have a perfect labratory where we can change one variable of interest (e.g. passive ownership), and hold everything else fixed. In this instance, it appears that one of the confounding effects (index inclusion) was working against me, but I’m sure that even this is not the whole story! When firms get added to the S&P 500, there are changes outside of passive ownership and volatility: the firms may market themselves differently to instutitional investors, be under more scruinity (and thus will have a change in corporate governance), etc. This is one of the reasons why I want to further develop the theory behind the empirics: It will give me more testable predictions, that will (hopefully) help rule out these alternative explanations.

Cluster Analysis

Fri, 19 Jun 2020 00:00:00 +0000

Who are Amazon’s (AMZN) competitors? This seems like a simple question, but I don’t think there is an easy answer. In the finance literature, competitors are often identified using some definition of industries like SIC or NAICS, but I don’t think that will work for AMZN. What industry is Amazon in? Books? General merchandise (Amazon basics)? e-Commerce (Amazon Prime)? Entertainment (Prime Video and Prime Music)? Business Services (Amazon Web Services i.e. AWS)? Technology (Alexa, Kindle, Fire, …)? Or all of the above?

I went on S&P Capital IQ, and their system identified the following competitors for Amazon:

Alibaba (another e-commerce giant)
Walmart (another store that sells almost everything)
jd.com (another e-commerce giant)
Priceline (not sure that Amazon is in the hotel/airfare business yet…)
ebay (another e-commerce giant)
Netflix (Amazon streams movies and TV shows, and produces original content)
Apple (Amazon makes their own tech products: the Kindle, Fire, Alexa, etc.)
Google (another tech giant)
Facebook (not sure that Amazon is in the social media business yet…)

I found this list dissatisfying, so I wanted to try something different: I could let the stock market tell me which firms are connected to Amazon. Specifically, if a firm’s returns were sufficiently correlated with Amazon’s returns, after accounting for the effect of common components (e.g. market-wide risk), then those firms should somehow be connected to Amazon. This got me interested in Affinity Propagation, a clustering algorithm which I will describe in the next subsection.

Set Up

I am not an expert on cluster analysis, but here is my understanding based on reading the Wikipedia page, the Python package documentation and one of the associated examples.

Affinity Propagation is a clustering algorithm i.e. it is an algorithm for grouping data. The key trade off is minimizing the distance between each data point and the closest cluster center (called an ‘exemplar’) and minimizing the number of clusters. The two extremes of this are (1) Set the distance between all the points and the exemplars to zero by making each data point its own cluster (2) Only have one exemplar. Which of these forces will dominate depends on parameters you choose when running the algorithm.

For all the applications in this blog post, the affinity propagation model will have the following ingredients:
Input: Sparse covariance matrix of stock returns (computed using this python package). How the sparsity works: Suppose we are working with firm-level returns. Then, if two firms have independent returns conditioning on the returns of all other firms, the corresponding coefficient in the precision matrix (i.e. the inverse of this sparse covariance matrix) will be zero. I am using the sparse covariance matrix, rather than the standard covariance matrix, to take out common factors in stock returns.
Output: Exemplars i.e. the firms that are most representative of other firms, as well as the members of each cluster. Note that the model chooses the number of clusters by itself, but the number of clusters chosen depends on a parameter chosen by the user. This parameter determines which of the two trade-off forces outlined above (small distance between data and exemplars vs. small number of clusters) dominates. For all the applications in this blog post, I used the default parameters in the python package, and did not attempt to do any tuning.

One last point: How is this affinity propagation algorithm different than just forming groups of firms based on correlation? The answer is, not really. The firms within each cluster are more correlated with each other than with any group of firms outside the cluster (I checked this), but the algorithm does some extra work. It tells us the right number of clusters. Even though the number of clusters depends on an input parameter (so there is some equivalence between choosing the number of clusters, and choosing this parameter), we can fix the parameter and feed in data from different time periods, and see how the number of clusters the algorithm chooses changes over time. This will become more clear in my fourth application below.

Application One: Simulation

Before I started working with real data, I wanted to work with simulated data. This way I could get a better understanding for where the clustering algorithm succeeded, and where it struggled. I started by simulating the following model: \begin{equation} r_{i,t}=\beta_{i,1} r_{1,t} + \beta_{i,2} r_{2,t} + \beta_{i,3} r_{3,t} + e_{i,t} \end{equation} where $r_{i,t}$ is the return on stock $i$ in period $t$, $\beta_{i,j}$ is the loading of stock $i$ on factor $j$, $r_{j,t}$ is the return of factor $j$ at time $t$, and $e_{i,t}$ is a firm-specific error term at time $t$.

I wanted to see if the AP algorithm could correctly cluster firms, so I started by creating 4 groups of firms with similar loadings on the common risks – which I call ‘risk groups’. Specifically, within each of these four risk groups, each $\beta_{i,j}$ is equal to a constant plus some noise. For example: If the average firm in group 1 has a $\beta_{i,1}$ i.e. a beta of one on the first factor, each of the individual firms have betas normally distributed around 1 with some positive variance. I add this noise to the betas because I found that simulating a model with constant factor loadings within each group can lead to a (close to) singular sparse covariance matrix.

I find that the affinity propagation (AP) model usually correctly groups firms in the same ‘risk group’ into the same cluster. How accurate this process is depends on (1) How far apart the average betas in each group from one another and (2) The variance of the noise i.e. $var(e_{i,t})$. More dispersion in underlying factor loadings usually leads to fewer clusters, while more noise usually leads to more clusters.

To get some intuition, I draw plots which summarize the sparse covariance matrix and clustering algorithm. We can think of the distance between points on the plot as how useful one firm’s stock returns are for predicting contemporaneous variation in other firms’ stock returns (closer means more predictive power). This is not exactly right, as we are flattening a high-dimensional object, the covariance matrix, into a 2-dimensional picture. To add a more dimensions to the picture, the darker/thicker the lines, the stronger the covariance between the firms.

In one instance, where I made $var(e_{i,t})$ relatively large, the algorithm still correctly identifies 4 clusters, but some firms (represented by dots) get grouped incorrectly.

Note that in this plot, and all the other plots, the color of the dots/color of the boxes around the firm/industry name identify the clusters. In this plot, the 4 clusters are denoted by light blue (bottom left), light gray (left and middle), yellow (middle and bottom right) and black (bottom right).

When I reduce $var(e_{i,t})$, the algorithm incorrectly identifies 5 clusters by splitting up one of the “correct” clusters:

Here, the true bottom left cluster is split into two clusters: green and yellow.

To have the algorithm correctly identify all 4 risk groups I found I needed the following: (1) Reasonably large dispersion in factor loadings between clusters (2) Relatively small $var(e_{i,t})$. This process convinced me that the AP algorithm is useful for grouping firms. When given a precise enough signal, it is able to identify a cluster structure in the underlying data. With this in mind, I was ready to take the algorithm to some real-world applications.

Application Two: Fama French Size/Value Factor Portfolios

The Fama-French 25 size and book-to-market portfolios have been studied extensively. It is well known that there is a strong factor structure in these portfolios, so I was curious what would happen if we put them into the AP model:

Here, I am using the monthly returns from these portfolios between 1926 and 2018. Using this data, the algorithm identified 6 clusters (ME is for equity/size, BM is for book-to-market/value): Cluster 1: SMALL LoBM
Cluster 2: ME1 BM2, ME1 BM3, ME1 BM4, SMALL HiBM
Cluster 3: ME2 BM1, ME2 BM2, ME3 BM1, ME3 BM2, ME4 BM1
Cluster 4: ME2 BM3, ME2 BM4, ME2 BM5, ME3 BM3, ME3 BM4, ME3 BM5, ME4 BM2, ME4 BM3, ME4 BM4, ME4 BM5, ME5 BM4
Cluster 5: BIG LoBM, ME5 BM2, ME5 BM3
Cluster 6: BIG HiBM

Obviously, there is something special about the extreme portfolios: SMALL LoBM and BIG HiBM! Other than that, it is hard for me to take away anything from clusters 2 to 5.

This, however, got me thinking: how is AP different than factor analysis? I decided to do principal component analysis (PCA) on the same 25 portfolios. The figure below plots the loadings of the portfolios on the factors:

Note that portfolios 1-5 are made up of the smallest 20\% of firms by market capitalization, while portfolios 21-25 are made up of the 20% largest by market capitalization. Portfolios 1, 6, 11, 16, and 21 have the lowest book-to-market, while portfolios 5, 10, 15, 20 and 25 have the highest book-to-market.

The factor structure is clear here – factor one is pretty much constant, and is probably something like the market. As we increase size, we increase loading on factor two. As we increase book to market, we decrease the loading on factor three. This point has been made before for different groups of portfolios, see e.g. John Cochrane’s blog post.

It seems like when there is a strong factor structure, PCA may be more useful than cluster analysis. With this in mind, let’s take the AP algorithm to portfolios without a strong factor structure.

Application Three: Fama French Industry Portfolios

Now, let’s take the AP model to a set of portfolios that is not well known to have a factor structure: The Fama-French industry portfolios. Here is the visualization (based on daily returns, for the sample where all the industries have non-missing daily returns):

Here are the clusters the algorithm identifies Cluster 1: Agric
Cluster 2: Food , Soda , Beer , Smoke, Hshld
Cluster 3: Toys
Cluster 4: Hlth , MedEq, Drugs, LabEq
Cluster 5: Books, Clths, Chems, Rubbr, Txtls, BldMt, Cnstr, Steel, FabPr, Mach , ElcEq, Autos, Ships, Mines, PerSv, BusSv, Paper, Boxes, Trans, Whlsl, Rtail, Meals, RlEst
Cluster 6: Aero , Guns
Cluster 7: Gold
Cluster 8: Coal , Oil
Cluster 9: Util
Cluster 10: Telcm
Cluster 11: Fun , Hardw, Softw, Chips
Cluster 12: Banks, Insur, Fin , Other
Industries in italics are the exemplars of each cluster. I don’t think we should put too much weight on which firms are chosen as exemplars, however, as this is not stable, and depends on the input parameters for the AP model.

Looking at these lists, some of clusters make intuitive sense. Cluster 8 looks like energy. Cluster 12 looks like finance. Cluster 2 looks like consumer non-durable goods. Some clusters make less sense, like cluster 5, which looks like a mixture of several industries. At this point, I was still not sure if the AP algorithm was useful for real-world data, but I wanted to go back to my original idea, and apply cluster analysis to individual firms’ returns.

Application Four: Individual Firms

Let’s apply the AP algorithm to the 100 largest firms traded on US exchanges. I was curious how the clusters would change over time so I ran the algorithm on daily stock return data for four separate years. 2000, 2008, 2012 and 2017:

Here is the plot for 2000:

A few groups really stand out. The energy firms in the bottom cluster, the technology manufacturing firms on the far right, the biotech firms on the middle right, the pharmacy firms on the top, and the consumer products near the ‘middle’.

Here is the plot for 2008:

The big difference from 2000 is that (1) All the finance firms are now on their own in the far right (2) All the other firms except oil/gas have been compressed in a big ball (recall that during this time there was also a big shock to oil prices!). This compression is consistent with systematic risk dominating in a crisis. Also interesting is that Buffett, BRK, is all by himself in the middle of the plot.

Here’s the plot for 2012:

The big difference from 2008 is that things have spread out again. The financial firms are still in their own groups on the far right, but the rest of the firms have spread out as well.

Finally, here is the plot for 2017:

Now the tech giants have formed their own group in the top right. The financial firms are now all in the same cluster (light green, near the top left). This last exercise showed what I think is the most interesting application of cluster analysis: Keeping the parameters of the model constant, and feeding in data from different time periods.

Wrap Up

From all these exercises, I had the following takeaways: (1) Based on the simulation results, the AP algorithm is capable of correctly identifying clusters of firms exposed to similar risks (2) When we know the data has a strong factor structure, like the Fama-French 25 portfolios formed on size and book-to-market, cluster analysis seems less useful than PCA (3) When there is not a strong factor structure, cluster analysis usually identifies groups that make intuitive sense together (4) In my opinion, the most interesting application is how clusters evolve over time. We see that during the financial crisis, clusters coalesce, while during expansions, they spread out. I am definitely still not an expert on these clustering algorithms, but I learned a lot by working through this blog post!

Trends in Market Efficiency

Fri, 12 Jun 2020 00:00:00 +0000

If you believe markets are efficient, then there should be limited predictability in stock returns. Why? Suppose returns were predictable: specifically suppose you received some information that the returns of Apple (AAPL) were certain to be high in the future. On this information, you start buying AAPL today, which pushes up the price, and pushes down future returns. Now, suppose it’s not just you who received that information, but all investors – and this is likely true, as long as you are not doing insider trading! Then everyone will demand AAPL and push up the price even more today, further pushing down future returns. If all investors can trade on this information instantaneously, the price would instantaneously adjust to the good news, and your information would have no predictive power for future stock returns.

In the data, however, return predictability has been well documented (see e.g. Discount Rates by John Cochrane), but this predictive power only exists over longer horizons, and is usually attributed to time-variation in risk premia. An example of this would be that at the bottom of a recession, predicted future returns i.e. risk premia are high – there is a lot of uncertainty about the future path of the economy, so investing in stocks is perceived as risky, and investors must be compensated accordingly – further, not many investors have capital to put into the stock market, so even though expected returns are high, prices stay low and predictability remains.

In this post, I am going to show how predictability over shorter horizons – of one month to two minutes – has changed over the past 100 years.

Set Up

One way to measure return predictability would be with a regression like: \begin{equation} r_{(t+n,t+1)}=a_n + \beta_{n}r_{(t,t-n-1)} + e_{n,t} \end{equation} where $r_{(t+n,t+1)}$ is the cumulative return from $t+1$ to $t+n$, and $r_{(t,t-n-1)}$ is the cumulative return from $t-n-1$ to $t$. Suppose $n=5$ trading days: then this regression is using last week’s return to predict this week’s return. If predictability is high, we would expect $\beta_n$ to be large in absolute value, and the R-squared i.e. the percent of future returns explained by past returns to be high.

I am going to run this regression with two sets of data: One is going to be ‘low frequency’, built up from daily returns on the CRSP value-weighted index from 1927-2018: the horizons will be 1 day, 5-day (one week) and 22-day (one month). I am also going to use high frequency data on S&P 500 futures, and look at predictability over shorter intervals: hourly, 30-min, 15-min, 10-min, 5-min and 2-min. The high-frequency data I have runs from 1983-2015.

Results

I run the regression described above at each horizon, every year (i.e. using one year of data at a time), and plot the $\beta_n$’s for each year in the figure below:

In all the plots, the red line represents a moving-average of the betas computed each year. A blue dot near zero would imply predictability is low in that year.

For the 22-day and 5-day frequencies, there is a weak downward trend over the past 100 years. For the 1-day frequency, it seems like predictability increases from the late 20’s to the 70’s. An explanation for this is that the Great Depression was a period of extreme stock volatility (check out the data my co-authors and I posted at: https://stockmarketjumps.com/), so short-run predictability had to be low. As we came of of the depression, and volatility declined, predictability mechanically increased. Later, as markets became more efficient, the 1-day predictability decreased.

The high-frequency pictures tell the same story, but in a more striking way. If we look at the 15-min returns regression, it looks like predictability disappeared in early 1990’s. For the 10-min, it looks like it disappeared in the late 1990’s. For the 5-min, it looks to be the mid 2000’s and for the 2-min it is around 2010.

This is consistent with the idea that the market has become ‘faster’ over time. The massive improvement in technology, co-location of computers near exchanges, and the growth of algorithmic trading firms likely together eliminated predictability, even at very short horizons. As the market has become faster and faster, predictability at shorter and shorter horizons has vanished.

From the same regressions, I plot the R-squared i.e. the percent of variation in future returns, $r_{(t+n,t+1)}$, explained by past returns $r_{(t,t-n-1)}$ each year:

An R-squared near zero means zero predictability. I think R-squared values tell an even stronger story than the betas: The R-squared values at each horizon went to zero faster than the betas did! Even at times where there was some predictability left, the low R-squared values imply it would hard to consistently make money trading on this predictability.

Wrap Up

Mean reversion at the 1-day horizon has decreased significantly from the 1970’s to today. At the high-frequency level, things appeared to have changed in stages. In the 1990’s there was still some predictability at 10-min intervals, but by the 2010’s all predictability at any horizon has essentially vanished. This suggests that markets have likely become more efficient over time, and the idea that prices should be martingales is more true now than ever before.

Thinking About Beta

Tue, 02 Jun 2020 00:00:00 +0000

I recently recorded a series of video lectures for an introduction to Finance class at Kellogg. One of the topics we cover is the Capital Asset Pricing Model (CAPM). Recording this class got me thinking about the “beta anomaly” for the first time in years.

My initial interest in beta was near the end of the second year of my PhD program. I needed to write a second-year paper, and at the time, I was replicating the Betting Against Beta (BAB) paper by Frazzini and Pedersen. Although their paper is extremly well cited (at the time of writing, the BAB paper has over 1500 citations), I thought I could provide a different explanation for why low beta stocks tend to outperform high beta stocks. Although I never even self-published that paper, I think the results are interesting enough to post here on my blog.

In this post, I will provide a quick review of the CAPM, examine the returns of beta-sorted portfolios, and see how sensitive the results are to various sorting modifications. While I’ve personally constructed all these beta-sorted portfolios, rather than use my replication for the BAB factor in Frazzini and Pedersen’s paper, I will use the data published by AQR here.

Why Bet Against Beta?

There are two pieces of the CAPM. The first is the CAPM regression: \begin{equation} r_{i,t}-r_{f,t}=a_i + \beta_i (r_{m,t}-r_{f,t}) + e_{i,t} \end{equation} where $r_{i,t}$ is the return on stock $i$ at time $t$, $r_{m,t}$ is the return on the market, $r_{f,t}$ is the risk-free rate and $\beta_i$ is the CAPM beta. The CAPM regression tells us that moves in the market might be able to explain some of the moves in stocks, but there is error i.e. moves in stocks not explained by the market, $e_{i,t}$. This error is firm-specific risk, and the CAPM puts no restriction on how large its variance, $Var(e_{i,t})$, can be. The main use for the CAPM regression is to estimate $\beta_i$ and use it in the CAPM equation: \begin{equation} E[r_i]=r_f+\beta_i MRP \end{equation} where $MRP$ is the market risk premium i.e. the compenstion for bearing market risk. In a future post, I will talk about how we might measure the market risk premium. In the CAPM equation, there is no error: stocks with higher betas should have higher expected returns. And this is the only reason stocks should have different expected returns.

Empirically, however, the opposite is true: low beta stocks tend to have higher (risk-adjusted) returns than high beta stocks. There are many explanations for this failure of the CAPM, but one of the most well known is explained in Capital market equilibrium with restricted borrowing, by Fischer Black (1972). This paper argues that the beta anomaly exists because one can not actually borrow/lend unlimited amounts at the risk free rate (an assumption required for the CAPM to hold). To achieve high returns with low beta stocks, an investor whould have to apply leverage, which is not cheap (unless you are Warren Buffett). So, with limited leverage, investors can only achieve high returns with high beta stocks. This leads to excess demand for these high beta stocks, pushing up their prices, and pushing down their expected returns. Limits on leverage are also the key mechanism in the theoretical model in the BAB paper.

There are other explanations for the beta anomaly, e.g. that it is demand for lottery-like stocks (see e.g. Bali, Turan G., et al. “Betting against beta or demand for lottery.” (2014)) or that it is the result of non-standard sorting procedures (see e.g. Novy-Marx, Robert, and Mihail Velikov. “Betting against betting against beta.” (2018)). I am more sympathetic to the non-standard sorting procedure argument, as there are a few things in the BAB paper that seemed unusual when I was first reading it: (1) Not using value weights (2) Applying time-varying leverage to the long and short sides of the portfolio.

In the next few sections, I will go through how I would form these portfolios, then I will examine the effects of changing the weights, and adding the time-varying leverage. Going through these non-standard sorting procedures will reveal something interesting about the BAB strategy.

Basic Sorting

I start by calculating each firm’s CAPM beta over the past year using daily returns (I require that a firm have at least 126 non-missing daily returns in this calculation to be included in my sample). I then follow Hou, Xue and Zhang (2020), and form 10 value-weighted portfolios based on NYSE breakpoints. In this, and all future analysis, I restrict to ordinary common shares traded on major exchanges. Finally, I form a long short portfolio by subtracting the return of the high beta portfolio from the return of the low beta portfolio.

I am going to evalue these portfolios three ways (1) the mean (2) the Sharpe ratio, which is the mean, minus the risk-free rate, devided by the standard deviation i.e. how much additional mean return you are getting for each unit of risk and (3) the 3-Factor alpha, which is the constant in a regression of the portfolio returns on the market, size and value factors i.e. how much mean return is not explained by these three factors.

Here are the annualized mean, Sharpe ratio and 3-Factor (market, size and value) alpha for the 10 beta-sorted portfolios: In the first row, the CAPM doesn’t look like a total failure… high beta stocks do have higher returns, but it tends to peak around the 6th and 7th portfolio. The real difference is in the risk-adjusted returns. The 3-Factor alpha of the long short portfolio is almost 9% a year, which is huge!

Weights

In the BAB paper, they use beta weights, rather than value weights. I talked about the issues with using beta weights in a previous blog post. Still, I was curious what would happen if we use equal-weighted portfolios instead of value-weighted portfolios. Here are the results with 10 equal-weighted portfolios: Now the beta anomaly appears even stronger. The low beta portfolio actually has higher mean returns than the high beta portfolio, a total failure for the CAPM. We’ve also increased the alpha to over 13% per year.

Making the Portfolio Market Neutral

As I’ve constructed it so far, the long short portfolio is not necessarily market-neutral i.e. the expected beta of that portfolio may not be zero. Frazzini and Pedersen propose a procedure to remove all (expected) market risk from the BAB portfolio.

First, shrink all betas toward one $\beta_{new}=0.6 \times \beta_{past year}+0.4 \times 1$ (with the idea that the mean cross-sectional beta must be one). Then, compute the value-weighted average $\beta_{new}$ for each portfolio. Finally, take the return of the each portfolio portfolios, and multiply it by one over the (value-weighted) average beta. This will increase the leverage on the low-beta portfolio, and decrease leverage on the high beta portfolios. If betas were constant, every portfolio would have a beta of one. I will call multipling portfolios by $1/E[\beta_{new}]$ the beta one adjustment. This also means that the long-short portfolio should have an expected beta of zero.

Here are the results for the 10 value-weighted portfolios, but setting the expected beta of each portfolio to one: Now the beta anomaly is even stronger. The low beta portfolio has a return over 11% higher per year than the high beta portfolio.

The effect gets downright huge when we do this with the equal-weighted portfolios:

Time-Varying Leverage

Making the long-short portfolio market neutral is putting time-varying weights on the long/short sides of the portfolio. The tables above show that this time varying leverage matters a lot for the size of the beta anomaly. While this is too much to get into for a blog post, I believe it has to do with the fact that betas compress toward one in bad times, so this would trim leverage when we would expect the market as a whole to perform poorly.

This table shows the average weight on the long/short side by decade:

And here is a plot with the leverage by month:

Time Period

Looking at the leverage over time made me want to understand if the beta-sorted portfolios do particurlarly well in and decade. Here are the mean returns, Sharpe ratios and alphas of the long-short portfolios I’ve constructed:

Nothing really stands out to me here, but what happens if we apply the beta one adjustment?

Now we have the value-weighted factor having positive mean return, Sharpe ratio, and alpha in every decade. So betting against beta seems to have been consistently profitable. But, for this to be true, we need to make the beta one adjustment.

Embedded Options

The explanation I put forth in the paper was that the the time-varying leverage makes the portfolio look like an option-writing strategy. Here is a plot of the monthly returns each long-short portfolio against the market: Note that the BAB factor actually looks somewhat like a bet against volatility. It has higher returns when the market has a return close to zero, and lower returns otherwise. This is not true, however, for the value-weigthed or equal-weighted strategies I constructed.

Here is the same picture, applying the beta one adjustment to my portfolios: We can see that this is the secret sauce which makes the long short portfolio look like a short straddle.

The slope are statistically significantly different depending on whether the market has a positive or negative return:

Now, why is this happening? The long side looks like the market plus a short call. The short side looks like the market plus a long call. So it seems like there are embedded options in these strategies. Now, why would this earn a premium? It is well documented that option-writing strategies earn a premium see e.g. Jurek and Stafford. “The cost of capital for alternative investments.” (2015). My propsed explanation that the alpha in BAB was due to an embedded option-writing strategy embedded in the portfolio.

Wrap Up

I think the evidence here makes it clear that there is something special about the BAB portfolio constructed by Frazzini and Pedersen. The basic value-weighted long-short beta portfolio has a negative mean. But, when we use equal weghts, or apply the time-varying leverage to the long and short sides, we can get a large positive mean. We also get an interesting pattern: the BAB portfolio, constructed using only equities, has payoffs that look like an option-writing strategy. I think this is why people are still studying the beta anomaly over 40 years after Fischer Black’s paper – it is not clear whether or not it really exists, and if it does exist, why it occurs!

Early vs. Late Resolution of Uncertainty

Mon, 25 May 2020 00:00:00 +0000

My new paper, ETFs, Learning, and Information in Stock Prices studies the effect of introducing an ETF into a model with endogenous information acquisition. Introducing the ETF has two competing effects on learning: (1) the ETF makes it easier for investors to take aggressive bets on stock-specific information, making it more attractive to learn about stock-specific risks (2) the ETF allows investors to directly trade on systematic risk, which cannot otherwise be diversified away. This makes it more attractive to learn about systematic risk and makes it less attractive to pay a fixed cost and become informed at all.

Which of these forces dominates in equilibrium depends on the economy’s risk bearing capacity, a function of the share of agents who decide to become informed and risk aversion. If risk aversion is sufficiently high, introducing the ETF decreases learning about stock-specific risks, increases learning about systematic risk and decreases the share of agents who become informed. The model provides an explanation for the empirical results in my job market paper, where I link the rise of passive ownership to decreased firm-specific information in stock prices.

My model mainly builds two papers: (1) A Noisy Rational Expectations Equilibrium for Multi-Asset Securities Markets by Anat Admati and (2) A Rational Theory of Mutual Funds’ Attention Allocation by Kacperczyk, Van Nieuwerburgh and Veldkamp. While working through the second paper, I noticed the following sentence defining the agents’ objective function when deciding how to allocate their limited attention, “The objective is $−E[lnE_j[exp(−\rho W_j)]]$.” For context, the outer expectation is taken with respect to time 0 information, while the inner expectation is taken with respect to time 1 information. Going forward, I will use $E_t$ to denote the expectation with respect to the time $t$ information set.

I didn’t understand why the authors put the $ln$ inside the outer expectation i.e. why the objective function was not $-E_0[E_1[exp(-\rho W_j)]]$, or standard expected utility. Reading through the appendix of the paper, I saw that putting this $ln$ inside the outer expectation, “is a transformation that induces a preference for early resolution of uncertainty.”

Still confused, I re-read chapter 8 of Information Choice in Macroeconomics and Finance (Veldkamp (2011)) and found this line in section 8.2.4 explaining what the $ln$ is doing: “This formulation of utility is related to Epstein and Zin’s (1989) preference for early resolution of uncertainty.” This was the last hint I needed to put the pieces together. In this post, I will walk through how to derive this utility function as a special case of Epstein-Zin preferences, why this transformation introduces a preference for an early resolution of uncertainty, and why this matters in models with endogenous information acquisition.

The Setup

In my model, terminal wealth i.e. wealth at time 2 is defined as: $w_{2,j}=\left(w_{0,j}-1_{informed,j} c\right) +\mathbf{q_j} (\mathbf{z} - \mathbf{p})$ where $w_{0,j}$ is initial wealth, $1_{informed,j}$ is an indicator of whether or not agent $j$ decides to become informed and $c$ is the cost (in dollars) of becoming informed. If an agent decides to become informed, they get a signal at time 1 about the asset payoffs, and the precision of this signal depends on how they allocate their limited attention. $\mathbf{q_j} (\mathbf{z} - \mathbf{p})$ is investor $j$’s trading profits: their portfolio $\mathbf{q_j}$, times the payoff of each asset $\mathbf{z}$ minus the price $\mathbf{p}$. The bold typeface denotes a vector, which is needed because there are multiple risky assets.

At time 1, agent $j$ submits demand $\mathbf{q_j}$ to maximize expected utility over time two wealth: $U_{1,j}=E_{1,j}[-exp(-\rho w_{2,j})]$, so investors have Constant Absolute Risk Aversion (CARA) or exponential utility at time 1, with risk aversion $\rho$.

At time 0, agent $j$ decides whether or not to pay $c$ and become informed. If informed, agent $j$ allocates attention to maximize time 0 expected utility. In line with Kacperczyk et. al. (2016), I define agents’ time 0 objective function as: $-E_0[ln(-U_{1,j})]/\rho$ which simplifies to: $U_0 = E_0\left[E_{1,j}[w_{2,j}]-0.5 \rho Var_{1,j}[w_{2,j}] \right]$. This simplification comes from the fact that (1) $w_{2,j}$ is normally distributed in the model, and (2) $E[exp(a x)]=exp(a \mu_x + \frac{1}{2}a^2 \sigma_x^2)$ where $x$ is a normally distributed random variable with mean $\mu_x$ and standard deviation $\sigma_x$, and $a$ is a constant.

Formulation as Recursive Utility

Too see how the log transformation, $-E_0[ln(-U_{1,j})]/\rho$, induces a preference for an early resolution of uncertainty relative to expected utility $E_0[U_{1,j}]$, we can follow the hint in Veldkamp (2011) and cast preferences as recursive utility (Epstein and Zin (1989)).

Start by writing down the formulation of Epstein-Zin preferences on the Wikipedia Page: $U_t = \left[(1-\beta)c_t^\alpha + \beta \mu_t\left(U_{t+1}\right)^\alpha \right]^{1/\alpha}$ where the elasticity of intertemporal substitution (EIS) is $1/(1-\alpha)$ and $\mu_t$ is the certainty equivalent (CE) operator. Note, the Wikipedia page uses $\rho$ instead of $\alpha$, but I’ve re-labeled it to avoid confusion with risk aversion.

In my setting, all consumption happens at time 2, so let’s simplify $U_t$ from the perspective of $t=0$. To further simplify things, set $\beta=1$.

Choose the von Neumann-Morgenstern utility index $u(w)=-exp(-\rho w)$ i.e. the CARA utility at time 1 described above. We can then define the certainty equivalent operator $\mu_t(U_{t+1})=E_t\left[-ln(-U_{t+1})/\rho\right]$. This $\mu_t$ is just the inverse function of the von Neumann-Morgenstern utility index. It makes sense to call this a certainty equivalent operator because it returns the amount of dollars for sure that would yield the same utility as the risky investment. Recall that $U_{1,j}=E_{1,j}[-exp(-\rho w_{2,j})]$ and wealth is normally distributed so $U_{1,j}=-exp(-\rho E_{1,j}[w_{2,j}]+0.5 \rho^2 Var_{1,j}[w_{2,j}])$

Starting with setting $\beta=0$ and $c_1=0$: $U_0 = \left[\mu_0\left(U_{1}\right)^\alpha \right]^{1/\alpha}$

Substituting in the expression for the CE operator: $U_0 = \left[ E_0\left[-ln(-U_{1})/\rho\right]^\alpha \right]^{1/\alpha}$

Putting in our expression for $U_1$: $U_0 = \left[ E_0\left[-ln(exp(-\rho E_{1,j}[w_{2,j}]+0.5 \rho^2 Var_{1,j}[w_{2,j}]))/\rho\right]^\alpha \right]^{1/\alpha}$

Simplifying: $U_0 = \left[ E_0\left[ \left(E_{1,j}[w_{2,j}]-0.5 \rho Var_{1,j}[w_{2,j}]\right) \right]^\alpha \right]^{1/\alpha}$

Setting $\alpha=1$ i.e. an infinite EIS: $U_0 = E_0\left[ \left(E_{1,j}[w_{2,j}]-0.5 \rho Var_{1,j}[w_{2,j}]\right) \right]$

which matches Equation 6 in Kacperczyk et. al. (2016)! This shows that we can derive their utility function from Epstein-Zin preferences, but does make it totally clear what this transformation has to do with an early vs. late resolution of uncertainty.

Another Way to View the Recursive Formulation

To make things clearer, let’s work with a more well-known version of Epstein-Zin preferences in Simon Gilchrist’s lecture notes (these were very helpful when I first learned about recursive utility!): $V_t = \left((1-\beta)c_t^{1-\rho}+\beta[E_t(V_{t+1}^{1-\alpha})]^{(1-\rho)/(1-\alpha)}\right)^{1/(1-\rho)}$

Setting $t=0$, $c_0=0$, $c_1=0$, $\beta=1$: $V_0 = \left([E_0(V_{1}^{1-\alpha})]^{(1-\rho)/(1-\alpha)}\right)^{1/(1-\rho)}$

Notice that $c^{1-\alpha}$ is a version of Constant Relative Risk Aversion (CRRA) utility. CRRA utility simplifies to log utility if relative risk aversion is equal to 1. So, with this in mind, set $\alpha=1$: $V_0 = \left(exp[E_0(ln[V_1])]^{(1-\rho)}\right)^{1/(1-\rho)}$

Set $\rho=0$ (i.e. infinite EIS as we did above): $V_0=exp[E_0(ln[V_1])]$

This is equivalent to maximizing: $\quad V_0=E_0(ln[V_1])$ because $exp(x)$ is a monotone function.

In my setting: $\quad V_1=E_1[exp(-\rho w)]$ i.e. time 1 utility times -1

So the final maximization problem is: $\quad V_0=-E_0(ln[-V_1])$

With Epstein-Zin, there is a preference for an early resolution of uncertainty if $\alpha>(1/EIS)$. As set up here, $\alpha=1$ and $1/EIS=0$, so agents have a preference for early resolution of uncertainty. For expected utility, we would set $\alpha=0$, and then there would be no preference for early resolution of uncertainty.

Why this Matters

As I said above, $U_0 = E_0\left[ \left(E_{1,j}[w_{2,j}]-0.5 \rho Var_{1,j}[w_{2,j}]\right) \right]$ introduces a preference for the early resolution of uncertainty (see e.g. Veldkamp, 2011). There are two types of uncertainty in the model: (1) uncertainty about payoffs at $t=2$, conditional on signals at $t=1$ (2) uncertainty about portfolio you will hold at $t=1$ from the perspective of $t=0$. With these preferences, agents are not averse to uncertainty resolved before time two i.e. are not averse to the uncertainty about which portfolio they will hold.

An intuitive way to see this is that increases in expected variance, $E_0\left[ Var_{1,j}[w_{2,j}]\right)$, linearly decrease utility. With expected utility, $-E_0[E_1[exp(-\rho w)]]$, simplifies to $-E_0[exp\left(-\rho E_{1,j}[w_{2,j}]+0.5 \rho^2 Var_{1,j}[w_{2,j}]\right)]$. Because variance is always positive, utility is decreasing faster than linearly in expected variance.

A more nuanced argument requires a discussion of why learning about particular risks is useful. Expected excess portfolio return achieved through learning depends on the covariance between your portfolio $q$ and asset payoffs $f-p$, $cov(q,f-p)$. Specializing in learning about one asset leads to a high covariance between payoffs and holdings of that asset. The actual portfolio you end up holding, however, can deviate substantially from the time 0 expected portfolio. Learning a little about every risk leads to smaller deviations between the realized and time 0 expected portfolio, but also lowers $cov(q,f-p)$.

With expected utility, investors are averse to time 1 portfolio uncertainty (i.e. risk that signals will lead them to take aggressive bets), so do not like portfolios that deviate substantially from $E_0\left[q\right]$ The utility cost of higher uncertainty from specialization offsets the utility benefit of higher portfolio returns, removing the “planning benefit” experienced by the mean-variance specification.

Recursive utility investors are not averse to risks resolved before time 2, so specialization is a low-risk strategy. Lowers time 2 portfolio risk by loading portfolio heavily on an asset whose payoff risk will be reduced by learning.

This also shows why it is desirable to introduce a preference for an early resolution of uncertainty in endogenous learning models. Think about an investor who wants to learn about AAPL. They do this so they can hold a lot of Apple (AAPL) when it does well, and hold little AAPL when it does poorly. An expected utility investor would be hesitant to learn too much about AAPL, because the fact that their portfolio will vary substantially depending on the signal they get seems risky to them.

Wrap Up

In this post, I showed (1) why adding a $ln$ inside the outer expectation induces a preference for an early resolution of uncertainty and (2) why this is useful in models with endogenous learning. For those interested in these topics, I recommend reading Costis Skiadas’ Asset Pricing Theory (2009) textbook for a discussion of recursive utility and Laura Veldkamp’s Information Choice in Macroeconomics and Finance (2011) for discussion of endogenous learning models.

Text Analysis in Python

Sun, 25 Feb 2018 00:00:00 +0000

Over the past few years, there has been a boom in the number of papers utilizing text as data.

For example, a recent paper by Koijen, Philipson and Uhlig uses SEC filings to measure healthcare firms’ exposure to regulation risk.

I taught a guest lecture last month on using Python to analyze large text datasets, with a focus on SEC filings. The slides are posted below, and I hope they are useful for people just getting into the subject.

A copy of the presentation can be found here (PDF)

Constructing a Stock Screener

Thu, 01 Feb 2018 00:00:00 +0000

Value and Momentum

Two of the most discussed effects in the asset pricing literature are Value and Momentum. Let’s start with the definitions:

1) Value Effect: Stocks with high book-to-market have historically outperformed stocks with low book-to-market
Book-to-Market: (book value of common equity + deferred taxes and investment credit - book value of preferred stock)/(market value of equity). The higher the book-to-market (BM), the “cheaper” the stock, as you are getting more book value for every dollar you invest.

2) Momentum Effect: Stocks with high returns over the previous year (winners) have historically outperformed stocks with low returns over the previous year (losers). Past-year returns are calculated as the cumulative returns from $t-12$ to $t-2$ (month $t-1$ is excluded).

Bonus – Size Effect: Stocks with low market capitalization (price $\times$ shares outstanding) have historically outperformed stocks with high market capitalization. Both the value and momentum effects are stronger among small stocks.

Trading on Value and Momentum

Although there are many ways to construct portfolios based on the value and momentum effects, I will start with the following baseline for back-testing (see here for details):

1) Start with monthly CRSP data, and adjust for delisting returns. I do this by (1) Setting the return to the delisting return in the month that the firm delists if the delisting return is non-missing (2) Set the delisting return to -0.3 (-30%) if the delisting return is missing and the delisting code is 500, 520, 551-573, 574, 580 or 584 (3) Set the delisting return to -1 (-100%) if the delisting return is missing, and the delisting code does not belong to the list above. This is important for calculating returns in the extreme value and momentum portfolios, as these firms have a higher-than-average likelihood of delisting.

2) Each month, select ordinary common shares which have non-stale prices, non-missing returns, non-missing shares outstanding and are traded on the major exchanges (NYSE, AMEX and NASDAQ).

2a) For the value portfolios, select firms with a non-missing book-to-market.
2b) For the momentum portfolios, select firms with a non-missing/non-stale price at t-12, and no more than 4 missing returns between t-12 and t-2.

3) Select only NYSE firms, then calculate percentiles of the sorting variables among these firms each month – these are the breakpoints we are going to use to form portfolios. For example: If you want to form 5 value portfolios, calculate the 20th, 40th, 60th and 80th percentiles of book-to-market for every month in your sample.

4) Merge these breakpoints back into the rest of the data (all NYSE, AMEX and NASDAQ firms). Then sort into portfolios based on the breakpoints. Back to the five value portfolios example: Firms with a book-to-market below the 20th percentile will be put into portfolio one, firms with a book-to-market between the 20th and 40th percentiles will be put into portfolio 2, etc. Note: The portfolios will not all have the same number of firms. This is because the average NSADAQ/AMEX firm is different than the average NYSE firm, so the percentiles will not line up exactly. This prevents small firms from exerting an undue influence on the results.

5) Now you have the portfolio assignments at the end of each month. Portfolios are rebalanced monthly, so these assignments will be used for the following month to prevent a look-ahead bias.

6) All portfolios are value-weighted using last month’s ending market capitalization. This also prevents a look-ahead bias.

7) Convert each portfolio return to an excess return by subtracting the monthly risk-free rate, which can be found at Ken French’s data library.

8) Form a factor portfolio by subtracting the extreme portfolios from one another. Using the 5 value portfolio example: Subtract the 1 portfolio (lowest BM) from the 5 portfolio (highest BM). This is also an excess return. Call these portfolios high-minus-low (HML).

9) Following the observation of Value and Momentum Everywhere, form a “Combo” portfolio which is an equal-weighted average of the value and momentum portfolios. Because the value and momentum effects deliver positive excess returns, and are negatively correlated, a combination of the two should on average outperform either strategy on its own.

Evaluating Portfolio Performance

I am going to start by evaluating performance in the simplest way possible: CAPM alpha – which is a measure of average returns that cannot be explained by a portfolio’s covariance with the market portfolio.

To calculate this, run the following regression:

\begin{equation} R^e_{p,t}=\alpha + \beta R^e_{m,t} + \epsilon_{p,t} \end{equation}

where $R^e_{p,t}$ is the excess return on the portfolio of interest, $R^e_{m,t}$ is the excess return on the market and $\alpha$ denotes the CAPM alpha we are trying to measure.

The table below presents the CAPM alphas and corresponding t-Statistics for 10 portfolios formed on value and momentum using data from 1970-2016. All quantities are annualized.

For both value and momentum, the CAPM alpha is almost monotonically increasing from the low-BM/loser portfolios to the high-BM/winner portfolios. The high-minus-low portfolios both generate positive CAPM alphas, although the CAPM alpha for value is only marginally significant. As expected, the negative correlation between value and momentum gives the COMBO portfolio a higher Sharpe Ratio (average excess return/standard deviation of excess returns) than either portfolio on its own.

Conditioning on Size

In this section I am going to use an alternative portfolio construction: At step 3 above, first divide NYSE firms into two groups – above and below median market capitalization. Then, calculate the breakpoints for value and momentum within each of these two groups. This is useful for understand how the value and momentum effects differ among large and small firms.

I then run the same CAPM regression as above. The table below presents the CAPM alphas and corresponding t-Statistics for $2 \times 5$ portfolios formed on size and value/momentum, using data from 1970-2016. All quantities are annualized.

As above, the CAPM alpha is almost monotonically increasing from the low-BM/loser portfolios to the high-BM/winner portfolios within both the small and large firm groups. As mentioned previously, the CAPM alphas for value and momentum are larger for the group of smaller firms.

Next Steps

With the basics established, you can refine the stock screener by:

1) Using less stale book-to-market data

2) Accounting for mis-measurement of book-to-market with intangible assets

3) Account for heterogeneity across industries

4) Accounting for “junky” stocks that get picked up by a value filter

5) Impose restrictions, such as value-weighted portfolio beta