LakeWater Finance BlogOpinions of a finance PhD student whose office [used to] overlook Lake Michigan. <br /> Copyright Marco Sammon
http://marcosammon.com/
Thu, 01 Nov 2018 22:45:24 +0000Thu, 01 Nov 2018 22:45:24 +0000Jekyll v3.7.4Research Updates<p>Today, I made the following updates to my research <a href="http://marcosammon.com/research/">page</a>:</p>
<p>1) Uploaded my new paper, <em>Passive Ownership and the Stock Market</em>. This paper shows that rising passive ownership has reduced the firm-specific earnings information in stock prices.</p>
<p>2) Added slides for <em>What Triggers National Stock Market Jumps?</em>, presented at <a href="https://site.stanford.edu/2018/session-6">SITE</a> in August, 2018. The biggest new finding is that volatility and trading volume are lower after jumps with high clarity. We define clarity as the first principal component of (1) agreement across newspapers describing the same jump (2) how confidently the journalist advanced their explanation (3) how easy it was to categorize the article (4) the share of newspapers that give an explanation for the jump (i.e. the share of newspapers that did not have ‘unknown & no explanation’ coding).</p>
<p>3) Added an abstract for a new paper <em>Trade Policy Uncertainty and Stock Market Performance</em>, joint with Marcelo Bianconi and Federico Esposito. The paper is still in preparation but a draft should be uploaded soon. We use the U.S. government granting China Permanent Normal Trade Relations (PNTR) in 2000 as a resolution of tariff uncertainty. The main result is that stocks exposed to more tariff uncertainty before the shock have relatively low returns after uncertainty was resolved.</p>
<p>Our theory model shows three main ways the resolution of tariff uncertainty could affect stock returns. The first is a competition effect: higher tariffs for Chinese goods imply higher prices, which makes US goods relatively cheaper for US consumers, increasing profits for US firms. The second is the direct input effect: higher tariffs on Chinese goods imply a higher cost of intermediate inputs for US firms, and thus lower profits. The third is an indirect input effect: because expensive Chinese intermediate inputs make US goods more expensive, US goods sold in China will be more expensive relative to Chinese goods, and thus US firms will lose market shares and have lower profits. Our goal is to isolate and measure the effect of these different mechanisms on the realized stock returns.</p>
Tue, 04 Sep 2018 00:00:00 +0000
http://marcosammon.com/2018/09/04/research.html
http://marcosammon.com/2018/09/04/research.htmlRewriting History<p>Today, I re-read all of my old posts, and made at least one change to each. Although most changes were minor, it was a good excersise to review my thinking from two years ago, and see how views have totally changed on some topics, and remained the same on others.</p>
Tue, 04 Sep 2018 00:00:00 +0000
http://marcosammon.com/2018/09/04/history.html
http://marcosammon.com/2018/09/04/history.htmlText Analysis in Python<p>Over the past few years, there has been a boom in the number of papers utilizing text as data.</p>
<p>For example, a <a href="http://onlinelibrary.wiley.com/doi/10.3982/ECTA11182/abstract">
recent paper
</a> by Koijen, Philipson and Uhlig uses SEC filings to measure healthcare firms’ exposure to regulation risk.</p>
<p>I taught a guest lecture last month on using Python to analyze large text datasets, with a focus on SEC filings. The slides are posted below, and I hope they are useful for people just getting into the subject.</p>
<p>
<a href="/images/python_text_analysis.pdf" target="_blank">
A copy of the presentation can be found here (PDF)
</a>
</p>
Sun, 25 Feb 2018 00:00:00 +0000
http://marcosammon.com/2018/02/25/text.html
http://marcosammon.com/2018/02/25/text.htmlConstructing a Stock Screener<h1 id="value-and-momentum">Value and Momentum</h1>
<p>Two of the most discussed effects in the asset pricing literature are Value and Momentum.
Let’s start with the definitions: <br /></p>
<p>1) Value Effect: Stocks with high book-to-market have historically outperformed stocks with low book-to-market <br />
Book-to-Market: (book value of common equity + deferred taxes and investment credit - book value of preferred stock)/(market value of equity). The higher the book-to-market (BM), the “cheaper” the stock, as you are getting more book value for every dollar you invest. <br /></p>
<p>2) Momentum Effect: Stocks with high returns over the previous year (winners) have historically outperformed stocks with low returns over the previous year (losers). Past-year returns are calculated as the cumulative returns from <script type="math/tex">t-12</script> to <script type="math/tex">t-2</script> (month <script type="math/tex">t-1</script> is excluded). <br /></p>
<p>Bonus – Size Effect: Stocks with low market capitalization (price <script type="math/tex">\times</script> shares outstanding) have historically outperformed stocks with high market capitalization. Both the value and momentum effects are stronger among small stocks. <br /></p>
<h1 id="trading-on-value-and-momentum">Trading on Value and Momentum</h1>
<p>Although there are many ways to construct portfolios based on the value and momentum effects, I will start with the following baseline for back-testing (<a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2961979" title="b1">see here for details</a>):</p>
<p>1) Start with monthly CRSP data, and adjust for delisting returns. I do this by (1) Setting the return to the delisting return in the month that the firm delists if the delisting return is non-missing (2) Set the delisting return to -0.3 (-30%) if the delisting return is missing and the delisting code is 500, 520, 551-573, 574, 580 or 584 (3) Set the delisting return to -1 (-100%) if the delisting return is missing, and the delisting code does not belong to the list above. This is important for calculating returns in the extreme value and momentum portfolios, as these firms have a higher-than-average likelihood of delisting.</p>
<p>2) Each month, select ordinary common shares which have non-stale prices, non-missing returns, non-missing shares outstanding and are traded on the major exchanges (NYSE, AMEX and NASDAQ).</p>
<p>2a) For the value portfolios, select firms with a non-missing book-to-market.<br />
2b) For the momentum portfolios, select firms with a non-missing/non-stale price at t-12, and no more than 4 missing returns between t-12 and t-2.</p>
<p>3) Select only NYSE firms, then calculate percentiles of the sorting variables among these firms each month – these are the breakpoints we are going to use to form portfolios. For example: If you want to form 5 value portfolios, calculate the 20th, 40th, 60th and 80th percentiles of book-to-market for every month in your sample.</p>
<p>4) Merge these breakpoints back into the rest of the data (all NYSE, AMEX and NASDAQ firms). Then sort into portfolios based on the breakpoints. Back to the five value portfolios example: Firms with a book-to-market below the 20th percentile will be put into portfolio one, firms with a book-to-market between the 20th and 40th percentiles will be put into portfolio 2, etc.
Note: The portfolios will not all have the same number of firms. This is because the average NSADAQ/AMEX firm is different than the average NYSE firm, so the percentiles will not line up exactly. This prevents small firms from exerting an undue influence on the results.</p>
<p>5) Now you have the portfolio assignments at the end of each month. Portfolios are rebalanced monthly, so these assignments will be used for the following month to prevent a look-ahead bias.</p>
<p>6) All portfolios are value-weighted using last month’s ending market capitalization. This also prevents a look-ahead bias.</p>
<p>7) Convert each portfolio return to an excess return by subtracting the monthly risk-free rate, which can be found <a href="http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html" title="b2">at Ken French’s data library.</a></p>
<p>8) Form a factor portfolio by subtracting the extreme portfolios from one another. Using the 5 value portfolio example: Subtract the 1 portfolio (lowest BM) from the 5 portfolio (highest BM). This is also an excess return. Call these portfolios high-minus-low (HML).</p>
<p>9) Following the observation of <a href="http://pages.stern.nyu.edu/~lpederse/papers/ValMomEverywhere.pdf" title="b2">Value and Momentum Everywhere</a>, form a “Combo” portfolio which is an equal-weighted average of the value and momentum portfolios. Because the value and momentum effects deliver positive excess returns, and are negatively correlated, a combination of the two should on average outperform either strategy on its own.</p>
<h1 id="evaluating-portfolio-performance">Evaluating Portfolio Performance</h1>
<p>I am going to start by evaluating performance in the simplest way possible: CAPM alpha – which is a measure of average returns that cannot be explained by a portfolio’s covariance with the market portfolio.</p>
<p>To calculate this, run the following regression:</p>
<p>\begin{equation}
R^e_{p,t}=\alpha + \beta R^e_{m,t} + \epsilon_{p,t}
\end{equation}</p>
<p>where <script type="math/tex">R^e_{p,t}</script> is the excess return on the portfolio of interest, <script type="math/tex">R^e_{m,t}</script> is the excess return on the market and <script type="math/tex">\alpha</script> denotes the CAPM alpha we are trying to measure.</p>
<p>The table below presents the CAPM alphas and corresponding t-Statistics for 10 portfolios formed on value and momentum using data from 1970-2016. All quantities are annualized.</p>
<p><img src="/Post_Images/2_1_2018/table1.PNG" alt="Figure 1" /></p>
<p>For both value and momentum, the CAPM alpha is almost monotonically increasing from the low-BM/loser portfolios to the high-BM/winner portfolios. The high-minus-low portfolios both generate positive CAPM alphas, although the CAPM alpha for value is only marginally significant. As expected, the negative correlation between value and momentum gives the COMBO portfolio a higher Sharpe Ratio (average excess return/standard deviation of excess returns) than either portfolio on its own.</p>
<h1 id="conditioning-on-size">Conditioning on Size</h1>
<p>In this section I am going to use an alternative portfolio construction: At step 3 above, first divide NYSE firms into two groups – above and below median market capitalization. Then, calculate the breakpoints for value and momentum within each of these two groups. This is useful for understand how the value and momentum effects differ among large and small firms.</p>
<p>I then run the same CAPM regression as above. The table below presents the CAPM alphas and corresponding t-Statistics for <script type="math/tex">2 \times 5</script> portfolios formed on size and value/momentum, using data from 1970-2016. All quantities are annualized.</p>
<p><img src="/Post_Images/2_1_2018/table2.PNG" alt="Figure 2" /></p>
<p>As above, the CAPM alpha is almost monotonically increasing from the low-BM/loser portfolios to the high-BM/winner portfolios within both the small and large firm groups. As mentioned previously, the CAPM alphas for value and momentum are larger for the group of smaller firms.</p>
<h1 id="next-steps">Next Steps</h1>
<p>With the basics established, you can refine the stock screener by:</p>
<p>1) Using less stale book-to-market data</p>
<p>2) Accounting for mis-measurement of book-to-market with intangible assets</p>
<p>3) Account for heterogeneity across industries</p>
<p>4) Accounting for “junky” stocks that get picked up by a value filter</p>
<p>5) Impose restrictions, such as value-weighted portfolio beta</p>
Thu, 01 Feb 2018 00:00:00 +0000
http://marcosammon.com/2018/02/01/stockscreen.html
http://marcosammon.com/2018/02/01/stockscreen.htmlExpected Discounted Utility as Recursive Utility<h1 id="expected-discounted-utility">Expected Discounted Utility</h1>
<p>Expected discounted utility is one of the most common ways to represent preferences over risky consumption plans. Consider an agent, sitting at time <script type="math/tex">t</script>, who will receive a consumption stream <script type="math/tex">c</script> until <script type="math/tex">T</script>:
\begin{equation}
U_t(c)= E_t \left[ \sum \limits_{s=t}^T \beta^{s-t}u_s(c_s)\right]
\end{equation}
Where <script type="math/tex">\beta</script> is the discount factor and <script type="math/tex">u</script> is a within-period utility function. A problem with expected discounted utility is that it cannot separate preferences for smoothing over time, and smoothing across states.<br />
Consider the following example:
You are stranded on an island at <script type="math/tex">t=0</script>. A man comes in a boat and offers you a choice of two deals (1) Every morning he comes and flips a coin, if it comes up heads, you get a bushel of bananas that day (2) He flips a coin today, if it comes up heads you get a bushel of bananas every day until time <script type="math/tex">T</script>, and if it comes up tails you get no bananas until time <script type="math/tex">T</script>.
It’s initiative that plan 2 is riskier than plan 1, but under expected discounted utility, for any <script type="math/tex">\beta</script> and <script type="math/tex">u</script> the agent is indifferent between the two plans:
\begin{equation}
U(Plan 1) = \sum\limits_{t=0}^T \beta^t \frac{u_t(1)+u_t(0)}{2}= U(Plan 2)
\end{equation}</p>
<h1 id="recursive-utility">Recursive Utility</h1>
<p>The only way to even partially separate preferences for smoothing over time, and preferences for smoothing across states is to use recursive utility (see Skiadas 2009 for a complete proof - this is an if and only if relationship).
Recursive utility has two ingredients, the aggregator, which determines preferences over deterministic plans (time smoothing) <script type="math/tex">f(t,c_t,\upsilon_t \left(U_{t+1}(c)\right))</script> and the conditional certainty equivalent <script type="math/tex">\upsilon_t(c)</script> (state smoothing). The steps below formulate expected discounted utility as recursive utility.
For simplicity, drop the dependence of all functions on time, so we can remove all the subscript <script type="math/tex">t</script>’s. Now, propose a desirable property for the utility function - normalization. Consider any deterministic plan <script type="math/tex">\alpha</script>, then a utility is normalized if <script type="math/tex">\bar{U}(\alpha)=\alpha</script>. Normalize utility <script type="math/tex">U</script>, the expected discounted utility defined above, as <script type="math/tex">\bar{U}(c)=\psi^{-1}(U(c))</script> where <script type="math/tex">\psi_t(\alpha)=\sum\limits_{s=t}^T \beta^{s-t} u(\alpha)</script>. Basically, <script type="math/tex">\psi</script> gives the discounted utility of deterministic plan <script type="math/tex">\alpha</script>, so <script type="math/tex">\psi^{-1}</script> gives the deterministic <script type="math/tex">\alpha</script> required to make the agent indifferent between potentially risky plan <script type="math/tex">c</script> and deterministic plan <script type="math/tex">\alpha</script>.<br />
For expected discounted utility, the aggregator is: <script type="math/tex">f(t,x,y)=\psi^{-1}_t (u(x)+\beta \psi_{t+1}(y))</script>.
The intuition is that with expected discounted utility, the agent’s utility from plan <script type="math/tex">c</script> is a weighted average of their consumption today, and the utility of the equivalent deterministic plan until <script type="math/tex">T</script>.
For utility to be normalized, the aggreator must satisfy <script type="math/tex">f(t,\alpha,\alpha)=\alpha</script> for any deterministic plan <script type="math/tex">\alpha</script>. Put this into the equation above to solve for <script type="math/tex">\psi</script>:
<script type="math/tex">f(t,x,x)=\psi_t^{-1}( u(x) + \beta \psi_{t+1}(x)) = x</script>. Then, apply <script type="math/tex">\psi_t</script> to both sides:
\begin{equation}
u(x) + \beta \psi_{t+1} (x) = \psi_t(x)
\end{equation}
Fix <script type="math/tex">\psi_T=u</script>, and interpret terminal consumption value <script type="math/tex">c_T</script> as consuming <script type="math/tex">c_T</script> for the rest of time (equivalently, imagine letting <script type="math/tex">T</script> go to infinity). This implies we can drop the subscripts on the <script type="math/tex">\psi</script>:
\begin{equation}
u(x)=\psi(x)-\beta\psi(x)
\end{equation}</p>
<p>Rearranging yields <script type="math/tex">\psi(x)=(1-\beta)^{-1}u(x)</script> and <script type="math/tex">\psi^{-1}(x)=u^{-1}((1-\beta)x)</script>. Putting this back into our expression above for <script type="math/tex">f(t,x,y)</script> implies:
\begin{equation}
f(t,x,y)=u^{-1}((1-\beta)u(x)+\beta u(y))
\end{equation}
Given the way the aggregator is defined, we can see that <script type="math/tex">f</script> depends on the curvature of <script type="math/tex">u</script> - in other words, the within period utility function <script type="math/tex">u</script> will influence preferences for smoothing over time. This also gives intuition for how to make an agent not indifferent between deal (1) and deal (2) described above - <script type="math/tex">f</script> needs to be defined independently of <script type="math/tex">u</script> (or <script type="math/tex">\upsilon</script>).</p>
<h1 id="conclusion">Conclusion</h1>
<p>Recursive utility is a general framework, with expected discounted utility as a special case. For a deeper look at recursive utility, see <a href="https://press.princeton.edu/titles/8906.html">Asset Pricing Theory</a> by Costis Skiadas.</p>
Wed, 09 Nov 2016 00:00:00 +0000
http://marcosammon.com/2016/11/09/euru.html
http://marcosammon.com/2016/11/09/euru.htmlDoes Adding a Constant Always Increase R-Squared?<p>Recall the formula for R-squared from your first statistics class:
\begin{equation}
R^2=\frac{\sum\limits_{i=1}^N (\hat{Y_i}-\overline{Y})^2}{\sum\limits_{i=1}^N (\hat{Y_i}-\overline{Y})^2 + \sum\limits_{i=1}^N (\hat{Y_i}-Y_i)^2}= \frac{SS_{model}}{SS_{model}+SS_{residual}}
\end{equation}
In that same class, you are taught that adding more regressors can not decrease R-squared, even if they have no relationship to the dependent variable. <br />
This, however, does not apply to the constant term. At first pass, this seems hard to believe: An unconstrained model should always do at least as well as a constrained model. <br />
The catch is, the variance explained by the constant term is not included in the calculation of R-squared - we subtract <script type="math/tex">\overline{Y}</script> when calculating <script type="math/tex">SS_{model}</script>. <br />
The no constant restriction implicitly sets <script type="math/tex">\overline{Y}</script> to zero. This will increase both the model sum of squares and the residual sum of squares. The model sum of squares effect dominates, however, and <script type="math/tex">R^2</script> is pushed towards one.<br />
I discovered this today, regressing realized GDP on forecasted GDP. Although the sum of squared errors is nearly identical for both models, the model sum of squares is much larger for the no-intercept case:</p>
<p><img src="/Post_Images/10_17_2016/gdp.png" alt="fig" /></p>
<p>My takeaway: Be careful when writing your own regress command in Matlab, or any other language. Omitting a constant term can drastically change R-squared.</p>
Mon, 17 Oct 2016 00:00:00 +0000
http://marcosammon.com/2016/10/17/constant_regression.html
http://marcosammon.com/2016/10/17/constant_regression.htmlCommentary on Understanding Unit Rooters: A Helicopter Tour by Sims and Uhlig (1991)<p>Sims and Uhlig argue: although classical (frequentist) <script type="math/tex">p</script>-values are asymptotically equivalent to Bayesian posteriors, they should not be interpreted as probabilities. This is because the equivalence breaks down in non-stationary models. <br />
The paper uses small sample sizes, with <script type="math/tex">T=100</script> - This post examines how the results change with <script type="math/tex">T=10,000</script>, when the asymptotic behavior kicks in.</p>
<h1 id="the-setup">The Setup</h1>
<p>Consider a simpler AR(1) model:
\begin{equation}
y_t=\rho y_{t-1} + \epsilon_t
\end{equation}
To simplify things, suppose <script type="math/tex">\epsilon_t \sim N(0,1)</script>. Classical inference suggests that for <script type="math/tex">% <![CDATA[
|\rho|<1 %]]></script>, the estimator is asymptotically normal and converges at rate <script type="math/tex">\sqrt{T}</script>:
\begin{equation}
\sqrt{T}(\hat{\rho}-\rho) \rightarrow^{L} N(0,(1-\rho^2))
\end{equation}
For <script type="math/tex">\rho=1</script>, however, we get a totally different distribution, which converges at rate <script type="math/tex">T</script>, instead of rate <script type="math/tex">\sqrt{T}</script>:
\begin{equation}
T(\hat{\rho}-\rho)=T(\hat{\rho}-1)= \rightarrow^{L} \frac{(1/2)([W(1)]^2-1)}{\int_0^1 [W(r)]^2 dr}
\end{equation}
where <script type="math/tex">W(1)</script> is a Brownian motion. Although it looks complicated, it is easier to visualize when you see <script type="math/tex">W(1)^2</script> is actually a <script type="math/tex">\chi^2(1)</script> variable. This is left skewed, as the probability that a <script type="math/tex">\chi^2(1)</script> is less than one is 0.68 and large realizations of <script type="math/tex">[W(1)]^2</script> in the numerator get down-weighted by a large denominator (it is the same Brownian motion in the numerator and denominator).
In the paper, the authors choose 31 values of <script type="math/tex">\rho</script> between 0.8 to 1.1 in increments of 0.01. For each <script type="math/tex">\rho</script> they simulate 10,000 samples of the AR(1) model described above with <script type="math/tex">T=100</script>. Finally, they run an OLS regression of <script type="math/tex">y_t</script> on <script type="math/tex">y_{t-1}</script> to get the distributions for <script type="math/tex">\hat{\rho}</script> (the OLS estimator of <script type="math/tex">\rho</script>). Below I show the distribution of <script type="math/tex">\hat{\rho}</script> for selected values of <script type="math/tex">\rho</script>:</p>
<p><img src="/Post_Images/10_1_2016/rhgr2.png" alt="fig" /></p>
<p>Another way to think about the data is to look at the distribution of <script type="math/tex">\rho</script> given observed values of <script type="math/tex">\hat{\rho}</script>. This is symmetric about 0.95:</p>
<p><img src="/Post_Images/10_1_2016/opp952.png" alt="fig" /></p>
<p>Their problem with using <script type="math/tex">p</script>-values as probabilities is that if we observe <script type="math/tex">\hat{\rho}=0.95</script>, we can reject the null of <script type="math/tex">\rho=0.9</script>, but we fail to reject the null of <script type="math/tex">\rho=1</script> (think about the area in the tails after normalizing the distribution to integrate to 1), even though <script type="math/tex">\rho</script> given <script type="math/tex">\hat{\rho}</script> is roughly symmetric about 0.95:</p>
<p><img src="/Post_Images/10_1_2016/hypo_test2.png" alt="fig" /></p>
<p>The problem is distortion by irrelevant information: Values of <script type="math/tex">\hat{\rho}</script> much below 0.95 are more likely given <script type="math/tex">\rho=1</script> than are values of <script type="math/tex">\hat{\rho}</script> much above 0.95 given <script type="math/tex">\rho=0.9</script>. This is irrelevant as we have already observed <script type="math/tex">\hat{\rho}=0.95</script>, so we know it is not far above or below.</p>
<p>The prior required to generate these results (i.e. the prior that would let us interpret <script type="math/tex">p</script>-values as posterior probabilities) is sample dependent. Usually, classical inference is asymptotically equivalent to Bayesian inference using a flat prior, but it is not the case here. The authors show that classical analysis is implicitly putting progressively more weight on values of <script type="math/tex">\rho</script> above one as <script type="math/tex">\hat{\rho}</script> gets closer to 1.</p>
<h1 id="testing-with-larger-samples">Testing with Larger Samples</h1>
<p>At first, I found the results counter-intuitive. The first figure above shows that the skewness arrives gradually in finite samples. This is strange, because the asymptotic properties of <script type="math/tex">\hat{\rho}</script> are only non-normal for <script type="math/tex">\rho=1</script>. I figured this was the result of using small samples of <script type="math/tex">T=100</script>. Under a flat prior, the distribution of <script type="math/tex">\rho</script> given the data and <script type="math/tex">\epsilon_t</script> having variance of 1 is:
\begin{equation}
\rho \sim N(\hat{\rho}, (\sum\limits_{t=1}^T y_{t-1})^{-1})
\end{equation}
This motivates my intuition for why the skewness arrives slowly: even for small samples, as <script type="math/tex">\rho</script> gets close to 1, <script type="math/tex">\sum\limits_{t=1}^T y_{t-1}</script> can be very large.</p>
<p>I repeat their analysis, except instead of <script type="math/tex">T=100</script>, I use <script type="math/tex">T=10,000</script>. As you can see, the asymptotic behavior kicks in and the skewness arrives only at <script type="math/tex">\rho=1</script>:</p>
<p><img src="/Post_Images/10_1_2016/rhgr.png" alt="fig" /></p>
<p>I also found that for <script type="math/tex">T=10,000</script> the distribution of <script type="math/tex">\rho</script> conditional on <script type="math/tex">\hat{\rho}</script>, does not spread out more for smaller values of <script type="math/tex">\hat{\rho}</script>, that is a small sample result.</p>
<h1 id="conclusion">Conclusion</h1>
<p>The point of this paper is to show that the classical way of dealing with unit roots implicitly makes undesirable assumptions - you need a sample-dependent prior which puts more weight on high values of <script type="math/tex">\rho</script>. To a degree, the authors’ results are driven by the short length of the simulated series. The example where you reject <script type="math/tex">\rho=0.9</script> but fail to reject <script type="math/tex">\rho=1</script> wouldn’t happen in large samples, as the asymptotic kick in and faster rate of convergence for <script type="math/tex">\rho=1</script> gives the distribution less spread.</p>
<p>For now, however, the authors’ criticism is still valid. With quarterly data from 1950-Present you get about 260 observations. Macroeconomics will have to survive until year 4,450 for there to be 10,000 observations, and that’s a long way off.</p>
Sat, 01 Oct 2016 00:00:00 +0000
http://marcosammon.com/2016/10/01/unit_roots.html
http://marcosammon.com/2016/10/01/unit_roots.htmlWhat's in a Name?<p>Given the success of companies like Apple, Amazon and American Express, I was curious if a company’s name could predict its expected return. It would be worrying if this worked, as there is no fundamental reason why companies starting with A-E should outperform companies starting with, say, F-J. Stories like stock screens returning results alphabetically or investors having an aversion to certain letters seem too far-fetched. I sort into portfolios based on the first letter of a company’s name to check this result.</p>
<h1 id="methodology">Methodology</h1>
<p>I get monthly stock data from 1980-2015 in CRSP, and restrict to ordinary common shares traded on major exchanges. At the end of every month, I extract the first character of a company’s name and give it a numerical value between 1 and 35: <br />
Example 1: First character of 1-800 Flowers is, “1” and it is given a value of 1 (there are no names starting with 0) <br />
Example 2: First character of Apple is “A” and it is given a value of 10 <br />
Example 3: First character of Zing Technologies is “Z” and it is given a value of 35 <br />
I then sort into 10 value-weighted portfolios based on deciles of this number (i.e. the bottom 10% of values are in portfolio 1, etc.). This allows the letter breakpoints to change slightly over time with changes in the frequency of first letters. Because this is a discrete measure, even within a month, portfolios don’t necessarily have the same number of securities. I tried making the portfolios contain the same number of securities and the results were similar. <br />
In addition, I create an “HML” style factor factor, which is the return on portfolio 10 minus the return on portfolio 1 (reading it again, maybe ZMA - for Z minus A - would have been a better name). <br />
I also tried sorting into 6 value-weighted portfolios based on the first letter themselves: Companies starting with 1-9 are in portfolio 1 (although there are no companies like this before 1983), A-E are in portfolio 2, F-J are in portfolio 3, K-O are in portfolio 4, P-T are in portfolio 5, and U-Z are in portfolio 6. I construct another HML style factor, except this time it is portfolio 6 minus portfolio 2, because portfolio 1 is sparsely populated. Again, the portfolios will not have the same number of firms, as some first letters are more popular than others.</p>
<h1 id="empirical-results">Empirical Results</h1>
<p>The table below shows the annualized (multiplied by 12) average returns, annualized (multiplied by <script type="math/tex">\sqrt{12}</script>) standard deviation and annualized Sharpe ratio: <br /></p>
<p><img src="/Post_Images/9_16_2016_2/sumstats_1.PNG" alt="fig" /> <br /></p>
<p>There is little difference in average returns across portfolios and sorting methodologies - they all pretty much track the market. The HML factors have an insignificant alpha once you control for market exposure. <br />
For robustness, I calculated the same statistics using only data from 2000 to 2015:</p>
<p><img src="/Post_Images/9_16_2016_2/sumstats_2.PNG" alt="fig" /> <br /></p>
<p>Post 2000, with the decile portfolios, the firms in portfolio 1 returned on average 2.3% more than firms in portfolio 10, and given this is based on deciles, it is not caused by portfolio 1 being sparsely populated. A t-test shows this difference is marginally statistically significant with <script type="math/tex">p=0.08</script>. I thought it was being driven by a size effect (small companies names are more likely to start with a number), so I repeated the exercise for the largest 500 firms and the effect goes away. Even before excluding small firms the alpha was insignificant once controlling for market, SMB and HML exposure. <br />
For the 6-portfolio sort, you can see they pretty much track the market (except for the 0-9 portfolio, which as mentioned above, is sparsely populated): <br />
<img src="/Post_Images/9_16_2016_2/1.png" alt="fig" /> <br />
<img src="/Post_Images/9_16_2016_2/2.png" alt="fig" /></p>
<h1 id="conclusion">Conclusion</h1>
<p>When searching for new asset pricing factors, it is important to avoid data-snooping. Here, I sorted on something that shouldn’t predict returns - and it didn’t - but I’m sure if I tried 20 more random sorts, one of them would create a new (totally spurious) factor. At least for now, alphabet risk is priced!</p>
<h1 id="update">Update</h1>
<p>In his 2016 presidential address for the American Finance Association, Campbell Harvey implemented a similar sorting on tickers. See <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/jofi.12530">here</a> for the full text.</p>
Fri, 16 Sep 2016 00:00:00 +0000
http://marcosammon.com/2016/09/16/alphabet.html
http://marcosammon.com/2016/09/16/alphabet.htmlPricing a Digital Option<p>This post is based on problems 2.10 and 2.11 in, “Heard on the Street” by Timothy Falcon Crack. I was asked how to price a digital option in a job interview - and had no idea what to do!</p>
<h1 id="european-call-options">European Call Options</h1>
<p>A European call option is the right to buy an asset at the strike price, <script type="math/tex">K</script>, on the option’s expiration date, <script type="math/tex">T</script>. A call is only worth exercising (using) if the underlying price, <script type="math/tex">S</script>, is greater than <script type="math/tex">K</script> at <script type="math/tex">T</script>, as the payoff from exercising is <script type="math/tex">S-K</script>. The plot below shows the value of a call option, as a function of the underlying asset’s price, with <script type="math/tex">K=100</script>:</p>
<p><img src="/Post_Images/9_15_2016/1.PNG" alt="fig" /> <br /></p>
<p>Selling a call option with a strike <script type="math/tex">K=100</script> earns you the call’s price, <script type="math/tex">c</script>, today, but your payoff will be decreasing in the underlying price:</p>
<p><img src="/Post_Images/9_15_2016/2.PNG" alt="fig" /> <br /></p>
<h1 id="digital-call-options">Digital Call Options</h1>
<p>A digital call option with <script type="math/tex">K=100</script> is similar - it pays off one dollar if <script type="math/tex">S\geq100</script> at expiration, and pays off zero otherwise:</p>
<p><img src="/Post_Images/9_15_2016/3.PNG" alt="fig" /> <br /></p>
<p>Suppose you have a model for pricing regular call options. If you’re using Black-Scholes the price of the call, <script type="math/tex">c</script>, is a function of <script type="math/tex">K</script>, <script type="math/tex">S</script>, time to expiration <script type="math/tex">T-t</script>, the volatility of the underlying asset <script type="math/tex">\sigma</script>, and the risk free rate <script type="math/tex">r</script>:
\begin{equation}
c=F(K,S,T-t,\sigma,r)
\end{equation}
Now - suppose the model is correct. How can you use <script type="math/tex">F(K,\cdot)</script> to price the digital option?</p>
<h1 id="replicating-the-digital-option">Replicating the Digital Option</h1>
<p>The trick is to replicate the digital option’s payoff with regular calls. As a starting point, consider buying a call with <script type="math/tex">K=100</script> and selling a call with <script type="math/tex">K=101</script>:</p>
<p><img src="/Post_Images/9_15_2016/4.PNG" alt="fig" /> <br /></p>
<p>This is close to the digital option, but not exactly right. We want to make the slope at 100 steeper, so we need to buy more options. This is because a call’s payoff increases one-for-one with the underlying once the option is in the money, so with one option you are stuck with a slope of one. <br />
Consider buying two calls with <script type="math/tex">K=100</script> and selling two calls at <script type="math/tex">K=100.5</script>:</p>
<p><img src="/Post_Images/9_15_2016/5.PNG" alt="fig" /> <br /></p>
<p>As opposed to a slope of 1 between 100 and 101, now we have a slope of two between 100 and 100.5. <br /></p>
<p>Generalizing this idea - consider a number <script type="math/tex">\epsilon>0</script>. To get a slope of <script type="math/tex">\frac{1}{\epsilon}</script>, you buy <script type="math/tex">\frac{1}{\epsilon}</script> calls at <script type="math/tex">K=100</script> and you sell <script type="math/tex">\frac{1}{\epsilon}</script> calls at <script type="math/tex">K=100+\epsilon</script>. Here’s what it looks like for <script type="math/tex">\epsilon=\frac{1}{10}</script>:</p>
<p><img src="/Post_Images/9_15_2016/6.PNG" alt="fig" /> <br /></p>
<p>Given that the slope is <script type="math/tex">\frac{1}{\epsilon}</script>, to get an infinite slope, we take the limit as <script type="math/tex">\epsilon</script> goes to zero. <br /></p>
<p>How much will the above portfolio cost? You earn <script type="math/tex">\frac{1}{\epsilon}F(100+\epsilon, \cdot)</script> from selling the <script type="math/tex">K=100+\epsilon</script> calls, and pay <script type="math/tex">\frac{1}{\epsilon}F(100, \cdot)</script> for the <script type="math/tex">K=100</script> calls. The net cost is:
\begin{equation}
lim_{\epsilon \rightarrow 0} \frac{F(100+\epsilon,\cdot)-F(100,\cdot)}{\epsilon}
\end{equation}</p>
<p>What does this look like? A derivative! It might look more familiar if I re-wrote it as:</p>
<p>\begin{equation}
lim_{\epsilon \rightarrow 0} \frac{F(K+\epsilon)-F(K)}{\epsilon}
\end{equation}</p>
<p>The price of the digital option is the derivative of <script type="math/tex">F</script> with respect to the strike price <script type="math/tex">K</script>.</p>
<h1 id="conclusion">Conclusion</h1>
<p>Many complicated payoffs can be re-created as combinations of vanilla puts and calls. For an overview, see the first few chapters of Sheldon Natenberg’s, “Option Volatility & Pricing”.</p>
Thu, 15 Sep 2016 00:00:00 +0000
http://marcosammon.com/2016/09/15/digital.html
http://marcosammon.com/2016/09/15/digital.htmlSkewness and Expected Returns<h1 id="intuition">Intuition</h1>
<p>From Harvey and Siddique (2000): “[Risk averse] investors should
prefer portfolios that are [positive]-skewed to portfolios that are [negative]-skewed.” The <a href="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f8/Negative_and_positive_skew_diagrams_(English).svg/2000px-Negative_and_positive_skew_diagrams_(English).svg.png">graphic</a> below shows the difference - negative skew entails a long left tail, and a mean below the median:</p>
<p><img src="/Post_Images/8_9_2016/skew.png" alt="fig" /></p>
<p>The following example, with a $1 portfolio and <script type="math/tex">u(c_t)=log(c_t)</script>, supports this claim - assets 1 and 2 have the same expected return, but 1 has left skewed returns and 2 has right skewed returns. The investor gets higher expected utility from asset 2:</p>
<p><img src="/Post_Images/8_9_2016/example.PNG" alt="fig" /></p>
<h1 id="skewness-and-expected-returns">Skewness and Expected Returns</h1>
<p>Let’s look at the two families of 2<script type="math/tex">\times</script>3 sorted portfolios using <a href="http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html">Ken French’s data</a> from 1980-present: Size/Book-to-Market (BM) and Size/Momentum (Prior). For reference, the CRSP value-weighted index had a skewness of -0.7638 over the same period. <br />
The small/high BM portfolio has the highest expected returns - it also has the largest skewness (in absolute value).</p>
<p><img src="/Post_Images/8_9_2016/sizebm.PNG" alt="fig" /></p>
<p><img src="/Post_Images/8_9_2016/sbm_scatter.png" alt="fig" /></p>
<p>The same point is conveyed by histograms (the fitted normal has the same mean and variance as the underlying data). The small/high BM portfolio has a heavy left tail - mass above the blue line to the left of the mean - not present in the big/low BM portfolio: <br />
<img src="/Post_Images/8_9_2016/hist1.png" alt="fig" /></p>
<p>The relationship is not as strong in the size/momentum portfolios (low prior return = low momentum), but it is still present.</p>
<p><img src="/Post_Images/8_9_2016/sizemom.PNG" alt="fig" /></p>
<p><img src="/Post_Images/8_9_2016/smom_scatter.png" alt="fig" /></p>
<p>Here, the histogram shows a slightly different picture than the summary statistics - in the small/high prior portfolio, there are a few extreme events (momentum crashes) not present in the big/low prior portfolio. Rare disaster risk could weaken the relationship between skewness and expected returns in these portfolios.</p>
<p><img src="/Post_Images/8_9_2016/hist2.png" alt="fig" /></p>
<h1 id="conclusion">Conclusion</h1>
<p>To a first order, the intuition is correct - the more negatively skewed portfolios have higher expected returns.</p>
Tue, 09 Aug 2016 00:00:00 +0000
http://marcosammon.com/2016/08/09/skewness.html
http://marcosammon.com/2016/08/09/skewness.html