A Leader in Unstructured Financial Data
In this brief analysis we discuss how Social Sentiment can enhance stock selection and be used to tilt the weights of S&P 500 Constituents to outperform market benchmarks.
SMA uses our patent NLP to calculate sentiment on each Tweet with fine grain scores ranging from -1.0000 to 1.0000. Negative 1 is extremely negative and positive 1 is extremely positive. We aggregate these values by stock over a 24-hour lookback period. This 24-hour summation of Social Sentiment is called ‘Raw-S’. We use a 7-day summation of Raw-S to calculate a Week-Raw-S for each stock. Popular securities, such as TSLA or AAPL, will have more extreme Raw-S and Week-Raw-S values due to higher Twitter volume. The Week-Raw-S is compared to the mean and standard deviation of the previous 13-weeks’ worth of Week-Raw-S values to calculate a weekly standardized score across all securities. We then take a weighted average of the weekly standardized score over the previous 4 weeks to determine the Monthly Sentiment Score. We take a weighted average of weekly standardized scores, sentiment from Tweets received in the last week have more impact on the Monthly Sentiment Score than Sentiment from Tweets 4 weeks ago.
For this theoretical model, we rebalance monthly at the close of the last market day of each month. Constituent weights are determined by the formula below where x is the Monthly Sentiment Score, i is the stock, and k is the number of constituents (500).
We then multiply each stock’s weight by its subsequent monthly price return and sum all returns in that month to calculate the monthly return of the Sentiment enhanced portfolio. Below is the cumulative return series of the SMA portfolio. We use SPY and an equal weighted average of S&P 500 constituents as a baseline.
The Sentiment tilted portfolio outperforms SPY by 13% and the equal weighted portfolio by nearly 20% over a 5-year period. This SMA portfolio also contributes less risk, partially because no stocks are ever weighted too heavily unlike SPY. For instance, SPY’s top 10 holdings account for 28% of the ETF and almost all are in similar industries (AAPL, MSFT, AMZN, GOOG, NVDA). The Sentiment Tilted portfolio’s top 10 holdings on average account for 15.7% of the entire portfolio and are not consistently the largest market cap companies (like AAPL or MSFT). The largest holdings are determined by the Monthly Sentiment Score stemming for SMA’s Social Media data feed.
If you are interested in learning more about SMA’s offerings, please email us at contactus@socialmarketanalytics.com or schedule a demo using this link.
This blog is a follow-up to our research paper “Machine Readable Filings (MRF) Word Count Alpha” which is an extension of Harvard Lazy Prices. This blog focuses on word count, sentiment factors and changes in those factors associated with regulatory filings.
SMA partnered with S&P Global Market Intelligence to provide textual data in U.S. SEC filings organized by headings with textual data underneath (i.e. Parts, Items). Textual data is parsed to create historical baselines for 10-Ks, 10-Qs, and other regulatory filings. There are 20 filing types in the MRF product; this paper analyzes 10-Ks and 10-Qs to focus on quarterly changes.
Subscribers of the MRF dataset can create derivative metrics stemming from the seven factors. For instance, one metric explored in this paper is Sentiment per Word. That factor is calculated by dividing Sentiment Sum by Word Count. This and other derivative factors are calculated to normalize sentiment based on document length.
Methodology
This analysis looks at the Quarter-over-Quarter changes in regulatory filings. Each 10-K and 10-Q is compared to the most recent 10-K and 10-Q from the same company. The Percent Change in Word Count is the difference between the word count in two filings divided by the word count of the previous filing.
The Universe used for this analysis includes all securities over $5. The benchmark used, called ‘Universe’, is the average return of all stocks in any Quintile portfolio at that point in time. The analysis begins in 2007 and concludes in July 2022.
When computing calendar-time portfolio returns, stocks are selected into buckets depending on the factor or change in that factor. Stocks enter the portfolio on the last market day of the month the report was released. Portfolios are rebalanced monthly to introduce new filings submitted in the most recent month.
Results
The graphs and metrics below are calendar-time portfolio returns. Quintile 1 contains stocks with the lowest factor value while Quintile 5 encompasses stocks with the highest factor value.
The graph and table above exemplify how Percentage Change in Word Count can enhance stock selection. Percentage Change in Word Count is calculated by comparing the current document’s Word Count to the Word Count from the same company’s most recent document of that same type (10-Ks are compared to 10-Ks and 10-Qs are compared to 10-Qs).
The green line represents securities with the largest increase in Word Count, which averages to an increase of 29.05% words. The red line denotes securities with the largest decrease in Word Count, which is an average of 16.8% decrease in word count. This outperforms all other quintiles while Quintile 5 underperforms all other quintiles.
As filings become longer compared to the company’s previous filing, returns tend to drop relative to the universe. Regulatory filings warn investors about the company’s future proceedings and risks associated. Typically, companies exclude information that is not required. If there are more words in a document, it means there are more potential liabilities, or the company is over-explaining a facet of the business.
As filings become more concise, subsequent stock returns outperform the universe. Regulatory filings will shrink in size if outstanding issues or risks have been resolved. Companies will remove information that is no longer relevant to the period of the report.
The difference in monthly returns between the two lines (Q1 – Q5) produces a hypothetical Long/Short. This portfolio has a T-Statistic of 3.85 and is proven significant at a 95% confidence level. The slow, steady increase of the Long/Short shows limited risk with the Sharpe Ratio being significantly higher than all other Portfolios.
The next metric explored is Sentiment per Word. Sentiment is calculated using Social Market Analytics’ patented sentiment dictionary. To calculate Sentiment per Word, we divide Sentiment Sum by Word Count to normalize sentiment by the length of the document. If a company has more words in its document, it is likely to have a more extreme Sentiment Sum compared to shorter documents.
In this analysis, Quintile 1 underperforms the Universe; this portfolio contains documents that have negative Sentiment per Word. Quintile 5, which contains stocks with extremely positive Sentiment per Word, outperforms the rest of the universe. As we expect, companies that have good news and talk about topics in a positive manner tend to have better price returns compared to companies with a negative tone in their documents.
SMA’s Machine Readable Filings (MRF) product has insightful, unique information on the structure and sentiment of SEC regulatory filings. This dataset provides you with metrics drilled down by Item and can be used in a variety of Long-term strategies.
If you are interested in learning more about how SMA’s MRF product can help your trading strategies, please email us at contactus@socialmarketanalytics.com or schedule a demo using this link.