A Leader in Unstructured Financial Data
In this brief analysis we discuss how Social Sentiment can enhance stock selection and be used to tilt the weights of S&P 500 Constituents to outperform market benchmarks.
SMA uses our patent NLP to calculate sentiment on each Tweet with fine grain scores ranging from -1.0000 to 1.0000. Negative 1 is extremely negative and positive 1 is extremely positive. We aggregate these values by stock over a 24-hour lookback period. This 24-hour summation of Social Sentiment is called ‘Raw-S’. We use a 7-day summation of Raw-S to calculate a Week-Raw-S for each stock. Popular securities, such as TSLA or AAPL, will have more extreme Raw-S and Week-Raw-S values due to higher Twitter volume. The Week-Raw-S is compared to the mean and standard deviation of the previous 13-weeks’ worth of Week-Raw-S values to calculate a weekly standardized score across all securities. We then take a weighted average of the weekly standardized score over the previous 4 weeks to determine the Monthly Sentiment Score. We take a weighted average of weekly standardized scores, sentiment from Tweets received in the last week have more impact on the Monthly Sentiment Score than Sentiment from Tweets 4 weeks ago.
For this theoretical model, we rebalance monthly at the close of the last market day of each month. Constituent weights are determined by the formula below where x is the Monthly Sentiment Score, i is the stock, and k is the number of constituents (500).
We then multiply each stock’s weight by its subsequent monthly price return and sum all returns in that month to calculate the monthly return of the Sentiment enhanced portfolio. Below is the cumulative return series of the SMA portfolio. We use SPY and an equal weighted average of S&P 500 constituents as a baseline.
The Sentiment tilted portfolio outperforms SPY by 13% and the equal weighted portfolio by nearly 20% over a 5-year period. This SMA portfolio also contributes less risk, partially because no stocks are ever weighted too heavily unlike SPY. For instance, SPY’s top 10 holdings account for 28% of the ETF and almost all are in similar industries (AAPL, MSFT, AMZN, GOOG, NVDA). The Sentiment Tilted portfolio’s top 10 holdings on average account for 15.7% of the entire portfolio and are not consistently the largest market cap companies (like AAPL or MSFT). The largest holdings are determined by the Monthly Sentiment Score stemming for SMA’s Social Media data feed.
If you are interested in learning more about SMA’s offerings, please email us at contactus@socialmarketanalytics.com or schedule a demo using this link.
This blog is a follow-up to our research paper “Machine Readable Filings (MRF) Word Count Alpha” which is an extension of Harvard Lazy Prices. This blog focuses on word count, sentiment factors and changes in those factors associated with regulatory filings.
SMA partnered with S&P Global Market Intelligence to provide textual data in U.S. SEC filings organized by headings with textual data underneath (i.e. Parts, Items). Textual data is parsed to create historical baselines for 10-Ks, 10-Qs, and other regulatory filings. There are 20 filing types in the MRF product; this paper analyzes 10-Ks and 10-Qs to focus on quarterly changes.
Subscribers of the MRF dataset can create derivative metrics stemming from the seven factors. For instance, one metric explored in this paper is Sentiment per Word. That factor is calculated by dividing Sentiment Sum by Word Count. This and other derivative factors are calculated to normalize sentiment based on document length.
Methodology
This analysis looks at the Quarter-over-Quarter changes in regulatory filings. Each 10-K and 10-Q is compared to the most recent 10-K and 10-Q from the same company. The Percent Change in Word Count is the difference between the word count in two filings divided by the word count of the previous filing.
The Universe used for this analysis includes all securities over $5. The benchmark used, called ‘Universe’, is the average return of all stocks in any Quintile portfolio at that point in time. The analysis begins in 2007 and concludes in July 2022.
When computing calendar-time portfolio returns, stocks are selected into buckets depending on the factor or change in that factor. Stocks enter the portfolio on the last market day of the month the report was released. Portfolios are rebalanced monthly to introduce new filings submitted in the most recent month.
Results
The graphs and metrics below are calendar-time portfolio returns. Quintile 1 contains stocks with the lowest factor value while Quintile 5 encompasses stocks with the highest factor value.
The graph and table above exemplify how Percentage Change in Word Count can enhance stock selection. Percentage Change in Word Count is calculated by comparing the current document’s Word Count to the Word Count from the same company’s most recent document of that same type (10-Ks are compared to 10-Ks and 10-Qs are compared to 10-Qs).
The green line represents securities with the largest increase in Word Count, which averages to an increase of 29.05% words. The red line denotes securities with the largest decrease in Word Count, which is an average of 16.8% decrease in word count. This outperforms all other quintiles while Quintile 5 underperforms all other quintiles.
As filings become longer compared to the company’s previous filing, returns tend to drop relative to the universe. Regulatory filings warn investors about the company’s future proceedings and risks associated. Typically, companies exclude information that is not required. If there are more words in a document, it means there are more potential liabilities, or the company is over-explaining a facet of the business.
As filings become more concise, subsequent stock returns outperform the universe. Regulatory filings will shrink in size if outstanding issues or risks have been resolved. Companies will remove information that is no longer relevant to the period of the report.
The difference in monthly returns between the two lines (Q1 – Q5) produces a hypothetical Long/Short. This portfolio has a T-Statistic of 3.85 and is proven significant at a 95% confidence level. The slow, steady increase of the Long/Short shows limited risk with the Sharpe Ratio being significantly higher than all other Portfolios.
The next metric explored is Sentiment per Word. Sentiment is calculated using Social Market Analytics’ patented sentiment dictionary. To calculate Sentiment per Word, we divide Sentiment Sum by Word Count to normalize sentiment by the length of the document. If a company has more words in its document, it is likely to have a more extreme Sentiment Sum compared to shorter documents.
In this analysis, Quintile 1 underperforms the Universe; this portfolio contains documents that have negative Sentiment per Word. Quintile 5, which contains stocks with extremely positive Sentiment per Word, outperforms the rest of the universe. As we expect, companies that have good news and talk about topics in a positive manner tend to have better price returns compared to companies with a negative tone in their documents.
SMA’s Machine Readable Filings (MRF) product has insightful, unique information on the structure and sentiment of SEC regulatory filings. This dataset provides you with metrics drilled down by Item and can be used in a variety of Long-term strategies.
If you are interested in learning more about how SMA’s MRF product can help your trading strategies, please email us at contactus@socialmarketanalytics.com or schedule a demo using this link.
In this video we demonstrate how easy-to-use and powerful the ThemeX tool is on the Unstructured Data Terminal. We use ThemeX to search filings, transcripts and social media for macro insights on the impact of a recession.
To learn more email us at ContactUs@SocialMarketAnalytics.com or schedule a demo use this link: https://outlook.office365.com/owa/calendar/SocialMarketAnalytics@socialmarketanalytics.com/bookings/
Earnings performance and indicators are one of the most looked at metrics within the finance community when choosing to invest in a security. Paired with earnings announcements are a conference calls where management comments on the company’s performance in a prepared statement. Afterwards investors can ask questions about specific components of the company and its outlook. This section is where you can gauge the finance community’s thoughts on a security based on the tone of conversation and the language used.
SMA converts Earnings Call Transcripts to JSON format and applies proprietary sentiment metrics. We calculate sentiment, word count, and section metrics from the Q&A section of the Earnings Call Transcripts to see relationships with subsequent price returns.
Since Earnings Calls typically occur 4 times in a calendar year for each company, this analysis uses quarterly holding periods. Portfolios are determined by the value of sentiment and word count metrics associated with the company’s Earnings Call Q&A section. At the end of each month, we look at any new Earnings Call Transcripts submitted in the past month and rebalance the portfolios. Sentiment and word count values can be held for a company for up to 4 months or until a new Earnings Call occurs. Portfolios are quintiles with Quintile 5 being the highest 20% of the metric and Quintile 1 being the lowest 20%. Each portfolio is rebalanced at market close of the last trading day of each month. Securities must have a Price > $5 to be included in the portfolio to remove volatility in returns from penny stocks.
The first metric is Sum Sentiment. SMA looks at every word and phrase for sentiment and has a score when identified. We take the total sum of all the sentiment from the words and phrases in the Q&A section. We expect companies that have a higher Sum Sentiment to outperform the market while companies with lower Sum Sentiment underperform.
Sum Sentiment quintiles performed as expected. There is a monotonic signal showing companies with high Sum Sentiment in the Q&A section of their Earnings Call are more likely to exceed market expectations. On the other side, companies with low Sum Sentiment underperform. The tone, or sentiment, of investors questions as well as the response from management, projects how people feel about the future of the company which affects the security’s future price returns.
A theoretical Long/Short between Quintile 5 and Quintile 1 has a steady increase over time with not much drawdown. This theoretical Long/Short portfolio passes a significance test at a 5% level. Each Quintile has on average 484 securities, which comes to an average of 2,420 securities rebalanced each month.
Next, we look at Hits per Word. Hit Count is the number of words or phrases tagged with a sentiment score and word count is the total number of words in the Q&A section. We take Hit Count divided by Word Count to get hits per word. This metric is the percentage of the document scored with sentiment. This is helpful in identifying which companies discuss relevant components and crucial to the firm’s outlook, rather than blathering about unrelated or unnecessary topics.
The monotonic nature of this graph demonstrates the predictive power of this factor. Companies that have a higher percentage of words/phrases with sentiment in their document tend to outperform the market. These firms are letting investors know insightful information rather than boilerplate responses. Investors respond positively to truthful management insights and respond negatively to Earnings Calls where management doesn’t convey the company’s vision.
The Questions and Answers section of an Earnings Call contains crucial information that typically isn’t being quantified. With SMA’s new Earnings Call Transcripts product, these calls now have quantitative metrics to its qualitative insights. This information can now be tapped to project companies price returns with an investment horizon of multiple months. The analysis above doesn’t even account for the alpha immediately after the earnings release because trades are executed at market close of the last trading day of each month. That means there could be 20+ days of alpha from the new Earnings Call to when the strategy acts on that information. This conservative approach still exemplifies the importance of these factors to the finance community.
If you are interested in learning more about how SMA’s Earnings data can help your trading strategies, please email us at contactus@socialmarketanalytics.com or schedule a demo using this link.
Predictive Power of Sentiment on Earnings Q&A
/by SMA TeamOne of the most closely followed events on the corporate calendar are earnings calls. This gives executives the opportunity to comment on earnings and answer questions from those outside of the company. Using our patented Natural Language Processing, Social Market Analytics scores Earnings Call Transcripts in real time and creates metrics based on sentiment, word count, and section count. For this research, we look specifically at the question-and-answer section of call transcripts. The theory is that isolating the section of the call where executives aren’t controlling the topic of conversation will give a more accurate assessment of the sentiment surrounding earnings results. We use Sum of Sentiment to quantify the positivity of the call. Sum of Sentiment adds all the words and phrases tagged in the section with sentiment. The following histogram shows the distribution of the Sum of Sentiment variable.
The Sum of Sentiment is centered around 3.5 and is roughly normal with a heavy tail skewing right. As executives of companies want to express good things to come, it makes sense that the sum is predominantly positive. Still some earnings calls are more positive than others. Based on the distribution of sentiment, we defined an extremely positive earning call as having a sum greater than 5 and a negative earning call as having a sum less than 1.5. These thresholds give a roughly equal number of instances over the past 14 years. We took these thresholds and compared returns for different time periods following the Earnings Call. Time periods were subsequent Open-to-Close; subsequent Close-to-Close, subsequent week return, subsequent month return, and subsequent quarter return. Since earnings calls are spaced throughout the year, it is difficult to compound the subsequent returns. Instead, we will be looking at average excess returns for each threshold. The excess return for each security is calculated by subtracting the SPY return of the same time frame from the securities return. Our hypothesis is that the average excess returns for the extremely positive earnings calls will be higher than those for negative earnings calls. We calculated returns of all instances since the end of 2009.
For long-term holdings, the average excess return for high sentiment earnings call companies was strongly positive. On the contrary, negative sentiment earning calls company returns were negative for every time frame. Quarterly returns highlight the importance of a positive earnings call as the average excess return is close to .8% higher than negative. The biggest takeaway from different time periods was the large difference in returns between the next Open-to-Close and the next Close-to-Close, especially those with a high sentiment. Entering on the subsequent close rather than open dropped the excess returns by .8% and made them negative. The next close to close returns were negative regardless of the sentiment threshold. Waiting to enter removed the benefit of positivity from high sentiment. We looked at the returns of these two-time frames with high sentiment over the past 13 years.
Looking at the past 14-year performance: The Open-to-Close excess returns were positive 12/14 years and Close-to-Close excess returns were positive 4/14 years. The two negative years for the Open-to-Close also came during an abnormal period of the COVID-19 pandemic. Immediate open to close return benefits from the high sentiment far more than the close to close. Therefore, there is a premium on knowing the sentiment of an Earnings Call in real time and entering the next open to maximize short term returns. Instead of manually reading earnings calls to gain insights, traders can use the sentiment summarized by Social Market Analytics to select positions. Waiting to enter on positive earnings calls generally hurts the short-term returns. Social Market Analytics’ scoring on Earning Calls can give traders the advantage of entering the position as quickly as possible for immediate returns, while also providing a holding option for quarterly returns.
If you are interested in learning more about how SMA’s Earnings data can help your trading strategies, please email us at contactus@socialmarketanalytics.com or schedule a demo using this link.
Explore sentiment on earnings calls and all corporate filings on the SMA Unstructured Data Terminal below.
General Market Indicator: Volatility of Sentiment Helpful in Predicting Overall Market Performance
/by SMA TeamThe target of this research was to find an indicator that helps predict the direction of the overall US Equity market for the next week using sentiment data from the previous week. The hypothesis is when there is high volatility in sentiment over the previous week, which means investors have differing opinions, the subsequent week overall market performance will underperform. When volatility on sentiment is low or neutral, the crowd has reached a consensus and the general market will outperform over the next week. The sentiment metric used to represent volatility is Raw-Volatility in SMA’s S-Factor data feed, which captures the volatility of the sentiment from Twitter conversations. All Raw-Volatility data points were taken from the 3:40 pm ET timestamp (20 minutes before the market close). We calculated the summation of Raw-Volatility for each date as a proxy to represent the volatility of Twitter social sentiment on the entire market. The exact calculation is as follows, where “N” is the number of companies with sentiment on that date and “D” is the date:
We then created a 7-day standardized volatility using a 91-day benchmark:
This Z_Volatility score follows a roughly normal distribution.
Using the S&P 500 ETF Trust (SPY) as a proxy of general market performance, we then look at the relationship between Z_Volatility and SPY’s return series. The daily close-to-close return is calculated as:
Hypothesis: When Z_Volatility for the previous closing Date is high, the subsequent market performance will be lower. When Z_Volatility is low or neutral, the next day’s market performance will be higher.
To test this, our strategy is to open short position of SPY when Z_Volatility > 1. When Z_Volatiltiy is =< 1, the portfolio treats SPY as a long position. This hypothetical portfolio is then compared to SPY over the past 10 years:
Prior to the COVID-19 pandemic, which began in early 2020, SPY outperformed the modified portfolio. However, since then the behavior of this factor changed drastically. Here is the same graph as above starting in 2020:
Taking a closer look, the separation since the beginning of 2020 is quite significant. Adding a short position to SPY when volatility on sentiment is high, has enhanced the portfolio’s return. Even though many of the days will maintain a long position, the Z-Volatility is predictive of downturns in the market since 2020. Traders could use this metric as an indicator to stay out of the market, or at the very least trade with more caution. The COVID-19 Pandemic led to a large amount of uncertainty surrounding the stock market and the direction its heading. A high Z_Volatility score indicates the public’s opinion is more uncertain about the direction of various stocks. This research shows the value of sentiment from Social Market Analytics in predicting macro-level events and price movements.
If you are interested in learning more about how SMA’s S-Factor data can help your trading strategies, please email us at contactus@socialmarketanalytics.com or schedule a demo using this link.