Bryan Taylor, Chief Economist, Global Financial Data
Global Financial Data has calculated a 100-share index that is more comprehensive than existing long-term indices of the United States stock market. GFD has collected historical data on U.S. shares from 1791 until 2017 and is using this database to calculate a more representative index of U.S. shares than is currently available.
The GFD-100 Index is superior to existing indices for three reasons. First, the index is capitalization weighted from its beginning in 1791 until 2019. Second, the selection of stocks is based upon all shares that were traded on U.S. exchanges and over-the-counter. The selection of shares is not limited to the New York Stock Exchange (NYSE), but includes shares from regional exchanges and over-the-counter markets. Third, the index includes finance stocks throughout its history. Finance shares were excluded from the S&P 500 Composite until July 1, 1976 when the composition of the S&P 500 was changed from 425 Industrials, 60 Utilities and 15 Rail shares (adopted in March 1957) to 400 Industrials, 40 Utilities, 20 Transportation and 40 Finance stocks.
Few people realize that bank and insurance stocks were excluded from any major United States stock index until 1976. Although bank and finance stocks mostly traded over-the-counter before the 1970s, finance stocks represented between 10% and 20% of the overall capitalization of the stock market before 1976. Bank and insurance companies were excluded because consistent price quotes were not available and finance stocks were not as liquid as the shares traded on exchanges. The exclusion of over-the-counter shares was not limited to banks and insurance companies. Standard Oil and its subsidiaries also traded over-the-counter and were excluded from stock market indices until they moved onto the New York Stock Exchange. Nevertheless, people did invest in bank and insurance stocks, and in the Standard Oil companies, and their exclusion creates biases in the historical calculations of stock market indices.
Existing calculations of long-term stock market returns in the United States are based upon four primary sources: Smith and Cole (1803-1862), Macaulay (1857-1871) Cowles (1871-1928) and Standard and Poor’s (1928-2017). Unfortunately, these indices have major flaws in them that create biases that researchers have tolerated until now because no one has ever collected historical data on U.S. share prices, corporate actions (dividends and splits) and shares outstanding so accurate price and return indices could be calculated. The current S&P Composite includes data from the Cowles Indices from 1871 until 1927, the 90-share daily S&P Index from January 1928 until February 1957, the 500-share composite that excludes finance stocks until July 1976 and the all-sector 500-share index since July 1, 1976.
Global Financial Data has collected data on all major United States exchanges going back to 1791 as well as data on over-the-counter shares since 1865. The data includes not only share price data, but information on corporate actions (dividends and splits) as well as shares outstanding. These data sources enable us to calculate cap-weighted price and return indices for the United States that accurately reflect what an investment in the 100 largest stocks each year would have produced for investors.
GFD set up criteria for determining which stocks are included or excluded from its indices. GFD has followed these rules for inclusion: 1) there had to be at least 9 observations per year, 2) Dividend data have to be available for each stock in order that both price and return indices can be calculated, and 3) There had to be share outstanding information available so the stock could be included in a capitalization-weighted index.
The index includes all shares that met these criteria from 1791 to 1824, the top 50 shares by capitalization are used from 1825 to 1850 and the top 100 shares by capitalization from 1851 to 2017. To choose which companies to include, shares were weighted by capitalization during the first month of each year and included if they were among the 100 largest shares in the United States. It was assumed that the stocks were held for the rest of the year, and in January of the next year, the same selection methodology was used to choose which shares to hold for the coming year. Each year, the list of the 100 largest companies was recalculated and a new list of stocks was introduced. For continuity purposes, if a stock missed a year, i.e. the stock was in the top 100 in 1914 and 1916, but not in 1915, the stock was included in the index in 1915 even though this put over 100 stocks in the index.
A decade-by-decade comparison of the return to stocks in the GFD-100, bonds in GFD’s U.S. Bond Index, bills in GFD’s US Bill Index and the equity-risk premium is provided below.
Although the overall returns between 1870 and 2017 do not differ significantly between the GFD-100 and Cowles/S&P Composite, there are several advantages in using the GFD-100 as the benchmark for long-term historical data for the United States stock market rather than the Cowles/S&P Composite.
1. The GFD-100 uses shares that traded on all United States exchanges and over-the-counter from 1791 until 2017. The Cowles/S&P Composite is limited to the New York Stock Exchange before 1972. Finance companies that traded OTC and companies such as Standard Oil which traded OTC for several decades before moving to the NYSE are included in the GFD-100, but excluded from Cowles/S&P.
2. The GFD-100 uses shares from all sectors, including finance, from 1791 until 2019. The Cowles/S&P Composite only includes finance shares beginning in 1976 and ignores the finance sector before 1976.
3. The GFD-100 includes accurate data on dividends from 1791 to 2019. The Cowles/S&P Composite only calculated dividends from 1871 until 2017. There is no inclusion of dividends before 1871 in the Cowles/S&P Composite because no data on dividends were collected. Consequently, existing indices are missing 80 years of dividend data.
4. The GFD-100 is capitalization weighted from 1791 until 2019. The Cowles/S&P is cap-weighted from 1871 until 2019 and includes no capitalization weighting before 1871.
5. The Smith and Cole bank indices that cover the period from 1802 until 1845 suffer from survivorship bias. Banks were chosen for the indices based upon their longevity, not on their size or liquidity. The GFD-100 components are chosen based upon their market capitalization. The largest companies are chosen during each January and are “held” in the portfolio for the rest of the year when a new portfolio is organized for the coming year.
6. The Smith and Cole indices are based upon a very limited population of six to eighteen companies per year from 1802 until 1862. The GFD-100 includes 50 companies from 1825 until 1850 and 100 companies from 1851 using a broader population of shares.
7. The GFD-100 uses a consistent methodology from 1791 until 2019. The Cole and Smith/Macaulay/Cowles/S&P Index use different methodologies. The data that are used to put together the composite are collected from four different sources and chain-linked together in an uncertain pattern. During the periods of time when different indices exist, choosing different indices generates different rates of return.
Overall, the GFD-100 provides a superior benchmark stock index. Because of its greater accuracy, we would encourage financial historians to use the GFD-100 for their analysis of long-term trends in the stock market in the United States.
REQUEST A DEMO with a GFDFinaeon Specialist
Our comprehensive financial databases span global markets offering data never compiled into an electronic format. We create and generate our own proprietary data series while we continue to investigate new sources and extend existing series whenever possible. GFD supports full data transparency to enable our users to verify financial data points, tracing them back to the original source documents. GFD is the original supplier of complete historical data.