Sector neutralization is a technique to neutralize sector exposures by ranking and comparing factors separately within each sector rather than comparing companies across the entire market. This is done to hedge out sector bets and reduce the impact of sector-specific risks on the portfolio.
Many fundamental metrics naturally vary across sectors and industries. These variations don't necessarily indicate better or worse companies but simply represent differences in the way different industries operate. Sector neutralization can be useful with these kinds of metrics.
In this notebook we'll use the debt-to-equity ratio (D/E ratio) to explore sector neutralization.
Let's start by looking at how the D/E ratio varies across sectors. To do so, we define a pipeline for our base universe, with the D/E ratio and sector as columns:
from zipline.pipeline import master, sharadar, Pipeline
from codeload.fundamental_factors.universe import CommonStocks, BaseUniverse
fundamentals = sharadar.Fundamentals.slice('ART')
de = fundamentals.DE.latest
sector = master.SecuritiesMaster.sharadar_Sector.latest
universe = BaseUniverse()
pipeline = Pipeline(
columns={
'de': de,
'sector': sector
},
initial_universe=CommonStocks(),
screen=universe
)
Then we run the pipeline, group the results by sector, and plot the median D/E ratio for each sector:
from zipline.research import run_pipeline
results = run_pipeline(pipeline, '2022-12-30', '2022-12-30')
results.groupby('sector').median().plot(kind="barh", title="Median debt-to-equity ratio by sector");
The median D/E ratio for the financial sector is much higher than for other sectors, and the D/E ratio for the healthcare sector is considerably lower than for other sectors. This means that if we rank stocks by their D/E ratio, the financial sector will be over-represented at one end of the rankings and the healthcare sector will be over-represented at the other end.
Let's visualize this over-representation. To do so, we'll rank by D/E ratio in ascending order (that is, assigning ranking 1 to the lowest D/E ratio) and form quintiles from the ranks. We will then be able to analyze the sector representation in the highest and lowest quintiles, respectively.
pipeline = Pipeline(
columns={
# we mask rank() with universe to avoid ranking stocks that aren't in our universe
'quintiles': de.rank(mask=universe).quintiles(),
'sector': sector,
},
initial_universe=CommonStocks(),
screen=universe
)
results = run_pipeline(pipeline, '2022-12-30', '2022-12-30')
To establish a baseline, let's count the number of securities per sector to see how different sectors compare in size. We can see that Financial Services, Healthcare, Technology, and Industrials are the largest sectors in the market:
securities_by_sector = results.groupby('sector').quintiles.count()
ax = securities_by_sector.plot(kind="pie", title="Number of securities by sector")
ax.set_ylabel('');
Now, let's count the number of securities per sector in the lowest and highest quintiles by D/E ratio. The quintile labels are zero-indexed, meaning quintile 0 contains stocks with the lowest D/E ratios, and quintile 4 contains stocks with the highest D/E ratios.
lowest_quintile_by_sector = results[results.quintiles == 0].groupby('sector').quintiles.count()
highest_quintile_by_sector = results[results.quintiles == 4].groupby('sector').quintiles.count()
We then view pie charts of the results. As expected, healthcare stocks dominate the low D/E quintile, and financial stocks dominate the high D/E quintile. Consequently, a long-short portfolio formed using the D/E ratio would largely be just a bet on healthcare vs financials:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 6))
lowest_quintile_by_sector.plot(kind="pie", ax=axes[0], title="Number of securities in lowest DE quintile by sector")
highest_quintile_by_sector.plot(kind="pie", ax=axes[1], title="Number of securities in highest DE quintile by sector")
for ax in axes:
ax.set_ylabel('')
fig.tight_layout()
To avoid sector concentration, we can incorporate the sector into the ranking process by using the groupby parameter of rank(). Normally, the rank() method ranks all securities from $1 \rightarrow N$, where $N$ is the number of securities in the universe. In contrast, rank(groupby=sector) ranks securities from $1 \rightarrow N$ within each sector, where $N$ is the number of securities in the sector. In the resulting output, if there are $S$ sectors, there will be $S$ stocks ranked 1, $S$ stocks ranked 2, etc. Ranking by sector will allow us to form a portfolio in which each sector is equally represented.
The following pipeline expression ranks by sector, then forms quintiles from the resulting ranks:
neutral_quintiles = de.rank(mask=universe, groupby=sector).quintiles()
Let's re-run the previous pipeline with the neutral quintiles to see how this affects the composition of our low D/E quintile:
pipeline = Pipeline(
columns={
'quintiles': de.rank(mask=universe).quintiles(),
'neutral_quintiles': neutral_quintiles,
'sector': sector,
},
initial_universe=CommonStocks(),
screen=universe
)
results = run_pipeline(pipeline, '2022-12-30', '2022-12-30')
We plot the sector breakdown of the low DE quintile without sector neutralization (on the left) and with sector neutralization (on the right). Sector neutralization has eliminated the over-weighting of healthcare and created balance across sectors:
lowest_quintile_by_sector_neutralized = results[results.neutral_quintiles == 0].groupby('sector').neutral_quintiles.count()
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 6))
lowest_quintile_by_sector.plot(kind="pie", ax=axes[0], title="Number of securities in lowest DE quintile by sector (not neutralized)")
lowest_quintile_by_sector_neutralized.plot(kind="pie", ax=axes[1], title="Number of securities in lowest DE quintile by sector (neutralized)")
for ax in axes:
ax.set_ylabel('')
fig.tight_layout()
Let's create a similar set of plots for the high D/E quintile, to see if we have eliminated the over-weighting of financial stocks in this quintile:
highest_quintile_by_sector_neutralized = results[results.neutral_quintiles == 4].groupby('sector').neutral_quintiles.count()
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 6))
highest_quintile_by_sector.plot(kind="pie", ax=axes[0], title="Number of securities in highest DE quintile by sector (not neutralized)")
highest_quintile_by_sector_neutralized.plot(kind="pie", ax=axes[1], title="Number of securities in highest DE quintile by sector (neutralized)")
for ax in axes:
ax.set_ylabel('')
fig.tight_layout()
That doesn't look good! The sector-neutralized, high D/E quintile (right-hand plot) is now highly concentrated in four sectors: Healthcare, Financial Services, Technology, and Industrials. What's going on? You may notice that these four sectors are the four largest sectors, as we saw earlier. When we rank by sector using rank(groupby=sector), the stocks in each sector are ranked from $1 \rightarrow N$, where $N$ is the number of securities in the sector. This means that larger sectors end up having securities with larger maximum ranks. As a result, when we get to the last quintile (which contains stocks with high ranks), only the larger sectors have any securities left to put in the quintile.
This means that if you want to construct a sector-neutralized high D/E quintile, you should not use quintile 4 of the D/E ratio ranked by sector in ascending order. Rather, you should form quintiles from D/E ratio ranked by sector in descending order (highest D/E ratio first) and select quintile 0.
This is the approach used by the top() and bottom() Factor methods in Pipeline, so you can avoid the above problem by using those methods to select high and low D/E stocks. The following code selects the 10 stocks from each sector with the lowest D/E ratio and the 10 stocks from each sector with the highest D/E ratio:
# this ranks D/E from low to high and takes the first 10 per sector
lowest_de_stocks = de.bottom(10, mask=universe, groupby=sector)
# this ranks D/E from high to low and takes the first 10 per sector
highest_de_stocks = de.top(10, mask=universe, groupby=sector)
To double-check that this results in the sector balance we expect, let's create a pipeline containing the low D/E and high D/E groups of stocks we just selected. We use the if_else() Filter method combined with Constant() to label the groups as "low" and "high", respectively:
from zipline.pipeline import Constant
# limit the output to the low DE and high DE stocks we just selected
screen = lowest_de_stocks | highest_de_stocks
# label the two groups of stocks (if_else() returns the first argument
# if lowest_de_stocks is True and the second argument if it is False)
label = lowest_de_stocks.if_else(Constant('low'), Constant('high'))
pipeline = Pipeline(
columns={
'label': label,
'sector': sector
},
initial_universe=CommonStocks(),
screen=screen
)
Running this pipeline, we confirm that there are 10 stocks per sector in the low and high D/E groups:
results = run_pipeline(pipeline, start_date="2022-12-30", end_date="2022-12-30")
results['label'] = results.label.cat.remove_unused_categories()
results.groupby(['label', 'sector']).size()
label sector
high 0
Basic Materials 10
Communication Services 10
Consumer Cyclical 10
Consumer Defensive 10
Energy 10
Financial Services 10
Healthcare 10
Industrials 10
Real Estate 10
Technology 10
Utilities 10
low 0
Basic Materials 10
Communication Services 10
Consumer Cyclical 10
Consumer Defensive 10
Energy 10
Financial Services 10
Healthcare 10
Industrials 10
Real Estate 10
Technology 10
Utilities 10
dtype: int64Another way to neutralize sectors is to demean them. With demeaning, we calculate the mean D/E ratio for each sector and subtract it from the observed values. Accordingly, the resulting values are centered around zero for each sector, which allows us to better compare values across sectors. If we rank stocks on their demeaned D/E ratios, we are ranking them not on their absolute D/E ratio but on how high or low their D/E ratio is relative to the sector average.
The following pipeline expression forms quintiles from sector-demeaned D/E ratios:
demeaned_quintiles = de.demean(mask=universe, groupby=sector).quintiles()
Alternatively, we can z-score the D/E ratios, which is like demeaning but includes the additional step of dividing the demeaned values by the standard deviation of D/E ratios for the sector. Whereas demeaning neutralizes the effect of one sector having generally higher or lower D/E ratios than another sector, z-scoring additionally removes the effect of one sector having a wider variation of D/E ratios than another sector.
To z-score in Pipeline, just modify the previous expression to use zscore() instead of demean():
zscored_quintiles = de.zscore(mask=universe, groupby=sector).quintiles()
Note that, unlike the ranking approaches shown above, demeaning and z-scoring don't guarantee that your quantiles will be equally weighted among all sectors. It is still possible that the best or worst D/E ratios will cluster more in one sector than another, even after adjusting for sector differences. Moreover, since some sectors are larger than others, we should naturally expect that those larger sectors will make up a larger portion of any given quantile, on average.
An alternative to neutralizing by sector is to neutralize by industry. Industries are more specific than sectors. In theory, the appeal of using industries instead of sectors is that companies within a given industry are more closely related than companies within a given sector, thus providing a more accurate benchmark. However, many industries are too small to allow for meaningful comparisons. For this reason, sectors usually provide a better balance of granularity and adequate sample size. Industries are best used when you wish to specifically target a particular industry or group of industries, which you know in advance are large enough to yield meaningful results.
The number of stocks per industry is shown below:
pipeline = Pipeline(
columns={
'sector': master.SecuritiesMaster.sharadar_Sector.latest,
'industry': master.SecuritiesMaster.sharadar_Industry.latest,
},
initial_universe=CommonStocks(),
screen=universe
)
results = run_pipeline(pipeline, '2022-12-30', '2022-12-30')
counts = results.groupby(['sector', 'industry']).size()
print(counts[counts > 0].to_string())
sector industry
Basic Materials Agricultural Inputs 11
Aluminum 3
Building Materials 9
Building Products & Equipment 28
Chemicals 16
Coking Coal 6
Copper 3
Gold 10
Lumber & Wood Production 4
Other Industrial Metals & Mining 5
Other Precious Metals & Mining 4
Paper & Paper Products 5
Specialty Chemicals 54
Steel 13
Thermal Coal 6
Uranium 4
Communication Services Advertising Agencies 26
Broadcasting 12
Electronic Gaming & Multimedia 11
Entertainment 27
Internet Content & Information 27
Telecom Services 32
Consumer Cyclical Apparel Manufacturing 16
Apparel Retail 34
Auto & Truck Dealerships 19
Auto Manufacturers 14
Auto Parts 37
Department Stores 4
Footwear & Accessories 11
Furnishings Fixtures & Appliances 25
Gambling 10
Home Improvement Retail 9
Internet Retail 19
Leisure 24
Lodging 8
Luxury Goods 8
Packaging & Containers 21
Personal Services 11
Publishing 7
Recreational Vehicles 13
Residential Construction 21
Resorts & Casinos 17
Restaurants 46
Specialty Retail 38
Textile Manufacturing 3
Consumer Defensive Beverages - Brewers 2
Beverages - Non-Alcoholic 10
Beverages - Wineries & Distilleries 7
Confectioners 4
Discount Stores 9
Education & Training Services 14
Farm Products 14
Food Distribution 11
Grocery Stores 9
Household & Personal Products 19
Packaged Foods 45
Pharmaceutical Retailers 3
Tobacco 5
Energy Oil & Gas Drilling 7
Oil & Gas E&P 71
Oil & Gas Equipment & Services 44
Oil & Gas Integrated 5
Oil & Gas Midstream 37
Oil & Gas Refining & Marketing 19
Financial Services Asset Management 90
Banks - Diversified 5
Banks - Regional 347
Capital Markets 34
Credit Services 41
Financial Conglomerates 3
Financial Data & Stock Exchanges 12
Insurance - Diversified 9
Insurance - Life 17
Insurance - Property & Casualty 34
Insurance - Reinsurance 7
Insurance - Specialty 22
Insurance Brokers 11
Mortgage Finance 14
Healthcare Biotechnology 454
Diagnostics & Research 50
Drug Manufacturers - General 13
Drug Manufacturers - Specialty & Generic 53
Health Information Services 41
Healthcare Plans 9
Medical Care Facilities 38
Medical Devices 106
Medical Distribution 7
Medical Instruments & Supplies 47
Industrials Aerospace & Defense 45
Airlines 14
Airports & Air Services 2
Business Equipment & Supplies 5
Conglomerates 17
Consulting Services 12
Electrical Equipment & Parts 33
Engineering & Construction 36
Farm & Heavy Construction Machinery 23
Industrial Distribution 18
Infrastructure Operations 2
Integrated Freight & Logistics 14
Marine Shipping 6
Metal Fabrication 17
Pollution & Treatment Controls 10
Railroads 10
Rental & Leasing Services 20
Security & Protection Services 16
Shell Companies 49
Specialty Business Services 31
Specialty Industrial Machinery 64
Staffing & Employment Services 21
Tools & Accessories 13
Travel Services 10
Trucking 18
Waste Management 13
Real Estate REIT - Diversified 20
REIT - Healthcare Facilities 15
REIT - Hotel & Motel 17
REIT - Industrial 17
REIT - Mortgage 40
REIT - Office 23
REIT - Residential 21
REIT - Retail 30
REIT - Specialty 20
Real Estate - Development 8
Real Estate - Diversified 5
Real Estate Services 23
Technology Communication Equipment 48
Computer Hardware 22
Consumer Electronics 10
Electronic Components 33
Electronics & Computer Distribution 10
Information Technology Services 41
Scientific & Technical Instruments 24
Semiconductor Equipment & Materials 25
Semiconductors 45
Software - Application 165
Software - Infrastructure 94
Solar 14
Utilities Utilities - Diversified 10
Utilities - Independent Power Producers 2
Utilities - Regulated Electric 32
Utilities - Regulated Gas 12
Utilities - Regulated Water 13
Utilities - Renewable 9