© Copyright Quantopian Inc.

© Modifications Copyright QuantRocket LLC

Licensed under the Creative Commons Attribution 4.0.

Disclaimer

by Maxwell Margenot, Gil Wassermann, James Christopher Hall, and Delaney Granizo-Mackenzie

How can we tell whether an alpha factor is good or not? Unfortunately, there is no cut-off that tells you whether a factor is objectively useful. Instead, we need to compare a particular factor to other options before deciding whether to use it. Our end goal in defining and selecting the best factors is to use them to rank stocks in a long-short equity strategy, covered elsewhere in the lecture series. The more independently predictive the factors we use, the better our ranking scheme and our overall strategy will be.

What we want when comparing factors is to make sure the chosen signal is actually predictive of **relative price movements**. We do not want to predict the absolute amount the assets in our universe will move up or down. We only care that we can choose assets to long that will do better than the assets we short. In a long-short equity strategy, we hold a long basket and a short basket of assets, determined by the factor values associated with each asset in our universe. If our ranking scheme is predictive, this means that assets in the top basket will tend to outperform assets in the bottom basket. As long this spread is consistent over time our strategy will have a positive return.

An individual factor can have a lot of moving parts to assess, but ideally it should be independent of other factors that you are already trading on in order to keep your portfolio diverse. We discuss the reasoning for this in the Position Concentration Risk lecture.

In this lecture, we detail and explain relevant statistics to evaluate your alpha factor before attempting to implement it in an algorithm. What's important to keep in mind is that all the metrics provided here are relative to other factors you may be trading or evaluating.

Let's have a look at a factor and try to assess its viability. We will calculate the factor values using Pipeline, so make sure you check out the Pipeline tutorial if you are unfamiliar with how Pipeline works.

In [1]:

```
from zipline.research import run_pipeline, get_forward_returns
from zipline.pipeline import Pipeline, master, EquityPricing
from zipline.pipeline.factors import Returns, AverageDollarVolume
```

Here we will be using a **momentum** factor as our example. Momentum factors are a very common form of alpha factor and they come in many shapes and sizes. They all try to get at the same idea, however, that securities in motion will stay in motion. Momentum factors try to quantify trends in financial markets and to "ride the wave", so to speak.

Let's say that we suspect that a momentum factor could potentially be predictive of stock returns.

In [2]:

```
momentum = Returns(window_length=252, exclude_window_length=21)
```

This momentum factor takes the change in price over the past year, up until a month ago.

In order to judge whether a factor is viable, we have created a package called Alphalens. Its source code is available on github if you want to get into the nitty-gritty of how it works. We use Alphalens to create a "tear sheet" of a factor, similar to how we use pyfolio to create a tear sheet for analyzing backtests.

In [3]:

```
import alphalens as al
```

Alphalens takes your factor and examines how useful it is for predicting relative value through a collection of different metrics. It breaks all the stocks in your chosen universe into different quantiles based on their ranking according to your factor and analyzes the returns, information coefficient (IC), the turnover of each quantile, and provides a breakdown of returns and IC by sector (or other grouping key).

Throughout the course of this lecture we will look at the major sections of an Alphalens tear sheet one by one and detail how to interpret the various individual plots. To do this, we will outline the intermediate steps involved to go from a pipeline definition to a tear sheet. At the end of the notebook, we show an easier way to skip the intermediate steps and generate the whole tear sheet at once.

As always, we need to define our universe. In this case we use a simplified version of the TradableStocksUS universe which does not include a Market Cap filter (to reduce the data dependencies of this lecture). The universe rules are separated into an `initial_universe`

variable consisting of securities master fields and a `screen`

variable consisting of all other rules. Defining an `initial_universe`

limits the computational universe of the pipeline and thus makes it run faster than if all rules were placed in the `screen`

. See the Pipeline Tutorial for a fuller discussion of the distinction between `initial_universe`

and `screen`

.

In [4]:

```
initial_universe = (
# common stocks only
master.SecuritiesMaster.usstock_SecurityType2.latest.eq("Common Stock")
# primary share only
& master.SecuritiesMaster.usstock_PrimaryShareSid.latest.isnull()
)
screen = (
# dollar volume over $2.5M over trailing 200 days
(AverageDollarVolume(window_length=200) >= 2.5e6)
# price > $5
& (EquityPricing.close.latest > 5)
# no missing data for 200 days (exclude trading halts, IPOs, etc.)
& EquityPricing.close.all_present(window_length=200)
& (EquityPricing.volume.latest > 0).all(window_length=200)
)
```

Now we will pull values for our factor for all stocks in our universe by using Pipeline. We also want to make sure that we have the sector code for each individual equity, so we add `Sector`

as another factor for our Pipeline. Note that running the Pipeline may take a while.

In [5]:

```
pipe = Pipeline(
columns = {
'momentum': momentum,
'sector': master.SecuritiesMaster.usstock_Sector.latest
},
initial_universe=initial_universe,
screen=screen
)
results = run_pipeline(pipe, start_date='2010-01-01', end_date='2011-01-01', bundle='usstock-learn-1d')
```

Let's take a look at the data to get a quick sense of what we have.

In [6]:

```
my_factor = results['momentum']
print(my_factor.head())
```

date asset 2010-01-04 Equity(FIBBG000C2V3D6 [A]) 0.812808 Equity(FIBBG000M7KQ09 [AAI]) -0.011086 Equity(FIBBG000F7RCJ1 [AAP]) 0.188061 Equity(FIBBG000B9XRY4 [AAPL]) 1.170468 Equity(FIBBG000C5QZ62 [AAV]) 0.394707 Name: momentum, dtype: float64

Our `my_factor`

variable contains a pandas `Series`

with a factor value for each equity in our universe for each point in time.

Here we create another `Series`

that contains sectors for each equity instead of factor values. This is categorical data that we will use as a parameter for `Alphalens`

later.

In [7]:

```
sectors = results['sector']
```

Now that we have the basic components of what we need to analyze our factor, we can start to deal with `Alphalens`

. Note that we will be breaking out individual components of the package, so this is not the typical workflow for using an `Alphalens`

tear sheet. The typical workflow is shown at the end of the notebook.

First we calculate our forward returns. The forward returns are the returns that we would have received for holding each security over the day periods ending on the given date, passed in through the `periods`

parameter. In our case, we look $1$, $5$, and $10$ days in advance. We can consider this a budget backtest. The tear sheet does not factor in any commission or slippage cost, rather, it only considers values as if we had magically already held the specified equities for the specified number of days up to the current day.

In [8]:

```
periods = [1, 5, 10]
forward_returns = get_forward_returns(my_factor, periods=periods, bundle="usstock-learn-1d")
```

Next, we pass our factor data and forward returns to Alphalens:

In [9]:

```
factor_data = al.utils.get_clean_factor(
my_factor,
forward_returns,
groupby=sectors
)
```

Dropped 0.9% entries from factor data: 0.9% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions). max_loss is 35.0%, not exceeded: OK!

The `factor_data`

variable here is similar to the `my_factor`

variable above. It has a factor value for every equity in our universe at each point in time. Our `Alphalens`

function here has also provided a sector grouping to go along with the factor value.

In [10]:

```
factor_data.head()
```

Out[10]:

1D | 5D | 10D | factor | group | factor_quantile | ||
---|---|---|---|---|---|---|---|

date | asset | ||||||

2010-01-04 | Equity(FIBBG000C2V3D6 [A]) | 0.007403 | -0.008690 | -0.020599 | 0.812808 | Technology | 5 |

Equity(FIBBG000M7KQ09 [AAI]) | -0.007663 | -0.022989 | 0.045977 | -0.011086 | Industrials | 1 | |

Equity(FIBBG000F7RCJ1 [AAP]) | -0.002470 | 0.003953 | -0.028903 | 0.188061 | Consumer Discretionary | 3 | |

Equity(FIBBG000B9XRY4 [AAPL]) | 0.015555 | 0.005922 | -0.022787 | 1.170468 | Technology | 5 | |

Equity(FIBBG000C5QZ62 [AAV]) | 0.047546 | 0.118098 | 0.078221 | 0.394707 | Energy | 3 |

As explained above, the forward returns are the returns that we would have received for holding each security for the specified number of days, ending on the given date. These, too, are broken out by sector.

This function also separates our factor into quantiles for each date, replacing the factor value with its appropriate quantile on a given day. Since we will be holding baskets of the top and bottom quantiles, we only care about the factor insofar as it relates to movement into and out of these baskets.

Alphalens provides four categories of analysis on alpha factors:

- Factor Distribution
- Returns
- Information
- Turnover

Each of these topics is covered in its own separate tear sheet as well as an all-encompassing full tear sheet.

A full tear sheet begins with a table of statistics depicting the distribution of the factor values. This table can be displayed on its own with the following code:

In [11]:

```
al.plotting.plot_factor_distribution_table(factor_data)
```

min | max | mean | std | count | avg daily count | count % | |
---|---|---|---|---|---|---|---|

Factor Quantile | |||||||

1 | -0.943 | 0.384 | -0.104 | 0.182 | 94,571 | 373.8 | 20.0% |

2 | -0.173 | 0.667 | 0.146 | 0.154 | 94,427 | 373.2 | 20.0% |

3 | -0.032 | 1.042 | 0.319 | 0.197 | 94,424 | 373.2 | 20.0% |

4 | 0.092 | 1.820 | 0.552 | 0.297 | 94,427 | 373.2 | 20.0% |

5 | 0.242 | 111.613 | 1.634 | 2.871 | 94,523 | 373.6 | 20.0% |

The table includes a row for each of the factor quantiles. By default, factors are divided into 5 equal-sized quantiles; this can be overridden using the `quantiles`

parameter to the `get_clean_factor`

function (or using the more typical functions shown later in the notebook).

For each quantile, we see the min, max, mean, and standard deviation of factor values. This gives us a sense of which factor values are ending up in which quantiles.

We also see the total count of values per quantile for the whole analysis, the average daily count per quantile, and the percentage count per quantile. Here, because we used equal-sized quantiles (the default option), the counts are the same for each quantile. Alternatively, we could have specified the `bins`

parameter to `get_clean_factor`

(or the more typically used functions shown later in the notebook) instead of `quantiles`

. Using the `bins`

parameter results in equal-*width* quantiles (spaced according to the overall range of factor values) rather than equal-*sized* quantiles. In that scenario, the counts will typically be different for different quantiles.

If we are solely interested in returns, we can create a tear sheet that only contains the returns analysis. The following code block generates all of our returns plots once we have stored the forward returns data:

In [12]:

```
al.tears.create_returns_tear_sheet(factor_data, by_group=True);
```

1D | 5D | 10D | |
---|---|---|---|

Ann. alpha | -0.045 | -0.053 | -0.040 |

beta | 0.206 | 0.233 | 0.181 |

Mean Relative Return Top Quantile (bps) | 1.130 | 0.890 | 0.886 |

Mean Relative Return Bottom Quantile (bps) | -0.304 | -0.037 | -0.026 |

Mean Spread (bps) | 1.434 | 0.861 | 0.869 |