Risk Models¶
In addition to the expected returns, meanvariance optimisation requires a risk model, some way of quantifying asset risk. The most commonlyuse risk model is the covariance matrix, a statistical entity that describes the volatility of asset returns and how they vary with one another. This is important because one of the principles of diversification is that risk can be reduced by making many uncorrelated bets (and correlation is just normalised covariance).
In many ways, the subject of risk models is far more important than that of expected returns because historical variance is generally a much more predictive statistic than mean historical returns. In fact, research by Kritzman et al. (2010) [1] suggests that minimum variance portfolios, which neglect to provide expected returns, actually perform much better out of sample.
The problem, however, is that in practice we do not have access to the covariance
matrix (in the same way that we don’t have access to expected returns) – the only
thing we can do is to make estimates based on past data. The most straightforward
approach is to just calculate the sample covariance matrix based on historical
returns, but relatively recent (post2000) research indicates that there are much
more robust statistical estimators of the covariance matrix. In addition to
providing a wrapper around the estimators in sklearn
, PyPortfolioOpt
provides some novel alternatives such as semicovariance and exponentially weighted
covariance.
Attention
Estimation of the covariance matrix is a very deep and activelyresearched topic that involves statistics, econometrics, and numerical/computational approaches. Please note that I am not an expert, but I have made an effort to familiarise myself with the seminal papers in the field.
The risk_models
module provides functions for estimating the covariance matrix given
historical returns. Because of the complexity of estimating covariance matrices
(and the importance of efficient computations), this module mostly provides a convenient
wrapper around the underrated sklearn.covariance module.
The format of the data input is the same as that in Expected Returns.
Currently implemented:
sample covariance
semicovariance
exponentially weighted covariance
mininum covariance determinant
shrunk covariance matrices:
 manual shrinkage
 Ledoit Wolf shrinkage
 Oracle Approximating shrinkage
covariance to correlation matrix

pypfopt.risk_models.
sample_cov
(prices, frequency=252)¶ Calculate the annualised sample covariance matrix of (daily) asset returns.
Parameters:  prices (pd.DataFrame) – adjusted closing prices of the asset, each row is a date and each column is a ticker/id.
 frequency (int, optional) – number of time periods in a year, defaults to 252 (the number of trading days in a year)
Returns: annualised sample covariance matrix
Return type: pd.DataFrame
This is the textbook default approach. The entries in the sample covariance matrix (which we denote as S) are the sample covariances between the i th and j th asset (the diagonals consist of variances). Although the sample covariance matrix is an unbiased estimator of the covariance matrix, i.e \(E(S) = \Sigma\), in practice it suffers from misspecification error and a lack of robustness. This is particularly problematic in meanvariance optimisation, because the optimiser may give extra credence to the erroneous values.
Note
This should not be your default choice! Please use a shrinkage estimator instead.

pypfopt.risk_models.
semicovariance
(prices, benchmark=7.9e05, frequency=252)¶ Estimate the semicovariance matrix, i.e the covariance given that the returns are less than the benchmark.
Parameters:  prices (pd.DataFrame) – adjusted closing prices of the asset, each row is a date and each column is a ticker/id.
 benchmark (float) – the benchmark return, defaults to the daily riskfree rate, i.e \(1.02^{(1/252)} 1\).
 frequency (int, optional) – number of time periods in a year, defaults to 252 (the number
of trading days in a year). Ensure that you use the appropriate
benchmark, e.g if
frequency=12
use the monthly riskfree rate.
Returns: semicovariance matrix
Return type: pd.DataFrame
The semivariance is the variance of all returns which are below some benchmark B (typically the riskfree rate) – it is a common measure of downside risk. There are multiple possible ways of defining a semicovariance matrix, the main differences lying in the ‘pairwise’ nature, i.e whether we should sum over \(\min(r_i,B)\min(r_j,B)\) or \(\min(r_ir_j, B)\). In this implementation, we have followed the advice of Estrada 2007 [2], preferring:
\[\frac{1}{n}\sum_{i = 1}^n {\sum_{j = 1}^n {\min \left( {{r_i},B} \right)} } \min \left( {{r_j},B} \right)\]

pypfopt.risk_models.
exp_cov
(prices, span=180, frequency=252)¶ Estimate the exponentiallyweighted covariance matrix, which gives greater weight to more recent data.
Parameters:  prices (pd.DataFrame) – adjusted closing prices of the asset, each row is a date and each column is a ticker/id.
 span (int, optional) – the span of the exponential weighting function, defaults to 180
 frequency (int, optional) – number of time periods in a year, defaults to 252 (the number of trading days in a year)
Returns: annualised estimate of exponential covariance matrix
Return type: pd.DataFrame
The exponential covariance matrix is a novel way of giving more weight to recent data when calculating covariance, in the same way that the exponential moving average price is often preferred to the simple average price. For a full explanation of how this estimator works, please refer to the blog post on my academic website.

pypfopt.risk_models.
min_cov_determinant
(prices, frequency=252, random_state=None)¶ Calculate the minimum covariance determinant, an estimator of the covariance matrix that is more robust to noise.
Parameters:  prices (pd.DataFrame) – adjusted closing prices of the asset, each row is a date and each column is a ticker/id.
 frequency (int, optional) – number of time periods in a year, defaults to 252 (the number of trading days in a year)
 random_state (int, optional) – random seed to make results reproducible, defaults to None
Returns: annualised estimate of covariance matrix
Return type: pd.DataFrame
The minimum covariance determinant (MCD) estimator is designed to be robust to outliers and ‘contaminated’ data [3]. An efficient estimator is implemented in the
sklearn.covariance
module, which is based on the algorithm presented in Rousseeuw 1999 [4].

pypfopt.risk_models.
cov_to_corr
(cov_matrix)¶ Convert a covariance matrix to a correlation matrix.
Parameters: cov_matrix (pd.DataFrame) – covariance matrix Returns: correlation matrix Return type: pd.DataFrame Note
This is especially useful when it comes to visualise the ‘correlation matrices’ that are associated with (shrunk) covariance matrices, using Matplotlib’s
imshow
or Seaborn’sheatmap
.
Shrinkage estimators¶
A great starting point for those interested in understanding shrinkage estimators is Honey, I Shrunk the Sample Covariance Matrix [5] by Ledoit and Wolf, which does a good job at capturing the intuition behind them – we will adopt the notation used therein. I have written a summary of this article, which is available on my website. A more rigorous reference can be found in Ledoit and Wolf (2001) [6].
The essential idea is that the unbiased but often poorly estimated sample covariance can be combined with a structured estimator \(F\), using the below formula (where \(\delta\) is the shrinkage constant):
It is called shrinkage because it can be thought of as “shrinking” the sample covariance matrix towards the other estimator, which is accordingly called the shrinkage target. The shrinkage target may be significantly biased but has little esimation error. There are many possible options for the target, and each one will result in a different optimal shrinkage constant \(\delta\). PyPortfolioOpt offers the following shrinkage methods:
LedoitWolf shrinkage:
constant_variance
shrinkage, i.e the target is the diagonal matrix with the mean of asset variances on the diagonals and zeroes elsewhere. This is the shrinkage offered bysklearn.LedoitWolf
.single_factor
shrinkage. Based on Sharpe’s singleindex model which effectively uses a stock’s beta to the market as a risk model. See Ledoit and Wolf 2001 [6].constant_correlation
shrinkage, in which all pairwise correlations are set to the average correlation (sample variances are unchanged). See Ledoit and Wolf 2003 [5]
Oracle approximating shrinkage (OAS), invented by Chen et al. (2010) [7], which has a lower meansquared error than LedoitWolf shrinkage when samples are Gaussian or nearGaussian.
Tip
For most use cases, I would just go with Ledoit Wolf shrinkage, as recommended by Quantopian in their lecture series on quantitative finance.
My implementations have been translated the Matlab code on Michael Wolf’s webpage, with the help of xtuanta.

class
pypfopt.risk_models.
CovarianceShrinkage
(prices, frequency=252)¶ Provide methods for computing shrinkage estimates of the covariance matrix, using the sample covariance matrix and choosing the structured estimator to be an identity matrix multiplied by the average sample variance. The shrinkage constant can be input manually, though there exist methods (notably Ledoit Wolf) to estimate the optimal value.
Instance variables:
X
(returns)S
(sample covariance matrix)delta
(shrinkage constant)

__init__
(prices, frequency=252)¶ Parameters:  prices (pd.DataFrame) – adjusted closing prices of the asset, each row is a date and each column is a ticker/id.
 frequency (int, optional) – number of time periods in a year, defaults to 252 (the number of trading days in a year)

format_and_annualise
(raw_cov_array)¶ Helper method which annualises the output of shrinkage calculations, and formats the result into a dataframe
Parameters: raw_cov_array (np.ndarray) – raw covariance matrix of daily returns Returns: annualised covariance matrix Return type: pd.DataFrame

ledoit_wolf
(shrinkage_target='constant_variance')¶ Calculate the LedoitWolf shrinkage estimate for a particular shrinkage target.
Parameters: shrinkage_target (str, optional) – choice of shrinkage target, either constant_variance
,single_factor
orconstant_correlation
. Defaults toconstant_variance
.Raises: NotImplementedError – if the shrinkage_target is unrecognised Returns: shrunk sample covariance matrix Return type: np.ndarray

oracle_approximating
()¶ Calculate the Oracle Approximating Shrinkage estimate
Returns: shrunk sample covariance matrix Return type: np.ndarray

shrunk_covariance
(delta=0.2)¶ Shrink a sample covariance matrix to the identity matrix (scaled by the average sample variance). This method does not estimate an optimal shrinkage parameter, it requires manual input.
Parameters: delta (float, optional) – shrinkage parameter, defaults to 0.2. Returns: shrunk sample covariance matrix Return type: np.ndarray
References¶
[1]  Kritzman, Page & Turkington (2010) In defense of optimization: The fallacy of 1/N. Financial Analysts Journal, 66(2), 3139. 
[2]  Estrada (2006), MeanSemivariance Optimization: A Heuristic Approach 
[3]  Rousseeuw, P., J (1984). Least median of squares regression. The Journal of the American Statistical Association, 79, 871880. 
[4]  Rousseeuw, P., J (1999). A Fast Algorithm for the Minimum Covariance Determinant Estimator. The Journal of the American Statistical Association, 41, 212223. 
[5]  (1, 2) Ledoit, O., & Wolf, M. (2003). Honey, I Shrunk the Sample Covariance Matrix The Journal of Portfolio Management, 30(4), 110–119. https://doi.org/10.3905/jpm.2004.110 
[6]  (1, 2) Ledoit, O., & Wolf, M. (2001). Improved estimation of the covariance matrix of stock returns with an application to portfolio selection, 10, 603–621. 
[7]  Chen et al. (2010), Shrinkage Algorithms for MMSE Covariance Estimation, IEEE Transactions on Signals Processing, 58(10), 50165029. 