
PyPortfolioOpt is a library that implements portfolio optimization methods, including classical efficient frontier techniques and Black-Litterman allocation, as well as more recent developments in the field like shrinkage and Hierarchical Risk Parity, along with some novel experimental features like exponentially-weighted covariance matrices.
It is extensive yet easily extensible, and can be useful for both the casual investor and the serious practitioner. Whether you are a fundamentals-oriented investor who has identified a handful of undervalued picks, or an algorithmic trader who has a basket of strategies, PyPortfolioOpt can help you combine your alpha sources in a risk-efficient way.
Installation¶
If you would like to play with PyPortfolioOpt interactively in your browser, you may launch Binder here. It takes a while to set up, but it lets you try out the cookbook recipes without having to install anything.
Prior to installing PyPortfolioOpt, you need to install C++. On macOS, this means that you need to install XCode Command Line Tools (see here).
For Windows users, download Visual Studio here, with additional instructions here.
Installation can then be done via pip:
pip install PyPortfolioOpt
For the sake of best practice, it is good to do this with a dependency manager. I suggest you set yourself up with poetry, then within a new poetry project run:
poetry add PyPortfolioOpt
The alternative is to clone/download the project, then in the project directory run
python setup.py install
Thanks to Thomas Schmelzer, PyPortfolioOpt now supports Docker (requires
make, docker, docker-compose). Build your first container with
make build
; run tests with make test
. For more information, please read
this guide.
Note
If any of these methods don’t work, please raise an issue with the ‘packaging’ label on GitHub
For developers¶
If you are planning on using PyPortfolioOpt as a starting template for significant modifications, it probably makes sense to clone the repository and to just use the source code
git clone https://github.com/robertmartin8/PyPortfolioOpt
Alternatively, if you still want the convenience of a global from pypfopt import x
,
you should try
pip install -e git+https://github.com/robertmartin8/PyPortfolioOpt.git
A Quick Example¶
This section contains a quick look at what PyPortfolioOpt can do. For a guided tour, please check out the User Guide. For even more examples, check out the Jupyter notebooks in the cookbook.
If you already have expected returns mu
and a risk model S
for your set of
assets, generating an optimal portfolio is as easy as:
from pypfopt.efficient_frontier import EfficientFrontier
ef = EfficientFrontier(mu, S)
weights = ef.max_sharpe()
However, if you would like to use PyPortfolioOpt’s built-in methods for calculating the expected returns and covariance matrix from historical data, that’s fine too:
import pandas as pd
from pypfopt.efficient_frontier import EfficientFrontier
from pypfopt import risk_models
from pypfopt import expected_returns
# Read in price data
df = pd.read_csv("tests/resources/stock_prices.csv", parse_dates=True, index_col="date")
# Calculate expected returns and sample covariance
mu = expected_returns.mean_historical_return(df)
S = risk_models.sample_cov(df)
# Optimize for maximal Sharpe ratio
ef = EfficientFrontier(mu, S)
weights = ef.max_sharpe()
ef.portfolio_performance(verbose=True)
This outputs the following:
Expected annual return: 33.0%
Annual volatility: 21.7%
Sharpe Ratio: 1.43
Contents¶
User Guide¶
This is designed to be a practical guide, mostly aimed at users who are interested in a quick way of optimally combining some assets (most likely stocks). However, when necessary I do introduce the required theory and also point out areas that may be suitable springboards for more advanced optimization techniques. Details about the parameters can be found in the respective documentation pages (please see the sidebar).
For this guide, we will be focusing on mean-variance optimization (MVO), which is what most people think of when they hear “portfolio optimization”. MVO forms the core of PyPortfolioOpt’s offering, though it should be noted that MVO comes in many flavours, which can have very different performance characteristics. Please refer to the sidebar to get a feeling for the possibilities, as well as the other optimization methods offered. But for now, we will continue with the standard Efficient Frontier.
PyPortfolioOpt is designed with modularity in mind; the below flowchart sums up the current functionality and overall layout of PyPortfolioOpt.

Processing historical prices¶
Mean-variance optimization requires two things: the expected returns of the assets,
and the covariance matrix (or more generally, a risk model quantifying asset risk).
PyPortfolioOpt provides methods for estimating both (located in
expected_returns
and risk_models
respectively), but also supports
users who would like to use their own models.
However, I assume that most users will (at least initially) prefer to use the built-ins. In this case, all you need to supply is a dataset of historical prices for your assets. This dataset should look something like the one below:
XOM RRC BBY MA PFE JPM
date
2010-01-04 54.068794 51.300568 32.524055 22.062426 13.940202 35.175220
2010-01-05 54.279907 51.993038 33.349487 21.997149 13.741367 35.856571
2010-01-06 54.749043 51.690697 33.090542 22.081820 13.697187 36.053574
2010-01-07 54.577045 51.593170 33.616547 21.937523 13.645634 36.767757
2010-01-08 54.358093 52.597733 32.297466 21.945297 13.756095 36.677460
The index should consist of dates or timestamps, and each column should represent the time series of prices for an asset. A dataset of real-life stock prices has been included in the tests folder of the GitHub repo.
Note
Pricing data does not have to be daily, but the frequency should be the same across all assets (workarounds exist but are not pretty).
After reading your historical prices into a pandas dataframe df
, you need to decide
between the available methods for estimating expected returns and the covariance matrix.
Sensible defaults are expected_returns.mean_historical_return()
and
the Ledoit Wolf shrinkage estimate of the covariance matrix found in
risk_models.CovarianceShrinkage
. It is simply a matter of applying the
relevant functions to the price dataset:
from pypfopt.expected_returns import mean_historical_return
from pypfopt.risk_models import CovarianceShrinkage
mu = mean_historical_return(df)
S = CovarianceShrinkage(df).ledoit_wolf()
mu
will then be a pandas series of estimated expected returns for each asset,
and S
will be the estimated covariance matrix (part of it is shown below):
GOOG AAPL FB BABA AMZN GE AMD \
GOOG 0.045529 0.022143 0.006389 0.003720 0.026085 0.015815 0.021761
AAPL 0.022143 0.207037 0.004334 0.002954 0.058200 0.038102 0.084053
FB 0.006389 0.004334 0.029233 0.003770 0.007619 0.003008 0.005804
BABA 0.003720 0.002954 0.003770 0.013438 0.004176 0.002011 0.006332
AMZN 0.026085 0.058200 0.007619 0.004176 0.276365 0.038169 0.075657
GE 0.015815 0.038102 0.003008 0.002011 0.038169 0.083405 0.048580
AMD 0.021761 0.084053 0.005804 0.006332 0.075657 0.048580 0.388916
Now that we have expected returns and a risk model, we are ready to move on to the actual portfolio optimization.
Mean-variance optimization¶
Mean-variance optimization is based on Harry Markowitz’s 1952 classic paper [1], which spearheaded the transformation of portfolio management from an art into a science. The key insight is that by combining assets with different expected returns and volatilities, one can decide on a mathematically optimal allocation.
If \(w\) is the weight vector of stocks with expected returns \(\mu\), then the portfolio return is equal to each stock’s weight multiplied by its return, i.e \(w^T \mu\). The portfolio risk in terms of the covariance matrix \(\Sigma\) is given by \(w^T \Sigma w\). Portfolio optimization can then be regarded as a convex optimization problem, and a solution can be found using quadratic programming. If we denote the target return as \(\mu^*\), the precise statement of the long-only portfolio optimization problem is as follows:
If we vary the target return, we will get a different set of weights (i.e a different portfolio) – the set of all these optimal portfolios is referred to as the efficient frontier.

Each dot on this diagram represents a different possible portfolio, with darker blue corresponding to ‘better’ portfolios (in terms of the Sharpe Ratio). The dotted black line is the efficient frontier itself. The triangular markers represent the best portfolios for different optimization objectives.
The Sharpe ratio is the portfolio’s return in excess of the risk-free rate, per unit risk (volatility).
It is particularly important because it measures the portfolio returns, adjusted for
risk. So in practice, rather than trying to minimise volatility for a given target
return (as per Markowitz 1952), it often makes more sense to just find the portfolio
that maximises the Sharpe ratio. This is implemented as the max_sharpe()
method in the EfficientFrontier
class. Using the series mu
and
dataframe S
from before:
from pypfopt.efficient_frontier import EfficientFrontier
ef = EfficientFrontier(mu, S)
weights = ef.max_sharpe()
If you print these weights, you will get quite an ugly result, because they will
be the raw output from the optimizer. As such, it is recommended that you use
the clean_weights()
method, which truncates tiny weights to zero
and rounds the rest:
cleaned_weights = ef.clean_weights()
ef.save_weights_to_file("weights.txt") # saves to file
print(cleaned_weights)
This prints:
{'GOOG': 0.01269,
'AAPL': 0.09202,
'FB': 0.19856,
'BABA': 0.09642,
'AMZN': 0.07158,
'GE': 0.0,
'AMD': 0.0,
'WMT': 0.0,
'BAC': 0.0,
'GM': 0.0,
'T': 0.0,
'UAA': 0.0,
'SHLD': 0.0,
'XOM': 0.0,
'RRC': 0.0,
'BBY': 0.06129,
'MA': 0.24562,
'PFE': 0.18413,
'JPM': 0.0,
'SBUX': 0.03769}
If we want to know the expected performance of the portfolio with optimal
weights w
, we can use the portfolio_performance()
method:
ef.portfolio_performance(verbose=True)
Expected annual return: 33.0%
Annual volatility: 21.7%
Sharpe Ratio: 1.43
A detailed discussion of optimization parameters is presented in General Efficient Frontier. However, there are two main variations which are discussed below.
Short positions¶
To allow for shorting, simply initialise the EfficientFrontier
object
with bounds that allow negative weights, for example:
ef = EfficientFrontier(mu, S, weight_bounds=(-1,1))
This can be extended to generate market neutral portfolios (with weights
summing to zero), but these are only available for the efficient_risk()
and efficient_return()
optimization methods for mathematical reasons.
If you want a market neutral portfolio, pass market_neutral=True
as shown below:
ef.efficient_return(target_return=0.2, market_neutral=True)
Dealing with many negligible weights¶
From experience, I have found that mean-variance optimization often sets many of the asset weights to be zero. This may not be ideal if you need to have a certain number of positions in your portfolio, for diversification purposes or otherwise.
To combat this, I have introduced an objective function which borrows the idea of
regularisation from machine learning. Essentially, by adding an additional cost
function to the objective, you can ‘encourage’ the optimizer to choose different
weights (mathematical details are provided in the More on L2 Regularisation section).
To use this feature, change the gamma
parameter:
from pypfopt import objective_functions
ef = EfficientFrontier(mu, S)
ef.add_objective(objective_functions.L2_reg, gamma=0.1)
w = ef.max_sharpe()
print(ef.clean_weights())
The result of this has far fewer negligible weights than before:
{'GOOG': 0.06366,
'AAPL': 0.09947,
'FB': 0.15742,
'BABA': 0.08701,
'AMZN': 0.09454,
'GE': 0.0,
'AMD': 0.0,
'WMT': 0.01766,
'BAC': 0.0,
'GM': 0.0,
'T': 0.00398,
'UAA': 0.0,
'SHLD': 0.0,
'XOM': 0.03072,
'RRC': 0.00737,
'BBY': 0.07572,
'MA': 0.1769,
'PFE': 0.12346,
'JPM': 0.0,
'SBUX': 0.06209}
Post-processing weights¶
In practice, we then need to convert these weights into an actual allocation, telling you how many shares of each asset you should purchase. This is discussed further in Post-processing weights, but we provide an example below:
from pypfopt.discrete_allocation import DiscreteAllocation, get_latest_prices
latest_prices = get_latest_prices(df)
da = DiscreteAllocation(w, latest_prices, total_portfolio_value=20000)
allocation, leftover = da.lp_portfolio()
print(allocation)
These are the quantities of shares that should be bought to have a $20,000 portfolio:
{'AAPL': 2.0,
'FB': 12.0,
'BABA': 14.0,
'GE': 18.0,
'WMT': 40.0,
'GM': 58.0,
'T': 97.0,
'SHLD': 1.0,
'XOM': 47.0,
'RRC': 3.0,
'BBY': 1.0,
'PFE': 47.0,
'SBUX': 5.0}
Improving performance¶
Let’s say you have conducted backtests and the results aren’t spectacular. What should you try?
- Try the Hierarchical Risk Parity model (see Other Optimizers) – which seems to robustly outperform mean-variance optimization out of sample.
- Use the Black-Litterman model to construct a more stable model of expected returns.
Alternatively, just drop the expected returns altogether! There is a large body of research
that suggests that minimum variance portfolios (
ef.min_volatility()
) consistently outperform maximum Sharpe ratio portfolios out-of-sample (even when measured by Sharpe ratio), because of the difficulty of forecasting expected returns. - Try different risk models: shrinkage models are known to have better numerical properties compared with the sample covariance matrix.
- Add some new objective terms or constraints. Tune the L2 regularisation parameter to see how diversification affects the performance.
This concludes the guided tour. Head over to the appropriate sections in the sidebar to learn more about the parameters and theoretical details of the different models offered by PyPortfolioOpt. If you have any questions, please raise an issue on GitHub and I will try to respond promptly.
If you’d like even more examples, check out the cookbook recipe.
References¶
[1] | Markowitz, H. (1952). Portfolio Selection. The Journal of Finance, 7(1), 77–91. https://doi.org/10.1111/j.1540-6261.1952.tb01525.x |
Expected Returns¶
Mean-variance optimization requires knowledge of the expected returns. In practice, these are rather difficult to know with any certainty. Thus the best we can do is to come up with estimates, for example by extrapolating historical data, This is the main flaw in mean-variance optimization – the optimization procedure is sound, and provides strong mathematical guarantees, given the correct inputs. This is one of the reasons why I have emphasised modularity: users should be able to come up with their own superior models and feed them into the optimizer.
Caution
Supplying expected returns can do more harm than good. If predicting stock returns were as easy as calculating the mean historical return, we’d all be rich! For most use-cases, I would suggest that you focus your efforts on choosing an appropriate risk model (see Risk Models).
As of v0.5.0, you can use Black-Litterman Allocation to significantly improve the quality of your estimate of the expected returns.
The expected_returns
module provides functions for estimating the expected returns of
the assets, which is a required input in mean-variance optimization.
By convention, the output of these methods is expected annual returns. It is assumed that
daily prices are provided, though in reality the functions are agnostic
to the time period (just change the frequency
parameter). Asset prices must be given as
a pandas dataframe, as per the format described in the User Guide.
All of the functions process the price data into percentage returns data, before calculating their respective estimates of expected returns.
Currently implemented:
- general return model function, allowing you to run any return model from one function.
- mean historical return
- exponentially weighted mean historical return
- CAPM estimate of returns
Additionally, we provide utility functions to convert from returns to prices and vice-versa.
Note
For any of these methods, if you would prefer to pass returns (the default is prices),
set the boolean flag returns_data=True
-
pypfopt.expected_returns.
mean_historical_return
(prices, returns_data=False, compounding=True, frequency=252)[source]¶ Calculate annualised mean (daily) historical return from input (daily) asset prices. Use
compounding
to toggle between the default geometric mean (CAGR) and the arithmetic mean.Parameters: - prices (pd.DataFrame) – adjusted closing prices of the asset, each row is a date and each column is a ticker/id.
- returns_data (bool, defaults to False.) – if true, the first argument is returns instead of prices. These should not be log returns.
- compounding (bool, defaults to True) – computes geometric mean returns if True, arithmetic otherwise, optional.
- frequency (int, optional) – number of time periods in a year, defaults to 252 (the number of trading days in a year)
Returns: annualised mean (daily) return for each asset
Return type: pd.Series
This is probably the default textbook approach. It is intuitive and easily interpretable, however the estimates are subject to large uncertainty. This is a problem especially in the context of a mean-variance optimizer, which will maximise the erroneous inputs.
-
pypfopt.expected_returns.
ema_historical_return
(prices, returns_data=False, compounding=True, span=500, frequency=252)[source]¶ Calculate the exponentially-weighted mean of (daily) historical returns, giving higher weight to more recent data.
Parameters: - prices (pd.DataFrame) – adjusted closing prices of the asset, each row is a date and each column is a ticker/id.
- returns_data (bool, defaults to False.) – if true, the first argument is returns instead of prices. These should not be log returns.
- compounding (bool, defaults to True) – computes geometric mean returns if True, arithmetic otherwise, optional.
- frequency (int, optional) – number of time periods in a year, defaults to 252 (the number of trading days in a year)
- span (int, optional) – the time-span for the EMA, defaults to 500-day EMA.
Returns: annualised exponentially-weighted mean (daily) return of each asset
Return type: pd.Series
The exponential moving average is a simple improvement over the mean historical return; it gives more credence to recent returns and thus aims to increase the relevance of the estimates. This is parameterised by the
span
parameter, which gives users the ability to decide exactly how much more weight is given to recent data. Generally, I would err on the side of a higher span – in the limit, this tends towards the mean historical return. However, if you plan on rebalancing much more frequently, there is a case to be made for lowering the span in order to capture recent trends.
-
pypfopt.expected_returns.
capm_return
(prices, market_prices=None, returns_data=False, risk_free_rate=0.02, compounding=True, frequency=252)[source]¶ Compute a return estimate using the Capital Asset Pricing Model. Under the CAPM, asset returns are equal to market returns plus a \(eta\) term encoding the relative risk of the asset.
\[R_i = R_f + \beta_i (E(R_m) - R_f)\]Parameters: - prices (pd.DataFrame) – adjusted closing prices of the asset, each row is a date and each column is a ticker/id.
- market_prices (pd.DataFrame, optional) – adjusted closing prices of the benchmark, defaults to None
- returns_data (bool, defaults to False.) – if true, the first arguments are returns instead of prices.
- risk_free_rate (float, optional) – risk-free rate of borrowing/lending, defaults to 0.02. You should use the appropriate time period, corresponding to the frequency parameter.
- compounding (bool, defaults to True) – computes geometric mean returns if True, arithmetic otherwise, optional.
- frequency (int, optional) – number of time periods in a year, defaults to 252 (the number of trading days in a year)
Returns: annualised return estimate
Return type: pd.Series
-
pypfopt.expected_returns.
returns_from_prices
(prices, log_returns=False)[source]¶ Calculate the returns given prices.
Parameters: - prices (pd.DataFrame) – adjusted (daily) closing prices of the asset, each row is a date and each column is a ticker/id.
- log_returns (bool, defaults to False) – whether to compute using log returns
Returns: (daily) returns
Return type: pd.DataFrame
-
pypfopt.expected_returns.
prices_from_returns
(returns, log_returns=False)[source]¶ Calculate the pseudo-prices given returns. These are not true prices because the initial prices are all set to 1, but it behaves as intended when passed to any PyPortfolioOpt method.
Parameters: - returns (pd.DataFrame) – (daily) percentage returns of the assets
- log_returns (bool, defaults to False) – whether to compute using log returns
Returns: (daily) pseudo-prices.
Return type: pd.DataFrame
Risk Models¶
In addition to the expected returns, mean-variance optimization requires a risk model, some way of quantifying asset risk. The most commonly-used risk model is the covariance matrix, which describes asset volatilities and their co-dependence. This is important because one of the principles of diversification is that risk can be reduced by making many uncorrelated bets (correlation is just normalised covariance).

In many ways, the subject of risk models is far more important than that of expected returns because historical variance is generally a much more persistent statistic than mean historical returns. In fact, research by Kritzman et al. (2010) [1] suggests that minimum variance portfolios, formed by optimising without providing expected returns, actually perform much better out of sample.
The problem, however, is that in practice we do not have access to the covariance
matrix (in the same way that we don’t have access to expected returns) – the only
thing we can do is to make estimates based on past data. The most straightforward
approach is to just calculate the sample covariance matrix based on historical
returns, but relatively recent (post-2000) research indicates that there are much
more robust statistical estimators of the covariance matrix. In addition to
providing a wrapper around the estimators in sklearn
, PyPortfolioOpt
provides some experimental alternatives such as semicovariance and exponentially weighted
covariance.
Attention
Estimation of the covariance matrix is a very deep and actively-researched topic that involves statistics, econometrics, and numerical/computational approaches. PyPortfolioOpt implements several options, but there is a lot of room for more sophistication.
The risk_models
module provides functions for estimating the covariance matrix given
historical returns.
The format of the data input is the same as that in Expected Returns.
Currently implemented:
fix non-positive semidefinite matrices
general risk matrix function, allowing you to run any risk model from one function.
sample covariance
semicovariance
exponentially weighted covariance
minimum covariance determinant
shrunk covariance matrices:
- manual shrinkage
- Ledoit Wolf shrinkage
- Oracle Approximating shrinkage
covariance to correlation matrix
Note
For any of these methods, if you would prefer to pass returns (the default is prices),
set the boolean flag returns_data=True
-
pypfopt.risk_models.
risk_matrix
(prices, method='sample_cov', **kwargs)[source]¶ Compute a covariance matrix, using the risk model supplied in the
method
parameter.Parameters: - prices (pd.DataFrame) – adjusted closing prices of the asset, each row is a date and each column is a ticker/id.
- returns_data (bool, defaults to False.) – if true, the first argument is returns instead of prices.
- method (str, optional) –
the risk model to use. Should be one of:
sample_cov
semicovariance
exp_cov
ledoit_wolf
ledoit_wolf_constant_variance
ledoit_wolf_single_factor
ledoit_wolf_constant_correlation
oracle_approximating
Raises: NotImplementedError – if the supplied method is not recognised
Returns: annualised sample covariance matrix
Return type: pd.DataFrame
-
pypfopt.risk_models.
fix_nonpositive_semidefinite
(matrix, fix_method='spectral')[source]¶ Check if a covariance matrix is positive semidefinite, and if not, fix it with the chosen method.
The
spectral
method sets negative eigenvalues to zero then rebuilds the matrix, while thediag
method adds a small positive value to the diagonal.Parameters: - matrix (pd.DataFrame) – raw covariance matrix (may not be PSD)
- fix_method (str, optional) – {“spectral”, “diag”}, defaults to “spectral”
Raises: NotImplementedError – if a method is passed that isn’t implemented
Returns: positive semidefinite covariance matrix
Return type: pd.DataFrame
Not all the calculated covariance matrices will be positive semidefinite (PSD). This method checks if a matrix is PSD and fixes it if not.
-
pypfopt.risk_models.
sample_cov
(prices, returns_data=False, frequency=252, **kwargs)[source]¶ Calculate the annualised sample covariance matrix of (daily) asset returns.
Parameters: - prices (pd.DataFrame) – adjusted closing prices of the asset, each row is a date and each column is a ticker/id.
- returns_data (bool, defaults to False.) – if true, the first argument is returns instead of prices.
- frequency (int, optional) – number of time periods in a year, defaults to 252 (the number of trading days in a year)
Returns: annualised sample covariance matrix
Return type: pd.DataFrame
This is the textbook default approach. The entries in the sample covariance matrix (which we denote as S) are the sample covariances between the i th and j th asset (the diagonals consist of variances). Although the sample covariance matrix is an unbiased estimator of the covariance matrix, i.e \(E(S) = \Sigma\), in practice it suffers from misspecification error and a lack of robustness. This is particularly problematic in mean-variance optimization, because the optimizer may give extra credence to the erroneous values.
Note
This should not be your default choice! Please use a shrinkage estimator instead.
-
pypfopt.risk_models.
semicovariance
(prices, returns_data=False, benchmark=7.9e-05, frequency=252, **kwargs)[source]¶ Estimate the semicovariance matrix, i.e the covariance given that the returns are less than the benchmark.
Parameters: - prices (pd.DataFrame) – adjusted closing prices of the asset, each row is a date and each column is a ticker/id.
- returns_data (bool, defaults to False.) – if true, the first argument is returns instead of prices.
- benchmark (float) – the benchmark return, defaults to the daily risk-free rate, i.e \(1.02^{(1/252)} -1\).
- frequency (int, optional) – number of time periods in a year, defaults to 252 (the number
of trading days in a year). Ensure that you use the appropriate
benchmark, e.g if
frequency=12
use the monthly risk-free rate.
Returns: semicovariance matrix
Return type: pd.DataFrame
The semivariance is the variance of all returns which are below some benchmark B (typically the risk-free rate) – it is a common measure of downside risk. There are multiple possible ways of defining a semicovariance matrix, the main differences lying in the ‘pairwise’ nature, i.e whether we should sum over \(\min(r_i,B)\min(r_j,B)\) or \(\min(r_ir_j, B)\). In this implementation, we have followed the advice of Estrada (2007) [2], preferring:
\[\frac{1}{n}\sum_{i = 1}^n {\sum_{j = 1}^n {\min \left( {{r_i},B} \right)} } \min \left( {{r_j},B} \right)\]
-
pypfopt.risk_models.
exp_cov
(prices, returns_data=False, span=180, frequency=252, **kwargs)[source]¶ Estimate the exponentially-weighted covariance matrix, which gives greater weight to more recent data.
Parameters: - prices (pd.DataFrame) – adjusted closing prices of the asset, each row is a date and each column is a ticker/id.
- returns_data (bool, defaults to False.) – if true, the first argument is returns instead of prices.
- span (int, optional) – the span of the exponential weighting function, defaults to 180
- frequency (int, optional) – number of time periods in a year, defaults to 252 (the number of trading days in a year)
Returns: annualised estimate of exponential covariance matrix
Return type: pd.DataFrame
The exponential covariance matrix is a novel way of giving more weight to recent data when calculating covariance, in the same way that the exponential moving average price is often preferred to the simple average price. For a full explanation of how this estimator works, please refer to the blog post on my academic website.
-
pypfopt.risk_models.
cov_to_corr
(cov_matrix)[source]¶ Convert a covariance matrix to a correlation matrix.
Parameters: cov_matrix (pd.DataFrame) – covariance matrix Returns: correlation matrix Return type: pd.DataFrame
-
pypfopt.risk_models.
corr_to_cov
(corr_matrix, stdevs)[source]¶ Convert a correlation matrix to a covariance matrix
Parameters: - corr_matrix (pd.DataFrame) – correlation matrix
- stdevs (array-like) – vector of standard deviations
Returns: covariance matrix
Return type: pd.DataFrame
Shrinkage estimators¶
A great starting point for those interested in understanding shrinkage estimators is Honey, I Shrunk the Sample Covariance Matrix [3] by Ledoit and Wolf, which does a good job at capturing the intuition behind them – we will adopt the notation used therein. I have written a summary of this article, which is available on my website. A more rigorous reference can be found in Ledoit and Wolf (2001) [4].
The essential idea is that the unbiased but often poorly estimated sample covariance can be combined with a structured estimator \(F\), using the below formula (where \(\delta\) is the shrinkage constant):
It is called shrinkage because it can be thought of as “shrinking” the sample covariance matrix towards the other estimator, which is accordingly called the shrinkage target. The shrinkage target may be significantly biased but has little estimation error. There are many possible options for the target, and each one will result in a different optimal shrinkage constant \(\delta\). PyPortfolioOpt offers the following shrinkage methods:
Ledoit-Wolf shrinkage:
constant_variance
shrinkage, i.e the target is the diagonal matrix with the mean of asset variances on the diagonals and zeroes elsewhere. This is the shrinkage offered bysklearn.LedoitWolf
.single_factor
shrinkage. Based on Sharpe’s single-index model which effectively uses a stock’s beta to the market as a risk model. See Ledoit and Wolf 2001 [4].constant_correlation
shrinkage, in which all pairwise correlations are set to the average correlation (sample variances are unchanged). See Ledoit and Wolf 2003 [3]
Oracle approximating shrinkage (OAS), invented by Chen et al. (2010) [5], which has a lower mean-squared error than Ledoit-Wolf shrinkage when samples are Gaussian or near-Gaussian.
Tip
For most use cases, I would just go with Ledoit Wolf shrinkage, as recommended by Quantopian in their lecture series on quantitative finance.
My implementations have been translated from the Matlab code on Michael Wolf’s webpage, with the help of xtuanta.
-
class
pypfopt.risk_models.
CovarianceShrinkage
(prices, returns_data=False, frequency=252)[source]¶ Provide methods for computing shrinkage estimates of the covariance matrix, using the sample covariance matrix and choosing the structured estimator to be an identity matrix multiplied by the average sample variance. The shrinkage constant can be input manually, though there exist methods (notably Ledoit Wolf) to estimate the optimal value.
Instance variables:
X
- pd.DataFrame (returns)S
- np.ndarray (sample covariance matrix)delta
- float (shrinkage constant)frequency
- int
-
__init__
(prices, returns_data=False, frequency=252)[source]¶ Parameters: - prices (pd.DataFrame) – adjusted closing prices of the asset, each row is a date and each column is a ticker/id.
- returns_data (bool, defaults to False.) – if true, the first argument is returns instead of prices.
- frequency (int, optional) – number of time periods in a year, defaults to 252 (the number of trading days in a year)
-
ledoit_wolf
(shrinkage_target='constant_variance')[source]¶ Calculate the Ledoit-Wolf shrinkage estimate for a particular shrinkage target.
Parameters: shrinkage_target (str, optional) – choice of shrinkage target, either constant_variance
,single_factor
orconstant_correlation
. Defaults toconstant_variance
.Raises: NotImplementedError – if the shrinkage_target is unrecognised Returns: shrunk sample covariance matrix Return type: np.ndarray
-
oracle_approximating
()[source]¶ Calculate the Oracle Approximating Shrinkage estimate
Returns: shrunk sample covariance matrix Return type: np.ndarray
-
shrunk_covariance
(delta=0.2)[source]¶ Shrink a sample covariance matrix to the identity matrix (scaled by the average sample variance). This method does not estimate an optimal shrinkage parameter, it requires manual input.
Parameters: delta (float, optional) – shrinkage parameter, defaults to 0.2. Returns: shrunk sample covariance matrix Return type: np.ndarray
References¶
[1] | Kritzman, Page & Turkington (2010) In defense of optimization: The fallacy of 1/N. Financial Analysts Journal, 66(2), 31-39. |
[2] | Estrada (2006), Mean-Semivariance Optimization: A Heuristic Approach |
[3] | (1, 2) Ledoit, O., & Wolf, M. (2003). Honey, I Shrunk the Sample Covariance Matrix The Journal of Portfolio Management, 30(4), 110–119. https://doi.org/10.3905/jpm.2004.110 |
[4] | (1, 2) Ledoit, O., & Wolf, M. (2001). Improved estimation of the covariance matrix of stock returns with an application to portfolio selection, 10, 603–621. |
[5] | Chen et al. (2010), Shrinkage Algorithms for MMSE Covariance Estimation, IEEE Transactions on Signals Processing, 58(10), 5016-5029. |
Mean-Variance Optimization¶
Mathematical optimization is a very difficult problem in general, particularly when we are dealing with complex objectives and constraints. However, convex optimization problems are a well-understood class of problems, which happen to be incredibly useful for finance. A convex problem has the following form:
where \(\mathbf{x} \in \mathbb{R}^n\), and \(f(\mathbf{x}), g_i(\mathbf{x})\) are convex functions. [1]
Fortunately, portfolio optimization problems (with standard objectives and constraints) are convex. This allows us to immediately apply the vast body of theory as well as the refined solving routines – accordingly, the main difficulty is inputting our specific problem into a solver.
PyPortfolioOpt aims to do the hard work for you, allowing for one-liners like ef.min_volatility()
to generate a portfolio that minimises the volatility, while at the same time allowing for more
complex problems to be built up from modular units. This is all possible thanks to
cvxpy, the fantastic python-embedded modelling
language for convex optimization upon which PyPortfolioOpt’s efficient frontier functionality lies.
Tip
You can find complete examples in the relevant cookbook recipe.
Structure¶
As shown in the definition of a convex problem, there are essentially two things we need to specify: the optimization objective, and the optimization constraints. For example, the classic portfolio optimization problem is to minimise risk subject to a return constraint (i.e the portfolio must return more than a certain amount). From an implementation perspective, however, there is not much difference between an objective and a constraint. Consider a similar problem, which is to maximize return subject to a risk constraint – now, the role of risk and return have swapped.
To that end, PyPortfolioOpt defines an objective_functions
module that contains objective functions
(which can also act as constraints, as we have just seen). The actual optimization occurs in the efficient_frontier.EfficientFrontier
class.
This class provides straightforward methods for optimising different objectives (all documented below).
However, PyPortfolioOpt was designed so that you can easily add new constraints or objective terms to an existing problem. For example, adding a regularisation objective (explained below) to a minimum volatility objective is as simple as:
ef = EfficientFrontier(expected_returns, cov_matrix) # setup
ef.add_objective(objective_functions.L2_reg) # add a secondary objective
ef.min_volatility() # find the portfolio that minimises volatility and L2_reg
Tip
If you would like to plot the efficient frontier, take a look at the Plotting module.
Basic Usage¶
The efficient_frontier
module houses the EfficientFrontier class and its descendants,
which generate optimal portfolios for various possible objective functions and parameters.
-
class
pypfopt.efficient_frontier.
EfficientFrontier
(expected_returns, cov_matrix, weight_bounds=(0, 1), solver=None, verbose=False, solver_options=None)[source]¶ An EfficientFrontier object (inheriting from BaseConvexOptimizer) contains multiple optimization methods that can be called (corresponding to different objective functions) with various parameters. Note: a new EfficientFrontier object should be instantiated if you want to make any change to objectives/constraints/bounds/parameters.
Instance variables:
Inputs:
n_assets
- inttickers
- str listbounds
- float tuple OR (float tuple) listcov_matrix
- np.ndarrayexpected_returns
- np.ndarraysolver
- strsolver_options
- {str: str} dict
Output:
weights
- np.ndarray
Public methods:
min_volatility()
optimizes for minimum volatilitymax_sharpe()
optimizes for maximal Sharpe ratio (a.k.a the tangency portfolio)max_quadratic_utility()
maximises the quadratic utility, given some risk aversion.efficient_risk()
maximises return for a given target riskefficient_return()
minimises risk for a given target returnadd_objective()
adds a (convex) objective to the optimization problemadd_constraint()
adds a constraint to the optimization problemconvex_objective()
solves for a generic convex objective with linear constraintsportfolio_performance()
calculates the expected return, volatility and Sharpe ratio for the optimized portfolio.set_weights()
creates self.weights (np.ndarray) from a weights dictclean_weights()
rounds the weights and clips near-zeros.save_weights_to_file()
saves the weights to csv, json, or txt.
-
__init__
(expected_returns, cov_matrix, weight_bounds=(0, 1), solver=None, verbose=False, solver_options=None)[source]¶ Parameters: - expected_returns (pd.Series, list, np.ndarray) – expected returns for each asset. Can be None if optimising for volatility only (but not recommended).
- cov_matrix (pd.DataFrame or np.array) – covariance of returns for each asset. This must be positive semidefinite, otherwise optimization will fail.
- weight_bounds (tuple OR tuple list, optional) – minimum and maximum weight of each asset OR single min/max pair if all identical, defaults to (0, 1). Must be changed to (-1, 1) for portfolios with shorting.
- solver (str) – name of solver. list available solvers with: cvxpy.installed_solvers()
- verbose (bool, optional) – whether performance and debugging info should be printed, defaults to False
- solver_options (dict, optional) – parameters for the given solver
Raises: - TypeError – if
expected_returns
is not a series, list or array - TypeError – if
cov_matrix
is not a dataframe or array
Note
As of v0.5.0, you can pass a collection (list or tuple) of (min, max) pairs representing different bounds for different assets.
Tip
If you want to generate short-only portfolios, there is a quick hack. Multiply your expected returns by -1, then optimize a long-only portfolio.
-
min_volatility
()[source]¶ Minimise volatility.
Returns: asset weights for the volatility-minimising portfolio Return type: OrderedDict
-
max_sharpe
(risk_free_rate=0.02)[source]¶ Maximise the Sharpe Ratio. The result is also referred to as the tangency portfolio, as it is the portfolio for which the capital market line is tangent to the efficient frontier.
This is a convex optimization problem after making a certain variable substitution. See Cornuejols and Tutuncu (2006) for more.
Parameters: risk_free_rate (float, optional) – risk-free rate of borrowing/lending, defaults to 0.02. The period of the risk-free rate should correspond to the frequency of expected returns. Raises: ValueError – if risk_free_rate
is non-numericReturns: asset weights for the Sharpe-maximising portfolio Return type: OrderedDict Caution
Because
max_sharpe()
makes a variable substitution, additional objectives may not work as intended.
-
max_quadratic_utility
(risk_aversion=1, market_neutral=False)[source]¶ Maximise the given quadratic utility, i.e:
\[\max_w w^T \mu - \frac \delta 2 w^T \Sigma w\]Parameters: - risk_aversion (positive float) – risk aversion parameter (must be greater than 0), defaults to 1
- market_neutral – whether the portfolio should be market neutral (weights sum to zero), defaults to False. Requires negative lower weight bound.
- market_neutral – bool, optional
Returns: asset weights for the maximum-utility portfolio
Return type: OrderedDict
Note
pypfopt.black_litterman
provides a method for calculating the market-implied risk-aversion parameter, which gives a useful estimate in the absence of other information!
-
efficient_risk
(target_volatility, market_neutral=False)[source]¶ Maximise return for a target risk. The resulting portfolio will have a volatility less than the target (but not guaranteed to be equal).
Parameters: - target_volatility (float) – the desired maximum volatility of the resulting portfolio.
- market_neutral – whether the portfolio should be market neutral (weights sum to zero), defaults to False. Requires negative lower weight bound.
- market_neutral – bool, optional
Raises: - ValueError – if
target_volatility
is not a positive float - ValueError – if no portfolio can be found with volatility equal to
target_volatility
- ValueError – if
risk_free_rate
is non-numeric
Returns: asset weights for the efficient risk portfolio
Return type: OrderedDict
Caution
If you pass an unreasonable target into
efficient_risk()
orefficient_return()
, the optimizer will fail silently and return weird weights. Caveat emptor applies!
-
efficient_return
(target_return, market_neutral=False)[source]¶ Calculate the ‘Markowitz portfolio’, minimising volatility for a given target return.
Parameters: - target_return (float) – the desired return of the resulting portfolio.
- market_neutral (bool, optional) – whether the portfolio should be market neutral (weights sum to zero), defaults to False. Requires negative lower weight bound.
Raises: - ValueError – if
target_return
is not a positive float - ValueError – if no portfolio can be found with return equal to
target_return
Returns: asset weights for the Markowitz portfolio
Return type: OrderedDict
-
portfolio_performance
(verbose=False, risk_free_rate=0.02)[source]¶ After optimising, calculate (and optionally print) the performance of the optimal portfolio. Currently calculates expected return, volatility, and the Sharpe ratio.
Parameters: - verbose (bool, optional) – whether performance should be printed, defaults to False
- risk_free_rate (float, optional) – risk-free rate of borrowing/lending, defaults to 0.02. The period of the risk-free rate should correspond to the frequency of expected returns.
Raises: ValueError – if weights have not been calcualted yet
Returns: expected return, volatility, Sharpe ratio.
Return type: (float, float, float)
Tip
If you would like to use the
portfolio_performance
function independently of any optimizer (e.g for debugging purposes), you can use:from pypfopt import base_optimizer base_optimizer.portfolio_performance( weights, expected_returns, cov_matrix, verbose=True, risk_free_rate=0.02 )
Note
PyPortfolioOpt defers to cvxpy’s default choice of solver. If you would like to explicitly
choose the solver, simply pass the optional solver = "ECOS"
kwarg to the constructor.
You can choose from any of the supported solvers,
and pass in solver params via solver_options
(a dict
).
Adding objectives and constraints¶
EfficientFrontier inherits from the BaseConvexOptimizer class. In particular, the functions to add constraints and objectives are documented below:
-
class
pypfopt.base_optimizer.
BaseConvexOptimizer
-
BaseConvexOptimizer.
add_constraint
(new_constraint)¶ Add a new constraint to the optimization problem. This constraint must satisfy DCP rules, i.e be either a linear equality constraint or convex inequality constraint.
Examples:
ef.add_constraint(lambda x : x[0] == 0.02) ef.add_constraint(lambda x : x >= 0.01) ef.add_constraint(lambda x: x <= np.array([0.01, 0.08, ..., 0.5]))
Parameters: new_constraint – the constraint to be added
-
BaseConvexOptimizer.
add_sector_constraints
(sector_mapper, sector_lower, sector_upper)¶ Adds constraints on the sum of weights of different groups of assets. Most commonly, these will be sector constraints e.g portfolio’s exposure to tech must be less than x%:
sector_mapper = { "GOOG": "tech", "FB": "tech",, "XOM": "Oil/Gas", "RRC": "Oil/Gas", "MA": "Financials", "JPM": "Financials", } sector_lower = {"tech": 0.1} # at least 10% to tech sector_upper = { "tech": 0.4, # less than 40% tech "Oil/Gas": 0.1 # less than 10% oil and gas }
Parameters: - sector_mapper ({str: str} dict) – dict that maps tickers to sectors
- sector_lower ({str: float} dict) – lower bounds for each sector
- sector_upper ({str:float} dict) – upper bounds for each sector
-
BaseConvexOptimizer.
add_objective
(new_objective, **kwargs)¶ Add a new term into the objective function. This term must be convex, and built from cvxpy atomic functions.
Example:
def L1_norm(w, k=1): return k * cp.norm(w, 1) ef.add_objective(L1_norm, k=2)
Parameters: new_objective (cp.Expression (i.e function of cp.Variable)) – the objective to be added
-
Objective functions¶
The objective_functions
module provides optimization objectives, including the actual
objective functions called by the EfficientFrontier
object’s optimization methods.
These methods are primarily designed for internal use during optimization and each requires
a different signature (which is why they have not been factored into a class).
For obvious reasons, any objective function must accept weights
as an argument, and must also have at least one of expected_returns
or cov_matrix
.
The objective functions either compute the objective given a numpy array of weights, or they
return a cvxpy expression when weights are a cp.Variable
. In this way, the same objective
function can be used both internally for optimization and externally for computing the objective
given weights. _objective_value()
automatically chooses between the two behaviours.
objective_functions
defaults to objectives for minimisation. In the cases of objectives
that clearly should be maximised (e.g Sharpe Ratio, portfolio return), the objective function
actually returns the negative quantity, since minimising the negative is equivalent to maximising
the positive. This behaviour is controlled by the negative=True
optional argument.
Currently implemented:
- Portfolio variance (i.e square of volatility)
- Portfolio return
- Sharpe ratio
- L2 regularisation (minimising this reduces nonzero weights)
- Quadratic utility
- Transaction cost model (a simple one)
- Ex-ante (squared) tracking error
- Ex-post (squared) tracking error
-
pypfopt.objective_functions.
L2_reg
(w, gamma=1)[source]¶ L2 regularisation, i.e \(\gamma ||w||^2\), to increase the number of nonzero weights.
Example:
ef = EfficientFrontier(mu, S) ef.add_objective(objective_functions.L2_reg, gamma=2) ef.min_volatility()
Parameters: - w (np.ndarray OR cp.Variable) – asset weights in the portfolio
- gamma (float, optional) – L2 regularisation parameter, defaults to 1. Increase if you want more non-negligible weights
Returns: value of the objective function OR objective function expression
Return type: float OR cp.Expression
-
pypfopt.objective_functions.
ex_ante_tracking_error
(w, cov_matrix, benchmark_weights)[source]¶ Calculate the (square of) the ex-ante Tracking Error, i.e \((w - w_b)^T \Sigma (w-w_b)\).
Parameters: - w (np.ndarray OR cp.Variable) – asset weights in the portfolio
- cov_matrix (np.ndarray) – covariance matrix
- benchmark_weights (np.ndarray) – asset weights in the benchmark
Returns: value of the objective function OR objective function expression
Return type: float OR cp.Expression
-
pypfopt.objective_functions.
ex_post_tracking_error
(w, historic_returns, benchmark_returns)[source]¶ Calculate the (square of) the ex-post Tracking Error, i.e \(Var(r - r_b)\).
Parameters: - w (np.ndarray OR cp.Variable) – asset weights in the portfolio
- historic_returns (np.ndarray) – historic asset returns
- benchmark_returns (pd.Series or np.ndarray) – historic benchmark returns
Returns: value of the objective function OR objective function expression
Return type: float OR cp.Expression
-
pypfopt.objective_functions.
portfolio_return
(w, expected_returns, negative=True)[source]¶ Calculate the (negative) mean return of a portfolio
Parameters: - w (np.ndarray OR cp.Variable) – asset weights in the portfolio
- expected_returns (np.ndarray) – expected return of each asset
- negative (boolean) – whether quantity should be made negative (so we can minimise)
Returns: negative mean return
Return type: float
-
pypfopt.objective_functions.
portfolio_variance
(w, cov_matrix)[source]¶ Calculate the total portfolio variance (i.e square volatility).
Parameters: - w (np.ndarray OR cp.Variable) – asset weights in the portfolio
- cov_matrix (np.ndarray) – covariance matrix
Returns: value of the objective function OR objective function expression
Return type: float OR cp.Expression
-
pypfopt.objective_functions.
quadratic_utility
(w, expected_returns, cov_matrix, risk_aversion, negative=True)[source]¶ Quadratic utility function, i.e \(\mu - \frac 1 2 \delta w^T \Sigma w\).
Parameters: - w (np.ndarray OR cp.Variable) – asset weights in the portfolio
- expected_returns (np.ndarray) – expected return of each asset
- cov_matrix (np.ndarray) – covariance matrix
- risk_aversion (float) – risk aversion coefficient. Increase to reduce risk.
- negative (boolean) – whether quantity should be made negative (so we can minimise).
Returns: value of the objective function OR objective function expression
Return type: float OR cp.Expression
-
pypfopt.objective_functions.
sharpe_ratio
(w, expected_returns, cov_matrix, risk_free_rate=0.02, negative=True)[source]¶ Calculate the (negative) Sharpe ratio of a portfolio
Parameters: - w (np.ndarray OR cp.Variable) – asset weights in the portfolio
- expected_returns (np.ndarray) – expected return of each asset
- cov_matrix (np.ndarray) – covariance matrix
- risk_free_rate (float, optional) – risk-free rate of borrowing/lending, defaults to 0.02. The period of the risk-free rate should correspond to the frequency of expected returns.
- negative (boolean) – whether quantity should be made negative (so we can minimise)
Returns: (negative) Sharpe ratio
Return type: float
-
pypfopt.objective_functions.
transaction_cost
(w, w_prev, k=0.001)[source]¶ A very simple transaction cost model: sum all the weight changes and multiply by a given fraction (default to 10bps). This simulates a fixed percentage commission from your broker.
Parameters: - w (np.ndarray OR cp.Variable) – asset weights in the portfolio
- w_prev (np.ndarray) – previous weights
- k (float) – fractional cost per unit weight exchanged
Returns: value of the objective function OR objective function expression
Return type: float OR cp.Expression
More on L2 Regularisation¶
As has been discussed in the User Guide, mean-variance optimization often results in many weights being negligible, i.e the efficient portfolio does not end up including most of the assets. This is expected behaviour, but it may be undesirable if you need a certain number of assets in your portfolio.
In order to coerce the mean-variance optimizer to produce more non-negligible
weights, we add what can be thought of as a “small weights penalty” to all
of the objective functions, parameterised by \(\gamma\) (gamma
). Considering
the minimum variance objective for instance, we have:
Note that \(w^T w\) is the same as the sum of squared weights (I didn’t write this explicitly to reduce confusion caused by \(\Sigma\) denoting both the covariance matrix and the summation operator). This term reduces the number of negligible weights, because it has a minimum value when all weights are equally distributed, and maximum value in the limiting case where the entire portfolio is allocated to one asset. I refer to it as L2 regularisation because it has exactly the same form as the L2 regularisation term in machine learning, though a slightly different purpose (in ML it is used to keep weights small while here it is used to make them larger).
Note
In practice, \(\gamma\) must be tuned to achieve the level
of regularisation that you want. However, if the universe of assets is small
(less than 20 assets), then gamma=1
is a good starting point. For larger
universes, or if you want more non-negligible weights in the final portfolio,
increase gamma
.
References¶
[1] | Boyd, S.; Vandenberghe, L. (2004). Convex Optimization. |
General Efficient Frontier¶
The mean-variance optimization methods described previously can be used whenever you have a vector of expected returns and a covariance matrix. The objective and constraints will be some combination of the portfolio return and portfolio volatility.
However, you may want to construct the efficient frontier for an entirely different type of risk model (one that doesn’t depend on covariance matrices), or optimize an objective unrelated to portfolio return (e.g tracking error). PyPortfolioOpt comes with several popular alternatives and provides support for custom optimization problems.
Efficient Semivariance¶
Instead of penalising volatility, mean-semivariance optimization seeks to only penalise downside volatility, since upside volatility may be desirable.
There are two approaches to the mean-semivariance optimization problem. The first is to use a
heuristic (i.e “quick and dirty”) solution: pretending that the semicovariance matrix
(implemented in risk_models
) is a typical covariance matrix and doing standard
mean-variance optimization. It can be shown that this does not yield a portfolio that
is efficient in mean-semivariance space (though it might be a good-enough approximation).
Fortunately, it is possible to write mean-semivariance optimization as a convex problem (albeit one with many variables), that can be solved to give an “exact” solution. For example, to maximise return for a target semivariance \(s^*\) (long-only), we would solve the following problem:
Here, B is the \(T \times N\) (scaled) matrix of excess returns:
B = (returns - benchmark) / sqrt(T)
. Additional linear equality constraints and
convex inequality constraints can be added.
PyPortfolioOpt allows users to optimize along the efficient semivariance frontier
via the EfficientSemivariance
class. EfficientSemivariance
inherits from
EfficientFrontier
, so it has the same utility methods
(e.g add_constraint()
, portfolio_performance()
), but finds portfolios on the mean-semivariance
frontier. Note that some of the parent methods, like max_sharpe()
and min_volatility()
are not applicable to mean-semivariance portfolios, so calling them returns NotImplementedError
.
EfficientSemivariance
has a slightly different API to EfficientFrontier
. Instead of passing
in a covariance matrix, you should past in a dataframe of historical/simulated returns (this can be constructed
from your price dataframe using the helper method expected_returns.returns_from_prices()
). Here
is a full example, in which we seek the portfolio that minimises the semivariance for a target
annual return of 20%:
from pypfopt import expected_returns, EfficientSemivariance
df = ... # your dataframe of prices
mu = expected_returns.mean_historical_returns(df)
historical_returns = expected_returns.returns_from_prices(df)
es = EfficientSemivariance(mu, historical_returns)
es.efficient_return(0.20)
# We can use the same helper methods as before
weights = es.clean_weights()
print(weights)
es.portfolio_performance(verbose=True)
The portfolio_performance
method outputs the expected portfolio return, semivariance,
and the Sortino ratio (like the Sharpe ratio, but for downside deviation).
Interested readers should refer to Estrada (2007) [1] for more details. I’d like to thank Philipp Schiele for authoring the bulk of the efficient semivariance functionality and documentation (all errors are my own). The implementation is based on Markowitz et al (2019) [2].
Caution
Finding portfolios on the mean-semivariance frontier is computationally harder
than standard mean-variance optimization: our implementation uses 2T + N
optimization variables,
meaning that for 50 assets and 3 years of data, there are about 1500 variables.
While EfficientSemivariance
allows for additional constraints/objectives in principle,
you are much more likely to run into solver errors. I suggest that you keep EfficientSemivariance
problems small and minimally constrained.
-
class
pypfopt.efficient_frontier.
EfficientSemivariance
(expected_returns, returns, frequency=252, benchmark=0, weight_bounds=(0, 1), solver=None, verbose=False, solver_options=None)[source]¶ EfficientSemivariance objects allow for optimization along the mean-semivariance frontier. This may be relevant for users who are more concerned about downside deviation.
Instance variables:
Inputs:
n_assets
- inttickers
- str listbounds
- float tuple OR (float tuple) listreturns
- pd.DataFrameexpected_returns
- np.ndarraysolver
- strsolver_options
- {str: str} dict
Output:
weights
- np.ndarray
Public methods:
min_semivariance()
minimises the portfolio semivariance (downside deviation)max_quadratic_utility()
maximises the “downside quadratic utility”, given some risk aversion.efficient_risk()
maximises return for a given target semideviationefficient_return()
minimises semideviation for a given target returnadd_objective()
adds a (convex) objective to the optimization problemadd_constraint()
adds a constraint to the optimization problemconvex_objective()
solves for a generic convex objective with linear constraintsportfolio_performance()
calculates the expected return, semideviation and Sortino ratio for the optimized portfolio.set_weights()
creates self.weights (np.ndarray) from a weights dictclean_weights()
rounds the weights and clips near-zeros.save_weights_to_file()
saves the weights to csv, json, or txt.
-
efficient_return
(target_return, market_neutral=False)[source]¶ Minimise semideviation for a given target return.
Parameters: - target_return (float) – the desired return of the resulting portfolio.
- market_neutral (bool, optional) – whether the portfolio should be market neutral (weights sum to zero), defaults to False. Requires negative lower weight bound.
Raises: - ValueError – if
target_return
is not a positive float - ValueError – if no portfolio can be found with return equal to
target_return
Returns: asset weights for the optimal portfolio
Return type: OrderedDict
-
efficient_risk
(target_semideviation, market_neutral=False)[source]¶ Maximise return for a target semideviation (downside standard deviation). The resulting portfolio will have a semideviation less than the target (but not guaranteed to be equal).
Parameters: - target_semideviation (float) – the desired maximum semideviation of the resulting portfolio.
- market_neutral – whether the portfolio should be market neutral (weights sum to zero), defaults to False. Requires negative lower weight bound.
- market_neutral – bool, optional
Returns: asset weights for the efficient risk portfolio
Return type: OrderedDict
-
max_quadratic_utility
(risk_aversion=1, market_neutral=False)[source]¶ Maximise the given quadratic utility, using portfolio semivariance instead of variance.
Parameters: - risk_aversion (positive float) – risk aversion parameter (must be greater than 0), defaults to 1
- market_neutral – whether the portfolio should be market neutral (weights sum to zero), defaults to False. Requires negative lower weight bound.
- market_neutral – bool, optional
Returns: asset weights for the maximum-utility portfolio
Return type: OrderedDict
-
min_semivariance
(market_neutral=False)[source]¶ Minimise portfolio semivariance (see docs for further explanation).
Parameters: - market_neutral – whether the portfolio should be market neutral (weights sum to zero), defaults to False. Requires negative lower weight bound.
- market_neutral – bool, optional
Returns: asset weights for the volatility-minimising portfolio
Return type: OrderedDict
-
portfolio_performance
(verbose=False, risk_free_rate=0.02)[source]¶ After optimising, calculate (and optionally print) the performance of the optimal portfolio, specifically: expected return, semideviation, Sortino ratio.
Parameters: - verbose (bool, optional) – whether performance should be printed, defaults to False
- risk_free_rate (float, optional) – risk-free rate of borrowing/lending, defaults to 0.02. The period of the risk-free rate should correspond to the frequency of expected returns.
Raises: ValueError – if weights have not been calcualted yet
Returns: expected return, semideviation, Sortino ratio.
Return type: (float, float, float)
Efficient CVaR¶
The conditional value-at-risk (a.k.a expected shortfall) is a popular measure of tail risk. The CVaR can be thought of as the average of losses that occur on “very bad days”, where “very bad” is quantified by the parameter \(\beta\).
For example, if we calculate the CVaR to be 10% for \(\beta = 0.95\), we can be 95% confident that the worst-case average daily loss will be 3%. Put differently, the CVaR is the average of all losses so severe that they only occur \((1-\beta)\%\) of the time.
While CVaR is quite an intuitive concept, a lot of new notation is required to formulate it mathematically (see the wiki page for more details). We will adopt the following notation:
- w for the vector of portfolio weights
- r for a vector of asset returns (daily), with probability distribution \(p(r)\).
- \(L(w, r) = - w^T r\) for the loss of the portfolio
- \(\alpha\) for the portfolio value-at-risk (VaR) with confidence \(\beta\).
The CVaR can then be written as:
This is a nasty expression to optimize because we are essentially integrating over VaR values. The key insight of Rockafellar and Uryasev (2001) [3] is that we can can equivalently optimize the following convex function:
where \([x]^+ = \max(x, 0)\). The authors prove that minimising \(F_\beta(w, \alpha)\) over all \(w, \alpha\) minimises the CVaR. Suppose we have a sample of T daily returns (these can either be historical or simulated). The integral in the expression becomes a sum, so the CVaR optimization problem reduces to a linear program:
This formulation introduces a new variable for each datapoint (similar to Efficient Semivariance), so you may run into performance issues for long returns dataframes. At the same time, you should aim to provide a sample of data that is large enough to include tail events.
I am grateful to Nicolas Knudde for the initial draft (all errors are my own). The implementation is based on Rockafellar and Uryasev (2001) [3].
-
class
pypfopt.efficient_frontier.
EfficientCVaR
(expected_returns, returns, beta=0.95, weight_bounds=(0, 1), solver=None, verbose=False, solver_options=None)[source]¶ The EfficientCVaR class allows for optimization along the mean-CVaR frontier, using the formulation of Rockafellar and Ursayev (2001).
Instance variables:
Inputs:
n_assets
- inttickers
- str listbounds
- float tuple OR (float tuple) listreturns
- pd.DataFrameexpected_returns
- np.ndarraysolver
- strsolver_options
- {str: str} dict
Output:
weights
- np.ndarray
Public methods:
min_cvar()
minimises the CVaRefficient_risk()
maximises return for a given CVaRefficient_return()
minimises CVaR for a given target returnadd_objective()
adds a (convex) objective to the optimization problemadd_constraint()
adds a constraint to the optimization problemportfolio_performance()
calculates the expected return and CVaR of the portfolioset_weights()
creates self.weights (np.ndarray) from a weights dictclean_weights()
rounds the weights and clips near-zeros.save_weights_to_file()
saves the weights to csv, json, or txt.
-
efficient_return
(target_return, market_neutral=False)[source]¶ Minimise CVaR for a given target return.
Parameters: - target_return (float) – the desired return of the resulting portfolio.
- market_neutral (bool, optional) – whether the portfolio should be market neutral (weights sum to zero), defaults to False. Requires negative lower weight bound.
Raises: - ValueError – if
target_return
is not a positive float - ValueError – if no portfolio can be found with return equal to
target_return
Returns: asset weights for the optimal portfolio
Return type: OrderedDict
-
efficient_risk
(target_cvar, market_neutral=False)[source]¶ Maximise return for a target CVaR. The resulting portfolio will have a CVaR less than the target (but not guaranteed to be equal).
Parameters: - target_cvar (float) – the desired maximum semideviation of the resulting portfolio.
- market_neutral – whether the portfolio should be market neutral (weights sum to zero), defaults to False. Requires negative lower weight bound.
- market_neutral – bool, optional
Returns: asset weights for the efficient risk portfolio
Return type: OrderedDict
-
min_cvar
(market_neutral=False)[source]¶ Minimise portfolio CVaR (see docs for further explanation).
Parameters: - market_neutral – whether the portfolio should be market neutral (weights sum to zero), defaults to False. Requires negative lower weight bound.
- market_neutral – bool, optional
Returns: asset weights for the volatility-minimising portfolio
Return type: OrderedDict
-
portfolio_performance
(verbose=False)[source]¶ After optimising, calculate (and optionally print) the performance of the optimal portfolio, specifically: expected return, CVaR
Parameters: verbose (bool, optional) – whether performance should be printed, defaults to False Raises: ValueError – if weights have not been calcualted yet Returns: expected return, CVaR. Return type: (float, float)
Custom optimization problems¶
We have seen previously that it is easy to add constraints to EfficientFrontier
objects (and
by extension, other general efficient frontier objects like EfficientSemivariance
). However, what if you aren’t interested
in anything related to max_sharpe()
, min_volatility()
, efficient_risk()
etc and want to
set up a completely new problem to optimize for some custom objective?
For example, perhaps our objective is to construct a basket of assets that best replicates a particular index, in otherwords, to minimise the tracking error. This does not fit within a mean-variance optimization paradigm, but we can still implement it in PyPortfolioOpt:
from pypfopt.base_optimizer import BaseConvexOptimizer
from pypfopt.objective_functions import ex_post_tracking_error
historic_rets = ... # dataframe of historic asset returns
benchmark_rets = ... # pd.Series of historic benchmark returns (same index as historic)
opt = BaseConvexOptimizer(
n_assets=len(historic_returns.columns),
tickers=historic_returns.columns,
weight_bounds=(0, 1)
)
opt.convex_objective(
ex_post_tracking_error,
historic_returns=historic_rets,
benchmark_returns=benchmark_rets,
)
weights = opt.clean_weights()
The EfficientFrontier
class inherits from BaseConvexOptimizer
. It may be more convenient
to call convex_objective
from an EfficientFrontier
instance than from BaseConvexOptimizer
,
particularly if your objective depends on the mean returns or covariance matrix.
You can either optimize some generic convex_objective
(which must be built using cvxpy
atomic functions – see here)
or a nonconvex_objective
, which uses scipy.optimize
as the backend and thus has a completely
different API. For more examples, check out this cookbook recipe.
- class
pypfopt.base_optimizer.
BaseConvexOptimizer
¶
BaseConvexOptimizer.
convex_objective
(custom_objective, weights_sum_to_one=True, **kwargs)¶Optimize a custom convex objective function. Constraints should be added with
ef.add_constraint()
. Optimizer arguments must be passed as keyword-args. Example:# Could define as a lambda function instead def logarithmic_barrier(w, cov_matrix, k=0.1): # 60 Years of Portfolio Optimization, Kolm et al (2014) return cp.quad_form(w, cov_matrix) - k * cp.sum(cp.log(w)) w = ef.convex_objective(logarithmic_barrier, cov_matrix=ef.cov_matrix)
Parameters:
- custom_objective (function with signature (cp.Variable, **kwargs) -> cp.Expression) – an objective function to be MINIMISED. This should be written using cvxpy atoms Should map (w, **kwargs) -> float.
- weights_sum_to_one (bool, optional) – whether to add the default objective, defaults to True
Raises: OptimizationError – if the objective is nonconvex or constraints nonlinear.
Returns: asset weights for the efficient risk portfolio
Return type: OrderedDict
BaseConvexOptimizer.
nonconvex_objective
(custom_objective, objective_args=None, weights_sum_to_one=True, constraints=None, solver='SLSQP', initial_guess=None)¶Optimize some objective function using the scipy backend. This can support nonconvex objectives and nonlinear constraints, but may get stuck at local minima. Example:
# Market-neutral efficient risk constraints = [ {"type": "eq", "fun": lambda w: np.sum(w)}, # weights sum to zero { "type": "eq", "fun": lambda w: target_risk ** 2 - np.dot(w.T, np.dot(ef.cov_matrix, w)), }, # risk = target_risk ] ef.nonconvex_objective( lambda w, mu: -w.T.dot(mu), # min negative return (i.e maximise return) objective_args=(ef.expected_returns,), weights_sum_to_one=False, constraints=constraints, )
Parameters:
- objective_function (function with signature (np.ndarray, args) -> float) – an objective function to be MINIMISED. This function should map (weight, args) -> cost
- objective_args (tuple of np.ndarrays) – arguments for the objective function (excluding weight)
- weights_sum_to_one (bool, optional) – whether to add the default objective, defaults to True
- constraints (dict list) – list of constraints in the scipy format (i.e dicts)
- solver (string) – which SCIPY solver to use, e.g “SLSQP”, “COBYLA”, “BFGS”. User beware: different optimizers require different inputs.
- initial_guess (np.ndarray) – the initial guess for the weights, shape (n,) or (n, 1)
Returns: asset weights that optimize the custom objective
Return type: OrderedDict
References¶
[1] | Estrada, J (2007). Mean-Semivariance Optimization: A Heuristic Approach. |
[2] | Markowitz, H.; Starer, D.; Fram, H.; Gerber, S. (2019). Avoiding the Downside. |
[3] | (1, 2) Rockafellar, R.; Uryasev, D. (2001). Optimization of conditional value-at-risk |
Black-Litterman Allocation¶
The Black-Litterman (BL) model [1] takes a Bayesian approach to asset allocation. Specifically, it combines a prior estimate of returns (for example, the market-implied returns) with views on certain assets, to produce a posterior estimate of expected returns. The advantages of this are:
- You can provide views on only a subset of assets and BL will meaningfully propagate it, taking into account the covariance with other assets.
- You can provide confidence in your views.
- Using Black-Litterman posterior returns results in much more stable portfolios than using mean-historical return.
Essentially, Black-Litterman treats the vector of expected returns itself as a quantity to be estimated. The Black-Litterman formula is given below:
- \(E(R)\) is a Nx1 vector of expected returns, where N is the number of assets.
- \(Q\) is a Kx1 vector of views.
- \(P\) is the KxN picking matrix which maps views to the universe of assets. Essentially, it tells the model which view corresponds to which asset(s).
- \(\Omega\) is the KxK uncertainty matrix of views.
- \(\Pi\) is the Nx1 vector of prior expected returns.
- \(\Sigma\) is the NxN covariance matrix of asset returns (as always)
- \(\tau\) is a scalar tuning constant.
Though the formula appears to be quite unwieldy, it turns out that the formula simply represents a weighted average between the prior estimate of returns and the views, where the weighting is determined by the confidence in the views and the parameter \(\tau\).
Similarly, we can calculate a posterior estimate of the covariance matrix:
Though the algorithm is relatively simple, BL proved to be a challenge from a software engineering perspective because it’s not quite clear how best to fit it into PyPortfolioOpt’s API. The full discussion can be found on a Github issue thread, but I ultimately decided that though BL is not technically an optimizer, it didn’t make sense to split up its methods into expected_returns or risk_models. I have thus made it an independent module and owing to the comparatively extensive theory, have given it a dedicated documentation page. I’d like to thank Felipe Schneider for his multiple contributions to the Black-Litterman implementation. A full example of its usage, including the acquistion of market cap data for free, please refer to the cookbook recipe.
Tip
Thomas Kirschenmann has built a neat interactive Black-Litterman tool on top of PyPortfolioOpt, which allows you to visualise BL outputs and compare optimization objectives.
Priors¶
You can think of the prior as the “default” estimate, in the absence of any information. Black and Litterman (1991) [2] provide the insight that a natural choice for this prior is the market’s estimate of the return, which is embedded into the market capitalisation of the asset.
Every asset in the market portfolio contributes a certain amount of risk to the portfolio. Standard theory suggests that investors must be compensated for the risk that they take, so we can attribute to each asset an expected compensation (i.e prior estimate of returns). This is quantified by the market-implied risk premium, which is the market’s excess return divided by its variance:
To calculate the market-implied returns, we then use the following formula:
Here, \(w_{mkt}\) denotes the market-cap weights. This formula is calculating the total amount of risk contributed by an asset and multiplying it with the market price of risk, resulting in the market-implied returns vector \(\Pi\). We can use PyPortfolioOpt to calculate this as follows:
from pypfopt import black_litterman, risk_models
"""
cov_matrix is a NxN sample covariance matrix
mcaps is a dict of market caps
market_prices is a series of S&P500 prices
"""
delta = black_litterman.market_implied_risk_aversion(market_prices)
prior = black_litterman.market_implied_prior_returns(mcaps, delta, cov_matrix)
There is nothing stopping you from using any prior you see fit (but it must have the same dimensionality as the universe). If you think that the mean historical returns are a good prior, you could go with that. But a significant body of research shows that mean historical returns are a completely uninformative prior.
Note
You don’t technically have to provide a prior estimate to the Black-Litterman model. This is particularly useful if your views (and confidences) were generated by some proprietary model, in which case BL is essentially a clever way of mixing your views.
Views¶
In the Black-Litterman model, users can either provide absolute or relative views. Absolute views are statements like: “AAPL will return 10%” or “XOM will drop 40%”. Relative views, on the other hand, are statements like “GOOG will outperform FB by 3%”.
These views must be specified in the vector \(Q\) and mapped to the asset universe via the picking matrix \(P\). A brief example of this is shown below, though a comprehensive guide is given by Idzorek. Let’s say that our universe is defined by the ordered list: SBUX, GOOG, FB, AAPL, BAC, JPM, T, GE, MSFT, XOM. We want to represent four views on these 10 assets, two absolute and two relative:
- SBUX will drop 20% (absolute)
- MSFT will rise by 5% (absolute)
- GOOG outperforms FB by 10%
- BAC and JPM will outperform T and GE by 15%
The corresponding views vector is formed by taking the numbers above and putting them into a column:
Q = np.array([-0.20, 0.05, 0.10, 0.15]).reshape(-1, 1)
The picking matrix is more interesting. Remember that its role is to link the views (which mention 8 assets) to the universe of 10 assets. Arguably, this is the most important part of the model because it is what allows us to propagate our expectations (and confidences in expectations) into the model:
P = np.array(
[
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 1, -1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0.5, 0.5, -0.5, -0.5, 0, 0],
]
)
A brief explanation of the above:
- Each view has a corresponding row in the picking matrix (the order matters)
- Absolute views have a single 1 in the column corresponding to the ticker’s order in the universe.
- Relative views have a positive number in the nominally outperforming asset columns and a negative number in the nominally underperforming asset columns. The numbers in each row should sum up to 0.
PyPortfolioOpt provides a helper method for inputting absolute views as either a dict
or pd.Series
–
if you have relative views, you must build your picking matrix manually:
from pypfopt.black_litterman import BlackLittermanModel
viewdict = {"AAPL": 0.20, "BBY": -0.30, "BAC": 0, "SBUX": -0.2, "T": 0.15}
bl = BlackLittermanModel(cov_matrix, absolute_views=viewdict)
Confidence matrix and tau¶
The confidence matrix is a diagonal covariance matrix containing the variances of each view. One heuristic for calculating \(\Omega\) is to say that is proportional to the variance of the priors. This is reasonable - quantities that move around a lot are harder to forecast! Hence PyPortfolioOpt does not require you to input a confidence matrix, and defaults to:
Alternatively, we provide an implementation of Idzorek’s method [1]. This allows you to specify your view uncertainties as
percentage confidences. To use this, choose omega="idzorek"
and pass a list of confidences (from 0 to 1) into the view_confidences
parameter.
You are of course welcome to provide your own estimate. This is particularly applicable if your views are the output of some statistical model, which may also provide the view uncertainty.
Another parameter that controls the relative weighting of the priors views is \(\tau\). There is a lot to be said about tuning this parameter, with many contradictory rules of thumb. Indeed, there has been an entire paper written on it [3]. We choose the sensible default \(\tau = 0.05\).
Note
If you use the default estimate of \(\Omega\), or omega="idzorek"
, it turns out that the value of \(\tau\) does not matter. This
is a consequence of the mathematics: the \(\tau\) cancels in the matrix multiplications.
Output of the BL model¶
The BL model outputs posterior estimates of the returns and covariance matrix. The default suggestion in the literature is to then input these into an optimizer (see General Efficient Frontier). A quick alternative, which is quite useful for debugging, is to calculate the weights implied by the returns vector [4]. It is actually the reverse of the procedure we used to calculate the returns implied by the market weights.
In PyPortfolioOpt, this is available under BlackLittermanModel.bl_weights()
. Because the BlackLittermanModel
class
inherits from BaseOptimizer
, this follows the same API as the EfficientFrontier
objects:
from pypfopt import black_litterman
from pypfopt.black_litterman import BlackLittermanModel
from pypfopt.efficient_frontier import EfficientFrontier
viewdict = {"AAPL": 0.20, "BBY": -0.30, "BAC": 0, "SBUX": -0.2, "T": 0.15}
bl = BlackLittermanModel(cov_matrix, absolute_views=viewdict)
rets = bl.bl_returns()
ef = EfficientFrontier(rets, cov_matrix)
# OR use return-implied weights
delta = black_litterman.market_implied_risk_aversion(market_prices)
bl.bl_weights(delta)
weights = bl.clean_weights()
Documentation reference¶
The black_litterman
module houses the BlackLittermanModel class, which
generates posterior estimates of expected returns given a prior estimate and user-supplied
views. In addition, two utility functions are defined, which calculate:
- market-implied prior estimate of returns
- market-implied risk-aversion parameter
-
class
pypfopt.black_litterman.
BlackLittermanModel
(cov_matrix, pi=None, absolute_views=None, Q=None, P=None, omega=None, view_confidences=None, tau=0.05, risk_aversion=1, **kwargs)[source]¶ A BlackLittermanModel object (inheriting from BaseOptimizer) contains requires a specific input format, specifying the prior, the views, the uncertainty in views, and a picking matrix to map views to the asset universe. We can then compute posterior estimates of returns and covariance. Helper methods have been provided to supply defaults where possible.
Instance variables:
Inputs:
cov_matrix
- np.ndarrayn_assets
- inttickers
- str listQ
- np.ndarrayP
- np.ndarraypi
- np.ndarrayomega
- np.ndarraytau
- float
Output:
posterior_rets
- pd.Seriesposterior_cov
- pd.DataFrameweights
- np.ndarray
Public methods:
default_omega()
- view uncertainty proportional to asset varianceidzorek_method()
- convert views specified as percentages into BL uncertaintiesbl_returns()
- posterior estimate of returnsbl_cov()
- posterior estimate of covariancebl_weights()
- weights implied by posterior returnsportfolio_performance()
calculates the expected return, volatility and Sharpe ratio for the allocated portfolio.set_weights()
creates self.weights (np.ndarray) from a weights dictclean_weights()
rounds the weights and clips near-zeros.save_weights_to_file()
saves the weights to csv, json, or txt.
-
__init__
(cov_matrix, pi=None, absolute_views=None, Q=None, P=None, omega=None, view_confidences=None, tau=0.05, risk_aversion=1, **kwargs)[source]¶ Parameters: - cov_matrix (pd.DataFrame or np.ndarray) – NxN covariance matrix of returns
- pi (np.ndarray, pd.Series, optional) – Nx1 prior estimate of returns, defaults to None. If pi=”market”, calculate a market-implied prior (requires market_caps to be passed). If pi=”equal”, use an equal-weighted prior.
- absolute_views (pd.Series or dict, optional) – a colleciton of K absolute views on a subset of assets, defaults to None. If this is provided, we do not need P, Q.
- Q (np.ndarray or pd.DataFrame, optional) – Kx1 views vector, defaults to None
- P (np.ndarray or pd.DataFrame, optional) – KxN picking matrix, defaults to None
- omega (np.ndarray or Pd.DataFrame, or string, optional) – KxK view uncertainty matrix (diagonal), defaults to None Can instead pass “idzorek” to use Idzorek’s method (requires you to pass view_confidences). If omega=”default” or None, we set the uncertainty proportional to the variance.
- view_confidences (np.ndarray, pd.Series, list, optional) – Kx1 vector of percentage view confidences (between 0 and 1), required to compute omega via Idzorek’s method.
- tau (float, optional) – the weight-on-views scalar (default is 0.05)
- risk_aversion (positive float, optional) – risk aversion parameter, defaults to 1
- market_caps (np.ndarray, pd.Series, optional) – (kwarg) market caps for the assets, required if pi=”market”
- risk_free_rate (float, defaults to 0.02) – (kwarg) risk_free_rate is needed in some methods
Caution
You must specify the covariance matrix and either absolute views or both Q and P, except in the special case where you provide exactly one view per asset, in which case P is inferred.
-
bl_cov
()[source]¶ Calculate the posterior estimate of the covariance matrix, given views on some assets. Based on He and Litterman (2002). It is assumed that omega is diagonal. If this is not the case, please manually set omega_inv.
Returns: posterior covariance matrix Return type: pd.DataFrame
-
bl_returns
()[source]¶ Calculate the posterior estimate of the returns vector, given views on some assets.
Returns: posterior returns vector Return type: pd.Series
-
bl_weights
(risk_aversion=None)[source]¶ Compute the weights implied by the posterior returns, given the market price of risk. Technically this can be applied to any estimate of the expected returns, and is in fact a special case of mean-variance optimization
\[w = (\delta \Sigma)^{-1} E(R)\]Parameters: risk_aversion (positive float, optional) – risk aversion parameter, defaults to 1 Returns: asset weights implied by returns Return type: OrderedDict
-
static
default_omega
(cov_matrix, P, tau)[source]¶ If the uncertainty matrix omega is not provided, we calculate using the method of He and Litterman (1999), such that the ratio omega/tau is proportional to the variance of the view portfolio.
Returns: KxK diagonal uncertainty matrix Return type: np.ndarray
-
static
idzorek_method
(view_confidences, cov_matrix, pi, Q, P, tau, risk_aversion=1)[source]¶ Use Idzorek’s method to create the uncertainty matrix given user-specified percentage confidences. We use the closed-form solution described by Jay Walters in The Black-Litterman Model in Detail (2014).
Parameters: view_confidences (np.ndarray, pd.Series, list,, optional) – Kx1 vector of percentage view confidences (between 0 and 1), required to compute omega via Idzorek’s method. Returns: KxK diagonal uncertainty matrix Return type: np.ndarray
-
portfolio_performance
(verbose=False, risk_free_rate=0.02)[source]¶ After optimising, calculate (and optionally print) the performance of the optimal portfolio. Currently calculates expected return, volatility, and the Sharpe ratio. This method uses the BL posterior returns and covariance matrix.
Parameters: - verbose (bool, optional) – whether performance should be printed, defaults to False
- risk_free_rate (float, optional) – risk-free rate of borrowing/lending, defaults to 0.02. The period of the risk-free rate should correspond to the frequency of expected returns.
Raises: ValueError – if weights have not been calcualted yet
Returns: expected return, volatility, Sharpe ratio.
Return type: (float, float, float)
-
pypfopt.black_litterman.
market_implied_prior_returns
(market_caps, risk_aversion, cov_matrix, risk_free_rate=0.02)[source]¶ Compute the prior estimate of returns implied by the market weights. In other words, given each asset’s contribution to the risk of the market portfolio, how much are we expecting to be compensated?
\[\Pi = \delta \Sigma w_{mkt}\]Parameters: - market_caps ({ticker: cap} dict or pd.Series) – market capitalisations of all assets
- risk_aversion (positive float) – risk aversion parameter
- cov_matrix (pd.DataFrame) – covariance matrix of asset returns
- risk_free_rate (float, optional) – risk-free rate of borrowing/lending, defaults to 0.02. You should use the appropriate time period, corresponding to the covariance matrix.
Returns: prior estimate of returns as implied by the market caps
Return type: pd.Series
-
pypfopt.black_litterman.
market_implied_risk_aversion
(market_prices, frequency=252, risk_free_rate=0.02)[source]¶ Calculate the market-implied risk-aversion parameter (i.e market price of risk) based on market prices. For example, if the market has excess returns of 10% a year with 5% variance, the risk-aversion parameter is 2, i.e you have to be compensated 2x the variance.
\[\delta = \frac{R - R_f}{\sigma^2}\]Parameters: - market_prices (pd.Series with DatetimeIndex.) – the (daily) prices of the market portfolio, e.g SPY.
- frequency (int, optional) – number of time periods in a year, defaults to 252 (the number of trading days in a year)
- risk_free_rate (float, optional) – risk-free rate of borrowing/lending, defaults to 0.02. The period of the risk-free rate should correspond to the frequency of expected returns.
Raises: TypeError – if market_prices cannot be parsed
Returns: market-implied risk aversion
Return type: float
References¶
[1] | (1, 2) Idzorek T. A step-by-step guide to the Black-Litterman model: Incorporating user-specified confidence levels. In: Forecasting Expected Returns in the Financial Markets. Elsevier Ltd; 2007. p. 17–38. |
[2] | Black, F; Litterman, R. Combining investor views with market equilibrium. The Journal of Fixed Income, 1991. |
[3] | Walters, Jay, The Factor Tau in the Black-Litterman Model (October 9, 2013). Available at SSRN: https://ssrn.com/abstract=1701467 or http://dx.doi.org/10.2139/ssrn.1701467 |
[4] | Walters J. The Black-Litterman Model in Detail (2014). SSRN Electron J.;(February 2007):1–65. |
Other Optimizers¶
Efficient frontier methods involve the direct optimization of an objective subject to constraints. However, there are some portfolio optimization schemes that are completely different in character. PyPortfolioOpt provides support for these alternatives, while still giving you access to the same pre and post-processing API.
Note
As of v0.4, these other optimizers now inherit from BaseOptimizer
or
BaseConvexOptimizer
, so you no longer have to implement pre-processing and
post-processing methods on your own. You can thus easily swap out, say,
EfficientFrontier
for HRPOpt
.
Hierarchical Risk Parity (HRP)¶
Hierarchical Risk Parity is a novel portfolio optimization method developed by Marcos Lopez de Prado [1]. Though a detailed explanation can be found in the linked paper, here is a rough overview of how HRP works:
- From a universe of assets, form a distance matrix based on the correlation of the assets.
- Using this distance matrix, cluster the assets into a tree via hierarchical clustering
- Within each branch of the tree, form the minimum variance portfolio (normally between just two assets).
- Iterate over each level, optimally combining the mini-portfolios at each node.
The advantages of this are that it does not require the inversion of the covariance matrix as with traditional mean-variance optimization, and seems to produce diverse portfolios that perform well out of sample.

The hierarchical_portfolio
module seeks to implement one of the recent advances in
portfolio optimization – the application of hierarchical clustering models in allocation.
All of the hierarchical classes have a similar API to EfficientFrontier
, though since
many hierarchical models currently don’t support different objectives, the actual allocation
happens with a call to optimize().
Currently implemented:
HRPOpt
implements the Hierarchical Risk Parity (HRP) portfolio. Code reproduced with permission from Marcos Lopez de Prado (2016).
-
class
pypfopt.hierarchical_portfolio.
HRPOpt
(returns=None, cov_matrix=None)[source]¶ A HRPOpt object (inheriting from BaseOptimizer) constructs a hierarchical risk parity portfolio.
Instance variables:
Inputs
n_assets
- inttickers
- str listreturns
- pd.DataFrame
Output:
weights
- np.ndarrayclusters
- linkage matrix corresponding to clustered assets.
Public methods:
optimize()
calculates weights using HRPportfolio_performance()
calculates the expected return, volatility and Sharpe ratio for the optimized portfolio.set_weights()
creates self.weights (np.ndarray) from a weights dictclean_weights()
rounds the weights and clips near-zeros.save_weights_to_file()
saves the weights to csv, json, or txt.
-
__init__
(returns=None, cov_matrix=None)[source]¶ Parameters: - returns (pd.DataFrame) – asset historical returns
- cov_matrix (pd.DataFrame.) – covariance of asset returns
Raises: TypeError – if
returns
is not a dataframe
-
optimize
(linkage_method='single')[source]¶ Construct a hierarchical risk parity portfolio, using Scipy hierarchical clustering (see here)
Parameters: linkage_method (str) – which scipy linkage method to use Returns: weights for the HRP portfolio Return type: OrderedDict
-
portfolio_performance
(verbose=False, risk_free_rate=0.02, frequency=252)[source]¶ After optimising, calculate (and optionally print) the performance of the optimal portfolio. Currently calculates expected return, volatility, and the Sharpe ratio assuming returns are daily
Parameters: - verbose (bool, optional) – whether performance should be printed, defaults to False
- risk_free_rate (float, optional) – risk-free rate of borrowing/lending, defaults to 0.02. The period of the risk-free rate should correspond to the frequency of expected returns.
- frequency (int, optional) – number of time periods in a year, defaults to 252 (the number of trading days in a year)
Raises: ValueError – if weights have not been calculated yet
Returns: expected return, volatility, Sharpe ratio.
Return type: (float, float, float)
The Critical Line Algorithm¶
This is a robust alternative to the quadratic solver used to find mean-variance optimal portfolios, that is especially advantageous when we apply linear inequalities. Unlike generic convex optimization routines, the CLA is specially designed for portfolio optimization. It is guaranteed to converge after a certain number of iterations, and can efficiently derive the entire efficient frontier.

Tip
In general, unless you have specific requirements e.g you would like to efficiently compute the entire
efficient frontier for plotting, I would go with the standard EfficientFrontier
optimizer.
I am most grateful to Marcos López de Prado and David Bailey for providing the implementation [2].
Permission for its distribution has been received by email. It has been modified such that it has
the same API, though as of v0.5.0 we only support max_sharpe()
and min_volatility()
.
The cla
module houses the CLA class, which
generates optimal portfolios using the Critical Line Algorithm as implemented
by Marcos Lopez de Prado and David Bailey.
-
class
pypfopt.cla.
CLA
(expected_returns, cov_matrix, weight_bounds=(0, 1))[source]¶ Instance variables:
Inputs:
n_assets
- inttickers
- str listmean
- np.ndarraycov_matrix
- np.ndarrayexpected_returns
- np.ndarraylb
- np.ndarrayub
- np.ndarray
Optimization parameters:
w
- np.ndarray listls
- float listg
- float listf
- float list list
Outputs:
weights
- np.ndarrayfrontier_values
- (float list, float list, np.ndarray list)
Public methods:
max_sharpe()
optimizes for maximal Sharpe ratio (a.k.a the tangency portfolio)min_volatility()
optimizes for minimum volatilityefficient_frontier()
computes the entire efficient frontierportfolio_performance()
calculates the expected return, volatility and Sharpe ratio for the optimized portfolio.clean_weights()
rounds the weights and clips near-zeros.save_weights_to_file()
saves the weights to csv, json, or txt.
-
__init__
(expected_returns, cov_matrix, weight_bounds=(0, 1))[source]¶ Parameters: - expected_returns (pd.Series, list, np.ndarray) – expected returns for each asset. Set to None if optimising for volatility only.
- cov_matrix (pd.DataFrame or np.array) – covariance of returns for each asset
- weight_bounds (tuple (float, float) or (list/ndarray, list/ndarray) or list(tuple(float, float))) – minimum and maximum weight of an asset, defaults to (0, 1). Must be changed to (-1, 1) for portfolios with shorting.
Raises: - TypeError – if
expected_returns
is not a series, list or array - TypeError – if
cov_matrix
is not a dataframe or array
-
efficient_frontier
(points=100)[source]¶ Efficiently compute the entire efficient frontier
Parameters: points (int, optional) – rough number of points to evaluate, defaults to 100 Raises: ValueError – if weights have not been computed Returns: return list, std list, weight list Return type: (float list, float list, np.ndarray list)
-
max_sharpe
()[source]¶ Maximise the Sharpe ratio.
Returns: asset weights for the max-sharpe portfolio Return type: OrderedDict
-
min_volatility
()[source]¶ Minimise volatility.
Returns: asset weights for the volatility-minimising portfolio Return type: OrderedDict
-
portfolio_performance
(verbose=False, risk_free_rate=0.02)[source]¶ After optimising, calculate (and optionally print) the performance of the optimal portfolio. Currently calculates expected return, volatility, and the Sharpe ratio.
Parameters: - verbose (bool, optional) – whether performance should be printed, defaults to False
- risk_free_rate (float, optional) – risk-free rate of borrowing/lending, defaults to 0.02
Raises: ValueError – if weights have not been calculated yet
Returns: expected return, volatility, Sharpe ratio.
Return type: (float, float, float)
Implementing your own optimizer¶
Please note that this is quite different to implementing Custom optimization problems, because in that case we are still using the same convex optimization structure. However, HRP and CLA optimization have a fundamentally different optimization method. In general, these are much more difficult to code up compared to custom objective functions.
To implement a custom optimizer that is compatible with the rest of PyPortfolioOpt, just
extend BaseOptimizer
(or BaseConvexOptimizer
if you want to use cvxpy
),
both of which can be found in base_optimizer.py
. This gives you access to utility
methods like clean_weights()
, as well as making sure that any output is compatible
with portfolio_performance()
and post-processing methods.
The base_optimizer
module houses the parent classes BaseOptimizer
from which all
optimizers will inherit. BaseConvexOptimizer
is the base class for all cvxpy
(and scipy
)
optimization.
Additionally, we define a general utility function portfolio_performance
to
evaluate return and risk for a given set of portfolio weights.
-
class
pypfopt.base_optimizer.
BaseOptimizer
(n_assets, tickers=None)[source]¶ Instance variables:
n_assets
- inttickers
- str listweights
- np.ndarray
Public methods:
set_weights()
creates self.weights (np.ndarray) from a weights dictclean_weights()
rounds the weights and clips near-zeros.save_weights_to_file()
saves the weights to csv, json, or txt.
-
__init__
(n_assets, tickers=None)[source]¶ Parameters: - n_assets (int) – number of assets
- tickers (list) – name of assets
-
clean_weights
(cutoff=0.0001, rounding=5)[source]¶ Helper method to clean the raw weights, setting any weights whose absolute values are below the cutoff to zero, and rounding the rest.
Parameters: - cutoff (float, optional) – the lower bound, defaults to 1e-4
- rounding (int, optional) – number of decimal places to round the weights, defaults to 5. Set to None if rounding is not desired.
Returns: asset weights
Return type: OrderedDict
-
class
pypfopt.base_optimizer.
BaseConvexOptimizer
(n_assets, tickers=None, weight_bounds=(0, 1), solver=None, verbose=False, solver_options=None)[source]¶ The BaseConvexOptimizer contains many private variables for use by
cvxpy
. For example, the immutable optimization variable for weights is stored as self._w. Interacting directly with these variables directly is discouraged.Instance variables:
n_assets
- inttickers
- str listweights
- np.ndarray_opt
- cp.Problem_solver
- str_solver_options
- {str: str} dict
Public methods:
add_objective()
adds a (convex) objective to the optimization problemadd_constraint()
adds a constraint to the optimization problemconvex_objective()
solves for a generic convex objective with linear constraintsnonconvex_objective()
solves for a generic nonconvex objective using the scipy backend. This is prone to getting stuck in local minima and is generally not recommended.set_weights()
creates self.weights (np.ndarray) from a weights dictclean_weights()
rounds the weights and clips near-zeros.save_weights_to_file()
saves the weights to csv, json, or txt.
-
__init__
(n_assets, tickers=None, weight_bounds=(0, 1), solver=None, verbose=False, solver_options=None)[source]¶ Parameters: - weight_bounds (tuple OR tuple list, optional) – minimum and maximum weight of each asset OR single min/max pair if all identical, defaults to (0, 1). Must be changed to (-1, 1) for portfolios with shorting.
- solver (str, optional. Defaults to "ECOS") – name of solver. list available solvers with:
cvxpy.installed_solvers()
- verbose (bool, optional) – whether performance and debugging info should be printed, defaults to False
- solver_options (dict, optional) – parameters for the given solver
-
_map_bounds_to_constraints
(test_bounds)[source]¶ Convert input bounds into a form acceptable by cvxpy and add to the constraints list.
Parameters: test_bounds (tuple OR list/tuple of tuples OR pair of np arrays) – minimum and maximum weight of each asset OR single min/max pair if all identical OR pair of arrays corresponding to lower/upper bounds. defaults to (0, 1). Raises: TypeError – if test_bounds
is not of the right typeReturns: bounds suitable for cvxpy Return type: tuple pair of np.ndarray
-
_solve_cvxpy_opt_problem
()[source]¶ Helper method to solve the cvxpy problem and check output, once objectives and constraints have been defined
Raises: exceptions.OptimizationError – if problem is not solvable by cvxpy
-
add_constraint
(new_constraint)[source]¶ Add a new constraint to the optimization problem. This constraint must satisfy DCP rules, i.e be either a linear equality constraint or convex inequality constraint.
Examples:
ef.add_constraint(lambda x : x[0] == 0.02) ef.add_constraint(lambda x : x >= 0.01) ef.add_constraint(lambda x: x <= np.array([0.01, 0.08, ..., 0.5]))
Parameters: new_constraint – the constraint to be added
-
add_objective
(new_objective, **kwargs)[source]¶ Add a new term into the objective function. This term must be convex, and built from cvxpy atomic functions.
Example:
def L1_norm(w, k=1): return k * cp.norm(w, 1) ef.add_objective(L1_norm, k=2)
Parameters: new_objective (cp.Expression (i.e function of cp.Variable)) – the objective to be added
-
add_sector_constraints
(sector_mapper, sector_lower, sector_upper)[source]¶ Adds constraints on the sum of weights of different groups of assets. Most commonly, these will be sector constraints e.g portfolio’s exposure to tech must be less than x%:
sector_mapper = { "GOOG": "tech", "FB": "tech",, "XOM": "Oil/Gas", "RRC": "Oil/Gas", "MA": "Financials", "JPM": "Financials", } sector_lower = {"tech": 0.1} # at least 10% to tech sector_upper = { "tech": 0.4, # less than 40% tech "Oil/Gas": 0.1 # less than 10% oil and gas }
Parameters: - sector_mapper ({str: str} dict) – dict that maps tickers to sectors
- sector_lower ({str: float} dict) – lower bounds for each sector
- sector_upper ({str:float} dict) – upper bounds for each sector
-
convex_objective
(custom_objective, weights_sum_to_one=True, **kwargs)[source]¶ Optimize a custom convex objective function. Constraints should be added with
ef.add_constraint()
. Optimizer arguments must be passed as keyword-args. Example:# Could define as a lambda function instead def logarithmic_barrier(w, cov_matrix, k=0.1): # 60 Years of Portfolio Optimization, Kolm et al (2014) return cp.quad_form(w, cov_matrix) - k * cp.sum(cp.log(w)) w = ef.convex_objective(logarithmic_barrier, cov_matrix=ef.cov_matrix)
Parameters: - custom_objective (function with signature (cp.Variable, **kwargs) -> cp.Expression) – an objective function to be MINIMISED. This should be written using cvxpy atoms Should map (w, **kwargs) -> float.
- weights_sum_to_one (bool, optional) – whether to add the default objective, defaults to True
Raises: OptimizationError – if the objective is nonconvex or constraints nonlinear.
Returns: asset weights for the efficient risk portfolio
Return type: OrderedDict
-
nonconvex_objective
(custom_objective, objective_args=None, weights_sum_to_one=True, constraints=None, solver='SLSQP', initial_guess=None)[source]¶ Optimize some objective function using the scipy backend. This can support nonconvex objectives and nonlinear constraints, but may get stuck at local minima. Example:
# Market-neutral efficient risk constraints = [ {"type": "eq", "fun": lambda w: np.sum(w)}, # weights sum to zero { "type": "eq", "fun": lambda w: target_risk ** 2 - np.dot(w.T, np.dot(ef.cov_matrix, w)), }, # risk = target_risk ] ef.nonconvex_objective( lambda w, mu: -w.T.dot(mu), # min negative return (i.e maximise return) objective_args=(ef.expected_returns,), weights_sum_to_one=False, constraints=constraints, )
Parameters: - objective_function (function with signature (np.ndarray, args) -> float) – an objective function to be MINIMISED. This function should map (weight, args) -> cost
- objective_args (tuple of np.ndarrays) – arguments for the objective function (excluding weight)
- weights_sum_to_one (bool, optional) – whether to add the default objective, defaults to True
- constraints (dict list) – list of constraints in the scipy format (i.e dicts)
- solver (string) – which SCIPY solver to use, e.g “SLSQP”, “COBYLA”, “BFGS”. User beware: different optimizers require different inputs.
- initial_guess (np.ndarray) – the initial guess for the weights, shape (n,) or (n, 1)
Returns: asset weights that optimize the custom objective
Return type: OrderedDict
References¶
[1] | López de Prado, M. (2016). Building Diversified Portfolios that Outperform Out of Sample. The Journal of Portfolio Management, 42(4), 59–69. |
[2] | Bailey and Loópez de Prado (2013). An Open-Source Implementation of the Critical-Line Algorithm for Portfolio Optimization |
Post-processing weights¶
After optimal weights have been generated, it is often necessary to do some post-processing before they can be used practically. In particular, you are likely using portfolio optimization techniques to generate a portfolio allocation – a list of tickers and corresponding integer quantities that you could go and purchase at a broker.
However, it is not trivial to convert the continuous weights (output by any of our optimization methods) into an actionable allocation. For example, let us say that we have $10,000 that we would like to allocate. If we multiply the weights by this total portfolio value, the result will be dollar amounts of each asset. So if the optimal weight for Apple is 0.15, we need $1500 worth of Apple stock. However, Apple shares come in discrete units ($190 at the time of writing), so we will not be able to buy exactly $1500 of stock. The best we can do is to buy the number of shares that gets us closest to the desired dollar value.
PyPortfolioOpt offers two ways of solving this problem: one using a simple greedy algorithm, the other using integer programming.
Greedy algorithm¶
DiscreteAllocation.greedy_portfolio()
proceeds in two ‘rounds’.
In the first round, we buy as many shares as we can for each asset without going over
the desired weight. In the Apple example, \(1500/190 \approx 7.89\), so we buy 7
shares at a cost of $1330. After iterating through all of the assets, we will have a
lot of money left over (since we always rounded down).
In the second round, we calculate how far the current weights deviate from the existing weights for each asset. We wanted Apple to form 15% of the portfolio (with total value $10,000), but we only bought $1330 worth of Apple stock, so there is a deviation of \(0.15 - 0.133\). Some assets will have a higher deviation from the ideal, so we will purchase shares of these first. We then repeat the process, always buying shares of the asset whose current weight is furthest away from the ideal weight. Though this algorithm will not guarantee the optimal solution, I have found that it allows us to generate discrete allocations with very little money left over (e.g $12 left on a $10,000 portfolio).
That being said, we can see that on the test dataset (for a standard max_sharpe
portfolio), the allocation method may deviate rather widely from the desired weights,
particularly for companies with a high share price (e.g AMZN).
Funds remaining: 12.15
MA: allocated 0.242, desired 0.246
FB: allocated 0.200, desired 0.199
PFE: allocated 0.183, desired 0.184
BABA: allocated 0.088, desired 0.096
AAPL: allocated 0.086, desired 0.092
AMZN: allocated 0.000, desired 0.072
BBY: allocated 0.064, desired 0.061
SBUX: allocated 0.036, desired 0.038
GOOG: allocated 0.102, desired 0.013
Allocation has RMSE: 0.038
Integer programming¶
This method (credit to Dingyuan Wang for the first implementation) treats the discrete allocation as an integer programming problem. In effect, the integer programming approach searches the space of possible allocations to find the one that is closest to our desired weights. We will use the following notation:
- \(T \in \mathbb{R}\) is the total dollar value to be allocated
- \(p \in \mathbb{R}^n\) is the array of latest prices
- \(w \in \mathbb{R}^n\) is the set of target weights
- \(x \in \mathbb{Z}^n\) is the integer allocation (i.e the result)
- \(r \in \mathbb{R}\) is the remaining unallocated value, i.e \(r = T - x \cdot p\).
The optimization problem is then given by:
This is straightforward to translate into cvxpy
.
Caution
Though lp_portfolio()
produces allocations with a lower RMSE, some testing
shows that it is between 100 and 1000 times slower than greedy_portfolio()
.
This doesn’t matter for small portfolios (it should still take less than a second),
but the runtime for integer programs grows exponentially as the number of stocks, so
for large portfolios you may have to use greedy_portfolio()
.
Dealing with shorts¶
As of v0.4, DiscreteAllocation
automatically deals with shorts by finding separate discrete
allocations for the long-only and short-only portions. If your portfolio has shorts,
you should pass a short ratio. The default is 0.30, corresponding to a 130/30 long-short balance.
Practically, this means that you would go long $10,000 of some stocks, short $3000 of some other
stocks, then use the proceeds from the shorts to go long another $3000.
Thus the total value of the resulting portfolio would be $13,000.
Documentation reference¶
The discrete_allocation
module contains the DiscreteAllocation
class, which
offers multiple methods to generate a discrete portfolio allocation from continuous weights.
-
class
pypfopt.discrete_allocation.
DiscreteAllocation
(weights, latest_prices, total_portfolio_value=10000, short_ratio=None)[source]¶ Generate a discrete portfolio allocation from continuous weights
Instance variables:
Inputs:
weights
- dictlatest_prices
- pd.Series or dicttotal_portfolio_value
- int/floatshort_ratio
- float
Output:
allocation
- dict
Public methods:
greedy_portfolio()
- uses a greedy algorithmlp_portfolio()
- uses linear programming
-
__init__
(weights, latest_prices, total_portfolio_value=10000, short_ratio=None)[source]¶ Parameters: - weights (dict) – continuous weights generated from the
efficient_frontier
module - latest_prices (pd.Series) – the most recent price for each asset
- total_portfolio_value (int/float, optional) – the desired total value of the portfolio, defaults to 10000
- short_ratio (float, defaults to None.) – the short ratio, e.g 0.3 corresponds to 130/30. If None, defaults to the input weights.
Raises: - TypeError – if
weights
is not a dict - TypeError – if
latest_prices
isn’t a series - ValueError – if
short_ratio < 0
- weights (dict) – continuous weights generated from the
-
_allocation_rmse_error
(verbose=True)[source]¶ Utility function to calculate and print RMSE error between discretised weights and continuous weights. RMSE was used instead of MAE because we want to penalise large variations.
Parameters: verbose (bool) – print weight discrepancies? Returns: rmse error Return type: float
-
static
_remove_zero_positions
(allocation)[source]¶ Utility function to remove zero positions (i.e with no shares being bought)
-
greedy_portfolio
(reinvest=False, verbose=False)[source]¶ Convert continuous weights into a discrete portfolio allocation using a greedy iterative approach.
Parameters: - reinvest (bool, defaults to False) – whether or not to reinvest cash gained from shorting
- verbose (bool, defaults to False) – print error analysis?
Returns: the number of shares of each ticker that should be purchased, along with the amount of funds leftover.
Return type: (dict, float)
-
lp_portfolio
(reinvest=False, verbose=False, solver='GLPK_MI')[source]¶ Convert continuous weights into a discrete portfolio allocation using integer programming.
Parameters: - reinvest (bool, defaults to False) – whether or not to reinvest cash gained from shorting
- verbose (bool) – print error analysis?
- solver (str, defaults to "GLPK_MI") – the CVXPY solver to use (must support mixed-integer programs)
Returns: the number of shares of each ticker that should be purchased, along with the amount of funds leftover.
Return type: (dict, float)
Plotting¶
All of the optimization functions in EfficientFrontier
produce a single optimal portfolio.
However, you may want to plot the entire efficient frontier. This efficient frontier can be thought
of in several different ways:
- The set of all
efficient_risk()
portfolios for a range of target risks - The set of all
efficient_return()
portfolios for a range of target returns - The set of all
max_quadratic_utility()
portfolios for a range of risk aversions.
The plotting
module provides support for all three of these approaches. To produce
a plot of the efficient frontier, you should instantiate your EfficientFrontier
object
and add constraints like you normally would, but before calling an optimization function (e.g with
ef.max_sharpe()
), you should pass this the instantiated object into plot.plot_efficient_frontier()
:
ef = EfficientFrontier(mu, S, weight_bounds=(None, None))
ef.add_constraint(lambda w: w[0] >= 0.2)
ef.add_constraint(lambda w: w[2] == 0.15)
ef.add_constraint(lambda w: w[3] + w[4] <= 0.10)
fig, ax = plt.subplots()
plotting.plot_efficient_frontier(ef, ax=ax, show_assets=True)
plt.show()
This produces the following plot:
You can explicitly pass a range of parameters (risk, utility, or returns) to generate a frontier:
# 100 portfolios with risks between 0.10 and 0.30
risk_range = np.linspace(0.10, 0.40, 100)
plotting.plot_efficient_frontier(ef, ef_param="risk", ef_param_range=risk_range,
show_assets=True, showfig=True)
We can easily generate more complex plots. The following script plots both the efficient frontier and randomly generated (suboptimal) portfolios, coloured by the Sharpe ratio:
fig, ax = plt.subplots()
plotting.plot_efficient_frontier(ef, ax=ax, show_assets=False)
# Find the tangency portfolio
ef.max_sharpe()
ret_tangent, std_tangent, _ = ef.portfolio_performance()
ax.scatter(std_tangent, ret_tangent, marker="*", s=100, c="r", label="Max Sharpe")
# Generate random portfolios
n_samples = 10000
w = np.random.dirichlet(np.ones(len(mu)), n_samples)
rets = w.dot(mu)
stds = np.sqrt(np.diag(w @ S @ w.T))
sharpes = rets / stds
ax.scatter(stds, rets, marker=".", c=sharpes, cmap="viridis_r")
# Output
ax.set_title("Efficient Frontier with random portfolios")
ax.legend()
plt.tight_layout()
plt.savefig("ef_scatter.png", dpi=200)
plt.show()
This is the result:
Documentation reference¶
The plotting
module houses all the functions to generate various plots.
Currently implemented:
plot_covariance
- plot a correlation matrixplot_dendrogram
- plot the hierarchical clusters in a portfolioplot_efficient_frontier
– plot the efficient frontier from an EfficientFrontier or CLA objectplot_weights
- bar chart of weights
Tip
To save the plot, pass filename="somefile.png"
as a keyword argument to any of
the plotting functions. This (along with some other kwargs) get passed through
_plot_io()
before being returned.
-
pypfopt.plotting.
_plot_io
(**kwargs)[source]¶ Helper method to optionally save the figure to file.
Parameters: - filename (str, optional) – name of the file to save to, defaults to None (doesn’t save)
- dpi (int (between 50-500)) – dpi of figure to save or plot, defaults to 300
- showfig (bool, optional) – whether to plt.show() the figure, defaults to False
-
pypfopt.plotting.
plot_covariance
(cov_matrix, plot_correlation=False, show_tickers=True, **kwargs)[source]¶ Generate a basic plot of the covariance (or correlation) matrix, given a covariance matrix.
Parameters: - cov_matrix (pd.DataFrame or np.ndarray) – covariance matrix
- plot_correlation (bool, optional) – whether to plot the correlation matrix instead, defaults to False.
- show_tickers (bool, optional) – whether to use tickers as labels (not recommended for large portfolios), defaults to True
Returns: matplotlib axis
Return type: matplotlib.axes object

-
pypfopt.plotting.
plot_dendrogram
(hrp, ax=None, show_tickers=True, **kwargs)[source]¶ Plot the clusters in the form of a dendrogram.
Parameters: - hrp (object) – HRPpt object that has already been optimized.
- show_tickers (bool, optional) – whether to use tickers as labels (not recommended for large portfolios), defaults to True
- filename (str, optional) – name of the file to save to, defaults to None (doesn’t save)
- showfig (bool, optional) – whether to plt.show() the figure, defaults to False
Returns: matplotlib axis
Return type: matplotlib.axes object

-
pypfopt.plotting.
plot_efficient_frontier
(opt, ef_param='return', ef_param_range=None, points=100, ax=None, show_assets=True, **kwargs)[source]¶ Plot the efficient frontier based on either a CLA or EfficientFrontier object.
Parameters: - opt (EfficientFrontier or CLA) – an instantiated optimizer object BEFORE optimising an objective
- ef_param (str, one of {"utility", "risk", "return"}.) – [EfficientFrontier] whether to use a range over utility, risk, or return. Defaults to “return”.
- ef_param_range (np.array or list (recommended to use np.arange or np.linspace)) – the range of parameter values for ef_param. If None, automatically compute a range from min->max return.
- points (int, optional) – number of points to plot, defaults to 100. This is overridden if an ef_param_range is provided explicitly.
- show_assets (bool, optional) – whether we should plot the asset risks/returns also, defaults to True
- filename (str, optional) – name of the file to save to, defaults to None (doesn’t save)
- showfig (bool, optional) – whether to plt.show() the figure, defaults to False
Returns: matplotlib axis
Return type: matplotlib.axes object

-
pypfopt.plotting.
plot_weights
(weights, ax=None, **kwargs)[source]¶ Plot the portfolio weights as a horizontal bar chart
Parameters: - weights ({ticker: weight} dict) – the weights outputted by any PyPortfolioOpt optimizer
- ax (matplotlib.axes) – ax to plot to, optional
Returns: matplotlib axis
Return type: matplotlib.axes

FAQs¶
Constraining the number of assets¶
Unfortunately, cardinality constraints are not convex, making them difficult to implement.
However, we can treat it as a mixed-integer program and solve (provided you have access to a solver).
for small problems with less than 1000 variables and constraints, you can use the community version of CPLEX:
pip install cplex
. In the below example, we limit the portfolio to at most 10 assets:
ef = EfficientFrontier(mu, S, solver=cp.CPLEX)
booleans = cp.Variable(len(ef.tickers), boolean=True)
ef.add_constraint(lambda x: x <= booleans)
ef.add_constraint(lambda x: cp.sum(booleans) <= 10)
ef.min_volatility()
This does not play well with max_sharpe
, and needs to be modified for different bounds.
See this issue for further discussion.
Tracking error¶
Tracking error can either be used as an objective (as described in General Efficient Frontier) or as a constraint. This is an example of adding a tracking error constraint:
from objective functions import ex_ante_tracking_error
benchmark_weights = ... # benchmark
ef = EfficientFrontier(mu, S)
ef.add_constraint(ex_ante_tracking_error, cov_matrix=ef.cov_matrix,
benchmark_weights=benchmark_weights)
ef.min_volatility()
Roadmap and Changelog¶
Roadmap¶
These are some of the features that I think would greatly improve PyPortfolioOpt; if you are interested in implementing one of these, raise an issue or send me an email and we can discuss. If you have any other feature requests, please raise them using GitHub issues
- Open-source backtests using either Backtrader or Zipline.
- Risk parity
- Optimising for higher moments (i.e skew and kurtosis)
- Factor modelling - this is conceptually doable, but a lot of thought needs to be put into the API.
- Monte Carlo optimization with custom distributions
- Further support for different risk/return models
1.4.0¶
- Finally implemented CVaR optimization! This has been one of the most requested features. Many thanks to Nicolas Knudde for the initial draft.
- Re-architected plotting so users can pass an ax, allowing for complex plots (see cookbook).
- Helper method to compute the max-return portfolio (thanks to Philipp Schiele) for the suggestion).
- Several bug fixes and test improvements (thanks to Carl Peasnell).
1.4.1¶
- 100% test coverage
- Reorganised docs; added FAQ page
- Reorganised module structure to make it more scalable
- Python 3.9 support, dockerfile versioning, misc packaging improvements (e.g cvxopt optional)
1.3.0¶
- Significantly improved plotting functionality: can now plot constrained efficient frontier!
- Efficient semivariance portfolios (thanks to Philipp Schiele)
- Improved functionality for portfolios with short positions (thanks to Rich Caputo).
- Significant improvement in test coverage (thanks to Carl Peasnell).
- Several bug fixes and usability improvements.
- Migrated from TravisCI to Github Actions.
1.3.1¶
- Minor cleanup (forgotten commits from v1.3.0).
1.2.0¶
- Added Idzorek’s method for calculating the
omega
matrix given percentage confidences. - Fixed max sharpe to allow for custom constraints
- Grouped sector constraints
- Improved error tracebacks
- Adding new cookbook for examples (in progress).
- Packaging: added bettter instructions for windows, added docker support.
1.2.1¶
Fixed critical ordering bug in sector constraints
1.2.2¶
Matplotlib now required dependency; support for pandas 1.0.
1.2.3¶
- Added support for changing solvers and verbose output
- Changed dict to OrderedDict to support python 3.5
- Improved packaging/dependencies: simplified requirements.txt, improved processes before pushing.
1.2.4¶
- Fixed bug in Ledoit-Wolf shrinkage calculation.
- Fixed bug in plotting docs that caused them not to render.
1.2.5¶
- Fixed compounding in
expected_returns
(thanks to Aditya Bhutra). - Improvements in advanced cvxpy API (thanks to Pat Newell).
- Deprecating James-Stein
- Exposed
linkage_method
in HRP. - Added support for cvxpy 1.1.
- Added an error check for
efficient_risk
. - Small improvements to docs.
1.2.6¶
- Fixed order-dependence bug in Black-Litterman
market_implied_prior_returns
- Fixed inaccuracy in BL cookbook.
- Fixed bug in exponential covariance.
1.2.7¶
- Fixed bug which required conservative risk targets for long/short portfolios.
1.1.0¶
- Multiple additions and improvements to
risk_models
:- Introduced a new API, in which the function
risk_models.risk_matrix(method="...")
allows all the different risk models to be called. This should make testing easier. - All methods now accept returns data instead of prices, if you set the flag
returns_data=True
.
- Introduced a new API, in which the function
- Automatically fix non-positive semidefinite covariance matrices!
- Additions and improvements to
expected_returns
:- Introduced a new API, in which the function
expected_returns.return_model(method="...")
allows all the different return models to be called. This should make testing easier. - Added option to ‘properly’ compound returns.
- Added the CAPM return model.
- Introduced a new API, in which the function
from pypfopt import plotting
: moved all plotting functionality into a new class and added new plots. All other plotting functions (scattered in different classes) have been retained, but are now deprecated.
1.0.0¶
- Migrated backend from
scipy
tocvxpy
and made significant breaking changes to the API- PyPortfolioOpt is now significantly more robust and numerically stable.
- These changes will not affect basic users, who can still access features like
max_sharpe()
. - However, additional objectives and constraints (including L2 regularisation) are now explicitly added before optimising some ‘primary’ objective.
- Added basic plotting capabilities for the efficient frontier, hierarchical clusters, and HRP dendrograms.
- Added a basic transaction cost objective.
- Made breaking changes to some modules and classes so that PyPortfolioOpt is easier to extend
in future:
- Replaced
BaseScipyOptimizer
withBaseConvexOptimizer
hierarchical_risk_parity
was replaced byhierarchical_portfolios
to leave the door open for other hierarchical methods.- Sadly, removed CVaR optimization for the time being until I can properly fix it.
- Replaced
1.0.1¶
Fixed minor issues in CLA: weight bound bug, efficient_frontier
needed weights to be called, set_weights
not needed.
1.0.2¶
Fixed small but important bug where passing expected_returns=None
fails. According to the docs, users
should be able to only pass covariance if they want to only optimize min volatility.
0.5.0¶
- Black-Litterman model and docs.
- Custom bounds per asset
- Improved
BaseOptimizer
, adding a method that writes weights to text and fixing a bug inset_weights
. - Unconstrained quadratic utility optimization (analytic)
- Revamped docs, with information on types of attributes and more examples.
0.5.1¶
Fixed an error with dot products by amending the pandas requirements.
0.5.2¶
Made PuLP, sklearn, noisyopt optional dependencies to improve installation experience.
0.5.3¶
- Fixed an optimization bug in
EfficientFrontier.efficient_risk
. An error is now thrown if optimization fails. - Added a hidden API to change the scipy optimizer method.
0.5.4¶
- Improved the Black-Litterman linear algebra to avoid inverting the uncertainty matrix. It is now possible to have 100% confidence in views.
- Clarified regarding the role of tau.
- Added a
pipfile
forpipenv
users. - Removed Value-at-risk from docs to discourage usage until it is properly fixed.
0.5.5¶
Began migration to cvxpy by changing the discrete allocation backend from PuLP to cvxpy.
0.4.0¶
- Major improvements to
discrete_allocation
. Added functionality to allocate shorts; modified the linear programming method suggested by Dingyuan Wang; added postprocessing section to User Guide. - Further refactoring and docs for
HRPOpt
. - Major documentation update, e.g to support custom optimizers
0.4.1¶
- Added CLA back in after getting permission from Dr Marcos López de Prado
- Added more tests for different risk models.
0.4.2¶
- Minor fix for
clean_weights
- Removed official support for python 3.4.
- Minor improvement to semicovariance, thanks to Felipe Schneider.
0.4.3¶
- Added
prices_from_returns
utility function and provided better docs forreturns_from_prices
. - Added
cov_to_corr
method to produce correlation matrices from covariance matrices. - Fixed readme examples.
0.3.0¶
- Merged an amazing PR from Dingyuan Wang that rearchitects the project to make it more self-consistent and extensible.
- New algorithm: ML de Prado’s CLA
- New algorithms for converting continuous allocation to discrete (using linear programming).
- Merged a PR implementing Single Factor and Constant Correlation shrinkage.
0.3.3¶
Migrated the project internally to use the poetry
dependency manager. Will still keep setup.py
and requirements.txt
, but poetry
is now the recommended way to interact with PyPortfolioOpt.
0.3.4¶
Refactored shrinkage models, including single factor and constant correlation.
0.2.0¶
- Hierarchical Risk Parity optimization
- Semicovariance matrix
- Exponential covariance matrix
- CVaR optimization
- Better support for custom objective functions
- Multiple bug fixes (including minimum volatility vs minimum variance)
- Refactored so all optimizers inherit from a
BaseOptimizer
.
0.2.1¶
- Included python 3.7 in travis build
- Merged PR from schneiderfelipe to fix an error message.
Contributing¶
Some of the things that I’d love for people to help with:
- Improve performance of existing code (but not at the cost of readability)
- Add new optimization objectives. For example, if you would like to use something other than the Sharpe ratio, write an optimizer! (or suggest it in Issues and I will have a go).
- Help me write more tests! If you are someone learning about quant finance and/or unit testing in python, what better way to practice than to write some tests on an open-source project! Feel free to check for edge cases, or for uncommon parameter combinations which may cause silent errors.
Guidelines¶
Seek early feedback¶
Before you start coding your contribution, it may be wise to raise an issue on GitHub to discuss whether the contribution is appropriate for the project.
Code style¶
For this project I have used Black as the formatting standard, with all of the default settings. It would be much appreciated if any PRs follow this standard because if not I will have to format before merging.
Testing¶
Any contributions must be accompanied by unit tests (written with pytest
).
These are incredibly simple to write, just find the relevant test file (or create
a new one), and write a bunch of assert
statements. The test should be applied
to the dummy dataset I have provided in tests/stock_prices.csv
, and should
cover core functionality, warnings/errors (check that they are raised as expected),
and limiting behaviour or edge cases.
Documentation¶
Inline comments are great when needed, but don’t go overboard. Docstring content should follow PEP257 semantically and sphinx syntactically, such that sphinx can automatically document the methods and their arguments. I am personally not a fan of writing long paragraphs in the docstrings: in my view, docstrings should state briefly how an object can be used, while the rest of the explanation and theoretical background should be offloaded to ReadTheDocs.
I would appreciate if changes are accompanied by relevant documentation - it doesn’t have to be pretty, because I will probably try to tidy it up before it goes onto ReadTheDocs, but it’d make things a lot simpler to have the person who wrote the code explain it in their own words.
Questions¶
If you have any questions related to the project, it is probably best to raise an issue and I will tag it as a question.
If you have questions unrelated to the project, drop me an email - contact details can be found on my website.
Bugs/issues¶
If you find any bugs or the portfolio optimization is not working as expected, feel free to raise an issue. I would ask that you provide the following information in the issue:
- Descriptive title so that other users can see the existing issues
- Operating system, python version, and python distribution (optional).
- Minimal example for reproducing the issue.
- What you expected to happen
- What actually happened
- A full traceback of the error message (omit personal details as you see fit).
About¶
I’m Robert, a Natural Sciences undergraduate at the University of Cambridge. I am interested in a broad range of quantitative topics, including physics, statistics, finance and computer science (and the intersection between them). For more about me, please head over to my website.
I learn fastest when making real projects. In early 2018 I began seriously trying to self-educate on certain topics in quantitative finance, and mean-variance optimization is one of the cornerstones of this field. I read quite a few journal articles and explanations but ultimately felt that a real proof of understanding would lie in the implementation. At the same time, I realised that existing open-source (python) portfolio optimization libraries (there are one or two), were unsatisfactory for several reasons, and that people ‘out there’ might benefit from a well-documented and intuitive API. This is what motivated the development of PyPortfolioOpt.
Project principles and design decisions¶
- It should be easy to swap out individual components of the optimization process with the user’s proprietary improvements.
- Usability is everything: it is better to be self-explanatory than consistent.
- There is no point in portfolio optimization unless it can be practically applied to real asset prices.
- Everything that has been implemented should be tested.
- Inline documentation is good: dedicated (separate) documentation is better. The two are not mutually exclusive.
- Formatting should never get in the way of good code: because of this, I have deferred all formatting decisions to Black.
Advantages over existing implementations¶
- Includes both classical methods (Markowitz 1952 and Black-Litterman), suggested best practices (e.g covariance shrinkage), along with many recent developments and novel features, like L2 regularisation, exponential covariance, hierarchical risk parity.
- Native support for pandas dataframes: easily input your daily prices data.
- Extensive practical tests, which use real-life data.
- Easy to combine with your proprietary strategies and models.
- Robust to missing data, and price-series of different lengths (e.g FB data only goes back to 2012 whereas AAPL data goes back to 1980).
Contributors¶
This is a non-exhaustive unordered list of contributors. I am sincerely grateful for all of your efforts!
- Philipp Schiele
- Carl Peasnell
- Felipe Schneider
- Dingyuan Wang
- Pat Newell
- Aditya Bhutra
- Thomas Schmelzer
- Rich Caputo
- Nicolas Knudde