pyblp.Problem¶

class
pyblp.
Problem
(product_formulations, product_data, agent_formulation=None, agent_data=None, integration=None)¶ A BLPtype problem.
This class is initialized with relevant data and solved with
Problem.solve()
. Parameters
product_formulations (Formulation or tuple of Formulation) –
Formulation
configuration or tuple of up to threeFormulation
configurations for the matrix of linear product characteristics, \(X_1\), for the matrix of nonlinear product characteristics, \(X_2\), and for the matrix of cost characteristics, \(X_3\), respectively. If the formulation for \(X_3\) is not specified or isNone
, a supply side will not be estimated. Similarly, if the formulation for \(X_2\) is not specified or isNone
, the logit (or nested logit) model will be estimated.Variable names should correspond to fields in
product_data
. Theshares
variable should not be included in any of the formulations andprices
should be included in the formulation for \(X_1\) or \(X_2\) (or both). Theabsorb
argument ofFormulation
can be used to absorb fixed effects into \(X_1\) and \(X_3\), but not \(X_2\). Characteristics in \(X_2\) should generally be included in \(X_1\). The typical exception is characteristics that are collinear with fixed effects that have been absorbed into \(X_1\).Characteristics in \(X_1\) that do not involve
prices
, \(X_1^x\), will be combined with excluded demandside instruments (specified below) to create the full set of demandside instruments, \(Z_D\). Any fixed effects absorbed into \(X_1\) will also be absorbed into \(Z_D\). Similarly, characteristics in \(X_3\) will be combined with the excluded supplyside instruments to create \(Z_S\), and any fixed effects absorbed into \(X_3\) will also be absorbed into \(Z_S\).Warning
Characteristics that involve prices, \(p\), should always be formulated with the
prices
variable. If another name is used,Problem
will not understand that the characteristic is endogenous, so it will be erroneously included in \(Z_D\), and derivatives computed with respect to prices will likely be wrong. For example, to include a \(p^2\) characteristic, includeI(prices**2)
in a formula instead of manually including aprices_squared
variable inproduct_data
and a formula.product_data (structured arraylike) –
Each row corresponds to a product. Markets can have differing numbers of products. The following fields are required:
market_ids : (object)  IDs that associate products with markets.
shares : (numeric)  Marketshares, \(s\), which should be between zero and one, exclusive. Outside shares should also be between zero and one. Shares in each market should sum to less than one.
prices : (numeric)  Product prices, \(p\).
If a formulation for \(X_3\) is specified in
product_formulations
, firm IDs are also required, since they will be used to estimate the supply side of the problem:firm_ids : (object, optional)  IDs that associate products with firms.
Excluded instruments should generally be specified with the following fields:
demand_instruments : (numeric)  Excluded demandside instruments, which, together with the formulated exogenous linear product characteristics, \(X_1^x\), constitute the full set of demandside instruments, \(Z_D\).
supply_instruments : (numeric, optional)  Excluded supplyside instruments, which, together with the formulated cost characteristics, \(X_3\), constitute the full set of supplyside instruments, \(Z_S\).
The recommendation in Conlon and Gortmaker (2019) is to start with differentiation instruments of Gandhi and Houde (2017), which can be built with
build_differentiation_instruments()
, and then compute feasible optimal instruments withProblemResults.compute_optimal_instruments()
in the second stage.If
firm_ids
are specified, custom ownership matrices can be specified as well:ownership : (numeric, optional)  Custom stacked \(J_t \times J_t\) ownership matrices, \(O\), for each market \(t\), which can be built with
build_ownership()
. By default, standard ownership matrices are built only when they are needed to reduce memory usage. If specified, there should be as many columns as there are products in the market with the most products. Rightmost columns in markets with fewer products will be ignored.
Note
Fields that can have multiple columns (
demand_instruments
,supply_instruments
, andownership
) can either be matrices or can be broken up into multiple onedimensional fields with column index suffixes that start at zero. For example, if there are three columns of excluded demandside instruments, ademand_instruments
field with three columns can be replaced by three onedimensional fields:demand_instruments0
,demand_instruments1
, anddemand_instruments2
.To estimate a nested logit or random coefficients nested logit (RCNL) model, nesting groups must be specified:
nesting_ids (object, optional)  IDs that associate products with nesting groups. When these IDs are specified,
rho
must be specified inProblem.solve()
as well.
Finally, clustering groups can be specified to account for withingroup correlation while updating the weighting matrix and estimating standard errors:
clustering_ids (object, optional)  Cluster group IDs, which will be used if
W_type
orse_type
inProblem.solve()
is'clustered'
.
Along with
market_ids
,firm_ids
,nesting_ids
,clustering_ids
, andprices
, the names of any additional fields can typically be used as variables inproduct_formulations
. However, there are a few variable names such as'X1'
, which are reserved for use byProducts
.agent_formulation (Formulation, optional) –
Formulation
configuration for the matrix of observed agent characteristics called demographics, \(d\), which will only be included in the model if this formulation is specified. Since demographics are only used if there are nonlinear product characteristics, this formulation should only be specified if \(X_2\) is formulated inproduct_formulations
. Variable names should correspond to fields inagent_data
.agent_data (structured arraylike, optional) –
Each row corresponds to an agent. Markets can have differing numbers of agents. Since simulated agents are only used if there are nonlinear product characteristics, agent data should only be specified if \(X_2\) is formulated in
product_formulations
. If agent data are specified, market IDs are required:market_ids : (object)  IDs that associate agents with markets. The set of distinct IDs should be the same as the set in
product_data
. Ifintegration
is specified, there must be at least as many rows in each market as the number of nodes and weights that are built for the market.
If
integration
is not specified, the following fields are required:weights : (numeric, optional)  Integration weights, \(w\), for integration over agent choice probabilities.
nodes : (numeric, optional)  Unobserved agent characteristics called integration nodes, \(\nu\). If there are more than \(K_2\) columns (the number of nonlinear product characteristics), only the first \(K_2\) will be retained.
The convenience function
build_integration()
can be useful when constructing custom nodes and weights.Note
If
nodes
has multiple columns, it can be specified as a matrix or broken up into multiple onedimensional fields with column index suffixes that start at zero. For example, if there are three columns of nodes, anodes
field with three columns can be replaced by three onedimensional fields:nodes0
,nodes1
, andnodes2
.Along with
market_ids
, the names of any additional fields can be typically be used as variables inagent_formulation
. The exception is the name'demographics'
, which is reserved for use byAgents
.integration (Integration, optional) –
Integration
configuration for how to build nodes and weights for integration over agent choice probabilities, which will replace anynodes
andweights
fields inagent_data
. This configuration is required ifnodes
andweights
inagent_data
are not specified. It should not be specified if \(X_2\) is not formulated inproduct_formulations
.If this configuration is specified, \(K_2\) columns of nodes (the number of nonlinear product characteristics) will be built. However, if
sigma
inProblem.solve()
is left unspecified or specified with columns fixed at zero, fewer columns will be used.

product_formulations
¶ Formulation
configurations for \(X_1\), \(X_2\), and \(X_3\), respectively. Type
Formulation or tuple of Formulation

agent_formulation
¶ Formulation
configuration for \(d\). Type
Formulation

products
¶ Product data structured as
Products
, which consists of data taken fromproduct_data
along with matrices built according toProblem.product_formulations
. Type
Products

agents
¶ Agent data structured as
Agents
, which consists of data taken fromagent_data
or built byintegration
along with any demographics built according toProblem.agent_formulation
. Type
Agents

unique_market_ids
¶ Unique market IDs in product and agent data.
 Type
ndarray

unique_firm_ids
¶ Unique firm IDs in product data.
 Type
ndarray

unique_nesting_ids
¶ Unique nesting group IDs in product data.
 Type
ndarray

T
¶ Number of markets, \(T\).
 Type
int

N
¶ Number of products across all markets, \(N\).
 Type
int

F
¶ Number of firms across all markets, \(F\).
 Type
int

I
¶ Number of agents across all markets, \(I\).
 Type
int

K1
¶ Number of linear product characteristics, \(K_1\).
 Type
int

K2
¶ Number of nonlinear product characteristics, \(K_2\).
 Type
int

K3
¶ Number of cost product characteristics, \(K_3\).
 Type
int

D
¶ Number of demographic variables, \(D\).
 Type
int

MD
¶ Number of demandside instruments, \(M_D\), which is the number of excluded demandside instruments plus the number of exogenous linear product characteristics, \(K_1^x\).
 Type
int

MS
¶ Number of supplyside instruments, \(M_S\), which is the number of excluded supplyside instruments plus the number of cost product characteristics, \(K_3\).
 Type
int

ED
¶ Number of absorbed dimensions of demandside fixed effects, \(E_D\).
 Type
int

ES
¶ Number of absorbed dimensions of supplyside fixed effects, \(E_S\).
 Type
int

H
¶ Number of nesting groups, \(H\).
 Type
int
Examples
Methods
solve
([sigma, pi, rho, beta, gamma, …])Solve the problem.