pyblp.Problem

class pyblp.Problem(product_formulations, product_data, agent_formulation=None, agent_data=None, integration=None, costs_type='linear')

A BLP-type problem.

This class is initialized with relevant data and solved with Problem.solve().

Parameters
  • product_formulations (Formulation or sequence of Formulation) –

    Formulation configuration or a sequence of up to three Formulation configurations for the matrix of linear product characteristics, \(X_1\), for the matrix of nonlinear product characteristics, \(X_2\), and for the matrix of cost characteristics, \(X_3\), respectively. If the formulation for \(X_3\) is not specified or is None, a supply side will not be estimated. Similarly, if the formulation for \(X_2\) is not specified or is None, the logit (or nested logit) model will be estimated.

    Variable names should correspond to fields in product_data. The shares variable should not be included in any of the formulations and prices should be included in the formulation for \(X_1\) or \(X_2\) (or both). The absorb argument of Formulation can be used to absorb fixed effects into \(X_1\) and \(X_3\), but not \(X_2\). Characteristics in \(X_2\) should generally be included in \(X_1\). The typical exception is characteristics that are collinear with fixed effects that have been absorbed into \(X_1\).

    Characteristics in \(X_1\) that do not involve prices, \(X_1^x\), will be combined with excluded demand-side instruments (specified below) to create the full set of demand-side instruments, \(Z_D\). Any fixed effects absorbed into \(X_1\) will also be absorbed into \(Z_D\). Similarly, characteristics in \(X_3\) will be combined with the excluded supply-side instruments to create \(Z_S\), and any fixed effects absorbed into \(X_3\) will also be absorbed into \(Z_S\).

    Warning

    Characteristics that involve prices, \(p\), should always be formulated with the prices variable. If another name is used, Problem will not understand that the characteristic is endogenous, so it will be erroneously included in \(Z_D\), and derivatives computed with respect to prices will likely be wrong. For example, to include a \(p^2\) characteristic, include I(prices**2) in a formula instead of manually including a prices_squared variable in product_data and a formula.

  • product_data (structured array-like) –

    Each row corresponds to a product. Markets can have differing numbers of products. The following fields are required:

    • market_ids : (object) - IDs that associate products with markets.

    • shares : (numeric) - Marketshares, \(s\), which should be between zero and one, exclusive. Outside shares should also be between zero and one. Shares in each market should sum to less than one.

    • prices : (numeric) - Product prices, \(p\).

    If a formulation for \(X_3\) is specified in product_formulations, firm IDs are also required, since they will be used to estimate the supply side of the problem:

    • firm_ids : (object, optional) - IDs that associate products with firms.

    Excluded instruments should generally be specified with the following fields:

    • demand_instruments : (numeric) - Excluded demand-side instruments, which, together with the formulated exogenous linear product characteristics, \(X_1^x\), constitute the full set of demand-side instruments, \(Z_D\).

    • supply_instruments : (numeric, optional) - Excluded supply-side instruments, which, together with the formulated cost characteristics, \(X_3\), constitute the full set of supply-side instruments, \(Z_S\).

    The recommendation in Conlon and Gortmaker (2019) is to start with differentiation instruments of Gandhi and Houde (2017), which can be built with build_differentiation_instruments(), and then compute feasible optimal instruments with ProblemResults.compute_optimal_instruments() in the second stage.

    If firm_ids are specified, custom ownership matrices can be specified as well:

    • ownership : (numeric, optional) - Custom stacked \(J_t \times J_t\) ownership matrices, \(O\), for each market \(t\), which can be built with build_ownership(). By default, standard ownership matrices are built only when they are needed to reduce memory usage. If specified, there should be as many columns as there are products in the market with the most products. Rightmost columns in markets with fewer products will be ignored.

    Note

    Fields that can have multiple columns (demand_instruments, supply_instruments, and ownership) can either be matrices or can be broken up into multiple one-dimensional fields with column index suffixes that start at zero. For example, if there are three columns of excluded demand-side instruments, a demand_instruments field with three columns can be replaced by three one-dimensional fields: demand_instruments0, demand_instruments1, and demand_instruments2.

    To estimate a nested logit or random coefficients nested logit (RCNL) model, nesting groups must be specified:

    • nesting_ids (object, optional) - IDs that associate products with nesting groups. When these IDs are specified, rho must be specified in Problem.solve() as well.

    Finally, clustering groups can be specified to account for within-group correlation while updating the weighting matrix and estimating standard errors:

    • clustering_ids (object, optional) - Cluster group IDs, which will be used if W_type or se_type in Problem.solve() is 'clustered'.

    Along with market_ids, firm_ids, nesting_ids, clustering_ids, and prices, the names of any additional fields can typically be used as variables in product_formulations. However, there are a few variable names such as 'X1', which are reserved for use by Products.

  • agent_formulation (Formulation, optional) – Formulation configuration for the matrix of observed agent characteristics called demographics, \(d\), which will only be included in the model if this formulation is specified. Since demographics are only used if there are nonlinear product characteristics, this formulation should only be specified if \(X_2\) is formulated in product_formulations. Variable names should correspond to fields in agent_data.

  • agent_data (structured array-like, optional) –

    Each row corresponds to an agent. Markets can have differing numbers of agents. Since simulated agents are only used if there are nonlinear product characteristics, agent data should only be specified if \(X_2\) is formulated in product_formulations. If agent data are specified, market IDs are required:

    • market_ids : (object) - IDs that associate agents with markets. The set of distinct IDs should be the same as the set in product_data. If integration is specified, there must be at least as many rows in each market as the number of nodes and weights that are built for the market.

    If integration is not specified, the following fields are required:

    • weights : (numeric, optional) - Integration weights, \(w\), for integration over agent choice probabilities.

    • nodes : (numeric, optional) - Unobserved agent characteristics called integration nodes, \(\nu\). If there are more than \(K_2\) columns (the number of nonlinear product characteristics), only the first \(K_2\) will be retained.

    The convenience function build_integration() can be useful when constructing custom nodes and weights.

    Note

    If nodes has multiple columns, it can be specified as a matrix or broken up into multiple one-dimensional fields with column index suffixes that start at zero. For example, if there are three columns of nodes, a nodes field with three columns can be replaced by three one-dimensional fields: nodes0, nodes1, and nodes2.

    Along with market_ids, the names of any additional fields can be typically be used as variables in agent_formulation. The exception is the name 'demographics', which is reserved for use by Agents.

  • integration (Integration, optional) –

    Integration configuration for how to build nodes and weights for integration over agent choice probabilities, which will replace any nodes and weights fields in agent_data. This configuration is required if nodes and weights in agent_data are not specified. It should not be specified if \(X_2\) is not formulated in product_formulations.

    If this configuration is specified, \(K_2\) columns of nodes (the number of nonlinear product characteristics) will be built. However, if sigma in Problem.solve() is left unspecified or specified with columns fixed at zero, fewer columns will be used.

  • costs_type (str, optional) –

    Functional form of the marginal cost function \(\tilde{c} = f(c)\) in (9). The following specifications are supported:

    • 'linear' (default) - Linear specification: \(\tilde{c} = c\).

    • 'log' - Log-linear specification: \(\tilde{c} = \log c\).

    This specification is only relevant if \(X_3\) is formulated.

product_formulations

Formulation configurations for \(X_1\), \(X_2\), and \(X_3\), respectively.

Type

Formulation or sequence of Formulation

agent_formulation

Formulation configuration for \(d\).

Type

Formulation

products

Product data structured as Products, which consists of data taken from product_data along with matrices built according to Problem.product_formulations.

Type

Products

agents

Agent data structured as Agents, which consists of data taken from agent_data or built by integration along with any demographics built according to Problem.agent_formulation.

Type

Agents

unique_market_ids

Unique market IDs in product and agent data.

Type

ndarray

unique_firm_ids

Unique firm IDs in product data.

Type

ndarray

unique_nesting_ids

Unique nesting group IDs in product data.

Type

ndarray

costs_type

Functional form of the marginal cost function \(\tilde{c} = f(c)\).

Type

str

T

Number of markets, \(T\).

Type

int

N

Number of products across all markets, \(N\).

Type

int

F

Number of firms across all markets, \(F\).

Type

int

I

Number of agents across all markets, \(I\).

Type

int

K1

Number of linear product characteristics, \(K_1\).

Type

int

K2

Number of nonlinear product characteristics, \(K_2\).

Type

int

K3

Number of cost product characteristics, \(K_3\).

Type

int

D

Number of demographic variables, \(D\).

Type

int

MD

Number of demand-side instruments, \(M_D\), which is the number of excluded demand-side instruments plus the number of exogenous linear product characteristics, \(K_1^x\).

Type

int

MS

Number of supply-side instruments, \(M_S\), which is the number of excluded supply-side instruments plus the number of cost product characteristics, \(K_3\).

Type

int

ED

Number of absorbed dimensions of demand-side fixed effects, \(E_D\).

Type

int

ES

Number of absorbed dimensions of supply-side fixed effects, \(E_S\).

Type

int

H

Number of nesting groups, \(H\).

Type

int

Examples

Methods

solve([sigma, pi, rho, beta, gamma, …])

Solve the problem.