pyblp.Simulation

class pyblp.Simulation(product_formulations, beta, sigma, gamma, product_data, agent_formulation=None, pi=None, agent_data=None, integration=None, rho=None, xi=None, omega=None, xi_variance=1, omega_variance=1, correlation=0.9, costs_type='linear', seed=None)

Simulation of synthetic data from BLP-type models.

All data are either loaded or simulated during initialization, except for synthetic prices and shares, which are computed by Simulation.solve().

Unspecified exogenous variables that are used to formulate product characteristics in \(X_1\), \(X_2\), and \(X_3\), as well as agent demographics, \(d\), are all drawn independently from the standard uniform distribution.

Unobserved demand- and supply-side product characteristics, \(\xi\) and \(\omega\), are drawn from a mean-zero bivariate normal distribution.

After variables are loaded or simulated, any unspecified integration nodes and weights, \(\nu\) and \(w\), are constructed according to a specified Integration configuration.

Next, traditional excluded BLP instruments are constructed. Demand-side instruments are BLP instruments constructed by build_blp_instruments() from \(X_1^x\), along with any supply shifters (variables in \(X_3\) but not \(X_1\). Supply side instruments are BLP instruments constructed from \(X_3\), along with any demand shifters (variables in \(X_1\) but not \(X_3\)). BLP instruments for constant characteristics are constructed only if there is variation in \(J_t\), the number of products per market.

Note

These excluded instruments are constructed only for convenience. Especially for more complicated formulations, they should be replaced with better instruments. For example, instruments constructed with build_differentiation_instruments() may be preferable.

Parameters
  • product_formulations (tuple) –

    Tuple of three Formulation configurations for the matrix of linear product characteristics, \(X_1\), for the matrix of nonlinear product characteristics, \(X_2\), and for the matrix of cost characteristics, \(X_3\), respectively. If the formulation for \(X_2\) is None, the logit (or nested logit) model will be simulated.

    The shares variable should not be included in any of the formulations and prices should be included in the formulation for \(X_1\) or \(X_2\) (or both). All exogenous characteristics in \(X_2\) should also be included in \(X_1\). Any additional variables that cannot be loaded from product_data will be drawn from independent standard uniform distributions. Unlike in Problem, fixed effect absorption is not supported during simulation.

  • beta (array-like) – Vector of demand-side linear parameters, \(\beta\). Elements correspond to columns in \(X_1\), which is formulated by product_formulations.

  • sigma (array-like) – Cholesky root of the covariance matrix for unobserved taste heterogeneity, \(\Sigma\), which is an upper triangular matrix. Rows and columns correspond to columns in \(X_2\), which is formulated by product_formulations. If the formulation for \(X_2\) is None, this should be None as well.

  • gamma (array-like) – Vector of supply-side linear parameters, \(\gamma\). Elements correspond to columns in \(X_3\), which is formulated by product_formulations.

  • product_data (structured array-like) –

    Each row corresponds to a product. Markets can have differing numbers of products. The convenience function build_id_data() can be used to construct the following required ID data:

    • market_ids : (object) - IDs that associate products with markets.

    • firm_ids : (object) - IDs that associate products with firms.

    Custom ownership matrices can be specified as well:

    • ownership : (numeric, optional’) - Custom stacked :math:`J_t times J_t ownership matrices, \(O\), for each market \(t\), which can be built with build_ownership(). By default, standard ownership matrices are built only when they are needed to reduce memory usage. If specified, there should be as many columns as there are products in the market with the most products. Rightmost columns in markets with fewer products will be ignored.

    Note

    If ownership has multiple columns, it can be specified as a matrix or broken up into multiple one-dimensional fields with column index suffixes that start at zero. For example, if there are three columns of ownership information, a ownership field with three columns can be replaced by three one-dimensional fields: ownership0, ownership1, and ownership2.

    To simulate a nested logit or random coefficients nested logit (RCNL) model, nesting groups must be specified:

    • nesting_ids (object, optional) - IDs that associate products with nesting groups. When these IDs are specified, rho must be specified as well.

    Along with market_ids, firm_ids, and nesting_ids, the names of any additional fields can typically be used as variables in product_formulations. However, there are a few variable names such as 'X1', which are reserved for use by Products.

  • agent_formulation (Formulation, optional) – Formulation configuration for the matrix of observed agent characteristics called demographics, \(d\), which will only be included in the model if this formulation is specified. Any variables that cannot be loaded from agent_data will be drawn from independent standard uniform distributions.

  • pi (array-like, optional) – Parameters that measure how agent tastes vary with demographics, \(\Pi\). Rows correspond to the same product characteristics as in sigma. Columns correspond to columns in \(d\), which is formulated by agent_formulation.

  • agent_data (structured array-like, optional) –

    Each row corresponds to an agent. Markets can have differing numbers of agents. Since simulated agents are only used if there are nonlinear product characteristics, agent data should only be specified if \(X_2\) is formulated in product_formulations. If agent data are specified, market IDs are required:

    • market_ids : (object, optional) - IDs that associate agents with markets. The set of distinct IDs should be the same as the set in product_data. If integration is specified, there must be at least as many rows in each market as the number of nodes and weights that are built for the market.

    If integration is not specified, the following fields are required:

    • weights : (numeric, optional) - Integration weights, \(w\), for integration over agent choice probabilities.

    • nodes : (numeric, optional) - Unobserved agent characteristics called integration nodes, \(\nu\). If there are more than \(K_2\) columns (the number of nonlinear product characteristics), only the first \(K_2\) will be used.

    The convenience function build_integration() can be useful when constructing custom nodes and weights.

    Note

    If nodes has multiple columns, it can be specified as a matrix or broken up into multiple one-dimensional fields with column index suffixes that start at zero. For example, if there are three columns of nodes, a nodes field with three columns can be replaced by three one-dimensional fields: nodes0, nodes1, and nodes2.

    Along with market_ids, the names of any additional fields can typically be used as variables in agent_formulation. The exception is the name 'demographics', which is reserved for use by Agents.

  • integration (Integration, optional) –

    Integration configuration for how to build nodes and weights for integration over agent choice probabilities, which will replace any nodes and weights fields in agent_data. This configuration is required if nodes and weights in agent_data are not specified. It should not be specified if \(X_2\) is not formulated in product_formulations.

    If this configuration is specified, \(K_2\) columns of nodes (the number of nonlinear product characteristics) will be built. However, if sigma is left unspecified or is specified with columns fixed at zero, fewer columns will be used.

  • rho (array-like, optional) – Parameters that measure within nesting group correlation, \(\rho\). If this is a scalar, it corresponds to all groups defined by the nesting_ids field of product_data. If this is a vector, it must have \(H\) elements, one for each nesting group. Elements correspond to group IDs in the sorted order of Simulation.unique_nesting_ids. If nesting IDs were not specified, this should not be specified either.

  • xi (array-like, optional) – Unobserved demand-side product characteristics, \(\xi\). By default, each pair of unobserved characteristics in this and \(\omega\) is drawn from a mean-zero bivariate normal distribution. This must be specified if omega is specified.

  • omega (array-like, optional) – Unobserved supply-side product characteristics, \(\omega\). By default, each pair of unobserved characteristics in this and \(\xi\) is drawn from a mean-zero bivariate normal distribution. This must be specified if xi is specified.

  • xi_variance (float, optional) – Variance of \(\xi\). The default value is 1.0. This is ignored if xi and omega are specified.

  • omega_variance (float, optional) – Variance of \(\omega\). The default value is 1.0. This is ignored if xi and omega are specified.

  • correlation (float, optional) – Correlation between \(\xi\) and \(\omega\). The default value is 0.9. This is ignored if xi and omega are specified.

  • costs_type (str, optional) –

    Specification of the marginal cost function \(\tilde{c} = f(c)\) in (9). The following specifications are supported:

    • 'linear' (default) - Linear specification: \(\tilde{c} = c\).

    • 'log' - Log-linear specification: \(\tilde{c} = \log c\).

  • seed (int, optional) – Passed to numpy.random.RandomState to seed the random number generator before data are simulated. By default, a seed is not passed to the random number generator.

product_formulations

Formulation configurations for \(X_1\), \(X_2\), and \(X_3\), respectively.

Type

tuple

agent_formulation

Formulation configuration for \(d\).

Type

tuple

product_data

Synthetic product data that were loaded or simulated during initialization, except for synthetic prices and shares, which are computed by Simulation.solve().

Type

recarray

agent_data

Synthetic agent data that were loaded or simulated during initialization.

Type

recarray

integration

Integration configuration for how any nodes and weights were built during initialization.

Type

Integration

products

Product data structured as Products, which consists of data taken from Simulation.product_data along with matrices build according to Simulation.product_formulations.

Type

Products

agents

Agent data structured as Agents, which consists of data taken from Simulation.agent_data or built by Simulation.integration along with any demographics formulated by Simulation.agent_formulation.

Type

Agents

unique_market_ids

Unique market IDs in product and agent data.

Type

ndarray

unique_firm_ids

Unique firm IDs in product data.

Type

ndarray

unique_nesting_ids

Unique nesting IDs in product data.

Type

ndarray

beta

Demand-side linear parameters, \(\beta\).

Type

ndarray

sigma

Cholesky root of the covariance matrix for unobserved taste heterogeneity, \(\Sigma\).

Type

ndarray

gamma

Supply-side linear parameters, \(\gamma\).

Type

ndarray

pi

Parameters that measures how agent tastes vary with demographics, \(\Pi\).

Type

ndarray

rho

Parameters that measure within nesting group correlation, \(\rho\).

Type

ndarray

xi

Unobserved demand-side product characteristics, \(\xi\).

Type

ndarray

omega

Unobserved supply-side product characteristics, \(\omega\).

Type

ndarray

costs

Marginal costs, \(c\), which was constructed during initialization.

Type

ndarray

costs_type

The specification according to which Simulation.costs was constructed during initialization.

Type

str

T

Number of markets, \(T\).

Type

int

N

Number of products across all markets, \(N\).

Type

int

F

Number of firms across all markets, \(F\).

Type

int

I

Number of agents across all markets, \(I\).

Type

int

K1

Number of linear product characteristics, \(K_1\).

Type

int

K2

Number of nonlinear product characteristics, \(K_2\).

Type

int

K3

Number of cost product characteristics, \(K_3\).

Type

int

D

Number of demographic variables, \(D\).

Type

int

MD

Number of demand-side instruments, \(M_D\), which is the number of excluded demand-side instruments plus the number of exogenous linear product characteristics, \(K_1^x\).

Type

int

MS

Number of supply-side instruments, \(M_S\), which is the number of excluded supply-side instruments plus the number of cost product characteristics, \(K_3\).

Type

int

ED

Number of absorbed dimensions of demand-side fixed effects, \(E_D\), which is always zero because simulations do not support fixed effect absorption.

Type

int

ES

Number of absorbed dimensions of supply-side fixed effects, \(E_S\), which is always zero because simulations do not support fixed effect absorption.

Type

int

H

Number of nesting groups, \(H\).

Type

int

Examples

Methods

solve([firm_ids, ownership, prices, …])

Compute synthetic prices and shares.