pyblp.Simulation¶

class
pyblp.
Simulation
(product_formulations, beta, sigma, gamma, product_data, agent_formulation=None, pi=None, agent_data=None, integration=None, rho=None, xi=None, omega=None, xi_variance=1, omega_variance=1, correlation=0.9, costs_type='linear', seed=None)¶ Simulation of synthetic data from BLPtype models.
All data are either loaded or simulated during initialization, except for synthetic prices and shares, which are computed by
Simulation.solve()
.Unspecified exogenous variables that are used to formulate product characteristics in \(X_1\), \(X_2\), and \(X_3\), as well as agent demographics, \(d\), are all drawn independently from the standard uniform distribution.
Unobserved demand and supplyside product characteristics, \(\xi\) and \(\omega\), are drawn from a meanzero bivariate normal distribution.
After variables are loaded or simulated, any unspecified integration nodes and weights, \(\nu\) and \(w\), are constructed according to a specified
Integration
configuration.Next, traditional excluded BLP instruments are constructed. Demandside instruments are BLP instruments constructed by
build_blp_instruments()
from \(X_1^x\), along with any supply shifters (variables in \(X_3\) but not \(X_1\). Supply side instruments are BLP instruments constructed from \(X_3\), along with any demand shifters (variables in \(X_1\) but not \(X_3\)). BLP instruments for constant characteristics are constructed only if there is variation in \(J_t\), the number of products per market.Note
These excluded instruments are constructed only for convenience. Especially for more complicated formulations, they should be replaced with better instruments. For example, instruments constructed with
build_differentiation_instruments()
may be preferable. Parameters
product_formulations (tuple) –
Tuple of three
Formulation
configurations for the matrix of linear product characteristics, \(X_1\), for the matrix of nonlinear product characteristics, \(X_2\), and for the matrix of cost characteristics, \(X_3\), respectively. If the formulation for \(X_2\) isNone
, the logit (or nested logit) model will be simulated.The
shares
variable should not be included in any of the formulations andprices
should be included in the formulation for \(X_1\) or \(X_2\) (or both). All exogenous characteristics in \(X_2\) should also be included in \(X_1\). Any additional variables that cannot be loaded fromproduct_data
will be drawn from independent standard uniform distributions. Unlike inProblem
, fixed effect absorption is not supported during simulation.beta (arraylike) – Vector of demandside linear parameters, \(\beta\). Elements correspond to columns in \(X_1\), which is formulated by
product_formulations
.sigma (arraylike) – Cholesky root of the covariance matrix for unobserved taste heterogeneity, \(\Sigma\), which is an upper triangular matrix. Rows and columns correspond to columns in \(X_2\), which is formulated by
product_formulations
. If the formulation for \(X_2\) isNone
, this should beNone
as well.gamma (arraylike) – Vector of supplyside linear parameters, \(\gamma\). Elements correspond to columns in \(X_3\), which is formulated by
product_formulations
.product_data (structured arraylike) –
Each row corresponds to a product. Markets can have differing numbers of products. The convenience function
build_id_data()
can be used to construct the following required ID data:market_ids : (object)  IDs that associate products with markets.
firm_ids : (object)  IDs that associate products with firms.
Custom ownership matrices can be specified as well:
ownership : (numeric, optional’)  Custom stacked :math:`J_t times J_t ownership matrices, \(O\), for each market \(t\), which can be built with
build_ownership()
. By default, standard ownership matrices are built only when they are needed to reduce memory usage. If specified, there should be as many columns as there are products in the market with the most products. Rightmost columns in markets with fewer products will be ignored.
Note
If
ownership
has multiple columns, it can be specified as a matrix or broken up into multiple onedimensional fields with column index suffixes that start at zero. For example, if there are three columns of ownership information, aownership
field with three columns can be replaced by three onedimensional fields:ownership0
,ownership1
, andownership2
.To simulate a nested logit or random coefficients nested logit (RCNL) model, nesting groups must be specified:
nesting_ids (object, optional)  IDs that associate products with nesting groups. When these IDs are specified,
rho
must be specified as well.
Along with
market_ids
,firm_ids
, andnesting_ids
, the names of any additional fields can typically be used as variables inproduct_formulations
. However, there are a few variable names such as'X1'
, which are reserved for use byProducts
.agent_formulation (Formulation, optional) –
Formulation
configuration for the matrix of observed agent characteristics called demographics, \(d\), which will only be included in the model if this formulation is specified. Any variables that cannot be loaded fromagent_data
will be drawn from independent standard uniform distributions.pi (arraylike, optional) – Parameters that measure how agent tastes vary with demographics, \(\Pi\). Rows correspond to the same product characteristics as in
sigma
. Columns correspond to columns in \(d\), which is formulated byagent_formulation
.agent_data (structured arraylike, optional) –
Each row corresponds to an agent. Markets can have differing numbers of agents. Since simulated agents are only used if there are nonlinear product characteristics, agent data should only be specified if \(X_2\) is formulated in
product_formulations
. If agent data are specified, market IDs are required:market_ids : (object, optional)  IDs that associate agents with markets. The set of distinct IDs should be the same as the set in
product_data
. Ifintegration
is specified, there must be at least as many rows in each market as the number of nodes and weights that are built for the market.
If
integration
is not specified, the following fields are required:weights : (numeric, optional)  Integration weights, \(w\), for integration over agent choice probabilities.
nodes : (numeric, optional)  Unobserved agent characteristics called integration nodes, \(\nu\). If there are more than \(K_2\) columns (the number of nonlinear product characteristics), only the first \(K_2\) will be used.
The convenience function
build_integration()
can be useful when constructing custom nodes and weights.Note
If
nodes
has multiple columns, it can be specified as a matrix or broken up into multiple onedimensional fields with column index suffixes that start at zero. For example, if there are three columns of nodes, anodes
field with three columns can be replaced by three onedimensional fields:nodes0
,nodes1
, andnodes2
.Along with
market_ids
, the names of any additional fields can typically be used as variables inagent_formulation
. The exception is the name'demographics'
, which is reserved for use byAgents
.integration (Integration, optional) –
Integration
configuration for how to build nodes and weights for integration over agent choice probabilities, which will replace anynodes
andweights
fields inagent_data
. This configuration is required ifnodes
andweights
inagent_data
are not specified. It should not be specified if \(X_2\) is not formulated inproduct_formulations
.If this configuration is specified, \(K_2\) columns of nodes (the number of nonlinear product characteristics) will be built. However, if
sigma
is left unspecified or is specified with columns fixed at zero, fewer columns will be used.rho (arraylike, optional) – Parameters that measure within nesting group correlation, \(\rho\). If this is a scalar, it corresponds to all groups defined by the
nesting_ids
field ofproduct_data
. If this is a vector, it must have \(H\) elements, one for each nesting group. Elements correspond to group IDs in the sorted order ofSimulation.unique_nesting_ids
. If nesting IDs were not specified, this should not be specified either.xi (arraylike, optional) – Unobserved demandside product characteristics, \(\xi\). By default, each pair of unobserved characteristics in this and \(\omega\) is drawn from a meanzero bivariate normal distribution. This must be specified if
omega
is specified.omega (arraylike, optional) – Unobserved supplyside product characteristics, \(\omega\). By default, each pair of unobserved characteristics in this and \(\xi\) is drawn from a meanzero bivariate normal distribution. This must be specified if
xi
is specified.xi_variance (float, optional) – Variance of \(\xi\). The default value is
1.0
. This is ignored ifxi
andomega
are specified.omega_variance (float, optional) – Variance of \(\omega\). The default value is
1.0
. This is ignored ifxi
andomega
are specified.correlation (float, optional) – Correlation between \(\xi\) and \(\omega\). The default value is
0.9
. This is ignored ifxi
andomega
are specified.costs_type (str, optional) –
Specification of the marginal cost function \(\tilde{c} = f(c)\) in (9). The following specifications are supported:
'linear'
(default)  Linear specification: \(\tilde{c} = c\).'log'
 Loglinear specification: \(\tilde{c} = \log c\).
seed (int, optional) – Passed to
numpy.random.RandomState
to seed the random number generator before data are simulated. By default, a seed is not passed to the random number generator.

product_formulations
¶ Formulation
configurations for \(X_1\), \(X_2\), and \(X_3\), respectively. Type
tuple

agent_formulation
¶ Formulation
configuration for \(d\). Type
tuple

product_data
¶ Synthetic product data that were loaded or simulated during initialization, except for synthetic prices and shares, which are computed by
Simulation.solve()
. Type
recarray

agent_data
¶ Synthetic agent data that were loaded or simulated during initialization.
 Type
recarray

integration
¶ Integration
configuration for how any nodes and weights were built during initialization. Type
Integration

products
¶ Product data structured as
Products
, which consists of data taken fromSimulation.product_data
along with matrices build according toSimulation.product_formulations
. Type
Products

agents
¶ Agent data structured as
Agents
, which consists of data taken fromSimulation.agent_data
or built bySimulation.integration
along with any demographics formulated bySimulation.agent_formulation
. Type
Agents

unique_market_ids
¶ Unique market IDs in product and agent data.
 Type
ndarray

unique_firm_ids
¶ Unique firm IDs in product data.
 Type
ndarray

unique_nesting_ids
¶ Unique nesting IDs in product data.
 Type
ndarray

beta
¶ Demandside linear parameters, \(\beta\).
 Type
ndarray

sigma
¶ Cholesky root of the covariance matrix for unobserved taste heterogeneity, \(\Sigma\).
 Type
ndarray

gamma
¶ Supplyside linear parameters, \(\gamma\).
 Type
ndarray

pi
¶ Parameters that measures how agent tastes vary with demographics, \(\Pi\).
 Type
ndarray

rho
¶ Parameters that measure within nesting group correlation, \(\rho\).
 Type
ndarray

xi
¶ Unobserved demandside product characteristics, \(\xi\).
 Type
ndarray

omega
¶ Unobserved supplyside product characteristics, \(\omega\).
 Type
ndarray

costs
¶ Marginal costs, \(c\), which was constructed during initialization.
 Type
ndarray

costs_type
¶ The specification according to which
Simulation.costs
was constructed during initialization. Type
str

T
¶ Number of markets, \(T\).
 Type
int

N
¶ Number of products across all markets, \(N\).
 Type
int

F
¶ Number of firms across all markets, \(F\).
 Type
int

I
¶ Number of agents across all markets, \(I\).
 Type
int

K1
¶ Number of linear product characteristics, \(K_1\).
 Type
int

K2
¶ Number of nonlinear product characteristics, \(K_2\).
 Type
int

K3
¶ Number of cost product characteristics, \(K_3\).
 Type
int

D
¶ Number of demographic variables, \(D\).
 Type
int

MD
¶ Number of demandside instruments, \(M_D\), which is the number of excluded demandside instruments plus the number of exogenous linear product characteristics, \(K_1^x\).
 Type
int

MS
¶ Number of supplyside instruments, \(M_S\), which is the number of excluded supplyside instruments plus the number of cost product characteristics, \(K_3\).
 Type
int

ED
¶ Number of absorbed dimensions of demandside fixed effects, \(E_D\), which is always zero because simulations do not support fixed effect absorption.
 Type
int

ES
¶ Number of absorbed dimensions of supplyside fixed effects, \(E_S\), which is always zero because simulations do not support fixed effect absorption.
 Type
int

H
¶ Number of nesting groups, \(H\).
 Type
int
Examples
Methods
solve
([firm_ids, ownership, prices, …])Compute synthetic prices and shares.