Download the Jupyter Notebook for this section: `simulation.ipynb`

# Problem Simulation Tutorial¶

```
[1]:
```

```
import pyblp
import numpy as np
import pandas as pd
pyblp.options.digits = 2
pyblp.options.verbose = False
pyblp.__version__
```

```
[1]:
```

```
'0.8.1'
```

Before configuring and solving a problem with real data, it may be a good idea to perform Monte Carlo analysis on simulated data to verify that it is possible to accurately estimate model parameters. For example, before configuring and solving the example problems in the prior tutorials, it may have been a good idea to simulate data according to the assumed models of supply and demand. During such Monte Carlo anaysis, the data would only be used to determine sample sizes and perhaps to choose reasonable true parameters.

Simulations are configured with the Simulation class, which requires many of the same inputs as Problem. The two main differences are:

Variables in formulations that cannot be loaded from

`product_data`

or`agent_data`

will be drawn from independent uniform distributions.True parameters and the distribution of unobserved product characteristics are specified.

First, we’ll use build_id_data to build market and firm IDs for a model in which there are \(T = 50\) markets, and in each market \(t\), a total of \(J_t = 20\) products produced by \(F = 10\) firms.

```
[2]:
```

```
id_data = pyblp.build_id_data(T=50, J=20, F=10)
```

Next, we’ll create an Integration configuration to build agent data according to a Gauss-Hermite product rule that exactly integrates polynomials of degree \(2 \times 9 - 1 = 17\) or less.

```
[3]:
```

```
integration = pyblp.Integration('product', 9)
integration
```

```
[3]:
```

```
Configured to construct nodes and weights according to the level-9 Gauss-Hermite product rule.
```

We’ll then pass these data to Simulation. We’ll use Formulation configurations to create an \(X_1\) that consists of a constant, prices, and an exogenous characteristic; an \(X_2\) that consists only of the same exogenous characteristic; and an \(X_3\) that consists of the common exogenous characteristic and a cost-shifter.

```
[4]:
```

```
simulation = pyblp.Simulation(
product_formulations=(
pyblp.Formulation('1 + prices + x'),
pyblp.Formulation('0 + x'),
pyblp.Formulation('0 + x + z')
),
beta=[1, -2, 2],
sigma=1,
gamma=[1, 4],
product_data=id_data,
integration=integration,
seed=0
)
simulation
```

```
[4]:
```

```
Dimensions:
=====================================
T N F I K1 K2 K3
--- ---- --- --- ---- ---- ----
50 1000 10 450 3 1 2
=====================================
Formulations:
=================================================
Column Indices: 0 1 2
------------------------------- --- ------ ---
X1: Linear Characteristics 1 prices x
X2: Nonlinear Characteristics x
X3: Linear Cost Characteristics x z
=================================================
Nonlinear Coefficient True Values:
================
Sigma: x
------ --------
x +1.0E+00
================
Beta True Values:
============================
1 prices x
-------- -------- --------
+1.0E+00 -2.0E+00 +2.0E+00
============================
Gamma True Values:
==================
x z
-------- --------
+1.0E+00 +4.0E+00
==================
```

When Simulation is initialized, it constructs Simulation.agent_data and simulates Simulation.product_data.

The Simulation can be further configured with other arguments that determine how unobserved product characteristics are simulated and how marginal costs are specified.

At this stage, simulated variables are not consistent with true parameters, so we still need to solve the simulation with Simulation.replace_endogenous. This method replaced simulated prices and marketshares with values that are consistent with the true parameters. Just like ProblemResults.compute_prices, to do so it iterates over the \(\zeta\)-markup equation from Morrow and Skerlos (2011).

```
[5]:
```

```
simulation_results = simulation.replace_endogenous()
simulation_results
```

```
[5]:
```

```
Simulation Results Summary:
=====================================
Computation Fixed Point Contraction
Time Iterations Evaluations
----------- ----------- -----------
00:00:00 721 721
=====================================
```

Now, we can try to recover the true parameters by creating and solving a Problem.

The convenience method SimulationResults.to_problem constructs some basic “sums of characteristics” BLP instruments that are functions of all exogenous numerical variables in the problem. In this example, excluded demand-side instruments are the cost-shifter `z`

and traditional BLP instruments constructed from `x`

. Excluded supply-side instruments are traditional BLP instruments constructed from `x`

and `z`

.

```
[6]:
```

```
problem = simulation_results.to_problem()
problem
```

```
[6]:
```

```
Dimensions:
=================================================
T N F I K1 K2 K3 MD MS
--- ---- --- --- ---- ---- ---- ---- ----
50 1000 10 450 3 1 2 5 6
=================================================
Formulations:
=================================================
Column Indices: 0 1 2
------------------------------- --- ------ ---
X1: Linear Characteristics 1 prices x
X2: Nonlinear Characteristics x
X3: Linear Cost Characteristics x z
=================================================
```

We’ll choose starting values that are half the true parameters so that the optimization routine has to do some work. Note that since we’re jointly estimating the supply side, we need to provide an initial value for the linear coefficient on prices because this parameter cannot be concentrated out of the problem (unlike linear coefficients on exogenous characteristics).

```
[7]:
```

```
results = problem.solve(
sigma=0.5 * simulation.sigma,
pi=0.5 * simulation.pi,
beta=[None, 0.5 * simulation.beta[1], None]
)
results
```

```
[7]:
```

```
Problem Results Summary:
=====================================================================================================
GMM Objective Projected Reduced Hessian Reduced Hessian Weighting Matrix Covariance Matrix
Step Value Gradient Norm Min Eigenvalue Max Eigenvalue Condition Number Condition Number
---- --------- ------------- --------------- --------------- ---------------- -----------------
2 +6.4E-03 +1.9E-10 +7.2E-03 +3.8E+00 +3.7E+04 +1.5E+04
=====================================================================================================
Cumulative Statistics:
================================================================
Computation Optimization Objective Fixed Point Contraction
Time Iterations Evaluations Iterations Evaluations
----------- ------------ ----------- ----------- -----------
00:00:23 25 30 7762 24616
================================================================
Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
==================
Sigma: x
------ ----------
x +7.8E-01
(+5.2E-01)
==================
Beta Estimates (Robust SEs in Parentheses):
==================================
1 prices x
---------- ---------- ----------
+9.6E-01 -2.0E+00 +2.1E+00
(+9.3E-02) (+2.4E-02) (+1.4E-01)
==================================
Gamma Estimates (Robust SEs in Parentheses):
======================
x z
---------- ----------
+9.8E-01 +4.0E+00
(+8.7E-02) (+8.5E-02)
======================
```

The parameters seem to have been estimated reasonably well.

```
[8]:
```

```
np.c_[simulation.beta, results.beta]
```

```
[8]:
```

```
array([[ 1. , 0.96223514],
[-2. , -2.00792431],
[ 2. , 2.10032015]])
```

```
[9]:
```

```
np.c_[simulation.gamma, results.gamma]
```

```
[9]:
```

```
array([[1. , 0.97820624],
[4. , 4.03121577]])
```

```
[10]:
```

```
np.c_[simulation.sigma, results.sigma]
```

```
[10]:
```

```
array([[1. , 0.78358853]])
```

In addition to checking that the configuration for a model based on actual data makes sense, the Simulation class can also be a helpful tool for better understanding under what general conditions BLP models can be accurately estimated. Simulations are also used extensively in pyblp’s test suite.