Download the Jupyter Notebook for this section: simulation.ipynb

Problem Simulation Tutorial

[1]:
import pyblp
import numpy as np
import pandas as pd

pyblp.options.digits = 2
pyblp.options.verbose = False
pyblp.__version__
[1]:
'1.1.0'

Before configuring and solving a problem with real data, it may be a good idea to perform Monte Carlo analysis on simulated data to verify that it is possible to accurately estimate model parameters. For example, before configuring and solving the example problems in the prior tutorials, it may have been a good idea to simulate data according to the assumed models of supply and demand. During such Monte Carlo anaysis, the data would only be used to determine sample sizes and perhaps to choose reasonable true parameters.

Simulations are configured with the Simulation class, which requires many of the same inputs as Problem. The two main differences are:

  1. Variables in formulations that cannot be loaded from product_data or agent_data will be drawn from independent uniform distributions.

  2. True parameters and the distribution of unobserved product characteristics are specified.

First, we’ll use build_id_data to build market and firm IDs for a model in which there are \(T = 50\) markets, and in each market \(t\), a total of \(J_t = 20\) products produced by \(F = 10\) firms.

[2]:
id_data = pyblp.build_id_data(T=50, J=20, F=10)

Next, we’ll create an Integration configuration to build agent data according to a Gauss-Hermite product rule that exactly integrates polynomials of degree \(2 \times 9 - 1 = 17\) or less.

[3]:
integration = pyblp.Integration('product', 9)
integration
[3]:
Configured to construct nodes and weights according to the level-9 Gauss-Hermite product rule with options {}.

We’ll then pass these data to Simulation. We’ll use Formulation configurations to create an \(X_1\) that consists of a constant, prices, and an exogenous characteristic; an \(X_2\) that consists only of the same exogenous characteristic; and an \(X_3\) that consists of the common exogenous characteristic and a cost-shifter.

[4]:
simulation = pyblp.Simulation(
   product_formulations=(
       pyblp.Formulation('1 + prices + x'),
       pyblp.Formulation('0 + x'),
       pyblp.Formulation('0 + x + z')
   ),
   beta=[1, -2, 2],
   sigma=1,
   gamma=[1, 4],
   product_data=id_data,
   integration=integration,
   seed=0
)
simulation
[4]:
Dimensions:
=====================================
 T    N     F    I    K1    K2    K3
---  ----  ---  ---  ----  ----  ----
50   1000  10   450   3     1     2
=====================================

Formulations:
=================================================
        Column Indices:           0     1      2
-------------------------------  ---  ------  ---
  X1: Linear Characteristics      1   prices   x
 X2: Nonlinear Characteristics    x
X3: Linear Cost Characteristics   x     z
=================================================

Nonlinear Coefficient True Values:
================
Sigma:     x
------  --------
  x     +1.0E+00
================

Beta True Values:
============================
   1       prices      x
--------  --------  --------
+1.0E+00  -2.0E+00  +2.0E+00
============================

Gamma True Values:
==================
   x         z
--------  --------
+1.0E+00  +4.0E+00
==================

When Simulation is initialized, it constructs Simulation.agent_data and simulates Simulation.product_data.

The Simulation can be further configured with other arguments that determine how unobserved product characteristics are simulated and how marginal costs are specified.

At this stage, simulated variables are not consistent with true parameters, so we still need to solve the simulation with Simulation.replace_endogenous. This method replaced simulated prices and market shares with values that are consistent with the true parameters. Just like ProblemResults.compute_prices, to do so it iterates over the \(\zeta\)-markup equation from Morrow and Skerlos (2011).

[5]:
simulation_results = simulation.replace_endogenous()
simulation_results
[5]:
Simulation Results Summary:
======================================================================================================
Computation  Fixed Point  Fixed Point  Contraction  Profit Gradients  Profit Hessians  Profit Hessians
   Time       Failures    Iterations   Evaluations      Max Norm      Min Eigenvalue   Max Eigenvalue
-----------  -----------  -----------  -----------  ----------------  ---------------  ---------------
 00:00:00         0           721          721          +1.3E-13         -8.4E-01         -9.6E-06
======================================================================================================

Now, we can try to recover the true parameters by creating and solving a Problem.

The convenience method SimulationResults.to_problem constructs some basic “sums of characteristics” BLP instruments that are functions of all exogenous numerical variables in the problem. In this example, excluded demand-side instruments are the cost-shifter z and traditional BLP instruments constructed from x. Excluded supply-side instruments are traditional BLP instruments constructed from x and z.

[6]:
problem = simulation_results.to_problem()
problem
[6]:
Dimensions:
=================================================
 T    N     F    I    K1    K2    K3    MD    MS
---  ----  ---  ---  ----  ----  ----  ----  ----
50   1000  10   450   3     1     2     5     6
=================================================

Formulations:
=================================================
        Column Indices:           0     1      2
-------------------------------  ---  ------  ---
  X1: Linear Characteristics      1   prices   x
 X2: Nonlinear Characteristics    x
X3: Linear Cost Characteristics   x     z
=================================================

We’ll choose starting values that are half the true parameters so that the optimization routine has to do some work. Note that since we’re jointly estimating the supply side, we need to provide an initial value for the linear coefficient on prices because this parameter cannot be concentrated out of the problem (unlike linear coefficients on exogenous characteristics).

[8]:
results = problem.solve(
    sigma=0.5 * simulation.sigma,
    pi=0.5 * simulation.pi,
    beta=[None, 0.5 * simulation.beta[1, 0], None],
    optimization=pyblp.Optimization('l-bfgs-b', {'gtol': 1e-5})
)
results
[8]:
Problem Results Summary:
==============================================================================================================
GMM   Objective    Projected    Reduced Hessian  Reduced Hessian  Clipped  Weighting Matrix  Covariance Matrix
Step    Value    Gradient Norm  Min Eigenvalue   Max Eigenvalue   Shares   Condition Number  Condition Number
----  ---------  -------------  ---------------  ---------------  -------  ----------------  -----------------
 2    +6.4E+00     +6.9E-08        +7.2E+00         +3.8E+03         0         +3.7E+04          +1.5E+04
==============================================================================================================

Cumulative Statistics:
===========================================================================
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:00:04       Yes          23           30          8289         26305
===========================================================================

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
==================
Sigma:      x
------  ----------
  x      +7.8E-01
        (+5.2E-01)
==================

Beta Estimates (Robust SEs in Parentheses):
==================================
    1         prices        x
----------  ----------  ----------
 +9.6E-01    -2.0E+00    +2.1E+00
(+9.3E-02)  (+2.4E-02)  (+1.4E-01)
==================================

Gamma Estimates (Robust SEs in Parentheses):
======================
    x           z
----------  ----------
 +9.8E-01    +4.0E+00
(+8.7E-02)  (+8.5E-02)
======================

The parameters seem to have been estimated reasonably well.

[9]:
np.c_[simulation.beta, results.beta]
[9]:
array([[ 1.        ,  0.96223514],
       [-2.        , -2.00792431],
       [ 2.        ,  2.10032015]])
[10]:
np.c_[simulation.gamma, results.gamma]
[10]:
array([[1.        , 0.97820624],
       [4.        , 4.03121577]])
[11]:
np.c_[simulation.sigma, results.sigma]
[11]:
array([[1.        , 0.78358853]])

In addition to checking that the configuration for a model based on actual data makes sense, the Simulation class can also be a helpful tool for better understanding under what general conditions BLP models can be accurately estimated. Simulations are also used extensively in pyblp’s test suite.