Download the Jupyter Notebook for this section: simulation.ipynb

Problem Simulation Tutorial

[1]:
import pyblp
import numpy as np
import pandas as pd

pyblp.options.digits = 2
pyblp.options.verbose = False
pyblp.__version__
[1]:
'0.7.0'

Before configuring and solving a problem with real data, it may be a good idea to perform Monte Carlo analysis on simulated data to verify that it is possible to accurately estimate model parameters. For example, before configuring and solving the example problems in the prior tutorials, it may have been a good idea to simulate data according to the assumed models of supply and demand. During such Monte Carlo anaysis, the data would only be used to determine sample sizes and perhaps to choose reasonable true parameters.

Simulations are configured with the Simulation class, which requires many of the same inputs as Problem. The two main differences are:

  1. Variables in formulations that cannot be loaded from product_data or agent_data will be drawn from independent uniform distributions.

  2. True parameters and the distribution of unobserved product characteristics are specified.

First, we’ll use build_id_data to build market and firm IDs for a model in which there are \(T = 50\) markets, and in each market \(t\), a total of \(J_t = 20\) products produced by \(F = 10\) firms.

[2]:
id_data = pyblp.build_id_data(T=50, J=20, F=10)

Next, we’ll create an Integration configuration to build agent data according to a Gauss-Hermite product rule that exactly integrates polynomials of degree \(2 \times 9 - 1 = 17\) or less.

[3]:
integration = pyblp.Integration('product', 9)
integration
[3]:
Configured to construct nodes and weights according to the level-9 Gauss-Hermite product rule.

We’ll then pass these data to Simulation. We’ll use Formulation configurations to create an \(X_1\) that consists of a constant, prices, and an exogenous characteristic; an \(X_2\) that consists only of the same exogenous characteristic; and an \(X_3\) that consists of the common exogenous characteristic and a cost-shifter.

[4]:
simulation = pyblp.Simulation(
   product_formulations=(
       pyblp.Formulation('1 + prices + x'),
       pyblp.Formulation('0 + x'),
       pyblp.Formulation('0 + x + z')
   ),
   beta=[1, -2, 2],
   sigma=1,
   gamma=[1, 4],
   product_data=id_data,
   integration=integration,
   seed=0
)
simulation
[4]:
Dimensions:
=================================================
 T    N     F    I    K1    K2    K3    MD    MS
---  ----  ---  ---  ----  ----  ----  ----  ----
50   1000  10   450   3     1     2     5     6
=================================================

Formulations:
===================================================
       Column Indices:           0      1       2
-----------------------------  -----  ------  -----
 X1: Linear Characteristics      1    prices    x
X2: Nonlinear Characteristics    x
  X3: Cost Characteristics       x      z
===================================================

Nonlinear Coefficient True Values:
==================
Sigma:      x
------  ----------
  x      +1.0E+00
==================

Beta True Values:
==================================
    1         prices        x
----------  ----------  ----------
 +1.0E+00    -2.0E+00    +2.0E+00
==================================

Gamma True Values:
======================
    x           z
----------  ----------
 +1.0E+00    +4.0E+00
======================

When Simulation is initialized, it constructs Simulation.agent_data and simulates all Simulation.product_data except for prices and shares, which are initialized as zeros and need to be solved for.

The excluded instruments in Simulation.product_data include basic instruments computed with build_blp_instruments that are functions of all exogenous numerical variables in the problem. In this example, excluded demand-side instruments are the cost-shifter z and traditional BLP instruments constructed from x. Excluded supply-side instruments are traditional BLP instruments constructed from x and z.

The Simulation can be further configured with other arguments that determine how unobserved product characteristics are simulated and how marginal costs are specified.

Since at this stage prices and shares are all zeros, we still need to solve the simulation with Simulation.solve. This method computes synthetic prices and shares. Just like ProblemResults.compute_prices, to do so it iterates over the \(\zeta\)-markup equation from Morrow and Skerlos (2011).

[5]:
simulation_results = simulation.solve()
simulation_results
[5]:
Simulation Results Summary:
=====================================
Computation  Fixed Point  Contraction
   Time      Iterations   Evaluations
-----------  -----------  -----------
 00:00:01        697          697
=====================================

Now, we can try to recover the true parameters by creating and solving a Problem. By default, the convenience method SimulationResults.to_problem uses the same formulations and unobserved agent data as the simulation, so estimation is relatively easy. However, we’ll choose starting values that are half the true parameters so that the optimization routine has to do some work. Note that since we’re jointly estimating the supply side, we need to provide an initial value for the linear coefficient on prices because this parameter cannot be concentrated out of the problem (unlike linear coefficients on exogenous characteristics).

[6]:
problem = simulation_results.to_problem()
problem
[6]:
Dimensions:
=================================================
 T    N     F    I    K1    K2    K3    MD    MS
---  ----  ---  ---  ----  ----  ----  ----  ----
50   1000  10   450   3     1     2     5     6
=================================================

Formulations:
===================================================
       Column Indices:           0      1       2
-----------------------------  -----  ------  -----
 X1: Linear Characteristics      1    prices    x
X2: Nonlinear Characteristics    x
  X3: Cost Characteristics       x      z
===================================================
[7]:
results = problem.solve(
   sigma=0.5 * simulation.sigma,
   pi=0.5 * simulation.pi,
   beta=[None, 0.5 * simulation.beta[1], None]
)
results
[7]:
Problem Results Summary:
========================================================================================================================
                                                                                                   Smallest    Largest
Computation  GMM   Optimization   Objective   Fixed Point  Contraction  Objective    Gradient      Hessian     Hessian
   Time      Step   Iterations   Evaluations  Iterations   Evaluations    Value    Infinity Norm  Eigenvalue  Eigenvalue
-----------  ----  ------------  -----------  -----------  -----------  ---------  -------------  ----------  ----------
 00:00:42     2         27           33          8318         26498     +9.3E+03     +2.2E-02      +9.5E+03    +4.9E+06
========================================================================================================================

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
==================
Sigma:      x
------  ----------
  x      +1.5E+00
        (+4.5E-01)
==================

Beta Estimates (Robust SEs in Parentheses):
==================================
    1         prices        x
----------  ----------  ----------
 +1.1E+00    -2.0E+00    +1.8E+00
(+1.2E-01)  (+2.3E-02)  (+1.7E-01)
==================================

Gamma Estimates (Robust SEs in Parentheses):
======================
    x           z
----------  ----------
 +9.3E-01    +4.2E+00
(+8.5E-02)  (+7.4E-02)
======================

The parameters seem to have been estimated reasonably well.

[8]:
np.c_[simulation.beta, results.beta]
[8]:
array([[ 1.        ,  1.05244587],
       [-2.        , -1.96850864],
       [ 2.        ,  1.84086991]])
[9]:
np.c_[simulation.gamma, results.gamma]
[9]:
array([[1.        , 0.93276371],
       [4.        , 4.17298671]])
[10]:
np.c_[simulation.sigma, results.sigma]
[10]:
array([[1.        , 1.48688669]])

In addition to checking that the configuration for a model based on actual data makes sense, the Simulation class can also be a helpful tool for better understanding under what general conditions BLP models can be accurately estimated. Simulations are also used extensively in pyblp’s test suite.