Building Differentiation Instruments Example

[1]:
import pyblp
import numpy as np
import pandas as pd

np.set_printoptions(precision=3)
pyblp.__version__
[1]:
'1.1.0'

In this example, we’ll load the automobile product data from Berry, Levinsohn, and Pakes (1995), build some very simple excluded demand-side instruments for the problem in the spirit of Gandhi and Houde (2017), and demonstrate how to update the problem data to use these instrument instead of the default ones.

[2]:
product_data = pd.read_csv(pyblp.data.BLP_PRODUCTS_LOCATION)
product_data.head()
[2]:
market_ids clustering_ids car_ids firm_ids region shares prices hpwt air mpd ... supply_instruments2 supply_instruments3 supply_instruments4 supply_instruments5 supply_instruments6 supply_instruments7 supply_instruments8 supply_instruments9 supply_instruments10 supply_instruments11
0 1971 AMGREM71 129 15 US 0.001051 4.935802 0.528997 0 1.888146 ... 0.0 1.705933 1.595656 87.0 -61.959985 0.0 46.060389 29.786989 0.0 1.888146
1 1971 AMHORN71 130 15 US 0.000670 5.516049 0.494324 0 1.935989 ... 0.0 1.680910 1.490295 87.0 -61.959985 0.0 46.060389 29.786989 0.0 1.935989
2 1971 AMJAVL71 132 15 US 0.000341 7.108642 0.467613 0 1.716799 ... 0.0 1.801067 1.357703 87.0 -61.959985 0.0 46.060389 29.786989 0.0 1.716799
3 1971 AMMATA71 134 15 US 0.000522 6.839506 0.426540 0 1.687871 ... 0.0 1.818061 1.261347 87.0 -61.959985 0.0 46.060389 29.786989 0.0 1.687871
4 1971 AMAMBS71 136 15 US 0.000442 8.928395 0.452489 0 1.504286 ... 0.0 1.933210 1.237365 87.0 -61.959985 0.0 46.060389 29.786989 0.0 1.504286

5 rows × 33 columns

We’ll first build “local” differentiation instruments, which are constructed by default, and which consist of counts of “close” rival and non-rival products in each market. Note that we’re excluding the constant column because it yields collinear constant columns of differentiation instruments.

[3]:
formulation = pyblp.Formulation('0 + hpwt + air + mpd')
local_instruments = pyblp.build_differentiation_instruments(
    formulation,
    product_data
)
local_instruments
[3]:
array([[ 4.,  4.,  4., 42., 87., 83.],
       [ 4.,  4.,  4., 53., 87., 84.],
       [ 4.,  4.,  4., 51., 87., 78.],
       ...,
       [ 0.,  0.,  0., 86., 70., 62.],
       [ 1.,  1.,  1.,  3., 58., 91.],
       [ 1.,  1.,  1., 13., 58., 72.]])

Next, we’ll build a more continuous “quadratic” version of the instruments, which consist of sums over squared differences between rival and non-rival products in each market.

[4]:
quadratic_instruments = pyblp.build_differentiation_instruments(
    formulation,
    product_data,
    version='quadratic'
)
quadratic_instruments
[4]:
array([[2.132e-02, 0.000e+00, 2.191e-01, 2.011e+00, 0.000e+00, 1.208e+01],
       [8.261e-03, 0.000e+00, 2.983e-01, 2.014e+00, 0.000e+00, 1.198e+01],
       [6.397e-03, 0.000e+00, 1.234e-01, 2.159e+00, 0.000e+00, 1.568e+01],
       ...,
       [0.000e+00, 0.000e+00, 0.000e+00, 2.239e+00, 6.000e+01, 1.312e+02],
       [1.467e-02, 0.000e+00, 6.317e-02, 1.864e+01, 7.100e+01, 6.185e+01],
       [1.467e-02, 0.000e+00, 6.317e-02, 8.961e+00, 7.100e+01, 8.819e+01]])

We could also use interact=True to include interaction terms in either version of instruments, which would help capture covariances between different product characteristics.

To use these instruments when setting up a Problem, the existing product data has to be updated or new product data has to be constructed. Since the existing product data is a Pandas DataFrame, it does not support matrices, so each column of instruments has to be added individually after deleting the existing instruments.

[5]:
for i in range(8):
    del product_data[f'demand_instruments{i}']

for i, column in enumerate(local_instruments.T):
    product_data[f'demand_instruments{i}'] = column

product_data
[5]:
market_ids clustering_ids car_ids firm_ids region shares prices hpwt air mpd ... supply_instruments8 supply_instruments9 supply_instruments10 supply_instruments11 demand_instruments0 demand_instruments1 demand_instruments2 demand_instruments3 demand_instruments4 demand_instruments5
0 1971 AMGREM71 129 15 US 0.001051 4.935802 0.528997 0 1.888146 ... 46.060389 29.786989 0.0 1.888146 4.0 4.0 4.0 42.0 87.0 83.0
1 1971 AMHORN71 130 15 US 0.000670 5.516049 0.494324 0 1.935989 ... 46.060389 29.786989 0.0 1.935989 4.0 4.0 4.0 53.0 87.0 84.0
2 1971 AMJAVL71 132 15 US 0.000341 7.108642 0.467613 0 1.716799 ... 46.060389 29.786989 0.0 1.716799 4.0 4.0 4.0 51.0 87.0 78.0
3 1971 AMMATA71 134 15 US 0.000522 6.839506 0.426540 0 1.687871 ... 46.060389 29.786989 0.0 1.687871 4.0 4.0 4.0 52.0 87.0 77.0
4 1971 AMAMBS71 136 15 US 0.000442 8.928395 0.452489 0 1.504286 ... 46.060389 29.786989 0.0 1.504286 4.0 4.0 4.0 52.0 87.0 69.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2212 1990 VV74085 5584 6 EU 0.000488 16.140015 0.385917 1 2.639135 ... 97.039220 27.861181 38.0 2.639135 2.0 2.0 2.0 102.0 57.0 109.0
2213 1990 VV760G87 5585 6 EU 0.000091 25.986993 0.435967 1 2.136442 ... 97.039220 27.861181 38.0 2.136442 2.0 2.0 2.0 112.0 57.0 86.0
2214 1990 YGGVPL90 5589 23 EU 0.000067 3.393267 0.358289 0 3.518846 ... 98.024103 28.809765 0.0 3.518846 0.0 0.0 0.0 86.0 70.0 62.0
2215 1990 PS911C90 5590 12 EU 0.000039 44.758990 0.814913 1 3.016154 ... 97.222743 28.407171 19.0 3.016154 1.0 1.0 1.0 3.0 58.0 91.0
2216 1990 PS94490 5592 12 EU 0.000025 32.058148 0.693796 1 3.267500 ... 97.222743 28.407171 19.0 3.267500 1.0 1.0 1.0 13.0 58.0 72.0

2217 rows × 31 columns

Any data type that has fields can be used as product data. An alternative way to specify problem_data for Problem initialization is to simply use a dict, where fields can be matrices. For example, we could use the following dict, which includes both the new demand instruments as well as a few other variables that might be used when setting up the problem.

[6]:
product_data_dict = {k: product_data[k] for k in ['market_ids', 'firm_ids', 'shares', 'prices', 'hpwt', 'air', 'mpd']}
product_data_dict['demand_instruments'] = local_instruments