Download the Jupyter Notebook for this section: build_differentiation_instruments.ipynb

Building Differentiation Instruments Example¶

[1]:

import pyblp
import numpy as np
import pandas as pd

np.set_printoptions(precision=3)
pyblp.__version__

[1]:

'1.2.0'

In this example, we’ll load the automobile product data from Berry, Levinsohn, and Pakes (1995), build some very simple excluded demand-side instruments for the problem in the spirit of Gandhi and Houde (2025), and demonstrate how to update the problem data to use these instrument instead of the default ones.

[2]:

product_data = pd.read_csv(pyblp.data.BLP_PRODUCTS_LOCATION)
product_data.head()

[2]:

	market_ids	clustering_ids	car_ids	firm_ids	region	shares	prices	hpwt	mpd	...	supply_instruments3	supply_instruments4	supply_instruments5	supply_instruments6	supply_instruments8	supply_instruments9	supply_instruments11
0	1971	AMGREM71	129	15	US	0.001051	4.935802	0.528997	1.888146	...	1.705933	1.595656	87.0	-61.959985	46.060389	29.786989	1.888146
1	1971	AMHORN71	130	15	US	0.000670	5.516049	0.494324	1.935989	...	1.680910	1.490295	87.0	-61.959985	46.060389	29.786989	1.935989
2	1971	AMJAVL71	132	15	US	0.000341	7.108642	0.467613	1.716799	...	1.801067	1.357703	87.0	-61.959985	46.060389	29.786989	1.716799
3	1971	AMMATA71	134	15	US	0.000522	6.839506	0.426540	1.687871	...	1.818061	1.261347	87.0	-61.959985	46.060389	29.786989	1.687871
4	1971	AMAMBS71	136	15	US	0.000442	8.928395	0.452489	1.504286	...	1.933210	1.237365	87.0	-61.959985	46.060389	29.786989	1.504286

5 rows × 33 columns

We’ll first build “local” differentiation instruments, which are constructed by default, and which consist of counts of “close” rival and non-rival products in each market. Note that we’re excluding the constant column because it yields collinear constant columns of differentiation instruments.

[3]:

formulation = pyblp.Formulation('0 + hpwt + air + mpd')
local_instruments = pyblp.build_differentiation_instruments(
    formulation,
    product_data
)
local_instruments

[3]:

array([[ 4.,  4.,  4., 42., 87., 83.],
       [ 4.,  4.,  4., 53., 87., 84.],
       [ 4.,  4.,  4., 51., 87., 78.],
       ...,
       [ 0.,  0.,  0., 86., 70., 62.],
       [ 1.,  1.,  1.,  3., 58., 91.],
       [ 1.,  1.,  1., 13., 58., 72.]], shape=(2217, 6))

Next, we’ll build a more continuous “quadratic” version of the instruments, which consist of sums over squared differences between rival and non-rival products in each market.

[4]:

quadratic_instruments = pyblp.build_differentiation_instruments(
    formulation,
    product_data,
    version='quadratic'
)
quadratic_instruments

[4]:

array([[2.132e-02, 0.000e+00, 2.191e-01, 2.011e+00, 0.000e+00, 1.208e+01],
       [8.261e-03, 0.000e+00, 2.983e-01, 2.014e+00, 0.000e+00, 1.198e+01],
       [6.397e-03, 0.000e+00, 1.234e-01, 2.159e+00, 0.000e+00, 1.568e+01],
       ...,
       [0.000e+00, 0.000e+00, 0.000e+00, 2.239e+00, 6.000e+01, 1.312e+02],
       [1.467e-02, 0.000e+00, 6.317e-02, 1.864e+01, 7.100e+01, 6.185e+01],
       [1.467e-02, 0.000e+00, 6.317e-02, 8.961e+00, 7.100e+01, 8.819e+01]],
      shape=(2217, 6))

We could also use interact=True to include interaction terms in either version of instruments, which would help capture covariances between different product characteristics.

To use these instruments when setting up a Problem, the existing product data has to be updated or new product data has to be constructed. Since the existing product data is a Pandas DataFrame, it does not support matrices, so each column of instruments has to be added individually after deleting the existing instruments.

[5]:

for i in range(8):
    del product_data[f'demand_instruments{i}']

for i, column in enumerate(local_instruments.T):
    product_data[f'demand_instruments{i}'] = column

product_data

[5]:

	market_ids	clustering_ids	car_ids	firm_ids	region	shares	prices	hpwt	air	mpd	...	supply_instruments8	supply_instruments9	supply_instruments10	supply_instruments11	demand_instruments0	demand_instruments1	demand_instruments2	demand_instruments3	demand_instruments4	demand_instruments5
0	1971	AMGREM71	129	15	US	0.001051	4.935802	0.528997	0	1.888146	...	46.060389	29.786989	0.0	1.888146	4.0	4.0	4.0	42.0	87.0	83.0
1	1971	AMHORN71	130	15	US	0.000670	5.516049	0.494324	0	1.935989	...	46.060389	29.786989	0.0	1.935989	4.0	4.0	4.0	53.0	87.0	84.0
2	1971	AMJAVL71	132	15	US	0.000341	7.108642	0.467613	0	1.716799	...	46.060389	29.786989	0.0	1.716799	4.0	4.0	4.0	51.0	87.0	78.0
3	1971	AMMATA71	134	15	US	0.000522	6.839506	0.426540	0	1.687871	...	46.060389	29.786989	0.0	1.687871	4.0	4.0	4.0	52.0	87.0	77.0
4	1971	AMAMBS71	136	15	US	0.000442	8.928395	0.452489	0	1.504286	...	46.060389	29.786989	0.0	1.504286	4.0	4.0	4.0	52.0	87.0	69.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2212	1990	VV74085	5584	6	EU	0.000488	16.140015	0.385917	1	2.639135	...	97.039220	27.861181	38.0	2.639135	2.0	2.0	2.0	102.0	57.0	109.0
2213	1990	VV760G87	5585	6	EU	0.000091	25.986993	0.435967	1	2.136442	...	97.039220	27.861181	38.0	2.136442	2.0	2.0	2.0	112.0	57.0	86.0
2214	1990	YGGVPL90	5589	23	EU	0.000067	3.393267	0.358289	0	3.518846	...	98.024103	28.809765	0.0	3.518846	0.0	0.0	0.0	86.0	70.0	62.0
2215	1990	PS911C90	5590	12	EU	0.000039	44.758990	0.814913	1	3.016154	...	97.222743	28.407171	19.0	3.016154	1.0	1.0	1.0	3.0	58.0	91.0
2216	1990	PS94490	5592	12	EU	0.000025	32.058148	0.693796	1	3.267500	...	97.222743	28.407171	19.0	3.267500	1.0	1.0	1.0	13.0	58.0	72.0

2217 rows × 31 columns

Any data type that has fields can be used as product data. An alternative way to specify problem_data for Problem initialization is to simply use a dict, where fields can be matrices. For example, we could use the following dict, which includes both the new demand instruments as well as a few other variables that might be used when setting up the problem.

[6]:

product_data_dict = {k: product_data[k] for k in ['market_ids', 'firm_ids', 'shares', 'prices', 'hpwt', 'air', 'mpd']}
product_data_dict['demand_instruments'] = local_instruments