Download the Jupyter Notebook for this section: build_matrix.ipynb

Building a Matrix Example¶

[1]:

import pyblp
import pandas as pd

pyblp.__version__

[1]:

'1.2.0'

In this example, we’ll load the fake cereal data from Nevo (2000a) and create a simple matrix involving a constant, prices, and shares.

[2]:

formulation = pyblp.Formulation('1 + prices + shares')
formulation

[2]:

1 + prices + shares

[3]:

product_data = pd.read_csv(pyblp.data.NEVO_PRODUCTS_LOCATION)
product_data.head()

[3]:

	market_ids	city_ids	quarter	product_ids	firm_ids	brand_ids	shares	prices	sugar	mushy	...	demand_instruments10	demand_instruments11	demand_instruments12	demand_instruments13	demand_instruments14	demand_instruments15	demand_instruments16	demand_instruments17	demand_instruments18	demand_instruments19
0	C01Q1	1	1	F1B04	1	4	0.012417	0.072088	2	1	...	2.116358	-0.154708	-0.005796	0.014538	0.126244	0.067345	0.068423	0.034800	0.126346	0.035484
1	C01Q1	1	1	F1B06	1	6	0.007809	0.114178	18	1	...	-7.374091	-0.576412	0.012991	0.076143	0.029736	0.087867	0.110501	0.087784	0.049872	0.072579
2	C01Q1	1	1	F1B07	1	7	0.012995	0.132391	4	1	...	2.187872	-0.207346	0.003509	0.091781	0.163773	0.111881	0.108226	0.086439	0.122347	0.101842
3	C01Q1	1	1	F1B09	1	9	0.005770	0.130344	3	0	...	2.704576	0.040748	-0.003724	0.094732	0.135274	0.088090	0.101767	0.101777	0.110741	0.104332
4	C01Q1	1	1	F1B11	1	11	0.017934	0.154823	12	0	...	1.261242	0.034836	-0.000568	0.102451	0.130640	0.084818	0.101075	0.125169	0.133464	0.121111

5 rows × 30 columns

[4]:

matrix = pyblp.build_matrix(formulation, product_data)
matrix

[4]:

array([[1.        , 0.07208794, 0.01241721],
       [1.        , 0.11417849, 0.00780939],
       [1.        , 0.13239066, 0.01299451],
       ...,
       [1.        , 0.13701741, 0.00222918],
       [1.        , 0.10017433, 0.01146267],
       [1.        , 0.12755747, 0.02620832]], shape=(2256, 3))

For various reasons, we may want to absorb fixed effects into the matrix. This can be done with the absorb argument of Formulation. We’ll now re-create the matrix, absorbing product-specific fixed effects. Note that the constant column is now ignored.

[5]:

absorb_formulation = pyblp.Formulation('prices + shares', absorb='product_ids')
absorb_formulation

[5]:

prices + shares + Absorb[product_ids]

[6]:

demeaned_matrix = pyblp.build_matrix(absorb_formulation, product_data)
demeaned_matrix

[6]:

array([[-0.01124832, -0.00052161],
       [-0.00713476, -0.03144549],
       [ 0.02367765, -0.01664996],
       ...,
       [ 0.03371995, -0.00779841],
       [-0.00417404, -0.0117508 ],
       [-0.01195648,  0.00666695]], shape=(2256, 2))