Download the Jupyter Notebook for this section: build_matrix.ipynb

Building a Matrix Example

[1]:
import pyblp
import pandas as pd

pyblp.__version__
[1]:
'1.1.0'

In this example, we’ll load the fake cereal data from Nevo (2000a) and create a simple matrix involving a constant, prices, and shares.

[2]:
formulation = pyblp.Formulation('1 + prices + shares')
formulation
[2]:
1 + prices + shares
[3]:
product_data = pd.read_csv(pyblp.data.NEVO_PRODUCTS_LOCATION)
product_data.head()
[3]:
market_ids city_ids quarter product_ids firm_ids brand_ids shares prices sugar mushy ... demand_instruments10 demand_instruments11 demand_instruments12 demand_instruments13 demand_instruments14 demand_instruments15 demand_instruments16 demand_instruments17 demand_instruments18 demand_instruments19
0 C01Q1 1 1 F1B04 1 4 0.012417 0.072088 2 1 ... 2.116358 -0.154708 -0.005796 0.014538 0.126244 0.067345 0.068423 0.034800 0.126346 0.035484
1 C01Q1 1 1 F1B06 1 6 0.007809 0.114178 18 1 ... -7.374091 -0.576412 0.012991 0.076143 0.029736 0.087867 0.110501 0.087784 0.049872 0.072579
2 C01Q1 1 1 F1B07 1 7 0.012995 0.132391 4 1 ... 2.187872 -0.207346 0.003509 0.091781 0.163773 0.111881 0.108226 0.086439 0.122347 0.101842
3 C01Q1 1 1 F1B09 1 9 0.005770 0.130344 3 0 ... 2.704576 0.040748 -0.003724 0.094732 0.135274 0.088090 0.101767 0.101777 0.110741 0.104332
4 C01Q1 1 1 F1B11 1 11 0.017934 0.154823 12 0 ... 1.261242 0.034836 -0.000568 0.102451 0.130640 0.084818 0.101075 0.125169 0.133464 0.121111

5 rows × 30 columns

[4]:
matrix = pyblp.build_matrix(formulation, product_data)
matrix
[4]:
array([[1.        , 0.07208794, 0.01241721],
       [1.        , 0.11417849, 0.00780939],
       [1.        , 0.13239066, 0.01299451],
       ...,
       [1.        , 0.13701741, 0.00222918],
       [1.        , 0.10017433, 0.01146267],
       [1.        , 0.12755747, 0.02620832]])

For various reasons, we may want to absorb fixed effects into the matrix. This can be done with the absorb argument of Formulation. We’ll now re-create the matrix, absorbing product-specific fixed effects. Note that the constant column is now ignored.

[5]:
absorb_formulation = pyblp.Formulation('prices + shares', absorb='product_ids')
absorb_formulation
[5]:
prices + shares + Absorb[product_ids]
[6]:
demeaned_matrix = pyblp.build_matrix(absorb_formulation, product_data)
demeaned_matrix
[6]:
array([[-0.01124832, -0.00052161],
       [-0.00713476, -0.03144549],
       [ 0.02367765, -0.01664996],
       ...,
       [ 0.03371995, -0.00779841],
       [-0.00417404, -0.0117508 ],
       [-0.01195648,  0.00666695]])