Download the Jupyter Notebook for this section: build_matrix.ipynb

# Building a Matrix Example¶

[1]:

import pyblp
import pandas as pd

pyblp.__version__

[1]:

'0.10.1'


In this example, we’ll load the fake cereal data from Nevo (2000) and create a simple matrix involving a constant, prices, and shares.

[2]:

formulation = pyblp.Formulation('1 + prices + shares')
formulation

[2]:

1 + prices + shares

[3]:

product_data = pd.read_csv(pyblp.data.NEVO_PRODUCTS_LOCATION)
product_data.head()

[3]:

market_ids city_ids quarter product_ids firm_ids brand_ids shares prices sugar mushy ... demand_instruments10 demand_instruments11 demand_instruments12 demand_instruments13 demand_instruments14 demand_instruments15 demand_instruments16 demand_instruments17 demand_instruments18 demand_instruments19
0 C01Q1 1 1 F1B04 1 4 0.012417 0.072088 2 1 ... 2.116358 -0.154708 -0.005796 0.014538 0.126244 0.067345 0.068423 0.034800 0.126346 0.035484
1 C01Q1 1 1 F1B06 1 6 0.007809 0.114178 18 1 ... -7.374091 -0.576412 0.012991 0.076143 0.029736 0.087867 0.110501 0.087784 0.049872 0.072579
2 C01Q1 1 1 F1B07 1 7 0.012995 0.132391 4 1 ... 2.187872 -0.207346 0.003509 0.091781 0.163773 0.111881 0.108226 0.086439 0.122347 0.101842
3 C01Q1 1 1 F1B09 1 9 0.005770 0.130344 3 0 ... 2.704576 0.040748 -0.003724 0.094732 0.135274 0.088090 0.101767 0.101777 0.110741 0.104332
4 C01Q1 1 1 F1B11 1 11 0.017934 0.154823 12 0 ... 1.261242 0.034836 -0.000568 0.102451 0.130640 0.084818 0.101075 0.125169 0.133464 0.121111

5 rows × 30 columns

[4]:

matrix = pyblp.build_matrix(formulation, product_data)
matrix

[4]:

array([[1.        , 0.07208794, 0.01241721],
[1.        , 0.11417849, 0.00780939],
[1.        , 0.13239066, 0.01299451],
...,
[1.        , 0.13701741, 0.00222918],
[1.        , 0.10017433, 0.01146267],
[1.        , 0.12755747, 0.02620832]])


For various reasons, we may want to absorb fixed effects into the matrix. This can be done with the absorb argument of Formulation. We’ll now re-create the matrix, absorbing product-specific fixed effects. Note that the constant column is now ignored.

[5]:

absorb_formulation = pyblp.Formulation('prices + shares', absorb='product_ids')
absorb_formulation

[5]:

prices + shares + Absorb[product_ids]

[6]:

demeaned_matrix = pyblp.build_matrix(absorb_formulation, product_data)
demeaned_matrix

[6]:

array([[-0.01124832, -0.00052161],
[-0.00713476, -0.03144549],
[ 0.02367765, -0.01664996],
...,
[ 0.03371995, -0.00779841],
[-0.00417404, -0.0117508 ],
[-0.01195648,  0.00666695]])