Download the Jupyter Notebook for this section: build_matrix.ipynb
Building a Matrix Example¶
[1]:
import pyblp
import pandas as pd
pyblp.__version__
[1]:
'1.1.0'
In this example, we’ll load the fake cereal data from Nevo (2000a) and create a simple matrix involving a constant, prices, and shares.
[2]:
formulation = pyblp.Formulation('1 + prices + shares')
formulation
[2]:
1 + prices + shares
[3]:
product_data = pd.read_csv(pyblp.data.NEVO_PRODUCTS_LOCATION)
product_data.head()
[3]:
market_ids | city_ids | quarter | product_ids | firm_ids | brand_ids | shares | prices | sugar | mushy | ... | demand_instruments10 | demand_instruments11 | demand_instruments12 | demand_instruments13 | demand_instruments14 | demand_instruments15 | demand_instruments16 | demand_instruments17 | demand_instruments18 | demand_instruments19 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | C01Q1 | 1 | 1 | F1B04 | 1 | 4 | 0.012417 | 0.072088 | 2 | 1 | ... | 2.116358 | -0.154708 | -0.005796 | 0.014538 | 0.126244 | 0.067345 | 0.068423 | 0.034800 | 0.126346 | 0.035484 |
1 | C01Q1 | 1 | 1 | F1B06 | 1 | 6 | 0.007809 | 0.114178 | 18 | 1 | ... | -7.374091 | -0.576412 | 0.012991 | 0.076143 | 0.029736 | 0.087867 | 0.110501 | 0.087784 | 0.049872 | 0.072579 |
2 | C01Q1 | 1 | 1 | F1B07 | 1 | 7 | 0.012995 | 0.132391 | 4 | 1 | ... | 2.187872 | -0.207346 | 0.003509 | 0.091781 | 0.163773 | 0.111881 | 0.108226 | 0.086439 | 0.122347 | 0.101842 |
3 | C01Q1 | 1 | 1 | F1B09 | 1 | 9 | 0.005770 | 0.130344 | 3 | 0 | ... | 2.704576 | 0.040748 | -0.003724 | 0.094732 | 0.135274 | 0.088090 | 0.101767 | 0.101777 | 0.110741 | 0.104332 |
4 | C01Q1 | 1 | 1 | F1B11 | 1 | 11 | 0.017934 | 0.154823 | 12 | 0 | ... | 1.261242 | 0.034836 | -0.000568 | 0.102451 | 0.130640 | 0.084818 | 0.101075 | 0.125169 | 0.133464 | 0.121111 |
5 rows × 30 columns
[4]:
matrix = pyblp.build_matrix(formulation, product_data)
matrix
[4]:
array([[1. , 0.07208794, 0.01241721],
[1. , 0.11417849, 0.00780939],
[1. , 0.13239066, 0.01299451],
...,
[1. , 0.13701741, 0.00222918],
[1. , 0.10017433, 0.01146267],
[1. , 0.12755747, 0.02620832]])
For various reasons, we may want to absorb fixed effects into the matrix. This can be done with the absorb
argument of Formulation. We’ll now re-create the matrix, absorbing product-specific fixed effects. Note that the constant column is now ignored.
[5]:
absorb_formulation = pyblp.Formulation('prices + shares', absorb='product_ids')
absorb_formulation
[5]:
prices + shares + Absorb[product_ids]
[6]:
demeaned_matrix = pyblp.build_matrix(absorb_formulation, product_data)
demeaned_matrix
[6]:
array([[-0.01124832, -0.00052161],
[-0.00713476, -0.03144549],
[ 0.02367765, -0.01664996],
...,
[ 0.03371995, -0.00779841],
[-0.00417404, -0.0117508 ],
[-0.01195648, 0.00666695]])