Download the Jupyter Notebook for this section: data_to_dict.ipynb
Converting Data into a Dictionary Example¶
[1]:
import pyblp
import numpy as np
import pandas as pd
np.set_printoptions(precision=1)
pyblp.options.digits = 2
pyblp.options.verbose = False
pyblp.__version__
[1]:
'1.1.0'
In this example, we’ll convert a dataset constructed by PyBLP into a dictionary that can more easily ingested by other Python packages. Note that you can also pickle most PyBLP objects, which may be more convenient.
First we’ll initialize a Problem with the fake cereal data from Nevo (2000a).
[2]:
product_data = pd.read_csv(pyblp.data.NEVO_PRODUCTS_LOCATION)
formulation = pyblp.Formulation('0 + prices', absorb='C(product_ids)')
problem = pyblp.Problem(formulation, product_data)
problem
[2]:
Dimensions:
================================
T N F K1 MD ED
--- ---- --- ---- ---- ----
94 2256 5 1 20 1
================================
Formulations:
==================================
Column Indices: 0
-------------------------- ------
X1: Linear Characteristics prices
==================================
The Problem.products attribute is a typical example of the type of NumPy record array that PyBLP uses to structure data throughout the package.
[3]:
problem.products
[3]:
rec.array([(['C01Q1'], [1], ['F1B04'], [], [], ['F1B04'], [], [], [0.], [-2.5e-01, 4.1e-02, -1.6e+00, -2.7e-01, -1.0e-02, 6.9e-03, -9.2e-01, 5.1e-03, 1.3e-01, 2.8e-01, 2.0e-01, 2.5e-01, -4.1e-03, -3.6e-02, 7.1e-02, 1.2e-02, 1.7e-02, -1.5e-02, 8.1e-02, -1.6e-02], [], [], [-0.], [], [], [0.1]),
(['C01Q1'], [1], ['F1B06'], [], [], ['F1B06'], [], [], [0.], [-2.1e-01, 5.7e-02, -1.0e+01, 1.5e-01, 4.0e-02, 6.1e-03, 1.1e+00, 8.6e-02, 1.1e-01, -2.7e-02, -1.2e+00, -1.3e-01, 2.6e-03, -6.8e-03, -4.5e-02, 6.7e-05, 3.1e-02, 5.8e-03, -3.2e-02, -1.1e-02], [], [], [-0.], [], [], [0.1]),
(['C01Q1'], [1], ['F1B07'], [], [], ['F1B07'], [], [], [0.], [-2.1e-01, 4.6e-02, -2.3e+00, -3.0e-02, 2.4e-03, -1.3e-02, 3.3e-01, -1.7e-01, -2.3e-01, 3.1e-01, 1.0e+00, 2.0e-01, 9.9e-04, 1.8e-02, 8.2e-02, 3.5e-02, 2.8e-02, 1.3e-02, 4.7e-02, 2.7e-02], [], [], [ 0.], [], [], [0.1]),
...,
(['C65Q2'], [4], ['F4B10'], [], [], ['F4B10'], [], [], [0.], [-1.2e-01, -3.2e-04, -1.1e+00, 1.8e-01, 3.6e-02, -1.9e-02, 2.4e-01, 5.4e-02, -3.2e-01, 8.7e-02, 2.7e+00, 1.6e-01, 8.8e-04, 3.8e-02, 1.9e-02, -5.2e-02, -1.8e-02, 3.7e-02, -5.8e-02, 3.6e-02], [], [], [ 0.], [], [], [0.1]),
(['C65Q2'], [4], ['F4B12'], [], [], ['F4B12'], [], [], [0.], [-2.0e-01, 3.3e-04, -5.1e-01, -4.5e-03, 3.2e-02, 6.1e-03, 5.7e-01, 2.3e-02, 1.1e-01, 1.9e-01, 2.1e+00, 1.3e-01, -8.1e-03, -1.2e-02, -3.6e-02, -4.3e-03, -1.7e-02, -6.6e-03, 7.2e-03, -1.5e-02], [], [], [-0.], [], [], [0.1]),
(['C65Q2'], [6], ['F6B18'], [], [], ['F6B18'], [], [], [0.], [-1.4e-01, 3.5e-03, -2.9e-01, 2.9e-01, 3.9e-02, 2.0e-02, -1.9e+00, -4.0e-02, 3.8e-01, 1.1e-01, 3.4e+00, 1.1e-01, -6.1e-03, -1.2e-03, -4.7e-02, -2.4e-02, -2.1e-02, -2.9e-02, -2.6e-02, -2.5e-02], [], [], [-0.], [], [], [0.1])],
dtype=[('market_ids', 'O', (1,)), ('firm_ids', 'O', (1,)), ('demand_ids', 'O', (1,)), ('supply_ids', 'O', (0,)), ('nesting_ids', 'O', (0,)), ('product_ids', 'O', (1,)), ('clustering_ids', 'O', (0,)), ('ownership', '<f8', (0,)), ('shares', '<f8', (1,)), ('ZD', '<f8', (20,)), ('ZS', '<f8', (0,)), ('ZC', '<f8', (0,)), (((prices,), 'X1'), '<f8', (1,)), (((), 'X2'), '<f8', (0,)), (((), 'X3'), '<f8', (0,)), ('prices', '<f8', (1,))])
This is hard to read, and if we try to convert it into a pandas.DataFrame
, we’ll get an error. This is because pandas.DataFrame
doesn’t support matrices.
Instead, we’ll use the data_to_dict function to first convert the record array into a dictionary, which can be easily ingested by Pandas. Matrices are converted into multiple fields, one for each column.
[4]:
x = pyblp.data_to_dict(problem.products)
print({k: v.size for k, v in x.items()})
df = pd.DataFrame(pyblp.data_to_dict(problem.products))
df
{'market_ids': 2256, 'firm_ids': 2256, 'demand_ids': 2256, 'product_ids': 2256, 'shares': 2256, 'ZD0': 2256, 'ZD1': 2256, 'ZD2': 2256, 'ZD3': 2256, 'ZD4': 2256, 'ZD5': 2256, 'ZD6': 2256, 'ZD7': 2256, 'ZD8': 2256, 'ZD9': 2256, 'ZD10': 2256, 'ZD11': 2256, 'ZD12': 2256, 'ZD13': 2256, 'ZD14': 2256, 'ZD15': 2256, 'ZD16': 2256, 'ZD17': 2256, 'ZD18': 2256, 'ZD19': 2256, 'X1': 2256, 'prices': 2256}
[4]:
market_ids | firm_ids | demand_ids | product_ids | shares | ZD0 | ZD1 | ZD2 | ZD3 | ZD4 | ... | ZD12 | ZD13 | ZD14 | ZD15 | ZD16 | ZD17 | ZD18 | ZD19 | X1 | prices | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | C01Q1 | 1 | F1B04 | F1B04 | 0.012417 | -0.249518 | 0.040943 | -1.577566 | -0.269073 | -0.010004 | ... | -0.004142 | -0.035593 | 0.070587 | 0.011768 | 0.017287 | -0.015031 | 0.081201 | -0.015833 | -0.011248 | 0.072088 |
1 | C01Q1 | 1 | F1B06 | F1B06 | 0.007809 | -0.205951 | 0.057100 | -10.383954 | 0.150476 | 0.039816 | ... | 0.002585 | -0.006776 | -0.045453 | 0.000067 | 0.031229 | 0.005841 | -0.032121 | -0.010614 | -0.007135 | 0.114178 |
2 | C01Q1 | 1 | F1B07 | F1B07 | 0.012995 | -0.212031 | 0.046246 | -2.278160 | -0.029976 | 0.002390 | ... | 0.000992 | 0.018425 | 0.081555 | 0.034975 | 0.027932 | 0.013156 | 0.047484 | 0.026800 | 0.023678 | 0.132391 |
3 | C01Q1 | 1 | F1B09 | F1B09 | 0.005770 | -0.170725 | 0.049143 | -1.159784 | -0.244789 | 0.002848 | ... | -0.004274 | 0.026440 | 0.064169 | 0.021496 | 0.032372 | 0.033063 | 0.045501 | 0.036154 | 0.029725 | 0.130344 |
4 | C01Q1 | 1 | F1B11 | F1B11 | 0.017934 | -0.164983 | 0.047168 | -4.737563 | -0.070873 | 0.012273 | ... | -0.004694 | -0.029179 | -0.000454 | -0.045272 | -0.025446 | -0.006794 | -0.007560 | -0.011364 | -0.015585 | 0.154823 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2251 | C65Q2 | 3 | F3B14 | F3B14 | 0.024702 | -0.126940 | 0.002240 | -1.067171 | 0.150626 | 0.037091 | ... | -0.004787 | -0.012775 | -0.059399 | 0.043775 | 0.059339 | -0.021934 | 0.034592 | -0.021052 | -0.017337 | 0.126086 |
2252 | C65Q2 | 4 | F4B02 | F4B02 | 0.007914 | -0.109756 | 0.011192 | 0.458133 | 0.066193 | 0.006838 | ... | 0.009385 | 0.037487 | 0.086225 | 0.060856 | 0.028264 | 0.051264 | 0.032965 | 0.033324 | 0.044542 | 0.199167 |
2253 | C65Q2 | 4 | F4B10 | F4B10 | 0.002229 | -0.119689 | -0.000324 | -1.109521 | 0.175027 | 0.036227 | ... | 0.000884 | 0.037634 | 0.019278 | -0.052403 | -0.018107 | 0.036733 | -0.057647 | 0.035662 | 0.033720 | 0.137017 |
2254 | C65Q2 | 4 | F4B12 | F4B12 | 0.011463 | -0.201890 | 0.000334 | -0.507311 | -0.004538 | 0.031569 | ... | -0.008093 | -0.011750 | -0.036333 | -0.004333 | -0.017427 | -0.006647 | 0.007228 | -0.015403 | -0.004174 | 0.100174 |
2255 | C65Q2 | 6 | F6B18 | F6B18 | 0.026208 | -0.139453 | 0.003468 | -0.285143 | 0.291132 | 0.039259 | ... | -0.006138 | -0.001181 | -0.046888 | -0.023637 | -0.021410 | -0.029402 | -0.025971 | -0.025435 | -0.011956 | 0.127557 |
2256 rows × 27 columns