Download the Jupyter Notebook for this section: data_to_dict.ipynb

Converting Data into a Dictionary Example

[1]:
import pyblp
import numpy as np
import pandas as pd

np.set_printoptions(precision=1)
pyblp.options.digits = 2
pyblp.options.verbose = False
pyblp.__version__
[1]:
'1.1.0'

In this example, we’ll convert a dataset constructed by PyBLP into a dictionary that can more easily ingested by other Python packages. Note that you can also pickle most PyBLP objects, which may be more convenient.

First we’ll initialize a Problem with the fake cereal data from Nevo (2000a).

[2]:
product_data = pd.read_csv(pyblp.data.NEVO_PRODUCTS_LOCATION)
formulation = pyblp.Formulation('0 + prices', absorb='C(product_ids)')
problem = pyblp.Problem(formulation, product_data)
problem
[2]:
Dimensions:
================================
 T    N     F    K1    MD    ED
---  ----  ---  ----  ----  ----
94   2256   5    1     20    1
================================

Formulations:
==================================
     Column Indices:          0
--------------------------  ------
X1: Linear Characteristics  prices
==================================

The Problem.products attribute is a typical example of the type of NumPy record array that PyBLP uses to structure data throughout the package.

[3]:
problem.products
[3]:
rec.array([(['C01Q1'], [1], ['F1B04'], [], [], ['F1B04'], [], [], [0.], [-2.5e-01,  4.1e-02, -1.6e+00, -2.7e-01, -1.0e-02,  6.9e-03, -9.2e-01,  5.1e-03,  1.3e-01,  2.8e-01,  2.0e-01,  2.5e-01, -4.1e-03, -3.6e-02,  7.1e-02,  1.2e-02,  1.7e-02, -1.5e-02,  8.1e-02, -1.6e-02], [], [], [-0.], [], [], [0.1]),
           (['C01Q1'], [1], ['F1B06'], [], [], ['F1B06'], [], [], [0.], [-2.1e-01,  5.7e-02, -1.0e+01,  1.5e-01,  4.0e-02,  6.1e-03,  1.1e+00,  8.6e-02,  1.1e-01, -2.7e-02, -1.2e+00, -1.3e-01,  2.6e-03, -6.8e-03, -4.5e-02,  6.7e-05,  3.1e-02,  5.8e-03, -3.2e-02, -1.1e-02], [], [], [-0.], [], [], [0.1]),
           (['C01Q1'], [1], ['F1B07'], [], [], ['F1B07'], [], [], [0.], [-2.1e-01,  4.6e-02, -2.3e+00, -3.0e-02,  2.4e-03, -1.3e-02,  3.3e-01, -1.7e-01, -2.3e-01,  3.1e-01,  1.0e+00,  2.0e-01,  9.9e-04,  1.8e-02,  8.2e-02,  3.5e-02,  2.8e-02,  1.3e-02,  4.7e-02,  2.7e-02], [], [], [ 0.], [], [], [0.1]),
           ...,
           (['C65Q2'], [4], ['F4B10'], [], [], ['F4B10'], [], [], [0.], [-1.2e-01, -3.2e-04, -1.1e+00,  1.8e-01,  3.6e-02, -1.9e-02,  2.4e-01,  5.4e-02, -3.2e-01,  8.7e-02,  2.7e+00,  1.6e-01,  8.8e-04,  3.8e-02,  1.9e-02, -5.2e-02, -1.8e-02,  3.7e-02, -5.8e-02,  3.6e-02], [], [], [ 0.], [], [], [0.1]),
           (['C65Q2'], [4], ['F4B12'], [], [], ['F4B12'], [], [], [0.], [-2.0e-01,  3.3e-04, -5.1e-01, -4.5e-03,  3.2e-02,  6.1e-03,  5.7e-01,  2.3e-02,  1.1e-01,  1.9e-01,  2.1e+00,  1.3e-01, -8.1e-03, -1.2e-02, -3.6e-02, -4.3e-03, -1.7e-02, -6.6e-03,  7.2e-03, -1.5e-02], [], [], [-0.], [], [], [0.1]),
           (['C65Q2'], [6], ['F6B18'], [], [], ['F6B18'], [], [], [0.], [-1.4e-01,  3.5e-03, -2.9e-01,  2.9e-01,  3.9e-02,  2.0e-02, -1.9e+00, -4.0e-02,  3.8e-01,  1.1e-01,  3.4e+00,  1.1e-01, -6.1e-03, -1.2e-03, -4.7e-02, -2.4e-02, -2.1e-02, -2.9e-02, -2.6e-02, -2.5e-02], [], [], [-0.], [], [], [0.1])],
          dtype=[('market_ids', 'O', (1,)), ('firm_ids', 'O', (1,)), ('demand_ids', 'O', (1,)), ('supply_ids', 'O', (0,)), ('nesting_ids', 'O', (0,)), ('product_ids', 'O', (1,)), ('clustering_ids', 'O', (0,)), ('ownership', '<f8', (0,)), ('shares', '<f8', (1,)), ('ZD', '<f8', (20,)), ('ZS', '<f8', (0,)), ('ZC', '<f8', (0,)), (((prices,), 'X1'), '<f8', (1,)), (((), 'X2'), '<f8', (0,)), (((), 'X3'), '<f8', (0,)), ('prices', '<f8', (1,))])

This is hard to read, and if we try to convert it into a pandas.DataFrame, we’ll get an error. This is because pandas.DataFrame doesn’t support matrices.

Instead, we’ll use the data_to_dict function to first convert the record array into a dictionary, which can be easily ingested by Pandas. Matrices are converted into multiple fields, one for each column.

[4]:
x = pyblp.data_to_dict(problem.products)
print({k: v.size for k, v in x.items()})

df = pd.DataFrame(pyblp.data_to_dict(problem.products))
df
{'market_ids': 2256, 'firm_ids': 2256, 'demand_ids': 2256, 'product_ids': 2256, 'shares': 2256, 'ZD0': 2256, 'ZD1': 2256, 'ZD2': 2256, 'ZD3': 2256, 'ZD4': 2256, 'ZD5': 2256, 'ZD6': 2256, 'ZD7': 2256, 'ZD8': 2256, 'ZD9': 2256, 'ZD10': 2256, 'ZD11': 2256, 'ZD12': 2256, 'ZD13': 2256, 'ZD14': 2256, 'ZD15': 2256, 'ZD16': 2256, 'ZD17': 2256, 'ZD18': 2256, 'ZD19': 2256, 'X1': 2256, 'prices': 2256}
[4]:
market_ids firm_ids demand_ids product_ids shares ZD0 ZD1 ZD2 ZD3 ZD4 ... ZD12 ZD13 ZD14 ZD15 ZD16 ZD17 ZD18 ZD19 X1 prices
0 C01Q1 1 F1B04 F1B04 0.012417 -0.249518 0.040943 -1.577566 -0.269073 -0.010004 ... -0.004142 -0.035593 0.070587 0.011768 0.017287 -0.015031 0.081201 -0.015833 -0.011248 0.072088
1 C01Q1 1 F1B06 F1B06 0.007809 -0.205951 0.057100 -10.383954 0.150476 0.039816 ... 0.002585 -0.006776 -0.045453 0.000067 0.031229 0.005841 -0.032121 -0.010614 -0.007135 0.114178
2 C01Q1 1 F1B07 F1B07 0.012995 -0.212031 0.046246 -2.278160 -0.029976 0.002390 ... 0.000992 0.018425 0.081555 0.034975 0.027932 0.013156 0.047484 0.026800 0.023678 0.132391
3 C01Q1 1 F1B09 F1B09 0.005770 -0.170725 0.049143 -1.159784 -0.244789 0.002848 ... -0.004274 0.026440 0.064169 0.021496 0.032372 0.033063 0.045501 0.036154 0.029725 0.130344
4 C01Q1 1 F1B11 F1B11 0.017934 -0.164983 0.047168 -4.737563 -0.070873 0.012273 ... -0.004694 -0.029179 -0.000454 -0.045272 -0.025446 -0.006794 -0.007560 -0.011364 -0.015585 0.154823
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2251 C65Q2 3 F3B14 F3B14 0.024702 -0.126940 0.002240 -1.067171 0.150626 0.037091 ... -0.004787 -0.012775 -0.059399 0.043775 0.059339 -0.021934 0.034592 -0.021052 -0.017337 0.126086
2252 C65Q2 4 F4B02 F4B02 0.007914 -0.109756 0.011192 0.458133 0.066193 0.006838 ... 0.009385 0.037487 0.086225 0.060856 0.028264 0.051264 0.032965 0.033324 0.044542 0.199167
2253 C65Q2 4 F4B10 F4B10 0.002229 -0.119689 -0.000324 -1.109521 0.175027 0.036227 ... 0.000884 0.037634 0.019278 -0.052403 -0.018107 0.036733 -0.057647 0.035662 0.033720 0.137017
2254 C65Q2 4 F4B12 F4B12 0.011463 -0.201890 0.000334 -0.507311 -0.004538 0.031569 ... -0.008093 -0.011750 -0.036333 -0.004333 -0.017427 -0.006647 0.007228 -0.015403 -0.004174 0.100174
2255 C65Q2 6 F6B18 F6B18 0.026208 -0.139453 0.003468 -0.285143 0.291132 0.039259 ... -0.006138 -0.001181 -0.046888 -0.023637 -0.021410 -0.029402 -0.025971 -0.025435 -0.011956 0.127557

2256 rows × 27 columns