Download the Jupyter Notebook for this section: data_to_dict.ipynb

Converting Data into a Dictionary Example¶

[1]:

import pyblp
import numpy as np
import pandas as pd

np.set_printoptions(precision=1)
pyblp.options.digits = 2
pyblp.options.verbose = False
pyblp.__version__

[1]:

'1.2.0'

In this example, we’ll convert a dataset constructed by PyBLP into a dictionary that can more easily ingested by other Python packages. Note that you can also pickle most PyBLP objects, which may be more convenient.

First we’ll initialize a Problem with the fake cereal data from Nevo (2000a).

[2]:

product_data = pd.read_csv(pyblp.data.NEVO_PRODUCTS_LOCATION)
formulation = pyblp.Formulation('0 + prices', absorb='C(product_ids)')
problem = pyblp.Problem(formulation, product_data)
problem

[2]:

Dimensions:
================================
 T    N     F    K1    MD    ED
---  ----  ---  ----  ----  ----
94   2256   5    1     20    1
================================

Formulations:
==================================
     Column Indices:          0
--------------------------  ------
X1: Linear Characteristics  prices
==================================

The Problem.products attribute is a typical example of the type of NumPy record array that PyBLP uses to structure data throughout the package.

[3]:

problem.products

[3]:

rec.array([(['C01Q1'], [1], ['F1B04'], [], [], ['F1B04'], [], [], [], [0.], [-2.5e-01,  4.1e-02, -1.6e+00, -2.7e-01, -1.0e-02,  6.9e-03, -9.2e-01,  5.1e-03,  1.3e-01,  2.8e-01,  2.0e-01,  2.5e-01, -4.1e-03, -3.6e-02,  7.1e-02,  1.2e-02,  1.7e-02, -1.5e-02,  8.1e-02, -1.6e-02], [], [], [-0.], [], [], [0.1]),
           (['C01Q1'], [1], ['F1B06'], [], [], ['F1B06'], [], [], [], [0.], [-2.1e-01,  5.7e-02, -1.0e+01,  1.5e-01,  4.0e-02,  6.1e-03,  1.1e+00,  8.6e-02,  1.1e-01, -2.7e-02, -1.2e+00, -1.3e-01,  2.6e-03, -6.8e-03, -4.5e-02,  6.7e-05,  3.1e-02,  5.8e-03, -3.2e-02, -1.1e-02], [], [], [-0.], [], [], [0.1]),
           (['C01Q1'], [1], ['F1B07'], [], [], ['F1B07'], [], [], [], [0.], [-2.1e-01,  4.6e-02, -2.3e+00, -3.0e-02,  2.4e-03, -1.3e-02,  3.3e-01, -1.7e-01, -2.3e-01,  3.1e-01,  1.0e+00,  2.0e-01,  9.9e-04,  1.8e-02,  8.2e-02,  3.5e-02,  2.8e-02,  1.3e-02,  4.7e-02,  2.7e-02], [], [], [ 0.], [], [], [0.1]),
           ...,
           (['C65Q2'], [4], ['F4B10'], [], [], ['F4B10'], [], [], [], [0.], [-1.2e-01, -3.2e-04, -1.1e+00,  1.8e-01,  3.6e-02, -1.9e-02,  2.4e-01,  5.4e-02, -3.2e-01,  8.7e-02,  2.7e+00,  1.6e-01,  8.8e-04,  3.8e-02,  1.9e-02, -5.2e-02, -1.8e-02,  3.7e-02, -5.8e-02,  3.6e-02], [], [], [ 0.], [], [], [0.1]),
           (['C65Q2'], [4], ['F4B12'], [], [], ['F4B12'], [], [], [], [0.], [-2.0e-01,  3.3e-04, -5.1e-01, -4.5e-03,  3.2e-02,  6.1e-03,  5.7e-01,  2.3e-02,  1.1e-01,  1.9e-01,  2.1e+00,  1.3e-01, -8.1e-03, -1.2e-02, -3.6e-02, -4.3e-03, -1.7e-02, -6.6e-03,  7.2e-03, -1.5e-02], [], [], [-0.], [], [], [0.1]),
           (['C65Q2'], [6], ['F6B18'], [], [], ['F6B18'], [], [], [], [0.], [-1.4e-01,  3.5e-03, -2.9e-01,  2.9e-01,  3.9e-02,  2.0e-02, -1.9e+00, -4.0e-02,  3.8e-01,  1.1e-01,  3.4e+00,  1.1e-01, -6.1e-03, -1.2e-03, -4.7e-02, -2.4e-02, -2.1e-02, -2.9e-02, -2.6e-02, -2.5e-02], [], [], [-0.], [], [], [0.1])],
          dtype=[('market_ids', 'O', (1,)), ('firm_ids', 'O', (1,)), ('demand_ids', 'O', (1,)), ('supply_ids', 'O', (0,)), ('nesting_ids', 'O', (0,)), ('product_ids', 'O', (1,)), ('clustering_ids', 'O', (0,)), ('lag_indices', '<i8', (0,)), ('ownership', '<f8', (0,)), ('shares', '<f8', (1,)), ('ZD', '<f8', (20,)), ('ZS', '<f8', (0,)), ('ZC', '<f8', (0,)), (((prices,), 'X1'), '<f8', (1,)), (((), 'X2'), '<f8', (0,)), (((), 'X3'), '<f8', (0,)), ('prices', '<f8', (1,))])

This is hard to read, and if we try to convert it into a pandas.DataFrame, we’ll get an error. This is because pandas.DataFrame doesn’t support matrices.

Instead, we’ll use the data_to_dict function to first convert the record array into a dictionary, which can be easily ingested by Pandas. Matrices are converted into multiple fields, one for each column.

[4]:

x = pyblp.data_to_dict(problem.products)
print({k: v.size for k, v in x.items()})

df = pd.DataFrame(pyblp.data_to_dict(problem.products))
df

{'market_ids': 2256, 'firm_ids': 2256, 'demand_ids': 2256, 'product_ids': 2256, 'shares': 2256, 'ZD0': 2256, 'ZD1': 2256, 'ZD2': 2256, 'ZD3': 2256, 'ZD4': 2256, 'ZD5': 2256, 'ZD6': 2256, 'ZD7': 2256, 'ZD8': 2256, 'ZD9': 2256, 'ZD10': 2256, 'ZD11': 2256, 'ZD12': 2256, 'ZD13': 2256, 'ZD14': 2256, 'ZD15': 2256, 'ZD16': 2256, 'ZD17': 2256, 'ZD18': 2256, 'ZD19': 2256, 'X1': 2256, 'prices': 2256}

[4]:

	market_ids	firm_ids	demand_ids	product_ids	shares	ZD0	ZD1	ZD2	ZD3	ZD4	...	ZD12	ZD13	ZD14	ZD15	ZD16	ZD17	ZD18	ZD19	X1	prices
0	C01Q1	1	F1B04	F1B04	0.012417	-0.249518	0.040943	-1.577566	-0.269073	-0.010004	...	-0.004142	-0.035593	0.070587	0.011768	0.017287	-0.015031	0.081201	-0.015833	-0.011248	0.072088
1	C01Q1	1	F1B06	F1B06	0.007809	-0.205951	0.057100	-10.383954	0.150476	0.039816	...	0.002585	-0.006776	-0.045453	0.000067	0.031229	0.005841	-0.032121	-0.010614	-0.007135	0.114178
2	C01Q1	1	F1B07	F1B07	0.012995	-0.212031	0.046246	-2.278160	-0.029976	0.002390	...	0.000992	0.018425	0.081555	0.034975	0.027932	0.013156	0.047484	0.026800	0.023678	0.132391
3	C01Q1	1	F1B09	F1B09	0.005770	-0.170725	0.049143	-1.159784	-0.244789	0.002848	...	-0.004274	0.026440	0.064169	0.021496	0.032372	0.033063	0.045501	0.036154	0.029725	0.130344
4	C01Q1	1	F1B11	F1B11	0.017934	-0.164983	0.047168	-4.737563	-0.070873	0.012273	...	-0.004694	-0.029179	-0.000454	-0.045272	-0.025446	-0.006794	-0.007560	-0.011364	-0.015585	0.154823
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2251	C65Q2	3	F3B14	F3B14	0.024702	-0.126940	0.002240	-1.067171	0.150626	0.037091	...	-0.004787	-0.012775	-0.059399	0.043775	0.059339	-0.021934	0.034592	-0.021052	-0.017337	0.126086
2252	C65Q2	4	F4B02	F4B02	0.007914	-0.109756	0.011192	0.458133	0.066193	0.006838	...	0.009385	0.037487	0.086225	0.060856	0.028264	0.051264	0.032965	0.033324	0.044542	0.199167
2253	C65Q2	4	F4B10	F4B10	0.002229	-0.119689	-0.000324	-1.109521	0.175027	0.036227	...	0.000884	0.037634	0.019278	-0.052403	-0.018107	0.036733	-0.057647	0.035662	0.033720	0.137017
2254	C65Q2	4	F4B12	F4B12	0.011463	-0.201890	0.000334	-0.507311	-0.004538	0.031569	...	-0.008093	-0.011750	-0.036333	-0.004333	-0.017427	-0.006647	0.007228	-0.015403	-0.004174	0.100174
2255	C65Q2	6	F6B18	F6B18	0.026208	-0.139453	0.003468	-0.285143	0.291132	0.039259	...	-0.006138	-0.001181	-0.046888	-0.023637	-0.021410	-0.029402	-0.025971	-0.025435	-0.011956	0.127557

2256 rows × 27 columns