pyblp.ProblemResults.compute_micro_scores

ProblemResults.compute_micro_scores(dataset, micro_data, integration=None)

Compute scores for observations \(n \in N_d\) from a micro dataset \(d\).

The score for observation \(n \in N_d\) is

(1)\[\mathscr{S}_n = \frac{\partial\log\mathscr{P}_n}{\partial\theta'},\]

in which the conditional probability of observation \(n\) is

(2)\[\mathscr{P}_n = \frac{ \sum_{i \in I_n} w_{it_n} s_{ij_nt_n} w_{dij_nt_n} }{ \sum_{t \in T} \sum_{i \in I_t} \sum_{j \in J_t \cup \{0\}} w_{it} s_{ijt} w_{dijt} }\]

where \(i \in I_n\) integrates over unobserved heterogeneity for observation \(n\).

Parameters
  • dataset (MicroDataset) – The MicroDataset for which scores will be computed. The compute_weights function is called separately for each observation \(n\).

  • micro_data (structured array-like) –

    Each row corresponds either to an observation \(n\) or if there are multiple rows per observation, to an \(i \in I_n\) that integrates over unobserved heterogeneity. In addition to the names of any demographics used in the agent_formulation and any specification of agent-specific product 'availability', the following fields are required:

    • market_ids : (object) - Market IDs \(t_n\) for each observation \(n\).

    • choice_indices : (int) - Within-market indices of choices \(j_n\). If compute_weights passed to the dataset returns an array with \(J_t\) elements in its second axis, then choice indices take on values from \(0\) to \(J_t - 1\) where \(0\) corresponds to the first inside good. If it returns an array with \(1 + J_t\) elements in its second axis, then choice indices take on values from \(0\) to \(J_t\) where \(0\) corresponds to the outside good.

    If the dataset is configured to support second choice data, second choices are also required:

    • second_choice_indices : (int, optional) - Within-market indices of second choices \(k_n\). If compute_weights passed to the dataset returns an array with \(J_t\) elements in its third axis, then second choice indices take on values from \(0\) to \(J_t - 1\) where \(0\) corresponds to the first inside good. If it returns an array with \(1 + J_t\) elements in its third axis, then second choice indices take on values from \(0\) to \(J_t\) where \(0\) corresponds to the outside good.

    The following fields are required if integration is not specified:

    • micro_ids : (object, optional) - IDs corresponding to observations \(n\), which should be pre-sorted, from smallest to largest.

    • weights : (numeric, optional) - Integration weights, \(w_{it_n}\), for integration over unobserved heterogeneity \(i \in I_n\).

    • nodes : (numeric, optional) - Unobserved agent characteristics called integration nodes, \(\nu\). If there are more than \(K_2\) columns (the number of demand-side nonlinear product characteristics), only the first \(K_2\) will be retained. If any columns of sigma are fixed at zero, only the first few columns of these nodes will be used.

    If these fields are specified, each row corresponds to an \(i \in I_n\), and there should generally be multiple rows per observation \(n\).

    The convenience function build_integration() can be useful when constructing custom nodes and weights.

    Note

    If nodes has multiple columns, it can be specified as a matrix or broken up into multiple one-dimensional fields with column index suffixes that start at zero. For example, if there are three columns of nodes, a nodes field with three columns can be replaced by three one-dimensional fields: nodes0, nodes1, and nodes2.

  • integration (Integration, optional) –

    Integration configuration for how to build nodes and weights fields in micro_data for each observation \(n\). If this configuration is specified, any micro_ids, weights, and nodes in micro_data will be ignored.

    If specified, each row of micro_data is treated as corresponding to a unique observation \(n\), and will be duplicated by as many rows of nodes as are created by the Integration configuration. Specifically, up to \(K_2\) columns of nodes (the number of demand-side nonlinear product characteristics) will be built for each observation \(n\). If there are zeros on the diagonal of \(\Sigma\), nodes will not be built for those characteristics, to cut down on memory usage.

Returns

Scores \(\mathscr{S}_n\). The list is in the same order as ProblemResults.theta (also see ProblemResults.theta_labels). Each element of the list is an array of scores for the corresponding parameter. The array is in the same order as observations appear in the micro_data. Note that it is possible for parameters in ProblemResults.theta to mechanically have zero scores, for example if they are on a constant demographic.

Taking the mean of a parameter’s scores delivers the observed value for an optimal MicroMoment that matches the score for that parameter.

If any scores are numpy.nan, this means that the probability of that observation is \(\mathscr{P}_n = 0\), suggesting that the observation was not generated by the sampling process defined by the dataset.

Return type

list