pyblp.ProblemResults.compute_micro_scores¶
-
ProblemResults.
compute_micro_scores
(dataset, micro_data, integration=None)¶ Compute scores for observations \(n \in N_d\) from a micro dataset \(d\).
The score for observation \(n \in N_d\) is
(1)¶\[\mathscr{S}_n = \frac{\partial\log\mathscr{P}_n}{\partial\theta'},\]in which the conditional probability of observation \(n\) is
(2)¶\[\mathscr{P}_n = \frac{ \sum_{i \in I_n} w_{it_n} s_{ij_nt_n} w_{dij_nt_n} }{ \sum_{t \in T} \sum_{i \in I_t} \sum_{j \in J_t \cup \{0\}} w_{it} s_{ijt} w_{dijt} }\]where \(i \in I_n\) integrates over unobserved heterogeneity for observation \(n\).
- Parameters
dataset (MicroDataset) – The
MicroDataset
for which scores will be computed. Thecompute_weights
function is called separately for each observation \(n\).micro_data (structured array-like) –
Each row corresponds either to an observation \(n\) or if there are multiple rows per observation, to an \(i \in I_n\) that integrates over unobserved heterogeneity. In addition to the names of any demographics used in the
agent_formulation
and any specification of agent-specific product'availability'
, the following fields are required:market_ids : (object) - Market IDs \(t_n\) for each observation \(n\).
choice_indices : (int) - Within-market indices of choices \(j_n\). If
compute_weights
passed to thedataset
returns an array with \(J_t\) elements in its second axis, then choice indices take on values from \(0\) to \(J_t - 1\) where \(0\) corresponds to the first inside good. If it returns an array with \(1 + J_t\) elements in its second axis, then choice indices take on values from \(0\) to \(J_t\) where \(0\) corresponds to the outside good.
If the
dataset
is configured to support second choice data, second choices are also required:second_choice_indices : (int, optional) - Within-market indices of second choices \(k_n\). If
compute_weights
passed to thedataset
returns an array with \(J_t\) elements in its third axis, then second choice indices take on values from \(0\) to \(J_t - 1\) where \(0\) corresponds to the first inside good. If it returns an array with \(1 + J_t\) elements in its third axis, then second choice indices take on values from \(0\) to \(J_t\) where \(0\) corresponds to the outside good.
The following fields are required if
integration
is not specified:micro_ids : (object, optional) - IDs corresponding to observations \(n\), which should be pre-sorted, from smallest to largest.
weights : (numeric, optional) - Integration weights, \(w_{it_n}\), for integration over unobserved heterogeneity \(i \in I_n\).
nodes : (numeric, optional) - Unobserved agent characteristics called integration nodes, \(\nu\). If there are more than \(K_2\) columns (the number of demand-side nonlinear product characteristics), only the first \(K_2\) will be retained. If any columns of
sigma
are fixed at zero, only the first few columns of these nodes will be used.
If these fields are specified, each row corresponds to an \(i \in I_n\), and there should generally be multiple rows per observation \(n\).
The convenience function
build_integration()
can be useful when constructing custom nodes and weights.Note
If
nodes
has multiple columns, it can be specified as a matrix or broken up into multiple one-dimensional fields with column index suffixes that start at zero. For example, if there are three columns of nodes, anodes
field with three columns can be replaced by three one-dimensional fields:nodes0
,nodes1
, andnodes2
.integration (Integration, optional) –
Integration
configuration for how to buildnodes
andweights
fields inmicro_data
for each observation \(n\). If this configuration is specified, anymicro_ids
,weights
, andnodes
inmicro_data
will be ignored.If specified, each row of
micro_data
is treated as corresponding to a unique observation \(n\), and will be duplicated by as many rows of nodes as are created by theIntegration
configuration. Specifically, up to \(K_2\) columns of nodes (the number of demand-side nonlinear product characteristics) will be built for each observation \(n\). If there are zeros on the diagonal of \(\Sigma\), nodes will not be built for those characteristics, to cut down on memory usage.
- Returns
Scores \(\mathscr{S}_n\). The list is in the same order as
ProblemResults.theta
(also seeProblemResults.theta_labels
). Each element of the list is an array of scores for the corresponding parameter. The array is in the same order as observations appear in themicro_data
. Note that it is possible for parameters inProblemResults.theta
to mechanically have zero scores, for example if they are on a constant demographic.Taking the mean of a parameter’s scores delivers the observed
value
for an optimalMicroMoment
that matches the score for that parameter.If any scores are
numpy.nan
, this means that the probability of that observation is \(\mathscr{P}_n = 0\), suggesting that the observation was not generated by the sampling process defined by thedataset
.- Return type
list