pyblp.Problem.solve¶
-
Problem.solve(sigma=None, pi=None, rho=None, phi=None, beta=None, gamma=None, sigma_bounds=None, pi_bounds=None, rho_bounds=None, phi_bounds=None, beta_bounds=None, gamma_bounds=None, delta=None, method='2s', initial_update=None, optimization=None, scale_objective=True, check_optimality='both', finite_differences=False, error_behavior='revert', error_punishment=1, delta_behavior='first', iteration=None, fp_type='safe_linear', shares_bounds=(1e-300, None), costs_bounds=None, W=None, center_moments=True, W_type='robust', se_type='robust', demand_moment_types='levels', supply_moment_types='levels', covariance_moments_mean=0, micro_moments=(), micro_sample_covariances=None, resample_agent_data=None)¶ Solve the problem.
The problem is solved in one or more GMM steps. During each step, any parameters in \(\theta\) are optimized to minimize the GMM objective value, giving the estimated \(\hat{\theta}\). If there are no parameters in \(\theta\) (for example, in the logit model there are no nonlinear parameters and all linear parameters can be concentrated out), the objective is evaluated once during the step.
If there are nonlinear parameters, the mean utility, \(\delta(\theta)\) is computed market-by-market with fixed point iteration. Otherwise, it is computed analytically according to the solution of the logit model. If a supply side is to be estimated, marginal costs, \(c(\theta)\), are also computed market-by-market. Linear parameters are then estimated, which are used to recover structural error terms, which in turn are used to form the objective value. By default, the objective gradient is computed as well.
Note
This method supports
parallel()processing. If multiprocessing is used, market-by-market computation of \(\delta(\theta)\) (and \(\tilde{c}(\theta)\) if a supply side is estimated), along with associated Jacobians, will be distributed among the processes.- Parameters
sigma (array-like, optional) –
Configuration for which elements in the lower-triangular Cholesky root of the covariance matrix for unobserved taste heterogeneity, \(\Sigma\), are fixed at zero and starting values for the other elements, which, if not fixed by
sigma_bounds, are in the vector of unknown elements, \(\theta\).Rows and columns correspond to columns in \(X_2\), which is formulated according
product_formulationsinProblem. If \(X_2\) was not formulated, this should not be specified, since the logit model will be estimated.Values above the diagonal are ignored. Zeros are assumed to be zero throughout estimation and nonzeros are, if not fixed by
sigma_bounds, starting values for unknown elements in \(\theta\). If any columns are fixed at zero, only the first few columns of integration nodes (specified inProblem) will be used.To have nonzero covariances for only a subset of the random coefficients, the characteristics for those random coefficients with zero covariances should come first in \(X_2\). This can be seen by looking at the expression for \(\Sigma\Sigma'\), the actual covariance matrix of the random coefficients.
pi (array-like, optional) –
Configuration for which elements in the matrix of parameters that measures how agent tastes vary with demographics, \(\Pi\), are fixed at zero and starting values for the other elements, which, if not fixed by
pi_bounds, are in the vector of unknown elements, \(\theta\).Rows correspond to the same product characteristics as in
sigma. Columns correspond to columns in \(d\), which is formulated according toagent_formulationinProblem. If \(d\) was not formulated, this should not be specified.Zeros are assumed to be zero throughout estimation and nonzeros are, if not fixed by
pi_bounds, starting values for unknown elements in \(\theta\).rho (array-like, optional) –
Configuration for which elements in the vector of parameters that measure within nesting group correlation, \(\rho\), are fixed at zero and starting values for the other elements, which, if not fixed by
rho_bounds, are in the vector of unknown elements, \(\theta\).If this is a scalar, it corresponds to all groups defined by the
nesting_idsfield ofproduct_datainProblem. If this is a vector, it must have \(H\) elements, one for each nesting group. Elements correspond to group IDs in the sorted order ofProblem.unique_nesting_ids. If nesting IDs were not specified, this should not be specified either.Zeros are assumed to be zero throughout estimation and nonzeros are, if not fixed by
rho_bounds, starting values for unknown elements in \(\theta\).phi (array-like, optional) –
Configuration for which elements in the vector of parameters that measure unobservable autocorrelation, \(\phi\), are fixed at zero and starting values for the other elements, which, if not fixed by
phi_bounds, are in the vector of unknown elements, \(\theta\).If
lag_indicesinproduct_datawere specified, this is either a scalar \(\phi = \phi_\xi\) or, if a supply side was specified, a \(2 \times 2\) matrix(1)\[\begin{split}\phi = \begin{bmatrix} \phi_\xi & \phi_{\xi\omega} \\ \phi_{\omega\xi} & \phi_\omega \end{bmatrix}\end{split}\]corresponding to demand- and supply-side unobservable autocorrelations:
(2)\[\begin{split}\xi_{jt} = \phi_\xi \cdot L \xi_{jt} + \phi_{\xi\omega} \cdot L \omega_{jt} + \tilde{\xi}_{jt}, \\ \omega_{jt} = \phi_\omega \cdot L \omega_{jt} + \phi_{\omega\xi} \cdot L \xi_{jt} + \tilde{\omega}_{jt},\end{split}\]where the
lag_indicesfield inproduct_datadefines the lag operator \(L\).beta (array-like, optional) –
Configuration for which elements in the vector of demand-side linear parameters, \(\beta\), are concentrated out of the problem. Usually, this is left unspecified, unless there is a supply side, in which case parameters on endogenous product characteristics cannot be concentrated out of the problem. Values specify which elements are fixed at zero and starting values for the other elements, which, if not fixed by
beta_bounds, are in the vector of unknown elements, \(\theta\).Elements correspond to columns in \(X_1\), which is formulated according to
product_formulationsinProblem.Both
Noneandnumpy.nanindicate that the parameter should be concentrated out of the problem. That is, it will be estimated, but does not have to be included in \(\theta\). Zeros are assumed to be zero throughout estimation and nonzeros are, if not fixed bybeta_bounds, starting values for unknown elements in \(\theta\).gamma (array-like, optional) –
Configuration for which elements in the vector of supply-side linear parameters, \(\gamma\), are concentrated out of the problem. Usually, this is left unspecified. Values specify which elements are fixed at zero and starting values for the other elements, which, if not fixed by
gamma_bounds, are in the vector of unknown elements, \(\theta\).Elements correspond to columns in \(X_3\), which is formulated according to
product_formulationsinProblem. If \(X_3\) was not formulated, this should not be specified.Both
Noneandnumpy.nanindicate that the parameter should be concentrated out of the problem. That is, it will be estimated, but does not have to be included in \(\theta\). Zeros are assumed to be zero throughout estimation and nonzeros are, if not fixed bygamma_bounds, starting values for unknown elements in \(\theta\).sigma_bounds (tuple, optional) –
Configuration for \(\Sigma\) bounds of the form
(lb, ub), in which bothlbandubare of the same size assigma. Each element inlbandubdetermines the lower and upper bound for its counterpart insigma. Ifoptimizationdoes not support bounds, these will be ignored. If bounds are supported, the diagonal ofsigmais by default bounded from below by zero.Values above the diagonal are ignored. Lower and upper bounds corresponding to zeros in
sigmaare set to zero. Setting a lower bound equal to an upper bound fixes the corresponding element, removing it from \(\theta\). BothNoneandnumpy.nanare converted to-numpy.infinlband tonumpy.infinub.pi_bounds (tuple, optional) –
Configuration for \(\Pi\) bounds of the form
(lb, ub), in which bothlbandubare of the same size aspi. Each element inlbandubdetermines the lower and upper bound for its counterpart inpi. Ifoptimizationdoes not support bounds, these will be ignored. By default,piis unbounded.Lower and upper bounds corresponding to zeros in
piare set to zero. Setting a lower bound equal to an upper bound fixes the corresponding element, removing it from \(\theta\). BothNoneandnumpy.nanare converted to-numpy.infinlband tonumpy.infinub.rho_bounds (tuple, optional) –
Configuration for \(\rho\) bounds of the form
(lb, ub), in which bothlbandubare of the same size asrho. Each element inlbandubdetermines the lower and upper bound for its counterpart inrho. Ifoptimizationdoes not support bounds, these will be ignored.If bounds are supported,
rhois by default bounded from below by0, which corresponds to the simple logit model, and bounded from above by0.99because values greater than1are inconsistent with utility maximization.Lower and upper bounds corresponding to zeros in
rhoare set to zero. Setting a lower bound equal to an upper bound fixes the corresponding element, removing it from \(\theta\). BothNoneandnumpy.nanare converted to-numpy.infinlband tonumpy.infinub.phi_bounds (tuple, optional) –
Configuration for \(\phi\) bounds of the form
(lb, ub), in which bothlbandubare of the same size asphi. Ifoptimizationdoes not support bounds, these will be ignored.Setting a lower bound equal to an upper bound fixes this parameter, removing it from \(\theta\). Both
Noneandnumpy.nanare converted to-numpy.infinlband tonumpy.infinub.beta_bounds (tuple, optional) –
Configuration for \(\beta\) bounds of the form
(lb, ub), in which bothlbandubare of the same size asbeta. Each element inlbandubdetermines the lower and upper bound for its counterpart inbeta. Ifoptimizationdoes not support bounds, these will be ignored.Usually, this is left unspecified unless there is a supply side, in which case parameters on endogenous product characteristics cannot be concentrated out of the problem. It is generally a good idea to constrain such parameters to be nonzero so that the intra-firm Jacobian of shares with respect to prices does not become singular.
By default, all non-concentrated out parameters are unbounded. Bounds should only be specified for parameters that are included in \(\theta\); that is, those with initial values specified in
beta.Lower and upper bounds corresponding to zeros in
betaare set to zero. Setting a lower bound equal to an upper bound fixes the corresponding element, removing it from \(\theta\). BothNoneandnumpy.nanare converted to-numpy.infinlband tonumpy.infinub.gamma_bounds (tuple, optional) –
Configuration for \(\gamma\) bounds of the form
(lb, ub), in which bothlbandubare of the same size asgamma. Each element inlbandubdetermines the lower and upper bound for its counterpart ingamma. Ifoptimizationdoes not support bounds, these will be ignored.By default, all non-concentrated out parameters are unbounded. Bounds should only be specified for parameters that are included in \(\theta\); that is, those with initial values specified in
gamma.Lower and upper bounds corresponding to zeros in
gammaare set to zero. Setting a lower bound equal to an upper bound fixes the corresponding element, removing it from \(\theta\). BothNoneandnumpy.nanare converted to-numpy.infinlband tonumpy.infinub.delta (array-like, optional) – Initial values for the mean utility, \(\delta\). If there are any nonlinear parameters, these are the values at which the fixed point iteration routine will start during the first objective evaluation. By default, the solution to the logit model in (45) is used. If \(\rho\) is specified, the solution to the nested logit model in (46) under the initial
rhois used instead.method (str, optional) –
The estimation routine that will be used. The following methods are supported:
'1s'- One-step GMM.'2s'(default) - Two-step GMM.
Iterated GMM can be manually implemented by executing single GMM steps in a loop, in which after the first iteration, nonlinear parameters and weighting matrices from the last
ProblemResultsare passed as arguments.initial_update (bool, optional) –
Whether to update starting values for the mean utility \(\delta\) and the weighting matrix \(W\) at the initial parameter values before the first GMM step. This initial update will be called a zeroth step.
By default, an initial update will not be used unless
micro_momentsare specified without an initial weighting matrixW.Note
When trying multiple parameter starting values to verify that the optimization routine converges to the same optimum, using
initial_updateis not recommended because different weighting matrices will be used for these different runs. A better option is to useoptimization=Optimization('return')at the best guess for parameter values and passProblemResults.updated_WtoWfor each set of different parameter starting values.optimization (Optimization, optional) –
Optimizationconfiguration for how to solve the optimization problem in each GMM step, which is only used if there are unfixed nonlinear parameters over which to optimize. By default,Optimization('l-bfgs-b', {'ftol': 0, 'gtol': 1e-8})is used. If available,Optimization('knitro')may be preferable. Generally, it is recommended to consider a number of different optimization routines and starting values, verifying that \(\hat{\theta}\) satisfies both the first and second order conditions. Choosing a routine that supports bounds (and configuring bounds) is typically a good idea. Choosing a routine that does not use analytic gradients will often down estimation.scale_objective (bool, optional) –
Whether to scale the objective in (10) by \(N\), the number of observations, in which case the objective after two GMM steps is equal to the \(J\) statistic from Hansen (1982). By default, the objective is scaled by \(N\).
In theory the scale of the objective should not matter, but in practice having similar objective values for different problem sizes is helpful because similar optimization tolerances can be used.
check_optimality (str, optional) –
How to check for optimality (first and second order conditions) after the optimization routine finishes. The following configurations are supported:
'gradient'- Analytically compute the gradient after optimization finishes, but do not compute the Hessian. Since Jacobians needed to compute standard errors will already be computed, gradient computation will not take a long time. This option may be useful if Hessian computation takes a long time when, for example, there are a large number of parameters.'both'(default) - Also compute the Hessian with central finite differences after optimization finishes.
finite_differences (bool, optional) –
Whether to use finite differences to compute Jacobians and the gradient instead of analytic expressions. Since finite differences comes with numerical approximation error and is typically slower, analytic expressions are used by default.
One situation in which finite differences may be preferable is when there are a sufficiently large number of products and integration nodes in individual markets to make computing analytic Jacobians infeasible because of memory requirements. Note that an analytic expression for the Hessian has not been implemented, so when computed it is always approximated with finite differences.
error_behavior (str, optional) –
How to handle any errors. For example, there can sometimes be overflow or underflow when computing \(\delta(\theta)\) at a large \(\hat{\theta}\). The following behaviors are supported:
'revert'(default) - Revert problematic values to their last computed values. If there are problematic values during the first objective evaluation, revert values in \(\delta(\theta)\) to their starting values; in \(\tilde{c}(\hat{\theta})\), to prices; in the objective, to1e10; and in other matrices such as Jacobians, to zeros.'punish'- Set the objective to1and its gradient to all zeros. This option along with a largeerror_punishmentcan be helpful for routines that do not use analytic gradients.'raise'- Raise an exception.
error_punishment (float, optional) – How to scale the GMM objective value after an error. By default, the objective value is not scaled.
delta_behavior (str, optional) –
Configuration for the values at which the fixed point computation of \(\delta(\theta)\) in each market will start. This configuration is only relevant if there are unfixed nonlinear parameters over which to optimize. The following behaviors are supported:
'first'(default) - Start at the values configured bydeltaduring the first GMM step, and at the values computed by the last GMM step for each subsequent step.'logit'- Start at the solution to the logit model in (45), or if \(\rho\) is specified, the solution to the nested logit model in (46). If the initialdeltais left unspecified and there is no nesting parameter being optimized over, this will generally be equivalent to'first'.'last'- Start at the values of \(\delta(\theta)\) computed during the last objective evaluation, or, if this is the first evaluation, at the values configured bydelta. This behavior tends to speed up computation but may introduce some instability into estimation.
iteration (Iteration, optional) –
Iterationconfiguration for how to solve the fixed point problem used to compute \(\delta(\theta)\) in each market. This configuration is only relevant if there are nonlinear parameters, since \(\delta\) can be estimated analytically in the logit model. By default,Iteration('squarem', {'atol': 1e-14})is used. Newton-based routines such asIteration('lm'`)that compute the Jacobian can often be faster (especially when there are nesting parameters), but the Jacobian-free SQUAREM routine is used by default because it speed is often comparable and in practice it can be slightly more stable.fp_type (str, optional) –
Configuration for the type of contraction mapping used to compute \(\delta(\theta)\). The following types are supported:
'safe_linear'(default) - The standard linear contraction mapping in (13) (or (44) when there is nesting) with safeguards against numerical overflow. Specifically, \(\max_j V_{ijt}\) (or \(\max_j V_{ijt} / (1 - \rho_{h(j)})\) when there is nesting) is subtracted from \(V_{ijt}\) and the logit expression for choice probabilities in (5) (or (42)) is re-scaled accordingly. Such re-scaling is known as the log-sum-exp trick.'linear'- The standard linear contraction mapping without safeguards against numerical overflow. This option may be preferable to'safe_linear'if utilities are reasonably small and unlikely to create overflow problems.'nonlinear'- Iteration over \(\exp \delta_{jt}\) instead of \(\delta_{jt}\). This can be faster than'linear'because it involves fewer logarithms. Also, following Brunner, Heiss, Romahn, and Weiser (2017), the \(\exp \delta_{jt}\) term can be cancelled out of the expression because it also appears in the numerator of (5) in the definition of \(s_{jt}(\delta, \theta)\). This second trick only works when there are no nesting parameters.'safe_nonlinear'- Exponentiated version with minimal safeguards against numerical overflow. Specifically, \(\max_j \mu_{ijt}\) is subtracted from \(\mu_{ijt}\). This helps with stability but is less helpful than subtracting from the full \(V_{ijt}\), so this version is less stable than'safe_linear'.
This option is only relevant if
sigmaorpiare specified because \(\delta\) can be estimated analytically in the logit model with (45) and in the nested logit model with (46).shares_bounds (tuple, optional) –
Configuration for \(s_{jt}(\delta, \theta)\) bounds in the contraction in (13) of the form
(lb, ub), in which bothlbandubare floats orNone. By default, simulated shares are bounded from below by1e-300. This is only relevant iffp_typeis'safe_linear'or'linear'. Bounding shares in the contraction does nothing with a nonlinear fixed point.It can be particularly helpful to bound shares in the contraction from below by a small number to prevent the contraction from failing when there are issues with zero or negative simulated shares. Zero shares can occur when there are underflow issues and negative shares can occur when there are issues with the numerical integration routine having negative integration weights (e.g., for sparse grid integration).
The idea is that a small lower bound will allow the contraction to converge even when it encounters some issues with small or negative shares. However, if these issues are unlikely, disabling this behavior can speed up the iteration routine because fewer checks will be done.
Both
Noneandnumpy.nanare converted to-numpy.infinlband tonumpy.infinub.costs_bounds (tuple, optional) –
Configuration for \(c_{jt}(\theta)\) bounds of the form
(lb, ub), in which bothlbandubare floats orNone. This is only relevant if \(X_3\) was formulated byproduct_formulationsinProblem. By default, marginal costs are unbounded.When
costs_typeinProblemis'log', nonpositive \(c(\theta)\) values can create problems when computing \(\tilde{c}(\theta) = \log c(\theta)\). One solution is to setlbto a small number. Rows in Jacobians associated with clipped marginal costs will be zero.Both
Noneandnumpy.nanare converted to-numpy.infinlband tonumpy.infinub.W (array-like, optional) – Starting values for the weighting matrix, \(W\). By default, the 2SLS weighting matrix in (23) is used, unless there are any
micro_moments, in which case aninitial_updatewill be used to update starting values \(W\) and the mean utility \(\delta\) at the initial parameter values before the first GMM step.center_moments (bool, optional) – Whether to center each column of the demand- and supply-side moments \(g\) before updating the weighting matrix \(W\) according to (24). By default, the moments are centered. This has no effect if
W_typeis'unadjusted'.W_type (str, optional) –
How to update the weighting matrix. This has no effect if
methodis'1s'. Usually,se_typeshould be the same. The following types are supported:'robust'(default) - Heteroscedasticity robust weighting matrix defined in (24) and (25).'clustered'- Clustered weighting matrix defined in (24) and (26). Clusters must be defined by theclustering_idsfield ofproduct_datainProblem.'unadjusted'- Homoskedastic weighting matrix defined in (24) and (28).
This only affects the standard demand- and supply-side block of the updated weighting matrix. If there are micro moments, this matrix will be block-diagonal with a micro moment block equal to the inverse of the scaled covariance matrix defined in (36).
se_type (str, optional) –
How to compute parameter covariances and standard errors. Usually,
W_typeshould be the same. The following types are supported:'robust'(default) - Heteroscedasticity robust covariances defined in (30) and (25).'clustered'- Clustered covariances defined in (30) and (26). Clusters must be defined by theclustering_idsfield ofproduct_datainProblem.'unadjusted'- Homoskedastic covariances defined in (31), which are computed under the assumption that the weighting matrix is optimal.
This only affects the standard demand- and supply-side block of the matrix of averaged moment covariances. If there are micro moments, the \(S\) matrix defined in the expressions referenced above will be block-diagonal with a micro moment block equal to the scaled covariance matrix defined in (36).
demand_moment_types (str or sequence of (str, int) tuples, optional) –
This can be used to replace demand-side moments \(\bar{g}_D\). The following types are supported:
'levels'(default) - Standard moments in (12): \(g_{D,jt} = \xi_{jt} \cdot Z_{D,jt}\).'innovations'- Replace \(\xi_{jt}\) with its innovation in (2): \(g_{D,jt} = \tilde{\xi}_{jt} \cdot Z_{D,jt}\).'differenced_innovations'- Further difference the innovation in (2) to eliminate, for example, any product-market fixed effects: \(g_{D,jt} = (\tilde{\xi}_{jt} - L \tilde{\xi}_{jt}) \cdot Z_{D,jt}\).
To specify multiple stacked types, a list can be used instead. Each element in the list is a
(type, instruments)tuple wheretypeis one of the above strings andinstrumentsis the number of columns in \(Z_D\) to use. For example,demand_moment_types=[('levels', 1), ('innovations', 3), ('differenced_innovations', 3)]implements Arellano and Bond (1991)’s “system GMM” estimator with an additional moment restriction on levels:(3)\[\begin{split}g_{D,jt} = \begin{bmatrix} \xi_{jt} \cdot Z_{D,jt,1} \\ \tilde{\xi}_{jt} \cdot Z_{D,jt,2:4} \\ (\tilde{\xi}_{jt} - L \tilde{\xi}_{jt}) \cdot Z_{D,jt,5:7} \end{bmatrix} .\end{split}\]Warning
Alternative moment types are still an experimental feature. The way in which they are implemented and used may change somewhat in future releases.
supply_moment_types (str or sequence of (str, int) tuples, optional) – This can be used to replace supply-side moments \(\bar{g}_S\) analogously to how
demand_moment_typesreplaces demand-side moments, but with \(\xi\) replaced by \(\omega\) and \(Z_D\) replaced by \(Z_S\).covariance_moments_mean (float, optional) – If
covariance_instrumentswere specified inproduct_data, this can be used to choose a \(m \neq 0\) in covariance moments \(E[g_{C,jt}] = E[(\xi_{jt}\omega_{jt} - m)Z_{C,jt}] = 0\) where \(m\) is by default zero. This can be used for sensitivity testing to see how different covariances may affect estimates.micro_moments (sequence of MicroMoment, optional) –
Configurations for the \(M_M\)
MicroMomentinstances that will be added to the standard set of moments. By default, no micro moments are used, so \(M_M = 0\).When micro moments are specified, unless an initial weighting matrix
Wis specified as well (with a lower right micro moment block that reflects micro moment covariances), aninitial_updatewill be used to update starting values \(W\) and the mean utility \(\delta\) at the initial parameter values before the first GMM step.Note
When trying multiple parameter starting values to verify that the optimization routine converges to the same optimum, using
initial_updateis not recommended because different weighting matrices will be used for these different runs. A better option is to useoptimization=Optimization('return')at the best guess for parameter values and passProblemResults.updated_WtoWfor each set of different parameter starting values.micro_sample_covariances (array-like, optional) – Sample covariance matrix for the \(M_M\) micro moments. By default, their asymptotic covariance matrix is computed according to (36). This override could be used, for example, if instead of estimating covariances at some estimated \(\hat{\theta}\), one wanted to use a boostrap procedure to compute their covariances directly from the micro data.
resample_agent_data (callable, optional) –
If specified, simulation error in moment covariances will be accounted for by resampling \(r = 1, \dots, R\) sets of agents by iteratively calling this function, which should be of the following form:
resample_agent_data(index) --> agent_data or None
where
indexincrements from0to1and so on andagent_datais the corresponding resampled agent data, which should be a resampled version of theagent_datapassed toProblem. Eachindexshould correspond to a different set of randomly drawn agent data, with different integration nodes and demographics. Ifindexis larger than \(R - 1\), this function should returnNone, at which point agents will stop being resampled.
- Returns
ProblemResultsof the solved problem.- Return type
ProblemResults
Examples