# Background¶

The following sections provide a very brief overview of the BLP model and how it is estimated. This goal is to concisely introduce the notation and terminology used throughout the rest of the documentation. For a more in-depth overview, refer to Conlon and Gortmaker (2019).

## The Model¶

There are $$t = 1, 2, \dotsc, T$$ markets, each with $$j = 1, 2, \dotsc, J_t$$ products produced by $$f = 1, 2, \dotsc, F_t$$ firms, for a total of $$N$$ products across all markets. There are $$i = 1, 2, \dotsc, I_t$$ agents who choose among the $$J_t$$ products and an outside good $$j = 0$$.

### Demand¶

Observed demand-side product characteristics are contained in the $$N \times K_1$$ matrix of linear characteristics, $$X_1$$, and the $$N \times K_2$$ matrix of nonlinear characteristics, $$X_2$$, which is typically a subset of $$X_1$$. Unobserved demand-side product characteristics, $$\xi$$, are a $$N \times 1$$ vector.

In market $$t$$, observed agent characteristics are a $$I_t \times D$$ matrix called demographics, $$d$$. Unobserved agent characteristics are a $$I_t \times K_2$$ matrix, $$\nu$$.

The indirect utility of agent $$i$$ from purchasing product $$j$$ in market $$t$$ is

(1)$U_{jti} = \underbrace{\delta_{jt} + \mu_{jti}}_{V_{jti}} + \epsilon_{jti},$

in which the mean utility is, in vector-matrix form,

(2)$\delta = \underbrace{X_1^p\alpha + X_1^x\beta^x}_{X_1\beta} + \xi.$

The $$K_1 \times 1$$ vector of demand-side linear paramterers, $$\beta$$, is partitioned into two components: $$\alpha$$ is a $$K_1^p \times 1$$ vector of parameters on the $$N \times K_1^p$$ submatrix of endogenous characteristics, $$X_1^p$$, and $$\beta^x$$ is a $$K_1^x \times 1$$ vector of parameters on the $$N \times K_1^x$$ submatrix of exogenous characteristics, $$X_1^x$$. Usually, $$X_1^p = p$$, prices, so $$\alpha$$ is simply a scalar.

The agent-specific portion of utility in a single market is

(3)$\mu = X_2(\Sigma\nu' + \Pi d').$

The model incorporates both observable (demographic) and unobservable taste heterogeneity though random coefficients. For the unobserved heterogeneity, we let $$\nu$$ denote independent draws from the standard normal distribution. These are scaled by a $$K_2 \times K_2$$ upper triangular matrix $$\Sigma$$, which denotes the Cholesky root of the covariance matrix for unobserved taste heterogeneity. The $$K_2 \times D$$ matrix $$\Pi$$ measures how agent tastes vary with demographics.

Random idiosyncratic preferences, $$\epsilon_{jti}$$, are assumed to be Type I Extreme Value, so that conditional on the heterogeneous coefficients, marketshares follow the well known logit form. Aggregate marketshares are obtained by integrating over the distribution of individual heterogeneity. They are approximated with Monte Carlo integration or quadrature rules defined by the $$I_t \times K_2$$ matrix of integration nodes, $$\nu$$, and a $$I_t \times 1$$ vector of integration weights, $$w$$:

(4)$s_{jt} \approx \sum_{i=1}^{I_t} w_i s_{jti},$

where the probability that agent $$i$$ chooses product $$j$$ in market $$t$$ is

(5)$s_{jti} = \frac{\exp V_{jti}}{1 + \sum_{k=1}^{J_t} \exp V_{kti}}.$

There is a one in the denominator because the utility of the outside good is normalized to $$U_{0ti} = 0$$.

### Supply¶

Observed supply-side product characteristics are contained in the $$N \times K_3$$ matrix of cost characteristics, $$X_3$$. Prices cannot be cost characteristics, but non-price product characteristics often overlap with the demand-side characteristics in $$X_1$$ and $$X_2$$. Unobserved supply-side product characteristics, $$\omega$$, are a $$N \times 1$$ vector.

Firm $$f$$ chooses prices in market $$t$$ to maximize the profits of its products $$\mathscr{J}_{ft} \subset \{1, 2, \ldots, J_t\}$$:

(6)$\pi_{ft} = \sum_{j \in \mathscr{J}_{ft}} (p_{jt} - c_{jt})s_{jt}.$

In a single market, the corresponding multi-product differentiated Bertrand first order conditions are, in vector-matrix form,

(7)$p - c = \underbrace{\Delta^{-1}s}_{\eta},$

where the multi-product Bertrand markup $$\eta$$ depends on $$\Delta$$, a $$J_t \times J_t$$ matrix of intra-firm (negative) demand derivatives:

(8)$\Delta = -O \odot \frac{\partial s}{\partial p}.$

Here, $$O$$ denotes the market-level ownership matrix, where $$O_{jk}$$ is typically $$1$$ if the same firm produces products $$j$$ and $$k$$, and $$0$$ otherwise.

To include a supply side, we must specify a functional form for marginal costs:

(9)$\tilde{c} = f(c) = X_3\gamma + \omega.$

The most common choices are $$f(c) = c$$ and $$f(c) = \log(c)$$.

## Estimation¶

A demand side is always estimated but including a supply side is optional. With only a demand side, there are three sets of parameters to be estimated: $$\beta$$ (which may include $$\alpha$$), $$\Sigma$$ and $$\Pi$$. With a supply side, there is also $$\gamma$$. The linear parameters, $$\beta$$ and $$\gamma$$, are typically concentrated out of the problem. The exception is $$\alpha$$, which cannot be concentrated out when there is a supply side because it is needed to compute demand derivatives and hence marginal costs. Linear parameters that are not concentrated out along with unknown nonlinear parameters in $$\Sigma$$ and $$\Pi$$ are collectively denoted $$\theta$$, a $$P \times 1$$ vector.

The GMM problem is

(10)$\min_\theta q(\theta) = N^2\bar{g}(\theta)'W\bar{g}(\theta),$

in which $$W$$ is a $$(M_D + M_S) \times (M_D + M_S)$$ weighting matrix and $$\bar{g}$$ is a $$(M_D + M_S) \times 1$$ vector of sample moment conditions:

(11)$\begin{split}\bar{g} = \frac{1}{N} \begin{bmatrix} \sum_{j,t} Z_{D,jt}'\xi_{jt} \\ \sum_{j,t} Z_{S,jt}'\omega_{jt} \end{bmatrix},\end{split}$

in which $$Z_D$$ and $$Z_S$$ are $$N \times M_D$$ and $$N \times M_S$$ matrices of demand- and supply-side instruments containing excluded instruments along with $$X_1^x$$ and $$X_3$$, respectively.

The vector $$\bar{g}$$ is the sample analogue of the GMM moment conditions $$E[g_{jt}] = 0$$ where

(12)$g_{jt} = \begin{bmatrix} Z_{D,jt}\xi_{jt} & Z_{S,jt}\omega_{jt} \end{bmatrix}.$

defines the $$N \times (M_D + M_S)$$ matrix of GMM moments $$g$$.

In each GMM stage, a nonlinear optimizer finds the $$\hat{\theta}$$ that minimizes the GMM objective value $$q(\theta)$$.

### The Objective¶

Given a $$\hat{\theta}$$, the first step to computing the objective $$q(\hat{\theta})$$ is to compute $$\delta(\hat{\theta})$$ in each market with the following standard contraction:

(13)$\delta_{jt} \leftarrow \delta_{jt} + \log s_{jt} - \log s_{jt}(\delta, \hat{\theta})$

where $$s$$ are the market’s observed shares and $$s(\delta, \hat{\theta})$$ are calculated marketshares. Iteration terminates when the norm of the change in $$\delta(\hat{\theta})$$ is less than a small number.

With a supply side, marginal costs are then computed according to (7):

(14)$c_{jt}(\hat{\theta}) = p_{jt} - \eta_{jt}(\hat{\theta}).$

Concentrated out linear parameters are recovered with linear IV-GMM:

(15)$\begin{split}\begin{bmatrix} \hat{\beta}^x \\ \hat{\gamma} \end{bmatrix} = (X'ZWZ'X)^{-1}X'ZWZ'Y(\hat{\theta})\end{split}$

where

(16)$\begin{split}X = \begin{bmatrix} X_1^x & 0 \\ 0 & X_3 \end{bmatrix}, \quad Z = \begin{bmatrix} Z_D & 0 \\ 0 & Z_S \end{bmatrix}, \quad Y(\hat{\theta}) = \begin{bmatrix} \delta(\hat{\theta}) - X_1^p\hat{\alpha} & 0 \\ 0 & \tilde{c}(\hat{\theta}) \end{bmatrix}.\end{split}$

With only a demand side, $$\alpha$$ can be concentrated out, so $$X = X_1$$, $$Z = Z_D$$, and $$Y = \delta(\hat{\theta})$$ recover the full $$\hat{\beta}$$ in (15).

Finally, the unobserved product characteristics (structural errors),

(17)$\begin{split}\begin{bmatrix} \xi(\hat{\theta}) \\ \omega(\hat{\theta}) \end{bmatrix} = \begin{bmatrix} \delta(\hat{\theta}) - X_1\hat{\beta} \\ \tilde{c}(\hat{\theta}) - X_3\hat{\gamma} \end{bmatrix},\end{split}$

are interacted with the instruments to form the sample moment conditions $$\bar{g}(\hat{\theta})$$ in (11), which give the GMM objective $$q(\hat{\theta})$$ in (10).

The gradient of the GMM objective in (10) is

(18)$\nabla q(\theta) = 2N^2\bar{G}(\theta)'W\bar{g}(\theta)$

where

(19)$\begin{split}\bar{G} = \frac{1}{N} \begin{bmatrix} \sum_{j,t} Z_{D,jt}'\frac{\partial\xi_{jt}}{\partial\theta} \\ \sum_{j,t} Z_{S,jt}'\frac{\partial\omega_{jt}}{\partial\theta} \end{bmatrix}.\end{split}$

Writing $$\delta$$ as an implicit function of $$s$$ in (4) gives the demand-side Jacobian:

(20)$\frac{\partial\xi}{\partial\theta} = \frac{\partial\delta}{\partial\theta} = -\left(\frac{\partial s}{\partial\delta}\right)^{-1}\frac{\partial s}{\partial\theta}.$

The supply-side Jacobian is derived from the definition of $$\tilde{c}$$ in (9):

(21)$\frac{\partial\omega}{\partial\theta} = \frac{\partial\tilde{c}}{\partial\theta_p} = -\frac{\partial\tilde{c}}{\partial c}\frac{\partial\eta}{\partial\theta}$

where the second term is derived from the definition of $$\eta$$ in (7):

(22)$\frac{\partial\eta}{\partial\theta} = -\Delta^{-1}\left(\frac{\partial\Delta}{\partial\theta}\eta + \frac{\partial\Delta}{\partial\xi}\eta\frac{\partial\xi}{\partial\theta}\right).$

### Weighting Matrices¶

Conventionally, the 2SLS weighting matrix is used in the first stage:

(23)$\begin{split}W = \begin{bmatrix} (Z_D'Z_D)^{-1} & 0 \\ 0 & (Z_S'Z_S)^{-1} \end{bmatrix}.\end{split}$

With two-step GMM, $$W$$ is updated before the second stage according to

(24)$W = S^{-1}.$

For heteroscedasticity robust weighting matrices,

(25)$S = \frac{1}{N}\sum_{j,t}^N g_{jt}g_{jt}'.$

For clustered weighting matrices, which account for arbitrary correlation within $$c = 1, 2, \dotsc, C$$ clusters,

(26)$S = \frac{1}{N}\sum_{c=1}^C q_cq_c',$

where, letting the set $$\mathscr{J}_c \subset \{1, 2, \ldots, N\}$$ denote products in cluster $$c$$,

(27)$q_c = \sum_{j\in\mathscr{J}_c} g_j.$

For unadjusted weighting matrices,

(28)$\begin{split}S = \frac{1}{N} \begin{bmatrix} \text{Var}(\xi)Z_D'Z_D & \text{Cov}(\xi, \omega)Z_D'Z_S \\ \text{Cov}(\xi, \omega)Z_S'Z_D & \text{Var}(\omega)Z_S'Z_S \end{bmatrix}.\end{split}$

### Standard Errors¶

The covariance matrix of the estimated parameters is

(29)$\text{Var}(\hat{\theta}) = (\bar{G}'W\bar{G})^{-1}\bar{G}'WSW\bar{G}(\bar{G}'W\bar{G})^{-1}.$

Standard errors are the square root of the diagonal of this matrix divided by $$N$$.

If the weighting matrix was chosen such that $$W = S^{-1}$$, this simplifies to

(30)$\text{Var}(\hat{\theta}) = (\bar{G}'W\bar{G})^{-1}.$

Standard errors extracted from this simpler expression are called unadjusted.

## Fixed Effects¶

The unobserved product characteristics can be partitioned into

(31)$\begin{split}\begin{bmatrix} \xi_{jt} \\ \omega_{jt} \end{bmatrix} = \begin{bmatrix} \xi_{k_1} + \xi_{k_2} + \cdots + \xi_{k_{E_D}} + \Delta\xi_{jt} \\ \omega_{\ell_1} + \omega_{\ell_2} + \cdots + \omega_{\ell_{E_S}} + \Delta\omega_{jt} \end{bmatrix}\end{split}$

where $$k_1, k_2, \dotsc, k_{E_D}$$ and $$\ell_1, \ell_2, \dotsc, \ell_{E_S}$$ index unobserved characteristics that are fixed across $$E_D$$ and $$E_S$$ dimensions. For example, with $$E_D = 1$$ dimension of product fixed effects, $$\xi_{jt} = \xi_j + \Delta\xi_{jt}$$.

Small numbers of fixed effects can be estimated with dummy variables in $$X_1$$, $$X_3$$, $$Z_D$$, and $$Z_S$$. However, this approach does not scale with high dimensional fixed effects because it requires constructing and inverting an infeasibly large matrix in (15).

Instead, fixed effects are typically absorbed into $$X$$, $$Z$$, and $$Y(\hat{\theta})$$ in (15). With one fixed effect, these matrices are simply de-meaned within each level of the fixed effect. Both $$X$$ and $$Z$$ can be de-meaned just once, but $$Y(\hat{\theta})$$ must be de-meaned for each new $$\hat{\theta}$$.

This procedure is equivalent to replacing each column of the matrices with residuals from a regression of the column on the fixed effect. The Frish-Waugh-Lovell (FWL) theorem of Frisch and Waugh (1933) and Lovell (1963) guarantees that using these residualized matrices gives the same results as including fixed effects as dummy variables. When $$E_D > 1$$ or $$E_S > 1$$, the matrices are residualized with more involved algorithms.

Once fixed effects have been absorbed, estimation is as described above with the structural errors $$\Delta\xi$$ and $$\Delta\omega$$.

## Random Coefficients Nested Logit¶

Incorporating parameters that measure within nesting group correlation gives the random coefficients nested logit (RCNL) model of Brenkers and Verboven (2006) and Grigolon and Verboven (2014). There are $$h = 1, 2, \dotsc, H$$ nesting groups and each product $$j$$ is assigned to a group $$h(j)$$. The set $$\mathscr{J}_{ht} \subset \{1, 2, \ldots, J_t\}$$ denotes the products in group $$h$$ and market $$t$$.

In the RCNL model, idiosyncratic preferences are partitioned into

(32)$\epsilon_{jti} = \bar{\epsilon}_{h(j)ti} + (1 - \rho_{h(j)})\bar{\epsilon}_{jti}$

where $$\bar{\epsilon}_{jti}$$ is Type I Extreme Value and $$\bar{\epsilon}_{h(j)ti}$$ is distributed such that $$\epsilon_{jti}$$ is still Type I Extreme Value.

The nesting parameters, $$\rho$$, can either be a $$H \times 1$$ vector or a scalar so that for all groups $$\rho_h = \rho$$. Letting $$\rho \to 0$$ gives the standard BLP model and $$\rho \to 1$$ gives division by zero errors. With $$\rho_h \in (0, 1)$$, the expression for choice probabilities in (5) becomes more complicated:

(33)$s_{jti} = \frac{\exp[V_{jti} / (1 - \rho_{h(j)})]}{\exp[V_{h(j)ti} / (1 - \rho_{h(j)})]}\cdot\frac{\exp V_{h(j)ti}}{1 + \sum_{h=1}^H \exp V_{hti}}$

where

(34)$V_{hti} = (1 - \rho_h)\log\sum_{k\in\mathscr{J}_{ht}} \exp[V_{kti} / (1 - \rho_h)].$

The contraction for $$\delta(\hat{\theta})$$ in (13) is also slightly different:

(35)$\delta_{jt} \leftarrow \delta_{jt} + (1 - \rho_{h(j)})[\log s_{jt} - \log s_{jt}(\delta, \hat{\theta})].$

Otherwise, estimation is as described above with $$\rho$$ included in $$\theta$$.

## Logit and Nested Logit¶

Letting $$\Sigma = 0$$ gives the simpler logit (or nested logit) model where there is a closed-form solution for $$\delta$$. In the logit model,

(36)$\delta_{jt} = \log s_{jt} - \log s_{0t},$

and a lack of nonlinear parameters means that nonlinear optimization is not needed.

In the nested logit model, $$\rho$$ must be optimized over, but there is still a closed-form solution for $$\delta$$:

(37)$\delta_{jt} = \log s_{jt} - \log s_{0t} - \rho_{h(j)}[\log s_{jt} - \log s_{h(j)t}].$

where

(38)$s_{ht} = \sum_{j\in\mathscr{J}_{ht}} s_{jt}.$

In both models, a supply side can still be estimated jointly with demand. Estimation is as described above with a representative agent in each market: $$I_t = 1$$ and $$w_1 = 1$$.

## Equilibrium Prices¶

Counterfactual evaluation, synthetic data simulation, and optimal instrument generation often involve solving for prices implied by the Bertrand first order conditions in (7). Solving this system with Newton’s method is slow and iterating over $$p \leftarrow c + \eta(p)$$ may not converge because it is not a contraction.

Instead, Morrow and Skerlos (2011) reformulate the solution to (7):

(39)$p - c = \underbrace{\Lambda^{-1}(O \odot \Gamma)'(p - c) - \Lambda^{-1}}_{\zeta}$

where $$\Lambda$$ is a diagonal $$J_t \times J_t$$ matrix approximated by

(40)$\Lambda_{jj} \approx \sum_{i=1}^{I_t} w_i s_{jti}\frac{\partial U_{jti}}{\partial p_{jt}}$

and $$\Gamma$$ is a dense $$J_t \times J_t$$ matrix approximated by

(41)$\Gamma_{jk} \approx \sum_{i=1}^{I_t} w_i s_{jti}s_{kti}\frac{\partial U_{jti}}{\partial p_{jt}}.$

Equilibrium prices are computed by iterating over the $$\zeta$$-markup equation in (39),

(42)$p \leftarrow c + \zeta(p),$

which, unlike (7), is a contraction. Iteration terminates when the norm of firms’ first order conditions, $$||\Lambda(p)(p - c - \zeta(p))||$$, is less than a small number.