pyblp.Formulation

class pyblp.Formulation(formula, absorb=None, absorb_method=None)

Configuration for designing matrices and absorbing fixed effects.

Internally, the patsy package is used to convert data and R-style formulas into matrices. All of the standard binary operators can be used to design complex matrices of factor interactions:

  • + - Set union of terms.

  • - - Set difference of terms.

  • * - Short-hand. The formula a * b is the same as a + b + a:b.

  • / - Short-hand. The formula a / b is the same as a + a:b.

  • : - Interactions between two sets of terms.

  • ** - Interactions up to an integer degree.

However, since factors need to be differentiated (for example, when computing elasticities), only the most essential functions are supported:

  • C - Mark a variable as categorical. See patsy.builtins.C(). Arguments are not supported.

  • I - Encapsulate mathematical operations. See patsy.builtins.I().

  • log - Natural logarithm function.

  • exp - Natural exponential function.

Data associated with variables should generally already be transformed. However, when encapsulated by I(), these operators function like normal mathematical operators on numeric variables: + adds, - subtracts, * multiplies, / divides, and ** exponentiates.

Internally, mathematical operations are parsed and evaluated by the SymPy package, which is also used to symbolically differentiate terms when derivatives are needed.

Parameters
  • formula (str) – R-style formula used to design a matrix. Variable names will be validated when this formulation and data are passed to a function that uses them. By default, an intercept is included, which can be removed with 0 or -1. If absorb is specified, intercepts are ignored.

  • absorb (str, optional) – R-style formula used to design a matrix of categorical variables representing fixed effects, which will be absorbed into the matrix designed by formula. Fixed effect absorption is only supported for some matrices. Unlike formula, intercepts are ignored. Only categorical variables are supported.

  • absorb_method (str or Iteration, optional) –

    The method with which fixed effects will be absorbed. One of the following:

    • 'simple' (default for one fixed effect) - Use simple de-meaning. This method is very unlikely to fully absorb more than one fixed effect.

    • 'memory' (default for two fixed effects) - Use the Somaini and Wolak (2016) algorithm, which only works for two-way fixed effects, and which requires inversion of a dense matrix with dimensions equal to the smaller number of fixed effect groups.

    • 'speed' - Use the same Somaini and Wolak (2016) algorithm but pre-compute the \(A\) matrix, which is a dense matrix with dimensions equal to the larger number of fixed effect groups. Again, this method only works for two-way fixed effects.

    • Iteration (default for more than two fixed effects) - Use the method of alternating projections described, for example, in Guimarães and Portugal (2010). By default, Iteration('simple', {'atol': 1e-12}) is used to iteratively de-mean the matrix within each fixed effect level until convergence. This method is equivalent to 'simple' for one fixed effect, and it will also work for two fixed effects, although either variant of the Somaini and Wolak (2016) algorithm is often more performant.

Examples