pyblp.Formulation

class pyblp.Formulation(formula, absorb=None, absorb_method=None, absorb_options=None)

Configuration for designing matrices and absorbing fixed effects.

Internally, the patsy package is used to convert data and R-style formulas into matrices. All of the standard binary operators can be used to design complex matrices of factor interactions:

  • + - Set union of terms.

  • - - Set difference of terms.

  • * - Short-hand. The formula a * b is the same as a + b + a:b.

  • / - Short-hand. The formula a / b is the same as a + a:b.

  • : - Interactions between two sets of terms.

  • ** - Interactions up to an integer degree.

However, since factors need to be differentiated (for example, when computing elasticities), only the most essential functions are supported:

  • C - Mark a variable as categorical. See patsy.builtins.C(). Arguments are not supported.

  • I - Encapsulate mathematical operations. See patsy.builtins.I().

  • log - Natural logarithm function.

  • exp - Natural exponential function.

Data associated with variables should generally already be transformed. However, when encapsulated by I(), these operators function like normal mathematical operators on numeric variables: + adds, - subtracts, * multiplies, / divides, and ** exponentiates.

Internally, mathematical operations are parsed and evaluated by the SymPy package, which is also used to symbolically differentiate terms when derivatives are needed.

Parameters
  • formula (str) – R-style formula used to design a matrix. Variable names will be validated when this formulation and data are passed to a function that uses them. By default, an intercept is included, which can be removed with 0 or -1. If absorb is specified, intercepts are ignored.

  • absorb (str, optional) – R-style formula used to design a matrix of categorical variables representing fixed effects, which will be absorbed into the matrix designed by formula by the PyHDFE package. Fixed effect absorption is only supported for some matrices. Unlike formula, intercepts are ignored. Only categorical variables are supported.

  • absorb_method (str, optional) –

    Method by which fixed effects will be absorbed. For a full list of supported methods, refer to the residualize_method argument of pyhdfe.create().

    By default, the simplest methods are used: simple de-meaning for a single fixed effect and simple iterative de-meaning by way of the method of alternating projections (MAP) for multiple dimensions of fixed effects. For multiple dimensions, non-accelerated MAP is unlikely to be the fastest algorithm. If fixed effect absorption seems to be taking a long time, consider using a different method such as 'lsmr', using absorb_options to specify a MAP acceleration method, or configuring other options such as termination tolerances.

  • absorb_options (dict, optional) – Configuration options for the chosen method, which will be passed to the options argument of pyhdfe.create().

Examples