-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to define custom operators? #177
Comments
Hi @s3alfisc , Thanks for looping me in. Currently this is not directly possible in stock Formulaic :(. The closest you can do today is something like: import pandas
import re
from formulaic import Formula
from formulaic.utils.stateful_transforms import stateful_transform
@stateful_transform
def varlist(pattern, _context=None):
pattern = re.compile(pattern)
return {
variable: values
for variable, values in _context.named_layers.get("data", {}).items()
if pattern.match(variable)
}
Formula("varlist('X.*')").get_model_matrix(pandas.DataFrame({"X1": [1,2,3], "X2": [1,2,3]}), context={"varlist": varlist}) This is equivalent to the additive terms you demonstrate above (with ugly naming), but would not work so well for interactions, since the Thinking through how this could be improved by additions to Formulaic: we are limited by the fact that the formula parser intentionally has no awareness of the dataset for which model matrices will be generated later on. So what we would need is support for rewriting formulae during materialization. Since we evaluate all of the factors prior to substituting them, we could for example return a new nested |
A slightly less general variant of the above is to add specific syntax for this kind of operation. Something like:
Where we leverage the existing Python code quoting and special case "Python" snippets that start with |
Are you still interested in exploring this @s3alfisc ? With the changes in 1.1, this is definitely within reach. |
Hi @matthewwardrop,
For
pyfixest
, @Wenzhi-Ding and I are currently discussing to add more syntactic formula sugar. For example, the original R package comes with a custom operator for interacting variablesi()
that slightly differs fromC()
as it allows to "drop" reference level columns from the model matrix, or operators for multiple estimation, which I have implemented in a very clunky and ad hoc way in pyfixest's FormulaParser. Eventually I'd like to revisit this part of the code (hopefully rather sooner than later as I am really not too proud) and am wondering if it is possible to easily integrate new "formula operators"? There are some hints in the docs and codebase that suggest that this might not be an impossible task 😄As a more concrete example, would it be possible to e.g. introduce a new operator
varlist
that would evaluateto
Y ~ X1 + X2 + ... + Xk
for all k variables indata
that start withX
by ourselves without "ad hoc" formula parsing on our end? Or would you recommend that we should stick with "ad hoc formula parsing"?Please feel free to just tell me to take a closer look at the docs if appropriate =)
The text was updated successfully, but these errors were encountered: