Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for other spaCy models for PII detection #64

Open
einarbmag opened this issue Jan 3, 2024 · 1 comment
Open

Add support for other spaCy models for PII detection #64

einarbmag opened this issue Jan 3, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@einarbmag
Copy link

Is your feature request related to a problem? Please describe.

We want to install NB Defense in resource-constrained environments. The hard-coded en_core_web_trf requirement for PII detection takes up significant amount of memory, and ideally requires a GPU to run reasonably fast.

Describe the solution you'd like

I would like to be able to install any spaCy model I want (e.g. en_core_web_md) and specify which model to use for PII detection using an environment variable or CLI argument.

@einarbmag einarbmag added the enhancement New feature or request label Jan 3, 2024
@einarbmag einarbmag changed the title Add support for other spaCy models Add support for other spaCy models for PII detection Jan 3, 2024
@badarahmed
Copy link
Collaborator

Thanks for filing the issue. We tested with different spacy models and found the results with the non-transformer models disappointing. Transformer model is definitely more resource hungry so it makes sense to try to use smaller models in a resource constrained environment (if you have to).

Please feel free to file a PR. The spacy model is being set in:

analyzer = AnalyzerEngine(nlp_engine=SpacyNlpEngine({"en": "en_core_web_trf"}))

and settings can be changed in following places:

DEFAULT_SETTINGS = {

class Settings(abc.ABC):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants