Skip to content

Question for community: We're considering adding pydantic as a base requirement to 🤗 transformers #36329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gante opened this issue Feb 21, 2025 · 9 comments

Comments

@gante
Copy link
Member

gante commented Feb 21, 2025

Hi everyone! 👋

We want to expand our argument validation in transformers to improve the library's overall UX. No one wants to store a config object on the Hub with impossible parameterization or go through the code to find the admissible range for a certain input argument.

To that end, we're considering adding pydantic>=2.0 as a base requirement to 🤗 transformers.

Adding a base requirement is not a decision we want to make lightly -- it may place unwanted constraints on downstream projects. We can't anticipate all issues, so we're raising this issue to proactively find them. If >2.0 is a general issue, we can do a try/except block with import pydantic.v1 as pydantic, and use 1.x syntax.

Let us know your opinion about pydantic!

(Related PR: #35910)

@marthos1
Copy link

고맙습니다

@gante
Copy link
Member Author

gante commented Feb 21, 2025

Extra information:

  • pydantic seems to be the most widely used annotator. Here's a graph with the GH star evolution of related libraries
Image
  • If we move forward, we'll be aiming at pydantic>=2.0. If we do so, it means we will not be compatible with packages that use pydantic<2.0 -- there were breaking changes from 1.x. If we find popular projects or a reasonable number of projects that are not planing to update from 1.x, we can use a try/except block with import pydantic.v1 as pydantic. However, 1.x is known to be much slower, and thus we would need to benchmark speed.

@gante
Copy link
Member Author

gante commented Feb 21, 2025

After some search -- using pydantic v2 syntax would NOT be okay.

We can find a few libraries that have both transformers and pydantic 1.x in their requirements:

  1. transformers + pydantic<2.0 (link). Notable examples with unpinned transformers version:
    a. microsoft/responsible-ai-toolbox, 1.5k stars
    b. ludwig-ai/ludwig, 11.3k stars
  2. transformers + pydantic==1 (link). More than 1500 pip dependency matches (most of them with pinned transformers version)! Notable examples with unpinned transformers version:
    a. wenge-research/YAYI, 3.3k stars
    b. THUDM/CogVideo, 10.7k stars

@BramVanroy
Copy link
Collaborator

If you can get it ready for v5, then such a change would not be too bad I think. Other libraries should pin <5 then.

I am not sure I completely understand how you would use it though. Could you give a small dummy code example?

@julien-c
Copy link
Member

I think we would need to be super super cautious & conservative about this.

BTW ping @Wauplin and @LysandreJik given we decided against adding pydantic as a dependency in huggingface_hub (which in turn is a dependency of transformers)

@julien-c
Copy link
Member

worst case, can't we vendor a subset of pydantic implementation/code instead?

@Manalelaidouni
Copy link
Contributor

We can support both Pydantic v1 and v2 so that any Transformers dependent library that pins to either version will still work and pip can figure out the right version to use.

Since we expect the ecosystem to move entirely to v2, we want it to be easy to drop v1 in the future, to do that we can use the Pydantic V2 namespaces internally, so if v2 is installed we call the native v2 methods directly and if v1 is installed we call their v1 equivalents under the hood.

We also don’t need to map every feature in Pydantic, we mostly need the validators, the base model configuration, Fields, and serialiazation methods.

@gante
Copy link
Member Author

gante commented Feb 28, 2025

Hi everyone 👋

We've heard your feedback (here and in other social media), and it became clear that pydantic would be problematic for some projects, and would give ourselves a few headaches -- we would have to start with v1, for overall compatibility, which would mean we would have to migrate to v2 at some point in the future.

Instead, together with huggingface_hub (which is a key dependency of transformers), we will create shared basic functionality to validate classes that mostly contain data, using dataclasses (part of the standard library) and their __post_init__ method.

This means we will have most benefits (data validation) with no additional dependencies.

Stay tuned 🤗

@Wauplin
Copy link
Contributor

Wauplin commented Feb 28, 2025

I opened huggingface/huggingface_hub#2895 as a suggestion on how to move forward with data validation in the HF ecosystem. Open to feedback and subject to change based on community requirements 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants