-
Notifications
You must be signed in to change notification settings - Fork 29
Config update mechanism, keep track of explicitly set config parameters #205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We probably need to add documentation on the configuration semantics: how to instantiate a config, how to update it, how to create a new config class, which values will be implicit defaults and which will not. It should also explain how to set implicit defaults in |
Just a note: if I’m correct, Hydra handles all its override and update semantics within OmegaConf before converting the config into an object (whether it’s a dataclass or a Pydantic model). This means it doesn’t need to deal with the complexity of default values during the override process — at the OmegaConf level, it's essentially a smart dictionary. Default values either don’t exist yet at that level or are set later by the object itself if the value is missing in OmegaConf level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my comment on adding docs for configs.
Agreed we need documentation, but I'd prefer to keep it for a separate PR. #225
See the comment in the description on why I chose not to do that. Hydra can get away with this because its configs are really just dicts, but Fast-LLM ones to quite a bit more: they can be modified post-creation or by other configs, they have the functionality of a full class, etc. So the pre-creation dict (which is roughly what the explicit parameters are tracking) isn't enough to reproduce the exact same config. |
Guys, thanks for the work here and the discussion. I want to take a step back and re-center us on Fast-LLM's mission. Fast-LLM is meant to be a focused, opinionated training stack for adapting state-of-the-art LLM techniques to real-world use. It's not meant to be a maximal-flexibility framework or an open-ended experimentation platform. Our priority is operational efficiency, with clear, modular extension points for model, data, and training logic. Right now, the configuration system is becoming too central and too complex. I don't think that's where our complexity budget should go. It's adding friction for contributors, especially those focused on model development, and it's steering the design of the system in directions that don't serve our primary goals. Before we invest further in documenting or expanding the current config machinery (e.g. via #225), we need to have a broader team conversation about whether this is even the direction we want. Let's be open to simpler paths like relying on OmegaConf dict overrides and delaying structured validation. That may give us 80% of the benefit with a fraction of the cost. I'll schedule a design sync to align on this. Until then, let's hold off on major additions to the config layer. |
I'm actually not against @jlamypoirier implementation of the config — I think it even deserves its own library, something like Some usability aspects could be improved — for example, having a That said, I’m not sure switching to just OmegaConf would be enough, since it doesn’t support merging config files with command-line arguments — only Hydra does. Let’s discuss this more during the meeting. |
@bigximik, thanks for your thoughts on this! It's helpful, especially the perspective around usability and documentation. But let's step back and consider our priorities here. Our core mission with Fast-LLM is operational excellence in training and streamlined experimenting with language models. Every bit of cognitive complexity we introduce (especially in the config layer) is complexity we pay for every single day, in every experiment, code contribution, PR review, or debugging session. Here are a few reasons why I think we need to be cautious about further investment in our custom config stack:
Let's definitely revisit this as a team. But I'm leaning strongly toward reducing the role of the custom config system, favoring simplicity and explicitness over generalized flexibility. The goal is to make Fast-LLM easy to use and easy to contribute to. That's how we scale both the system and the team. |
I am not following the opposition here. This PR is not adding complexity, it's reducing it it. I made the changes specifically to improve user and developer experience: configs updates are made more omegaconf/hydra-like, loading pretrained models work just as we'd like, dataset sampling configuration is simpler, and serialized configs are made more useful and readable. @tscholak I understand your preference for omegaconf/hydra, and I am well-aware that Fast-LLM development is difficult, but the config system is not the problem, quite the opposite. I got no real indication from my experience (PR reviews, issue board, bug report, discussion with developers, etc.) that the configuration mechanism is a noticeable pain point for development.
Feature-creep is nowhere in sight for the config mechanism. This is the first important change to the config system in 6 months (since we added yaml config), and the first change to
It's not. The codebase is still shaped around the principles of speed, flexibility and ease, and we need a robust configuration mechanism to support the latter two.
Omegaconf, hydra, pydantic, etc. may be more familiar to many users, but they are absolutely not simpler.
Fast-LLM opted for a custom config mechanism by unanimous consensus when it was time to make that decision. It was driven by actual pain points I've hit in practice. My personal preference was pydantic first, then hydra, but these caused too much trouble, mostly:
In the end I chose the plain python So anyway the situation right now is we have a simpl, robust and low-maintenance configuration mechanism for Fast-LLM that is working well in practice. We can discuss other options going forward, but a major change of direction would be very difficult to do at this point, if even possible, and would most likely not help with anything. |
thanks @jlamypoirier. let's disentangle this discussion from this PR. |
I still would like to respond though to what you wrote, @jlamypoirier.
A fairer description would be: a custom system was adopted during an earlier phase when the team was smaller and priorities were different. The majority of current contributors weren't part of that discussion, and we now have several people (internally and externally) struggling with this system, despite being productive with standard tools like OmegaConf or Pydantic in other projects. So rather than treat past consensus as a permanent mandate, we should ask whether the current system still serves our needs, and whether it's the right foundation as we grow the team and the codebase. |
Just because people aren't filing bug reports or call you on the weekend doesn't mean the system is working well. Confusion, reluctance to touch config code, or copy-paste programming and then wondering why things break are all signs of friction. We've seen this already from newer contributors and from people trying to integrate external models. |
I've also been thinking again about your suggestion to collapse config classes using tagged unions (aka "dynamic classes"). I agree that could help flatten the hierarchy and reduce duplication or conflicts. But I want to be precise about the kind of problem this addresses: it's a structural refactor, not a conceptual simplification. When I reviewed #211, I realized that the deeper issue isn't necessarily the number of config classes. A good config system can handle a large class hierarchy without much friction. The real problem is that we don't have a clear model of configuration layering. Right now, when someone reads a config field, it's hard to answer basic questions: Instead of a clean layering model, we have recursive validation that mutates and validates at the same time. There's a lot of hidden state (_validated, _explicit_fields, _setting_implicit_default, etc.) and it's not obvious when a value was set, why it was set, or whether it's safe to override. We're juggling implicit vs. explicit values, global consistency constraints, and derived fields, all wrapped inside a validation system that also mutates. The end result is opaque behaviour, even for experienced developers. So while tagged unions might help declutter the config class structure, I'm afraid they won't address the bigger problem: the lack of a transparent, predictable config merge model. In the other PR, I proposed something pragmatic: define a clear config precedence stack, and make layering explicit. For example:
Each layer should be immutable, and we should avoid encoding behavior in validation or relying on subclassing to handle logic. Merges should be predictable. Validation should be side-effect-free. This would simplify the mental model and give us a foundation that scales better across contributors. This should help even if the number of config classes stays the same. |
Note: The PR is a lot simpler than this description makes it look. Things should "just work"
✨ Description
Config update mechanism
Make a second config update mechanism where we only update provided parameters in nested configs (eg. like in hydra) rather then overriding the whole thing. This is what we want when updating a config with another. Also:
tuple
config serialization format in_to_dict
/to_serialized
since it's no longer needed. The tuple format is still available infrom_dict
for nested overrides (used for command-line args) but doesn't need to be specified explicitly, the method can figure it out._to_dict
/to_serialized
distinction irrelevant, I merged them to a singleto_dict
.Updated sampling configs to use this feature.
Keep track of explicitly set config parameters
When overriding a config with another, we only want to override explicitly specified parameters, and not replace everything with default values as is the case with the current update mechanism. I addressed by keeping track of explicitly set parameters in the config so they are the one used for serialization and update (
verbose
can still be used to show more). Note that I had to discard some potentially simpler alternatives:TrainingConfig.model
as a plain, unvalidated dict and to store the actual updated config in a separate non-dataclass field. On top of the confusion, the resulting updated config would not have been serialized anywhere, so would be hard to trace.Another reason for keeping track of explicitly set parameters is that it has several other positive side effects like:
Add tests for config mechanism, restrict types
Added a bunch of tests for the config mechanism to make sure the new (and old) features work properly. Also restricted the type of most config fields to an exact type match to address some known typing problems where a derived type was accepted but caused trouble later on (ex. numpy float as float, enum as str).
Misc
All of this allows making the model config override the pretrained config (#170, #211), which in turn allows setting non-architecture parameters in converted configs #166.
Known restrictions/special cases:
__setattr__
, and the_set_implicit_default
context is used to mark implicit fields.🔍 Type of change
Select all that apply: