Skip to content

Config update mechanism, keep track of explicitly set config parameters #205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 14, 2025

Conversation

jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Mar 26, 2025

Note: The PR is a lot simpler than this description makes it look. Things should "just work"

✨ Description

Config update mechanism

Make a second config update mechanism where we only update provided parameters in nested configs (eg. like in hydra) rather then overriding the whole thing. This is what we want when updating a config with another. Also:

  • Removed the tuple config serialization format in _to_dict/to_serialized since it's no longer needed. The tuple format is still available in from_dict for nested overrides (used for command-line args) but doesn't need to be specified explicitly, the method can figure it out.
  • This in turn makes the _to_dict/to_serialized distinction irrelevant, I merged them to a single to_dict.

Updated sampling configs to use this feature.

Keep track of explicitly set config parameters

When overriding a config with another, we only want to override explicitly specified parameters, and not replace everything with default values as is the case with the current update mechanism. I addressed by keeping track of explicitly set parameters in the config so they are the one used for serialization and update (verbose can still be used to show more). Note that I had to discard some potentially simpler alternatives:

  • Apply the update during the (delayed) config instantiation (see [Prototype] Make the model config override the pretrained config #171). This was my first attempt and would have mostly worked, but would have caused serialization issues. For example, if overriding a (non-default) pretrained parameter with a default value, that value would have not been saved during serialization, and deserializing that saved config would result in the pretrained value being used without override. The selected method avoids this problem by saving the override even when equal to the default value, and makes the delayed instantiation unnecessary.
  • Keep update configs as dict. This would have worked but would have been really annoying to use. For example we would have needed to keep TrainingConfig.model as a plain, unvalidated dict and to store the actual updated config in a separate non-dataclass field. On top of the confusion, the resulting updated config would not have been serialized anywhere, so would be hard to trace.

Another reason for keeping track of explicitly set parameters is that it has several other positive side effects like:

Add tests for config mechanism, restrict types

Added a bunch of tests for the config mechanism to make sure the new (and old) features work properly. Also restricted the type of most config fields to an exact type match to address some known typing problems where a derived type was accepted but caused trouble later on (ex. numpy float as float, enum as str).

Misc

All of this allows making the model config override the pretrained config (#170, #211), which in turn allows setting non-architecture parameters in converted configs #166.

Known restrictions/special cases:

  • Lists aren't fully supported. Updates with nested lists will cause an error, and non-empty lists (and dicts) will always be included in serialization/update (not strictly needed, but reduces the risk of mutations not being treated as explicit setting of parameters). Configs used as updates should not contain lists with non-empty defaults.
  • Setting parameters from a parent config can't be treated as an implicit default, because we may want to rebuild the config without its parent. So such parameters are considered explicit. This is handled automatically in __setattr__, and the _set_implicit_default context is used to mark implicit fields.

🔍 Type of change

Select all that apply:

  • 🐛 Bug fix (non-breaking change that addresses a specific issue)
  • 🚀 New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
  • 📝 Documentation change (updates documentation, including new content or typo fixes)
  • 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

@jlamypoirier jlamypoirier changed the title Config update mechanism, keep track of explicitly set config parameters, override the pretrained config Config update mechanism, keep track of explicitly set config parameters Mar 27, 2025
@jlamypoirier jlamypoirier marked this pull request as ready for review April 2, 2025 22:27
@bigximik
Copy link
Contributor

bigximik commented Apr 4, 2025

We probably need to add documentation on the configuration semantics: how to instantiate a config, how to update it, how to create a new config class, which values will be implicit defaults and which will not. It should also explain how to set implicit defaults in validate, how to set explicit values (even if it's just by not using _set_implicit_default), then defaults should be explicit and implicit in validate. Additionally, it should cover the restrictions you mentioned and other related details.

@bigximik
Copy link
Contributor

bigximik commented Apr 4, 2025

Just a note: if I’m correct, Hydra handles all its override and update semantics within OmegaConf before converting the config into an object (whether it’s a dataclass or a Pydantic model). This means it doesn’t need to deal with the complexity of default values during the override process — at the OmegaConf level, it's essentially a smart dictionary. Default values either don’t exist yet at that level or are set later by the object itself if the value is missing in OmegaConf level.

Copy link
Contributor

@bigximik bigximik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comment on adding docs for configs.

@jlamypoirier
Copy link
Collaborator Author

jlamypoirier commented Apr 4, 2025

We probably need to add documentation on the configuration semantics: how to instantiate a config, how to update it, how to create a new config class, which values will be implicit defaults and which will not. It should also explain how to set implicit defaults in validate, how to set explicit values (even if it's just by not using _set_implicit_default), then defaults should be explicit and implicit in validate. Additionally, it should cover the restrictions you mentioned and other related details.

Agreed we need documentation, but I'd prefer to keep it for a separate PR. #225

Just a note: if I’m correct, Hydra handles all its override and update semantics within OmegaConf before converting the config into an object (whether it’s a dataclass or a Pydantic model). This means it doesn’t need to deal with the complexity of default values during the override process — at the OmegaConf level, it's essentially a smart dictionary. Default values either don’t exist yet at that level or are set later by the object itself if the value is missing in OmegaConf level.

See the comment in the description on why I chose not to do that. Hydra can get away with this because its configs are really just dicts, but Fast-LLM ones to quite a bit more: they can be modified post-creation or by other configs, they have the functionality of a full class, etc. So the pre-creation dict (which is roughly what the explicit parameters are tracking) isn't enough to reproduce the exact same config.

@tscholak
Copy link
Collaborator

tscholak commented Apr 6, 2025

Guys, thanks for the work here and the discussion. I want to take a step back and re-center us on Fast-LLM's mission.

Fast-LLM is meant to be a focused, opinionated training stack for adapting state-of-the-art LLM techniques to real-world use. It's not meant to be a maximal-flexibility framework or an open-ended experimentation platform. Our priority is operational efficiency, with clear, modular extension points for model, data, and training logic.

Right now, the configuration system is becoming too central and too complex. I don't think that's where our complexity budget should go. It's adding friction for contributors, especially those focused on model development, and it's steering the design of the system in directions that don't serve our primary goals.

Before we invest further in documenting or expanding the current config machinery (e.g. via #225), we need to have a broader team conversation about whether this is even the direction we want.

Let's be open to simpler paths like relying on OmegaConf dict overrides and delaying structured validation. That may give us 80% of the benefit with a fraction of the cost.

I'll schedule a design sync to align on this. Until then, let's hold off on major additions to the config layer.

@bigximik
Copy link
Contributor

bigximik commented Apr 7, 2025

I'm actually not against @jlamypoirier implementation of the config — I think it even deserves its own library, something like fast-config. I also think it could be superior to OmegaConf or Hydra in some ways, with features like automatic conversion of class names to convenient config names, and autoloading registries, which @jlamypoirier already started with sampling configs.

Some usability aspects could be improved — for example, having a _preprocess callback instead of overriding from_dict, making registries more visible so we don’t have to jump across files to see which class is being instantiated. Or, even enabling automatic class generation for configs configurable pairs. Right now, it feels a bit too verbose.

That said, I’m not sure switching to just OmegaConf would be enough, since it doesn’t support merging config files with command-line arguments — only Hydra does.

Let’s discuss this more during the meeting.

@tscholak
Copy link
Collaborator

tscholak commented Apr 7, 2025

@bigximik, thanks for your thoughts on this! It's helpful, especially the perspective around usability and documentation.

But let's step back and consider our priorities here. Our core mission with Fast-LLM is operational excellence in training and streamlined experimenting with language models. Every bit of cognitive complexity we introduce (especially in the config layer) is complexity we pay for every single day, in every experiment, code contribution, PR review, or debugging session.

Here are a few reasons why I think we need to be cautious about further investment in our custom config stack:

  1. Extracting "fast-config" into a separate library doesn't solve the core issue: cost of ownership. Even if it's packaged separately, Fast-LLM would still depend on it at every layer. The complexity would still permeate the entire codebase. It may be slightly more modular, but the learning curve, maintenance burden, and confusion remain.

  2. We don't want to run a configuration lab. Our novelty budget is limited, and we should be spending it on model architecture, training efficiency, and scaling, and not on class machinery. The config system should serve those efforts, not compete with them.

  3. Feature creep is a real risk. Things like autoloading, registries, and implicit wiring may sound useful, but they introduce "magic" that makes the system harder to understand and debug. We need explicitness and predictability over cleverness or generality.

  4. The developer experience isn't where it needs to be. I applaud @jlamypoirier for trying and for him putting in a huge amount of work here, and it's clear he's optimizing for flexibility and power. But the result is an interface that is hard to approach and harder to modify unless you already understand the whole system. That's not sustainable. We shouldn't build core infrastructure around a system that only one or two people feel confident dealing with or changing.

  5. The current config system is already shaping how we design the rest of the codebase. That's a red flag. It means we're building around the config layer rather than using config to support our design goals. That reverses the hierarchy of priorities.

  6. We already have simpler tools. OmegaConf (and optionally Hydra) give us robust, battle-tested support for merging configs, overriding values, and managing CLI inputs. These are familiar to virtually all contributors. They are well-documented, and easier to reason about. If we are going to depart from that standard, we need strong, consensus-backed reasons, ideally driven by actual pain points we've hit in practice, not theoretical improvements or personal preferences.

Let's definitely revisit this as a team. But I'm leaning strongly toward reducing the role of the custom config system, favoring simplicity and explicitness over generalized flexibility. The goal is to make Fast-LLM easy to use and easy to contribute to. That's how we scale both the system and the team.

@jlamypoirier
Copy link
Collaborator Author

I am not following the opposition here. This PR is not adding complexity, it's reducing it it. I made the changes specifically to improve user and developer experience: configs updates are made more omegaconf/hydra-like, loading pretrained models work just as we'd like, dataset sampling configuration is simpler, and serialized configs are made more useful and readable.

@tscholak I understand your preference for omegaconf/hydra, and I am well-aware that Fast-LLM development is difficult, but the config system is not the problem, quite the opposite. I got no real indication from my experience (PR reviews, issue board, bug report, discussion with developers, etc.) that the configuration mechanism is a noticeable pain point for development.

  1. Feature creep is a real risk. Things like autoloading, registries, and implicit wiring may sound useful, but they introduce "magic" that makes the system harder to understand and debug. We need explicitness and predictability over cleverness or generality.

Feature-creep is nowhere in sight for the config mechanism. This is the first important change to the config system in 6 months (since we added yaml config), and the first change to config.py in 3 months. Of the 50 currently open issues, the only feature request for the config system is #126, and we're not planning on doing it soon.

  1. The current config system is already shaping how we design the rest of the codebase. That's a red flag. It means we're building around the config layer rather than using config to support our design goals. That reverses the hierarchy of priorities.

It's not. The codebase is still shaped around the principles of speed, flexibility and ease, and we need a robust configuration mechanism to support the latter two.
We do plan our code contributions around the constraints of the existing codebase, this kind of technical debt is unavoidable, and a change of direction isn't going to fix it, quite the opposite.

  1. We already have simpler tools. OmegaConf (and optionally Hydra) give us robust, battle-tested support for merging configs, overriding values, and managing CLI inputs. These are familiar to virtually all contributors. They are well-documented, and easier to reason about.

Omegaconf, hydra, pydantic, etc. may be more familiar to many users, but they are absolutely not simpler.

If we are going to depart from that standard, we need strong, consensus-backed reasons, ideally driven by actual pain points we've hit in practice, not theoretical improvements or personal preferences.

Fast-LLM opted for a custom config mechanism by unanimous consensus when it was time to make that decision. It was driven by actual pain points I've hit in practice. My personal preference was pydantic first, then hydra, but these caused too much trouble, mostly:

  • Poorly formatted error messages that make debugging very difficult.
  • Lots of unnecessary features and "magic" that broke things. I don't like those either.
  • Can't extend when a feature is missing
  • (Hydra) Config classes can't have methods.
  • (Hydra) Derived fields are only indirectly supported and unsafe
  • Limited support for some complex types.

In the end I chose the plain python dataclasses module as a starting point and added the bare minimum needed, which ended up way simpler and more robust than the alternatives. The feature has grown bit since then, but remains quite small at less than 1000 line.

So anyway the situation right now is we have a simpl, robust and low-maintenance configuration mechanism for Fast-LLM that is working well in practice. We can discuss other options going forward, but a major change of direction would be very difficult to do at this point, if even possible, and would most likely not help with anything.

@tscholak
Copy link
Collaborator

tscholak commented Apr 9, 2025

thanks @jlamypoirier. let's disentangle this discussion from this PR.

@tscholak
Copy link
Collaborator

I still would like to respond though to what you wrote, @jlamypoirier.

Fast-LLM opted for a custom config mechanism by unanimous consensus when it was time to make that decision.

A fairer description would be: a custom system was adopted during an earlier phase when the team was smaller and priorities were different. The majority of current contributors weren't part of that discussion, and we now have several people (internally and externally) struggling with this system, despite being productive with standard tools like OmegaConf or Pydantic in other projects.

So rather than treat past consensus as a permanent mandate, we should ask whether the current system still serves our needs, and whether it's the right foundation as we grow the team and the codebase.

@tscholak
Copy link
Collaborator

I got no real indication from my experience (PR reviews, issue board, bug report, discussion with developers, etc.) that the configuration mechanism is a noticeable pain point for development.

Just because people aren't filing bug reports or call you on the weekend doesn't mean the system is working well. Confusion, reluctance to touch config code, or copy-paste programming and then wondering why things break are all signs of friction. We've seen this already from newer contributors and from people trying to integrate external models.

@tscholak
Copy link
Collaborator

I've also been thinking again about your suggestion to collapse config classes using tagged unions (aka "dynamic classes"). I agree that could help flatten the hierarchy and reduce duplication or conflicts. But I want to be precise about the kind of problem this addresses: it's a structural refactor, not a conceptual simplification.

When I reviewed #211, I realized that the deeper issue isn't necessarily the number of config classes. A good config system can handle a large class hierarchy without much friction. The real problem is that we don't have a clear model of configuration layering.

Right now, when someone reads a config field, it's hard to answer basic questions:
Is this value coming from a default? Was it set by the user? Was it imported from Hugging Face and then translated? Was it computed from other fields?

Instead of a clean layering model, we have recursive validation that mutates and validates at the same time. There's a lot of hidden state (_validated, _explicit_fields, _setting_implicit_default, etc.) and it's not obvious when a value was set, why it was set, or whether it's safe to override. We're juggling implicit vs. explicit values, global consistency constraints, and derived fields, all wrapped inside a validation system that also mutates. The end result is opaque behaviour, even for experienced developers.

So while tagged unions might help declutter the config class structure, I'm afraid they won't address the bigger problem: the lack of a transparent, predictable config merge model.

In the other PR, I proposed something pragmatic: define a clear config precedence stack, and make layering explicit. For example:

  1. Defaults
  2. Imported config (e.g., from Hugging Face)
  3. User overrides
  4. Derived/computed fields

Each layer should be immutable, and we should avoid encoding behavior in validation or relying on subclassing to handle logic. Merges should be predictable. Validation should be side-effect-free.

This would simplify the mental model and give us a foundation that scales better across contributors. This should help even if the number of config classes stays the same.

@jlamypoirier jlamypoirier merged commit 7a74af0 into main Apr 14, 2025
4 checks passed
@jlamypoirier jlamypoirier deleted the config_updates branch April 14, 2025 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] Inconsistent init_method_std in test_load_distributed_checkpoint_dp2
3 participants