-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate enzymes by whether they are produced by bacteria or fungi #766
base: develop
Are you sure you want to change the base?
Separate enzymes by whether they are produced by bacteria or fungi #766
Conversation
…ferent-types-of-enzyme
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #766 +/- ##
===========================================
+ Coverage 94.72% 94.77% +0.05%
===========================================
Files 75 75
Lines 5197 5248 +51
===========================================
+ Hits 4923 4974 +51
Misses 274 274 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced about all the code duplication this brings to the table. The same can probably be said about other parts of the code where the same operations are done on a finite, hardcoded set of options.
I am going to suggest an alternative, and we can discuss if it's unclear or feels cumbersome.
- Define an enumeration that contains fungi and bacteria:
from enum import Enum
class Microbium(str, Enum): # You can get a better name for this
FUNGI = "fungi"
BACTERIA = "bacteria"
- Define all the relevant constants as
dict[Microbium, float]
and assign them as, eg.:
half_sat_pom_decomposition: dict[Microbium, float] = {}
half_sat_pom_decomposition[Microbium.BACTERIA]: float = 70.0
half_sat_pom_decomposition[Microbium.FUNGI]: float = 35.0
- For variables in the
Data
object, it is trickier, to be honest. One option will be to add another axes containing theMicrobium
data, but that's a lot of work and I'm not sure how good the support is for non-spacial axes. So I suggest to just have independent variables and construct the variable names on the fly as needed, which typically will be in loops. Eg.
for m in Microbium:
data[f"some_variable_{m.value}"] = complex_calculation(
...
data[f"other_variable_{m.value}"]
)
As I say, this might have some rough edges and aspects we need to look in detail, but I feel it will make the code cleaner in the long run and might be applicable to other use cases.
@dalonsoa, I hadn't come across |
I guess the issue I'm running into is how this would work with the existing constants system?
Also how would non-default constant values be provided? At the moment, an alternative value for e.g.
I guess the new system would work as follows
The only issue I can see there is that both values would always have to be provided when one of the defaults is being changed, which is more complex, but I suppose not the end of the world. |
I think there is a problem there with the variables system - those would also need to be defined dynamically? |
Isn't this basically the same issue that lead to the Couldn't the parameters just go in here: virtual_ecosystem/virtual_ecosystem/models/soil/microbial_groups.py Lines 11 to 17 in 303661e
|
I guess I'd also try and reduce duplication by allowing multiple instances of classes rather than duplicating names within them. So that for example: @dataclass
class EnzymePoolChanges:
"""Changes to the different enzyme pools due to production and denaturation."""
net_change_pom: NDArray[np.float32]
"""Net change in the produced enzyme pool that breaks down :term:`POM`.
Units of [kg C m^-3 day^-1]
"""
net_change_maom: NDArray[np.float32]
"""Net change in the produced enzyme pool that breaks down :term:`MAOM`.
Units of [kg C m^-3 day^-1]
"""
denaturation: NDArray[np.float32]
"""Total denaturation rate for all the enzyme produced by bacteria.
Units of [kg C m^-3 day^-1]
""" But then you might have a set of |
At the moment they could go in there. The issue is that down the road the microbial groups will get further differentiated, and adding an extra functional group shouldn't necessarily imply that extra enzymes need to be added (e.g. I really don't see a value in us trying to distinguish enzymes produced by saprotrophic fungi from those produced by mycorrhizal fungi). I suppose we could make another data class for the different enzymes, and then add something to the |
I guess it seems sensible to me to have parameters for the different microbial groups come in via the same mechanism. It does get complicated if you don't need the full "table" of parameters for every group. You'd have to make attributes that aren't required in all classes optional. There is then a question about enforcement. With a Would you ever have a situation where a parameter must be identical between two groups? It could always be configured as identical but would we ever need to enforce that? |
This is a sketch - I kind of feel like subclasses would be more canonical though! from dataclasses import dataclass
from typing import Optional, ClassVar
@dataclass(frozen=True)
class MFT:
name: str
a: float
b: Optional[float] = None
required: ClassVar[dict] = {
"aaa": ["a", "b"],
"bbb": ["a"],
}
def __post_init__(self):
if self.name not in self.required:
raise ValueError(f"Unknown name value: {self.name}")
not_populated = [
attr for attr in self.required[self.name] if getattr(self, attr) is None
]
if not_populated:
raise ValueError(
f"The following attributes cannot be undefined for {self.name}: "
f"{', '.join(not_populated)}"
)
MFT(name="aab", a=1, b=1) # Fails - unknown name
MFT(name="aaa", a=1, b=1) # Good
MFT(name="aaa", a=1) # Fails, missing b
MFT(name="bbb", a=1) # Good |
I guess the problem is that some of the enzymes will be shared between functional groups, e.g. I want to differentiate enzymes based on substrate and whether they were produced by a prokaryote or a eukaryote. Parameterising differences between enzymes based on the specific class of fungi that produced them is to my best knowledge nearly impossible, so I don't really want to introduce it as an option |
That's what I feared. You've basically got a tree of parameter definitions, some of which are defined at the tips of groups and some of which are named at internal nodes for all children of that branch. I guess the question then is how much is it worth trying to capture that structure within the code as opposed to just defining a load of parameters and then having overlapping code. |
An interesting problem the one you have here... I think, based on what I understand of the problem, subclassing might be the right way to go as it allows, essentially, an infinitely deep tree of parameter definitions with branching and overriding as needed, and with all the children having access to the parameters of the parents. Not sure if |
@dalonsoa @davidorme, I've tried to address your comments by implementing a new I've also tried to refactor the two worst offenders on the "dense hardcoded logic front" which were Anyway let me know if you think that my approach seems sensible or not! |
…ferent-types-of-enzyme
Description
This PR splits the enzyme pools (which were previously only split by substrate) based on whether they are produced by bacteria or fungi. The reason for making this split is that fungi (all eukaryotes really) produce substantively more complex enzyme than bacteria, and we want to capture this in someway. I've done this in a fairly straightforward manner by just defining two new enzyme pools with new constants to represent that these fungal enzymes are (generally) more effective.
As always feedback on code style/readablity etc would be appreciated
Fixes #761
Type of change
Key checklist
pre-commit
checks:$ pre-commit run -a
$ poetry run pytest
Further checks