Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastCC results differ from respective MATLAB function #1154

Open
NantiaL opened this issue Feb 21, 2022 · 20 comments · May be fixed by #1427
Open

FastCC results differ from respective MATLAB function #1154

NantiaL opened this issue Feb 21, 2022 · 20 comments · May be fixed by #1427
Labels

Comments

@NantiaL
Copy link

NantiaL commented Feb 21, 2022

Hello, I would like to raise a question regarding the FastCC implementation of Cobrapy. After multiple runs, FastCC outputs a different number of blocked reactions using the Recon1 model. It might be reasonable because optimization never delivers unique solutions. But, the respective MATLAB function from the COBRAToolbox always returns the same results after several runs. Since both implementations are based on the same algorithm, I guess both should give the same results. So, I think this might be something worths checking.

I noticed that the number of irreversible reactions detected in the FastCC implementation deviates from the number of irreversible reactions computed by the MATLAB function. Reactions that are irreversible in the backward direction are not reported as irreversible in the MATLAB command window.

For Recon1:
MATLAB:

  • 2186 irreversible reactions
  • 1274 blocked reactions

Python:

  • 2192 irreversible reactions
  • blocked reactions differ after each run
    Used solver: glpk, python 3.8.5
@Midnighter
Copy link
Member

Thanks for the report @NantiaL. Can you please post the exact code that you used to generate these results?

@NantiaL
Copy link
Author

NantiaL commented Feb 21, 2022

Hey, here it is:

MATLAB:

is_active = fastcc(model, 1e-4);
inactiveRxns = setdiff(model.rxns, model.rxns(is_active));

Python:

from cobra import *
model = io.read_sbml_model('RECON1.xml')
model.solver= 'glpk'
fastcc(model)

In Python I also got different results over multiple runs after setting the zero_cutoff and/or the flux_threshold to 1e-4.

@NantiaL NantiaL changed the title FastCC results differe from respective MATLAB function FastCC results differ from respective MATLAB function Feb 21, 2022
@cdiener
Copy link
Member

cdiener commented Feb 21, 2022

From a cursory look it seems like there are some bugs in the FastCC coefficient setting. For instance a reaction with a forward flux of 1 and reverse flux of 1 (net flux 0) is considered active on our formulation.

@Midnighter Midnighter added the bug label Feb 21, 2022
@synchon
Copy link
Member

synchon commented Feb 23, 2022

Apologies for the delayed response. I'll take a look. Thanks for the report!

@synchon synchon self-assigned this Feb 23, 2022
@babessell1
Copy link

@synchon

Hello, I was wondering if you ever found any leads on what is causing this issue. I tried to diagnose myself but it got quite overwhelming for me to understand. This issue seems pretty important for research since results should be reproducible.

Thank you

@synchon
Copy link
Member

synchon commented Aug 4, 2022

Hi @babessell1 ! The problem is most likely in the problem formulation as @cdiener pointed out earlier. I have been quite busy lately and I'm not sure when I can take tackle it. Apologies for that as I had self-assigned it. I'll remove it and let someone else take a stab at it.

@synchon synchon removed their assignment Aug 4, 2022
@synchon synchon added help-wanted An issue that should be easy to implement for anyone in the community. and removed help-wanted An issue that should be easy to implement for anyone in the community. labels Aug 4, 2022
@dagl1
Copy link

dagl1 commented Feb 19, 2025

Hi there, I have a more general question as this is apparently still not fixed; the last change to fastCC is in 2021 if I am not mistaken, and at least one bug has been identified in 2022, yet this function still exists in this state. Why is it not removed, or why does it not at least show warnings that it is not to be used. It greatly reduces any type of confidence in this repository if this is the case (I hope I am mistaken and missed the fix/commit, however I too find differences in the amount of reactions found between matlab and python implementations), but the fact that I can use the fastCC function without knowing that there are problems with it, is problematic isn't it?

@Midnighter
Copy link
Member

I could write quite a bit about the project now, instead, please consider the following question:
Whose responsibility is it in your opinion to make those changes?

@dagl1
Copy link

dagl1 commented Feb 19, 2025

The changes that are necessary to fix and test the function are of course no one's responsibility, however the people accepting PR's or even those that identified issues probably do probably have the responsibility to make people aware of known issues. Now I understand that you might say: this is no one's job and as this is an open source repository, people could make these changes if they want to, however I would counter that allowing known bugs to persist without informing users (for several years), is unbecoming of academics.

(this would only require 2 lines of code):

import warnings
...
(inside fastcc()):
warnings.warn("As of 2021, release V... this function contains a known bug which allows reactions with zero net flux to be considered active, as well as known discrepancies between this function (the cobrapy implementation) and its matlab implementation present in the cobra toolbox, see: #1154")

@dagl1 dagl1 linked a pull request Feb 19, 2025 that will close this issue
4 tasks
@dagl1
Copy link

dagl1 commented Feb 20, 2025

@Midnighter Can we keep this issue open (as the PR is going to close it), as the bug still exists. @cdiener Since I will be having a look at this function (and some others) soon, I was wondering if you still see the specific bug you mentioned? I spotted some other discrepancies between this function and the MATLAB implementation, I do not see the coefficient setting that you mention? The abs(flux) part in _find_sparse_mode, won't deal with this abs(-2 + 2) should still be 0 (if you don't remember, no worries, I will take a look later this week/month)

@cdiener
Copy link
Member

cdiener commented Feb 20, 2025

@dagl1 Bug is still there but it is just a sign error. What's implemented here is LP-7 from the paper. However, cobrapy is using a standard form for the LP problem, so v = v_f - v_b. So for the z from the paper, it is, v_f - v_b >= z, ergo v_f - v_b - z >= 0 so there is a sign error in the implementation. The rest looks pretty good. Maybe it would be faster to flip the objective coefficients instead of the constraints, but what is there now is not wrong AFAICT.

Sorry, originally I wanted to give @synchon a chance to fix it first and then I never checked. And there is already a working function to detect blocked reactions and always enough other urgent things to do, so it wasn't the highest priority for me.

@dagl1
Copy link

dagl1 commented Feb 20, 2025

Ah @cdiener you mean this part right (changed from + var to - var already):

const = prob.Constraint(
rxn.forward_variable - rxn.reverse_variable - var,
name="constraint_{}".format(rxn.id),
lb=0.0,
)

I also spotted a small source of discrepancy in how active fluxes are considered, as the Matlab implementation checks for >= .99*epsilon while cobrapy checks > zero_cutoff (zero_cutoff being the same as the epsilon of the Matlab implementation).

However even with these things changed, they still lead to discrepancies, so I will try going over some potential solver setting differences, and investigate the reactions that are actually different.
flux_threshold is just a set of ones in the matlab implementation if I see it correctly, so that shouldn't really need to be changed either way, but I will see if setting it as an int might lead to changes.

If anyone has other ideas that could lead to differences between the functions, let me know. Even with trying over 50 different settings I cannot get the exact same amount of consistent reactions as the matlab function provides (11681 for human-gem 1.17 with matlab, whereas the closest I got was 11677, which is only 4 off, but still, I would hope to get the same results eventually).

@cdiener
Copy link
Member

cdiener commented Feb 20, 2025

Yes exactly, your version is correct. Flux threshold shouldn't matter much for the result. Smaller values can make it faster because more fluxes can carry flux requiring less iterations. Epsilon is definitely important. Does the human GEM have blocked reactions? Otherwise, the largest number of consisten reactions is probably the more correct result....

Getting the exact same result may be hard because of floating point accuracy. The > eps comparison is only true within the solver tolerances. So some fluxes might be too close to be called consistently. We use a slightly tighter tolerance than most other psckages (1e-7) so maybe adjusting the matlab settings brings it more in line. The agreement you are seeing looks pretty good already.

@dagl1
Copy link

dagl1 commented Feb 21, 2025

Yes exactly, your version is correct. Flux threshold shouldn't matter much for the result. Smaller values can make it faster because more fluxes can carry flux requiring less iterations. Epsilon is definitely important. Does the human GEM have blocked reactions? Otherwise, the largest number of consisten reactions is probably the more correct result....

Getting the exact same result may be hard because of floating point accuracy. The > eps comparison is only true within the solver tolerances. So some fluxes might be too close to be called consistently. We use a slightly tighter tolerance than most other psckages (1e-7) so maybe adjusting the matlab settings brings it more in line. The agreement you are seeing looks pretty good already.

Yes I too would think the more lenient values are correct, and most likely floating point arithmatic will be the root of this, but in general it's nice to be able to see the exact same output (just so that any analysis that would take whole network structure or whatever down the line, won't be different between using something in Python vs Matlab). I will let you know what comes out regarding settings (although so far 4 reactions off is the best I can still do).

I am using Human Gem 1.17 (1.19 is the latest), where the original model has ~12700 reactions, so a good ~1000 are filtered away.
Some are trivially easy to spot by just checking the for any metaboltie that can only be produced/removed (taking into account flipping of reversible reactions); I think I will apply loopless FVA either way to really find any reactions that can take no flux, regardless of getting fastCC to be fully congruent or not, just for making sure I can reduce the model as much as possible.

@dagl1
Copy link

dagl1 commented Feb 24, 2025

  • Changing solver tolerances to those used standardly by Cobra toolbox does have some minor effects, however this is at most 1 or 2 reactions different.
  • In general (for almost all the hereafter described results) Matlab with default settings on the same .mat model will have additional consistent reactions. There are some exceptions where the cobrapy function might find 1 or 2 reactions to be consistent that matlab did not find, but overall it is heavily skewed towards the Matlab implementation showing more consistent reactions.
  • Changing the code from the implementation as it is in the current Cobrapy version to @cdiener 's suggestion of flipping the sign (v_forward + v_backward - z_var, >> to, v_forward - v_backward - z_var) will drastically reduce the total amount of consistent reactions (by about a 1000), indicating to me that either the flipping isn't correct, or the Matlab implementation might find reactions to be flux_carrying when they aren't truly capable of (however in all my testing I did it all under the impression that the Matlab code is correct and verified).
  • Depending on the settings used in the cobrapy FastCC function, it marks reactions as inconsistent that definitely CAN carry flux, and will be identified by changing the settings.
  • Of interest is that zero_cutoff has very little effect (as long as it is in the range of the correct (meaning similar to epsilon in Matlab) values, so anywhere around e-3 to e-6) which is a bit unexpected.
  • What appears to have a much more and drastic effect on the included reactions is the the flux threshold setting. This can make up for a difference of more than 600 reactions, which is not at all what I expected. This does not follow a true trend either, increasing the flux threshold seems to increase the amount of consistent/kept reactions, however peaks at a value of 900 (with 880 to 920 checked in steps of 5). A flux threshold of 1 (as per default) leads to ~460 reactions indicated as not consistent that were not present in the Matlab implementation (and many of which by manual checking can carry flux, although I did not systematically check all of these).
  • The above observation that flux_threshold is important for included reactions is baffling to me, as this is only used as the upper bound of the z_var, and while I can somewhat imagine that setting this to allow for higher values might affect the included reactions, I cannot imagine setting this upper bound to be even higher, will then lead to a reduction of consistent reactions.
  • Some specific settings are only 4-5 reactions off from those found by the Matlab implementation, however looking at these reactions they should and can carry flux, which leads me to think that there is still something off in regards to floating point arithmatic. Two of these reactions are pool reactions and thus might actually be on the edge of consistent/inconsistent due to very low stoichiometries. However for other reactions this is not the case, one must (except for some really weird scheme I suppose) be able to be active as both their surrounding reactions are found to be consistent, and if maximizing its flux I can get a value of 111 through it.

I will use loopless FVA instead, and might come back to this, but as of right now I will not continue working on this as I will require removal of all reactions that cannot carry flux, which this won't do anyway. I had the hope to find the issue, but why these results are the way they are is unclear to me.

Below inside the spoilers are some findings; new_fastCC implies a change from the original code to have cutoffs as >= 0.99 * zero_cutoff instead of > zero_cutoff:
Matlab finds 11681 reactions to be consistent:

Original implementation

VERIFIED WITH OLD FASTCC version that did not utilzie the exact same code as the matlab one
Verified specifically it used flux > zero_cutoff while the matlab function performs flux >= 0.99zero_cutoff
filtered_cobra_model_zero_1_0, indices_removed = fastcc(cobra_model, zero_cutoff = 1.0, flux_threshold = 1.0) Verified to not work in realistic time (currently at 68 iteration)
print(len(filtered_cobra_model_zero_1_0.reactions)) Verified... as this leads to no correct outcome due to zero_cutoff being to high, only keeps 8617 reactions (matlab standard function is 11681)
filtered_cobra_model_zero_0_1, indices_removed = fastcc(cobra_model, zero_cutoff = 0.1, flux_threshold = 1.0) verified keeps 11387, 24 iterations
filtered_cobra_model_zero_0_01, indices_removed = fastcc(cobra_model, zero_cutoff = 0.01, flux_threshold = 1.0) verified keeps 11399, 26 iterations
filtered_cobra_model_zero_0_001, indices_removed = fastcc(cobra_model, zero_cutoff = 0.001, flux_threshold = 1.0) verified keeps 11469, 12 iterations
filtered_cobra_model_zero_0_0001, indices_removed = fastcc(cobra_model, zero_cutoff = 0.0001, flux_threshold = 1.0) verified keeps 11487, 12 iterations
filtered_cobra_model_zero_0_00001_ep1_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.00001, flux_threshold = 1.0) verified keeps 11440, 11 iterations
filtered_cobra_model_zero_0_000001_ep1_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.000001, flux_threshold = 1.0) verified keeps 11440, 11 iterations
filtered_cobra_model_zero_0_00001_ep10_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.00001, flux_threshold = 10.0) verified keeps 11607, 27 iterations
filtered_cobra_model_zero_0_000001_ep10_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.000001, flux_threshold = 10.0) verified keeps 11601, 27 iterations
filtered_cobra_model_zero_0_00001_ep0_1, indices_removed = fastcc(cobra_model, zero_cutoff = 0.00001, flux_threshold = 0.1) verified keeps 11335, 6 iterations
filtered_cobra_model_zero_0_000001_ep0_1, indices_removed = fastcc(cobra_model, zero_cutoff = 0.000001, flux_threshold = 0.1) verified keeps 11335, 6 iterations
filtered_cobra_model_ep1_0, indices_removed = fastcc(cobra_model, flux_threshold = 1.0) verified keeps 11440 reactions, 11 iterations
filtered_cobra_model_ep0_1, indices_removed = fastcc(cobra_model, flux_threshold = 0.1) verified keeps 11335 reactions, 6 iterations
filtered_cobra_model_ep0_01, indices_removed = fastcc(cobra_model, flux_threshold = 0.01) verified keeps 11328 reactions, 4 iterations
filtered_cobra_model_ep0_001, indices_removed = fastcc(cobra_model, flux_threshold = 0.001) verified keeps 11125, 8 iterations
filtered_cobra_model_ep0_0001, indices_removed = fastcc(cobra_model, flux_threshold = 0.0001) verified keeps 11247, 17 iterations
filtered_cobra_model_ep0_00001, indices_removed = fastcc(cobra_model, flux_threshold = 0.00001) verified keeps 11205, 14 iterations
filtered_cobra_model_ep0_000001, indices_removed = fastcc(cobra_model, flux_threshold = 0.000001) verified keeps 11326, 14 iterations
filtered_cobra_model_zero_0_00001_ep100_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.00001, flux_threshold = 100.0) verified keeps 11668, 66 iterations
filtered_cobra_model_zero_0_000001_ep100_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.000001, flux_threshold = 100.0) verified keeps 11668, 66 iterations
filtered_cobra_model_zero_0_00001_ep1000_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.00001, flux_threshold = 1000.0) verified keeps 11674, 211 iterations
filtered_cobra_model_zero_0_000001_ep1000_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.000001, flux_threshold = 1000.0) verified 11674 keeps 11674, 211 iterations
filtered_cobra_model_zero_0_0001_ep100_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.0001, flux_threshold = 100.0) verified keeps 11668, 66 iterations
filtered_cobra_model_zero_0_0001_ep1000_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.0001, flux_threshold = 1000.0) verified keeps 11674, 211 iterations
IMPORTANT END of old FastCC verification with old matlab version that did not utilize the exact same code as the matlab one
IMPORTANT specifically verified that the old FastCC used flux > zero_cutoff while the matlab function performs flux >= 0.99
zero_cutoff

edit: fixed layout (removed BOLD from spoiler text)

New implementation VERIFIED Below is verified with the new fastCC function that utilizes the same cutoff as the matlab function only using the settings that previously found at least 11500 or higher new_fastcc_filtered_cobra_model_zero_0_000001_ep100_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.000001, flux_threshold = 100.0) verified keeps 11658, 67 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep100_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.00001, flux_threshold = 100.0) verified keeps 11658, 67 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep1000_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.00001, flux_threshold = 1000.0) verified keeps 11674, 205 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep1_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.000001, flux_threshold = 1.0) verified keeps 11473, 12 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep1_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.00001, flux_threshold = 1.0) verified keeps 11473, 12 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep10_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.00001, flux_threshold = 10.0) verified 11603, 27 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep10_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.000001, flux_threshold = 10.0) verified 11610, 27 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep1000_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000001, flux_threshold=1000.0) verified 11674, 205 iterations new_fastcc_filtered_cobra_model_zero_0_0000001_ep1000_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=1000.0) verified 11674, 205 iterations new_fastcc_filtered_cobra_model_zero_0_0000001_ep1100_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=1100.0) verified 11675, 203 iterations new_fastcc_filtered_cobra_model_zero_0_0000001_ep900_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=900.0) verified 11677, 180 iterations important best new_fastcc_filtered_cobra_model_zero_0_00001_ep1100_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=1100.0) verified 11675, 205 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep900_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=900.0) verified 11677, 180 iterations new_fastcc_filtered_cobra_model_zero_0_00002_ep1000_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00002, flux_threshold=1000.0) verified 11674, 205 iterations new_fastcc_filtered_cobra_model_zero_0_000005_ep1000_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000005, flux_threshold=1000.0) verified 11674, 205 iterations new_fastcc_filtered_cobra_model_zero_0_0000001_ep800_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=800.0) verified 11675, 178 iterations new_fastcc_filtered_cobra_model_zero_0_0000001_ep950_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=950.0) verified 11674, 188 iterations new_fastcc_filtered_cobra_model_zero_0_0000001_ep1050_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=1050.0) verified 11675, 200 iterations new_fastcc_filtered_cobra_model_zero_0_0000001_ep1200_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=1200.0) verified 11674, 201 iterations new_fastcc_filtered_cobra_model_zero_0_0000001_ep1500_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=1500.0) verified 11674, 193 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep1100_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=1100.0) verified 11675, 205 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep900_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=900.0) verified 11677, 180 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep880_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=880.0) verified 11674, 179 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep890_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=890.0) verified 11674 reactions, 184 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep895_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=895.0) verified 11676, 182 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep905_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=905.0) verified 11676, 178 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep910_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=910.0) verified 11675, 189 new_fastcc_filtered_cobra_model_zero_0_00001_ep915_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=915.0) verified 11676, 187 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep920_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=920.0) verified 11676, 186 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep880_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000001, flux_threshold=880.0) verified 11674, 184 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep890_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000001, flux_threshold=890.0) verified 11676 reactions, 182 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep895_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000001, flux_threshold=895.0) verified 11676, 182 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep905_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000001, flux_threshold=905.0) verified 11676, 178 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep910_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000001, flux_threshold=910.0) verified 11675, 189 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep915_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000001, flux_threshold=915.0) verified 11676, 187 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep920_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000001, flux_threshold=920.0) verified 11676, 186 new_fastcc_filtered_cobra_model_zero_0_00002_ep800_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00002, flux_threshold=800.0) verified 11674, 177 new_fastcc_filtered_cobra_model_zero_0_00002_ep1200_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00002, flux_threshold=1200.0) verified 11676, 200 new_fastcc_filtered_cobra_model_zero_0_00002_ep900_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00002, flux_threshold=900.0) verified 11670, 184 new_fastcc_filtered_cobra_model_zero_0_000005_ep900_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000005, flux_threshold=900.0) verified 11674, 184 new_fastcc_filtered_cobra_model_zero_0_0000001_ep999_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=999.0) verified 11674, 185 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep999_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=999.0) verified 11675, 182 iterations new_fastcc_filtered_cobra_model_zero_0_00002_ep999_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00002, flux_threshold=999.0) verified 116745, 182 iterations new_fastcc_filtered_cobra_model_zero_0_000005_ep999_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000005, flux_threshold=999.0) verified 116745, 182 iterations

@cdiener
Copy link
Member

cdiener commented Feb 24, 2025

You're right. Looking a bit more at the code there are more issues. The flipping shouldn't change the objective and it is missing the singleton step. I will start a fix as a reference.

@cdiener
Copy link
Member

cdiener commented Feb 26, 2025

I sent in a PR with a fix. This version seems to work fine. It gives the same result as find_blocked_reactions with default solver settings and if I change the tolerance to 1e-6 it gives the same result as the Matlab version for Human Gem 1.17.

>>> from cobra.io import load_matlab_model
>>> from cobra.core.gene import GPR
KeyboardInterrupt
>>> from cobra.flux_analysis import fastcc, find_blocked_reactions
>>> import logging
>>> logging.basicConfig(level="INFO")
>>> mod = load_matlab_model("/home/cdiener/Downloads/Human-GEM-1.17.mat")
>>> mod.tolerance = 1e-6
>>> cmod = fastcc(mod, 1e-3, 1e-6)
INFO:cobra.flux_analysis.fastcc:Initial step found 10241 consistent reactions. Starting the consistency loop for the remaining 2728 reactions.
INFO:cobra.flux_analysis.fastcc:Final - consistent reactions: 11681 - inconsistent reactions: 1288 [eps=0.001, tol=1e-06]
>>> 

There were a few things that needed changing. It could probably be sped up a bit too. The last singleton steps are essentially an FVA, so it might make sense to switch over to that.

@dagl1
Copy link

dagl1 commented Feb 27, 2025

@cdiener Oh that is prefect, I will look at the changes later in detail (as it will be interesting to observe exactly what changed), but I already the implementation is a lot closer to the Matlab version. I had been running the CycleFreeFlux FVA in the meantime and found that with fastCC (Matlab implementation at slightly different epsilon than before) followed by CycleFreeFluxFVA (removing any reaction that did not have a flux above 0.001) I am left with 10127 reactions. Which is quite a big difference still in the amount of unreachable reactions.

As fastCC is LP based, I was thinking that with the now-working fastCC you created (thanks again, I really appreciate it, and I think others will too of course), we might be able to already include a loopless option by using CycleFreeFlux, but then I realised that since in the intial steps (nearly) all reactions are part of the objective (or there z_var is), this will give problems for CycleFreeFlux as I remember from the paper that it doesn't deal with cycles that have constituent reactions which are part of the objective.
I imagine that this would leave us with only being able to utilize actual loopless constraints (and turning this whole thing into several MILPs). At which point it might just be faster to do CycleFreeFVA in the first place (or do you think it is worth the time for me to look into (do you see any chance of this working in the first place basically))?

@cdiener
Copy link
Member

cdiener commented Feb 28, 2025

You could try that for sure. CycleFree FVA is much slower though. I think the fastest would be to use fastcc or find_blocked_reactions (which is also fast) and then use CycleFreeFlux once you actually use the model for FBA.

@dagl1
Copy link

dagl1 commented Feb 28, 2025

The issue would be that I am not doing FBA but different overall objectives that would be affected by fluxes. Using looplaw constraints is fine for this, but at the moment I am mostly attempting to reduce the model as much as humanly possibly before attempting any further processing (it also somewhat affects the amount of model curation I need to do, because adjusting lumped reactions (the bane of my algorithm) is not necessary when a reaction is blocked in the first place). As far as I can see, find_blocked_reactions does not deal with loops either (hence I prefer using FVA directly with allowLoops as True).

The idea that a reaction is blocked when it cannot carry flux (due to it, for instance, not being connected to the network/exchange reactions), would be considered as "not blocked" when it is part of a triangle thermodynamically infeasible loop is strange to me. Any reaction that without loops cannot carry flux is blocked (to me), but reactions that can carry flux only through loops, shouldn't be any less blocked (as they still remain unconnected to the overall network).

However I admit that I am potentially thinking about this all wrong? Would find_blocked_reactions deal with this, or are there reasons to think that reactions that can carry only thermodynamically infeasible fluxes, should not be considered blocked?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants