-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FastCC results differ from respective MATLAB function #1154
Comments
Thanks for the report @NantiaL. Can you please post the exact code that you used to generate these results? |
Hey, here it is: MATLAB:
Python:
In Python I also got different results over multiple runs after setting the zero_cutoff and/or the flux_threshold to 1e-4. |
From a cursory look it seems like there are some bugs in the FastCC coefficient setting. For instance a reaction with a forward flux of 1 and reverse flux of 1 (net flux 0) is considered active on our formulation. |
Apologies for the delayed response. I'll take a look. Thanks for the report! |
Hello, I was wondering if you ever found any leads on what is causing this issue. I tried to diagnose myself but it got quite overwhelming for me to understand. This issue seems pretty important for research since results should be reproducible. Thank you |
Hi @babessell1 ! The problem is most likely in the problem formulation as @cdiener pointed out earlier. I have been quite busy lately and I'm not sure when I can take tackle it. Apologies for that as I had self-assigned it. I'll remove it and let someone else take a stab at it. |
Hi there, I have a more general question as this is apparently still not fixed; the last change to fastCC is in 2021 if I am not mistaken, and at least one bug has been identified in 2022, yet this function still exists in this state. Why is it not removed, or why does it not at least show warnings that it is not to be used. It greatly reduces any type of confidence in this repository if this is the case (I hope I am mistaken and missed the fix/commit, however I too find differences in the amount of reactions found between matlab and python implementations), but the fact that I can use the fastCC function without knowing that there are problems with it, is problematic isn't it? |
I could write quite a bit about the project now, instead, please consider the following question: |
The changes that are necessary to fix and test the function are of course no one's responsibility, however the people accepting PR's or even those that identified issues probably do probably have the responsibility to make people aware of known issues. Now I understand that you might say: this is no one's job and as this is an open source repository, people could make these changes if they want to, however I would counter that allowing known bugs to persist without informing users (for several years), is unbecoming of academics. (this would only require 2 lines of code): import warnings |
@Midnighter Can we keep this issue open (as the PR is going to close it), as the bug still exists. @cdiener Since I will be having a look at this function (and some others) soon, I was wondering if you still see the specific bug you mentioned? I spotted some other discrepancies between this function and the MATLAB implementation, I do not see the coefficient setting that you mention? The abs(flux) part in _find_sparse_mode, won't deal with this abs(-2 + 2) should still be 0 (if you don't remember, no worries, I will take a look later this week/month) |
@dagl1 Bug is still there but it is just a sign error. What's implemented here is LP-7 from the paper. However, cobrapy is using a standard form for the LP problem, so Sorry, originally I wanted to give @synchon a chance to fix it first and then I never checked. And there is already a working function to detect blocked reactions and always enough other urgent things to do, so it wasn't the highest priority for me. |
Ah @cdiener you mean this part right (changed from + var to - var already): const = prob.Constraint( I also spotted a small source of discrepancy in how active fluxes are considered, as the Matlab implementation checks for >= .99*epsilon while cobrapy checks > zero_cutoff (zero_cutoff being the same as the epsilon of the Matlab implementation). However even with these things changed, they still lead to discrepancies, so I will try going over some potential solver setting differences, and investigate the reactions that are actually different. If anyone has other ideas that could lead to differences between the functions, let me know. Even with trying over 50 different settings I cannot get the exact same amount of consistent reactions as the matlab function provides (11681 for human-gem 1.17 with matlab, whereas the closest I got was 11677, which is only 4 off, but still, I would hope to get the same results eventually). |
Yes exactly, your version is correct. Flux threshold shouldn't matter much for the result. Smaller values can make it faster because more fluxes can carry flux requiring less iterations. Epsilon is definitely important. Does the human GEM have blocked reactions? Otherwise, the largest number of consisten reactions is probably the more correct result.... Getting the exact same result may be hard because of floating point accuracy. The > eps comparison is only true within the solver tolerances. So some fluxes might be too close to be called consistently. We use a slightly tighter tolerance than most other psckages (1e-7) so maybe adjusting the matlab settings brings it more in line. The agreement you are seeing looks pretty good already. |
Yes I too would think the more lenient values are correct, and most likely floating point arithmatic will be the root of this, but in general it's nice to be able to see the exact same output (just so that any analysis that would take whole network structure or whatever down the line, won't be different between using something in Python vs Matlab). I will let you know what comes out regarding settings (although so far 4 reactions off is the best I can still do). I am using Human Gem 1.17 (1.19 is the latest), where the original model has ~12700 reactions, so a good ~1000 are filtered away. |
I will use loopless FVA instead, and might come back to this, but as of right now I will not continue working on this as I will require removal of all reactions that cannot carry flux, which this won't do anyway. I had the hope to find the issue, but why these results are the way they are is unclear to me. Below inside the spoilers are some findings; new_fastCC implies a change from the original code to have cutoffs as >= 0.99 * zero_cutoff instead of > zero_cutoff: Original implementationVERIFIED WITH OLD FASTCC version that did not utilzie the exact same code as the matlab one edit: fixed layout (removed BOLD from spoiler text) New implementationVERIFIED Below is verified with the new fastCC function that utilizes the same cutoff as the matlab function only using the settings that previously found at least 11500 or higher new_fastcc_filtered_cobra_model_zero_0_000001_ep100_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.000001, flux_threshold = 100.0) verified keeps 11658, 67 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep100_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.00001, flux_threshold = 100.0) verified keeps 11658, 67 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep1000_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.00001, flux_threshold = 1000.0) verified keeps 11674, 205 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep1_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.000001, flux_threshold = 1.0) verified keeps 11473, 12 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep1_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.00001, flux_threshold = 1.0) verified keeps 11473, 12 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep10_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.00001, flux_threshold = 10.0) verified 11603, 27 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep10_0, indices_removed = fastcc(cobra_model, zero_cutoff = 0.000001, flux_threshold = 10.0) verified 11610, 27 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep1000_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000001, flux_threshold=1000.0) verified 11674, 205 iterations new_fastcc_filtered_cobra_model_zero_0_0000001_ep1000_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=1000.0) verified 11674, 205 iterations new_fastcc_filtered_cobra_model_zero_0_0000001_ep1100_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=1100.0) verified 11675, 203 iterations new_fastcc_filtered_cobra_model_zero_0_0000001_ep900_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=900.0) verified 11677, 180 iterations important best new_fastcc_filtered_cobra_model_zero_0_00001_ep1100_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=1100.0) verified 11675, 205 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep900_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=900.0) verified 11677, 180 iterations new_fastcc_filtered_cobra_model_zero_0_00002_ep1000_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00002, flux_threshold=1000.0) verified 11674, 205 iterations new_fastcc_filtered_cobra_model_zero_0_000005_ep1000_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000005, flux_threshold=1000.0) verified 11674, 205 iterations new_fastcc_filtered_cobra_model_zero_0_0000001_ep800_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=800.0) verified 11675, 178 iterations new_fastcc_filtered_cobra_model_zero_0_0000001_ep950_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=950.0) verified 11674, 188 iterations new_fastcc_filtered_cobra_model_zero_0_0000001_ep1050_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=1050.0) verified 11675, 200 iterations new_fastcc_filtered_cobra_model_zero_0_0000001_ep1200_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=1200.0) verified 11674, 201 iterations new_fastcc_filtered_cobra_model_zero_0_0000001_ep1500_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=1500.0) verified 11674, 193 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep1100_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=1100.0) verified 11675, 205 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep900_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=900.0) verified 11677, 180 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep880_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=880.0) verified 11674, 179 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep890_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=890.0) verified 11674 reactions, 184 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep895_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=895.0) verified 11676, 182 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep905_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=905.0) verified 11676, 178 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep910_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=910.0) verified 11675, 189 new_fastcc_filtered_cobra_model_zero_0_00001_ep915_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=915.0) verified 11676, 187 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep920_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=920.0) verified 11676, 186 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep880_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000001, flux_threshold=880.0) verified 11674, 184 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep890_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000001, flux_threshold=890.0) verified 11676 reactions, 182 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep895_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000001, flux_threshold=895.0) verified 11676, 182 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep905_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000001, flux_threshold=905.0) verified 11676, 178 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep910_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000001, flux_threshold=910.0) verified 11675, 189 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep915_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000001, flux_threshold=915.0) verified 11676, 187 iterations new_fastcc_filtered_cobra_model_zero_0_000001_ep920_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000001, flux_threshold=920.0) verified 11676, 186 new_fastcc_filtered_cobra_model_zero_0_00002_ep800_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00002, flux_threshold=800.0) verified 11674, 177 new_fastcc_filtered_cobra_model_zero_0_00002_ep1200_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00002, flux_threshold=1200.0) verified 11676, 200 new_fastcc_filtered_cobra_model_zero_0_00002_ep900_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00002, flux_threshold=900.0) verified 11670, 184 new_fastcc_filtered_cobra_model_zero_0_000005_ep900_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000005, flux_threshold=900.0) verified 11674, 184 new_fastcc_filtered_cobra_model_zero_0_0000001_ep999_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.0000001, flux_threshold=999.0) verified 11674, 185 iterations new_fastcc_filtered_cobra_model_zero_0_00001_ep999_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00001, flux_threshold=999.0) verified 11675, 182 iterations new_fastcc_filtered_cobra_model_zero_0_00002_ep999_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.00002, flux_threshold=999.0) verified 116745, 182 iterations new_fastcc_filtered_cobra_model_zero_0_000005_ep999_0, indices_removed = fastcc( cobra_model, zero_cutoff=0.000005, flux_threshold=999.0) verified 116745, 182 iterations |
You're right. Looking a bit more at the code there are more issues. The flipping shouldn't change the objective and it is missing the singleton step. I will start a fix as a reference. |
I sent in a PR with a fix. This version seems to work fine. It gives the same result as >>> from cobra.io import load_matlab_model
>>> from cobra.core.gene import GPR
KeyboardInterrupt
>>> from cobra.flux_analysis import fastcc, find_blocked_reactions
>>> import logging
>>> logging.basicConfig(level="INFO")
>>> mod = load_matlab_model("/home/cdiener/Downloads/Human-GEM-1.17.mat")
>>> mod.tolerance = 1e-6
>>> cmod = fastcc(mod, 1e-3, 1e-6)
INFO:cobra.flux_analysis.fastcc:Initial step found 10241 consistent reactions. Starting the consistency loop for the remaining 2728 reactions.
INFO:cobra.flux_analysis.fastcc:Final - consistent reactions: 11681 - inconsistent reactions: 1288 [eps=0.001, tol=1e-06]
>>> There were a few things that needed changing. It could probably be sped up a bit too. The last singleton steps are essentially an FVA, so it might make sense to switch over to that. |
@cdiener Oh that is prefect, I will look at the changes later in detail (as it will be interesting to observe exactly what changed), but I already the implementation is a lot closer to the Matlab version. I had been running the CycleFreeFlux FVA in the meantime and found that with fastCC (Matlab implementation at slightly different epsilon than before) followed by CycleFreeFluxFVA (removing any reaction that did not have a flux above 0.001) I am left with 10127 reactions. Which is quite a big difference still in the amount of unreachable reactions. As fastCC is LP based, I was thinking that with the now-working fastCC you created (thanks again, I really appreciate it, and I think others will too of course), we might be able to already include a loopless option by using CycleFreeFlux, but then I realised that since in the intial steps (nearly) all reactions are part of the objective (or there z_var is), this will give problems for CycleFreeFlux as I remember from the paper that it doesn't deal with cycles that have constituent reactions which are part of the objective. |
You could try that for sure. CycleFree FVA is much slower though. I think the fastest would be to use fastcc or find_blocked_reactions (which is also fast) and then use CycleFreeFlux once you actually use the model for FBA. |
The issue would be that I am not doing FBA but different overall objectives that would be affected by fluxes. Using looplaw constraints is fine for this, but at the moment I am mostly attempting to reduce the model as much as humanly possibly before attempting any further processing (it also somewhat affects the amount of model curation I need to do, because adjusting lumped reactions (the bane of my algorithm) is not necessary when a reaction is blocked in the first place). As far as I can see, find_blocked_reactions does not deal with loops either (hence I prefer using FVA directly with allowLoops as True). The idea that a reaction is blocked when it cannot carry flux (due to it, for instance, not being connected to the network/exchange reactions), would be considered as "not blocked" when it is part of a triangle thermodynamically infeasible loop is strange to me. Any reaction that without loops cannot carry flux is blocked (to me), but reactions that can carry flux only through loops, shouldn't be any less blocked (as they still remain unconnected to the overall network). However I admit that I am potentially thinking about this all wrong? Would find_blocked_reactions deal with this, or are there reasons to think that reactions that can carry only thermodynamically infeasible fluxes, should not be considered blocked? |
Hello, I would like to raise a question regarding the FastCC implementation of Cobrapy. After multiple runs, FastCC outputs a different number of blocked reactions using the Recon1 model. It might be reasonable because optimization never delivers unique solutions. But, the respective MATLAB function from the COBRAToolbox always returns the same results after several runs. Since both implementations are based on the same algorithm, I guess both should give the same results. So, I think this might be something worths checking.
I noticed that the number of irreversible reactions detected in the FastCC implementation deviates from the number of irreversible reactions computed by the MATLAB function. Reactions that are irreversible in the backward direction are not reported as irreversible in the MATLAB command window.
For Recon1:
MATLAB:
Python:
Used solver: glpk, python 3.8.5
The text was updated successfully, but these errors were encountered: