-
Notifications
You must be signed in to change notification settings - Fork 53
Create multimedia.qmd #768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Check the suggestions and the following:
-
increase support for the mia and (Tree)SummarizedExperiment methods when possible, this is very useful for compatibility with many other methods; I added suggestions. Also multimedia seems to support SE so this should be done.
-
Let us try to avoid repetition in the code; this has now two almost identical parts. Shall we prepare example data that is readily usable for the examples?
-
I am wondering if there would be a way to combine the two mediation chapters into one?
-
If we consider these methods widely usable we should consider providing wrappers for the interpretation, summaries, and visualizations of the results; either as contributions to "multimedia" package, or into mia or some other pkg (doesn't matter which pkg as long as it works)
|
Could you resolve the cases you have closed (from the "Resolve" buttons)? |
|
Fixed most of the detailed comments, still working on the more genral ones like creating pkg. Will finish this asap |
|
Can you resolve the completed suggestions above and confirm if this is ready to merge? |
Just fixed some minor inconsistence in the name of variables
|
To be checked in more detail before merge, regarding the following:
|
Hi Leo, thank you for the suggestions. I have added some wrapper functions, especially to where you indicated. For the data cleanning steps, do you mean we can remove those part, and add some texts to explain instead? |
|
The data cleaning part would work best if we include a suitable cleaned demo data in the R pkg, then we can skip the data cleaning steps in OMA. The problem is that if we could do this with every single method but then more than half of the book might easily be data cleaning examples, and this is also shifting the focus from the actual method to general data processing steps -> can we include demo data set/s in the multimedia, mia, or other pkg, or can we use already existing demo data sets? |
|
Regarding the wrappers, we could see if we can include these in a package if they would be generally useful anyway. |
|
Hi Leo – just wanted to quickly chime in here. This dataset is a special case where we don’t have a conventional, well-defined outcome for mediation analysis. Instead, the outcome is a dysbiosis score derived from taxonomic profiles, as described by Lloyd-Price et al. (2019), which, to my knowledge, hasn’t been implemented elsewhere. We are already drawing the data from curatedMetagenomicData, but creating the dysbiosis score is a necessary step to showcase the mediation analysis. |
|
Thanks. I think that dysbiosis score could then stay there. Let's check @TuomasBorman feedback. |
TuomasBorman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks very good! However, there are couple points to discuss.
This book is intended as teaching material to demonstrate specific analyses and tasks. The main focus is on the underlying ideas and concepts rather than implementation details. Therefore, the code is kept as simple as possible to achieve the intended outcomes. More polished analyses, including advanced plots, are better suited for workflow packages (we are also considering creating workflow package to showcase more complex analyses).
There are couple lengthy code chunks that do operations to results. Having these long chunks have opposite effect as readers are exhausted by the amount of code and they cannot focus on the main points. Everything that is not relevant for the main point should be removed even though plots or names would not be optimal.
Ideally, these wrappers will be implemented in a package, but of course it takes some time. At least, we should have plan to implement them. If we just leave it here, these will be never implemented as we are very busy with everything.
|
Can you @YihanLiu4023 confirm when you're ready with the updates? Can you also press the "Update branch" button above to ensure that this PR is in sync with the latest devel branch. |
|
Updated file submitted! |
antagomir
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - some points to clarify still.
Overall this looks good but it will be essential to consider the following:
-
Data cleaning operations should be either ignored, or readily cleaned demo data set should be used (it can be placed to one of our packages). Otherwise OMA would be primarily a data cleaning example collection..
-
We need to check if any ready-made functionality is available to fetch and visualize results from mediation analyses. Now this is done with custom code. Maybe multimedia package has some utilities and if not, we should see if some tasks are so central that they should be standardized (either into multimedia package as contributed PR, or into mia).
| # Calculate dysbiosis score for each sample | ||
| # For each sample i, we compute the median distance between sample i and all reference samples in `ref_set` | ||
| sample_ids <- tse[["SampleID"]] | ||
|
|
||
| tse[["dysbiosis"]] <- sapply(seq_along(tse$disease_binary), function(i) { | ||
|
|
||
| # Logical vector indicating all other reference samples (excluding i) | ||
| ref_others <- tse$disease_binary & (sample_ids != sample_ids[i]) | ||
|
|
||
| # Compute median distance between sample i and these reference samples | ||
| median(diss[i, ref_others], na.rm = TRUE) | ||
|
|
||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if the functions from the dysbiosis package could be applicable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the dysbiosisR package do have a similar function called "dysbiosisMedianCLV()", but it can only be applied to objects in phyloseq. I can use the function "mia::convertToPhyloseq()" for convertion but just want to make sure if we want this convertion, since it's a little bit complex as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not want to convert to phyloseq. We could probably add SE support for this package relatively easily but maybe it is better not wait for that. Unless @microsud wants to comment on this?
| # Create the Mediation Data object | ||
| exper <- mediation_data( | ||
| se_relative, | ||
| outcomes = "dysbiosis", | ||
| treatments = "treatment", | ||
| mediators = medi_idx | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The multimedia package already seems to have support for SummarizedExperiment that we are using. Just confirming, is it really necessary to convert SE into Mediation Data object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is necessary to convert SE into a Mediation Data object, as confirmed by the multimedia tutorial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I thought that it could be so. In that case we can keep this but at the same time an issue should be opened (to OMA?) to propose PR to multimedia to add direct SE support (the extra conversion step can be probably automated).
| # Clean the pathway names | ||
| pwy <- rownames(summary_df) | ||
| pwy <- gsub("PWY0\\.", "PWY0-", pwy) # PWY0.xxx → PWY0-xxx | ||
| pwy <- gsub("PWY\\.", "PWY-", pwy) # PWY.xxx → PWY-xxx | ||
| pwy <- gsub("\\.", " ", pwy) # leftover dots → spaces | ||
| pwy <- gsub("__", ": ", pwy) # double underscores → colon | ||
| pwy <- gsub("_\\.", " ", pwy) # underscore then dot → space | ||
| pwy <- gsub("_", " ", pwy) # remaining underscores → space | ||
| pwy <- gsub("\\.\\.", " ", pwy) # double dots → space | ||
| pwy <- gsub("\\.$", "", pwy) # remove ending period | ||
| # pwy <- trimws(pwy) # final cleanup | ||
| rownames(summary_df) <- pwy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we prepare demo data in one of the R packages (e.g. mia) that is sufficiently clean?
OMA is dedicated to show how methods work, and data cleaning is something that is so common that it will overwhelm all examples in the book if we are not actively avoiding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can do that. Do I just need to create a new pull request to the mia package for the cleaned demo data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a PR. It should follow similar conventions than the other demo data sets.
| ggplot(summary_df, aes(x=estimate, y=reorder(mediator, estimate), color=significant)) + | ||
| geom_point() + | ||
| geom_errorbarh(aes(xmin=lower, xmax=upper), height=0.2) + | ||
| geom_vline(xintercept=0, linetype="dashed", color="grey") + | ||
| theme_classic() + | ||
| labs( | ||
| x = "Observed Indirect Effect with Bootstrap CI", | ||
| y = "Mediator", | ||
| title = "Forest Plot of Mediation-specific Indirect Effects for Pathway" | ||
| ) + | ||
| scale_color_manual(values=c("black","red")) + | ||
| theme(legend.position="bottom") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we consider having default visualization method for mediation analyses, for isntance in miaViz pkg (if not multimedia pkg itself)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a package-native mediator visualization using plot_mediators(), which shows the observed outcome–mediator relationships (colored by treatment) for the top pathwise indirect effects. Here is the example plots for species:

(Mediators are standardized (z-scored) for comparability across features; hence values may be negative. Besides, many microbial features are sparse; several mediators exhibit near-zero values for most samples, producing vertical bands.)
While a miaViz-based visualization would instead emphasize the taxonomic/pathway abundance structure (e.g., distributions/compositional patterns) rather than mediation-effect estimates, our original bootstrap forest plot complements plot_mediators() by summarizing effect uncertainty via CIs across mediators. Do you want plot_mediators() to be the primary/default visualization (with the forest plot as optional), or should we keep both as complementary views?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plot_mediators() sounds good - in which R/Bioc package this would be placed? Some programming conventions could be considered depending on the answer.
No description provided.