Skip to content

Commit dfae57e

Browse files
authored
Merge pull request #16 from solislemuslab/develop
MiNAA v1.2.0
2 parents 38795c7 + 0712960 commit dfae57e

File tree

13 files changed

+483
-29
lines changed

13 files changed

+483
-29
lines changed

.github/workflows/draft-pdf.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ jobs:
1414
# This should be the path to the paper within your repo.
1515
paper-path: paper/paper.md
1616
- name: Upload
17-
uses: actions/upload-artifact@v1
17+
uses: actions/upload-artifact@v4
1818
with:
1919
name: paper
2020
# This is the output path where Pandoc will write the compiled

README.md

+13-3
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# MiNAA: Microbiome Network Alignment Algorithm
22

3-
<img src="logo.png" style="width:40%;" align=right>
3+
<img src="img/logo.png" style="width:40%;" align=right>
44

5-
[![GitHub Releases](https://img.shields.io/github/v/release/solislemuslab/minaa?display_name=tag)](https://github.com/solislemuslab/minaa/releases) [![GitHub license](https://img.shields.io/github/license/solislemuslab/minaa)](https://github.com/solislemuslab/minaa/blob/main/LICENSE) [![GitHub Issues](https://img.shields.io/github/issues/solislemuslab/minaa)](https://github.com/solislemuslab/minaa/issues) ![ ](https://img.shields.io/github/languages/code-size/solislemuslab/minaa) [![status](https://joss.theoj.org/papers/b4d9f26021065b1759d50413f60aa9c3/status.svg)](https://joss.theoj.org/papers/b4d9f26021065b1759d50413f60aa9c3)
5+
[![GitHub Releases](https://img.shields.io/github/v/release/solislemuslab/minaa?display_name=tag)](https://github.com/solislemuslab/minaa/releases) [![GitHub license](https://img.shields.io/github/license/solislemuslab/minaa?color=yellow)](https://github.com/solislemuslab/minaa/blob/main/LICENSE) [![GitHub Issues](https://img.shields.io/github/issues/solislemuslab/minaa)](https://github.com/solislemuslab/minaa/issues) ![ ](https://img.shields.io/github/languages/code-size/solislemuslab/minaa?color=white) [![status](https://joss.theoj.org/papers/b4d9f26021065b1759d50413f60aa9c3/status.svg)](https://joss.theoj.org/papers/b4d9f26021065b1759d50413f60aa9c3)
66

77
## Description
88

@@ -64,6 +64,10 @@ This utility has the form `./minaa.exe <G> <H> [-B=bio] [-a=alpha] [-b=beta]`.
6464
- **-st=**: similarity threshold; The similarity value above which aligned pairs are included in the output.
6565
- Require: a real number in range [0, 1].
6666
- Default: 0.
67+
- **-c**: conserved subgraphs; whether or not to output a list of the conserved subgraphs in the alignment between G and H.
68+
- Require: none.
69+
- Default: this list is not calculated or returned.
70+
- Note: We define a conserved subgraph as a connected subgraph of G whose nodes are aligned to a connected subgraph of H. See the Examples section for a visual.
6771

6872
#### Uncommon
6973

@@ -105,6 +109,12 @@ This utility has the form `./minaa.exe <G> <H> [-B=bio] [-a=alpha] [-b=beta]`.
105109

106110
### Examples
107111

112+
<img src="img/conserved_subgraph.png" style="width:60%;" align=center>
113+
114+
On the left are adjacency matrices for simple networks G and H, and an imagined **alignment matrix** as returned by MiNAA. On the right is a visual depiction of G overlaying H, and the connected purple graph is what we call a **conserved subgraph** in the alignment of G and H. We color a node purple if that node in G is aligned to a node in H, and we color an edge purple if a pair of adjacent nodes in G are also adjacent in the nodes they're aligned to in H. Note that the blue edge, red edge, and red node `e` are not considered part of this conserved subgraph.
115+
116+
#### Example Execution
117+
108118
Examples of MiNAA's usage with real data and in-depth explanations can be found in the `examples/` directory.
109119

110120
## Simulations in the Manuscript
@@ -117,7 +127,7 @@ Users interested in expanding functionalities in MiNAA are welcome to do so. Iss
117127

118128
## License
119129

120-
MiNAA is licensed under the [MIT](https://opensource.org/licenses/MIT) license. &copy; SolisLemus lab (2024).
130+
MiNAA is licensed under the [MIT](https://opensource.org/licenses/MIT) license. &copy; Solis-Lemus Lab (2024).
121131

122132
## Citation
123133

examples/README.md

+26-4
Original file line numberDiff line numberDiff line change
@@ -10,27 +10,49 @@ Once MiNAA is successfully compiled, the examples below can be run from this pro
1010

1111
## Example 1
1212

13-
`./minaa.exe examples/g.csv examples/h.csv -a=0.6 -g`
13+
```bash
14+
./minaa.exe examples/g.csv examples/h.csv -a=0.6 -g
15+
```
1416

1517
Output to: `g-h-a0.6/`
1618

1719
Here we align network **g** with network **h** using no biological data. `-a=0.6` sets alpha equal to 0.6, meaning 60% of the topological cost function comes from similarity calculated by GDVs, and 40% from simpler node degree data.
1820

1921
## Example 2
2022

21-
`./minaa.exe examples/g.csv examples/h.csv -B=examples/bio.csv -b=0.85 -st=0.5 -s`
23+
```bash
24+
./minaa.exe examples/g.csv examples/h.csv -B=examples/bio.csv -b=0.85 -st=0.5 -s
25+
```
2226

2327
Output to: `g-h/`
2428

2529
Here we align network **g** with network **h** using topological information and the given biological similarity matrix, **bio**. Since we've provided a similarity matrix instead of a cost matrix (the default), we have to flag that with `-s`. Since alpha was unspecified, it defaults to 1. Since beta was set to 0.85, 85% of the cost weight is from the calculated topological cost matrix, and 15% is from **bio**. Since the similarity threshold `-st=` was set to 0.5, any aligned pair with similarity score less than or equal to 0.5 is excluded from the alignment results.
2630

2731
## Example 3
2832

29-
`./minaa.exe examples/g.csv examples/h.csv -Galias=nonsmoker -Halias=smoker -p -t`
33+
```bash
34+
./minaa.exe examples/g.csv examples/h.csv -Galias=nonsmoker -Halias=smoker -p -t -c
35+
```
3036

3137
Output to: `nonsmoker-smoker-2024_01_16-22_05_34/`
3238

33-
Here we align network **g** with network **h**, where **g** is given the alias "nonsmoker", and **h** is given the alias "smoker". The timestamp option `-t` was specified, so the name of the output folder will be nonsmoker-smoker-T, where T is the date and time of execution. Additionally, because the passthrough option `-p` was specified, g.csv and h.csv will be passed through to the output folder as nonsmoker.csv and smoker.csv, respectively.
39+
Here we align network **g** with network **h**, where **g** is given the alias "nonsmoker", and **h** is given the alias "smoker". The timestamp option `-t` was specified, so the name of the output folder will be nonsmoker-smoker-T, where T is the date and time of execution. Because the passthrough option `-p` was specified, g.csv and h.csv will be passed through to the output folder as nonsmoker.csv and smoker.csv, respectively. Finally, because the `-c` option was specified, the output folder will include the alignment's conserved subgraphs, in a file called `conserved_subgraphs.csv`.
40+
41+
## Example 4
42+
43+
```R
44+
source("plot_alignment.R")
45+
plot_alignment(
46+
g_filepath = "examples/g.csv",
47+
h_filepath = "examples/h.csv",
48+
alignment_filepath = "examples/g-h/alignment_matrix.csv",
49+
output_filepath = "examples/g-h/plot.png",
50+
st = 0.5,
51+
hide_singletons = TRUE
52+
)
53+
```
54+
55+
This is an example execution in R of the alignment visualization script `plot_alignment.R`. The resulting `plot.png` is in `examples/g-h/`.
3456

3557
## Attributions
3658

examples/g-h/plot.png

1.08 MB
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
"g1","h146"
2+
3+
"g6","h5"
4+
5+
"g7","h161"
6+
7+
"g8","h138"
8+
"g167","h134"
9+
10+
"g12","h124"
11+
12+
"g13","h43"
13+
"g126","h121"
14+
15+
"g14","h171"
16+
17+
"g16","h96"
18+
19+
"g17","h14"
20+
"g74","h74"
21+
"g82","h71"
22+
"g23","h81"
23+
"g81","h23"
24+
"g90","h90"
25+
"g71","h17"
26+
"g70","h70"
27+
"g58","h13"
28+
29+
"g19","h25"
30+
31+
"g20","h128"
32+
"g28","h87"
33+
34+
"g22","h160"
35+
36+
"g24","h162"
37+
38+
"g30","h130"
39+
40+
"g32","h122"
41+
"g45","h156"
42+
"g42","h145"
43+
44+
"g35","h135"
45+
"g155","h120"
46+
"g164","h60"
47+
"g112","h109"
48+
"g92","h68"
49+
"g51","h104"
50+
51+
"g37","h110"
52+
53+
"g43","h56"
54+
55+
"g46","h155"
56+
57+
"g52","h127"
58+
59+
"g53","h63"
60+
"g68","h142"
61+
62+
"g54","h123"
63+
64+
"g56","h118"
65+
66+
"g59","h152"
67+
68+
"g63","h129"
69+
70+
"g65","h7"
71+
72+
"g73","h115"
73+
74+
"g91","h55"
75+
76+
"g95","h153"
77+
78+
"g103","h117"
79+
80+
"g105","h132"
81+
82+
"g106","h107"
83+
84+
"g108","h28"
85+
86+
"g109","h93"
87+
88+
"g115","h72"
89+
90+
"g116","h169"
91+
92+
"g119","h103"
93+
94+
"g120","h101"
95+
96+
"g123","h73"
97+
98+
"g125","h157"
99+
100+
"g128","h173"
101+
"g157","h6"
102+
103+
"g130","h102"
104+
105+
"g131","h51"
106+
107+
"g133","h32"
108+
109+
"g137","h174"
110+
111+
"g139","h30"
112+
113+
"g142","h31"
114+
115+
"g150","h22"
116+
117+
"g154","h21"
118+
119+
"g166","h18"
120+
121+
"g169","h125"
122+
123+
"g172","h44"
124+
125+
"g173","h106"
126+
127+
"g174","h137"
128+

img/conserved_subgraph.png

52.6 KB
Loading

logo.png img/logo.png

File renamed without changes.

include/file_io.h

+1
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ namespace FileIO
2121
void matrix_to_file(std::string, std::vector<std::string>, std::vector<std::string>, std::vector<std::vector<double>>);
2222
void alignment_to_matrix_file(std::string, std::vector<std::string>, std::vector<std::string>, std::vector<std::vector<double>>, double);
2323
void alignment_to_list_file(std::string, std::vector<std::string>, std::vector<std::string>, std::vector<std::vector<double>>, double);
24+
void subgraphs_to_file(std::string, std::vector<std::string>, std::vector<std::string>, std::vector<std::vector<std::pair<unsigned, unsigned>>>);
2425
}
2526

2627
#endif

include/util.h

+1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ namespace Util
1010
std::vector<std::vector<double>> normalize(std::vector<std::vector<double>>);
1111
std::vector<std::vector<double>> one_minus(std::vector<std::vector<double>>);
1212
std::vector<std::vector<double>> combine(std::vector<std::vector<double>>, std::vector<std::vector<double>>, double);
13+
std::vector<std::vector<std::pair<unsigned, unsigned>>> conserved_subgraphs(std::vector<std::vector<unsigned>>, std::vector<std::vector<unsigned>>, std::vector<std::vector<double>>, double);
1314
}
1415

1516
#endif

plot_alignment.R

+145
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
plot_alignment <- function(
2+
g_filepath,
3+
h_filepath,
4+
alignment_filepath,
5+
output_filepath,
6+
st = 0,
7+
hide_singletons = FALSE
8+
) {
9+
# # Load necessary libraries
10+
# required_packages <- c("igraph", "ggraph", "ggplot2", "readr", "dplyr")
11+
# installed_packages <- rownames(installed.packages())
12+
# for (pkg in required_packages) {
13+
# if (!pkg %in% installed_packages) {
14+
# install.packages(pkg, dependencies = TRUE)
15+
# }
16+
# }
17+
18+
library(igraph)
19+
library(ggraph)
20+
library(ggplot2)
21+
library(readr)
22+
library(dplyr)
23+
24+
# Helper function to read adjacency matrix
25+
read_adjacency <- function(filepath) {
26+
df <- read_csv(filepath, show_col_types = FALSE)
27+
mat <- as.matrix(df[, -1])
28+
rownames(mat) <- df[[1]]
29+
return(mat)
30+
}
31+
32+
# Read adjacency matrices
33+
g_mat <- read_adjacency(g_filepath)
34+
h_mat <- read_adjacency(h_filepath)
35+
36+
# Create igraph objects (assuming undirected graphs; adjust 'mode' if needed)
37+
g_graph <- graph_from_adjacency_matrix(g_mat, mode = "undirected", diag = FALSE)
38+
h_graph <- graph_from_adjacency_matrix(h_mat, mode = "undirected", diag = FALSE)
39+
40+
# Read alignment matrix
41+
alignment_df <- read_csv(alignment_filepath, show_col_types = FALSE)
42+
alignment_mat <- as.matrix(alignment_df[, -1])
43+
rownames(alignment_mat) <- alignment_df[[1]]
44+
45+
# Extract alignment pairs with similarity > the similarity threshold
46+
alignment_pairs <- which(alignment_mat > st, arr.ind = TRUE)
47+
align_df <- data.frame(
48+
g_node = rownames(alignment_mat)[alignment_pairs[, 1]],
49+
h_node = colnames(alignment_mat)[alignment_pairs[, 2]],
50+
similarity = alignment_mat[alignment_pairs]
51+
)
52+
53+
if (hide_singletons) {
54+
# Remove nodes in G and H with degree 0
55+
deg_g <- degree(g_graph)
56+
non_single_g <- names(deg_g[deg_g > 0])
57+
g_graph <- induced_subgraph(g_graph, vids = non_single_g)
58+
59+
deg_h <- degree(h_graph)
60+
non_single_h <- names(deg_h[deg_h > 0])
61+
h_graph <- induced_subgraph(h_graph, vids = non_single_h)
62+
63+
# Update alignment_df to include only existing nodes after subsetting
64+
align_df <- align_df %>%
65+
filter(g_node %in% V(g_graph)$name & h_node %in% V(h_graph)$name)
66+
}
67+
68+
# Rename nodes to ensure uniqueness
69+
V(g_graph)$name <- paste0("G_", V(g_graph)$name)
70+
V(h_graph)$name <- paste0("H_", V(h_graph)$name)
71+
72+
combined_graph <- disjoint_union(g_graph, h_graph)
73+
74+
# Prepare alignment edges
75+
if (nrow(align_df) > 0) {
76+
align_edges <- align_df %>%
77+
mutate(
78+
from = paste0("G_", g_node),
79+
to = paste0("H_", h_node)
80+
)
81+
} else {
82+
align_edges <- data.frame(from = character(0), to = character(0), similarity = numeric(0))
83+
}
84+
85+
# Create a layout with G on the left and H on the right
86+
set.seed(123) # For reproducibility
87+
layout_g <- layout_with_fr(g_graph)
88+
layout_h <- layout_with_fr(h_graph)
89+
90+
# Shift H layout to the right of G
91+
shift_x <- max(layout_g[, 1]) - min(layout_h[, 1]) + 5
92+
layout_h[, 1] <- layout_h[, 1] + shift_x
93+
94+
# Combine layouts
95+
combined_layout <- rbind(layout_g, layout_h)
96+
V(combined_graph)$x <- combined_layout[, 1]
97+
V(combined_graph)$y <- combined_layout[, 2]
98+
99+
# Prepare data for alignment edges
100+
if (nrow(align_edges) > 0) {
101+
align_coords <- align_edges %>%
102+
mutate(
103+
from_x = V(combined_graph)$x[match(from, V(combined_graph)$name)],
104+
from_y = V(combined_graph)$y[match(from, V(combined_graph)$name)],
105+
to_x = V(combined_graph)$x[match(to, V(combined_graph)$name)],
106+
to_y = V(combined_graph)$y[match(to, V(combined_graph)$name)]
107+
)
108+
}
109+
110+
# Plot #
111+
112+
# Adjust the plot size based on the number of nodes
113+
num_nodes <- vcount(combined_graph)
114+
plot_width <- max(2000, 100 + 20 * num_nodes)
115+
plot_height <- max(1000, 100 + 10 * num_nodes)
116+
117+
png(filename = output_filepath, width = plot_width, height = plot_height, res = 150)
118+
119+
p <- ggraph(combined_graph, layout = "manual", x = V(combined_graph)$x, y = V(combined_graph)$y) +
120+
# Internal edges within G and H
121+
geom_edge_link(aes(alpha = 0.5), color = "grey", linewidth = 0.5) +
122+
# G nodes in red, H in blue
123+
geom_node_point(aes(color = ifelse(startsWith(name, "G_"), "G", "H")), size = 3) +
124+
# Node labels without the prefix
125+
geom_node_text(aes(label = gsub("^[GH]_", "", name)), repel = TRUE, size = 3)
126+
# Color mapping
127+
scale_color_manual(values = c("G" = "red", "H" = "blue")) +
128+
theme_void() +
129+
theme(legend.position = "none")
130+
131+
if (nrow(align_edges) > 0) {
132+
# Add alignment edges
133+
p <- p +
134+
geom_segment(
135+
data = align_coords,
136+
aes(x = from_x, y = from_y, xend = to_x, yend = to_y),
137+
color = "purple", linewidth = 0.5, alpha = 0.6
138+
)
139+
}
140+
141+
print(p)
142+
dev.off()
143+
144+
message("Plot saved to ", output_filepath)
145+
}

0 commit comments

Comments
 (0)