-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update negative sampling to work directly on GPU #9608
base: master
Are you sure you want to change the base?
Update negative sampling to work directly on GPU #9608
Conversation
for more information, see https://pre-commit.ci
wow this would be awesome!! |
Wow, this is pretty cool. Can we fix the tests so that we are Python 3.8 compatible (no Python 3.10 type hints such as |
Thank you, I am happy to hear this is a desired feature! I updated the code to support python 3.8, and made small changes to the negative sampling test functions. |
92e2629
to
10d440c
Compare
for more information, see https://pre-commit.ci
… retrieve the requested number of edges, it raises an exception.
…kSplit transform.
… added edges are actually new.
…have no guarantee that there were no repeated edges. Now it is based on randperm. Also, the number of nodes is doubled to ensure that structured sampling is (almost) always feasible.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
a6314ec
to
27d6079
Compare
for more information, see https://pre-commit.ci
Hello, I updated the code to pass all the pytests. |
test/utils/test_negative_sampling.py
Outdated
@@ -153,16 +159,72 @@ def test_structured_negative_sampling(): | |||
assert (adj & neg_adj).sum() == 0 | |||
|
|||
# Test with no self-loops: | |||
edge_index = torch.LongTensor([[0, 0, 1, 1, 2], [1, 2, 0, 2, 1]]) | |||
#edge_index = torch.LongTensor([[0, 0, 1, 1, 2], [1, 2, 0, 2, 1]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#edge_index = torch.LongTensor([[0, 0, 1, 1, 2], [1, 2, 0, 2, 1]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielecastellana22 this line was causing the lint issue. But why was it commented out? was that just a temporary change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it can be removed.
I think that in the previous test, there were two different edge_index
to make sure that the structured negative sampling was feasible with and without the negative self-loops.
Since I changed the structured negative sampling, now it is feasible also with the first definition of edge_index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. I'll make few more passes with reviews.
neg_edge_index = negative_sampling(edge_index, method='dense') | ||
assert neg_edge_index.size(1) == edge_index.size(1) | ||
assert is_negative(edge_index, neg_edge_index, (4, 4), bipartite=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why drop this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the method is now inferred automatically based on the graph size. Since the graph used for the test is small, the method is always dense. This reflects the idea that a sparse method (which is based on a random guessing of the negative edges) is reasonable only when the graph is spare (
To test both the sparse and the dense method, I added a new test called test_negative_sampling_with_different_edge_density
. Actually, I think that the whole function test_negative_sampling
can be removed but I left it there to be sure that the old test still works.
# structured sample is feasible if, for each node, deg > max_neigh/2 | ||
return bool(torch.all(2 * deg <= max_num_neighbors)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# structured sample is feasible if, for each node, deg > max_neigh/2 | |
return bool(torch.all(2 * deg <= max_num_neighbors)) | |
# structured sample is feasible if, for each node, deg > max_neigh/2 | |
return bool(torch.all(2 * deg <= max_num_neighbors)) |
The assumption here is that we don't want the negative edges to repeat. I like the assumption, but it might be too tight, also we don't guarantee that the negative edges are unique. I'd suggest keeping the old condition. Or we could make it explicit by adding an argument unique_neg
, and use this new condition when it is True
and the old condition when its False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your point, and this makes me wondering the following:
-
Should the negative sampling return duplicate edges?
In this new implementation,structured_negative_sampling
withmethod=dense
guarantees that the negative edges are unique. Instead, whenmethod=sparse
, this is not explicitly guaranteed (but it should be very unlikely especially if the graph is sparse). However, this could be easily resolved by removing the duplicates after the sampling. Also, it is worth highlighting that thenegative_sampling
never returns duplicates. -
What is the purpose of
structured_negative_sampling_feasible
?
In my understanding, this function checks if a structured negative sampling is feasible in a graph. Note that this does not mean that the functionstructured_negative_sampling
will return all the negative edges requested. In fact, when the sparse method is selected, the method can fail (i.e. it is not able to sample a negative edge for all the true edges) even if it is feasible due to the randomness of the sampling. Vice versa, when the dense method is used, the method will always return the correct set of negative edges (if it is feasible) since it enumerates all edges.
vector_id = torch.arange(N1 * N2) | ||
edge_index3 = torch.stack(vector_id_to_edge_index(vector_id, (N1, N2)), | ||
dim=0) | ||
assert edge_index.tolist() == edge_index3.tolist() | ||
|
||
|
||
def test_negative_sampling(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decorate with @withCUDA
to test on cpu and gpu.
See this for an example.
assert k is not None | ||
k = 2 * k if force_undirected else k | ||
p = torch.tensor([1.], device=device).expand(N1 * N2) | ||
if k > N1 * N2: | ||
k = N1 * N2 | ||
new_edge_id = torch.multinomial(p, k, replacement=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert k is not None | |
k = 2 * k if force_undirected else k | |
p = torch.tensor([1.], device=device).expand(N1 * N2) | |
if k > N1 * N2: | |
k = N1 * N2 | |
new_edge_id = torch.multinomial(p, k, replacement=False) | |
assert k is not None | |
k = 2 * k if force_undirected else k | |
new_edge_id = torch.randperm(N1*N2, device = device)[:min(k, N1*N2)] |
Why not just do this? Also given this are we saying dense
and sparse
are not different anymore. Previosly the difference in dense and sparse was in how negatives were identified, now we don't have that difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The methods differ in terms of complexity:
- the dense method enumerates all the possible edges and then filters out the positive ones. This requires
$O(N^2)$ space and time. - the sparse method samples
$k$ edges and then filters out the positive ones. The complexity now scales with$O(k)$ .
The idea is that if we have a big graph, the dense method is prohibitive. However, in most cases, big graphs are not dense (i.e.
In your suggestion, both sparse and dense will use torch.randperm(N1*N2)
, which returns a vector of size N1*N2, i.e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but in the above method aren't you creating p = torch.tensor([1.], device=device).expand(N1 * N2)
which is of shape N^2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch.expand
does not allocate new memory, as you can see in the pytorch documtation.
Actually, I am not sure about the complexity of torch.multinomial
when replacement=False
, but I didn't find anything online. However, this can be replaced by torch.randint
and then handling the repetition of negative edges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @wsad1, are there any updates?
@rusty1s I re-implemented the negative edge sampling in order to work directly on GPUs.
In the following, I summarise the main idea of my implementation. Then, I will rise same questions that I hope you can help me to answer.
Negative Sampling
The idea of the negative edge sampling is to obtain a list of edges, and then discard all the ones in the input graph.
To perform the existence check, I use torch.searchsorted. The input edge_index should be sorted, but this is usually the case.
The initial guess of negative edges can be obtained in a dense and sparse way. The function also supports the automatic way to let the code automatically choose the best method for the input graph.
Dense Method
The dense method is exact (you can always obtain the desired number of negative edge samples if it is possible), but it is costly since it enumerates all the possible edges (so the cost is quadratic w.r.t the number of nodes). The samples are obtained through the function torch.randperm to get a stochastic process.
Sparse Method
The sparse method is not exact (you could obtain fewer samples than the requested number), but it is more efficient since it does not enumerate all possible edges.
To obtain the guess, we simply sample k edges using torch.multinomial. The number k is crucial to obtain the desired number of negative edges, and it depends on the probability to sample a negative edge randomly.
Structured Negative Sampling
In a similar way, I implemented the structured negative sampling. The main difference here is that we would like to sample a negative edge$(u,w)$ for each edge $(u,v)$ in the graph.
Dense Method
For each node$u$ , we obtain a random permutation of all the nodes. Then, we select the first $deg(u)$ nodes that are not linked with $u$ .
Sparse Method
We sample$k*E$ edges, where $E$ is the number of edges in the input graph. Here the choice of $k$ is more tricky since it depends on the degree of each node. When the method is not able to obtain a negative sample for an edge, it returns $-1$ .
Open Questions
Sorting the edges
I use the pyg function
index_sort
to sort the inputedge_index
. However, I believe that in most of the cases, the input is already sorted. Hence, another option could be assuming that the inputedge_index
is already sorted. In this way, the sort becomes a duty of the user.How to manage the warning
It could be cool to raise a warning when the sparse method could fail. I raise some warning when the probability of sampling is low, but probably it is a better idea to raise the warning when we are sure that the method has failed (e.g. the number of sampled edges is 10% less than the requested number)
Determine the number of samples in the structured negative sampling$k$ in the sparse structured negative sampling. Probably my approach is not the best one since it is "global", hence it is difficult to find the right $k$ . For now, I just set $k=10$ .
I am struggling to find a way to determine the number
Feasibility of structured negative sampling$deg(u) < N/2$ for all nodes in the graph.
If I understood the code of
structured_negative_sampling_feasible
, it returns true if there are no nodes that are connected with all the others. I think this is wrong since the structured negative sampling should be feasible if and only ifLet me know what you think about this!