Thank you for your incredible work! I've learned a lot from it. May I ask if I have the edge index computed using CUDA, how can I efficiently find the connected components in an arbitrary-degree graph? Alternatively, is there a method to group nodes using an adjacency matrix with CUDA? I've tested loading it to the CPU and using NetworkX to handle the subgraph, but it's a bit slow.