Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on sub-ePDG depth and normalization #9

Open
yunhao-qian opened this issue Dec 29, 2023 · 4 comments
Open

Questions on sub-ePDG depth and normalization #9

yunhao-qian opened this issue Dec 29, 2023 · 4 comments

Comments

@yunhao-qian
Copy link

Hi, thank you for the excellent paper and open sourcing of the project. I am working on a new project that modifies and uses your code, and here are two questions I have.

  1. I notice a depth_limit option for controlling the depth of sub-ePDGs at runtime, but this option is unused in the hector train command line, which means that the sub-ePDG of a manifestation point always includes all its predecessor nodes. There might be situations where a malicious sub-ePDG is a subgraph of another sub-ePDG labelled benign. Are these situations expected, or we should avoid them early in the data preprocessing step?
  2. The VulChecker paper mentioned a batch normalization layer between the GNN and the MLP classifier, while model.py has an orphan BatchNorm module left unused. Was there any reason behind that change? Or has it been replaced because the normalization performed by hector feature_stats achieves a similar goal?
@gmacon
Copy link

gmacon commented Jan 2, 2024

  1. The depth_limit is set in hector train, but by a somewhat convoluted path: a Predictor is constructed with the parameter embedding_steps set from the command line option --embedding-steps (default value: 4), and then predictor.depth_limit is passed to TrainingData.from_parameters. In the Predictor, the depth_limit attribute is defined to just expose the embedding_steps paramter.
  2. My memory on this point is a little vague, but I believe the intent was to replace the batch normalization in the model with an ahead-of-time normalization that happens during data loading. Based on that, I think the unused BatchNorm module was left in by mistake.

@ulysseyson
Copy link

In my opinion, the question for question 1 was insufficiently answered.
I still don't know if we should avoid the problem that a malicious sub-ePDG may be a subgraph of another benign sub-ePDG.

Is the reason you didn't go deep into the problem that the sampling step may contain a malicious sub-ePDG, because you trust the manifestation_distance you set in the sampling step? Which is the feature that tells model should not concentrate on inside malicious manifestation node.

@gmacon
Copy link

gmacon commented Aug 20, 2024

The question is not really "is this graph buggy"; the question is "is there a bug at this manifestation point?" so I don't think it matters if the bug is only a subgraph of the graph in consideration: it's buggy either way.

@ulysseyson
Copy link

That's right. Anyway the s2v captures the difference between malicious end and benign end.

I think that is awesome point of ML.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants