-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor CPG representation #5098
Comments
Joern’s CFG representation is designed to model program flow in a way that aligns with actual control flow in software, which is inherently non-tree-like. CFGs are often directed graphs with cycles due to loops and other control flow constructs, so they naturally resist a strict hierarchical tree structure.
Joern’s AST representation prioritizes cross-language support and compatibility with its CPG model. This model emphasizes program structure in a way that can generalize across languages, which sometimes requires a trade-off in language-specific AST details. For example, representing assignments as atomic operations is done to normalize operations across languages, especially for cross-language analysis tasks. Joern’s CPG model includes both data flow and control flow edges. This allows you to perform deeper analysis that can distinguish between LHS and RHS in assignments by exploring data dependencies, rather than syntactic parsing. By combining Joern’s AST with data flow and control flow analysis, you can perform rich, multi-level queries that surpass what a traditional AST alone would provide. But also pure AST-based queries / traversals are available.
Quite the opposite is true. Joern’s CPG model is intentionally designed as an interconnected graph that combines AST, CFG, call graphs, and data flow into a unified structure. This model is actually one of Joern’s key advantages, as it allows traversing across these different representations in a single query. If you wish to isolate specific components, like only an AST or only a CFG, Joern provides API calls to retrieve each representation individually or combined as required. The flexibility of Joern’s DSL and the modular CPG design means you can perform specific queries across AST, CFG, and data flows, going beyond the capabilities of isolated graphs. |
Just to add on from what @max-leuthaeuser says:
CFG's aren't rooted trees, as they can have loops. In any case, something like this can trace a simple path in the CFG:
There is an open PR on these traversals lifted from the graph database's source code: joernio/flatgraph-docs#2
Have a look at the Furthermore, I do however think that the AISEC work has merit, and last I checked they were exploring some novel work that we simply don't do like weighted pushdown systems (maybe even typestate analysis), however I could not get that running when I pulled the project locally. It is largely aimed at research, so scalability and practicality is not at the forefront of their concerns, e.g., they use Neo4j as their storage backend which is an inherent limitation on resources when compared to this package's backend, flatgraph. |
Hi folks, I really appreciate your work, and realize Joern highly depends on this package.
However, I find the representation of this CPG is poor. For example, here are my practices,
cpg.cfgNode.toList
. If we want the tree, we need to select a starting node and then perform a DFS/BFS to get a tree on our own;I want to implement heavy static analysis based on your work. However, I was stuck in the early stage. If you can give some suggestions, I would really appreciate it! Thank you very much!
Here is the reference: https://github.com/Fraunhofer-AISEC/cpg. They support further static analysis.
The text was updated successfully, but these errors were encountered: