Skip to content

Code for the methods and algorithms described in the paper "Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task"

License

Notifications You must be signed in to change notification settings

worldbeater/code-vecs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task

Code for source code embedding algorithms described in the paper Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task. This repository also includes code implementing control flow graph-based source code embeddings for reproducing the experiments described in our paper Source Code Embeddings Based on Control Flow Graphs and Markov Chains for Program Classification.

image

Getting Started

  1. Install Docker CE and GNU make.
  2. Clone the repository, then clone the submodules using git submodule update --init --recursive
  3. Download the dataset [2] from Zenodo and extract the task-*.csv files into src/data.
  4. Classification targets can contain digits, so navigate to external/code2vec/common.py and apply the patch:
     @staticmethod
     def legal_method_names_checker(special_words, name):
-        return name != special_words.OOV and re.match(r'^[a-zA-Z|]+$', name)
+        return name != special_words.OOV
  1. Run make notebook from repository root, run the notebooks.

References

  1. Gorchakov, A.V.; Demidova, L.A.; Sovietov, P.N. Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task. Future Internet 2023, 15, 314.
  2. Demidova, L.A.; Andrianova, E.G.; Sovietov, P.N.; Gorchakov, A.V. Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant. Data 2023, 8 (6), p. 109.
  3. Gorchakov, A.V.; Demidova, L.A.; Maslennikov, V.V. Source Code Embeddings Based on Control Flow Graphs and Markov Chains for Program Classification. Proceedings of the 2024 6th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA). IEEE, 2024, pp 328-333.

Citation

If you use the code from this repository in your research work, please consider citing 1 or 3.

About

Code for the methods and algorithms described in the paper "Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task"

Topics

Resources

License

Stars

Watchers

Forks

Languages