Predicting the Understandability of Computational Notebooks through Code Metrics Analysis

This repository contains the source code for a code comprehension predictor service for computational notesbooks.

Usage

To run the code, install the requirements by executing the following command:

pip install -r requirements.txt

After installing the requirements, you can run the CLI or API and start using the service.

Data Requirements

To use the functionalities provided in this repository, you will need certain CSV files containing notebook code and markdown cell data. These files can be found here (DistilKaggle: a distilled dataset of Kaggle Jupyter notebooks) and here (A Predictive Model to Identify Effective Metrics for the Comprehension of Computational Notebooks).

Use below download links to get started

notebook_metrics.csv: notebooks features file, mainly used to train models.
code.csv: mainly used for metrics extraction.
augmented_kernel_quality.csv: notebook scores file, mainly used to train models.
sample1050_labeled_by_experts.csv: used to evaluate the models.

Folder Structure

src: Contains the main code that provides code comprehension prediction and metrics evaluation.
- src/core: includes the main python files of the project. These classes and functions do the actual work behind the interfaces.
- src/utils: helper files used to manage the project like config.py where we manage all the configurations.
- src/notebooks: base notebook files that support the paper's results.
dataframes: Contains basic data of selected jupyter notebooks for training models. For example, code.csv that contains the source codes used in each notebook, and markdown.csv that has the markdown cells data.
metrics: Contains CSV files with metrics of selected Jupyter notebooks for training the models. For instance, code_cell_metrics.csv contains metrics of each code cell in the notebook, markdown_cell_metrics.csv contains markdown cell metrics of each notebook, and notebook_metrics.csv holds the aggregated metrics of all cells in the notebook.
notebooks: Stores the notebooks provided to be predicted by the code.
models: Stores the trained models.
logs: Keeps the log files.
cache: Is used for cached data.

CLI

First, cd to the src directory and then execute cli.py file and start your journey.

cd src
export PYTHONPATH="$(pwd)"
python cli.py --help

Use --help for each command to get further instructions. Some use cases are provided below.

python cli.py
python cli.py extract-dataframe-metrics --help
python cli.py extract-dataframe-metrics --chunk-size 100 --limit-chunk-count 5
python cli.py extract-dataframe-metrics ../dataframes/markdown.csv ../metrics/markdown_cell_metrics.csv --chunk-size 100 --limit-chunk-count 5 --file-type markdown

python cli.py aggregate-metrics --help
python cli.py aggregate-metrics ../metrics/code_cell_metrics.csv ../metrics/markdown_cell_metrics.csv ../metrics/notebook_metrics_lite.csv

python cli.py extract-notebook-metrics --help
python cli.py extract-notebook-metrics ../notebooks/file.ipynb ../notebooks/results.json
python cli.py extract-notebook-metrics ../notebooks/file.ipynb ../notebooks/results.csv

python cli.py predict ../notebooks/file.ipynb cat_boost ../models/catBoostClassifier.withOutPT.sf50.sr20.combined_score.v2.model 
python cli.py predict ../notebooks/file.ipynb cat_boost ../models/catBoostClassifier.withPT.sf50.sr20.combined_score.v2.model --pt-score 10

FastAPI

First, cd to the src directory and then execute main.py file and start your journey.

cd src
export PYTHONPATH="$(pwd)"
python main.py

after this you can see the documentation of the apis at http://localhost:8000/docs.

Docker

Use below command to build and run the image using docker compose

docker compose up --build

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting the Understandability of Computational Notebooks through Code Metrics Analysis

Usage

Data Requirements

Folder Structure

CLI

FastAPI

Docker

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
cache		cache
dataframes		dataframes
logs		logs
metrics		metrics
models		models
nginx		nginx
notebooks		notebooks
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

ISE-Research/NotebookCU

Folders and files

Latest commit

History

Repository files navigation

Predicting the Understandability of Computational Notebooks through Code Metrics Analysis

Usage

Data Requirements

Folder Structure

CLI

FastAPI

Docker

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages