Skip to content

This repository focuses on Deep Learning in NLP and implements sentiment classification models on a Twitter Dataset, using machine learning algorithms. It is based on an assignment from the Department of Informatics and Telecommunications at the University of Athens (DIT UOA).

License

Notifications You must be signed in to change notification settings

AntonisZks/Deep-Learning-for-Natural-Language-Processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Learning for Natural Language Processing

License Repository Size Release Issues

thumbnail

This repository contains implementations of various deep learning models for natural language processing (NLP) tasks, specifically sentiment classification on an English Twitter dataset. The project is based on assignments for the Department of Informatics and Telecommunications (DIT) at the University of Athens (UOA).

Table of Contents


Overview

The goal of this project is to build and fine-tune sentiment classifiers using various deep learning models, including BERT, DistilBERT, and traditional machine learning approaches like TF-IDF with logistic regression. The models are trained and evaluated on a Twitter dataset, with the final goal of predicting sentiment labels for unseen tweets.


Repository Structure

├── data/
│   ├── sample_submission.csv
│   ├── test_dataset.csv
│   ├── train_dataset.csv
│   ├── val_dataset.csv
├── docs/
│   ├── AI2_Homework_1_2025.pdf
│   ├── AI2_Homework_2_2025.pdf
│   ├── AI2_Homework_3_2025.pdf
├── notebooks/
│   ├── bert_transformer.ipynb
│   ├── distilbert_transformer.ipynb
│   ├── tfidf_logistic_regression.ipynb
│   ├── word_embeddings_deep_neural_networks.ipynb
├── reports/
│   ├── figures/
│   │   ├── activation_functions_training_results.png
│   │   ├── base_model_training_results.png
│   │   ├── dataset_file_sizes_pie.png
│   ├── PDFs/
│   │   ├── BERT_and_DistilBERT_transformers_in_NLP.pdf
│   │   ├── TF-IDF_and_Logistic_Regression_in_NLP.pdf
│   │   ├── Word_Embeddings_and_FeedForward_Neural_Networks.pdf
├── LICENSE
├── README.md

Key Files and Directories

  • data/: Contains the datasets used for training, validation, and testing.
  • docs/: Documentation and assignment PDFs related to the project.
  • notebooks/: Jupyter notebooks implementing various models and experiments.
  • reports/: Visualizations and reports generated during the experiments.
  • requirements.txt: Python dependencies required to run the project.
  • LICENSE: MIT License for the repository.

Models Implemented

  1. BERT:

  2. DistilBERT:

  3. TF-IDF with Logistic Regression:

  4. Word Embeddings with Deep Neural Networks:


Setup Instructions

Prerequisites

  • Python 3.8 or higher
  • GPU support (optional but recommended for training deep learning models)

Installation

  1. Clone the repository:

    git clone https://github.com/AntonisZks/Deep-Learning-for-Natural-Language-Processing.git
    cd Deep-Learning-for-Natural-Language-Processing
    
  2. Install dependencies:

    pip install -r requirements.txt
  3. Download the datasets and place them in the data/ directory


License

This project is licensed under the MIT License. See the LICENSE file for details.

About

This repository focuses on Deep Learning in NLP and implements sentiment classification models on a Twitter Dataset, using machine learning algorithms. It is based on an assignment from the Department of Informatics and Telecommunications at the University of Athens (DIT UOA).

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published