Skip to content

DeepTrackAI/text2image_dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text-to-Image Transformed MNIST Dataset (text2image_dataset)

Overview

This DeepTrackAI repository contains a dataset of transformed MNIST digits paired with natural-language descriptions.
Images are generated using the notebook create_transformed_images.ipynb, which applies randomized color, rotation/flip, and contrast transformations to the original MNIST digits and saves both the transformed images and corresponding text prompts.

The dataset is designed for benchmarking text-to-image alignment and multimodal learning tasks.

Summary

  • Dataset Size: equal to the number of MNIST images (60,000), each with one transformed version
  • Image Size: 28 × 28 pixels
  • Format: 8-bit RGB PNG images
  • Metadata: JSON file mapping filenames to natural-language transformation descriptions
  • Transformations included:
    • Digit colorization (e.g. “in red on a blue background”)
    • Rotations (±90°, 180°)
    • Horizontal mirror / vertical flip
    • Auto-contrast normalization

Original Source

The transformed dataset in this repository is based on MNIST and therefore distributed under the same license. If you use this dataset, please follow the licensing requirements and provide proper attribution to the original authors.


Dataset Structure

/text2image_dataset  
  └── train/  
      ├── images/  
      │   ├── 0_00001.png  
      │   ├── 0_00002.png  
      │   └── ...  
      └── image_descriptions.json  
  • images/ contains transformed digit images in PNG format.
  • image_descriptions.json stores a dictionary mapping filenames to textual descriptions of the transformations.

How to Access the Data

Clone the Repository

git clone https://github.com/DeepTrackAI/text2image_dataset  
cd text2image_dataset  

Generate the Dataset

Run the notebook to generate transformed images and their descriptions:

jupyter notebook create_transformed_images.ipynb

Attribution

When using this dataset or the code, please cite both the original MNIST dataset and the text-to-image transformation repository.

Cite MNIST:

LeCun Y, Cortes C, Burges CJC. The MNIST Database of Handwritten Digits. Retrieved from http://yann.lecun.com/exdb/mnist/

@misc{lecun1998mnist,
  title        = {The MNIST Database of Handwritten Digits},
  author       = {LeCun, Yann and Cortes, Corinna and Burges, Christopher J.C.},
  year         = {1998},
  howpublished = {\url{http://yann.lecun.com/exdb/mnist/}}
}

Cite this repository:

Carlo Manzo. Text-to-Image Transformed MNIST Dataset. GitHub (2025).
GitHub

@misc{text2image2025,  
  author       = {Carlo Manzo},  
  title        = {Text-to-Image Transformed MNIST Dataset},  
  year         = {2025},  
  howpublished = {\url{https://github.com/DeepTrackAI/text2image_dataset}}  
}  

License

This dataset is shared under the Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) License, following the original licensing terms.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published