TarrySingh
diff --git a/‎.DS_Store
2 KB b/‎.DS_Store
2 KB
diff --git a/‎deep-learning/.DS_Store
0 Bytes b/‎deep-learning/.DS_Store
0 Bytes
diff --git a/‎deep-learning/Transformer-Tutorials/ViLT/Fine_tuning_ViLT_for_VQA.ipynb
+22,864 b/‎deep-learning/Transformer-Tutorials/ViLT/Fine_tuning_ViLT_for_VQA.ipynb
+22,864
diff --git a/‎deep-learning/Transformer-Tutorials/ViLT/Inference_with_ViLT_(visual_question_answering).ipynb
+1,051 b/‎deep-learning/Transformer-Tutorials/ViLT/Inference_with_ViLT_(visual_question_answering).ipynb
+1,051
diff --git a/‎deep-learning/Transformer-Tutorials/ViLT/Masked_language_modeling_with_ViLT.ipynb
+3,196 b/‎deep-learning/Transformer-Tutorials/ViLT/Masked_language_modeling_with_ViLT.ipynb
+3,196
diff --git a/‎deep-learning/Transformer-Tutorials/ViLT/README.md
+10 b/‎deep-learning/Transformer-Tutorials/ViLT/README.md
+10
@@ -0,0 +1,10 @@
+# ViLT notebooks
+In this directory, you can find several notebooks that illustrate how to use NAVER AI Lab's [ViLT](https://arxiv.org/abs/2102.03334) both for fine-tuning on custom data as well as inference. It currently includes the following notebooks:
+
+- fine-tuning ViLT for visual question answering (VQA) (based on the [VQAv2 dataset](https://visualqa.org/))
+- performing inference with ViLT to illustrate visual question answering (VQA)
+- masked language modeling (MLM) with a pre-trained ViLT model
+- performing inference with ViLT for image-text retrieval
+- performing inference with ViLT to illustrate natural language for visual reasoning (based on the [NLVRv2 dataset](https://lil.nlp.cornell.edu/nlvr/)).
+
+All models can be found on the [hub](https://huggingface.co/models?search=vilt).