LF-Luis
diff --git a/‎LICENSE.txt
+674 b/‎LICENSE.txt
+674
diff --git a/‎README.md
+50-15 b/‎README.md
+50-15
diff --git a/‎assets/hellaswag_acc.png
41.3 KB b/‎assets/hellaswag_acc.png
41.3 KB
diff --git a/‎assets/nvidia-smi_sample_train_output.txt
+49 b/‎assets/nvidia-smi_sample_train_output.txt
+49
diff --git a/‎assets/puzzle_fox.jpg
1.27 MB b/‎assets/puzzle_fox.jpg
1.27 MB
diff --git a/‎assets/train_val_loss.png
40.8 KB b/‎assets/train_val_loss.png
40.8 KB
diff --git a/‎assets/training_metrics_over_steps.png
169 KB b/‎assets/training_metrics_over_steps.png
169 KB
diff --git a/‎notebooks/parameters_tuning.ipynb
+7-1 b/‎notebooks/parameters_tuning.ipynb
+7-1
@@ -1,28 +1,63 @@
-Repo where I'll be creating my own LLM from scratch. The goal is to deepen my understanding of model-architecture design, training methodologies, and core deep-learning principles.
+<div align="center">
 
-Run first
-`export PYTHONPATH="${PYTHONPATH}:$(pwd)/src"`
+# LF_LLM-269M
 
-- Make sure to use PyTorch 2.0 or later!
+</div>
+<div align="center"><img src="./assets/puzzle_fox.jpg" width="300"/></div>
 
-To debug on a Mac, use: `./run_pre_training_e2e_on_mac.sh`
-To run on a machine with an NVIDIA GPU(s), use: `./run_pre_training_e2e.sh`
+MyLLM is a deep-learning personal project where I built a modern LLM from the ground up. I focused on developing the core components required for pre-training an LLM, including writing the model-architecture code, handling large datasets, training the model efficiently, and evaluating its performance.
 
-Push to server with NVIDIA GPUs (ignoring contents from `temp_data/` dir):
-```
-rsync -avz --delete --progress --exclude 'temp_data/*' $PWD username@server_ip_address:/home/ubuntu/
-```
+# How To Reproduce
+You can debug on a Mac (or most unix/linux-machine) by using `./run_pre_training_e2e_debug.sh`.   
 
-# Choosing Model Architecture and Training Parameters
+To actually train the model I used NVIDIA GPUs (went with 8xA100s because of cost). To run training end-to-end (downloading all datasets needed, training, running evals, etc) you can simply run `./run_pre_training_e2e.sh`. I used [VESSL AI's](https://vessl.ai) Workspaces to setup my training infra, using their `PyTorch 2.3.1 (CUDA 12.1)` image.
+
+Note that this project uses the ./temp_data/ dir as a quick access way to store temporary data, such as logs, datasets, and checkpoints. To avoid syncing this between a development machine and your accelerated machine you can use e.g. `rsync -avz --delete --progress --exclude 'temp_data/*' $PWD username@server_ip_address:/home/ubuntu/`
+
+# Building LF_LLM-269M
+
+### Choosing Model Architecture and Training Parameters
 Due to my limited GPU resources (I don't want to spend resources searching for the best parameters), and because this is a learning project, I'll base my parameters around the parameters used by open source LLMs. It's not a perfect approach by any means, and choosing parameters can be an entire project of its own, but for now this is fine.  
 Below are is a summary table I created to help me tune my parameters (more info in [parameters_tuning.ipynb](./notebooks/parameters_tuning.ipynb)).
 
 ![Summary table](./assets/some_open_source_models.png)
 
+### Pre-Training Data
+For pre-training data I looked at [Dolma](https://allenai.org/dolma) and [RedPajama-v2](https://www.together.ai/blog/redpajama-data-v2), but [build-nanogpt](https://github.com/karpathy/build-nanogpt) showed me that a smaller, more refine dataset is enough for a small project like this.
+
+# Results
+
+[metric_graphs.ipynb](./notebooks/metric_graphs.ipynb)
+![Training Metrics](./assets/training_metrics_over_steps.png)
+![Train Val Loss](./assets/train_val_loss.png)
+![HellaSwag Acc](./assets/hellaswag_acc.png)
+
+
+**At step 0 (no training):**  
+(Green is the prompt, blue is LLM generated text.)
+- Sample 1: <span style="color:green;">If animals could talk, my pet would probably say </span><span style="color:blue;">undertake undertake distortion intest Gylassotide acids Yankee neoconcept Coming Coming launcherimpl Sussex Sussexinea minim Ding</span>
+- Sample 2: <span style="color:green;">HTML stands for </span><span style="color:blue;">campaigns desserts sawradio AUTH sort Pythononto unforeseen rainfall rainfall Host awaits solubleheaded Fever estimate genders proponentMAR</span>
+- Sample 3: <span style="color:green;">The clever fox built the strange machine with just a feather, a pebble, and a tiny twig </span><span style="color:blue;">intrusion complying Resist master Yad induced derogatory Magic damageced amusing 290Sn},{" muddy universal prospect prospect prospect Rey</span>
+
+**After last training step:**  
+(Green is the prompt, blue is LLM generated text.)
+- Sample 1: <span style="color:green;">If animals could talk, my pet would probably say </span><span style="color:blue;">hello or I would say Hi.
+I am excited to have my pet respond to the sound I</span>
+- Sample 2: <span style="color:green;">HTML stands for </span><span style="color:blue;">HyperText Markup Language. For more information about the browser, see:<|endoftext|>A few months ago</span>
+- Sample 3: <span style="color:green;">The clever fox built the strange machine with just a feather, a pebble, and a tiny twig </span><span style="color:blue;">; by the time it was ready, it was a great working machine. After watching him carefully,</span>
+
+
+<details>
+<summary><strong>Resources/References</strong></summary>
 
-More to come...
+- [Molomo (MolmoE)](https://huggingface.co/allenai/MolmoE-1B-0924)
+- [apple/corenet](https://github.com/apple/corenet/tree/main)
+- [allenai/OLMo](https://github.com/allenai/OLMo)
+- [mosaicml/llm-foundry](https://github.com/mosaicml/llm-foundry)
+- [google-research/tuning_playbook](https://github.com/google-research/tuning_playbook)
+- [karpathy/build-nanogpt](https://github.com/karpathy/build-nanogpt/tree/master)
 
+</details>
 
-11/21/24 2:14PM
-Time (per step) = 767.36 ms. 
-Throughput: 341,620.04 tokens/sec
+## License
+GNU GPLv3 ([LICENSE.txt](./LICENSE.txt))
@@ -0,0 +1,49 @@
+Every 1.0s: nvidia-smi
+
+Fri Dec  6 20:34:35 2024
++---------------------------------------------------------------------------------------+
+| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
+|-----------------------------------------+----------------------+----------------------+
+| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
+|                                         |                      |               MIG M. |
+|=========================================+======================+======================|
+|   0  NVIDIA A100-SXM4-80GB          Off | 00000000:06:00.0 Off |                  Off |
+| N/A   65C    P0             381W / 400W |  72338MiB / 81920MiB |     99%      Default |
+|                                         |                      |             Disabled |
++-----------------------------------------+----------------------+----------------------+
+|   1  NVIDIA A100-SXM4-80GB          Off | 00000000:0C:00.0 Off |                  Off |
+| N/A   61C    P0             397W / 400W |  72482MiB / 81920MiB |     99%      Default |
+|                                         |                      |             Disabled |
++-----------------------------------------+----------------------+----------------------+
+|   2  NVIDIA A100-SXM4-80GB          Off | 00000000:10:00.0 Off |                  Off |
+| N/A   63C    P0             327W / 400W |  72488MiB / 81920MiB |     98%      Default |
+|                                         |                      |             Disabled |
++-----------------------------------------+----------------------+----------------------+
+|   3  NVIDIA A100-SXM4-80GB          Off | 00000000:16:00.0 Off |                  Off |
+| N/A   62C    P0             331W / 400W |  72480MiB / 81920MiB |    100%      Default |
+|                                         |                      |             Disabled |
++-----------------------------------------+----------------------+----------------------+
+|   4  NVIDIA A100-SXM4-80GB          Off | 00000000:19:00.0 Off |                  Off |
+| N/A   65C    P0             382W / 400W |  72480MiB / 81920MiB |    100%      Default |
+|                                         |                      |             Disabled |
++-----------------------------------------+----------------------+----------------------+
+|   5  NVIDIA A100-SXM4-80GB          Off | 00000000:1B:00.0 Off |                  Off |
+| N/A   66C    P0             412W / 400W |  72480MiB / 81920MiB |    100%      Default |
+|                                         |                      |             Disabled |
++-----------------------------------------+----------------------+----------------------+
+|   6  NVIDIA A100-SXM4-80GB          Off | 00000000:21:00.0 Off |                  Off |
+| N/A   70C    P0             400W / 400W |  72486MiB / 81920MiB |    100%      Default |
+|                                         |                      |             Disabled |
++-----------------------------------------+----------------------+----------------------+
+|   7  NVIDIA A100-SXM4-80GB          Off | 00000000:24:00.0 Off |                  Off |
+| N/A   62C    P0             389W / 400W |  72342MiB / 81920MiB |    100%      Default |
+|                                         |                      |             Disabled |
++-----------------------------------------+----------------------+----------------------+
+
++---------------------------------------------------------------------------------------+
+| Processes:                                                                            |
+|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
+|        ID   ID                                                             Usage      |
+|=======================================================================================|
++---------------------------------------------------------------------------------------+
@@ -581,8 +581,14 @@
   }
  ],
  "metadata": {
+  "kernelspec": {
+   "display_name": "my_llm",
+   "language": "python",
+   "name": "python3"
+  },
   "language_info": {
-   "name": "python"
+   "name": "python",
+   "version": "3.9.20"
   }
  },
  "nbformat": 4,