Skip to content

Commit 87ba476

Browse files
committed
add metrics and some draft text to README
1 parent 5de1c7a commit 87ba476

8 files changed

+780
-16
lines changed

LICENSE.txt

+674
Large diffs are not rendered by default.

README.md

+50-15
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,63 @@
1-
Repo where I'll be creating my own LLM from scratch. The goal is to deepen my understanding of model-architecture design, training methodologies, and core deep-learning principles.
1+
<div align="center">
22

3-
Run first
4-
`export PYTHONPATH="${PYTHONPATH}:$(pwd)/src"`
3+
# LF_LLM-269M
54

6-
- Make sure to use PyTorch 2.0 or later!
5+
</div>
6+
<div align="center"><img src="./assets/puzzle_fox.jpg" width="300"/></div>
77

8-
To debug on a Mac, use: `./run_pre_training_e2e_on_mac.sh`
9-
To run on a machine with an NVIDIA GPU(s), use: `./run_pre_training_e2e.sh`
8+
MyLLM is a deep-learning personal project where I built a modern LLM from the ground up. I focused on developing the core components required for pre-training an LLM, including writing the model-architecture code, handling large datasets, training the model efficiently, and evaluating its performance.
109

11-
Push to server with NVIDIA GPUs (ignoring contents from `temp_data/` dir):
12-
```
13-
rsync -avz --delete --progress --exclude 'temp_data/*' $PWD username@server_ip_address:/home/ubuntu/
14-
```
10+
# How To Reproduce
11+
You can debug on a Mac (or most unix/linux-machine) by using `./run_pre_training_e2e_debug.sh`.
1512

16-
# Choosing Model Architecture and Training Parameters
13+
To actually train the model I used NVIDIA GPUs (went with 8xA100s because of cost). To run training end-to-end (downloading all datasets needed, training, running evals, etc) you can simply run `./run_pre_training_e2e.sh`. I used [VESSL AI's](https://vessl.ai) Workspaces to setup my training infra, using their `PyTorch 2.3.1 (CUDA 12.1)` image.
14+
15+
Note that this project uses the ./temp_data/ dir as a quick access way to store temporary data, such as logs, datasets, and checkpoints. To avoid syncing this between a development machine and your accelerated machine you can use e.g. `rsync -avz --delete --progress --exclude 'temp_data/*' $PWD username@server_ip_address:/home/ubuntu/`
16+
17+
# Building LF_LLM-269M
18+
19+
### Choosing Model Architecture and Training Parameters
1720
Due to my limited GPU resources (I don't want to spend resources searching for the best parameters), and because this is a learning project, I'll base my parameters around the parameters used by open source LLMs. It's not a perfect approach by any means, and choosing parameters can be an entire project of its own, but for now this is fine.
1821
Below are is a summary table I created to help me tune my parameters (more info in [parameters_tuning.ipynb](./notebooks/parameters_tuning.ipynb)).
1922

2023
![Summary table](./assets/some_open_source_models.png)
2124

25+
### Pre-Training Data
26+
For pre-training data I looked at [Dolma](https://allenai.org/dolma) and [RedPajama-v2](https://www.together.ai/blog/redpajama-data-v2), but [build-nanogpt](https://github.com/karpathy/build-nanogpt) showed me that a smaller, more refine dataset is enough for a small project like this.
27+
28+
# Results
29+
30+
[metric_graphs.ipynb](./notebooks/metric_graphs.ipynb)
31+
![Training Metrics](./assets/training_metrics_over_steps.png)
32+
![Train Val Loss](./assets/train_val_loss.png)
33+
![HellaSwag Acc](./assets/hellaswag_acc.png)
34+
35+
36+
**At step 0 (no training):**
37+
(Green is the prompt, blue is LLM generated text.)
38+
- Sample 1: <span style="color:green;">If animals could talk, my pet would probably say </span><span style="color:blue;">undertake undertake distortion intest Gylassotide acids Yankee neoconcept Coming Coming launcherimpl Sussex Sussexinea minim Ding</span>
39+
- Sample 2: <span style="color:green;">HTML stands for </span><span style="color:blue;">campaigns desserts sawradio AUTH sort Pythononto unforeseen rainfall rainfall Host awaits solubleheaded Fever estimate genders proponentMAR</span>
40+
- Sample 3: <span style="color:green;">The clever fox built the strange machine with just a feather, a pebble, and a tiny twig </span><span style="color:blue;">intrusion complying Resist master Yad induced derogatory Magic damageced amusing 290Sn},{" muddy universal prospect prospect prospect Rey</span>
41+
42+
**After last training step:**
43+
(Green is the prompt, blue is LLM generated text.)
44+
- Sample 1: <span style="color:green;">If animals could talk, my pet would probably say </span><span style="color:blue;">hello or I would say Hi.
45+
I am excited to have my pet respond to the sound I</span>
46+
- Sample 2: <span style="color:green;">HTML stands for </span><span style="color:blue;">HyperText Markup Language. For more information about the browser, see:<|endoftext|>A few months ago</span>
47+
- Sample 3: <span style="color:green;">The clever fox built the strange machine with just a feather, a pebble, and a tiny twig </span><span style="color:blue;">; by the time it was ready, it was a great working machine. After watching him carefully,</span>
48+
49+
50+
<details>
51+
<summary><strong>Resources/References</strong></summary>
2252

23-
More to come...
53+
- [Molomo (MolmoE)](https://huggingface.co/allenai/MolmoE-1B-0924)
54+
- [apple/corenet](https://github.com/apple/corenet/tree/main)
55+
- [allenai/OLMo](https://github.com/allenai/OLMo)
56+
- [mosaicml/llm-foundry](https://github.com/mosaicml/llm-foundry)
57+
- [google-research/tuning_playbook](https://github.com/google-research/tuning_playbook)
58+
- [karpathy/build-nanogpt](https://github.com/karpathy/build-nanogpt/tree/master)
2459

60+
</details>
2561

26-
11/21/24 2:14PM
27-
Time (per step) = 767.36 ms.
28-
Throughput: 341,620.04 tokens/sec
62+
## License
63+
GNU GPLv3 ([LICENSE.txt](./LICENSE.txt))

assets/hellaswag_acc.png

41.3 KB
Loading
+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
Every 1.0s: nvidia-smi
2+
3+
Fri Dec 6 20:34:35 2024
4+
+---------------------------------------------------------------------------------------+
5+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
6+
|-----------------------------------------+----------------------+----------------------+
7+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
8+
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
9+
| | | MIG M. |
10+
|=========================================+======================+======================|
11+
| 0 NVIDIA A100-SXM4-80GB Off | 00000000:06:00.0 Off | Off |
12+
| N/A 65C P0 381W / 400W | 72338MiB / 81920MiB | 99% Default |
13+
| | | Disabled |
14+
+-----------------------------------------+----------------------+----------------------+
15+
| 1 NVIDIA A100-SXM4-80GB Off | 00000000:0C:00.0 Off | Off |
16+
| N/A 61C P0 397W / 400W | 72482MiB / 81920MiB | 99% Default |
17+
| | | Disabled |
18+
+-----------------------------------------+----------------------+----------------------+
19+
| 2 NVIDIA A100-SXM4-80GB Off | 00000000:10:00.0 Off | Off |
20+
| N/A 63C P0 327W / 400W | 72488MiB / 81920MiB | 98% Default |
21+
| | | Disabled |
22+
+-----------------------------------------+----------------------+----------------------+
23+
| 3 NVIDIA A100-SXM4-80GB Off | 00000000:16:00.0 Off | Off |
24+
| N/A 62C P0 331W / 400W | 72480MiB / 81920MiB | 100% Default |
25+
| | | Disabled |
26+
+-----------------------------------------+----------------------+----------------------+
27+
| 4 NVIDIA A100-SXM4-80GB Off | 00000000:19:00.0 Off | Off |
28+
| N/A 65C P0 382W / 400W | 72480MiB / 81920MiB | 100% Default |
29+
| | | Disabled |
30+
+-----------------------------------------+----------------------+----------------------+
31+
| 5 NVIDIA A100-SXM4-80GB Off | 00000000:1B:00.0 Off | Off |
32+
| N/A 66C P0 412W / 400W | 72480MiB / 81920MiB | 100% Default |
33+
| | | Disabled |
34+
+-----------------------------------------+----------------------+----------------------+
35+
| 6 NVIDIA A100-SXM4-80GB Off | 00000000:21:00.0 Off | Off |
36+
| N/A 70C P0 400W / 400W | 72486MiB / 81920MiB | 100% Default |
37+
| | | Disabled |
38+
+-----------------------------------------+----------------------+----------------------+
39+
| 7 NVIDIA A100-SXM4-80GB Off | 00000000:24:00.0 Off | Off |
40+
| N/A 62C P0 389W / 400W | 72342MiB / 81920MiB | 100% Default |
41+
| | | Disabled |
42+
+-----------------------------------------+----------------------+----------------------+
43+
44+
+---------------------------------------------------------------------------------------+
45+
| Processes: |
46+
| GPU GI CI PID Type Process name GPU Memory |
47+
| ID ID Usage |
48+
|=======================================================================================|
49+
+---------------------------------------------------------------------------------------+

assets/puzzle_fox.jpg

1.27 MB
Loading

assets/train_val_loss.png

40.8 KB
Loading
169 KB
Loading

notebooks/parameters_tuning.ipynb

+7-1
Original file line numberDiff line numberDiff line change
@@ -581,8 +581,14 @@
581581
}
582582
],
583583
"metadata": {
584+
"kernelspec": {
585+
"display_name": "my_llm",
586+
"language": "python",
587+
"name": "python3"
588+
},
584589
"language_info": {
585-
"name": "python"
590+
"name": "python",
591+
"version": "3.9.20"
586592
}
587593
},
588594
"nbformat": 4,

0 commit comments

Comments
 (0)