-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpretraining_log.txt
1588 lines (1579 loc) · 232 KB
/
pretraining_log.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
2024-12-06 20:27:54,923 - src.utils.handle_ddp - INFO - Launching worker with config:
local_rank: 0 | world_size: 8 | is_main: True
assigned_device: cuda:0 | device_type: cuda.
2024-12-06 20:27:55,752 - src.utils.handle_ddp - INFO - Launching worker with config:
local_rank: 6 | world_size: 8 | is_main: False
assigned_device: cuda:6 | device_type: cuda.
2024-12-06 20:27:56,497 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:56,691 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:56,879 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:57,045 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:57,064 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:57,228 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:57,251 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:57,410 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:57,435 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:57,593 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:57,621 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:57,773 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:57,804 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:57,954 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:57,986 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:58,135 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:58,159 - src.utils.handle_ddp - INFO - Launching worker with config:
local_rank: 4 | world_size: 8 | is_main: False
assigned_device: cuda:4 | device_type: cuda.
2024-12-06 20:27:58,172 - src.utils.handle_ddp - INFO - Launching worker with config:
local_rank: 2 | world_size: 8 | is_main: False
assigned_device: cuda:2 | device_type: cuda.
2024-12-06 20:27:58,173 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:58,220 - src.utils.handle_ddp - INFO - Launching worker with config:
local_rank: 1 | world_size: 8 | is_main: False
assigned_device: cuda:1 | device_type: cuda.
2024-12-06 20:27:58,249 - src.utils.handle_ddp - INFO - Launching worker with config:
local_rank: 5 | world_size: 8 | is_main: False
assigned_device: cuda:5 | device_type: cuda.
2024-12-06 20:27:58,249 - src.utils.handle_ddp - INFO - Launching worker with config:
local_rank: 3 | world_size: 8 | is_main: False
assigned_device: cuda:3 | device_type: cuda.
2024-12-06 20:27:58,251 - src.utils.handle_ddp - INFO - Launching worker with config:
local_rank: 7 | world_size: 8 | is_main: False
assigned_device: cuda:7 | device_type: cuda.
2024-12-06 20:27:58,320 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:58,355 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:58,508 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:58,538 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:58,690 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:58,722 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:58,872 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:58,905 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,055 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,086 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,237 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,421 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,463 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,582 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,603 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,603 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,614 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,641 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,642 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,649 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,766 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,787 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,798 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,826 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,831 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,836 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,948 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,970 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:27:59,980 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,009 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,014 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,017 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,132 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,154 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,162 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,194 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,198 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,201 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,317 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,338 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,347 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,381 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,383 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,383 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,499 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,523 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,529 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,564 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,565 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,566 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,681 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,710 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,724 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,747 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,747 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,749 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,863 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,892 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,906 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,929 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,931 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:00,939 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,046 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,075 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,090 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,112 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,114 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,121 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,228 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,260 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,296 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,297 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,302 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,380 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,411 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,443 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,481 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,483 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,488 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,571 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,598 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,629 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,669 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,676 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,677 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,755 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,788 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,815 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,857 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,861 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,861 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,941 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:01,970 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:02,000 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:02,043 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:02,046 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:02,048 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:02,124 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:02,155 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:02,184 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:02,230 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:02,235 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:02,312 - src.rope - INFO - cos_cache and sin_cache have been previously created, skipping creation.
2024-12-06 20:28:14,186 - src.model_utils.adamw_opt - INFO - 0.1 decay params: ['_orig_mod.embd.weight', '_orig_mod.transformer_blocks.0.attn.qkv_proj.weight', '_orig_mod.transformer_blocks.0.attn.out_proj.weight', '_orig_mod.transformer_blocks.0.ffn.net.0.weight', '_orig_mod.transformer_blocks.0.ffn.net.3.weight', '_orig_mod.transformer_blocks.1.attn.qkv_proj.weight', '_orig_mod.transformer_blocks.1.attn.out_proj.weight', '_orig_mod.transformer_blocks.1.ffn.net.0.weight', '_orig_mod.transformer_blocks.1.ffn.net.3.weight', '_orig_mod.transformer_blocks.2.attn.qkv_proj.weight', '_orig_mod.transformer_blocks.2.attn.out_proj.weight', '_orig_mod.transformer_blocks.2.ffn.net.0.weight', '_orig_mod.transformer_blocks.2.ffn.net.3.weight', '_orig_mod.transformer_blocks.3.attn.qkv_proj.weight', '_orig_mod.transformer_blocks.3.attn.out_proj.weight', '_orig_mod.transformer_blocks.3.ffn.net.0.weight', '_orig_mod.transformer_blocks.3.ffn.net.3.weight', '_orig_mod.transformer_blocks.4.attn.qkv_proj.weight', '_orig_mod.transformer_blocks.4.attn.out_proj.weight', '_orig_mod.transformer_blocks.4.ffn.net.0.weight', '_orig_mod.transformer_blocks.4.ffn.net.3.weight', '_orig_mod.transformer_blocks.5.attn.qkv_proj.weight', '_orig_mod.transformer_blocks.5.attn.out_proj.weight', '_orig_mod.transformer_blocks.5.ffn.net.0.weight', '_orig_mod.transformer_blocks.5.ffn.net.3.weight', '_orig_mod.transformer_blocks.6.attn.qkv_proj.weight', '_orig_mod.transformer_blocks.6.attn.out_proj.weight', '_orig_mod.transformer_blocks.6.ffn.net.0.weight', '_orig_mod.transformer_blocks.6.ffn.net.3.weight', '_orig_mod.transformer_blocks.7.attn.qkv_proj.weight', '_orig_mod.transformer_blocks.7.attn.out_proj.weight', '_orig_mod.transformer_blocks.7.ffn.net.0.weight', '_orig_mod.transformer_blocks.7.ffn.net.3.weight', '_orig_mod.transformer_blocks.8.attn.qkv_proj.weight', '_orig_mod.transformer_blocks.8.attn.out_proj.weight', '_orig_mod.transformer_blocks.8.ffn.net.0.weight', '_orig_mod.transformer_blocks.8.ffn.net.3.weight', '_orig_mod.transformer_blocks.9.attn.qkv_proj.weight', '_orig_mod.transformer_blocks.9.attn.out_proj.weight', '_orig_mod.transformer_blocks.9.ffn.net.0.weight', '_orig_mod.transformer_blocks.9.ffn.net.3.weight', '_orig_mod.transformer_blocks.10.attn.qkv_proj.weight', '_orig_mod.transformer_blocks.10.attn.out_proj.weight', '_orig_mod.transformer_blocks.10.ffn.net.0.weight', '_orig_mod.transformer_blocks.10.ffn.net.3.weight', '_orig_mod.transformer_blocks.11.attn.qkv_proj.weight', '_orig_mod.transformer_blocks.11.attn.out_proj.weight', '_orig_mod.transformer_blocks.11.ffn.net.0.weight', '_orig_mod.transformer_blocks.11.ffn.net.3.weight', '_orig_mod.transformer_blocks.12.attn.qkv_proj.weight', '_orig_mod.transformer_blocks.12.attn.out_proj.weight', '_orig_mod.transformer_blocks.12.ffn.net.0.weight', '_orig_mod.transformer_blocks.12.ffn.net.3.weight', '_orig_mod.transformer_blocks.13.attn.qkv_proj.weight', '_orig_mod.transformer_blocks.13.attn.out_proj.weight', '_orig_mod.transformer_blocks.13.ffn.net.0.weight', '_orig_mod.transformer_blocks.13.ffn.net.3.weight', '_orig_mod.transformer_blocks.14.attn.qkv_proj.weight', '_orig_mod.transformer_blocks.14.attn.out_proj.weight', '_orig_mod.transformer_blocks.14.ffn.net.0.weight', '_orig_mod.transformer_blocks.14.ffn.net.3.weight', '_orig_mod.transformer_blocks.15.attn.qkv_proj.weight', '_orig_mod.transformer_blocks.15.attn.out_proj.weight', '_orig_mod.transformer_blocks.15.ffn.net.0.weight', '_orig_mod.transformer_blocks.15.ffn.net.3.weight']
2024-12-06 20:28:14,187 - src.model_utils.adamw_opt - INFO - 0.0 no-decay params: ['_orig_mod.transformer_blocks.0.attn.qkv_proj.bias', '_orig_mod.transformer_blocks.0.attn.out_proj.bias', '_orig_mod.transformer_blocks.0.ffn.net.0.bias', '_orig_mod.transformer_blocks.0.ffn.net.3.bias', '_orig_mod.transformer_blocks.0.norm1.weight', '_orig_mod.transformer_blocks.0.norm2.weight', '_orig_mod.transformer_blocks.1.attn.qkv_proj.bias', '_orig_mod.transformer_blocks.1.attn.out_proj.bias', '_orig_mod.transformer_blocks.1.ffn.net.0.bias', '_orig_mod.transformer_blocks.1.ffn.net.3.bias', '_orig_mod.transformer_blocks.1.norm1.weight', '_orig_mod.transformer_blocks.1.norm2.weight', '_orig_mod.transformer_blocks.2.attn.qkv_proj.bias', '_orig_mod.transformer_blocks.2.attn.out_proj.bias', '_orig_mod.transformer_blocks.2.ffn.net.0.bias', '_orig_mod.transformer_blocks.2.ffn.net.3.bias', '_orig_mod.transformer_blocks.2.norm1.weight', '_orig_mod.transformer_blocks.2.norm2.weight', '_orig_mod.transformer_blocks.3.attn.qkv_proj.bias', '_orig_mod.transformer_blocks.3.attn.out_proj.bias', '_orig_mod.transformer_blocks.3.ffn.net.0.bias', '_orig_mod.transformer_blocks.3.ffn.net.3.bias', '_orig_mod.transformer_blocks.3.norm1.weight', '_orig_mod.transformer_blocks.3.norm2.weight', '_orig_mod.transformer_blocks.4.attn.qkv_proj.bias', '_orig_mod.transformer_blocks.4.attn.out_proj.bias', '_orig_mod.transformer_blocks.4.ffn.net.0.bias', '_orig_mod.transformer_blocks.4.ffn.net.3.bias', '_orig_mod.transformer_blocks.4.norm1.weight', '_orig_mod.transformer_blocks.4.norm2.weight', '_orig_mod.transformer_blocks.5.attn.qkv_proj.bias', '_orig_mod.transformer_blocks.5.attn.out_proj.bias', '_orig_mod.transformer_blocks.5.ffn.net.0.bias', '_orig_mod.transformer_blocks.5.ffn.net.3.bias', '_orig_mod.transformer_blocks.5.norm1.weight', '_orig_mod.transformer_blocks.5.norm2.weight', '_orig_mod.transformer_blocks.6.attn.qkv_proj.bias', '_orig_mod.transformer_blocks.6.attn.out_proj.bias', '_orig_mod.transformer_blocks.6.ffn.net.0.bias', '_orig_mod.transformer_blocks.6.ffn.net.3.bias', '_orig_mod.transformer_blocks.6.norm1.weight', '_orig_mod.transformer_blocks.6.norm2.weight', '_orig_mod.transformer_blocks.7.attn.qkv_proj.bias', '_orig_mod.transformer_blocks.7.attn.out_proj.bias', '_orig_mod.transformer_blocks.7.ffn.net.0.bias', '_orig_mod.transformer_blocks.7.ffn.net.3.bias', '_orig_mod.transformer_blocks.7.norm1.weight', '_orig_mod.transformer_blocks.7.norm2.weight', '_orig_mod.transformer_blocks.8.attn.qkv_proj.bias', '_orig_mod.transformer_blocks.8.attn.out_proj.bias', '_orig_mod.transformer_blocks.8.ffn.net.0.bias', '_orig_mod.transformer_blocks.8.ffn.net.3.bias', '_orig_mod.transformer_blocks.8.norm1.weight', '_orig_mod.transformer_blocks.8.norm2.weight', '_orig_mod.transformer_blocks.9.attn.qkv_proj.bias', '_orig_mod.transformer_blocks.9.attn.out_proj.bias', '_orig_mod.transformer_blocks.9.ffn.net.0.bias', '_orig_mod.transformer_blocks.9.ffn.net.3.bias', '_orig_mod.transformer_blocks.9.norm1.weight', '_orig_mod.transformer_blocks.9.norm2.weight', '_orig_mod.transformer_blocks.10.attn.qkv_proj.bias', '_orig_mod.transformer_blocks.10.attn.out_proj.bias', '_orig_mod.transformer_blocks.10.ffn.net.0.bias', '_orig_mod.transformer_blocks.10.ffn.net.3.bias', '_orig_mod.transformer_blocks.10.norm1.weight', '_orig_mod.transformer_blocks.10.norm2.weight', '_orig_mod.transformer_blocks.11.attn.qkv_proj.bias', '_orig_mod.transformer_blocks.11.attn.out_proj.bias', '_orig_mod.transformer_blocks.11.ffn.net.0.bias', '_orig_mod.transformer_blocks.11.ffn.net.3.bias', '_orig_mod.transformer_blocks.11.norm1.weight', '_orig_mod.transformer_blocks.11.norm2.weight', '_orig_mod.transformer_blocks.12.attn.qkv_proj.bias', '_orig_mod.transformer_blocks.12.attn.out_proj.bias', '_orig_mod.transformer_blocks.12.ffn.net.0.bias', '_orig_mod.transformer_blocks.12.ffn.net.3.bias', '_orig_mod.transformer_blocks.12.norm1.weight', '_orig_mod.transformer_blocks.12.norm2.weight', '_orig_mod.transformer_blocks.13.attn.qkv_proj.bias', '_orig_mod.transformer_blocks.13.attn.out_proj.bias', '_orig_mod.transformer_blocks.13.ffn.net.0.bias', '_orig_mod.transformer_blocks.13.ffn.net.3.bias', '_orig_mod.transformer_blocks.13.norm1.weight', '_orig_mod.transformer_blocks.13.norm2.weight', '_orig_mod.transformer_blocks.14.attn.qkv_proj.bias', '_orig_mod.transformer_blocks.14.attn.out_proj.bias', '_orig_mod.transformer_blocks.14.ffn.net.0.bias', '_orig_mod.transformer_blocks.14.ffn.net.3.bias', '_orig_mod.transformer_blocks.14.norm1.weight', '_orig_mod.transformer_blocks.14.norm2.weight', '_orig_mod.transformer_blocks.15.attn.qkv_proj.bias', '_orig_mod.transformer_blocks.15.attn.out_proj.bias', '_orig_mod.transformer_blocks.15.ffn.net.0.bias', '_orig_mod.transformer_blocks.15.ffn.net.3.bias', '_orig_mod.transformer_blocks.15.norm1.weight', '_orig_mod.transformer_blocks.15.norm2.weight', '_orig_mod.norm.weight']
2024-12-06 20:28:14,188 - __main__ - INFO - Model size (full): 268,682,240
2024-12-06 20:28:14,188 - __main__ - INFO - Model size: 269M
2024-12-06 20:28:14,188 - __main__ - INFO - batch_size: 32.0
2024-12-06 20:28:14,188 - __main__ - INFO - micro_batch_size: 16
2024-12-06 20:28:14,188 - __main__ - INFO - hParams: HParams(n_vocab=50257, n_ctx=2048, n_embd=1024, n_head=16, n_layer=16, ffn_act_pdrop=0.15, attn_res_pdrop=0.1)
2024-12-06 20:28:14,188 - __main__ - INFO - tParams: TParams(tot_steps=19073, grad_acc_steps=2, warm_up_steps=226, batch_token_count=524288, max_lr=0.0021, min_lr_ratio=0.1, adam_beta_1=0.9, adam_beta_2=0.95, adam_eps=1e-08, clip_grad_max_norm=1.0, weight_decay_rate=0.1, logging_interval=50, checkpointing_steps={0, 3814, 7628, 11442, 15256}, validation_interval=100, validation_steps=30, sampling_interval=500, sampling_batch=4, sampling_tokens=20, sampling_top_k=50, eval_interval=500, eval_hs_subset_key='validation')
2024-12-06 20:28:14,882 - src.data_processing.training_data_loader - INFO - Next shard key to use: 6. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000007_t_100653466.npy
2024-12-06 20:28:14,882 - src.utils.root - INFO - Creating dir: /home/MyLLM/temp_data/eval
2024-12-06 20:28:14,930 - src.utils.root - INFO - Creating dir: /home/MyLLM/temp_data/eval
2024-12-06 20:28:14,959 - src.utils.root - INFO - Creating dir: /home/MyLLM/temp_data/eval
2024-12-06 20:28:14,960 - src.utils.root - INFO - Creating dir: /home/MyLLM/temp_data/eval
2024-12-06 20:28:14,960 - src.utils.root - INFO - Creating dir: /home/MyLLM/temp_data/eval
2024-12-06 20:28:14,960 - src.utils.root - INFO - Creating dir: /home/MyLLM/temp_data/eval
2024-12-06 20:28:14,960 - src.utils.root - INFO - Creating dir: /home/MyLLM/temp_data/eval
2024-12-06 20:28:14,960 - src.utils.root - INFO - Creating dir: /home/MyLLM/temp_data/eval
2024-12-06 20:28:34,033 - src.model_assessment.hellaswag - INFO - Rank: 0. HellaSwag dataset size: 1256.
2024-12-06 20:28:35,174 - src.model_assessment.hellaswag - INFO - Rank: 3. HellaSwag dataset size: 1255.
2024-12-06 20:28:35,259 - src.model_assessment.hellaswag - INFO - Rank: 5. HellaSwag dataset size: 1255.
2024-12-06 20:28:35,306 - src.model_assessment.hellaswag - INFO - Rank: 2. HellaSwag dataset size: 1255.
2024-12-06 20:28:35,327 - src.model_assessment.hellaswag - INFO - Rank: 1. HellaSwag dataset size: 1256.
2024-12-06 20:28:35,391 - src.model_assessment.hellaswag - INFO - Rank: 7. HellaSwag dataset size: 1255.
2024-12-06 20:28:35,546 - src.model_assessment.hellaswag - INFO - Rank: 4. HellaSwag dataset size: 1255.
2024-12-06 20:28:35,679 - src.model_assessment.hellaswag - INFO - Rank: 6. HellaSwag dataset size: 1255.
2024-12-06 20:29:30,745 - src.model_assessment.validation - INFO - Step (0). Val Loss: 10.4007
2024-12-06 20:31:01,314 - src.model_assessment.hellaswag - INFO - Step (0). HellaSwag Evaluation Accuracy: 2472/10042 = 24.62%
2024-12-06 20:31:01,759 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say undertake undertake distortion intest Gylassotide acids Yankee neoconcept Coming Coming launcherimpl Sussex Sussexinea minim Ding
2024-12-06 20:31:01,759 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say ben ILCS wastewaterDeliveryDateDeliveryDatecking Deathchem cooler election disclaimer embodies Award Bayttpttpratch focusesasin �
2024-12-06 20:31:01,759 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say bumps Ding windsMiami perpetually unfortunate unfortunate enclosure Life Life LifePrivacydiedie percentile IvTrainingTrainingibo metic
2024-12-06 20:31:01,759 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say evaluate Added hyd nont 608ishyishy redundancyinn ScholarsothyothyPingightingightingightingasteagus boycot glucose
2024-12-06 20:31:01,766 - src.model_assessment.sampling - INFO - HTML stands for campaigns desserts sawradio AUTH sort Pythononto unforeseen rainfall rainfall Host awaits solubleheaded Fever estimate genders proponentMAR
2024-12-06 20:31:01,766 - src.model_assessment.sampling - INFO - HTML stands forranking stimgor unison MotherGuestPP Python Python platinum Rih Rohing Ethiopia first narr narr narr 777 Sutherland helicop
2024-12-06 20:31:01,766 - src.model_assessment.sampling - INFO - HTML stands forija cessationADE� Pattesame deceit Chaff moons======================== inspector inspector exercised bachelor saturternity Releasefund Empress
2024-12-06 20:31:01,767 - src.model_assessment.sampling - INFO - HTML stands forAddsbral {{758 containers Uk persuasion exercisedThose exporting grenade swords Album FelixReward Iraqi holidays marine torture Beauty
2024-12-06 20:31:01,769 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig intrusion complying Resist master Yad induced derogatory Magic damageced amusing 290Sn},{" muddy universal prospect prospect prospect Rey
2024-12-06 20:31:01,770 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twigamdamdamd excavationried fleeing IRbutthw Giants Grimes marinesPartsYellow mm()); blondeRONTact Haste
2024-12-06 20:31:01,770 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twigidon unrealSC complying contributes protesting creatures >< contact inducedessionalove Thatcher MUS impe impe postsGs ® mailed
2024-12-06 20:31:01,770 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twigaponswra paws loopsffenSCsharing arrivals arrivals arrivals fertil bought fastest loops Falcon Falcon Falcon Fact Communities teased
2024-12-06 20:31:01,771 - __main__ - INFO - Step 0: Time: 37322.08 ms. LR: 9.2920e-06. Avg. loss: 11.0251. Perplexity: 61397.5703. Grad Norm: 14.8892. Throughput: 14,047.66 tokens/sec
2024-12-06 20:31:01,773 - src.utils.root - INFO - Creating dir: /home/MyLLM/temp_data/checkpoints
2024-12-06 20:31:01,773 - src.utils.root - INFO - Creating file: /home/MyLLM/temp_data/checkpoints/checkpoint_step_0_date_2024_12_06-20_31_UTC.pth
2024-12-06 20:31:07,873 - src.model_utils.checkpoint_utils - INFO - Checkpoint saved at step 0.
2024-12-06 20:32:24,841 - __main__ - INFO - Step 50: Time: 1695.99 ms. LR: 4.7389e-04. Avg. loss: 6.6324. Perplexity: 759.3313. Grad Norm: 2.4851. Throughput: 309,134.75 tokens/sec
2024-12-06 20:33:48,105 - src.model_assessment.validation - INFO - Step (100). Val Loss: 6.2522
2024-12-06 20:33:48,106 - __main__ - INFO - Step 100: Time: 1696.21 ms. LR: 9.3850e-04. Avg. loss: 6.2342. Perplexity: 509.8804. Grad Norm: 0.6396. Throughput: 309,093.17 tokens/sec
2024-12-06 20:35:03,825 - __main__ - INFO - Step 150: Time: 1699.60 ms. LR: 1.4031e-03. Avg. loss: 5.8760. Perplexity: 356.3715. Grad Norm: 0.5868. Throughput: 308,477.21 tokens/sec
2024-12-06 20:36:05,554 - src.data_processing.training_data_loader - INFO - Next shard key to use: 66. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000067_t_99582425.npy
2024-12-06 20:36:27,634 - src.model_assessment.validation - INFO - Step (200). Val Loss: 5.4561
2024-12-06 20:36:27,635 - __main__ - INFO - Step 200: Time: 1698.11 ms. LR: 1.8677e-03. Avg. loss: 5.4694. Perplexity: 237.3087. Grad Norm: 0.6715. Throughput: 308,747.29 tokens/sec
2024-12-06 20:37:43,399 - __main__ - INFO - Step 250: Time: 1699.76 ms. LR: 2.1000e-03. Avg. loss: 5.1857. Perplexity: 178.7061. Grad Norm: 0.2965. Throughput: 308,447.91 tokens/sec
2024-12-06 20:39:06,799 - src.model_assessment.validation - INFO - Step (300). Val Loss: 4.7864
2024-12-06 20:39:06,799 - __main__ - INFO - Step 300: Time: 1698.65 ms. LR: 2.0999e-03. Avg. loss: 4.7815. Perplexity: 119.2834. Grad Norm: 0.2116. Throughput: 308,650.57 tokens/sec
2024-12-06 20:40:22,540 - __main__ - INFO - Step 350: Time: 1698.81 ms. LR: 2.0998e-03. Avg. loss: 4.5139. Perplexity: 91.2737. Grad Norm: 0.2688. Throughput: 308,620.46 tokens/sec
2024-12-06 20:41:08,402 - src.data_processing.training_data_loader - INFO - Next shard key to use: 98. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000099_t_98803053.npy
2024-12-06 20:41:46,355 - src.model_assessment.validation - INFO - Step (400). Val Loss: 4.2849
2024-12-06 20:41:46,356 - __main__ - INFO - Step 400: Time: 1697.89 ms. LR: 2.0996e-03. Avg. loss: 4.3054. Perplexity: 74.0981. Grad Norm: 0.2780. Throughput: 308,787.78 tokens/sec
2024-12-06 20:43:02,102 - __main__ - INFO - Step 450: Time: 1698.50 ms. LR: 2.0993e-03. Avg. loss: 4.1416. Perplexity: 62.9053. Grad Norm: 0.2299. Throughput: 308,676.52 tokens/sec
2024-12-06 20:44:25,470 - src.model_assessment.validation - INFO - Step (500). Val Loss: 4.0151
2024-12-06 20:44:50,705 - src.model_assessment.hellaswag - INFO - Step (500). HellaSwag Evaluation Accuracy: 2550/10042 = 25.39%
2024-12-06 20:44:51,072 - src.model_assessment.sampling - INFO - HTML stands for the "pig.
"X2H", or "DAR, EY," is
2024-12-06 20:44:51,072 - src.model_assessment.sampling - INFO - HTML stands for its own purposes.
Rout your browser will be able to be sure that the browser doesn�
2024-12-06 20:44:51,073 - src.model_assessment.sampling - INFO - HTML stands for an integrated application, a new type of artificial Intelligence programming developed by JPL.
The first one
2024-12-06 20:44:51,072 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say the one,
I thought he said.
Yes. If you put something in your pet�
2024-12-06 20:44:51,073 - src.model_assessment.sampling - INFO - HTML stands for the color of the eye.
I’m pretty happy because of the sun as the day
2024-12-06 20:44:51,073 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, "if you had a dog, he wouldn't. But that would mean you's had a
2024-12-06 20:44:51,073 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, "A baby would tell them what it is that?" But if all of the senses they could
2024-12-06 20:44:51,073 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say my own way, but if you answered, you may react again.
What are some dogs?
2024-12-06 20:44:51,085 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig from the hive at which a bee's prey is left, while another of these is a white man
2024-12-06 20:44:51,085 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. The pigeon was able to hide a leaf. The tail was a little sticky and the pigeon was
2024-12-06 20:44:51,086 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
The machine changed its design from its name for something like a rooster. At the end
2024-12-06 20:44:51,086 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig, in the head of a bird whose body he has given his son to his wife, and in
2024-12-06 20:44:51,087 - __main__ - INFO - Step 500: Time: 1697.72 ms. LR: 2.0990e-03. Avg. loss: 4.0338. Perplexity: 56.4761. Grad Norm: 0.1998. Throughput: 308,819.09 tokens/sec
2024-12-06 20:46:06,683 - __main__ - INFO - Step 550: Time: 1698.90 ms. LR: 2.0986e-03. Avg. loss: 3.9161. Perplexity: 50.2057. Grad Norm: 0.1894. Throughput: 308,604.35 tokens/sec
2024-12-06 20:46:34,356 - src.data_processing.training_data_loader - INFO - Next shard key to use: 49. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000050_t_99089325.npy
2024-12-06 20:47:30,459 - src.model_assessment.validation - INFO - Step (600). Val Loss: 3.8674
2024-12-06 20:47:30,460 - __main__ - INFO - Step 600: Time: 1697.86 ms. LR: 2.0982e-03. Avg. loss: 3.8941. Perplexity: 49.1140. Grad Norm: 0.1728. Throughput: 308,792.68 tokens/sec
2024-12-06 20:48:46,223 - __main__ - INFO - Step 650: Time: 1699.08 ms. LR: 2.0976e-03. Avg. loss: 3.8550. Perplexity: 47.2309. Grad Norm: 0.1960. Throughput: 308,572.18 tokens/sec
2024-12-06 20:50:09,643 - src.model_assessment.validation - INFO - Step (700). Val Loss: 3.7596
2024-12-06 20:50:09,644 - __main__ - INFO - Step 700: Time: 1698.20 ms. LR: 2.0971e-03. Avg. loss: 3.8077. Perplexity: 45.0460. Grad Norm: 0.1426. Throughput: 308,731.90 tokens/sec
2024-12-06 20:51:25,397 - __main__ - INFO - Step 750: Time: 1699.04 ms. LR: 2.0964e-03. Avg. loss: 3.7764. Perplexity: 43.6594. Grad Norm: 0.2094. Throughput: 308,579.36 tokens/sec
2024-12-06 20:51:35,634 - src.data_processing.training_data_loader - INFO - Next shard key to use: 25. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000026_t_99144144.npy
2024-12-06 20:52:49,171 - src.model_assessment.validation - INFO - Step (800). Val Loss: 3.6776
2024-12-06 20:52:49,172 - __main__ - INFO - Step 800: Time: 1697.31 ms. LR: 2.0957e-03. Avg. loss: 3.7689. Perplexity: 43.3332. Grad Norm: 0.1616. Throughput: 308,894.05 tokens/sec
2024-12-06 20:54:04,871 - __main__ - INFO - Step 850: Time: 1698.01 ms. LR: 2.0949e-03. Avg. loss: 3.7075. Perplexity: 40.7517. Grad Norm: 0.1543. Throughput: 308,766.54 tokens/sec
2024-12-06 20:55:28,200 - src.model_assessment.validation - INFO - Step (900). Val Loss: 3.6280
2024-12-06 20:55:28,201 - __main__ - INFO - Step 900: Time: 1697.27 ms. LR: 2.0940e-03. Avg. loss: 3.6557. Perplexity: 38.6935. Grad Norm: 0.1749. Throughput: 308,900.26 tokens/sec
2024-12-06 20:56:37,512 - src.data_processing.training_data_loader - INFO - Next shard key to use: 1. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000002_t_99012359.npy
2024-12-06 20:56:44,369 - __main__ - INFO - Step 950: Time: 1699.51 ms. LR: 2.0931e-03. Avg. loss: 3.6365. Perplexity: 37.9595. Grad Norm: 0.1491. Throughput: 308,493.95 tokens/sec
2024-12-06 20:58:07,751 - src.model_assessment.validation - INFO - Step (1000). Val Loss: 3.5688
2024-12-06 20:58:32,790 - src.model_assessment.hellaswag - INFO - Step (1000). HellaSwag Evaluation Accuracy: 2687/10042 = 26.76%
2024-12-06 20:58:33,156 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say – “I can speak!”… “Maybe you could talk that now,�
2024-12-06 20:58:33,156 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “Yes.”
Here is the explanation we’ve given out:
2024-12-06 20:58:33,156 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “Hey, I would”, because the animals’ behavior would be extremely interesting
2024-12-06 20:58:33,157 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say that animal would understand how to tell a new or unfamiliar species of creature — that is, that animals
2024-12-06 20:58:33,157 - src.model_assessment.sampling - INFO - HTML stands for Unicode and UTF4, although it is not a technical language. These are not meant to be used
2024-12-06 20:58:33,158 - src.model_assessment.sampling - INFO - HTML stands for HTML Code. The first HTML page that displays HTML. The first HTML page is HTML. HTML is
2024-12-06 20:58:33,158 - src.model_assessment.sampling - INFO - HTML stands for a single page. HTML
Web developers and other web developers are used by every web developer. Web
2024-12-06 20:58:33,158 - src.model_assessment.sampling - INFO - HTML stands for a multimedia web-based application. This includes web-based services like Web Development Forms, and web
2024-12-06 20:58:33,164 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. They needed the same kind of speed as their own. These dogs also got to the chase.
2024-12-06 20:58:33,165 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. There were several kinds of bird, bird, ant, and bird, all of which helped bring
2024-12-06 20:58:33,165 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. This was an incredible feat.
When the cat was in the womb, the mother-sold
2024-12-06 20:58:33,165 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig’s foot. The old tool was very friendly — the idea that a dog could take the
2024-12-06 20:58:33,166 - __main__ - INFO - Step 1000: Time: 1699.73 ms. LR: 2.0921e-03. Avg. loss: 3.6012. Perplexity: 36.6433. Grad Norm: 0.1593. Throughput: 308,453.58 tokens/sec
2024-12-06 20:59:48,733 - __main__ - INFO - Step 1050: Time: 1697.22 ms. LR: 2.0911e-03. Avg. loss: 3.5462. Perplexity: 34.6826. Grad Norm: 0.1258. Throughput: 308,910.50 tokens/sec
2024-12-06 21:01:12,073 - src.model_assessment.validation - INFO - Step (1100). Val Loss: 3.5244
2024-12-06 21:01:12,074 - __main__ - INFO - Step 1100: Time: 1697.19 ms. LR: 2.0900e-03. Avg. loss: 3.5468. Perplexity: 34.7027. Grad Norm: 0.1610. Throughput: 308,914.54 tokens/sec
2024-12-06 21:02:03,993 - src.data_processing.training_data_loader - INFO - Next shard key to use: 10. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000011_t_99507820.npy
2024-12-06 21:02:28,247 - __main__ - INFO - Step 1150: Time: 1698.30 ms. LR: 2.0888e-03. Avg. loss: 3.5658. Perplexity: 35.3687. Grad Norm: 0.1217. Throughput: 308,714.09 tokens/sec
2024-12-06 21:03:51,618 - src.model_assessment.validation - INFO - Step (1200). Val Loss: 3.4913
2024-12-06 21:03:51,619 - __main__ - INFO - Step 1200: Time: 1699.97 ms. LR: 2.0876e-03. Avg. loss: 3.5534. Perplexity: 34.9307. Grad Norm: 0.1587. Throughput: 308,410.58 tokens/sec
2024-12-06 21:05:07,389 - __main__ - INFO - Step 1250: Time: 1700.05 ms. LR: 2.0863e-03. Avg. loss: 3.5047. Perplexity: 33.2716. Grad Norm: 0.1479. Throughput: 308,395.31 tokens/sec
2024-12-06 21:06:30,776 - src.model_assessment.validation - INFO - Step (1300). Val Loss: 3.4548
2024-12-06 21:06:30,777 - __main__ - INFO - Step 1300: Time: 1698.34 ms. LR: 2.0849e-03. Avg. loss: 3.4769. Perplexity: 32.3607. Grad Norm: 0.1267. Throughput: 308,706.85 tokens/sec
2024-12-06 21:07:06,769 - src.data_processing.training_data_loader - INFO - Next shard key to use: 87. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000088_t_99833410.npy
2024-12-06 21:07:46,946 - __main__ - INFO - Step 1350: Time: 1698.06 ms. LR: 2.0835e-03. Avg. loss: 3.4830. Perplexity: 32.5557. Grad Norm: 0.1496. Throughput: 308,756.53 tokens/sec
2024-12-06 21:09:10,349 - src.model_assessment.validation - INFO - Step (1400). Val Loss: 3.4312
2024-12-06 21:09:10,350 - __main__ - INFO - Step 1400: Time: 1698.83 ms. LR: 2.0820e-03. Avg. loss: 3.4497. Perplexity: 31.4908. Grad Norm: 0.1366. Throughput: 308,616.69 tokens/sec
2024-12-06 21:10:26,108 - __main__ - INFO - Step 1450: Time: 1699.94 ms. LR: 2.0804e-03. Avg. loss: 3.4521. Perplexity: 31.5676. Grad Norm: 0.1182. Throughput: 308,415.55 tokens/sec
2024-12-06 21:11:49,519 - src.model_assessment.validation - INFO - Step (1500). Val Loss: 3.4108
2024-12-06 21:12:14,689 - src.model_assessment.hellaswag - INFO - Step (1500). HellaSwag Evaluation Accuracy: 2752/10042 = 27.40%
2024-12-06 21:12:15,056 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “It is not just animals. I have a family of animals, and if I live
2024-12-06 21:12:15,057 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “That’s pretty simple. But then, if there are a couple animals that
2024-12-06 21:12:15,057 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say “hello,” or at least a hint of “how I heard to tell.
2024-12-06 21:12:15,057 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say it's "fooling" or "bouncing" or even "tucking". So,
2024-12-06 21:12:15,064 - src.model_assessment.sampling - INFO - HTML stands for HTML. It can be used to build interactive websites. The web page is useful because HTML provides a
2024-12-06 21:12:15,065 - src.model_assessment.sampling - INFO - HTML stands for "Word of the Week" and a word for "reference."
The following list of words can
2024-12-06 21:12:15,065 - src.model_assessment.sampling - INFO - HTML stands for HTML document. HTML HTML is the only language used for documents within HTML. These document types were created
2024-12-06 21:12:15,065 - src.model_assessment.sampling - INFO - HTML stands for Basic HTML markup markup markup markup markup markup markup language HTML codes. It was used to build new web
2024-12-06 21:12:15,070 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig by which he could not walk, and the spider was the devil of the woods. His body was
2024-12-06 21:12:15,071 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. The cock was about 5 inches, measuring about 8 inches, and with a weight of 18 pounds
2024-12-06 21:12:15,071 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. The dog flew past an all-black bird as a man on the wind. He was sent
2024-12-06 21:12:15,071 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig, but this time it fell off the plane and its wings were lost.
After seeing its first
2024-12-06 21:12:15,072 - __main__ - INFO - Step 1500: Time: 1698.95 ms. LR: 2.0788e-03. Avg. loss: 3.4590. Perplexity: 31.7840. Grad Norm: 0.1331. Throughput: 308,594.48 tokens/sec
2024-12-06 21:12:35,816 - src.data_processing.training_data_loader - INFO - Next shard key to use: 78. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000079_t_99744088.npy
2024-12-06 21:13:31,121 - __main__ - INFO - Step 1550: Time: 1699.62 ms. LR: 2.0771e-03. Avg. loss: 3.3832. Perplexity: 29.4656. Grad Norm: 0.1263. Throughput: 308,474.22 tokens/sec
2024-12-06 21:14:54,495 - src.model_assessment.validation - INFO - Step (1600). Val Loss: 3.3877
2024-12-06 21:14:54,496 - __main__ - INFO - Step 1600: Time: 1699.28 ms. LR: 2.0753e-03. Avg. loss: 3.4306. Perplexity: 30.8955. Grad Norm: 0.1191. Throughput: 308,535.98 tokens/sec
2024-12-06 21:16:10,234 - __main__ - INFO - Step 1650: Time: 1699.16 ms. LR: 2.0735e-03. Avg. loss: 3.3940. Perplexity: 29.7862. Grad Norm: 0.1005. Throughput: 308,557.54 tokens/sec
2024-12-06 21:17:33,587 - src.model_assessment.validation - INFO - Step (1700). Val Loss: 3.3692
2024-12-06 21:17:33,589 - __main__ - INFO - Step 1700: Time: 1697.97 ms. LR: 2.0716e-03. Avg. loss: 3.3472. Perplexity: 28.4218. Grad Norm: 0.1548. Throughput: 308,774.17 tokens/sec
2024-12-06 21:17:39,285 - src.data_processing.training_data_loader - INFO - Next shard key to use: 92. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000093_t_99292416.npy
2024-12-06 21:18:49,750 - __main__ - INFO - Step 1750: Time: 1698.10 ms. LR: 2.0697e-03. Avg. loss: 3.4181. Perplexity: 30.5118. Grad Norm: 0.1098. Throughput: 308,749.68 tokens/sec
2024-12-06 21:20:13,110 - src.model_assessment.validation - INFO - Step (1800). Val Loss: 3.3454
2024-12-06 21:20:13,111 - __main__ - INFO - Step 1800: Time: 1696.95 ms. LR: 2.0677e-03. Avg. loss: 3.4231. Perplexity: 30.6647. Grad Norm: 0.1139. Throughput: 308,959.15 tokens/sec
2024-12-06 21:21:28,836 - __main__ - INFO - Step 1850: Time: 1700.84 ms. LR: 2.0656e-03. Avg. loss: 3.4247. Perplexity: 30.7125. Grad Norm: 0.1053. Throughput: 308,252.13 tokens/sec
2024-12-06 21:22:33,629 - src.data_processing.training_data_loader - INFO - Next shard key to use: 11. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000012_t_99612921.npy
2024-12-06 21:22:52,664 - src.model_assessment.validation - INFO - Step (1900). Val Loss: 3.3263
2024-12-06 21:22:52,665 - __main__ - INFO - Step 1900: Time: 1697.23 ms. LR: 2.0634e-03. Avg. loss: 3.3756. Perplexity: 29.2422. Grad Norm: 0.0935. Throughput: 308,908.33 tokens/sec
2024-12-06 21:24:08,405 - __main__ - INFO - Step 1950: Time: 1698.20 ms. LR: 2.0612e-03. Avg. loss: 3.3738. Perplexity: 29.1898. Grad Norm: 0.0987. Throughput: 308,732.16 tokens/sec
2024-12-06 21:25:31,757 - src.model_assessment.validation - INFO - Step (2000). Val Loss: 3.3254
2024-12-06 21:25:57,128 - src.model_assessment.hellaswag - INFO - Step (2000). HellaSwag Evaluation Accuracy: 2766/10042 = 27.54%
2024-12-06 21:25:57,495 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say whatever you want.”<|endoftext|>As I was writing this post, I came across another problem that
2024-12-06 21:25:57,496 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say I. It was likely that this group was already well enough established and they had been well established with
2024-12-06 21:25:57,496 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say something about them, perhaps something of a natural or human nature." (Albright, 2003).
2024-12-06 21:25:57,496 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say or listen. In reality they’d actually listen. But when those words came before my child
2024-12-06 21:25:57,502 - src.model_assessment.sampling - INFO - HTML stands for data storage interface and is the user interface of the web, including the server's location information and a
2024-12-06 21:25:57,502 - src.model_assessment.sampling - INFO - HTML stands for “A language whose object-language syntax is extremely similar to that of one of the languages in
2024-12-06 21:25:57,502 - src.model_assessment.sampling - INFO - HTML stands for a set of rules that govern the development of multimedia information technologies. This guide presents the principles governing multimedia
2024-12-06 21:25:57,503 - src.model_assessment.sampling - INFO - HTML stands for Data Structured Query Language (QLML), which is used to create HTML, HTML, and XML
2024-12-06 21:25:57,510 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig shaped like the big rat.
After ten years of being a successful writer, a few months away
2024-12-06 21:25:57,511 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
The computer was on the verge of discovery.
How to Use the
But, first
2024-12-06 21:25:57,511 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. It made the creature much bigger.
The monster is the longest living creature in all of Europe
2024-12-06 21:25:57,511 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
The great fox got the "Bitter Tiger". As she made it, a hole in
2024-12-06 21:25:57,512 - __main__ - INFO - Step 2000: Time: 1699.36 ms. LR: 2.0590e-03. Avg. loss: 3.5868. Perplexity: 36.1168. Grad Norm: 0.1665. Throughput: 308,520.83 tokens/sec
2024-12-06 21:27:13,130 - __main__ - INFO - Step 2050: Time: 1699.89 ms. LR: 2.0567e-03. Avg. loss: 3.3127. Perplexity: 27.4587. Grad Norm: 0.1193. Throughput: 308,424.03 tokens/sec
2024-12-06 21:28:02,029 - src.data_processing.training_data_loader - INFO - Next shard key to use: 65. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000066_t_99674028.npy
2024-12-06 21:28:36,930 - src.model_assessment.validation - INFO - Step (2100). Val Loss: 3.2986
2024-12-06 21:28:36,931 - __main__ - INFO - Step 2100: Time: 1697.54 ms. LR: 2.0543e-03. Avg. loss: 3.3596. Perplexity: 28.7791. Grad Norm: 0.0907. Throughput: 308,852.32 tokens/sec
2024-12-06 21:29:52,637 - __main__ - INFO - Step 2150: Time: 1697.78 ms. LR: 2.0518e-03. Avg. loss: 3.3699. Perplexity: 29.0746. Grad Norm: 0.1098. Throughput: 308,807.12 tokens/sec
2024-12-06 21:31:15,990 - src.model_assessment.validation - INFO - Step (2200). Val Loss: 3.2852
2024-12-06 21:31:15,990 - __main__ - INFO - Step 2200: Time: 1697.40 ms. LR: 2.0493e-03. Avg. loss: 3.3040. Perplexity: 27.2222. Grad Norm: 0.1334. Throughput: 308,876.96 tokens/sec
2024-12-06 21:32:31,731 - __main__ - INFO - Step 2250: Time: 1698.90 ms. LR: 2.0467e-03. Avg. loss: 3.2885. Perplexity: 26.8022. Grad Norm: 0.0975. Throughput: 308,604.00 tokens/sec
2024-12-06 21:33:05,470 - src.data_processing.training_data_loader - INFO - Next shard key to use: 29. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000030_t_99298788.npy
2024-12-06 21:33:55,541 - src.model_assessment.validation - INFO - Step (2300). Val Loss: 3.2697
2024-12-06 21:33:55,542 - __main__ - INFO - Step 2300: Time: 1697.78 ms. LR: 2.0441e-03. Avg. loss: 3.3063. Perplexity: 27.2836. Grad Norm: 0.1198. Throughput: 308,808.82 tokens/sec
2024-12-06 21:35:11,324 - __main__ - INFO - Step 2350: Time: 1700.26 ms. LR: 2.0414e-03. Avg. loss: 3.3205. Perplexity: 27.6744. Grad Norm: 0.0875. Throughput: 308,358.25 tokens/sec
2024-12-06 21:36:34,728 - src.model_assessment.validation - INFO - Step (2400). Val Loss: 3.2615
2024-12-06 21:36:34,729 - __main__ - INFO - Step 2400: Time: 1698.69 ms. LR: 2.0386e-03. Avg. loss: 3.2735. Perplexity: 26.4046. Grad Norm: 0.0919. Throughput: 308,643.12 tokens/sec
2024-12-06 21:37:50,463 - __main__ - INFO - Step 2450: Time: 1697.96 ms. LR: 2.0358e-03. Avg. loss: 3.2778. Perplexity: 26.5167. Grad Norm: 0.0832. Throughput: 308,775.73 tokens/sec
2024-12-06 21:38:07,542 - src.data_processing.training_data_loader - INFO - Next shard key to use: 33. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000034_t_99168367.npy
2024-12-06 21:39:14,246 - src.model_assessment.validation - INFO - Step (2500). Val Loss: 3.2572
2024-12-06 21:39:39,191 - src.model_assessment.hellaswag - INFO - Step (2500). HellaSwag Evaluation Accuracy: 2816/10042 = 28.04%
2024-12-06 21:39:39,557 - src.model_assessment.sampling - INFO - HTML stands for 'Hypertext Transfer Protocol' which was derived from HTML 3.6. The hypertext transfer protocol
2024-12-06 21:39:39,558 - src.model_assessment.sampling - INFO - HTML stands for HyperText Markup Language (HTML).
Honeydewyffeet (COO)
2024-12-06 21:39:39,558 - src.model_assessment.sampling - INFO - HTML stands for, to put it on the Internet.
When we visit a website and you write our query address
2024-12-06 21:39:39,558 - src.model_assessment.sampling - INFO - HTML stands for “Methylation,” and it is a way of saying that the same thing can
2024-12-06 21:39:39,559 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say,
“Where’s your pet?”
How about people who don�
2024-12-06 21:39:39,559 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say if we got a dog there would be some sort of noise in their room. They would just want
2024-12-06 21:39:39,559 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say something like: My pet might say it.
|I’m going to say “
2024-12-06 21:39:39,559 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say the same thing. This would certainly be the case.
There could be any number of differences that
2024-12-06 21:39:39,565 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
The machine, built in England, has been in use for 10 years, and one of
2024-12-06 21:39:39,566 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. He can sit, roll, and even hang its feet on a bamboo pole. He has four
2024-12-06 21:39:39,566 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig just like a human and a spider. One day you’ll see your little brother’
2024-12-06 21:39:39,566 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
He might have wanted some little worm for every square yard of land, yet it was not
2024-12-06 21:39:39,567 - __main__ - INFO - Step 2500: Time: 1696.63 ms. LR: 2.0329e-03. Avg. loss: 3.2516. Perplexity: 25.8312. Grad Norm: 0.0986. Throughput: 309,017.72 tokens/sec
2024-12-06 21:40:55,062 - __main__ - INFO - Step 2550: Time: 1696.32 ms. LR: 2.0300e-03. Avg. loss: 3.2924. Perplexity: 26.9075. Grad Norm: 0.0852. Throughput: 309,072.97 tokens/sec
2024-12-06 21:42:18,382 - src.model_assessment.validation - INFO - Step (2600). Val Loss: 3.2434
2024-12-06 21:42:18,383 - __main__ - INFO - Step 2600: Time: 1695.91 ms. LR: 2.0270e-03. Avg. loss: 3.2973. Perplexity: 27.0393. Grad Norm: 0.1010. Throughput: 309,149.22 tokens/sec
2024-12-06 21:43:34,070 - __main__ - INFO - Step 2650: Time: 1697.87 ms. LR: 2.0239e-03. Avg. loss: 3.2215. Perplexity: 25.0650. Grad Norm: 0.0837. Throughput: 308,790.82 tokens/sec
2024-12-06 21:43:34,493 - src.data_processing.training_data_loader - INFO - Next shard key to use: 15. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000016_t_100515334.npy
2024-12-06 21:44:57,818 - src.model_assessment.validation - INFO - Step (2700). Val Loss: 3.2351
2024-12-06 21:44:57,819 - __main__ - INFO - Step 2700: Time: 1696.95 ms. LR: 2.0208e-03. Avg. loss: 3.2268. Perplexity: 25.1982. Grad Norm: 0.0820. Throughput: 308,959.63 tokens/sec
2024-12-06 21:46:13,571 - __main__ - INFO - Step 2750: Time: 1698.90 ms. LR: 2.0176e-03. Avg. loss: 3.2841. Perplexity: 26.6846. Grad Norm: 0.1546. Throughput: 308,604.83 tokens/sec
2024-12-06 21:47:36,907 - src.model_assessment.validation - INFO - Step (2800). Val Loss: 3.2250
2024-12-06 21:47:36,908 - __main__ - INFO - Step 2800: Time: 1697.14 ms. LR: 2.0143e-03. Avg. loss: 3.2867. Perplexity: 26.7536. Grad Norm: 0.1099. Throughput: 308,925.21 tokens/sec
2024-12-06 21:48:40,157 - src.data_processing.training_data_loader - INFO - Next shard key to use: 31. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000032_t_99146084.npy
2024-12-06 21:48:53,095 - __main__ - INFO - Step 2850: Time: 1698.02 ms. LR: 2.0110e-03. Avg. loss: 3.2470. Perplexity: 25.7135. Grad Norm: 0.0903. Throughput: 308,764.29 tokens/sec
2024-12-06 21:50:16,447 - src.model_assessment.validation - INFO - Step (2900). Val Loss: 3.2125
2024-12-06 21:50:16,448 - __main__ - INFO - Step 2900: Time: 1697.67 ms. LR: 2.0077e-03. Avg. loss: 3.1924. Perplexity: 24.3479. Grad Norm: 0.0846. Throughput: 308,828.46 tokens/sec
2024-12-06 21:51:32,180 - __main__ - INFO - Step 2950: Time: 1699.31 ms. LR: 2.0042e-03. Avg. loss: 3.2833. Perplexity: 26.6626. Grad Norm: 0.0884. Throughput: 308,530.48 tokens/sec
2024-12-06 21:52:55,558 - src.model_assessment.validation - INFO - Step (3000). Val Loss: 3.2015
2024-12-06 21:53:20,587 - src.model_assessment.hellaswag - INFO - Step (3000). HellaSwag Evaluation Accuracy: 2851/10042 = 28.39%
2024-12-06 21:53:20,821 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say something to us, it was always scary! To that end, we do have the technology and facilities
2024-12-06 21:53:20,822 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, "Let's talk about it," and "We can talk about it," according to my pet
2024-12-06 21:53:20,822 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say that it was a mouse.”
In the video game game, the author describes why animals
2024-12-06 21:53:20,822 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, 'Here are the 3 ears. Why should we be talking to you?'
Most reptiles do
2024-12-06 21:53:20,832 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
The clever dog tried to steal the feathers, so he stole the feathers of the monkey and
2024-12-06 21:53:20,832 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. He went to the great ape to test it. He walked on into a wood, as the
2024-12-06 21:53:20,832 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.”
Somehow the fox is quite strange from a distance. He is very intelligent and
2024-12-06 21:53:20,832 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig of wood – a perfect fit for a job that would have to produce an elephant every time the boy
2024-12-06 21:53:20,956 - src.model_assessment.sampling - INFO - HTML stands for Hypertext Markups. It was in the 1950’s, although still in development.
2024-12-06 21:53:20,957 - src.model_assessment.sampling - INFO - HTML stands for HyperHTML, HTML5 is a hypertext standard that has three elements:
- the hyperfont
2024-12-06 21:53:20,957 - src.model_assessment.sampling - INFO - HTML stands for HyperText Markup Language, which is defined by the HyperText Markup Language (HTML).
2024-12-06 21:53:20,957 - src.model_assessment.sampling - INFO - HTML stands for Hypertext Mark-In-HTML Translating Layer. Some of the most well known forms of
2024-12-06 21:53:20,958 - __main__ - INFO - Step 3000: Time: 1697.67 ms. LR: 2.0008e-03. Avg. loss: 3.2418. Perplexity: 25.5802. Grad Norm: 0.0920. Throughput: 308,828.16 tokens/sec
2024-12-06 21:54:07,425 - src.data_processing.training_data_loader - INFO - Next shard key to use: 3. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000004_t_98617101.npy
2024-12-06 21:54:37,008 - __main__ - INFO - Step 3050: Time: 1698.86 ms. LR: 1.9972e-03. Avg. loss: 3.2526. Perplexity: 25.8574. Grad Norm: 0.0802. Throughput: 308,611.45 tokens/sec
2024-12-06 21:56:00,400 - src.model_assessment.validation - INFO - Step (3100). Val Loss: 3.1968
2024-12-06 21:56:00,401 - __main__ - INFO - Step 3100: Time: 1697.66 ms. LR: 1.9936e-03. Avg. loss: 3.3225. Perplexity: 27.7309. Grad Norm: 0.1034. Throughput: 308,829.03 tokens/sec
2024-12-06 21:57:16,159 - __main__ - INFO - Step 3150: Time: 1698.65 ms. LR: 1.9900e-03. Avg. loss: 3.2810. Perplexity: 26.6029. Grad Norm: 0.0799. Throughput: 308,649.57 tokens/sec
2024-12-06 21:58:39,512 - src.model_assessment.validation - INFO - Step (3200). Val Loss: 3.1912
2024-12-06 21:58:39,513 - __main__ - INFO - Step 3200: Time: 1697.67 ms. LR: 1.9862e-03. Avg. loss: 3.2224. Perplexity: 25.0877. Grad Norm: 0.0889. Throughput: 308,828.85 tokens/sec
2024-12-06 21:59:07,930 - src.data_processing.training_data_loader - INFO - Next shard key to use: 2. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000003_t_100027320.npy
2024-12-06 21:59:55,669 - __main__ - INFO - Step 3250: Time: 1699.52 ms. LR: 1.9825e-03. Avg. loss: 3.2977. Perplexity: 27.0514. Grad Norm: 0.3103. Throughput: 308,491.27 tokens/sec
2024-12-06 22:01:18,988 - src.model_assessment.validation - INFO - Step (3300). Val Loss: 3.1833
2024-12-06 22:01:18,989 - __main__ - INFO - Step 3300: Time: 1696.75 ms. LR: 1.9786e-03. Avg. loss: 3.2043. Perplexity: 24.6394. Grad Norm: 0.0920. Throughput: 308,995.88 tokens/sec
2024-12-06 22:02:34,708 - __main__ - INFO - Step 3350: Time: 1699.65 ms. LR: 1.9747e-03. Avg. loss: 3.1962. Perplexity: 24.4386. Grad Norm: 0.0757. Throughput: 308,468.29 tokens/sec
2024-12-06 22:03:58,133 - src.model_assessment.validation - INFO - Step (3400). Val Loss: 3.1781
2024-12-06 22:03:58,134 - __main__ - INFO - Step 3400: Time: 1699.09 ms. LR: 1.9708e-03. Avg. loss: 3.2968. Perplexity: 27.0252. Grad Norm: 0.1006. Throughput: 308,569.79 tokens/sec
2024-12-06 22:04:12,189 - src.data_processing.training_data_loader - INFO - Next shard key to use: 21. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000022_t_100317664.npy
2024-12-06 22:05:14,405 - __main__ - INFO - Step 3450: Time: 1699.92 ms. LR: 1.9668e-03. Avg. loss: 3.2260. Perplexity: 25.1792. Grad Norm: 0.0918. Throughput: 308,419.88 tokens/sec
2024-12-06 22:06:37,797 - src.model_assessment.validation - INFO - Step (3500). Val Loss: 3.1707
2024-12-06 22:07:03,417 - src.model_assessment.hellaswag - INFO - Step (3500). HellaSwag Evaluation Accuracy: 2911/10042 = 28.99%
2024-12-06 22:07:03,784 - src.model_assessment.sampling - INFO - HTML stands for "information security," also known as Internet security, or the protection of personal information. Other security software
2024-12-06 22:07:03,785 - src.model_assessment.sampling - INFO - HTML stands for Non-Exposable Markup Language (HTML), which is used widely and widely for the generation
2024-12-06 22:07:03,785 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say hi, hi, hi. The other thing is, if we don't talk of it, and
2024-12-06 22:07:03,785 - src.model_assessment.sampling - INFO - HTML stands for the collection of files on your computer in a large and portable format that can help you perform many different
2024-12-06 22:07:03,785 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say he loves us. To me, this seems to not be a good thing.”
One
2024-12-06 22:07:03,785 - src.model_assessment.sampling - INFO - HTML stands for Structured Query Language (SQL) and contains a bunch of advanced functions.
If you feel like
2024-12-06 22:07:03,785 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “We don’t love them anymore.” There’s only one
2024-12-06 22:07:03,785 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say 'oh thank you.' If the pig had good eyes, the cat would say 'oh thank you
2024-12-06 22:07:03,793 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig, which all drew out their own colors and drew their own faces and tails, and all the while
2024-12-06 22:07:03,794 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig, and it grew and grew like a tree, but there was a black and white mulch that
2024-12-06 22:07:03,794 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig to help him build it.
This was a nice little machine, its just that – that�
2024-12-06 22:07:03,794 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig, while his wife grew a pebble and a pea. Both men were well-trained
2024-12-06 22:07:03,795 - __main__ - INFO - Step 3500: Time: 1697.52 ms. LR: 1.9627e-03. Avg. loss: 3.1821. Perplexity: 24.0982. Grad Norm: 0.0841. Throughput: 308,854.88 tokens/sec
2024-12-06 22:08:19,356 - __main__ - INFO - Step 3550: Time: 1697.83 ms. LR: 1.9586e-03. Avg. loss: 3.2128. Perplexity: 24.8493. Grad Norm: 0.0712. Throughput: 308,799.10 tokens/sec
2024-12-06 22:09:42,696 - src.model_assessment.validation - INFO - Step (3600). Val Loss: 3.1616
2024-12-06 22:09:42,697 - __main__ - INFO - Step 3600: Time: 1697.46 ms. LR: 1.9544e-03. Avg. loss: 3.2260. Perplexity: 25.1784. Grad Norm: 0.0730. Throughput: 308,866.16 tokens/sec
2024-12-06 22:09:43,093 - src.data_processing.training_data_loader - INFO - Next shard key to use: 85. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000086_t_99992086.npy
2024-12-06 22:10:58,866 - __main__ - INFO - Step 3650: Time: 1699.06 ms. LR: 1.9502e-03. Avg. loss: 3.2434. Perplexity: 25.6207. Grad Norm: 0.0716. Throughput: 308,574.82 tokens/sec
2024-12-06 22:12:22,215 - src.model_assessment.validation - INFO - Step (3700). Val Loss: 3.1611
2024-12-06 22:12:22,216 - __main__ - INFO - Step 3700: Time: 1698.00 ms. LR: 1.9459e-03. Avg. loss: 3.3459. Perplexity: 28.3863. Grad Norm: 0.1041. Throughput: 308,767.54 tokens/sec
2024-12-06 22:13:37,978 - __main__ - INFO - Step 3750: Time: 1699.68 ms. LR: 1.9416e-03. Avg. loss: 3.2073. Perplexity: 24.7131. Grad Norm: 0.0792. Throughput: 308,463.01 tokens/sec
2024-12-06 22:14:39,725 - src.data_processing.training_data_loader - INFO - Next shard key to use: 79. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000080_t_99034609.npy
2024-12-06 22:15:01,797 - src.model_assessment.validation - INFO - Step (3800). Val Loss: 3.1553
2024-12-06 22:15:01,798 - __main__ - INFO - Step 3800: Time: 1698.14 ms. LR: 1.9372e-03. Avg. loss: 3.2110. Perplexity: 24.8042. Grad Norm: 0.0814. Throughput: 308,743.17 tokens/sec
2024-12-06 22:15:23,006 - src.utils.root - INFO - Creating dir: /home/MyLLM/temp_data/checkpoints
2024-12-06 22:15:23,006 - src.utils.root - INFO - Creating file: /home/MyLLM/temp_data/checkpoints/checkpoint_step_3814_date_2024_12_06-22_15_UTC.pth
2024-12-06 22:15:29,001 - src.model_utils.checkpoint_utils - INFO - Checkpoint saved at step 3814.
2024-12-06 22:16:23,491 - __main__ - INFO - Step 3850: Time: 1698.35 ms. LR: 1.9328e-03. Avg. loss: 3.2002. Perplexity: 24.5383. Grad Norm: 0.0674. Throughput: 308,704.82 tokens/sec
2024-12-06 22:17:46,857 - src.model_assessment.validation - INFO - Step (3900). Val Loss: 3.1441
2024-12-06 22:17:46,858 - __main__ - INFO - Step 3900: Time: 1698.59 ms. LR: 1.9283e-03. Avg. loss: 3.2527. Perplexity: 25.8597. Grad Norm: 0.0862. Throughput: 308,659.88 tokens/sec
2024-12-06 22:19:02,590 - __main__ - INFO - Step 3950: Time: 1698.63 ms. LR: 1.9237e-03. Avg. loss: 3.1997. Perplexity: 24.5241. Grad Norm: 0.0978. Throughput: 308,652.56 tokens/sec
2024-12-06 22:19:46,916 - src.data_processing.training_data_loader - INFO - Next shard key to use: 35. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000036_t_99808623.npy
2024-12-06 22:20:26,342 - src.model_assessment.validation - INFO - Step (4000). Val Loss: 3.1377
2024-12-06 22:20:51,230 - src.model_assessment.hellaswag - INFO - Step (4000). HellaSwag Evaluation Accuracy: 2916/10042 = 29.04%
2024-12-06 22:20:51,566 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig, too, all together.<|endoftext|>A look at how we are getting closer to “normal�
2024-12-06 22:20:51,567 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig, a bit like a little toothbrush!
As we grow up, we are in awe of
2024-12-06 22:20:51,567 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig, and it was almost a perfect replica of the original. At least it’s more than
2024-12-06 22:20:51,567 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig that grew to be the perfect shape.
Another clever fox of the world built such a cat.
2024-12-06 22:20:51,595 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say nothing if there was a cat behind his back. I can say that cats can talk.
So
2024-12-06 22:20:51,596 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say "thank you," as if he could come back. Now we know he's being very social when
2024-12-06 22:20:51,596 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, "No! I would just walk and run around with a happy face." (Mammals
2024-12-06 22:20:51,596 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, "No," and then he would either say, "No!" or he would just say,
2024-12-06 22:20:51,598 - src.model_assessment.sampling - INFO - HTML stands for "instructions set forth by the Web. Many users of Wikipedia feel strongly that the HTML design
2024-12-06 22:20:51,599 - src.model_assessment.sampling - INFO - HTML stands for text, pictures or sound. All 3 elements of art are represented by a graphic in 3d objects
2024-12-06 22:20:51,599 - src.model_assessment.sampling - INFO - HTML stands for “word and image editing”.
There isn’t much new about how web
2024-12-06 22:20:51,599 - src.model_assessment.sampling - INFO - HTML stands for the complete application of HTML tags to the web pages. This was one of the first things about which
2024-12-06 22:20:51,600 - __main__ - INFO - Step 4000: Time: 1696.99 ms. LR: 1.9191e-03. Avg. loss: 3.2137. Perplexity: 24.8702. Grad Norm: 0.0807. Throughput: 308,952.60 tokens/sec
2024-12-06 22:22:07,121 - __main__ - INFO - Step 4050: Time: 1696.86 ms. LR: 1.9144e-03. Avg. loss: 3.1793. Perplexity: 24.0300. Grad Norm: 0.0801. Throughput: 308,974.65 tokens/sec
2024-12-06 22:23:30,422 - src.model_assessment.validation - INFO - Step (4100). Val Loss: 3.1357
2024-12-06 22:23:30,423 - __main__ - INFO - Step 4100: Time: 1696.25 ms. LR: 1.9097e-03. Avg. loss: 3.1916. Perplexity: 24.3281. Grad Norm: 0.0839. Throughput: 309,086.39 tokens/sec
2024-12-06 22:24:46,113 - __main__ - INFO - Step 4150: Time: 1698.57 ms. LR: 1.9050e-03. Avg. loss: 3.1605. Perplexity: 23.5830. Grad Norm: 0.0834. Throughput: 308,664.47 tokens/sec
2024-12-06 22:25:15,301 - src.data_processing.training_data_loader - INFO - Next shard key to use: 64. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000065_t_99908256.npy
2024-12-06 22:26:09,878 - src.model_assessment.validation - INFO - Step (4200). Val Loss: 3.1288
2024-12-06 22:26:09,879 - __main__ - INFO - Step 4200: Time: 1697.23 ms. LR: 1.9001e-03. Avg. loss: 3.1239. Perplexity: 22.7339. Grad Norm: 0.0807. Throughput: 308,908.42 tokens/sec
2024-12-06 22:27:25,637 - __main__ - INFO - Step 4250: Time: 1700.31 ms. LR: 1.8953e-03. Avg. loss: 3.1629. Perplexity: 23.6402. Grad Norm: 0.0879. Throughput: 308,348.35 tokens/sec
2024-12-06 22:28:49,008 - src.model_assessment.validation - INFO - Step (4300). Val Loss: 3.1340
2024-12-06 22:28:49,009 - __main__ - INFO - Step 4300: Time: 1698.78 ms. LR: 1.8903e-03. Avg. loss: 3.2277. Perplexity: 25.2220. Grad Norm: 0.1039. Throughput: 308,626.66 tokens/sec
2024-12-06 22:30:04,730 - __main__ - INFO - Step 4350: Time: 1698.66 ms. LR: 1.8854e-03. Avg. loss: 3.1786. Perplexity: 24.0137. Grad Norm: 0.0725. Throughput: 308,647.10 tokens/sec
2024-12-06 22:30:19,513 - src.data_processing.training_data_loader - INFO - Next shard key to use: 90. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000091_t_98977145.npy
2024-12-06 22:31:28,542 - src.model_assessment.validation - INFO - Step (4400). Val Loss: 3.1201
2024-12-06 22:31:28,543 - __main__ - INFO - Step 4400: Time: 1698.03 ms. LR: 1.8804e-03. Avg. loss: 3.1422. Perplexity: 23.1555. Grad Norm: 0.0918. Throughput: 308,762.72 tokens/sec
2024-12-06 22:32:44,270 - __main__ - INFO - Step 4450: Time: 1699.36 ms. LR: 1.8753e-03. Avg. loss: 3.1540. Perplexity: 23.4301. Grad Norm: 0.0901. Throughput: 308,521.48 tokens/sec
2024-12-06 22:34:07,649 - src.model_assessment.validation - INFO - Step (4500). Val Loss: 3.1150
2024-12-06 22:34:32,650 - src.model_assessment.hellaswag - INFO - Step (4500). HellaSwag Evaluation Accuracy: 2952/10042 = 29.40%
2024-12-06 22:34:33,016 - src.model_assessment.sampling - INFO - HTML stands for the following:
- There is no definitive record of the existence or presence of the Earth, at
2024-12-06 22:34:33,017 - src.model_assessment.sampling - INFO - HTML stands for the author, and this page gives more information as to what it is about.
The second part
2024-12-06 22:34:33,016 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say that. Some cats and dogs are able to communicate their own language, but some cats and dogs can
2024-12-06 22:34:33,017 - src.model_assessment.sampling - INFO - HTML stands for it. We are not talking about using XHTML and our goal is to write a simple document to
2024-12-06 22:34:33,017 - src.model_assessment.sampling - INFO - HTML stands for Open XML Schema. It comes with the main language so the use of it in the C++
2024-12-06 22:34:33,017 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “If the animal wants us to make up our mind that it is for the animal,
2024-12-06 22:34:33,017 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “Well, it’s easy!” If we could talk it off with
2024-12-06 22:34:33,017 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say: "I am a dog!"
What is a dog?
It is a very small carniv
2024-12-06 22:34:33,027 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig and branches. At her height she may have been the tallest animal ever in the world at the time
2024-12-06 22:34:33,027 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. The man was named “The V” by the fox – so V, the fox
2024-12-06 22:34:33,028 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. And all the way around the plant the fox built himself: on the moss and mossy leaves
2024-12-06 22:34:33,028 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
A tiny finger would stick out of the feather, and a small nail held a feather out
2024-12-06 22:34:33,029 - __main__ - INFO - Step 4500: Time: 1698.36 ms. LR: 1.8702e-03. Avg. loss: 3.1859. Perplexity: 24.1885. Grad Norm: 0.0757. Throughput: 308,703.13 tokens/sec
2024-12-06 22:35:45,965 - src.data_processing.training_data_loader - INFO - Next shard key to use: 17. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000018_t_99572636.npy
2024-12-06 22:35:49,030 - __main__ - INFO - Step 4550: Time: 1697.22 ms. LR: 1.8650e-03. Avg. loss: 3.1208. Perplexity: 22.6639. Grad Norm: 0.1018. Throughput: 308,910.63 tokens/sec
2024-12-06 22:37:12,332 - src.model_assessment.validation - INFO - Step (4600). Val Loss: 3.1137
2024-12-06 22:37:12,333 - __main__ - INFO - Step 4600: Time: 1696.39 ms. LR: 1.8598e-03. Avg. loss: 3.1523. Perplexity: 23.3894. Grad Norm: 0.0764. Throughput: 309,061.24 tokens/sec
2024-12-06 22:38:28,046 - __main__ - INFO - Step 4650: Time: 1698.86 ms. LR: 1.8545e-03. Avg. loss: 3.1469. Perplexity: 23.2647. Grad Norm: 0.0955. Throughput: 308,610.85 tokens/sec
2024-12-06 22:39:51,419 - src.model_assessment.validation - INFO - Step (4700). Val Loss: 3.1060
2024-12-06 22:39:51,420 - __main__ - INFO - Step 4700: Time: 1698.38 ms. LR: 1.8492e-03. Avg. loss: 3.1846. Perplexity: 24.1580. Grad Norm: 0.0870. Throughput: 308,699.40 tokens/sec
2024-12-06 22:40:48,609 - src.data_processing.training_data_loader - INFO - Next shard key to use: 30. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000031_t_99731166.npy
2024-12-06 22:41:07,587 - __main__ - INFO - Step 4750: Time: 1697.25 ms. LR: 1.8438e-03. Avg. loss: 3.1569. Perplexity: 23.4967. Grad Norm: 0.0788. Throughput: 308,903.64 tokens/sec
2024-12-06 22:42:30,955 - src.model_assessment.validation - INFO - Step (4800). Val Loss: 3.0996
2024-12-06 22:42:30,956 - __main__ - INFO - Step 4800: Time: 1698.24 ms. LR: 1.8384e-03. Avg. loss: 3.1643. Perplexity: 23.6723. Grad Norm: 0.0862. Throughput: 308,725.23 tokens/sec
2024-12-06 22:43:46,693 - __main__ - INFO - Step 4850: Time: 1698.46 ms. LR: 1.8329e-03. Avg. loss: 3.1662. Perplexity: 23.7171. Grad Norm: 0.0739. Throughput: 308,684.10 tokens/sec
2024-12-06 22:45:10,030 - src.model_assessment.validation - INFO - Step (4900). Val Loss: 3.0997
2024-12-06 22:45:10,031 - __main__ - INFO - Step 4900: Time: 1697.18 ms. LR: 1.8274e-03. Avg. loss: 3.1681. Perplexity: 23.7620. Grad Norm: 0.0777. Throughput: 308,916.44 tokens/sec
2024-12-06 22:45:52,089 - src.data_processing.training_data_loader - INFO - Next shard key to use: 56. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000057_t_100864631.npy
2024-12-06 22:46:26,209 - __main__ - INFO - Step 4950: Time: 1698.10 ms. LR: 1.8219e-03. Avg. loss: 3.1428. Perplexity: 23.1690. Grad Norm: 0.0752. Throughput: 308,750.20 tokens/sec
2024-12-06 22:47:49,552 - src.model_assessment.validation - INFO - Step (5000). Val Loss: 3.0927
2024-12-06 22:48:14,568 - src.model_assessment.hellaswag - INFO - Step (5000). HellaSwag Evaluation Accuracy: 2971/10042 = 29.59%
2024-12-06 22:48:14,934 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say something. I know there is always a chance it would be up to us that this animal might speak
2024-12-06 22:48:14,935 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say ‘a man wouldn’t talk to me.
‘A man would not talk
2024-12-06 22:48:14,935 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “I won’t.” I wouldn’t touch him if you
2024-12-06 22:48:14,935 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say “NO.” It would seem like it didn’t intend to be a joke
2024-12-06 22:48:14,935 - src.model_assessment.sampling - INFO - HTML stands for a set of Unicode characters that are used for coding purposes in most HTML files.
I am going
2024-12-06 22:48:14,935 - src.model_assessment.sampling - INFO - HTML stands for “the same technology and the same technologies that are used to create and add to the web pages
2024-12-06 22:48:14,936 - src.model_assessment.sampling - INFO - HTML stands for data structure that will serve as a basis for writing applications. (HTML is an object-oriented language
2024-12-06 22:48:14,936 - src.model_assessment.sampling - INFO - HTML stands for HTML, and therefore, should be preceded by a URL, and you should avoid a URL that starts
2024-12-06 22:48:14,945 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. The cunning fox is usually too small (about 5-8 cm) to have an ear and
2024-12-06 22:48:14,945 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
When the machine failed, the bird flew over the tunnel, as might be expected, to
2024-12-06 22:48:14,946 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
It turned out a light bulb inside of a square-shaped machine, called a barrow
2024-12-06 22:48:14,946 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig in hand. It would move on its own in an amazing display. If it wanted to catch the
2024-12-06 22:48:14,947 - __main__ - INFO - Step 5000: Time: 1696.87 ms. LR: 1.8162e-03. Avg. loss: 3.1586. Perplexity: 23.5378. Grad Norm: 0.0714. Throughput: 308,973.65 tokens/sec
2024-12-06 22:49:30,552 - __main__ - INFO - Step 5050: Time: 1699.01 ms. LR: 1.8106e-03. Avg. loss: 3.1455. Perplexity: 23.2315. Grad Norm: 0.0749. Throughput: 308,584.21 tokens/sec
2024-12-06 22:50:53,895 - src.model_assessment.validation - INFO - Step (5100). Val Loss: 3.0939
2024-12-06 22:50:53,896 - __main__ - INFO - Step 5100: Time: 1700.57 ms. LR: 1.8049e-03. Avg. loss: 3.1717. Perplexity: 23.8488. Grad Norm: 0.0689. Throughput: 308,301.10 tokens/sec
2024-12-06 22:51:23,830 - src.data_processing.training_data_loader - INFO - Next shard key to use: 36. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000037_t_99538740.npy
2024-12-06 22:52:10,057 - __main__ - INFO - Step 5150: Time: 1698.94 ms. LR: 1.7992e-03. Avg. loss: 3.1408. Perplexity: 23.1213. Grad Norm: 0.0881. Throughput: 308,597.81 tokens/sec
2024-12-06 22:53:33,442 - src.model_assessment.validation - INFO - Step (5200). Val Loss: 3.0841
2024-12-06 22:53:33,443 - __main__ - INFO - Step 5200: Time: 1698.46 ms. LR: 1.7934e-03. Avg. loss: 3.1222. Perplexity: 22.6969. Grad Norm: 0.0668. Throughput: 308,684.84 tokens/sec
2024-12-06 22:54:49,202 - __main__ - INFO - Step 5250: Time: 1698.70 ms. LR: 1.7875e-03. Avg. loss: 3.1321. Perplexity: 22.9221. Grad Norm: 0.0817. Throughput: 308,641.21 tokens/sec
2024-12-06 22:56:12,560 - src.model_assessment.validation - INFO - Step (5300). Val Loss: 3.0819
2024-12-06 22:56:12,561 - __main__ - INFO - Step 5300: Time: 1698.21 ms. LR: 1.7817e-03. Avg. loss: 3.1554. Perplexity: 23.4627. Grad Norm: 0.0833. Throughput: 308,729.43 tokens/sec
2024-12-06 22:56:26,612 - src.data_processing.training_data_loader - INFO - Next shard key to use: 16. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000017_t_99616998.npy
2024-12-06 22:57:28,718 - __main__ - INFO - Step 5350: Time: 1698.80 ms. LR: 1.7758e-03. Avg. loss: 3.1237. Perplexity: 22.7313. Grad Norm: 0.0775. Throughput: 308,623.45 tokens/sec
2024-12-06 22:58:52,054 - src.model_assessment.validation - INFO - Step (5400). Val Loss: 3.0777
2024-12-06 22:58:52,055 - __main__ - INFO - Step 5400: Time: 1698.43 ms. LR: 1.7698e-03. Avg. loss: 3.1015. Perplexity: 22.2306. Grad Norm: 0.0703. Throughput: 308,689.30 tokens/sec
2024-12-06 23:00:07,773 - __main__ - INFO - Step 5450: Time: 1698.80 ms. LR: 1.7638e-03. Avg. loss: 3.1622. Perplexity: 23.6218. Grad Norm: 0.1072. Throughput: 308,623.28 tokens/sec
2024-12-06 23:01:22,408 - src.data_processing.training_data_loader - INFO - Next shard key to use: 38. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000039_t_98981562.npy
2024-12-06 23:01:31,565 - src.model_assessment.validation - INFO - Step (5500). Val Loss: 3.0788
2024-12-06 23:01:56,451 - src.model_assessment.hellaswag - INFO - Step (5500). HellaSwag Evaluation Accuracy: 2993/10042 = 29.80%
2024-12-06 23:01:56,819 - src.model_assessment.sampling - INFO - HTML stands for a "Language Web Site". The HTML code is not used. It is a HTML editor. A
2024-12-06 23:01:56,820 - src.model_assessment.sampling - INFO - HTML stands for Web Content Markup Language (HTML). The specification of the website URL, or header, contains a
2024-12-06 23:01:56,820 - src.model_assessment.sampling - INFO - HTML stands for Extensible Markup Language (XHTML), a markup language for the Web. As any web developer
2024-12-06 23:01:56,820 - src.model_assessment.sampling - INFO - HTML stands for Active, Passive, and Extensible Extensible Markup Language. According most users, the term "
2024-12-06 23:01:56,820 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say something like “I want to be that animal.” Most of our words have to do
2024-12-06 23:01:56,821 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say the opposite, "Goodbye!"
In this case it would have a happy ending:
"
2024-12-06 23:01:56,821 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say something to me, this is what the world looks like for, with many different sounds.
But
2024-12-06 23:01:56,821 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say something like "Coyote. I want it, but we don't want our cat any more
2024-12-06 23:01:56,827 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig, and a small twig. If you want to run the experiment down the path you need to
2024-12-06 23:01:56,827 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
- The horse-drawn wagon covered the field with mud. A man in a wagg
2024-12-06 23:01:56,827 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. The fox quickly removed the feather and went on in his quest for more than 100,000 pounds
2024-12-06 23:01:56,828 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. You can do all of the things that we humans do not know about what makes animals unique.
2024-12-06 23:01:56,829 - __main__ - INFO - Step 5500: Time: 2136.18 ms. LR: 1.7577e-03. Avg. loss: 3.0972. Perplexity: 22.1353. Grad Norm: 0.0821. Throughput: 245,432.54 tokens/sec
2024-12-06 23:03:12,466 - __main__ - INFO - Step 5550: Time: 1698.30 ms. LR: 1.7517e-03. Avg. loss: 3.1430. Perplexity: 23.1723. Grad Norm: 0.0802. Throughput: 308,712.83 tokens/sec
2024-12-06 23:04:35,862 - src.model_assessment.validation - INFO - Step (5600). Val Loss: 3.0745
2024-12-06 23:04:35,863 - __main__ - INFO - Step 5600: Time: 1699.04 ms. LR: 1.7455e-03. Avg. loss: 3.1088. Perplexity: 22.3951. Grad Norm: 0.0852. Throughput: 308,579.75 tokens/sec
2024-12-06 23:05:51,603 - __main__ - INFO - Step 5650: Time: 1699.79 ms. LR: 1.7394e-03. Avg. loss: 3.1460. Perplexity: 23.2431. Grad Norm: 0.0725. Throughput: 308,442.37 tokens/sec
2024-12-06 23:06:48,838 - src.data_processing.training_data_loader - INFO - Next shard key to use: 51. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000052_t_100988833.npy
2024-12-06 23:07:15,461 - src.model_assessment.validation - INFO - Step (5700). Val Loss: 3.0722
2024-12-06 23:07:15,463 - __main__ - INFO - Step 5700: Time: 1698.10 ms. LR: 1.7332e-03. Avg. loss: 3.1017. Perplexity: 22.2362. Grad Norm: 0.0777. Throughput: 308,749.59 tokens/sec
2024-12-06 23:08:31,194 - __main__ - INFO - Step 5750: Time: 1698.45 ms. LR: 1.7269e-03. Avg. loss: 3.1118. Perplexity: 22.4625. Grad Norm: 0.0793. Throughput: 308,685.71 tokens/sec
2024-12-06 23:09:54,552 - src.model_assessment.validation - INFO - Step (5800). Val Loss: 3.0652
2024-12-06 23:09:54,553 - __main__ - INFO - Step 5800: Time: 1696.09 ms. LR: 1.7206e-03. Avg. loss: 3.0943. Perplexity: 22.0719. Grad Norm: 0.0722. Throughput: 309,116.37 tokens/sec
2024-12-06 23:11:10,299 - __main__ - INFO - Step 5850: Time: 1700.11 ms. LR: 1.7143e-03. Avg. loss: 3.1154. Perplexity: 22.5427. Grad Norm: 0.0899. Throughput: 308,385.19 tokens/sec
2024-12-06 23:11:56,181 - src.data_processing.training_data_loader - INFO - Next shard key to use: 20. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000021_t_98598536.npy
2024-12-06 23:12:34,135 - src.model_assessment.validation - INFO - Step (5900). Val Loss: 3.0613
2024-12-06 23:12:34,136 - __main__ - INFO - Step 5900: Time: 1697.74 ms. LR: 1.7079e-03. Avg. loss: 3.0777. Perplexity: 21.7093. Grad Norm: 0.0702. Throughput: 308,814.93 tokens/sec
2024-12-06 23:13:49,878 - __main__ - INFO - Step 5950: Time: 1699.42 ms. LR: 1.7015e-03. Avg. loss: 3.1109. Perplexity: 22.4402. Grad Norm: 0.0787. Throughput: 308,510.70 tokens/sec
2024-12-06 23:15:13,291 - src.model_assessment.validation - INFO - Step (6000). Val Loss: 3.0583
2024-12-06 23:15:38,202 - src.model_assessment.hellaswag - INFO - Step (6000). HellaSwag Evaluation Accuracy: 3029/10042 = 30.16%
2024-12-06 23:15:38,569 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, "I wonder," or "I want my pet." I find it a bit upsetting that people
2024-12-06 23:15:38,569 - src.model_assessment.sampling - INFO - HTML stands for the “Assessing and Identifying Data from Resources”, as well as “The
2024-12-06 23:15:38,570 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say “no.” He is probably saying no. I guess I’m not a
2024-12-06 23:15:38,570 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say yes.
It certainly doesn’t make sense if we want to use vocalisations in an
2024-12-06 23:15:38,570 - src.model_assessment.sampling - INFO - HTML stands for 'Comunicator', the 'Word Creator', and the 'Word Builder'. The new version of
2024-12-06 23:15:38,570 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say something very similar: Hey, that’s a terrible day, and what you’re
2024-12-06 23:15:38,570 - src.model_assessment.sampling - INFO - HTML stands for "hypertext writing software". This means your site will do not support HTML from any source.
2024-12-06 23:15:38,570 - src.model_assessment.sampling - INFO - HTML stands for ‘The American Standard Version of the Bible’. These are the words that are used in
2024-12-06 23:15:38,577 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig from the pine resin tree. On one side of the cat’s head, you can see
2024-12-06 23:15:38,578 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. We can only guess that he had got the head from which to hold the feather when walking on
2024-12-06 23:15:38,578 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. Each arm had a pair of tiny hinges that held the box in place. The feather and pe
2024-12-06 23:15:38,578 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig stuck next to it.”
When I asked her, she said “How many feathers
2024-12-06 23:15:38,579 - __main__ - INFO - Step 6000: Time: 1697.95 ms. LR: 1.6951e-03. Avg. loss: 3.1735. Perplexity: 23.8901. Grad Norm: 0.0772. Throughput: 308,776.34 tokens/sec
2024-12-06 23:16:54,195 - __main__ - INFO - Step 6050: Time: 1699.00 ms. LR: 1.6886e-03. Avg. loss: 3.1208. Perplexity: 22.6645. Grad Norm: 0.0727. Throughput: 308,586.38 tokens/sec
2024-12-06 23:17:21,876 - src.data_processing.training_data_loader - INFO - Next shard key to use: 60. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000061_t_99221277.npy
2024-12-06 23:18:17,990 - src.model_assessment.validation - INFO - Step (6100). Val Loss: 3.0555
2024-12-06 23:18:17,991 - __main__ - INFO - Step 6100: Time: 1698.26 ms. LR: 1.6821e-03. Avg. loss: 3.0783. Perplexity: 21.7210. Grad Norm: 0.0706. Throughput: 308,720.68 tokens/sec
2024-12-06 23:19:33,769 - __main__ - INFO - Step 6150: Time: 1699.47 ms. LR: 1.6755e-03. Avg. loss: 3.0991. Perplexity: 22.1790. Grad Norm: 0.0876. Throughput: 308,501.09 tokens/sec
2024-12-06 23:20:57,146 - src.model_assessment.validation - INFO - Step (6200). Val Loss: 3.0512
2024-12-06 23:20:57,147 - __main__ - INFO - Step 6200: Time: 1697.03 ms. LR: 1.6689e-03. Avg. loss: 3.0691. Perplexity: 21.5219. Grad Norm: 0.0681. Throughput: 308,944.96 tokens/sec
2024-12-06 23:22:12,916 - __main__ - INFO - Step 6250: Time: 1699.57 ms. LR: 1.6623e-03. Avg. loss: 3.1143. Perplexity: 22.5178. Grad Norm: 0.0765. Throughput: 308,482.92 tokens/sec
2024-12-06 23:22:23,950 - src.data_processing.training_data_loader - INFO - Next shard key to use: 88. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000089_t_99472319.npy
2024-12-06 23:23:36,769 - src.model_assessment.validation - INFO - Step (6300). Val Loss: 3.0479
2024-12-06 23:23:36,770 - __main__ - INFO - Step 6300: Time: 1700.02 ms. LR: 1.6556e-03. Avg. loss: 3.0897. Perplexity: 21.9698. Grad Norm: 0.0634. Throughput: 308,401.54 tokens/sec
2024-12-06 23:24:52,527 - __main__ - INFO - Step 6350: Time: 1699.33 ms. LR: 1.6489e-03. Avg. loss: 3.0800. Perplexity: 21.7594. Grad Norm: 0.0721. Throughput: 308,525.68 tokens/sec
2024-12-06 23:26:15,872 - src.model_assessment.validation - INFO - Step (6400). Val Loss: 3.0441
2024-12-06 23:26:15,873 - __main__ - INFO - Step 6400: Time: 1697.57 ms. LR: 1.6422e-03. Avg. loss: 3.1129. Perplexity: 22.4868. Grad Norm: 0.0726. Throughput: 308,846.59 tokens/sec
2024-12-06 23:27:26,700 - src.data_processing.training_data_loader - INFO - Next shard key to use: 77. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000078_t_99732892.npy
2024-12-06 23:27:32,056 - __main__ - INFO - Step 6450: Time: 1699.14 ms. LR: 1.6354e-03. Avg. loss: 3.1164. Perplexity: 22.5648. Grad Norm: 0.0788. Throughput: 308,561.61 tokens/sec
2024-12-06 23:28:55,430 - src.model_assessment.validation - INFO - Step (6500). Val Loss: 3.0473
2024-12-06 23:29:20,444 - src.model_assessment.hellaswag - INFO - Step (6500). HellaSwag Evaluation Accuracy: 3016/10042 = 30.03%
2024-12-06 23:29:20,811 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “He who eats cows has a brain!”
And no, he who eats
2024-12-06 23:29:20,812 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “My dog?” I can not tell how he was feeling, and I get
2024-12-06 23:29:20,812 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “My name is Roo, it is a big fish,” and that�
2024-12-06 23:29:20,812 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, ‘no,’ rather than ‘yes,’ it’s likely
2024-12-06 23:29:20,813 - src.model_assessment.sampling - INFO - HTML stands for non-linear programming language. In its simplest form it has two distinct levels of abstraction: the main
2024-12-06 23:29:20,814 - src.model_assessment.sampling - INFO - HTML stands for “The Extensible Markup Language,” and this is a great way to help teach
2024-12-06 23:29:20,814 - src.model_assessment.sampling - INFO - HTML stands for Hypertext Markup Language and Microsoft's Internet Explorer for Windows. We use it to format web pages
2024-12-06 23:29:20,814 - src.model_assessment.sampling - INFO - HTML stands for
The first section can be seen
[B], where a
is the head of each argument
2024-12-06 23:29:20,818 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
These are not the first animals, but they are certainly the last. If there is ever
2024-12-06 23:29:20,819 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. The “little boy” can turn into a tiny robot if he touches two fingers,
2024-12-06 23:29:20,819 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig, so that the wind would roll off its limbs, and it would drift off to the next day
2024-12-06 23:29:20,819 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
The story was first told in America in 1868; it would go on to come in
2024-12-06 23:29:20,820 - __main__ - INFO - Step 6500: Time: 1698.09 ms. LR: 1.6286e-03. Avg. loss: 3.1610. Perplexity: 23.5938. Grad Norm: 0.0751. Throughput: 308,752.06 tokens/sec
2024-12-06 23:30:36,393 - __main__ - INFO - Step 6550: Time: 1699.80 ms. LR: 1.6218e-03. Avg. loss: 3.0669. Perplexity: 21.4750. Grad Norm: 0.0829. Throughput: 308,441.51 tokens/sec
2024-12-06 23:31:59,747 - src.model_assessment.validation - INFO - Step (6600). Val Loss: 3.0426
2024-12-06 23:31:59,748 - __main__ - INFO - Step 6600: Time: 1696.97 ms. LR: 1.6149e-03. Avg. loss: 3.1188. Perplexity: 22.6197. Grad Norm: 0.0731. Throughput: 308,954.55 tokens/sec
2024-12-06 23:32:55,424 - src.data_processing.training_data_loader - INFO - Next shard key to use: 95. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000096_t_99705588.npy
2024-12-06 23:33:15,898 - __main__ - INFO - Step 6650: Time: 1698.49 ms. LR: 1.6080e-03. Avg. loss: 3.0328. Perplexity: 20.7552. Grad Norm: 0.0728. Throughput: 308,679.34 tokens/sec
2024-12-06 23:34:39,213 - src.model_assessment.validation - INFO - Step (6700). Val Loss: 3.0374
2024-12-06 23:34:39,214 - __main__ - INFO - Step 6700: Time: 1695.96 ms. LR: 1.6011e-03. Avg. loss: 3.1220. Perplexity: 22.6913. Grad Norm: 0.0756. Throughput: 309,139.36 tokens/sec
2024-12-06 23:35:54,923 - __main__ - INFO - Step 6750: Time: 1698.73 ms. LR: 1.5942e-03. Avg. loss: 3.1254. Perplexity: 22.7681. Grad Norm: 0.0746. Throughput: 308,635.88 tokens/sec
2024-12-06 23:37:18,286 - src.model_assessment.validation - INFO - Step (6800). Val Loss: 3.0361
2024-12-06 23:37:18,287 - __main__ - INFO - Step 6800: Time: 1697.61 ms. LR: 1.5872e-03. Avg. loss: 3.0914. Perplexity: 22.0076. Grad Norm: 0.0685. Throughput: 308,838.61 tokens/sec
2024-12-06 23:37:58,837 - src.data_processing.training_data_loader - INFO - Next shard key to use: 47. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000048_t_99264953.npy
2024-12-06 23:38:34,463 - __main__ - INFO - Step 6850: Time: 1698.85 ms. LR: 1.5801e-03. Avg. loss: 3.0962. Perplexity: 22.1141. Grad Norm: 0.0753. Throughput: 308,613.71 tokens/sec
2024-12-06 23:39:57,823 - src.model_assessment.validation - INFO - Step (6900). Val Loss: 3.0297
2024-12-06 23:39:57,824 - __main__ - INFO - Step 6900: Time: 1697.41 ms. LR: 1.5731e-03. Avg. loss: 3.0829. Perplexity: 21.8207. Grad Norm: 0.0872. Throughput: 308,874.79 tokens/sec
2024-12-06 23:41:13,561 - __main__ - INFO - Step 6950: Time: 1698.75 ms. LR: 1.5660e-03. Avg. loss: 3.0771. Perplexity: 21.6962. Grad Norm: 0.0727. Throughput: 308,631.20 tokens/sec
2024-12-06 23:42:36,912 - src.model_assessment.validation - INFO - Step (7000). Val Loss: 3.0297
2024-12-06 23:43:01,787 - src.model_assessment.hellaswag - INFO - Step (7000). HellaSwag Evaluation Accuracy: 3097/10042 = 30.84%
2024-12-06 23:43:02,154 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say something very different...
By contrast to a cat, only animals are capable of vocalizations, and
2024-12-06 23:43:02,155 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say something very similar to its young dog or cat or a human, but we’re going to
2024-12-06 23:43:02,155 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, ‘It’s a big deal.’” He feels like he�
2024-12-06 23:43:02,155 - src.model_assessment.sampling - INFO - HTML stands for “universal markup language for web pages.”
Why do I need the W3C
2024-12-06 23:43:02,155 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say “I can’t feel him or her”. “So that’
2024-12-06 23:43:02,155 - src.model_assessment.sampling - INFO - HTML stands for the "European Commission and the European Centre for Agricultural Communications and Innovation.|
|• Summer 2009
2024-12-06 23:43:02,156 - src.model_assessment.sampling - INFO - HTML stands for Content-Based Learning, since a user is going to learn content material for a specific purpose and can
2024-12-06 23:43:02,156 - src.model_assessment.sampling - INFO - HTML stands for Light, Light, Light, Light, Light
Light is the power to be measured as far as
2024-12-06 23:43:02,164 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. A piece of wood, wrapped in silk threads, was placed on top of a heap of feathers
2024-12-06 23:43:02,164 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. This is the first example of what the intelligence of the creature is, like the creature in the
2024-12-06 23:43:02,165 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. The man was very curious.
“Who is the first to create the invisible machine?
2024-12-06 23:43:02,165 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
Then, the next night, the frightened fox saw that one thing, he had never seen
2024-12-06 23:43:02,166 - __main__ - INFO - Step 7000: Time: 1696.71 ms. LR: 1.5589e-03. Avg. loss: 3.0791. Perplexity: 21.7399. Grad Norm: 0.0782. Throughput: 309,002.87 tokens/sec
2024-12-06 23:43:25,922 - src.data_processing.training_data_loader - INFO - Next shard key to use: 97. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000098_t_99316308.npy
2024-12-06 23:44:18,134 - __main__ - INFO - Step 7050: Time: 1697.94 ms. LR: 1.5518e-03. Avg. loss: 3.0605. Perplexity: 21.3382. Grad Norm: 0.0741. Throughput: 308,778.77 tokens/sec
2024-12-06 23:45:41,433 - src.model_assessment.validation - INFO - Step (7100). Val Loss: 3.0255
2024-12-06 23:45:41,434 - __main__ - INFO - Step 7100: Time: 1696.86 ms. LR: 1.5446e-03. Avg. loss: 3.0926. Perplexity: 22.0335. Grad Norm: 0.0712. Throughput: 308,975.74 tokens/sec
2024-12-06 23:46:57,142 - __main__ - INFO - Step 7150: Time: 1698.67 ms. LR: 1.5374e-03. Avg. loss: 3.0936. Perplexity: 22.0562. Grad Norm: 0.0639. Throughput: 308,645.76 tokens/sec
2024-12-06 23:48:20,493 - src.model_assessment.validation - INFO - Step (7200). Val Loss: 3.0231
2024-12-06 23:48:20,494 - __main__ - INFO - Step 7200: Time: 1697.59 ms. LR: 1.5302e-03. Avg. loss: 3.0629. Perplexity: 21.3896. Grad Norm: 0.0671. Throughput: 308,841.82 tokens/sec
2024-12-06 23:48:27,698 - src.data_processing.training_data_loader - INFO - Next shard key to use: 91. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000092_t_99779299.npy
2024-12-06 23:49:36,680 - __main__ - INFO - Step 7250: Time: 1697.94 ms. LR: 1.5230e-03. Avg. loss: 3.0987. Perplexity: 22.1693. Grad Norm: 0.0814. Throughput: 308,779.03 tokens/sec
2024-12-06 23:51:00,032 - src.model_assessment.validation - INFO - Step (7300). Val Loss: 3.0208
2024-12-06 23:51:00,033 - __main__ - INFO - Step 7300: Time: 1697.16 ms. LR: 1.5157e-03. Avg. loss: 3.0557. Perplexity: 21.2355. Grad Norm: 0.0753. Throughput: 308,920.39 tokens/sec
2024-12-06 23:52:15,724 - __main__ - INFO - Step 7350: Time: 1698.21 ms. LR: 1.5084e-03. Avg. loss: 3.1501. Perplexity: 23.3378. Grad Norm: 0.0660. Throughput: 308,729.26 tokens/sec
2024-12-06 23:53:23,491 - src.data_processing.training_data_loader - INFO - Next shard key to use: 69. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000070_t_99186271.npy
2024-12-06 23:53:39,474 - src.model_assessment.validation - INFO - Step (7400). Val Loss: 3.0176
2024-12-06 23:53:39,475 - __main__ - INFO - Step 7400: Time: 1696.72 ms. LR: 1.5011e-03. Avg. loss: 3.0280. Perplexity: 20.6556. Grad Norm: 0.0703. Throughput: 309,000.83 tokens/sec
2024-12-06 23:54:55,184 - __main__ - INFO - Step 7450: Time: 1698.52 ms. LR: 1.4938e-03. Avg. loss: 3.0883. Perplexity: 21.9403. Grad Norm: 0.0680. Throughput: 308,672.79 tokens/sec
2024-12-06 23:56:18,516 - src.model_assessment.validation - INFO - Step (7500). Val Loss: 3.0181
2024-12-06 23:56:43,539 - src.model_assessment.hellaswag - INFO - Step (7500). HellaSwag Evaluation Accuracy: 3077/10042 = 30.64%
2024-12-06 23:56:43,906 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “Serve up. That’s the way dogs go!”
This
2024-12-06 23:56:43,906 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “Pets, what do you think dogs and cats love, or dogs and cats might
2024-12-06 23:56:43,906 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, 'I wonder if God would allow me to talk like you.
|In the days of
2024-12-06 23:56:43,907 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say goodbye to her.
I’m sure you didn’t ask “Why is
2024-12-06 23:56:43,909 - src.model_assessment.sampling - INFO - HTML stands for the standard markup language used to create HTML markup for use in Web browser applications. The HTML markup language
2024-12-06 23:56:43,909 - src.model_assessment.sampling - INFO - HTML stands for ‘The Word’, and to speak of ‘a’ as ‘of
2024-12-06 23:56:43,909 - src.model_assessment.sampling - INFO - HTML stands for hypertext markup language, and HTML stands for HyperText Markup Language. The syntax of HTML is
2024-12-06 23:56:43,909 - src.model_assessment.sampling - INFO - HTML stands for "The Message-Transfer Layer."
In this article, we will look at the history of TCP
2024-12-06 23:56:43,916 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig of pine – and he built it, the other did not.
There is no contradiction in this
2024-12-06 23:56:43,916 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
In the beginning there was a simple machine that would do everything, but it was not very
2024-12-06 23:56:43,917 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig of cork and pine. The old man then sat down, and said, "My friend,
2024-12-06 23:56:43,917 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig of wire. A tiny wire flew across the surface of the water, which was then turned on.
2024-12-06 23:56:43,918 - __main__ - INFO - Step 7500: Time: 1697.28 ms. LR: 1.4864e-03. Avg. loss: 3.0068. Perplexity: 20.2217. Grad Norm: 0.0782. Throughput: 308,899.13 tokens/sec
2024-12-06 23:57:59,479 - __main__ - INFO - Step 7550: Time: 1697.99 ms. LR: 1.4790e-03. Avg. loss: 3.1364. Perplexity: 23.0202. Grad Norm: 0.0746. Throughput: 308,769.01 tokens/sec
2024-12-06 23:58:50,605 - src.data_processing.training_data_loader - INFO - Next shard key to use: 83. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000084_t_99315055.npy
2024-12-06 23:59:23,263 - src.model_assessment.validation - INFO - Step (7600). Val Loss: 3.0111
2024-12-06 23:59:23,264 - __main__ - INFO - Step 7600: Time: 1698.65 ms. LR: 1.4716e-03. Avg. loss: 3.1008. Perplexity: 22.2154. Grad Norm: 0.0813. Throughput: 308,650.39 tokens/sec
2024-12-07 00:00:05,674 - src.utils.root - INFO - Creating dir: /home/MyLLM/temp_data/checkpoints
2024-12-07 00:00:05,675 - src.utils.root - INFO - Creating file: /home/MyLLM/temp_data/checkpoints/checkpoint_step_7628_date_2024_12_07-00_00_UTC.pth
2024-12-07 00:00:11,651 - src.model_utils.checkpoint_utils - INFO - Checkpoint saved at step 7628.
2024-12-07 00:00:44,945 - __main__ - INFO - Step 7650: Time: 1697.70 ms. LR: 1.4642e-03. Avg. loss: 3.1312. Perplexity: 22.9010. Grad Norm: 0.0791. Throughput: 308,822.95 tokens/sec
2024-12-07 00:02:08,285 - src.model_assessment.validation - INFO - Step (7700). Val Loss: 3.0105
2024-12-07 00:02:08,286 - __main__ - INFO - Step 7700: Time: 1697.18 ms. LR: 1.4567e-03. Avg. loss: 3.0800. Perplexity: 21.7595. Grad Norm: 0.0796. Throughput: 308,917.66 tokens/sec
2024-12-07 00:03:23,993 - __main__ - INFO - Step 7750: Time: 1697.76 ms. LR: 1.4492e-03. Avg. loss: 3.0664. Perplexity: 21.4640. Grad Norm: 0.0808. Throughput: 308,811.63 tokens/sec
2024-12-07 00:03:58,465 - src.data_processing.training_data_loader - INFO - Next shard key to use: 61. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000062_t_99772284.npy
2024-12-07 00:04:47,761 - src.model_assessment.validation - INFO - Step (7800). Val Loss: 3.0099
2024-12-07 00:04:47,762 - __main__ - INFO - Step 7800: Time: 1698.36 ms. LR: 1.4417e-03. Avg. loss: 3.0255. Perplexity: 20.6036. Grad Norm: 0.0867. Throughput: 308,702.95 tokens/sec
2024-12-07 00:06:03,510 - __main__ - INFO - Step 7850: Time: 1699.20 ms. LR: 1.4342e-03. Avg. loss: 3.0680. Perplexity: 21.4980. Grad Norm: 0.0676. Throughput: 308,549.27 tokens/sec
2024-12-07 00:07:26,841 - src.model_assessment.validation - INFO - Step (7900). Val Loss: 3.0050
2024-12-07 00:07:26,842 - __main__ - INFO - Step 7900: Time: 1696.15 ms. LR: 1.4267e-03. Avg. loss: 3.0703. Perplexity: 21.5494. Grad Norm: 0.0870. Throughput: 309,104.73 tokens/sec
2024-12-07 00:08:42,536 - __main__ - INFO - Step 7950: Time: 1697.23 ms. LR: 1.4191e-03. Avg. loss: 3.0496. Perplexity: 21.1066. Grad Norm: 0.0861. Throughput: 308,908.68 tokens/sec
2024-12-07 00:09:01,871 - src.data_processing.training_data_loader - INFO - Next shard key to use: 12. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000013_t_99454541.npy
2024-12-07 00:10:06,282 - src.model_assessment.validation - INFO - Step (8000). Val Loss: 3.0008
2024-12-07 00:10:31,330 - src.model_assessment.hellaswag - INFO - Step (8000). HellaSwag Evaluation Accuracy: 3127/10042 = 31.14%
2024-12-07 00:10:31,698 - src.model_assessment.sampling - INFO - HTML stands for Web Server, a type of server that contains and manages the web server.
- HTML – stands
2024-12-07 00:10:31,699 - src.model_assessment.sampling - INFO - HTML stands for “universal design”, and is a specific language designed to help developers integrate the design of
2024-12-07 00:10:31,699 - src.model_assessment.sampling - INFO - HTML stands for “Universal Resource Description Framework”.
The internet of things (IoT) technologies
2024-12-07 00:10:31,699 - src.model_assessment.sampling - INFO - HTML stands for “Internet of Things,” which is a term that refers to the technology that allows for
2024-12-07 00:10:31,699 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say to the other animal "If I have a dog at home I should go and say to your pet
2024-12-07 00:10:31,699 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, "I love you!" It's because our understanding of animals is based on the same principles.
2024-12-07 00:10:31,699 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say they just want to show how well they’re doing.
Of the creatures I would likely
2024-12-07 00:10:31,699 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say “No“. Even the most intelligent pet that my animals are known to be able to
2024-12-07 00:10:31,705 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
In 1837, the first real Fox Hunter took his first trip, a trip that would
2024-12-07 00:10:31,706 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig that connected it to a long, thin, and very thin metal wire that controlled the electricity. One
2024-12-07 00:10:31,706 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
One day, the fox crept toward the machine.
“Come out,”
2024-12-07 00:10:31,706 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig, which is actually a ball.
The two teams of 3-4, three teams of 4
2024-12-07 00:10:31,707 - __main__ - INFO - Step 8000: Time: 1697.41 ms. LR: 1.4116e-03. Avg. loss: 3.0460. Perplexity: 21.0301. Grad Norm: 0.0709. Throughput: 308,874.92 tokens/sec
2024-12-07 00:11:47,289 - __main__ - INFO - Step 8050: Time: 1698.27 ms. LR: 1.4040e-03. Avg. loss: 3.0607. Perplexity: 21.3422. Grad Norm: 0.0727. Throughput: 308,718.42 tokens/sec
2024-12-07 00:13:10,632 - src.model_assessment.validation - INFO - Step (8100). Val Loss: 3.0010
2024-12-07 00:13:10,633 - __main__ - INFO - Step 8100: Time: 1697.36 ms. LR: 1.3964e-03. Avg. loss: 3.0096. Perplexity: 20.2797. Grad Norm: 0.0714. Throughput: 308,884.38 tokens/sec
2024-12-07 00:14:26,370 - __main__ - INFO - Step 8150: Time: 1697.62 ms. LR: 1.3888e-03. Avg. loss: 3.0840. Perplexity: 21.8460. Grad Norm: 0.0691. Throughput: 308,837.01 tokens/sec
2024-12-07 00:14:29,824 - src.data_processing.training_data_loader - INFO - Next shard key to use: 23. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000024_t_100357847.npy
2024-12-07 00:15:50,163 - src.model_assessment.validation - INFO - Step (8200). Val Loss: 2.9969
2024-12-07 00:15:50,164 - __main__ - INFO - Step 8200: Time: 1698.27 ms. LR: 1.3811e-03. Avg. loss: 3.0873. Perplexity: 21.9187. Grad Norm: 0.0785. Throughput: 308,719.03 tokens/sec
2024-12-07 00:17:05,886 - __main__ - INFO - Step 8250: Time: 1698.83 ms. LR: 1.3735e-03. Avg. loss: 3.0453. Perplexity: 21.0160. Grad Norm: 0.0817. Throughput: 308,616.82 tokens/sec
2024-12-07 00:18:29,244 - src.model_assessment.validation - INFO - Step (8300). Val Loss: 2.9943
2024-12-07 00:18:29,245 - __main__ - INFO - Step 8300: Time: 1696.95 ms. LR: 1.3658e-03. Avg. loss: 3.0736. Perplexity: 21.6199. Grad Norm: 0.0921. Throughput: 308,958.28 tokens/sec
2024-12-07 00:19:34,773 - src.data_processing.training_data_loader - INFO - Next shard key to use: 5. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000006_t_98852760.npy
2024-12-07 00:19:45,415 - __main__ - INFO - Step 8350: Time: 1698.91 ms. LR: 1.3581e-03. Avg. loss: 3.0405. Perplexity: 20.9166. Grad Norm: 0.0697. Throughput: 308,601.71 tokens/sec
2024-12-07 00:21:08,773 - src.model_assessment.validation - INFO - Step (8400). Val Loss: 2.9911
2024-12-07 00:21:08,774 - __main__ - INFO - Step 8400: Time: 1697.65 ms. LR: 1.3504e-03. Avg. loss: 3.0310. Perplexity: 20.7173. Grad Norm: 0.0738. Throughput: 308,832.41 tokens/sec
2024-12-07 00:22:24,514 - __main__ - INFO - Step 8450: Time: 1699.31 ms. LR: 1.3427e-03. Avg. loss: 3.0645. Perplexity: 21.4233. Grad Norm: 0.0722. Throughput: 308,530.61 tokens/sec
2024-12-07 00:23:47,843 - src.model_assessment.validation - INFO - Step (8500). Val Loss: 2.9882
2024-12-07 00:24:12,774 - src.model_assessment.hellaswag - INFO - Step (8500). HellaSwag Evaluation Accuracy: 3154/10042 = 31.41%
2024-12-07 00:24:13,140 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say “Honey!” or “Thank you, good evening.” This sounds
2024-12-07 00:24:13,140 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say to me, “Listen, it’s time for the game.”
�
2024-12-07 00:24:13,141 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, "Hooroh."
- Acknowledge, say, "I'm coming here."
2024-12-07 00:24:13,141 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say something about what kind of animal she was.
Also remember that in most cases a dog is not
2024-12-07 00:24:13,141 - src.model_assessment.sampling - INFO - HTML stands for
In this context, the phrase "I do not know" is "I don't know what
2024-12-07 00:24:13,142 - src.model_assessment.sampling - INFO - HTML stands for Red-Nose Disease in which the person with the disease eats green onions instead of red onions.
2024-12-07 00:24:13,142 - src.model_assessment.sampling - INFO - HTML stands for “open source” in English.
It is a term used to describe any form of
2024-12-07 00:24:13,142 - src.model_assessment.sampling - INFO - HTML stands for “the HyperText Markup Language,” which can be found on most of your software
2024-12-07 00:24:13,152 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig and a few small pieces of clay…
To me, the original design was a machine that mim
2024-12-07 00:24:13,153 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig which were placed directly onto its front paw. He made the little man out of the bark of a
2024-12-07 00:24:13,153 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig into a shape that resembled a bird nest. Now, when the tiny one had filled the nest,
2024-12-07 00:24:13,153 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig underneath.
This clever old machine is a huge deal: it made in Canada, and almost all
2024-12-07 00:24:13,154 - __main__ - INFO - Step 8500: Time: 1697.30 ms. LR: 1.3350e-03. Avg. loss: 3.0519. Perplexity: 21.1547. Grad Norm: 0.0822. Throughput: 308,895.70 tokens/sec
2024-12-07 00:25:01,100 - src.data_processing.training_data_loader - INFO - Next shard key to use: 62. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000063_t_99172879.npy
2024-12-07 00:25:29,148 - __main__ - INFO - Step 8550: Time: 1699.09 ms. LR: 1.3272e-03. Avg. loss: 3.0451. Perplexity: 21.0122. Grad Norm: 0.0828. Throughput: 308,569.62 tokens/sec
2024-12-07 00:26:52,487 - src.model_assessment.validation - INFO - Step (8600). Val Loss: 2.9855
2024-12-07 00:26:52,488 - __main__ - INFO - Step 8600: Time: 1696.74 ms. LR: 1.3195e-03. Avg. loss: 3.0216. Perplexity: 20.5234. Grad Norm: 0.0746. Throughput: 308,998.14 tokens/sec
2024-12-07 00:28:08,215 - __main__ - INFO - Step 8650: Time: 1697.85 ms. LR: 1.3117e-03. Avg. loss: 3.0410. Perplexity: 20.9268. Grad Norm: 0.0693. Throughput: 308,794.59 tokens/sec
2024-12-07 00:29:31,545 - src.model_assessment.validation - INFO - Step (8700). Val Loss: 2.9842
2024-12-07 00:29:31,546 - __main__ - INFO - Step 8700: Time: 1696.69 ms. LR: 1.3039e-03. Avg. loss: 3.0376. Perplexity: 20.8546. Grad Norm: 0.0997. Throughput: 309,007.17 tokens/sec
2024-12-07 00:30:02,975 - src.data_processing.training_data_loader - INFO - Next shard key to use: 0. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000001_t_99059140.npy
2024-12-07 00:30:47,663 - __main__ - INFO - Step 8750: Time: 1697.44 ms. LR: 1.2962e-03. Avg. loss: 3.0053. Perplexity: 20.1922. Grad Norm: 0.0730. Throughput: 308,869.41 tokens/sec
2024-12-07 00:32:11,013 - src.model_assessment.validation - INFO - Step (8800). Val Loss: 2.9817
2024-12-07 00:32:11,014 - __main__ - INFO - Step 8800: Time: 1697.23 ms. LR: 1.2884e-03. Avg. loss: 3.0749. Perplexity: 21.6468. Grad Norm: 0.0673. Throughput: 308,907.46 tokens/sec
2024-12-07 00:33:26,740 - __main__ - INFO - Step 8850: Time: 1698.56 ms. LR: 1.2806e-03. Avg. loss: 3.0973. Perplexity: 22.1388. Grad Norm: 0.0837. Throughput: 308,665.60 tokens/sec
2024-12-07 00:34:50,083 - src.model_assessment.validation - INFO - Step (8900). Val Loss: 2.9792
2024-12-07 00:34:50,084 - __main__ - INFO - Step 8900: Time: 1697.20 ms. LR: 1.2728e-03. Avg. loss: 3.0734. Perplexity: 21.6163. Grad Norm: 0.0792. Throughput: 308,914.06 tokens/sec
2024-12-07 00:35:04,145 - src.data_processing.training_data_loader - INFO - Next shard key to use: 28. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000029_t_100287359.npy
2024-12-07 00:36:06,274 - __main__ - INFO - Step 8950: Time: 1698.37 ms. LR: 1.2649e-03. Avg. loss: 3.1025. Perplexity: 22.2536. Grad Norm: 0.0881. Throughput: 308,701.22 tokens/sec
2024-12-07 00:37:29,611 - src.model_assessment.validation - INFO - Step (9000). Val Loss: 2.9753
2024-12-07 00:37:54,530 - src.model_assessment.hellaswag - INFO - Step (9000). HellaSwag Evaluation Accuracy: 3179/10042 = 31.66%
2024-12-07 00:37:54,896 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say it if I spoke to him.
When a dog has the “fear-free-
2024-12-07 00:37:54,896 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “Oh, what is that you are talking about?” If they can hear me
2024-12-07 00:37:54,896 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say what he wants. This means he can’t feel what he wants; it’s
2024-12-07 00:37:54,897 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say
1"Don't scream for a dog, I won't talk for your dog. It would
2024-12-07 00:37:54,900 - src.model_assessment.sampling - INFO - HTML stands for “State of the World English.”
Since its humble beginnings, the English language has
2024-12-07 00:37:54,900 - src.model_assessment.sampling - INFO - HTML stands for the fact that it was designed to be easy to use, maintainable, and reliable, and it
2024-12-07 00:37:54,900 - src.model_assessment.sampling - INFO - HTML stands for "Web Page Information System" or "Web Application System" which is also known as web application system
2024-12-07 00:37:54,900 - src.model_assessment.sampling - INFO - HTML stands for "information services." This is a category under which an application runs, not the entire application (A
2024-12-07 00:37:54,909 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig.
When the sun began to set, a creature like a wolf saw a man that had a
2024-12-07 00:37:54,909 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. That was the idea. It was not very long before other clever people were able to build a
2024-12-07 00:37:54,909 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig of wood. The animal didn't resemble the one pictured above, however, but it did resemble a
2024-12-07 00:37:54,910 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. On the inside the bird's face came four golden eyes in circles and a few tiny droppings
2024-12-07 00:37:54,911 - __main__ - INFO - Step 9000: Time: 1696.89 ms. LR: 1.2571e-03. Avg. loss: 3.0329. Perplexity: 20.7577. Grad Norm: 0.0727. Throughput: 308,969.79 tokens/sec
2024-12-07 00:39:10,488 - __main__ - INFO - Step 9050: Time: 1697.66 ms. LR: 1.2493e-03. Avg. loss: 3.0656. Perplexity: 21.4468. Grad Norm: 0.0750. Throughput: 308,829.85 tokens/sec
2024-12-07 00:40:33,802 - src.model_assessment.validation - INFO - Step (9100). Val Loss: 2.9784
2024-12-07 00:40:33,803 - __main__ - INFO - Step 9100: Time: 1697.26 ms. LR: 1.2414e-03. Avg. loss: 3.0474. Perplexity: 21.0610. Grad Norm: 0.0940. Throughput: 308,902.04 tokens/sec
2024-12-07 00:40:34,198 - src.data_processing.training_data_loader - INFO - Next shard key to use: 7. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000008_t_98785750.npy
2024-12-07 00:41:49,958 - __main__ - INFO - Step 9150: Time: 1699.27 ms. LR: 1.2336e-03. Avg. loss: 3.0830. Perplexity: 21.8232. Grad Norm: 0.0724. Throughput: 308,537.06 tokens/sec
2024-12-07 00:43:13,314 - src.model_assessment.validation - INFO - Step (9200). Val Loss: 2.9707
2024-12-07 00:43:13,315 - __main__ - INFO - Step 9200: Time: 1698.18 ms. LR: 1.2257e-03. Avg. loss: 2.9920. Perplexity: 19.9264. Grad Norm: 0.0746. Throughput: 308,735.76 tokens/sec
2024-12-07 00:44:29,017 - __main__ - INFO - Step 9250: Time: 1698.45 ms. LR: 1.2179e-03. Avg. loss: 3.0129. Perplexity: 20.3472. Grad Norm: 0.0698. Throughput: 308,686.23 tokens/sec
2024-12-07 00:45:26,953 - src.data_processing.training_data_loader - INFO - Next shard key to use: 76. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000077_t_99364791.npy
2024-12-07 00:45:52,758 - src.model_assessment.validation - INFO - Step (9300). Val Loss: 2.9771
2024-12-07 00:45:52,759 - __main__ - INFO - Step 9300: Time: 1697.02 ms. LR: 1.2100e-03. Avg. loss: 2.9884. Perplexity: 19.8544. Grad Norm: 0.0914. Throughput: 308,945.91 tokens/sec
2024-12-07 00:47:08,462 - __main__ - INFO - Step 9350: Time: 1698.43 ms. LR: 1.2022e-03. Avg. loss: 2.9794. Perplexity: 19.6761. Grad Norm: 0.0774. Throughput: 308,690.43 tokens/sec
2024-12-07 00:48:31,810 - src.model_assessment.validation - INFO - Step (9400). Val Loss: 2.9669
2024-12-07 00:48:31,811 - __main__ - INFO - Step 9400: Time: 1697.10 ms. LR: 1.1943e-03. Avg. loss: 3.0264. Perplexity: 20.6227. Grad Norm: 0.0823. Throughput: 308,932.11 tokens/sec
2024-12-07 00:49:47,507 - __main__ - INFO - Step 9450: Time: 1696.95 ms. LR: 1.1864e-03. Avg. loss: 3.0144. Perplexity: 20.3773. Grad Norm: 0.0757. Throughput: 308,959.33 tokens/sec
2024-12-07 00:50:29,540 - src.data_processing.training_data_loader - INFO - Next shard key to use: 72. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000073_t_98249388.npy
2024-12-07 00:51:11,300 - src.model_assessment.validation - INFO - Step (9500). Val Loss: 2.9644
2024-12-07 00:51:36,958 - src.model_assessment.hellaswag - INFO - Step (9500). HellaSwag Evaluation Accuracy: 3208/10042 = 31.95%
2024-12-07 00:51:37,325 - src.model_assessment.sampling - INFO - HTML stands for “the complete documentation: documentation, documentation, documentation, documentation, documentation, documentation.”
2024-12-07 00:51:37,325 - src.model_assessment.sampling - INFO - HTML stands for Simple English Language, a language that allows you to understand and speak English using the basic English expressions.
2024-12-07 00:51:37,325 - src.model_assessment.sampling - INFO - HTML stands for data and it is the data that provides information about a system. It may contain useful information about the
2024-12-07 00:51:37,326 - src.model_assessment.sampling - INFO - HTML stands for “Hypertext Markup Language,” which means that HTML contains all the HTML tags contained
2024-12-07 00:51:37,326 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “Yes I do,” and I would let them know. But they don�
2024-12-07 00:51:37,327 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say something that I don't think I am going to say.”
“I am just
2024-12-07 00:51:37,327 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, "Well there's something wrong with me!" or, "Let's try it again." I
2024-12-07 00:51:37,327 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, "Hi! My son, my son, why don't you be sad? I never need
2024-12-07 00:51:37,335 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig! The bird used this for his nest and had two sets of feathers!
A third team,
2024-12-07 00:51:37,336 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig from some fruit.
When he touched that part of the string, his finger caught on a piece
2024-12-07 00:51:37,336 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. They could then run underneath it to search for prey.
A very clever fox made a small
2024-12-07 00:51:37,336 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig at the top to keep the tail from snapping. The spider flew with the feather and the insect was
2024-12-07 00:51:37,337 - __main__ - INFO - Step 9500: Time: 1697.78 ms. LR: 1.1785e-03. Avg. loss: 3.0357. Perplexity: 20.8156. Grad Norm: 0.0749. Throughput: 308,808.82 tokens/sec
2024-12-07 00:52:52,856 - __main__ - INFO - Step 9550: Time: 1697.40 ms. LR: 1.1707e-03. Avg. loss: 3.0303. Perplexity: 20.7028. Grad Norm: 0.0727. Throughput: 308,876.31 tokens/sec
2024-12-07 00:54:16,190 - src.model_assessment.validation - INFO - Step (9600). Val Loss: 2.9614
2024-12-07 00:54:16,191 - __main__ - INFO - Step 9600: Time: 1697.59 ms. LR: 1.1628e-03. Avg. loss: 2.9647. Perplexity: 19.3885. Grad Norm: 0.0699. Throughput: 308,842.12 tokens/sec
2024-12-07 00:55:31,877 - __main__ - INFO - Step 9650: Time: 1697.58 ms. LR: 1.1549e-03. Avg. loss: 2.9882. Perplexity: 19.8489. Grad Norm: 0.0687. Throughput: 308,844.16 tokens/sec
2024-12-07 00:55:54,224 - src.data_processing.training_data_loader - INFO - Next shard key to use: 96. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000097_t_98942070.npy
2024-12-07 00:56:55,642 - src.model_assessment.validation - INFO - Step (9700). Val Loss: 2.9587
2024-12-07 00:56:55,642 - __main__ - INFO - Step 9700: Time: 1697.29 ms. LR: 1.1470e-03. Avg. loss: 3.0019. Perplexity: 20.1246. Grad Norm: 0.0760. Throughput: 308,896.40 tokens/sec
2024-12-07 00:58:11,328 - __main__ - INFO - Step 9750: Time: 1697.62 ms. LR: 1.1392e-03. Avg. loss: 3.0132. Perplexity: 20.3515. Grad Norm: 0.0816. Throughput: 308,836.79 tokens/sec
2024-12-07 00:59:34,675 - src.model_assessment.validation - INFO - Step (9800). Val Loss: 2.9568
2024-12-07 00:59:34,676 - __main__ - INFO - Step 9800: Time: 1698.90 ms. LR: 1.1313e-03. Avg. loss: 3.0162. Perplexity: 20.4142. Grad Norm: 0.0820. Throughput: 308,604.74 tokens/sec
2024-12-07 01:00:50,417 - __main__ - INFO - Step 9850: Time: 1699.55 ms. LR: 1.1234e-03. Avg. loss: 3.0061. Perplexity: 20.2077. Grad Norm: 0.0717. Throughput: 308,486.34 tokens/sec
2024-12-07 01:00:55,371 - src.data_processing.training_data_loader - INFO - Next shard key to use: 32. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000033_t_99950025.npy
2024-12-07 01:02:14,180 - src.model_assessment.validation - INFO - Step (9900). Val Loss: 2.9546
2024-12-07 01:02:14,181 - __main__ - INFO - Step 9900: Time: 1697.54 ms. LR: 1.1156e-03. Avg. loss: 3.0206. Perplexity: 20.5037. Grad Norm: 0.0833. Throughput: 308,851.23 tokens/sec
2024-12-07 01:03:29,916 - __main__ - INFO - Step 9950: Time: 1699.42 ms. LR: 1.1077e-03. Avg. loss: 3.0281. Perplexity: 20.6587. Grad Norm: 0.0656. Throughput: 308,510.66 tokens/sec
2024-12-07 01:04:53,271 - src.model_assessment.validation - INFO - Step (10000). Val Loss: 2.9525
2024-12-07 01:05:18,325 - src.model_assessment.hellaswag - INFO - Step (10000). HellaSwag Evaluation Accuracy: 3264/10042 = 32.50%
2024-12-07 01:05:18,692 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say the word. The voice that makes you happy could be heard with a high frequency.
I can
2024-12-07 01:05:18,693 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say and then he would be happy to talk.
But in the case of most domesticated animals,
2024-12-07 01:05:18,693 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, "I’m thinking of you, and I’m thinking of you now."
2024-12-07 01:05:18,693 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say: “I don’t know why I should not.”
How about if
2024-12-07 01:05:18,693 - src.model_assessment.sampling - INFO - HTML stands for the abbreviation of the German word: w-tze-so-so (German for the
2024-12-07 01:05:18,694 - src.model_assessment.sampling - INFO - HTML stands for "Java and C++" and "C++".
|This section related to Java or C
2024-12-07 01:05:18,694 - src.model_assessment.sampling - INFO - HTML stands for “open-source audio.” In your case, most of your audio goes for a
2024-12-07 01:05:18,694 - src.model_assessment.sampling - INFO - HTML stands for "open standard format." A number of organizations have agreed to put a code format, so that each
2024-12-07 01:05:18,699 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig and used it as if he were about to feed the dog. The dog began to think that he
2024-12-07 01:05:18,700 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig that the fox then used to climb out of the animal’s mouth. He used the feathers
2024-12-07 01:05:18,700 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig
The clever fox decided to use the shape of the pigeon to create the cat he wanted and got
2024-12-07 01:05:18,700 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. When he climbed over his wooden platform, he did not care to take a step. He would
2024-12-07 01:05:18,701 - __main__ - INFO - Step 10000: Time: 1696.45 ms. LR: 1.0998e-03. Avg. loss: 3.0032. Perplexity: 20.1504. Grad Norm: 0.0637. Throughput: 309,050.21 tokens/sec
2024-12-07 01:06:24,812 - src.data_processing.training_data_loader - INFO - Next shard key to use: 43. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000044_t_100013364.npy
2024-12-07 01:06:34,710 - __main__ - INFO - Step 10050: Time: 1697.38 ms. LR: 1.0920e-03. Avg. loss: 2.9923. Perplexity: 19.9316. Grad Norm: 0.0751. Throughput: 308,880.34 tokens/sec
2024-12-07 01:07:57,971 - src.model_assessment.validation - INFO - Step (10100). Val Loss: 2.9501
2024-12-07 01:07:57,972 - __main__ - INFO - Step 10100: Time: 1695.07 ms. LR: 1.0841e-03. Avg. loss: 2.9658. Perplexity: 19.4097. Grad Norm: 0.0856. Throughput: 309,302.33 tokens/sec
2024-12-07 01:09:13,674 - __main__ - INFO - Step 10150: Time: 1699.36 ms. LR: 1.0763e-03. Avg. loss: 3.0714. Perplexity: 21.5725. Grad Norm: 0.0745. Throughput: 308,520.53 tokens/sec
2024-12-07 01:10:36,970 - src.model_assessment.validation - INFO - Step (10200). Val Loss: 2.9478
2024-12-07 01:10:36,971 - __main__ - INFO - Step 10200: Time: 1696.16 ms. LR: 1.0684e-03. Avg. loss: 3.0276. Perplexity: 20.6475. Grad Norm: 0.0833. Throughput: 309,102.08 tokens/sec
2024-12-07 01:11:28,854 - src.data_processing.training_data_loader - INFO - Next shard key to use: 70. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000071_t_99054586.npy
2024-12-07 01:11:53,148 - __main__ - INFO - Step 10250: Time: 1700.06 ms. LR: 1.0606e-03. Avg. loss: 3.0361. Perplexity: 20.8243. Grad Norm: 0.0747. Throughput: 308,393.37 tokens/sec
2024-12-07 01:13:16,501 - src.model_assessment.validation - INFO - Step (10300). Val Loss: 2.9463
2024-12-07 01:13:16,502 - __main__ - INFO - Step 10300: Time: 1696.86 ms. LR: 1.0527e-03. Avg. loss: 2.9588. Perplexity: 19.2747. Grad Norm: 0.0808. Throughput: 308,974.87 tokens/sec
2024-12-07 01:14:32,184 - __main__ - INFO - Step 10350: Time: 1699.10 ms. LR: 1.0449e-03. Avg. loss: 3.0230. Perplexity: 20.5530. Grad Norm: 0.0696. Throughput: 308,568.45 tokens/sec
2024-12-07 01:15:55,490 - src.model_assessment.validation - INFO - Step (10400). Val Loss: 2.9428
2024-12-07 01:15:55,491 - __main__ - INFO - Step 10400: Time: 1697.02 ms. LR: 1.0371e-03. Avg. loss: 2.9782. Perplexity: 19.6515. Grad Norm: 0.0830. Throughput: 308,946.30 tokens/sec
2024-12-07 01:16:29,949 - src.data_processing.training_data_loader - INFO - Next shard key to use: 54. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000055_t_100480113.npy
2024-12-07 01:17:11,628 - __main__ - INFO - Step 10450: Time: 1697.71 ms. LR: 1.0293e-03. Avg. loss: 3.0022. Perplexity: 20.1288. Grad Norm: 0.0687. Throughput: 308,821.13 tokens/sec
2024-12-07 01:18:34,947 - src.model_assessment.validation - INFO - Step (10500). Val Loss: 2.9422
2024-12-07 01:18:59,886 - src.model_assessment.hellaswag - INFO - Step (10500). HellaSwag Evaluation Accuracy: 3257/10042 = 32.43%
2024-12-07 01:19:00,254 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say that they were happy and happy. And if I could talk, then, you could say that these
2024-12-07 01:19:00,254 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, `Oh, you see what a big dog can do.'
(9) The dog says
2024-12-07 01:19:00,255 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say,
“I’m not here to talk to animals, I’m here
2024-12-07 01:19:00,255 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “Let’s make some music!”
One of the most common vocal
2024-12-07 01:19:00,258 - src.model_assessment.sampling - INFO - HTML stands for:
In other words, this module is a subset of HTML, which is a combination of text
2024-12-07 01:19:00,259 - src.model_assessment.sampling - INFO - HTML stands for "Open Source Software".
Huge number of free software can be downloaded at www.gnu.
2024-12-07 01:19:00,259 - src.model_assessment.sampling - INFO - HTML stands for “Text Input Code”. The key is the “Text Markup” key
2024-12-07 01:19:00,259 - src.model_assessment.sampling - INFO - HTML stands for “open source” and is owned by Microsoft.
- Google’s parent company
2024-12-07 01:19:00,266 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig; but it was useless since there were no wires and the squirrel got it wrong. The wires were
2024-12-07 01:19:00,267 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. But the result? Not all those details have now to be verified. A new study finds that
2024-12-07 01:19:00,267 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. But by looking at the whole thing, your eye felt puzzled that this particular building might be made
2024-12-07 01:19:00,267 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig-like object. The object, the brain, was filled with something the dog could recognize as a
2024-12-07 01:19:00,268 - __main__ - INFO - Step 10500: Time: 1697.25 ms. LR: 1.0215e-03. Avg. loss: 3.0258. Perplexity: 20.6104. Grad Norm: 0.0750. Throughput: 308,905.25 tokens/sec
2024-12-07 01:20:15,830 - __main__ - INFO - Step 10550: Time: 1698.83 ms. LR: 1.0137e-03. Avg. loss: 2.9913. Perplexity: 19.9121. Grad Norm: 0.0730. Throughput: 308,616.35 tokens/sec
2024-12-07 01:21:39,157 - src.model_assessment.validation - INFO - Step (10600). Val Loss: 2.9400
2024-12-07 01:21:39,158 - __main__ - INFO - Step 10600: Time: 1695.69 ms. LR: 1.0059e-03. Avg. loss: 3.0226. Perplexity: 20.5449. Grad Norm: 0.0896. Throughput: 309,187.78 tokens/sec
2024-12-07 01:22:00,769 - src.data_processing.training_data_loader - INFO - Next shard key to use: 68. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000069_t_99557492.npy
2024-12-07 01:22:55,271 - __main__ - INFO - Step 10650: Time: 1697.76 ms. LR: 9.9813e-04. Avg. loss: 3.0118. Perplexity: 20.3248. Grad Norm: 0.0947. Throughput: 308,810.85 tokens/sec
2024-12-07 01:24:18,565 - src.model_assessment.validation - INFO - Step (10700). Val Loss: 2.9377
2024-12-07 01:24:18,566 - __main__ - INFO - Step 10700: Time: 1696.70 ms. LR: 9.9037e-04. Avg. loss: 2.9999. Perplexity: 20.0835. Grad Norm: 0.0832. Throughput: 309,004.74 tokens/sec
2024-12-07 01:25:34,258 - __main__ - INFO - Step 10750: Time: 1698.63 ms. LR: 9.8262e-04. Avg. loss: 2.9740. Perplexity: 19.5709. Grad Norm: 0.0787. Throughput: 308,653.60 tokens/sec
2024-12-07 01:26:57,640 - src.model_assessment.validation - INFO - Step (10800). Val Loss: 2.9347
2024-12-07 01:26:57,641 - __main__ - INFO - Step 10800: Time: 1698.50 ms. LR: 9.7488e-04. Avg. loss: 2.9646. Perplexity: 19.3865. Grad Norm: 0.0784. Throughput: 308,677.08 tokens/sec
2024-12-07 01:27:03,341 - src.data_processing.training_data_loader - INFO - Next shard key to use: 40. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000041_t_98647850.npy
2024-12-07 01:28:13,787 - __main__ - INFO - Step 10850: Time: 1697.86 ms. LR: 9.6716e-04. Avg. loss: 3.0125. Perplexity: 20.3380. Grad Norm: 0.0652. Throughput: 308,794.03 tokens/sec
2024-12-07 01:29:37,065 - src.model_assessment.validation - INFO - Step (10900). Val Loss: 2.9324
2024-12-07 01:29:37,066 - __main__ - INFO - Step 10900: Time: 1695.74 ms. LR: 9.5944e-04. Avg. loss: 2.9998. Perplexity: 20.0813. Grad Norm: 0.0857. Throughput: 309,179.74 tokens/sec
2024-12-07 01:30:52,751 - __main__ - INFO - Step 10950: Time: 1698.23 ms. LR: 9.5174e-04. Avg. loss: 2.9488. Perplexity: 19.0835. Grad Norm: 0.0732. Throughput: 308,726.88 tokens/sec
2024-12-07 01:31:55,994 - src.data_processing.training_data_loader - INFO - Next shard key to use: 55. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000056_t_99391669.npy
2024-12-07 01:32:16,524 - src.model_assessment.validation - INFO - Step (11000). Val Loss: 2.9304
2024-12-07 01:32:42,062 - src.model_assessment.hellaswag - INFO - Step (11000). HellaSwag Evaluation Accuracy: 3252/10042 = 32.38%
2024-12-07 01:32:42,430 - src.model_assessment.sampling - INFO - HTML stands for the Uniform Resource Identifiers.
The name of this particular name has to do with the way it
2024-12-07 01:32:42,430 - src.model_assessment.sampling - INFO - HTML stands for Internet Content-accessibility guidelines and standards. It defines standards that apply to various formats and allows for
2024-12-07 01:32:42,430 - src.model_assessment.sampling - INFO - HTML stands for Digital Object Identifiers.
In HTML5 the term “tags” refer to a subset
2024-12-07 01:32:42,431 - src.model_assessment.sampling - INFO - HTML stands for an Internet Service Provider.
If you think you'll have a problem accessing a web page on your
2024-12-07 01:32:42,430 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say: “That sounds ridiculous!” And if I didn’t reply, I�
2024-12-07 01:32:42,431 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say something like: "Let me see if it's okay," the child would have said something like:
2024-12-07 01:32:42,431 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say “Pair!”
More research is needed, but the results from my research and
2024-12-07 01:32:42,431 - src.model_assessment.sampling - INFO - If animals could talk, my pet would probably say, “No, the animal is still sick, she’s in bed all day,
2024-12-07 01:32:42,438 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. They then went through the design and it produced the object that we have come to call “
2024-12-07 01:32:42,439 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. At first, the robot only appeared to be capable of picking a few pebbles from a
2024-12-07 01:32:42,439 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig. The feather proved useful in attaching the robot's tail — or as simple as a nail, if
2024-12-07 01:32:42,439 - src.model_assessment.sampling - INFO - The clever fox built the strange machine with just a feather, a pebble, and a tiny twig that he cut and folded to make the small device.
As the Fox later explained, the only
2024-12-07 01:32:42,440 - __main__ - INFO - Step 11000: Time: 1696.37 ms. LR: 9.4406e-04. Avg. loss: 2.9605. Perplexity: 19.3074. Grad Norm: 0.0743. Throughput: 309,064.97 tokens/sec
2024-12-07 01:33:57,910 - __main__ - INFO - Step 11050: Time: 1696.08 ms. LR: 9.3639e-04. Avg. loss: 3.0144. Perplexity: 20.3768. Grad Norm: 0.0768. Throughput: 309,117.98 tokens/sec
2024-12-07 01:35:21,177 - src.model_assessment.validation - INFO - Step (11100). Val Loss: 2.9292
2024-12-07 01:35:21,178 - __main__ - INFO - Step 11100: Time: 1697.56 ms. LR: 9.2873e-04. Avg. loss: 3.0112. Perplexity: 20.3110. Grad Norm: 0.1041. Throughput: 308,847.94 tokens/sec
2024-12-07 01:36:36,845 - __main__ - INFO - Step 11150: Time: 1697.07 ms. LR: 9.2110e-04. Avg. loss: 2.9945. Perplexity: 19.9748. Grad Norm: 0.0792. Throughput: 308,937.97 tokens/sec
2024-12-07 01:37:24,187 - src.data_processing.training_data_loader - INFO - Next shard key to use: 48. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000049_t_99566025.npy
2024-12-07 01:38:00,589 - src.model_assessment.validation - INFO - Step (11200). Val Loss: 2.9266
2024-12-07 01:38:00,590 - __main__ - INFO - Step 11200: Time: 1696.55 ms. LR: 9.1347e-04. Avg. loss: 3.0250. Perplexity: 20.5936. Grad Norm: 0.0853. Throughput: 309,032.49 tokens/sec
2024-12-07 01:39:16,296 - __main__ - INFO - Step 11250: Time: 1699.27 ms. LR: 9.0587e-04. Avg. loss: 3.0047. Perplexity: 20.1799. Grad Norm: 0.0723. Throughput: 308,536.50 tokens/sec
2024-12-07 01:40:39,671 - src.model_assessment.validation - INFO - Step (11300). Val Loss: 2.9216
2024-12-07 01:40:39,672 - __main__ - INFO - Step 11300: Time: 1697.73 ms. LR: 8.9828e-04. Avg. loss: 2.9649. Perplexity: 19.3931. Grad Norm: 0.0820. Throughput: 308,817.58 tokens/sec
2024-12-07 01:41:55,373 - __main__ - INFO - Step 11350: Time: 1697.43 ms. LR: 8.9071e-04. Avg. loss: 2.9617. Perplexity: 19.3314. Grad Norm: 0.0733. Throughput: 308,871.06 tokens/sec
2024-12-07 01:42:26,790 - src.data_processing.training_data_loader - INFO - Next shard key to use: 80. shard_file_path: /home/MyLLM/temp_data/edu_fineweb10B/edufineweb_i_000081_t_98676258.npy
2024-12-07 01:43:19,083 - src.model_assessment.validation - INFO - Step (11400). Val Loss: 2.9186
2024-12-07 01:43:19,083 - __main__ - INFO - Step 11400: Time: 1696.39 ms. LR: 8.8316e-04. Avg. loss: 3.0333. Perplexity: 20.7654. Grad Norm: 0.0745. Throughput: 309,060.50 tokens/sec
2024-12-07 01:44:22,637 - src.utils.root - INFO - Creating dir: /home/MyLLM/temp_data/checkpoints
2024-12-07 01:44:22,637 - src.utils.root - INFO - Creating file: /home/MyLLM/temp_data/checkpoints/checkpoint_step_11442_date_2024_12_07-01_44_UTC.pth
2024-12-07 01:44:28,794 - src.model_utils.checkpoint_utils - INFO - Checkpoint saved at step 11442.
2024-12-07 01:44:40,893 - __main__ - INFO - Step 11450: Time: 1694.66 ms. LR: 8.7562e-04. Avg. loss: 2.8939. Perplexity: 18.0634. Grad Norm: 0.0752. Throughput: 309,375.96 tokens/sec