Why a locked model? #20

charlesdvaught-hash · 2026-04-09T03:01:47Z

charlesdvaught-hash
Apr 9, 2026

I've been running some tweaks to getting it running on a 5070, like using a slightly quantized model and reducing token length..

Can i ask why you use a locked model? I seemed to see a reference to retraining at 2am- is that something automatic or user-initiated?

itigges22 · 2026-04-09T16:07:10Z

itigges22
Apr 9, 2026
Maintainer

The 2am LoRa tune was an old technique that had been since moved away from. (Was in v1). We have since moved to the geometric lens as we can introduce new ways to work with the data it already has versus an entire retrain. It's a lot more efficient.

I also chose a frozen model as it would prove how much it already knows without needing to fine tune or retrain it. It also would allow for solid ablation and benchmarking across each infra version.

But feel free to make it your own! I will mention that its not completely model agnostic yet- its straight forward with llama++ server, but you would still need to retrain c(x), g(x) to match the correct dimensions of your model, and adjust config and any templates.

0 replies

charlesdvaught-hash · 2026-04-10T01:25:08Z

charlesdvaught-hash
Apr 10, 2026
Author

you know, I thought I had it working- but seeing your post, realizing i didn't retrain anything... I think q4_k_m uses the same embedding dimensions.
i guess I'm gonna have to do a benchmark run- fun!

1 reply

itigges22 Apr 10, 2026
Maintainer

I have yet to bench the 9B, and would love to bench larger models to see what type of scaling laws are associated with the infrastructure. I doubt it's linear, but I am curious!

Keep me updated with results!

charlesdvaught-hash · 2026-04-10T09:31:58Z

charlesdvaught-hash
Apr 10, 2026
Author

Im reasonably sure that my quant has the same weights and doesnt require training, but I'm having an issue related to either token length or Atlas' ability to save files where software never 'passes'. It does present code, but I suspect it just does a normal qwen build and picks the best without the layers working properly. Still working on diagnostics atm, benchmark is some time off. Fun project though

…

On Thu, Apr 9, 2026, 8:02 PM Johnathon Isaac Tigges < ***@***.***> wrote: I have yet to bench the 9B, and would love to bench larger models to see what type of scaling laws are associated with the infrastructure. I doubt it's linear, but I am curious! Keep me updated with results! — Reply to this email directly, view it on GitHub <#20 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BZAETYWEVBCFV7YSWCQ636D4VBP5XAVCNFSM6AAAAACXR447MKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTMNJRGA4TSMA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

1 reply

itigges22 Apr 10, 2026
Maintainer

Oh well feel free to submit issues! Happy to help debug!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why a locked model? #20

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why a locked model? #20

Uh oh!

charlesdvaught-hash Apr 9, 2026

Replies: 3 comments · 2 replies

Uh oh!

itigges22 Apr 9, 2026 Maintainer

Uh oh!

charlesdvaught-hash Apr 10, 2026 Author

Uh oh!

itigges22 Apr 10, 2026 Maintainer

Uh oh!

charlesdvaught-hash Apr 10, 2026 Author

Uh oh!

itigges22 Apr 10, 2026 Maintainer

charlesdvaught-hash
Apr 9, 2026

Replies: 3 comments 2 replies

itigges22
Apr 9, 2026
Maintainer

charlesdvaught-hash
Apr 10, 2026
Author

itigges22 Apr 10, 2026
Maintainer

charlesdvaught-hash
Apr 10, 2026
Author

itigges22 Apr 10, 2026
Maintainer