Skip to content

Training RoPE ViT #2557

Answered by rwightman
sinahmr asked this question in Q&A
Jul 25, 2025 · 1 comments · 2 replies
Discussion options

You must be logged in to vote

@sinahmr yes, currently all of the ViT models with ROPE embeddings in timm are based on the EVA model (eva.py). It was the first to include ROPE embeddings and it's been extended to support several variants. It's essentially a ViT with ROPE support (w/ abs pos embed option), SwiGLU option. The comments in the file have references for the model sources, papers, etc. There are also several timm definitions that I trained with registers like

vit_base_patch16_rope_reg1_gap_256.sbb_in1k
vit_betwixt_patch16_rope_reg4_gap_256.sbb_in1k
vit_medium_patch16_rope_reg1_gap_256.sbb_in1k
vit_mediumd_patch16_rope_reg1_gap_256.sbb_in1k

All of the models are a bit different, even at the same 'size' range. …

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@rwightman
Comment options

Answer selected by sinahmr
@sinahmr
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants