Skip to content

Training NesT from scatch (on CIFAR10) #751

Answered by alexander-soare
cjsg asked this question in Q&A
Discussion options

You must be logged in to vote

@cjsg having a glance at what you've done here are some things that might get you a lot closer:

  1. Probably most importantly, you need patch_size=1. It's a little confusing when comparing to the initial implementation as there the patch_size refers to the size of a "block" (as the term is used in the paper) in units of "patches". To be clear, in terms of official -> timm it's
  • patch_size -> block_size as determined here
  • init_patch_embed_size -> patch_size as set in the model_kwargs
    Some check sums to make sure we're on the same page: Your image size is 32x32. You set num_levels to 4 which means the first hierarchical level has 8x8 "blocks" each with 4x4 pixels. And your patch size is 1x1, …

Replies: 1 comment 7 replies

Comment options

You must be logged in to vote
7 replies
@alexander-soare
Comment options

@alexander-soare
Comment options

@cjsg
Comment options

@charchit7
Comment options

@cjsg
Comment options

Answer selected by cjsg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants