You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for sharing this great project. Here I have a question on the 2D pixel shuffle. The vit_embeds has the shape of [N, L, C], and it is first reshaped to [N, h, w, C], and then is performed with pixel shuffle in two dimensions by reshaping, permute and contiguous for [N, h/2, w/2, C4]. But finally the vit_embeds is reshaped into [N, L/4, C4] for further usage. Why not directly perform the pixel shuffle on vit_embeds with [N, L, C] into [N, L/4, C*4]?If we modify the inference code with this pixel shuffle, will this change has significant influence on the performance?
Thanks for sharing this great project. Here I have a question on the 2D pixel shuffle. The vit_embeds has the shape of [N, L, C], and it is first reshaped to [N, h, w, C], and then is performed with pixel shuffle in two dimensions by reshaping, permute and contiguous for [N, h/2, w/2, C4]. But finally the vit_embeds is reshaped into [N, L/4, C4] for further usage. Why not directly perform the pixel shuffle on vit_embeds with [N, L, C] into [N, L/4, C*4]?If we modify the inference code with this pixel shuffle, will this change has significant influence on the performance?
InternVL/internvl_chat/internvl/model/internvl_chat/modeling_internvl_chat.py
Line 287 in 34a8100
The text was updated successfully, but these errors were encountered: