Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Qwen2-VL generate with inputs_embeds #35466

Merged

Conversation

minostauros
Copy link
Contributor

@minostauros minostauros commented Dec 31, 2024

What does this PR do?

Fixes #35463

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@zucchini-nlp

@minostauros
Copy link
Contributor Author

Failing test_contrastive_generate might be related to proper initialization of self.rope_deltas.
What do you think?

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minostauros hey, thanks for submitting s PR!

Unfortunately, we'd like to keep the RoPE delta related code withing the forward method as it has been causing errors in some generation techniques. The failing tests are also related to the rope deltas, some techniques call model's forward several times with the same inputs and thus position ids might not be correct in such cases. Is there any way to fix this by changing only the forward code?

I suggest to refactor a bit the get_repo_index and remove the dependency on input_ids as a required argument when no video/image is passed

@minostauros minostauros force-pushed the fix/#35463-qwen2-vl-inputs_embeds branch 3 times, most recently from 8a22d10 to d1541d7 Compare January 7, 2025 02:06
@minostauros
Copy link
Contributor Author

Solved failing errors.
A little concern is that similar codes are repeated in forward and prepare_inputs_for_generation but without the code in prepare_inputs_for_generation my newly added test will fail.

I suggest to refactor a bit the get_repo_index and remove the dependency on input_ids as a required argument when no video/image is passed

Applied in d1541d7

@minostauros
Copy link
Contributor Author

minostauros commented Jan 7, 2025

Additionally failing tests

These tests are failing in main branch as well, so I suppose it is not my fault ;)


⬢ [Docker] ❯ RUN_SLOW=1 pytest -vv tests/models/qwen2_vl/test_modeling_qwen2_vl.py
======================================================================= test session starts =======================================================================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0 -- /usr/bin/python3.10
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/workspace/Github/transformers_qwen2vlfix/.hypothesis/examples'))
rootdir: /workspace/Github/transformers_qwen2vlfix
configfile: pyproject.toml
plugins: hypothesis-6.123.2, xdist-3.6.1, timeout-2.3.1, rich-0.2.0, asyncio-0.23.8, anyio-4.7.0, hydra-core-1.3.2
asyncio: mode=strict
collected 145 items                                                                                                                                               

tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_assisted_decoding_matches_greedy_search <- ../../../usr/local/lib/python3.10/dist-packages/_pytest/mark/structures.py PASSED [  0%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_assisted_decoding_matches_greedy_search_0_random <- tests/generation/test_utils.py PASSED [  1%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_assisted_decoding_matches_greedy_search_1_same <- tests/generation/test_utils.py PASSED [  2%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_assisted_decoding_sample <- tests/generation/test_utils.py PASSED                   [  2%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_assisted_decoding_with_num_logits_to_keep <- tests/generation/test_utils.py SKIPPED [  3%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_attention_outputs <- tests/test_modeling_common.py PASSED                           [  4%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_attn_implementation_composite_models <- tests/test_modeling_common.py SKIPPED       [  4%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_batching_equivalence <- tests/test_modeling_common.py PASSED                        [  5%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_beam_sample_generate <- tests/generation/test_utils.py PASSED                       [  6%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_beam_sample_generate_dict_output <- tests/generation/test_utils.py PASSED           [  6%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_beam_search_generate <- tests/generation/test_utils.py PASSED                       [  7%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_beam_search_generate_dict_output <- tests/generation/test_utils.py PASSED           [  8%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_beam_search_generate_dict_outputs_use_cache <- tests/generation/test_utils.py PASSED [  8%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_beam_search_low_memory SKIPPED (Qwen2-VL can't do low-memory generation because
position IDs have extra dimension and split function doesn't work for that)                                                                                 [  9%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_can_use_safetensors <- tests/test_modeling_common.py PASSED                         [ 10%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_config PASSED                                                                       [ 11%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_constrained_beam_search_generate <- tests/generation/test_utils.py PASSED           [ 11%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_constrained_beam_search_generate_dict_output <- tests/generation/test_utils.py PASSED [ 12%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_contrastive_generate <- tests/generation/test_utils.py PASSED                       [ 13%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_contrastive_generate_dict_outputs_use_cache <- tests/generation/test_utils.py PASSED [ 13%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_contrastive_generate_low_memory <- tests/generation/test_utils.py PASSED            [ 14%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_correct_missing_keys <- tests/test_modeling_common.py PASSED                        [ 15%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_cpu_offload SKIPPED (CPU offload is not yet supported)                              [ 15%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_custom_4d_attention_mask <- tests/test_modeling_common.py SKIPPED                   [ 16%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_determinism <- tests/test_modeling_common.py PASSED                                 [ 17%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_disk_offload_bin SKIPPED (Some undefined behavior encountered with test versions of
this model. Skip for now.)                                                                                                                                  [ 17%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_disk_offload_safetensors SKIPPED (Some undefined behavior encountered with test
versions of this model. Skip for now.)                                                                                                                      [ 18%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_dola_decoding_sample <- tests/generation/test_utils.py PASSED                       [ 19%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_eager_matches_fa2_generate <- tests/generation/test_utils.py FAILED                 [ 20%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_eager_matches_sdpa_generate <- tests/generation/test_utils.py PASSED                [ 20%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_eager_matches_sdpa_inference_0_float16 <- tests/test_modeling_common.py PASSED      [ 21%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_eager_matches_sdpa_inference_1_bfloat16 <- tests/test_modeling_common.py PASSED     [ 22%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_eager_matches_sdpa_inference_2_float32 <- tests/test_modeling_common.py PASSED      [ 22%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_equivalence_flax_to_pt <- tests/test_modeling_common.py SKIPPED (No Flax model
exists for this class)                                                                                                                                      [ 23%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_equivalence_pt_to_flax <- tests/test_modeling_common.py SKIPPED (No Flax model
exists for this class)                                                                                                                                      [ 24%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_feed_forward_chunking SKIPPED (Feedforward chunking is not yet supported)           [ 24%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_flash_attention_2_padding_matches_padding_free_with_position_ids <- tests/test_modeling_common.py SKIPPED [ 25%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_flash_attn_2_can_dispatch_composite_models <- tests/test_modeling_common.py SKIPPED [ 26%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_flash_attn_2_fp32_ln <- tests/test_modeling_common.py SKIPPED (test requires
bitsandbytes and torch)                                                                                                                                     [ 26%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_flash_attn_2_from_config <- tests/test_modeling_common.py PASSED                    [ 27%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_flash_attn_2_inference_equivalence <- tests/test_modeling_common.py PASSED          [ 28%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_flash_attn_2_inference_equivalence_right_padding <- tests/test_modeling_common.py PASSED [ 28%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_flax_from_pt_safetensors <- tests/test_modeling_common.py SKIPPED (transformers
does not have this model in Flax version yet)                                                                                                               [ 29%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_forward_with_num_logits_to_keep <- tests/test_modeling_common.py SKIPPED (This
model does not support `num_logits_to_keep` argument.)                                                                                                      [ 30%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_from_pretrained_no_checkpoint <- tests/test_modeling_common.py PASSED               [ 31%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_compile_0_forward_only <- tests/generation/test_utils.py PASSED            [ 31%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_compile_1_end_to_end <- tests/generation/test_utils.py FAILED              [ 32%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_compile_fullgraph SKIPPED (Can't compile fullgraph due to dynamic control
flow in `prepare_inputs_for_generate`)                                                                                                                      [ 33%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_continue_from_past_key_values <- tests/generation/test_utils.py PASSED     [ 33%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_from_inputs_embeds <- ../../../usr/local/lib/python3.10/dist-packages/_pytest/mark/structures.py PASSED [ 34%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_from_inputs_embeds_0_greedy <- tests/generation/test_utils.py PASSED       [ 35%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_from_inputs_embeds_1_beam_search <- tests/generation/test_utils.py PASSED  [ 35%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_from_inputs_embeds_with_static_cache SKIPPED (VLMs can't generate from
inputs embeds and pixels. This can be tested as part of bacbone LM, no need to run the tes for VLMs)                                                        [ 36%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_methods_with_num_logits_to_keep <- tests/generation/test_utils.py SKIPPED  [ 37%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_with_head_masking <- tests/generation/test_utils.py PASSED                 [ 37%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_with_quant_cache <- tests/generation/test_utils.py SKIPPED (test requires
optimum-quanto)                                                                                                                                             [ 38%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_with_static_cache <- tests/generation/test_utils.py PASSED                 [ 39%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_without_input_ids <- tests/generation/test_utils.py PASSED                 [ 40%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_gradient_checkpointing_backward_compatibility <- tests/test_modeling_common.py PASSED [ 40%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_gradient_checkpointing_enable_disable <- tests/test_modeling_common.py PASSED       [ 41%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_greedy_generate <- tests/generation/test_utils.py PASSED                            [ 42%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_greedy_generate_dict_outputs <- tests/generation/test_utils.py PASSED               [ 42%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_greedy_generate_dict_outputs_use_cache <- tests/generation/test_utils.py PASSED     [ 43%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_group_beam_search_generate <- tests/generation/test_utils.py PASSED                 [ 44%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_group_beam_search_generate_dict_output <- tests/generation/test_utils.py PASSED     [ 44%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_head_pruning <- tests/test_modeling_common.py SKIPPED (Pruning is not activated)    [ 45%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_head_pruning_integration <- tests/test_modeling_common.py SKIPPED (Pruning is not
activated)                                                                                                                                                  [ 46%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_head_pruning_save_load_from_config_init <- tests/test_modeling_common.py SKIPPED    [ 46%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_head_pruning_save_load_from_pretrained <- tests/test_modeling_common.py SKIPPED     [ 47%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_headmasking <- tests/test_modeling_common.py SKIPPED (Model does not support head
masking)                                                                                                                                                    [ 48%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_hidden_states_output <- tests/test_modeling_common.py PASSED                        [ 48%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_inherits_generation_mixin <- tests/generation/test_utils.py PASSED                  [ 49%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_initialization PASSED                                                               [ 50%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_inputs_embeds <- tests/test_modeling_common.py PASSED                               [ 51%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_inputs_embeds_matches_input_ids <- tests/test_modeling_common.py PASSED             [ 51%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_keep_in_fp32_modules <- tests/test_modeling_common.py SKIPPED (Model class has no
_keep_in_fp32_modules attribute defined)                                                                                                                    [ 52%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_left_padding_compatibility <- tests/generation/test_utils.py PASSED                 [ 53%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_load_save_without_tied_weights <- tests/test_modeling_common.py PASSED              [ 53%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_load_with_mismatched_shapes <- tests/test_modeling_common.py PASSED                 [ 54%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_matched_shapes_have_loaded_weights_when_some_mismatched_shapes_exist <- tests/test_modeling_common.py PASSED [ 55%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_mismatched_shapes_have_properly_initialized_weights <- tests/test_modeling_common.py PASSED [ 55%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_mismatching_num_image_tokens PASSED                                                 [ 56%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_get_set_embeddings <- tests/test_modeling_common.py PASSED                    [ 57%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_is_small SKIPPED (We cannot configure to output a smaller model.)             [ 57%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_main_input_name <- tests/test_modeling_common.py PASSED                       [ 58%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_outputs_equivalence <- tests/test_modeling_common.py PASSED                   [ 59%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_parallel_beam_search <- tests/generation/test_utils.py PASSED                 [ 60%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_parallel_equal_results <- tests/test_modeling_common.py SKIPPED               [ 60%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_parallelism SKIPPED (Some undefined behavior encountered with test versions
of this model. Skip for now.)                                                                                                                               [ 61%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_parallelization <- tests/test_modeling_common.py SKIPPED (test_model_parallel
is set to False)                                                                                                                                            [ 62%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_weights_reload_no_missing_tied_weights <- tests/test_modeling_common.py PASSED [ 62%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_multi_gpu_data_parallel_forward SKIPPED (Got `CUDA error: misaligned address` with
PyTorch 2.0.0.)                                                                                                                                             [ 63%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_new_cache_format_0 <- tests/generation/test_utils.py PASSED                         [ 64%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_new_cache_format_1 <- tests/generation/test_utils.py PASSED                         [ 64%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_new_cache_format_2 <- tests/generation/test_utils.py PASSED                         [ 65%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_offloaded_cache_implementation_0_offloaded <- tests/generation/test_utils.py PASSED [ 66%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_past_key_values_format <- tests/generation/test_utils.py PASSED                     [ 66%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_peft_gradient_checkpointing_enable_disable <- tests/test_modeling_common.py PASSED  [ 67%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_problem_types <- tests/test_modeling_common.py PASSED                               [ 68%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_prompt_lookup_decoding_matches_greedy_search <- tests/generation/test_utils.py PASSED [ 68%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_prompt_lookup_decoding_stops_at_eos <- tests/generation/test_utils.py PASSED        [ 69%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_pt_tf_model_equivalence <- tests/test_modeling_common.py SKIPPED (transformers does
not have TF version of this model yet)                                                                                                                      [ 70%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_resize_embeddings_untied <- tests/test_modeling_common.py PASSED                    [ 71%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_resize_embeddings_untied_with_deepspeed <- tests/test_modeling_common.py SKIPPED    [ 71%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_resize_embeddings_untied_with_deepspeed_multi_gpu <- tests/test_modeling_common.py SKIPPED [ 72%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_resize_position_vector_embeddings <- tests/test_modeling_common.py SKIPPED (Model
does not have position embeddings)                                                                                                                          [ 73%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_resize_tokens_embeddings <- tests/test_modeling_common.py PASSED                    [ 73%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_resize_tokens_embeddings_with_deepspeed <- tests/test_modeling_common.py SKIPPED    [ 74%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_resize_tokens_embeddings_with_deepspeed_multi_gpu <- tests/test_modeling_common.py SKIPPED [ 75%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_retain_grad_hidden_states_attentions <- tests/test_modeling_common.py PASSED        [ 75%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_sample_generate <- tests/generation/test_utils.py PASSED                            [ 76%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_sample_generate_dict_output <- tests/generation/test_utils.py PASSED                [ 77%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_save_load <- tests/test_modeling_common.py PASSED                                   [ 77%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_save_load_fast_init_from_base <- tests/test_modeling_common.py PASSED               [ 78%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_save_load_fast_init_to_base <- tests/test_modeling_common.py PASSED                 [ 79%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_save_load_keys_to_ignore_on_save <- tests/test_modeling_common.py PASSED            [ 80%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_save_load_low_cpu_mem_usage <- tests/test_modeling_common.py PASSED                 [ 80%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_save_load_low_cpu_mem_usage_checkpoints <- tests/test_modeling_common.py PASSED     [ 81%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_save_load_low_cpu_mem_usage_no_safetensors <- tests/test_modeling_common.py PASSED  [ 82%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_sdpa_can_compile_dynamic SKIPPED (Compile not yet supported because in Qwen2VL
models)                                                                                                                                                     [ 82%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_sdpa_can_dispatch_composite_models <- tests/test_modeling_common.py SKIPPED         [ 83%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_sdpa_can_dispatch_non_composite_models <- tests/test_modeling_common.py PASSED      [ 84%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_sdpa_can_dispatch_on_flash SKIPPED (Compile not yet supported because in Qwen2VL
models)                                                                                                                                                     [ 84%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_sdpa_matches_eager_sliding_window <- tests/test_modeling_common.py SKIPPED (Model
architecture does not support attentions)                                                                                                                   [ 85%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_tf_from_pt_safetensors <- tests/test_modeling_common.py SKIPPED (transformers does
not have this model in TF version yet)                                                                                                                      [ 86%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_tie_model_weights <- tests/test_modeling_common.py PASSED                           [ 86%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_tied_weights_keys <- tests/test_modeling_common.py PASSED                           [ 87%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torch_compile_for_training <- tests/test_modeling_common.py SKIPPED                 [ 88%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torch_fx <- tests/test_modeling_common.py SKIPPED (Either torch.fx is not
available, or the model type qwen2_vl is not compatible with torch.fx)                                                                                      [ 88%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torch_fx_output_loss <- tests/test_modeling_common.py SKIPPED (Either torch.fx is
not available, or the model type qwen2_vl is not compatible with torch.fx)                                                                                  [ 89%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torch_save_load <- tests/test_modeling_common.py PASSED                             [ 90%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_attentions <- tests/test_modeling_common.py PASSED               [ 91%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_hidden_state <- tests/test_modeling_common.py PASSED             [ 91%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_simple <- tests/test_modeling_common.py PASSED                          [ 92%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_training <- tests/test_modeling_common.py PASSED                                    [ 93%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_training_gradient_checkpointing <- tests/test_modeling_common.py PASSED             [ 93%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_training_gradient_checkpointing_use_reentrant <- tests/test_modeling_common.py PASSED [ 94%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_training_gradient_checkpointing_use_reentrant_false <- tests/test_modeling_common.py PASSED [ 95%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test FAILED                                           [ 95%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch FAILED                                     [ 96%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_different_resolutions FAILED               [ 97%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_flashatt2 FAILED                           [ 97%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_inputs_embeds PASSED                       [ 98%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_wo_image FAILED                            [ 99%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_wo_image_flashatt2 FAILED                  [100%]

============================================================================ FAILURES =============================================================================
________________________________________________________ Qwen2VLModelTest.test_eager_matches_fa2_generate _________________________________________________________

self = <tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLModelTest testMethod=test_eager_matches_fa2_generate>

    @pytest.mark.flash_attn_test
    @require_flash_attn
    @require_torch_gpu
    @slow
    def test_eager_matches_fa2_generate(self):
        """Tests that generate has equivalent outputs with FA2 and eager attention implementations."""
        # TODO (@joao @raushan) -- this test is failing the output checks on most models, investigate. After fixing,
        # check whether we still need the overwrites
>       self._test_attention_implementation("flash_attention_2")

tests/generation/test_utils.py:2232: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/generation/test_utils.py:2211: in _test_attention_implementation
    res_attn = model_attn.generate(**inputs_dict, **generate_kwargs)
/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py:116: in decorate_context
    return func(*args, **kwargs)
src/transformers/generation/utils.py:2254: in generate
    result = self._sample(
src/transformers/generation/utils.py:3253: in _sample
    outputs = self(**model_inputs, return_dict=True)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1636: in forward
    image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1013: in forward
    hidden_states = blk(hidden_states, cu_seqlens=cu_seqlens, rotary_pos_emb=rotary_pos_emb)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/qwen2_vl/modeling_qwen2_vl.py:426: in forward
    hidden_states = hidden_states + self.attn(
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/qwen2_vl/modeling_qwen2_vl.py:371: in forward
    attn_output = flash_attn_varlen_func(q, k, v, cu_seqlens, cu_seqlens, max_seqlen, max_seqlen).reshape(
/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py:1412: in flash_attn_varlen_func
    return FlashAttnVarlenFunc.apply(
/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py:575: in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py:901: in forward
    out_padded, softmax_lse, S_dmask, rng_state = _wrapped_flash_attn_varlen_forward(
/usr/local/lib/python3.10/dist-packages/torch/_ops.py:1116: in __call__
    return self._op(*args, **(kwargs or {}))
/usr/local/lib/python3.10/dist-packages/torch/_library/autograd.py:113: in autograd_impl
    result = forward_no_grad(*args, Metadata(keyset, keyword_only_args))
/usr/local/lib/python3.10/dist-packages/torch/_library/autograd.py:40: in forward_no_grad
    result = op.redispatch(keyset & _C._after_autograd_keyset, *args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_ops.py:721: in redispatch
    return self._handle.redispatch_boxed(keyset, *args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_library/custom_ops.py:324: in backend_impl
    result = self._backend_fns[device_type](*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_compile.py:32: in inner
    return disable_fn(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py:632: in _fn
    return fn(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_library/custom_ops.py:367: in wrapped_fn
    return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

q = tensor([[[-0.1935,  0.0630,  0.0069,  0.0455, -0.0185, -0.1489, -0.0538,
          -0.1925],
         [ 0.0688,  0.078...0675,  0.1012, -0.0778, -0.1439, -0.1144, -0.0988, -0.0464,
          -0.0062]]], device='cuda:0', dtype=torch.float16)
k = tensor([[[-0.0680,  0.1868, -0.1798,  0.0471,  0.1980, -0.0596,  0.1442,
          -0.1207],
         [-0.1365,  0.209...0978,  0.0904,  0.1068,  0.0186,  0.0360,  0.0588,  0.0048,
           0.1798]]], device='cuda:0', dtype=torch.float16)
v = tensor([[[-0.1302,  0.0261, -0.2043,  0.0259, -0.0839, -0.0353,  0.0321,
           0.0205],
         [-0.1661, -0.047...1198, -0.1594,  0.0762,  0.1050,  0.1483,  0.0617,  0.1683,
           0.0405]]], device='cuda:0', dtype=torch.float16)
cu_seqlens_q = tensor([0, 1, 2], dtype=torch.int32), cu_seqlens_k = tensor([0, 1, 2], dtype=torch.int32), max_seqlen_q = 1, max_seqlen_k = 1, dropout_p = 0.0
softmax_scale = 0.3535533905932738, causal = False, window_size_left = -1, window_size_right = -1, softcap = 0.0, alibi_slopes = None, return_softmax = False
block_table = None, leftpad_k = None, seqused_k = None

    @_torch_custom_op_wrapper("flash_attn::_flash_attn_varlen_forward", mutates_args=(), device_types="cuda")
    def _flash_attn_varlen_forward(
        q: torch.Tensor,
        k: torch.Tensor,
        v: torch.Tensor,
        cu_seqlens_q: torch.Tensor,
        cu_seqlens_k: torch.Tensor,
        max_seqlen_q: int,
        max_seqlen_k: int,
        dropout_p: float,
        softmax_scale: float,
        causal: bool,
        window_size_left: int = -1,
        window_size_right: int = -1,
        softcap: float = 0.0,
        alibi_slopes: Optional[torch.Tensor] = None,
        return_softmax: bool = False,
        block_table: Optional[torch.Tensor] = None,
        leftpad_k: Optional[torch.Tensor] = None,
        seqused_k: Optional[torch.Tensor] = None,
    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
        q, k, v = [maybe_contiguous(x) for x in (q, k, v)]
>       out, softmax_lse, S_dmask, rng_state = flash_attn_gpu.varlen_fwd(
            q,
            k,
            v,
            None,
            cu_seqlens_q,
            cu_seqlens_k,
            seqused_k,
            leftpad_k,
            block_table,
            alibi_slopes,
            max_seqlen_q,
            max_seqlen_k,
            dropout_p,
            softmax_scale,
            False,
            causal,
            window_size_left,
            window_size_right,
            softcap,
            return_softmax,
            None,
        )
E       RuntimeError: cu_seqlens_q must be on CUDA

/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py:169: RuntimeError
---------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
_______________________________________________________ Qwen2VLModelTest.test_generate_compile_1_end_to_end _______________________________________________________

a = (<tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLModelTest testMethod=test_generate_compile_1_end_to_end>,), kw = {}

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

/usr/local/lib/python3.10/dist-packages/parameterized/parameterized.py:620: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/generation/test_utils.py:2077: in test_generate_compile
    compiled_outputs.append(model.generate(model_inputs, generation_config=generation_config))
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py:465: in _fn
    return fn(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py:1269: in __call__
    return self._torchdynamo_orig_callable(
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py:526: in __call__
    return _compile(
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py:924: in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py:666: in compile_inner
    return _compile_inner(code, one_graph, hooks, transform)
/usr/local/lib/python3.10/dist-packages/torch/_utils_internal.py:87: in wrapper_function
    return function(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py:699: in _compile_inner
    out_code = transform_code_object(code, transform)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py:1322: in transform_code_object
    transformations(instructions, code_options)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py:219: in _fn
    return fn(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py:634: in transform
    tracer.run()
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:2796: in run
    super().run()
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:983: in run
    while self.step():
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:895: in step
    self.dispatch_table[inst.opcode](self, inst)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:582: in wrapper
    return inner_fn(self, inst)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:1680: in CALL_FUNCTION_EX
    self.call_function(fn, argsvars.items, kwargsvars)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:830: in call_function
    self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/lazy.py:156: in realize_and_forward
    return getattr(self.realize(), name)(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:385: in call_function
    return super().call_function(tx, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:324: in call_function
    return super().call_function(tx, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:111: in call_function
    return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:836: in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:3011: in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:3139: in inline_call_
    tracer.run()
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:983: in run
    while self.step():
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:895: in step
    self.dispatch_table[inst.opcode](self, inst)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:582: in wrapper
    return inner_fn(self, inst)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:1680: in CALL_FUNCTION_EX
    self.call_function(fn, argsvars.items, kwargsvars)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:830: in call_function
    self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:324: in call_function
    return super().call_function(tx, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:111: in call_function
    return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:836: in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:3011: in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:3139: in inline_call_
    tracer.run()
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:983: in run
    while self.step():
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:895: in step
    self.dispatch_table[inst.opcode](self, inst)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:582: in wrapper
    return inner_fn(self, inst)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:1680: in CALL_FUNCTION_EX
    self.call_function(fn, argsvars.items, kwargsvars)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:830: in call_function
    self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:385: in call_function
    return super().call_function(tx, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:324: in call_function
    return super().call_function(tx, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:111: in call_function
    return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:836: in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:3011: in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:3139: in inline_call_
    tracer.run()
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:983: in run
    while self.step():
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:895: in step
    self.dispatch_table[inst.opcode](self, inst)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:582: in wrapper
    return inner_fn(self, inst)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:1680: in CALL_FUNCTION_EX
    self.call_function(fn, argsvars.items, kwargsvars)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:830: in call_function
    self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:385: in call_function
    return super().call_function(tx, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:324: in call_function
    return super().call_function(tx, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:111: in call_function
    return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:836: in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:3011: in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:3139: in inline_call_
    tracer.run()
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:983: in run
    while self.step():
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:895: in step
    self.dispatch_table[inst.opcode](self, inst)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <torch._dynamo.symbolic_convert.InliningInstructionTranslator object at 0x7f5792190250>
inst = Instruction(opcode=114, opname='POP_JUMP_IF_FALSE', arg=75, argval=150, offset=124, starts_line=1763, is_jump_target=F...ffset=150, starts_line=1768, is_jump_target=True, positions=None, target=None, exn_tab_entry=None), exn_tab_entry=None)

    def inner(self: "InstructionTranslatorBase", inst: Instruction):
        value: VariableTracker = self.pop()
        if (
            config.rewrite_assert_with_torch_assert
            and _detect_and_normalize_assert_statement(self, truth_fn, push)
        ):
            error_msg: VariableTracker = self.pop()
            # Skip over things like `assert True`
            if value.is_python_constant():
                if bool(value.as_python_constant()):
                    return self.jump(inst)
                else:
                    jump_graph_break(self, inst, value)
    
            # TODO maybe should respect DtoH sync intention of users later??
            # Manually insert torch._assert_async instead of python assert and jump over
            # assert related instructions as we don't need them anymore.
    
            # if we see Tensor as assert statement, no need to call scalar_tensor
            if isinstance(value, TensorVariable):
                self.output.create_proxy(
                    "call_function",
                    torch._assert_async,
                    *proxy_args_kwargs((value, error_msg), {}),
                )
                self.jump(inst)
                return
    
            if isinstance(value, SymNodeVariable):
                # if the assertion is normal shape expression.
                # just install guard and bail out.
                sym_expr = value.sym_num
                if not isinstance(sym_expr, torch.SymBool):
                    sym_expr = sym_expr != 0
    
                result = torch.fx.experimental.symbolic_shapes.expect_true(sym_expr)
                if not result:
                    unimplemented(
                        "Assertion failed on symbolic shapes. Did you make sure eager mode succeeds?"
                    )
                self.jump(inst)
                return
    
            scalar_to_tensor_proxy = self.output.create_proxy(
                "call_function", torch.scalar_tensor, *proxy_args_kwargs((value,), {})
            )
    
            scalar_to_tensor = wrap_fx_proxy(
                self,
                scalar_to_tensor_proxy,
                example_value=get_fake_value(scalar_to_tensor_proxy.node, self),
            )
    
            self.output.create_proxy(
                "call_function",
                torch._assert_async,
                *proxy_args_kwargs((scalar_to_tensor, error_msg), {}),
            )
            self.jump(inst)
            return
    
        if value.is_python_constant():
            if truth_fn(value.as_python_constant()):
                if push:
                    self.push(value)
                self.jump(inst)
        elif (
            isinstance(value, (TensorVariable)) and self.should_compile_partial_graph()
        ):
            jump_graph_break(self, inst, value)
        elif isinstance(value, NNModuleVariable):
            # Equivalent of "self.nn_module is not None"
            mod = self.output.get_submodule(value.module_key)
            if truth_fn(mod):
                if push:
                    self.push(value)
                self.jump(inst)
        elif isinstance(value, UnspecializedNNModuleVariable):
            mod = value.value
            if truth_fn(mod):
                if push:
                    self.push(value)
                self.jump(inst)
        elif isinstance(value, UserDefinedObjectVariable):
            try:
                x = value.var_getattr(self, "__bool__")  # type: ignore[arg-type]
            except exc.ObservedAttributeError:
                exc.handle_observed_exception(self)
                # if __bool__ is missing, trying __len__ to infer a truth value.
                try:
                    x = value.var_getattr(self, "__len__")  # type: ignore[arg-type]
                except exc.ObservedAttributeError:
                    exc.handle_observed_exception(self)
                    x = None
    
            # __bool__ or __len__ is function
            if isinstance(x, UserMethodVariable):
                result = x.call_function(self, [], {})  # type: ignore[arg-type]
                if isinstance(result, ConstantVariable) and isinstance(
                    result.value, (bool, int)
                ):
                    if truth_fn(result.value):
                        if push:
                            self.push(value)
                        self.jump(inst)
                else:
                    unimplemented(
                        "generic_jump on UserDefined with __bool__ returning non-constant"
                    )
            # __bool__ or __len__ is non-function or not existed in the user defined object
            else:
                if truth_fn(True):
                    if push:
                        self.push(value)
                    self.jump(inst)
        elif not isinstance(value, TensorVariable) and value.has_unpack_var_sequence(
            self
        ):
            if truth_fn(len(value.unpack_var_sequence(self))):
                if push:
                    self.push(value)
                self.jump(inst)
        elif isinstance(value, SymNodeVariable):
            try:
                eval_result = value.evaluate_expr(self.output)
            except exc.UserError as e:
                if self.should_compile_partial_graph():
                    return jump_graph_break(self, inst, value, extra_msg=f"\n{e}")
                raise
            if truth_fn(eval_result):
                if push:
                    self.push(value)
                self.jump(inst)
        elif isinstance(value, variables.BackwardHookVariable):
            if truth_fn(True):
                if push:
                    self.push(value)
                self.jump(inst)
        else:
            from .source import is_constant_source
    
            if value.source is not None and is_constant_source(value.source):
                if truth_fn(value.get_real_value()):  # type: ignore[attr-defined]
                    if push:
                        self.push(value)
                    self.jump(inst)
            else:
                # TODO link the torch.cond doc later
>               raise exc.UserError(
                    exc.UserErrorType.DYNAMIC_CONTROL_FLOW,
                    "Dynamic control flow is not supported at the moment. Please use "
                    "functorch.experimental.control_flow.cond to explicitly capture the control flow.",
                    case_name="cond_operands",
                )
E               torch._dynamo.exc.UserError: Dynamic control flow is not supported at the moment. Please use functorch.experimental.control_flow.cond to explicitly capture the control flow. For more information about this error, see: https://pytorch.org/docs/main/generated/exportdb/index.html#cond-operands
E               
E               from user code:
E                  File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py", line 40, in inner
E                   return fn(*args, **kwargs)
E                 File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
E                   return func(*args, **kwargs)
E                 File "/workspace/Github/transformers_qwen2vlfix/src/transformers/generation/utils.py", line 2254, in generate
E                   result = self._sample(
E                 File "/workspace/Github/transformers_qwen2vlfix/src/transformers/generation/utils.py", line 3246, in _sample
E                   model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
E                 File "/workspace/Github/transformers_qwen2vlfix/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1763, in prepare_inputs_for_generation
E                   if cache_position is None or (cache_position is not None and cache_position[0] == 0):
E               
E               Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
E               
E               
E               You can suppress this exception and fall back to eager by setting:
E                   import torch._dynamo
E                   torch._dynamo.config.suppress_errors = True

/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:560: UserError
____________________________________________________ Qwen2VLIntegrationTest.test_small_model_integration_test _____________________________________________________

self = <tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLIntegrationTest testMethod=test_small_model_integration_test>

    @slow
    def test_small_model_integration_test(self):
        model = Qwen2VLForConditionalGeneration.from_pretrained(
            "Qwen/Qwen2-VL-7B-Instruct", torch_dtype="auto", device_map="auto"
        )
    
        text = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
        inputs = self.processor(text=[text], images=[self.image], return_tensors="pt")
    
        expected_input_ids = [151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 151652, 151655, 151655]  # fmt: skip
        assert expected_input_ids == inputs.input_ids[0].tolist()[:17]
    
        expected_pixel_slice = torch.tensor(
            [
                [0.8792, 0.8792, 0.9084],
                [1.1858, 1.1858, 1.2296],
                [1.2004, 1.2004, 1.2150],
                [1.4340, 1.4340, 1.4194],
                [1.3902, 1.4048, 1.4194],
                [1.5216, 1.5362, 1.5362],
            ],
            dtype=torch.float32,
            device="cpu",
        )
        assert torch.allclose(expected_pixel_slice, inputs.pixel_values[:6, :3], atol=3e-3)
    
        # verify generation
        inputs = inputs.to(torch_device)
    
        output = model.generate(**inputs, max_new_tokens=30)
        EXPECTED_DECODED_TEXT = "system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular pets"
    
>       self.assertEqual(
            self.processor.decode(output[0], skip_special_tokens=True),
            EXPECTED_DECODED_TEXT,
        )
E       AssertionError: 'syst[165 chars]r friendly and intelligent nature, making them popular choices' != 'syst[165 chars]r friendly and intelligent nature, making them popular pets'
E       Diff is 685 characters long. Set self.maxDiff to None to see it.

tests/models/qwen2_vl/test_modeling_qwen2_vl.py:391: AssertionError
---------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------
Loading checkpoint shards: 100%|██████████| 5/5 [00:03<00:00,  1.40it/s]
_________________________________________________ Qwen2VLIntegrationTest.test_small_model_integration_test_batch __________________________________________________

self = <tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLIntegrationTest testMethod=test_small_model_integration_test_batch>

    @slow
    def test_small_model_integration_test_batch(self):
        model = Qwen2VLForConditionalGeneration.from_pretrained(
            "Qwen/Qwen2-VL-7B-Instruct", torch_dtype="auto", device_map="auto"
        )
        text = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
        inputs = self.processor(text=[text, text], images=[self.image, self.image], return_tensors="pt").to(
            torch_device
        )
    
        # it should not matter whether two images are the same size or not
        output = model.generate(**inputs, max_new_tokens=30)
    
        EXPECTED_DECODED_TEXT = [
            'system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular choices',
            'system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular pets'
        ]  # fmt: skip
>       self.assertEqual(
            self.processor.batch_decode(output, skip_special_tokens=True),
            EXPECTED_DECODED_TEXT,
        )
E       AssertionError: Lists differ: ['sys[402 chars] friendly and intelligent nature, making them popular choices'] != ['sys[402 chars] friendly and intelligent nature, making them popular pets']
E       
E       First differing element 1:
E       'syst[165 chars]r friendly and intelligent nature, making them popular choices'
E       'syst[165 chars]r friendly and intelligent nature, making them popular pets'
E       
E       Diff is 786 characters long. Set self.maxDiff to None to see it.

tests/models/qwen2_vl/test_modeling_qwen2_vl.py:413: AssertionError
---------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------
Loading checkpoint shards: 100%|██████████| 5/5 [00:03<00:00,  1.34it/s]
______________________________________ Qwen2VLIntegrationTest.test_small_model_integration_test_batch_different_resolutions _______________________________________

self = <tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLIntegrationTest testMethod=test_small_model_integration_test_batch_different_resolutions>

    @slow
    def test_small_model_integration_test_batch_different_resolutions(self):
        model = Qwen2VLForConditionalGeneration.from_pretrained(
            "Qwen/Qwen2-VL-7B-Instruct", torch_dtype="auto", device_map="auto"
        )
        text = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
        text2 = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
        image2 = self.image.resize((224, 224))
        inputs = self.processor(text=[text, text2], images=[self.image, image2], padding=True, return_tensors="pt").to(
            torch_device
        )
    
        # it should not matter whether two images are the same size or not
        output = model.generate(**inputs, max_new_tokens=30)
    
        EXPECTED_DECODED_TEXT = [
            "system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular pets",
            "system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular pets",
        ]
>       self.assertEqual(
            self.processor.batch_decode(output, skip_special_tokens=True),
            EXPECTED_DECODED_TEXT,
        )
E       AssertionError: Lists differ: ['sys[216 chars]ular choices', 'system\nYou are a helpful assi[198 chars]ces'] != ['sys[216 chars]ular pets', 'system\nYou are a helpful assista[192 chars]ets']
E       
E       First differing element 0:
E       'syst[165 chars]r friendly and intelligent nature, making them popular choices'
E       'syst[165 chars]r friendly and intelligent nature, making them popular pets'
E       
E       Diff is 1024 characters long. Set self.maxDiff to None to see it.

tests/models/qwen2_vl/test_modeling_qwen2_vl.py:464: AssertionError
---------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------
Loading checkpoint shards: 100%|██████████| 5/5 [00:03<00:00,  1.30it/s]
____________________________________________ Qwen2VLIntegrationTest.test_small_model_integration_test_batch_flashatt2 _____________________________________________

self = <tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLIntegrationTest testMethod=test_small_model_integration_test_batch_flashatt2>

    @slow
    @require_flash_attn
    @require_torch_gpu
    def test_small_model_integration_test_batch_flashatt2(self):
        model = Qwen2VLForConditionalGeneration.from_pretrained(
            "Qwen/Qwen2-VL-7B-Instruct",
            torch_dtype=torch.bfloat16,
            attn_implementation="flash_attention_2",
            device_map="auto",
        )
        text = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
        inputs = self.processor(text=[text, text], images=[self.image, self.image], return_tensors="pt").to(
            torch_device
        )
    
        # it should not matter whether two images are the same size or not
        output = model.generate(**inputs, max_new_tokens=30)
    
        EXPECTED_DECODED_TEXT = [
            "system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular pets",
            "system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular pets",
        ]
    
>       self.assertEqual(
            self.processor.batch_decode(output, skip_special_tokens=True),
            EXPECTED_DECODED_TEXT,
        )
E       AssertionError: Lists differ: ['sys[216 chars]ular choices', 'system\nYou are a helpful assi[198 chars]ces'] != ['sys[216 chars]ular pets', 'system\nYou are a helpful assista[192 chars]ets']
E       
E       First differing element 0:
E       'syst[165 chars]r friendly and intelligent nature, making them popular choices'
E       'syst[165 chars]r friendly and intelligent nature, making them popular pets'
E       
E       Diff is 1024 characters long. Set self.maxDiff to None to see it.

tests/models/qwen2_vl/test_modeling_qwen2_vl.py:492: AssertionError
---------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------
Loading checkpoint shards: 100%|██████████| 5/5 [00:03<00:00,  1.32it/s]
_____________________________________________ Qwen2VLIntegrationTest.test_small_model_integration_test_batch_wo_image _____________________________________________

self = <tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLIntegrationTest testMethod=test_small_model_integration_test_batch_wo_image>

    @slow
    def test_small_model_integration_test_batch_wo_image(self):
        model = Qwen2VLForConditionalGeneration.from_pretrained(
            "Qwen/Qwen2-VL-7B-Instruct", torch_dtype="auto", device_map="auto"
        )
        text = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
        messages2 = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who are you?"},
        ]
        text2 = self.processor.apply_chat_template(messages2, tokenize=False, add_generation_prompt=True)
        inputs = self.processor(text=[text, text2], images=[self.image], padding=True, return_tensors="pt").to(
            torch_device
        )
    
        # it should not matter whether two images are the same size or not
        output = model.generate(**inputs, max_new_tokens=30)
    
        EXPECTED_DECODED_TEXT = [
            'system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular pets',
            'system\nYou are a helpful assistant.\nuser\nWho are you?\nassistant\nI am Qwen, a large language model created by Alibaba Cloud. I am designed to assist with various tasks and answer questions to the best of my'
        ]  # fmt: skip
>       self.assertEqual(
            self.processor.batch_decode(output, skip_special_tokens=True),
            EXPECTED_DECODED_TEXT,
        )
E       AssertionError: Lists differ: ['sys[216 chars]ular choices', 'system\nYou are a helpful assi[107 chars]en.'] != ['sys[216 chars]ular pets', 'system\nYou are a helpful assista[174 chars] my']
E       
E       First differing element 0:
E       'syst[165 chars]r friendly and intelligent nature, making them popular choices'
E       'syst[165 chars]r friendly and intelligent nature, making them popular pets'
E       
E       Diff is 1005 characters long. Set self.maxDiff to None to see it.

tests/models/qwen2_vl/test_modeling_qwen2_vl.py:440: AssertionError
---------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------
Loading checkpoint shards: 100%|██████████| 5/5 [00:03<00:00,  1.38it/s]
________________________________________ Qwen2VLIntegrationTest.test_small_model_integration_test_batch_wo_image_flashatt2 ________________________________________

self = <tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLIntegrationTest testMethod=test_small_model_integration_test_batch_wo_image_flashatt2>

    @slow
    @require_flash_attn
    @require_torch_gpu
    def test_small_model_integration_test_batch_wo_image_flashatt2(self):
        model = Qwen2VLForConditionalGeneration.from_pretrained(
            "Qwen/Qwen2-VL-7B-Instruct",
            torch_dtype=torch.bfloat16,
            attn_implementation="flash_attention_2",
            device_map="auto",
        )
        text = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
        messages2 = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who are you?"},
        ]
        text2 = self.processor.apply_chat_template(messages2, tokenize=False, add_generation_prompt=True)
        inputs = self.processor(text=[text, text2], images=[self.image], padding=True, return_tensors="pt").to(
            torch_device
        )
    
        # it should not matter whether two images are the same size or not
        output = model.generate(**inputs, max_new_tokens=30)
    
        EXPECTED_DECODED_TEXT = [
            "system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular pets",
            "system\nYou are a helpful assistant.\nuser\nWho are you?\nassistant\nI am Qwen, a large language model created by Alibaba Cloud. I am designed to answer a wide range of questions and provide information on various topics",
        ]
    
>       self.assertEqual(
            self.processor.batch_decode(output, skip_special_tokens=True),
            EXPECTED_DECODED_TEXT,
        )
E       AssertionError: Lists differ: ['sys[216 chars]ular choices', 'system\nYou are a helpful assi[107 chars]en.'] != ['sys[216 chars]ular pets', 'system\nYou are a helpful assista[184 chars]ics']
E       
E       First differing element 0:
E       'syst[165 chars]r friendly and intelligent nature, making them popular choices'
E       'syst[165 chars]r friendly and intelligent nature, making them popular pets'
E       
E       Diff is 1015 characters long. Set self.maxDiff to None to see it.

tests/models/qwen2_vl/test_modeling_qwen2_vl.py:529: AssertionError
---------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------
Loading checkpoint shards: 100%|██████████| 5/5 [00:03<00:00,  1.31it/s]
======================================================================== warnings summary =========================================================================
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_attention_outputs
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_retain_grad_hidden_states_attentions
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_attentions
  /workspace/Github/transformers_qwen2vlfix/src/transformers/generation/configuration_utils.py:818: UserWarning: `return_dict_in_generate` is NOT set to `True`, but `output_attentions` is. When `return_dict_in_generate` is not `True`, `output_attentions` is ignored.
    warnings.warn(

tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_batching_equivalence
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_hidden_states_output
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_retain_grad_hidden_states_attentions
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_hidden_state
  /workspace/Github/transformers_qwen2vlfix/src/transformers/generation/configuration_utils.py:818: UserWarning: `return_dict_in_generate` is NOT set to `True`, but `output_hidden_states` is. When `return_dict_in_generate` is not `True`, `output_hidden_states` is ignored.
    warnings.warn(

tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_continue_from_past_key_values
  /workspace/Github/transformers_qwen2vlfix/src/transformers/generation/configuration_utils.py:606: UserWarning: `pad_token_id` should be positive but got -1. This will cause errors when batch generating, if there is padding. Please set `pad_token_id` explicitly as `model.generation_config.pad_token_id=PAD_TOKEN_ID` to avoid errors in generation
    warnings.warn(

tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_attentions
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_hidden_state
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_simple
  /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:2529: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
    assert padding_idx < weight.size(

tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_attentions
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_hidden_state
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_simple
  /workspace/Github/transformers_qwen2vlfix/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py:578: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
    if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):

tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_attentions
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_hidden_state
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_simple
  /workspace/Github/transformers_qwen2vlfix/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1297: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
    if attention_mask.shape[-1] > target_length:

tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_hidden_state
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_simple
  /workspace/Github/transformers_qwen2vlfix/src/transformers/modeling_attn_mask_utils.py:285: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
    elif sliding_window is None or key_value_length < sliding_window:

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===================================================================== short test summary info =====================================================================
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_eager_matches_fa2_generate - RuntimeError: cu_seqlens_q must be on CUDA
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_compile_1_end_to_end - torch._dynamo.exc.UserError: Dynamic control flow is not supported at the moment. Please use functorch.experimental.control_flow.cond to explicitly capture th...
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test - AssertionError: 'syst[165 chars]r friendly and intelligent nature, making them popular choices' != 'syst[165 chars]r friendly and intelligent nature, making t...
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch - AssertionError: Lists differ: ['sys[402 chars] friendly and intelligent nature, making them popular choices'] != ['sys[402 chars] friendly and intelligent nat...
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_different_resolutions - AssertionError: Lists differ: ['sys[216 chars]ular choices', 'system\nYou are a helpful assi[198 chars]ces'] != ['sys[216 chars]ular pets', 'system\nYou are a...
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_flashatt2 - AssertionError: Lists differ: ['sys[216 chars]ular choices', 'system\nYou are a helpful assi[198 chars]ces'] != ['sys[216 chars]ular pets', 'system\nYou are a...
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_wo_image - AssertionError: Lists differ: ['sys[216 chars]ular choices', 'system\nYou are a helpful assi[107 chars]en.'] != ['sys[216 chars]ular pets', 'system\nYou are a...
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_wo_image_flashatt2 - AssertionError: Lists differ: ['sys[216 chars]ular choices', 'system\nYou are a helpful assi[107 chars]en.'] != ['sys[216 chars]ular pets', 'system\nYou are a...
================================================ 8 failed, 93 passed, 44 skipped, 19 warnings in 106.06s (0:01:46) ================================================

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yes, the failing tests you mentioned are not related to this PR and the slow tests sometimes fail unless specific hardware is used

For the new changes, we can remove the repetition in prepare_inputs with tiny tricks. LMK if that works :)

@minostauros minostauros force-pushed the fix/#35463-qwen2-vl-inputs_embeds branch 2 times, most recently from b28244c to 9f121fd Compare January 7, 2025 11:13
@minostauros
Copy link
Contributor Author

Thank you! Applied your suggrestions and I guess it is ready.

@minostauros minostauros force-pushed the fix/#35463-qwen2-vl-inputs_embeds branch from 9f121fd to b06981c Compare January 7, 2025 12:58
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great look good to me! I guess the tests for Qwen2-VL are all passing now?

Will request a second review and we can merge then :)

@minostauros
Copy link
Contributor Author

minostauros commented Jan 7, 2025

I guess the tests for Qwen2-VL are all passing now?

Integration tests are still failing just like in main branch, but it is passing the added test_generate_from_inputs_embeds.

test report including slow tests
⬢ [Docker] ❯ RUN_SLOW=1 pytest -vv tests/models/qwen2_vl/test_modeling_qwen2_vl.py
================================================================== test session starts ==================================================================
platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0 -- /usr/bin/python3.10
cachedir: .pytest_cache
rootdir: /workspace/Github/transformers_qwen2vlfix
configfile: pyproject.toml
plugins: anyio-4.7.0, hydra-core-1.3.2
collected 144 items                                                                                                                                     

tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_assisted_decoding_matches_greedy_search <- ../../../usr/local/lib/python3.10/dist-packages/_pytest/mark/structures.py PASSED [  0%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_assisted_decoding_matches_greedy_search_0_random <- tests/generation/test_utils.py PASSED [  1%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_assisted_decoding_matches_greedy_search_1_same <- tests/generation/test_utils.py PASSED [  2%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_assisted_decoding_sample <- tests/generation/test_utils.py PASSED         [  2%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_assisted_decoding_with_num_logits_to_keep <- tests/generation/test_utils.py SKIPPED [  3%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_attention_outputs <- tests/test_modeling_common.py PASSED                 [  4%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_attn_implementation_composite_models <- tests/test_modeling_common.py SKIPPED [  4%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_batching_equivalence <- tests/test_modeling_common.py PASSED              [  5%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_beam_sample_generate <- tests/generation/test_utils.py PASSED             [  6%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_beam_sample_generate_dict_output <- tests/generation/test_utils.py PASSED [  6%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_beam_search_generate <- tests/generation/test_utils.py PASSED             [  7%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_beam_search_generate_dict_output <- tests/generation/test_utils.py PASSED [  8%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_beam_search_generate_dict_outputs_use_cache <- tests/generation/test_utils.py PASSED [  9%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_beam_search_low_memory SKIPPED (Qwen2-VL can't do low-memory generation
because position IDs have extra dimension and split function doesn't work for that)                                                               [  9%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_can_use_safetensors <- tests/test_modeling_common.py PASSED               [ 10%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_config PASSED                                                             [ 11%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_constrained_beam_search_generate <- tests/generation/test_utils.py PASSED [ 11%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_constrained_beam_search_generate_dict_output <- tests/generation/test_utils.py PASSED [ 12%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_contrastive_generate <- tests/generation/test_utils.py PASSED             [ 13%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_contrastive_generate_dict_outputs_use_cache <- tests/generation/test_utils.py PASSED [ 13%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_contrastive_generate_low_memory <- tests/generation/test_utils.py PASSED  [ 14%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_correct_missing_keys <- tests/test_modeling_common.py PASSED              [ 15%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_cpu_offload SKIPPED (CPU offload is not yet supported)                    [ 15%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_custom_4d_attention_mask <- tests/test_modeling_common.py SKIPPED         [ 16%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_determinism <- tests/test_modeling_common.py PASSED                       [ 17%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_disk_offload_bin SKIPPED (Some undefined behavior encountered with test
versions of this model. Skip for now.)                                                                                                            [ 18%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_disk_offload_safetensors SKIPPED (Some undefined behavior encountered
with test versions of this model. Skip for now.)                                                                                                  [ 18%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_dola_decoding_sample <- tests/generation/test_utils.py PASSED             [ 19%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_eager_matches_fa2_generate <- tests/generation/test_utils.py FAILED       [ 20%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_eager_matches_sdpa_generate <- tests/generation/test_utils.py PASSED      [ 20%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_eager_matches_sdpa_inference_0_float16 <- tests/test_modeling_common.py PASSED [ 21%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_eager_matches_sdpa_inference_1_bfloat16 <- tests/test_modeling_common.py PASSED [ 22%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_eager_matches_sdpa_inference_2_float32 <- tests/test_modeling_common.py PASSED [ 22%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_equivalence_flax_to_pt <- tests/test_modeling_common.py SKIPPED (test is
PT+FLAX test)                                                                                                                                     [ 23%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_equivalence_pt_to_flax <- tests/test_modeling_common.py SKIPPED (test is
PT+FLAX test)                                                                                                                                     [ 24%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_feed_forward_chunking SKIPPED (Feedforward chunking is not yet supported) [ 25%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_flash_attention_2_padding_matches_padding_free_with_position_ids <- tests/test_modeling_common.py SKIPPED [ 25%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_flash_attn_2_can_dispatch_composite_models <- tests/test_modeling_common.py SKIPPED [ 26%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_flash_attn_2_fp32_ln <- tests/test_modeling_common.py SKIPPED (test
requires bitsandbytes and torch)                                                                                                                  [ 27%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_flash_attn_2_from_config <- tests/test_modeling_common.py PASSED          [ 27%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_flash_attn_2_inference_equivalence <- tests/test_modeling_common.py PASSED [ 28%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_flash_attn_2_inference_equivalence_right_padding <- tests/test_modeling_common.py PASSED [ 29%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_flax_from_pt_safetensors <- tests/test_modeling_common.py SKIPPED (test
is PT+FLAX test)                                                                                                                                  [ 29%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_forward_with_num_logits_to_keep <- tests/test_modeling_common.py SKIPPED  [ 30%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_from_pretrained_no_checkpoint <- tests/test_modeling_common.py PASSED     [ 31%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_compile_0_forward_only <- tests/generation/test_utils.py PASSED  [ 31%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_compile_1_end_to_end <- tests/generation/test_utils.py FAILED    [ 32%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_compile_fullgraph SKIPPED (Can't compile fullgraph due to
dynamic control flow in `prepare_inputs_for_generate`)                                                                                            [ 33%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_continue_from_past_key_values <- tests/generation/test_utils.py PASSED [ 34%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_from_inputs_embeds <- ../../../usr/local/lib/python3.10/dist-packages/_pytest/mark/structures.py PASSED [ 34%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_from_inputs_embeds_0_greedy <- tests/generation/test_utils.py PASSED [ 35%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_from_inputs_embeds_1_beam_search <- tests/generation/test_utils.py PASSED [ 36%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_from_inputs_embeds_with_static_cache SKIPPED (VLMs can't
generate from inputs embeds and pixels. This can be tested as part of bacbone LM, no need to run the tes for VLMs)                                [ 36%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_methods_with_num_logits_to_keep <- tests/generation/test_utils.py SKIPPED [ 37%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_with_head_masking <- tests/generation/test_utils.py PASSED       [ 38%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_with_quant_cache <- tests/generation/test_utils.py SKIPPED (test
requires optimum-quanto)                                                                                                                          [ 38%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_with_static_cache <- tests/generation/test_utils.py PASSED       [ 39%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_without_input_ids <- tests/generation/test_utils.py PASSED       [ 40%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_gradient_checkpointing_backward_compatibility <- tests/test_modeling_common.py PASSED [ 40%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_gradient_checkpointing_enable_disable <- tests/test_modeling_common.py PASSED [ 41%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_greedy_generate <- tests/generation/test_utils.py PASSED                  [ 42%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_greedy_generate_dict_outputs <- tests/generation/test_utils.py PASSED     [ 43%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_greedy_generate_dict_outputs_use_cache <- tests/generation/test_utils.py PASSED [ 43%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_group_beam_search_generate <- tests/generation/test_utils.py PASSED       [ 44%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_group_beam_search_generate_dict_output <- tests/generation/test_utils.py PASSED [ 45%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_head_pruning <- tests/test_modeling_common.py SKIPPED (Pruning is not
activated)                                                                                                                                        [ 45%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_head_pruning_integration <- tests/test_modeling_common.py SKIPPED         [ 46%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_head_pruning_save_load_from_config_init <- tests/test_modeling_common.py SKIPPED [ 47%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_head_pruning_save_load_from_pretrained <- tests/test_modeling_common.py SKIPPED [ 47%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_headmasking <- tests/test_modeling_common.py SKIPPED (Model does not
support head masking)                                                                                                                             [ 48%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_hidden_states_output <- tests/test_modeling_common.py PASSED              [ 49%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_inherits_generation_mixin <- tests/generation/test_utils.py PASSED        [ 50%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_initialization PASSED                                                     [ 50%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_inputs_embeds <- tests/test_modeling_common.py PASSED                     [ 51%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_inputs_embeds_matches_input_ids <- tests/test_modeling_common.py PASSED   [ 52%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_keep_in_fp32_modules <- tests/test_modeling_common.py SKIPPED (Model
class has no _keep_in_fp32_modules attribute defined)                                                                                             [ 52%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_left_padding_compatibility <- tests/generation/test_utils.py PASSED       [ 53%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_load_save_without_tied_weights <- tests/test_modeling_common.py PASSED    [ 54%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_load_with_mismatched_shapes <- tests/test_modeling_common.py PASSED       [ 54%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_matched_shapes_have_loaded_weights_when_some_mismatched_shapes_exist <- tests/test_modeling_common.py PASSED [ 55%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_mismatched_shapes_have_properly_initialized_weights <- tests/test_modeling_common.py PASSED [ 56%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_mismatching_num_image_tokens PASSED                                       [ 56%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_get_set_embeddings <- tests/test_modeling_common.py PASSED          [ 57%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_is_small SKIPPED (We cannot configure to output a smaller model.)   [ 58%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_main_input_name <- tests/test_modeling_common.py PASSED             [ 59%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_outputs_equivalence <- tests/test_modeling_common.py PASSED         [ 59%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_parallel_beam_search <- tests/generation/test_utils.py FAILED       [ 60%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_parallel_equal_results <- tests/test_modeling_common.py SKIPPED     [ 61%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_parallelism SKIPPED (Some undefined behavior encountered with test
versions of this model. Skip for now.)                                                                                                            [ 61%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_parallelization <- tests/test_modeling_common.py SKIPPED            [ 62%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_weights_reload_no_missing_tied_weights <- tests/test_modeling_common.py PASSED [ 63%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_multi_gpu_data_parallel_forward SKIPPED (Got `CUDA error: misaligned
address` with PyTorch 2.0.0.)                                                                                                                     [ 63%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_new_cache_format_0 <- tests/generation/test_utils.py PASSED               [ 64%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_new_cache_format_1 <- tests/generation/test_utils.py PASSED               [ 65%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_new_cache_format_2 <- tests/generation/test_utils.py PASSED               [ 65%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_offloaded_cache_implementation_0_offloaded <- tests/generation/test_utils.py PASSED [ 66%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_past_key_values_format <- tests/generation/test_utils.py PASSED           [ 67%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_peft_gradient_checkpointing_enable_disable <- tests/test_modeling_common.py PASSED [ 68%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_problem_types <- tests/test_modeling_common.py PASSED                     [ 68%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_prompt_lookup_decoding_matches_greedy_search <- tests/generation/test_utils.py PASSED [ 69%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_prompt_lookup_decoding_stops_at_eos <- tests/generation/test_utils.py PASSED [ 70%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_pt_tf_model_equivalence <- tests/test_modeling_common.py SKIPPED (test is
PT+TF test)                                                                                                                                       [ 70%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_resize_embeddings_untied <- tests/test_modeling_common.py PASSED          [ 71%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_resize_embeddings_untied_with_deepspeed <- tests/test_modeling_common.py SKIPPED [ 72%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_resize_embeddings_untied_with_deepspeed_multi_gpu <- tests/test_modeling_common.py SKIPPED [ 72%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_resize_position_vector_embeddings <- tests/test_modeling_common.py SKIPPED [ 73%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_resize_tokens_embeddings <- tests/test_modeling_common.py PASSED          [ 74%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_resize_tokens_embeddings_with_deepspeed <- tests/test_modeling_common.py SKIPPED [ 75%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_resize_tokens_embeddings_with_deepspeed_multi_gpu <- tests/test_modeling_common.py SKIPPED [ 75%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_retain_grad_hidden_states_attentions <- tests/test_modeling_common.py PASSED [ 76%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_sample_generate <- tests/generation/test_utils.py PASSED                  [ 77%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_sample_generate_dict_output <- tests/generation/test_utils.py PASSED      [ 77%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_save_load <- tests/test_modeling_common.py PASSED                         [ 78%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_save_load_fast_init_from_base <- tests/test_modeling_common.py PASSED     [ 79%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_save_load_fast_init_to_base <- tests/test_modeling_common.py PASSED       [ 79%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_save_load_keys_to_ignore_on_save <- tests/test_modeling_common.py PASSED  [ 80%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_save_load_low_cpu_mem_usage <- tests/test_modeling_common.py PASSED       [ 81%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_save_load_low_cpu_mem_usage_checkpoints <- tests/test_modeling_common.py PASSED [ 81%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_save_load_low_cpu_mem_usage_no_safetensors <- tests/test_modeling_common.py PASSED [ 82%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_sdpa_can_compile_dynamic SKIPPED (Compile not yet supported because in
Qwen2VL models)                                                                                                                                   [ 83%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_sdpa_can_dispatch_composite_models <- tests/test_modeling_common.py SKIPPED [ 84%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_sdpa_can_dispatch_non_composite_models <- tests/test_modeling_common.py PASSED [ 84%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_sdpa_can_dispatch_on_flash SKIPPED (Compile not yet supported because in
Qwen2VL models)                                                                                                                                   [ 85%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_sdpa_matches_eager_sliding_window <- tests/test_modeling_common.py SKIPPED [ 86%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_tf_from_pt_safetensors <- tests/test_modeling_common.py SKIPPED (test is
PT+TF test)                                                                                                                                       [ 86%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_tie_model_weights <- tests/test_modeling_common.py PASSED                 [ 87%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_tied_weights_keys <- tests/test_modeling_common.py PASSED                 [ 88%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torch_compile_for_training <- tests/test_modeling_common.py SKIPPED       [ 88%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torch_fx <- tests/test_modeling_common.py SKIPPED (Either torch.fx is not
available, or the model type qwen2_vl is not compatible with torch.fx)                                                                            [ 89%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torch_fx_output_loss <- tests/test_modeling_common.py SKIPPED (Either
torch.fx is not available, or the model type qwen2_vl is not compatible with torch.fx)                                                            [ 90%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torch_save_load <- tests/test_modeling_common.py PASSED                   [ 90%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_attentions <- tests/test_modeling_common.py PASSED     [ 91%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_hidden_state <- tests/test_modeling_common.py PASSED   [ 92%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_simple <- tests/test_modeling_common.py PASSED                [ 93%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_training <- tests/test_modeling_common.py PASSED                          [ 93%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_training_gradient_checkpointing <- tests/test_modeling_common.py PASSED   [ 94%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_training_gradient_checkpointing_use_reentrant <- tests/test_modeling_common.py PASSED [ 95%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_training_gradient_checkpointing_use_reentrant_false <- tests/test_modeling_common.py PASSED [ 95%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test FAILED                                 [ 96%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch FAILED                           [ 97%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_different_resolutions FAILED     [ 97%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_flashatt2 FAILED                 [ 98%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_wo_image FAILED                  [ 99%]
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_wo_image_flashatt2 FAILED        [100%]

======================================================================= FAILURES ========================================================================
___________________________________________________ Qwen2VLModelTest.test_eager_matches_fa2_generate ____________________________________________________

self = <tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLModelTest testMethod=test_eager_matches_fa2_generate>

    @pytest.mark.flash_attn_test
    @require_flash_attn
    @require_torch_gpu
    @slow
    def test_eager_matches_fa2_generate(self):
        """Tests that generate has equivalent outputs with FA2 and eager attention implementations."""
        # TODO (@joao @raushan) -- this test is failing the output checks on most models, investigate. After fixing,
        # check whether we still need the overwrites
>       self._test_attention_implementation("flash_attention_2")

tests/generation/test_utils.py:2230: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/generation/test_utils.py:2209: in _test_attention_implementation
    res_attn = model_attn.generate(**inputs_dict, **generate_kwargs)
/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py:116: in decorate_context
    return func(*args, **kwargs)
src/transformers/generation/utils.py:2254: in generate
    result = self._sample(
src/transformers/generation/utils.py:3253: in _sample
    outputs = self(**model_inputs, return_dict=True)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1636: in forward
    image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1013: in forward
    hidden_states = blk(hidden_states, cu_seqlens=cu_seqlens, rotary_pos_emb=rotary_pos_emb)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/qwen2_vl/modeling_qwen2_vl.py:426: in forward
    hidden_states = hidden_states + self.attn(
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
src/transformers/models/qwen2_vl/modeling_qwen2_vl.py:371: in forward
    attn_output = flash_attn_varlen_func(q, k, v, cu_seqlens, cu_seqlens, max_seqlen, max_seqlen).reshape(
/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py:1412: in flash_attn_varlen_func
    return FlashAttnVarlenFunc.apply(
/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py:575: in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py:901: in forward
    out_padded, softmax_lse, S_dmask, rng_state = _wrapped_flash_attn_varlen_forward(
/usr/local/lib/python3.10/dist-packages/torch/_ops.py:1116: in __call__
    return self._op(*args, **(kwargs or {}))
/usr/local/lib/python3.10/dist-packages/torch/_library/autograd.py:113: in autograd_impl
    result = forward_no_grad(*args, Metadata(keyset, keyword_only_args))
/usr/local/lib/python3.10/dist-packages/torch/_library/autograd.py:40: in forward_no_grad
    result = op.redispatch(keyset & _C._after_autograd_keyset, *args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_ops.py:721: in redispatch
    return self._handle.redispatch_boxed(keyset, *args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_library/custom_ops.py:324: in backend_impl
    result = self._backend_fns[device_type](*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_compile.py:32: in inner
    return disable_fn(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py:632: in _fn
    return fn(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_library/custom_ops.py:367: in wrapped_fn
    return fn(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

q = tensor([[[-0.1614,  0.0496,  0.0310,  0.0848, -0.0069,  0.0024, -0.0278,
          -0.2484],
         [ 0.1320,  0.103...1920,  0.0258, -0.0024,  0.0219, -0.1159, -0.2313, -0.0026,
          -0.0903]]], device='cuda:0', dtype=torch.float16)
k = tensor([[[-0.0649,  0.0501, -0.1707,  0.1211,  0.1405, -0.0281,  0.1024,
          -0.0083],
         [ 0.0125,  0.233...1232,  0.0568,  0.0165,  0.0043,  0.0517, -0.0721, -0.0253,
           0.1096]]], device='cuda:0', dtype=torch.float16)
v = tensor([[[-7.0496e-02, -2.0706e-02, -1.9617e-01,  6.9153e-02, -3.0197e-02,
          -5.8868e-02, -4.4037e-02,  3.0319...4626e-03,  6.3171e-02,
          -6.2599e-03,  1.1542e-01,  8.6365e-02]]], device='cuda:0',
       dtype=torch.float16)
cu_seqlens_q = tensor([0, 1, 2], dtype=torch.int32), cu_seqlens_k = tensor([0, 1, 2], dtype=torch.int32), max_seqlen_q = 1, max_seqlen_k = 1
dropout_p = 0.0, softmax_scale = 0.3535533905932738, causal = False, window_size_left = -1, window_size_right = -1, softcap = 0.0, alibi_slopes = None
return_softmax = False, block_table = None, leftpad_k = None, seqused_k = None

    @_torch_custom_op_wrapper("flash_attn::_flash_attn_varlen_forward", mutates_args=(), device_types="cuda")
    def _flash_attn_varlen_forward(
        q: torch.Tensor,
        k: torch.Tensor,
        v: torch.Tensor,
        cu_seqlens_q: torch.Tensor,
        cu_seqlens_k: torch.Tensor,
        max_seqlen_q: int,
        max_seqlen_k: int,
        dropout_p: float,
        softmax_scale: float,
        causal: bool,
        window_size_left: int = -1,
        window_size_right: int = -1,
        softcap: float = 0.0,
        alibi_slopes: Optional[torch.Tensor] = None,
        return_softmax: bool = False,
        block_table: Optional[torch.Tensor] = None,
        leftpad_k: Optional[torch.Tensor] = None,
        seqused_k: Optional[torch.Tensor] = None,
    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
        q, k, v = [maybe_contiguous(x) for x in (q, k, v)]
>       out, softmax_lse, S_dmask, rng_state = flash_attn_gpu.varlen_fwd(
            q,
            k,
            v,
            None,
            cu_seqlens_q,
            cu_seqlens_k,
            seqused_k,
            leftpad_k,
            block_table,
            alibi_slopes,
            max_seqlen_q,
            max_seqlen_k,
            dropout_p,
            softmax_scale,
            False,
            causal,
            window_size_left,
            window_size_right,
            softcap,
            return_softmax,
            None,
        )
E       RuntimeError: cu_seqlens_q must be on CUDA

/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py:169: RuntimeError
----------------------------------------------------------------- Captured stderr call ------------------------------------------------------------------
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
__________________________________________________ Qwen2VLModelTest.test_generate_compile_1_end_to_end __________________________________________________

a = (<tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLModelTest testMethod=test_generate_compile_1_end_to_end>,), kw = {}

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

/usr/local/lib/python3.10/dist-packages/parameterized/parameterized.py:620: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/generation/test_utils.py:2075: in test_generate_compile
    compiled_outputs.append(model.generate(model_inputs, generation_config=generation_config))
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py:465: in _fn
    return fn(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py:1269: in __call__
    return self._torchdynamo_orig_callable(
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py:526: in __call__
    return _compile(
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py:924: in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py:666: in compile_inner
    return _compile_inner(code, one_graph, hooks, transform)
/usr/local/lib/python3.10/dist-packages/torch/_utils_internal.py:87: in wrapper_function
    return function(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py:699: in _compile_inner
    out_code = transform_code_object(code, transform)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py:1322: in transform_code_object
    transformations(instructions, code_options)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py:219: in _fn
    return fn(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py:634: in transform
    tracer.run()
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:2796: in run
    super().run()
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:983: in run
    while self.step():
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:895: in step
    self.dispatch_table[inst.opcode](self, inst)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:582: in wrapper
    return inner_fn(self, inst)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:1680: in CALL_FUNCTION_EX
    self.call_function(fn, argsvars.items, kwargsvars)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:830: in call_function
    self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/lazy.py:156: in realize_and_forward
    return getattr(self.realize(), name)(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:385: in call_function
    return super().call_function(tx, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:324: in call_function
    return super().call_function(tx, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:111: in call_function
    return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:836: in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:3011: in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:3139: in inline_call_
    tracer.run()
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:983: in run
    while self.step():
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:895: in step
    self.dispatch_table[inst.opcode](self, inst)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:582: in wrapper
    return inner_fn(self, inst)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:1680: in CALL_FUNCTION_EX
    self.call_function(fn, argsvars.items, kwargsvars)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:830: in call_function
    self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:324: in call_function
    return super().call_function(tx, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:111: in call_function
    return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:836: in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:3011: in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:3139: in inline_call_
    tracer.run()
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:983: in run
    while self.step():
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:895: in step
    self.dispatch_table[inst.opcode](self, inst)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:582: in wrapper
    return inner_fn(self, inst)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:1680: in CALL_FUNCTION_EX
    self.call_function(fn, argsvars.items, kwargsvars)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:830: in call_function
    self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:385: in call_function
    return super().call_function(tx, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:324: in call_function
    return super().call_function(tx, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:111: in call_function
    return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:836: in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:3011: in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:3139: in inline_call_
    tracer.run()
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:983: in run
    while self.step():
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:895: in step
    self.dispatch_table[inst.opcode](self, inst)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:582: in wrapper
    return inner_fn(self, inst)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:1680: in CALL_FUNCTION_EX
    self.call_function(fn, argsvars.items, kwargsvars)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:830: in call_function
    self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:385: in call_function
    return super().call_function(tx, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:324: in call_function
    return super().call_function(tx, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/variables/functions.py:111: in call_function
    return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:836: in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:3011: in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:3139: in inline_call_
    tracer.run()
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:983: in run
    while self.step():
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:895: in step
    self.dispatch_table[inst.opcode](self, inst)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <torch._dynamo.symbolic_convert.InliningInstructionTranslator object at 0x7f7ff05a8550>
inst = Instruction(opcode=114, opname='POP_JUMP_IF_FALSE', arg=51, argval=102, offset=92, starts_line=1762, is_jump_target=Fa...ffset=102, starts_line=1767, is_jump_target=True, positions=None, target=None, exn_tab_entry=None), exn_tab_entry=None)

    def inner(self: "InstructionTranslatorBase", inst: Instruction):
        value: VariableTracker = self.pop()
        if (
            config.rewrite_assert_with_torch_assert
            and _detect_and_normalize_assert_statement(self, truth_fn, push)
        ):
            error_msg: VariableTracker = self.pop()
            # Skip over things like `assert True`
            if value.is_python_constant():
                if bool(value.as_python_constant()):
                    return self.jump(inst)
                else:
                    jump_graph_break(self, inst, value)
    
            # TODO maybe should respect DtoH sync intention of users later??
            # Manually insert torch._assert_async instead of python assert and jump over
            # assert related instructions as we don't need them anymore.
    
            # if we see Tensor as assert statement, no need to call scalar_tensor
            if isinstance(value, TensorVariable):
                self.output.create_proxy(
                    "call_function",
                    torch._assert_async,
                    *proxy_args_kwargs((value, error_msg), {}),
                )
                self.jump(inst)
                return
    
            if isinstance(value, SymNodeVariable):
                # if the assertion is normal shape expression.
                # just install guard and bail out.
                sym_expr = value.sym_num
                if not isinstance(sym_expr, torch.SymBool):
                    sym_expr = sym_expr != 0
    
                result = torch.fx.experimental.symbolic_shapes.expect_true(sym_expr)
                if not result:
                    unimplemented(
                        "Assertion failed on symbolic shapes. Did you make sure eager mode succeeds?"
                    )
                self.jump(inst)
                return
    
            scalar_to_tensor_proxy = self.output.create_proxy(
                "call_function", torch.scalar_tensor, *proxy_args_kwargs((value,), {})
            )
    
            scalar_to_tensor = wrap_fx_proxy(
                self,
                scalar_to_tensor_proxy,
                example_value=get_fake_value(scalar_to_tensor_proxy.node, self),
            )
    
            self.output.create_proxy(
                "call_function",
                torch._assert_async,
                *proxy_args_kwargs((scalar_to_tensor, error_msg), {}),
            )
            self.jump(inst)
            return
    
        if value.is_python_constant():
            if truth_fn(value.as_python_constant()):
                if push:
                    self.push(value)
                self.jump(inst)
        elif (
            isinstance(value, (TensorVariable)) and self.should_compile_partial_graph()
        ):
            jump_graph_break(self, inst, value)
        elif isinstance(value, NNModuleVariable):
            # Equivalent of "self.nn_module is not None"
            mod = self.output.get_submodule(value.module_key)
            if truth_fn(mod):
                if push:
                    self.push(value)
                self.jump(inst)
        elif isinstance(value, UnspecializedNNModuleVariable):
            mod = value.value
            if truth_fn(mod):
                if push:
                    self.push(value)
                self.jump(inst)
        elif isinstance(value, UserDefinedObjectVariable):
            try:
                x = value.var_getattr(self, "__bool__")  # type: ignore[arg-type]
            except exc.ObservedAttributeError:
                exc.handle_observed_exception(self)
                # if __bool__ is missing, trying __len__ to infer a truth value.
                try:
                    x = value.var_getattr(self, "__len__")  # type: ignore[arg-type]
                except exc.ObservedAttributeError:
                    exc.handle_observed_exception(self)
                    x = None
    
            # __bool__ or __len__ is function
            if isinstance(x, UserMethodVariable):
                result = x.call_function(self, [], {})  # type: ignore[arg-type]
                if isinstance(result, ConstantVariable) and isinstance(
                    result.value, (bool, int)
                ):
                    if truth_fn(result.value):
                        if push:
                            self.push(value)
                        self.jump(inst)
                else:
                    unimplemented(
                        "generic_jump on UserDefined with __bool__ returning non-constant"
                    )
            # __bool__ or __len__ is non-function or not existed in the user defined object
            else:
                if truth_fn(True):
                    if push:
                        self.push(value)
                    self.jump(inst)
        elif not isinstance(value, TensorVariable) and value.has_unpack_var_sequence(
            self
        ):
            if truth_fn(len(value.unpack_var_sequence(self))):
                if push:
                    self.push(value)
                self.jump(inst)
        elif isinstance(value, SymNodeVariable):
            try:
                eval_result = value.evaluate_expr(self.output)
            except exc.UserError as e:
                if self.should_compile_partial_graph():
                    return jump_graph_break(self, inst, value, extra_msg=f"\n{e}")
                raise
            if truth_fn(eval_result):
                if push:
                    self.push(value)
                self.jump(inst)
        elif isinstance(value, variables.BackwardHookVariable):
            if truth_fn(True):
                if push:
                    self.push(value)
                self.jump(inst)
        else:
            from .source import is_constant_source
    
            if value.source is not None and is_constant_source(value.source):
                if truth_fn(value.get_real_value()):  # type: ignore[attr-defined]
                    if push:
                        self.push(value)
                    self.jump(inst)
            else:
                # TODO link the torch.cond doc later
>               raise exc.UserError(
                    exc.UserErrorType.DYNAMIC_CONTROL_FLOW,
                    "Dynamic control flow is not supported at the moment. Please use "
                    "functorch.experimental.control_flow.cond to explicitly capture the control flow.",
                    case_name="cond_operands",
                )
E               torch._dynamo.exc.UserError: Dynamic control flow is not supported at the moment. Please use functorch.experimental.control_flow.cond to explicitly capture the control flow. For more information about this error, see: https://pytorch.org/docs/main/generated/exportdb/index.html#cond-operands
E               
E               from user code:
E                  File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py", line 40, in inner
E                   return fn(*args, **kwargs)
E                 File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
E                   return func(*args, **kwargs)
E                 File "/workspace/Github/transformers_qwen2vlfix/src/transformers/generation/utils.py", line 2254, in generate
E                   result = self._sample(
E                 File "/workspace/Github/transformers_qwen2vlfix/src/transformers/generation/utils.py", line 3246, in _sample
E                   model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
E                 File "/workspace/Github/transformers_qwen2vlfix/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1762, in prepare_inputs_for_generation
E                   if cache_position[0] != 0:
E               
E               Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
E               
E               
E               You can suppress this exception and fall back to eager by setting:
E                   import torch._dynamo
E                   torch._dynamo.config.suppress_errors = True

/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py:560: UserError
___________________________________________________ Qwen2VLModelTest.test_model_parallel_beam_search ____________________________________________________

self = <tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLModelTest testMethod=test_model_parallel_beam_search>

    @require_accelerate
    @require_torch_multi_accelerator
    @pytest.mark.generate
    def test_model_parallel_beam_search(self):
        for model_class in self.all_generative_model_classes:
            if "xpu" in torch_device:
                return unittest.skip(reason="device_map='auto' does not work with XPU devices")
    
            if model_class._no_split_modules is None:
                continue
    
            config, inputs_dict = self.prepare_config_and_inputs_for_generate()
    
            model = model_class(config).eval()
            with tempfile.TemporaryDirectory() as tmp_dir:
                model.cpu().save_pretrained(tmp_dir)
                new_model = model_class.from_pretrained(tmp_dir, device_map="auto")
    
>               new_model.generate(
                    max_new_tokens=self.max_new_tokens,
                    num_beams=2,
                    **inputs_dict,
                )

tests/generation/test_utils.py:689: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py:116: in decorate_context
    return func(*args, **kwargs)
src/transformers/generation/utils.py:2285: in generate
    result = self._beam_search(
src/transformers/generation/utils.py:3505: in _beam_search
    outputs = self(**model_inputs, return_dict=True)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py:170: in new_forward
    output = module._old_forward(*args, **kwargs)
src/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1677: in forward
    position_ids, rope_deltas = self.get_rope_index(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = Qwen2VLForConditionalGeneration(
  (visual): Qwen2VisionTransformerPretrainedModel(
    (patch_embed): PatchEmbed(
   ...e-05)
    (rotary_emb): Qwen2VLRotaryEmbedding()
  )
  (lm_head): Linear(in_features=32, out_features=99, bias=False)
)
input_ids = tensor([38, 11, 32, 47, 37, 97, 19, 43, 33, 25, 80, 10, 76, 76, 27, 15, 11, 27,
        28, 64, 20, 12, 31, 78, 69, 61, 93, 73, 38, 21, 21, 54,  4, 25, 67, 48,
        37,  1,  2], device='cuda:1')
image_grid_thw = tensor([[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]], device='cuda:1'), video_grid_thw = None
attention_mask = tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1..., 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:2')

    def get_rope_index(
        self,
        input_ids: Optional[torch.LongTensor] = None,
        image_grid_thw: Optional[torch.LongTensor] = None,
        video_grid_thw: Optional[torch.LongTensor] = None,
        attention_mask: Optional[torch.Tensor] = None,
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        """
        Calculate the 3D rope index based on image and video's temporal, height and width in LLM.
    
        Explanation:
            Each embedding sequence contains vision embedding and text embedding or just contains text embedding.
    
            For pure text embedding sequence, the rotary position embedding has no difference with mordern LLMs.
            Examples:
                input_ids: [T T T T T], here T is for text.
                temporal position_ids: [0, 1, 2, 3, 4]
                height position_ids: [0, 1, 2, 3, 4]
                width position_ids: [0, 1, 2, 3, 4]
    
            For vision and text embedding sequence, we calculate 3D rotary position embedding for vision part
            and 1D rotary position embeddin for text part.
            Examples:
                Assume we have a video input with 3 temporal patches, 2 height patches and 2 width patches.
                input_ids: [V V V V V V V V V V V V T T T T T], here V is for vision.
                vision temporal position_ids: [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2]
                vision height position_ids: [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1]
                vision width position_ids: [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
                text temporal position_ids: [3, 4, 5, 6, 7]
                text height position_ids: [3, 4, 5, 6, 7]
                text width position_ids: [3, 4, 5, 6, 7]
                Here we calculate the text start position_ids as the max vision position_ids plus 1.
    
        Args:
            input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
                Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide
                it.
            image_grid_thw (`torch.LongTensor` of shape `(num_images, 3)`, *optional*):
                The temporal, height and width of feature shape of each image in LLM.
            video_grid_thw (`torch.LongTensor` of shape `(num_videos, 3)`, *optional*):
                The temporal, height and width of feature shape of each video in LLM.
            attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
                Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
    
                - 1 for tokens that are **not masked**,
                - 0 for tokens that are **masked**.
    
        Returns:
            position_ids (`torch.LongTensor` of shape `(3, batch_size, sequence_length)`)
            mrope_position_deltas (`torch.Tensor` of shape `(batch_size)`)
        """
        spatial_merge_size = self.config.vision_config.spatial_merge_size
        image_token_id = self.config.image_token_id
        video_token_id = self.config.video_token_id
        vision_start_token_id = self.config.vision_start_token_id
        mrope_position_deltas = []
        if input_ids is not None and (image_grid_thw is not None or video_grid_thw is not None):
            total_input_ids = input_ids
            if attention_mask is None:
                attention_mask = torch.ones_like(total_input_ids)
            position_ids = torch.ones(
                3, input_ids.shape[0], input_ids.shape[1], dtype=input_ids.dtype, device=input_ids.device
            )
            image_index, video_index = 0, 0
            for i, input_ids in enumerate(total_input_ids):
>               input_ids = input_ids[attention_mask[i] == 1]
E               RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cuda:1)

src/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1481: RuntimeError
----------------------------------------------------------------- Captured stderr call ------------------------------------------------------------------
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
_______________________________________________ Qwen2VLIntegrationTest.test_small_model_integration_test ________________________________________________

self = <tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLIntegrationTest testMethod=test_small_model_integration_test>

    @slow
    def test_small_model_integration_test(self):
        model = Qwen2VLForConditionalGeneration.from_pretrained(
            "Qwen/Qwen2-VL-7B-Instruct", torch_dtype="auto", device_map="auto"
        )
    
        text = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
        inputs = self.processor(text=[text], images=[self.image], return_tensors="pt")
    
        expected_input_ids = [151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 151652, 151655, 151655]  # fmt: skip
        assert expected_input_ids == inputs.input_ids[0].tolist()[:17]
    
        expected_pixel_slice = torch.tensor(
            [
                [0.8792, 0.8792, 0.9084],
                [1.1858, 1.1858, 1.2296],
                [1.2004, 1.2004, 1.2150],
                [1.4340, 1.4340, 1.4194],
                [1.3902, 1.4048, 1.4194],
                [1.5216, 1.5362, 1.5362],
            ],
            dtype=torch.float32,
            device="cpu",
        )
        assert torch.allclose(expected_pixel_slice, inputs.pixel_values[:6, :3], atol=3e-3)
    
        # verify generation
        inputs = inputs.to(torch_device)
    
        output = model.generate(**inputs, max_new_tokens=30)
        EXPECTED_DECODED_TEXT = "system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular pets"
    
>       self.assertEqual(
            self.processor.decode(output[0], skip_special_tokens=True),
            EXPECTED_DECODED_TEXT,
        )
E       AssertionError: 'syst[165 chars]r friendly and intelligent nature, making them popular choices' != 'syst[165 chars]r friendly and intelligent nature, making them popular pets'
E       Diff is 685 characters long. Set self.maxDiff to None to see it.

tests/models/qwen2_vl/test_modeling_qwen2_vl.py:391: AssertionError
----------------------------------------------------------------- Captured stderr call ------------------------------------------------------------------
Loading checkpoint shards: 100%|██████████| 5/5 [00:03<00:00,  1.44it/s]
____________________________________________ Qwen2VLIntegrationTest.test_small_model_integration_test_batch _____________________________________________

self = <tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLIntegrationTest testMethod=test_small_model_integration_test_batch>

    @slow
    def test_small_model_integration_test_batch(self):
        model = Qwen2VLForConditionalGeneration.from_pretrained(
            "Qwen/Qwen2-VL-7B-Instruct", torch_dtype="auto", device_map="auto"
        )
        text = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
        inputs = self.processor(text=[text, text], images=[self.image, self.image], return_tensors="pt").to(
            torch_device
        )
    
        # it should not matter whether two images are the same size or not
        output = model.generate(**inputs, max_new_tokens=30)
    
        EXPECTED_DECODED_TEXT = [
            'system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular choices',
            'system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular pets'
        ]  # fmt: skip
>       self.assertEqual(
            self.processor.batch_decode(output, skip_special_tokens=True),
            EXPECTED_DECODED_TEXT,
        )
E       AssertionError: Lists differ: ['sys[402 chars] friendly and intelligent nature, making them popular choices'] != ['sys[402 chars] friendly and intelligent nature, making them popular pets']
E       
E       First differing element 1:
E       'syst[165 chars]r friendly and intelligent nature, making them popular choices'
E       'syst[165 chars]r friendly and intelligent nature, making them popular pets'
E       
E       Diff is 786 characters long. Set self.maxDiff to None to see it.

tests/models/qwen2_vl/test_modeling_qwen2_vl.py:413: AssertionError
----------------------------------------------------------------- Captured stderr call ------------------------------------------------------------------
Loading checkpoint shards: 100%|██████████| 5/5 [00:03<00:00,  1.33it/s]
_________________________________ Qwen2VLIntegrationTest.test_small_model_integration_test_batch_different_resolutions __________________________________

self = <tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLIntegrationTest testMethod=test_small_model_integration_test_batch_different_resolutions>

    @slow
    def test_small_model_integration_test_batch_different_resolutions(self):
        model = Qwen2VLForConditionalGeneration.from_pretrained(
            "Qwen/Qwen2-VL-7B-Instruct", torch_dtype="auto", device_map="auto"
        )
        text = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
        text2 = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
        image2 = self.image.resize((224, 224))
        inputs = self.processor(text=[text, text2], images=[self.image, image2], padding=True, return_tensors="pt").to(
            torch_device
        )
    
        # it should not matter whether two images are the same size or not
        output = model.generate(**inputs, max_new_tokens=30)
    
        EXPECTED_DECODED_TEXT = [
            "system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular pets",
            "system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular pets",
        ]
>       self.assertEqual(
            self.processor.batch_decode(output, skip_special_tokens=True),
            EXPECTED_DECODED_TEXT,
        )
E       AssertionError: Lists differ: ['sys[216 chars]ular choices', 'system\nYou are a helpful assi[198 chars]ces'] != ['sys[216 chars]ular pets', 'system\nYou are a helpful assista[192 chars]ets']
E       
E       First differing element 0:
E       'syst[165 chars]r friendly and intelligent nature, making them popular choices'
E       'syst[165 chars]r friendly and intelligent nature, making them popular pets'
E       
E       Diff is 1024 characters long. Set self.maxDiff to None to see it.

tests/models/qwen2_vl/test_modeling_qwen2_vl.py:464: AssertionError
----------------------------------------------------------------- Captured stderr call ------------------------------------------------------------------
Loading checkpoint shards: 100%|██████████| 5/5 [00:03<00:00,  1.34it/s]
_______________________________________ Qwen2VLIntegrationTest.test_small_model_integration_test_batch_flashatt2 ________________________________________

self = <tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLIntegrationTest testMethod=test_small_model_integration_test_batch_flashatt2>

    @slow
    @require_flash_attn
    @require_torch_gpu
    def test_small_model_integration_test_batch_flashatt2(self):
        model = Qwen2VLForConditionalGeneration.from_pretrained(
            "Qwen/Qwen2-VL-7B-Instruct",
            torch_dtype=torch.bfloat16,
            attn_implementation="flash_attention_2",
            device_map="auto",
        )
        text = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
        inputs = self.processor(text=[text, text], images=[self.image, self.image], return_tensors="pt").to(
            torch_device
        )
    
        # it should not matter whether two images are the same size or not
        output = model.generate(**inputs, max_new_tokens=30)
    
        EXPECTED_DECODED_TEXT = [
            "system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular pets",
            "system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular pets",
        ]
    
>       self.assertEqual(
            self.processor.batch_decode(output, skip_special_tokens=True),
            EXPECTED_DECODED_TEXT,
        )
E       AssertionError: Lists differ: ['sys[216 chars]ular choices', 'system\nYou are a helpful assi[198 chars]ces'] != ['sys[216 chars]ular pets', 'system\nYou are a helpful assista[192 chars]ets']
E       
E       First differing element 0:
E       'syst[165 chars]r friendly and intelligent nature, making them popular choices'
E       'syst[165 chars]r friendly and intelligent nature, making them popular pets'
E       
E       Diff is 1024 characters long. Set self.maxDiff to None to see it.

tests/models/qwen2_vl/test_modeling_qwen2_vl.py:492: AssertionError
----------------------------------------------------------------- Captured stderr call ------------------------------------------------------------------
Loading checkpoint shards: 100%|██████████| 5/5 [00:03<00:00,  1.32it/s]
________________________________________ Qwen2VLIntegrationTest.test_small_model_integration_test_batch_wo_image ________________________________________

self = <tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLIntegrationTest testMethod=test_small_model_integration_test_batch_wo_image>

    @slow
    def test_small_model_integration_test_batch_wo_image(self):
        model = Qwen2VLForConditionalGeneration.from_pretrained(
            "Qwen/Qwen2-VL-7B-Instruct", torch_dtype="auto", device_map="auto"
        )
        text = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
        messages2 = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who are you?"},
        ]
        text2 = self.processor.apply_chat_template(messages2, tokenize=False, add_generation_prompt=True)
        inputs = self.processor(text=[text, text2], images=[self.image], padding=True, return_tensors="pt").to(
            torch_device
        )
    
        # it should not matter whether two images are the same size or not
        output = model.generate(**inputs, max_new_tokens=30)
    
        EXPECTED_DECODED_TEXT = [
            'system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular pets',
            'system\nYou are a helpful assistant.\nuser\nWho are you?\nassistant\nI am Qwen, a large language model created by Alibaba Cloud. I am designed to assist with various tasks and answer questions to the best of my'
        ]  # fmt: skip
>       self.assertEqual(
            self.processor.batch_decode(output, skip_special_tokens=True),
            EXPECTED_DECODED_TEXT,
        )
E       AssertionError: Lists differ: ['sys[216 chars]ular choices', 'system\nYou are a helpful assi[107 chars]en.'] != ['sys[216 chars]ular pets', 'system\nYou are a helpful assista[174 chars] my']
E       
E       First differing element 0:
E       'syst[165 chars]r friendly and intelligent nature, making them popular choices'
E       'syst[165 chars]r friendly and intelligent nature, making them popular pets'
E       
E       Diff is 1005 characters long. Set self.maxDiff to None to see it.

tests/models/qwen2_vl/test_modeling_qwen2_vl.py:440: AssertionError
----------------------------------------------------------------- Captured stderr call ------------------------------------------------------------------
Loading checkpoint shards: 100%|██████████| 5/5 [00:03<00:00,  1.34it/s]
___________________________________ Qwen2VLIntegrationTest.test_small_model_integration_test_batch_wo_image_flashatt2 ___________________________________

self = <tests.models.qwen2_vl.test_modeling_qwen2_vl.Qwen2VLIntegrationTest testMethod=test_small_model_integration_test_batch_wo_image_flashatt2>

    @slow
    @require_flash_attn
    @require_torch_gpu
    def test_small_model_integration_test_batch_wo_image_flashatt2(self):
        model = Qwen2VLForConditionalGeneration.from_pretrained(
            "Qwen/Qwen2-VL-7B-Instruct",
            torch_dtype=torch.bfloat16,
            attn_implementation="flash_attention_2",
            device_map="auto",
        )
        text = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
        messages2 = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who are you?"},
        ]
        text2 = self.processor.apply_chat_template(messages2, tokenize=False, add_generation_prompt=True)
        inputs = self.processor(text=[text, text2], images=[self.image], padding=True, return_tensors="pt").to(
            torch_device
        )
    
        # it should not matter whether two images are the same size or not
        output = model.generate(**inputs, max_new_tokens=30)
    
        EXPECTED_DECODED_TEXT = [
            "system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?\nassistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and intelligent nature, making them popular pets",
            "system\nYou are a helpful assistant.\nuser\nWho are you?\nassistant\nI am Qwen, a large language model created by Alibaba Cloud. I am designed to answer a wide range of questions and provide information on various topics",
        ]
    
>       self.assertEqual(
            self.processor.batch_decode(output, skip_special_tokens=True),
            EXPECTED_DECODED_TEXT,
        )
E       AssertionError: Lists differ: ['sys[216 chars]ular choices', 'system\nYou are a helpful assi[107 chars]en.'] != ['sys[216 chars]ular pets', 'system\nYou are a helpful assista[184 chars]ics']
E       
E       First differing element 0:
E       'syst[165 chars]r friendly and intelligent nature, making them popular choices'
E       'syst[165 chars]r friendly and intelligent nature, making them popular pets'
E       
E       Diff is 1015 characters long. Set self.maxDiff to None to see it.

tests/models/qwen2_vl/test_modeling_qwen2_vl.py:529: AssertionError
----------------------------------------------------------------- Captured stderr call ------------------------------------------------------------------
Loading checkpoint shards: 100%|██████████| 5/5 [00:03<00:00,  1.36it/s]
=================================================================== warnings summary ====================================================================
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_attention_outputs
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_retain_grad_hidden_states_attentions
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_attentions
  /workspace/Github/transformers_qwen2vlfix/src/transformers/generation/configuration_utils.py:818: UserWarning: `return_dict_in_generate` is NOT set to `True`, but `output_attentions` is. When `return_dict_in_generate` is not `True`, `output_attentions` is ignored.
    warnings.warn(

tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_batching_equivalence
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_hidden_states_output
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_retain_grad_hidden_states_attentions
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_hidden_state
  /workspace/Github/transformers_qwen2vlfix/src/transformers/generation/configuration_utils.py:818: UserWarning: `return_dict_in_generate` is NOT set to `True`, but `output_hidden_states` is. When `return_dict_in_generate` is not `True`, `output_hidden_states` is ignored.
    warnings.warn(

tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_continue_from_past_key_values
  /workspace/Github/transformers_qwen2vlfix/src/transformers/generation/configuration_utils.py:606: UserWarning: `pad_token_id` should be positive but got -1. This will cause errors when batch generating, if there is padding. Please set `pad_token_id` explicitly as `model.generation_config.pad_token_id=PAD_TOKEN_ID` to avoid errors in generation
    warnings.warn(

tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_attentions
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_hidden_state
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_simple
  /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:2529: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
    assert padding_idx < weight.size(

tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_attentions
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_hidden_state
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_simple
  /workspace/Github/transformers_qwen2vlfix/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py:578: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
    if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):

tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_attentions
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_hidden_state
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_simple
  /workspace/Github/transformers_qwen2vlfix/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1297: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
    if attention_mask.shape[-1] > target_length:

tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_output_hidden_state
tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_torchscript_simple
  /workspace/Github/transformers_qwen2vlfix/src/transformers/modeling_attn_mask_utils.py:285: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
    elif sliding_window is None or key_value_length < sliding_window:

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================ short test summary info ================================================================
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_eager_matches_fa2_generate - RuntimeError: cu_seqlens_q must be on CUDA
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_generate_compile_1_end_to_end - torch._dynamo.exc.UserError: Dynamic control flow is not supported at the moment. Please use functorch.experimental.control_flow.cond to explicitly capture the control flow. For more information about this error, see: https://pytorch.org/docs/main/generated/exportdb/index.html#cond-operands

from user code:
   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py", line 40, in inner
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/Github/transformers_qwen2vlfix/src/transformers/generation/utils.py", line 2254, in generate
    result = self._sample(
  File "/workspace/Github/transformers_qwen2vlfix/src/transformers/generation/utils.py", line 3246, in _sample
    model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
  File "/workspace/Github/transformers_qwen2vlfix/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1762, in prepare_inputs_for_generation
    if cache_position[0] != 0:

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLModelTest::test_model_parallel_beam_search - RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cuda:1)
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test - AssertionError: 'syst[165 chars]r friendly and intelligent nature, making them popular choices' != 'syst[165 chars]r friendly and intelligent nature, making them popular pets'
Diff is 685 characters long. Set self.maxDiff to None to see it.
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch - AssertionError: Lists differ: ['sys[402 chars] friendly and intelligent nature, making them popular choices'] != ['sys[402 chars] friendly and intelligent nature, making them popular pets']

First differing element 1:
'syst[165 chars]r friendly and intelligent nature, making them popular choices'
'syst[165 chars]r friendly and intelligent nature, making them popular pets'

Diff is 786 characters long. Set self.maxDiff to None to see it.
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_different_resolutions - AssertionError: Lists differ: ['sys[216 chars]ular choices', 'system\nYou are a helpful assi[198 chars]ces'] != ['sys[216 chars]ular pets', 'system\nYou are a helpful assista[192 chars]ets']

First differing element 0:
'syst[165 chars]r friendly and intelligent nature, making them popular choices'
'syst[165 chars]r friendly and intelligent nature, making them popular pets'

Diff is 1024 characters long. Set self.maxDiff to None to see it.
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_flashatt2 - AssertionError: Lists differ: ['sys[216 chars]ular choices', 'system\nYou are a helpful assi[198 chars]ces'] != ['sys[216 chars]ular pets', 'system\nYou are a helpful assista[192 chars]ets']

First differing element 0:
'syst[165 chars]r friendly and intelligent nature, making them popular choices'
'syst[165 chars]r friendly and intelligent nature, making them popular pets'

Diff is 1024 characters long. Set self.maxDiff to None to see it.
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_wo_image - AssertionError: Lists differ: ['sys[216 chars]ular choices', 'system\nYou are a helpful assi[107 chars]en.'] != ['sys[216 chars]ular pets', 'system\nYou are a helpful assista[174 chars] my']

First differing element 0:
'syst[165 chars]r friendly and intelligent nature, making them popular choices'
'syst[165 chars]r friendly and intelligent nature, making them popular pets'

Diff is 1005 characters long. Set self.maxDiff to None to see it.
FAILED tests/models/qwen2_vl/test_modeling_qwen2_vl.py::Qwen2VLIntegrationTest::test_small_model_integration_test_batch_wo_image_flashatt2 - AssertionError: Lists differ: ['sys[216 chars]ular choices', 'system\nYou are a helpful assi[107 chars]en.'] != ['sys[216 chars]ular pets', 'system\nYou are a helpful assista[184 chars]ics']

First differing element 0:
'syst[165 chars]r friendly and intelligent nature, making them popular choices'
'syst[165 chars]r friendly and intelligent nature, making them popular pets'

Diff is 1015 characters long. Set self.maxDiff to None to see it.
=========================================== 9 failed, 91 passed, 44 skipped, 19 warnings in 94.99s (0:01:34) ============================================

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, the failing test would still need to be worked on (cuda transfer for cu seq lens and etc) but this works. THanks

@ArthurZucker ArthurZucker merged commit 4349a0e into huggingface:main Jan 8, 2025
25 checks passed
@minostauros minostauros mentioned this pull request Jan 9, 2025
5 tasks
@marthos1
Copy link

I'd like to help review and contribute to this PR. Could you guide me on what improvements are needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Qwen2-VL used to work with inputs_embeds instead of input_ids, but no more
5 participants