Data Type Mismatch and Discrepancy in K and V Acquisition in Self-Attention

# A Data Type Mismatch Causes a Code Error

In the code for wavelet transformation and invert wavelet transformation, why  are the type of  wavelet filters cast to torch.float16? The input x is still torch.float32, which cause error when running code due to mismatch. Did I encounter this issue because I used float32 inputs during debugging, while you used float16? If that’s the case,  I am sorry, because I mistakenly assumed the data type mismatch was a problem.

## Code references:

https://github.com/YehLi/ImageNetModel/blob/0b4302e0b10ff12013b03b799dda50393efa69a9/semantic_segmentation/mmseg/models/backbones/wavevit.py#L115

https://github.com/YehLi/ImageNetModel/blob/0b4302e0b10ff12013b03b799dda50393efa69a9/semantic_segmentation/mmseg/models/backbones/wavevit.py#L93

# The Acquisition of K and V in Self-Attention Does Not Match the Paper

<img width="1038" alt="Screenshot 2024-10-01 at 23 33 41" src="https://github.com/user-attachments/assets/cc6a5178-1e84-458e-8483-bdfd7ded46a2">

According to the paper, k and v is obtained from down-sampled feature map through wavelet transformation, which is 1/2 size of q. However, in the code, down-sampled feature map is further downsampled by a factor of 4 in the first stage and by a factor of 2 in the second stage., which is not matched the paper. Could you explain  the reason behind this discrepancy? Just to reduce the computational cost?

## Code references:

https://github.com/YehLi/ImageNetModel/blob/0b4302e0b10ff12013b03b799dda50393efa69a9/semantic_segmentation/mmseg/models/backbones/wavevit.py#L189

https://github.com/YehLi/ImageNetModel/blob/0b4302e0b10ff12013b03b799dda50393efa69a9/semantic_segmentation/mmseg/models/backbones/wavevit.py#L224


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data Type Mismatch and Discrepancy in K and V Acquisition in Self-Attention #14

A Data Type Mismatch Causes a Code Error

Code references:

The Acquisition of K and V in Self-Attention Does Not Match the Paper

Code references:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Data Type Mismatch and Discrepancy in K and V Acquisition in Self-Attention #14

Description

A Data Type Mismatch Causes a Code Error

Code references:

The Acquisition of K and V in Self-Attention Does Not Match the Paper

Code references:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions