Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Question about the Calculation of cauchy_cross_entroy Loss #28

Closed
MaxChanger opened this issue May 12, 2019 · 11 comments
Closed

A Question about the Calculation of cauchy_cross_entroy Loss #28

MaxChanger opened this issue May 12, 2019 · 11 comments

Comments

@MaxChanger
Copy link

MaxChanger commented May 12, 2019

Hello, I recently looked at the DCH method in the DeepHash library, but I encountered some incomprehension, as shown in the following figure

20190512213035

The parameter v and label_v in function cauchy_cross_entroy is always None, so in function cauchy_cross_entroy, the part in the red box is always executed. So u and v are the same.

At the same time, I'm also looking at HashNet's pytorch code implementation. And It also requires two sets to calculate the loss, but outputs1 and outputs2 in the figure below take two batches from the entire data, which are different.

20190512213723

So now I'm wondering why you set U and V to the same directly.
In the paper, h_i and h_j are also used to represent two sets, which are not the same.

I hope I can get your answer. Thank you.

@bl0
Copy link
Contributor

bl0 commented May 12, 2019

Hi, I think that calculating the loss within the batch and cross two batches are both ok.
But obviously within the batch is more convenient.

@MaxChanger
Copy link
Author

I found this problem when I tried to modify the DCH based on the HashNet code, because I was more familiar with pytorch. I modified the loss function of HashNet according to the DCH paper and tensorflow code, but if I used the image data from two batches to calculate loss, the result would be nan. But if the settings in the picture above come from the same batch, I can get a map on coco that is similar to the one in the paper, 0.73.

@MaxChanger
Copy link
Author

I can't understand why it's set to come from the same batch.
I think if u and v come from the same batch, the values of h_i and h_j should be equal, h_i = h_j.
And if it's true, then dist(h_i,h_j) = dist(h_i,h_i) , what's the mean about dist(h_i,h_i)
I don't know if I understand it correctly.

@bl0
Copy link
Contributor

bl0 commented May 13, 2019

Hi, a batch also contains some samples. So i and j can be any number in [0, batch_size) and we can calculate the cross entropy loss for all pair (i, j) within the batch.

@bl0
Copy link
Contributor

bl0 commented May 13, 2019

PYI: In our implementation, ip_1 and label_ip both have shape [bs, bs] where bs is the batch size.

@MaxChanger
Copy link
Author

Hi, I think the u in the code is batch_size×hash_bits, for example 128×48, represent the batch has 128 images and the length of hash code is 48, the label_u is batch_size×label_bits, the label_bits depend on the train.txt/database.txt. Hence, as you said, ip_1 and label_ip both have shape [batch_size, batch_size].

But I understand that h_i or h_j is a row of u matrices, and if ip_1= tf.matmul(u, tf.transpose(v) represent matrix u is multiplied by v's transposition. if u and v are the same, It is equivalent to ip_1 = tf.matmul(u, tf.transpose(u), the value of ip_1[i][j] represents the dot product of u[h_i] row and u^T[h_j] column, but is the value of ip[i][i] meaningful?

@bl0
Copy link
Contributor

bl0 commented May 13, 2019

Why ip[i][i] is not meaningful? They come from different i and j.

@MaxChanger
Copy link
Author

ip[i][j] is meaningful, they come from different i and j, but ip[i][i] is dist(h_i, h_i) not dist(h_i, h_j), it represents a hash code of an image dot product by itself

@bl0
Copy link
Contributor

bl0 commented May 13, 2019

Ok, I understand your question now.
Yes, ip[i, i] may not be meaningful. But the loss for pair (i, i) will be zero and don't affect the training. So we need not worry that.

@MaxChanger
Copy link
Author

Thank you for your answer. Through the discussion with you, I am more convinced that my understanding is correct or not.
But I still need to find why I set two different batch in my code as input led to the loss is nan, but the same batch can work perfectly.
If I have some progress, I will show in this issue again. Thanks

@bl0
Copy link
Contributor

bl0 commented May 13, 2019

Ok. Welcome to discuss with me.

@bl0 bl0 closed this as completed May 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants