Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the loss of Masked LM #49

Open
zhezhaoa opened this issue Dec 7, 2018 · 4 comments
Open

Question about the loss of Masked LM #49

zhezhaoa opened this issue Dec 7, 2018 · 4 comments
Labels
good first issue

Comments

@zhezhaoa
Copy link

@zhezhaoa zhezhaoa commented Dec 7, 2018

Thank you very much for this great contribution.
I found the loss of masked LM didn't decrease when it reaches the value around 7. However, in the official tensorflow implementation, the loss of MLM decreases to 1 easily. I think something went wrong in your implementation.
In additional, I found the code can not predict the next sentence correctly. I think the reason is: self.criterion = nn.NLLLoss(ignore_index=0). It can not be used as criterion for sentence prediction because the label of sentence is 1 or 0. We should remove ignore_index=0 for sentence prediction.
I am looking forward to your reply~

@tanaka-jp
Copy link

@tanaka-jp tanaka-jp commented Dec 14, 2018

I think the reason is: self.criterion = nn.NLLLoss(ignore_index=0). It can not be used as criterion for sentence prediction because the label of sentence is 1 or 0.

I think you are right.
My loss of next sentence is very low, but the acc of next_correct is always near 50%.

@raulpuric
Copy link

@raulpuric raulpuric commented Jan 25, 2019

I've been trying to repro BERT's pretraining results from scratch in my own time, and I have been unable to train beyond an masked LM loss of 5.4. So if anyone is able to get past this point I'd love to learn what you did.

@codertimo
Copy link
Owner

@codertimo codertimo commented Apr 8, 2019

Sorry for my late update, and I think your point is right too. I'll fix it up ASAP

@codertimo codertimo added the good first issue label Apr 8, 2019
@itamargol
Copy link

@itamargol itamargol commented May 5, 2019

What is the verdict here regarding next sentence task?
Should we use 2 different loss function, without ignore=0 for sentence prediction?

And what about the MLM? anyone found a solution? Can't drop also beneath 6/7...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants