language-model
Here are 718 public repositories matching this topic...
-
Updated
Oct 22, 2020
-
Updated
Jun 26, 2021 - Python
The Split
class accepts SplitDelimiterBehavior
which is really useful. The Punctuation
however always uses SplitDelimiterBehavior::Isolated
(and Whitespace
on the other hand behaves like SplitDelimiterBehavior::Removed
).
impl PreTokenizer for Punctuation {
fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
pretokenized.split(|_, s| s.spl
chooses 15% of token
From paper, it mentioned
Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my
dog is hairy it chooses hairy.
It means that 15% of token will be choose for sure.
From https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L68,
for every single token, it has 15% of chance that go though the followup procedure.
PositionalEmbedding
-
Updated
Jun 30, 2021 - Python
-
Updated
Jun 29, 2021 - Python
-
Updated
Jun 19, 2021 - Python
-
Updated
Jun 18, 2021 - Python
Many users in our community have been asking to have easier ways to return the output of intermediate nodes. I can see that this could be very useful for debugging and also qualitative evaluation.
I think this feature would be very useful, though the exact design is not yet fully clear.
-
Updated
Nov 11, 2020 - Python
-
Updated
Jun 25, 2021
-
Updated
Apr 23, 2021 - Python
-
Updated
May 7, 2020 - Python
-
Updated
May 11, 2021 - Python
-
Updated
Feb 7, 2019 - Python
Issue to track tutorial requests:
- Deep Learning with PyTorch: A 60 Minute Blitz - #69
- Sentence Classification - #79
-
Updated
Aug 5, 2020
-
Updated
Jun 14, 2021 - Python
-
Updated
Jan 1, 2019 - Python
-
Updated
Jun 29, 2021 - Python
-
Updated
May 24, 2021 - Go
-
Updated
Feb 23, 2021 - Python
-
Updated
Oct 29, 2020 - Python
-
Updated
Jun 15, 2021 - Python
-
Updated
Dec 14, 2020 - Python
-
Updated
May 4, 2021 - Python
-
Updated
Jun 6, 2021 - TeX
-
Updated
Dec 18, 2017 - Python
-
Updated
Jun 4, 2021 - Jupyter Notebook
Improve this page
Add a description, image, and links to the language-model topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the language-model topic, visit your repo's landing page and select "manage topics."
Add better error message to
HubertForCTC
,Wav2Vec2ForCTC
if labels are bigger than vocab size.Motivation
Following this issue: huggingface/transformers#12264 it is clear that an error message should be thrown if any of the any of the labels are >
self.config.vocab_size
or else silent errors can sneak into the training script.So w