natural-language-understanding
Here are 512 public repositories matching this topic...
-
Updated
Jun 2, 2021 - Python
-
Updated
Jul 1, 2021 - Python
-
Updated
Jul 6, 2021 - Python
-
Updated
Apr 8, 2021 - Python
The Split
class accepts SplitDelimiterBehavior
which is really useful. The Punctuation
however always uses SplitDelimiterBehavior::Isolated
(and Whitespace
on the other hand behaves like SplitDelimiterBehavior::Removed
).
impl PreTokenizer for Punctuation {
fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
pretokenized.split(|_, s| s.spl
[Error Message] Improve error message in SentencepieceTokenizer when arguments are not expected.
Description
While using tokenizers.create with the model and vocab file for a custom corpus, the code throws an error and is not able to generate the BERT vocab file
Error Message
ValueError: Mismatch vocabulary! All special tokens specified must be control tokens in the sentencepiece vocabulary.
To Reproduce
from gluonnlp.data import tokenizers
tokenizers.create('spm', model_p
-
Updated
Jul 7, 2021 - Scheme
-
Updated
Jan 22, 2021 - C++
-
Updated
Feb 16, 2021 - Python
-
Updated
Jul 7, 2021 - Python
-
Updated
Feb 25, 2021 - JavaScript
-
Updated
Oct 11, 2020 - Python
-
Updated
Mar 18, 2021 - Jupyter Notebook
-
Updated
Jul 6, 2021 - Python
-
Updated
May 4, 2021 - Python
-
Updated
Jan 12, 2021 - Python
-
Updated
Jul 7, 2021 - Python
-
Updated
Jul 7, 2021 - Jupyter Notebook
-
Updated
Jun 29, 2020 - Python
-
Updated
Sep 27, 2018 - Python
-
Updated
Jul 6, 2021 - JavaScript
-
Updated
Jun 28, 2021 - Java
-
Updated
Apr 16, 2018 - Python
-
Updated
Dec 28, 2019
-
Updated
Feb 10, 2021 - C++
-
Updated
Jun 23, 2021 - Python
-
Updated
Jul 6, 2021 - Python
-
Updated
Jun 8, 2021
-
Updated
Jul 7, 2021 - Python
Improve this page
Add a description, image, and links to the natural-language-understanding topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the natural-language-understanding topic, visit your repo's landing page and select "manage topics."
Add better error message to
HubertForCTC
,Wav2Vec2ForCTC
if labels are bigger than vocab size.Motivation
Following this issue: huggingface/transformers#12264 it is clear that an error message should be thrown if any of the any of the labels are >
self.config.vocab_size
or else silent errors can sneak into the training script.So w