bert
Here are 1,672 public repositories matching this topic...
-
Updated
Jul 1, 2021 - Python
-
Updated
Jul 25, 2021 - Jupyter Notebook
-
Updated
Oct 22, 2020
-
Updated
Jul 21, 2021 - Python
The Split
class accepts SplitDelimiterBehavior
which is really useful. The Punctuation
however always uses SplitDelimiterBehavior::Isolated
(and Whitespace
on the other hand behaves like SplitDelimiterBehavior::Removed
).
impl PreTokenizer for Punctuation {
fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
pretokenized.split(|_, s| s.spl
chooses 15% of token
From paper, it mentioned
Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my
dog is hairy it chooses hairy.
It means that 15% of token will be choose for sure.
From https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L68,
for every single token, it has 15% of chance that go though the followup procedure.
PositionalEmbedding
-
Updated
Jun 28, 2021 - Python
-
Updated
Feb 24, 2021 - Python
-
Updated
Oct 22, 2020 - Python
-
Updated
Jul 25, 2021 - Python
-
Updated
Jul 15, 2021 - Jupyter Notebook
-
Updated
Jun 10, 2021 - Python
-
Updated
Jul 22, 2021 - Python
-
Updated
Jul 25, 2021 - Scala
-
Updated
Jun 19, 2021 - Python
-
Updated
Sep 17, 2020 - Python
A web crawler was added by #775, but the test cases are missing.
-
Updated
Jul 9, 2021 - Python
-
Updated
Jul 22, 2021 - Python
-
Updated
Apr 23, 2021 - Python
-
Updated
Jul 8, 2021 - Python
-
Updated
Mar 21, 2021
-
Updated
Jul 12, 2021 - Python
关于一些具体建议
1.希望可以把底层的api文档再完善一些,比如encoder,decoder,以便于复现一些论文
2.希望可以维护一个pytorch和paddle的api对照一览表,尽量全一些
3.错误日志能否准确一些,有时候datalaoder出的一些错误信息不好定位
4.能否增加使用梯度累加特性,进一步提高batch size
预训练模型下载地址修改
-
Updated
Jul 25, 2021 - Python
-
Updated
Jan 3, 2021 - Python
-
Updated
Jan 28, 2021 - Jupyter Notebook
-
Updated
Jul 23, 2021 - Cuda
-
Updated
Jul 5, 2021 - Python
Improve this page
Add a description, image, and links to the bert topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the bert topic, visit your repo's landing page and select "manage topics."
Add better error message to
HubertForCTC
,Wav2Vec2ForCTC
if labels are bigger than vocab size.Motivation
Following this issue: huggingface/transformers#12264 it is clear that an error message should be thrown if any of the any of the labels are >
self.config.vocab_size
or else silent errors can sneak into the training script.So w