transformer
Here are 1,739 public repositories matching this topic...
-
Updated
Jul 25, 2021 - Jupyter Notebook
-
Updated
Mar 4, 2022
Bidirectional RNN
Is there a way to train a bidirectional RNN (like LSTM or GRU) on trax nowadays?
-
Updated
Mar 4, 2022 - Jupyter Notebook
-
Updated
Sep 10, 2021 - Python
-
Updated
Jan 26, 2022 - Python
chooses 15% of token
From paper, it mentioned
Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my
dog is hairy it chooses hairy.
It means that 15% of token will be choose for sure.
From https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L68,
for every single token, it has 15% of chance that go though the followup procedure.
PositionalEmbedding
-
Updated
Mar 4, 2022 - Python
-
Updated
Feb 9, 2022 - Python
-
Updated
Feb 25, 2022 - Python
-
Updated
Aug 4, 2021 - Jupyter Notebook
We keep this issue open to collect feature requests from users and hear your voice. Our monthly release plan is also available here.
You can either:
- Suggest a new feature by leaving a comment.
- Vote for a feature request with
👍 or be against with👎 . (Remember that developers are busy and cannot respond to all feature requests, so vote for your most favorable one!) - Tell us that you wo
-
Updated
May 3, 2017 - Java
文档增加tokenizer类别及样例建议
欢迎您反馈PaddleNLP使用问题,非常感谢您对PaddleNLP的贡献!
在留下您的问题时,辛苦您同步提供如下信息:
- 版本、环境信息
1)PaddleNLP和PaddlePaddle版本:请提供您的PaddleNLP和PaddlePaddle版本号,例如PaddleNLP 2.0.4,PaddlePaddle2.1.1
2)系统环境:请您描述系统类型,例如Linux/Windows/MacOS/,python版本 - 复现信息:如为报错,请给出复现环境、复现步骤
paddle版本2.0.8 paddlenlp版本2.1.0
建议,能否在paddlenlp文档中,整理列出各个模型的tokenizer是基于什么类别的based,如bert tokenizer是word piece的,xlnet tokenizer是sentence piece的,以及对应的输入输出样例
关于一些具体建议
-
Updated
Feb 25, 2022 - JavaScript
目前的多音字使用 pypinyin 或者 g2pM,精度有限,想做一个基于 BERT (或者 ERNIE) 多音字预测模型,简单来说就是假设某语言有 100 个多音字,每个多音字最多有 3 个发音,那么可以在 BERT 后面接 100 个 3 分类器(简单的 fc 层即可),在预测时,找到对应的分类器进行分类即可。
参考论文:
tencent_polyphone.pdf
数据可以用 https://github.com/kakaobrain/g2pM 提供的数据
进阶:多任务的 BERT
 when Python 3.9 is released (which already happened on 2020.10.5). Is there any plan to complete this?
https://github.com/huggingface/transformers/blob/2c2a31ffbcfe03339b1721348781aac4fc05bc5e/src/transformers/hf_argparser.py#L85-L90