Skip to content
#

bert

Here are 1,672 public repositories matching this topic...

transformers
tokenizers
david-waterworth
david-waterworth commented Feb 27, 2021

The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).

impl PreTokenizer for Punctuation {
    fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
        pretokenized.split(|_, s| s.spl
myboyliu
myboyliu commented Jun 25, 2021

1.希望可以把底层的api文档再完善一些,比如encoder,decoder,以便于复现一些论文
2.希望可以维护一个pytorch和paddle的api对照一览表,尽量全一些
3.错误日志能否准确一些,有时候datalaoder出的一些错误信息不好定位
4.能否增加使用梯度累加特性,进一步提高batch size

Improve this page

Add a description, image, and links to the bert topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the bert topic, visit your repo's landing page and select "manage topics."

Learn more