Skip to content
#

language-model

Here are 718 public repositories matching this topic...

transformers
tokenizers
david-waterworth
david-waterworth commented Feb 27, 2021

The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).

impl PreTokenizer for Punctuation {
    fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
        pretokenized.split(|_, s| s.spl
haystack
brandenchan
brandenchan commented Jun 14, 2021

Many users in our community have been asking to have easier ways to return the output of intermediate nodes. I can see that this could be very useful for debugging and also qualitative evaluation.

I think this feature would be very useful, though the exact design is not yet fully clear.

Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)

  • Updated Jun 25, 2021

Improve this page

Add a description, image, and links to the language-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the language-model topic, visit your repo's landing page and select "manage topics."

Learn more