CTC beamsearch decoding via ctcdecode #773

Antoine-Caubriere · 2021-05-19T16:39:53Z

Quick add : Beam search decoding using this implementation : https://github.com/parlance/ctcdecode

TParcollet · 2021-05-19T19:31:19Z

TParcollet · 2021-05-19T19:32:38Z

@Antoine-Caubriere Could you please add an exemple in your class ? Just like for other decoders ? This will help the user to understand how to use it.

30stomercury · 2021-05-21T17:43:59Z

Hi @Antoine-Caubriere , can you share some examples or details of your experiments? Thanks.

Antoine-Caubriere · 2021-05-22T19:31:32Z

Hi @TParcollet, @30stomercury,

This is very simple to use this implementation.

First, you have to add the CTCdeccodeBeamSearch in your YAML configuration file.
You must give the labels array and the path to your LM (kenlm).

An example for my experiments :

ctc_beam_search_module: !new:speechbrain.decoders.CTCDecodeBeamSearch
        labels: ['<blank>', 'u', ' ', 'r', 'e', 'v', 'o', 'i', 'm', 'c', 's', 'x', 'n', 't', 'h', "'", 'p', 'f', 'd', 'b', 'é', 'z', 'ç', 'è', 'à', 'l', 'g', 'j', 'ô', 'k', 'q', 'â', 'ù', 'î', 'y', 'û', 'ë', 'ê', 'w', '[', ']', 'ï', 'a', '<unk>']
        model_path: /users/lm/media.noconcept.4g.unk.mmap
        log_probs_input: True

Then you can simply use him in the compute_objectives.

An example :

    def compute_forward(self, batch, stage):
        """Forward computations from the waveform batches to the output probabilities."""

        batch = batch.to(self.device)
        wavs, wav_lens = batch.sig

        # Forward pass
        feats = self.modules.wav2vec2(wavs)
        x = self.modules.enc(feats)
        logits = self.modules.ctc_lin(x)
        p_ctc = self.hparams.log_softmax(logits)
        return p_ctc, wav_lens


    def compute_objectives(self, predictions, batch, stage):
        """Computes the CTC loss given predictions and targets."""

        p_ctc, wav_lens = predictions
        chars, char_lens = batch.char_encoded
        
        loss = self.hparams.ctc_cost(
                p_ctc, chars, wav_lens, char_lens
            )
        self.ctc_metrics.append(batch.id, p_ctc, chars, wav_lens, char_lens)

        if stage != sb.Stage.TRAIN:
            if(hparams["ctc_beam_search"]):   # <--- here, a boolean in the configuration file            
                sequence = self.hparams.ctc_beam_search_module(p_ctc)
            else:
                sequence = sb.decoders.ctc_greedy_decode(p_ctc, wav_lens, self.hparams.blank_index)
            
            self.cer_metrics.append(
                ids=batch.id,
                predict=sequence,
                target=chars,
                target_len=char_lens,
                ind2lab=self.label_encoder.decode_ndim,
            )

        return loss

By default, blank token is on the index 0. But you can change this in the YAML if you need. (As all the others parameters (alpha, beta, beam_width...)).

Also, you can use the "nBest" parameter to extract more than the 1Best.

example :

sequences = self.hparams.ctc_beam_search_module(p_ctc,nBest=5)

30stomercury · 2021-05-23T06:16:36Z

Hi @Antoine-Caubriere , thanks. The make_ngram implementation in their scorer.cpp can convert prefix into ngram for kenlm scoring, even the vocab is in subword unit. This is very useful.

mravanelli · 2021-05-31T19:31:25Z

Thank you @Antoine-Caubriere! That looks a nice addition. Our goal is to integrate our code with k2 and wfst, but this looks like a reasonable solution for now.
As for the code, my suggestion is the following:
1- Add a docstring example to clarify how the beamsearch work. One issue here is that this ctc decoding code is a non-mandatory external dependency but there is a way to avoid testing if during the CI.
2-The current PR changes too many files for no reason (e.g, removing one space, etc). I suggest only pushing the code part that is needed.
3- I think it is important to show the effectiveness of the approach on one of our recipe. For instance, I would plug this the top of our transducer recipe. @30stomercury, @aheba could you help with that?

…brain into develop

30stomercury · 2021-06-15T14:17:44Z

Hi @Antoine-Caubriere , thanks for this pr. Have you tested this ctc decode with wordpiece tokens? On my side the ngram model can only be wordpiece-based ngram model.
Also, for wordpiece-based model, giving labels in the yaml file is intractable (thousands of labels). Do you think it would be better to obtain the labels array in another way? For example, with a tokenizer.

TParcollet · 2021-06-15T14:32:36Z

@30stomercury Can't we simply extract the labels from the currently loaded tokenizer? @30stomercury since you already have BPE-based N-Gram LM, could you try to see if it works? I don't think @Antoine-Caubriere has one.

TParcollet · 2021-06-15T14:32:47Z

I would like to merge this PR ASAP ...

30stomercury · 2021-06-15T14:36:53Z

Hi @TParcollet , we can merge it first. I think labels can be obtained from the currently loaded tokenizer in train.py if we would like to use it for wp-based models.

mravanelli · 2021-06-15T14:53:37Z

It looks like tests are failing. We have to think about what we would like to do here. This c++ solution seems very fast, while the solution that @30stomercury is setting up is slower but more flexible (it supports many things, including CTC + Transformer LM). Moreover, in the future we might consider the integration with WFST of K2. The risk is to have too many search solutions and the users can get lost easily here. What do you think?

TParcollet · 2021-06-16T10:36:33Z

This is a quick patch, if we do not advertise it. @30stomercury, do you have an estimate on the time needed to end your work? 1 month?

I think we can merge this, and remove it later. We can drop the support of features, just like we will do with Fairseq w2v2 once I integrated the pretraining phase.

TParcollet · 2021-06-16T10:51:32Z

@mravanelli let's discuss this on Slack later today. Maybe if makes sense that we wait for @30stomercury solution, and we just redirect peoples interested to this PR ?

30stomercury · 2021-06-17T07:08:40Z

Hi @TParcollet , I've discussed with Mirco in the meeting this week. The main blocks have been done and the performance looks reasonable. I will adapt it to all recipes once other developers are okay with the implementation of scoring part in #751 .

gkucsko · 2021-09-08T15:07:10Z

hey, another more lightweight and maybe more flexible option could be: https://github.com/kensho-technologies/pyctcdecode
happy to help with implementation if needed

mravanelli · 2021-12-15T02:30:16Z

I think we should close it this as we are moving to k2 FST, right? @TParcollet and @Antoine-Caubriere

TParcollet · 2021-12-15T08:58:00Z

Yes

TParcollet · 2022-01-15T19:40:42Z

@mravanelli I was wondering last time if we shouldn't bring this class back. I mean, for now, we have absolutely no n-gram rescoring but only LM fusion (that doesn't work with word-level LM). Maybe we should add it as it always is great to be compliant with CTCdecode? I dunno.

Antoine-Caubriere added 3 commits May 19, 2021 18:24

add beamsearch via ctcdecode

9102e94

correct whitespace for test

7f8a64c

another whitespace correction for test

5a2175c

run pre-commit

e348719

mravanelli mentioned this pull request May 31, 2021

SpeechBrain 0.6.0 #751

Merged

mravanelli added the work in progress Not ready for merge label Jun 2, 2021

Antoine-Caubriere added 6 commits June 7, 2021 12:18

reset head

4fb1762

Merge branch 'develop' of https://github.com/Antoine-Caubriere/speech…

4e0b275

…brain into develop

Reverting to the state of the project at 9102e94

b026f26

pre-commit on decoders/ctc.py only

c56c0e8

correction for test

943bac6

add docstring example

3c397be

fixing docstring and few tests

4bbe4c6

TParcollet closed this Dec 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CTC beamsearch decoding via ctcdecode #773

CTC beamsearch decoding via ctcdecode #773

Antoine-Caubriere commented May 19, 2021

TParcollet commented May 19, 2021

TParcollet commented May 19, 2021

30stomercury commented May 21, 2021

Antoine-Caubriere commented May 22, 2021

30stomercury commented May 23, 2021 •

edited

Loading

mravanelli commented May 31, 2021 •

edited

Loading

30stomercury commented Jun 15, 2021

TParcollet commented Jun 15, 2021

TParcollet commented Jun 15, 2021

30stomercury commented Jun 15, 2021

mravanelli commented Jun 15, 2021

TParcollet commented Jun 16, 2021

TParcollet commented Jun 16, 2021

30stomercury commented Jun 17, 2021

gkucsko commented Sep 8, 2021

mravanelli commented Dec 15, 2021

TParcollet commented Dec 15, 2021

TParcollet commented Jan 15, 2022

CTC beamsearch decoding via ctcdecode #773

CTC beamsearch decoding via ctcdecode #773

Conversation

Antoine-Caubriere commented May 19, 2021

TParcollet commented May 19, 2021

TParcollet commented May 19, 2021

30stomercury commented May 21, 2021

Antoine-Caubriere commented May 22, 2021

30stomercury commented May 23, 2021 • edited Loading

mravanelli commented May 31, 2021 • edited Loading

30stomercury commented Jun 15, 2021

TParcollet commented Jun 15, 2021

TParcollet commented Jun 15, 2021

30stomercury commented Jun 15, 2021

mravanelli commented Jun 15, 2021

TParcollet commented Jun 16, 2021

TParcollet commented Jun 16, 2021

30stomercury commented Jun 17, 2021

gkucsko commented Sep 8, 2021

mravanelli commented Dec 15, 2021

TParcollet commented Dec 15, 2021

TParcollet commented Jan 15, 2022

30stomercury commented May 23, 2021 •

edited

Loading

mravanelli commented May 31, 2021 •

edited

Loading