Newest 'computational-linguistics+n-grams' Questions

1 vote

1 answer

211 views

how to interpret probabilities of sequences given by ngram language modelling?

Question about ngram models, might be a stupid question: With ngram models, the probability of a sequence is the product of the conditional probabilities of the n-grams into which the sequence can be ...

Gog

11

asked Dec 6, 2020 at 15:56

1 vote

0 answers

33 views

Language model created with SRILM does not sum to 1

I created an n-gram language model on the Penn Treebank using the following command: ngram-count -text $trainfile -order 5 -lm $temp/templm.ptb -gt3min 1 -gt4min 1 -kndiscount -interpolate -unk This ...

Kim Yung

11

asked Aug 2, 2017 at 8:30

0 votes

1 answer

4k views

Add-1 laplace smoothing for bigram implementation8

I am doing an exercise where I am determining the most likely corpus from a number of corpora when given a test sentence. I am trying to test an and-1 (laplace) smoothing model for this exercise. I ...

Héctor

29

asked Oct 4, 2016 at 23:43

2 votes

0 answers

415 views

Simple bigram letter model

I am working through an exercise where, given a set of corpora, I will implement a simple model on a test corpus to determine the most likely corpus. Say the corpora with which I want to learn are ...

Héctor

29

asked Sep 30, 2016 at 22:33

1 vote

1 answer

263 views

train a language model with google ngrams [closed]

I want to find a conditional probability of a word given its previous set of words. I plan to use Google N-grams for the same. However, being such a huge resource as it is, I don't think it is ...

Riken Shah

247

asked Jul 8, 2016 at 10:37

9 votes

3 answers

4k views

What's the real need for an end-symbol in n-gram models?

There's a footnote in Jurafsky & Martin (2008, p.89) pointing out that, without an end-symbol, an n-gram model would not be "a true probability distribution". Even after seeking the paper they've ...

mcrisc

191

asked Feb 24, 2015 at 14:23

4 votes

2 answers

434 views

Probabilities for 2-grams are higher than 1-grams in arpa file produced by kenlm

I'm using the 1 billion word language corpus to build a model with 1 and 2-grams. When using the lmplz program that comes with kenlm, I noticed that the arpa file seems to have higher probabilities ...

kristianp

141

asked Jul 20, 2014 at 2:35

4 votes

2 answers

714 views

I am looking for an Arabic ngram corpus

I am working in a project where i need to use an ngram model. So, i want to know if an Arabic ngram corpus exist. I have tried to find a corpus but all my researches failed. I know that for languages ...

Riadh Belkebir

171

asked Feb 7, 2014 at 13:33

3 votes

2 answers

431 views

What is the most efficient way to store n-grams in a database / data structure?

Let's assume we have Google's 1T n-grams. I want to be able to: Search for n-grams containing all of a set of words (such as finding all n-grams containing the words "dog" and "bone" in any position) ...

mtanti

131

asked Dec 22, 2013 at 18:09

2 votes

1 answer

1k views

Is perplexity in SRILM normalized for sentence length?

If I generate a language model with SRILM's ngram-count and then use ngram -unk -ppl text -lm model to get log probabilities and perplexity values, are the perplexities normalized for sentence length?

L3viathan

123

asked Dec 16, 2013 at 5:35

5 votes

1 answer

5k views

Common English bigrams / trigrams - recognising that a jumble of letters contain only valid English words

I have a database of one million strings which I want to rank against one another so that I can tell which contain meaningful English words / sentences. These strings contain no spaces or punctuation....

StuR

153

asked Mar 22, 2013 at 0:52

11 votes

1 answer

715 views

The power of trigram language models (2nd order Markov models)

Many people in computational linguistics seem to mention the unexpected power of trigram (or 2nd order Markov) models for language modeling. For instance, it has been stated (verbally) to me on ...

Julie

377

asked Oct 5, 2012 at 21:03

9 votes

2 answers

2k views

Are there any statistics or web services for n-grams of frequent English words?

I found this for six common subjects. But it doesn't contain the complete statistics about all common English words.

ARZ

233

asked Nov 2, 2011 at 16:19

Stack Exchange Network

All Questions

how to interpret probabilities of sequences given by ngram language modelling?

Language model created with SRILM does not sum to 1

Add-1 laplace smoothing for bigram implementation8

Simple bigram letter model

train a language model with google ngrams [closed]

What's the real need for an end-symbol in n-gram models?

Probabilities for 2-grams are higher than 1-grams in arpa file produced by kenlm

I am looking for an Arabic ngram corpus

What is the most efficient way to store n-grams in a database / data structure?

Is perplexity in SRILM normalized for sentence length?

Common English bigrams / trigrams - recognising that a jumble of letters contain only valid English words

The power of trigram language models (2nd order Markov models)

Are there any statistics or web services for n-grams of frequent English words?

Hot Network Questions

All Questions

Related Tags