Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

Tagging this as a Good First issue if anyone's interested.

When you look at the variables in the pretrained base uncased BERT the varibles look like list 1. When you do the training from scratch, 2 additional variables per layer are introduced, with suffixes adam_m and adam_v. It would be nice for someone to explain what these variables are? and what is their significance to the process of training?
If one were to manually initialize variables from a pri

I was going though the existing enhancement issues again and though it'd be nice to collect ideas for spaCy plugins and related projects. There are always people in the community who are looking for new things to build, so here's some inspiration ✨ For existing plugins and projects, check out the spaCy universe.

If you have questions about the projects I suggested,

Example (from TfidfTransformer)

if isinstance(docs[0], tuple):
    docs = [docs]
return [self.gensim_model[doc] for doc in docs]

This method expects a list of tuples, instead of an iterable. This means that the entire corpus has to be stored as a lis

Looping the process of writing images into the .tfrecords-file works fine, but how do I read multiple images from a .tfrecords-file?

Is there any simple solution? would be great if added to the code.

Some typical variations of email addresses are not detected:

works: nlp("send a message to bob@host.com today").emails()
empty: nlp("send a message to mr.bob@host.com today").emails()
empty: nlp("send a message to mailto:bob@host.com today").emails()

This output is unexpected. The In returns the capitalize In from PorterStemmer's output.

>>> from nltk.stem import PorterStemmer
>>> porter = PorterStemmer()
>>> porter.stem('In')
'In'

More details on https://stackoverflow.com/q/60387288/610569

My feature request is to include an option on a button made from choice skill, to redirect a link to an external url...

Here's a detailed explanation including screenshots

https://help.botpress.io/t/how-to-redirect-to-an-external-url-while-using-a-button-made-in-choice-skill/1791

This option will be really beneficial using choice skill buttons since at the moment, you can only add an ext

I tried selecting hyper parameters of my model following "Tutorial 8: Model Tuning" below:
https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_8_MODEL_OPTIMIZATION.md

Although I got the "param_selection.txt" file in the result directory, I am not sure how to interpret the file, i.e. which parameter combination to use. At the bottom of the "param_selection.txt" file, I found "

Describe the bug

Calling Predictor.get_gradients() returns an empty dictionary

To Reproduce
I am replicating the binary sentiment classification tasked described in the paper 'Attention is not Explanation ' (Jain and Wallace 2019 - https://arxiv.org/pdf/1902.10186.pdf).

My first experiment is on the Stanford Sentiment TreeBank Dataset. I need to measure the correlation between th

Description of Problem:
Had to add a Facebook messenger channel integration into a bot I spun up and found the Facebook Messenger docs lacking a bit especially on the special response template keys quick_replies and elements; I had to read through the Facebook channel source to figure out what was allowed/disallowed.

Overview of the Solution:
FB Messenger docs updated with additio

Prerequisites

Please fill in by replacing [ ] with [x].

Are you running the latest bert-as-service?
Did you follow the installation and the usage instructions in README.md?
Did you check the [FAQ list in README.md](https://github.com/hanxiao/bert-as-se

As per the StanfordCoreNLP documentation for CoreLabel, The functions after() and before() should return white space strings between the token and the next/previous tokens respectively.
However, they return an empty string always even if there are some white spaces when the tokenizer option **normalizeOth

The documentation specifies:

The words and sentences properties are helpers that use the textblob.tokenizers.WordTokenizer and textblob.tokenizers.SentenceTokenizer classes, respectively.

You can use other tokenizers, such as those provided by NLTK, by passing them into the TextBlob constructor then accessing the t

Excuse me, https://github.com/graykode/nlp-tutorial/blob/master/1-1.NNLM/NNLM-Torch.py#L50 The comment here may be wrong. It should be X = X.view(-1, n_step * m) # [batch_size, n_step * m]

Sorry for disturbing you.

Hi I would like to propose a better implementation for 'test_indices':

We can remove the unneeded np.array casting:

Cleaner/New:
test_indices = list(set(range(len(texts))) - set(train_indices))

Old:
test_indices = np.array(list(set(range(len(texts))) - set(train_indices)))

Hi, can batchify method only batch a doc in a file, not two docs in the same file? Why the EOD flag not use to distinguish different docs in data_utils.py ?

Natural language processing

Here are 9,591 public repositories matching this topic...

huggingface / transformers

apachecn / AiLearning

google-research / bert

hankcs / HanLP

explosion / spaCy

oxford-cs-deepnlp-2017 / lectures

virgili0 / Virgilio

RaRe-Technologies / gensim

keon / awesome-nlp

bharathgs / Awesome-pytorch-list

chiphuyen / stanford-tensorflow-tutorials

spencermountain / compromise

nltk / nltk

botpress / botpress

flairNLP / flair

allenai / allennlp

RasaHQ / rasa

hanxiao / bert-as-service

stanfordnlp / CoreNLP

sloria / TextBlob

graykode / nlp-tutorial

brightmart / text_classification

nfmcclure / tensorflow_cookbook

NLPchina / ansj_seg

crownpku / Awesome-Chinese-NLP

zihangdai / xlnet

dragen1860 / TensorFlow-2.x-Tutorials

NLP-LOVE / ML-NLP

brightmart / nlp_chinese_corpus

haifengl / smile