natural-language-processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

Tagging this as a Good First issue if anyone's interested.

When you look at the variables in the pretrained base uncased BERT the varibles look like list 1. When you do the training from scratch, 2 additional variables per layer are introduced, with suffixes adam_m and adam_v. It would be nice for someone to explain what these variables are? and what is their significance to the process of training?
If one were to manually initialize variables from a pri

I was going though the existing enhancement issues again and though it'd be nice to collect ideas for spaCy plugins and related projects. There are always people in the community who are looking for new things to build, so here's some inspiration ✨ For existing plugins and projects, check out the spaCy universe.

If you have questions about the projects I suggested,

Example (from TfidfTransformer)

if isinstance(docs[0], tuple):
    docs = [docs]
return [self.gensim_model[doc] for doc in docs]

This method expects a list of tuples, instead of an iterable. This means that the entire corpus has to be stored as a lis

Looping the process of writing images into the .tfrecords-file works fine, but how do I read multiple images from a .tfrecords-file?

Is there any simple solution? would be great if added to the code.

This output is unexpected. The In returns the capitalize In from PorterStemmer's output.

>>> from nltk.stem import PorterStemmer
>>> porter = PorterStemmer()
>>> porter.stem('In')
'In'

More details on https://stackoverflow.com/q/60387288/610569

I tried selecting hyper parameters of my model following "Tutorial 8: Model Tuning" below:
https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_8_MODEL_OPTIMIZATION.md

Although I got the "param_selection.txt" file in the result directory, I am not sure how to interpret the file, i.e. which parameter combination to use. At the bottom of the "param_selection.txt" file, I found "

Description of Problem:
Had to add a Facebook messenger channel integration into a bot I spun up and found the Facebook Messenger docs lacking a bit especially on the special response template keys quick_replies and elements; I had to read through the Facebook channel source to figure out what was allowed/disallowed.

Overview of the Solution:
FB Messenger docs updated with additio

Prerequisites

Please fill in by replacing [ ] with [x].

Are you running the latest bert-as-service?
Did you follow the installation and the usage instructions in README.md?
Did you check the [FAQ list in README.md](https://github.com/hanxiao/bert-as-se

The dutch sentiment file (see nl-sentiment.xml) has words with negative subjectivity, which does not respect the boundary values for subjectivity: [0.0, 1.0]. I did not check for how many cases, but the word "verloren" is an example.

Sentiment files for other languages may have negative subjectivity as well. Since

The documentation specifies:

The words and sentences properties are helpers that use the textblob.tokenizers.WordTokenizer and textblob.tokenizers.SentenceTokenizer classes, respectively.

You can use other tokenizers, such as those provided by NLTK, by passing them into the TextBlob constructor then accessing the t

Is your feature request related to a problem? Please describe.
Other related issues: #408 #251
I trained a Chinese model for spaCy, linked it to [spacy's package folder]/data/zh (using spacy link) and want to use that for ludwig. However, when I tried to set the config for ludwig, I received an error, which tell me that there is no way to load the Chinese model.

ValueError: Key ch

Excuse me, https://github.com/graykode/nlp-tutorial/blob/master/1-1.NNLM/NNLM-Torch.py#L50 The comment here may be wrong. It should be X = X.view(-1, n_step * m) # [batch_size, n_step * m]

Sorry for disturbing you.

Description

Add a ReadMe file in the GitHub folder.
Explain usage of the Templates

Other Comments

Principles of NLP Documentation
Each landing page at the folder level should have a ReadMe which explains -
○ Summary of what this folder offers.
○ Why and how it benefits users
○ As applicable - Documentation of using it, brief description etc
Scenarios folder:
○

When using a pocketsphinx wakeword mycroft tries to load a language specific model. If the model doesn't exist the load fails. (report on the forums)

This should be handled by using a fallback mechanism, so if no language specific model exists it should log a warning and fallback to using the english model that is included in mycr

Hi! Great package!

Both NLTK and Spacy offer the option to install models from local files, as with:

pip install /Users/you/en_core_web_sm-2.2.0.tar.gz

Do you have any thoughts on adding this to stanza? This makes it easier to deploy in an environment where the resources for the cod

Looking at the following diagram and the code you wrote , which is :

import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()

        self.hidden_size = hidden_size

        self.i2h =

It would be worth to provide a tutorial how to train a simple cross-language classification model using sentencepiece. Supposed to have a given training set and have chosen a model (let'say a simple Word2Vec plus softmax or a LSTM model, etc), how to use the created sentencepiece model (vocabulary/codes) to feed this model for train and inference?

When trying to find the options that can be used to initialize NLP.js, I have to delve into the code and navigate through many files to find what I am looking for, if I am lucky. Also the structure of the settings are hard to figure out and require a lot of trial and error.

For example, we are using the NlpManager with these:

{
    ner: { builtins: [] },
    autoSave: false,
    langua

It would be great if simple replace could be made (partly) case sensitive. By this I mean:

detect if there is uppercase in part before =
if so, change behaviour to suggest check left part case sensitively, and suggest the right parts the same; except for the first token when this is at the start of the sentence

一言でいうと

自然言語とプログラムコード双方で事前学習したモデルの提案。翻訳と同様、自然言語/プログラムコードをSeparatorで区切って学習させる。BERT(#959 )のMask以外にELECTRA(#1539 )の置換トークン発見を目的関数に使っている。自然言語によるコード検索、欠損語推論(多肢選択)で有効性を確認。

論文リンク

https://arxiv.org/abs/2002.08155

著者/所属機関

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, Ming Zhou

natural-language-processing

Here are 4,468 public repositories matching this topic...

huggingface / transformers

google-research / bert

hankcs / HanLP

explosion / spaCy

oxford-cs-deepnlp-2017 / lectures

RaRe-Technologies / gensim

keon / awesome-nlp

bharathgs / Awesome-pytorch-list

chiphuyen / stanford-tensorflow-tutorials

nltk / nltk

ShusenTang / Dive-into-DL-PyTorch

flairNLP / flair

RasaHQ / rasa

hanxiao / bert-as-service

clips / pattern

sloria / TextBlob

uber / ludwig

graykode / nlp-tutorial

lazyprogrammer / machine_learning_examples

microsoft / nlp-recipes

Description

Other Comments

MycroftAI / mycroft-core

stanfordnlp / stanza

spro / practical-pytorch

google / sentencepiece

jadore801120 / attention-is-all-you-need-pytorch

axa-group / nlp.js

languagetool-org / languagetool

PaddlePaddle / ERNIE

arXivTimes / arXivTimes

一言でいうと

論文リンク

著者/所属機関

NTMC-Community / MatchZoo

CANT ENCODE