natural-language-processing
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
Here are 5,494 public repositories matching this topic...
When you look at the variables in the pretrained base uncased BERT the varibles look like list 1. When you do the training from scratch, 2 additional variables per layer are introduced, with suffixes adam_m and adam_v. It would be nice for someone to explain what these variables are? and what is their significance to the process of training?
If one were to manually initialize variables from a pri
-
Updated
May 13, 2020 - Python
-
Updated
May 20, 2020 - Python
I was going though the existing enhancement issues again and though it'd be nice to collect ideas for spaCy plugins and related projects. There are always people in the community who are looking for new things to build, so here's some inspiration
If you have questions about the projects I suggested,
-
Updated
May 19, 2020 - Python
-
Updated
Jun 12, 2017
Example (from TfidfTransformer)
if isinstance(docs[0], tuple):
docs = [docs]
return [self.gensim_model[doc] for doc in docs]
This method expects a list of tuples, instead of an iterable. This means that the entire corpus has to be stored as a lis
-
Updated
May 18, 2020
-
Updated
May 13, 2020
Looping the process of writing images into the .tfrecords-file works fine, but how do I read multiple images from a .tfrecords-file?
Is there any simple solution? would be great if added to the code.
This output is unexpected. The In
returns the capitalize In
from PorterStemmer's output.
>>> from nltk.stem import PorterStemmer
>>> porter = PorterStemmer()
>>> porter.stem('In')
'In'
More details on https://stackoverflow.com/q/60387288/610569
-
Updated
Feb 23, 2020 - Jupyter Notebook
I tried selecting hyper parameters of my model following "Tutorial 8: Model Tuning" below:
https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_8_MODEL_OPTIMIZATION.md
Although I got the "param_selection.txt" file in the result directory, I am not sure how to interpret the file, i.e. which parameter combination to use. At the bottom of the "param_selection.txt" file, I found "
Describe the bug
Calling Predictor.get_gradients() returns an empty dictionary
To Reproduce
I am replicating the binary sentiment classification tasked described in the paper 'Attention is not Explanation ' (Jain and Wallace 2019 - https://arxiv.org/pdf/1902.10186.pdf).
My first experiment is on the Stanford Sentiment TreeBank Dataset. I need to measure the correlation between th
Description of Problem:
Had to add a Facebook messenger channel integration into a bot I spun up and found the Facebook Messenger docs lacking a bit especially on the special response template keys quick_replies
and elements
; I had to read through the Facebook channel source to figure out what was allowed/disallowed.
Overview of the Solution:
FB Messenger docs updated with additio
-
Updated
May 18, 2020
Prerequisites
Please fill in by replacing
[ ]
with[x]
.
- Are you running the latest
bert-as-service
? - Did you follow the installation and the usage instructions in
README.md
? - Did you check the [FAQ list in
README.md
](https://github.com/hanxiao/bert-as-se
The dutch sentiment file (see nl-sentiment.xml) has words with negative subjectivity, which does not respect the boundary values for subjectivity: [0.0, 1.0]. I did not check for how many cases, but the word "verloren" is an example.
Sentiment files for other languages may have negative subjectivity as well. Since
As per the StanfordCoreNLP documentation for CoreLabel, The functions after() and before() should return white space strings between the token and the next/previous tokens respectively.
However, they return an empty string always even if there are some white spaces when the tokenizer option **normalizeOth
The words and sentences properties are helpers that use the textblob.tokenizers.WordTokenizer and textblob.tokenizers.SentenceTokenizer classes, respectively.
You can use other tokenizers, such as those provided by NLTK, by passing them into the TextBlob constructor then accessing the t
Is your feature request related to a problem? Please describe.
Other related issues: #408 #251
I trained a Chinese model for spaCy, linked it to [spacy's package folder]/data/zh
(using spacy link
) and want to use that for ludwig. However, when I tried to set the config for ludwig, I received an error, which tell me that there is no way to load the Chinese model.
ValueError: Key ch
Excuse me, https://github.com/graykode/nlp-tutorial/blob/master/1-1.NNLM/NNLM-Torch.py#L50 The comment here may be wrong. It should be X = X.view(-1, n_step * m) # [batch_size, n_step * m]
Sorry for disturbing you.
-
Updated
May 20, 2020 - Python
-
Updated
May 17, 2020 - Python
Description
Add a ReadMe file in the GitHub folder.
Explain usage of the Templates
Other Comments
Principles of NLP Documentation
Each landing page at the folder level should have a ReadMe which explains -
○ Summary of what this folder offers.
○ Why and how it benefits users
○ As applicable - Documentation of using it, brief description etc
Scenarios folder:
○
When using a pocketsphinx wakeword mycroft tries to load a language specific model. If the model doesn't exist the load fails. (report on the forums)
This should be handled by using a fallback mechanism, so if no language specific model exists it should log a warning and fallback to using the english model that is included in mycr
Created by Alan Turing
- Wikipedia
- Wikipedia
Tagging this as a Good First issue if anyone's interested.