Natural language processing

Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.

Recently HF trainer was extended to support full fp16 eval via --fp16_full_eval. I'd have expected it to be either equal or faster than eval with fp32 model, but surprisingly I have noticed a 25% slowdown when using it.

This may or may not impact deepspeed as well, which also runs eval in fp16, but we can't compare it to a baseline, since it only runs fp16.

I wonder if someone would like t

@gojomo

Not a high-priority at all, but it'd be more sensible for such a tutorial/testing utility corpus to be implemented elsewhere - maybe under /test/ or some other data- or doc- related module – rather than in gensim.models.word2vec.

Originally posted by @gojomo in RaRe-Technologies/gensim#2939 (comment)

While setting train_parameters to False very often we also may consider disabling dropout/batchnorm, in other words, to run the pretrained model in eval mode.
We've done a little modification to PretrainedTransformerEmbedder that allows providing whether the token embedder should be forced to eval mode during the training phase.

Do you this feature might be handy? Should I open a PR?

Hello,

It seems when a cached file is saved from calling dataset.map for preprocessing, it gets the user permissions and none of the user's group permissions. As we share data files across members of our team, this is causing a bit of an issue as we have to continually reset the permission of the files. Do you know any ways around this or a way to correctly set the permissions?

Hi I would like to propose a better implementation for 'test_indices':

We can remove the unneeded np.array casting:

Cleaner/New:
test_indices = list(set(range(len(texts))) - set(train_indices))

Old:
test_indices = np.array(list(set(range(len(texts))) - set(train_indices)))

Natural language processing

Here are 13,390 public repositories matching this topic...

huggingface / transformers

apachecn / AiLearning

google-research / bert

hankcs / HanLP

explosion / spaCy

oxford-cs-deepnlp-2017 / lectures

virgili0 / Virgilio

RaRe-Technologies / gensim

keon / awesome-nlp

bharathgs / Awesome-pytorch-list

RasaHQ / rasa

flairNLP / flair

chiphuyen / stanford-tensorflow-tutorials

allenai / allennlp

spencermountain / compromise

nltk / nltk

botpress / botpress

hanxiao / bert-as-service

graykode / nlp-tutorial

NLP-LOVE / ML-NLP

stanfordnlp / CoreNLP

sloria / TextBlob

huggingface / datasets

brightmart / text_classification

crownpku / Awesome-Chinese-NLP

brightmart / nlp_chinese_corpus

dragen1860 / TensorFlow-2.x-Tutorials

nfmcclure / tensorflow_cookbook

NLPchina / ansj_seg

zihangdai / xlnet