Natural language processing
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
Here are 18,971 public repositories matching this topic...
-
Updated
Apr 24, 2022 - Python
-
Updated
May 9, 2022 - Python
-
Updated
May 8, 2022 - Python
-
Updated
Jun 2, 2022 - Python
-
Updated
Jun 12, 2017
-
Updated
Jun 2, 2022 - Python
Describe the bug
Streaming Datasets can't be pickled, so any interaction between them and multiprocessing results in a crash.
Steps to reproduce the bug
import transformers
from transformers import Trainer, AutoModelForCausalLM, TrainingArguments
import datasets
ds = datasets.load_dataset('oscar', "unshuffled_deduplicated_en", split='train', streaming=True).with_format("
-
Updated
May 19, 2022
In gensim/models/fasttext.py:
model = FastText(
vector_size=m.dim,
vector_size=m.dim,
window=m.ws,
window=m.ws,
epochs=m.epoch,
epochs=m.epoch,
negative=m.neg,
negative=m.neg,
# FIXME: these next 2 lines read in unsupported FB FT modes (loss=3 softmax or loss=4 onevsall,
# or model=3 supervi
-
Updated
Apr 28, 2022
-
Updated
Apr 1, 2022 - Jupyter Notebook
-
Updated
May 31, 2022 - Python
Is your feature request related to a problem? Please describe.
I typically used compressed datasets (e.g. gzipped) to save disk space. This works fine with AllenNLP during training because I can write my dataset reader to load the compressed data. However, the predict
command opens the file and reads lines for the Predictor
. This fails when it tries to load data from my compressed files.
Checking the Python files in NLTK with "python -m doctest" reveals that many tests are failing. In many cases, the failures are just cosmetic discrepancies between the expected and the actual output, such as missing a blank line, or unescaped linebreaks. Other cases may be real bugs.
If these failures could be avoided, it would become possible to improve CI by running "python -m doctest" each t
-
Updated
May 26, 2022 - Python
-
Updated
Jul 25, 2021 - Jupyter Notebook
-
Updated
Jun 1, 2022 - JavaScript
-
Updated
Dec 22, 2020 - Python
-
Updated
Jun 1, 2022 - TypeScript
-
Updated
May 31, 2022
-
Updated
May 26, 2022 - Java
-
Updated
May 24, 2022 - Python
-
Updated
Jun 2, 2022 - Python
-
Updated
Jun 2, 2022
-
Updated
Mar 30, 2022 - Python
Created by Alan Turing
- Wikipedia
- Wikipedia
Feature request
Dear huggingface community,
I am experimenting with the ViTMAE model from the transformers library. The ViTMAEConfig class has the option "num_channels" to specify the number of input (color) channels belonging to an image. If I modify this, say, to 1 (for processing grayscale images), the model throws an error, due to the number "3" being hard-coded into the functions "patch