Skip to content
Compare
Choose a tag to compare

Dataset Features

Dataset changes

Dataset Cards

Documentation

General improvements and bug fixes

New Contributors

Full Changelog: 2.3.2...2.4.0

Compare
Choose a tag to compare

Bug fixes

  • Fix double dots in data files by @lhoestq in #4505
    • fix a bug when /../ is passed to data_files causing FileNotFoundError
  • fix ETT m1/m2 test/val dataset by @kashif in #4499
  • Corrected broken links in doc by @clefourrier in #4501

New Contributors

Full Changelog: 2.3.1...2.3.2

Compare
Choose a tag to compare

Bug fixes

  • Fix patching module that doesn't exist by @lhoestq in #4495
    • fix bug when importing the lib when scipy is not installed
  • Re-add download_manager module in utils by @lhoestq in #4497
    • fix moved imports of DownloadConfig, DownloadMode, DownloadManager
  • Support streaming UDHR dataset by @albertvillanova in #4487

Full Changelog: 2.3.0...2.3.1

Compare
Choose a tag to compare

Datasets Changes

Dataset Features

Dataset Cards

  • Minor fixes/improvements in scene_parse_150 card by @mariosasko in #4447
  • Tidy up license metadata for google_wellformed_query, newspop, sick by @leondz in #4378
  • Fix example in opus_ubuntu, Add license info by @leondz in #4360
  • Update README.md of fquad by @lhoestq in #4450

Documentation

Other improvements and bug fixes

New Contributors

Full Changelog: 2.2.2...lol

Compare
Choose a tag to compare

Datasets fixes

Bug fixes

  • Support lists of multi-dimensional numpy arrays by @albertvillanova in #4194
  • Check if dataset features match before push in DatasetDict.push_to_hub by @mariosasko in #4372
  • Pin dill by @albertvillanova in #4380
    • dill 0.3.5 has some issues in transformers - pinning the version to <0.3.5 for now

Dataset Cards

  • Adding eval metadata for ade v2 by @sashavor in #4319
  • Adding eval metadata for AG News by @sashavor in #4329
  • Adding eval metadata to Allociné dataset by @sashavor in #4330
  • Adding eval metadata to Amazon Polarity by @sashavor in #4331
  • Adding eval metadata for arabic speech corpus by @sashavor in #4332
  • Adding eval metadata for Banking 77 by @sashavor in #4333
  • Eval metadata Batch 4: Tweet Eval, Tweets Hate Speech Detection, VCTK, Weibo NER, Wisesight Sentiment, XSum, Yahoo Answers Topics, Yelp Polarity, Yelp Review Full by @sashavor in #4338
  • Eval metadata batch 3: Reddit, Rotten Tomatoes, SemEval 2010, Sentiment 140, SMS Spam, Snips, SQuAD, SQuAD v2, Timit ASR by @sashavor in #4337
  • Eval metadata batch 1: BillSum, CoNLL2003, CoNLLPP, CUAD, Emotion, GigaWord, GLUE, Hate Speech 18, Hate Speech by @sashavor in #4335
  • Eval metadata batch 2 : Health Fact, Jigsaw Toxicity, LIAR, LJ Speech, MSRA NER, Multi News, NCBI Disease, Poem Sentiment by @sashavor in #4336

Docs

  • Add API code examples for Builder classes by @stevhliu in #4313
  • Add redirect to dataset script in the repo structure page by @lhoestq in #4369

Other improvements and bug fixes

New Contributors

Full Changelog: 2.2.1...2.2.2

Compare
Choose a tag to compare

Datasets bug fixes

  • Fix cnn_dailymail (dm stories were ignored) by @lhoestq in #4317
    • datasets 2.2.0 introduced a bug in cnn_dailymail and some examples were missing in the dataset

General improvements and bug fixes

New Contributors

Full Changelog: 2.2.0...2.2.1

Compare
Choose a tag to compare

Dataset Changes

Dataset Features

Dataset Cards

Metrics Changes

Metric Cards

Documentation

  • Document save_to_disk and push_to_hub on images and audio files by @lhoestq in #4193
  • Add to docs how to load from local script by @albertvillanova in #4200
  • Add code examples to API docs by @stevhliu in #4168
  • Add code examples for DatasetDict by @stevhliu in #4245
  • Add API code examples for IterableDataset by @stevhliu in #4274
  • Add packaged builder configs to the documentation by @lhoestq in #4307
  • [Imagefolder] Docs + Don't infer labels from file names when there are metadata + Error messages when metadata and images aren't linked correctly by @lhoestq in #4311

General improvements and bug fixes

New Contributors

Full Changelog: 2.1.0...2.2.0

Compare
Choose a tag to compare

Datasets Changes

Dataset Cards

Datasets Tags and Search on the Hugging Face Hub

Metrics Changes

Metric Cards

Documentation

General improvements and bug fixes

New Contributors

Full Changelog: 2.0.0...2.1.0

Compare
Choose a tag to compare

🤗 Datasets 2.0.0

We're happy to announce that our new documentation is available at hf.co/docs/datasets !

Dataset Features

  • Load a folder of images using the imagefolder dataset loader:
  • Push your image and audio datasets on the Hugging Face Hub with push_to_hub:
    • Add support for Audio and Image feature in push_to_hub by @mariosasko in #3685
  • New processing methods for streaming datasets:
    • Add IterableDataset.filter by @lhoestq in #3826
    • Manipulate columns on IterableDataset (rename columns, cast, etc.) by @lhoestq in #3862
    • Add the new methods to IterableDatasetDict by @lhoestq in #3923
  • And more:

Breaking changes

  • API changes for map and shuffle for datasets loaded in streaming mode:
    • Align map when streaming: update instead of overwrite + add missing parameters by @lhoestq in #3801
    • Align IterableDataset.shuffle with Dataset.shuffle by @lhoestq in #3842
  • Rename GenerateMode to DownloadMode by @albertvillanova in #3759
  • Remove deprecated methods/params (preparation for v2.0) by @mariosasko in #3803
  • Remove deprecated remove_columns param in filter by @mariosasko in #3827
  • Module namespace cleanup for v2.0 by @mariosasko in #3875

Dataset Changes

Dataset cards

Metric Changes

Metric cards

New documentation

General improvements and bug fixes

New Contributors

Full Changelog: 1.18.3...0.0.0

faf3d79
Compare
Choose a tag to compare

Bug fixes

Full Changelog: 1.18.3...1.18.4