Skip to content
#

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

Here are 19,401 public repositories matching this topic...

jnothman
jnothman commented May 12, 2021

We should be using pkg_resources (or importlib.resources if our min Python version is 3.7) instead of uses of __file__.

$ get grep '__file__' sklearn/
sklearn/__check_build/__init__.py:    local_dir = os.path.split(__file__)[0]
sklearn/datasets/_base.py:    module_path = dirname(__file__)
sklearn/datasets/_base.py:    module_path = dirname(__file__)
sklearn/datasets/_base.py:    
superset
GregOnEvo
GregOnEvo commented May 11, 2021

Keyboard navigation in the control panel of the Explore view is difficult.

Expected results

You should be able to move focus between adjacent controls in the control panel with a single Tab key press
and visually distinguish what element has focus. You should be able to interact with controls the keyboard
(Enter or space bar for button-like things).

Actual results

Several tab

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  • Updated May 13, 2021
  • Python
rlan
rlan commented Jun 11, 2021

What is the problem?

After running tune.run, the experiment results are missing from progress.csv but are in result.json.
A possible solution is written by mannyv: https://discuss.ray.io/t/saving-checkpoints-with-good-custom-metric-using-tune-run/2109/12

Ray version and other system information (Python version, TensorFlow version, OS):

Ray version 1.2.0.
Tensorflow 1.15.4.
Python

dash
pytorch-lightning
carmocca
carmocca commented Jun 10, 2021

🐛 Bug

If accumulate_grad_batches is enabled, we don't call on_after_backward until we step the optimizers

https://github.com/PyTorchLightning/pytorch-lightning/blob/d209b689796719d1ab4fcc8e1c26b8b57cd348c4/pytorch_lightning/trainer/training_loop.py#L757-L763

This means on_after_backward is acting like on_before_optimizer_step.

So we should add that and always run `on_after_b

gensim
gojomo
gojomo commented Jun 12, 2021

(triggered by SO question: https://stackoverflow.com/questions/67944732/using-my-own-stopword-list-with-gensim-corpora-textcorpus-textcorpus/67951592#67951592)

Gensim has two remove_stopwords() functions with similar, but slightly-different behavior that risks confusing users.

gensim.parsing.preprocessing.remove_stopwords takes a space-delimited string, and always consults the current

danieldeutsch
danieldeutsch commented Jun 2, 2021

Is your feature request related to a problem? Please describe.
I typically used compressed datasets (e.g. gzipped) to save disk space. This works fine with AllenNLP during training because I can write my dataset reader to load the compressed data. However, the predict command opens the file and reads lines for the Predictor. This fails when it tries to load data from my compressed files.

nni