Skip to content
#

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

Here are 26,173 public repositories matching this topic...

GaelVaroquaux
GaelVaroquaux commented Feb 7, 2022

Describe the issue linked to the documentation

Many legitimate notebook style examples have been broken, and specifically by the following PR
scikit-learn/scikit-learn#9061

List of examples to update

Note for maintainers: the content between begin/end_auto_generated is updated automatically by a script. If you edit it by hand your changes may be revert

Easy Documentation good first issue
superset
rumbin
rumbin commented Jan 31, 2022

The Mixed Time-Series chart type allows for configuring the title of the primary and the secondary y-axis.
However, while only the title of the primary axis is shown next to the axis, the title of the secondary one is placed at the upper end of the axis where it gets hidden by bar values and zoom controls.

How to reproduce the bug

  1. Create a mixed time-series chart
  2. Configure axi
good first issue #bug validation:validated preset:cares

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  • Updated Apr 3, 2022
  • Python
gjoliver
gjoliver commented Apr 13, 2022

Description

There are multiple user requests of using GraphNN data (node and edge lists) as sample batches into a custom RLlib model.

https://discuss.ray.io/t/rllib-variable-length-observation-spaces-without-padding/726
https://discuss.ray.io/t/working-with-graph-neural-networks-varying-state-space/5730/2

The recommended method today is to use Repeated observation space and VariableVal

good first issue enhancement P2 rllib-models
asaini
asaini commented Oct 1, 2021

Problem

See #3856 . Developer would like the ability to configure whether the developer menu or viewer menu is displayed while they are developing on cloud IDEs like Gitpod or Github Codespaces

Solution

Create a config option

showDeveloperMenu: true | false | auto

where

  • true: always shows the developer menu locally and while deployed
  • false: always sho
enhancement good first issue
pytorch-lightning
tsuga
tsuga commented Apr 15, 2022

🐛 Bug

tuner.scale_batch_size finds the suitable batch size and update the batch size of the model AND datamodule.
For the model, tuner.scale_batch_size updates the batch size in the model regardless of model.batch_size and model.hparams.batch_size.

However, for the datamodule, tuner.scale_batch_size updates datamodule.batch_size only, and keep datamodule.hparams.batch_size

bug good first issue trainer: tune lightningdatamodule
dash
baloe
baloe commented Apr 12, 2022

Documentation Link

https://matplotlib.org/stable/api/_as_gen/matplotlib.animation.FFMpegFileWriter.html

Problem

There is no word on what arguments to set via *args, **kwargs.

Also, I am wondering how to control where the temporary frame files are stored. I was unable to find them in my /tmp directory
okay, that is actually rather clear (frame_prefix in setup)

Documentation Good first issue
tirkarthi
tirkarthi commented Jan 12, 2022

Python 3.10 added suggestions for AttributeError and NameError in the error messages. It seems the suggestions are not stored in the exception object but calculated when Error is displayed. There is a note that that this won't work with IPython but it will be good to see if it's feasible. Opening an issue for discussion.

https://bugs.python.org/issue38530
https://docs.python.org/3/whatsnew/3.

gensim
mpenkov
mpenkov commented Jun 22, 2021

In gensim/models/fasttext.py:

    model = FastText(
        vector_size=m.dim,
        vector_size=m.dim,
        window=m.ws,
        window=m.ws,
        epochs=m.epoch,
        epochs=m.epoch,
        negative=m.neg,
        negative=m.neg,
        # FIXME: these next 2 lines read in unsupported FB FT modes (loss=3 softmax or loss=4 onevsall,
        # or model=3 supervi
bug difficulty easy good first issue fasttext
AnirudhDagar
AnirudhDagar commented Jan 24, 2022

Although the results look nice and ideal in all TensorFlow plots and are consistent across all frameworks, there is a small difference (more of a consistency issue). The result training loss/accuracy plots look like they are sampling on a lesser number of points. It looks more straight and smooth and less wiggly as compared to PyTorch or MXNet.

It can be clearly seen in chapter 6([CNN Lenet](ht

tensorflow-adapt-track good first issue
nni
pkubik
pkubik commented Mar 14, 2022

Describe the issue:
During computing Channel Dependencies reshape_break_channel_dependency does following code to ensure that the number of input channels equals the number of output channels:

in_shape = op_node.auxiliary['in_shape']
out_shape = op_node.auxiliary['out_shape']
in_channel = in_shape[1]
out_channel = out_shape[1]
return in_channel != out_channel

This is correct

bug help wanted good first issue model compression
danieldeutsch
danieldeutsch commented Jun 2, 2021

Is your feature request related to a problem? Please describe.
I typically used compressed datasets (e.g. gzipped) to save disk space. This works fine with AllenNLP during training because I can write my dataset reader to load the compressed data. However, the predict command opens the file and reads lines for the Predictor. This fails when it tries to load data from my compressed files.

Good First Issue Contributions welcome Feature request