Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

Describe the issue linked to the documentation

Many legitimate notebook style examples have been broken, and specifically by the following PR
scikit-learn/scikit-learn#9061

List of examples to update

Note for maintainers: the content between begin/end_auto_generated is updated automatically by a script. If you edit it by hand your changes may be revert

The Mixed Time-Series chart type allows for configuring the title of the primary and the secondary y-axis.
However, while only the title of the primary axis is shown next to the axis, the title of the secondary one is placed at the upper end of the axis where it gets hidden by bar values and zoom controls.

How to reproduce the bug

Create a mixed time-series chart
Configure axi

@simon-mo

Description

Per https://discuss.ray.io/t/how-do-i-sample-from-a-ray-datasets/5308, we should add a random_sample(N) API that returns N records from a Dataset. This can be implemented via a map_batches() followed by a take().

cc @simon-mo @clarkzinzow

Use case

Random sample is useful for a variety of scenarios, including creating training batches, and downsampling the dataset for

Problem

See #3856 . Developer would like the ability to configure whether the developer menu or viewer menu is displayed while they are developing on cloud IDEs like Gitpod or Github Codespaces

Solution

Create a config option

showDeveloperMenu: true | false | auto

where

true: always shows the developer menu locally and while deployed
false: always sho

🐛 Bug

tuner.scale_batch_size finds the suitable batch size and update the batch size of the model AND datamodule.
For the model, tuner.scale_batch_size updates the batch size in the model regardless of model.batch_size and model.hparams.batch_size.

However, for the datamodule, tuner.scale_batch_size updates datamodule.batch_size only, and keep datamodule.hparams.batch_size

Describe your context
Please provide us your environment, so we can easily reproduce the issue.

replace the result of pip list | grep dash below

dash                      2.0.0
dash-bootstrap-components 1.0.0

if frontend related, tell us your Browser, Version and OS
- OS: [e.g. iOS] Windows
- Browser [e.g. chrome, safari]: Chrome 96.0x, Edge 96.0x, Firefox

Bug summary

When the build gets to https://github.com/matplotlib/matplotlib/blob/main/src/_tkagg.cpp#L262-L273 on Cygwin, the build fails with a few goto crosses initialization warnings, which are easy to fix, and two error: ‘PyErr_SetFromWindowsErr’ was not declared in this scope, which are less easy to fix.

Code for reproduction

pip install matplotlib

The warnings at

https://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html

do not mention the issues with reloading modules with enums:

Enum and Flag are compared by identity (is, even if == is used (similarly to None))
reloading a module, or importing the same module by a different name, creates new enums (look the same, but are not the same)

Although the results look nice and ideal in all TensorFlow plots and are consistent across all frameworks, there is a small difference (more of a consistency issue). The result training loss/accuracy plots look like they are sampling on a lesser number of points. It looks more straight and smooth and less wiggly as compared to PyTorch or MXNet.

It can be clearly seen in chapter 6([CNN Lenet](ht

In gensim/models/fasttext.py:

    model = FastText(
        vector_size=m.dim,
        vector_size=m.dim,
        window=m.ws,
        window=m.ws,
        epochs=m.epoch,
        epochs=m.epoch,
        negative=m.neg,
        negative=m.neg,
        # FIXME: these next 2 lines read in unsupported FB FT modes (loss=3 softmax or loss=4 onevsall,
        # or model=3 supervi

Describe the issue:
During computing Channel Dependencies reshape_break_channel_dependency does following code to ensure that the number of input channels equals the number of output channels:

in_shape = op_node.auxiliary['in_shape']
out_shape = op_node.auxiliary['out_shape']
in_channel = in_shape[1]
out_channel = out_shape[1]
return in_channel != out_channel

This is correct

Is your feature request related to a problem? Please describe.
I typically used compressed datasets (e.g. gzipped) to save disk space. This works fine with AllenNLP during training because I can write my dataset reader to load the compressed data. However, the predict command opens the file and reads lines for the Predictor. This fails when it tries to load data from my compressed files.

Data Science

Here are 26,466 public repositories matching this topic...

keras-team / keras

scikit-learn / scikit-learn

Describe the issue linked to the documentation

List of examples to update

apache / superset

How to reproduce the bug

microsoft / ML-For-Beginners

GokuMohandas / MadeWithML

CamDavidsonPilon / Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

explosion / spaCy

donnemartin / data-science-ipython-notebooks

eriklindernoren / ML-From-Scratch

ray-project / ray

Description

Use case

eugeneyan / applied-ml

AMAI-GmbH / AI-Expert-Roadmap

streamlit / streamlit

Problem

Solution

academic / awesome-datascience

PyTorchLightning / pytorch-lightning

🐛 Bug

plotly / dash

matplotlib / matplotlib

Bug summary

Code for reproduction

ipython / ipython

fastai / fastbook

afshinea / stanford-cs-229-machine-learning

virgili0 / Virgilio

d2l-ai / d2l-en

RaRe-Technologies / gensim

microsoft / recommenders

bharathgs / Awesome-pytorch-list

qax-os / excelize

rasbt / python-machine-learning-book

microsoft / nni

allenai / allennlp

0xnr / awesome-bigdata

Related Topics