Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plt.hist() fails with TensorFlow Numpy emulation #19574

Open
FlorinAndrei opened this issue Feb 24, 2021 · 24 comments
Open

plt.hist() fails with TensorFlow Numpy emulation #19574

FlorinAndrei opened this issue Feb 24, 2021 · 24 comments

Comments

@FlorinAndrei
Copy link

@FlorinAndrei FlorinAndrei commented Feb 24, 2021

Bug report

Bug summary

Generating np.random.randn(1000) values, visualizing them with plt.hist(). Works fine with Numpy.

When I replace Numpy with tensorflow.experimental.numpy, Matplotlib 3.3.4 fails to display the histogram correctly. Matplotlib 3.2.2 works fine.

Code for reproduction

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import tensorflow.experimental.numpy as tnp

# bad image
labels1 = 15 + 2 * tnp.random.randn(1000)
_ = plt.hist(labels1)

# good image
labels2 = 15 + 2 * np.random.randn(1000)
_ = plt.hist(labels2)

Actual outcome

np-bad

Expected outcome

np-good

Matplotlib version

  • Operating system: Windows 10
  • Matplotlib version (import matplotlib; print(matplotlib.__version__)): 3.3.4
  • Matplotlib backend (print(matplotlib.get_backend())): module://ipykernel.pylab.backend_inline
  • Python version: 3.8.7
  • Jupyter version (if applicable): see below
  • Other libraries: see below

TensorFlow 2.4.1

jupyter --version
jupyter core     : 4.7.0
jupyter-notebook : 6.1.6
qtconsole        : 5.0.1
ipython          : 7.20.0
ipykernel        : 5.4.2
jupyter client   : 6.1.7
jupyter lab      : not installed
nbconvert        : 6.0.7
ipywidgets       : 7.6.3
nbformat         : 5.0.8
traitlets        : 5.0.5

Python installed from python.org as an exe installer. Everything else is pip install --user

Bug opened with TensorFlow on this same issue:

tensorflow/tensorflow#46274

@jklymak
Copy link
Member

@jklymak jklymak commented Feb 25, 2021

I think we are happy to help, but we make no guarantee to support all numpy-like objects that exist in the python universe. If you want help with this I suggest figuring out how to reproduce w/o tensorflow so we can advise.

@tacaswell
Copy link
Member

@tacaswell tacaswell commented Feb 25, 2021

This looks like a failure of the up-cast to 2D logic (and instead of 1 histogram of N elements we are seeing N histograms of 1 element).

@tacaswell
Copy link
Member

@tacaswell tacaswell commented Feb 25, 2021

Specifically this code:

def _reshape_2D(X, name):
"""
Use Fortran ordering to convert ndarrays and lists of iterables to lists of
1D arrays.
Lists of iterables are converted by applying `numpy.asanyarray` to each of
their elements. 1D ndarrays are returned in a singleton list containing
them. 2D ndarrays are converted to the list of their *columns*.
*name* is used to generate the error message for invalid inputs.
"""
# unpack if we have a values or to_numpy method.
try:
X = X.to_numpy()
except AttributeError:
try:
if isinstance(X.values, np.ndarray):
X = X.values
except AttributeError:
pass
# Iterate over columns for ndarrays.
if isinstance(X, np.ndarray):
X = X.T
if len(X) == 0:
return [[]]
elif X.ndim == 1 and np.ndim(X[0]) == 0:
# 1D array of scalars: directly return it.
return [X]
elif X.ndim in [1, 2]:
# 2D array, or 1D array of iterables: flatten them first.
return [np.reshape(x, -1) for x in X]
else:
raise ValueError(f'{name} must have 2 or fewer dimensions')
# Iterate over list of iterables.
if len(X) == 0:
return [[]]
result = []
is_1d = True
for xi in X:
# check if this is iterable, except for strings which we
# treat as singletons.
if (isinstance(xi, collections.abc.Iterable) and
not isinstance(xi, str)):
is_1d = False
xi = np.asanyarray(xi)
nd = np.ndim(xi)
if nd > 1:
raise ValueError(f'{name} must have 2 or fewer dimensions')
result.append(xi.reshape(-1))
if is_1d:
# 1D array of scalars: directly return it.
return [np.reshape(result, -1)]
else:
# 2D array, or 1D array of iterables: use flattened version.
return result

which recently got re-worked due to changes in how numpy auto-up-casts object / ragged arrays.

@FlorinAndrei
Copy link
Author

@FlorinAndrei FlorinAndrei commented Feb 25, 2021

I suggest figuring out how to reproduce w/o tensorflow so we can advise

The Tensorflow failure is the only case I know, and it's how I ended up reporting this bug. I was just trying to learn the Numpy emulation in Tensorflow.

@mwaskom
Copy link

@mwaskom mwaskom commented Feb 25, 2021

I think the two problems are:

  1. Matplotlib preprocesses instances of np.ndarray so they are always 2D, with vectors gaining an outer dimension. The tensorflow object doesn't get this preprocessing. (Matplotlib should probably be using __array__ here?)

  2. The introspection of the elements in X to test for iterableness (which lets a normal sequence of numbers work as expected) fails because the elements of the tensorflow object are 0D but an instance of Iterable:

from collections.abc import Iterable
print(isinstance(labels1[0], Iterable))
print(isinstance(labels2[0], Iterable))

@tacaswell
Copy link
Member

@tacaswell tacaswell commented Feb 25, 2021

It is reasonable to expect that we will ducktype TF arrays correctly here. For now I think there is a user-space work around to cant to numpy before passing to us. That said, none of the core developers are regular TF users so any help with duck-typing them correctly would be greatly appreciated.

Internally we can not directly use np.asarray before handling units because that will break things like pint, but checking for __array__ may work better?

Labeled this as good first issue because I think all of the work that needs to be done is in _reshape_2D and requires little understanding of the rest of Matplotlib internals.

@jklymak
Copy link
Member

@jklymak jklymak commented Feb 25, 2021

I think it is desirable to ducktype what we can, but we need some rules for what that means. Random object x used to work, but now it doesn't is not a good argument that random x should ever have worked. It would be preferable to try and stipulate what properties x needs to have for successful duck typing.

@mwaskom
Copy link

@mwaskom mwaskom commented Feb 25, 2021

Isn't that what obj.__array__ is for?

@tacaswell
Copy link
Member

@tacaswell tacaswell commented Feb 25, 2021

Long term we can rely on the behavior defined by https://data-apis.org/blog/array_api_standard_release/ but for now I suspect we are going to have to keep playing whack-a-mole with these.

@jklymak
Copy link
Member

@jklymak jklymak commented Feb 25, 2021

As discussed on the dev call: We are happy to take a patch that improves the duck typing here, and provides a mock class that behaves like the object so that we can run tests. We cannot take downstream libraries as a test dependency (except for pandas) so someone will have to go through the effort of making a mock.

In user space, please convert to a numpy array or list of lists or list of numpy arrays. Nothing else is documented or guaranteed to work. While we don't try to break things, passing other objects are not guaranteed to be stable.

Downstream libraries should be able to avoid these problems by providing a to_numpy method. Its possible we could check for __array__ as well, but it would be nice to have a test object to work with so we can test that. (In this case its not clear that will work if array elements are all going to be Iterable.)

I'll close because we don't have an immediate action, but again we very much welcome feedback either as a new self-contained issue or PR. Re-opening at @tacaswell request - we will definitely still take a PR for this.

@jklymak jklymak closed this Feb 25, 2021
@jklymak jklymak reopened this Feb 25, 2021
@mwaskom
Copy link

@mwaskom mwaskom commented Feb 25, 2021

(In this case its not clear that will work if array elements are all going to be Iterable.)

If _reshape_2D duck-typed ndarrays, this wouldn't be an issue because the tensorflow object would be 2D when you get to that check. But it's an added wrinkle that explains why it doesn't work as you might expect from the fact that the tensorflow array looks like a vector.

@QuLogic QuLogic removed this from the v3.4.1 milestone Mar 30, 2021
@QuLogic QuLogic added this to the v3.4.2 milestone Mar 30, 2021
@tacaswell tacaswell removed this from the v3.4.2 milestone May 5, 2021
@tacaswell tacaswell added this to the v3.5.0 milestone May 5, 2021
@tacaswell
Copy link
Member

@tacaswell tacaswell commented May 5, 2021

Pushed to 3.5. I think the action items here are

  1. we need to document what we expect to work or not so that in the future we can have a clear line of what we
  2. we would likely take a patch to cbook that handles the reality of what TensorFlow is doing because we do not have 1 yet.

@jklymak
Copy link
Member

@jklymak jklymak commented May 5, 2021

Is this really a good first issue?

@timhoffm
Copy link
Member

@timhoffm timhoffm commented May 5, 2021

Probably 2. can be done as a first issue.

For 1. I don't think we even know what we expect.

@tacaswell
Copy link
Member

@tacaswell tacaswell commented May 5, 2021

I think part 2 is a good first issue to Matplotlib for someone who is fluent in tensorflow and array conventions.

@sriakshata
Copy link

@sriakshata sriakshata commented Aug 19, 2021

Hello, I need clarity on this issue more correctly I couldn't get it . The problem is only with the matplotlib 3.3.4 version, as with the latest version 3.4.3 version it shows correct output on google colab. I am new here and I attempting to first time. Please help me understand.
Screenshot (314)

@tacaswell
Copy link
Member

@tacaswell tacaswell commented Aug 19, 2021

Are you using the same version of tensorflow with both mpl3.4.3 and mpl3.3.4?

If not, it could be that something changed on the TF side to match our expectation of what an "array" is better.

If so, then maybe we accidentally fixed this and it can be closed!

@sriakshata
Copy link

@sriakshata sriakshata commented Aug 20, 2021

I have used mpl 3.4.3 with tensorflow 2. I haven't worked with mpl 3.3.4. It works fine with mpl 3.4.3 with tensorflow 2.

@mwaskom
Copy link

@mwaskom mwaskom commented Aug 20, 2021

Do you get the same results as in this comment?

from collections.abc import Iterable
print(isinstance(labels1[0], Iterable))
print(isinstance(labels2[0], Iterable))

@sriakshata
Copy link

@sriakshata sriakshata commented Aug 20, 2021

@mwaskom I am not sure about the idea mentioned in that comment. I just tried getting the plot with the given code and it worked fine.

@mwaskom
Copy link

@mwaskom mwaskom commented Aug 20, 2021

Can you try running that code in the environment where the plot works?

@sriakshata
Copy link

@sriakshata sriakshata commented Aug 21, 2021

I have tried running it. It works fine with mpl 3.4.3 and tensorflow 2

@mwaskom
Copy link

@mwaskom mwaskom commented Aug 21, 2021

Yes but what does it print?

@sriakshata
Copy link

@sriakshata sriakshata commented Aug 22, 2021

See above comment https://github.com/matplotlib/matplotlib/issues/19574#issuecomment-902075023. I have attached the output I got.

@QuLogic QuLogic removed this from the v3.5.0 milestone Sep 25, 2021
@QuLogic QuLogic added this to the v3.6.0 milestone Sep 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants