plt.hist() fails with TensorFlow Numpy emulation #19574

FlorinAndrei · 2021-02-24T21:52:19Z

Bug report

Bug summary

Generating np.random.randn(1000) values, visualizing them with plt.hist(). Works fine with Numpy.

When I replace Numpy with tensorflow.experimental.numpy, Matplotlib 3.3.4 fails to display the histogram correctly. Matplotlib 3.2.2 works fine.

Code for reproduction

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import tensorflow.experimental.numpy as tnp

# bad image
labels1 = 15 + 2 * tnp.random.randn(1000)
_ = plt.hist(labels1)

# good image
labels2 = 15 + 2 * np.random.randn(1000)
_ = plt.hist(labels2)

Actual outcome

Expected outcome

Matplotlib version

Operating system: Windows 10
Matplotlib version (import matplotlib; print(matplotlib.__version__)): 3.3.4
Matplotlib backend (print(matplotlib.get_backend())): module://ipykernel.pylab.backend_inline
Python version: 3.8.7
Jupyter version (if applicable): see below
Other libraries: see below

TensorFlow 2.4.1

jupyter --version
jupyter core     : 4.7.0
jupyter-notebook : 6.1.6
qtconsole        : 5.0.1
ipython          : 7.20.0
ipykernel        : 5.4.2
jupyter client   : 6.1.7
jupyter lab      : not installed
nbconvert        : 6.0.7
ipywidgets       : 7.6.3
nbformat         : 5.0.8
traitlets        : 5.0.5

Python installed from python.org as an exe installer. Everything else is pip install --user

Bug opened with TensorFlow on this same issue:

tensorflow/tensorflow#46274

The text was updated successfully, but these errors were encountered:

jklymak · 2021-02-25T00:36:38Z

I think we are happy to help, but we make no guarantee to support all numpy-like objects that exist in the python universe. If you want help with this I suggest figuring out how to reproduce w/o tensorflow so we can advise.

tacaswell · 2021-02-25T00:59:19Z

This looks like a failure of the up-cast to 2D logic (and instead of 1 histogram of N elements we are seeing N histograms of 1 element).

tacaswell · 2021-02-25T01:01:24Z

Specifically this code:

matplotlib/lib/matplotlib/cbook/__init__.py

Lines 1384 to 1444 in a510e32

    
           def _reshape_2D(X, name): 
        
               """ 
        
               Use Fortran ordering to convert ndarrays and lists of iterables to lists of 
        
               1D arrays. 
        
               Lists of iterables are converted by applying `numpy.asanyarray` to each of 
        
               their elements.  1D ndarrays are returned in a singleton list containing 
        
               them.  2D ndarrays are converted to the list of their *columns*. 
        
               *name* is used to generate the error message for invalid inputs. 
        
               """ 
        
               # unpack if we have a values or to_numpy method. 
        
               try: 
        
                   X = X.to_numpy() 
        
               except AttributeError: 
        
                   try: 
        
                       if isinstance(X.values, np.ndarray): 
        
                           X = X.values 
        
                   except AttributeError: 
        
                       pass 
        
               # Iterate over columns for ndarrays. 
        
               if isinstance(X, np.ndarray): 
        
                   X = X.T 
        
                   if len(X) == 0: 
        
                       return [[]] 
        
                   elif X.ndim == 1 and np.ndim(X[0]) == 0: 
        
                       # 1D array of scalars: directly return it. 
        
                       return [X] 
        
                   elif X.ndim in [1, 2]: 
        
                       # 2D array, or 1D array of iterables: flatten them first. 
        
                       return [np.reshape(x, -1) for x in X] 
        
                   else: 
        
                       raise ValueError(f'{name} must have 2 or fewer dimensions') 
        
               # Iterate over list of iterables. 
        
               if len(X) == 0: 
        
                   return [[]] 
        
               result = [] 
        
               is_1d = True 
        
               for xi in X: 
        
                   # check if this is iterable, except for strings which we 
        
                   # treat as singletons. 
        
                   if (isinstance(xi, collections.abc.Iterable) and 
        
                           not isinstance(xi, str)): 
        
                       is_1d = False 
        
                   xi = np.asanyarray(xi) 
        
                   nd = np.ndim(xi) 
        
                   if nd > 1: 
        
                       raise ValueError(f'{name} must have 2 or fewer dimensions') 
        
                   result.append(xi.reshape(-1)) 
        
               if is_1d: 
        
                   # 1D array of scalars: directly return it. 
        
                   return [np.reshape(result, -1)] 
        
               else: 
        
                   # 2D array, or 1D array of iterables: use flattened version. 
        
                   return result

which recently got re-worked due to changes in how numpy auto-up-casts object / ragged arrays.

FlorinAndrei · 2021-02-25T08:05:53Z

I suggest figuring out how to reproduce w/o tensorflow so we can advise

The Tensorflow failure is the only case I know, and it's how I ended up reporting this bug. I was just trying to learn the Numpy emulation in Tensorflow.

mwaskom · 2021-02-25T16:28:55Z

I think the two problems are:

Matplotlib preprocesses instances of np.ndarray so they are always 2D, with vectors gaining an outer dimension. The tensorflow object doesn't get this preprocessing. (Matplotlib should probably be using __array__ here?)
The introspection of the elements in X to test for iterableness (which lets a normal sequence of numbers work as expected) fails because the elements of the tensorflow object are 0D but an instance of Iterable:

from collections.abc import Iterable
print(isinstance(labels1[0], Iterable))
print(isinstance(labels2[0], Iterable))

tacaswell · 2021-02-25T18:47:39Z

It is reasonable to expect that we will ducktype TF arrays correctly here. For now I think there is a user-space work around to cant to numpy before passing to us. That said, none of the core developers are regular TF users so any help with duck-typing them correctly would be greatly appreciated.

Internally we can not directly use np.asarray before handling units because that will break things like pint, but checking for __array__ may work better?

Labeled this as good first issue because I think all of the work that needs to be done is in _reshape_2D and requires little understanding of the rest of Matplotlib internals.

jklymak · 2021-02-25T19:30:10Z

I think it is desirable to ducktype what we can, but we need some rules for what that means. Random object x used to work, but now it doesn't is not a good argument that random x should ever have worked. It would be preferable to try and stipulate what properties x needs to have for successful duck typing.

mwaskom · 2021-02-25T19:55:55Z

Isn't that what obj.__array__ is for?

tacaswell · 2021-02-25T20:04:43Z

Long term we can rely on the behavior defined by https://data-apis.org/blog/array_api_standard_release/ but for now I suspect we are going to have to keep playing whack-a-mole with these.

jklymak · 2021-02-25T21:38:39Z

As discussed on the dev call: We are happy to take a patch that improves the duck typing here, and provides a mock class that behaves like the object so that we can run tests. We cannot take downstream libraries as a test dependency (except for pandas) so someone will have to go through the effort of making a mock.

In user space, please convert to a numpy array or list of lists or list of numpy arrays. Nothing else is documented or guaranteed to work. While we don't try to break things, passing other objects are not guaranteed to be stable.

Downstream libraries should be able to avoid these problems by providing a to_numpy method. Its possible we could check for __array__ as well, but it would be nice to have a test object to work with so we can test that. (In this case its not clear that will work if array elements are all going to be Iterable.)

~~I'll close because we don't have an immediate action, but again we very much welcome feedback either as a new self-contained issue or PR.~~ Re-opening at @tacaswell request - we will definitely still take a PR for this.

mwaskom · 2021-02-25T22:33:30Z

(In this case its not clear that will work if array elements are all going to be Iterable.)

If _reshape_2D duck-typed ndarrays, this wouldn't be an issue because the tensorflow object would be 2D when you get to that check. But it's an added wrinkle that explains why it doesn't work as you might expect from the fact that the tensorflow array looks like a vector.

tacaswell · 2021-05-05T19:09:22Z

Pushed to 3.5. I think the action items here are

we need to document what we expect to work or not so that in the future we can have a clear line of what we
we would likely take a patch to cbook that handles the reality of what TensorFlow is doing because we do not have 1 yet.

jklymak · 2021-05-05T19:54:22Z

Is this really a good first issue?

timhoffm · 2021-05-05T20:15:00Z

Probably 2. can be done as a first issue.

For 1. I don't think we even know what we expect.

tacaswell · 2021-05-05T20:15:17Z

I think part 2 is a good first issue to Matplotlib for someone who is fluent in tensorflow and array conventions.

sriakshata · 2021-08-19T16:47:02Z

Hello, I need clarity on this issue more correctly I couldn't get it . The problem is only with the matplotlib 3.3.4 version, as with the latest version 3.4.3 version it shows correct output on google colab. I am new here and I attempting to first time. Please help me understand.

tacaswell · 2021-08-19T22:58:12Z

Are you using the same version of tensorflow with both mpl3.4.3 and mpl3.3.4?

If not, it could be that something changed on the TF side to match our expectation of what an "array" is better.

If so, then maybe we accidentally fixed this and it can be closed!

sriakshata · 2021-08-20T06:09:41Z

I have used mpl 3.4.3 with tensorflow 2. I haven't worked with mpl 3.3.4. It works fine with mpl 3.4.3 with tensorflow 2.

mwaskom · 2021-08-20T10:52:47Z

Do you get the same results as in this comment?

from collections.abc import Iterable
print(isinstance(labels1[0], Iterable))
print(isinstance(labels2[0], Iterable))

sriakshata · 2021-08-20T13:34:23Z

@mwaskom I am not sure about the idea mentioned in that comment. I just tried getting the plot with the given code and it worked fine.

mwaskom · 2021-08-20T21:18:40Z

Can you try running that code in the environment where the plot works?

sriakshata · 2021-08-21T06:56:35Z

I have tried running it. It works fine with mpl 3.4.3 and tensorflow 2

mwaskom · 2021-08-21T11:18:09Z

Yes but what does it print?

sriakshata · 2021-08-22T10:41:02Z

See above comment https://github.com/matplotlib/matplotlib/issues/19574#issuecomment-902075023. I have attached the output I got.

FlorinAndrei mentioned this issue Feb 24, 2021

tf.experimental.numpy fails with plt.hist tensorflow/tensorflow#46274

Closed

jklymak added status: downstream fix required status: needs clarification labels Feb 25, 2021

tacaswell added this to the v3.4.1 milestone Feb 25, 2021

tacaswell added Good first issue and removed status: downstream fix required status: needs clarification labels Feb 25, 2021

tacaswell added the status: downstream fix required label Feb 25, 2021

jklymak closed this Feb 25, 2021

jklymak reopened this Feb 25, 2021

QuLogic removed this from the v3.4.1 milestone Mar 30, 2021

QuLogic added this to the v3.4.2 milestone Mar 30, 2021

tacaswell removed this from the v3.4.2 milestone May 5, 2021

tacaswell added this to the v3.5.0 milestone May 5, 2021

jklymak added the topic: units label May 5, 2021

QuLogic removed this from the v3.5.0 milestone Sep 25, 2021

QuLogic added this to the v3.6.0 milestone Sep 25, 2021

matplotlib / matplotlib Public

plt.hist() fails with TensorFlow Numpy emulation #19574

plt.hist() fails with TensorFlow Numpy emulation #19574

FlorinAndrei commented Feb 24, 2021

jklymak commented Feb 25, 2021

tacaswell commented Feb 25, 2021

tacaswell commented Feb 25, 2021

FlorinAndrei commented Feb 25, 2021

mwaskom commented Feb 25, 2021 •

edited

tacaswell commented Feb 25, 2021

jklymak commented Feb 25, 2021

mwaskom commented Feb 25, 2021

tacaswell commented Feb 25, 2021

jklymak commented Feb 25, 2021 •

edited

mwaskom commented Feb 25, 2021

tacaswell commented May 5, 2021

jklymak commented May 5, 2021

timhoffm commented May 5, 2021

tacaswell commented May 5, 2021

sriakshata commented Aug 19, 2021

tacaswell commented Aug 19, 2021

sriakshata commented Aug 20, 2021

mwaskom commented Aug 20, 2021 •

edited

sriakshata commented Aug 20, 2021

mwaskom commented Aug 20, 2021

sriakshata commented Aug 21, 2021

mwaskom commented Aug 21, 2021

sriakshata commented Aug 22, 2021

matplotlib / matplotlib Public

plt.hist() fails with TensorFlow Numpy emulation #19574

plt.hist() fails with TensorFlow Numpy emulation #19574

Comments

FlorinAndrei commented Feb 24, 2021

Bug report

jklymak commented Feb 25, 2021

tacaswell commented Feb 25, 2021

tacaswell commented Feb 25, 2021

FlorinAndrei commented Feb 25, 2021

mwaskom commented Feb 25, 2021 • edited

tacaswell commented Feb 25, 2021

jklymak commented Feb 25, 2021

mwaskom commented Feb 25, 2021

tacaswell commented Feb 25, 2021

jklymak commented Feb 25, 2021 • edited

mwaskom commented Feb 25, 2021

tacaswell commented May 5, 2021

jklymak commented May 5, 2021

timhoffm commented May 5, 2021

tacaswell commented May 5, 2021

sriakshata commented Aug 19, 2021

tacaswell commented Aug 19, 2021

sriakshata commented Aug 20, 2021

mwaskom commented Aug 20, 2021 • edited

sriakshata commented Aug 20, 2021

mwaskom commented Aug 20, 2021

sriakshata commented Aug 21, 2021

mwaskom commented Aug 21, 2021

sriakshata commented Aug 22, 2021

mwaskom commented Feb 25, 2021 •

edited

jklymak commented Feb 25, 2021 •

edited

mwaskom commented Aug 20, 2021 •

edited