Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stale information drawn by nbAgg #19116

Open
ianhi opened this issue Dec 15, 2020 · 18 comments
Open

Stale information drawn by nbAgg #19116

ianhi opened this issue Dec 15, 2020 · 18 comments

Comments

@ianhi
Copy link
Contributor

@ianhi ianhi commented Dec 15, 2020

Bug report

This is a follow up to the glitching noted in #19059.

Bug summary
If you are not forcing a full frontend redraw and are sending messages to the frontend fast enough then there can be artifacts left in the plot.

Code for reproduction

import itertools

cnt = itertools.count()
bg = None

def onclick_handle(event):
    """Should draw elevating green line on each mouse click"""
    global bg
    if bg is None:
        bg = ax.figure.canvas.copy_from_bbox(ax.bbox)        
    ax.figure.canvas.restore_region(bg)

    cur_y = (next(cnt) % 10) * 0.1
    ln.set_ydata([cur_y, cur_y])
    ax.draw_artist(ln)
    ax.figure.canvas.blit(ax.bbox)


fig, ax = plt.subplots()
ax.plot([0, 1], [0, 1], 'r')
ln, = ax.plot([0, 1], [0, 0], 'g', animated=True)
plt.show()
ax.figure.canvas.draw()

ax.figure.canvas.mpl_connect('button_press_event', onclick_handle)

Actual outcome
smearing

Expected outcome
Only one green line should be ever be present.

Matplotlib version

  • Operating system:
  • Matplotlib version: Master
  • Matplotlib backend (print(matplotlib.get_backend())): nbAgg or ipympl
  • Python version:
  • Jupyter version (if applicable): Jupyterlab 2.2.9

In #19059 (comment) @tacaswell noted the following:

@danielballan has also noticed this form of glitching in ipympl and is interested in fixing it. I agree the issue is that via some mechanism one or more of the diff frames are being lost or processed out of order. We either need to make the front ends smart enough to detect this (by tacking a frame number on to everything and verifying that that they come in order) or on the kernel side force a non-diff every N frame (aka key-frames analogous to the way that mpeg works internally).

@ianhi
Copy link
Contributor Author

@ianhi ianhi commented Dec 15, 2020

Wasn't sure on the best title for this issue, open to suggestions

@story645
Copy link
Member

@story645 story645 commented Dec 16, 2020

Does it only happen w/ the jupyter backends or is it a general animation glitch?

@ianhi
Copy link
Contributor Author

@ianhi ianhi commented Dec 16, 2020

I can't to replicate it with the tkAgg backend so I think it is specific to the juptyer backends.

@ianhi
Copy link
Contributor Author

@ianhi ianhi commented Dec 16, 2020

Actually, it may also occur if you embed webAgg into a generic webpage. But I also may have found a bug with this example https://matplotlib.org/3.3.1/gallery/user_interfaces/embedding_webagg_sgskip.html that prevented me from checking that. It seems I need to refresh in order to have updates the graph actually get updated. See this GIF:
Peek 2020-12-16 00-58

Embed Code
import io
import json
import mimetypes
from pathlib import Path

try:
    import tornado
except ImportError as err:
    raise RuntimeError("This example requires tornado.") from err
import tornado.web
import tornado.httpserver
import tornado.ioloop
import tornado.websocket


import matplotlib as mpl
from matplotlib.backends.backend_webagg_core import (
    FigureManagerWebAgg, new_figure_manager_given_figure)
from matplotlib.figure import Figure

import numpy as np


def create_figure():
    """
    Creates a simple example figure.
    """

    return fig


# The following is the content of the web page.  You would normally
# generate this using some sort of template facility in your web
# framework, but here we just use Python string formatting.
html_content = """
<html>
  <head>
    <!-- TODO: There should be a way to include all of the required javascript
               and CSS so matplotlib can add to the set in the future if it
               needs to. -->
    <link rel="stylesheet" href="_static/css/page.css" type="text/css">
    <link rel="stylesheet" href="_static/css/boilerplate.css"
          type="text/css" />
    <link rel="stylesheet" href="_static/css/fbm.css" type="text/css" />
    <link rel="stylesheet" href="_static/css/mpl.css" type="text/css">
    <script src="mpl.js"></script>

    <script>
      /* This is a callback that is called when the user saves
         (downloads) a file.  Its purpose is really to map from a
         figure and file format to a url in the application. */
      function ondownload(figure, format) {
        window.open('download.' + format, '_blank');
      };

      function ready(fn) {
        if (document.readyState != "loading") {
          fn();
        } else {
          document.addEventListener("DOMContentLoaded", fn);
        }
      }

      ready(
        function() {
          /* It is up to the application to provide a websocket that the figure
             will use to communicate to the server.  This websocket object can
             also be a "fake" websocket that underneath multiplexes messages
             from multiple figures, if necessary. */
          var websocket_type = mpl.get_websocket_type();
          var websocket = new websocket_type("%(ws_uri)sws");

          // mpl.figure creates a new figure on the webpage.
          var fig = new mpl.figure(
              // A unique numeric identifier for the figure
              %(fig_id)s,
              // A websocket object (or something that behaves like one)
              websocket,
              // A function called when a file type is selected for download
              ondownload,
              // The HTML element in which to place the figure
              document.getElementById("figure"));
        }
      );
    </script>

    <title>matplotlib</title>
  </head>

  <body>
    <div id="figure">
    </div>
  </body>
</html>
"""


class MyApplication(tornado.web.Application):
    class MainPage(tornado.web.RequestHandler):
        """
        Serves the main HTML page.
        """

        def get(self):
            manager = self.application.manager
            ws_uri = "ws://{req.host}/".format(req=self.request)
            content = html_content % {
                "ws_uri": ws_uri, "fig_id": manager.num}
            self.write(content)

    class MplJs(tornado.web.RequestHandler):
        """
        Serves the generated matplotlib javascript file.  The content
        is dynamically generated based on which toolbar functions the
        user has defined.  Call `FigureManagerWebAgg` to get its
        content.
        """

        def get(self):
            self.set_header('Content-Type', 'application/javascript')
            js_content = FigureManagerWebAgg.get_javascript()

            self.write(js_content)

    class Download(tornado.web.RequestHandler):
        """
        Handles downloading of the figure in various file formats.
        """

        def get(self, fmt):
            manager = self.application.manager
            self.set_header(
                'Content-Type', mimetypes.types_map.get(fmt, 'binary'))
            buff = io.BytesIO()
            manager.canvas.figure.savefig(buff, format=fmt)
            self.write(buff.getvalue())

    class WebSocket(tornado.websocket.WebSocketHandler):
        """
        A websocket for interactive communication between the plot in
        the browser and the server.

        In addition to the methods required by tornado, it is required to
        have two callback methods:

            - ``send_json(json_content)`` is called by matplotlib when
              it needs to send json to the browser.  `json_content` is
              a JSON tree (Python dictionary), and it is the responsibility
              of this implementation to encode it as a string to send over
              the socket.

            - ``send_binary(blob)`` is called to send binary image data
              to the browser.
        """
        supports_binary = True

        def open(self):
            # Register the websocket with the FigureManager.
            manager = self.application.manager
            manager.add_web_socket(self)
            if hasattr(self, 'set_nodelay'):
                self.set_nodelay(True)

        def on_close(self):
            # When the socket is closed, deregister the websocket with
            # the FigureManager.
            manager = self.application.manager
            manager.remove_web_socket(self)

        def on_message(self, message):
            # The 'supports_binary' message is relevant to the
            # websocket itself.  The other messages get passed along
            # to matplotlib as-is.

            # Every message has a "type" and a "figure_id".
            message = json.loads(message)
            if message['type'] == 'supports_binary':
                self.supports_binary = message['value']
            else:
                manager = self.application.manager
                manager.handle_json(message)

        def send_json(self, content):
            self.write_message(json.dumps(content))

        def send_binary(self, blob):
            if self.supports_binary:
                self.write_message(blob, binary=True)
            else:
                data_uri = "data:image/png;base64,{0}".format(
                    blob.encode('base64').replace('\n', ''))
                self.write_message(data_uri)

    def __init__(self, figure):
        self.figure = figure
        self.manager = new_figure_manager_given_figure(id(figure), figure)

        super().__init__([
            # Static files for the CSS and JS
            (r'/_static/(.*)',
             tornado.web.StaticFileHandler,
             {'path': FigureManagerWebAgg.get_static_file_path()}),

            # Static images for the toolbar
            (r'/_images/(.*)',
             tornado.web.StaticFileHandler,
             {'path': Path(mpl.get_data_path(), 'images')}),

            # The page that contains all of the pieces
            ('/', self.MainPage),

            ('/mpl.js', self.MplJs),

            # Sends images and events to the browser, and receives
            # events from the browser
            ('/ws', self.WebSocket),

            # Handles the downloading (i.e., saving) of static images
            (r'/download.([a-z0-9.]+)', self.Download),
        ])


import itertools

if __name__ == "__main__":
    cnt = itertools.count()
    fig = Figure()
    application = MyApplication(fig)
    ax = fig.add_subplot(111)
    ax.plot([0, 1], [0, 1], 'r')
    ln, = ax.plot([0, 1], [0, 0], 'g', animated=False)
    ax.figure.canvas.draw()
    fig.canvas.draw()

    bg = ax.figure.canvas.copy_from_bbox(ax.bbox)
    times_clicked = 0

    def onclick_handle(event):
        """Should draw elevating green line on each mouse click"""
        global times_clicked
        times_clicked += 1
        ax.figure.canvas.restore_region(bg)
        print('weeeeeeeeeee')

        cur_y = (next(cnt) % 10) * 0.1
        ln.set_ydata([cur_y, cur_y])
        ax.draw_artist(ln)
        ax.figure.canvas.blit(ax.bbox)
        ax.set_title(f"clicked: {times_clicked} times")

    ax.figure.canvas.mpl_connect('button_press_event', onclick_handle)

    http_server = tornado.httpserver.HTTPServer(application)
    http_server.listen(8080)

    print("http://127.0.0.1:8080/")
    print("Press Ctrl+C to quit")

    tornado.ioloop.IOLoop.instance().start()
but I think that's a separate issue that deserves to be filed in own right

@tacaswell
Copy link
Member

@tacaswell tacaswell commented Dec 16, 2020

Can you try wrapping the callback in a threading lock? I am worried that somehow threads are getting involved (I recently skimmed the ipykernel code and I think it is single threaded, but given that zmq and zmq calbacks are involved I am not 100% sure of that). A way to get this effect without errors in the sending is

  1. thread A: restore
  2. thread B: restore
  3. thread A: draw
  4. thread B: draw

However it is not clear to me how that survives in the double buffer on the Python side I am not clear (as even if you ended up with the same wrong version in both buffers, on the next slow click through I think it should be cleaned up.

It would also be interesting to see if you make the axes background 50% green if you can see a "negative line" anyplace. Maybe make the axes background (128, 128, 128) and then randomize the line color?

I'm still mostly sure that this is a dropped diff issue, but it might be worth being very sure of that.


What version of mpl and browser? @QuLogic has been making progress on cleaning up the js and either you may not have a fix or we put a regression in.

What are those dom exceptions referring to?

@ianhi
Copy link
Contributor Author

@ianhi ianhi commented Dec 16, 2020

Can you try wrapping the callback in a threading lock?

I will poke around with this sometime in the next few days.

It would also be interesting to see if you make the axes background 50% green if you can see a "negative line" anyplace. Maybe make the axes background (128, 128, 128) and then randomize the line color?

This is a nice idea, I will try this out.

What version of mpl and browser?

Firefox 83.0 on Ubuntu, matplotlib: 3.3.1

DOM exceptions (not even a screenshot! I guess I'm learning)

Uncaught DOMException: An attempt was made to use an object that is not, or is no longer, usable mpl.js:377
    send_message http://127.0.0.1:8080/mpl.js:377
    request_resize http://127.0.0.1:8080/mpl.js:371
    resizeObserver http://127.0.0.1:8080/mpl.js:218
    (Async: ResizeObserverCallback)
    _init_canvas http://127.0.0.1:8080/mpl.js:169
    figure http://127.0.0.1:8080/mpl.js:55
    <anonymous> http://127.0.0.1:8080/:40
    (Async: EventListener.handleEvent)
    ready http://127.0.0.1:8080/:26
    <anonymous> http://127.0.0.1:8080/:30

@ianhi
Copy link
Contributor Author

@ianhi ianhi commented Dec 16, 2020

If I upgrade to matplotlib 3.3.3 it's actually even worse, then the graph totally fails to render (blank page only) with this error in the console:

Uncaught DOMException: An attempt was made to use an object that is not, or is no longer, usable mpl.js:390
    send_message http://127.0.0.1:8080/mpl.js:390
    _init_canvas http://127.0.0.1:8080/mpl.js:161
    figure http://127.0.0.1:8080/mpl.js:55
    <anonymous> http://127.0.0.1:8080/:40

and that's with no modifications to the example here https://matplotlib.org/3.3.3/gallery/user_interfaces/embedding_webagg_sgskip.html

@ianhi
Copy link
Contributor Author

@ianhi ianhi commented Dec 16, 2020

I tried the webAgg example on master matplotlib and found it's broken there as well. So I opened #19129 to keep it separate from this issue

@ianhi
Copy link
Contributor Author

@ianhi ianhi commented Dec 17, 2020

This same glitching does not occur when using the webAgg backend (tested when running the branch for #19131) so I'd guess it's an issue either with threads in ipython, or with the lagginess of the jupyter comms.

TBD on testing with a threading lock, but I have some thoughts on the latter.

Pure websockets are significantly faster than the jupyter comms - see my investigation here: https://github.com/ianhi/widget_message_speed#widget_message_speed (there are matplotlib plots in the readme!). I don't really understand how those comms work, but maybe they're not actually a setting up a proper queue, and so it can end up out of order especially given the times that are in play?

Is it possible for maptlotlib to hijack the tornado event loop that underlies jupyter, add a new websocket handler, and then open a direct websocket to the frontend? This would bypass a bit how I think widgets are meant to work, but I think would also come with significant performance improvements for interactivity.

@QuLogic
Copy link
Member

@QuLogic QuLogic commented Mar 19, 2021

Initially, I thought the problem might have been due to #18583, and that the binary comm might be a separate channel, with things arriving in parallel, but not necessarily in order. However, reverting that change does not fix the artifacting, though it maybe happens a bit less often (I'm not sure exactly how you tested other than clicking a lot.)

@QuLogic
Copy link
Member

@QuLogic QuLogic commented Mar 19, 2021

So the way display works is a bit funny; I think there may be a few extra buffers and async than necessary. We draw on a Canvas, which mostly makes sense. There is an 'image mode', which is either 'diff' or 'full', which determines whether the Canvas is cleared first. This mode is sent before the image data in a separate message. While this could be causing a mismatch, I think it might not be the problem.

When the JS side receives image data, it doesn't draw that directly on the Canvas. Instead, it creates a new object URL, and sets the source on an Image. Then in the Image's onload, it does the actual draw on the Canvas. Now I believe this may be the correct way to do things due to what sources Canvas supports, but since we re-use the same Image object, I think there may be some room for the Image to trigger onload and start drawing while the websocket has received a new diff and started modifying the Image again.

I've been thinking maybe we should use Canvas.putImageData, but I think that would be raw bytes, and mean dropping the whole 'diff' image mode. Not encoding as PNG will save CPU, but increase transfer size. I haven't had a chance to try this out, and it may end up being slower because of that.

@QuLogic
Copy link
Member

@QuLogic QuLogic commented Mar 19, 2021

Some quick print debug shows that when I get an artifact, I see that 2-3 image data loads have happened, but only one draw in the onload function. Though I'm not sure how that created a double-draw of things.

Switching from a single Image to just making a new one every time seems like it might have fixed it, but I don't know if I've clicked fast enough. It also might be leaking the old objects right now.

@tacaswell
Copy link
Member

@tacaswell tacaswell commented Mar 24, 2021

Not encoding as PNG will save CPU, but increase transfer size.

I think Mike did a systematic study of this and the png compression time was worth it for reduced data (and CPUs have only gotten comparatively faster).

@tacaswell tacaswell added this to the v3.4.1 milestone Mar 24, 2021
@ianhi
Copy link
Contributor Author

@ianhi ianhi commented Mar 24, 2021

I've been thinking maybe we should use Canvas.putImageData, but I think that would be raw bytes, and mean dropping the whole 'diff' image mode. Not encoding as PNG will save CPU, but increase transfer size. I haven't had a chance to try this out, and it may end up being slower because of that.

What about generic compression of the bytes/array? For example ipydatawidgets provides serializers for numpy arrays to scijs arrays in the frontend (https://github.com/vidartf/ipydatawidgets#arrays) I discussed this some w.r.t ipycanvas martinRenou/ipycanvas#104. The actual compressed serialization happens here: https://github.com/vidartf/ipydatawidgets/blob/da6db7a9f9ace74c87583e633b22c3634526520e/ipydatawidgets/ndarray/serializers.py#L94 and on the js side you can use pako to decompress: https://github.com/vidartf/ipydatawidgets/blob/da6db7a9f9ace74c87583e633b22c3634526520e/packages/serializers/src/compression.ts#L16

but since we re-use the same Image object, I think there may be some room for the Image to trigger onload and start drawing while the websocket has received a new diff and started modifying the Image again.

I agree that this seems the most likely culprit. Maybe some sort of lock or usage of await somewhere could help with this?

@QuLogic
Copy link
Member

@QuLogic QuLogic commented Mar 24, 2021

but since we re-use the same Image object, I think there may be some room for the Image to trigger onload and start drawing while the websocket has received a new diff and started modifying the Image again.

I agree that this seems the most likely culprit. Maybe some sort of lock or usage of await somewhere could help with this?

This seemed possibly okay, and didn't leak (kind of depends on JS garbage collection, I think), but it is difficult to test. I'm trying to put together something in selenium, but only started with WebAgg for now.

@ianhi
Copy link
Contributor Author

@ianhi ianhi commented Mar 24, 2021

I think the thing may be to do:

await image.decode()

immediately after setting the src.

@tacaswell
Copy link
Member

@tacaswell tacaswell commented Jun 29, 2021

@ianhi To verify, blitting is still broken with webagg/nbagg/ipympl?

@ianhi
Copy link
Contributor Author

@ianhi ianhi commented Jun 30, 2021

On the current matplotlib master it was not happening in webagg but was for nbagg. I also tried with ipympl and was not able to replicate it, but in the past I trouble doing it consistently so I don't expect that that is fixed.

@QuLogic QuLogic removed this from the v3.4.3 milestone Aug 12, 2021
@QuLogic QuLogic added this to the v3.4.4 milestone Aug 12, 2021
@QuLogic QuLogic removed this from the v3.4.4 milestone Nov 16, 2021
@QuLogic QuLogic added this to the v3.5.1 milestone Nov 16, 2021
@QuLogic QuLogic removed this from the v3.5.1 milestone Dec 10, 2021
@QuLogic QuLogic added this to the v3.5.2 milestone Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants