Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: UnicodeDecodeError when using some special and accented characters in TeX #23019

Closed
23ccozad opened this issue May 9, 2022 · 5 comments · Fixed by #23033
Closed

[Bug]: UnicodeDecodeError when using some special and accented characters in TeX #23019

23ccozad opened this issue May 9, 2022 · 5 comments · Fixed by #23033
Labels
status: has patch patch suggested, PR still needed topic: text/usetex
Milestone

Comments

@23ccozad
Copy link
Contributor

23ccozad commented May 9, 2022

Bug summary

I'm just getting started with the development side of matplotlib, and I'm getting a UnicodeDecodeError in some cases (not all cases) when TeX is being used in unit tests and building docs. The cause of the issue may be specific to my installation and seems to be isolated to the use of certain special and accented characters, but I'm not sure how to resolve it.

The code below is an example (from tex_demo.py in the docs) of one instance where this error is occurring, and \N{DEGREE SIGN} is causing the issue (removing it allows the code to run without error).

Code for reproduction

import numpy as np
import matplotlib.pyplot as plt

plt.rcParams['text.usetex'] = True

t = np.linspace(0.0, 1.0, 100)
s = np.cos(4 * np.pi * t) + 2

fig, ax = plt.subplots(figsize=(6, 4), tight_layout=True)
ax.plot(t, s)

ax.set_xlabel(r'\textbf{time (s)}')
ax.set_ylabel('\\textit{Velocity (\N{DEGREE SIGN}/sec)}', fontsize=16)  # \N{DEGREE SIGN} appears to be the culprit
ax.set_title(r'\TeX\ is Number $\displaystyle\sum_{n=1}^\infty'
             r'\frac{-e^{i\pi}}{2^n}$!', fontsize=16, color='r')

Actual outcome

Traceback from warning/error in building docs:

generating gallery for gallery\text_labels_and_annotations... [ 78%] tex_demo.pydemo.py.pyxample.py
Warning, treated as error:
C:\Users\username\Documents\GitHub\matplotlib\examples\text_labels_and_annotations\tex_demo.py failed to execute correctly: Traceback (most recent call last):

  File "C:\Users\username\miniconda3\envs\matplotlib-dev\lib\site-packages\sphinx_gallery\scrapers.py", line 378, in save_figures
    rst = scraper(block, block_vars, gallery_conf)
  File "C:\Users\username\Documents\GitHub\matplotlib\doc\conf.py", line 173, in matplotlib_reduced_latex_scraper
    return matplotlib_scraper(block, block_vars, gallery_conf, **kwargs)
  File "C:\Users\username\miniconda3\envs\matplotlib-dev\lib\site-packages\sphinx_gallery\scrapers.py", line 171, in matplotlib_scraper
    fig.savefig(image_path, **these_kwargs)
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\figure.py", line 3099, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\backend_bases.py", line 2262, in print_figure
    self.figure.draw(renderer)
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\artist.py", line 73, in draw_wrapper
    result = draw(artist, renderer, *args, **kwargs)
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\artist.py", line 50, in draw_wrapper
    return draw(artist, renderer)
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\figure.py", line 2891, in draw
    mimage._draw_list_compositing_images(
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\image.py", line 131, in _draw_list_compositing_images
    a.draw(renderer)
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\artist.py", line 50, in draw_wrapper
    return draw(artist, renderer)
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\axes\_base
.py", line 3042, in draw
    self._update_title_position(renderer)
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\axes\_base
.py", line 2982, in _update_title_position
    ax.yaxis.get_tightbbox(renderer)  # update offsetText
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\text.py", line 915, in get_window_extent
    bbox, info, descent = self._get_layout(self._renderer)
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\text.py", line 321, in _get_layout                   File "c:\users\username\documents\github\matplotlib\lib\matplotlib\axis.py",
line 1244, in get_tightbbox
    bb = self.label.get_window_extent(renderer)
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\text.py",
line 915, in get_window_extent
    bbox, info, descent = self._get_layout(self._renderer)
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\text.py",
line 321, in _get_layout
    w, h, d = _get_text_metrics_with_cache(
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\text.py", line 97, in _get_text_metrics_with_cache
    return _get_text_metrics_with_cache_impl(
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\text.py", line 105, in _get_text_metrics_with_cache_impl
    return renderer_ref().get_text_width_height_descent(text, fontprop, ismath)
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\backends\backend_agg.py", line 235, in get_text_width_height_descent
    w, h, d = texmanager.get_text_width_height_descent(
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\texmanager.py", line 359, in get_text_width_height_descent
    dvifile = cls.make_dvi(tex, fontsize)
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\texmanager.py", line 292, in make_dvi
    cls._run_checked_subprocess(
  File "c:\users\username\documents\github\matplotlib\lib\matplotlib\texmanager.py", line 268, in _run_checked_subprocess
    exc=exc.output.decode('utf-8'))) from exc
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 1673: invalid start byte

The test_savefig_to_stringio() test also fails four times, each time with a different set of arguments. This is also a UnicodeDecodeError, but it originates from the title of the plot having accented characters ("Déjà vu"). Removing those characters resolves the UnicodeDecodeError, but leaves me with an AssertionError at the end of the test. Traceback is omitted here because it is over 200 lines long, and results in a similar error as above.

Expected outcome

Expected outcome is to build docs and pass all tests without failure. Referring to the failing instance in the docs from tex_demo.py, that code should produce:

Additional information

Again, this could be isolated to my installation of matplotlib, but if anyone can reproduce this error or knows of a possible fix, that would be great.

Operating system

Windows 11

Matplotlib Version

3.6.0.dev2170+g40676bb62f

Matplotlib Backend

module://matplotlib_inline.backend_inline

Python version

3.8.13

Jupyter version

3.4.0

Installation

git checkout

@oscargus
Copy link
Contributor

oscargus commented May 9, 2022

Fwiw, I get the same error on my Windows 10 machine.

The unfortunate thing is that it is actually in the formatting of the error message that the error occurs. Removing the decode part leads to:

RuntimeError: latex was not able to process the following string:
b'\\\\textit{Velocity (\\xb0/sec)}'

Here is the full report generated by latex:
RuntimeError: latex was not able to process the following string:
b'This is pdfTeX, Version 3.141592653-2.6-1.40.24 (MiKTeX 22.3) (preloaded format=latex.fmt)\r\n restricted \\write18 enabled.\r\nentering extended mode\r\n(../4045204a3cad939c86f044077310bd0a.tex\r\nLaTeX2e <2021-11-15> patch level 1\r\nL3 programming layer <2022-02-24>\r\n(C:\\Program Files\\MiKTeX 2.9\\tex/latex/base\\article.cls\r\nDocument Class: article 2021/10/04 v1.4n Standard LaTeX document class\r\n(C:\\Program Files\\MiKTeX 2.9\\tex/latex/base\\size10.clo))\r\n(C:\\Program Files\\MiKTeX 2.9\\tex/latex/type1cm\\type1cm.sty)\r\n(C:\\Program Files\\MiKTeX 2.9\\tex/latex/cm-super\\type1ec.sty\r\n(C:\\Program Files\\MiKTeX 2.9\\tex/latex/base\\t1cmr.fd))\r\n(C:\\Program Files\\MiKTeX 2.9\\tex/latex/base\\inputenc.sty)\r\n(C:\\Program Files\\MiKTeX 2.9\\tex/latex/geometry\\geometry.sty\r\n(C:\\Program Files\\MiKTeX 2.9\\tex/latex/graphics\\keyval.sty)\r\n(C:\\Program Files\\MiKTeX 2.9\\tex/generic/iftex\\ifvtex.sty\r\n(C:\\Program Files\\MiKTeX 2.9\\tex/generic/iftex\\iftex.sty))\r\n(C:\\Program Files\\MiKTeX 2.9\\tex/latex/geometry\\geometry.cfg))\r\n(C:\\Program Files\\MiKTeX 2.9\\tex/latex/underscore\\underscore.sty)\r\n(C:\\Program Files\\MiKTeX 2.9\\tex/latex/base\\textcomp.sty)\r\n(C:\\Program Files\\MiKTeX 2.9\\tex/latex/l3backend\\l3backend-dvips.def)\r\nNo file 4045204a3cad939c86f044077310bd0a.aux.\r\n*geometry* driver: auto-detecting\r\n*geometry* detected driver: dvips\r\n\r\nLaTeX Font Warning: Font shape `OT1/cmss/m/it\' in size <16> not available\r\n(Font)              Font shape `OT1/cmss/m/sl\' tried instead on input line 29.\r\n\r\n\r\n! LaTeX Error: Invalid UTF-8 byte "B0.\r\n\r\nSee the LaTeX manual or LaTeX Companion for explanation.\r\nType  H <return>  for immediate help.\r\n ...                                              \r\n                                                  \r\nl.29 ...eylines\\sffamily \\textit{Velocity (\xb0/sec)}\r\n                                                  }%\r\nNo pages of output.\r\nTranscript written on 4045204a3cad939c86f044077310bd0a.log.\r\nlatex: major issue: So far, you have not checked for updates as a MiKTeX user.\r\n'
...

0xb0/\xb0 is the ASCII code for °. As far as I know, plain latex doesn't handle Unicode characters that well, so a bit surprising that the example is rendering well. Adding plt.rcParams['text.latex.preamble'] = [r'\usepackage[utf8]{inputenc}'] leads to a different error on my machine.

(FYI, you may get test errors due to slightly different fonts when running locally, especially on Windows, but this is clearly a different issue.)

@anntzer
Copy link
Contributor

anntzer commented May 9, 2022

Can you check whether

diff --git i/lib/matplotlib/texmanager.py w/lib/matplotlib/texmanager.py
index 2ffe0d5f66..a9022db5ed 100644
--- i/lib/matplotlib/texmanager.py
+++ w/lib/matplotlib/texmanager.py
@@ -243,7 +243,7 @@ class TexManager:
         Return the file name.
         """
         texfile = cls.get_basefile(tex, fontsize) + ".tex"
-        Path(texfile).write_text(cls._get_tex_source(tex, fontsize))
+        Path(texfile).write_text(cls._get_tex_source(tex, fontsize), encoding="utf-8")
         return texfile
 
     @classmethod

fixes the issue?

(We should probably also fix _run_checked_subprocess to use something like .decode("utf-8", "backslashreplace").)

@oscargus
Copy link
Contributor

oscargus commented May 9, 2022

Seems to fix it for me at least!

@anntzer
Copy link
Contributor

anntzer commented May 9, 2022

Feel free to pick up the patch, then.

@23ccozad
Copy link
Contributor Author

23ccozad commented May 9, 2022

Yep, this also works for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: has patch patch suggested, PR still needed topic: text/usetex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants