New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-28660: make TextWrapper break long words on hyphens #22721
Conversation
…ng_words=True and break_on_hyphens=True
Lib/textwrap.py
Outdated
chunk = reversed_chunks[-1] | ||
if self.break_on_hyphens and len(chunk) > space_left: | ||
hyphen = chunk.rfind('-', 0, space_left) | ||
if hyphen != -1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"-1234567890"
would split as "-"
and "1234567890"
. Maybe check hyphen > 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Got me wondering about 1E-123456789 and 2**-123456789, but they are probably quite esoteric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have "--a_long_command_name" it will break right after the --. To avoid that we could check that there is something other than '-' in chunk[0:hyphen]. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be.
Misc/NEWS.d/next/Library/2020-10-16-16-08-04.bpo-28660.eX9pvD.rst
Outdated
Show resolved
Hide resolved
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Lib/textwrap.py
Outdated
hyphen = chunk.rfind('-', 0, space_left) | ||
if hyphen != -1: | ||
if hyphen > 0 and any(c != '-' for c in chunk[0:hyphen]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply chunk[:hyphen]
instead of chunk[0:hyphen]
.
Other ways to perform this test are:
chunk[:hyphen].rstrip('-')
and
re.compile(r'[^-]').search(chunk, 0, hyphen)
What do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose this method on the unsubstantiated hunch that it would have better performance - rstrip creates a new string, right? And re seemed like overkill. But I don't feel strongly about it.
Yes, the 0 index can be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slicing also creates a new string.
Your version looks the most obvious (it may the slowest one because Python generators are slower than C code), strip-version is the shortest code, and re-version may be potentially the most efficient for very long strings, but I am not sure that the difference exist in common case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the rstrip version most readable, so if there is no performance concern I would go with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it does not matter and I already merged your initial version. Thank you for your PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, thank you for the review and merge!
textwrap currently ignores hyphens when breaking up words longer than width (with the break_long_words=True option), even if break_on_hyphens=True was selected. This PR makes it break long words on hyphens if those exist.
https://bugs.python.org/issue28660