Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-28660: make TextWrapper break long words on hyphens #22721

Merged
merged 7 commits into from Oct 18, 2020

Conversation

iritkatriel
Copy link
Member

@iritkatriel iritkatriel commented Oct 16, 2020

textwrap currently ignores hyphens when breaking up words longer than width (with the break_long_words=True option), even if break_on_hyphens=True was selected. This PR makes it break long words on hyphens if those exist.

https://bugs.python.org/issue28660

Lib/textwrap.py Outdated Show resolved Hide resolved
Lib/textwrap.py Outdated
chunk = reversed_chunks[-1]
if self.break_on_hyphens and len(chunk) > space_left:
hyphen = chunk.rfind('-', 0, space_left)
if hyphen != -1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"-1234567890" would split as "-" and "1234567890". Maybe check hyphen > 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Got me wondering about 1E-123456789 and 2**-123456789, but they are probably quite esoteric.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have "--a_long_command_name" it will break right after the --. To avoid that we could check that there is something other than '-' in chunk[0:hyphen]. Does that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be.

iritkatriel and others added 3 commits October 18, 2020 14:32
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Lib/textwrap.py Outdated
hyphen = chunk.rfind('-', 0, space_left)
if hyphen != -1:
if hyphen > 0 and any(c != '-' for c in chunk[0:hyphen]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply chunk[:hyphen] instead of chunk[0:hyphen].

Other ways to perform this test are:

chunk[:hyphen].rstrip('-')

and

re.compile(r'[^-]').search(chunk, 0, hyphen)

What do you prefer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose this method on the unsubstantiated hunch that it would have better performance - rstrip creates a new string, right? And re seemed like overkill. But I don't feel strongly about it.

Yes, the 0 index can be dropped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slicing also creates a new string.

Your version looks the most obvious (it may the slowest one because Python generators are slower than C code), strip-version is the shortest code, and re-version may be potentially the most efficient for very long strings, but I am not sure that the difference exist in common case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the rstrip version most readable, so if there is no performance concern I would go with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it does not matter and I already merged your initial version. Thank you for your PR!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, thank you for the review and merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants