Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string formatting: normalize negative zero #90153

Closed
belm0 mannequin opened this issue Dec 6, 2021 · 24 comments
Closed

string formatting: normalize negative zero #90153

belm0 mannequin opened this issue Dec 6, 2021 · 24 comments
Assignees
Labels
3.11 stdlib type-feature

Comments

@belm0
Copy link
Mannequin

@belm0 belm0 mannequin commented Dec 6, 2021

BPO 45995
Nosy @mdickinson, @belm0, @ericvsmith, @stevendaprano, @serhiy-storchaka, @belm0
PRs
  • #30049
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/mdickinson'
    closed_at = None
    created_at = <Date 2021-12-06.12:03:46.943>
    labels = ['type-feature', 'library', '3.11']
    title = 'string formatting: normalize negative zero'
    updated_at = <Date 2022-03-13.12:04:51.159>
    user = 'https://github.com/belm0'

    bugs.python.org fields:

    activity = <Date 2022-03-13.12:04:51.159>
    actor = 'mark.dickinson'
    assignee = 'mark.dickinson'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2021-12-06.12:03:46.943>
    creator = 'John Belmonte'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 45995
    keywords = ['patch']
    message_count = 23.0
    messages = ['407792', '407793', '407796', '407810', '407822', '407857', '407905', '407928', '407929', '407930', '407934', '407973', '407990', '408367', '408413', '408529', '408800', '410838', '411243', '411295', '411302', '412360', '415034']
    nosy_count = 6.0
    nosy_names = ['mark.dickinson', 'jbelmonte', 'eric.smith', 'steven.daprano', 'serhiy.storchaka', 'John Belmonte']
    pr_nums = ['30049']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue45995'
    versions = ['Python 3.11']

    @belm0
    Copy link
    Mannequin Author

    @belm0 belm0 mannequin commented Dec 6, 2021

    proposal: add a string formatting option to normalize negative 0 values to 0

    use case: rounded display of a float that is nominally 0, where the distraction of a flashing minus sign from minute changes around 0 is unwanted

    example:
    >>> '%~5.1f' % -.00001
    '  0.0'

    format spec before:
    format_spec ::= [[fill]align][sign][#][0][width][grouping_option][.precision][type]
    after:
    format_spec ::= [[fill]align][sign][~][#][0][width][grouping_option][.precision][type]
    where '~' is only allowed for number types

    implementation: if '~' is present in the spec, add 0 to the value after applying precision

    @belm0 belm0 mannequin added stdlib type-feature labels Dec 6, 2021
    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Dec 6, 2021

    To normalize negative 0.0 to 0.0 you can just add 0.0. It will work with any method of converting floats to string, there is no need to change all formatting specifications.

    >>> x = -0.0
    >>> x
    -0.0
    >>> x + 0.0
    0.0

    @belm0
    Copy link
    Mannequin Author

    @belm0 belm0 mannequin commented Dec 6, 2021

    To normalize negative 0.0 to 0.0 you can just add 0.0.

    yes, I'm aware-- see "implementation" in the original post

    there is no need to change all formatting specifications

    For adding 0 to work, it must be done after the rounding. That means if you want make use of anything in the current formatting spec regarding precision, normalizing negative zero would need to be a proper option of the formatting spec.

    @stevendaprano
    Copy link
    Member

    @stevendaprano stevendaprano commented Dec 6, 2021

    It was decided long ago that % formatting would not be enhanced with new features. I think that it is supposed to match the standard C formatting codes, and nothing else.

    So this is unlikely to be approved for % formatting.

    It *might* be approved for the format method and f-strings. (I suspect that unless you get immediate and uncontroversial agreement from multiple core developers, you may need to take it for further discussion and perhaps even a PEP.)

    What you call a "distraction" I consider to be critical part of the display. The numbers you are displaying actually are negative, and rounding them for display does not change that. So they ought to show the minus sign. So I'm not really very sympathetic to this feature request.

    @belm0
    Copy link
    Mannequin Author

    @belm0 belm0 mannequin commented Dec 6, 2021

    Here is the same proposal made for C++ std::format:

    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1496r2.pdf

    It makes fair arguments for the feature's use, and explains why the problem is hard to work around.

    It was withdrawn by the author for C++20, but a consensus proposal is promised for C++23.

    @mdickinson
    Copy link
    Member

    @mdickinson mdickinson commented Dec 6, 2021

    I'd support having this functionality available for format and for f-strings. (As Steven says, changing %-formatting doesn't seem viable.) It really is awkward to do this in any other way, and I'm reliably informed that normal people don't expect to see negative zeros in formatted numeric output.

    It did take me a few minutes to get my head around the idea that f"{-0.01:+.1f}" would return "+0.0" rather than "-0.0" or " 0.0" or just plain "0.0" under this proposal, but I agree that it seems like the only thing that can be consistent and make sense.

    I'm not 100% convinced by the particular spelling proposed, but I don't have anything better to suggest. If C++ might be going with a "z", would it make sense to do the same for Python?

    I don't forsee any implementation difficulties for float and complex types. For Decimal, we'd need to "own" the string formatting, taking that responsibility away from mpdecimal, but there are already other reasons to do that. Once we've done that, again the implementation doesn't seem onerous.

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Dec 7, 2021

    Well, it makes sense for negative zero produced by rounding.

    But if we add a special support for this case, it would be useful to have some control on the type of rounding. Currently floats are rounded to the nearest decimal number, but in some cases it would be better to round up, down, toward zero or infinity (seed for example bpo-44884).

    You can round explicitly before formatting, but this solution is also applicable for this issue:

    >>> '%5.1f' % (round(-.00001, 1) + 0.0)
    '  0.0'

    @belm0
    Copy link
    Mannequin Author

    @belm0 belm0 mannequin commented Dec 7, 2021

    changing %-formatting doesn't seem viable

    I'm concerned about treating %-formatting specially. As far as float/complex, the logical and efficient place to put this change seems to be PyOS_double_to_string(), which affects all three formatting options.

    For example, the dtoa case is as simple as this change to format_float_short():

    /* coerce negative zero to positive */
    if (sign == 1 && ((digits_len == 0 && decpt == -1) ||
                      (digits_len == 1 && digits[0] == '0'))) {
        sign = 0;
    }
    

    @stevendaprano
    Copy link
    Member

    @stevendaprano stevendaprano commented Dec 7, 2021

    Sorry John, I don't understand your comment about "treating %-formatting
    specifically". Isn't the point here not to change %-formatting at all?

    @ericvsmith
    Copy link
    Member

    @ericvsmith ericvsmith commented Dec 7, 2021

    %-formatting already doesn't support some formats that float.__format__ does, for example ','.

    So I agree we shouldn't modify %-formatting. I don't have much of an opinion on whether changing __format__ is a good idea or not.

    @belm0
    Copy link
    Mannequin Author

    @belm0 belm0 mannequin commented Dec 7, 2021

    I see now. PyOS_double_to_string() could gain the extra flag to coerce negative zero but, out of the three formatting methods, only format() and f-string would use the flag.

    @ericvsmith
    Copy link
    Member

    @ericvsmith ericvsmith commented Dec 7, 2021

    PyOS_double_to_string is part of the stable ABI. I don't recall if we're allowed to add new bitfield flags to a stable ABI function. We'd use a new Py_DTSF_NORMALIZE_NEGATIVE_0 flag for this feature.

    I suspect we can't add a flag, due to comparability reasons (new code setting the flag being used in old versions of python without it), but we'd need to research. I saw a similar discussion within the last few years, but of course now I can't find it. Maybe old versions would correctly ignore the new bit being set.

    This proposal becomes less interesting if we'd need to add a new function to support it. Although I guess we could do something that's internal-only.

    @belm0
    Copy link
    Mannequin Author

    @belm0 belm0 mannequin commented Dec 8, 2021

    I'll share a draft PR soon (excluding Decimal), so far it's still looking straightforward.

    Maybe old versions would correctly ignore the new bit being set.

    That's one of the benefits of using bit flags in an ABI: backward-compatible extensibility. (An implementation can defeat it by intentionally failing if unknown bits are encountered, but the code in question doesn't appear to be doing this.)

    You can round explicitly before formatting

    >>> '%5.1f' % (round(-.00001, 1) + 0.0)

    Yes, I have experience with it.

    1. even as a one-off, it's questionable. If someone accidentally changes the precision in only one of the spec string or round call, that's a bug.
    2. since applications and libraries may pass around format specs, and because of (1), you'll try to make a programmatic solution. Now you're parsing format spec strings.
    3. while a programmatic solution can be done for a function API like format() that takes a separate spec and value, there is no sane way to wrap f-strings

    @belm0
    Copy link
    Mannequin Author

    @belm0 belm0 mannequin commented Dec 12, 2021

    For Decimal, we'd need to "own" the string formatting, taking that responsibility away from mpdecimal, but there are already other reasons to do that.

    After some digging, I believe this is the background on forking pieces of mpdecimal (and why the existing source copy inside Python doesn't count as a fork):
    https://bugs.python.org/issue45708#msg405895
    #29438

    If I understand correctly, the PR for supporting underscore separators in Decimal formatting is only taking control of generating a mpd_spec_t from the spec string. Formatting itself is still done by mpd_qformat_spec().

    So there's outstanding work to also pull the formatting code itself into _decimal.c. (And this is wanted anyway to reconcile existing libmpdec formatting modifications: 298131a)

    And this is all because vendors have the crazy practice of unbundling libmpdec from Python. (If a project is bundling the source of another, there may be some reason...?)

    @belm0
    Copy link
    Mannequin Author

    @belm0 belm0 mannequin commented Dec 12, 2021

    potential short-term solution for Decimal:

      if negative zero option is set and sign is negative:
          pre-round into a temp using mpd_qrescale()
          if mpd_iszero(temp):
              change sign to positive

    @belm0
    Copy link
    Mannequin Author

    @belm0 belm0 mannequin commented Dec 14, 2021

    implemented float and Decimal-- PR is ready for review

    @mdickinson
    Copy link
    Member

    @mdickinson mdickinson commented Dec 17, 2021

    Thanks, John. I should have time to review within the next week or so.

    @belm0
    Copy link
    Mannequin Author

    @belm0 belm0 mannequin commented Jan 18, 2022

    Mark, would you give it a review this month? (PR has been marked stale.)

    @mdickinson
    Copy link
    Member

    @mdickinson mdickinson commented Jan 22, 2022

    [John]

    Mark, would you give it a review this month?

    Apologies; my holiday-break free time was nobbled from unexpected quarters. I can't promise to find time this month, but I can promise to try. I did at least skim through the PR, and while there are still likely some iterations needed I'm satisfied that this is technically feasible.

    But I'm afraid that's the easy part. If this is to go in, the other problem we still have to solve is achieving some consensus among the core developers that this is worth doing. Right now, judging by comments on this issue, I think I'm the only core dev who thinks this is a good idea; others are lukewarm at best, and I'm not willing to unilaterally approve and merge these changes without something closer to a consensus. There are a couple of ways forward here:

    • Post the proposal on python-ideas to get wider visibility and feedback. If everyone agrees this is a great idea (from experience, this seems an unlikely outcome), then we can go ahead and merge. Otherwise we'd likely need a PEP to move forward.

    • Bypass the python-ideas step, write the PEP, discuss in the appropriate forums, and then submit to the SC for approval / rejection.

    • Convince Eric Smith. :-) With apologies to Eric for singling him out: Eric could reasonably be described as the steward/maintainer of the formatting machinery, so if he's persuaded, that's good enough for me.

    The fact that you've already created a working implementation so that people can experiment is a bonus when it comes to trying to sell this to others.

    I don't have the bandwidth to write a PEP, but I would be happy to act as PEP sponsor.

    @ericvsmith
    Copy link
    Member

    @ericvsmith ericvsmith commented Jan 22, 2022

    Wow, thanks, Mark!

    I'm generally in favor. The selling points to me are that it needs to happen post-rounding, and the C++ discussion. It would be better if this were already accepted in C++. I'll note that the paper is proposing a 'z' modifier to the sign, so I guess for us that would translate to: [sign[optional-z]] instead of just sign. I'd have to noodle through the differences between that the proposed [sign][~]. I guess this would all be worked out in a PEP.

    My only reservation is Mark's comment: """For Decimal, we'd need to "own" the string formatting, taking that responsibility away from mpdecimal, but there are already other reasons to do that."""

    If Mark is okay with that (right back at you, Mark!), then I think a PEP is the next step. It doesn't need to be huge, sort of like PEP-378.

    @belm0
    Copy link
    Mannequin Author

    @belm0 belm0 mannequin commented Jan 22, 2022

    Thank you Mark and Eric.

    I'll note that the paper is proposing a 'z' modifier to the sign, so I guess for us that would translate to: [sign[optional-z]] instead of just sign. I'd have to noodle through the differences between that the proposed [sign][~].

    The C++ paper proposes [sign][z] (i.e. you can have the z alone without an explicit +/-), and this is what I implemented in the Python PR. My original proposal with tilde was discarded.

    My only reservation is Mark's comment: """For Decimal, we'd need to "own" the string formatting, taking that responsibility away from mpdecimal, but there are already other reasons to do that."""

    In the PR I was able to avoid taking that on by preprocessing the format string before handing it to mpdecimal. The code was already doing such things to handle the NULL fill character.

    @belm0
    Copy link
    Mannequin Author

    @belm0 belm0 mannequin commented Feb 2, 2022

    @mdickinson
    Copy link
    Member

    @mdickinson mdickinson commented Mar 13, 2022

    I forgot to update here:

    PEP at python/peps#2295

    For the record, PEP-682 has been accepted.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    mdickinson added a commit that referenced this issue Apr 11, 2022
    …0049)
    
    Add "z" format specifier to coerce negative 0 to zero.
    
    See #90153 (originally https://bugs.python.org/issue45995) for discussion.
    This covers `str.format()` and f-strings.  Old-style string interpolation is not supported.
    
    Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
    @mdickinson
    Copy link
    Member

    @mdickinson mdickinson commented Apr 11, 2022

    PR #30049 has been merged!

    gvanrossum pushed a commit to gvanrossum/cpython that referenced this issue Apr 13, 2022
    …onGH-30049)
    
    Add "z" format specifier to coerce negative 0 to zero.
    
    See python#90153 (originally https://bugs.python.org/issue45995) for discussion.
    This covers `str.format()` and f-strings.  Old-style string interpolation is not supported.
    
    Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
    mdickinson pushed a commit that referenced this issue Jun 11, 2022
    Add what's new entry for PEP 682 in Python 3.11.
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 11, 2022
    Add what's new entry for PEP 682 in Python 3.11.
    (cherry picked from commit 010284b)
    
    Co-authored-by: John Belmonte <john@neggie.net>
    miss-islington added a commit that referenced this issue Jun 11, 2022
    Add what's new entry for PEP 682 in Python 3.11.
    (cherry picked from commit 010284b)
    
    Co-authored-by: John Belmonte <john@neggie.net>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 stdlib type-feature
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants