Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyLong_FromString documentation should state that the string must be null-terminated #59200

Closed
rfk mannequin opened this issue Jun 4, 2012 · 10 comments
Closed

PyLong_FromString documentation should state that the string must be null-terminated #59200

rfk mannequin opened this issue Jun 4, 2012 · 10 comments
Labels
3.11 docs Documentation in the Doc dir easy type-feature A feature request or enhancement

Comments

@rfk
Copy link
Mannequin

rfk mannequin commented Jun 4, 2012

BPO 14995
Nosy @Rosuav, @MojoVampire, @cryvate, @cryvate
PRs
  • bpo-14995: Clarify in PyLong_FromString that digits should be ended by a null byte #26774
  • Files
  • PyLong_FromString-doc.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2012-06-04.00:32:52.870>
    labels = ['easy', '3.11', 'type-feature', 'docs']
    title = 'PyLong_FromString documentation should state that the string must be null-terminated'
    updated_at = <Date 2021-06-18.04:13:42.001>
    user = 'https://bugs.python.org/rfk'

    bugs.python.org fields:

    activity = <Date 2021-06-18.04:13:42.001>
    actor = 'josh.r'
    assignee = 'docs@python'
    closed = False
    closed_date = None
    closer = None
    components = ['Documentation']
    creation = <Date 2012-06-04.00:32:52.870>
    creator = 'rfk'
    dependencies = []
    files = ['25812']
    hgrepos = []
    issue_num = 14995
    keywords = ['patch', 'easy']
    message_count = 3.0
    messages = ['162242', '212559', '396032']
    nosy_count = 6.0
    nosy_names = ['rfk', 'docs@python', 'Rosuav', 'josh.r', 'Cryvate', 'cryvate']
    pr_nums = ['26774']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue14995'
    versions = ['Python 3.11']

    @rfk
    Copy link
    Mannequin Author

    rfk mannequin commented Jun 4, 2012

    PyLong_FromString will raise a ValueError if the given string doesn't contain a null byte after the digits. For example, this will result in a ValueError

       char *pend;
       PyLong_FromString("1234 extra", &pend, 10)

    While this will successfully read the number and set the pointer to the extra data:

       char *pend;
       PyLong_FromString("1234\0extra", &pend, 10)

    The requirement for a null-terminated string of digits is not clear from the docs. Suggested re-wording attached.

    @rfk rfk mannequin assigned docspython Jun 4, 2012
    @rfk rfk mannequin added the docs Documentation in the Doc dir label Jun 4, 2012
    @Rosuav
    Copy link
    Contributor

    Rosuav commented Mar 2, 2014

    Patch doesn't apply to current Python trunk (18 months later). Do you know what version you wrote this against? The current wording is different.

    @iritkatriel iritkatriel added easy 3.11 type-feature A feature request or enhancement labels Jun 17, 2021
    @MojoVampire
    Copy link
    Mannequin

    MojoVampire mannequin commented Jun 18, 2021

    The description is nonsensical as is; not sure the patch goes far enough. C-style strings are *defined* to end at the NUL terminator; if it really needs a NUL after the int, saying it "points to the first character which follows the representation of the number" is highly misleading; the NUL isn't logically a character in the C-string way of looking at things.

    The patch is also wrong; the digits need not end in a NUL byte (trailing whitespace is allowed).

    AFAICT, the function really uses pend for two purposes:

    1. If it succeeds in parsing, then pend reports the end of the string, nothing else
    2. If it fails, because the string is not a legal input (contains non-numeric, or non-leading/terminal whitespace or whatever), pend tells you where the first violation character that couldn't be massaged to meet the rules for int() occurred.

    #1 is a mostly useless bit of info (strlen would be equally informative, and if the value parsed, you rarely care how long it was anyway), so pend is, practically speaking, solely for error-checking/reporting.

    The rewrite should basically say what is allowed (making it clear anything beyond the single parsable integer value with optional leading/trailing whitespace is illegal), and making it clear that pend always points to the end of the string on success (not just after the representation of the number, it's after the trailing whitespace too), and on failure indicates where parsing failed.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @JustAnotherArchivist
    Copy link
    Contributor

    JustAnotherArchivist commented Jun 14, 2022

    Has it been considered to instead change the function to fit its description, i.e. not require a NULL-terminated string? I have a use case that would benefit from this (decoding arbitrary ints when parsing JSON without memcpy or mangling to insert a NUL).

    @Rosuav
    Copy link
    Contributor

    Rosuav commented Jun 14, 2022

    It's documented as taking a const char * bytestring, which is C's standard NUL-terminated string. The current behaviour does fit the documentation. If you want a version that doesn't require a \0 at the end of the string, you'd probably do best to write it as a more restricted function (eg no support for bases other than 10, unless that is actually a requirement).

    @JustAnotherArchivist
    Copy link
    Contributor

    JustAnotherArchivist commented Jun 14, 2022

    @Rosuav No, the behaviour does not fit the documentation. The problem isn't the end of str but that the string might contain more than just the number to be parsed. The documentation implies that "1234 foo" should be parseable, resulting in an int(1234) and **pend pointing at the f (or, actually, the space?). Instead, the function expects a NUL immediately after the int (and any optional whitespace). And that is not what the documentation says.

    @Rosuav
    Copy link
    Contributor

    Rosuav commented Jun 14, 2022

    Hmm, I'm not convinced that it implies that "1234 foo" is parseable. But if there is an issue, it's one with documentation clarity, not with the code failing to fit the specification.

    @JustAnotherArchivist
    Copy link
    Contributor

    JustAnotherArchivist commented Jun 14, 2022

    Hmm, I suppose it could also be read differently, although at least two people have understood it that way (the original issue submitter and me).

    #26774 has been stale for almost a year, although the reviews requesting changes were only added a few months ago. Fine if I create a replacement PR for it?

    @Rosuav
    Copy link
    Contributor

    Rosuav commented Jun 14, 2022

    Yeah, I think a replacement PR would be the way to go here - I'm not aware of a way to contribute to someone else's PR when you're not part of the organization.

    JustAnotherArchivist added a commit to JustAnotherArchivist/cpython that referenced this issue Jun 28, 2022
    * Indicate return value on error
    * Explain in what `*pend` points to exactly both on success and on error
    * Mention that trailing whitespace is ignored; also replace 'spaces' with 'whitespace' since all ASCII whitespace is skipped, not just spaces
    * Add that `str` must be NULL-terminated
    @hauntsaninja
    Copy link
    Contributor

    hauntsaninja commented Oct 12, 2022

    Thanks, looks like this has been completed

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 docs Documentation in the Doc dir easy type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants