Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make inspect.signature expression evaluation more powerful #68155

Open
larryhastings opened this issue Apr 15, 2015 · 11 comments
Open

Make inspect.signature expression evaluation more powerful #68155

larryhastings opened this issue Apr 15, 2015 · 11 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@larryhastings
Copy link
Contributor

larryhastings commented Apr 15, 2015

BPO 23967
Nosy @ncoghlan, @larryhastings, @zware, @serhiy-storchaka, @1st1, @pdmccormick
Files
  • larry.improved.signature.expressions.1.txt
  • pdm-argument_clinic-mixed_py_and_c_defaults-v1.patch: Argument Clinic patch simplifying the use of the improved signatures
  • larry.improved.signature.expressions.2.txt
  • larry.improved.signature.expressions.3.txt
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/larryhastings'
    closed_at = None
    created_at = <Date 2015-04-15.18:52:37.516>
    labels = ['type-feature', 'library']
    title = 'Make inspect.signature expression evaluation more powerful'
    updated_at = <Date 2020-05-29.17:47:00.140>
    user = 'https://github.com/larryhastings'

    bugs.python.org fields:

    activity = <Date 2020-05-29.17:47:00.140>
    actor = 'brett.cannon'
    assignee = 'larry'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2015-04-15.18:52:37.516>
    creator = 'larry'
    dependencies = []
    files = ['39047', '39066', '39123', '39181']
    hgrepos = []
    issue_num = 23967
    keywords = ['patch']
    message_count = 11.0
    messages = ['241140', '241204', '241205', '241478', '241533', '241534', '241850', '241853', '241855', '242006', '365315']
    nosy_count = 7.0
    nosy_names = ['ncoghlan', 'larry', 'zach.ware', 'serhiy.storchaka', 'yselivanov', 'pdmccormick', 'Eric Wieser']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue23967'
    versions = ['Python 3.5']

    @larryhastings
    Copy link
    Contributor Author

    larryhastings commented Apr 15, 2015

    Peter's working on converting socket to use Argument Clinic. He had a default that really should look like this:

        min(SOME_SOCKET_MODULE_CONSTANT, 128)

    "min" wasn't something we'd needed before. I thought about it and realized we could do a much better job of simulating the evaluation context of a shared module.

    Initially I thought, all I needed was to bolster the environment we used for eval() to add the builtins. (Which I've done.) But this wasn't sufficient because we deliberately used ast.literal_eval(), which doesn't support function calls by design for superior security. Or subscripting, or attribute access. We already worked around those I think.

    But how concerned are we about security? What is the attack vector here? If the user is able to construct an object that has a villainous __text_signature__ on it... surely they could already do as they like?

    So here's a first draft at modifying the __text_signature__ evaluation environment so it can handle much more sophisticated expressions. It can use anything from builtins, or anything in sys.modules, or anything in the current module; it can call functions, and subscript, and access attributes, and everything.

    To make this work I had to write an ast printer that produces evaluatable Python code. Note that it's not complete, I know it's not complete, it's missing loads of operators. Assume that if this is a good idea I will add all the missing operators.

    Nick was worried that *in the future* we might expose a "turn this string into a signature" function. That might make an easier attack vector. So he asked that the "trusted=" keyword flag be added, and the full-on eval only happen if the string is trusted.

    @larryhastings larryhastings self-assigned this Apr 15, 2015
    @larryhastings larryhastings added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 15, 2015
    @pdmccormick
    Copy link
    Mannequin

    pdmccormick mannequin commented Apr 16, 2015

    This definitely works for the _socket.listen use case!

    In terms of generating such a signature using Argument Clinic, currently this is required:

    backlog: int(py_default="builtins.min(SOMAXCONN, 128)", c_default="Py_MIN(SOMAXCONN, 128)") = 000
    

    The attached patch lets Tools/clinic/clinic.py make an exception when both C and Python defaults are specified, simplifying the above to:

    backlog: int(py_default="builtins.min(SOMAXCONN, 128)", c_default="Py_MIN(SOMAXCONN, 128)")
    

    @pdmccormick
    Copy link
    Mannequin

    pdmccormick mannequin commented Apr 16, 2015

    I missed the fact that Larry's patch obviates the need for the builtins. prefix, shortening the Argument Clinic parameter specification into:

    backlog: int(py_default="min(SOMAXCONN, 128)", c_default="Py_MIN(SOMAXCONN, 128)")
    

    @larryhastings
    Copy link
    Contributor Author

    larryhastings commented Apr 19, 2015

    I should mention that evalify_node() is pretty hacked up here, and is not ready to be checked in. (I'm proposing separately that we simply add something like this directly into the standard library, see issue bpo-24002.)

    @larryhastings
    Copy link
    Contributor Author

    larryhastings commented Apr 19, 2015

    Thanks to bpo-24002 I now know how to write evalify_node properly. This patch is now much better.

    Note that I deliberately made the new function _eval_ast_expr() as a "private" module-level routine. I need that same functionality in Argument Clinic too, so if both patches are accepted I'll have Clinic switch to calling this version.

    @larryhastings
    Copy link
    Contributor Author

    larryhastings commented Apr 19, 2015

    Whoops. Here's the revised patch.

    @larryhastings
    Copy link
    Contributor Author

    larryhastings commented Apr 23, 2015

    Cleaned up the patch some more--the code was stupid in a couple places. I think it's ready to go in.

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Apr 23, 2015

    Using complex expressions is deceitful. In Python functions the default value is evaluated only once, at function creation time, but inspect.signature will evaluate it every time. For example foo(x={}) and foo(x=dict()) means the same in function declaration, but different in signature.

    It could also affect security, because allow arbitrary code execution at the place where it was not allowed before.

    I think this issue should be discussed on Python-Dev. I'm not sure that it is pythonic.

    @larryhastings
    Copy link
    Contributor Author

    larryhastings commented Apr 23, 2015

    It's only used for signatures in builtins. Any possible security hole here is uninteresting because the evil hacker already got to run arbitrary C code in the module init.

    Because it's only used for signatures in builtins, we shouldn't encounter a function with a mutable default value like {} or [] which gets mutated later. Builtins don't have those.

    In case you're wondering about the "trusted" parameter, that was suggested by Nick Coghlan at the PyCon sprints. He's thinking that other callers may use _signature_fromstr() in the future, and he wanted the API to make it clear that future uses may be on non-trustworthy sources.

    And, finally, consider that the original version already calls eval(). Admittedly it uses eval() in a way that should be much harder to exploit. But it's not an enormous difference between the two calls.

    I don't really think we need to post to python-dev about this.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Apr 25, 2015

    Right, Larry and I had a fairly long discussion about this idea at the sprints, and I was satisfied that all the cases where he's proposing to use this are safe: in order to exploit them you need to be able to set __text_signature__ on arbitrary objects, and if an attacker can do that, you've already lost control of the process.

    However, a natural future extension is to expose this as a public alternative constructor for Signature objects, and for that, the fact that it ultimately calls eval() under the hood presents more of a security risk. The "trusted=False" default on _signature_fromstr allows the function to be used safely on untrusted data, while allowing additional flexibility when you *do* trust the data you're evaluating.

    @EricWieser
    Copy link
    Mannequin

    EricWieser mannequin commented Mar 30, 2020

    To make this work I had to write an ast printer that produces evaluatable Python code. Note that it's not complete, I know it's not complete, it's missing loads of operators. Assume that if this is a good idea I will add all the missing operators.

    Now that ast.unparse is in (bpo-38870), can this patch be simplified?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    hauntsaninja added a commit to hauntsaninja/cpython that referenced this issue Oct 28, 2022
    …handling
    
    This makes a couple related changes to inspect.signature's behaviour
    when parsing a signature from `__text_signature__`.
    
    First, `inspect.signature` is documented as only raising ValueError or
    TypeError. However, in some cases, we could raise RuntimeError.  This PR
    changes that, thereby fixing python#83685.
    
    (Note that the new ValueErrors in RewriteSymbolics are caught and then
    reraised with a message)
    
    Second, `inspect.signature` could randomly drop parameters that it
    didn't understand (corresponding to `return None` in the `p` function).
    This is the core issue in python#85267. I think this is very surprising
    behaviour and it seems better to fail outright.
    
    Third, adding this new failure broke a couple tests. To fix them (and to
    e.g. allow `inspect.signature(select.epoll.register)` as in python#85267), I
    add constant folding of a couple binary operations to RewriteSymbolics.
    
    (There's some discussion of making signature expression evaluation
    arbitrary powerful in python#68155. I think that's out of scope. The
    additional constant folding here is pretty straightforward, useful, and
    not much of a slippery slope)
    
    Fourth, while python#85267 is incorrect about the cause of the issue, it turns
    out if you had consecutive newlines in __text_signature__, you'd get
    `tokenize.TokenError`.
    
    Finally, the `if name is invalid:` code path was dead, since
    `parse_name` never returned `invalid`.
    JelleZijlstra pushed a commit that referenced this issue Dec 21, 2022
    …ng (#98796)
    
    This makes a couple related changes to inspect.signature's behaviour
    when parsing a signature from `__text_signature__`.
    
    First, `inspect.signature` is documented as only raising ValueError or
    TypeError. However, in some cases, we could raise RuntimeError.  This PR
    changes that, thereby fixing #83685.
    
    (Note that the new ValueErrors in RewriteSymbolics are caught and then
    reraised with a message)
    
    Second, `inspect.signature` could randomly drop parameters that it
    didn't understand (corresponding to `return None` in the `p` function).
    This is the core issue in #85267. I think this is very surprising
    behaviour and it seems better to fail outright.
    
    Third, adding this new failure broke a couple tests. To fix them (and to
    e.g. allow `inspect.signature(select.epoll.register)` as in #85267), I
    add constant folding of a couple binary operations to RewriteSymbolics.
    
    (There's some discussion of making signature expression evaluation
    arbitrary powerful in #68155. I think that's out of scope. The
    additional constant folding here is pretty straightforward, useful, and
    not much of a slippery slope)
    
    Fourth, while #85267 is incorrect about the cause of the issue, it turns
    out if you had consecutive newlines in __text_signature__, you'd get
    `tokenize.TokenError`.
    
    Finally, the `if name is invalid:` code path was dead, since
    `parse_name` never returned `invalid`.
    hauntsaninja added a commit to hauntsaninja/cpython that referenced this issue Dec 21, 2022
    …ture__ handling (pythonGH-98796)
    
    This makes a couple related changes to inspect.signature's behaviour
    when parsing a signature from `__text_signature__`.
    
    First, `inspect.signature` is documented as only raising ValueError or
    TypeError. However, in some cases, we could raise RuntimeError.  This PR
    changes that, thereby fixing pythonGH-83685.
    
    (Note that the new ValueErrors in RewriteSymbolics are caught and then
    reraised with a message)
    
    Second, `inspect.signature` could randomly drop parameters that it
    didn't understand (corresponding to `return None` in the `p` function).
    This is the core issue in pythonGH-85267. I think this is very surprising
    behaviour and it seems better to fail outright.
    
    Third, adding this new failure broke a couple tests. To fix them (and to
    e.g. allow `inspect.signature(select.epoll.register)` as in pythonGH-85267), I
    add constant folding of a couple binary operations to RewriteSymbolics.
    
    (There's some discussion of making signature expression evaluation
    arbitrary powerful in pythonGH-68155. I think that's out of scope. The
    additional constant folding here is pretty straightforward, useful, and
    not much of a slippery slope)
    
    Fourth, while pythonGH-85267 is incorrect about the cause of the issue, it turns
    out if you had consecutive newlines in __text_signature__, you'd get
    `tokenize.TokenError`.
    
    Finally, the `if name is invalid:` code path was dead, since
    `parse_name` never returned `invalid`..
    (cherry picked from commit 79311cb)
    
    Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
    hauntsaninja added a commit to hauntsaninja/cpython that referenced this issue Dec 21, 2022
    …ture__ handling (pythonGH-98796)
    
    This makes a couple related changes to inspect.signature's behaviour
    when parsing a signature from `__text_signature__`.
    
    First, `inspect.signature` is documented as only raising ValueError or
    TypeError. However, in some cases, we could raise RuntimeError.  This PR
    changes that, thereby fixing pythonGH-83685.
    
    (Note that the new ValueErrors in RewriteSymbolics are caught and then
    reraised with a message)
    
    Second, `inspect.signature` could randomly drop parameters that it
    didn't understand (corresponding to `return None` in the `p` function).
    This is the core issue in pythonGH-85267. I think this is very surprising
    behaviour and it seems better to fail outright.
    
    Third, adding this new failure broke a couple tests. To fix them (and to
    e.g. allow `inspect.signature(select.epoll.register)` as in pythonGH-85267), I
    add constant folding of a couple binary operations to RewriteSymbolics.
    
    (There's some discussion of making signature expression evaluation
    arbitrary powerful in pythonGH-68155. I think that's out of scope. The
    additional constant folding here is pretty straightforward, useful, and
    not much of a slippery slope)
    
    Fourth, while pythonGH-85267 is incorrect about the cause of the issue, it turns
    out if you had consecutive newlines in __text_signature__, you'd get
    `tokenize.TokenError`.
    
    Finally, the `if name is invalid:` code path was dead, since
    `parse_name` never returned `invalid`..
    (cherry picked from commit 79311cb)
    
    Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
    JelleZijlstra pushed a commit that referenced this issue Dec 21, 2022
    … handling (GH-98796) (#100392)
    
    This makes a couple related changes to inspect.signature's behaviour
    when parsing a signature from `__text_signature__`.
    
    First, `inspect.signature` is documented as only raising ValueError or
    TypeError. However, in some cases, we could raise RuntimeError.  This PR
    changes that, thereby fixing GH-83685.
    
    (Note that the new ValueErrors in RewriteSymbolics are caught and then
    reraised with a message)
    
    Second, `inspect.signature` could randomly drop parameters that it
    didn't understand (corresponding to `return None` in the `p` function).
    This is the core issue in GH-85267. I think this is very surprising
    behaviour and it seems better to fail outright.
    
    Third, adding this new failure broke a couple tests. To fix them (and to
    e.g. allow `inspect.signature(select.epoll.register)` as in GH-85267), I
    add constant folding of a couple binary operations to RewriteSymbolics.
    
    (There's some discussion of making signature expression evaluation
    arbitrary powerful in GH-68155. I think that's out of scope. The
    additional constant folding here is pretty straightforward, useful, and
    not much of a slippery slope)
    
    Fourth, while GH-85267 is incorrect about the cause of the issue, it turns
    out if you had consecutive newlines in __text_signature__, you'd get
    `tokenize.TokenError`.
    
    Finally, the `if name is invalid:` code path was dead, since
    `parse_name` never returned `invalid`..
    (cherry picked from commit 79311cb)
    
    Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
    JelleZijlstra pushed a commit that referenced this issue Dec 21, 2022
    … handling (GH-98796) (#100393)
    
    This makes a couple related changes to inspect.signature's behaviour
    when parsing a signature from `__text_signature__`.
    
    First, `inspect.signature` is documented as only raising ValueError or
    TypeError. However, in some cases, we could raise RuntimeError.  This PR
    changes that, thereby fixing GH-83685.
    
    (Note that the new ValueErrors in RewriteSymbolics are caught and then
    reraised with a message)
    
    Second, `inspect.signature` could randomly drop parameters that it
    didn't understand (corresponding to `return None` in the `p` function).
    This is the core issue in GH-85267. I think this is very surprising
    behaviour and it seems better to fail outright.
    
    Third, adding this new failure broke a couple tests. To fix them (and to
    e.g. allow `inspect.signature(select.epoll.register)` as in GH-85267), I
    add constant folding of a couple binary operations to RewriteSymbolics.
    
    (There's some discussion of making signature expression evaluation
    arbitrary powerful in GH-68155. I think that's out of scope. The
    additional constant folding here is pretty straightforward, useful, and
    not much of a slippery slope)
    
    Fourth, while GH-85267 is incorrect about the cause of the issue, it turns
    out if you had consecutive newlines in __text_signature__, you'd get
    `tokenize.TokenError`.
    
    Finally, the `if name is invalid:` code path was dead, since
    `parse_name` never returned `invalid`..
    (cherry picked from commit 79311cb)
    
    Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants