Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argparse can't parse subparsers with parse_known_args #91199

Open
rive-n mannequin opened this issue Mar 17, 2022 · 6 comments
Open

Argparse can't parse subparsers with parse_known_args #91199

rive-n mannequin opened this issue Mar 17, 2022 · 6 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@rive-n
Copy link
Mannequin

rive-n mannequin commented Mar 17, 2022

BPO 47043
Nosy @lysnikolaou, @rive-n

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2022-03-17.13:49:54.593>
labels = ['interpreter-core', 'type-bug', '3.8']
title = "Argparse can't parse subparsers with parse_known_args"
updated_at = <Date 2022-03-22.08:42:30.118>
user = 'https://github.com/rive-n'

bugs.python.org fields:

activity = <Date 2022-03-22.08:42:30.118>
actor = 'rive-n'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Parser']
creation = <Date 2022-03-17.13:49:54.593>
creator = 'rive-n'
dependencies = []
files = []
hgrepos = []
issue_num = 47043
keywords = []
message_count = 4.0
messages = ['415412', '415414', '415490', '415751']
nosy_count = 2.0
nosy_names = ['lys.nikolaou', 'rive-n']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue47043'
versions = ['Python 3.8']

@rive-n
Copy link
Mannequin Author

rive-n mannequin commented Mar 17, 2022

Let's say we have default parser configuration:
https://docs.python.org/3/library/argparse.html#other-utilities

Like this one:

import argparse

parser = argparse.ArgumentParser(prog='PROG')
parser.add_argument('--foo', action='store_true', help='foo help')
subparsers = parser.add_subparsers(help='sub-command help')

# create the parser for the "a" command
parser_a = subparsers.add_parser('a', help='a help')
parser_a.add_argument('-a', help='bar help')

# create the parser for the "b" command
parser_b = subparsers.add_parser('b', help='b help')
parser_b.add_argument('-b', help='baz help')

So i want to parse all subparsers arguments. For this purpose i could use parse_known_args method.

But for some reason there is no way to get all of current args specified via command prompt or via list:
print(parser.parse_known_args(['a', '-a', '12', 'b', '-b', '32'])) - will not print 'a' and 'b' data. It will only contain first valid argument;

This is pretty strange behavior. Why i can't just get all of actions (subparsers) ?

@rive-n rive-n mannequin added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Mar 17, 2022
@rive-n
Copy link
Mannequin Author

rive-n mannequin commented Mar 17, 2022

Let's say we have default parser configuration:
https://docs.python.org/3/library/argparse.html#other-utilities

Like this one:

import argparse

parser = argparse.ArgumentParser(prog='PROG')
parser.add_argument('--foo', action='store_true', help='foo help')
subparsers = parser.add_subparsers(help='sub-command help')

# create the parser for the "a" command
parser_a = subparsers.add_parser('a', help='a help')
parser_a.add_argument('-a', help='bar help')

# create the parser for the "b" command
parser_b = subparsers.add_parser('b', help='b help')
parser_b.add_argument('-b', help='baz help')

So i want to parse all subparsers arguments. For this purpose i could use parse_known_args method.

But for some reason there is no way to get all of current args specified via command prompt or via list:
print(parser.parse_known_args(['a', '-a', '12', 'b', '-b', '32'])) - will not print 'a' and 'b' data. It will only contain first valid argument;

This is pretty strange behavior. Why i can't just get all of actions (subparsers) ?


I've found a solutions:

  1. Easy way (and most stupid):
args, unk_args = (parser.parse_known_args(['a', '-a', '12', 'b', '-b', '32']))
args, unk_args = (parser.parse_known_args(unk_args))

Simple recursive calls on unk_args

  1. Hard way -> sources fork (or just fix this, lul!)

parse_know_args -> 1799-1807 lines:

        try:
            namespace, args = self._parse_known_args(args, namespace)
            if hasattr(namespace, _UNRECOGNIZED_ARGS_ATTR):
                args.extend(getattr(namespace, _UNRECOGNIZED_ARGS_ATTR))
                delattr(namespace, _UNRECOGNIZED_ARGS_ATTR)
            return namespace, args
        except ArgumentError:
            err = _sys.exc_info()[1]
            self.error(str(err))

There is only 1 call to self._parse_known_args instead of amount of current subparsers.

@rive-n
Copy link
Mannequin Author

rive-n mannequin commented Mar 18, 2022

Hi again.

previous solution with:

        try:
            namespace, args = self._parse_known_args(args, namespace)
            if hasattr(namespace, _UNRECOGNIZED_ARGS_ATTR):
                args.extend(getattr(namespace, _UNRECOGNIZED_ARGS_ATTR))
                delattr(namespace, _UNRECOGNIZED_ARGS_ATTR)
            return namespace, args
        except ArgumentError:
            err = _sys.exc_info()[1]
            self.error(str(err))

Is not working at all. So i spent some time (a lot of time) to make changes in source code.
I figured out exactly how the algorithm works, I read about 3,000 lines of code. And here's what I came up with:
argparse.py:
line 1774: parse_known_args
basically this function calling protected one on line 1800:
amespace, args = self._parse_known_args(args, namespace)

This one is making magic. But we need to check line 1856:
def take_action(action, argument_strings, option_string=None):

This function creating objects of necessary classes. For example:
_SubParsersAction (problem in it).

Before fix:

    def __call__(self, parser, namespace, values, option_string=None):
        parser_name = values[0]
        arg_strings = values[1:]

        # set the parser name if requested
        if self.dest is not SUPPRESS:
            setattr(namespace, self.dest, parser_name)

        # select the parser
        try:
            parser = self._name_parser_map[parser_name]
        except KeyError:
            args = {'parser_name': parser_name,
                    'choices': ', '.join(self._name_parser_map)}
            msg = _('unknown parser %(parser_name)r (choices: %(choices)s)') % args
            raise ArgumentError(self, msg)

        # parse all the remaining options into the namespace
        # store any unrecognized options on the object, so that the top
        # level parser can decide what to do with them

        # In case this subparser defines new defaults, we parse them
        # in a new namespace object and then update the original
        # namespace for the relevant parts.
        subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
        for key, value in vars(subnamespace).items():
            setattr(namespace, key, value)

        if arg_strings:
            vars(namespace).setdefault(_UNRECOGNIZED_ARGS_ATTR, [])
            getattr(namespace, _UNRECOGNIZED_ARGS_ATTR).extend(arg_strings)

After fix:

    def __call__(self, parser, namespace, values, option_string=None, arg_strings_pattern:list =None):
        o_amount = arg_strings_pattern.count("O")
        if not o_amount:
            raise ValueError("No Os found")
        o_start, o_stop, indexes = arg_strings_pattern.index('O'), len(arg_strings_pattern), []
        print(parser)
        try:
            while arg_strings_pattern.index('O', o_start, o_stop):
                indexes.append(arg_strings_pattern.index('O', o_start, o_stop))
                o_start = arg_strings_pattern.index('O', o_start + 1, o_stop)
        except ValueError:
            pass

        for parser in range(o_amount):
            parser_name = values[indexes[parser] - 1] # indexes[parser] could give int (real index)
            arg_strings = values[indexes[parser]: indexes[(parser + 1)] - 1]  if parser < len(indexes) - 1 else \
            values[indexes[parser]:] # could give all data

            # set the parser name if requested
            if self.dest is not SUPPRESS:
                setattr(namespace, self.dest, parser_name)

            # select the parser
            try:
                parser = self._name_parser_map[parser_name]
            except KeyError:
                args = {'parser_name': parser_name,
                        'choices': ', '.join(self._name_parser_map)}
                msg = _('unknown parser %(parser_name)r (choices: %(choices)s)') % args
                raise ArgumentError(self, msg)

            # parse all the remaining options into the namespace
            # store any unrecognized options on the object, so that the top
            # level parser can decide what to do with them

            # In case this subparser defines new defaults, we parse them
            # in a new namespace object and then update the original
            # namespace for the relevant parts.
            subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
            for key, value in vars(subnamespace).items():
                setattr(namespace, key, value)

            if arg_strings:
                vars(namespace).setdefault(_UNRECOGNIZED_ARGS_ATTR, [])
                getattr(namespace, _UNRECOGNIZED_ARGS_ATTR).extend(arg_strings)

That's not the best solution but this solution works.

@rive-n
Copy link
Mannequin Author

rive-n mannequin commented Mar 22, 2022

Long time no updates here. Another fix. In past version more than 1 argument could not be parsed. Fix (finally with unittests):

    def __call__(self, parser, namespace, values, option_string=None, arg_strings_pattern:list =None):
        o_amount = arg_strings_pattern.count("O")
        if not o_amount:
            raise ValueError("No Os found")
        o_start, o_stop, indexes = arg_strings_pattern.index('O'), len(arg_strings_pattern), []
        print(parser)
        try:
            while arg_strings_pattern.index('O', o_start, o_stop):
                indexes.append(arg_strings_pattern.index('O', o_start, o_stop))
                o_start = arg_strings_pattern.index('O', o_start + 1, o_stop)
        except ValueError:
            pass

        used_indexes = []
        known_args = {}
        for i, index in enumerate(indexes):
            parser_name = values[index - 1]
            if not known_args.get(parser_name):
                known_args[parser_name] = []
                known_args[parser_name] += values[index: indexes[i + 1] - 1] if i + 1 < len(indexes) else values[index:]
            if index not in used_indexes:
                for s, subindex in enumerate(indexes[1:]):
                    subparser_name = values[subindex - 1]
                    if parser_name == subparser_name:
                        used_indexes.append(index)
                        used_indexes.append(subindex)
                        subparser_args = values[subindex: indexes[s + 2] - 1] if s + 2 < len(indexes) else values[subindex:]
                        known_args[parser_name] += subparser_args

        for parser_name, args in known_args.items():
            self._create_parser(namespace, parser_name, args)

    def _create_parser(self, namespace, parser_name, arg_strings):

        # set the parser name if requested
        if self.dest is not SUPPRESS:
            setattr(namespace, self.dest, parser_name)

        # select the parser
        try:
            parser = self._name_parser_map[parser_name]
        except KeyError:
            args = {'parser_name': parser_name,
                    'choices': ', '.join(self._name_parser_map)}
            msg = _('unknown parser %(parser_name)r (choices: %(choices)s)') % args
            raise ArgumentError(self, msg)

        # parse all the remaining options into the namespace
        # store any unrecognized options on the object, so that the top
        # level parser can decide what to do with them

        # In case this subparser defines new defaults, we parse them
        # in a new namespace object and then update the original
        # namespace for the relevant parts.
        subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
        for key, value in vars(subnamespace).items():
            setattr(namespace, key, value)

        if arg_strings:
            vars(namespace).setdefault(_UNRECOGNIZED_ARGS_ATTR, [])
            getattr(namespace, _UNRECOGNIZED_ARGS_ATTR).extend(arg_strings)

Unittests:

import unittest
import argfork as argparse
from argfork import Namespace


class argparseTest(unittest.TestCase):

    def setUp(self) -> None:
        self.parser = argparse.ArgumentParser(prog='PROG')
        subparsers = self.parser.add_subparsers(help='sub-command help')

        # create the parser for the "a" command
        parser_a = subparsers.add_parser('a', help='a help')
        parser_a.add_argument('-a', help='bar help')

        # create the parser for the "b" command
        parser_b = subparsers.add_parser('b', help='b help')
        parser_b.add_argument('-b', help='baz help')
        parser_b.add_argument('-q', help='baz help')

        # create the parser for the "c" command
        parser_b = subparsers.add_parser('c', help='b help')
        parser_b.add_argument('-c', help='baz help')
        parser_b.add_argument('-k', help='baz help')

        # create the parser for the "c" command
        parser_b = subparsers.add_parser('d', help='b help')
        parser_b.add_argument('-d', help='baz help')
        parser_b.add_argument('-D', help='baz help')
        parser_b.add_argument('-R', help='baz help')

    def testSimple(self):
        case = ['a', '-a', 'test']
        res_obj = Namespace(a='test').__dict__
        rest_obj = self.parser.parse_known_args(case)[0].__dict__

        res_k, res_v = res_obj.keys(), list(res_obj.values())
        test_k, test_v = rest_obj.keys(), list(rest_obj.values())

        self.assertEqual(res_v, test_v)
        self.assertEqual(res_k, test_k)

    def testMany(self):
        case = ['d', '-d', '1234', 'd', '-D', '12345', 'd', '-R', '1', 'c', '-c', '123', 'c', '-k', '555', 'b', '-q', 'test']
        res_obj = Namespace(d='1234', D='12345', R='1', c='123', k='555', b=None, q='test').__dict__
        rest_obj = self.parser.parse_known_args(case)[0].__dict__

        res_k, res_v = res_obj.keys(), list(res_obj.values())
        test_k, test_v = rest_obj.keys(), list(rest_obj.values())

        self.assertEqual(res_v, test_v)
        self.assertEqual(res_k, test_k)

    def testZero(self):
        case = []
        res_obj = Namespace().__dict__
        rest_obj = self.parser.parse_known_args(case)[0].__dict__

        res_k, res_v = res_obj.keys(), list(res_obj.values())
        test_k, test_v = rest_obj.keys(), list(rest_obj.values())

        self.assertEqual(res_v, test_v)
        self.assertEqual(res_k, test_k)

if __name__ == '__main__':
    unittest.main()

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@hpaulj
Copy link

hpaulj commented Apr 11, 2022

You got caught up in the migration, so didn't get any responses.

The behavior that bothers you is normal and expected. add_subparsers creates a positional argument, and explicitly allows only one of its kind. Handling only one subcommand might not be universal, but it is typical - pip and jupyter are two examples I use.

You've gotten around this by giving the Action itself the ability to find and call multiple parsers. But the change is not reflected in the help or usage, which depends only on what the main parser "knows".

You are welcome to develop and publish the change as a pypi package, but I don't think it should be considered as a change to the standard argparse module.

@hpaulj
Copy link

hpaulj commented Apr 20, 2022

This should be changed from type-bug to type-feature

@AlexWaygood AlexWaygood removed interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.8 only security fixes labels Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
Status: Bugs
Development

No branches or pull requests

2 participants