Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argparse.ArgumentParser is slow when parsing a very long option list #96859

Open
alexeypa opened this issue Sep 16, 2022 · 3 comments
Open

argparse.ArgumentParser is slow when parsing a very long option list #96859

alexeypa opened this issue Sep 16, 2022 · 3 comments
Labels
3.12 new features, bug and security fixes performance Performance or resource usage stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@alexeypa
Copy link

alexeypa commented Sep 16, 2022

Bug report

The following parser exhibits O(N^2) behavior when parsing a very long option list (e.g. --item=a --item=b ...):

parser = argparse.ArgumentParser(description="test")
parser.add_argument('--item', dest='accumulate', action='append')

Here is a simple repro that measures the time parse_args() take depending on the number of options:

#!/usr/bin/env python3
import argparse
from datetime import datetime


def run_test(num_iterations):
    """Parse a list with 'append' action and measure the time it takes."""
    parser = argparse.ArgumentParser(description="test")
    parser.add_argument('--item', dest='accumulate', action='append')

    start = datetime.now()

    args = parser.parse_args(['--item=x' for _ in range(num_iterations)])

    end = datetime.now()
    diff = end - start

    print(f"{num_iterations:8} iterations: {diff} s")


def main():
    for i in range(8, 18):
        run_test(pow(2, i))


if __name__ == "__main__":
    main()

The output on my machine:

> ./repro.py
     256 iterations: 0:00:00.002699 s
     512 iterations: 0:00:00.008444 s
    1024 iterations: 0:00:00.028508 s
    2048 iterations: 0:00:00.104813 s
    4096 iterations: 0:00:00.389232 s
    8192 iterations: 0:00:01.552962 s
   16384 iterations: 0:00:06.132139 s
   32768 iterations: 0:00:25.233547 s
   65536 iterations: 0:01:51.921170 s
  131072 iterations: 0:07:42.047914 s

Your environment

  • CPython versions tested on: HEAD of main (a9d58fe)
  • Operating system and architecture: Fedora release 36 (Thirty Six)
@alexeypa alexeypa added the type-bug An unexpected behavior, bug, or error label Sep 16, 2022
alexeypa added a commit to alexeypa/cpython that referenced this issue Sep 16, 2022
…onals

The loop consuming optional and positional arguments in ArgumentParser._parse_known_args()
rescans option_string_indices on every iteration of the loop in order to compute the next
option index:

```py
next_option_string_index = min([
    index
    for index in option_string_indices
    if index >= start_index])
```

This becomes very slow when parsing a very long list of arguments, e.g. when parsing
a response file.

This commit refactors the loop to scan the option indices list only once.
@alexeypa
Copy link
Author

Here is a simple repro that measures the time parse_args() take depending on the number of options

For comparison, here is how alexeypa@548376e behaves with the same repro:

> ./repro.py
     256 iterations: 0:00:00.001061 s
     512 iterations: 0:00:00.002236 s
    1024 iterations: 0:00:00.005026 s
    2048 iterations: 0:00:00.013403 s
    4096 iterations: 0:00:00.040210 s
    8192 iterations: 0:00:00.133397 s
   16384 iterations: 0:00:00.479814 s
   32768 iterations: 0:00:01.837926 s
   65536 iterations: 0:00:07.891365 s
  131072 iterations: 0:00:34.732420 s

There is still non-linear growth caused by _AppendAction.__call__() cloning the whole list of parsed values (see _copy_items()) on every append but it is much faster already. The _copy_items() behavior can be adjusted on a separate commit.

Let me go through a submission checklist before creating a PR for this fix.

alexeypa added a commit to alexeypa/cpython that referenced this issue Sep 19, 2022
Currently the append, append_const, and extend actions make a copy of the destination
list every time the action is called. This is noteceable with very long option lists.

This commit adds Action.clone_dest() method that is called the first time an action
runs for each destination. This allows to copy lists only once, significantky speeding
up appends.
@alexeypa
Copy link
Author

There is still non-linear growth caused by _AppendAction.call() cloning the whole list of parsed values (see _copy_items()) on every append but it is much faster already. The _copy_items() behavior can be adjusted on a separate commit.

I updated the actions code to copy lists only once on alexeypa@856ca35. This improved the situation significantly:

> ./repro.py                                 
     256 iterations: 0:00:00.001032 s
     512 iterations: 0:00:00.001732 s
    1024 iterations: 0:00:00.003265 s
    2048 iterations: 0:00:00.006883 s
    4096 iterations: 0:00:00.012096 s
    8192 iterations: 0:00:00.024929 s
   16384 iterations: 0:00:00.049330 s
   32768 iterations: 0:00:00.101852 s
   65536 iterations: 0:00:00.199928 s
  131072 iterations: 0:00:00.428487 s

I think it makes sense to lend this fix on a separate PR, since it addresses a different problem then #96904.

@AlexWaygood AlexWaygood added performance Performance or resource usage stdlib Python modules in the Lib dir labels Sep 19, 2022
alexeypa added a commit to alexeypa/cpython that referenced this issue Sep 20, 2022
alexeypa added a commit to alexeypa/cpython that referenced this issue Sep 20, 2022
@iritkatriel iritkatriel added the 3.12 new features, bug and security fixes label Sep 24, 2022
@alexeypa
Copy link
Author

Hey, trying to bring some attention to #96904. The PR is still waiting for a core review. Is anyone available to take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 new features, bug and security fixes performance Performance or resource usage stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: Bugs
Development

No branches or pull requests

3 participants