Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up urllib.request.getproxies_environment #91539

Closed
bohea opened this issue Apr 14, 2022 · 1 comment · Fixed by #91566
Closed

speed up urllib.request.getproxies_environment #91539

bohea opened this issue Apr 14, 2022 · 1 comment · Fixed by #91566
Labels
3.12 performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@bohea
Copy link

bohea commented Apr 14, 2022

in module urllib.request, function getproxies_environment (https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L2493)

iterate over all environment variables twice

I made a test:

  1. create 2500 environment variables
  2. call and timing getproxies_environment()
  3. the result is: getproxies_environment() take more than 50ms, which is a big overhead

so if there are many environment variables(like 10 thousands) in a machine, packages depend on function urllib.request.getproxies_environment (like requests does) shall be very slow

is 10 thousands environment variables reasonable?

yes it is, for example, in k8s cluster,kubelete may add a set of environment variables all Services(could be 10 thousands even 100 thousands), in purpose of service discovery

so would it be ok that getproxies_environment don't iterate over all environment variables?

@AlexWaygood AlexWaygood added performance Performance or resource usage stdlib Python modules in the Lib dir labels Apr 14, 2022
@eendebakpt
Copy link
Contributor

eendebakpt commented Apr 15, 2022

@bohea The following code reduces the computation time by a factor 3 or more on my system.

def getproxies_environment():
    """Return a dictionary of scheme -> proxy server URL mappings.
    Scan the environment for variables named <scheme>_proxy;
    this seems to be the standard convention.  If you need a
    different way, you can pass a proxies dictionary to the
    [Fancy]URLopener constructor.
    """
    # in order to prefer lowercase variables, process environment in
    # two passes: first matches any, second pass matches lowercase only

    # select only environment variables which end in (after making lowercase) _proxy 
    candidate_names = [name for name in os.environ.keys() if name[-6:]=='_'] # fast selection of candidates
    environment = [(name, os.environ[name], name.lower()) for name in candidate_names if name[-6:].lower()=='_proxy'] 
    
    proxies = {}
    for name, value, name_lower in environment:
        if value and name_lower[-6:] == '_proxy':
            proxies[name_lower[:-6]] = value
    # CVE-2016-1000110 - If we are running as CGI script, forget HTTP_PROXY
    # (non-all-lowercase) as it may be set from the web server by a "Proxy:"
    # header from the client
    # If "proxy" is lowercase, it will still be used thanks to the next block
    if 'REQUEST_METHOD' in os.environ:
        proxies.pop('http', None)
    for name, value, name_lower in environment:
        if name[-6:] == '_proxy':
            if value:
                proxies[name_lower[:-6]] = value
            else:
                proxies.pop(name_lower[:-6], None)
    return proxies

Does this work for you? If so, then I will make a PR

orsenthil pushed a commit that referenced this issue Oct 5, 2022
* improve performance of get_proxies_environment when there are many environment variables

* 📜🤖 Added by blurb_it.

* fix case of short env name

* fix formatting

* fix whitespace

* whitespace

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* whitespace

* Update Misc/NEWS.d/next/Library/2022-04-15-11-29-38.gh-issue-91539.7WgVuA.rst

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Carl Meyer <carl@oddbird.net>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 5, 2022
…nGH-91566)

* improve performance of get_proxies_environment when there are many environment variables

* 📜🤖 Added by blurb_it.

* fix case of short env name

* fix formatting

* fix whitespace

* whitespace

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* whitespace

* Update Misc/NEWS.d/next/Library/2022-04-15-11-29-38.gh-issue-91539.7WgVuA.rst

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Carl Meyer <carl@oddbird.net>
(cherry picked from commit aeb28f5)

Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 5, 2022
…nGH-91566)

* improve performance of get_proxies_environment when there are many environment variables

* 📜🤖 Added by blurb_it.

* fix case of short env name

* fix formatting

* fix whitespace

* whitespace

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* whitespace

* Update Misc/NEWS.d/next/Library/2022-04-15-11-29-38.gh-issue-91539.7WgVuA.rst

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Carl Meyer <carl@oddbird.net>
(cherry picked from commit aeb28f5)

Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
miss-islington added a commit that referenced this issue Oct 5, 2022
* improve performance of get_proxies_environment when there are many environment variables

* 📜🤖 Added by blurb_it.

* fix case of short env name

* fix formatting

* fix whitespace

* whitespace

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* whitespace

* Update Misc/NEWS.d/next/Library/2022-04-15-11-29-38.gh-issue-91539.7WgVuA.rst

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Carl Meyer <carl@oddbird.net>
(cherry picked from commit aeb28f5)

Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
carljm added a commit to carljm/cpython that referenced this issue Oct 6, 2022
* main: (66 commits)
  pythongh-65961: Raise `DeprecationWarning` when `__package__` differs from `__spec__.parent` (python#97879)
  docs(typing): add "see PEP 675" to LiteralString (python#97926)
  pythongh-97850: Remove all known instances of module_repr() (python#97876)
  I changed my surname early this year (python#96671)
  pythongh-93738: Documentation C syntax (:c:type:<C type> -> :c:expr:<C type>) (python#97768)
  pythongh-91539: improve performance of get_proxies_environment  (python#91566)
  build(deps): bump actions/stale from 5 to 6 (python#97701)
  pythonGH-95172 Make the same version `versionadded` oneline (python#95172)
  pythongh-88050: Fix asyncio subprocess to kill process cleanly when process is blocked (python#32073)
  pythongh-93738: Documentation C syntax (Function glob patterns -> literal markup) (python#97774)
  pythongh-93357: Port test cases to IsolatedAsyncioTestCase, part 2 (python#97896)
  pythongh-95196: Disable incorrect pickling of the C implemented classmethod descriptors (pythonGH-96383)
  pythongh-97758: Fix a crash in getpath_joinpath() called without arguments (pythonGH-97759)
  pythongh-74696: Pass root_dir to custom archivers which support it (pythonGH-94251)
  pythongh-97661: Improve accuracy of sqlite3.Cursor.fetchone docs (python#97662)
  pythongh-87092: bring compiler code closer to a preprocessing-opt-assembler organisation (pythonGH-97644)
  pythonGH-96704: Add {Task,Handle}.get_context(), use it in call_exception_handler() (python#96756)
  pythongh-93738: Documentation C syntax (:c:type:`PyTypeObject*` -> :c:expr:`PyTypeObject*`) (python#97778)
  pythongh-97825: fix AttributeError when calling subprocess.check_output(input=None) with encoding or errors args (python#97826)
  Add re.VERBOSE flag documentation example (python#97678)
  ...
carljm added a commit to carljm/cpython that referenced this issue Oct 8, 2022
* main: (53 commits)
  pythongh-94808: Coverage: Test that maximum indentation level is handled (python#95926)
  pythonGH-88050: fix race in closing subprocess pipe in asyncio  (python#97951)
  pythongh-93738: Disallow pre-v3 syntax in the C domain (python#97962)
  pythongh-95986: Fix the example using match keyword (python#95989)
  pythongh-97897: Prevent os.mkfifo and os.mknod segfaults with macOS 13 SDK (pythonGH-97944)
  pythongh-94808: Cover `PyUnicode_Count` in CAPI (python#96929)
  pythongh-94808: Cover `PyObject_PyBytes` case with custom `__bytes__` method (python#96610)
  pythongh-95691: Doc BufferedWriter and BufferedReader (python#95703)
  pythonGH-88968: Add notes about socket ownership transfers (python#97936)
  pythongh-96865: [Enum] fix Flag to use CONFORM boundary (pythonGH-97528)
  pythongh-65961: Raise `DeprecationWarning` when `__package__` differs from `__spec__.parent` (python#97879)
  docs(typing): add "see PEP 675" to LiteralString (python#97926)
  pythongh-97850: Remove all known instances of module_repr() (python#97876)
  I changed my surname early this year (python#96671)
  pythongh-93738: Documentation C syntax (:c:type:<C type> -> :c:expr:<C type>) (python#97768)
  pythongh-91539: improve performance of get_proxies_environment  (python#91566)
  build(deps): bump actions/stale from 5 to 6 (python#97701)
  pythonGH-95172 Make the same version `versionadded` oneline (python#95172)
  pythongh-88050: Fix asyncio subprocess to kill process cleanly when process is blocked (python#32073)
  pythongh-93738: Documentation C syntax (Function glob patterns -> literal markup) (python#97774)
  ...
mpage pushed a commit to mpage/cpython that referenced this issue Oct 11, 2022
…n#91566)

* improve performance of get_proxies_environment when there are many environment variables

* 📜🤖 Added by blurb_it.

* fix case of short env name

* fix formatting

* fix whitespace

* whitespace

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

* whitespace

* Update Misc/NEWS.d/next/Library/2022-04-15-11-29-38.gh-issue-91539.7WgVuA.rst

Co-authored-by: Carl Meyer <carl@oddbird.net>

* Update Lib/urllib/request.py

Co-authored-by: Carl Meyer <carl@oddbird.net>

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Carl Meyer <carl@oddbird.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 performance Performance or resource usage stdlib Python modules in the Lib dir
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants