Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot cleanly shut down an asyncio based server #113538

Open
CendioOssman opened this issue Dec 28, 2023 · 27 comments
Open

Cannot cleanly shut down an asyncio based server #113538

CendioOssman opened this issue Dec 28, 2023 · 27 comments
Labels
topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@CendioOssman
Copy link
Contributor

CendioOssman commented Dec 28, 2023

Bug report

Bug description:

When writing an asyncio based service, you basically have this sequence:

  1. Create an event loop
  2. Register a SIGTERM handler
  3. Start your server
  4. loop.run_forever()
  5. SIGTERM causes a loop.stop()
  6. Close the server
  7. Close event loop

If there are any connections active at this point, then they don't get discarded until interpreter shutdown, with the result that you get a bunch of ResourceWarnings (and cleanup code might not run).

It would be very useful if there was a Server.close_clients() or something like that. Even a Server.all_transports() would be useful, as then you could do something similar as when doing a Task.cancel() on what you get from loop.all_tasks().

We could poke at Server._transports, but that is something internal that might change in the future.

There is Server.wait_closed(), but that hangs until all clients have gracefully disconnected. It doesn't help us when we want to shut down the service now.

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Linked PRs

@CendioOssman CendioOssman added the type-bug An unexpected behavior, bug, or error label Dec 28, 2023
@CendioOssman
Copy link
Contributor Author

This might be a symptom of a more general issue, that there is a need for cancelling everything that's dependent on the event loop we are shutting down.

Currently, it is possible to cancel tasks, async generators, and executors. The issue here is (sort-of) about shutting down pending file descriptors. But it might also be necessary to be able to cancel pending futures, and possibly more things?

@gvanrossum
Copy link
Member

I've always assumed that it is up to the application to keep track of connections (etc.) in a form that is suitable for shutting down. After all in a typical application there's more to clean up than the client connections -- there may be database connections, and who know what else to handle.

I would definitely not mess with pending futures -- those are sufficiently low-level that you cannot know who is waiting for them and why, and in which order they should be cancelled.

The key point of control you have is probably cancelling tasks -- tasks can do cleanup in response to a cancellation request, and if your tasks form a tree (or even a DAG), they will be cancelled recursively (if task t1 is waiting for task t2, cancelling t1 first cancels t2). So perhaps your application should keep track of the main task per connection?

@gvanrossum
Copy link
Member

gvanrossum commented Dec 28, 2023

Another thought is that this might be something that a higher-level framework or library could offer. E.g. gunicorn or aio-http? (Note that I've never used either, so I may be way off base here.)

@CendioOssman
Copy link
Contributor Author

I've always assumed that it is up to the application to keep track of connections (etc.) in a form that is suitable for shutting down. After all in a typical application there's more to clean up than the client connections -- there may be database connections, and who know what else to handle.

asyncio unfortunately makes this difficult, as it is the one in control of creating the connections. It is done by the Server class, which you cannot override or extend. At best, you can try to track things in the protocol factory callback. But that requires you to also have handling in each protocol class, which is messy and fragile.

Clean up should hopefully be possible using the normal signals from the transports, i.e. connection_lost().

I would definitely not mess with pending futures -- those are sufficiently low-level that you cannot know who is waiting for them and why, and in which order they should be cancelled.

You are probably right. But it does make me uneasy that there isn't a way to centrally force a clean-up on shutdown. If you have to build your own everywhere, then it's easy to make a mistake and overlook something.

The key point of control you have is probably cancelling tasks -- tasks can do cleanup in response to a cancellation request, and if your tasks form a tree (or even a DAG), they will be cancelled recursively (if task t1 is waiting for task t2, cancelling t1 first cancels t2). So perhaps your application should keep track of the main task per connection?

Unfortunately, there isn't always a task involved. Sometimes there is just a protocol, or a construct using futures and callbacks.

Another thought is that this might be something that a higher-level framework or library could offer. E.g. gunicorn or aio-http? (Note that I've never used either, so I may be way off base here.)

Maybe. Those two specific projects are not sufficient, though, as they are just for HTTP.

@gvanrossum
Copy link
Member

Okay, let's dive into the original problem a bit more. I think I would like to see a "scaled-down" example of a realistic server that we might be able to use to demonstrate the problem, reason about it, and evaluate potential solutions. It should be small enough that I can understand the code in 5-10 minutes (my typical attention span :-). Would you be willing to code up a first attempt at such an example? We can iterate on it together.

Regarding Futures, these are used all over the place (internally in asyncio) with specific semantics, sometimes involving cancellation. If we were to start cancelling Futures automatically when a loop is stopped that could have all sorts of unexpected consequences -- probably worse than leaving them alone. So let's declare this out of scope (or if it really bugs you, please open a separate issue, as it has nothing to do with closing servers).

@CendioOssman
Copy link
Contributor Author

This should be a rough mockup of the situation:

#!/usr/bin/python3

import asyncio

class EchoServerProtocol(asyncio.Protocol):
    def connection_made(self, transport):
        peername = transport.get_extra_info('peername')
        print('Connection from {}'.format(peername))
        self.transport = transport

    def data_received(self, data):
        message = data.decode()
        print('Data received: {!r}'.format(message))

        print('Send: {!r}'.format(message))
        self.transport.write(data)

        print('Close the client socket')
        self.transport.close()

def stop():
    loop = asyncio.get_running_loop()
    loop.stop()

def server_up(fut):
    global server
    server = fut.result()

loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)

fut = asyncio.ensure_future(loop.create_server(
        lambda: EchoServerProtocol(),
        '127.0.0.1', 8888))
fut.add_done_callback(server_up)

# Simulate termination
loop.call_later(2.5, stop)

try:
    loop.run_forever()
finally:
    server.close()
    # Can't run this as it will hang:
    #loop.run_until_complete(server.wait_closed())
    loop.close()

If you run this in dev mode and connect a client, you see the issue:

$ python3.12 -X dev echosrv.py 
Connection from ('127.0.0.1', 56290)
sys:1: ResourceWarning: unclosed <socket.socket fd=7, family=2, type=1, proto=6, laddr=('127.0.0.1', 8888), raddr=('127.0.0.1', 56290)>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/lib64/python3.12/asyncio/selector_events.py:875: ResourceWarning: unclosed transport <_SelectorSocketTransport fd=7>
  _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback

@jsundahl
Copy link

jsundahl commented Jan 2, 2024

This wasn't a problem until 3.11.5. Here's a simpler example of what I think is the same root issue:

#!/usr/bin/env python3

import asyncio

async def main():
    async def on_connection(reader, writer):
        while True:
            buf = await reader.read(1024)
            print("got bytes", buf)


    server = await asyncio.start_unix_server(on_connection, path="/tmp/sock")
    cr, cw = await asyncio.open_unix_connection(path="/tmp/sock")
    print("sending hi")
    cw.write(b"hi")
    await cw.drain()

    await asyncio.sleep(0.2)

    server.close()
    await server.wait_closed()


asyncio.run(main())

then using pyenv to set the python version and run it we get the following:

$ echo "3.11.4" > .python-version 
$ ./t.py 
sending hi
got bytes b'hi'

I have yet to look at the underlying code but I'm guessing the server implementation in <=3.11.4 is cancelling the reader.read coroutine and ignoring the cancellation error as I'd expect.

$ echo "3.11.5" > .python-version 
$ ./t.py 
sending hi
got bytes b'hi'
Exception ignored in: <function StreamWriter.__del__ at 0x7fe69d0f2c00>
Traceback (most recent call last):
  File "/home/jsundahl/.pyenv/versions/3.11.5/lib/python3.11/asyncio/streams.py", line 396, in __del__
    self.close()
  File "/home/jsundahl/.pyenv/versions/3.11.5/lib/python3.11/asyncio/streams.py", line 344, in close
    return self._transport.close()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jsundahl/.pyenv/versions/3.11.5/lib/python3.11/asyncio/selector_events.py", line 860, in close
    self._loop.call_soon(self._call_connection_lost, None)
  File "/home/jsundahl/.pyenv/versions/3.11.5/lib/python3.11/asyncio/base_events.py", line 761, in call_soon
    self._check_closed()
  File "/home/jsundahl/.pyenv/versions/3.11.5/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

skipping 3.11.6, it's identical to 3.11.5.
3.11.7 is slightly different:

$ echo "3.11.7" > .python-version 
$ ./t.py 
sending hi
got bytes b'hi'
Exception in callback StreamReaderProtocol.connection_made.<locals>.callback(<Task cancell...oon/./t.py:6>>) at /home/jsundahl/.pyenv/versions/3.11.7/lib/python3.11/asyncio/streams.py:248
handle: <Handle StreamReaderProtocol.connection_made.<locals>.callback(<Task cancell...oon/./t.py:6>>) at /home/jsundahl/.pyenv/versions/3.11.7/lib/python3.11/asyncio/streams.py:248>
Traceback (most recent call last):
  File "/home/jsundahl/.pyenv/versions/3.11.7/lib/python3.11/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/jsundahl/.pyenv/versions/3.11.7/lib/python3.11/asyncio/streams.py", line 249, in callback
    exc = task.exception()
          ^^^^^^^^^^^^^^^^
  File "/home/jsundahl/scripts/callsoon/./t.py", line 8, in on_connection
    buf = await reader.read(1024)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jsundahl/.pyenv/versions/3.11.7/lib/python3.11/asyncio/streams.py", line 708, in read
    await self._wait_for_data('read')
  File "/home/jsundahl/.pyenv/versions/3.11.7/lib/python3.11/asyncio/streams.py", line 540, in _wait_for_data
    await self._waiter
asyncio.exceptions.CancelledError

then 3.12.0 and 3.12.1 hang indefinitely on wait_closed and have the same issue in StreamWriter.__del__ that 3.11.5 has on SIGINT.

@jsundahl
Copy link

jsundahl commented Jan 2, 2024

looks like this is related to #109538

@gvanrossum
Copy link
Member

gvanrossum commented Jan 3, 2024

With @CendioOssman's example (connecting a client that doesn't send anything, just hangs) I get different results -- at first I thought I saw what they saw, but then I couldn't repro it again. Weird, let's say that's my fault.

With @jsundahl's example I get exactly what they got; main is the same as 3.12. Let me do some spelunking.

@gvanrossum
Copy link
Member

A bit more...

  • In 3.10 and before, server.wait_closed() was a no-op, unless you called it before server.close(), in a task (asyncio.create_task(server.wait_closed())). The unclosed connection was just getting abandoned.
  • I attempted to fix this in 3.12.0, and backported the fix (I think this ended up in 3.11.5).
  • Something was wrong with the fix and an improved fix landed in 3.12.1. I think this is what you have in 3.11.7 (3.11.5 regression: StreamWriter.__del__ fails if event loop is already closed #109538).

Alas, the fix waits until all handlers complete, which in this example is never. Hence the hang. I guess the correct way to write the handler is to put a timeout on the await reader.read(1024) call.

I have to think about this more, but I'm out of time for today. Sorry.

@CendioOssman
Copy link
Contributor Author

I think the use case for server.wait_closed() is perfectly reasonable for some scenarios. So I don't think we should change that. If you have the option of trusting your clients to finish in a reasonable time, then it works great. What's missing is a method for when you don't trust your clients.

E.g. a clean version of:

    for fd, transport in loop._transports.items():
        transport.abort()
    loop.run_until_complete(server.wait_closed())

@gvanrossum
Copy link
Member

One workaround: if you're happy with the old (pre-3.11) behavior, just delete the wait_closed() call. It used to be a no-op, and apparently you were happy with that, so you ought to be happy with not calling it at all. :-)

If you don't trust your clients, you can just exit the process without prejudice, right? Perhaps after a timeout. Any warnings or errors getting logged at that point will help you track down the unreliable bits.

I am reluctant trying to backport any more fixes to 3.12 or 3.11, given how sketchy this has been already.

I don't think we can afford to keep track of all transports -- those are owned by the application, not by the loop. However, you could (after a delay) enumerate all tasks and cancel them. That should clean things up.

@jsundahl
Copy link

jsundahl commented Jan 3, 2024

@gvanrossum that's only kind of a solution for 3.12. For 3.11 there's still that "event loop is closed" issue. It seems to me like it should be feasible for the server to own its "on_connection" tasks and cancel them on close.

@jsundahl
Copy link

jsundahl commented Jan 3, 2024

some way to forcibly cancel this task defined here

self._task = self._loop.create_task(res)

@gvanrossum
Copy link
Member

Okay, now I see one improvement: the task.exception() call in callback() inside StreamReaderProtocol.connection_made() should be guarded by if not task.cancelled(), so that when the task is cancelled, the callback (which was added purely to log the error) doesn't crash and cause more errors to be logged.
This is as simple as the adding this to the top of the callback:

                    if task.cancelled():
                        transport.close()
                        return

and this can be backported to 3.12 and 3.11.

I think you will find that with this change, even when that task lingers because the connection is never closed, it doesn't cause any problems.

Shall I proceed with that?

@kumaraditya303 I suppose this was an oversight in your original PR #111601 ?

@jsundahl
Copy link

jsundahl commented Jan 3, 2024

Yeah that's definitely an improvement that could be made. I don't think that fixes the hang in python 3.12 though, but maybe that's not something to take on for 3.12. I'm going to try to find some time in the next couple days to dig in a bit deeper and get a dev environment set up to play around with this issue.

@gvanrossum
Copy link
Member

I'm actually getting cold feet about that fix. The task exists because asyncio.streams.start_server() takes a client_connected_cb which may be either a plain function or a coroutine; if it's a coroutine, it's run in its own task.

Not much is said about the case where client_connected_cb fails. If it's a plain function, it will cause connection_made to fail. It looks like connection_made methods of protocols in general are called using loop.call_soon(), so that means that errors are logged but nothing else is done about it. The transport may remain in a bad state?

If it's a coroutine (i.e., async def), it's wrapped in a task. I think #111601 added the callback so that something is logged when the coroutine fails (causing the task to exit with an exception); the issue #110894 describes a scenario where the client_connected_cb is a coroutine always immediately fails, and the transport is never closed. The fix retrieves the exception, logs it, and closes the transport; if there's no exception, it logs nothing and leaves the transport open.

But what should happen if the task is cancelled? The current state of affairs is that the task.exception() call raises CancelledError, the callback then raises that, something logs that, and the transport remains open. I would like to avoid the spurious logged exception, but should it close the transport or not?

@gvanrossum
Copy link
Member

FWIW the hang is because the callback never returns and the (toy) application has neither a timeout on its reads nor another way to make it exit its infinite loop. One way to for the appplication to have more control would be to use a plain function as the client_connected_cb instead, and let that plain function create its own task that reads from the reader and writes to the writer -- the application can then manage that task as it pleases.

The most likely reason the task is cancelled would seem to be when ^C is hit. In this case, if we don't close the transport, we get these messages (in addition to the traceback):

sys:1: ResourceWarning: unclosed <socket.socket fd=8, family=1, type=1, proto=0, laddr=/tmp/sock>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/Users/guido/cpython/Lib/asyncio/selector_events.py:865: ResourceWarning: unclosed transport <_SelectorSocketTransport fd=8>
  _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/Users/guido/cpython/Lib/asyncio/streams.py:414: ResourceWarning: loop is closed
  warnings.warn("loop is closed", ResourceWarning)
/Users/guido/cpython/Lib/asyncio/selector_events.py:865: ResourceWarning: unclosed transport <_SelectorSocketTransport fd=7>

If we do close the transport, that's reduced to this:

/Users/guido/cpython/Lib/asyncio/streams.py:414: ResourceWarning: loop is closed
  warnings.warn("loop is closed", ResourceWarning)
/Users/guido/cpython/Lib/asyncio/selector_events.py:865: ResourceWarning: unclosed transport <_SelectorSocketTransport fd=7>

Those remaining messages are about the client (cr, cw) so these are expected. If I arrange to call cw.close() before leaving main(), they go away.

So, concluding, I think that closing the transport when the task is cancelled is the right thing to do. If you agree, please approve the PR.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jan 4, 2024
…task is cancelled (pythonGH-113690)

(cherry picked from commit 4681a52)

Co-authored-by: Guido van Rossum <guido@python.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jan 4, 2024
…task is cancelled (pythonGH-113690)

(cherry picked from commit 4681a52)

Co-authored-by: Guido van Rossum <guido@python.org>
@gvanrossum
Copy link
Member

@jsundahl @CendioOssman Is there anything else you think needs to be done here? You have my view on the hang in 3.12.

gvanrossum added a commit that referenced this issue Jan 4, 2024
… task is cancelled (GH-113690) (#113714)

(cherry picked from commit 4681a52)

Co-authored-by: Guido van Rossum <guido@python.org>
gvanrossum added a commit that referenced this issue Jan 4, 2024
… task is cancelled (GH-113690) (#113713)

(cherry picked from commit 4681a52)

Co-authored-by: Guido van Rossum <guido@python.org>
@jsundahl
Copy link

jsundahl commented Jan 4, 2024

nope, thanks for taking action on this @gvanrossum !

@gvanrossum
Copy link
Member

You're welcome! I hope I didn't break yet another workflow this time. :-)

@CendioOssman
Copy link
Contributor Author

I don't think we can afford to keep track of all transports -- those are owned by the application, not by the loop. However, you could (after a delay) enumerate all tasks and cancel them. That should clean things up.

In case that wasn't clear, loop._transports is something that already exists. It just isn't an official API.

@jsundahl @CendioOssman Is there anything else you think needs to be done here? You have my view on the hang in 3.12.

Unfortunately, that only discusses tasks. In my example, there are no tasks, and hence no mechanism to cancel/close anything. So I'm afraid I still think there is something missing in the API.

If you don't trust your clients, you can just exit the process without prejudice, right? Perhaps after a timeout. Any warnings or errors getting logged at that point will help you track down the unreliable bits.

But I'd like to avoid cluttering up logs with those warnings and errors. There is nothing unreliable in the given scenario. External actors simply cannot generally be trusted to be well-behaved, and a robust server should be able to deal with that.

There might also be other clean-ups that would trigger on a transport close that currently isn't being executed.

The ideal scenario might look something like:

  1. Call close() on all open transports
  2. Give them some time to close gracefully
  3. Call abort() on all still open transports
  4. Give some time for callbacks to react to the closed transports
  5. cancel() any remaining tasks
  6. Give some time for the tasks to clean up
  7. Terminate, abandoning any tasks that still refuse to die

(ideally 2, 4, and 6 are not fixed sleeps, but can finish early if everything terminates cleanly)

@CendioOssman
Copy link
Contributor Author

The quick fix would probably be to promote loop._transports() to an official API and create asyncio.all_transports(), equivalent to asyncio.all_tasks().

@gvanrossum
Copy link
Member

@CendioOssman

In case that wasn't clear, loop._transports is something that already exists. It just isn't an official API.

I'm very reluctant to make a public API for loop._transports() -- reading python/asyncio#372, where it is introduced, makes it clear that it only exists to be able to diagnose one particular error condition in user code (calling add_reader/add_writer for a socket that's owned by a transport). In a future version we might decide to implement that differently, or only in debug mode, or whatever.

What I also learned from that issue is that transports are supposed to clean up when they are GC'ed. I am guessing that in your scenario this isn't enough, because the protocol never exits so it never releases the transport. (I forget what keeps protocols alive -- it could be just a reference cycle between transport and protocol.)

Finally, there's a whole other world on Windows, where instead of selectors we (by default) use proactors. A proactor event loop does not keep track of its transports, because there is no need (proactors don't support add_reader and friends).

Unfortunately, that only discusses tasks. In my example, there are no tasks, and hence no mechanism to cancel/close anything. So I'm afraid I still think there is something missing in the API.

Aha! While @jsundahl's example used asyncio.start_unix_server, passing a coroutine as client_connected_cb, causing it to run in a task created by StreamReaderProtocol, your example defines its own protocol class (EchoServerProtocol) and uses the lower-level loop.create_server call. Indeed there is no task in this case. (I regret that asyncio has so many different ways to do things -- I always recommend using coroutines, but the low-level protocol/transport APIs also exist, and there's even lower-level add_reader etc.)

So the question is how to set a timeout on a protocol. I think the answer is that your protocol should define a method on_timeout which is called using asyncio.get_running_loop().call_later(DELAY, self.on_timeout), saving the returned TimerHandle instance as an instance variable on the protocol. Then in data_received and connection_closed you cancel that handle, and in data_received you create a new one. The on_timeout method should close the transport. (In fact, you can probably get away with just calling loop.call_later(DELAY, transport.close), if you don't want to log those timeouts.)

If that looks like a lot of boilerplate, maybe it is, but I'm sure you could define a helper class to make it simpler.

External actors simply cannot generally be trusted to be well-behaved [...]

By this you are referring to clients, right? I completely agree. But I think it is the job of the application to deal with these -- the server infrastructure cannot know when a client is assumed to be misbehaving. If you get warnings in the logs due to misbehaving clients, your first reaction ought to be to add something to your application code (e.g. a timeout like I suggested above). That's why the warnings are being logged in the first place.

So, I am still firm in my insistence that applications already have everything they need to handle misbehaving clients (in particular those that never close their connection), and a universal mechanism to track down and close all transports is not needed.

@gvanrossum
Copy link
Member

I'm going to assume silence is assent. Later this week I'd like to close this issue as completed, unless there is pushback. (If you need more time or feel I am too pushy, just add a brief note and I'll keep it open.)

@CendioOssman
Copy link
Contributor Author

I'm very reluctant to make a public API for loop._transports() -- reading python/asyncio#372, where it is introduced, makes it clear that it only exists to be able to diagnose one particular error condition in user code (calling add_reader/add_writer for a socket that's owned by a transport). In a future version we might decide to implement that differently, or only in debug mode, or whatever.

That is the current use case, but I don't see why that would prevent us from adding more things if we think they are useful and worthwhile? :)

We can always lock down correct behaviour using unit tests.

What I also learned from that issue is that transports are supposed to clean up when they are GC'ed. I am guessing that in your scenario this isn't enough, because the protocol never exits so it never releases the transport. (I forget what keeps protocols alive -- it could be just a reference cycle between transport and protocol.)

Abandoning things doesn't seem very Pythonic. And the current code complains with a ResourceWarning, so I am not convinced that's the intended way to do things. I read that comment as "it's a common antipattern to abandon transports, so we cannot rely on close() or abort() being called.

And transports are kept alive by the main loop anyway, so they cannot be abandoned. Which is a good thing! If you're using a protocol-based system, the protocol will often not be able to start tasks until it has had some initial communication over the transport (e.g. reading an HTTP request).

Finally, there's a whole other world on Windows, where instead of selectors we (by default) use proactors. A proactor event loop does not keep track of its transports, because there is no need (proactors don't support add_reader and friends).

That's just a small matter of implementation. :)

The key issue is what API we want/need. I'm confident there is a way to implement tracking of transports on Windows as well.

(I regret that asyncio has so many different ways to do things -- I always recommend using coroutines, but the low-level protocol/transport APIs also exist, and there's even lower-level add_reader etc.)

We actually prefer the low-level API for our core stuff. It makes the asynchronous nature more visible, which is very important when trying to write robust code. In a coroutine, it's much easier to miss where it executes concurrently with other things and create races.

So the question is how to set a timeout on a protocol.

I think there might be some misunderstanding of the scenario here. Our use case is when the service is shutting down. Until then, the clients can have all the time in the world. Which means we cannot start the timeout from the protocol. It has no idea when the service is shutting down.

But I think it is the job of the application to deal with these -- the server infrastructure cannot know when a client is assumed to be misbehaving.

I fully agree! The application is the only party that can decide this. So that's not what we're asking for. What we're asking for is better tools to find the misbehaving clients. Which in this scenario is every client¹, so the definition is pretty straightforward.

¹ So "misbehaving" might not be a fully appropriate description in this case

So, I am still firm in my insistence that applications already have everything they need to handle misbehaving clients (in particular those that never close their connection), and a universal mechanism to track down and close all transports is not needed.

I'm afraid I still don't agree. Yes, it is possible to solve the issue. But it is messy, which to me indicates something is missing in the core infrastructure.

We need a list of all active transports so we can call close() or abort() on them. To do this ourselves, I only see one option, which is to do it in the protocols, using connection_made() and connection_lost(). Annoying since there are many protocols, and this is a universal problem. The protocols are also isolated from the Server (which is nice and modular), which means we need to provide some connection between those as well.

Options that aren't available:

a) Do it in the protocol factory. This is more tightly coupled to the Server and its context. However, we only get a call when connections are started, not when they end. We also don't get the transport at this point for us to save away.

b) Subclassing the event loop or the Server object. There are no publicly defined hooks that would allow us to track transports.

@gvanrossum
Copy link
Member

I'm sorry, but I still think you are misunderstanding the architecture around transports and protocols. What you want to do is access all transports associated with a particular Server instance. But the _transports weak value dict exists on the event loop. Transports in there may be owned by any Server instance associated with the loop, or by none (you can call create_connection yourself). There are also transports that aren't registered with the loop at all (e.g. loop.subprocess_shell()) or loop.connect_write_pipe()).

I would have less of a problem with a proposal that added a weak set of transports associated with a particular Server. When a *SocketTransport is instantiated, the server (or None) is passed to the constructor. We could use this to add the transport to the weak set and remove it later -- just look for _attach() and _detach() calls. It'd then be easy to add a method to the server that returns the transports in that weak set (or maybe just the ones that aren't yet closing) and then you can close or abort those.

And transports are kept alive by the main loop anyway, so they cannot be abandoned.

Where? The ._transports attribute is a weak value dict, which doesn't keep its values alive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-asyncio type-bug An unexpected behavior, bug, or error
Projects
Status: Todo
Development

No branches or pull requests

4 participants