Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upBypass the pathological case of too many threads #17615
Conversation
The _adjust_thread_count () function does not check the result of starting a new thread. However, in the extreme case where a thread cannot be created, an exception occurs and no new future is created. Catching this exception in the main program doesn't help: there is no future anyway, and the new worker isn't running. It creates situations like this: >>> def placeholder(): ... pass ... >>> import concurrent.futures >>> exe=concurrent.futures.ThreadPoolExecutor(10000) >>> futlist = [exe.submit(placeholder) for _ in range(4095)] >>> futlist1 = [exe.submit(placeholder) for _ in range(2)] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py", line 174, in submit self._adjust_thread_count() File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py", line 196, in _adjust_thread_count t.start() File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 852, in start _start_new_thread(self._bootstrap, ()) RuntimeError: can't start new thread -- there is no single running future, but a new future cannot be created, and everything is stuck. At the moment, there are two options: 1) terminate the current ThreadPoolExecutor, and recreate it with fewer workers, 2) adjust the value of _max_workers. Option 1) has many disadvantages, the main of which are: - the logic of the program becomes significantly more complex, and the readability drops dramatically - if threads could be created in parallel (say, in frameworks/libraries being used) this re-creating can lead to the significant performance drop, - workers can perform non-idempotent tasks, and recreating them can be nearly impossible. Option 2) has one, but global disadvantage: it is completely undocumented, and is based on external access to the" intimate parts " of the class. In this regard, it is proposed to add an optional exception handling to the _adjust_thread_count () function, which will adjust the value of _max_workers transparently for the user, and continue the execution of the program. A future will be created, all good. It is also proposed to manage this exception handling using the new optional shrink_on_exception parameter in the __init__ constructor of the ThreadPoolExecutor class. It is False by default, which ensures full backward compatibility. This approach has one drawback: the number of threads that can be created in ThreadPoolExecutors can vary dynamically over time (if threads are creating in other parts of the program in parallel). Therefore, sequential reducing _max_workers several times can result in a pathologically low value of this parameter. In order to enable the user to manage this situation, it is proposed to create a documented method ThreadPoolExecutor.set_size(new_max_workers:int)->int , which for now will only change the value of _max_workers.
This comment has been minimized.
This comment has been minimized.
the-knights-who-say-ni
commented
Dec 15, 2019
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). Recognized GitHub usernameWe couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames: This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
semenyaka commentedDec 15, 2019
The _adjust_thread_count () function does not check the result of starting a new thread. However, in the extreme case where a thread cannot be created, an exception occurs and no new future is created. Catching this exception in the main program doesn't help: there is no future anyway, and the new worker isn't running. It creates situations like this:
-- there is no single running future, but a new future cannot be created, and everything is stuck.
At the moment, there are two options:
Option 1) has many disadvantages, the main of which are:
Option 2) has one, but global disadvantage: it is completely undocumented, and is based on external access to the" intimate parts " of the class.
In this regard, it is proposed to add an optional exception handling to the _adjust_thread_count () function, which will adjust the value of _max_workers transparently for the user, and continue the execution of the program. A future will be created, all good.
It is also proposed to manage this exception handling using the new optional shrink_on_exception parameter in the init constructor of the ThreadPoolExecutor class. It is False by default, which ensures full backward compatibility.
This approach has one drawback: the number of threads that can be created in ThreadPoolExecutors can vary dynamically over time (if threads are creating in other parts of the program in parallel). Therefore, sequential reducing _max_workers several times can result in a pathologically low value of this parameter. In order to enable the user to manage this situation, it is proposed to create a documented method ThreadPoolExecutor.set_size(new_max_workers:int)->int , which for now will only change the value of _max_workers.