Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upRefresh aspired servables/versions following config update #1518
Conversation
Currently when the configured model list is updated via a call to handleReloadConfigRequest, the request thread blocks until any newly added models become available. Their availability however depends on the filesystem polling thread rescanning the filesystem at some periodic interval, meaning that there's an arbitrary delay before the requested changes actually take effect and the RPC returns. This problem may not be very noticeable with the default polling interval of 1 second, but seems undesirable for longer intervals and in particular makes API-based dynamic reconfiguration incompatible with the --file_system_poll_wait_seconds=0 setting (in this case all handleReloadConfigRequest calls time-out and do not take effect).
I have opened this against 1.15 since that's the version we are using, but can rebase on a different branch if needed. Also apologies in advance for the code, I am not very familiar with C++. |
tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc
Show resolved
Hide resolved
Can we add unit test coverage? |
Thanks @christisg, I've pushed a commit to address your logging comment. I will aim to add unit test coverage when I get a chance... it will take me a bit longer due to unfamiliarity with C++ and the codebase/test framework. |
@njhill thanks for reporting this bug. It's painful and took me more than 3 hrs to figure out REAL behavior when setting |
Currently when the configured model list is updated via a call to
handleReloadConfigRequest
, the request thread blocks until any newly added models become available.Their availability however depends on the filesystem polling thread rescanning the filesystem at some periodic interval, meaning that there's an arbitrary delay before the requested changes actually take effect and the RPC returns.
This problem may not be very noticeable with the default polling interval of 1 second, but seems undesirable for longer intervals and in particular makes API-based dynamic reconfiguration incompatible with the
--file_system_poll_wait_seconds=0
setting (in this case allhandleReloadConfigRequest
calls time-out and do not take effect).Fixes #1519