Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make altinstall for PGO is not parallel-safe #93584

Closed
tiran opened this issue Jun 7, 2022 · 7 comments
Closed

make altinstall for PGO is not parallel-safe #93584

tiran opened this issue Jun 7, 2022 · 7 comments
Assignees
Labels
3.11 3.12 build release-blocker type-bug

Comments

@tiran
Copy link
Member

@tiran tiran commented Jun 7, 2022

Bug report

setup.py has a helper method add_multiarch_paths. The method creates a temporary file build/lib.{platform}/multiarch and unlinks it at the end of the function call. This is not parallel-safe.

PGO builds of Python use $(MAKE), so called recursive make. Recursive makes are considered harmful because the main make process has no understanding what the child make process is doing. With heavy parallel makes this can cause race conditions.

The combination of unsafe add_multiarch_paths(), recursive make and loooots of CPU cores can lead to build issues like #84461 (comment).

Possible workarounds

  1. Rewrite our Makefile to not use $(MAKE)
  2. Move CC -print-multiarch and dpkg-architecture ... -qDEB_HOST_MULTIARCH calls to configure.ac
  3. Fix _bootsubprocess.check_output() and add_multiarch_paths() to use a more safe tmp file

Option (3) is the simplest approach. Even tmpfile = os.path.join(self.build_temp, f'multiarch-{os.getpid()}') would be good enough to avoid file conflicts in parallel builds.

@tiran tiran added type-bug build 3.11 3.12 labels Jun 7, 2022
@tiran tiran self-assigned this Jun 7, 2022
tiran added a commit to tiran/cpython that referenced this issue Jun 7, 2022
tiran added a commit to tiran/cpython that referenced this issue Jun 7, 2022
tiran added a commit to tiran/cpython that referenced this issue Jun 7, 2022
@tiran
Copy link
Member Author

@tiran tiran commented Jun 7, 2022

My PR solved the race condition in add_multiarch_paths() but revealed another race condition later.

@pablogsal 's problem is caused by the fact that PGO builds use recursive make. The sharedmods target and the profile-opt target run ./python setup.py build at the same time. setup.py is not designed to be parallel-safe.

tiran added a commit to tiran/cpython that referenced this issue Jun 7, 2022
All install targets use the "all" target as synchronization point to
prevent race conditions with PGO builds. PGO builds use recursive make,
which can lead to two parallel `./python setup.py build` processes that
step on each others toes.

"test" targets now correctly compile PGO build in a clean repo.
tiran added a commit to tiran/cpython that referenced this issue Jun 7, 2022
All install targets use the "all" target as synchronization point to
prevent race conditions with PGO builds. PGO builds use recursive make,
which can lead to two parallel `./python setup.py build` processes that
step on each others toes.

"test" targets now correctly compile PGO build in a clean repo.
@ethanhs
Copy link
Contributor

@ethanhs ethanhs commented Jun 7, 2022

Weird, I never hit a compilation failure after 2 hours of trying on my 32 core machine, maybe this is an edge case you can only hit with the macOS scheduler.

@tiran
Copy link
Member Author

@tiran tiran commented Jun 7, 2022

Did you create a build with configure --enable-optimizations? The issue only occurs with PGO builds.

@ethanhs
Copy link
Contributor

@ethanhs ethanhs commented Jun 7, 2022

Yeah, I was running a script in a loop using the steps from Pablo's reproducer here: #84461 (comment)

After 2 hours it still hadn't failed

@pablogsal
Copy link
Member

@pablogsal pablogsal commented Jun 7, 2022

Yeah, I was running a script in a loop using the steps from Pablo's reproducer here: #84461 (comment)

After 2 hours it still hadn't failed

That's odd. I have been able to reproduce in at least 3 systems, including Linux with arch Linux and RHEL7

@tiran
Copy link
Member Author

@tiran tiran commented Jun 8, 2022

In comment #93589 (comment) @nascheme suggested to rename some of the targets to make the intention more clear. I like the idea, but I don't want to mix improvements with a release blocker fix.

@tiran tiran changed the title setup.py add_multiarch_paths() is not parallel-safe with recursive make make altinstall for PGO is not parallel-save Jun 8, 2022
tiran added a commit that referenced this issue Jun 8, 2022
All install targets use the "all" target as synchronization point to
prevent race conditions with PGO builds. PGO builds use recursive make,
which can lead to two parallel `./python setup.py build` processes that
step on each others toes.

"test" targets now correctly compile PGO build in a clean repo.
tiran added a commit to tiran/cpython that referenced this issue Jun 8, 2022
…ythonGH-93589)

All install targets use the "all" target as synchronization point to
prevent race conditions with PGO builds. PGO builds use recursive make,
which can lead to two parallel `./python setup.py build` processes that
step on each others toes.

"test" targets now correctly compile PGO build in a clean repo.
(cherry picked from commit 243ed54)

Co-authored-by: Christian Heimes <christian@python.org>
tiran added a commit that referenced this issue Jun 8, 2022
) (GH-93603)

All install targets use the "all" target as synchronization point to
prevent race conditions with PGO builds. PGO builds use recursive make,
which can lead to two parallel `./python setup.py build` processes that
step on each others toes.

"test" targets now correctly compile PGO build in a clean repo.
(cherry picked from commit 243ed54)

Co-authored-by: Christian Heimes <christian@python.org>
@gpshead gpshead changed the title make altinstall for PGO is not parallel-save make altinstall for PGO is not parallel-safe Jun 15, 2022
@tiran
Copy link
Member Author

@tiran tiran commented Jun 15, 2022

@pablogsal As far as I know this bug has been fixed by GH-93589.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 3.12 build release-blocker type-bug
Projects
None yet
Development

No branches or pull requests

3 participants