On Linux, os.count() should read cgroup cpu.shares and cpu.cfs (CPU count inside docker container) #80235

keirlawson · 2019-02-20T16:56:22Z

BPO	36054
Nosy	@vstinner, @giampaolo, @tiran, @jab, @methane, @matrixise, @corona10, @Zheaoli, @mcnelsonphd, @NargiT, @Redeyed

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2019-02-20.16:56:22.384>
labels = ['interpreter-core', '3.8', '3.7', 'library', 'performance']
title = 'On Linux, os.count() should read cgroup cpu.shares and cpu.cfs (CPU count inside docker container)'
updated_at = <Date 2021-10-11.14:25:34.762>
user = 'https://github.com/keirlawson'

bugs.python.org fields:

activity = <Date 2021-10-11.14:25:34.762>
actor = 'corona10'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core', 'Library (Lib)']
creation = <Date 2019-02-20.16:56:22.384>
creator = 'keirlawson'
dependencies = []
files = []
hgrepos = []
issue_num = 36054
keywords = []
message_count = 24.0
messages = ['336117', '336126', '336146', '336148', '336149', '338724', '338778', '338780', '338783', '339401', '339404', '339429', '339439', '353690', '364894', '364898', '364901', '364902', '364916', '365071', '365075', '366811', '403622', '403654']
nosy_count = 12.0
nosy_names = ['vstinner', 'giampaolo.rodola', 'christian.heimes', 'jab', 'methane', 'matrixise', 'corona10', 'Manjusaka', 'mcnelsonphd', 'galen', 'nargit', 'RedEyed']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue36054'
versions = ['Python 3.7', 'Python 3.8']

keirlawson · 2019-02-20T16:56:22Z

There appears to be no way to detect the number of CPUs allotted to a Python program within a docker container. With the following script:

import os

print("os.cpu_count(): " + str(os.cpu_count()))
print("len(os.sched_getaffinity(0)): " + str(len(os.sched_getaffinity(0))))

when run in a container (from an Ubuntu 18.04 host) I get:

docker run -v "$PWD":/src/ -w /src/ --cpus=1 python:3.7 python detect_cpus.py
os.cpu_count(): 4
len(os.sched_getaffinity(0)): 4

Recent vesions of Java are able to correctly detect the CPU allocation:

docker run -it --cpus 1 openjdk:10-jdk
Feb 20, 2019 4:20:29 PM java.util.prefs.FileSystemPreferences$1 run
INFO: Created user preferences directory.
| Welcome to JShell -- Version 10.0.2
| For an introduction type: /help intro

jshell> Runtime.getRuntime().availableProcessors()
$1 ==> 1

matrixise · 2019-02-20T17:12:51Z

I would like to work on this issue.

matrixise · 2019-02-20T20:15:06Z

so, I have also tested with the last docker image of golang.

docker run --rm --cpus 1 -it golang /bin/bash

here is the code of golang:

package main

import "fmt"
import "runtime"

func main() {
cores := runtime.NumCPU()
fmt.Printf("This machine has %d CPU cores.\n", cores)
}

Here is the output

./demo
This machine has 4 CPU cores.

When I try with grep on /proc/cpuinfo
I get this result

grep processor /proc/cpuinfo -c
4

I will test with openjdk because it's related to Java and see if I can get the result of 1

matrixise · 2019-02-20T20:18:19Z

ok, I didn't see your test with openjdk:10, sorry

keirlawson · 2019-02-20T20:24:27Z

I believe this is related to this ticket: https://bugs.python.org/issue26692

Looking at Java's implementation it seems like they are checking if cgroups are enabled via /proc/self/cgroup and then if it is parsing the cgroup information out of the filesystem.

matrixise · 2019-03-24T06:23:07Z

I am really sorry but I thought to work on this issue but it's not the case. Feel free to submit a PR.

Zheaoli · 2019-03-25T02:47:56Z

I think that I may work on a PR for this issue. Is there anybody has worked on it ？

matrixise · 2019-03-25T05:08:49Z

Hi Manjusaka,

Could you explain your solution, because I have read the code of
openjdk, (C++) and I am going to be honnest, it was not complex but not
really clear.

Also, if you need my help for the review or for the construction of your
PR, I can help you.

Have a nice day,

Stéphane

Zheaoli · 2019-03-25T05:38:46Z

Hi Stéphane

Thanks a lot!

In my opinion, I would like to make an independent library that name is cgroups. For ease of use and compatibility, I think it's better than combining code with the os module.

Thanks for you working!

Manjusaka

Zheaoli · 2019-04-03T16:48:53Z

Hi Stéphane:

I have checked the JVM implantation about container improvements. I confirm that maybe we need a new Libary for container environment. I don't think that combine it into the os module is a good idea. I will make a PR during this week.

tiran · 2019-04-03T17:05:14Z

The JVM parses cgroups information from the proc filesystem and evaluates CPU count from the cgroup cpu.shares and cpu.cfs.

https://github.com/openjdk/jdk/blob/d5686b87f31d6c57ec6b3e5e9c85a04209dbac7a/src/hotspot/os/linux/os_linux.cpp#L5304-L5336

https://github.com/openjdk/jdk/blob/2d5137e403e16b694800b2ffe18c3640396b757e/src/hotspot/os/linux/osContainer_linux.cpp#L517-L591

Zheaoli · 2019-04-04T03:43:22Z

Yes, not only but also support get real memory limit.

look at https://blogs.oracle.com/java-platform-group/java-se-support-for-docker-cpu-and-memory-limits

matrixise · 2019-04-04T09:37:14Z

Yes, not only but also support get real memory limit.

look at https://blogs.oracle.com/java-platform-group/java-se-support-for-docker-cpu-and-memory-limits

Yep, but in this case, you have to create an other issue for the memory
limit.

mcnelsonphd · 2019-10-01T12:46:23Z

Is this issue still being worked on as a core feature? I needed a solution for this using 2.7.11 to enable some old code to work properly/nicely in a container environment on AWS Batch and was forced to figure out what OpenJDK was doing and came up with a solution. The process in OpenJDK seems to be, find where the cgroups for docker are located in the file system, then depending on the values in different files you can determine the number of CPUs available.

The inelegant code below is what worked for me:

def query_cpu():
	if os.path.isfile('/sys/fs/cgroup/cpu/cpu.cfs_quota_us'):
		cpu_quota = int(open('/sys/fs/cgroup/cpu/cpu.cfs_quota_us').read().rstrip())
		#print(cpu_quota) # Not useful for AWS Batch based jobs as result is -1, but works on local linux systems
	if cpu_quota != -1 and os.path.isfile('/sys/fs/cgroup/cpu/cpu.cfs_period_us'):
		cpu_period = int(open('/sys/fs/cgroup/cpu/cpu.cfs_period_us').read().rstrip())
		#print(cpu_period)
		avail_cpu = int(cpu_quota / cpu_period) # Divide quota by period and you should get num of allotted CPU to the container, rounded down if fractional.
	elif os.path.isfile('/sys/fs/cgroup/cpu/cpu.shares'):
		cpu_shares = int(open('/sys/fs/cgroup/cpu/cpu.shares').read().rstrip())
		#print(cpu_shares) # For AWS, gives correct value * 1024.
		avail_cpu = int(cpu_shares / 1024)
	return avail_cpu

This solution makes several assumptions about the cgroup locations within the container vs dynamically finding where those files are located as OpenJDK does. I also haven't included the more robust method in case cpu.quota and cpu.shares are -1.

Hopefully this is a start for getting this implemented.

Zheaoli · 2020-03-23T20:09:47Z

Hello Mike, thanks for your code.

I think it's a good way

I think if cpu.quota and cpu.shares are -1, just return the original value in os.cpu_count() is OK

Zheaoli · 2020-03-23T20:18:52Z

I will make a PR in this week

vstinner · 2020-03-23T21:18:09Z

I'm not sure that it's a good idea to change os.cpucount(). I suggest to add a new function instead.

os.cpu_count() iss documented as:
"Return the number of CPUs in the system."
https://docs.python.org/dev/library/os.html#os.cpu_count

By the way, the documentation adds:

"This number is not equivalent to the number of CPUs the current process can use. The number of usable CPUs can be obtained with len(os.sched_getaffinity(0))"

tiran · 2020-03-23T21:28:48Z

I suggest that your provide a low solution that returns general information from cgroup v1 and unified cgroup v2 rather than a solution that focuses on CPU only. Then you can provide a high level interface that returns effective CPU cores.

cgroup v2 (unified hierarchy) has been around for 6 years and slowly gains traction as container platforms start to support them.

Zheaoli · 2020-03-24T04:03:08Z

Actually, we already have some third party libs to support cgroup. But most of them get these questions

They are not std lib
They are just support cgroup1

But if we want to add a new std lib. Should we create a PEP?

Zheaoli · 2020-03-26T13:34:48Z

Hello guys, I some ideas about this issue

First, maybe we should add a new API named cpu_usable_count(). I think it's more meaningful than the os.sched_getaffinity(0)

Second, more and more people use docker to run their app today. So people need an official way to get the environment info, not just cpu, but the memory, the network traffic limit. Because the docker is based on the CGroup in Linux, maybe we can add a cgroup lib as an official supported lib.

but I'm not sure about this idea, because there are some problem.

the CGroup is only supported on Linux. I'm not sure that adding a platform-specific lib is a good idea
Many languages are not adding cgroup official yet. Maybe there are some languages are optimized for the cgroup (such as Java in JVM)

vstinner · 2020-03-26T13:53:43Z

Hello guys,

Please try to find a more inclusive way to say hello: https://heyguys.cc/ ;-)

NargiT · 2020-04-20T05:47:27Z

Do we have any news about this?

RedEyed · 2021-10-11T06:31:29Z

Do we have any news about this?

There is IBM effort to do this in container level, so that os.cpu_count() will return right result in container

https://www.phoronix.com/scan.php?page=news_item&px=Linux-CPU-Namespace

corona10 · 2021-10-11T14:25:35Z

There is IBM effort to do this in container level, so that os.cpu_count() will return right result in container

Good news!

d1gl3 · 2022-06-10T11:44:35Z

Is this issue still being worked on as a core feature? I needed a solution for this using 2.7.11 to enable some old code to work properly/nicely in a container environment on AWS Batch and was forced to figure out what OpenJDK was doing and came up with a solution. The process in OpenJDK seems to be, find where the cgroups for docker are located in the file system, then depending on the values in different files you can determine the number of CPUs available.

The inelegant code below is what worked for me:
def query_cpu():
	if os.path.isfile('/sys/fs/cgroup/cpu/cpu.cfs_quota_us'):
		cpu_quota = int(open('/sys/fs/cgroup/cpu/cpu.cfs_quota_us').read().rstrip())
		#print(cpu_quota) # Not useful for AWS Batch based jobs as result is -1, but works on local linux systems
	if cpu_quota != -1 and os.path.isfile('/sys/fs/cgroup/cpu/cpu.cfs_period_us'):
		cpu_period = int(open('/sys/fs/cgroup/cpu/cpu.cfs_period_us').read().rstrip())
		#print(cpu_period)
		avail_cpu = int(cpu_quota / cpu_period) # Divide quota by period and you should get num of allotted CPU to the container, rounded down if fractional.
	elif os.path.isfile('/sys/fs/cgroup/cpu/cpu.shares'):
		cpu_shares = int(open('/sys/fs/cgroup/cpu/cpu.shares').read().rstrip())
		#print(cpu_shares) # For AWS, gives correct value * 1024.
		avail_cpu = int(cpu_shares / 1024)
	return avail_cpu
This solution makes several assumptions about the cgroup locations within the container vs dynamically finding where those files are located as OpenJDK does. I also haven't included the more robust method in case cpu.quota and cpu.shares are -1.

Hopefully this is a start for getting this implemented.

FYI, this solution was actually implemented in pylint==2.14.0. Yesterday pylint crashed due to that solution when run in a kubernetes pod. The pods can actually get cpu_shares < 1024 which leads to 0 available cores due to rounding.
That crashed pylint as it was using this logic to determine how many processes to pass to multiprocessing (which needs at least 1) when --jobs=0 was passed.

I fixed this for pylint by setting avail_cpu to 1 if 0 has been calculated. (see pylint PR)

keirlawson mannequin added 3.7 stdlib Python modules in the Lib dir performance Performance or resource usage labels Feb 20, 2019

matrixise added interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) 3.8 labels Mar 24, 2019

vstinner changed the title ~~Way to detect CPU count inside docker container~~ On Linux, os.count() should read cgroup cpu.shares and cpu.cfs (CPU count inside docker container) Oct 1, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

danxmoran mentioned this issue Jun 15, 2022

Detection of #cores for default parallelism settings doesn't work in cgroup pantsbuild/pants#15840

Open

dsilakov mentioned this issue Aug 10, 2022

analyze-build doesn’t honor cgroup CPU limitations if launched inside a container llvm/llvm-project#57050

Open

iritkatriel added 3.12 and removed 3.8 3.7 labels Sep 10, 2022

On Linux, os.count() should read cgroup cpu.shares and cpu.cfs (CPU count inside docker container) #80235

On Linux, os.count() should read cgroup cpu.shares and cpu.cfs (CPU count inside docker container) #80235

keirlawson mannequin commented Feb 20, 2019

keirlawson mannequin commented Feb 20, 2019

matrixise commented Feb 20, 2019

matrixise commented Feb 20, 2019

matrixise commented Feb 20, 2019

keirlawson mannequin commented Feb 20, 2019

matrixise commented Mar 24, 2019

Zheaoli mannequin commented Mar 25, 2019

matrixise commented Mar 25, 2019

Zheaoli mannequin commented Mar 25, 2019

Zheaoli mannequin commented Apr 3, 2019

tiran commented Apr 3, 2019

Zheaoli mannequin commented Apr 4, 2019

matrixise commented Apr 4, 2019

mcnelsonphd mannequin commented Oct 1, 2019

Zheaoli mannequin commented Mar 23, 2020

Zheaoli mannequin commented Mar 23, 2020

vstinner commented Mar 23, 2020

tiran commented Mar 23, 2020

Zheaoli mannequin commented Mar 24, 2020

Zheaoli mannequin commented Mar 26, 2020

vstinner commented Mar 26, 2020

NargiT mannequin commented Apr 20, 2020

RedEyed mannequin commented Oct 11, 2021

corona10 commented Oct 11, 2021

d1gl3 commented Jun 10, 2022

On Linux, os.count() should read cgroup cpu.shares and cpu.cfs (CPU count inside docker container) #80235

On Linux, os.count() should read cgroup cpu.shares and cpu.cfs (CPU count inside docker container) #80235

Comments

keirlawson mannequin commented Feb 20, 2019

keirlawson mannequin commented Feb 20, 2019

matrixise commented Feb 20, 2019

matrixise commented Feb 20, 2019

matrixise commented Feb 20, 2019

keirlawson mannequin commented Feb 20, 2019

matrixise commented Mar 24, 2019

Zheaoli mannequin commented Mar 25, 2019

matrixise commented Mar 25, 2019

Zheaoli mannequin commented Mar 25, 2019

Zheaoli mannequin commented Apr 3, 2019

tiran commented Apr 3, 2019

Zheaoli mannequin commented Apr 4, 2019

matrixise commented Apr 4, 2019

mcnelsonphd mannequin commented Oct 1, 2019

Zheaoli mannequin commented Mar 23, 2020

Zheaoli mannequin commented Mar 23, 2020

vstinner commented Mar 23, 2020

tiran commented Mar 23, 2020

Zheaoli mannequin commented Mar 24, 2020

Zheaoli mannequin commented Mar 26, 2020

vstinner commented Mar 26, 2020

NargiT mannequin commented Apr 20, 2020

RedEyed mannequin commented Oct 11, 2021

corona10 commented Oct 11, 2021

d1gl3 commented Jun 10, 2022